TransWikia.com

the missForest function in R doesn't work

Data Science Asked on July 12, 2021

I’m trying to use the function missForest() of the library ‘missForest’ but I always get the same error message.

This is the code:

libraries:

library(dplyr)
library(naniar)
library(missForest)

data:

url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
crx <- read.csv(url, sep = ",", header = F)

Then I replace all the "?" with null values (NA):

crx <- crx %>% replace_with_na_all(condition = ~.x == "?")

And then I apply the missForest to get rid of the null values:

crx <- missForest(crx)

And I get the following error message:

Error: Assigned data `mean(xmis[, t.co], na.rm = TRUE)` must be 

compatible with existing data. ℹ Error occurred for column `V1`. x Can't convert <double> to <character>. Run `rlang::last_error()` to see where the error occurred.
20.
stop(fallback)
19.
signal_abort(cnd)
18.
cnd_signal(error_assign_incompatible_type(x, value, j, value_arg, cnd_message(cnd)))
17.
(function (cnd) { cnd_signal(error_assign_incompatible_type(x, value, j, value_arg, cnd_message(cnd))) ...
16.
signalCondition(cnd)
15.
signal_abort(cnd)
14.
abort(message, class = c(class, "vctrs_error"), ...)
13.
stop_vctrs(message, class = c(class, "vctrs_error_incompatible"), x = x, y = y, details = details, ...)
12.
stop_incompatible(x, y, x_arg = x_arg, y_arg = y_arg, details = details, ..., message = message, class = c(class, "vctrs_error_incompatible_type"))
11.
stop_incompatible_type(x = x, y = to, ..., x_arg = x_arg, y_arg = to_arg, action = "convert", details = details, message = message, class = class)
10.
stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = match_from_dispatch(...))
9.
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = `vctrs:::from_dispatch`, `vctrs:::df_fallback` = `vctrs:::df_fallback`, `vctrs:::s3_fallback` = `vctrs:::s3_fallback`)
8.
(function () vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = `vctrs:::from_dispatch`, `vctrs:::df_fallback` = `vctrs:::df_fallback`, `vctrs:::s3_fallback` = `vctrs:::s3_fallback`))()
7.
`vec_slice<-`(`*tmp*`, i, value = value[[j]])
6.
withCallingHandlers(for (j in seq_along(x)) { xj <- x[[j]] vec_slice(xj, i) <- value[[j]] x[[j]] <- xj ...
5.
tbl_subassign_row(xj, i, value, value_arg)
4.
tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
3.
`[<-.tbl_df`(`*tmp*`, is.na(xmis[, t.co]), t.co, value = NA_real_)
2.
`[<-`(`*tmp*`, is.na(xmis[, t.co]), t.co, value = NA_real_)
1.
missForest(crx)
Show in New WindowClear OutputExpand/Collapse Output
tibble [690 × 16] (S3: tbl_df/tbl/data.frame)
 $ V1 : chr [1:690] "b" "a" "a" "b" ...
 $ V2 : chr [1:690] "30.83" "58.67" "24.50" "27.83" ...
 $ V3 : num [1:690] 0 4.46 0.5 1.54 5.62 ...
 $ V4 : chr [1:690] "u" "u" "u" "u" ...
 $ V5 : chr [1:690] "g" "g" "g" "g" ...
 $ V6 : chr [1:690] "w" "q" "q" "w" ...
 $ V7 : chr [1:690] "v" "h" "h" "v" ...
 $ V8 : num [1:690] 1.25 3.04 1.5 3.75 1.71 ...
 $ V9 : chr [1:690] "t" "t" "t" "t" ...
 $ V10: chr [1:690] "t" "t" "f" "t" ...
 $ V11: int [1:690] 1 6 0 5 0 0 0 0 0 0 ...
 $ V12: chr [1:690] "f" "f" "f" "t" ...
 $ V13: chr [1:690] "g" "g" "g" "g" ...
 $ V14: chr [1:690] "00202" "00043" "00280" "00100" ...
 $ V15: int [1:690] 0 560 824 3 0 0 31285 1349 314 1442 ...
 $ V16: chr [1:690] "+" "+" "+" "+" ...

I read in StackOverflow a probable fix to that and that is to transform the data into a data frame with the function as.data.frame() but didn’t work either and it returns this other error message:

argument is not numeric or logical: returning NAargument is not numeric or logical:
returning NAargument is not numeric or logical: returning NAargument is not numeric or 
logical: returning NAargument is not numeric or logical: returning NAargument is not numeric 
or logical: returning NAargument is not numeric or logical: returning NAargument is not 
numeric or logical: returning NAargument is not numeric or logical: returning NAargument is 
not numeric or logical: returning NAargument is not numeric or logical: returning NAargument
is not numeric or logical: returning NA  missForest iteration 1 in progress...done!
Error in FUN(left, right) : non-numeric argument to binary operator

One Answer

First, I don't understand why are you using naniar package for just a simple task of replacing values. You can use a very simple method to replace values.

After getting data just use this to replace ? values in your data -

> crx = as.data.frame(crx) 
> crx[crx == '?'] <- NA     # Replace ? values with NA

You will see summary like this -

> summary(crx)

    V1            V2            V3            V4         V5            V6            V7            V8         V9     
 ?   :  0   22.67  :  9   Min.   : 0.000   ?   :  0   ?   :  0   c      :137   v      :399   Min.   : 0.000   f:329  
 a   :210   20.42  :  7   1st Qu.: 1.000   l   :  2   g   :519   q      : 78   h      :138   1st Qu.: 0.165   t:361  
 b   :468   18.83  :  6   Median : 2.750   u   :519   gg  :  2   w      : 64   bb     : 59   Median : 1.000          
 NA's: 12   19.17  :  6   Mean   : 4.759   y   :163   p   :163   i      : 59   ff     : 57   Mean   : 2.223          
            20.67  :  6   3rd Qu.: 7.207   NA's:  6   NA's:  6   aa     : 54   j      :  8   3rd Qu.: 2.625          
            (Other):644   Max.   :28.000                         (Other):289   (Other): 20   Max.   :28.500          
            NA's   : 12                                          NA's   :  9   NA's   :  9                           
 V10          V11       V12     V13          V14           V15           V16    
 f:395   Min.   : 0.0   f:374   g:625   00000  :132   Min.   :     0.0   -:383  
 t:295   1st Qu.: 0.0   t:316   p:  8   00120  : 35   1st Qu.:     0.0   +:307  
         Median : 0.0           s: 57   00200  : 35   Median :     5.0          
         Mean   : 2.4                   00160  : 34   Mean   :  1017.4          
         3rd Qu.: 3.0                   00080  : 30   3rd Qu.:   395.5          
         Max.   :67.0                   (Other):411   Max.   :100000.0          
                                        NA's   : 13                             

Run function str to get an idea about variable's values-

> str(crx)
'data.frame':   690 obs. of  16 variables:
 $ V1 : Factor w/ 3 levels "?","a","b": 3 2 2 3 3 3 3 2 3 3 ...
 $ V2 : Factor w/ 350 levels "?","13.75","15.17",..: 158 330 91 127 45 170 181 76 312 257 ...
 $ V3 : num  0 4.46 0.5 1.54 5.62 ...
 $ V4 : Factor w/ 4 levels "?","l","u","y": 3 3 3 3 3 3 3 3 4 4 ...
 $ V5 : Factor w/ 4 levels "?","g","gg","p": 2 2 2 2 2 2 2 2 4 4 ...
 $ V6 : Factor w/ 15 levels "?","aa","c","cc",..: 14 12 12 14 14 11 13 4 10 14 ...
 $ V7 : Factor w/ 10 levels "?","bb","dd",..: 9 5 5 9 9 9 5 9 5 9 ...
 $ V8 : num  1.25 3.04 1.5 3.75 1.71 ...
 $ V9 : Factor w/ 2 levels "f","t": 2 2 2 2 2 2 2 2 2 2 ...
 $ V10: Factor w/ 2 levels "f","t": 2 2 1 2 1 1 1 1 1 1 ...
 $ V11: int  1 6 0 5 0 0 0 0 0 0 ...
 $ V12: Factor w/ 2 levels "f","t": 1 1 1 2 1 2 2 1 1 2 ...
 $ V13: Factor w/ 3 levels "g","p","s": 1 1 1 1 3 1 1 1 1 1 ...
 $ V14: Factor w/ 171 levels "?","00000","00017",..: 70 13 98 33 39 117 56 25 64 17 ...
 $ V15: int  0 560 824 3 0 0 31285 1349 314 1442 ...
 $ V16: Factor w/ 2 levels "-","+": 2 2 2 2 2 2 2 2 2 2 ...

Here you see your columns V2 and V14 have more than 53 levels (categories). Random Forest can't be used with variables having more than 53 categories. So you can't use these in missForest. If you use, you will get error like this -

Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, : Can not handle categorical predictors with more than 53 categories.

So your option is to use other variables except these two.

> crx_impute = missForest(crx[,c(1,3,4,5,6,7,8,9,10,11,12,13,15,16)])

Then you can use crx_impute$ximp to get your imputed data.

Answered by Ankit Seth on July 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP