Data Science Asked on July 12, 2021
I’m trying to use the function missForest() of the library ‘missForest’ but I always get the same error message.
This is the code:
libraries:
library(dplyr)
library(naniar)
library(missForest)
data:
url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data'
crx <- read.csv(url, sep = ",", header = F)
Then I replace all the "?" with null values (NA):
crx <- crx %>% replace_with_na_all(condition = ~.x == "?")
And then I apply the missForest to get rid of the null values:
crx <- missForest(crx)
And I get the following error message:
Error: Assigned data `mean(xmis[, t.co], na.rm = TRUE)` must be
compatible with existing data. ℹ Error occurred for column `V1`. x Can't convert <double> to <character>. Run `rlang::last_error()` to see where the error occurred.
20.
stop(fallback)
19.
signal_abort(cnd)
18.
cnd_signal(error_assign_incompatible_type(x, value, j, value_arg, cnd_message(cnd)))
17.
(function (cnd) { cnd_signal(error_assign_incompatible_type(x, value, j, value_arg, cnd_message(cnd))) ...
16.
signalCondition(cnd)
15.
signal_abort(cnd)
14.
abort(message, class = c(class, "vctrs_error"), ...)
13.
stop_vctrs(message, class = c(class, "vctrs_error_incompatible"), x = x, y = y, details = details, ...)
12.
stop_incompatible(x, y, x_arg = x_arg, y_arg = y_arg, details = details, ..., message = message, class = c(class, "vctrs_error_incompatible_type"))
11.
stop_incompatible_type(x = x, y = to, ..., x_arg = x_arg, y_arg = to_arg, action = "convert", details = details, message = message, class = class)
10.
stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = match_from_dispatch(...))
9.
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = `vctrs:::from_dispatch`, `vctrs:::df_fallback` = `vctrs:::df_fallback`, `vctrs:::s3_fallback` = `vctrs:::s3_fallback`)
8.
(function () vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg, `vctrs:::from_dispatch` = `vctrs:::from_dispatch`, `vctrs:::df_fallback` = `vctrs:::df_fallback`, `vctrs:::s3_fallback` = `vctrs:::s3_fallback`))()
7.
`vec_slice<-`(`*tmp*`, i, value = value[[j]])
6.
withCallingHandlers(for (j in seq_along(x)) { xj <- x[[j]] vec_slice(xj, i) <- value[[j]] x[[j]] <- xj ...
5.
tbl_subassign_row(xj, i, value, value_arg)
4.
tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
3.
`[<-.tbl_df`(`*tmp*`, is.na(xmis[, t.co]), t.co, value = NA_real_)
2.
`[<-`(`*tmp*`, is.na(xmis[, t.co]), t.co, value = NA_real_)
1.
missForest(crx)
Show in New WindowClear OutputExpand/Collapse Output
tibble [690 × 16] (S3: tbl_df/tbl/data.frame)
$ V1 : chr [1:690] "b" "a" "a" "b" ...
$ V2 : chr [1:690] "30.83" "58.67" "24.50" "27.83" ...
$ V3 : num [1:690] 0 4.46 0.5 1.54 5.62 ...
$ V4 : chr [1:690] "u" "u" "u" "u" ...
$ V5 : chr [1:690] "g" "g" "g" "g" ...
$ V6 : chr [1:690] "w" "q" "q" "w" ...
$ V7 : chr [1:690] "v" "h" "h" "v" ...
$ V8 : num [1:690] 1.25 3.04 1.5 3.75 1.71 ...
$ V9 : chr [1:690] "t" "t" "t" "t" ...
$ V10: chr [1:690] "t" "t" "f" "t" ...
$ V11: int [1:690] 1 6 0 5 0 0 0 0 0 0 ...
$ V12: chr [1:690] "f" "f" "f" "t" ...
$ V13: chr [1:690] "g" "g" "g" "g" ...
$ V14: chr [1:690] "00202" "00043" "00280" "00100" ...
$ V15: int [1:690] 0 560 824 3 0 0 31285 1349 314 1442 ...
$ V16: chr [1:690] "+" "+" "+" "+" ...
I read in StackOverflow a probable fix to that and that is to transform the data into a data frame with the function as.data.frame() but didn’t work either and it returns this other error message:
argument is not numeric or logical: returning NAargument is not numeric or logical:
returning NAargument is not numeric or logical: returning NAargument is not numeric or
logical: returning NAargument is not numeric or logical: returning NAargument is not numeric
or logical: returning NAargument is not numeric or logical: returning NAargument is not
numeric or logical: returning NAargument is not numeric or logical: returning NAargument is
not numeric or logical: returning NAargument is not numeric or logical: returning NAargument
is not numeric or logical: returning NA missForest iteration 1 in progress...done!
Error in FUN(left, right) : non-numeric argument to binary operator
First, I don't understand why are you using naniar
package for just a simple task of replacing values. You can use a very simple method to replace values.
After getting data just use this to replace ? values in your data -
> crx = as.data.frame(crx)
> crx[crx == '?'] <- NA # Replace ? values with NA
You will see summary like this -
> summary(crx)
V1 V2 V3 V4 V5 V6 V7 V8 V9
? : 0 22.67 : 9 Min. : 0.000 ? : 0 ? : 0 c :137 v :399 Min. : 0.000 f:329
a :210 20.42 : 7 1st Qu.: 1.000 l : 2 g :519 q : 78 h :138 1st Qu.: 0.165 t:361
b :468 18.83 : 6 Median : 2.750 u :519 gg : 2 w : 64 bb : 59 Median : 1.000
NA's: 12 19.17 : 6 Mean : 4.759 y :163 p :163 i : 59 ff : 57 Mean : 2.223
20.67 : 6 3rd Qu.: 7.207 NA's: 6 NA's: 6 aa : 54 j : 8 3rd Qu.: 2.625
(Other):644 Max. :28.000 (Other):289 (Other): 20 Max. :28.500
NA's : 12 NA's : 9 NA's : 9
V10 V11 V12 V13 V14 V15 V16
f:395 Min. : 0.0 f:374 g:625 00000 :132 Min. : 0.0 -:383
t:295 1st Qu.: 0.0 t:316 p: 8 00120 : 35 1st Qu.: 0.0 +:307
Median : 0.0 s: 57 00200 : 35 Median : 5.0
Mean : 2.4 00160 : 34 Mean : 1017.4
3rd Qu.: 3.0 00080 : 30 3rd Qu.: 395.5
Max. :67.0 (Other):411 Max. :100000.0
NA's : 13
Run function str
to get an idea about variable's values-
> str(crx)
'data.frame': 690 obs. of 16 variables:
$ V1 : Factor w/ 3 levels "?","a","b": 3 2 2 3 3 3 3 2 3 3 ...
$ V2 : Factor w/ 350 levels "?","13.75","15.17",..: 158 330 91 127 45 170 181 76 312 257 ...
$ V3 : num 0 4.46 0.5 1.54 5.62 ...
$ V4 : Factor w/ 4 levels "?","l","u","y": 3 3 3 3 3 3 3 3 4 4 ...
$ V5 : Factor w/ 4 levels "?","g","gg","p": 2 2 2 2 2 2 2 2 4 4 ...
$ V6 : Factor w/ 15 levels "?","aa","c","cc",..: 14 12 12 14 14 11 13 4 10 14 ...
$ V7 : Factor w/ 10 levels "?","bb","dd",..: 9 5 5 9 9 9 5 9 5 9 ...
$ V8 : num 1.25 3.04 1.5 3.75 1.71 ...
$ V9 : Factor w/ 2 levels "f","t": 2 2 2 2 2 2 2 2 2 2 ...
$ V10: Factor w/ 2 levels "f","t": 2 2 1 2 1 1 1 1 1 1 ...
$ V11: int 1 6 0 5 0 0 0 0 0 0 ...
$ V12: Factor w/ 2 levels "f","t": 1 1 1 2 1 2 2 1 1 2 ...
$ V13: Factor w/ 3 levels "g","p","s": 1 1 1 1 3 1 1 1 1 1 ...
$ V14: Factor w/ 171 levels "?","00000","00017",..: 70 13 98 33 39 117 56 25 64 17 ...
$ V15: int 0 560 824 3 0 0 31285 1349 314 1442 ...
$ V16: Factor w/ 2 levels "-","+": 2 2 2 2 2 2 2 2 2 2 ...
Here you see your columns V2 and V14 have more than 53 levels (categories). Random Forest can't be used with variables having more than 53 categories. So you can't use these in missForest. If you use, you will get error like this -
Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, : Can not handle categorical predictors with more than 53 categories.
So your option is to use other variables except these two.
> crx_impute = missForest(crx[,c(1,3,4,5,6,7,8,9,10,11,12,13,15,16)])
Then you can use crx_impute$ximp
to get your imputed data.
Answered by Ankit Seth on July 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP