Hi everybody,
I'am new to R and trying to deal with missing values NA by imputation, but got stucked.
The data is something like this:
id
1111
0019
2059
6000
1
NA
NA
NA
NA
2
NA
NA
NA
NA
3
NA
NA
NA
NA
4
NA
NA
NA
NA
5
17621
8520
8273
828
6
77525
33805
40138
3582
7
69884
25737
38496
5651
8
NA
NA
NA
NA
9
NA
NA
NA
NA
10
NA
NA
NA
NA
11
NA
NA
NA
NA
12
42365
14853
23338
4174
13
22188
8707
12032
1449
14
54738
21094
29265
4379
15
44200
17345
23968
2887
16
7685
2520
4380
785
17
9612
3174
5358
1080
18
8669
2999
4868
802
19
NA
NA
NA
NA
20
NA
NA
NA
NA
21
NA
NA
NA
NA
22
NA
NA
NA
NA
23
NA
NA
NA
NA
24
NA
NA
NA
NA
25
11465
5127
5430
908
tried to do it with missForest
imp <- missForest(dfmis)
and
imp <- missForest(dfmis, xtrue = dfa)
But got error: Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, :
length of response must be the same as predictors...
Also tried mice
imp <- mice(dfpmis, m=5, maxit = 50, method = 'pmm', seed = 500)
And got also error: Error in terms.formula(tmp, simplify = TRUE) :
invalid term in model formula.
I'll appreciate any thought on this.
Thank you in advance,
DR
Hi there,
See the example below based on the iris
dataset. You will see that in the one instance we feed the original iris
into xtrue
to see the quality of the imputation but you are able to simple run without that dataset as well as demonstrated in iris.imp2
.
library(missForest)
#> Loading required package: randomForest
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> Loading required package: foreach
#> Loading required package: itertools
#> Loading required package: iterators
## Nonparametric missing value imputation on mixed-type data:
data(iris)
summary(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
#> 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
#> Median :5.800 Median :3.000 Median :4.350 Median :1.300
#> Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
#> 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
#> Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
#> Species
#> setosa :50
#> versicolor:50
#> virginica :50
#>
#>
#>
## The data contains four continuous and one categorical variable.
## Artificially produce missing values using the 'prodNA' function:
set.seed(81)
iris.mis <- prodNA(iris, noNA = 0.2)
summary(iris.mis)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
#> 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
#> Median :5.750 Median :3.000 Median :4.400 Median :1.300
#> Mean :5.828 Mean :3.070 Mean :3.855 Mean :1.169
#> 3rd Qu.:6.400 3rd Qu.:3.375 3rd Qu.:5.100 3rd Qu.:1.800
#> Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
#> NA's :24 NA's :32 NA's :33 NA's :32
#> Species
#> setosa :42
#> versicolor:40
#> virginica :39
#> NA's :29
#>
#>
#>
## Impute missing values providing the complete matrix for
## illustration. Use 'verbose' to see what happens between iterations:
iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE)
#> missForest iteration 1 in progress...done!
#> error(s): 0.206485 0.03448276
#> estimated error(s): 0.160313 0.05785124
#> difference(s): 0.01225256 0.1466667
#> time: 0.07 seconds
#>
#> missForest iteration 2 in progress...done!
#> error(s): 0.2115068 0.03448276
#> estimated error(s): 0.1439782 0.04132231
#> difference(s): 0.0001759815 0
#> time: 0.04 seconds
#>
#> missForest iteration 3 in progress...done!
#> error(s): 0.2164123 0.03448276
#> estimated error(s): 0.142713 0.04958678
#> difference(s): 4.654903e-05 0
#> time: 0.07 seconds
#>
#> missForest iteration 4 in progress...done!
#> error(s): 0.2204607 0.03448276
#> estimated error(s): 0.1429416 0.04958678
#> difference(s): 2.832941e-05 0
#> time: 0.04 seconds
#>
#> missForest iteration 5 in progress...done!
#> error(s): 0.2186308 0.03448276
#> estimated error(s): 0.1432276 0.04958678
#> difference(s): 3.899112e-05 0
#> time: 0.05 seconds
## The imputation is finished after five iterations having a final
## true NRMSE of 0.143 and a PFC of 0.036. The estimated final NRMSE
## is 0.157 and the PFC is 0.025 (see Details for the reason taking
## iteration 4 instead of iteration 5 as final value).
## The final results can be accessed directly. The estimated error:
iris.imp$OOBerror
#> NRMSE PFC
#> 0.14294158 0.04958678
## The true imputation error (if available):
iris.imp$error
#> NRMSE PFC
#> 0.22046067 0.03448276
## And of course the imputed data matrix (do not run this):
## iris.imp$Ximp
iris.imp$ximp
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.100000 3.500000 1.483978 0.2000000 setosa
#> 2 4.596743 3.000000 1.400000 0.2000000 setosa
#> 3 4.700000 3.200000 1.571645 0.2000000 setosa
#> 4 4.600000 3.308491 1.343283 0.2000000 setosa
#> 5 4.946217 3.600000 1.400000 0.2000000 setosa
#> 6 5.400000 3.900000 1.700000 0.3246167 setosa
#> 7 4.600000 3.400000 1.400000 0.3000000 setosa
#> 8 5.000000 3.400000 1.500000 0.2000000 setosa
#> 9 4.400000 3.015152 1.306128 0.1997000 setosa
#> 10 4.900000 3.100000 1.500000 0.1000000 setosa
#> 11 5.400000 3.700000 1.500000 0.2000000 setosa
#> 12 4.800000 3.175219 1.600000 0.2105000 setosa
#> 13 4.800000 3.247726 1.515808 0.1000000 setosa
#> 14 4.300000 3.000000 1.100000 0.1000000 setosa
#> 15 5.800000 3.793967 1.200000 0.2000000 setosa
#> 16 5.700000 4.400000 1.500000 0.4000000 setosa
#> 17 5.400000 3.900000 1.300000 0.4000000 setosa
#> 18 5.100000 3.500000 1.400000 0.3000000 setosa
#> 19 5.700000 3.800000 1.700000 0.3000000 setosa
#> 20 5.100000 3.800000 1.500000 0.3000000 setosa
#> 21 5.400000 3.400000 1.700000 0.2000000 setosa
#> 22 5.100000 3.760617 1.500000 0.4000000 setosa
#> 23 4.600000 3.600000 1.000000 0.2000000 setosa
#> 24 5.181750 3.300000 1.700000 0.3401667 setosa
#> 25 4.800000 3.400000 1.486224 0.2000000 setosa
#> 26 5.000000 3.000000 1.600000 0.3348667 setosa
#> 27 5.000000 3.400000 1.564650 0.4000000 setosa
#> 28 5.200000 3.500000 1.500000 0.2000000 setosa
#> 29 5.200000 3.400000 1.526502 0.2000000 setosa
#> 30 4.700000 3.200000 1.600000 0.2000000 setosa
#> 31 4.800000 3.100000 1.600000 0.2000000 setosa
#> 32 5.400000 3.992488 1.435600 0.4000000 setosa
#> 33 5.328964 4.100000 1.500000 0.2876667 setosa
#> 34 5.369470 4.200000 1.400000 0.2000000 setosa
#> 35 4.900000 3.100000 1.500000 0.2000000 setosa
#> 36 5.519200 2.487667 3.947292 1.1112500 versicolor
#> 37 5.121666 3.500000 1.483978 0.2000000 setosa
#> 38 4.900000 3.600000 1.410960 0.1000000 setosa
#> 39 4.400000 3.000000 1.300000 0.2000000 setosa
#> 40 5.100000 3.569493 1.500000 0.2301500 setosa
#> 41 5.000000 3.500000 1.300000 0.3000000 setosa
#> 42 4.500000 2.300000 1.300000 0.3000000 setosa
#> 43 4.400000 3.200000 1.372094 0.2000000 setosa
#> 44 5.000000 3.381917 1.600000 0.6000000 setosa
#> 45 5.100000 3.800000 1.900000 0.4000000 setosa
#> 46 4.800000 3.355694 1.400000 0.3000000 setosa
#> 47 5.100000 3.800000 1.600000 0.2000000 setosa
#> 48 4.600000 3.323424 1.400000 0.2000000 setosa
#> 49 5.300000 3.700000 1.500000 0.2000000 setosa
#> 50 5.000000 3.300000 1.400000 0.2000000 setosa
#> 51 7.000000 3.200000 4.992667 1.6422976 versicolor
#> 52 6.400000 2.839933 4.500000 1.5000000 versicolor
#> 53 6.900000 3.114000 4.900000 1.5000000 versicolor
#> 54 5.500000 2.300000 4.000000 1.0555000 versicolor
#> 55 6.121617 2.845333 4.600000 1.5000000 versicolor
#> 56 5.700000 2.800000 4.500000 1.3000000 versicolor
#> 57 6.300000 3.300000 4.836333 1.5734643 versicolor
#> 58 4.900000 2.203024 3.300000 1.0000000 versicolor
#> 59 6.600000 2.900000 4.600000 1.3000000 versicolor
#> 60 5.200000 2.700000 3.900000 1.4000000 versicolor
#> 61 5.000000 2.000000 3.463500 1.0000000 versicolor
#> 62 5.900000 3.000000 4.574878 1.5000000 versicolor
#> 63 6.000000 2.200000 4.000000 1.0000000 versicolor
#> 64 6.100000 2.900000 4.700000 1.4630000 versicolor
#> 65 5.600000 2.900000 3.600000 1.3000000 versicolor
#> 66 6.700000 3.100000 4.777488 1.4000000 versicolor
#> 67 5.600000 3.000000 4.500000 1.5000000 versicolor
#> 68 5.800000 2.454271 4.100000 1.0000000 versicolor
#> 69 6.200000 2.200000 4.500000 1.5000000 versicolor
#> 70 5.600000 2.500000 3.900000 1.1000000 versicolor
#> 71 5.900000 3.200000 4.800000 1.5193976 versicolor
#> 72 6.100000 2.610833 4.000000 1.2076000 versicolor
#> 73 6.048646 2.840600 4.541544 1.5000000 versicolor
#> 74 6.100000 2.800000 4.238548 1.2000000 versicolor
#> 75 6.400000 2.900000 4.300000 1.3000000 versicolor
#> 76 6.600000 3.000000 4.400000 1.4000000 versicolor
#> 77 6.800000 3.081000 4.800000 1.4000000 versicolor
#> 78 6.700000 3.135000 5.000000 1.6385476 versicolor
#> 79 6.000000 2.900000 4.500000 1.5000000 versicolor
#> 80 5.700000 2.600000 3.500000 1.1507500 versicolor
#> 81 5.500000 2.400000 3.904458 1.0555000 versicolor
#> 82 5.500000 2.400000 3.700000 1.0000000 versicolor
#> 83 5.800000 2.683000 3.900000 1.2000000 versicolor
#> 84 6.262917 2.700000 5.100000 1.5118333 versicolor
#> 85 5.400000 3.000000 4.270735 1.5000000 versicolor
#> 86 6.000000 3.400000 4.500000 1.6000000 versicolor
#> 87 6.700000 3.100000 4.866000 1.5000000 versicolor
#> 88 5.818619 2.300000 4.400000 1.3000000 versicolor
#> 89 5.600000 3.000000 4.100000 1.3000000 versicolor
#> 90 5.500000 2.500000 4.000000 1.1102500 versicolor
#> 91 5.500000 2.667500 4.400000 1.2000000 versicolor
#> 92 6.100000 2.873333 4.600000 1.4460000 versicolor
#> 93 5.800000 2.600000 3.814233 1.1432500 versicolor
#> 94 5.193333 2.300000 3.300000 1.0000000 versicolor
#> 95 5.600000 2.700000 4.200000 1.3000000 versicolor
#> 96 5.700000 2.732714 4.200000 1.2000000 versicolor
#> 97 5.929452 2.900000 4.200000 1.3000000 versicolor
#> 98 6.200000 2.900000 4.300000 1.3000000 versicolor
#> 99 5.100000 2.500000 3.660000 1.1000000 versicolor
#> 100 5.881819 2.800000 4.331931 1.3000000 versicolor
#> 101 6.300000 3.300000 6.000000 2.5000000 virginica
#> 102 6.213000 2.700000 5.100000 1.6864667 virginica
#> 103 7.100000 3.000000 5.900000 2.1000000 virginica
#> 104 6.300000 2.900000 4.959917 1.8000000 virginica
#> 105 6.500000 3.000000 5.800000 2.2000000 virginica
#> 106 7.600000 3.000000 6.600000 2.1000000 virginica
#> 107 4.900000 2.500000 4.500000 1.7000000 virginica
#> 108 7.300000 2.900000 6.300000 2.0173333 virginica
#> 109 6.700000 2.500000 5.221300 1.8000000 virginica
#> 110 7.340917 3.600000 6.100000 2.1470000 virginica
#> 111 6.500000 3.200000 5.100000 2.0000000 virginica
#> 112 5.930833 2.700000 4.995500 1.9000000 virginica
#> 113 6.800000 3.000000 5.500000 2.1000000 virginica
#> 114 5.700000 2.500000 5.000000 2.0000000 virginica
#> 115 5.800000 2.821250 5.100000 2.4000000 virginica
#> 116 6.400000 3.014500 5.300000 2.3000000 virginica
#> 117 6.500000 3.000000 5.500000 2.0942667 virginica
#> 118 7.700000 3.800000 6.700000 2.1200000 virginica
#> 119 7.700000 2.600000 6.900000 2.3000000 virginica
#> 120 6.000000 2.771500 5.000000 1.5000000 virginica
#> 121 6.900000 3.200000 5.700000 2.3000000 virginica
#> 122 5.600000 2.800000 4.900000 2.0000000 virginica
#> 123 7.700000 2.800000 6.700000 2.0000000 virginica
#> 124 6.300000 2.700000 4.900000 1.8000000 virginica
#> 125 6.700000 3.300000 5.700000 2.4345000 virginica
#> 126 7.200000 3.200000 6.000000 1.8000000 virginica
#> 127 6.200000 2.800000 4.800000 1.8000000 virginica
#> 128 6.100000 3.000000 4.900000 1.8340000 virginica
#> 129 6.451750 2.800000 5.600000 2.1000000 virginica
#> 130 7.200000 3.000000 5.800000 2.0546667 virginica
#> 131 5.912533 2.800000 4.984433 1.9000000 virginica
#> 132 7.900000 3.800000 6.400000 2.0000000 virginica
#> 133 6.400000 3.012000 5.600000 2.2000000 virginica
#> 134 6.300000 2.800000 5.100000 1.5000000 virginica
#> 135 6.100000 2.600000 5.600000 1.4000000 virginica
#> 136 7.700000 3.341500 6.100000 2.3000000 virginica
#> 137 6.300000 3.400000 5.600000 2.4000000 virginica
#> 138 6.668000 3.100000 5.500000 1.8000000 virginica
#> 139 6.234233 3.000000 4.800000 1.8000000 virginica
#> 140 6.900000 3.100000 5.516067 2.1000000 virginica
#> 141 6.700000 3.100000 5.363058 2.4000000 virginica
#> 142 6.900000 3.100000 5.100000 2.3000000 virginica
#> 143 5.800000 2.700000 5.039067 1.9570000 virginica
#> 144 6.800000 3.200000 5.900000 2.3000000 virginica
#> 145 6.700000 3.300000 5.700000 2.5000000 virginica
#> 146 6.503533 3.000000 5.200000 2.3000000 virginica
#> 147 5.894533 2.745250 5.000000 1.9000000 virginica
#> 148 6.500000 3.000000 5.200000 2.1320667 virginica
#> 149 6.200000 2.760083 5.400000 1.8063500 virginica
#> 150 6.419900 3.000000 5.100000 1.8000000 virginica
iris.imp2 <- missForest(iris.mis, verbose = TRUE)
#> missForest iteration 1 in progress...done!
#> estimated error(s): 0.1585091 0.0661157
#> difference(s): 0.01217872 0.1466667
#> time: 0.04 seconds
#>
#> missForest iteration 2 in progress...done!
#> estimated error(s): 0.1450789 0.03305785
#> difference(s): 0.0001785337 0
#> time: 0.06 seconds
#>
#> missForest iteration 3 in progress...done!
#> estimated error(s): 0.1417088 0.03305785
#> difference(s): 4.808566e-05 0
#> time: 0.05 seconds
#>
#> missForest iteration 4 in progress...done!
#> estimated error(s): 0.1402569 0.04958678
#> difference(s): 4.275054e-05 0
#> time: 0.05 seconds
#>
#> missForest iteration 5 in progress...done!
#> estimated error(s): 0.1453971 0.04132231
#> difference(s): 4.293823e-05 0
#> time: 0.06 seconds
iris.imp2
#> $ximp
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.100000 3.500000 1.492399 0.2000000 setosa
#> 2 4.634083 3.000000 1.400000 0.2000000 setosa
#> 3 4.700000 3.200000 1.563350 0.2000000 setosa
#> 4 4.600000 3.262682 1.355167 0.2000000 setosa
#> 5 4.960381 3.600000 1.400000 0.2000000 setosa
#> 6 5.400000 3.900000 1.700000 0.3121667 setosa
#> 7 4.600000 3.400000 1.400000 0.3000000 setosa
#> 8 5.000000 3.400000 1.500000 0.2000000 setosa
#> 9 4.400000 2.993748 1.286250 0.2023333 setosa
#> 10 4.900000 3.100000 1.500000 0.1000000 setosa
#> 11 5.400000 3.700000 1.500000 0.2000000 setosa
#> 12 4.800000 3.174998 1.600000 0.1998500 setosa
#> 13 4.800000 3.288842 1.490500 0.1000000 setosa
#> 14 4.300000 3.000000 1.100000 0.1000000 setosa
#> 15 5.800000 3.748451 1.200000 0.2000000 setosa
#> 16 5.700000 4.400000 1.500000 0.4000000 setosa
#> 17 5.400000 3.900000 1.300000 0.4000000 setosa
#> 18 5.100000 3.500000 1.400000 0.3000000 setosa
#> 19 5.700000 3.800000 1.700000 0.3000000 setosa
#> 20 5.100000 3.800000 1.500000 0.3000000 setosa
#> 21 5.400000 3.400000 1.700000 0.2000000 setosa
#> 22 5.100000 3.793417 1.500000 0.4000000 setosa
#> 23 4.600000 3.600000 1.000000 0.2000000 setosa
#> 24 5.129813 3.300000 1.700000 0.3025833 setosa
#> 25 4.800000 3.400000 1.506307 0.2000000 setosa
#> 26 5.000000 3.000000 1.600000 0.3373667 setosa
#> 27 5.000000 3.400000 1.575833 0.4000000 setosa
#> 28 5.200000 3.500000 1.500000 0.2000000 setosa
#> 29 5.200000 3.400000 1.529307 0.2000000 setosa
#> 30 4.700000 3.200000 1.600000 0.2000000 setosa
#> 31 4.800000 3.100000 1.600000 0.2000000 setosa
#> 32 5.400000 3.928617 1.506381 0.4000000 setosa
#> 33 5.288242 4.100000 1.500000 0.2885000 setosa
#> 34 5.299246 4.200000 1.400000 0.2000000 setosa
#> 35 4.900000 3.100000 1.500000 0.2000000 setosa
#> 36 5.517917 2.463967 3.946750 1.0655000 versicolor
#> 37 5.116759 3.500000 1.504399 0.2000000 setosa
#> 38 4.900000 3.600000 1.446333 0.1000000 setosa
#> 39 4.400000 3.000000 1.300000 0.2000000 setosa
#> 40 5.100000 3.571953 1.500000 0.2245278 setosa
#> 41 5.000000 3.500000 1.300000 0.3000000 setosa
#> 42 4.500000 2.300000 1.300000 0.3000000 setosa
#> 43 4.400000 3.200000 1.315833 0.2000000 setosa
#> 44 5.000000 3.327383 1.600000 0.6000000 setosa
#> 45 5.100000 3.800000 1.900000 0.4000000 setosa
#> 46 4.800000 3.319933 1.400000 0.3000000 setosa
#> 47 5.100000 3.800000 1.600000 0.2000000 setosa
#> 48 4.600000 3.253515 1.400000 0.2000000 setosa
#> 49 5.300000 3.700000 1.500000 0.2000000 setosa
#> 50 5.000000 3.300000 1.400000 0.2000000 setosa
#> 51 7.000000 3.200000 4.980000 1.6047500 versicolor
#> 52 6.400000 2.860452 4.500000 1.5000000 versicolor
#> 53 6.900000 3.104333 4.900000 1.5000000 versicolor
#> 54 5.500000 2.300000 4.000000 1.0635000 versicolor
#> 55 6.163576 2.674667 4.600000 1.5000000 versicolor
#> 56 5.700000 2.800000 4.500000 1.3000000 versicolor
#> 57 6.300000 3.300000 4.822000 1.5511667 versicolor
#> 58 4.900000 2.170533 3.300000 1.0000000 versicolor
#> 59 6.600000 2.900000 4.600000 1.3000000 versicolor
#> 60 5.200000 2.700000 3.900000 1.4000000 versicolor
#> 61 5.000000 2.000000 3.423410 1.0000000 versicolor
#> 62 5.900000 3.000000 4.570083 1.5000000 versicolor
#> 63 6.000000 2.200000 4.000000 1.0000000 versicolor
#> 64 6.100000 2.900000 4.700000 1.4675667 versicolor
#> 65 5.600000 2.900000 3.600000 1.3000000 versicolor
#> 66 6.700000 3.100000 4.769500 1.4000000 versicolor
#> 67 5.600000 3.000000 4.500000 1.5000000 versicolor
#> 68 5.800000 2.433817 4.100000 1.0000000 versicolor
#> 69 6.200000 2.200000 4.500000 1.5000000 versicolor
#> 70 5.600000 2.500000 3.900000 1.1000000 versicolor
#> 71 5.900000 3.200000 4.800000 1.5248333 versicolor
#> 72 6.100000 2.640183 4.000000 1.2080000 versicolor
#> 73 6.074643 2.903500 4.595483 1.5000000 versicolor
#> 74 6.100000 2.800000 4.291000 1.2000000 versicolor
#> 75 6.400000 2.900000 4.300000 1.3000000 versicolor
#> 76 6.600000 3.000000 4.400000 1.4000000 versicolor
#> 77 6.800000 3.078750 4.800000 1.4000000 versicolor
#> 78 6.700000 3.091000 5.000000 1.5900833 versicolor
#> 79 6.000000 2.900000 4.500000 1.5000000 versicolor
#> 80 5.700000 2.600000 3.500000 1.1715000 versicolor
#> 81 5.500000 2.400000 3.913850 1.0645000 versicolor
#> 82 5.500000 2.400000 3.700000 1.0000000 versicolor
#> 83 5.800000 2.658233 3.900000 1.2000000 versicolor
#> 84 6.149167 2.700000 5.100000 1.5301167 versicolor
#> 85 5.400000 3.000000 4.179583 1.5000000 versicolor
#> 86 6.000000 3.400000 4.500000 1.6000000 versicolor
#> 87 6.700000 3.100000 4.864000 1.5000000 versicolor
#> 88 5.843600 2.300000 4.400000 1.3000000 versicolor
#> 89 5.600000 3.000000 4.100000 1.3000000 versicolor
#> 90 5.500000 2.500000 4.000000 1.1090000 versicolor
#> 91 5.500000 2.683283 4.400000 1.2000000 versicolor
#> 92 6.100000 2.870667 4.600000 1.4639000 versicolor
#> 93 5.800000 2.600000 3.945000 1.2050000 versicolor
#> 94 5.167250 2.300000 3.300000 1.0000000 versicolor
#> 95 5.600000 2.700000 4.200000 1.3000000 versicolor
#> 96 5.700000 2.762633 4.200000 1.2000000 versicolor
#> 97 5.760267 2.900000 4.200000 1.3000000 versicolor
#> 98 6.200000 2.900000 4.300000 1.3000000 versicolor
#> 99 5.100000 2.500000 3.711750 1.1000000 versicolor
#> 100 6.016350 2.800000 4.399438 1.3000000 versicolor
#> 101 6.300000 3.300000 6.000000 2.5000000 virginica
#> 102 6.151167 2.700000 5.100000 1.6791167 virginica
#> 103 7.100000 3.000000 5.900000 2.1000000 virginica
#> 104 6.300000 2.900000 4.962833 1.8000000 virginica
#> 105 6.500000 3.000000 5.800000 2.2000000 virginica
#> 106 7.600000 3.000000 6.600000 2.1000000 virginica
#> 107 4.900000 2.500000 4.500000 1.7000000 virginica
#> 108 7.300000 2.900000 6.300000 2.0360000 virginica
#> 109 6.700000 2.500000 5.306533 1.8000000 virginica
#> 110 7.487500 3.600000 6.100000 2.1880000 virginica
#> 111 6.500000 3.200000 5.100000 2.0000000 virginica
#> 112 5.940250 2.700000 5.045000 1.9000000 virginica
#> 113 6.800000 3.000000 5.500000 2.1000000 virginica
#> 114 5.700000 2.500000 5.000000 2.0000000 virginica
#> 115 5.800000 2.884750 5.100000 2.4000000 virginica
#> 116 6.400000 3.033333 5.300000 2.3000000 virginica
#> 117 6.500000 3.000000 5.500000 2.1378333 virginica
#> 118 7.700000 3.800000 6.700000 2.1310000 virginica
#> 119 7.700000 2.600000 6.900000 2.3000000 virginica
#> 120 6.000000 2.780000 5.000000 1.5000000 virginica
#> 121 6.900000 3.200000 5.700000 2.3000000 virginica
#> 122 5.600000 2.800000 4.900000 2.0000000 virginica
#> 123 7.700000 2.800000 6.700000 2.0000000 virginica
#> 124 6.300000 2.700000 4.900000 1.8000000 virginica
#> 125 6.700000 3.300000 5.700000 2.4376667 virginica
#> 126 7.200000 3.200000 6.000000 1.8000000 virginica
#> 127 6.200000 2.800000 4.800000 1.8000000 virginica
#> 128 6.100000 3.000000 4.900000 1.8030000 virginica
#> 129 6.359500 2.800000 5.600000 2.1000000 virginica
#> 130 7.200000 3.000000 5.800000 2.0935000 virginica
#> 131 5.975000 2.800000 5.028000 1.9000000 virginica
#> 132 7.900000 3.800000 6.400000 2.0000000 virginica
#> 133 6.400000 3.053500 5.600000 2.2000000 virginica
#> 134 6.300000 2.800000 5.100000 1.5000000 virginica
#> 135 6.100000 2.600000 5.600000 1.4000000 virginica
#> 136 7.700000 3.261500 6.100000 2.3000000 virginica
#> 137 6.300000 3.400000 5.600000 2.4000000 virginica
#> 138 6.663450 3.100000 5.500000 1.8000000 virginica
#> 139 6.243000 3.000000 4.800000 1.8000000 virginica
#> 140 6.900000 3.100000 5.481000 2.1000000 virginica
#> 141 6.700000 3.100000 5.387250 2.4000000 virginica
#> 142 6.900000 3.100000 5.100000 2.3000000 virginica
#> 143 5.800000 2.700000 5.089250 1.9789167 virginica
#> 144 6.800000 3.200000 5.900000 2.3000000 virginica
#> 145 6.700000 3.300000 5.700000 2.5000000 virginica
#> 146 6.522100 3.000000 5.200000 2.3000000 virginica
#> 147 5.962750 2.760333 5.000000 1.9000000 virginica
#> 148 6.500000 3.000000 5.200000 2.1580000 virginica
#> 149 6.200000 2.771000 5.400000 1.8061667 virginica
#> 150 6.429000 3.000000 5.100000 1.8000000 virginica
#>
#> $OOBerror
#> NRMSE PFC
#> 0.14025694 0.04958678
#>
#> attr(,"class")
#> [1] "missForest"
Created on 2022-01-17 by the reprex package (v2.0.0)
Since I don't have a reprex from you I can't be sure if your data is in the correct format etc so compare with the example above.
Also just to add, if you have rows with full missing it won't be able to impute. That is literally impossible. It will need some information at least to be able to perform an impute.
Hi GM, thank you for the reply, reading it and making the reprex I realized some code errors.
This is the data:
id
c0011
c0019
c2059
c6000
c4444
c4419
c4459
c4460
date
1
1
NA
NA
NA
NA
NA
NA
NA
NA
2020
2
1
NA
NA
NA
NA
NA
NA
NA
NA
2020
3
1
NA
NA
NA
NA
NA
NA
NA
NA
2020
4
1
NA
NA
NA
NA
NA
NA
NA
NA
2020
5
1
33989
NA
NA
NA
NA
NA
NA
NA
2021
6
1
1653
NA
NA
NA
NA
NA
NA
NA
2021
7
1
799
NA
NA
NA
NA
NA
NA
NA
2021
8
1
22383
NA
NA
NA
NA
NA
NA
NA
2021
9
2
4011
2139
1639
233
991470
746901
217421
27148
2020
10
2
17621
8520
8273
828
991470
746901
217421
27148
2020
11
2
77525
33805
40138
3582
991470
746901
217421
27148
2020
12
2
69884
25737
38496
5651
991470
746901
217421
27148
2020
13
2
NA
NA
NA
NA
906534
679373
202449
24712
2021
14
2
NA
NA
NA
NA
906534
679373
202449
24712
2021
15
2
NA
NA
NA
NA
906534
679373
202449
24712
2021
16
2
NA
NA
NA
NA
906534
679373
202449
24712
2021
17
3
42365
14853
23338
4174
1012683
339358
563151
110174
2020
18
3
22188
8707
12032
1449
1012683
339358
563151
110174
2020
19
3
54738
21094
29265
4379
1012683
339358
563151
110174
2020
20
3
44200
17345
23968
2887
1012683
339358
563151
110174
2020
21
3
7685
2520
4380
785
1012683
339358
563151
110174
2020
22
3
9612
3174
5358
1080
1012683
339358
563151
110174
2020
23
3
8669
2999
4868
802
1012683
339358
563151
110174
2020
24
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
25
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
26
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
27
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
28
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
29
3
NA
NA
NA
NA
375736
124121
209384
42231
2021
30
3
11465
5127
5430
908
375736
124121
209384
42231
2021
and the code:
# import from xlsx a data set with number of clients per category
df <-
# relabel
names(df) <- c('id', 'c0011', 'c0019', 'c0059', 'c0060',
'c4444', 'c4419', 'c4459', 'c4460', 'date')
#> Error in names(df) <- c("id", "c0011", "c0019", "c0059", "c0060", "c4444", : names() applied to a non-vector
# convert id and date to factor and others to integer with lapply
df[c(1, 10)] <- lapply(df[c(1, 10)], as.factor)
#> Error in df[c(1, 10)]: object of type 'closure' is not subsettable
df[c(2:9)] <- lapply(df[c(2:9)], as.integer)
#> Error in df[c(2:9)]: object of type 'closure' is not subsettable
# drop missing rows and factor vars
dfmis <- df[-c(1:4), -c(1, 10)]
#> Error in df[-c(1:4), -c(1, 10)]: object of type 'closure' is not subsettable
# check
sapply(dfmis, class)
#> Error in lapply(X = X, FUN = FUN, ...): object 'dfmis' not found
# impute
imp <- missForest(dfmis, xtrue = df)
#> Error in missForest(dfmis, xtrue = df): could not find function "missForest"
You have errors here on errors. You need to solve these first before presenting it.
df <- data.frame(first = c(1, 1, 1,1, 1, 1),
second = c(1,1,1,1,1,1),
thirth = c(1,1,NA,NA, 1, 1),
fourth = c(1,1,1,1,1, NA))
df
#Option 1:
Replace to zeros:
df$thirth[(3:4)] <- 0
df$fourth[6] <- 0
df
#Option 2:
Replace by the arithmetic mean:
df$thirth[(3:4)] <- mean(df$thirth)
df$fourth[6] <- mean(df$fourt)
df
Indeed there is much to learn, I'll work on that.
By the way after exporting the modified data frame to *.txt and importing, the imputation works like a charm. (Just to to calm my curiosity lol).
Thank you GM for the recommendations!
Hi Miguel,
A much simpler workarround, many thanks.
system
Closed
February 8, 2022, 1:53am
9
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.