Hello
I am wondering on how to write a function that takes a data frame and a missing value function, as arguments, and then returns a new data frame with the missing values then replaced with values as found by the missing values function.
I was wondering if anyone could help me with this?
Kind regards
Ronnie
Can you post a minimal reproducible example, or reprex, with dummy data, the code you have already tried and your desired output.
You may also want to check out the functions section of R for data science and the section on functionals in Advanced R
As @tbradley said a reprex with example input data and desired output data would make it a lot easier for us to help you, but here is a simple example of one way to do this that you might be able to build on. However their are many different ways to do this depending on your requirements.
suppressPackageStartupMessages(library(tibble))
t <- tibble::tribble(~id, ~value1, ~value2,
1,NA, 2,
2, NA, 5,
3, 5, NA,
NA, 9, NA)
t
#> # A tibble: 4 x 3
#> id value1 value2
#> <dbl> <dbl> <dbl>
#> 1 1.00 NA 2.00
#> 2 2.00 NA 5.00
#> 3 3.00 5.00 NA
#> 4 NA 9.00 NA
valueFcn <- function() {
1000
}
replace_na <- function(tbl, column, valueFcn) {
tbl[is.na(tbl[[column]]), column] <- valueFcn()
tbl
}
replace_na(t, "value1", valueFcn)
#> # A tibble: 4 x 3
#> id value1 value2
#> <dbl> <dbl> <dbl>
#> 1 1.00 1000 2.00
#> 2 2.00 1000 5.00
#> 3 3.00 5.00 NA
#> 4 NA 9.00 NA
... and here is a tidyverse oriented version:
suppressPackageStartupMessages(library(tibble))
ifna <- function(value) {
purrr::map_dbl(value, ~ if(!is.na(.)) {.} else { 1000})
}
t <- tibble::tribble(
~id, ~value1, ~value2,
1,NA, 2,
2, NA, 5,
3, 5, NA,
NA, 9, NA)
dplyr::mutate_at(t, "value1", ifna)
#> # A tibble: 4 x 3
#> id value1 value2
#> <dbl> <dbl> <dbl>
#> 1 1.00 1000 2.00
#> 2 2.00 1000 5.00
#> 3 3.00 5.00 NA
#> 4 NA 9.00 NA
If that is actually what needs to be done, far simpler way is:
t <- tibble::tribble(
~id, ~value1, ~value2,
1,NA, 2,
2, NA, 5,
3, 5, NA,
NA, 9, NA)
t[is.na(t[["value1"]]), "value1"] <- 100
Obviously, since it is a string, value1
can be assigned to a variable and changed without any issue.
Many thanks for your help!
I'm still have a little struggle with implementing the above with my data-set
Apologies in advance for not providing the dataset with the question...The data-set is in a csv file so I can't upload it unfortunately, instead i've included a section of the data-set below:
,X,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
1,1,5.1,3.5,1.4,0.2,setosa
2,2,4.9,3,1.4,0.2,setosa
3,3,4.7,3.2,1.3,0.2,setosa
4,4,4.6,3.1,1.5,0.2,setosa
5,5,5,3.6,1.4,0.2,setosa
6,6,5.4,3.9,1.7,0.4,setosa
7,7,4.6,3.4,1.4,0.3,setosa
8,8,5,3.4,1.5,0.2,setosa
9,9,4.4,2.9,1.4,0.2,setosa
10,10,4.9,3.1,1.5,0.1,setosa
11,11,5.4,NA,1.5,0.2,setosa
12,12,4.8,NA,1.6,0.2,setosa
13,13,4.8,3,1.4,0.1,setosa
14,14,4.3,3,1.1,0.1,setosa
15,15,5.8,4,1.2,0.2,setosa
16,16,5.7,4.4,1.5,0.4,setosa
17,17,5.4,3.9,1.3,0.4,setosa
18,18,5.1,3.5,1.4,0.3,setosa
19,19,5.7,3.8,1.7,0.3,setosa
20,20,5.1,3.8,1.5,0.3,setosa
21,21,5.4,3.4,1.7,0.2,setosa
22,22,5.1,3.7,1.5,0.4,setosa
23,23,4.6,NA,1,0.2,setosa
24,24,5.1,3.3,1.7,0.5,setosa
25,25,4.8,3.4,1.9,0.2,setosa
26,26,5,3,1.6,0.2,setosa
27,27,5,3.4,1.6,0.4,setosa
28,28,5.2,3.5,1.5,0.2,setosa
29,29,5.2,3.4,1.4,0.2,setosa
30,30,4.7,3.2,1.6,0.2,setosa
31,31,4.8,3.1,1.6,0.2,setosa
32,32,5.4,3.4,1.5,0.4,setosa
33,33,5.2,NA,1.5,0.1,setosa
34,34,5.5,4.2,1.4,0.2,setosa
35,35,4.9,3.1,1.5,0.2,setosa
36,36,5,3.2,1.2,0.2,setosa
37,37,5.5,3.5,1.3,0.2,setosa
38,38,4.9,3.6,1.4,0.1,setosa
39,39,4.4,3,1.3,0.2,setosa
40,40,5.1,NA,1.5,0.2,setosa
41,41,5,3.5,1.3,0.3,setosa
42,42,4.5,2.3,1.3,0.3,setosa
43,43,4.4,3.2,1.3,0.2,setosa
44,44,5,3.5,1.6,0.6,setosa
45,45,5.1,3.8,1.9,0.4,setosa
46,46,4.8,3,1.4,0.3,setosa
47,47,5.1,NA,1.6,0.2,setosa
48,48,4.6,3.2,1.4,0.2,setosa
49,49,5.3,3.7,1.5,0.2,setosa
50,50,5,3.3,1.4,0.2,setosa
51,51,7,3.2,4.7,1.4,versicolor
52,52,6.4,3.2,4.5,1.5,versicolor
53,53,6.9,3.1,4.9,1.5,versicolor
54,54,5.5,NA,4,1.3,versicolor
55,55,6.5,2.8,4.6,1.5,versicolor
56,56,5.7,2.8,4.5,1.3,versicolor
57,57,6.3,3.3,4.7,1.6,versicolor
58,58,4.9,2.4,3.3,1,versicolor
59,59,6.6,2.9,4.6,1.3,versicolor
60,60,5.2,2.7,3.9,1.4,versicolor
61,61,5,2,3.5,1,versicolor
62,62,5.9,3,4.2,1.5,versicolor
63,63,6,NA,4,1,versicolor
64,64,6.1,2.9,4.7,1.4,versicolor
65,65,5.6,2.9,3.6,1.3,versicolor
66,66,6.7,3.1,4.4,1.4,versicolor
67,67,5.6,3,4.5,1.5,versicolor
68,68,5.8,2.7,4.1,1,versicolor
69,69,6.2,2.2,4.5,1.5,versicolor
70,70,5.6,2.5,3.9,1.1,versicolor
71,71,5.9,3.2,4.8,1.8,versicolor
72,72,6.1,2.8,4,1.3,versicolor
73,73,6.3,2.5,4.9,1.5,versicolor
74,74,6.1,2.8,4.7,1.2,versicolor
75,75,6.4,2.9,4.3,1.3,versicolor
76,76,6.6,3,4.4,1.4,versicolor
77,77,6.8,2.8,4.8,1.4,versicolor
78,78,6.7,3,5,1.7,versicolor
79,79,6,2.9,4.5,1.5,versicolor
80,80,5.7,2.6,3.5,1,versicolor
81,81,5.5,2.4,3.8,1.1,versicolor
82,82,5.5,2.4,3.7,1,versicolor
83,83,5.8,NA,3.9,1.2,versicolor
84,84,6,2.7,5.1,1.6,versicolor
85,85,5.4,3,4.5,1.5,versicolor
86,86,6,3.4,4.5,1.6,versicolor
87,87,6.7,3.1,4.7,1.5,versicolor
88,88,6.3,NA,4.4,1.3,versicolor
89,89,5.6,3,4.1,1.3,versicolor
90,90,5.5,2.5,4,1.3,versicolor
91,91,5.5,2.6,4.4,1.2,versicolor
92,92,6.1,3,4.6,1.4,versicolor
93,93,5.8,2.6,4,1.2,versicolor
94,94,5,2.3,3.3,1,versicolor
95,95,5.6,2.7,4.2,1.3,versicolor
96,96,5.7,3,4.2,1.2,versicolor
97,97,5.7,2.9,4.2,1.3,versicolor
98,98,6.2,2.9,4.3,1.3,versicolor
99,99,5.1,2.5,3,1.1,versicolor
100,100,5.7,2.8,4.1,1.3,versicolor
101,101,6.3,3.3,6,2.5,virginica
102,102,5.8,NA,5.1,1.9,virginica
103,103,7.1,3,5.9,2.1,virginica
104,104,6.3,2.9,5.6,1.8,virginica
I'm trying to find a way which does use libraries also
Thankyou again!
It's still not clear what it is that you want to do. What to you mean by "missing data"... there are a number of examples on this thread for replacing NA's.
You example data, as is, has 7 columns but only 6 column headings. The data itself has a leading comma with no value preceding it... just a typo? Or is something supposed to be there?
Your data looks like it was derived from datasets::iris, was it?
You should include an example of where the missing data is in your example and a criterion you might use to determine it is missing and what you would replace it with.
There are many people here who want to help you but your question, as is, is just too general. A specific example would do a lot to help us to help you.
t <- tibble::rowid_to_column(datasets::iris)
t
#> rowid Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 1 5.1 3.5 1.4 0.2 setosa
#> 2 2 4.9 3.0 1.4 0.2 setosa
#> 3 3 4.7 3.2 1.3 0.2 setosa
#> 4 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 5.0 3.6 1.4 0.2 setosa
#> 6 6 5.4 3.9 1.7 0.4 setosa
#> 7 7 4.6 3.4 1.4 0.3 setosa
#> 8 8 5.0 3.4 1.5 0.2 setosa
#> 9 9 4.4 2.9 1.4 0.2 setosa
#> 10 10 4.9 3.1 1.5 0.1 setosa
#> 11 11 5.4 3.7 1.5 0.2 setosa
#> 12 12 4.8 3.4 1.6 0.2 setosa
#> 13 13 4.8 3.0 1.4 0.1 setosa
#> 14 14 4.3 3.0 1.1 0.1 setosa
#> 15 15 5.8 4.0 1.2 0.2 setosa
#> 16 16 5.7 4.4 1.5 0.4 setosa
#> 17 17 5.4 3.9 1.3 0.4 setosa
#> 18 18 5.1 3.5 1.4 0.3 setosa
#> 19 19 5.7 3.8 1.7 0.3 setosa
#> 20 20 5.1 3.8 1.5 0.3 setosa
#> 21 21 5.4 3.4 1.7 0.2 setosa
#> 22 22 5.1 3.7 1.5 0.4 setosa
#> 23 23 4.6 3.6 1.0 0.2 setosa
#> 24 24 5.1 3.3 1.7 0.5 setosa
#> 25 25 4.8 3.4 1.9 0.2 setosa
#> 26 26 5.0 3.0 1.6 0.2 setosa
#> 27 27 5.0 3.4 1.6 0.4 setosa
#> 28 28 5.2 3.5 1.5 0.2 setosa
#> 29 29 5.2 3.4 1.4 0.2 setosa
#> 30 30 4.7 3.2 1.6 0.2 setosa
#> 31 31 4.8 3.1 1.6 0.2 setosa
#> 32 32 5.4 3.4 1.5 0.4 setosa
#> 33 33 5.2 4.1 1.5 0.1 setosa
#> 34 34 5.5 4.2 1.4 0.2 setosa
#> 35 35 4.9 3.1 1.5 0.2 setosa
#> 36 36 5.0 3.2 1.2 0.2 setosa
#> 37 37 5.5 3.5 1.3 0.2 setosa
#> 38 38 4.9 3.6 1.4 0.1 setosa
#> 39 39 4.4 3.0 1.3 0.2 setosa
#> 40 40 5.1 3.4 1.5 0.2 setosa
#> 41 41 5.0 3.5 1.3 0.3 setosa
#> 42 42 4.5 2.3 1.3 0.3 setosa
#> 43 43 4.4 3.2 1.3 0.2 setosa
#> 44 44 5.0 3.5 1.6 0.6 setosa
#> 45 45 5.1 3.8 1.9 0.4 setosa
#> 46 46 4.8 3.0 1.4 0.3 setosa
#> 47 47 5.1 3.8 1.6 0.2 setosa
#> 48 48 4.6 3.2 1.4 0.2 setosa
#> 49 49 5.3 3.7 1.5 0.2 setosa
#> 50 50 5.0 3.3 1.4 0.2 setosa
#> 51 51 7.0 3.2 4.7 1.4 versicolor
#> 52 52 6.4 3.2 4.5 1.5 versicolor
#> 53 53 6.9 3.1 4.9 1.5 versicolor
#> 54 54 5.5 2.3 4.0 1.3 versicolor
#> 55 55 6.5 2.8 4.6 1.5 versicolor
#> 56 56 5.7 2.8 4.5 1.3 versicolor
#> 57 57 6.3 3.3 4.7 1.6 versicolor
#> 58 58 4.9 2.4 3.3 1.0 versicolor
#> 59 59 6.6 2.9 4.6 1.3 versicolor
#> 60 60 5.2 2.7 3.9 1.4 versicolor
#> 61 61 5.0 2.0 3.5 1.0 versicolor
#> 62 62 5.9 3.0 4.2 1.5 versicolor
#> 63 63 6.0 2.2 4.0 1.0 versicolor
#> 64 64 6.1 2.9 4.7 1.4 versicolor
#> 65 65 5.6 2.9 3.6 1.3 versicolor
#> 66 66 6.7 3.1 4.4 1.4 versicolor
#> 67 67 5.6 3.0 4.5 1.5 versicolor
#> 68 68 5.8 2.7 4.1 1.0 versicolor
#> 69 69 6.2 2.2 4.5 1.5 versicolor
#> 70 70 5.6 2.5 3.9 1.1 versicolor
#> 71 71 5.9 3.2 4.8 1.8 versicolor
#> 72 72 6.1 2.8 4.0 1.3 versicolor
#> 73 73 6.3 2.5 4.9 1.5 versicolor
#> 74 74 6.1 2.8 4.7 1.2 versicolor
#> 75 75 6.4 2.9 4.3 1.3 versicolor
#> 76 76 6.6 3.0 4.4 1.4 versicolor
#> 77 77 6.8 2.8 4.8 1.4 versicolor
#> 78 78 6.7 3.0 5.0 1.7 versicolor
#> 79 79 6.0 2.9 4.5 1.5 versicolor
#> 80 80 5.7 2.6 3.5 1.0 versicolor
#> 81 81 5.5 2.4 3.8 1.1 versicolor
#> 82 82 5.5 2.4 3.7 1.0 versicolor
#> 83 83 5.8 2.7 3.9 1.2 versicolor
#> 84 84 6.0 2.7 5.1 1.6 versicolor
#> 85 85 5.4 3.0 4.5 1.5 versicolor
#> 86 86 6.0 3.4 4.5 1.6 versicolor
#> 87 87 6.7 3.1 4.7 1.5 versicolor
#> 88 88 6.3 2.3 4.4 1.3 versicolor
#> 89 89 5.6 3.0 4.1 1.3 versicolor
#> 90 90 5.5 2.5 4.0 1.3 versicolor
#> 91 91 5.5 2.6 4.4 1.2 versicolor
#> 92 92 6.1 3.0 4.6 1.4 versicolor
#> 93 93 5.8 2.6 4.0 1.2 versicolor
#> 94 94 5.0 2.3 3.3 1.0 versicolor
#> 95 95 5.6 2.7 4.2 1.3 versicolor
#> 96 96 5.7 3.0 4.2 1.2 versicolor
#> 97 97 5.7 2.9 4.2 1.3 versicolor
#> 98 98 6.2 2.9 4.3 1.3 versicolor
#> 99 99 5.1 2.5 3.0 1.1 versicolor
#> 100 100 5.7 2.8 4.1 1.3 versicolor
#> 101 101 6.3 3.3 6.0 2.5 virginica
#> 102 102 5.8 2.7 5.1 1.9 virginica
#> 103 103 7.1 3.0 5.9 2.1 virginica
#> 104 104 6.3 2.9 5.6 1.8 virginica
#> 105 105 6.5 3.0 5.8 2.2 virginica
#> 106 106 7.6 3.0 6.6 2.1 virginica
#> 107 107 4.9 2.5 4.5 1.7 virginica
#> 108 108 7.3 2.9 6.3 1.8 virginica
#> 109 109 6.7 2.5 5.8 1.8 virginica
#> 110 110 7.2 3.6 6.1 2.5 virginica
#> 111 111 6.5 3.2 5.1 2.0 virginica
#> 112 112 6.4 2.7 5.3 1.9 virginica
#> 113 113 6.8 3.0 5.5 2.1 virginica
#> 114 114 5.7 2.5 5.0 2.0 virginica
#> 115 115 5.8 2.8 5.1 2.4 virginica
#> 116 116 6.4 3.2 5.3 2.3 virginica
#> 117 117 6.5 3.0 5.5 1.8 virginica
#> 118 118 7.7 3.8 6.7 2.2 virginica
#> 119 119 7.7 2.6 6.9 2.3 virginica
#> 120 120 6.0 2.2 5.0 1.5 virginica
#> 121 121 6.9 3.2 5.7 2.3 virginica
#> 122 122 5.6 2.8 4.9 2.0 virginica
#> 123 123 7.7 2.8 6.7 2.0 virginica
#> 124 124 6.3 2.7 4.9 1.8 virginica
#> 125 125 6.7 3.3 5.7 2.1 virginica
#> 126 126 7.2 3.2 6.0 1.8 virginica
#> 127 127 6.2 2.8 4.8 1.8 virginica
#> 128 128 6.1 3.0 4.9 1.8 virginica
#> 129 129 6.4 2.8 5.6 2.1 virginica
#> 130 130 7.2 3.0 5.8 1.6 virginica
#> 131 131 7.4 2.8 6.1 1.9 virginica
#> 132 132 7.9 3.8 6.4 2.0 virginica
#> 133 133 6.4 2.8 5.6 2.2 virginica
#> 134 134 6.3 2.8 5.1 1.5 virginica
#> 135 135 6.1 2.6 5.6 1.4 virginica
#> 136 136 7.7 3.0 6.1 2.3 virginica
#> 137 137 6.3 3.4 5.6 2.4 virginica
#> 138 138 6.4 3.1 5.5 1.8 virginica
#> 139 139 6.0 3.0 4.8 1.8 virginica
#> 140 140 6.9 3.1 5.4 2.1 virginica
#> 141 141 6.7 3.1 5.6 2.4 virginica
#> 142 142 6.9 3.1 5.1 2.3 virginica
#> 143 143 5.8 2.7 5.1 1.9 virginica
#> 144 144 6.8 3.2 5.9 2.3 virginica
#> 145 145 6.7 3.3 5.7 2.5 virginica
#> 146 146 6.7 3.0 5.2 2.3 virginica
#> 147 147 6.3 2.5 5.0 1.9 virginica
#> 148 148 6.5 3.0 5.2 2.0 virginica
#> 149 149 6.2 3.4 5.4 2.3 virginica
#> 150 150 5.9 3.0 5.1 1.8 virginica