Hi ,
I want to recode multiple fields for e.g. gender,marital status ,education to numeric for eg. male=1 ,female=2 & so on
convert <-function(x,y,z)
{
if(x$y==z)
{x$y=as.factor(1)}
else{
x$y=as.factor(2)
}
}
train$Gender<-convert(train,Gender,"Male")
But this throws an error { : argument is of length zero which I believe is because the gender column is character. Can some one help out?
Thanks in advance.
Frank
December 11, 2017, 8:08pm
2
The function has a few issues:
It needs to return something at the end (probably x
). See ?return
.
We cannot write x$y
programmatically passing y
, unfortunately. Instead x[[y]] <-
should work where y
is a string column name with quotes.
if
and else
are for control flow (determining which pieces of code get run), while you you want this to apply in a vectorized way to each element of x$y
.
I'd suggest approaching this with a lookup table and a left join. The tidyverse way is shown here:
Does anyone have any recommendations for efficiently recoding data in a tibble?
I regularly work with coded values that need to be converted into human-readable names. I like the explicitness that dplyr::recode() provides, but it becomes cumbersome if there are many different coded values and it can be slow with large datasets (see the benchmark() tests below).
An alternative I sometimes use is to create a named vector of the human-readable values and assign the code values to the names attribute. This works well if there's a 1:1 relationship between codes and human-readable values and it's faster than the recode method, but it's still slow enough that it has become a bottleneck for my dai…
This also looks related, using case_when
as a substitute for if
/else
:
Sometimes I feel I use a Tidyverse approach, but not the right one or perhaps a non Tidyverse process altogether is better. Here is an example of such a situation:
I have some standard data, ie. mpg and cyl from mtcars. I also have some label summary statistics that says Bad, Medium or good for a level of mpg for cars with certain cyl.
Note: In the example pretend that the summary labels came from elsewhere. I'm not interested in calculating summary stats in order to label the data.
The data:
library(dplyr)
library(tibble)
library(purrr)
cars <- rownames_to_column(mtcars[1:2]) %>% as_data_frame()
[image]
mpg_label <- data_frame(
cyl_l = rep(c(4, 6, 8), each = 3),
label_l = rep(…
2 Likes