Hi all,
I recently got into a problem with tidying up data after scraping a table off the web.
The basic problem is illustrated in the test_df
test_df <- tibble(
x = c(NA, NA, "foobar", "foobar"),
y = c("foobar", "foobar", "bar","bar"),
z = c("bar", "foo", NA, NA)
)
# A tibble: 4 x 3
x y z
<chr> <chr> <chr>
1 NA foobar bar
2 NA foobar foo
3 foobar bar NA
4 foobar bar NA
I basically needed to add a new column named "foobar" with the values which occur right next to the "foobar" entry.
The desired output is:
# A tibble: 4 x 4
x y z foobar
<chr> <chr> <chr> <chr>
1 NA foobar bar bar
2 NA foobar foo foo
3 foobar bar NA bar
4 foobar bar NA bar
I managed to do this with the following function:
select_row <- function(df, term){
ifelse(!is.na(df$x) & df$x == term, df$y,
ifelse(!is.na(df$y) & df$y == term, df$z, NA))}
and then using mutate
from dplyr:
test_df %>% mutate(foobar = select_row(., "foobar"))
This gives the desired output, but the function select_row()
is not pretty. Firstly, the column names are hard-coded. Secondly, if the data frame is large and we expect the value pair to be in any column, the ifelse statement in the function would become really large as well.
Does anybody have an idea how this function could be written in a more elegant and reusable way, i.e. without a large ifelse-statement and without having the column names hard-coded.
Any input would be highly appreciated.