Hi all!
I registered to ask this question. Also first time trying out reprex(). I am curating data and need the code to be as easy to understand for others as possible.
I have code that does transform the data as intended, but I'm hoping it can be improved (see below).
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'purrr' was built under R version 3.5.3
I have categorical data (here: fruit_categorical) derived from a picklist that includes one “diverse” category that allows specification using a string. The specifications are stored in an additional variable (here: fruit_diverse). For manually selected strings in fruit_diverse, I want to “categorize” the string, i.e. putting it in _categorical and placing NA in _diverse
is there a way to mutate several variables based on the same condition ?
example data
fruits <- data.frame(fruit_categorical = c("apple", "banana", "diverse", "diverse"),
fruit_diverse = c(NA_character_, NA_character_, "kiwi", "red kiwi"),
stringsAsFactors = FALSE)
fruits
#> fruit_categorical fruit_diverse
#> 1 apple <NA>
#> 2 banana <NA>
#> 3 diverse kiwi
#> 4 diverse red kiwi
wanted result
data.frame(fruit_categorical = c("apple", "banana", "kiwi", "diverse"),
fruit_diverse = c(NA_character_, NA_character_, NA_character_, "red kiwi"),
stringsAsFactors = FALSE)
#> fruit_categorical fruit_diverse
#> 1 apple <NA>
#> 2 banana <NA>
#> 3 kiwi <NA>
#> 4 diverse red kiwi
wanted solution: something like a mutate_if on rows, this would be very easy for somebody else to understand what is going on:
fruits %>%
if (str_detect("^kiwi|^mango", fruit_diverse)) mutate(
fruit_categorical = fruit_diverse,
fruit_diverse = NA_character_
)
#> Warning in if (.) str_detect("^kiwi|^mango", fruit_diverse) else
#> mutate(fruit_categorical = fruit_diverse, : the condition has length > 1
#> and only the first element will be used
#> Error in if (.) str_detect("^kiwi|^mango", fruit_diverse) else mutate(fruit_categorical = fruit_diverse, : argument is not interpretable as logical
best solution I can come up with, this is what I’m currently using and looking to simplify
fruits %>%
mutate(
defined_categorical = !is.na(fruit_diverse) & str_detect("^kiwi|^mango", fruit_diverse),
fruit_categorical = if_else(defined_categorical, fruit_diverse, fruit_categorical),
fruit_diverse = if_else(defined_categorical, NA_character_, fruit_diverse)
) %>%
select(-defined_categorical)
#> fruit_categorical fruit_diverse
#> 1 apple <NA>
#> 2 banana <NA>
#> 3 kiwi <NA>
#> 4 diverse red kiwi
also not wanted, seems much more difficult to comprehend:
defined_regex <- "^kiwi|^mango"
fruits %>%
mutate(
fruit_categorical = if_else(!is.na(fruit_diverse) & str_detect(defined_regex, fruit_diverse), fruit_diverse, fruit_categorical),
fruit_diverse = if_else(str_detect(defined_regex, fruit_categorical), NA_character_, fruit_diverse)
)
#> fruit_categorical fruit_diverse
#> 1 apple <NA>
#> 2 banana <NA>
#> 3 kiwi <NA>
#> 4 diverse red kiwi
Created on 2019-04-15 by the reprex package (v0.2.1)