Combining the mutate() and if_else() function together

Hi everyone, I'm trying to understand why my code doesn't work with the second syntax. I'm new to R, and I don't understand why the quotations make a difference in my results? In the first syntax, I get the result I want, with any number less than 10 is labeled as "short" and any number larger than 1 is labeled as "tall." In the second syntax, however, every label is "tall," when that's clearly not right. Why? Thank you.

First Syntax:
starwars %>%
mutate(height_in_cm = height/10) %>%
mutate(height_evaluation = if_else(height_in_cm < 10,
"short",
"tall")) %>%
View()

2nd Syntax:
starwars %>%
mutate("height in cm" = height/10) %>%
mutate(height_evaluation = if_else("height in cm" < 10,
"short",
"tall")) %>%
View()

This takes some explaining and I might not be completely precise in my explanation.
The functions of the tidyverse, like mutate(), take as their first argument a data frame. Any bare, without quotes, names used in the function are looked for first as column names in the data frame. In

starwars %>%
mutate(height_in_cm = height/10)

mutate() get the starwars data frame and looks for a column named height to do the required calculation. The same happens in

mutate(height_evaluation = if_else(height_in_cm < 10,
"short",
"tall"))

though your code made the column height_in_cm.
In

mutate(height_evaluation = if_else("height in cm" < 10,
"short",
"tall"))

the if_else function does not treat the string "height_in_cm" as a column name but as simple text. That comparison returns FALSE

 "height_in_cm" < 10
[1] FALSE

so you always get "tall". If you wrote

mutate(height_evaluation = if_else(20 < 10,
"short",
"tall"))

you would get the same result.
In many functions outside of the tidyverse, column names and other names do need to be quoted. It just depends on how the function was written.

1 Like

For what it's worth, there are a number of ways you can adjust if_else("height in cm" < 10, "short", "tall")) to get it to work. Here are three:

  • if_else(.data[["height in cm"]] < 10, "short", "tall"))

  • if_else(get("height in cm") < 10, "short", "tall"))

  • if_else(!!sym("height in cm") < 10, "short", "tall"))

1 Like

Got it, so it's important to me to understand that the quotations will confuse R studio and it will consider it to be text and not a pre-made column. I will make sure to keep this in mind moving forward! Thank you!

Thank you for the other possibilities around this problem! I will keep this in mind moving forward.

To follow up on the discussion, an alternative way of expressing the reason for the behavior described by @FJCC is that the column name "height in cm" is non-syntactic — it violates the naming rules in R by including space characters. However, if instead you use backticks, `height in cm`, then R will treat it as a proper column name.

2 Likes