Changing the order of factor levels in tidyverse

Hey everyone, I have two examples of code where I have a column with the factor datatype in the "gapminder" data set, and I'm trying to change the order of factor levels within that column.

In my first example, everything works fine. However, in my second example, when I don't assign the "gapminder" data set to a made up data frame, the order of my factor levels remain the same as the originally were: "Africa" "Americas" "Asia" "Europe" "Oceania". Why is that? Because I am a beginner to R, I would appreciate if someone could explain it in very simple terms. Thank you.

1st Syntax Example:

hey <- gapminder %>%
mutate(continent = factor(continent,
levels = c("Asia","Americas","Oceania","Europe","Africa")))

levels(info$continent)

2nd Syntax Example:

gapminder %>%
mutate(continent = factor(continent,
levels = c("Asia","Americas","Oceania","Europe","Africa")))

levels(gapminder$continent)

Functions in R take in arguments and return a value, If you don't store the returned value, it is printed to the console but not saved. In your second code

gapminder %>%
mutate(continent = factor(continent,
levels = c("Asia","Americas","Oceania","Europe","Africa")))

gapminder is passed to the mutate function and the levels of the continent column are changed but the resulting data frame is not saved. Running

gapminder <- gapminder %>%
mutate(continent = factor(continent,
levels = c("Asia","Americas","Oceania","Europe","Africa")))

would overwrite the original data frame . Or you could store the result in a new data frame.

NewDF <- gapminder %>%
mutate(continent = factor(continent,
levels = c("Asia","Americas","Oceania","Europe","Africa")))
1 Like

Thank you! That makes sense. I do have a question though: why does this concept not apply to something like the code below?When I execute my code I see that the hair_color column changes to a factor data type as I wanted, even though I did not assign my data frame to another data frame. Is it because we are not altering the actual data in the column and simply just the data type of the column?

starwars %>%
mutate(hair_color = as.factor(hair_color))

What you see is the class of the column changing in what is printed to the console. If you check the class of the column in the starwars data frame, it is still character.

library(dplyr)

data("starwars")
class(starwars$hair_color)
#> [1] "character"
starwars %>% mutate(hair_color = as.factor(hair_color))
#> # A tibble: 87 Ă— 14
#>    name     height  mass hair_color skin_color eye_color birth_year sex   gender
#>    <chr>     <int> <dbl> <fct>      <chr>      <chr>          <dbl> <chr> <chr> 
#>  1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
#>  2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
#>  3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
#>  4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
#>  5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
#>  6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
#>  7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
#>  8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
#>  9 Biggs D…    183    84 black      light      brown           24   male  mascu…
#> 10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
#> # â„ą 77 more rows
#> # â„ą 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

#check the class
class(starwars$hair_color)
#> [1] "character"

starwars <- starwars %>% mutate(hair_color = as.factor(hair_color))
class(starwars$hair_color)
#> [1] "factor"

Created on 2024-01-19 with reprex v2.0.2

1 Like

So what I'm getting from this is when we don't assign our original data frame into another data frame, even though we see our result in the console as "factor" for the hair_color column, our code did not actually change that column to factors because our resulting data frame was not saved/updated, and simply printed into the console. We must insert our data frame into another data frame in order for the column's class/data type to TRULY change. Would you agree with this summary of mine? Thank you for the constant feedback by the way! Much appreciated.

Your description is basically correct. It would be a little more correct to say that you assign a name to the result of the function. So,

starwars <- starwars %>% mutate(hair_color = as.factor(hair_color))

does something to starwars and, whatever the result (other than an error), R assigns the name starwars to it. The name starwars is on both sides of the assignment <-, but R waits for the right side to finish and then names the result starwars. The result could be just a single number and starwars would no longer be a data frame. The left side could be a name that does not exist earlier in the code or it could have been assigned earlier and will now be referring to the result of the most recent assignment. R is very open minded about what gets assigned to a name. Not all languages are like that.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.