Hey everyone, I have two examples of code where I have a column with the factor datatype in the "gapminder" data set, and I'm trying to change the order of factor levels within that column.
In my first example, everything works fine. However, in my second example, when I don't assign the "gapminder" data set to a made up data frame, the order of my factor levels remain the same as the originally were: "Africa" "Americas" "Asia" "Europe" "Oceania". Why is that? Because I am a beginner to R, I would appreciate if someone could explain it in very simple terms. Thank you.
Functions in R take in arguments and return a value, If you don't store the returned value, it is printed to the console but not saved. In your second code
Thank you! That makes sense. I do have a question though: why does this concept not apply to something like the code below?When I execute my code I see that the hair_color column changes to a factor data type as I wanted, even though I did not assign my data frame to another data frame. Is it because we are not altering the actual data in the column and simply just the data type of the column?
What you see is the class of the column changing in what is printed to the console. If you check the class of the column in the starwars data frame, it is still character.
library(dplyr)
data("starwars")
class(starwars$hair_color)
#> [1] "character"
starwars %>% mutate(hair_color = as.factor(hair_color))
#> # A tibble: 87 Ă— 14
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <fct> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke Sk… 172 77 blond fair blue 19 male mascu…
#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
#> 4 Darth V… 202 136 none white yellow 41.9 male mascu…
#> 5 Leia Or… 150 49 brown light brown 19 fema… femin…
#> 6 Owen La… 178 120 brown, gr… light blue 52 male mascu…
#> 7 Beru Wh… 165 75 brown light blue 47 fema… femin…
#> 8 R5-D4 97 32 <NA> white, red red NA none mascu…
#> 9 Biggs D… 183 84 black light brown 24 male mascu…
#> 10 Obi-Wan… 182 77 auburn, w… fair blue-gray 57 male mascu…
#> # â„ą 77 more rows
#> # â„ą 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
#check the class
class(starwars$hair_color)
#> [1] "character"
starwars <- starwars %>% mutate(hair_color = as.factor(hair_color))
class(starwars$hair_color)
#> [1] "factor"
So what I'm getting from this is when we don't assign our original data frame into another data frame, even though we see our result in the console as "factor" for the hair_color column, our code did not actually change that column to factors because our resulting data frame was not saved/updated, and simply printed into the console. We must insert our data frame into another data frame in order for the column's class/data type to TRULY change. Would you agree with this summary of mine? Thank you for the constant feedback by the way! Much appreciated.
does something to starwars and, whatever the result (other than an error), R assigns the name starwars to it. The name starwars is on both sides of the assignment <-, but R waits for the right side to finish and then names the result starwars. The result could be just a single number and starwars would no longer be a data frame. The left side could be a name that does not exist earlier in the code or it could have been assigned earlier and will now be referring to the result of the most recent assignment. R is very open minded about what gets assigned to a name. Not all languages are like that.