Trying to understand the factor function and ordering

# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))

I searched the function factor in RDocumentation. I feel confused since I didn't see the "order" argument. I only see "ordered" argument in factor function.

There are two kinds of "ordering" going on with the factor function. Let's start with this example:

temp_vec <- c("High", "Low", "High", "Low", "Medium")
temp_vec <- factor(temp_vec, levels = c("Low", "Medium", "High"))

[1] "factor"

[1] Low    Low    Medium High   High  
Levels: Low Medium High

Error in Summary.factor(c(3L, 1L, 3L, 1L, 2L), na.rm = FALSE) : 
  ‘max’ not meaningful for factors

temp_vec is now a factor (that is, its class is "factor"). Also, its levels have the order you gave it with the levels argument of the factor function. If you sort temp_vec it will be sorted in the order of the levels (so you can use factor to set a sorting order that is different from alphabetical). And if you create a regression model using temp_vec, the first level will be treated as the reference level.

But note in the example above that the levels are not treated as if any level is less than or greater than another level. That is, temp_vec is not an "ordinal" variable. It has three categories, but they don't have a natural ordering in terms of their "magnitude". With factor we've just changed the order for sorting purposes.

To give temp_vec an order in terms of magnitude, we turn it into an ordered factor. Either of these will work (note that there is an ordered argument, but not an order argument):

temp_vec <- factor(temp_vec, levels=c("Low", "Medium", "High"), ordered=TRUE)  
temp_vec <- ordered(temp_vec,  levels = c("Low", "Medium", "High"))

[1] "ordered" "factor" 

[1] High
Levels: Low < Medium < High

Note how the levels now have a magnitude order with "Low" less than "Medium" and "Medium" less than "High". An ordered factor is different from a non-ordered factor, because now R's modeling functions (such as lm or glm) will treat "Low" as being less than "Medium" and "High".


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.