There are two kinds of "ordering" going on with the factor
function. Let's start with this example:
temp_vec <- c("High", "Low", "High", "Low", "Medium")
temp_vec <- factor(temp_vec, levels = c("Low", "Medium", "High"))
class(temp_vec)
[1] "factor"
sort(temp_vec)
[1] Low Low Medium High High
Levels: Low Medium High
max(temp_vec)
Error in Summary.factor(c(3L, 1L, 3L, 1L, 2L), na.rm = FALSE) :
‘max’ not meaningful for factors
temp_vec
is now a factor
(that is, its class is "factor"). Also, its levels have the order you gave it with the levels
argument of the factor
function. If you sort temp_vec
it will be sorted in the order of the levels (so you can use factor
to set a sorting order that is different from alphabetical). And if you create a regression model using temp_vec
, the first level will be treated as the reference level.
But note in the example above that the levels are not treated as if any level is less than or greater than another level. That is, temp_vec
is not an "ordinal" variable. It has three categories, but they don't have a natural ordering in terms of their "magnitude". With factor
we've just changed the order for sorting purposes.
To give temp_vec
an order in terms of magnitude, we turn it into an ordered factor. Either of these will work (note that there is an ordered
argument, but not an order
argument):
temp_vec <- factor(temp_vec, levels=c("Low", "Medium", "High"), ordered=TRUE)
temp_vec <- ordered(temp_vec, levels = c("Low", "Medium", "High"))
class(temp_vec)
[1] "ordered" "factor"
max(temp_vec)
[1] High
Levels: Low < Medium < High
Note how the levels now have a magnitude order with "Low" less than "Medium" and "Medium" less than "High". An ordered factor is different from a non-ordered factor, because now R's modeling functions (such as lm
or glm
) will treat "Low" as being less than "Medium" and "High".