Change the integer value coding of a factor in R

Hi:

dat <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48), Self_control = c(65, 70, 60, 60, 60, 55, 60,
55, 70, 65, 60, 70, 65, 60, 60, 50, 55, 65, 70, 55, 55, 60, 50,
50, 50, 55, 80, 65, 70, 75, 75, 65, 45, 60, 85, 65, 70, 70, 80,
60, 30, 30, 30, 55, 35, 20, 45, 40), Sex = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("Female",
"Male"), class = "factor"), Alcohol = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("None",
"2 Pints", "4 Pints"), class = "factor")), row.names = c(NA,
48L), class = "data.frame")


# Factor:
dat$Sex <- factor(dat$Sex, levels=c("Male", "Female"),
                  labels=c("Male", "Female"))

dat$Sex <- relevel(dat$Sex, ref="Female")

levels(dat$Sex)

I want to see how this factor is encoded, probably Female = 1 and Male = 2. How to change it to Female = 0 and Male = 1 ?

Level Encoding
1  Female        0
2   Male         1

R factors are positive integers. The lowest possible encoding is one.

OK, I understand now, that encoding for factors is always an integer that begins at 1 and has length = the number of factor's levels.
What if I want encode female as 123 and male as 256 ? I know that is not logical but just asking. Or am I confusing something here, I mean internal representation of levels encoding and coding levels as themselves ?

You simply cant do that.
I suppose you could use vctrs to invent your own version of factors that work like you would wish them to, but they wouldnt be R factors per-say.

Edit: i was wrong. Destix showed a way

Just don't make the variable a factor.

You could force the matter, by introducing a set of dummies. For example,

levels_dummy <- as.character(1:300)
levels_dummy[123] <- "Female"
levels_dummy[256] <- "Male"
dat$Sex <- factor(dat$Sex, levels = levels_dummy)

gives

as.integer(dat$Sex)
 [1] 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 123 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256

This can be practical when your levels correspond to an ordinal position. An intuitive example is given in the factor() documentation with letters:

(ff <- factor("s" "t" "a" "t" "i" "s" "t" "i" "c" "s", levels = letters))
as.integer(ff)
[1] 19 20  1 20  9 19 20  9  3 19

Here it makes sense to match the levels with the corresponding alphabet position.

Thank you @Destix for good, didactic example.

@Destix is giving a very good suggestion, but be careful how you use it. Note that @Destix carefully used as.integer. If you don't do this you end up with something like

>dat$Sex[1] + dat$Sex[2]
[1] NA
Warning message:
In Ops.factor(dat$Sex[1], dat$Sex[2]) : ‘+’ not meaningful for factors

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.