Converting a variable with 10 numeric values into just 2 (0/1)


I have a variable that has the numeric values of 0 to 9 within that column.

I would like to convert these numbers so that the value of 0 remains 0 while the values of 1,2,3...9 all become 1.

I want to do this as I am running a logistic regression and I need this variable to become my target so if anyone can help me to convert these values into 1 that would be really amazing. This has frustrated me for the past hour because it seems so simple but I just can't get it. Thanks in advance!

Edit: I have added what I hope for the change to look like visually.

Weeks ------> Weeks
0 ------------------- 0
1 ------------------- 1
5 ------------------- 1
9 -------------------- 1
2 -------------------- 1
7 -------------------- 1
0 -------------------- 0
2 -------------------- 1
0 -------------------- 0

You can do as follow


df <- data.frame(weeks = c(0, 1, 5, 0, 9, 7, 0, 2))

df %>%
mutate(weeks = ifelse(weeks == 0, 0, 1))

1 Like

Thank you so much!

So if my data frame is 200 values for the weeks variable, how do I change what goes in the "c(" part of the data frame?

The second line of code was just to have a reproducible example of your data frame. I suppose you already have an data.frame object in your work space with a variable call weeks, or no?

1 Like

Yes, the entire csv file was renamed to "kits" with one of the variables named "weeks".

I tried the formula but it said the object "weeks" wasn't found:

df <- data.frame(weeks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

df %>%
mutate(weeks = ifelse(weeks == 0, 0, 1))

You just need kits %>% mutate(weeks = ifelse(weeks == 0, 0, 1))

1 Like

Thank you very much for your help!!

Great. Please mark the solution for the benefit of those to follow.

1 Like

Dichotomizing throws away information, reducing the statistical power and precision of your model. I don't know the context of your data or research problem but it's probably better to use a different model that uses all the information in the data and, if necessary, dichotomize the model predictions at the end, rather than dichotomizing the input data at the beginning.


Hi Joel, so the issue here is I am trying to solve for if a customer signs up for a product. So the number of weeks is irrelevant, I just need to know did they sign up? Yes or No. The easiest way I could think of it was to find a way to change the 1-9 values into a 1 so that I could run logistic regression. Is there something else I should consider?

I'm not sure. Does the number of weeks really not contain any information other than that they signed up? For example, could number of weeks be dependent on some underlying propensity to sign up (e.g., some sort of interest in or need for a product) and/or a likelihood of some event happening (e.g., seeing an ad, receiving a promotion, etc.). It's hard to know what's appropriate without more context on the problem you're trying to solve, what other data you have that might be predictive of the outcome, and what you believe the data-generating process to be.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.