Converting a variable with 10 numeric values into just 2 (0/1)

That20sShow · March 28, 2020, 11:08pm

Hello,

I have a variable that has the numeric values of 0 to 9 within that column.

I would like to convert these numbers so that the value of 0 remains 0 while the values of 1,2,3...9 all become 1.

I want to do this as I am running a logistic regression and I need this variable to become my target so if anyone can help me to convert these values into 1 that would be really amazing. This has frustrated me for the past hour because it seems so simple but I just can't get it. Thanks in advance!

Edit: I have added what I hope for the change to look like visually.

Weeks ------> Weeks
0 ------------------- 0
1 ------------------- 1
5 ------------------- 1
9 -------------------- 1
2 -------------------- 1
7 -------------------- 1
0 -------------------- 0
2 -------------------- 1
0 -------------------- 0

johan809 · March 29, 2020, 12:00am

You can do as follow

library(dplyr)

df <- data.frame(weeks = c(0, 1, 5, 0, 9, 7, 0, 2))

df %>%
mutate(weeks = ifelse(weeks == 0, 0, 1))

That20sShow · March 29, 2020, 12:15am

Thank you so much!

So if my data frame is 200 values for the weeks variable, how do I change what goes in the "c(" part of the data frame?

johan809 · March 29, 2020, 12:24am

The second line of code was just to have a reproducible example of your data frame. I suppose you already have an data.frame object in your work space with a variable call weeks, or no?

That20sShow · March 29, 2020, 12:31am

Yes, the entire csv file was renamed to "kits" with one of the variables named "weeks".

I tried the formula but it said the object "weeks" wasn't found:

df <- data.frame(weeks = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

df %>%
mutate(weeks = ifelse(weeks == 0, 0, 1))

johan809 · March 29, 2020, 12:35am

You just need kits %>% mutate(weeks = ifelse(weeks == 0, 0, 1))

That20sShow · March 29, 2020, 1:04am

Thank you very much for your help!!

johan809 · March 29, 2020, 1:25am

Great. Please mark the solution for the benefit of those to follow.

joels · March 29, 2020, 1:38am

Dichotomizing throws away information, reducing the statistical power and precision of your model. I don't know the context of your data or research problem but it's probably better to use a different model that uses all the information in the data and, if necessary, dichotomize the model predictions at the end, rather than dichotomizing the input data at the beginning.

That20sShow · March 29, 2020, 3:42am

Hi Joel, so the issue here is I am trying to solve for if a customer signs up for a product. So the number of weeks is irrelevant, I just need to know did they sign up? Yes or No. The easiest way I could think of it was to find a way to change the 1-9 values into a 1 so that I could run logistic regression. Is there something else I should consider?

joels · March 30, 2020, 3:34pm

I'm not sure. Does the number of weeks really not contain any information other than that they signed up? For example, could number of weeks be dependent on some underlying propensity to sign up (e.g., some sort of interest in or need for a product) and/or a likelihood of some event happening (e.g., seeing an ad, receiving a promotion, etc.). It's hard to know what's appropriate without more context on the problem you're trying to solve, what other data you have that might be predictive of the outcome, and what you believe the data-generating process to be.

system · April 6, 2020, 3:34pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.