AC3112
July 21, 2021, 5:40pm
1
Hi there,
I wondered if someone could provide some guidance on how to recode a continuous variable into a categorical variable with certain value ranges.
For example, suppose I had a continuous variable 'Score' taken from a data set called 'test'. The continuous variable score ranges from -1, 1.
For scores < 0 , I would like to classify this as a "critical".
For 0 < score < 0.5, I would like to classify as "poor"
And for score > 0.5, I would like to classify as "good"
Would really appreciate anyone's help on the matter.
Best,
It's a bit of a hack, but...
Score <- runif(10, -1, 1) ## sample data
Score
Val <- rep("", 10) ## storage for the classifcation
Val[Score<0] <- "critical"
Val[0<=Score & Score<0.5] <- "poor"
Val[Score>=0.5] <- "good"
Val
AC3112
July 21, 2021, 6:02pm
3
Thanks @bloosmore - I appreciate your contribution.
I see you've generated a random distribution for Score. However, would this type same procedure work if I ran test$Score in place of Score?
If so, how would I define the storage for classification?
Would be appreciated.
Sure!
test <- data.frame(Score = runif(10, -1, 1)) ## sample data
test$Val <- rep("", length(test$Score))
test$Val[test$Score<0] <- "critical"
test$Val[0<=test$Score & test$Score<0.5] <- "poor"
test$Val[test$Score>=0.5] <- "good"
test
AC3112
July 21, 2021, 6:11pm
5
Thanks. Really appreciate your help on the matter.
One more dumb question, apologies. I don't understand the operational purpose of the `rep('''',10)'.
Would you be able to clarify that a little?
Note I simplified the above slightly...
The rep()
function creates a vector of the same length as the original data consisting initially of null characters. It's basically just creating storage for the subsequent classifications. An alternative approach would be
test$Val <- vector("character", length(test$Score))
This is a pretty common operation, so there is actually a function that does it for you:
test <- data.frame(Score = runif(10, -1, 1)) ## sample data
test$val <- cut(test$Score,
breaks = c(-1, 0, 0.5, 1),
labels = c("critical","poor","good"))
Which is doing the same thing as @bloosmore , but also has methods for different types of variables.
2 Likes
AC3112
July 21, 2021, 6:20pm
8
Thank you. That makes sense
I guess thereafter, it is possible/necessary to transform the 'character' Val into a factor/categorical variable using the standard arguments?
cut returns a factor, with option to return an ordered factor, see ?cut
AC3112
July 21, 2021, 6:23pm
10
Thanks both Really appreciate both contributions.
@AC3112 it is helpful to mark this post as solved, so that others know where to devote effort. See
If your question has been answered, don't forget to mark the solution!
How do I mark a solution?
Find the reply you want to mark as the solution and look for the row of small gray icons at the bottom of that reply. Click the one that looks like a box with a checkmark in it:
[image]
Hovering over the mark solution button shows the label, "Select if this reply solves the problem". If you don't see the mark solution button, try clicking the three dots button ( ••• ) to expand the full set of options.
When a solution is chosen, the icon turns green and the hover label changes to: "Unselect if this reply no longer solves the problem". Success!
[solution_reply_author]
…
AC3112
July 21, 2021, 6:33pm
12
Understood
I've marked the solution overall to balance contributions by both parties. Thanks again for the help of you both
And yes, if you decide not to use cut()
you can always use:
test$Val <- as.factor(test$Val)
AC3112
July 21, 2021, 7:39pm
14
Hi guys,
Sorry to be a pain again.
When I recode this variable as instructed as above in my data set, the original variable Score and the new variable Val exist.
Is there anyway to simply have Val replace Score?
Something like:
test <- data.frame(Score = runif(10, -1, 1)) ## sample data
test$Score <- cut(test$Score,
breaks = c(-1, 0, 0.5, 1),
labels = c("critical","poor","good"))
1 Like
system
Closed
July 28, 2021, 9:12pm
16
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.