omario
October 6, 2021, 2:34am
1
I am working with the R programming language.
I created some data:
#PART 1
#create data
library(dplyr)
library(caret)
set.seed(123)
salary <- rnorm(1000,5,5)
height <- rnorm(1000,2,2)
my_data = data.frame(salary, height)
#PART 2
#create train and test data
train<-sample_frac(my_data, 0.7)
sid<-as.numeric(rownames(train)) # because rownames() returns character
test<-my_data[-sid,]
#PART 3
salary_quantiles = data.frame( train %>% summarise (quant_1 = quantile(salary, 0.33),
quant_2 = quantile(salary, 0.66),
quant_3 = quantile(salary, 0.99)))
> salary_quantiles
quant_1 quant_2 quant_3
1 3.005188 6.952076 16.98823
Question: Now, I am trying to write an IF STATEMENT which takes the quantiles (3.005188 6.952076 16.98823) and place them into the if statement (I did this manually):
#PART 4
train$salary_type = as.factor(ifelse(train$salary < 3.005188, "A", ifelse( train$salary > 3.005188 & train$salary < 6.952076, "B", "C")))
Does anyone know if there is a way to do this without writing these numbers explicitly? For example:
train$salary_type = as.factor(ifelse(train$salary < salary_quantiles$quant_1 , "A", ifelse( train$salary > salary_quantiles$quant_1 & train$salary < salary_quantiles$quant_2, "B", "C")))
Is this possible to do in R?
Thanks!
Perhaps this?
salary_quantiles[[1]]
train$salary_type = as.factor(ifelse(train$salary < salary_quantiles[[1]], "A",
ifelse( train$salary > salary_quantiles[[1]] & train$salary < salary_quantiles[[2]], "B", "C")))
You could also use case_when()
instead of ifelse in the second bit of code.
1 Like
omario
October 6, 2021, 2:53am
3
Thank you for your answer! I tried similar logics for the following style of problem. Suppose you have this data set:
head(test)
salary height
701 1.358904 1.6148796
702 -2.702212 1.0604070
703 1.534527 -4.0957218
704 5.594247 5.7373110
705 -1.823547 5.5808484
706 7.949913 -0.2021635
test$salary_type = as.factor(ifelse(test$salary < salary_quantiles$quant_1 , "A", ifelse( test$salary > salary_quantiles$quant_1 & test$salary < salary_quantiles$quant_2, "B", "C")))
But then this does not work
test$height_pred = as.factor(ifelse(test$salary_type == "A", height_quanitles[[1]], ifelse(test$salary_type == "B", height_quanitles[[2]], height_quanitles[[3]])))
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Do you know why this returns an error?
Thanks!
system
Closed
October 27, 2021, 3:05am
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.