I have a datafile with sales data that I want to clean. I want to make a new EndUser column based on an existing column.
I am used to working in SPSS.
What is wrong with this command?
if (sales$EndUser ="B") {sales$EndUserB <- "B2"}
I have a datafile with sales data that I want to clean. I want to make a new EndUser column based on an existing column.
I am used to working in SPSS.
What is wrong with this command?
if (sales$EndUser ="B") {sales$EndUserB <- "B2"}
You want to use the vectorised version of if, ifselse(). I used NA to fill rows where EndUser !="B" but you may want to do something else.
By the way, notice that the operator for comparing two values is ==, not =
Df <- data.frame(EndUser = sample(c("A", "B", "C"),8, replace = TRUE ))
Df
#> EndUser
#> 1 C
#> 2 B
#> 3 A
#> 4 C
#> 5 A
#> 6 C
#> 7 B
#> 8 A
Df$EndUserB <- ifelse(Df$EndUser == "B", "B2", NA)
Df
#> EndUser EndUserB
#> 1 C <NA>
#> 2 B B2
#> 3 A <NA>
#> 4 C <NA>
#> 5 A <NA>
#> 6 C <NA>
#> 7 B B2
#> 8 A <NA>
Created on 2019-09-11 by the reprex package (v0.2.1)
Hi there,
The logical rule of equivalence is actually '=='. So try this...
if (sales$EndUser =="B") {sales$EndUserB <- "B2"}
Also, you could change your line of code somewhat, to assign in a different way...
sales$EndUserB <- if (sales$EndUser =="B") {"B2"} else {"something else"}
Regards,
Will
FJCC's answer is definitely better in this case. 'ifelse' is much faster!
Will
You don't even have to use if:
# just to make sure Df$EndUserB exists
Df$EndUserB <- NA
# Set all rows where EndUser == "B" to "B2"
Df$EndUserB[Df$EndUser=="B"] <- "B2"
Thanks a lot.
I tried the == (had it before as well) but get this error message
Error in if (sales$EndUser == "B") { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In if (sales$EndUser == "B") { :
the condition has length > 1 and only the first element will be used
I first used the if else statement, and made else empty ("").
However, when I would then run the second line (I have a whole list - for instance "CC" needs to become "C", else emply --> it would overwrite my first command.
If you have many value pairs, make a vector that maps the values as shown below.
Df <- data.frame(EndUser = c("B", "A", "C", "A", "C", "B", "A"))
Df
#> EndUser
#> 1 B
#> 2 A
#> 3 C
#> 4 A
#> 5 C
#> 6 B
#> 7 A
MapValues <- c("A" = "AZ", "B" = "B2", "C" = "CC")
Df$EndUserB <- MapValues[Df$EndUser]
Df
#> EndUser EndUserB
#> 1 B B2
#> 2 A AZ
#> 3 C CC
#> 4 A AZ
#> 5 C CC
#> 6 B B2
#> 7 A AZ
Created on 2019-09-11 by the reprex package (v0.2.1)
Thank you Stkrog.
When i run this I get
"Error in Df$EndUserB <- NA : object 'Df' not found"
And I would need EndUserB in my original datafile, to cross it to other columns later.
It is really frustrating this. What kind of course or book would you recommend.
I left out the part where you populate the data.frame with your original data. I can see from your original post that the df is named "sales". Just replace "Df" in my post with "sales" and you should be home safe.
But the mapping solution from FJCC is actually much nicer as it handles NAs without initialization of sales$EndUserB.
As for a book, google "R for Data Science" by Hadley Wickham and Garrett Grolemund. I leaned it from "An introduction to R" by Venables & Smith and "Statistics with R" by Dalgaard, but that was ages ago.
Thanks. It now worked, yeah!
The solution of FJCC acquires something after else. And I do not know what to put there. As I tried to leave it empty. But then when I would run C needs to become C2 or emply ---> the B2 row was empty again.
I will look into the books.
We have a vintage-but-still-great thread here where people contributed lots of great books and other resources for learning R: What's your favorite intro to R?
There's at least one book out there that specifically targets people transitioning from SPSS:
Thanks a lot! will look into it.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.