Creating New DF Column that Assigns a Label based on Conditions from Another Column

griffin6 · October 7, 2020, 3:14pm

I have a data frame of about 390,000 observations where I'm trying to create a new column that outputs labels "ISR" and "Sales Lead" based on a certain person's name that's located in another column. For example, this is what I'm trying to get...

The "Assigned" column is the one that already exists. I want all of names 1 & 2 to come out as "ISR" in the new column and names 3-9 to come out as "Sales Lead". I've tried doing the mutate function, but that only seems to work with mathematic formulas.

Any help would be greatly appreciated.

andresrcs · October 7, 2020, 4:56pm

Hi,

This is an example of how to do it using dplyr::mutate() and a little help of the stringr package which enables the use of "Regular Expressions" to avoid too much typing.

library(dplyr)
library(stringr)

# Sample data on a copy/paste friendly format
sample_df <- data.frame(
  stringsAsFactors = FALSE,
          Assigned = c("Name 1", "Name 1", "Name 3", "Name 4", "Name 2"),
     Assigned_Type = c("ISR", "ISR", "Sales Lead", "Sales Lead", "ISR")
)

sample_df %>% 
    mutate(new_column = if_else(str_detect(Assigned, "(1|2)$"), "ISR", "Sales Lead"))
#>   Assigned Assigned_Type new_column
#> 1   Name 1           ISR        ISR
#> 2   Name 1           ISR        ISR
#> 3   Name 3    Sales Lead Sales Lead
#> 4   Name 4    Sales Lead Sales Lead
#> 5   Name 2           ISR        ISR

^{Created on 2020-10-07 by the reprex package (v0.3.0)}

If you want to learn more about data wrangling you can read this free ebook

Also, for future questions please try to provide a proper REPRoducible EXample (reprex) illustrating your issue.

griffin6 · October 7, 2020, 6:05pm

Thank you for the response, I wanted to mention that in my existing DF, I have 16 total variables, so how exactly would that go down in the formula?

ex: My DF is called "Activities" and I have other columns/variables labeled "Date", Account Name", "Event Outcome", etc. There are way too many dates for me to type each one out as "Date = c(1/1/2020, 1/2/2020, etc.)". Any help on this part?

andresrcs · October 7, 2020, 6:16pm

There is no need to do that, in my example that is to provide sample data on a copy/paste friendly format, you simply have to use your own data set instead. read the reprex guide in the link I gave you to understand why I have done it that way.

griffin6 · October 8, 2020, 1:33pm

Alright, that makes more sense after reading that. One more question, I have other names in the "Assigned" column that are neither ISR's nor Sales Leads and want to label them as "N/A". How would I go about doing that?

nirgrahamuk · October 8, 2020, 2:24pm

You can use dplyr case_when function to handle more than 2 cases.

griffin6 · October 8, 2020, 2:35pm

Thank you for the response. What part of the example below would I insert that function into?

andresrcs:

sample_df %>% 
    mutate(new_column = if_else(str_detect(Assigned, "(1|2)$"), "ISR", "Sales Lead"))
#>   Assigned Assigned_Type new_column
#> 1   Name 1           ISR        ISR
#> 2   Name 1           ISR        ISR
#> 3   Name 3    Sales Lead Sales Lead
#> 4   Name 4    Sales Lead Sales Lead
#> 5   Name 2           ISR        ISR

nirgrahamuk · October 8, 2020, 2:39pm

this sort of thing

sample_df %>% 
  mutate(new_column = case_when(str_detect(Assigned, "(1|2)$")~ "ISR",
                                str_detect(Assigned, "(3)$")~ "Sales Lead",
                                TRUE ~ "other"))

griffin6 · October 8, 2020, 2:41pm

Wonderful, thank you so much for the help!

system · October 15, 2020, 2:41pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.