Creating New DF Column that Assigns a Label based on Conditions from Another Column

I have a data frame of about 390,000 observations where I'm trying to create a new column that outputs labels "ISR" and "Sales Lead" based on a certain person's name that's located in another column. For example, this is what I'm trying to get...

Assigned | Assigned Type
Name 1 | ISR
Name 1 | ISR
Name 3 | Sales Lead
Name 4 | Sales Lead
Name 2 | ISR

The "Assigned" column is the one that already exists. I want all of names 1 & 2 to come out as "ISR" in the new column and names 3-9 to come out as "Sales Lead". I've tried doing the mutate function, but that only seems to work with mathematic formulas.

Any help would be greatly appreciated.

Hi,

This is an example of how to do it using dplyr::mutate() and a little help of the stringr package which enables the use of "Regular Expressions" to avoid too much typing.

library(dplyr)
library(stringr)

# Sample data on a copy/paste friendly format
sample_df <- data.frame(
  stringsAsFactors = FALSE,
          Assigned = c("Name 1", "Name 1", "Name 3", "Name 4", "Name 2"),
     Assigned_Type = c("ISR", "ISR", "Sales Lead", "Sales Lead", "ISR")
)

sample_df %>% 
    mutate(new_column = if_else(str_detect(Assigned, "(1|2)$"), "ISR", "Sales Lead"))
#>   Assigned Assigned_Type new_column
#> 1   Name 1           ISR        ISR
#> 2   Name 1           ISR        ISR
#> 3   Name 3    Sales Lead Sales Lead
#> 4   Name 4    Sales Lead Sales Lead
#> 5   Name 2           ISR        ISR

Created on 2020-10-07 by the reprex package (v0.3.0)

If you want to learn more about data wrangling you can read this free ebook

Also, for future questions please try to provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thank you for the response, I wanted to mention that in my existing DF, I have 16 total variables, so how exactly would that go down in the formula?

ex: My DF is called "Activities" and I have other columns/variables labeled "Date", Account Name", "Event Outcome", etc. There are way too many dates for me to type each one out as "Date = c(1/1/2020, 1/2/2020, etc.)". Any help on this part?

There is no need to do that, in my example that is to provide sample data on a copy/paste friendly format, you simply have to use your own data set instead. read the reprex guide in the link I gave you to understand why I have done it that way.

Alright, that makes more sense after reading that. One more question, I have other names in the "Assigned" column that are neither ISR's nor Sales Leads and want to label them as "N/A". How would I go about doing that?

You can use dplyr case_when function to handle more than 2 cases.

Thank you for the response. What part of the example below would I insert that function into?

this sort of thing

sample_df %>% 
  mutate(new_column = case_when(str_detect(Assigned, "(1|2)$")~ "ISR",
                                str_detect(Assigned, "(3)$")~ "Sales Lead",
                                TRUE ~ "other"))
1 Like

Wonderful, thank you so much for the help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.