Replacing all numerical values in a column with text characters

Hello - I am working with multiple large datasets and am pooling information together in order to create a data frame. However, I would like to add in a column that will allow me to identify the dataset where my observations are coming from. Here is an example that I have created where I was able to add an extra column called 'EmployeeCompany' using the mutate function:

structure(list(EmployeeID = 1:4, EmploymentType = structure(c(2L, 
2L, 3L, 1L), .Label = c("Manager", "Mechanic", "Painter"), class = "factor"), 
    EmployeeCompany = 1:4), class = "data.frame", row.names = c(NA, 
-4L))

Is there a way to modify the mutate function so it converts all numerical employee IDs to company name company name Honda, which is the name of my dataset?

Also I would like to repeat this process for a similar data frame but by using Toyota as the employee company. Is there a function that will allow me to merge the two data frames in the end so it looks like this:

EmployeeID EmploymentType EmployeeCompany
1 Mechanic Honda
2 Mechanic Honda
3 Painter Honda
4 Manager Honda
5 Painter Toyota
6 Manager Toyota
7 Manager Toyota
8 Mechanic Toyota

Thanks!

Are you reading your input files from a local source (CSVs, Excel sheets etc.)? If that is so, do the file names contain the name of the company you want to fill the column with? There may be a way to strip out the company name from the file path.

OK. Then, perhaps something like this for the two company scenario; not suitable if you have more. Let me know if that is the case and we can work out something better.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3
library(tibble)

honda_df <- structure(
  list(EmployeeID = 1:4, 
       EmploymentType = structure(c(2L, 2L, 3L, 1L), 
                                  .Label = c("Manager", "Mechanic", "Painter"), 
                                  class = "factor")), 
  class = "data.frame", row.names = c(NA, -4L))

toyota_df <- structure(
  list(EmployeeID = 5:8, 
       EmploymentType = structure(c(3L, 2L, 2L, 1L), 
                                  .Label = c("Manager", "Mechanic", "Painter"), 
                                  class = "factor")), 
  class = "data.frame", row.names = c(NA, -4L))

honda_df %>% 
  add_column(EmployeeCompany = "Honda") %>% 
  bind_rows(toyota_df) %>% 
  mutate(EmployeeCompany = if_else(is.na(EmployeeCompany), 
                                   true = "Toyota", 
                                   false = EmployeeCompany))
#>   EmployeeID EmploymentType EmployeeCompany
#> 1          1       Mechanic           Honda
#> 2          2       Mechanic           Honda
#> 3          3        Painter           Honda
#> 4          4        Manager           Honda
#> 5          5        Painter          Toyota
#> 6          6       Mechanic          Toyota
#> 7          7       Mechanic          Toyota
#> 8          8        Manager          Toyota

Created on 2020-04-03 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Yes, its a CSV file but doesn't contain the company's name. I am trying to add in the company name directly in R without modifying the original dataset. Thanks