Using ifelse function on a list

I would like to replace names in my database with names i have in a list without having to write all the different names manually.
I have 56 different chemical names in a df with 2 columns, 1 column is the "Wrong" name and 1 column is the "Correct" names.
My Database dataframe has 2500 different Measurement of these different parametres and i need to change the names of the parametres.

So instead of this:
Database$Parametre <- with(Database, ifelse(Parametre == "Quicksilver", "Hg", ifelse(Parametre == "Arsenic", "As", ifelse(Parametre == "Lead", "Pb".... and so on for 50 chemicals.

I would like to change the name of the parametres according to my chemnames df.

Database$Parametre <- with(Database, ifelse(Parametre == ChemNames$Wrong,ChemNames$Corret, as.character(Parametre)))

But his gives me an error saying levels for the factors a different.

Hope this is understandable and hope you can help.

Hi @Rafn,

Take a look at this:

# Load libraries ----------------------------------------------------------
library("tidyverse")

# Create example data -----------------------------------------------------
look_up <- tribble(
  ~wrong, ~correct,
  "Quicksilver", "Hg",
  "Arsenic", "As",
  "Lead", "Pb"
)
data <- tibble(name = sample(x = c("Quicksilver", "Arsenic", "Lead"),
                             size = 2500,
                             replace = TRUE))

# Correct names -----------------------------------------------------------
data_corrected <- data %>%
  full_join(look_up, by = c("name" = "wrong"))

Yielding:

> look_up
# A tibble: 3 x 2
  wrong       correct
  <chr>       <chr>  
1 Quicksilver Hg     
2 Arsenic     As     
3 Lead        Pb     
> data
# A tibble: 2,500 x 1
   name       
   <chr>      
 1 Lead       
 2 Lead       
 3 Arsenic    
 4 Quicksilver
 5 Quicksilver
 6 Lead       
 7 Quicksilver
 8 Arsenic    
 9 Quicksilver
10 Arsenic    
# … with 2,490 more rows
> data_corrected
# A tibble: 2,500 x 2
   name        correct
   <chr>       <chr>  
 1 Lead        Pb     
 2 Lead        Pb     
 3 Arsenic     As     
 4 Quicksilver Hg     
 5 Quicksilver Hg     
 6 Lead        Pb     
 7 Quicksilver Hg     
 8 Arsenic     As     
 9 Quicksilver Hg     
10 Arsenic     As     
# … with 2,490 more rows

Hope it helps :slightly_smiling_face:

Hi @Rafn - Perhaps you can use something from this little example?

# Load libraries ----------------------------------------------------------
library("tidyverse")

# Define example data -----------------------------------------------------
d <- tribble(
  ~x1,
  "ab cd",
  "ab  cd",
  "ab   cd",
  "ab    cd",
  "ab     cd",
)

# Wrangle data ------------------------------------------------------------
d_clean <- d %>%
  mutate(x2 = str_squish(x1))

Yielding

> d
# A tibble: 5 x 1
  x1       
  <chr>    
1 ab cd    
2 ab  cd   
3 ab   cd  
4 ab    cd 
5 ab     cd
> d_clean
# A tibble: 5 x 2
  x1        x2   
  <chr>     <chr>
1 ab cd     ab cd
2 ab  cd    ab cd
3 ab   cd   ab cd
4 ab    cd  ab cd
5 ab     cd ab cd

Try to check ?stringr::str_squish :+1:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

The NA happens, if there is not an exact 1:1 correspondence between your look-up and your data

Hello Leon

Thank you very much for the reponse.

This is exactly what i was looking for, unfortunately, sometimes i get NA's in the new column.
Is there any kind og characters this formula doesn't allow.

I have some more advances chemical names like "Nonylphenol-monoethoxylater (NP1EO)" and "Benzfluranthen b+j+k" which should be translated to "Nonylphenol Monoethoxylat" and "Benzo(b+j+k)fluoranthen"

Is it maybe because of the space in the name and if so how do i counter it?

image

I wrote all the space manually again and it fixed it, i think the data i got was using tabs instead of spaces.

Thank you very much.

Yep, that'd do it - tabs versus space is a classic. Great you got it working :+1:

Hello again Leon

Now my scripts is almost working, but the tabs versus spaces seems to be a problem somewhere else in my dataset, is there a function to detect all spaces and tabs in all columns of my data?