Pattern matching with rebus and stringr

I've got the following tibble where the part of the string before the underscore character stands for a certain product (i.e. 12370 for the first entry).

Products 
# A tibble: 1,895 x 1
   Bilder           
   <chr>            
 1 12370_00_18.jpg  
 2 12785_001_18.jpg 
 3 170424_001_18.jpg
 4 30117_001_19.jpg 
 5 30117_002_19.jpg 
 6 30117_003_19.jpg 
 7 30117_004_19.jpg 
 8 31031_001_19.jpg 
 9 31032_001_19.jpg 
10 31033_001_19.jpg 
# ... with 1,885 more rows

So i want to achieve that ive got a additional column (i.e. Product) where only the certain product number is in. something like:

Products 
# A tibble: 1,895 x 2
   Bilder            Product
   <chr>             <chr>      
 1 12370_00_18.jpg   12370      
 2 12785_001_18.jpg  12785      
 3 170424_001_18.jpg 170424      
 4 30117_001_19.jpg  30117      
 5 30117_002_19.jpg  30117      
 6 30117_003_19.jpg  30117      
 7 30117_004_19.jpg  30117      
 8 31031_001_19.jpg  31031      
 9 31032_001_19.jpg  31032      
10 31033_001_19.jpg  31033      
# ... with 1,885 more rows

I already tried with some functions from the stringr and rebus packages but i was not able to manage that my match (with str_match()) ends right before the first underscore character.
Thank You very much

Does this work for you?

library(tidyverse)

df <- data.frame(stringsAsFactors=FALSE,
                 Bilder = c("12370_00_18.jpg", "12785_001_18.jpg", "170424_001_18.jpg",
                            "30117_001_19.jpg", "30117_002_19.jpg", "30117_003_19.jpg",
                            "30117_004_19.jpg", "31031_001_19.jpg", "31032_001_19.jpg",
                            "31033_001_19.jpg")
)

df %>% 
    mutate(Product = str_extract(Bilder, "^\\d+")) 
#>               Bilder Product
#> 1    12370_00_18.jpg   12370
#> 2   12785_001_18.jpg   12785
#> 3  170424_001_18.jpg  170424
#> 4   30117_001_19.jpg   30117
#> 5   30117_002_19.jpg   30117
#> 6   30117_003_19.jpg   30117
#> 7   30117_004_19.jpg   30117
#> 8   31031_001_19.jpg   31031
#> 9   31032_001_19.jpg   31032
#> 10  31033_001_19.jpg   31033

Created on 2019-11-13 by the reprex package (v0.3.0.9000)

1 Like

Works perfectly, thank you very much :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.