Delete values with different first character

Thienthanh · May 19, 2019, 2:59am

Hi everyone,

I have a table with a column name "Product ID" with values: c(Gold01, Gold 02, Gold 03, Sold 02, Sold 04)
I want to delete the Product ID which starts with "S" (e.g. Sold02, Sold 04).
Could you please help me how can I do it in R?

Thank you so much!

FJCC · May 19, 2019, 3:44am

Do you want to filter the data to keep rows where the Product ID starts with G?

library(dplyr)
df <- data.frame(PRODUCT_ID = c("Gold 01", "Gold 02", "Sold 02", "Sold 04", "Gold 03"),
                 Number = 1:5)
df
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Sold 02      3
#> 4    Sold 04      4
#> 5    Gold 03      5

df <- df %>% filter(grepl("^G", PRODUCT_ID))
df
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Gold 03      5

^{Created on 2019-05-18 by the reprex package (v0.2.1)}

Here is similar code but the filter condition keeps rows that do not start with S.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(PRODUCT_ID = c("Gold 01", "Gold 02", "Sold 02", "Sold 04", "Gold 03"),
                 Number = 1:5)
df
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Sold 02      3
#> 4    Sold 04      4
#> 5    Gold 03      5

df <- df %>% filter(!grepl("^S", PRODUCT_ID))
df
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Gold 03      5

^{Created on 2019-05-18 by the reprex package (v0.2.1)}

Thienthanh · May 19, 2019, 4:00am

Thank you so much @FJCC!

I have another question. In this case, I know that my ProductID starts with G & S.

However, if I have million ProductID and I don't know whether it starts with G or S or I, E, ..... --> I only want to keep G and get rid of other row/ Product ID which do not start with G. How can I detect/ check whether the ProductID starts with other letter? and then delete them?

Thank you so much!

FJCC · May 19, 2019, 1:43pm

The first code I posted keeps only the rows that start with G. Doesn't that meet your request:

I only want to keep G and get rid of other row/ Product ID which do not start with G.

I am sorry if I am missing the point.

Thienthanh · May 22, 2019, 8:17am

Hi @FJCC

If my ProductID has 2 characters "G":

#> PRODUCT_ID
#> 1 GoldG 01
#> 2 SGold 02
#> 3 SoldG 02
#> 4 SoldE 04
#> 5 GoldG 03

and I want to remove the ProductID that only starts with the first character "G" (e.g. GoldG 01).
Does the "df <- df %>% filter(grepl("^G", PRODUCT_ID))" still work?
Should I mention the order of G in the text so that the function works?

Thank you so much!

P/S: Sorry if my English is not really good

FJCC · May 22, 2019, 2:09pm

The code

df <- df %>% filter(grepl("^G", PRODUCT_ID))

keeps only those rows where PRODUCT_ID starts with G. The ^ character is a regular expression that represents the beginning of the text. If you are not familiar with regular expressions, there are many on line tutorials where you can learn about them.

Thienthanh · May 23, 2019, 6:27am

Thank you so much @FJCC

I'll take my time to learn regular expressions

system · May 30, 2019, 6:34am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.