Delete values with different first character

Hi everyone,

I have a table with a column name "Product ID" with values: c(Gold01, Gold 02, Gold 03, Sold 02, Sold 04)
I want to delete the Product ID which starts with "S" (e.g. Sold02, Sold 04).
Could you please help me how can I do it in R?

Thank you so much!

Do you want to filter the data to keep rows where the Product ID starts with G?

df <- data.frame(PRODUCT_ID = c("Gold 01", "Gold 02", "Sold 02", "Sold 04", "Gold 03"),
                 Number = 1:5)
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Sold 02      3
#> 4    Sold 04      4
#> 5    Gold 03      5

df <- df %>% filter(grepl("^G", PRODUCT_ID))
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Gold 03      5

Created on 2019-05-18 by the reprex package (v0.2.1)

Here is similar code but the filter condition keeps rows that do not start with S.

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union
df <- data.frame(PRODUCT_ID = c("Gold 01", "Gold 02", "Sold 02", "Sold 04", "Gold 03"),
                 Number = 1:5)
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Sold 02      3
#> 4    Sold 04      4
#> 5    Gold 03      5

df <- df %>% filter(!grepl("^S", PRODUCT_ID))
#>   PRODUCT_ID Number
#> 1    Gold 01      1
#> 2    Gold 02      2
#> 3    Gold 03      5

Created on 2019-05-18 by the reprex package (v0.2.1)

Thank you so much @FJCC!

I have another question. In this case, I know that my ProductID starts with G & S.

However, if I have million ProductID and I don't know whether it starts with G or S or I, E, ..... --> I only want to keep G and get rid of other row/ Product ID which do not start with G. How can I detect/ check whether the ProductID starts with other letter? and then delete them?

Thank you so much!

The first code I posted keeps only the rows that start with G. Doesn't that meet your request:

I only want to keep G and get rid of other row/ Product ID which do not start with G.

I am sorry if I am missing the point.


If my ProductID has 2 characters "G":

#> 1 GoldG 01
#> 2 SGold 02
#> 3 SoldG 02
#> 4 SoldE 04
#> 5 GoldG 03

and I want to remove the ProductID that only starts with the first character "G" (e.g. GoldG 01).
Does the "df <- df %>% filter(grepl("^G", PRODUCT_ID))" still work?
Should I mention the order of G in the text so that the function works?

Thank you so much!

P/S: Sorry if my English is not really good :cry:

The code

df <- df %>% filter(grepl("^G", PRODUCT_ID))

keeps only those rows where PRODUCT_ID starts with G. The ^ character is a regular expression that represents the beginning of the text. If you are not familiar with regular expressions, there are many on line tutorials where you can learn about them.

Thank you so much @FJCC

I'll take my time to learn regular expressions :blush:

