How to exclude specific data line format from a table

Hi all,
I am pretty new in R and I am currently working on my first project.
In preparation of using a table for my analysis I have to do some clean up. In column A the content are defined as characters.
In general the "style" of each line in column A is either 123.456.789 and 12.345.678 (comparable to IP addresses) or ABC50BC4304. I want to exclude all those with styles like IP addresses (123.456.789 and 12.345.67).
Do you have any idea how to exclude these specific format from column A? Thank you so much for your help

You can do that with regular expressions .
It can be done in base R or e.g. with the nice package stringr

columnA <-c("123.456.789", "12.345.678", "12..678", "ABC50BC4304")
pattern <- "^\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"

grepl(pattern,columnA)
#> [1]  TRUE  TRUE FALSE FALSE

library(stringr)

stringr::str_detect(columnA,pattern)
#> [1]  TRUE  TRUE FALSE FALSE

Created on 2021-08-02 by the reprex package (v2.0.0)
1 Like

Hi Han Oostdijk,
thank you so much for your help. Really appreciate it.
Best regards and have a nice evening

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.