Help with filtering dataset - string matching

I have this code currently:

VAPublicIP %>% filter(Site == "xxxxxx") %>% ggplot() + 
  geom_point(mapping = aes(x = Host, y = `Plugin ID`, color = Risk))

Let say for Site I want to match "def" in abcdef, defgh, bcdfeg and so on. How do I modify this code to do that?

This question is very different from the one in your topic title, please don't go off-topic in your own thread, ask this question on a new topic providing a REPRoducible EXample (reprex).

I 2nd the reprex request. That with a minimal sample dataset will be very helpful in addressing your question.

Could you also clarify; it sounds like your have a variables/columns named abcdef, defgh, and bcdfeg, and you want to only return the row in which values in that variable are exactly equal to def, does that sound right?

I’d check out the options in dplyr’s filter function for combining multiple statements, and lookup the stringr package and the str_detect function

Hi EconomiCurtis,

You are partly correct. Those examples are not column names but the Column values. I want my codes to return values under column "Site" which contain "def" for example it should match value "abcdef", "defgh", "bcdefg" but NOT "abc", "bcdfg", "deghij" or "efghi".

This is a sample of values in column "Site" (ignore the line number):

39 OfficeVPC-PublicIP(EXTERNAL)-VCP6-DCInfra        
40 OfficeVPC-PublicIP(EXTERNAL)-OfficeInfra          
44 Location1-PublicIP(EXTERNAL)-RDP               
45 Location1-PublicIP(EXTERNAL)-VCP2-1st          
46 Location1-PublicIP(EXTERNAL)-VCP2-2nd          
47 Location1-PublicIP(EXTERNAL)-Gate            
48 Location1-PublicIP(EXTERNAL)-Office         
49 VASNOC-PublicIP(EXTERNAL)                       
50 VASSOC-PublicIP(EXTERNAL)                       
51 Location1-PublicIP(EXTERNAL)-VCP2              
52 VSMServer-PublicIP(EXTERNAL)-VCP6               
53 Location1-PrivateIP(EXTERNAL)-Africa          
54 DEVOPS-PublicIP(EXTERNAL)-VCP6                  
55 VASNOC-PublicIP(EXTERNAL)-ByDomain              
56 CPT-PublicIP(EXTERNAL)-VCP6             
57 IXXSEM-PublicIP(EXTERNAL)-VCP6        
58 Webmail-PublicIP(EXTERNAL)-VCP6&Moscow 
59 Service-PublicIP(EXTERNAL)                 
60 Location1-PublicIP(EXTERNAL)-VCP6              
61 Location1-PublicIP(EXTERNAL)-Bangkok-NewIPRange 
62 DataCenter-PublicIP(EXTERNAL)-CCTV       
63 Location1-PublicIP(EXTERNAL)-VCP10              
64 Webmail-PublicIP(EXTERNAL)-VCP6&PD           
65 IXD-PublicIP(EXTERNAL)-TVCP6

So if I want values which contain VCP2 only, it should return me these values only:

Location1-PublicIP(EXTERNAL)-VCP2-1st
Location1-PublicIP(EXTERNAL)-VCP2-2nd
Location1-PublicIP(EXTERNAL)-VCP2

Code below will not give me anything.

VAPublicIP %>% filter(Site == "VCP2") %>% ggplot() + 
  geom_point(mapping = aes(x = Host, y = `Plugin ID`, color = Risk))

How do I fix this?

Thanks

Hi,

Here is an example using grepl to filter with regex. You filter out rows where the Site contains your string of choice.

VAPublicIP = data.frame(V1 =1:7 , Site = c( "abcdef", "defgh", "bcdefg", "abc", "bcdfg", "deghij", "efghi"))
VAPublicIP[grepl("def",VAPublicIP$Site, ignore.case = FALSE),]

I put in ignore.case = FALSE but change it to TRUE if you like to return both def and DEF

Hi Sir,

Thank you. It works.

How can I count the number of rows returned by the code?

Hi,

Just use the nrow() function for a data frame or length() for a vector.

nrow(VAPublicIP)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.