Removing characters from a string that contains brackets

So I'm trying to figure out how to format my function that grabs my ipaddress to basically just keep the numbers. My function is:

myIPaddress <- readLines("https://lnkd.in/drb3YQZV", warn = FALSE)

The variable is (with the numbers substituted with zeroes so as not to post my actual ip adress)

> myIPaddress
[1] "{\"ip\":\"000.00.000.00\"}"

I want it to be formatted so it just has

> myIPaddress
[1] "000.00.000.00"

I've tried a couple variations of gsub but am struggling. I'm trying to first get rid of everything in the beginning, so I have tried:

> myIPaddress <- readLines("https://lnkd.in/drb3YQZV", warn = FALSE) %>% 
+   gsub(".*\\0",'')
Error in gsub(., ".*\\0", "") : 
  invalid regular expression '{"ip":"000.00.000.00"}', reason 'Invalid contents of {}'
In addition: Warning message:
In gsub(., ".*\\0", "") :
  TRE pattern compilation error 'Invalid contents of {}'

So I think it's having a problem with the fact that some of the characters are in the brackets {}. So I found that I could use the perl=TRUE addition and then it just removed everything, as shown below.

> myIPaddress <- readLines("https://lnkd.in/drb3YQZV", warn = FALSE) %>% 
+   gsub(".*\\1",'',perl=TRUE)
> myIPaddress
[1] ""

So doing it this way just completely removed the whole address. Any help would be appreciated

Hi,

If I understand the question you can use {stringr} package as per:

library(stringr)

str_extract(myIPaddress, pattern = "\\d.*\\d")

This basically looks for the first and last numbers and extracts everything between them.

Hope this is useful

1 Like

That worked, thanks!

Maybe this helps ?

myIPaddress<- "{\"ip\":\"000.00.000.00\"}"
(m1 <- gsub("^[^0123456789]+","",myIPaddress))
#> [1] "000.00.000.00\"}"
(m2 <- gsub("\"}$","",m1))
#> [1] "000.00.000.00"
Created on 2022-08-05 by the reprex package (v2.0.1)

Hello,

I personally think the grep and related functions is base R are very confusing and not intuitive. I recommend you use the "stringr" package from the Tidyverse which is much more elegant.

In your case, you want this:

library(stringr)

str_extract("{\"ip\":\"000.00.000.00\"}", "(\\d+\\.?)+")
#> [1] "000.00.000.00"

Created on 2022-08-04 by the reprex package (v2.0.1)

The regular expression I used looks for the first digit, then keeps adding digits or periods until the full IP address has been extracted.

Hope this helps,
PJ

You have to refine the regular expression, also, the stringr package is very handy:

text <- "{\"ip\":\"000.00.000.00\"}"

stringr::str_extract(text, "(\\d{1,3}\\.?){4}")
#> [1] "000.00.000.00"

Created on 2022-08-04 by the reprex package (v2.0.1)

Try str_extract from the stringr package.

library(stringr)
MyIP <- "{\"ip\":\"000.00.000.00\"}"
str_extract(MyIP, "\\d+\\.\\d+\\.\\d+\\.\\d+")
#> [1] "000.00.000.00"

Created on 2022-08-04 by the reprex package (v2.0.1)

I feel the need to explain the reason for my "mosterd na de maaltijd" answer:
my answer was the first reaction to the question and I answered that question in terms that the OP used.
My answer was set on hold awaiting clearance by an moderator ??
Later I saw the answer by @W4tRb0y that uses str_extract and indeed I think that is a better solution.
Because my answer was waiting for the moderator, I had the opportunity to delete my solution, but being curious why it was being set for moderation I decided not to do that.
Today my solution was admitted but without giving a reason for the delay.
Still curious why my suggestion was put to moderation. Was it because of the use of gsub?

It is an auto-flagging system, most likely triggered this time because of the IP-like address used in the example (my answer was auto-flagged too), as far as I know, auto-flags al cleared by a single person (the forum administrator) so sometimes it takes a while depending on his workload.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.