R replace href content with anchor value

I have a paragraph of text something like this

"Lots of stuff written about <a href="figures/Climate.png" rel="lightbox" title="Figure 2.1">Figure 2.1</a> but i want to remove the link"

I want to use the R functions sub/gsub or tidyverses stringr to end up with

"Lots of stuff written about Figure 2.1 but i want to remove the link"

I can't seem to find the regular expression to match the "<a ....>" etc.

I used to use perl for this kind of stuff (but it's been a while) so i am interested in doing this in R.

Now i use

  stringr::str_match_all(aline,"<a[^>]+href=\"(.*?)\"[^>]*>(.*?)</a>")

to select all of the linked items and the urls, but can't work out how to remove the content.

How about this?

library(stringr)

text <- 'Lots of stuff written about <a href="figures/Climate.png" rel="lightbox" title="Figure 2.1">Figure 2.1</a> but i want to remove the link'

str_remove_all(text, "</?a[^>]*>")
#> [1] "Lots of stuff written about Figure 2.1 but i want to remove the link"

Created on 2020-04-20 by the reprex package (v0.3.0.9001)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

@andresrcs Thanks a million !