How to remove unicode "<U+200B>" from a variable

Hi everyone,
I scraped some data and one of my variables has the following structure:

x <- as.character(c("10 318[78]<U+200B>", "16 970[78]<U+200B>", "22 898[78]<U+200B>"))

But I can only see the <U+200B> part with the str() function, otherwise when I print the variable is just like this:

> x
[1] "10 318[78]" "16 970[78]" "22 898[78]"

The thing is that I can remove the [78] part from the values but I can't remove the <U+200B> part and for this reason I can't either convert the variable to numeric.
I tried with gsub("<U\\+200B>", "", x) but it didn't work.
How can I remove the <U+200B> from my variable?

Note that this is not how to create a string with an embedded Unicode character. I assume what you really wanted was

x <- c("10 318[78]\u200B", "16 970[78]\u200B", "22 898[78]\u200B")

The \uxxxx notation means the Unicode character that is displayed as <U+200B> by str().

To remove it:

gsub("\u200b", "", x)
2 Likes

Yeah, you're totally right. I didn't know how to create it correctly.
And thanks a lot, it work out fine the solution!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.