Special Characters

Ledeth · August 17, 2020, 7:42pm

Excuse me, I don't speak English well in the first place, but as a desperate measure I decided to register with the community. Currently I am finishing my semester and I am in a research subject in my career, within this I had to use R and honestly there are many things that I do not know. My problem is the following: My database has characters in some variables that are unreadable, so it is difficult for me to work it in R, Is there a way to use them more easily within R Studio?

I will leave it here:

Nombre Institu~ Alineamiento P~ Asignaci\u0097~ Planificaci\u0~ Liderazgo Rol y dependen~ Gesti\u0097n d~ Alineaci\u0097~ Gesti\u0097n d~

1 "Direcci\x97n d~ 4 3 3 4 4 3 4 3
2 "Servicio de Im~ 4 3 4 3 3 4 4 4
3 "Servicio de Ev~ 3 4 4 4 4 4 4 2
4 "Servicio de Te~ 3 4 4 4 4 4 4 4
5 "Defensor\x92a ~ 4 3 4 4 4 3 4 4
6 "Caja de Previs~ 3 3 3 4 4 3 4 3
7 "Comisi\x97n Ch~ 2 3 3 3 3 3 3 4
8 "Superintendenc~ 3 3 3 3 3 3 3 3
9 "Superintendenc~ 4 3 4 4 4 3 4 3

My problem is regarding both rows and columns, if anyone can help me I would greatly appreciate (I hope you forgive me if I do not respect any regulations)

technocrat · August 17, 2020, 9:20pm

Hi, and welcome.

English is a world language and everyone here speaks it differently, even those of us who have it as their only language. Your written English is
concise and communicates the problem well

There are three things to try:

Review the wiki UTF-8 page.
Save your source data with UTF-8 encoding. This depends on your text editor and operating system.
Make certain that your RStudio has UTF-8 set as default with File|Save with Encoding

Come back with further questions.

Ledeth · August 17, 2020, 10:34pm

Thank you very much, I still could not solve the problem but it still helped me to find concepts that I did not know that surely I have to handle to solve my problem, have a good day

technocrat · August 17, 2020, 11:53pm

Come back if you have problems, please/

AlexisW · August 18, 2020, 5:23am

To fully understand the problem, you'll need to know what encoding your database uses (for ex MySQL has "latin1_swedish" by default), and what functions you used to read or import your database content (readLines() has an encoding option).

Anyway, based on the context I think in your rows "\x97" (and "\u0097" in column names) is supposed to be o with acute accent (ó). This is unusual, as in Unicode it would be "\u00f3". We can go to a bigger list that suggests the encoding here is the (old) Macintosh (nowadays Apple also switched to UTF-8). So we can convert your text:

x <- "\x97"
x
#> [1] "—"  # not the character we expect
xx <- iconv(x, "mac", "UTF-8")
xx
#> [1] "ó"
Encoding(c(x, xx))
#> [1] "unknown" "UTF-8"

You can see the full list of conversions that iconv() supports with iconvlist(). Note the Encoding() only supports "latin1", "UTF-8" and "unknown" (and a special "bytes"), so you can't use Encoding(x) <- "mac" as one could have thought.

In the column headers, the character appears as "\u0097" which does get translated as "¬ó" (I don't know why it's not the same as in the rows, might depend on the source and functions used). You can always replace it selectively with:

str_replace(x, "\u0097", "\x97")

And then run iconv(). Or to directly go to Unicode:

str_replace(x, "\u0097", "\U00f3")

system · September 8, 2020, 5:23am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.