Special Characters

Excuse me, I don't speak English well in the first place, but as a desperate measure I decided to register with the community. Currently I am finishing my semester and I am in a research subject in my career, within this I had to use R and honestly there are many things that I do not know. My problem is the following: My database has characters in some variables that are unreadable, so it is difficult for me to work it in R, Is there a way to use them more easily within R Studio?

I will leave it here:

Nombre Institu~ Alineamiento P~ Asignaci\u0097~ Planificaci\u0~ Liderazgo Rol y dependen~ Gesti\u0097n d~ Alineaci\u0097~ Gesti\u0097n d~

1 "Direcci\x97n d~ 4 3 3 4 4 3 4 3
2 "Servicio de Im~ 4 3 4 3 3 4 4 4
3 "Servicio de Ev~ 3 4 4 4 4 4 4 2
4 "Servicio de Te~ 3 4 4 4 4 4 4 4
5 "Defensor\x92a ~ 4 3 4 4 4 3 4 4
6 "Caja de Previs~ 3 3 3 4 4 3 4 3
7 "Comisi\x97n Ch~ 2 3 3 3 3 3 3 4
8 "Superintendenc~ 3 3 3 3 3 3 3 3
9 "Superintendenc~ 4 3 4 4 4 3 4 3

My problem is regarding both rows and columns, if anyone can help me I would greatly appreciate (I hope you forgive me if I do not respect any regulations)

Hi, and welcome.

Your written English is
concise and communicates the problem well

There are three things to try:

  1. Review the wiki UTF-8 page.
  2. Save your source data with UTF-8 encoding. This depends on your text editor and operating system.
  3. Make certain that your RStudio has UTF-8 set as default with File|Save with Encoding

Come back with further questions.

Thank you very much, I still could not solve the problem but it still helped me to find concepts that I did not know that surely I have to handle to solve my problem, have a good day

Come back if you have problems, please.

To fully understand the problem, you'll need to know what encoding your database uses (for ex MySQL has "latin1_swedish" by default), and what functions you used to read or import your database content (readLines() has an encoding option).

Anyway, based on the context I think in your rows "\x97" (and "\u0097" in column names) is supposed to be o with acute accent (ó). This is unusual, as in Unicode it would be "\u00f3". We can go to a bigger list that suggests the encoding here is the (old) Macintosh (nowadays Apple also switched to UTF-8). So we can convert your text:

x <- "\x97"
#> [1] "—"  # not the character we expect
xx <- iconv(x, "mac", "UTF-8")
#> [1] "ó"
Encoding(c(x, xx))
#> [1] "unknown" "UTF-8"

You can see the full list of conversions that iconv() supports with iconvlist(). Note the Encoding() only supports "latin1", "UTF-8" and "unknown" (and a special "bytes"), so you can't use Encoding(x) <- "mac" as one could have thought.

In the column headers, the character appears as "\u0097" which does get translated as "¬ó" (I don't know why it's not the same as in the rows, might depend on the source and functions used). You can always replace it selectively with:

str_replace(x, "\u0097", "\x97")

And then run iconv(). Or to directly go to Unicode:

str_replace(x, "\u0097", "\U00f3")
