Edit: Because apparently they never were UTF-8-encoded in the first place.
I'm trying to read in a csv file that was stored from Excel with UTF-8 encoding. The file only contains strings of German words. When I print the tibble, those cells that contain a word with, say, an umlaut ("ä" etc.) appear in quotes and instead of the umlaut, its ISO-8859-1 code is depicted.
For instance, store a file
Test.csv with the content
and read it with
dat <- read_csv2("Test.csv"). When printing it, this is the output I see:
> dat # A tibble: 1 x 2 name city <chr> <chr> 1 "B\xe4rbel" Berlin
The encoding is apparently right:
> Encoding(dat$name)  "UTF-8"
> guess_encoding(charToRaw(dat$name)) # A tibble: 2 x 2 encoding confidence <chr> <dbl> 1 ISO-8859-1 0.42 2 ISO-8859-2 0.42
So, what is printed to the console is encoded in Latin-1? At least, when I read in via
dat <- read_csv2("Test.csv", locale = locale(encoding = "ISO-8859-1")), I get
> dat # A tibble: 1 x 2 name city <chr> <chr> 1 Bärbel Berlin
My question ultimately is this: Do I have to specify separately in what format characters are stored and in what way they are printed to the console?
(Apologies if this a bad question, but I'm quite new to R and I have thoroughly tried to find an answer elsewhere, to no avail... Any help would be much appreciated!)