Please help me with this question. I am badly need its answer.
For English text this has limited use; when working with non-English texts using non - ASCII characters this may come crucial. Function iconv()
allows you to translate your text between different encodings, and thus render it in a legible fashion.
Consider this example:
# Two sentences, one in Czech, the other in Slovak, heavily reliant on accented characters:
tea_leaves <- "P\xf8\xedli\xb9 \xbelu\xbbou\xe8k\xfd k\xf9\xf2 \xfap\xecl \xef\xe1belsk\xe9 \xf3dy. P\xe4\xbbt\xfd\xbed\xf2ov\xe9 v\xe5\xe8at\xe1 nerv\xf3zne \xb9tekaj\xfa na m\xf4jho \xefat\xb5a v t\xe0n\xed."
#this makes no sense...
cat(tea_leaves)
P��li� �lu�ou�k� k�� �p�l ��belsk� �dy. P�t��d�ov� v��at� nerv�zne �tekaj� na m�jho �at�a v t�n�.
# converting to UTF-8
legible <- iconv(tea_leaves, from = "ISO-8859-2", to = "UTF-8")
# this still makes no sense, but at least the accented characters look "right"
cat(legible)
Příliš žluťoučký kůň úpěl ďábelské ódy. Päťtýždňové vĺčatá nervózne štekajú na môjho ďatľa v tŕní.
Also note that:
- the "lege artis" way of storing text is UTF-8 encoding, use it whenever you can
- in addition to base
iconv()
there is{stringi}
package, allowing greater flexibility in manipulating character encodings
1 Like
Thanks a lot it helped me greatly.
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.