What is the use of the following :- iconv(), utf-8, utf-8-mac, converting text to vector.

Suraj_viswakarma · February 5, 2020, 12:44pm

Please help me with this question. I am badly need its answer.

jlacko · February 5, 2020, 1:18pm

For English text this has limited use; when working with non-English texts using non - ASCII characters this may come crucial. Function iconv() allows you to translate your text between different encodings, and thus render it in a legible fashion.

Consider this example:

# Two sentences, one in Czech, the other in Slovak, heavily reliant on accented characters:
tea_leaves <- "P\xf8\xedli\xb9 \xbelu\xbbou\xe8k\xfd k\xf9\xf2 \xfap\xecl \xef\xe1belsk\xe9 \xf3dy. P\xe4\xbbt\xfd\xbed\xf2ov\xe9 v\xe5\xe8at\xe1 nerv\xf3zne \xb9tekaj\xfa na m\xf4jho \xefat\xb5a v t\xe0n\xed."

#this makes no sense...
cat(tea_leaves)
P��li� �lu�ou�k� k�� �p�l ��belsk� �dy. P�t��d�ov� v��at� nerv�zne �tekaj� na m�jho �at�a v t�n�.

# converting to UTF-8
legible <- iconv(tea_leaves, from = "ISO-8859-2", to = "UTF-8")

# this still makes no sense, but at least the accented characters look "right"
cat(legible)
Příliš žluťoučký kůň úpěl ďábelské ódy. Päťtýždňové vĺčatá nervózne štekajú na môjho ďatľa v tŕní.

Also note that:

the "lege artis" way of storing text is UTF-8 encoding, use it whenever you can
in addition to base iconv() there is {stringi} package, allowing greater flexibility in manipulating character encodings

Suraj_viswakarma · February 5, 2020, 2:19pm

Thanks a lot it helped me greatly.

system · February 26, 2020, 2:19pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.