Retrieve broken Vietnamese string variables in R

Hi everyone,
I have a dataset from Vietnam. But when I read it in the R, the string variables are imported broken.
I used stri_trans_general from stringi package. It works on only few columns.
I checked the raw dataset, it seems those few columns were broken when the dataset was exported from the survey collecting platform.

"Du?c ch?t m?i"

When I say broken i mean with "?" or ">" instead of actual words.

So any recommendation, how i can retrieve these broken words in R?

Thank you

Hi,

If you can get the data from the 'survey collecting platform' to be exported in UTF-8, you can then import it in R with UTF-8 encoding and that should solve it.

In case you can't get the correct input data anymore, your only option is to substitute the characters again I think using something like str_replace_all from the stringr package. This will only work if each symbol is only matching to one letter of course...

Good luck,
PJ

1 Like

I guess that would be the best thing to do.
Thanks @pieterjanvc

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.