I have the raw data of a survey that was conducted in two languages - English and Hindi. The issue is that the Hindi characters in the csv file have all been converted to junk characters. however, by looking at the English survey, i can identify the actual response.
Query is how do i convert the individual responses which are junk values with the specific English responses.
for instance,
test_english <- tribble(
~id, ~age,
"a1", "25 years to 34 years",
"a2", "18 years to 24 years",
"a3", "less than 18 years",
"a4", "45 years and above",
"a5", "35 years to 44 years",
)
test_hindi <- tribble(
~id, ~age,
"b1", "25 वरà¥\u008dष से 34 वरà¥\u008dष",
"b2", "18 वरà¥\u008dष से 24 वरà¥\u008dष",
"b3", "18 वरà¥\u008dष से कम" ,
"b4", "45 वरà¥\u008dष और उससे अधिक",
"b5", "35 वरà¥\u008dष से 44 वरà¥\u008dष",
)
in the example above, i would like to reach an output that can be expressed as:
test_output <- tribble(
~id, ~age,
"b1", "25 years to 34 years",
"b2", "18 years to 24 years",
"b3", "less than 18 years",
"b4", "45 years and above",
"b5", "35 years to 44 years",
)
there are multiple repsonses which have to be "translated" from junk to English characters. As per my understanding, I would have to search for these junk values in the raw data and perform a 1-to-1 replacement with another character vector. Which would be the appropriate function that can do this?