How to read the file if there is data in the file with the wrong encoding ?

How to read the file if there is data in the file with the wrong encoding ? The head lines is read by readr::read_tsv without any problem, while the error "Invalid multibyte sequence" returned when " n_max = Inf". I have changed the locale encoding with "UTF-8" and "CP932" to verify the lines couldn't be read. Something seems wrong with the lines. Is there any method to read the file without adjustments for the file ? Or, how to fixed the file (the file size almost 3G) ?
Here is the part of the lines with locale encoding "UTF-8", which seems to be the reason of the error.

column name: product_name

line 94715: "\xb1\xb7\xd4\xcf\xbe \xc4\xb8\xc6\xc2\xb7 150G"
line 94716: "\xb1\xb7\xd4\xcf\xbe \xba\xb8\xc4\xb3\xc6\xc2\xb9\xc0\xde\xcf \x80150"

Can anyone give me the solution to fix it ?

try encoding = latin1

Thank you for the reply.
It worked for the 'product_name', and read all the lines. Something wrong, however, against the other columns like 'job', 'gender', etc
job
1 "\u0090³\u008eÐ\u0088õ\u0081E\u008cö\u0096±\u0088õ"
2 "\u0090³\u008eÐ\u0088õ\u0081E\u008cö\u0096±\u0088õ"
gender
1 "\u0092j\u0090«"
2 "\u0092j\u0090«"
age
1 "50\u0091ã"
2 "50\u0091ã"

Along with the error:
Warning: 1 parsing failure.
row col expected actual file
17511453 -- 20 columns 11 columns '01studysample_SCIData.tsv'

It really confused me, how can the file be like this with different encoding.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.