I can't read a file that the filename is using the UTF-8 in RStudio 1.1.453

Hi,

I try to use read_csv to read my csv file and the source code as follow:

ch4sample.exp1_path<-"D:/Rcode/presentation_sample_ppt/sample_data/family/最近一年內曾因家庭緣故影響工作之情形-按無法加班或無法延長工時分(年齡).csv"
ch4sample.exp1<-read_csv(ch4sample.exp1_path,col_names = TRUE)

There is some reason that I need to use the traditional Chinese characters to name my filename.

Unfortunately, the R console was showing the error message as follow:
Error in guess_header_(datasource, tokenizer, locale) : Cannot read file D:/Rcode/presentation_sample_ppt/sample_data/family/?€餈?撟游??曉?摰嗅滬蝻?敶勗?撌乩?銋?敶g???瘜??剜???撱園撌交??撟湧?).csv

There is no problem in RStudio 1.1.447, so I don't think it's caused by the readr.

Could anyone help me with this problem?

Thanks!

BTW:
covert the filename to the Unicode is:
\u6700\u8fd1\u4e00\u5e74\u5167\u66fe\u56e0\u5bb6\u5ead\u7de3\u6545\u5f71\u97ff\u5de5\u4f5c\u4e4b\u60c5\u5f62\uff0d\u6309\u7121\u6cd5\u52a0\u73ed\u6216\u7121\u6cd5\u5ef6\u9577\u5de5\u6642\u5206(\u5e74\u9f61)

Roddy_Hung

You might use R's list.files() function to find out how R names these files, and refer to them that way.

For example on my system

> list.files()
[1] "community_test"          "community-sandbox.Rproj" 
[3] "poobär.r"

I think here there are both encoding and nameing problems.
For the naming problem, you can follow the @EconomiCurtis way to get the correct name of your file.
And when you don't know the proper way to fix the encoding problem, try read_lines_raw instead of specific read_* method.

1 Like

Can you also provide the output of sessionInfo(), so we can know what locale you're running with? It would be doubly helpful if you could provide a reproducible example -- for example, a set of R code that creates a file containing these traditional Chinese characters in the path that then causes this issue when attempting to read it.

I'm also curious what the output of:

Encoding(ch4sample.exp1_path)

is, and whether using enc2utf8() to ensure it's UTF-8 makes a difference.

1 Like