csv import issue

Hi,
I can see and issue with csv import in R.
I have some data files provided to me in csv format. Unfortunately I cannot import them to R. Although when I open them in Excel and import them in Excel format, everything is fine. I know I can convert all files before importing them but I am just curious that that could be.

this doesn't work:

data.source.csv <- read.csv("P:/User/yyy.csv", header = TRUE, sep = ",")

but this is fine:

library(readxl)
data.source.xls <- read_excel("P:/User/yyy.xlsx")

and I've got that as a result:

data.frame(stringsAsFactors=FALSE,
                       þÿ.URN. = c("10BE022654416", "10BE022662462", "10BE022001922"),
                      QUESTION = c("Recommendation", "Recommendation", "Choice Dealer"),
                      VERBATIM = c("aaa meer!!!!!", "bbb gesteld", "ccceid"),
                      CONCEPTS = c("-[5|68|180]", "+[7|0|61]\t-[5|61|106]", "+[7|0|107]")
                  )

any thoughts? Maybe that is related to a weird character in the name of the first variable (þÿ.URN)?

Any chance you can share a link to a sample .csv file that reproduces the issue?

1 Like

I had this happen to me and the solution was providing the encoding as UTF-16.

Does the following help at all?

data.source.csv <- read.csv("P:/User/yyy.csv", header = TRUE, sep = ",", encoding = "UTF-16")

Thank you but encoding did not help. I still have this error:

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 2 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 3 appears to contain embedded nulls
4: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 4 appears to contain embedded nulls
5: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 5 appears to contain embedded nulls

I don't know how I could attached the file as only pdfs and pictures are allowed.

Ok, I've got it in Google Drive: https://drive.google.com/open?id=1U5Psb8CxEBYx74RfquGWLlaN1aJWmJmo

You haven't made the file public, we don't have access to it.

Ooops, sorry.
Try now please: https://drive.google.com/open?id=1U5Psb8CxEBYx74RfquGWLlaN1aJWmJmo

I have a feeling this csv problem is an R issue which can be resolved only by converting csv files to excel prior to importing them to R environment. Unless there is a package helping to import weird or corrupter csv files :thinking:

Your csv file has a rare encoding that I can't identify but using "utf16" allows you to read the data, although you lose the special characters like "þÿ"

url <- "https://drive.google.com/uc?authuser=0&id=1U5Psb8CxEBYx74RfquGWLlaN1aJWmJmo&export=download"
read.csv(url, header = TRUE, fileEncoding = "utf16")
#>             URN       QUESTION       VERBATIM               CONCEPTS
#> 1 10BE022654416 Recommendation aaa meer!!!!!             -[5|68|180]
#> 2 10BE022662462 Recommendation   bbb gesteld  +[7|0|61]\t-[5|61|106]
#> 3 10BE022001922  Choice Dealer        ccceid              +[7|0|107]

Created on 2019-10-07 by the reprex package (v0.3.0.9000)</sup

If you know what the exact encoding is then you can specify it and have the special characters to appear.

1 Like

O wow! :clap:
That was really weird.
Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.