read.csv() behaves differently on desktop vs. cloud

I was using rstudio cloud for a project. I ran out of hours so I downloaded all the files of the project and opened them in rstudio desktop.

I have an .Rmd with a chunk that includes foo <- read.csv('foo.csv'). I then set a variable to a column of the data, bar=foo$bar. In the actual .csv, this column includes mostly numbers with a few '?' characters. I first remove these rows entirely. Looking at typeof(bar) in rstudio cloud shows that bar is of type character. Running as.numeric(bar) successfully converts it to type double.

Running this exact same file using the exact same .csv in my rstudio desktop reports that bar is of type integer, even before I remove the rows with '?' characters in the column. Running as.numeric(bar) now (after removing question marks) returns with a seemingly random list of values that are in no way related to the original values.

I am very confused how this is possible, as virtually everything in the environments is the same including the source and data files. I have the most up to date versions of all packages on my desktop version (testing before updating packages showed the same results on desktop).

I have a slight suspicion CRLF/LF conversion is involved. My local os is Windows 10, could the cloud server be linux?

How could this be happening and how can I fix this?

My guess is this is due to an R version difference between the cloud and your desktop.

As of R 4.0 the default argument for stringsAsFactors when creating a data frame was changed from TRUE to FALSE.

It sounds like you have a version prior to 4.0 on your desktop which is recognizing the column as a character vector, so it creates it as a factor which is an integer type, where the values as basically put into a lookup table in lexical order.

On the cloud which is > 4.0, it is reading it in as a character and without converting it to factor. So, when you use as.numeric() on it, you get the values you are expecting.

On you desktop you can either,

  1. Update to the latest R
  2. Set stringsAsFactors = FALSE
  3. First concert the column to character, then to numeric, e.g. as.numeric(as.character(bar))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.