Preventing drop of first 0 in records

carmendw · July 30, 2018, 10:52pm

Hi,
I am reading a csv file with a field that has been set to special format with 14 zeros marking 14 spaces to display even when 0 is first. But when read by R, it does not see the first 0 and changes those records to 13 numbers/spaces.

Is there a way to display properly showing the full 14 numbers including those starting with 0?

zeros

jcblum · July 30, 2018, 10:59pm

I'm having trouble understanding what the original data format and the imported data formats are. Can you provide a brief sample of the contents of your CSV file (the text itself, not a screenshot)?

Then, can you also provide the output of running str() on the imported data frame?

Finally, are you expecting the values in this field to behave like numbers (you use them in mathematical operations), or like text?

carmendw · July 30, 2018, 11:06pm

Hello @jcblum,

I originally wanted R to read the .xls but it would not read this one ( i had two). My guess was this one was too big? or I don't know why it didn't want to read this one.

The data was excel >> csv with the field saved as "special 00000000000000" so that 14 spaces are kept as the numbers signify an ID that is made up of 14 digits ( some start with 0). (I am not sure how to show you the text, sorry. Suggestions?)

This field can be text/character, it will not be used for calculation but will be a key for merging at some point. Also, the records that start with a zero do so for a reason, they need to be there.

str() gives:

str(schools2015$CDSCode)
 num [1:13972] 1100170000000 1100170109835 1100170112607 1100170118489 1100170123968 ...

jcblum · July 30, 2018, 11:56pm

Ah, OK, I think I see.

CSV is a very simple format — literally, comma separated values, i.e. lines of plain text with commas in between the fields. So things like special formatting that you've chosen in Excel do not get preserved when exporting as CSV. To see what the contents of your CSV are, you can open it in a "text editor" — on Windows, the built-in one is Notepad. On Mac, there's TextEdit. If you're using RStudio, its Source editor serves the same function (and is more full-featured). You can share the contents of your CSV by opening it in a text editor and copy-pasting a few lines here.

So, first question:

When you open the CSV file in RStudio (click on it in the Files pane or open it via the File menu), or in another text editor, are the leading zeroes present?

If so, then the solution is fairly simple — you add some more details to the import code to make sure that variable is imported as text, instead of as a number (as it currently is). To get help with exactly how to do this, you'll need to post the code you used to import the CSV (there's more than one way to do that in R!).

carmendw · July 31, 2018, 12:19am

@jcblum Yes, selecting it through file menu and import preserves zeros.

schools2015 <- read.csv ("C:\\Box Sync\\Karina Fastovsky\\Data\\Spatial\\Schools_Census\\schools2015_SJ15_0.csv")
schools2018 <- read.csv ("C:\\Box Sync\\Karina Fastovsky\\Data\\Spatial\\Schools_Census\\schools2018_SJ15.csv")

jcblum · July 31, 2018, 3:57am

This should import the CDSCode field as character, leaving the rest of the fields to be imported according to the default (where R guesses the proper type). You can also explicitly specify a type for every field, but if the rest are importing according to your expectations there's no need to do so.

object_name <- read.csv("path\\to\\file.csv", colClasses = c(CDSCode = "character"))

I'm assuming you will fill in your own values for object_name and "path\\to\\file.csv". By the way, if you're using a lot of absolute paths in your code, you might want to look into adopting a more project-oriented workflow, which can help make the code easier to maintain over time.