How to read quoted numeric in csv file

read_csv

#1

I have a csv file where the numeric values are also quoted, as in

"20180901","1.234,45","tag1,tag2,tag3"
"20180905","43,50","tag6,tag2"

These values a read as string, how can I read them as numeric?
I tried this suggestion but cannot get it to work as expected, possibly because the decimal separator is ,.

thanks in advance for your help!


#2

Hey @rbauduin! I'm on my phone right now, so I can't test this, but have you tried to explicitly passing column types to readr::read_csv() using the col_types argument?


#3

Thanks for the suggestion @rensa. I had tried colClasses without success.
I use this command:

e <- read.csv("expenses.csv",header=FALSE, dec = ",")

and I don't have col_types available in the suggestions, and if I still pass it, I get the message it was left unused. What am I missing?


#4

@rensa is suggesting that you use the read_csv() function from the readr package, instead of read.csv() function from base R. In read_csv() you have the option to set the locale details, which allows you to change the default decimal and grouping marks. Once you tell read_csv() to look for commas as decimal separators and periods as grouping separators, it can figure out that column should be numeric instead of character.

library(tidyverse)

my_csv <- 
  '"20180901","1.234,45","tag1,tag2,tag3"\n
  "20180905","43,50","tag6,tag2"'

read_csv(
  file = my_csv,
  col_names = FALSE,
  locale = locale(decimal_mark = ",", grouping_mark = ".")
  )
#> # A tibble: 2 x 3
#>         X1     X2 X3            
#>      <dbl>  <dbl> <chr>         
#> 1 20180901 1234.  tag1,tag2,tag3
#> 2 20180905   43.5 tag6,tag2

Created on 2018-10-02 by the reprex package (v0.2.1)

If you are new to R, the Data Import chapter of R for Data Science is a great resource to get you started.


#5

It is not as elegant, but if your data is small you could just go:

x[grepl("^[[:digit:]]",x)] %>% gsub("\\.","",.) %>% gsub(",",".",.) %>% as.numeric


#6

thank you all for your answers!

I am using readr with the locale passed as suggested by @mfherman and it indeed detects the column as numeric.

For future readers, to install "tidyverse" on ubuntu linux I had to install the packages libxml2-dev, libcurl4-openssl-dev, libssl-dev.


#7

Thanks for the save, @mfherman :smiley: Readr's read_csv is more robust than the base read.csv in a lot of ways, IMHO :slight_smile: