Reading csv files with different locales

Hi all!

I'm trying to read a semicolon-delimited file. It works just fine using read.csv (see below). But when I'm using readr and read_csv2, it uses "," as delimiter and "." as grouping mark. I assume it's because of my locale. But even when I specify "." as delimiter it still doesn't work (see below). I know I can use read_delim for more control, but how can I use read_csv2 and specify that it should treat "." as a delimiter?

Best, R

library(tidyverse)

dt_org <- read.csv("https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/Howell1.csv",
               sep=";")

head(dt_org)
#>    height   weight age male
#> 1 151.765 47.82561  63    1
#> 2 139.700 36.48581  63    0
#> 3 136.525 31.86484  65    0
#> 4 156.845 53.04191  41    1
#> 5 145.415 41.27687  51    0
#> 6 163.830 62.99259  35    1

dt_tv <- read_csv2("https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/Howell1.csv")
#> i Using '\',\'' as decimal and '\'.\'' as grouping mark. Use `read_delim()` for more control.
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   height = col_number(),
#>   weight = col_number(),
#>   age = col_character(),
#>   male = col_double()
#> )
head(dt_tv)
#> # A tibble: 6 x 4
#>   height    weight age    male
#>    <dbl>     <dbl> <chr> <dbl>
#> 1 151765 478256065 63        1
#> 2   1397 364858065 63        0
#> 3 136525  31864838 65        0
#> 4 156845 530419145 41        1
#> 5 145415  41276872 51        0
#> 6  16383  62992589 35        1

dt_tv2 <- read_csv2("https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/Howell1.csv",
                locale = locale(decimal_mark = "."))
#> i Using '\',\'' as decimal and '\'.\'' as grouping mark. Use `read_delim()` for more control.
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   height = col_number(),
#>   weight = col_number(),
#>   age = col_character(),
#>   male = col_double()
#> )

head(dt_tv2)
#> # A tibble: 6 x 4
#>   height    weight age    male
#>    <dbl>     <dbl> <chr> <dbl>
#> 1 151765 478256065 63        1
#> 2   1397 364858065 63        0
#> 3 136525  31864838 65        0
#> 4 156845 530419145 41        1
#> 5 145415  41276872 51        0
#> 6  16383  62992589 35        1

Created on 2021-08-17 by the reprex package (v2.0.0)

  if (locale$decimal_mark == ".") {
    cli::cli_alert_info("Using {.val ','} as decimal and {.val '.'} as grouping mark. Use {.fn read_delim} for more control.")
    locale$decimal_mark <- ","
    locale$grouping_mark <- "."
  }

the code for read_csv2 simply doesnt allow it. if you set the decimal_mark to . you are told that they are swithing it to comma, and then they do so...

Therefore it seems read_delim would indeed be the way to go

Thank you for the information!

Am I right in assuming that this is related to my locale, or is this the same for everyone regardless? It seems weird to have "," as the default limiter for everyone.

I should have read the documentation more carefully. It clearly states that read_csv2 uses ";" as the field separator and "," as the decimal point. I still think it's a bit confusing , and I don't get why I can't change the decimal point through the locale setting, but I guess that is just how it is.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.