How to prevent read_csv from generating exponents?

zhanhaopeh · March 7, 2019, 2:11pm

Hi,

I'm getting stuck on an interesting problem where my dataframe contains values of the following type: 1e1, 1e2, 1e3, 1e4... For example:

df <- data.frame(
   col1 = rep("1e1", 10)
)

And I need to perform some data transformation, and write the dataframe into a new csv using read_csv() function.

However, if you type a number with an “e” in excel, such as 1e1, it will automatically result in a scientific number: 1.00E+09. I understand that in excel, if you don’t want a scientific number, enter an apostrophe before the number: '1e1. But I'm unable to write a new csv file with values like '1e1, because then it would be interpreted as text in the output csv.

I need my values to remain in their original form as either text or factors, and I wish to still be able to work on this via csv files.

Does anyone know of a solution to this?

ab604 · March 7, 2019, 7:15pm

I think using col_types = "c" is what you are after, see below.

library(tidyverse)
# Make data frame
df <- data.frame(
        col1 = rep("1e1", 10))
# Write csv
df %>% write_csv("test.csv")

# Read csv as default
read_csv("test.csv")
#> Parsed with column specification:
#> cols(
#>   col1 = col_double()
#> )
#> # A tibble: 10 x 1
#>     col1
#>    <dbl>
#>  1    10
#>  2    10
#>  3    10
#>  4    10
#>  5    10
#>  6    10
#>  7    10
#>  8    10
#>  9    10
#> 10    10

# Read csv and set column as character
read_csv("test.csv", col_types = "c")
#> # A tibble: 10 x 1
#>    col1 
#>    <chr>
#>  1 1e1  
#>  2 1e1  
#>  3 1e1  
#>  4 1e1  
#>  5 1e1  
#>  6 1e1  
#>  7 1e1  
#>  8 1e1  
#>  9 1e1  
#> 10 1e1

^{Created on 2019-03-07 by the reprex package (v0.2.1)}

I'd suggest creating a reproducible example if this doesn't answer your question and you need more help.

All the best,

Alistair

zhanhaopeh · March 8, 2019, 1:56am

Hi Alistair,

Thanks for your solution and this is what I'm looking for!

However, what happens if I have multiple columns in my dataframe, and I want to apply col_types to only a particular column? For example:

# Make data frame
df <- data.frame(
        col1 = rep("1e1", 10),
        col2 = seq(1, 10)
)
# Write csv
df %>% write_csv("test.csv")

# Read csv and set column as character
read_csv("test.csv", col_types = "c") 

#Warning: 10 parsing failures.
#row col  expected    actual       file
#  1  -- 1 columns 2 columns 'test.csv'
#  2  -- 1 columns 2 columns 'test.csv'
#  3  -- 1 columns 2 columns 'test.csv'
#  4  -- 1 columns 2 columns 'test.csv'
#  5  -- 1 columns 2 columns 'test.csv'
#... ... ......... ......... ..........
#See problems(...) for more details.

How can I do this?

Yarnabrina · March 8, 2019, 2:19am

Hey, what's about this?

# loading libraries
library(magrittr)
library(readr)

# Make data frame
df <- data.frame(
  col1 = rep("1e1", 10),
  col2 = seq(1, 10)
)

# Write csv
df %>%
  write_csv(path = "test.csv")

# Read csv and set column as character
read_csv(file = "test.csv",
         col_types = cols(col1 = "c"))
#> # A tibble: 10 x 2
#>    col1   col2
#>    <chr> <dbl>
#>  1 1e1       1
#>  2 1e1       2
#>  3 1e1       3
#>  4 1e1       4
#>  5 1e1       5
#>  6 1e1       6
#>  7 1e1       7
#>  8 1e1       8
#>  9 1e1       9
#> 10 1e1      10

ab604 · March 8, 2019, 9:51am

In addition to Yarnabrina's reply, for more details checkout: vignette("readr") and scroll down to the column specification section onwards for lots of information and examples on what the defaults for read_csv are and how to override them.

system · March 29, 2019, 9:51am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.