Convert char to date and numeric

Hello everyone. This is my first post and reach out to the Community.

I've just completed the Google Data Analytics course and am working on my own case study. Rather than use a nicely cleaned publicly available data set I decided I would source my own data from Yahoo finance. I'm now wondering if this was a tad over ambitious. I thought I had formatted the data appropriately in Google Sheets but all three columns of my dataset are showing as type 'char' in R. I need the 'Date' column to be in date format and 'Close' column to be numeric.

I've tried all kinds of fixes from scouring stackoverflow.com etc. but nothing seems to work. Any pointers would be most welcome. Need to get this project off the ground! Markdown file below, I think? This is all new to me - the Google course is one thing, this is where the real work begins clearly.

Hi, can you provide a reproducible example?

Hi. I tried to do this but for some reason couldn't read the csv file?

install.packages("ggplot2")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(ggplot2)
install.packages("tidyverse")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(tidyverse)
install.packages("datapasta")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(datapasta)
install.packages("styler")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
#> also installing the dependencies 'R.methodsS3', 'R.oo', 'R.utils', 'R.cache', 'rprojroot'
library(styler)
install.packages("reprex")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(reprex)

safe_haven <- read.csv("Safe_Haven.csv")
#> Warning in file(file, "rt"): cannot open file 'Safe_Haven.csv': No such file or
#> directory
#> Error in file(file, "rt"): cannot open the connection

head(safe_haven)
#> Error in head(safe_haven): object 'safe_haven' not found
str(safe_haven)
#> Error in str(safe_haven): object 'safe_haven' not found

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] reprex_2.0.1    styler_1.7.0    datapasta_3.1.0 forcats_0.5.1  
#>  [5] stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2    
#>  [9] tidyr_1.2.0     tibble_3.1.6    tidyverse_1.3.1 ggplot2_3.3.5  
#> 
#> loaded via a namespace (and not attached):
#>  [1] lubridate_1.8.0   assertthat_0.2.1  digest_0.6.29     utf8_1.2.2       
#>  [5] R6_2.5.1          cellranger_1.1.0  backports_1.4.1   evaluate_0.15    
#>  [9] httr_1.4.2        highr_0.9         pillar_1.7.0      rlang_1.0.2      
#> [13] readxl_1.4.0      rstudioapi_0.13   R.utils_2.11.0    R.oo_1.24.0      
#> [17] rmarkdown_2.14    munsell_0.5.0     broom_0.8.0       compiler_4.2.0   
#> [21] modelr_0.1.8      xfun_0.30         pkgconfig_2.0.3   htmltools_0.5.2  
#> [25] tidyselect_1.1.2  fansi_1.0.3       crayon_1.5.1      tzdb_0.3.0       
#> [29] dbplyr_2.1.1      withr_2.5.0       R.methodsS3_1.8.1 grid_4.2.0       
#> [33] jsonlite_1.8.0    gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.2        
#> [37] magrittr_2.0.3    scales_1.2.0      cli_3.3.0         stringi_1.7.6    
#> [41] fs_1.5.2          xml2_1.3.3        ellipsis_0.3.2    generics_0.1.2   
#> [45] vctrs_0.4.1       tools_4.2.0       R.cache_0.15.0    glue_1.6.2       
#> [49] hms_1.1.1         fastmap_1.1.0     yaml_2.3.5        colorspace_2.0-3 
#> [53] rvest_1.0.2       knitr_1.39        haven_2.5.0

Created on 2022-05-02 by the reprex package (v2.0.1)

Minimal data set below

head(safe_haven)

          Date    Close Invest

1 04/01/2015 00:00 1,294.20 GOLD
2 04/02/2015 00:00 1,288.20 GOLD
3 04/03/2015 00:00 1,288.20 GOLD
4 04/04/2015 00:00 1,288.20 GOLD
5 04/05/2015 00:00 1,288.20 GOLD
6 04/06/2015 00:00 1,305.40 GOLD

str(safe_haven)
'data.frame': 15342 obs. of 3 variables:
Date : chr "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ... Close : chr "1,294.20" "1,288.20" "1,288.20" "1,288.20" ...
$ Invest: chr "GOLD" "GOLD" "GOLD" "GOLD" ...

Can you post the output of dput(head(safe_haven))?

Otherwise, this should probably work:

library(tidyverse)
safe_haven %>% 
  mutate(date = as.Date(date, format = "%d/%m/%Y %H:%M"))

The read error is possibly because you didn't put the csv file into your current working directory(wd). set the folder of the csv file as the wd or provide the full file path of the csv file might solve the problem:

# assuming that the directory "~/user/home/desktop/" is the folder of csv
setwd("~/user/home/desktop/")
read.csv("Safe_Haven.csv")
# or
read.csv("~/user/home/desktop/Safe_Haven.csv")

Use function as.Date to retrieve dates from chr, function strptime() to retrieve date-time form chr. The function as.numeric() is used to transform other datatypes into numeric. But in your example data, you may use str_remove to delete all the thousand separators at first.
In addition, you can combine using the function mutate from package dplyr to modify columns of a data.frame in clean syntax.

library(tidyverse)

safe_haven %>% mutate(
  Date_ = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Datetime_ = strptime(Date,format = '%m/%d/%Y %H:%M'),
  Close_ = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 6
  Date             Close    Invest Date_      Datetime_           Close_
  <chr>            <chr>    <chr>  <date>     <dttm>               <dbl>
1 04/01/2015 00:00 1,294.20 GOLD   2015-04-01 2015-04-01 00:00:00  1294.
2 04/02/2015 00:00 1,288.20 GOLD   2015-04-02 2015-04-02 00:00:00  1288.
3 04/03/2015 00:00 1,288.20 GOLD   2015-04-03 2015-04-03 00:00:00  1288.
4 04/04/2015 00:00 1,288.20 GOLD   2015-04-04 2015-04-04 00:00:00  1288.
5 04/05/2015 00:00 1,288.20 GOLD   2015-04-05 2015-04-05 00:00:00  1288.
6 04/06/2015 00:00 1,305.40 GOLD   2015-04-06 2015-04-06 00:00:00  1305.

# name the mutated column as the previous one to replace the original data:
safe_haven %>% mutate(
    Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
    Close = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 3
  Date       Close Invest
  <date>     <dbl> <chr> 
1 2015-04-01 1294. GOLD  
2 2015-04-02 1288. GOLD  
3 2015-04-03 1288. GOLD  
4 2015-04-04 1288. GOLD  
5 2015-04-05 1288. GOLD  
6 2015-04-06 1305. GOLD 
2 Likes
setwd("/cloud/project")
safe_haven <- read.csv('Safe_Haven.csv')


# Here is the output from dput(head(safe_haven))

dput(head(safe_haven))
#> structure(list(Date = c("04/01/2015 00:00", "04/02/2015 00:00", 
#> "04/03/2015 00:00", "04/04/2015 00:00", "04/05/2015 00:00", "04/06/2015 00:00"
#> ), Close = c("1,294.20", "1,288.20", "1,288.20", "1,288.20", 
#> "1,288.20", "1,305.40"), Invest = c("GOLD", "GOLD", "GOLD", "GOLD", 
#> "GOLD", "GOLD")), row.names = c(NA, 6L), class = "data.frame")

# After running your suggested code I get this erroe

library(tidyverse)
safe_haven %>% 
  mutate(date = as.Date(date, format = "%d/%m/%Y %H:%M"))
#> Error in `mutate()`:
#> ! Problem while computing `date = as.Date(date, format = "%d/%m/%Y
#>   %H:%M")`.
#> Caused by error in `as.Date.default()`:
#> ! do not know how to convert 'date' to class "Date"

Thank you so much for your help. After running your suggested code I get the same output as you

   Date       Close Invest
   <date>     <dbl> <chr> 
 1 2015-04-01 1294. GOLD  
 2 2015-04-02 1288. GOLD  
 3 2015-04-03 1288. GOLD  
 4 2015-04-04 1288. GOLD  
 5 2015-04-05 1288. GOLD  
 6 2015-04-06 1305. GOLD  
 7 2015-04-07 1298. GOLD  
 8 2015-04-08 1291. GOLD  
 9 2015-04-09 1282. GOLD  
10 2015-04-10 1293. GOLD  

(although when I run str(Safe_Haven) the date column still shows as 'chr')?

str(Safe_Haven)
spec_tbl_df [15,342 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Date  : chr [1:15342] "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ...
 $ Close : num [1:15342] 1294 1288 1288 1288 1288 ...
 $ Invest: chr [1:15342] "GOLD" "GOLD" "GOLD" "GOLD" ...
 - attr(*, "spec")=
  .. cols(
  ..   Date = col_character(),
  ..   Close = col_number(),
  ..   Invest = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

Sorry but when I'm trying to provide you with reproducible example, not able to read the file. Something I'm doing wrong? reprex::reprex()

library(tidyverse)

Safe_Haven %>% mutate(
  Date_ = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Datetime_ = strptime(Date,format = '%m/%d/%Y %H:%M'),
  Close_ = as.numeric(str_remove_all(Close,','))
)
#> Error in mutate(., Date_ = as.Date(Date, format = "%m/%d/%Y %H:%M"), Datetime_ = strptime(Date, : object 'Safe_Haven' not found

Safe_Haven %>% mutate(
  Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Close = as.numeric(str_remove_all(Close,','))
)
#> Error in mutate(., Date = as.Date(Date, format = "%m/%d/%Y %H:%M"), Close = as.numeric(str_remove_all(Close, : object 'Safe_Haven' not found
1 Like

You don't have to set the working directory if you use Projects. It is good practice for collaborating as file locations are different across machines. Also just easier.

1 Like

Watch carefully of the object name, I noticed that in your code, the data object named "safe_haven", but in later code which triggered error, you gave it "Safe_Haven".

Moreover, my code didn't save the result of mutate, you should save the result at first, like:

safe_haven_1 <- safe_haven %>% mutate(
    Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
    Close = as.numeric(str_remove_all(Close,','))
)

> str(safe_haven)
tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
 $ Date  : chr [1:6] "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ...
 $ Close : chr [1:6] "1,294.20" "1,288.20" "1,288.20" "1,288.20" ...
 $ Invest: chr [1:6] "GOLD" "GOLD" "GOLD" "GOLD" ...
> str(safe_haven_1)
tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
 $ Date  : Date[1:6], format: "2015-04-01" "2015-04-02" "2015-04-03" "2015-04-04" ...
 $ Close : num [1:6] 1294 1288 1288 1288 1288 ...
 $ Invest: chr [1:6] "GOLD" "GOLD" "GOLD" "GOLD" ...
1 Like

Thank you so much for your patience :grinning:

Highly suggest not using setwd() for setting working directories and instead use the package {here} in tandem with RProjects. It will make your code reproducible, streamline your workflow, and your working directories won't break as in above. See @jennybryan's blog post about why setwd() is poor and here() makes your life easier. Also want to add that using {lubridate} makes working with date/POSIXct variables a lot easier than base functions. Not that you can't use base functions as per the solution and get the same result but might be worth checking out {lubridate} at some point. Cheers

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.