Adjust dates in R

Jovani · August 26, 2021, 2:09am

Could you tell me why the dates are coming off weird? I will insert an executable code below. You may notice that date1 and date2 have the wrong dates.

The link to download the database df: df.xlsx - Google Sheets

df<-read_excel('C:/Users/Desktop/df.xlsx')

df<-subset(df,df$date2<df$date1) 

dim_data<-dim(df)

day<-c(seq.Date(from = as.Date(df$date2[1]),
                to = as.Date(df$date2[dim_data[1]]),
                by = "1 day"))

df_grouped<-matrix(nrow = 9,ncol=6)
colnames(df_grouped)<-c("Id","date1","date2","Week","D","DR01")


for (k in 1:9) {
  df_grouped[k,]<-c(subset(df,date2==day[k])$Id[1], 
                                  subset(df,date2==day[k])$date1[1],
                                  subset(df,date2==day[k])$date2[1], 
                                  subset(df,date2==day[k])$Week[1],
                                  subset(df,date2==day[k])$D[1],
                                  sum(subset(df,date2==day[k])$DR01))
}
> df_grouped
      Id  date1        date2        Week       D     DR01
 [1,] "1" "1626739200" "1617321600" "Friday"   NA    "21"
 [2,] "1" "1626739200" "1617408000" "Saturday" NA    "24"
 [3,] NA  NA           NA           NA         NA    "0" 
 [4,] NA  NA           NA           NA         NA    "0" 
 [5,] NA  NA           NA           NA         NA    "0" 
 [6,] NA  NA           NA           NA         NA    "0" 
 [7,] "1" "1626739200" "1617840000" "Thursday" "Hol" "21"
 [8,] "1" "1626739200" "1617926400" "Friday"   NA    "11"
 [9,] "1" "1626739200" "1618012800" "Saturday" NA    "14

andresrcs · August 26, 2021, 3:51am

df_grouped is not a data frame but a matrix and matrices can't contain mixed class variables so you get that values when date-time variables get coerced into text. Maybe you are trying to do something like this?

df %>%
    mutate(across(starts_with("date"), as.Date)) %>% 
    group_by(date2) %>% 
    summarise(Id = first(Id),
              date1 = first(date1),
              Week = first(Week),
              D = first(D),
              DR01 = sum(DR01))
#> # A tibble: 6 × 6
#>   date2         Id date1      Week     D      DR01
#>   <date>     <dbl> <date>     <chr>    <chr> <dbl>
#> 1 2021-04-02     1 2021-07-20 Friday   <NA>     21
#> 2 2021-04-03     1 2021-07-20 Saturday <NA>     24
#> 3 2021-04-08     1 2021-07-20 Thursday Hol      21
#> 4 2021-04-09     1 2021-07-20 Friday   <NA>     11
#> 5 2021-04-10     1 2021-07-20 Saturday <NA>     14
#> 6 2021-08-01     1 2021-07-20 Thursday <NA>     13

Jovani · August 26, 2021, 4:17am

Exactly @andresrcs ! Thank you very much!

However, I have a few questions:

I need the order to be shown in the sequence: id, date1, date2, Week, D and DR01. As it appeared, it is date 2 before id and date
is it possible to leave the date formats in dd-mm-yyyy?
I need to create a variable, that can be df_grouped for this you created, because I'll need to call this variable in another part of the code, how does it look then?

Thanks again!

FJCC · August 26, 2021, 1:08pm

The code below is a modified version of @andresrcs's code that rearranges the columns and stores the result in the variable df_grouped. It also shows how to change the date columns into character columns with a particular format. Since the formatting results in characters, those columns cannot be used in calculations as dates without transforming them back. If you want the dates shown in a particular format, I suggest you do that at the last step after all calculations are complete.
By the way, the original spreadsheet had the dates in the m/d/Y format when I opened it, I suppose because Google detected my US locale. That shows the dangers of using ambiguous date formats.

df <- readxl::read_excel('~/R/Play/df.xlsx')
df_grouped <- df %>%
   mutate(across(starts_with("date"), as.Date)) %>% 
   group_by(date2) %>% 
   summarise(Id = first(Id),
             date1 = first(date1),
             Week = first(Week),
             D = first(D),
             DR01 = sum(DR01)) %>% 
   select(Id,date1,date2,Week,D,DR01)
`summarise()` ungrouping output (override with `.groups` argument)
df_grouped
# A tibble: 6 x 6
     Id date1      date2      Week     D      DR01
  <dbl> <date>     <date>     <chr>    <chr> <dbl>
1     1 2021-07-20 2021-04-02 Friday   NA       21
2     1 2021-07-20 2021-04-03 Saturday NA       24
3     1 2021-07-20 2021-04-08 Thursday Hol      21
4     1 2021-07-20 2021-04-09 Friday   NA       11
5     1 2021-07-20 2021-04-10 Saturday NA       14
6     1 2021-07-20 2021-08-01 Thursday NA       13
 
df_grouped <- df_grouped %>% mutate(date1=format(date1,"%d/%m/%Y"),
                                    date2=format(date2,"%d/%m/%Y"))
 
df_grouped
# A tibble: 6 x 6
     Id date1      date2      Week     D      DR01
  <dbl> <chr>      <chr>      <chr>    <chr> <dbl>
1     1 20/07/2021 02/04/2021 Friday   NA       21
2     1 20/07/2021 03/04/2021 Saturday NA       24
3     1 20/07/2021 08/04/2021 Thursday Hol      21
4     1 20/07/2021 09/04/2021 Friday   NA       11
5     1 20/07/2021 10/04/2021 Saturday NA       14
6     1 20/07/2021 01/08/2021 Thursday NA       13

Jovani · August 26, 2021, 1:37pm

Thanks @andresrcs and @FJCC for helping me! Solved the problem! =)

system · September 2, 2021, 1:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.