Assign range of dates for each unique row in the dataframe.

dplyr
tidyverse
tabulizer

#1

I have a table which gives me the gpi score for a country by year.

EXISTING DATA IN THE FINAL DATAFRAME I HAVE:

gpi_year gpi_rank	gpi_country	     gpi_score
2018		1		Iceland		     1.096
2018		2		New Zealand	     1.192
2018		3		Austria		     1.274
.
.
.

2018		161		South Sudan	     3.508
2018		162		Afghanistan	     3.585
2018		163		Syria		     3.6

However, I need to add a date range for each individual row.

Example:
I will check if the row has 2018 if yes then i would duplicate the record for that country for each date starting from 1st Jan 2018 to 31st Dec 2018 (because it is 2018). i.e 365 dates for one single country.

Similarly if the row has the data for 2017 for any country then i need to duplicate the record for that country for each date starting from 1st Jan 2017 to 31st Dec 2017 i.e 365 dates for one single country.

Example: NEED A SIMILAR OUTPUT:

gpi_year  gpi_rank  gpi_country	 gpi_score	date
2018		1		Iceland		 1.096		01-01-2018
2018		1		Iceland		 1.096		02-01-2018
2018		1		Iceland		 1.096		03-01-2018
.
.
.
.
2018		1		Iceland		1.096		29-12-2018
2018		1		Iceland		1.096		30-12-2018
2018		1		Iceland		1.096		31-12-2018
2018		2		New Zealand	1.096		01-01-2018
2018		2		New Zealand 1.096		02-01-2018
2018		2		New Zealand 1.096		03-01-2018
.
.
.
.
2018		2		New Zealand 1.096		29-12-2018
2018		2		New Zealand 1.096		30-12-2018
2018		2		New Zealand 1.096		31-12-2018

NOTE: My data has 163 countries. Each country has the score for 2017 and 2018.
Therefore for each country i need to assign dates ranging from 1st Jan 2017 to 31st Jan 2017 and 1st Jan 2018 to 31st Jan 2018.


Why doesn't group_by %>% purr::f() behave the same way as summarise()?
#2

Providing reprex is always advisable if you want to get input on your problem as fast as possible. There is more info about how to do one here:

Your problem can be solved with a join and unnest from tidyr:

suppressPackageStartupMessages(library(tidyverse))

years <- tibble::tribble(
  ~year, ~dates,
  "2018",  seq(from = as.Date("2018-01-01"), to = as.Date("2018-12-31"), by = "1 day"),
  "2017",  seq(from = as.Date("2017-01-01"), to = as.Date("2017-12-31"), by = "1 day")
)

df <- tibble::tribble(
  ~gpi_year, ~gpi_country, ~gpi_score,
  "2018",      "Iceland",      1.096,
  "2018",      "South Sudan",  3.508,
  "2017",      "Iceland",      2,
  "2017",      "South Sudan",  1
)

options(pillar.sigfig = 4)

df %>%
  dplyr::left_join(years, by = c("gpi_year" = "year")) %>%
  tidyr::unnest(dates)
#> # A tibble: 1,460 x 4
#>    gpi_year gpi_country gpi_score dates     
#>    <chr>    <chr>           <dbl> <date>    
#>  1 2018     Iceland         1.096 2018-01-01
#>  2 2018     Iceland         1.096 2018-01-02
#>  3 2018     Iceland         1.096 2018-01-03
#>  4 2018     Iceland         1.096 2018-01-04
#>  5 2018     Iceland         1.096 2018-01-05
#>  6 2018     Iceland         1.096 2018-01-06
#>  7 2018     Iceland         1.096 2018-01-07
#>  8 2018     Iceland         1.096 2018-01-08
#>  9 2018     Iceland         1.096 2018-01-09
#> 10 2018     Iceland         1.096 2018-01-10
#> # ... with 1,450 more rows

Created on 2018-11-03 by the reprex package (v0.2.1)


#3

Probably the simplest solution

library(dplyr)
date = c(rep(as.Date(as.Date("2018-01-01"):as.Date("2018-12-31"), origin = '1970-01-01'), 163), 
rep(as.Date(as.Date("2017-01-01"):as.Date("2017-12-31"), origin = '1970-01-01'), 163))
date = paste(substr(date,9,10), substr(date,6,7), substr(date,1,4), sep= '-') 

df = mutate(df, date = date)

#4

I guess, I'm biased, but I would hardly call this a simple solution. You are introducing quite a few dependencies that have very little to do with the problem.

For example, 163 is really just an implementation detail in this case. If you happen to have 20 countries then you'll need to change it in 2 places.
Your date variable is way too hacky. It is not obvious at all what it does and why it works in this case.
Just my 0.02$


#5

Well, since some years ago, dplyr isn't a dependency, it's must have library and you better have it always, bc it makes life easier. The rest are just basic R functions. You can always replace 163 with nrow(df). I havn't even used lubridate in my example ; )


#6

Maybe I'm misunderstanding, but for this to work don't you need a starting data frame that's already been manipulated to make room for all the dates? I thought the OP's question was more about how to make the new rows, not how to make a sequence of dates.

For instance, if I try to adapt this solution to @mishabalyasin's sample data frame above:

library(tidyverse)

df <- tibble::tribble(
  ~gpi_year, ~gpi_country, ~gpi_score,
  "2018",      "Iceland",      1.096,
  "2018",      "South Sudan",  3.508,
  "2017",      "Iceland",      2,
  "2017",      "South Sudan",  1
)

date <- c(
  rep(as.Date(
    as.Date("2018-01-01"):as.Date("2018-12-31"),
    origin = '1970-01-01'), 2),
  
  rep(as.Date(
    as.Date("2017-01-01"):as.Date("2017-12-31"),
    origin = '1970-01-01'), 2)
)

date <- paste(substr(date, 9, 10), substr(date, 6, 7), substr(date, 1, 4), sep = '-')

df<- mutate(df, date = date)
#> Error in mutate_impl(.data, dots): Column `date` must be length 4 (the number of rows) or one, not 1460

Created on 2018-11-08 by the reprex package (v0.2.1)

I'm also not sure I see the virtue of converting an actual date vector into a character vector just to match the OP's preferred date format. For downstream analysis and plotting, I think dates should almost always be dates ā€” the formatting can be handled when it's time for a final presentation.

That said, if I were going to convert the dates to strings, I think I'd use format(date, "%d-%m-%Y") rather than paste(). format() has the distinct benefit of making it obvious at a glance which format was applied, which seems to me like a major win for maintainability.


#7

Hi @mishabalyasin

I came across this question while looking for something else..

When I run your code, I get this error message: Error: Each column must either be a list of vectors or a list of data frames [dates]

suppressPackageStartupMessages(library(tidyverse))
#> Warning: package 'ggplot2' was built under R version 3.4.4
#> Warning: package 'tidyr' was built under R version 3.4.4

years <- tibble::tribble(
 ~year, ~dates,
 "2018",  seq(from = as.Date("2018-01-01"), to = as.Date("2018-12-31"), by = "1 day"),
 "2017",  seq(from = as.Date("2017-01-01"), to = as.Date("2017-12-31"), by = "1 day")
)

df <- tibble::tribble(
 ~gpi_year, ~gpi_country, ~gpi_score,
 "2018",      "Iceland",      1.096,
 "2018",      "South Sudan",  3.508,
 "2017",      "Iceland",      2,
 "2017",      "South Sudan",  1
)

options(pillar.sigfig = 4)

df %>%
 dplyr::left_join(years, by = c("gpi_year" = "year")) %>%
 tidyr::unnest(dates)
#> Error: Each column must either be a list of vectors or a list of data frames [dates]


#8

I re-ran the code with reprex and it works. What is the version of packages you are using? You can easily share it with reprex::reprex(si = TRUE):

suppressPackageStartupMessages(library(tidyverse))

years <- tibble::tribble(
  ~year, ~dates,
  "2018",  seq(from = as.Date("2018-01-01"), to = as.Date("2018-12-31"), by = "1 day"),
  "2017",  seq(from = as.Date("2017-01-01"), to = as.Date("2017-12-31"), by = "1 day")
)

df <- tibble::tribble(
  ~gpi_year, ~gpi_country, ~gpi_score,
  "2018",      "Iceland",      1.096,
  "2018",      "South Sudan",  3.508,
  "2017",      "Iceland",      2,
  "2017",      "South Sudan",  1
)

options(pillar.sigfig = 4)

df %>%
  dplyr::left_join(years, by = c("gpi_year" = "year")) %>%
  tidyr::unnest()
#> # A tibble: 1,460 x 4
#>    gpi_year gpi_country gpi_score dates     
#>    <chr>    <chr>           <dbl> <date>    
#>  1 2018     Iceland         1.096 2018-01-01
#>  2 2018     Iceland         1.096 2018-01-02
#>  3 2018     Iceland         1.096 2018-01-03
#>  4 2018     Iceland         1.096 2018-01-04
#>  5 2018     Iceland         1.096 2018-01-05
#>  6 2018     Iceland         1.096 2018-01-06
#>  7 2018     Iceland         1.096 2018-01-07
#>  8 2018     Iceland         1.096 2018-01-08
#>  9 2018     Iceland         1.096 2018-01-09
#> 10 2018     Iceland         1.096 2018-01-10
#> # ... with 1,450 more rows

Created on 2018-11-09 by the reprex package (v0.2.1)

Session info
devtools::session_info()
#> ā”€ Session info ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Ubuntu 18.04.1 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2018-11-09                  
#> 
#> ā”€ Packages ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€
#>  package     * version date       lib source        
#>  assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)
#>  backports     1.1.2   2017-12-13 [1] CRAN (R 3.5.1)
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 3.5.1)
#>  bindr         0.1.1   2018-03-13 [1] CRAN (R 3.5.1)
#>  bindrcpp      0.2.2   2018-03-29 [1] CRAN (R 3.5.1)
#>  broom         0.5.0   2018-07-17 [1] CRAN (R 3.5.1)
#>  callr         3.0.0   2018-08-24 [1] CRAN (R 3.5.1)
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.5.1)
#>  cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)
#>  colorspace    1.3-2   2016-12-14 [1] CRAN (R 3.5.1)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)
#>  devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)
#>  digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)
#>  dplyr       * 0.7.7   2018-10-16 [1] CRAN (R 3.5.1)
#>  evaluate      0.12    2018-10-09 [1] CRAN (R 3.5.1)
#>  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.5.1)
#>  forcats     * 0.3.0   2018-02-19 [1] CRAN (R 3.5.1)
#>  fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)
#>  ggplot2     * 3.1.0   2018-10-25 [1] CRAN (R 3.5.1)
#>  glue          1.3.0   2018-07-17 [1] CRAN (R 3.5.1)
#>  gtable        0.2.0   2016-02-26 [1] CRAN (R 3.5.1)
#>  haven         1.1.2   2018-06-27 [1] CRAN (R 3.5.1)
#>  hms           0.4.2   2018-03-10 [1] CRAN (R 3.5.1)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.5.1)
#>  httr          1.3.1   2017-08-20 [1] CRAN (R 3.5.1)
#>  jsonlite      1.5     2017-06-01 [1] CRAN (R 3.5.1)
#>  knitr         1.20    2018-02-20 [1] CRAN (R 3.5.1)
#>  lattice       0.20-35 2017-03-25 [4] CRAN (R 3.5.0)
#>  lazyeval      0.2.1   2017-10-29 [1] CRAN (R 3.5.1)
#>  lubridate     1.7.4   2018-04-11 [1] CRAN (R 3.5.1)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)
#>  modelr        0.1.2   2018-05-11 [1] CRAN (R 3.5.1)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.5.1)
#>  nlme          3.1-137 2018-04-07 [4] CRAN (R 3.5.0)
#>  pillar        1.3.0   2018-07-14 [1] CRAN (R 3.5.1)
#>  pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)
#>  pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.1)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)
#>  plyr          1.8.4   2016-06-08 [1] CRAN (R 3.5.1)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)
#>  processx      3.2.0   2018-08-16 [1] CRAN (R 3.5.1)
#>  ps            1.2.0   2018-10-16 [1] CRAN (R 3.5.1)
#>  purrr       * 0.2.5   2018-05-29 [1] CRAN (R 3.5.1)
#>  R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.1)
#>  Rcpp          0.12.19 2018-10-01 [1] CRAN (R 3.5.1)
#>  readr       * 1.1.1   2017-05-16 [1] CRAN (R 3.5.1)
#>  readxl        1.1.0   2018-04-20 [1] CRAN (R 3.5.1)
#>  remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)
#>  rlang         0.3.0.1 2018-10-25 [1] CRAN (R 3.5.1)
#>  rmarkdown     1.10    2018-06-11 [1] CRAN (R 3.5.1)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)
#>  rvest         0.3.2   2016-06-17 [1] CRAN (R 3.5.1)
#>  scales        1.0.0   2018-08-09 [1] CRAN (R 3.5.1)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)
#>  stringi       1.2.4   2018-07-20 [1] CRAN (R 3.5.1)
#>  stringr     * 1.3.1   2018-05-10 [1] CRAN (R 3.5.1)
#>  testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)
#>  tibble      * 1.4.2   2018-01-22 [1] CRAN (R 3.5.1)
#>  tidyr       * 0.8.2   2018-10-28 [1] CRAN (R 3.5.1)
#>  tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.5.1)
#>  tidyverse   * 1.2.1   2017-11-14 [1] CRAN (R 3.5.1)
#>  usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)
#>  utf8          1.1.4   2018-05-24 [1] CRAN (R 3.5.1)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)
#>  xml2          1.2.0   2018-01-24 [1] CRAN (R 3.5.1)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.1)
#> 
#> [1] /home/misha/R/x86_64-pc-linux-gnu-library/3.5
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

#9

Looks like I was using an older version of R. I updated R, R Studio and all of the packages. The code seems to be working now.


#10

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.