Hi all,
I have a dataframe with a column Arrivo (formatted as date) and a column Giorni (formatted as integer) with number of days (es.: 2, 3, 6 etc..).
I would like to apply two function to these columns and precisely, I would like to duplicate a row for the number in the column Giorni and while duplicating these rows, I would like to create a new column called data.osservazione that is equal to Arrivo and augmented of one day iteratively.
From this:
No. Casa Anno Data Categoria Camera Arrivo Stornata.il Giorni
1 2.867 SEELE 2019 03/09/2019 CDV 316 28/03/2020 NA 3
2 148.000 SEELE 2020 20/01/2020 CDS 105 29/03/2020 NA 3
3 3.684 SEELE 2019 16/11/2019 CD 102 02/04/2020 NA 5
It is a little disappointing to me that you only have a question to ask, while you also had something to offer:
showing how to duplicate the rows.
If I understand you correctly the following could be a way to do this (I simplified the data) :
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(magrittr)
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#>
#> extract
data1 = data.frame(
No. = c(2867,148000,3684),
Arrivo = c('28/03/2020','29/03/2020','02/04/2020'),
Giorni = c(3, 3, 5),
Otherfields = c('CDV ...','CDS ...','CD ...')
)
# Casa Anno Data Categoria Camera Arrivo Stornata.il Giorni
# 1 2.867 SEELE 2019 03/09/2019 CDV 316 28/03/2020 NA 3
# 2 148.000 SEELE 2020 20/01/2020 CDS 105 29/03/2020 NA 3
# 3 3.684 SEELE 2019 16/11/2019 CD 102 02/04/2020 NA 5
data1 %>%
tidyr::uncount(Giorni) %>%
dplyr::group_by(No.) %>%
dplyr::mutate(
seqnr=seq(1,n())-1,
data.osservazione = as.Date(Arrivo,"%d/%m/%Y") +seqnr
) %>%
dplyr::ungroup() %>%
dplyr::select(-c(Arrivo,seqnr))
#> # A tibble: 11 x 3
#> No. Otherfields data.osservazione
#> <dbl> <chr> <date>
#> 1 2867 CDV ... 2020-03-28
#> 2 2867 CDV ... 2020-03-29
#> 3 2867 CDV ... 2020-03-30
#> 4 148000 CDS ... 2020-03-29
#> 5 148000 CDS ... 2020-03-30
#> 6 148000 CDS ... 2020-03-31
#> 7 3684 CD ... 2020-04-02
#> 8 3684 CD ... 2020-04-03
#> 9 3684 CD ... 2020-04-04
#> 10 3684 CD ... 2020-04-05
#> 11 3684 CD ... 2020-04-06
I basically created some data frame, extract the variable I need and attached to the data frame with duplicated rows to get the final version.
I know it's redundant and not so efficient but I still learning