Apply a function to a variable and concurrently create a new one

Hi all,
I have a dataframe with a column Arrivo (formatted as date) and a column Giorni (formatted as integer) with number of days (es.: 2, 3, 6 etc..).
I would like to apply two function to these columns and precisely, I would like to duplicate a row for the number in the column Giorni and while duplicating these rows, I would like to create a new column called data.osservazione that is equal to Arrivo and augmented of one day iteratively.

From this:

  No.  Casa Anno       Data Categoria Camera     Arrivo Stornata.il Giorni
1   2.867 SEELE 2019 03/09/2019       CDV    316 28/03/2020          NA      3
2 148.000 SEELE 2020 20/01/2020       CDS    105 29/03/2020          NA      3
3   3.684 SEELE 2019 16/11/2019        CD    102 02/04/2020          NA      5

to this:

No. data.osservazione  Casa Anno       Data Categoria Camera            Arrivo
1 2867         3/28/2020 SEELE 2019 03/09/2019       CDV    316 3/28/2020 0:00:00
2 2867         3/29/2020 SEELE 2019 03/09/2019       CDV    316 3/28/2020 0:00:00
3 2867         3/30/2020 SEELE 2019 03/09/2019       CDV    316 3/28/2020 0:00:00
4  148         3/29/2020 SEELE 2020 20/01/2020       CDS    105 3/29/2020 0:00:00
5  148         3/30/2020 SEELE 2020 20/01/2020       CDS    105 3/29/2020 0:00:00
6  148         3/31/2020 SEELE 2020 20/01/2020       CDS    105 3/29/2020 0:00:00
  Stornata.il Giorni
1        #N/D      3
2        #N/D      3
3        #N/D      3
4        #N/D      3

I was able to duplicate the rows but I don't know how to concurrently create the new column with the values I need.

Please don't mind the date values in the columns, I'll fix them in the end.

Thanks in advance

It is a little disappointing to me that you only have a question to ask, while you also had something to offer:
showing how to duplicate the rows.
If I understand you correctly the following could be a way to do this (I simplified the data) :

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

data1 = data.frame(
  No. = c(2867,148000,3684),
  Arrivo = c('28/03/2020','29/03/2020','02/04/2020'),
  Giorni = c(3, 3, 5),
  Otherfields = c('CDV ...','CDS ...','CD ...')
)

  # Casa Anno       Data Categoria Camera     Arrivo Stornata.il Giorni
  # 1   2.867 SEELE 2019 03/09/2019       CDV    316 28/03/2020          NA      3
  # 2 148.000 SEELE 2020 20/01/2020       CDS    105 29/03/2020          NA      3
  # 3   3.684 SEELE 2019 16/11/2019        CD    102 02/04/2020          NA      5


data1 %>%
  tidyr::uncount(Giorni) %>%
  dplyr::group_by(No.) %>%
  dplyr::mutate(
    seqnr=seq(1,n())-1,
    data.osservazione = as.Date(Arrivo,"%d/%m/%Y") +seqnr
    ) %>%
  dplyr::ungroup() %>%
  dplyr::select(-c(Arrivo,seqnr))
#> # A tibble: 11 x 3
#>       No. Otherfields data.osservazione
#>     <dbl> <chr>       <date>           
#>  1   2867 CDV ...     2020-03-28       
#>  2   2867 CDV ...     2020-03-29       
#>  3   2867 CDV ...     2020-03-30       
#>  4 148000 CDS ...     2020-03-29       
#>  5 148000 CDS ...     2020-03-30       
#>  6 148000 CDS ...     2020-03-31       
#>  7   3684 CD ...      2020-04-02       
#>  8   3684 CD ...      2020-04-03       
#>  9   3684 CD ...      2020-04-04       
#> 10   3684 CD ...      2020-04-05       
#> 11   3684 CD ...      2020-04-06

Created on 2020-05-22 by the reprex package (v0.3.0)

Hi @HanOostdijk, I'm sorry I didn't provide the only piece of code I got, I didn't think it was necessary, I apologize.

I came up with a solution myself yesterday night but yours is really more concise and elegant, I really like it!

I'll show what I got so people can see the differences.

library(tidyverse)
library(purrr)
library(lubridate)

revenue <- revenue %>% mutate(Data = as_date(Data), Arrivo = as_date(Arrivo), `Stornata il` = as_date(`Stornata il`), Partenza = as_date(Partenza))

revenue <- revenue %>% mutate(media_die = Alloggio/Giorni)

revenue_1 <- revenue %>%  mutate(data_obs = Arrivo, id = 1:nrow(revenue))

revenue_2 <- revenue_1 %>% group_by(id, data_obs) %>% 
  complete(Giorni = sequence(Giorni)) %>% 
  ungroup() %>% 
  mutate(data_obs = data_obs + Giorni -1)

data_obs <- revenue_2$data_obs

revenue_3 <- revenue %>% map_df(.,rep, .$Giorni)

revenue_finale <- revenue_3 %>% mutate(data_obs = data_obs)

I basically created some data frame, extract the variable I need and attached to the data frame with duplicated rows to get the final version.
I know it's redundant and not so efficient but I still learning :slight_smile:

Again really thank you!

Thank you for sharing!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.