Delete column depending on the date and code you choose

Could you help me with the following question: I want to make a new Y variable, which makes a relationship between a date and the code I choose, df1 dataset and the SPV dataset. I'll give an example to make it easier to understand.

Note that my df1 dataset for 07/07/2021, Code CDE, has values up to DR08. Columns DR09 through DR014 are 0. Therefore, I would like to make a new Y variable, which excludes these columns for that same day and code, but from the SPV variable, which in this case is DR09_DR09_DR09_PV through DR014_DR014_PV. So, the Y variable output table for that day and code would look like this:

enter image description here

If I choose day 09/07/2021, Code CDE, the columns DR013_DR013_PV and DR014_DR014_PV would be excluded from my SPV dataset, as DR013 and DR014 have values equal to 0 of df1 dataset. Therefore, the output table for this day and code would look like this :

enter image description here

So, my Y variable will depend on the day and code I choose.

  library(dplyr)
  library(tidyverse)
  library(lubridate)
    
df1 <- structure(
  list(date1= c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
               "2021-06-28","2021-06-28","2021-06-28"),
       date2 = c("2021-06-30","2021-06-30","2021-07-02","2021-07-07","2021-07-07","2021-07-09","2021-07-09","2021-07-09"),
       Code = c("FDE","ABC","ABC","ABC","CDE","FGE","ABC","CDE"),
       Week= c("Wednesday","Wednesday","Friday","Wednesday","Wednesday","Friday","Friday","Friday"),
       DR1 = c(4,1,4,3,3,4,3,5),
       DR01 = c(4,1,4,3,3,4,3,6), DR02= c(4,2,6,7,3,2,7,4),DR03= c(9,5,4,3,3,2,1,5),
       DR04 = c(5,4,3,3,6,2,1,9),DR05 = c(5,4,5,3,6,2,1,9),
       DR06 = c(2,4,3,3,5,6,7,8),DR07 = c(2,5,4,4,9,4,7,8),
       DR08 = c(0,0,0,1,2,0,0,0),DR09 = c(0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0),DR011 = c(4,0,0,0,0,0,0,0), 
       DR012 = c(0,0,0,3,0,0,0,5),DR013 = c(0,0,1,0,0,0,2,0),DR014 = c(0,0,0,0,0,2,0,0)),
  class = "data.frame", row.names = c(NA, -8L))

> df1
       date1      date2 Code      Week DR1 DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR09 DR010 DR011 DR012 DR013 DR014
1 2021-06-28 2021-06-30  FDE Wednesday   4    4    4    9    5    5    2    2    0    0     0     4     0     0     0
2 2021-06-28 2021-06-30  ABC Wednesday   1    1    2    5    4    4    4    5    0    0     0     0     0     0     0
3 2021-06-28 2021-07-02  ABC    Friday   4    4    6    4    3    5    3    4    0    0     0     0     0     1     0
4 2021-06-28 2021-07-07  ABC Wednesday   3    3    7    3    3    3    3    4    1    0     0     0     3     0     0
5 2021-06-28 2021-07-07  CDE Wednesday   3    3    3    3    6    6    5    9    2    0     0     0     0     0     0
6 2021-06-28 2021-07-09  FGE    Friday   4    4    2    2    2    2    6    4    0    0     0     0     0     0     2
7 2021-06-28 2021-07-09  ABC    Friday   3    3    7    1    1    1    7    7    0    0     0     0     0     2     0
8 2021-06-28 2021-07-09  CDE    Friday   5    6    4    5    9    9    8    8    0    0     0     0     5     0     0

dmda<-"2021-07-07"
CodeChosse<-"CDE"

x<-df1 %>% select(starts_with("DR0"))

x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
PV<-select(x, date2,Week, Code, DR1, ends_with("PV"))

med<-PV %>%
  group_by(Code,Week) %>%
  summarize(across(ends_with("PV"), median))

SPV<-df1%>%
  inner_join(med, by = c('Code', 'Week')) %>%
  mutate(across(matches("^DR0\\d+$"), ~.x + 
                  get(paste0(cur_column(), '_PV')),
                .names = '{col}_{col}_PV')) %>%
  select(date1:Code, DR01_DR01_PV:last_col())

SPV<-data.frame(SPV)
> SPV
       date1      date2 Code DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV DR04_DR04_PV DR05_DR05_PV DR06_DR06_PV DR07_DR07_PV DR08_DR08_PV DR09_DR09_PV DR010_DR010_PV DR011_DR011_PV DR012_DR012_PV DR013_DR013_PV DR014_DR014_PV
1 2021-06-28 2021-06-30  FDE            4          4.0            4          4.0          4.0          4.0          4.0          4.0          4.0            4.0            4.0            4.0              4            4.0
2 2021-06-28 2021-06-30  ABC            1         -0.5            3          2.5          2.5          2.5          2.5          1.5          2.0            2.0            2.0            0.5              2            2.0
3 2021-06-28 2021-07-02  ABC            4          3.0            5          4.5          5.5          1.5          2.0          3.5          3.5            3.5            3.5            3.5              3            3.5
4 2021-06-28 2021-07-07  ABC            3          4.5            1          1.5          1.5          1.5          1.5          2.5          2.0            2.0            2.0            3.5              2            2.0
5 2021-06-28 2021-07-07  CDE            3          3.0            3          3.0          3.0          3.0          3.0          3.0          3.0            3.0            3.0            3.0              3            3.0
6 2021-06-28 2021-07-09  FGE            4          4.0            4          4.0          4.0          4.0          4.0          4.0          4.0            4.0            4.0            4.0              4            4.0
7 2021-06-28 2021-07-09  ABC            3          4.0            2          2.5          1.5          5.5          5.0          3.5          3.5            3.5            3.5            3.5              4            3.5
8 2021-06-28 2021-07-09  CDE            5          5.0            5          5.0          5.0          5.0          5.0          5.0          5.0            5.0            5.0            5.0              5            5.0


mat1 <- df1 %>%
  filter(date2 == dmda, Code == CodeChosse) %>%
  select(starts_with("DR0")) %>%
  as.matrix()

(dropnums <- which(mat1 == 0))
(dropnames <- paste0("DR0", dropnums, "_DR0", dropnums, "_PV"))

SPV %>%
  filter(date2 == dmda, Code == CodeChosse) %>%
  select(-dropnames)
1 Like

It's almost certain what you did @nirgrahamuk, thanks for that! However, notice that for the day 09/07, code CDE, the code excluded DR08_DR08_PV, DR09_DR09_PV, DR10_DR10_PV and DR11_DR11_PV, however, it was to exclude only DR013_DR13_PV and DR014_DR14_PV. The idea is that it only excludes 0 values that have no later number. In this case, it has 5 as a later number, so it doesn't exclude DR08_DR08_PV, DR09_DR09_PV, DR10_DR10_PV and DR11_DR11_PV, only DR013_DR13_PV and DR014_DR14_PV. I don't know if it's more understandable now?

> SPV %>%
+ filter(date2 == dmda, Code == CodeChosse) %>%
+ select(-dropnames)  
       date1      date2 Code DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV DR04_DR04_PV DR05_DR05_PV DR06_DR06_PV DR07_DR07_PV DR012_DR012_PV
1 2021-06-28 2021-07-09  CDE            5            5            5            5            5            5            5              5

The output for this case is:
enter image description here

ok.

mat1 <- df1 %>%
  filter(date2 == dmda, Code == CodeChosse) %>%
  select(starts_with("DR0")) %>%
  pivot_longer(cols = everything()) %>%
  arrange(desc(row_number())) %>%
  mutate(cs = cumsum(value)) %>%
  filter(cs == 0) %>%
  pull(name)

(dropnames <- paste0(mat1,"_",mat1, "_PV"))

SPV %>%
  filter(date2 == dmda, Code == CodeChosse) %>%
  select(-dropnames)
1 Like

That's exactly what I wanted @nirgrahamuk . Only one that was wrong, which is for the data 09/07, Code FGE, which has no columns to exclude, in this case, I would like the result to be the line corresponding to the day 09/07, code FGE, without excluding any column. Could you adjust this in the code? That's just what's missing, the rest is ok. Thanks @nirgrahamuk

@nirgrahamuk , do we need to use any_of function to solve this little problem?

yes


SPV %>%
  filter(date2 == dmda, Code == CodeChosse) %>%
  select(-any_of(dropnames))
1 Like

@nirgrahamuk, thank you so much for replying and for the great help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.