Execute an operation on all variables begining by the same name

I have data with a number of test. In fonction of the data, number of test can be different and I want to automate my code as much as I can. I have some steps of importations or things like that for example, and I don't know how to do to do it in fonction of my number of tests.

  memo <-paste(BITR$num_ref,BITR$prenom,BITR$nom,BITR$njf,BITR$cp,BITR$ddn,BITR$sexe,
               BITR$adresse1,BITR$adresse2,BITR$localite,
               BITR$date_test_1,BITR$type_test_1, BITR$resultat_1, BITR$resul_test_1, BITR$date_1, BITR$resul_1,
               BITR$date_test_2,BITR$type_test_2, BITR$resultat_2, BITR$resul_test_2, BITR$date_2, BITR$resul_2,
               BITR$date_test_3,BITR$type_test_3, BITR$resultat_3, BITR$resul_test_3, BITR$date_3, BITR$resul_3,
               BITR$date_test_4,BITR$type_test_4, BITR$resultat_4, BITR$resul_test_4, BITR$date_4, BITR$resul_4,
               BITR$date_test_5,BITR$type_test_5, BITR$resultat_5, BITR$resul_test_5, BITR$date_5, BITR$resul_5,
               BITR$date_test_6,BITR$type_test_6, BITR$resultat_6, BITR$resul_test_6, BITR$date_6, BITR$resul_6,
               BITR$date_test_7,BITR$type_test_7, BITR$resultat_7, BITR$resul_test_7, BITR$date_7, BITR$resul_7,
               sep="_")

data<- data.frame(id, prenom, nom, nomjf, ddn, sexe,cpNaiss,cp,ville,
                      date_testD1,type_testD1,resultatD1,result_testD1,date_D1,resul_D1,
                      date_testD2,type_testD2,resultatD2,result_testD2,date_D2,resul_D2,
                      date_testD3,type_testD3,resultatD3,result_testD3,date_D3,resul_D3,
                      date_testD4,type_testD4,resultatD4,result_testD4,date_D4,resul_D4,
                      date_testD5,type_testD5,resultatD5,result_testD5,date_D5,resul_D5,
                      date_testD6,type_testD6,resultatD6,result_testD6,date_D6,resul_D6,
                      date_testD7,type_testD7,resultatD7,result_testD7,date_D7,resul_D7,Class)

These are examples of my instructions, and I want my program to do it easily, without the user need for example to add lines in the code. For example with "start_with ='date_testD'" or I don't know how.

Thank you for your time

Can you describe your data type here? like BITR and variables in the below dataframe?

Selection helpers from tidyselect shall be used with select or else tidyselect functions, for example:

library(tidyverse)
BITR %>% select(
  num_ref,prenom,nom,njf,cp,ddn,sexe,adresse1,adresse2,localite,
  starts_with(c("date_test","type_test","resultat","resul_test","date","resul")))

My date type is all in chr. Your solution is good but it doesn't keep my variables order. It's good also to create a datframe like in my second instruction but how can I do with the paste of the first condition?

In the case of memo, when you repeat the same information, thats a clue you arent using enough abstraction. With all the data for memo coming from 'BITR' its best to mention BITR only once and not however many times.
R has a function with() that supports that.
Similarly, when the names have a pattern i.e. different only by integer, you can generate the name in a loop. however, taking the time to craft that level of solution might be a loss if the data wont be changing, as copy and paste and changing numbers is probably quite fast.

One thing I would emphasise is the reproducible example, that I provided. This almost certainly took more effort for me to provide than for you to have provided, so in the future, where you can do so, please try.


#set up 
library(tidyverse)
library(glue)
BITR <- tibble(as.data.frame(matrix(1:(52*5),nrow=5,ncol=52)))
names(BITR) <- c("num_ref","prenom","nom","njf","cp","ddn","sexe",
"adresse1","adresse2","localite",
"date_test_1","type_test_1", "resultat_1", "resul_test_1", "date_1", "resul_1",
"date_test_2","type_test_2", "resultat_2", "resul_test_2", "date_2", "resul_2",
"date_test_3","type_test_3", "resultat_3", "resul_test_3", "date_3", "resul_3",
"date_test_4","type_test_4", "resultat_4", "resul_test_4", "date_4", "resul_4",
"date_test_5","type_test_5", "resultat_5", "resul_test_5", "date_5", "resul_5",
"date_test_6","type_test_6", "resultat_6", "resul_test_6", "date_6", "resul_6",
"date_test_7","type_test_7", "resultat_7", "resul_test_7", "date_7", "resul_7")

BITR

#original code

memo <-paste(BITR$num_ref,BITR$prenom,BITR$nom,BITR$njf,BITR$cp,BITR$ddn,BITR$sexe,
             BITR$adresse1,BITR$adresse2,BITR$localite,
             BITR$date_test_1,BITR$type_test_1, BITR$resultat_1, BITR$resul_test_1, BITR$date_1, BITR$resul_1,
             BITR$date_test_2,BITR$type_test_2, BITR$resultat_2, BITR$resul_test_2, BITR$date_2, BITR$resul_2,
             BITR$date_test_3,BITR$type_test_3, BITR$resultat_3, BITR$resul_test_3, BITR$date_3, BITR$resul_3,
             BITR$date_test_4,BITR$type_test_4, BITR$resultat_4, BITR$resul_test_4, BITR$date_4, BITR$resul_4,
             BITR$date_test_5,BITR$type_test_5, BITR$resultat_5, BITR$resul_test_5, BITR$date_5, BITR$resul_5,
             BITR$date_test_6,BITR$type_test_6, BITR$resultat_6, BITR$resul_test_6, BITR$date_6, BITR$resul_6,
             BITR$date_test_7,BITR$type_test_7, BITR$resultat_7, BITR$resul_test_7, BITR$date_7, BITR$resul_7,
             sep="_")

memo2 <-with(data = BITR,
            expr = paste(num_ref,prenom,nom,njf,cp,ddn,sexe,
             adresse1,adresse2,localite,
             date_test_1,type_test_1, resultat_1, resul_test_1, date_1, resul_1,
             date_test_2,type_test_2, resultat_2, resul_test_2, date_2, resul_2,
             date_test_3,type_test_3, resultat_3, resul_test_3, date_3, resul_3,
             date_test_4,type_test_4, resultat_4, resul_test_4, date_4, resul_4,
             date_test_5,type_test_5, resultat_5, resul_test_5, date_5, resul_5,
             date_test_6,type_test_6, resultat_6, resul_test_6, date_6, resul_6,
             date_test_7,type_test_7, resultat_7, resul_test_7, date_7, resul_7,
             sep="_"))

identical(memo,memo2)

# manufactor the paste expression;  its shortened because of the 1:7 which saves us repeating text
myreps <- function(n) glue("date_test_{n},type_test_{n}, resultat_{n}, resul_test_{n}, date_{n}, resul_{n}") %>%
  paste0(collapse=",\n")

(myexpr <- parse(text=glue(("paste(num_ref,prenom,nom,njf,cp,ddn,sexe,
      adresse1,adresse2,localite,\n {myreps(1:7)} ,
sep='_')"))))


memo3 <-with(data = BITR,
             expr = eval(myexpr))

identical(memo,memo3)

you can try unite to paste columns:

library(tidyverse)
BITR %>% select(
  num_ref,prenom,nom,njf,cp,ddn,sexe,adresse1,adresse2,localite,
  starts_with(c("date_test","type_test","resultat","resul_test","date","resul"))) %>%
  unite("memo", everything(),sep = "_")

as to the column order, you shall rather change another structure of your data, or you can generate a vector of names of ideal order of columns, and use all_of inside select to select them:

vars <- expand_grid(
  A = c('date_test_',"type_test_","resultat_","resul_test_","date_","resul_"),
  B = 1:7
) %>% arrange(B) %>% str_glue_data("{A}{B}")

BITR %>% select(
  num_ref,prenom,nom,njf,cp,ddn,sexe,adresse1,adresse2,localite,all_of(vars)) %>%
  unite('memo',everything(),sep = "_")

that's great, and then i imagine I can reuse "var"? For example now I have a separate procedure :

final<-separate(paires, memo, into = c('idR','prenomR','nomR','nomjfR','cpNaissR',
                                       'ddn_','sexe_','ad1_','ad2_','ville_',
                                       'date_test_1','type_test_1','resultat_1','result_test_1','date__1','resul__1',
                                       'date_test_2','type_test_2','resultat_2','result_test_2','date__2','resul__2',
                                       'date_test_3','type_test_3','resultat_3','result_test_3','date__3','resul__3',
                                       'date_test_4','type_test_4','resultat_4','result_test_4','date__4','resul__4',
                                       'date_test_5','type_test_5','resultat_5','result_test_5','date__5','resul__5',
                                       'date_test_6','type_test_6','resultat_6','result_test_6','date__6','resul__6',
                                       'date_test_7','type_test_7','resultat_7','result_test_7','date__7','resul__7'),sep='_')

why something like that doesn't work ? :

final<-paires%>% separate(memo, into(
  num_ref,prenom,nom,njf,cp,ddn,sexe,adresse1,adresse2,localite,all_of(vars)) 

all_of() must be used with tidyselect functions.

in your next case, the param into just needs a vector of names rather than some cols which requires tidyselect syntax, so it would be:

separate(
  paires,
  memo,
  sep = "_",
  into = c(
    'num_ref',
    'prenom',
    'nom',
    'njf',
    'cp',
    'ddn',
    'sexe',
    'adresse1',
    'adresse2',
    'localite',
    vars
  )
)

Thank you! Then I have the same to create a dataframe. But if I put this :

df<- data.frame(id, prenom, nom, ddnR,  vars,Clas_SGRC)

it doesn't work because there is not the same row lenght. Indeed, it doesn't take my values but the col names who are stock in 'vars'. How can I deal with it?

If you're sure that you are going to construct a dataframe with variables from the global environment, the function get() might be of help. And we also need map() function from purrr to do the loop.
In addition, I used add_column() rather than bind_cols here to control the position of added cols.

data.frame(id, prenom, nom, ddnR, Clas_SGRC) %>% add_column(
  vars %>% map_df(~ data.frame(get(.x)) %>% `names<-`(.x)), .before = Clas_SGRC
)

Yes my variable are from values in the global environnement. I don't know why but I have this error :

Error:
! New columns must be compatible with `.data`.
x New columns have 14742 rows.
i `.data` has 351 rows.

Unless my values in the global environnement have all 351 rows

Sry for my bad, we shall use map_dfc to bind columns here:

data.frame(id, prenom, nom, ddnR, Clas_SGRC) %>% add_column(
  vars %>% map_dfc(~ data.frame(get(.x)) %>% `names<-`(.x)), .before = Clas_SGRC
)

:sweat_smile:

Error:
! Can't find columns ``, ``, ``, ``, ``, and 346 more in `.data`.

sry again for my mistake, I didn't test the code very rigorously :face_with_hand_over_mouth:
try this?

data.frame(id, prenom, nom, ddnR, Clas_SGRC) %>% add_column(
  vars %>% map_dfc(~ data.frame(get(.x)) %>% `names<-`(.x)), .before = "Clas_SGRC"
)

or if Clas_SGRC is a dataframe, replace .before = "Clas_SGRC" with .after = "ddnR"

1 Like