Looping through a vector within the select function

Hi there,

I'm trying to make a loop where the iterative variable sits within the select() function (i.e. selecting one variable at a time to make a new table).

Here is my df:
age <- c("00to17", "18to24", "25to34")
m01 <- c(145, 163, 159)
m02 <- c(174, 178, 208)
m03 <- c(483, 573, 578)
df <- data.frame(age, m01, m02, m03)

I want to select each variable at a time to create three separate tables (1. age m01, 2. age m02, 3. age m03).

The code outside of a loop is:

df <- df %>%
select(age, m01)

How would I use a vector and loop within the select() function? The code below doesn't work.

vector <- c('m01', 'm02', 'm03')

for(i in seq_along(vector)){
df <- df %>%
select(age, i %in% vector[1:i])
vname <- paste0("table_", sep = ".", i)
assign(vname, table)
}

I would be really grateful for any suggestions.

Many thanks

I would do it like this.

library(rlang)
library(dplyr)
library(purrr)
age <- c("00to17", "18to24", "25to34")
m01 <- c(145, 163, 159)
m02 <- c(174, 178, 208)
m03 <- c(483, 573, 578)
df <- data.frame(age, m01, m02, m03)

MyFunc <- function(Nm, DF) {
  TableName <- paste("table", Nm, sep ="_")
  tmp <- select(DF, age, {{Nm}})
  assign(TableName, tmp, envir = .GlobalEnv)
}

Vec <- c("m01", "m02", "m03")
walk(Vec, MyFunc, DF = df)
1 Like

Hi @CM1

Here is something else that you could do with the use of the very useful list2env() function:

library(dplyr)

age <- c("00to17", "18to24", "25to34")
m01 <- c(145, 163, 159)
m02 <- c(174, 178, 208)
m03 <- c(483, 573, 578)
df <- data.frame(age, m01, m02, m03)

vector <- c('m01', 'm02', 'm03')
table_name <- paste0("table_", vector)
data_list <- lapply(vector, function(v){
  df[, c("age", v)]
  # select(df, age, {{v}}) # <- use this if you want to make use of dplyr::select() instead
})

names(data_list) <- table_name
list2env(x = data_list, envir = globalenv())
3 Likes

Yet another variant

library(tidyverse)

df <- data.frame(
  stringsAsFactors = FALSE,
               age = c("00to17", "18to24", "25to34"),
               m01 = c(145, 163, 159),
               m02 = c(174, 178, 208),
               m03 = c(483, 573, 578)
)

df %>% 
    pivot_longer(cols = -age, names_to = "variable", values_to = "value") %>% 
    nest_by(variable) %>%
    mutate(data = set_names(list(data), variable)) %>%
    list2env(x = .$data, envir = globalenv())
#> <environment: R_GlobalEnv>

Created on 2022-03-13 by the reprex package (v2.0.1)

2 Likes

This is brilliant! I haven't used a function before. Great to see how it works. Thanks so much

Thank you so much! I really like this solution and had not heard of the list2env function before. This works perfectly!

Thank you! It is really interesting to see the different ways of using the list2env() function. Really appreciate your help

You're welcome. data_list is not a function. I guess you mean the list2env() function?

Yes, sorry, I meant the list2env() function. Really helpful, thank you

You're very welcome!

Thanks so much again everyone. I've been trying to apply this to my work, but can't figure it out. I'm calculating direct age standardised rates according to the reprex and code below. This results in one table called ageadj_totals_40. What I need is to loop or use a function so that this process happens for each of the variables in "table", starting with mm_40, then mm_39... etc down to 0, resulting in 40 ageadj_totals_(iterative variable) tables. I'm selecting the variable (in this case "mm_40") when I define the "count" df. I think this should be doable, but I just can't figure it out. Any ideas very gratefully received!

library(epitools)
table <- data.frame(
  stringsAsFactors = FALSE,
      age_SIMD = c("00to17.01","18to24.01",
                   "25to34.01","35to44.01","45to54.01","55to64.01",
                   "65to74.01","75to84.01","85to94.01","95to00.01","00to17.02",
                   "18to24.02","25to34.02","35to44.02","45to54.02",
                   "55to64.02","65to74.02","75to84.02","85to94.02","95to00.02",
                   "00to17.03","18to24.03","25to34.03","35to44.03",
                   "45to54.03","55to64.03","65to74.03","75to84.03",
                   "85to94.03","95to00.03","00to17.04","18to24.04","25to34.04",
                   "35to44.04","45to54.04","55to64.04","65to74.04",
                   "75to84.04","85to94.04","95to00.04","00to17.05","18to24.05",
                   "25to34.05","35to44.05","45to54.05","55to64.05",
                   "65to74.05","75to84.05","85to94.05","95to00.05",
                   "00to17.06","18to24.06","25to34.06","35to44.06","45to54.06",
                   "55to64.06","65to74.06","75to84.06","85to94.06",
                   "95to00.06","00to17.07","18to24.07","25to34.07",
                   "35to44.07","45to54.07","55to64.07","65to74.07","75to84.07",
                   "85to94.07","95to00.07","00to17.08","18to24.08",
                   "25to34.08","35to44.08","45to54.08","55to64.08","65to74.08",
                   "75to84.08","85to94.08","95to00.08","00to17.09",
                   "18to24.09","25to34.09","35to44.09","45to54.09",
                   "55to64.09","65to74.09","75to84.09","85to94.09","95to00.09",
                   "00to17.10","18to24.10","25to34.10","35to44.10",
                   "45to54.10","55to64.10","65to74.10","75to84.10","85to94.10",
                   "95to00.10"),
         mm_40 = c(154,189,474,1115,2123,3895,
                   4169,3409,1185,93,163,181,582,1241,2193,3894,
                   4274,3376,1058,53,155,205,581,1324,2312,4183,
                   4700,3738,1446,208,298,305,924,1920,3180,5341,
                   6095,4478,1346,91,317,326,1108,2166,3548,5668,6145,
                   4403,1278,62,269,365,1178,2249,3493,5363,5557,
                   4113,1246,86,269,353,1327,2623,3781,5616,5824,
                   3906,1172,58,132,248,926,1946,2779,4087,4267,
                   3140,911,53,217,352,1223,2394,3476,4786,5084,3414,
                   983,46,156,283,1086,2275,3062,3594,3656,2282,
                   545,21),
         mm_39 = c(155,189,474,1118,2125,3890,
                   4169,3409,1185,93,165,181,585,1241,2193,3898,
                   4274,3376,1058,51,155,205,581,1324,2312,4183,
                   4700,3738,1441,208,298,305,954,1920,3180,5341,
                   6096,4475,1346,91,316,326,1106,2166,3546,5666,6145,
                   4403,1278,62,269,365,1178,2249,3493,5366,5557,
                   4116,1246,86,269,353,1327,2651,3781,5616,5826,
                   3906,1172,58,132,248,927,1949,2776,4087,4266,
                   3140,911,53,217,355,1223,2397,3479,4786,5084,3414,
                   983,46,158,286,1086,2278,3562,3594,3656,2286,
                   545,21),
         mm_38 = c(155,189,474,1118,2125,3890,
                   4169,3405,1185,93,165,181,586,1241,2193,3898,
                   4274,3376,1058,51,155,205,581,1324,2312,4183,
                   4706,3738,1441,208,258,305,924,1920,3180,5341,
                   6096,4478,1346,91,317,326,1158,2165,3548,5658,6145,
                   4403,1276,62,269,365,1178,2249,3493,5363,5557,
                   4113,1246,86,269,353,1527,2621,3781,5516,5824,
                   3906,1172,56,132,248,927,1949,2579,4087,4267,
                   3140,915,53,216,351,1223,2397,3478,4785,5084,3414,
                   983,46,158,283,1082,2276,3066,3596,3656,2286,
                   545,21),
             n = c(19287,8253,10836,15202,
                   15000,13123,7907,4679,1416,120,17660,8095,13256,
                   16172,14454,12781,7995,4686,1326,65,16796,7256,11602,
                   15251,14123,13055,8846,5345,1909,305,22684,9515,
                   13787,18506,17579,15696,10786,6186,1647,112,
                   21240,9220,15049,18024,16611,14903,10307,5895,1560,
                   76,22946,10096,15886,18946,16447,13967,9336,5490,
                   1556,104,21976,9841,15467,18684,16235,13676,9425,
                   5229,1463,89,15907,7649,12597,14218,11904,9952,
                   7020,4208,1124,61,17523,9096,12946,14586,13106,
                   10696,7847,4442,1188,71,15137,9593,13443,14100,
                   11373,7894,5585,2993,693,41)
   )

population <- table %>%
 separate("age_SIMD", c("age", "SIMD")) %>%
 select(age, SIMD, n) %>%
 pivot_wider(names_from = SIMD, values_from = n)

 x <- population
 x$age <- NULL
 standard <- rowMeans(x)

 count <- table %>%
 separate("age_SIMD", c("age", "SIMD")) %>%
 select(age, SIMD, mm_40) %>%
 pivot_wider(names_from = SIMD, values_from = mm_40)

 SIMD01<-ageadjust.direct(count$'01',population$'01',stdpop=standard)
 SIMD01 <- round(10^5*SIMD01,2)

SIMD02<-ageadjust.direct(count$'02',population$'02',stdpop=standard)
SIMD02 <- round(10^5*SIMD02,2)

SIMD03<-ageadjust.direct(count$'03',population$'03',stdpop=standard)
SIMD03 <- round(10^5*SIMD03,2)

SIMD04<-ageadjust.direct(count$'04',population$'04',stdpop=standard)
SIMD04 <- round(10^5*SIMD04,2)

SIMD05<-ageadjust.direct(count$'05',population$'05',stdpop=standard)
SIMD05 <- round(10^5*SIMD05,2)

SIMD06<-ageadjust.direct(count$'06',population$'06',stdpop=standard)
SIMD06 <- round(10^5*SIMD06,2)

SIMD07<-ageadjust.direct(count$'07',population$'07',stdpop=standard)
SIMD07 <- round(10^5*SIMD07,2)

SIMD08<-ageadjust.direct(count$'08',population$'08',stdpop=standard)
SIMD08 <- round(10^5*SIMD08,2)

SIMD09<-ageadjust.direct(count$'09',population$'09',stdpop=standard)
SIMD09 <- round(10^5*SIMD09,2)

SIMD10<-ageadjust.direct(count$'10',population$'10',stdpop=standard)
SIMD10 <- round(10^5*SIMD10,2)

ageadj_totals_40 <- data.frame(SIMD01, SIMD02, SIMD03, SIMD04, SIMD05, SIMD06, SIMD07, SIMD08,      SIMD09, SIMD10)
rownames(ageadj_totals_40) <- c("cruderate_40", "adjrate_40", "lci_40", "uci_40")

ageadj_totals_40 <- tibble::rownames_to_column(ageadj_totals_40, "VALUE")
ageadj_totals_40 <- ageadj_totals_40 %>%
pivot_longer(!VALUE, names_to = "SIMD", values_to = "count")

Unfortunately, the fact that your question is not reproducible makes it a bit difficult for us to provide any help. We have no information about the count and population datasets or the ageadjust.direct function. Helping you becomes easier if your question is reproducible. Take a look at this if you are not familiar with the term:

ageadj_totals_40 <- purrr::map_dfc(names(x),
              ~ {
                 ageadjust.direct(count[[.x]],
                                  population[[.x]],
                                  stdpop=standard) %>% `*`(10^5) %>%
                  round(2) %>% enframe(name=NULL,
                                       value=paste0("SIMD",.x))
               })


ageadj_totals_40$VALUE <- c("cruderate_40", "adjrate_40", "lci_40", "uci_40")
ageadj_totals_40 <- ageadj_totals_40 %>%
  pivot_longer(!VALUE, names_to = "SIMD", values_to = "count")

Thanks so much, yes I'll make it a clearer example next time!

Thanks so much, this is a really neat way to code this, with a massive reduction in code lines...

I am trying to perform this code for all the variables in the "counts" df - so this is the ageadj_totals_40 table, but I would like to make ageadj_totals_39, ageadj_totals_38... (the real df goes down to 1). Do you know how to loop through the mm_40 section of code below to allow that to happen (resulting in 40 tables, or ideally one long one)?

Many thanks, really appreciate your help

count <- table %>%
separate("age_SIMD", c("age", "SIMD")) %>%
select(age, SIMD, mm_40) %>%
pivot_wider(names_from = SIMD, values_from = mm_40)

Something like this?

library(epitools)
library(tidyverse)

table <- data.frame(
    stringsAsFactors = FALSE,
    age_SIMD = c("00to17.01","18to24.01",
                 "25to34.01","35to44.01","45to54.01","55to64.01",
                 "65to74.01","75to84.01","85to94.01","95to00.01","00to17.02",
                 "18to24.02","25to34.02","35to44.02","45to54.02",
                 "55to64.02","65to74.02","75to84.02","85to94.02","95to00.02",
                 "00to17.03","18to24.03","25to34.03","35to44.03",
                 "45to54.03","55to64.03","65to74.03","75to84.03",
                 "85to94.03","95to00.03","00to17.04","18to24.04","25to34.04",
                 "35to44.04","45to54.04","55to64.04","65to74.04",
                 "75to84.04","85to94.04","95to00.04","00to17.05","18to24.05",
                 "25to34.05","35to44.05","45to54.05","55to64.05",
                 "65to74.05","75to84.05","85to94.05","95to00.05",
                 "00to17.06","18to24.06","25to34.06","35to44.06","45to54.06",
                 "55to64.06","65to74.06","75to84.06","85to94.06",
                 "95to00.06","00to17.07","18to24.07","25to34.07",
                 "35to44.07","45to54.07","55to64.07","65to74.07","75to84.07",
                 "85to94.07","95to00.07","00to17.08","18to24.08",
                 "25to34.08","35to44.08","45to54.08","55to64.08","65to74.08",
                 "75to84.08","85to94.08","95to00.08","00to17.09",
                 "18to24.09","25to34.09","35to44.09","45to54.09",
                 "55to64.09","65to74.09","75to84.09","85to94.09","95to00.09",
                 "00to17.10","18to24.10","25to34.10","35to44.10",
                 "45to54.10","55to64.10","65to74.10","75to84.10","85to94.10",
                 "95to00.10"),
    mm_40 = c(154,189,474,1115,2123,3895,
              4169,3409,1185,93,163,181,582,1241,2193,3894,
              4274,3376,1058,53,155,205,581,1324,2312,4183,
              4700,3738,1446,208,298,305,924,1920,3180,5341,
              6095,4478,1346,91,317,326,1108,2166,3548,5668,6145,
              4403,1278,62,269,365,1178,2249,3493,5363,5557,
              4113,1246,86,269,353,1327,2623,3781,5616,5824,
              3906,1172,58,132,248,926,1946,2779,4087,4267,
              3140,911,53,217,352,1223,2394,3476,4786,5084,3414,
              983,46,156,283,1086,2275,3062,3594,3656,2282,
              545,21),
    mm_39 = c(155,189,474,1118,2125,3890,
              4169,3409,1185,93,165,181,585,1241,2193,3898,
              4274,3376,1058,51,155,205,581,1324,2312,4183,
              4700,3738,1441,208,298,305,954,1920,3180,5341,
              6096,4475,1346,91,316,326,1106,2166,3546,5666,6145,
              4403,1278,62,269,365,1178,2249,3493,5366,5557,
              4116,1246,86,269,353,1327,2651,3781,5616,5826,
              3906,1172,58,132,248,927,1949,2776,4087,4266,
              3140,911,53,217,355,1223,2397,3479,4786,5084,3414,
              983,46,158,286,1086,2278,3562,3594,3656,2286,
              545,21),
    mm_38 = c(155,189,474,1118,2125,3890,
              4169,3405,1185,93,165,181,586,1241,2193,3898,
              4274,3376,1058,51,155,205,581,1324,2312,4183,
              4706,3738,1441,208,258,305,924,1920,3180,5341,
              6096,4478,1346,91,317,326,1158,2165,3548,5658,6145,
              4403,1276,62,269,365,1178,2249,3493,5363,5557,
              4113,1246,86,269,353,1527,2621,3781,5516,5824,
              3906,1172,56,132,248,927,1949,2579,4087,4267,
              3140,915,53,216,351,1223,2397,3478,4785,5084,3414,
              983,46,158,283,1082,2276,3066,3596,3656,2286,
              545,21),
    n = c(19287,8253,10836,15202,
          15000,13123,7907,4679,1416,120,17660,8095,13256,
          16172,14454,12781,7995,4686,1326,65,16796,7256,11602,
          15251,14123,13055,8846,5345,1909,305,22684,9515,
          13787,18506,17579,15696,10786,6186,1647,112,
          21240,9220,15049,18024,16611,14903,10307,5895,1560,
          76,22946,10096,15886,18946,16447,13967,9336,5490,
          1556,104,21976,9841,15467,18684,16235,13676,9425,
          5229,1463,89,15907,7649,12597,14218,11904,9952,
          7020,4208,1124,61,17523,9096,12946,14586,13106,
          10696,7847,4442,1188,71,15137,9593,13443,14100,
          11373,7894,5585,2993,693,41)
)

table %>%
    separate("age_SIMD", c("age", "SIMD")) %>% 
    pivot_longer(starts_with("mm"), names_to = "mm", values_to = "count") %>% 
    group_by(age) %>% 
    mutate(standard = mean(n)) %>% 
    group_by(mm, SIMD) %>% 
    summarise(ageadj = list(round(ageadjust.direct(count, n, stdpop = standard)*10^5, 2))) %>%
    unnest_wider(ageadj)
#> `summarise()` has grouped output by 'mm'. You can override using the `.groups` argument.
#> # A tibble: 30 × 6
#> # Groups:   mm [3]
#>    mm    SIMD  crude.rate adj.rate    lci    uci
#>    <chr> <chr>      <dbl>    <dbl>  <dbl>  <dbl>
#>  1 mm_38 01        17535.   17261. 17001. 17525 
#>  2 mm_38 02        17642.   17569. 17306. 17836.
#>  3 mm_38 03        19953.   18037. 17778. 18298.
#>  4 mm_38 04        20549.   19622. 19374. 19873.
#>  5 mm_38 05        22198.   21475. 21209. 21743.
#>  6 mm_38 06        20840.   21377. 21106. 21649.
#>  7 mm_38 07        22327.   22775. 22494. 23060.
#>  8 mm_38 08        21617.   22085. 21766. 22408.
#>  9 mm_38 09        24018.   24567. 24243. 24895.
#> 10 mm_38 10        20988.   24341. 23972. 24716.
#> # … with 20 more rows

Created on 2022-03-15 by the reprex package (v2.0.1)

This is absolutely brilliant, thank you so much! I have been trying to work out a neat way of doing the for weeks... Very grateful.

The solution proposed by @andresrcs is quite complex. Here's my solution. It doesn't require any extra packages, pure R is enough. The code is much simpler, just take a look:

age <- c("00to17", "18to24", "25to34")
m01 <- c(145, 163, 159)
m02 <- c(174, 178, 208)
m03 <- c(483, 573, 578)
df <- data.frame(age, m01, m02, m03)

df

# regular expression to extract colnames of your interest:
my_cols <- grep(pattern = "^m", x = colnames(df), value = TRUE)

# collect all new data.frames as a list, to access them easily
lapply(my_cols, function(col) {
  df[, c("age", col)]  
})

Your solution doesn't seem to address the OP's request, can you elaborate on how this code is going to yield the requested output?

Have in mind that the OP has clarified their request beyond what has been stated on the first post,so you would have to read the complete thread to have a complete sense of what is required.

Hehe, You're right @andresrcs :wink: It's really bad style of asking for help, by the way. The output is not clear at any point in this thread, but as we can see You managed to figure out what the author had in her mind. Point for You :slight_smile: