Using paste0 to name a new df inside a function

jkdby · July 28, 2021, 1:36pm

Hello R community! I am working with a survey that has 37 items. I designed a function that computes a mean for each item over three days, and returns the dataframe that I will need to do a subsequent statistical test (not included here). It works well, but the problem is I do not want to copy paste it 37 times. So, I tried adding paste0 into my function in the hope that I could ultimately call my function with a list of the 37 items and it would go on to make the 37 dfs that I need. However, as soon as I entered the portion of the line of code with paste0 I get an error. Any ideas about how to get passed this?

my_tdm <- 
  function(x) {
    paste0(x,"_tinydf") <<-  # note that this is the new line in the function that makes it unoperational
    feelings_revcode_c %>% 
    select(record_id,day,x) %>% 
    pivot_wider(names_from = "day", values_from = x) %>% 
    mutate(tdm = (one+two+three)/3)
  }

my_tdm("achy") # note that achy is a name of an item in my survey -this command works when I eliminate the line with `paste0`. when I include the problem line, this is the error message it yields:

Error in paste0(x, "_tinydf") <<- (feelings_revcode_c %>% select(record_id,  : 
  object 'x' not found

Please see the data here:
structure(list(record_id = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "11", "13", "14", "15",
"16", "17", "19", "20", "21", "22", "23", "24", "25", "26", "27",
"29", "30", "31", "32", "34", "36", "37", "38", "39", "41", "42",
"43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53",
"54", "55", "56", "57", "59", "60", "62", "63", "64", "65", "66",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "93", "94", "96", "97", "98", "99", "100", "101", "103",
"104", "105", "106", "107", "108", "109", "110", "111", "112",
"113", "114", "115", "116", "117", "118", "121", "122", "123",
"124", "125", "126", "127", "128", "129", "130", "131", "132",
"133", "134", "136", "137", "138", "139", "141", "142", "143",
"144", "145", "146", "147", "148", "149", "150", "151", "152",
"153", "154", "156", "158", "159", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "170", "171", "172"
), class = "factor"), day = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("baseline",
"one", "two", "three"), class = "factor"), achy = c(2L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), angry = c(2L, 2L, 5L, 1L, 1L, 2L, 3L, 2L, 1L, 4L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), annoy = c(5L, 5L, 5L, 1L, 1L,
2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L)), row.names = c(NA,
-20L), groups = structure(list(day = structure(1:4, .Label = c("baseline",
"one", "two", "three"), class = "factor"), .rows = structure(list(
c(1L, 5L, 9L, 13L, 17L), c(2L, 6L, 10L, 14L, 18L), c(3L,
7L, 11L, 15L, 19L), c(4L, 8L, 12L, 16L, 20L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

nirgrahamuk · July 28, 2021, 1:55pm


my_tdm <-
  function(x) {
    result <- feelings_revcode_c %>%
      select(record_id, day, x) %>%
      pivot_wider(names_from = "day", values_from = x) %>%
      mutate(tdm = (one + two + three) / 3)
    assign(
      x = paste0(x, "_tinydf"),
      value = result, 
      envir = .GlobalEnv
    )
    result
  }

arthur.t · July 28, 2021, 2:02pm

When you say "item", what column in the data frame does that correspond to?

jkdby · July 28, 2021, 2:18pm

The addition of the assign function makes it work, thank you!

But I am not sure why passing the name of a list doesn't work to then give me the output for several items? do you know?

although i tried with a list of 37 character strings, for the reprex here the equivalent would be:

my_feeling_names <- as.list("achy", "angry", "annoy")

my_tdm(my_feeling_names)

But then I get this error:

Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type `list`.
ℹ It must be numeric or character.
Run `rlang::last_error()` to see where the error occurred.

jkdby · July 28, 2021, 2:21pm

@arthur.t
Items correspond to columns in the my_reprex, so, those would be "achy", "angry", and "annoy".

arthur.t · July 28, 2021, 3:48pm

Does this return the result you want?

I would recommend against writing functions that modify objects that exist outside the function. Instead, I recommend finding a way to do the calculation with pivot, group_by, summarize functions.

library(tidyverse)

df <- structure(list(record_id = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
                                             2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), .Label = c("1",
                                                                                                                 "2", "3", "4", "5", "6", "7", "8", "9", "11", "13", "14", "15",
                                                                                                                 "16", "17", "19", "20", "21", "22", "23", "24", "25", "26", "27",
                                                                                                                 "29", "30", "31", "32", "34", "36", "37", "38", "39", "41", "42",
                                                                                                                 "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53",
                                                                                                                 "54", "55", "56", "57", "59", "60", "62", "63", "64", "65", "66",
                                                                                                                 "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
                                                                                                                 "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
                                                                                                                 "91", "93", "94", "96", "97", "98", "99", "100", "101", "103",
                                                                                                                 "104", "105", "106", "107", "108", "109", "110", "111", "112",
                                                                                                                 "113", "114", "115", "116", "117", "118", "121", "122", "123",
                                                                                                                 "124", "125", "126", "127", "128", "129", "130", "131", "132",
                                                                                                                 "133", "134", "136", "137", "138", "139", "141", "142", "143",
                                                                                                                 "144", "145", "146", "147", "148", "149", "150", "151", "152",
                                                                                                                 "153", "154", "156", "158", "159", "160", "161", "162", "163",
                                                                                                                 "164", "165", "166", "167", "168", "169", "170", "171", "172"
                                             ), class = "factor"), day = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
                                                                                     3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("baseline",
                                                                                                                                                         "one", "two", "three"), class = "factor"), achy = c(2L, 1L, 1L,
                                                                                                                                                                                                             1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
                                                                                                                                                                                                             1L), angry = c(2L, 2L, 5L, 1L, 1L, 2L, 3L, 2L, 1L, 4L, 1L, 1L,
                                                                                                                                                                                                                            1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), annoy = c(5L, 5L, 5L, 1L, 1L,
                                                                                                                                                                                                                                                                       2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L)), row.names = c(NA,
                                                                                                                                                                                                                                                                                                                                                   -20L), groups = structure(list(day = structure(1:4, .Label = c("baseline",
                                                                                                                                                                                                                                                                                                                                                                                                                  "one", "two", "three"), class = "factor"), .rows = structure(list(
                                                                                                                                                                                                                                                                                                                                                                                                                    c(1L, 5L, 9L, 13L, 17L), c(2L, 6L, 10L, 14L, 18L), c(3L,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         7L, 11L, 15L, 19L), c(4L, 8L, 12L, 16L, 20L)), ptype = integer(0), class = c("vctrs_list_of",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "tbl_df", "tbl", "data.frame"))

df %>% 
  pivot_longer(-c(record_id, day)) %>%
  group_by(day, name) %>%
  summarize_at(vars(value), lst(mean, min, max, sd))
#> # A tibble: 12 x 6
#> # Groups:   day [4]
#>    day      name   mean   min   max    sd
#>    <fct>    <chr> <dbl> <int> <int> <dbl>
#>  1 baseline achy    1.6     1     3 0.894
#>  2 baseline angry   1.4     1     2 0.548
#>  3 baseline annoy   2       1     5 1.73 
#>  4 one      achy    1       1     1 0    
#>  5 one      angry   2       1     4 1.22 
#>  6 one      annoy   2.4     1     5 1.67 
#>  7 two      achy    1       1     1 0    
#>  8 two      angry   2.2     1     5 1.79 
#>  9 two      annoy   2.4     1     5 1.67 
#> 10 three    achy    1       1     1 0    
#> 11 three    angry   1.4     1     2 0.548
#> 12 three    annoy   1.2     1     2 0.447

^{Created on 2021-07-28 by the reprex package (v1.0.0)}

jkdby · July 28, 2021, 6:06pm

@nirgrahamuk I've also tried an alternative approach of using apply instead of passing a list name. but likewise, i have had no luck with it.

apply(feelings_revcode_c[,3:5],2, my_tdm)

error message:


Error: Problem with `mutate()` input `tdm`.
x object 'one' not found
ℹ Input `tdm` is `(one + two + three)/3`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Values in `achy` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(achy = list)` to suppress this warning.
* Use `values_fn = list(achy = length)` to identify where the duplicates arise
* Use `values_fn = list(achy = summary_fun)` to summarise duplicates 
2: Values in `day` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(day = list)` to suppress this warning.
* Use `values_fn = list(day = length)` to identify where the duplicates arise
* Use `values_fn = list(day = summary_fun)` to summarise duplicates 
3: Values in `record_id` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(record_id = list)` to suppress this warning.
* Use `values_fn = list(record_id = length)` to identify where the duplicates arise
* Use `values_fn = list(record_id = summary_fun)` to summarise duplicates

system · August 18, 2021, 6:07pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.