Extract names of a dataset with purrr::map2() or purrr::imap()

simRock · September 27, 2022, 4:09am

I am trying to use purrr::map2() or purrr::imap() to find a dataset from a large list of datasets where there is a given variable. Essentially, I will loop through the list of datasets and only print the names of the datasets that have the variable of interest. When I do it with purrr::map() , the dataset is unnamed ".x[[i]]". Any help would be greatly appreciated. Thank you

#load packages
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

#create fictitious datasets
df1 <- tibble(score_a=1:20,
              sex_a=rep(c("M", "F"), 10))
df2 <- tibble(score_b=1:20,
              sex_b=rep(c("M", "F"), 10))
df3 <- tibble(score_c=1:20,
              sex_c=rep(c("M", "F"), 10))

#create a function that returns the dataset that
#contains a given variable
get_dataset_name <- function(data, contains){
  
  data_var_names <- colnames(data) 

  dataname <- deparse(substitute(data))
  
  if(contains %in% data_var_names){
    return(dataname)
  }
}

#testing the function
get_dataset_name(data=df3, contains="score_c")
#> [1] "df3"


#creating a list of the all datasets
data_list <- list(df1, df2, df3)

#looping through a list of the dataset to find the dataset
#that includes the given variable

map(data_list, get_dataset_name, contains="score_c")
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> [1] ".x[[i]]"

#I was hoping to obtain "df3" instead of  ".x[[i]]"
#I read that purrr::map2() or purrr::imap could solve
#the issue but I am not sure how to set it up
#Any help would be appreciated it



# map2(.x=data_list, 
#      .y=names(data_list), 
#      ~get_dataset_name(data=.x, contains="score_c"),
#      nest(.x, name=.y)
# )


#imap(data_list, get_dataset_name, contains="score_c")

FactOREO · September 27, 2022, 7:30am

Hello,

my solution is outside functional programming, which is due to the main issue in your problem description:

# works, because you directly input df3 as argument for data
get_dataset_name(df3, 'score_c')
#> [1] "df3"
# your problem, because that is what map() (or lapply()) use to apply the function
get_dataset_name(data_list[[3]], 'score_c')
#> [1] "data_list[[3]]"

^{Created on 2022-09-27 with reprex v2.0.2}

Instead you could modify your function like this

is_var_here <- function(data, contains){
 
  data_var_names <- names(data) 

  contains %in% data_var_names
}

^{Created on 2022-09-27 with reprex v2.0.2}

and then loop over your (named) list:

result <- vector(length = length(data_list))
for(l in seq_along(data_list)){
  if (is_var_here(data_list[[l]],'score_c')){
    result[[l]] <- names(data_list[l])
  } else {
    result[[l]] <- NA_character_
  }
}

result
#> [1] NA    NA    "df3"

^{Created on 2022-09-27 with reprex v2.0.2}

If your data.frames inside the list are not named, this will not work. But if they are, it is a feasible way.

Kind regards

nirgrahamuk · September 27, 2022, 12:01pm

are you able to name the datasets within the list ? so the result is equivalent to
data_list <- list(df1=df1, df2=df2, df3=df3)
rather than data_list <- list(df1, df2, df3)

if so, you could do

get_dataset_name <- function(data, contains){
  contains %in% colnames(data) 
}

map(data_list, ~{get_dataset_name(.x,
                                  contains="score_c")}) |> 
  Filter(f = \(x)isTRUE(x)) |> names()

simRock · September 27, 2022, 2:53pm

Thank you, @nirgrahamuk , I can name the list as you suggested but I tried to run the code but it is given me an error. What is the |> or the \(x) symbol? Is this a pipe operator? The code does not seem to run as is.

nirgrahamuk · September 27, 2022, 3:07pm

I'm using the pipe operator and anonymous function syntax introduced in R4.1
if you are using older R use the magrittr pipe %>% instead of |> and use function(x) in place of \(x)

simRock · September 27, 2022, 3:47pm

Thank you very much. It works now. Just out of curiosity, is there a tidyverse equivalent of Filter(f = function(x) isTRUE(x)) . I did not know about the Filter() from base R. Not sure what it does.

nirgrahamuk · September 27, 2022, 3:49pm

it applies a filter to a list/vector, simily to dplyr::filter which works on data.frames

simRock · September 27, 2022, 4:10pm

Thank you very much @nirgrahamuk

simRock · September 27, 2022, 4:11pm

Thank you very much @FactOREO

system · October 4, 2022, 4:11pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.