How to select data from a nested list using a loop?

eugenio.alladio · March 11, 2020, 10:29am

I have the following nested list that I obtain after importing several .xy files in my workspace.
I'd like to use only the data_block dataframes that are available in my list, in order to obtain a list of only the different data_block.

example_nested_list

I wrote the following code but I keep on obtaining the same error:

new_dat <- lapply(1:length(myfiles), function(x) NULL)
for (i in 1:length(myfiles)) {
  for (j in i) new_dat[[i]] <- myfiles[[i]][["dataset"]][[j]][["data_block"]]
}

Error in myfiles[[i]][["dataset"]][[j]] : subscript out of bounds

I'm quite new to R so I guess this is a very silly question, I'd be very grateful if anybody could help my with this issue

hendrikvanb · March 11, 2020, 11:21am

I'd strongly suggest using the {purrr} package when working with nested lists in R. I suspect the error you are getting may relate to the absence of a "data_block" element within one or more of the "dataset" elements in your list. Without a reprex, I unfortunately cannot confirm whether this is indeed the case. The best I could do was to generate some fake data that looks to be similar in structure to what you posted (again, it's difficult to discern the structure of your data purely based on your screenshot) and then show how one could extract the desired elements using {purrr}

# load library
library(purrr)

# create some mock data containing elements with varying degrees of "completeness"
myfiles <- list(
  list(
    dataset = list(
      'data_block' = as.matrix(1:2),
      'metadata_block' = list()
    )
  ),
  list(
    dataset = list(
      'data_block' = as.matrix(1:2),
      'metadata_block' = list()
    )
  ),
  list(
    dataset = list(
      'metadata_block' = list()
    )
  ),
  list(
  )
)

# use purrr map to iterate over the myfiles list {for each element in the list,
# try to extract the 'data_block' element (assuming it exists) from the 'dataset' element (assuming it exists)}
  purrr::map(myfiles, function(x) {
    purrr::pluck(x, 'dataset', 'data_block')
  })
#> [[1]]
#>      [,1]
#> [1,]    1
#> [2,]    2
#> 
#> [[2]]
#>      [,1]
#> [1,]    1
#> [2,]    2
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL

dromano · March 11, 2020, 11:36am

@eugenio.alladio: Could you post a sample of the table myfiles as text? You can apply dput() to it and paste the output here, between a pair of triple backticks, like this:

```
<--- paste output of dput(myfiles) here
```

nirgrahamuk · March 11, 2020, 12:17pm

I tried to find a very generic approach that could work on a very arbitrary list heirarchy without consistent structure. I found this crude solution, I'm hoping someone here might be in a position to refine it...

library(tidyverse)

convoluted_list_example <- list(
  list(
    "a",
    list(
      1,
      2
    )
  ),
  list(
    list("data_block" = c(9, 9, 9)),
    "xyz"
  ),
  list(
    list("qwerty",
      "data_block" = c(8, 8, 8)
    ),
    c(7, 8, 9)
  )
)

str(convoluted_list_example)

results_list <- list()

add_x_to_results_list <- function(x) {
  if (is.list(x)) {
    if (!is.null(names(x))) {
      if ("data_block" %in% names(x)) {
        l <- length(results_list)
      }
      results_list[[l + 1]] <<- x[["data_block"]]
    }
  }
}

processlist <- function(x) {
  if (length(x) > 1 & !"data_block" %in% names(x)) {
    walk(x, processlist)
  } else {
    add_x_to_results_list(x)
  }
}

processlist(convoluted_list_example)

results_list
> results_list
[[1]]
[1] 9 9 9

[[2]]
[1] 8 8 8

eugenio.alladio · March 11, 2020, 12:55pm

Dear @nirgrahamuk @dromano @hendrikvanb,
following yours indication I paste here an example of the dataset:

myfiles<-list(structure(list(dataset = structure(list(list(data_block = structure(c(0.00055, 
                                                                           0.001383333, 0.002216667, 0.00305, 0.003883333, 0.004716667, 
                                                                           0.00555, 0.006383333, 0.007216667, 0.00805, 0.008883333, 0.009716667, 
                                                                           0.01055, 0.011383333, 0.012216667, 0.01305, 0.013883333, 0.014716667, 
                                                                           0.01555, 0.016383333, 0.017216667, 0.01805, 0.018883333, 0.019716667, 
                                                                           0.02055, 0.021383333, 0.022216667, 0.02305, 0.023883333, 0.024716667, 
                                                                           0.02555, 0.026383333, 0.027216667, 0.02805, 37694, 37700, 37701, 
                                                                           37662, 37672, 37671, 37690, 37684, 37731, 37739, 37721, 37682, 
                                                                           37735, 37756, 37746, 37708, 37753, 37746, 37754, 37722, 37734, 
                                                                           37721, 37680, 37685, 37717, 37718, 37696, 37653, 37664, 37661, 
                                                                           37638, 37680, 37729, 37749), .Dim = c(34L, 2L), .Dimnames = list(
                                                                             NULL, c("V1", "V2"))), metadata_block = structure(list(key = character(0), 
                                                                                                                                    value = character(0)), class = "data.frame", row.names = integer(0)))), .Names = ""), 
                    metadata = structure(list(key = character(0), value = character(0)), class = "data.frame", row.names = integer(0))), format_name = "SPECS SpecsLab2 xy", class = "rxylib"), 
     structure(list(dataset = structure(list(list(data_block = structure(c(0.000117, 
                                                                           0.00095, 0.001783333, 0.002616667, 0.00345, 0.004283333, 
                                                                           0.005116667, 0.00595, 0.006783333, 0.007616667, 0.00845, 
                                                                           0.009283333, 0.010116667, 0.01095, 0.011783333, 0.012616667, 
                                                                           0.01345, 0.014283333, 0.015116667, 0.01595, 0.016783333, 
                                                                           0.017616667, 0.01845, 0.019283333, 0.020116667, 0.02095, 
                                                                           0.021783333, 0.022616667, 0.02345, 0.024283333, 0.025116667, 
                                                                           0.02595, 0.026783333, 0.027616667, 38494, 38502, 38463, 38426, 
                                                                           38411, 38436, 38424, 38360, 38296, 38264, 38216, 38191, 38205, 
                                                                           38212, 38194, 38128, 38127, 38122, 38127, 38104, 38066, 38005, 
                                                                           37926, 37865, 37915, 37941, 37910, 37877, 37875, 37902, 37868, 
                                                                           37832, 37875, 37837), .Dim = c(34L, 2L), .Dimnames = list(
                                                                             NULL, c("V1", "V2"))), metadata_block = structure(list(
                                                                               key = character(0), value = character(0)), class = "data.frame", row.names = integer(0)))), .Names = ""), 
                    metadata = structure(list(key = character(0), value = character(0)), class = "data.frame", row.names = integer(0))), format_name = "SPECS SpecsLab2 xy", class = "rxylib"))

Unfortunately I could not solve the problem using your helpful suggestions; @hendrikvanb's approach provided me the following result:

[[1]]
NULL

[[2]]
NULL

while @nirgrahamuk code provided the following error:

processlist(myfiles)

Error in results_list[[l + 1]] <<- x[["data_block"]] :
object 'l' not found

I hope that now that the dataset is available here, it would be helpful for you. Thank you very much for your help and willingness!

hendrikvanb · March 11, 2020, 1:02pm

@eugenio.alladio: you need to take the actual structure of your data into account! The code I suggested was based on the fake data I created. The structure you provided is different. I was able to make it work on my machine with a very simple adjustment:

purrr::map(myfiles, function(x) {
  purrr::pluck(x, 'dataset',1,  'data_block')
})

nirgrahamuk · March 11, 2020, 1:11pm

myfiles<-list(structure(list(dataset = structure(list(list(data_block = structure(c(0.00055, 
                                                                                    0.001383333, 0.002216667, 0.00305, 0.003883333, 0.004716667, 
                                                                                    0.00555, 0.006383333, 0.007216667, 0.00805, 0.008883333, 0.009716667, 
                                                                                    0.01055, 0.011383333, 0.012216667, 0.01305, 0.013883333, 0.014716667, 
                                                                                    0.01555, 0.016383333, 0.017216667, 0.01805, 0.018883333, 0.019716667, 
                                                                                    0.02055, 0.021383333, 0.022216667, 0.02305, 0.023883333, 0.024716667, 
                                                                                    0.02555, 0.026383333, 0.027216667, 0.02805, 37694, 37700, 37701, 
                                                                                    37662, 37672, 37671, 37690, 37684, 37731, 37739, 37721, 37682, 
                                                                                    37735, 37756, 37746, 37708, 37753, 37746, 37754, 37722, 37734, 
                                                                                    37721, 37680, 37685, 37717, 37718, 37696, 37653, 37664, 37661, 
                                                                                    37638, 37680, 37729, 37749), .Dim = c(34L, 2L), .Dimnames = list(
                                                                                      NULL, c("V1", "V2"))), metadata_block = structure(list(key = character(0), 
                                                                                                                                             value = character(0)), class = "data.frame", row.names = integer(0)))), .Names = ""), 
                             metadata = structure(list(key = character(0), value = character(0)), class = "data.frame", row.names = integer(0))), format_name = "SPECS SpecsLab2 xy", class = "rxylib"), 
              structure(list(dataset = structure(list(list(data_block = structure(c(0.000117, 
                                                                                    0.00095, 0.001783333, 0.002616667, 0.00345, 0.004283333, 
                                                                                    0.005116667, 0.00595, 0.006783333, 0.007616667, 0.00845, 
                                                                                    0.009283333, 0.010116667, 0.01095, 0.011783333, 0.012616667, 
                                                                                    0.01345, 0.014283333, 0.015116667, 0.01595, 0.016783333, 
                                                                                    0.017616667, 0.01845, 0.019283333, 0.020116667, 0.02095, 
                                                                                    0.021783333, 0.022616667, 0.02345, 0.024283333, 0.025116667, 
                                                                                    0.02595, 0.026783333, 0.027616667, 38494, 38502, 38463, 38426, 
                                                                                    38411, 38436, 38424, 38360, 38296, 38264, 38216, 38191, 38205, 
                                                                                    38212, 38194, 38128, 38127, 38122, 38127, 38104, 38066, 38005, 
                                                                                    37926, 37865, 37915, 37941, 37910, 37877, 37875, 37902, 37868, 
                                                                                    37832, 37875, 37837), .Dim = c(34L, 2L), .Dimnames = list(
                                                                                      NULL, c("V1", "V2"))), metadata_block = structure(list(
                                                                                        key = character(0), value = character(0)), class = "data.frame", row.names = integer(0)))), .Names = ""), 
                             metadata = structure(list(key = character(0), value = character(0)), class = "data.frame", row.names = integer(0))), format_name = "SPECS SpecsLab2 xy", class = "rxylib"))


results_list <- list()

add_x_to_results_list <- function(x) {
  if (is.list(x)) {
    if (!is.null(names(x))) {
      print("list has names")
      print(names(x))
      if ("data_block" %in% names(x)) {
        print("found a block")
        l <- length(results_list)
        results_list[[l + 1]] <<- x[["data_block"]]
      }
     
    }
  }
  else {
    print("not a list")
  }
}

processlist <- function(x) {
  if (length(x) > 1 & !"data_block" %in% names(x)) {
    print("walking x")
    purrr::walk(x, processlist)
  } else {
    print("add x")
    print(x)
    add_x_to_results_list(x)
    if(is.list(x))
      purrr::walk(x, processlist)
  }
}
# debug(add_x_to_results_list)
processlist(myfiles)

results_list

nwerth · March 11, 2020, 1:36pm

This should work for your case:

new_dat <- lapply(myfiles, function(x) x[["dataset"]][[1]][["data_block"]])

As for the error you came across, let's walk through your code:

new_dat <- lapply(1:length(myfiles), function(x) NULL)
new_dat
# [[1]]
# NULL
#
# [[2]]
# NULL

So new_dat is just a list where every element is a NULL. Next comes the loop, which we'll test by just assigning i and j directly.

# for (i in 1:length(myfiles))
i <- 1
# for (j in i)
j <- i[1]

Allow to me pause here and point something out: i will always be a single integer, so j in i is equivalent to j <- i. This might be different if i were a list, and that might've been your intent. For that, you'd need to do something like for (i in myfiles). But R is a functional language, which means we can do cool stuff with functions. That's why I suggest using lapply.

Moving onto the assignment statement, everything works in this loop:

new_dat[[i]] <- myfiles[[i]][["dataset"]][[j]][["data_block"]]
new_dat
# [[1]]
#                V1    V2
#  [1,] 0.000550000 37694
#  [2,] 0.001383333 37700
#  [3,] 0.002216667 37701
#  [4,] 0.003050000 37662
#  [5,] 0.003883333 37672
#  ...

But things break down in the next loop.

i <- 2
j <- i
new_dat[[i]] <- myfiles[[i]][["dataset"]][[j]][["data_block"]]
# Error in myfiles[[i]][["dataset"]][[j]] : subscript out of bounds

The problem is the "dataset" element of myfiles[[i]] is a list of length 1. So asking for the second element with [[j]] raises an error.

eugenio.alladio · March 11, 2020, 1:53pm

Dear @nwerth,
thank you sincerely for your help, you totally solved my issue!
Thanks a lot also for the whole clear explanation! I'm sorry for the silly question!

system · March 18, 2020, 1:53pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.