Extracting only numeric values from list of lists

GreyMerchant · March 9, 2022, 9:19am

Hello!

I am looking for the easiest way to strip my nested list of all non numeric values or to only return those lists (df) and components that are numeric. I don't mind it being flattened but I want it in a format I can work with as numerics. I am sure there is some way clever way of doing with purrr but not entirely sure how best to approach it.

library(tidyverse)
library(purrr)



df <- list(a = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c")),
           b = list(1,2,3,4,5),
           c = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c"),
                    c1 = list("a","b", c3 = c(1,2,3))
           )
)

df2 <- df %>% purrr::flatten()

lapply(df2, function(x){
  is.numeric(x)
})

df3 <- 
  purrr::keep(df2,is.numeric)

df3

pieterjanvc · March 9, 2022, 12:28pm

Hi there,

Here is a way of doing this by using the flatten() function from purrr

library(tidyverse)

#Data
df <- list(a = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c")),
           b = list(1,2,3,4,5),
           c = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c"),
                    c1 = list("a","b", c3 = c(1,2,3))
           )
)

#Flatten until only one dimension (but keep type)
while(any(lengths(df) > 1)){
  df = flatten(df)
}

#Only keep numeric values
numVals = df[sapply(df, is.numeric)] %>% unlist()

numVals
#>  [1] 1 2 3 1 2 3 4 5 1 2 3 1 2 3

^{Created on 2022-03-09 by the reprex package (v2.0.1)}

Hope this helps,
PJ

GreyMerchant · March 9, 2022, 12:30pm

Hi there,

This is definitely a start but I would like to preserve some of the complex list structure. So either I would like to take out all the non numeric bits or only return that structure with those still containing lists with numbers or vectors or dataframes with numbers/numeric.

pieterjanvc · March 9, 2022, 1:21pm

Hi,

Your original post suggested otherwise

Anyway, I spent too much time trying to get this, but I wanted to see it through and found a solution that preserves the structure

#Data
df <- list(a = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c")),
           b = list(1,2,3,4,5),
           c = list(a1 = list(1,2,3), 
                    b1 = list("a","b","c"),
                    c1 = list("a","b", c3 = c(1,2,3))
           )
)

#Recursive function that checks for numeric vaues
myFun = function(x){
  if(class(x) == "list"){
    
    #Go to next level if more dimensions
    y = lapply(x, myFun)
    
    #Ignore any NULL returns
    y = y[sapply(y, function(z){
      length(z) > 0
      })]
    
    return(y)
    
  } else {
    #Check for the logic
    if(all(is.numeric(x))){
      return(x)
    }
  }
}

myFun(df)
#> $a
#> $a$a1
#> $a$a1[[1]]
#> [1] 1
#> 
#> $a$a1[[2]]
#> [1] 2
#> 
#> $a$a1[[3]]
#> [1] 3
#> 
#> 
#> 
#> $b
#> $b[[1]]
#> [1] 1
#> 
#> $b[[2]]
#> [1] 2
#> 
#> $b[[3]]
#> [1] 3
#> 
#> $b[[4]]
#> [1] 4
#> 
#> $b[[5]]
#> [1] 5
#> 
#> 
#> $c
#> $c$a1
#> $c$a1[[1]]
#> [1] 1
#> 
#> $c$a1[[2]]
#> [1] 2
#> 
#> $c$a1[[3]]
#> [1] 3
#> 
#> 
#> $c$c1
#> $c$c1$c3
#> [1] 1 2 3

^{Created on 2022-03-09 by the reprex package (v2.0.1)}

Is this what you are looking for?

PJ

xvalda · March 9, 2022, 1:31pm

Hi @GreyMerchant

I can propose a shorter alternative, but it only indicates the structure, doesn't preserve it.
@pieterjanvc 's solution would is best for preserving the full structure.

df %>% unlist() %>% enframe() %>% filter(str_detect(value, "\\d+"))

nirgrahamuk · March 9, 2022, 1:39pm

Pieter beat me to a solution.
Mine was

library(tidyverse)
library(purrr)



df <- list(
  a = list(
    a1 = list(1, 2, 3),
    b1 = list("a", "b", "c")
  ),
  b = list(1, 2, 3, 4, 5),
  c = list(
    a1 = list(1, 2, 3),
    b1 = list("a", "b", "c"),
    c1 = list("a", "b", c3 = c(1, 2, 3))
  )
)

test_and_do <- function(x) {
  if (is.list(x)) {
    process_list(x)
  } else if (is.numeric(x)) {
    return(x)
  } else {
    return(NA)
  }
}
process_list <- function(x) {
  sublist_results <- map(
    x,
    test_and_do
  )
  discard(sublist_results, function(x) {
    all(is.na(x))
  })
}

(df2 <- test_and_do(df))

GreyMerchant · March 9, 2022, 2:16pm

Thank you Pieter! You're right I did mention flattened wouldn't be a problem. That would work for most cases but I could have a point at which that would lead to some errors in this specific problem.

Thanks for the recursive function to solve this problem! This should help me with what I need to get these similar enough for comparison now with waldo

system · March 16, 2022, 2:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.