Subsetting a dataframe based on elements in list and store results (subsetted dataframes) in a list

Hello! :smiley:

# I have got this dataframe:
a <- c(10,20,30,40)
b <- c(15,25,35,45)
c <- c(20,30,40,50)
df <- data.frame(a,b,c)

df
#    a  b  c
# 1 10 15 20
# 2 20 25 30
# 3 30 35 40
# 4 40 45 50

# And this list:
v1 <- c(1,2)
v2 <- c(2,4)
v3 <- c(3,4)
list <- list(v1,v2,v3)

list
# [[1]]
# [1] 1 2

# [[2]]
# [1] 2 4

# [[3]]
# [1] 3 4 

I would like to subset my dataframe based on the elements in the list. The elements in the list represent the index of the dataframe. And I would like to store the subsetted dataframes in a new list.

Here is my solution: 
df1 <- df[list[[1]],]
df2 <- df[list[[2]],]
df3 <- df[list[[3]],]

df_list <- list(df1,df2,df3)

df_list
#   [[1]]
#   a  b  c
# 1 10 15 20
# 2 20 25 30

#   [[2]]
#   a  b  c
# 2 20 25 30
# 4 40 45 50

#  [[3]]
#   a  b  c
# 3 30 35 40
# 4 40 45 50

Unfortunately, my real list and dataframe are very huge. Therefore, I would like to include all components of the list in one code line. Something like that:

df_list <- df[list[[1:3]], ]

but this doesn't work.

Someone here, who could help me?

Hi @Mad_Madru,
I am almost ashamed to post a solution here that relies on a for() loop:

# I have got this dataframe:
a <- c(10,20,30,40)
b <- c(15,25,35,45)
c <- c(20,30,40,50)
df <- data.frame(a,b,c)
df
#>    a  b  c
#> 1 10 15 20
#> 2 20 25 30
#> 3 30 35 40
#> 4 40 45 50

# And this list:
v1 <- c(1,2)
v2 <- c(2,4)
v3 <- c(3,4)
list1 <- list(v1,v2,v3)
list1
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 2 4
#> 
#> [[3]]
#> [1] 3 4

# How to extract a list of subsetted dataframes based on row range in list1?

out_lst <- vector(mode="list", length=length(list1))

for(ii in seq(1:length(list1))) {
  sub_df <- df[unlist(list1[ii])[1] : unlist(list1[ii])[2], ]
  out_lst[[ii]] <- sub_df
}

out_lst
#> [[1]]
#>    a  b  c
#> 1 10 15 20
#> 2 20 25 30
#> 
#> [[2]]
#>    a  b  c
#> 2 20 25 30
#> 3 30 35 40
#> 4 40 45 50
#> 
#> [[3]]
#>    a  b  c
#> 3 30 35 40
#> 4 40 45 50

Created on 2021-08-17 by the reprex package (v2.0.1)
However, I'm sure my indiscretion will prompt another forum member to supply a suitable solution using purrr :wink:

Hello @DavoWW,

thank you for the quick reply. That looks good, but something got wrong with the second dataframe in the output list. Your result is:

#> [[2]]
#>    a  b  c
#> 2 20 25 30
#> 3 30 35 40
#> 4 40 45 50

But it shoud be:

#   [[2]]
#   a  b  c
# 2 20 25 30
# 4 40 45 50

I think it's because of your range definition. Your code also pick index 3 in the second component of the list v2 <-c(2,4), but it should only pick index 2 and index 4.

Another approach:

> df = data.frame(a=seq(10,40,10), b=seq(15,45,10), c=seq(20,50,10))
> lst = list(c(1,2), c(2,4), c(3,4))
> lapply(lst, \(x) df[x,])
[[1]]
   a  b  c
1 10 15 20
2 20 25 30

[[2]]
   a  b  c
2 20 25 30
4 40 45 50

[[3]]
   a  b  c
3 30 35 40
4 40 45 50

Hope this helps.

2 Likes

OK, this does it but @Yarnabrina 's solution is much neater.

out_lst <- vector(mode="list", length=length(list1))

for(ii in seq(1:length(list1))) {
  sub_df <- df[unlist(list1[ii]), ]
  out_lst[[ii]] <- sub_df
}

out_lst
1 Like

Hello @Yarnabrina,

are you sure this:

lapply(lst, \(x) df[x,])

is right?

There is something missing or wrong...R says"unexpected token"".

Pretty sure. If you are on latest version of R, it should work with no issues.

\(x) df[x,]

and

function(x) df[x,]

are equivalent.

1 Like

This works perfectly with my data. Thank you. :upside_down_face:

Also,

list1 %>%
  map(\(x) df[x,])
1 Like

I have got an older version (4.0.3).

lapply(list1, \(x) df[x,])

doesn't run. However, this

lapply(list1, function(x) df[x,])

works fine.

Thank you! :smiley:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.