Using dply::rowwise() operations on multiple nested dataframes without apply or for loops

Hello,

I have a question relating to the usage of the rowwise() function as updated for dplyr 1.0: dplyr 1.0.0: working within rows

This is awesome and exactly what I need, but is there a way to apply rowwise() operations when I am dealing with a dataframe nested inside a dataframe containing yet another dataframe?

I put together a reproducible example to illustrate:

library(dplyr) #need dplyr 1.0
library(tidyverse)
df <- tibble(
  student_id = 1:4, 
  test1 = 10:13, 
  test2 = 20:23, 
  test3 = 30:33, 
  test4 = 40:43,
  nest(tibble(
    nested_student_id = 1:4, 
    nested_test1 = 10:13, 
    nested_test2 = 20:23, 
    nested_test3 = 30:33,
)))
# make rowwise
df <- df %>% rowwise(data)
# now for each nested dataframe
for (i in 1:nrow(df)){
  # make a change to one to be able to tell the difference in values
  df$data[[i]]$nested_test1 <- df$data[[i]]$nested_test1+i
  # now make rowwise
  df$data[[i]] <- df$data[[i]] %>% rowwise()
}

Now I could perform rowwise() operations like so:

df %>% mutate(avg = mean(data$nested_test1, na.rm = T))

But let's say there is one additional "layer" of dataframes hidden inside data:

test_df <- df %>% nest(-student_id)

Now, does the same type of operation become impossible with rowwise()?

test_df %>% mutate(avg = mean(data$data$nested_test1, na.rm = T))

I am able to get to the result, but I need to index through the elements:

test_df$data[[1]] %>% mutate(avg = mean(data$nested_test1, na.rm = T))

I could wrap it in a for loop and use it to index through the different elements, but that defeats the purpose of using rowwise() to avoid this type of solution.

Is there a way to use rowwise() operations this way?

Is anyone able to figure out what code would be able to give me the new column of data$avg averaging the column data$data$nested_test1?

Sorry about having kept na.rm=Tin all the code when it was unnecessary here, it's there because my real data needs it.

I would very much appreciate any guidance on getting this type of operation to work without for loops or apply family functions.

For the context of the real dataset that I am using, see this tweet: https://twitter.com/Esclaponr/status/1264240029557157888

Thank you for your help!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.