Group rows by row name

dg.aragones · July 5, 2021, 11:14am

Hello all,

I am working with a large data frame. What I want is to simplify it by averaging rows that are closer by their row names. I will use a simple example:

r = 0.1 #Radius for averaging

data = data.frame(x1=c(0,60,0,76,7,0,11,0,87,0), 
           x2=c(4,33,0,116,0,0,48,0,63,0),
           x3=c(15,40,0,124,0,0,51,0,83,0), 
           row.names=c("705.35","705.4","726.4","726.45","726.5","727.4",
                       "727.45","757.4","757.45","757.5"))

Defining a given radius r, I would like to obtain a new data frame with rows averaged, grouping (in this example) rows c(1,2), c(3,4,5), c(6,7), c(8,9,10). I would like also the row names to be the mean of the previous ones.

Any help would be much appreciated, particularly using base R.

Thanks in advance for any help!

pieterjanvc · July 5, 2021, 12:07pm

Hi,

Here is a way of doing this

library(dplyr)
library(tibble)

#The data
r = 0.1 
data = data.frame(x1=c(0,60,0,76,7,0,11,0,87,0), 
                  x2=c(4,33,0,116,0,0,48,0,63,0),
                  x3=c(15,40,0,124,0,0,51,0,83,0), 
                  row.names=c("705.35","705.4","726.4","726.45","726.5","727.4",
                              "727.45","757.4","757.45","757.5"))

#Get the rowname values
data$rownames = as.numeric(rownames(data))
rowvals = data$rownames 

#Loop over the rowname vals to create groups
val = rowvals[1]
curGroup = 1
group = rep(1, length(rowvals))
for(i in 2:length(rowvals)){

  if(round(rowvals[i] - val, 2) > r){
    curGroup = curGroup + 1
    val = rowvals[i]
  } 
  
  group[i] = curGroup
}

#Add the group into to the data
data$groups = group
data
#>        x1  x2  x3 rownames groups
#> 705.35  0   4  15   705.35      1
#> 705.4  60  33  40   705.40      1
#> 726.4   0   0   0   726.40      2
#> 726.45 76 116 124   726.45      2
#> 726.5   7   0   0   726.50      2
#> 727.4   0   0   0   727.40      3
#> 727.45 11  48  51   727.45      3
#> 757.4   0   0   0   757.40      4
#> 757.45 87  63  83   757.45      4
#> 757.5   0   0   0   757.50      4

#Summarise by group and get all mean values 
data = data %>% group_by(groups) %>% 
  summarise(across(.fns = mean), .groups = "drop") %>% 
  column_to_rownames("rownames")

data
#>         groups       x1       x2       x3
#> 705.375      1 30.00000 18.50000 27.50000
#> 726.45       2 27.66667 38.66667 41.33333
#> 727.425      3  5.50000 24.00000 25.50000
#> 757.45       4 29.00000 21.00000 27.66667

^{Created on 2021-07-05 by the reprex package (v2.0.0)}

Hope this helps,
PJ

dg.aragones · July 5, 2021, 2:17pm

Yes, it helps a lot!

Thanks @pieterjanvc for the solution!

system · July 12, 2021, 2:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.