Applying a computation to sub-groups in a data frame?

Hey there,

I've got a beginner question for applying computations to data-frames. I'm having some trouble articulating what I want to do and finding results. I was hoping someone here could lend me a hand. Basically, I've got some (example) data that looks like this:

  ID Utilisation Class    NWCRT    NBCRT    NACRT Seed Total
1  1    0.128117     1 0.224152 0.139388 0.180320    0   0.3
2  2    0.070674     1 0.272605 0.077059 0.140413    0   0.3
3  3    0.010957     1 0.198594 0.012644 0.051620    0   0.3
4  4    0.050528     1 0.260191 0.055417 0.103483    0   0.3

What I want to do is transform it into a new frame with three columns, where I've computed a jitter value:

Jitter    Total        Class
   0.5          0.1                1
   0.4          0.1                2

I've created a handy function to compute the jitter for me, which I've defined here:

meanJitterByClass <- function(d, c) {
    z <- filter(d, Class==c)
    jitter <- c()
    for (r in 1:nrow(z)) {
        jitter <- c(jitter, d$NWCRT[r] - d$NBCRT[r])
    }
    return (mean(jitter))
}

Basically, I just call: meanJitterByClass(filter(df, Total=="0.1")) and obtain the mean for a particular class within the data frame, out of the rows in which I have a particular total. But I would like to get this in a data-frame where I've got a mean jitter for each combination of class and total (so I can make a grouped barplot).

I've tried using a custom double for loop to iterate over all utilisations, and then iterate over all classes, and append a row to a data frame. But It's not quite working, and I feel like there's a much more elegant way to do it in R.

I'd appreciate any help I can get!

Cheers,

~ Micrium

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

Of course, here is a fully contained example (data is fetched from Github Gist):

library(dplyr)
library(lattice)

# Function to compute jitter
meanJitterByClass <- function(d, c) {
    z <- filter(d, Class==c)
    jitter <- c()
    for (r in 1:nrow(z)) {
        jitter <- c(jitter, d$NWCRT[r] - d$NBCRT[r])
    }
    return (mean(jitter))
}

# Data presentation mode
# Major groups (columns) are things like your Utilisation, or Chain Length
# Minor groups have a row label, like "Class" being "0", "1", or "2"

# Prior to reshape, we're going to format our data as follows:
# VALUES (i.e jitter) | GROUP (i.e. Total) | Subgroup (i.e. mode)
raw <- read.csv(url("https://gist.githubusercontent.com/Micrified/cd3c00bbf8429e5701f0af31b54ed109/raw/c01e7a0a35dc6bceb7b14d6ed38df8210256a7c9/data.csv"))
raw <- as.data.frame(raw)
colnames(raw) <- c("ID", "Utilisation", "Class", "NWCRT", "NBCRT", "NACRT", "Seed", "Total")
df <- NULL

# For each Group (major) (The group column is called "Total" in the data, and has possible values [0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90])
for (i in 1:9) {

	# Create three Subgroups (minor) (The subgroup column is called "Mode" and has possible values [1, 2, 3])
	subset <- filter(raw, Total==(i*0.10))
	rbind(df, c(meanJitterByClass(subset, "0"), i, 0)) -> df
	rbind(df, c(meanJitterByClass(subset, "1"), i, 1)) -> df
	rbind(df, c(meanJitterByClass(subset, "2"), i, 2)) -> df
}

# Not elegant? Also, has NA inside?

# We can modify our data for a base barplot using reshape (??)
# base <- reshape()

Is this what you are trying to accomplish?

library(dplyr, warn.conflicts = FALSE)
raw <- read.csv(url("https://gist.githubusercontent.com/Micrified/cd3c00bbf8429e5701f0af31b54ed109/raw/c01e7a0a35dc6bceb7b14d6ed38df8210256a7c9/data.csv"),
                header = FALSE)
colnames(raw) <- c("ID", "Utilisation", "Class", "NWCRT", "NBCRT", "NACRT", "Seed", "Total")
STATS <- raw %>% mutate(jitter = NWCRT - NBCRT) %>% 
  group_by(Class, Total) %>% 
  summarize(MeanJitter = mean(jitter))
#> `summarise()` regrouping output by 'Class' (override with `.groups` argument)
STATS
#> # A tibble: 27 x 3
#> # Groups:   Class [3]
#>    Class Total MeanJitter
#>    <dbl> <dbl>      <dbl>
#>  1     0   0.1     0.180 
#>  2     0   0.2     0.281 
#>  3     0   0.3     0.355 
#>  4     0   0.4     0.508 
#>  5     0   0.5     0.618 
#>  6     0   0.6     0.780 
#>  7     0   0.7     0.880 
#>  8     0   0.8     1.12  
#>  9     0   0.9     1.24  
#> 10     1   0.1     0.0966
#> # ... with 17 more rows

Created on 2021-03-23 by the reprex package (v0.3.0)

1 Like

Yes! This solved my problem perfectly. Thank you so much for helping me out!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.