Applying a computation to sub-groups in a data frame?

Micrium · March 23, 2021, 12:19pm

Hey there,

I've got a beginner question for applying computations to data-frames. I'm having some trouble articulating what I want to do and finding results. I was hoping someone here could lend me a hand. Basically, I've got some (example) data that looks like this:

  ID Utilisation Class    NWCRT    NBCRT    NACRT Seed Total
1  1    0.128117     1 0.224152 0.139388 0.180320    0   0.3
2  2    0.070674     1 0.272605 0.077059 0.140413    0   0.3
3  3    0.010957     1 0.198594 0.012644 0.051620    0   0.3
4  4    0.050528     1 0.260191 0.055417 0.103483    0   0.3

What I want to do is transform it into a new frame with three columns, where I've computed a jitter value:

Jitter    Total        Class
   0.5          0.1                1
   0.4          0.1                2

I've created a handy function to compute the jitter for me, which I've defined here:

meanJitterByClass <- function(d, c) {
    z <- filter(d, Class==c)
    jitter <- c()
    for (r in 1:nrow(z)) {
        jitter <- c(jitter, d$NWCRT[r] - d$NBCRT[r])
    }
    return (mean(jitter))
}

Basically, I just call: meanJitterByClass(filter(df, Total=="0.1")) and obtain the mean for a particular class within the data frame, out of the rows in which I have a particular total. But I would like to get this in a data-frame where I've got a mean jitter for each combination of class and total (so I can make a grouped barplot).

I've tried using a custom double for loop to iterate over all utilisations, and then iterate over all classes, and append a row to a data frame. But It's not quite working, and I feel like there's a much more elegant way to do it in R.

I'd appreciate any help I can get!

Cheers,

~ Micrium

nirgrahamuk · March 23, 2021, 12:30pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Micrium · March 23, 2021, 12:48pm

Of course, here is a fully contained example (data is fetched from Github Gist):

library(dplyr)
library(lattice)

# Function to compute jitter
meanJitterByClass <- function(d, c) {
    z <- filter(d, Class==c)
    jitter <- c()
    for (r in 1:nrow(z)) {
        jitter <- c(jitter, d$NWCRT[r] - d$NBCRT[r])
    }
    return (mean(jitter))
}

# Data presentation mode
# Major groups (columns) are things like your Utilisation, or Chain Length
# Minor groups have a row label, like "Class" being "0", "1", or "2"

# Prior to reshape, we're going to format our data as follows:
# VALUES (i.e jitter) | GROUP (i.e. Total) | Subgroup (i.e. mode)
raw <- read.csv(url("https://gist.githubusercontent.com/Micrified/cd3c00bbf8429e5701f0af31b54ed109/raw/c01e7a0a35dc6bceb7b14d6ed38df8210256a7c9/data.csv"))
raw <- as.data.frame(raw)
colnames(raw) <- c("ID", "Utilisation", "Class", "NWCRT", "NBCRT", "NACRT", "Seed", "Total")
df <- NULL

# For each Group (major) (The group column is called "Total" in the data, and has possible values [0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90])
for (i in 1:9) {

	# Create three Subgroups (minor) (The subgroup column is called "Mode" and has possible values [1, 2, 3])
	subset <- filter(raw, Total==(i*0.10))
	rbind(df, c(meanJitterByClass(subset, "0"), i, 0)) -> df
	rbind(df, c(meanJitterByClass(subset, "1"), i, 1)) -> df
	rbind(df, c(meanJitterByClass(subset, "2"), i, 2)) -> df
}

# Not elegant? Also, has NA inside?

# We can modify our data for a base barplot using reshape (??)
# base <- reshape()

FJCC · March 23, 2021, 2:21pm

Is this what you are trying to accomplish?

library(dplyr, warn.conflicts = FALSE)
raw <- read.csv(url("https://gist.githubusercontent.com/Micrified/cd3c00bbf8429e5701f0af31b54ed109/raw/c01e7a0a35dc6bceb7b14d6ed38df8210256a7c9/data.csv"),
                header = FALSE)
colnames(raw) <- c("ID", "Utilisation", "Class", "NWCRT", "NBCRT", "NACRT", "Seed", "Total")
STATS <- raw %>% mutate(jitter = NWCRT - NBCRT) %>% 
  group_by(Class, Total) %>% 
  summarize(MeanJitter = mean(jitter))
#> `summarise()` regrouping output by 'Class' (override with `.groups` argument)
STATS
#> # A tibble: 27 x 3
#> # Groups:   Class [3]
#>    Class Total MeanJitter
#>    <dbl> <dbl>      <dbl>
#>  1     0   0.1     0.180 
#>  2     0   0.2     0.281 
#>  3     0   0.3     0.355 
#>  4     0   0.4     0.508 
#>  5     0   0.5     0.618 
#>  6     0   0.6     0.780 
#>  7     0   0.7     0.880 
#>  8     0   0.8     1.12  
#>  9     0   0.9     1.24  
#> 10     1   0.1     0.0966
#> # ... with 17 more rows

^{Created on 2021-03-23 by the reprex package (v0.3.0)}

Micrium · March 23, 2021, 8:19pm

Yes! This solved my problem perfectly. Thank you so much for helping me out!

system · March 30, 2021, 8:19pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.