Find bar chart values

ggplot2
rstudio

#1

Greetings,

As a complete novice, I managed to create a bar chart of area measurements from two groups with the following code. Now I’d like to know the actual values (i.e., means) used for each bar and then test for group differences. Any help or suggestions with these questions are greatly appreciated.

roi1 <- ggplot(SampleData, aes(Gp2, fill = Gp2))
roi1 + geom_bar()
roi1 + stat_summary_bin(aes(y = lh.area.frontal), fun.y = "mean", geom = "bar", show.legend = TRUE)

Cheers,
Jason


#2

I'll go for something like

## some data
Data = data.frame(value = c(rnorm(10000),rnorm(10000,5)), groups = rep(c('a','b'), each = 10000))
## the mean values for each group
DataMeans <- sapply(split(Data$value, Data$groups), mean)
## checking that the output corresponds with the simulated data (mean of a = 0, mean of b = 5)
DataMeans
## output
> DataMeans
      a           b 
0.004444622 5.013350561 

cheers
Fer


#3

Thanks for the quick response Fer,

This was extremely helpful. For calculating group means on multiple variables (e.g., 10+), do I have to list out each variable or is there a quicker way to specify multiple variables?

As an aside, what is the 'c' in the first line of code used for, what does it mean? I frequently see this in a lot of answers but have no idea what it means...:disappointed:

Cheers,
Jason


#4

If with multiple variables you mean that your grouping variable has several different values, the split function will do the job for you:

## some data
Data = data.frame(value = c(rnorm(10000),rnorm(10000,5), rnorm(10000,10)), groups = rep(c('a','b','d'), each = 10000))
## the mean values for each group
DataMeans <- sapply(split(Data$value, Data$groups), mean)
## checking that the output corresponds with the simulated data (mean of a = 0, mean of b = 5)
DataMeans
## output
> DataMeans
     a          b          d 
0.00673895 4.99842741 9.98537344 

But if you mean more variables (columns), and want the mean value for all combinations between the variables values, then you need tto create a list of the columns (on the argument called 'f'):

## some data
Data = data.frame(value = c(rnorm(10000),rnorm(10000,5), rnorm(10000,10)), groups = rep(c('a','b','d'), each = 10000), gender = rep(c('M','F'), each = 15000))
## the mean values for each group
DataMeans <- sapply(split(Data$value, f = list(Data$groups, Data$gender)), mean)
## checking that the output corresponds with the simulated data (mean of a = 0, mean of b = 5)
DataMeans
## output
> DataMeans
a.F          b.F          d.F          a.M          b.M          d.M 
NaN  4.995547802 10.007088587  0.003088001  4.982541806          NaN   

AS you can see, it reports NaN for the combinations that does not exists on the data set, so, if you want only those that exists, then:

> DataMeans[is.finite(DataMeans)]
         b.F          d.F          a.M          b.M 
 4.995547802 10.007088587  0.003088001  4.982541806 

cheers
Fer

Edit: the 'c' cames from 'concatenate'. That means exactly this. I am concatenating three random generated sets of 10000 values with a normal distribution but different means 0,5 and 10 (that is, adding one after another). It is used for creating vectors. So, if you want to create a vector with values 4,6,8,and 10, then you just type Vector <- c(4,6,8,10)


#5

Fer,
Thanks again for the helpful answers. As another concrete example, I have a dataset with variables (columns):
ID, Group, Sex, Age, ICV, lh.volume.1-- lh.volume.11
I want to calculate Group means of:
Age
ICV
lh.volumes.1 to lh.volume.11

Then I want to test these means statistically. Right now there are 2 groups. So I plan to use t.test(variable~Group, dataset).

Regarding both of these goals, is there a way to choose multiple variables with a wildcard or regex? That way I can include all the lh.volumes at once?

Cheers,
Jason


#6

Minor pedantic point: the c() function is actually named for “combine” since it “combines values into a vector or list” — same idea, slightly different vocab: https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/c

You can find the documentation for a function in R by typing ?functionName or help("functionName") at the console.

You might be interested in this thread, which has lots of great resources for getting up to speed when you’re new to R: What's your favorite intro to R?


#7

Thanks for the clarification and link. I will definitely check it out in the near future!

My biggest question at the moment is is how to code a t-test of Group with 11 other variables.? Do I have to code each t-test separately or is there a way to do this all at once?

Cheers,
Jason


#8

Thanks for the point. I am sure I have read in some books that thing of concatenate long time ago (and in the function help, they call '...' the objects to be 'concatenated'). That "combine" sounds a bit weird to me, as if I think of 'combining' A, B and C on a vector, first thing would came to me would be the vector {ABC, ACB, BAC, BCA, CAB, CBA}.
But it is how it is :slight_smile:


#9

Interesting! For me, “concatenation” is dominated by associations with string concatenation so I’d think of “concatenating” A, B, and C as resulting in “ABC” (a vector of length 1). I agree that “combine” isn’t entirely felicitous either, due to the obvious associations with combinatorics, as you point out. Naming things is hard! Thankfully, after you’ve had R in your brain for a good long while you stop really thinking about what c() means other than what it does. :wink:


#10

Since this is a substantially separate question from the one that started this thread, can you please post it as a new topic? If you want your new topic to be automatically linked to this one, click the little :link: icon at the bottom of your original post and select New Topic from the popup:

(There’s also some helpful guidelines for posting questions here in the FAQs: https://community.rstudio.com/faq)