Trying to do statistical analysis of specific variables in R

Hi there

I am quite a novice with R and I am currently trying to look through a large dataset with around 157 different variables of patient data, I want to see the difference between one variable and another in 2 different ethnic subgroups and both genders but not too sure how to do this.
In my dataset Gender is categorised with 0 being female and 1 being male. I then have ethnicity categorised from 1-15 and I want to look at the variation between ethnicity and genders in these different variables. E.g. what is the average height of females and age with ethnicity 2

Any help would make me incredibly grateful

Many thanks.

Here is an example of doing that sort of thing with some data I invented. It would not matter if there were more columns in the data, the code would be the same. I strongly recommend you read a book like R for Data Science

set.seed(1)
DF <- data.frame(Gender = sample(0:1, 100, replace = TRUE),
                 Ethn = sample(1:15, 100, replace = TRUE),
                 Age = sample(18:40, 100, replace= TRUE),
                 Height = rnorm(100, 1.5, 0.1))
library(dplyr)
#Calculate for all values of Ethn
Stats <- DF %>% group_by(Gender, Ethn) %>% 
  summarize(AvgAge = mean(Age), AvgHt = mean(Height))
Stats
#> # A tibble: 29 x 4
#> # Groups:   Gender [2]
#>    Gender  Ethn AvgAge AvgHt
#>     <int> <int>  <dbl> <dbl>
#>  1      0     1   23.3  1.57
#>  2      0     2   20    1.45
#>  3      0     3   24.2  1.59
#>  4      0     4   29    1.63
#>  5      0     5   23    1.53
#>  6      0     6   22.2  1.49
#>  7      0     7   30.5  1.52
#>  8      0     8   25.7  1.47
#>  9      0     9   29.8  1.55
#> 10      0    10   26.4  1.42
#> # … with 19 more rows

#If you just want Ethn == 2
Stats <- DF %>% filter(Ethn == 2) %>% 
  group_by(Gender, Ethn) %>% 
  summarize(AvgAge = mean(Age), AvgHt = mean(Height))
Stats
#> # A tibble: 2 x 4
#> # Groups:   Gender [2]
#>   Gender  Ethn AvgAge AvgHt
#>    <int> <int>  <dbl> <dbl>
#> 1      0     2   20    1.45
#> 2      1     2   31.2  1.53

Created on 2019-12-09 by the reprex package (v0.2.1)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.