how to find conditional mean?

Hi all. From yesterday I was trying to find the conditional mean of the variable. In my case, I need two variables. One is continuous (positive), and the second one is binomial (yes=1, no=0). So, I have to find the mean of the first variable (continuous), if the second variable will equal to 1 (yes). And repeat the operation for the the same variable if the second variable is no (0). Also, I need to include na.rm = TRUE so that error didn't appear in my command line, because there are gaps in the table not filled in (NA). I have tried some commands, but they seem to be totally incorrect. Here are some of my attempts (mydata - data name, it was subsetted from the main data, because I needed only one year for all variables among all given years, x1 - continuous variable, x2 - binomial variable)

Part 1

if(mydata$x2 == 1) w <- mydata$x1
mean(w)
error: the condition has length > 1 and only the first element will be used

Part 2

mean(mydata[mydata$x2>0, "x1"])
Answer: [1] NA.

I don't know also how to integrate na.rm = TRUE argument here.
Please, help. Thanks.

1 Like

Why you're getting errors

  1. mydata$x2 is a vector, and you can't use it for comparison this way.
  2. Try with mean(mydata[mydata$x2>0, "x1"], na.rm = TRUE)

Alternative

Use the by function.

Let me illustrate by an example:

dataset <- data.frame(continuous = rnorm(n = 10),
                      binary = sample(x = 0:1, size = 10, replace = TRUE))

dataset
#>     continuous binary
#> 1  -0.01978487      0
#> 2  -1.14185292      0
#> 3   0.20931787      0
#> 4  -0.63720730      0
#> 5   1.07750407      1
#> 6  -1.59274225      0
#> 7  -0.48722740      1
#> 8  -0.64151044      0
#> 9  -0.64111755      0
#> 10  0.99598287      1

# your method
mean(dataset[dataset$binary == 1, 1])
#> [1] 0.5287532
mean(dataset[dataset$binary == 0, 1])
#> [1] -0.6378425

# using by
by(data = dataset$continuous, INDICES = dataset$binary, FUN = mean)
#> dataset$binary: 0
#> [1] -0.6378425
#> -------------------------------------------------------- 
#> dataset$binary: 1
#> [1] 0.5287532

Created on 2019-03-19 by the reprex package (v0.2.1)

Hope this helps.

PS

Please ask your future questions in form of a reproducible example. In this case, it was not too difficult to understand what can be going wrong, but more than often it's not the case. You can go through this great post to know how to make a reprex:

1 Like

If you are not constrained to use base R, another approach would be to use a tidyverse based solution like this one

set.seed(123)
library(dplyr)

dataset <- data.frame(continuous = rnorm(n = 10),
                      binary = sample(x = 0:1, size = 10, replace = TRUE))
dataset %>% 
    group_by(binary) %>% 
    summarise(continuos_mean = mean(continuous, na.rm = TRUE))
#> # A tibble: 2 x 2
#>   binary continuos_mean
#>    <int>          <dbl>
#> 1      0         -0.566
#> 2      1          0.235

Created on 2019-03-19 by the reprex package (v0.2.1)

Here you can find a free online book that teaches how to use the tidyverse tools.

3 Likes

Thanks, dear friend. You are helping me second time with R. I will take into account your advice. Thanks very much again.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.