Aggregate command

Hello
I am new to the Rstudio and need help about something.
I have a topic about wage difference between males and females.
I do not know how to use aggregate command to test wage difference between males and females at mean education level.

Please prepare a Reproducible Example (Reprex) for your question. We need to see the data in order to help you effectively.

Also, if your question is about a homework problem, please read the homework policy.

Please explain more about the desired calculation. The educ column has the values 0 2 5 8 11 15. What is meant by the "mean education level"?

mean(dat$educ)
9.25
I found 9.25 mean education value by using this command. I need wage difference between males and females at this 9.25 value.

The aggregate function calculates the mean, or whatever function you specify, of one column for each level of one or more other columns. You can use it to calculate the mean wage for each level of educ that is present in the data, for example. It will not interpolate between existing values of educ.

One way to calculate the wage at educ = 9.25 is to calculate a fit of wage as a function of educ using the lm() function then use the parameters of that fit to predict the wage at educ = 9.25. Is that something you are willing to try?

Yes. I think I need that lm() function solution.

Please give that method a try and ask for help, showing your code, if you get stuck. The steps to follow could be:

  1. Make a subset of the data with just females
  2. Us the lm() function to fit hwage ~ educ
  3. Use the coefficients of that fit to predict hwage at educ = 9.25
  4. Repeat steps 1 - 3 for the males.

reg=lm(hwage~educ+female,
How can I specify that educ=9.25 in this regression?

Once you have a fit you can use the predict() function to generate new values. Let's say I have data spanning roughly the range 1 - 3 and I want a prediction at 2.75:

DF <- data.frame(X = c(0.9, 1.1, 2.3, 2.1, 2.9, 3.3),
                 Y = c(6.7, 6.4, 9.3, 9.8, 12.0, 12.4),
                 Female = c("M", "F", "M", "F", "M", "F"))
FIT <- lm(Y ~ X + Female, data = DF)

NewDat <- data.frame(X = c(2.75, 2.75), Female = c("M", "F"))

predict(FIT, newdata = NewDat)
#>        1        2 
#> 11.20826 11.05944

NewDat$Y <- predict(FIT, newdata = data.frame(X = c(2.75, 2.75), Female = c("M", "F")))

NewDat
#>      X Female        Y
#> 1 2.75      M 11.20826
#> 2 2.75      F 11.05944

Created on 2019-12-15 by the reprex package (v0.2.1)

Thank you for all your help.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.