Find the regression and predicted values by grouping the categories

I have got a dataframe df. Is there a way to find the predicted values across groups. For example for below dataframe, I have found the regression equation

df <- structure(list(colA = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), colB = c(48, 34, 56, 34, 56, 78), colC = c(45, 
67, 87, 45, 34, 56)), class = "data.frame", row.names = c(NA, 
-6L))
fit <- lmList(colB ~ colC | colA, data = df) # to find regression equation

Now is there a way to find the predicted values by inserting new column (Predicted colB) like below

df
  colA colB colC Predicted colB
1    A   48   45         
2    A   34   67
3    A   56   87
4    B   34   45
5    B   56   34
6    B   78   56

Predicted colB should have the predicted value in colB based on colC. For example in first row, when colC is 45 what is colB?

You could use predict() to get a vector of the predicted values. For example:

df$predicted_colB <- predict(fit)

# or with dplyr
df %>%
  mutate(predicted_colB = predict(fit))

Note that if the data frame were a new one (not the same data you predicted on), you'd have had to use predict(fit, newdata = df).

1 Like

thanks it is working. But I have a datasets with 290 rows. When I use your formula, I get below error. Not sure? Can you guide me?

df$predicted_low <- predict(fit)

Error in names(val) <- rep(namVal, ngrps) : 
  'names' attribute [290] must be the same length as the vector [289]

My guess is that there's one missing value in either colB or colC (or both). It looks like predict() on lme4 list objects can't handle that.

Try filtering the data before fitting it:

library(dplyr)

df_filtered <- df %>%
  filter(!is.na(colB), !is.na(colC))

fit <- lmList(colB ~ colC | colA, data = df_filtered)

df_filtered$predicted_colB <- predict(fit)

Hi,

what does a vertical line | mean in here:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.