using predict as part of a pipeline

issue

I have dynamically included the results of a regression into a dataframe - so far so good.

I can use predict from the dataframe so regressions have worked

I now want to use predict as part of a pipeline.

For each of the four lines in dataAndModel below I want to find the predicted value from the associated model and the single data point x on each line.

There are two issues:

  1. passing the regression equation so it is recognised in the pipeline by predict
  2. passing a single datapoint instead of a dataframe

I am seeking an output looking like
x forecastedValueofXforGivenModel
3 13
4 14
5 15
6 16

Thanks in advance for your comments


library(dplyr)
library(stats)

theData <- data.frame(type=c(1,1,2,2),x=c(1,2,3,4), y=c(11,12,13,14))
regressions <- theData %>%
  group_by(type) %>%
  do(myModel=lm(y ~ x, data=.))

regressions #1	<S3: lm>			and 2	<S3: lm>	


evaluate <- data.frame(type=c(1,1,2,2), x=c(3,4,5,6))
fakeX <- data.frame(x=c(3,4,5,6))

dataAndModel <- evaluate %>% 
    merge(x=.,y=regressions)

{ # works so regression is working
  
  predict(dataAndModel[[1,"myModel"]],fakeX)
}


# now i want to use predict as part of a pipe

dataAndModel %>%                                     # does not work - given x is not a dataframe not expected to work
  mutate(yEst = predict(myModel,x))



dataAndModel %>%                           # does not work
  mutate(yEst = predict(myModel,fakeX))

Hi @boffin,

I am using a slightly different approach with the {purrr} and {tidyr} packages here but the final result is what you expect. Do not hesitate to ask questions if you have any:

# Load packages ----

library(dplyr)
library(purrr)
library(tidyr)


# Create dataset (+ fake dataset) ----

theData <- data.frame(
  type = c(1, 1, 2, 2),
  x = c(1, 2, 3, 4), 
  y = c(11, 12, 13, 14),
  fakeX = c(3, 4, 5, 6)
)


# Nest the data by type ----

nestedData <- theData %>%
  group_by(type) %>%
  nest(data = c(y, x), fakeX = fakeX) %>%
  ungroup()


# Run model for each type ----

nestedData <- nestedData %>%
  mutate(
    model = map(.x = data, ~ lm(y ~ x, data = .x))
  )

# Predict values using fake data ----

final_nested <- nestedData %>%
  mutate(
    yEst = map2(
      .x = model,
      .y = fakeX, 
      .f = ~ predict(object = .x, newData = .y)
    )
  )

final_nested

# A tibble: 2 × 5
   type data             fakeX            model  yEst     
  <dbl> <list>           <list>           <list> <list>   
1     1 <tibble [2 × 2]> <tibble [2 × 1]> <lm>   <dbl [2]>
2     2 <tibble [2 × 2]> <tibble [2 × 1]> <lm>   <dbl [2]>

# Select and unnest numeric columns ----

final_nested %>%
  select(type, data, fakeX, yEst) %>%
  unnest(cols = -type)

# A tibble: 4 × 5
   type     y     x fakeX  yEst
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    11     1     3    11
2     1    12     2     4    12
3     2    13     3     5    13
4     2    14     4     6    14

Hi @gueyenono

Thank you for this reply - yes it is the answer, and I will mark it the answer on reply by you for the following.

I could not get your last statement to work

final_nested %>%                          
  select(type,  fakeX, yEst) %>%
  unnest(cols = -type)                       ### ERR  unexpected '<' in "<"

I tried

final_nested %>%
  select(type,  fakeX, yEst) %>%
  unnest(cols = !type)      #### same ERR unexpected '<' in "<"

the code below works and the whole suggested code works well, thank you for this example - thanks also for the introduction to nest/unnest & map/map2

final_nested %>%
  select(type,  fakeX, yEst) %>%
  unnest(cols = c(fakeX,yEst))

Can you advise the status of the "unnest(cols = -type)" line

thanks again

I'm not quite sure why it is not working for you, but using unnest(cols = c(fakeX, yEst)) is definitely the recommended approach.

...unnest(cols = -type) means "unnest all columns except the type column", but this is too general so you should stick with the code that worked.

Hi @gueyenono

I have noticed that the results were incorrect and have learnt a lot about how purrr works.

The code up to and including the regression worked. The code around final_nested needed some changes as follows:
(1) the syntax was newdata not newData
(2) the predict was looking for a x name down at the field level so earlier I created a nestedData2 where the nest had an internal name of "x"

The question I had with -type (bottom of code) has now gone away.
regards
Boffin

library(dplyr)
library(purrr)
library(tidyr)


# Create dataset (+ fake dataset) ----

theData <- data.frame(
  type = c(1, 1, 2, 2),
  x = c(1, 2, 3, 4), 
  y = c(11, 12, 13, 14),
  fakeX = c(3, 4, 5, 6)
)


# Nest the data by type ----

nestedData1 <- theData %>%
  select(type,y,x) %>%
  group_by(type) %>%
  nest(data = c(y, x)) %>%  # x=fakex removed need to have the fakeX = 'x' else predict will not work
  ungroup()

nestedData2 <- theData %>%
  select(type,fakeX) %>%
  group_by(type) %>%
  rename(x=fakeX ) %>%  # rename X
  nest(fakeX = x )%>%  # x=fakex removed need to have the fakeX = 'x' else predict will not work
  ungroup()

nestedData <- merge(nestedData1,nestedData2)   # 3 objects in dataset are  type, fakeX, data
    
  
  
# Run model for each type ----

nestedData <- nestedData %>%
  mutate(
    model = map(.x = data, ~ lm(y ~ x, data = .x))
  )

## the model moans as it is a very poor (meaningless) statistically - not a problem if i had many data points

# Predict values using fake data ----

final_nested <- nestedData %>%
  mutate(
    yEst = map2(
      .x = model,
      .y = fakeX, # this line passes 'x' into the model
      .f = ~ predict(object = .x, newdata = .y)  # newdata not newData
    )
  )

final_nested



# check out the modelling
summary(nestedData[[1,"model"]])  # works 1 line
summary(nestedData[[2,"model"]])  # works 1 line
predict(object=nestedData[[1,"model"]],newdata = data.frame(x=6)) # 6*1 + 10 = 16


  # Select and unnest numeric columns ----


final_nested %>%
  select(type,  fakeX, yEst) %>%
  unnest(cols =-type)                        # now works

2 Likes

You are right regarding the typo in my code. Indeed, the argument is newdata and not newData. And glad you have been able to make it work to your liking. You should mark your own code as the solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.