Trying to Return and Get Stats from the Last Three Rows of a Grouping Variable

Hi All,
I am trying to return and average the last three sales of a group. Then I want to create a new column with those values within the data frame I already have. I'll post reprex below.

library(tidyverse)

original_blue_chips <- data.frame("Title" = c('Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Joe Namath', 'Joe Namath', 'Joe Namath', 'Joe Namath'),
                               "Sale_date" = c(2009-08-12,2010-0-15,2011-10-12, 2012-09-01, 2009-08-12,2010-09-15,2011-10-12, 2012-09-01), 
                               "Price" = c(124, 156, 533, 174, 234, 299, 495, 485))

what_i_want <- data.frame("Title" = c('Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Joe Namath', 'Joe Namath', 'Joe Namath', 'Joe Namath'),
                          "Sale_date" = c(2009-08-12,2010-09-15,2011-10-12, 2012-09-01, 2009-08-12,2010-09-15,2011-10-12, 2012-09-01), 
                          "Price" = c(124, 156, 533, 174, 234, 299, 495, 485),
                          "Market_Value" = c(288, 288, 288, 288, 426, 426, 426, 426))

How about something like this:

suppressPackageStartupMessages(library(tidyverse))

original_blue_chips <- tibble("Title" = c('Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Joe Namath', 'Joe Namath', 'Joe Namath', 'Joe Namath'),
                                  "Sale_date" = c("2009-08-12", "2010-09-15","2011-10-12", "2012-09-01", "2009-08-12","2010-09-15","2011-10-12", "2012-09-01"), 
                                  "Price" = c(124, 156, 533, 174, 234, 299, 495, 485)) %>%
  mutate(Sale_date = as.Date(Sale_date))

what_i_want <- tibble("Title" = c('Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Joe Namath', 'Joe Namath', 'Joe Namath', 'Joe Namath'),
                          "Sale_date" = c("2009-08-12", "2010-09-15","2011-10-12", "2012-09-01", "2009-08-12","2010-09-15","2011-10-12", "2012-09-01"), 
                          "Price" = c(124, 156, 533, 174, 234, 299, 495, 485),
                          "Market_Value" = c(288, 288, 288, 288, 426, 426, 426, 426)) %>%
  mutate(Sale_date = as.Date(Sale_date))

original_blue_chips
#> # A tibble: 8 x 3
#>   Title         Sale_date  Price
#>   <chr>         <date>     <dbl>
#> 1 Mickey Mantle 2009-08-12   124
#> 2 Mickey Mantle 2010-09-15   156
#> 3 Mickey Mantle 2011-10-12   533
#> 4 Mickey Mantle 2012-09-01   174
#> 5 Joe Namath    2009-08-12   234
#> 6 Joe Namath    2010-09-15   299
#> 7 Joe Namath    2011-10-12   495
#> 8 Joe Namath    2012-09-01   485

what_i_want
#> # A tibble: 8 x 4
#>   Title         Sale_date  Price Market_Value
#>   <chr>         <date>     <dbl>        <dbl>
#> 1 Mickey Mantle 2009-08-12   124          288
#> 2 Mickey Mantle 2010-09-15   156          288
#> 3 Mickey Mantle 2011-10-12   533          288
#> 4 Mickey Mantle 2012-09-01   174          288
#> 5 Joe Namath    2009-08-12   234          426
#> 6 Joe Namath    2010-09-15   299          426
#> 7 Joe Namath    2011-10-12   495          426
#> 8 Joe Namath    2012-09-01   485          426

output <- original_blue_chips %>%
  group_by(Title) %>%
  mutate(order = rank(Sale_date),
         is_last_3 = order > (max(order) - 2.5),
         Market_Value = mean(if_else(is_last_3 == TRUE, Price, NA_real_), na.rm = TRUE))


output
#> # A tibble: 8 x 6
#> # Groups:   Title [2]
#>   Title         Sale_date  Price order is_last_3 Market_Value
#>   <chr>         <date>     <dbl> <dbl> <lgl>            <dbl>
#> 1 Mickey Mantle 2009-08-12   124     1 FALSE             288.
#> 2 Mickey Mantle 2010-09-15   156     2 TRUE              288.
#> 3 Mickey Mantle 2011-10-12   533     3 TRUE              288.
#> 4 Mickey Mantle 2012-09-01   174     4 TRUE              288.
#> 5 Joe Namath    2009-08-12   234     1 FALSE             426.
#> 6 Joe Namath    2010-09-15   299     2 TRUE              426.
#> 7 Joe Namath    2011-10-12   495     3 TRUE              426.
#> 8 Joe Namath    2012-09-01   485     4 TRUE              426.

Created on 2019-08-14 by the reprex package (v0.3.0)

If you feel like it, you can also drop the no-longer-necessary columns with output %>% select(-order, -is_last_3)

1 Like

You can use the tail function calculate the mean of just the last 3 observations in each group. I've also sorted by Sale_date to ensure we're getting the most recent 3 sales for each group. For example:

library(tidyverse)

original_blue_chips <- tibble("Title" = c('Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Mickey Mantle', 'Joe Namath', 'Joe Namath', 'Joe Namath', 'Joe Namath'),
                              "Sale_date" = as.Date(c("2009-08-12", "2010-09-15","2011-10-12", "2012-09-01", "2009-08-12","2010-09-15","2011-10-12", "2012-09-01")), 
                              "Price" = c(124, 156, 533, 174, 234, 299, 495, 485))

original_blue_chips %>%
  arrange(Title, Sale_date) %>% 
  group_by(Title) %>% 
  mutate(Market_Value = mean(tail(Price, 3)))
  Title         Sale_date  Price Market_Value
  <chr>         <date>     <dbl>        <dbl>
1 Joe Namath    2009-08-12   234         426.
2 Joe Namath    2010-09-15   299         426.
3 Joe Namath    2011-10-12   495         426.
4 Joe Namath    2012-09-01   485         426.
5 Mickey Mantle 2009-08-12   124         288.
6 Mickey Mantle 2010-09-15   156         288.
7 Mickey Mantle 2011-10-12   533         288.
8 Mickey Mantle 2012-09-01   174         288.

The code is a little harder to read, but you can also index the most recent 3 dates in each group without ordering the data:

original_blue_chips %>%
  group_by(Title) %>% 
  mutate(Market_Value = mean(Price[rank(Sale_date) %in% (n()-2):n()]))
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.