Can I nest an extremely untidy data frame?


#1

Apologies for the unspecific question, but I'm trying to mold my data to work within a Bradley-Terry model in BradleyTerry2.

My issue is that, to account for additional variables in the model, I need to do some funky data manipulations, which include "mutating" (but not with mutate()) data frames as "columns".

I'm trying to then nest() this data based on season, but because of this weird structure, it won't work.

Any ideas? I attached reproducible code below, though it's in wrapr::build_frame() form. Also, if anybody has any ideas how to tidy the data so it'll work in a Bradley-Terry format, I'd love to hear it.

model_data <- wrapr::build_frame(
   "season"  , "away_player"     , "home_player"        , "faceoff_winner" |
   "20132014", "TOMAS.PLEKANEC"  , "KYLE.TURRIS"        , 1                |
   "20162017", "RYAN.JOHANSEN"   , "MARCUS.KRUGER"      , 0                |
   "20072008", "KAMIL.KREPS"     , "ADAM.MAIR"          , 0                |
   "20132014", "JORDAN.STAAL"    , "VERNON.FIDDLER"     , 1                |
   "20122013", "RICH.PEVERLEY"   , "KEVIN.PORTER"       , 0                |
   "20142015", "TOMAS.PLEKANEC"  , "KYLE.PALMIERI"      , 0                |
   "20152016", "LOGAN.COUTURE"   , "NICK.BONINO"        , 1                |
   "20162017", "HENRIK.SEDIN"    , "ALEX.WENNBERG"      , 0                |
   "20162017", "ALAN.QUINE"      , "RYAN.O'REILLY"      , 1                |
   "20152016", "BOYD.GORDON"     , "MIKAEL.GRANLUND"    , 0                |
   "20152016", "NATHAN.MACKINNON", "MATT.STAJAN"        , 1                |
   "20112012", "RYAN.KESLER"     , "DARREN.HELM"        , 0                |
   "20172018", "LUCAS.WALLMARK"  , "BRAYDEN.SCHENN"     , 0                |
   "20112012", "MAXIM.LAPIERRE"  , "ADAM.HENRIQUE"      , 0                |
   "20092010", "DAYMOND.LANGKOW" , "MATTHEW.LOMBARDI"   , 1                |
   "20172018", "NOLAN.PATRICK"   , "DEVIN.SHORE"        , 1                |
   "20112012", "BLAIR.JONES"     , "MATT.CULLEN"        , 1                |
   "20092010", "CLAUDE.GIROUX"   , "PATRIK.ELIAS"       , 1                |
   "20142015", "DERICK.BRASSARD" , "ANTOINE.VERMETTE"   , 1                |
   "20132014", "DAVID.LEGWAND"   , "BRAD.RICHARDSON"    , 1                |
   "20152016", "EVGENY.KUZNETSOV", "ANZE.KOPITAR"       , 0                |
   "20162017", "DAVID.KREJCI"    , "NICK.COUSINS"       , 0                |
   "20162017", "PAUL.STASTNY"    , "SERGEY.KALININ"     , 0                |
   "20162017", "KEVIN.HAYES"     , "JEAN-GABRIEL.PAGEAU", 1                |
   "20112012", "KEITH.AUCOIN"    , "JOHN.MITCHELL"      , 1                )

model_data %>% as_tibble()

Trying to account for more variables

## Ugly -- but necessary -- step in adding in home-ice effects
model_data$home <- data.frame(name = model_data$home_player, at_home = 1)
model_data$away <- data.frame(name = model_data$away_player, at_home = 0)

Trying (and failing) to nest

model_data %>% nest(-season)

#2

I don't know about Bradley-Terry model data very much. It seems it needs some very specific nested structure of data.frame column. I think here, it is not very compatible with tidyverse tibble format.

If you make sure your data is not tibble. (as.data.frame(model_data)), it seems that tidyr::nest does not throw an error. All tidyverse verbs works one data.frame and tibble.

You could try that see if it suits you.

Otherwise, I would advice to use base R to build your data using list and data.frame. However, tidyverse's purrr may help you manipulate list structure.

hope it helps.


# I don't have wrapr and tribble does the job
model_data <- tibble::tribble(
  ~ "season"  , ~ "away_player"   , ~ "home_player"      ,~ "faceoff_winner",
    "20132014", "TOMAS.PLEKANEC"  , "KYLE.TURRIS"        , 1            ,
    "20162017", "RYAN.JOHANSEN"   , "MARCUS.KRUGER"      , 0            ,
    "20072008", "KAMIL.KREPS"     , "ADAM.MAIR"          , 0            ,
    "20132014", "JORDAN.STAAL"    , "VERNON.FIDDLER"     , 1            ,
    "20122013", "RICH.PEVERLEY"   , "KEVIN.PORTER"       , 0            ,
    "20142015", "TOMAS.PLEKANEC"  , "KYLE.PALMIERI"      , 0            ,
    "20152016", "LOGAN.COUTURE"   , "NICK.BONINO"        , 1            ,
    "20162017", "HENRIK.SEDIN"    , "ALEX.WENNBERG"      , 0            ,
    "20162017", "ALAN.QUINE"      , "RYAN.O'REILLY"      , 1            ,
    "20152016", "BOYD.GORDON"     , "MIKAEL.GRANLUND"    , 0            ,
    "20152016", "NATHAN.MACKINNON", "MATT.STAJAN"        , 1            ,
    "20112012", "RYAN.KESLER"     , "DARREN.HELM"        , 0            ,
    "20172018", "LUCAS.WALLMARK"  , "BRAYDEN.SCHENN"     , 0            ,
    "20112012", "MAXIM.LAPIERRE"  , "ADAM.HENRIQUE"      , 0            ,
    "20092010", "DAYMOND.LANGKOW" , "MATTHEW.LOMBARDI"   , 1            ,
    "20172018", "NOLAN.PATRICK"   , "DEVIN.SHORE"        , 1            ,
    "20112012", "BLAIR.JONES"     , "MATT.CULLEN"        , 1            ,
    "20092010", "CLAUDE.GIROUX"   , "PATRIK.ELIAS"       , 1            ,
    "20142015", "DERICK.BRASSARD" , "ANTOINE.VERMETTE"   , 1            ,
    "20132014", "DAVID.LEGWAND"   , "BRAD.RICHARDSON"    , 1            ,
    "20152016", "EVGENY.KUZNETSOV", "ANZE.KOPITAR"       , 0            ,
    "20162017", "DAVID.KREJCI"    , "NICK.COUSINS"       , 0            ,
    "20162017", "PAUL.STASTNY"    , "SERGEY.KALININ"     , 0            ,
    "20162017", "KEVIN.HAYES"     , "JEAN-GABRIEL.PAGEAU", 1            ,
    "20112012", "KEITH.AUCOIN"    , "JOHN.MITCHELL"      , 1            )

library(dplyr, warn.conflicts = FALSE)
#> Warning: le package 'dplyr' a été compilé avec la version R 3.4.4
# Make sure it is data.frame
model_data <- model_data %>% as.data.frame()
model_data$home <- data.frame(name = model_data$home_player, at_home = 1)
model_data$away <- data.frame(name = model_data$away_player, at_home = 0)
model_data %>% tidyr::nest(-season) %>% str(2)
#> Warning: le package 'bindrcpp' a été compilé avec la version R 3.4.4
#> 'data.frame':    9 obs. of  2 variables:
#>  $ season: chr  "20132014" "20162017" "20072008" "20122013" ...
#>  $ data  :List of 9
#>   ..$ :'data.frame': 3 obs. of  5 variables:
#>   ..$ :'data.frame': 6 obs. of  5 variables:
#>   ..$ :'data.frame': 1 obs. of  5 variables:
#>   ..$ :'data.frame': 1 obs. of  5 variables:
#>   ..$ :'data.frame': 2 obs. of  5 variables:
#>   ..$ :'data.frame': 4 obs. of  5 variables:
#>   ..$ :'data.frame': 4 obs. of  5 variables:
#>   ..$ :'data.frame': 2 obs. of  5 variables:
#>   ..$ :'data.frame': 2 obs. of  5 variables:

Created on 2018-07-26 by the reprex package (v0.2.0).


#3

Awesome -- it seems to work, well! By the way, it seems that adding as_tibble() to the end of the last pipe returns it to a tibble and reduces the verbosity of the output.

So:

model_data %>% tidyr::nest(-season) %>% as_tibble()

#4

Cool! I think this is why people make broom tidiers, because it is so complicated to get the structure of the model objects as you'd like them for the input and output!