Tidyr:nest() isn't nesting on the column I expect


#1

I have a sample of United Nations data set. I was trying to nest so that there are two columns, one with country name, and one that is a list of data for each country, like so:

# A tibble: 200 x 2
   country                         data             
   <chr>                           <list>           
 1 Afghanistan                     <tibble [34 x 3]>
 2 Argentina                       <tibble [34 x 3]>
 3 Australia                       <tibble [34 x 3]>
 4 Belarus                         <tibble [34 x 3]>
 5 Belgium                         <tibble [34 x 3]>

When I run the command inside of datacamp, it works as expected.

However, when I run inside RStudio, each year is showing up as a row. I tried doing nest(-country) , as well as simply listing all the variables I wanted to nest (leaving country out), in both cases year was left out of the next instead of country.

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 3.4.4
#> Warning: package 'purrr' was built under R version 3.4.4
#> Warning: package 'dplyr' was built under R version 3.4.4
#> Warning: package 'stringr' was built under R version 3.4.4
#> Warning: package 'forcats' was built under R version 3.4.4
library(countrycode)
#> Warning: package 'countrycode' was built under R version 3.4.4
library(broom)
#> Warning: package 'broom' was built under R version 3.4.4
library(tidyr)
library(reprex)

#subset of data (put together using dput)

Sample_Data <-
    structure(list(year = structure(c(1997, 1997, 1997, 1999, 1999, 
                                      1999, 2001, 2001, 2001, 2003, 2003, 2003, 2005, 2005, 2005, 2007, 
                                      2007, 2007, 2009, 2009, 2009, 2011, 2011, 2011, 2013, 2013, 2013
    ), comment = ""), country = c("France", "United Kingdom", "United States", 
                                  "France", "United Kingdom", "United States", "France", "United Kingdom", 
                                  "United States", "France", "United Kingdom", "United States", 
                                  "France", "United Kingdom", "United States", "France", "United Kingdom", 
                                  "United States", "France", "United Kingdom", "United States", 
                                  "France", "United Kingdom", "United States", "France", "United Kingdom", 
                                  "United States"), total = c(69L, 69L, 69L, 68L, 68L, 68L, 65L, 
                                                              67L, 67L, 76L, 76L, 76L, 74L, 74L, 73L, 76L, 77L, 77L, 69L, 69L, 
                                                              69L, 65L, 65L, 65L, 64L, 64L, 64L), percent_yes = c(0.565217391304348, 
                                                                                                                  0.565217391304348, 0.289855072463768, 0.558823529411765, 0.558823529411765, 
                                                                                                                  0.235294117647059, 0.538461538461538, 0.537313432835821, 0.164179104477612, 
                                                                                                                  0.565789473684211, 0.513157894736842, 0.171052631578947, 0.581081081081081, 
                                                                                                                  0.581081081081081, 0.164383561643836, 0.526315789473684, 0.506493506493506, 
                                                                                                                  0.116883116883117, 0.478260869565217, 0.492753623188406, 0.188405797101449, 
                                                                                                                  0.569230769230769, 0.553846153846154, 0.261538461538462, 0.515625, 
                                                                                                                  0.5, 0.203125)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
                                                                                                                  ), row.names = c(NA, -27L), vars = "year", drop = TRUE, .Names = c("year", 
                                                                                                                                                                                     "country", "total", "percent_yes"), indices = list(0:2, 3:5, 
                                                                                                                                                                                                                                        6:8, 9:11, 12:14, 15:17, 18:20, 21:23, 24:26), group_sizes = c(3L, 
                                                                                                                                                                                                                                                                                                       3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), biggest_group_size = 3L, labels = structure(list(
                                                                                                                                                                                                                                                                                                           year = structure(c(1997, 1999, 2001, 2003, 2005, 2007, 2009, 
                                                                                                                                                                                                                                                                                                                              2011, 2013), comment = "")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                               -9L), vars = "year", drop = TRUE, .Names = "year"))

# Attempt at nesting

nested <- Sample_Data %>%
    nest(-country)
#> Warning: package 'bindrcpp' was built under R version 3.4.4

Created on 2018-06-19 by the reprex package (v0.2.0).


#2

Hi,

try this:

Sample_Data %>% ungroup() %>% nest(-country)

Sample_Data is still grouped by year and therefore nest is not working as expected.


#3

That was exactly my problem.

Thank you!


#4

I had this issue, too.

Curious, how did you know that Sample_Data was still grouped by year? Is there a function that I can run to check if a data frame is still grouped? Or, did you simply know that grouping was an issue due to the syntax of nest()?


#5

Printing a tibble reports any grouping variables if you're doing things interactively. You can also check the class of the tibble using class(Sample_Data), and that'll include "grouped_df" if it's grouped :slight_smile:

It was just bad luck that the DF was grouped in this instance when the original poster used dput() to replicate it!


#6

A couple of ways to check:

  • Within the structure of the sample data, you can see that one of the classes specified is "grouped_df", with a vars argument of "year".
  • If you run str(Sample_Data), you'll see the above as well.
  • Running dplyr::group_vars(Sample_Data) will return "year" or character(0) if there no grouping variables.