NULL factor levels with readr data-read

I'm in hour 7 of writing this book today, because deadline, which is certainly a problem, and I am not thinking straight. I'm also not sure if I should ask this question here, or start somewhere else. I have tried searching to make sure I'm not duplicating another question, but everything I read is either people putting incorrect objects on the level() function (like a data frame) or technical discussions that I can't follow (and don't seem to apply to me).

It's also so elementary I'm a little ashamed to ask.

Installed:
library(tidyverse)
library(datatables)
library(forcats)

These all work great:

View (chickwts)
names (chickwts)
[1] "weight" "feed"
levels (chickwts$feed)
[1] "casein" "horsebean" "linseed" "meatmeal" "soybean" "sunflower"

I have an artificial data set whose dput() I'll attach below. The filename is M&Ms.csv, with two variables: M&Ms and Brand X. 250 rows of colors, headed toward a test whether the two brands adhere to the M&Ms published proportion.

I load the data in using readr, using Rstudio's menu.

names(M_Ms)
[1] "M&Ms" "BrandX"
levels(M_Ms$M&Ms)
NULL
levels(M_Ms$BrandX)
NULL

Why the NULLs? I can't see a difference between the two data sets.

I've read ?levels and ?readr and the only thing I see is the following.

x : an object, for example, a factor.
value : A valid value for levels(x) . For the default method, NULL or a character vector.

By being inside the datasets package(?), chickwt is returning its variables' levels, and M&Ms isn't. By using readr, I'm making a tibble, which I thought was an extension of a data.frame, so I'm not sure why levels doesn't work.

Even though these aren't really factor variables, do I need to make them factors (like Chapter 15 in R for Data Science?). Somehow I thought tidyverse lifted that burden from me.

structure(list(M&Ms = c("brown", "red", "red", "blue", "green",
"red", "red", "orange", "green", "orange", "brown", "green",
"yellow", "green", "blue", "yellow", "brown", "yellow", "red",
"green", "brown", "yellow", "yellow", "green", "yellow", "brown",
"brown", "blue", "blue", "brown", "blue", "yellow", "yellow",
"blue", "brown", "yellow", "blue", "orange", "yellow", "blue",
"red", "red", "red", "red", "blue", "yellow", "blue", "blue",
"brown", "yellow", "red", "red", "orange", "brown", "brown",
"orange", "brown", "brown", "yellow", "yellow", "red", "red",
"brown", "red", "blue", "red", "yellow", "brown", "orange", "red",
"red", "orange", "brown", "red", "yellow", "green", "brown",
"blue", "red", "brown", "brown", "brown", "red", "yellow", "brown",
"brown", "brown", "brown", "red", "blue", "brown", "red", "orange",
"brown", "brown", "blue", "brown", "green", "green", "brown",
"red", "brown", "brown", "red", "green", "red", "brown", "yellow",
"yellow", "yellow", "yellow", "brown", "red", "red", "orange",
"red", "brown", "red", "brown", "green", "blue", "yellow", "red",
"blue", "red", "red", "green", "red", "orange", "red", "brown",
"green", "orange", "red", "orange", "red", "yellow", "brown",
"yellow", "yellow", "blue", "brown", "brown", "yellow", "green",
"brown", "red", "brown", "yellow", "orange", "red", "red", "brown",
"green", "green", "brown", "brown", "yellow", "orange", "red",
"red", "yellow", "green", "yellow", "green", "red", "red", "brown",
"brown", "yellow", "green", "orange", "orange", "red", "blue",
"brown", "brown", "yellow", "blue", "yellow", "red", "brown",
"red", "red", "brown", "blue", "red", "yellow", "blue", "blue",
"yellow", "yellow", "brown", "yellow", "brown", "yellow", "brown",
"yellow", "brown", "blue", "blue", "orange", "green", "red",
"red", "red", "brown", "red", "red", "green", "yellow", "green",
"red", "red", "blue", "green", "blue", "brown", "green", "green",
"red", "red", "red", "yellow", "brown", "green", "yellow", "blue",
"red", "red", "yellow", "green", "brown", "yellow", "yellow",
"yellow", "yellow", "red", "blue", "green", "brown", "orange",
"yellow", "orange", "yellow", "brown", "orange", "red", "yellow",
"red"), BrandX = c("yellow", "red", "orange", "blue", "orange",
"green", "red", "yellow", "blue", "blue", "red", "red", "yellow",
"blue", "yellow", "orange", "red", "green", "brown", "brown",
"red", "red", "yellow", "yellow", "brown", "red", "blue", "orange",
"red", "red", "yellow", "orange", "yellow", "orange", "brown",
"blue", "blue", "green", "red", "orange", "brown", "orange",
"red", "red", "brown", "red", "yellow", "yellow", "brown", "orange",
"blue", "brown", "yellow", "red", "red", "brown", "brown", "yellow",
"brown", "blue", "orange", "orange", "brown", "orange", "yellow",
"orange", "red", "red", "blue", "red", "blue", "red", "blue",
"orange", "red", "orange", "red", "red", "brown", "brown", "brown",
"red", "yellow", "red", "blue", "yellow", "blue", "brown", "orange",
"yellow", "red", "orange", "green", "brown", "red", "orange",
"yellow", "red", "brown", "orange", "yellow", "orange", "yellow",
"brown", "blue", "brown", "red", "yellow", "yellow", "red", "brown",
"yellow", "brown", "orange", "brown", "blue", "red", "orange",
"yellow", "orange", "blue", "red", "blue", "red", "orange", "red",
"red", "green", "brown", "brown", "red", "brown", "red", "brown",
"yellow", "brown", "orange", "yellow", "brown", "orange", "red",
"red", "red", "red", "orange", "green", "brown", "yellow", "red",
"brown", "brown", "red", "yellow", "orange", "red", "blue", "blue",
"yellow", "brown", "yellow", "blue", "brown", "red", "orange",
"brown", "brown", "green", "brown", "blue", "red", "brown", "brown",
"brown", "blue", "yellow", "orange", "yellow", "yellow", "yellow",
"orange", "orange", "brown", "red", "brown", "orange", "yellow",
"red", "red", "yellow", "blue", "red", "green", "brown", "orange",
"red", "brown", "orange", "yellow", "yellow", "brown", "brown",
"red", "blue", "yellow", "red", "brown", "blue", "brown", "red",
"green", "yellow", "yellow", "red", "orange", "brown", "red",
"orange", "red", "red", "red", "brown", "blue", "yellow", "red",
"red", "brown", "red", "brown", "blue", "orange", "orange", "yellow",
"red", "red", "brown", "brown", "yellow", "orange", "brown",
"orange", "brown", "red", "yellow", "orange", "red", "brown",
"red", "orange", "yellow", "red")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -250L), spec = structure(list(
cols = list(M&Ms = structure(list(), class = c("collector_character",
"collector")), BrandX = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector"))), class = "col_spec"))

The short answer is yes they need to be factors. Tidyverse/readr doesn't lift the burden of factors from your control- rather, it puts all of that within your control and makes zero assumptions. So you need to convert a variable to a factor first.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 3.5.2
#> Warning: package 'purrr' was built under R version 3.5.2

nochickwts <- chickwts %>% mutate(feed = as.character(feed))

levels(nochickwts$feed)
#> NULL

levels(chickwts$feed)
#> [1] "casein"    "horsebean" "linseed"   "meatmeal"  "soybean"   "sunflower"

factor(nochickwts$feed)
#>  [1] horsebean horsebean horsebean horsebean horsebean horsebean horsebean
#>  [8] horsebean horsebean horsebean linseed   linseed   linseed   linseed  
#> [15] linseed   linseed   linseed   linseed   linseed   linseed   linseed  
#> [22] linseed   soybean   soybean   soybean   soybean   soybean   soybean  
#> [29] soybean   soybean   soybean   soybean   soybean   soybean   soybean  
#> [36] soybean   sunflower sunflower sunflower sunflower sunflower sunflower
#> [43] sunflower sunflower sunflower sunflower sunflower sunflower meatmeal 
#> [50] meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal 
#> [57] meatmeal  meatmeal  meatmeal  casein    casein    casein    casein   
#> [64] casein    casein    casein    casein    casein    casein    casein   
#> [71] casein   
#> Levels: casein horsebean linseed meatmeal soybean sunflower

# You'll want to use `as.factor()` within a `mutate`, and save over the original variable.
newchickwts <- nochickwts %>% mutate(feed = as.factor(feed))

# Examine levels
newchickwts %>% 
  pull(feed) %>% 
  levels()
#> [1] "casein"    "horsebean" "linseed"   "meatmeal"  "soybean"   "sunflower"

Created on 2019-02-14 by the reprex package (v0.2.1)

1 Like

Thanks @apreshill! Wonderful explanation and great code.

You are very welcome- best of luck with the book.

On the subject of teaching scope, as a developmental psychologist, I always think there are links between teaching code and what we know about how children learn words. Starting with one way is like teaching that :rabbit: = bunny. Later on, you might extend/expand this label into a category, to understand that :rabbit: could be a bunny, a rabbit, Peter, or a hare- all are correct and different ways to refer to :rabbit:. But I probably wouldn't start there.

Also a plug for bookdown if you are writing an R book :slight_smile:

Also linking here to the new forcats cheatsheet:
https://github.com/rstudio/cheatsheets/blob/master/factors.pdf![factors|647x500](upload://nBfNSaguZjsJp3RWLQ5AZSHOD7q.png)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.