Unclass() command

Hi all,
I must have a misunderstanding of what the unclass() command does. I thought it worked on categorical columns that were composed from a small list of choices. R, being efficient, doesn't store character strings over and over again, but rather stored a number that is keyed to the strings.

My example is a theoretical candies sample, with 250 candies from a plain M&Ms package and the same number from a generic Brand X package. You can get this 500-row data table here.

The two columns are Color and Type, both categorical, and I've executed as.factor() commands on both. Yet when I try unclass(Candies$Color), I get a list of 500 strings. I assumed I'd get 500 numbers and a list of levels.

Here is what I get when I unclass() a data frame column that is a factor.

DF <- data.frame(A = LETTERS[1:5])
str(DF)
#> 'data.frame':    5 obs. of  1 variable:
#>  $ A: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5

unclass(DF$A)
#> [1] 1 2 3 4 5
#> attr(,"levels")
#> [1] "A" "B" "C" "D" "E"

Created on 2019-09-25 by the reprex package (v0.2.1)

I can't reproduce your issue, that is exactly what I get with your sample data

url <- "https://uca802a4496fd381ad33b671b0f2.dl.dropboxusercontent.com/cd/0/get/ApNAeb2WJFLQGrMIUyaOgt3ScQWIWwYYlNqC7vHoIw5Sm4w8gYgxkcoNJC4TNR6aRgjiiCN0PFQHyinnzKfSJzredBqF1GcLD1TRR8puGKrrlg/file?_download_id=45217081711881946608196355865125572930518665905038785490700574325&_notify_domain=www.dropbox.com&dl=1"
candies <- read.csv(url)
unclass(candies$Color)
#>   [1] 2 5 5 1 3 5 5 4 3 4 2 3 6 3 1 6 2 6 5 3 2 6 6 3 6 2 2 1 1 2 1 6 6 1 2
#>  [36] 6 1 4 6 1 5 5 5 5 1 6 1 1 2 6 5 5 4 2 2 4 2 2 6 6 5 5 2 5 1 5 6 2 4 5
#>  [71] 5 4 2 5 6 3 2 1 5 2 2 2 5 6 2 2 2 2 5 1 2 5 4 2 2 1 2 3 3 2 5 2 2 5 3
#> [106] 5 2 6 6 6 6 2 5 5 4 5 2 5 2 3 1 6 5 1 5 5 3 5 4 5 2 3 4 5 4 5 6 2 6 6
#> [141] 1 2 2 6 3 2 5 2 6 4 5 5 2 3 3 2 2 6 4 5 5 6 3 6 3 5 5 2 2 6 3 4 4 5 1
#> [176] 2 2 6 1 6 5 2 5 5 2 1 5 6 1 1 6 6 2 6 2 6 2 6 2 1 1 4 3 5 5 5 2 5 5 3
#> [211] 6 3 5 5 1 3 1 2 3 3 5 5 5 6 2 3 6 1 5 5 6 3 2 6 6 6 6 5 1 3 2 4 6 4 6
#> [246] 2 4 5 6 5 6 5 4 1 4 3 5 6 1 1 5 5 6 1 6 4 5 3 2 2 5 5 6 6 2 5 1 4 5 5
#> [281] 6 4 6 4 2 1 1 3 5 4 2 4 5 5 2 5 6 6 2 4 1 2 6 5 5 2 2 6 2 1 4 4 2 4 6
#> [316] 4 5 5 1 5 1 5 1 4 5 4 5 5 2 2 2 5 6 5 1 6 1 2 4 6 5 4 3 2 5 4 6 5 2 4
#> [351] 6 4 6 2 1 2 5 6 6 5 2 6 2 4 2 1 5 4 6 4 1 5 1 5 4 5 5 3 2 2 5 2 5 2 6
#> [386] 2 4 6 2 4 5 5 5 5 4 3 2 6 5 2 2 5 6 4 5 1 1 6 2 6 1 2 5 4 2 2 3 2 1 5
#> [421] 2 2 2 1 6 4 6 6 6 4 4 2 5 2 4 6 5 5 6 1 5 3 2 4 5 2 4 6 6 2 2 5 1 6 5
#> [456] 2 1 2 5 3 6 6 5 4 2 5 4 5 5 5 2 1 6 5 5 2 5 2 1 4 4 6 5 5 2 2 6 4 2 4
#> [491] 2 5 6 4 5 2 5 4 6 5
#> attr(,"levels")
#> [1] "blue"   "brown"  "green"  "orange" "red"    "yellow"

From the docs

unclass returns (a copy of) its argument with its class attribute removed. (It is not allowed for objects which cannot be copied, namely environments and external pointers.)

The initial read of the csv file returns an object of class data.frame, and as.factor() casts it to class factor.

'data.frame':	500 obs. of  2 variables:
 $ Color: Factor w/ 6 levels "blue","brown",..: 2 5 5 1 3 5 5 4 3 4 ...
 $ Type : Factor w/ 2 levels "Brand X","M&Ms": 2 2 2 2 2 2 2 2 2 2 ...

After unclass() the result is a list

str(object1)
List of 2
 $ Color: Factor w/ 6 levels "blue","brown",..: 2 5 5 1 3 5 5 4 3 4 ...
 $ Type : Factor w/ 2 levels "Brand X","M&Ms": 2 2 2 2 2 2 2 2 2 2 ...
 - attr(*, "row.names")= int [1:500] 1 2 3 4 5 6 7 8 9 10 ...

which gives you lists of 500 colors, 500 brands and 500 row names.

I read the docs before I posted, but thanks for including it so others can see it.

This is exactly what I expected, except my unclass() call gives a list of strings rather than a list of numbers.

I do note that your initial read of the csv file has numbers and mine has strings. I used an import method from RStudio. Perhaps it lies in the underlying csv import that RStudio executes. Could you specify what you used to import the csv?

1 Like

Sorry, I was just setting the table with the quote--wasn't intended as a RTFM.

I used read.csv on the downloaded file, but @andresrcs has the better method.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.