Hi all,
I must have a misunderstanding of what the unclass() command does. I thought it worked on categorical columns that were composed from a small list of choices. R, being efficient, doesn't store character strings over and over again, but rather stored a number that is keyed to the strings.
My example is a theoretical candies sample, with 250 candies from a plain M&Ms package and the same number from a generic Brand X package. You can get this 500-row data table here.
The two columns are Color and Type, both categorical, and I've executed as.factor() commands on both. Yet when I try unclass(Candies$Color), I get a list of 500 strings. I assumed I'd get 500 numbers and a list of levels.
unclass returns (a copy of) its argument with its class attribute removed. (It is not allowed for objects which cannot be copied, namely environments and external pointers.)
The initial read of the csv file returns an object of class data.frame, and as.factor() casts it to class factor.
I read the docs before I posted, but thanks for including it so others can see it.
This is exactly what I expected, except my unclass() call gives a list of strings rather than a list of numbers.
I do note that your initial read of the csv file has numbers and mine has strings. I used an import method from RStudio. Perhaps it lies in the underlying csv import that RStudio executes. Could you specify what you used to import the csv?