Hi,

I'm not familiar with this specific type of linear model, but I think the difference arises from the encoding of the variables as strings or numbers.

A linear model can only interpret a numeric value on a continuous scale. Categorical data like factors have to be converted into one-hot vectors to be able to fit into a model (many models will do that for you). They work by generating for every factor a separate binary variable that's 0 or 1 if present or absent. This increases the size of the model and can reduce performance if the number of factors is large and the dataset small.

If you present the variable as numeric values, the model will interpret them as if they were on a linear scale. For example, if you say blue = 1, red = 2 and green = 3, the model thinks that blue < red < green. This is nonsensical and although results will be generated, they are not reflecting the actual things that are going on.

It's therefor important to know what inputs the models you use require, and how to generate the correct ones if needed. Again, some models will do automatic conversion of factors to one-hot, but will interpret numeric values always as a continuous variable.

Hope this helps,

PJ