Error in eval(predvars, data, env) : object 'bachelor's' not found

Hello everyone.

I am doing a NLP project and I am running a classification algorithm (RF). So basically, I have turned tokens turned into variables, and the dataframe is filled with their count to the respective text.

When I run a Random Forest, I get the following error:

Error in eval(predvars, data, env) : object 'bachelor's' not found

I have checked other posts and most of the time this error is related to a typo, or that the variable does not exist in the data set.

The variable is present in the data frame because when I ran df$bachelor's it returns the values of the column..

But, and I think the problem happens because of this:
When I write df$bachelor's, it transforms into df$bachelor's

So it must be the way the variable is written, as it is nor recognised by the Random Forest. But the problem does not stop here and some variables like '2736e' also return an error. Is it because of the numbers?

I don't understand what is going on.
I tried to replicate the code but I couldn't ...
How can I fix this?

Thank you in advance

P.S: Let me know if something is not clear.

Both bachelor's and 2736e are non-syntactic variable names, to reference this kind of variables you have to enclose them among backticks (e.g. `bachelor's`) but have in mind that using non-syntactic variable names is considered a bad practice in R.

The variables are basically tokens... do you know a way to solve this?

I'm not an NLP expert but, aren't you supposed to clean the text before tokenizing? I mean getting word's root (i.e. bachelor instead of bachelor's) and eliminating non-semantic elements (not sure if this is the right term) like 2736e

I see what you mean. I cleaned the data. I guess I will review that part again, because something is clearly off. Thanks for the help.
br

For syntactic column names with minimal effort check out the janitor package which has a clean_names() function

1 Like

That is exactly what I was looking for!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.