Hi there!
Is it possible to erase rows within a data frame whose string in a particular coloum does not appear often enough?
I am using a data frame to train a neural network. It uses 3/4 of the dataframe as training data,
the remaining 1/4 as test data. If I happen to have a string that only appears once in the dataframe
and it ends up in the test data, the neural network has no idea what to do and returns errors. Even
if I get actual test data and the entire data frame is the training data, it does not seem smart to
have a string in the training data that only appears once.
Unfortunately, my data frame is huge, with over 50000 entries, so there is no way I can check for
every possible string. Is there a way to tell R:
Count every string in this (preferable these, I use 2 coloums with strings) coloum, count each string used,
if a row's entry is used less than let's say n=10 times throughout the entire dataframe, erase the entire row from the dataframe.
Thank you in advance!