"Error in parse(text = x, keep.source = FALSE) : <text>:2:0: unexpected end of input" when using the stepclass() function

Hi, I am attempting to feature selection and extraction on a dataset containing DNA information for humans and phages. I'm using the stepclass() function in the KlaR package in order to see which variables are the most important in distinguishing between the human and phage DNA sequences but whenever I try I get this same error message above. I've tried, 'forward', 'backward' and 'both' directions and none of them work so what's the issue?

model = stepclass(V1 ~ .,data = phage, method = "lda", direction = "forward", criterion = "AC")

where 'phage' is the dataset, below is the top six rows:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
1 pos  T  C  G  A  C  G  G  G   A   T   C   A   C   G   A   G   G   T   C   A   G   G   A   G   A   T   C   G
2 pos  G  C  G  G  G  C  G  C   C   T   G   T   A   G   T   T   C   C   A   G   C   T   A   C   T   C   G   G
3 pos  T  C  C  A  G  C  C  T   G   G   G   C   G   A   C   A   G   A   G   C   G   A   G   A   C   T   C   C
4 pos  T  T  T  T  C  T  G  G   C   C   T   A   C   T   A   C   C   T   T   T   A   A   A   A   T   T   C   C
5 pos  T  T  A  A  A  C  T  T   G   C   A   C   C   A   A   T   G   T   C   T   G   C   T   C   T   T   T   T
6 pos  G  C  A  T  T  C  C  C   T   T   T   A   A   A   T   A   C   C   T   G   T   C   T   T   A   A   C   C
  V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56
1   A   G   A   C   C   A   T   C   C   T   G   A   C   T   A   C   C   A   C   G   G   T   G   A   A   A   C
2   G   A   G   G   C   T   G   A   G   G   A   G   G   G   A   G   A   A   T   G   G   C   G   T   G   A   A
3   G   T   C   T   C   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
4   T   G   T   T   G   C   A   T   T   T   C   T   T   T   G   T   A   T   T   T   A   C   A   A   G   G   A
5   T   T   T   T   T   A   A   T   G   T   T   T   T   T   G   G   T   A   C   T   C   T   G   G   G   C   A
6   T   C   C   T   A   C   T   T   T   T   A   T   T   T   C   C   T   A   C   T   C   C   T   T   T   C   C
  V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70 V71 V72 V73 V74 V75 V76 V77 V78 V79 V80 V81 V82 V83
1   C   C   C   G   T   C   T   C   T   A   C   T   A   A   A   A   A   A   A   A   T   A   C   A   A   A   A
2   C   C   C   G   G   G   A   G   G   C   G   G   A   G   C   T   T   G   C   A   G   T   G   A   G   C   C
3   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   G   A   A   A   A   G   G   A   T   T
4   A   A   A   G   A   C   T   G   A   A   C   T   T   T   T   T   C   T   C   A   T   C   A   A   A   A   C
5   G   A   C   T   T   C   A   G   T   T   T   T   T   T   A   A   A   A   A   A   T   A   A   A   G   A   T
6   A   C   A   C   A   C   A   T   G   C   A   T   A   C   A   A   T   C   C   T   T   T   A   C   C   T   T
  V84 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100 V101
1   A   A   C   T   A   G   C   C   A   G   G   C   A   T   G   G    T    G
2   G   A   G   A   T   T   G   T   G   C   C   A   C   T   G   C    A    C
3   G   T   A   A   G   A   G   T   T   A   C   T   G   T   T   A    C    A
4   T   A   G   C   T   T   T   T   T   T   C   T   C   A   C   A    G    G
5   T   C   T   A   A   T   G   C   A   G   C   T   A   T   C   T    T    G
6   T   T   A   A   A   G   A   A   T   C   A   T   T   A   A   G    A    C

(Note that 'pos' stands for human DNA, whereas 'neg' would stand for phage DNA)

Any help would be hugely appreciated.

probably an issue with your phage dataframe.

Do you get an error when running only say the first 10 rows of the phage ?

model <- stepclass(V1 ~ .,data = head(phage,n=10), method = "lda", direction = "forward", criterion = "AC")

edit:
I tried this on my own data, and by reducing the rows, I was sending in a dataframe without examples of more than a single class in the classifying variable, and it threw the error you wrote. so I would check unique levels of V1 in phage. What are they please?

Hi nirgraham, I still get the same error when running the first 10 rows of the dataset. And by checking the unique levels i presume you mean doing unique(phage$v1)? This returns just two levels: pos and neg.

is it like the first or the second one of these two examples ?



vec_with_both_of_two_factors <- as.factor(c("pos","neg","pos"))
vec_with_one_of_two_factors <- vec_with_both_of_two_factors[vec_with_both_of_two_factors=='pos']
  
unique(vec_with_both_of_two_factors)
# > unique(vec_with_both_of_two_factors)
# [1] pos neg
# Levels: neg pos
unique(vec_with_one_of_two_factors)
# > unique(vec_with_one_of_two_factors)
# [1] pos
# Levels: neg pos

It's like the first one:

> unique(phage$V1)
[1] pos neg
Levels: neg pos

ok, so thats not the problem.
can you do

dput(head(phage$V1,n=10))

and paste the results here, that would allow us to use the first 10 rows of phage in our R sessions, and can try to debug

Here are the results:

structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("neg", 
"pos"), class = "factor")

Thank you for all your help so far I didn't expect to get this much support with it!

ok, sorry, I mistyped my request. this caused you to share only one of the columns of phage (the V1) rather than the whole thing. but it does reveal that for those 10 records, all values are 'pos' , which is the problematic case of the first example I shared, rather than the one that works well (the 2nd).

lets take a random sample, of pos and neg records, and test (and share that) and see what happens).
use the library(tidyverse)

pos_df<- phage%>% filter(V1=='pos')
nrow_pos <- nrow(pos_df)
rand_rows_1 <- sample.int(nrow_pos ,size=3,replace=FALSE)
pos_3<- slice(pos_df,
                rand_rows_1)
neg_df<- phage%>% filter(V1=='neg')
nrow_neg <- nrow(neg_df)
rand_rows_2 <- sample.int(nrow_neg ,size=3,replace=FALSE)
neg_3<- slice(neg_df,
                rand_rows_2)

shortphage <- union(pos_3,neg_3)

after preparing the shortphage you can share it like so

dput(shortphage)

I suppose you could also test it yourself

model = stepclass(V1 ~ .,data = shortphage, method = "lda", direction = "forward", criterion = "AC")

however I'd be interested to debug it for you if you could share shortphage and it does error for you.