Must subset columns with a valid subscript vector. Can't convert from <double> to <integer> due to loss of precision

I am trying to run the decision tree C5.0 model with the following dataset:

DT5_Example
id A B C D E PF

1 1 0.0045 0.765 0.0072 0.938 0.809 1
2 2 0.0022 1 0.0076 0.938 1 1
3 3 0.0030 1 0.0010 0.946 1 1
4 4 0.0054 1 0.0045 0.844 1 0
5 5 0.0046 1 0.0041 0.856 1 1
6 6 0.0048 1 0.0051 0.846 1 0
7 7 0.0038 1 0.0005 0.617 0.987 1
8 8 0.0275 1 0.0103 0.954 1 1
9 9 0.0017 1 0.0129 0.917 1 1
10 10 0.0139 1 0.0059 0.983 1 1

Below is my script:
A<-DT5_Example$A
B<-DT5_Example$B
C<-DT5_Example$C
D<-DT5_Example$D
E<-DT5_Example$E

vars<-c(A, B, C, D, E)

Converting PF into a factor because it is the outcome variable

DT5_Example2<-DT5_Example %>%
mutate(PFcat=factor(PF, levels = c(0,1))) %>% collect()

Fitting the C5.0 model to the data

install.packages("C50")
library(C50)
DT5_model<-C5.0(x=DT5_Example2[, vars], y = DT5_Example2$PFcat)
summary(DT5_model)

I received the following error message:
Error: Must subset columns with a valid subscript vector.
x Can't convert from to due to loss of precision.

If you run the model with PF as an integer variable, you still receive the same message

I already googled this error and read topics related in the RStudio community, and I have not been able to fix it. Any help will be appreciated. Thanks.

This calls for a subset of column indices. The indices must be integers. But vars contains doubles.

library(C50)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
DT5_Example <- data.frame(A = c(
  0.0045, 0.0022, 0.003, 0.0054, 0.0046, 0.0048,
  0.0038, 0.0275, 0.0017, 0.0139
), B = c(
  0.765, 1, 1, 1, 1, 1,
  1, 1, 1, 1
), C = c(
  0.0072, 0.0076, 0.001, 0.0045, 0.0041, 0.0051,
  5e-04, 0.0103, 0.0129, 0.0059
), D = c(
  0.938, 0.938, 0.946, 0.844,
  0.856, 0.846, 0.617, 0.954, 0.917, 0.983
), E = c(
  0.809, 1, 1,
  1, 1, 1, 0.987, 1, 1, 1
), PF = c(1, 1, 1, 0, 1, 0, 1, 1, 1, 1))

A <- DT5_Example$A
B <- DT5_Example$B
C <- DT5_Example$C
D <- DT5_Example$D
E <- DT5_Example$E

vars <- c(A, B, C, D, E)

vars
#>  [1] 0.0045 0.0022 0.0030 0.0054 0.0046 0.0048 0.0038 0.0275 0.0017 0.0139
#> [11] 0.7650 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
#> [21] 0.0072 0.0076 0.0010 0.0045 0.0041 0.0051 0.0005 0.0103 0.0129 0.0059
#> [31] 0.9380 0.9380 0.9460 0.8440 0.8560 0.8460 0.6170 0.9540 0.9170 0.9830
#> [41] 0.8090 1.0000 1.0000 1.0000 1.0000 1.0000 0.9870 1.0000 1.0000 1.0000

DT5_Example2<-DT5_Example %>%
  dplyr::mutate(PFcat=factor(PF, levels = c(0,1))) %>% dplyr::collect()

# give required columns explicitly
DT5_model<-C5.0(x=DT5_Example2[, 1:5], y = DT5_Example2$PFcat)
summary(DT5_model)
#> 
#> Call:
#> C5.0.default(x = DT5_Example2[, 1:5], y = DT5_Example2$PFcat)
#> 
#> 
#> C5.0 [Release 2.07 GPL Edition]      Wed Dec 29 18:15:53 2021
#> -------------------------------
#> 
#> Class specified by attribute `outcome'
#> 
#> Read 10 cases (6 attributes) from undefined.data
#> 
#> Decision tree:
#> 
#> D <= 0.846: 0 (3/1)
#> D > 0.846: 1 (7)
#> 
#> 
#> Evaluation on training data (10 cases):
#> 
#>      Decision Tree   
#>    ----------------  
#>    Size      Errors  
#> 
#>       2    1(10.0%)   <<
#> 
#> 
#>     (a)   (b)    <-classified as
#>    ----  ----
#>       2          (a): class 0
#>       1     7    (b): class 1
#> 
#> 
#>  Attribute usage:
#> 
#>  100.00% D
#> 
#> 
#> Time: 0.0 secs

Thank you so much for your help! It worked to my end with the real data set. I still have the following question. I understand that the subset of predictors [1:5] must be integers. However, the script that you used to fix the issue did not include any transformation from double to integer. The five predictors in the DT5_Example2 are double. Therefore, "*dplyr::mutate(PFcat=factor(PF, levels = c(0,1))) %>% dplyr::collect()" was the solution for this problem. Am I correct in my interpretation?

works to subset DT5_Example2 so long as there are at least 5 variables(columns). It does not matter what type of variables the columns are—integer,double,character,logical or a mix. They just have to be referred to with an integer index.

Technocrat, thank you very much for your explanation! The solution of the problem and your explanation have been very much appreciated.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.