estim_ncpFDMA : error message (no defined colums)

Hi!

I am trying to implemente a dataframe for a FDMA anlysis using the missMDA package and the estim_ncpFDMA function, as my dataset has missing values for qalitative and quantitative variables.

Firstm I tried to use estim_ncpMCA on my qualitative variables :

nb.mca <- estim_ncpMCA(quali,
                       ncp.max=5)

but the calculation takes forever (I ave been running it an entire night and no results)

So I tried to use the FDMA method on my complete dataset :

nb_FDMA <- estim_ncpFAMD(quanti_quali)

And I receive a message error:

Error in `[.data.frame`(jeu, , (nbquanti + 1):ncol(jeu), drop = F) : 
  no defined selected columns

here is what the dataset looks like:

'data.frame':	2196 obs. of  49 variables:
 $ KL_HAB  : chr  "semi-erect" "semi-erect" "semi-erect" "erect" ...
 $ KL_LCO  : chr  "LCO_green" "LCO_green" "LCO_green" NA ...
 $ KL_PHC  : chr  "PHC_orange" "PHC_orange" "PHC_orange" "PHC_orange" ...
 $ KL_RCSH : chr  NA NA "RCSH_green" "RCSH_no difference" ...
 $ KL_ROSH : chr  "oblong/cylindrical" "oblong/cylindrical" "oblong/cylindrical" "oblong/cylindrical" ...
 $ KL_ROSHO: chr  "ROSHO_rounded" "ROSHO_rounded" "ROSHO_rounded-conical" "ROSHO_rounded-conical" ...
 $ KL_RSC  : chr  "RSC_orange" "RSC_orange" "RSC_orange" "RSC_orange" ...
 $ KL_RTIP : chr  "RTIP_rounded" "RTIP_rounded" "RTIP_blunt" "RTIP_blunt" ...
 $ KL_XC   : chr  "XC_orange" "XC_orange" "XC_orange" "XC_orange" ...
 $ QT_BO%2 : num  0 0 0 0 0 0 0 0 0 0 ...
 $ QT_BO%3 : num  0 0 0 0 1 NA 0 0 0 0 ...
 $ QT_BO%4 : num  0 0 0 0 1 0 0 0 0 0 ...
 $ QT_BOL2 : num  NA 0 3 3 0 3 3 3 3 0 ...
 $ QT_BOL3 : num  NA 0 3 3 3 NA 3 3 3 0 ...
 $ QT_BOL4 : num  3 0 3 3 3 3 5 3 3 0 ...
 $ QT_BRIX : num  NA 8.3 NA NA 10.4 NA NA NA NA 9.2 ...
 $ QT_BSD  : num  0 0 0 0 3 0 0 0 0 0 ...
 $ QT_BSDPH: num  NA 0 NA 0 0 NA NA NA 0 NA ...
 $ QT_CRA  : num  0 3 0 3 0 0 0 5 0 3 ...
 $ QT_DC   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ QT_FW   : num  3 3 3 3 3 3 3 3 3 5 ...
 $ QT_GP   : num  3 NA NA NA NA NA 5 5 NA 7 ...
 $ QT_ICC  : num  5 5 5 0 5 0 5 7 0 0 ...
 $ QT_INT  : num  NA 7 5 5 5 5 7 7 5 3 ...
 $ QT_LBL2 : num  1 0 1 0 0 3 0 0 3 0 ...
 $ QT_LBL3 : num  3 0 3 0 2 NA 0 0 3 0 ...
 $ QT_LBL4 : num  3 0 2 NA 2 3 0 1 3 0 ...
 $ QT_LTYP : num  2 2 2 NA 2 NA NA NA NA NA ...
 $ QT_MDW2 : num  0 0 0 0 0 0 0 2 0 0 ...
 $ QT_MDW3 : num  0 0 1 0 0 NA 0 2 0 1 ...
 $ QT_MDW4 : num  0 0 1 NA 0 0 0 2 0 1 ...
 $ QT_ND   : num  0 0 0 0 0 0 1 0 0 1 ...
 $ QT_PDC  : num  NA 95 102 37 41 NA 40 67 NA 109 ...
 $ QT_RED2 : num  1 0 0 0 0 0 0 0 0 0 ...
 $ QT_RED3 : num  1 0 0 0 0 NA 1 0 0 0 ...
 $ QT_RED4 : num  0 0 0 NA 0 0 0 0 0 0 ...
 $ QT_RFD  : num  0 0 0 0 NA 0 0 0 0 0 ...
 $ QT_ROBRA: num  0 1 5 0 0 0 0 0 0 0 ...
 $ QT_RODA : num  1.5 1.93 2.2 2.3 3.05 ...
 $ QT_ROLA : num  15 12.8 15 17.5 16.9 16 10.9 19 17 11.5 ...
 $ QT_ROSU : num  5 1 2 4 4 5 2 5 5 3 ...
 $ QT_RPOS : num  3 3 3 5 3 5 5 5 5 5 ...
 $ QT_SCL3 : num  0 0 0 0 0 NA 0 0 0 0 ...
 $ QT_SCL4 : num  0 0 0 NA 0 0 0 0 0 0 ...
 $ QT_SPH  : num  NA 0 NA 2 1 NA NA NA 0 NA ...
 $ QT_SV2  : num  NA NA NA NA NA NA NA 10 NA 10 ...
 $ QT_SV4  : num  NA NA NA NA NA NA 11 13 NA 13 ...
 $ QT_WLR  : num  3 3 3 5 5 5 5 3 5 3 ...
 $ QT_QPH  : num  NA 7 NA 7 5 NA NA NA 3 NA ...

Note that I have tried to change the colums names to have them without numbers and % symbols, but it didn't work.

And here is a reproductible example:

> test
       KL_RSC      KL_RTIP     KL_XC QT_BO%2 QT_BO%3 QT_BO%4
18 RSC_orange RTIP_rounded      <NA>      NA      NA   0.000
19       <NA>         <NA>      <NA>      NA      NA      NA
20 RSC_orange RTIP_pointed XC_orange       0   0.000   0.000
21 RSC_yellow RTIP_pointed XC_yellow       0   0.000   0.000
22 RSC_yellow RTIP_pointed XC_yellow       0   0.000   0.000
23 RSC_orange RTIP_rounded XC_orange       0   0.000   0.000
24 RSC_orange         <NA>      <NA>       0   5.000   5.000
25       <NA>         <NA>      <NA>       0   0.000   0.000
26 RSC_orange RTIP_pointed      <NA>       0   0.000   0.000
27 RSC_orange   RTIP_blunt XC_orange       0   0.588   1.176
28 RSC_orange         <NA> XC_orange       1   3.000   6.000
29 RSC_orange RTIP_rounded XC_orange       0   2.000   2.000
30 RSC_orange RTIP_pointed XC_yellow       0      NA   0.000

Thank you for your help !

It seems it only works if the categorical variables are encoded as factors, not characters. In your str() we can see that your columns are character:

'data.frame':	2196 obs. of  49 variables:
 $ KL_HAB  : chr  "semi-erect" ...
 $ KL_LCO  : chr  "LCO_green" ...
             ^

But if you check the example, they use data(ozone):

> str(ozone)
'data.frame':	112 obs. of  13 variables:
 $ maxO3 : int  87 NA 92 114 94 80 NA 79 101 106 ...
 ...
 $ vent  : Factor w/ 4 levels "Est", ...
 $ pluie : Factor w/ 2 levels ...

And with your test, converting to factor still gives an error, but for a different reason (my guess is because this test data is too small to impute).

test <- read.table(text = "KL_RSC      KL_RTIP     KL_XC QT_BO%2 QT_BO%3 QT_BO%4
18 RSC_orange RTIP_rounded      <NA>      NA      NA   0.000
19       <NA>         <NA>      <NA>      NA      NA      NA
20 RSC_orange RTIP_pointed XC_orange       0   0.000   0.000
21 RSC_yellow RTIP_pointed XC_yellow       0   0.000   0.000
22 RSC_yellow RTIP_pointed XC_yellow       0   0.000   0.000
23 RSC_orange RTIP_rounded XC_orange       0   0.000   0.000
24 RSC_orange         <NA>      <NA>       0   5.000   5.000
25       <NA>         <NA>      <NA>       0   0.000   0.000
26 RSC_orange RTIP_pointed      <NA>       0   0.000   0.000
27 RSC_orange   RTIP_blunt XC_orange       0   0.588   1.176
28 RSC_orange         <NA> XC_orange       1   3.000   6.000
29 RSC_orange RTIP_rounded XC_orange       0   2.000   2.000
30 RSC_orange RTIP_pointed XC_yellow       0      NA   0.000",
na.strings = c("<NA>", "NA"))

test2 <- test

test2$KL_RSC <- as.factor(test2$KL_RSC)
test2$KL_RTIP <- as.factor(test2$KL_RTIP)
test2$KL_XC <- as.factor(test2$KL_XC)

library(missMDA)

nb_FDMA <- estim_ncpFAMD(test)
#> Error in `[.data.frame`(jeu, , (nbquanti + 1):ncol(jeu), drop = F): undefined columns selected

nb_FDMA <- estim_ncpFAMD(test2)
#> Error in impute(X, group = group, ncp = ncp, type = type, method = method, : The algorithm fails to converge. Choose a number of components (ncp) less or equal than 1 or a number of iterations (maxiter) less or equal than 999

Created on 2023-02-06 by the reprex package (v2.0.1)

And checking in the source code of the function, you can also see that, just before the line that fails, you have this:

jeu <- don[, c(which(sapply(don, is.numeric)), which(sapply(don, 
    is.factor))), drop = F]

So the function assumes that columns are either numeric or factor, anything else is ignored.

Hi,

Thank you, I'll try to convert to factors ; do you know how to do so with several columns at a time maybe?

If you're familiar with the tidyverse, here is a direct way:

library(dplyr)
test2 <- test |>
  mutate(across(where(is.character), as.factor))

In pure base R, you can rely on the fact that a data.frame is a list of columns, so we can use lapply() to apply to each column:

is_character <- which(sapply(test, is.character))

test3 <- test
test3[is_character] <- lapply(test3[is_character], as.factor)

all.equal(test2, test3)
#> [1] TRUE

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.