Data for package Rmixmod

Hello,
I've got a problem with the preparation of data for a training set in Rmixmod. I want to do semi supervised learning with the data in data.train (dir==I|B). Than, I want to predict the data of data.predict. I want to cluster this data into 4 clusters (2 known clusters and 2 unknown clusters). But there is a problem with reading the data, respectively the message appears "Empty mixture component". Has anybody an idea, whats the reason for this problem? Thank you very much!


#>             A          B          C          D          E           F dir
#> 1  2.56823821  1.1585366  0.3401361 -0.2900232 -0.7588739 -1.32496513   G
#> 2  2.56823821  1.8902439 -0.2267574  1.6821346  0.2203182  0.63150236   G
#> 3  1.32754342  2.1341463  0.5668934 -1.2180974 -2.1052632  0.20920502   I
#> 4 -0.03722084  1.2804878 -1.3605442 -1.5661253 -1.0036720 -0.62761506   I
#> 5 -0.28535980 -0.4268293 -0.1133787  1.2180974 -1.0036720  0.06973501   B
#> 6 -0.53349876  0.6707317  0.9070295  1.4501160  1.8115055  2.85913529   B

Created on 2022-02-24 by the reprex package (v2.0.1)

#Preparation of data

data.train<-subset(abcdef, dir=="I"|dir=="B")
data.predict<-subset(abcdef, dir=="G")

#train
library(Rmixmod)
#> Warning: Paket 'Rmixmod' wurde unter R Version 4.0.5 erstellt
#> Lade nötiges Paket: Rcpp
#> Warning: Paket 'Rcpp' wurde unter R Version 4.0.5 erstellt
#> Rmixmod v. 2.1.6 / URI: www.mixmod.org
set.seed(1234)
learn<-mixmodLearn(data.train[2:7],dataType="quantitative", knownLabels=data.train$dir

#predict
new("MixmodPredict", data=data.predict[ ,2:7], classificationRule=learn["bestResult"])
getSlots("MixmodPredict")


<sup>Created on 2022-02-24 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>

Hi @Ande,
The knownLabels in the training set have to be a factor to get your code to run:

learn <- mixmodLearn(data.train[2:7], 
                     dataType="quantitative", 
                     knownLabels=as.factor(data.train$dir))

Hi @DavoWW ,

thank you for help, but unfortunately the problem remains. What amazes me is, that when I swap the data inferentially,

data.train<-subset(abcdef, dir=="G")
data.predict<-subset(abcdef, dir=="I"|dir=="B")

everything works.

The specific cause of the problem is still not clear to me, but it could be fixed by working with integers instead of factors (B,I, G) and working with the filter function.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.