Warning Message: In daisy binary variables are treated as interval scaled

IPB · September 8, 2020, 1:58am

Hi, I`m making a multivariate cluster analysis with daisy function. When I rut it, it appears me a warning message saying binary variables are being treated as interval scaled variables, even though I specify their nature within the programming. Can somebody help me? (it is very important each variable is treated as I specified). Just for clarify, my data is composed of 38 observations with 24 variables each one, given by this order: binary symmetric, binary asymmetric, ordinal, nominal, and numeric.

I would really appreciate if somebody could help me

Here is my code:

install.packages(c("cluster", "factoextra", "backports"))
library(c("cluster", "factoextra", "backports"))
setwd("C:/Users/Isa/Desktop/Escritorio/Colegio/Monografía/Datos/Cluster")
data <-read.csv(file="EncuestaR1.csv", header =TRUE, sep=";", dec = ",", row.names 
           = 1 )
d <- daisy(data, metric = "gower", type =list(ordered= 
       c(9,10,11,12,13),asymm=c(5,6,7,8),     symm=c(1,2,3,4), factor=c(14,15,16,17,18), 
        numeric=c(19,20,21,22,23,24)))
d
round(as.matrix(d)[1:24,1:24],3)

nirgrahamuk · September 8, 2020, 10:07am

The documentation states:

Columns of mode numeric (i.e. all columns when x is a matrix) will be recognized as interval scaled variables, columns of class factor will be recognized as nominal variables, and columns of class ordered will be recognized as ordinal variables.

Here is an example of that. I think this will override concerns such as the type list

library(cluster)
data(agriculture)
set.seed(123)
agriculture$binary <- sample.int(n=2,size=nrow(agriculture),replace=TRUE) -1 

(d.agr <- daisy(agriculture, metric = "euclidean", stand = FALSE))

agriculture$binary <- factor(agriculture$binary)

(d.agr2 <- daisy(agriculture, metric = "euclidean", stand = FALSE))

Appreciate that its your first time on the forum, so I'd like to help you with future success of engaging here to please note that my example code is a reprex (reproducible exampe), which is easy for you to run, and play with.

It's recommended that you create your own reprexes when you post about your issues as it will improve your chance of getting support. A guide on it is available here : FAQ: How to do a minimal reproducible example ( reprex ) for beginners

IPB · September 8, 2020, 1:29pm

Hi, yes, I`m still learning how to make a reprex. I used your example in mine, and my results are still incongruous. (For make it easier I changed the name of my data to "agriculture")

If you want to, I can upload my data, but please help me, thanks

setwd("C:/Users/Isa/Desktop/Escritorio/Colegio/Monografía/Datos/Cluster")

agriculture <-read.csv(file="EncuestaR1.csv", header =TRUE, sep=";", dec = ",", row.names =1 )

agriculture$binary <- sample.int(n=c(2), size=nrow(agriculture),replace=TRUE) -1 

(d.agr <- daisy(agriculture, metric = "gower", type =list(symm=c(1,2,3,4),asymm=c(5,6,7,8)), stand = FALSE))

agriculture$binary <- factor(agriculture$binary)

f<-(d.agr4 <- daisy(agriculture, metric = "gower", type =list(symm=c(1,2,3,4), asymm=c(5,6,7,8)), stand = FALSE))

f

nirgrahamuk · September 8, 2020, 1:47pm

All you've done is add an extra variable to your existing problematic data.
Which column is it that is binary in EncuestaR1?

IPB · September 8, 2020, 2:06pm

Binary symmetric variables are: 1,2,3,4 and binary assymetric are 5,6,7,8

nirgrahamuk · September 8, 2020, 2:35pm

To me that implies they should all be factors

system · September 29, 2020, 2:35pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.