Could you help me to interpret the results I obtained using the pvclust
package? I am using two databases, df1 and df2. Therefore, I am generating 2 different scenarios. The executable code is below.
#Database df1
rm(list=ls())
library(rdist)
library(pvclust)
library(geosphere)
df1<-structure(list(Propertie = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), Latitude = c(-23.8, -23.9, -23.5, -23.4, -23.6,-23.9, -23.2, -23.5, -23.8, -23.7, -23.8, -23.9, -23.4, -23.9,
-23.9, -23.2, -23.3, -23.7, -23.8),
Longitude = c(-49.1, -49.3,-49.4, -49.7, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.7,-49.2, -49.5, -49.8, -49.5, -49.3, -49.3, -49.2, -49.5),
Waste = c(526,350, 526, 469, 285, 175, 175, 350, 350, 175, 350, 175, 175, 364,175, 175, 350, 45.5, 54.6)),
class = "data.frame", row.names = c(NA, -19L))
#PVCLUST
coordinates<-subset(df1,select=c("Latitude","Longitude"))
d<-distm(coordinates[,2:1])
diag(d)<-1000000
d<-as.dist(d)
mat <- as.matrix(d)
mat <- t(mat)
fit <- pvclust(mat, method.hclust="average", method.dist="euclidean",
nboot=10)
plot(fit,hang=-1,cex=.8,main="Average Linkage Clustering")
pvrect(fit, alpha=.70, pv="au", type="geq")
#Database df2
rm(list=ls())
library(rdist)
library(pvclust)
library(geosphere)
df2<-structure(list(Propertie = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16), Latitude = c(-23.8, -23.9, -23.5, -23.4, -23.6,-23.9, -23.2, -23.5, -23.8, -23.7, -23.8, -23.9, -23.4, -23.9,
-23.9, -23.2),
Longitude = c(-49.1, -49.3,-49.4, -49.7, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.7,-49.2, -49.5, -49.8, -49.5, -49.3),
Waste = c(526,350, 526, 469, 285, 175, 175, 350, 350, 175, 350, 175, 175, 364,175, 175)),
class = "data.frame", row.names = c(NA, -16L))
#PVCLUST
coordinates<-subset(df2,select=c("Latitude","Longitude"))
d<-distm(coordinates[,2:1])
diag(d)<-1000000
d<-as.dist(d)
mat <- as.matrix(d)
mat <- t(mat)
fit <- pvclust(mat, method.hclust="average", method.dist="euclidean",
nboot=10)
plot(fit,hang=-1,cex=.8,main="Average Linkage Clustering")
pvrect(fit, alpha=.60, pv="au", type="geq")
I opted to study for the AU p-value, as it allows a much better assessment of how strongly the cluster is supported by the data. The table below shows the scenarios I have, the number of clusters for each scenario, AU values for each cluster, as well as AU average and AU standard deviation (SD). But I am having a hard time interpreting the results I obtained.