I want to start using ggplot

Dear community

When I tried to run ggplot, I got an error message, "data must be a <data.frame>, or an object coercible by fortify(), not an S3 object with class /."

What do I have to do?

x = read.csv("MYDATA.csv", header = TRUE, row.names = 1)

y=as.matrix(x)

for(i in 2:(nrow(y)-1)){
    pam(y,k=i,metric="euclidean")
}

kmed=pam(y,k=5)

fviz_cluster(kmed,star.plot=TRUE,frame.type=FALSE)
#I want to avoid overlapping labels in this graph.

ggplot(kmed)
#I expect that if I can run ggplot, I can enter geom_text_repel() and labels will be able to avoid overlapping.

Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`, not an S3
  object with class <pam>/<partition>.
Run `rlang::last_error()` to see where the error occurred.

The error message is telling you that the input to ggplot() must be a data frame or an object that can be coerced into a data frame using the fortify() function. The pam() function, however, returns an object of class "pam" which is not a data frame and cannot be directly used as input to ggplot().

To plot the results of the clustering, you can use the fviz_cluster() function from the factoextra package instead of ggplot(). It is specifically designed for visualizing clustering results and offers various options for customizing the plot.

If you want to use ggplot() to plot the results of the clustering, you would need to first convert the "pam" object to a data frame using the as.data.frame() function. This will give you a data frame with columns for each of the variables in your dataset, as well as a column indicating the cluster assignment for each data point. You can then use this data frame as input to ggplot() and use the geom_text_repel() function to avoid overlapping labels.

kmed_df <- as.data.frame(kmed)
ggplot(kmed_df, aes(x, y, color = cluster)) + geom_point() + geom_text_repel(aes(label = row.names(kmed_df)))

It's important to note that the above code is a general example and you should adapt it to the specific columns name of your dataframe .

library(cluster)
library(factoextra)
#> Loading required package: ggplot2
#> Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(ggplot2)

x <- data.frame(
  aaddr = c(
    "800 Coddingtown Ctr", "900 Coddingtown Ctr",
    "1363 N Mcdowell Blvd", "8270 Petaluma Hill Rd", "311 Rohnert Park Expy W",
    "100 Santa Rosa Plz"
  ),
  saledate = c("2005-11-21", "2005-11-21", "2017-04-13", "2015-04-03", "2022-06-24", "2015-07-06"),
  psf = c(105, 105, 186, 10.8, 164, 180),
  price = c(10562045, 7882123, 16416907, 1200000, 14000000, 12829406)
)
class(x)
#> [1] "data.frame"

# converting x to matrix will return an all
# character or all numeric matrix, while
# a data frame can mix those types
# ggplot expects a data frame object
# and pam can take either
# y=as.matrix(x)
# the data I have at hand is mixed,
# so remove non-numeric and rename
x <- x[3:4]
colnames(x) <- c("V1", "V2")
x
#>      V1       V2
#> 1 105.0 10562045
#> 2 105.0  7882123
#> 3 186.0 16416907
#> 4  10.8  1200000
#> 5 164.0 14000000
#> 6 180.0 12829406

# the loop does not return anything
r <- for (i in 2:(nrow(x) - 1)) pam(x, k = i, metric = "euclidean")
r
#> NULL

kmed <- pam(x, k = 5)

# call of the function uses deprecated parameter
fviz_cluster(kmed, star.plot = TRUE, frame.type = FALSE)
#> Warning: argument frame is deprecated; please use ellipse instead.
#> Warning: argument frame.type is deprecated; please use ellipse.type instead.


# the return value is a ggplot object, that displays
# but does not stay in namespace
fviz_cluster(kmed, star.plot = TRUE)


# instead, assign it a name; the conventional name
# for a ggplot object is p
p <- fviz_cluster(kmed, star.plot = TRUE)

# kmed has already been used to create a ggplot
# object, and from now on it can be embellished
# with simply the + operator
# ggplot(kmed)

p +
  theme_minimal()

Created on 2023-01-26 with reprex v2.0.2

Thank you for your detaild advice!

I got a plot like you!

Iā€™m sorry for the late reply.

I'm sorry for the late reply.

It was simply for not dataframe...

Thanks to you, I can design my plot!

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.