Shading Selected Regions

Hi All,

I've recently been using a newly published package called ordinalEditDistance.

It is a clustering routine, but with some novel cluster performance metrics. These performance metrics are typically visualised in terms of a frontier type diagram.

However, there's a departure between the example code, and published visuals. I wondered if someone could help with a process of making the the example output look more like published output?

This would include a demonstration on how to include shaded regions in the demo diagram, and joining the regions so that the frontiers meet both axes.

Any help would be appreciated :slight_smile:

devtools::install_github(“HannahJohns/ordinalEditDistance”)library(ordinalEditDistance)
library(parallel)
library(ggplot2)
library(tibble)
library(rPref)


#EXAMPLE DATA

df <- example_data

#DATA AS LIST

levelList <- by(example_data,example_data$id,function(df){
  df$state[order(df$step)]
})

#EVALUATING CLUSTER PERFORMANCE

cl <- makeCluster(round(0.6*parallel::detectCores()))
results <- evaluateClusters(levelList,
                            a=seq(0,1,length.out=11),
                            p=seq(1,5,length.out=11),
                            k = c(2,3,4,5),cl = cl)
stopCluster(cl)


## IDENTIFYING PARETO OUTPUT

pareto <- do.call("rbind",by(results,results$k,function(df){
  psel(df,high(distinctiveness) * low(deviation))
}))

## PLOT

ggplot(results,
       aes(x=1-deviation,
           y=distinctiveness,
           color=as.factor(k))
) +
  geom_point()+
  geom_point(data=pareto,size=4)+
  geom_step(data=pareto,direction = "vh")+
  geom_label(data=pareto,size=4,hjust=0,
             aes(label=sprintf("a=%0.2f, p=%0.2f",a,p))
  )

DEMO
Screenshot 2022-11-29 at 22.33.09

PUBLISHED

I have your example working, but I'm unsure on the clustering—is it the four levels of variable k?

Hi @ technocrat. Yeah, four levels where k was evaluated at c(2,3,4,5).

a first attempt:


library(ordinalEditDistance)
library(parallel)
library(ggplot2)
library(tibble)
library(rPref)


#EXAMPLE DATA

df <- example_data


levelList <- by(example_data,example_data$id,function(df){
  df$state[order(df$step)]
})

#EVALUATING CLUSTER PERFORMANCE

cl <- makeCluster(round(0.6*parallel::detectCores()))
results <- evaluateClusters(levelList,
                            a=seq(0,1,length.out=11),
                            p=seq(1,5,length.out=11),
                            k = c(2,3,4,5),cl = cl)
stopCluster(cl)

raw_results <- results

results$k <- as.factor(raw_results$k)

pareto <- do.call("rbind",by(results,results$k,function(df){
  psel(df,high(distinctiveness) * low(deviation))
}))

make_shell_data <- function(kval){
  kval <- factor(kval,levels = levels(pareto$k))
  temp <- pareto |> na.omit() |> filter(k==kval) |> 
    mutate(x=1-deviation,y=distinctiveness,yx=y*x) |>
    arrange(desc(yx)) |> select(x,y,yx,k) |> distinct()
  res <- bind_rows(data.frame(x=0,y=max(temp$y),k=kval),temp,
                   data.frame(x=max(temp$x),y=0,k=kval),
                   data.frame(x=0,y=0,k=kval)) 
  res |> select(-yx)
}

(my_shell_data <- map_dfr(2:5,make_shell_data))



ggplot() + 
  geom_polygon(data=my_shell_data
               ,mapping = aes(x=x,y=y,fill=k),
               alpha=.2)+
  aes(x=1-deviation,
      y=distinctiveness,
      color=k) +
  geom_point(data = results)+ 
  geom_point(data=pareto,size=4)

1 Like

Thank you @nirgrahamuk . Great help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.