Group image counts in specific ranges

user124578 · July 3, 2020, 10:37pm

I have the data below produced from a machine learning algorithm which count cells in each image that I would like to visualise in order to understand how the model is predicting the cells. The data shows the actual and the predictions counts. Is it possible to group the data in range of (0:100),(100:200),(200:300),(400:500). I would like to understand how the model is preforming with these range of groups of cells. Not sure if there is a better way of presenting this but this is what i have in mind. Many Thanks

data.frame(
      Image1 = c(294L, 249L, 14L, 278L, 54L, 311L, 202L, 238L, 601L, 275L),
      Image2 = c(294L, 205L, 14L, 244L, 55L, 300L, 234L, 178L, 597L, 293L),
      Image3 = c(269L, 395L, 14L, 401L, 59L, 258L, 236L, 363L, 608L, 354L),
      Image4 = c(298L, 249L, 13L, 312L, 57L, 258L, 218L, 290L, 580L, 254L),
      Image5 = c(295L, 182L, 15L, 323L, 64L, 200L, 219L, 326L, 565L, 292L),
      Image6 = c(297L, 328L, 10L, 344L, 56L, 292L, 225L, 298L, 545L, 217L),
      Image7 = c(270L, 319L, 17L, 334L, 69L, 216L, 244L, 293L, 582L, 253L),
      Image8 = c(301L, 266L, 11L, 354L, 59L, 342L, 213L, 271L, 585L, 305L),
   ValueType = as.factor(c("Pred","Actual",
                           "Pred","Actual","Pred","Actual","Pred","Actual",
                           "Pred","Actual"))

FJCC · July 4, 2020, 2:25am

The cut() function will make bins for data according to break points that you give it. I used it to bin the Actual values. To make the comparison between Actual and Pred values easier, I made a data frame for each type of value and then joined the two so that each row would have the associated actual and predicted values. Is that the kind of thing you are trying to do?

library(tidyr)
library(dplyr, warn.conflicts = FALSE)
DF <- data.frame(
  Image1 = c(294L, 249L, 14L, 278L, 54L, 311L, 202L, 238L, 601L, 275L),
  Image2 = c(294L, 205L, 14L, 244L, 55L, 300L, 234L, 178L, 597L, 293L),
  Image3 = c(269L, 395L, 14L, 401L, 59L, 258L, 236L, 363L, 608L, 354L),
  Image4 = c(298L, 249L, 13L, 312L, 57L, 258L, 218L, 290L, 580L, 254L),
  Image5 = c(295L, 182L, 15L, 323L, 64L, 200L, 219L, 326L, 565L, 292L),
  Image6 = c(297L, 328L, 10L, 344L, 56L, 292L, 225L, 298L, 545L, 217L),
  Image7 = c(270L, 319L, 17L, 334L, 69L, 216L, 244L, 293L, 582L, 253L),
  Image8 = c(301L, 266L, 11L, 354L, 59L, 342L, 213L, 271L, 585L, 305L),
  ValueType = as.factor(c("Pred","Actual",
                          "Pred","Actual","Pred","Actual","Pred","Actual",
                          "Pred","Actual")))

Pred <- filter(DF, ValueType == "Pred") %>% 
  mutate(ROW = row_number())
Actual <- filter(DF, ValueType == "Actual") %>% 
  mutate(ROW = row_number())

Predlng <- pivot_longer(data = Pred, cols = -c("ValueType", "ROW"), 
                       names_to = "Image", values_to = "value")

Actlng <- pivot_longer(data = Actual, cols = -c("ValueType", "ROW"), 
                       names_to = "Image", values_to = "value") %>% 
  mutate(BIN = cut(value, breaks = seq(0, 700, 100)))

AllDat <- inner_join(Actlng, Predlng, by = c("ROW", "Image"), suffix = c(".Act", ".Pred"))
AllDat
#> # A tibble: 40 x 7
#>    ValueType.Act   ROW Image  value.Act BIN       ValueType.Pred value.Pred
#>    <fct>         <int> <chr>      <int> <fct>     <fct>               <int>
#>  1 Actual            1 Image1       249 (200,300] Pred                  294
#>  2 Actual            1 Image2       205 (200,300] Pred                  294
#>  3 Actual            1 Image3       395 (300,400] Pred                  269
#>  4 Actual            1 Image4       249 (200,300] Pred                  298
#>  5 Actual            1 Image5       182 (100,200] Pred                  295
#>  6 Actual            1 Image6       328 (300,400] Pred                  297
#>  7 Actual            1 Image7       319 (300,400] Pred                  270
#>  8 Actual            1 Image8       266 (200,300] Pred                  301
#>  9 Actual            2 Image1       278 (200,300] Pred                   14
#> 10 Actual            2 Image2       244 (200,300] Pred                   14
#> # ... with 30 more rows

^{Created on 2020-07-03 by the reprex package (v0.3.0)}

user124578 · July 4, 2020, 7:16am

Many Thanks for this. This is kind of what I am looking to do. Can you suggest the best to visualise this please? I have 23189 observations. Thanks

nirgrahamuk · July 4, 2020, 7:36am

I guess you would calculate the Root Mean Squared Error for each Bin and then it would be straightforward to visualise

system · July 25, 2020, 7:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.