Creating a % calculation to use in a plot

Currently I'm using a mosaicplot to see an accuracy vs length of vowel comparison (based on if someone knows Japanese). My code is working, but is there a way that I can calculate a % value for my JAcc column using boxplot or some other categorical-friendly function?

For that matter, I get some weird results already when I try to boxplot it

Original mosaicplot code:

mosaicplot(table(JA1$JAcc, JA1$Length))

attempt at boxplot code:

ggplot(data = JA1) +
  geom_boxplot(mapping = aes(x = JAcc, y = Length))

results in:
55%20PM

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

1 Like

this should be the information needed?

> head(JA1)
# A tibble: 6 x 9
  Participant Experiment Question Order Length Quality Response JAcc        LengthLanguage
        <dbl>      <dbl>    <dbl> <dbl> <chr>  <chr>   <chr>    <chr>       <chr>         
1           1          1        1     1 short  u       long     Noincorrect No            
2           1          1        2     2 short  i       long     Noincorrect No            
3           1          1        3     3 long   e       long     Nocorrect   No            
4           1          1        4     4 short  a       short    Nocorrect   No            
5           1          1        5     5 long   o       short    Noincorrect No            
6           1          1        6     6 long   a       long     Nocorrect   No    
library(tidyverse)
ComprrJAcc <- unite(Comprr, JAcc, Japanese, Accuracy, sep = "")
#and .     these were done as separate operations if that matters, not done together
ComprrJAcc %>% filter(Experiment=="1") -> JA1

#this is the code I ran
ggplot(ComprrJAcc) +
  geom_boxplot(aes(x=JAcc, y=Length))

#Following is the console message
> ggplot(data = JA1) +
  +   geom_boxplot(aes(x=JAcc, y=Length))

the graph is the same as above:
55%20PM

hopefully this is enough information to reproduce it

You could get the % short/long length for each JAcc category. I've used simulated data which gives you an artificial 50% mean for each category. The means for your data should vary.

library(ggplot2)
JA1 <- expand.grid(Length=c("short","long"), 
            JAcc=c("Noincorrect","Nocorrect","Yescorrect","Yesincorrect"))

ggplot(JA1) +
  geom_boxplot(aes(x=JAcc, y=round(mean(Length=="long")*100, digits = 2)))

Created on 2019-11-22 by the reprex package (v0.3.0)

It is not clear to me what your desired output is, you cant make a boxplot out of a categorical variable but considering your variable selection, maybe you want to do something like this?

JA1 <- data.frame(stringsAsFactors=FALSE,
                  Participant = c(1, 1, 1, 1, 1, 1),
                  Experiment = c(1, 1, 1, 1, 1, 1),
                  Question = c(1, 2, 3, 4, 5, 6),
                  Order = c(1, 2, 3, 4, 5, 6),
                  Length = c("short", "short", "long", "short", "long", "long"),
                  Quality = c("u", "i", "e", "a", "o", "a"),
                  Response = c("long", "long", "long", "short", "short", "long"),
                  JAcc = c("Noincorrect", "Noincorrect", "Nocorrect", "Nocorrect",
                           "Noincorrect", "Nocorrect"),
                  LengthLanguage = c("No", "No", "No", "No", "No", "No")
)

library(tidyverse)

JA1 %>% 
    count(Length, JAcc, name = "Prop") %>%
    group_by(JAcc) %>% 
    mutate(Prop = Prop/sum(Prop) * 100) %>% 
    ggplot(aes(x=JAcc, y = Length, fill = Prop)) +
    geom_tile() +
    geom_text(aes(label = scales::number(Prop, 
                                         accuracy = 0.1,
                                         suffix = "%")),
              color = "white") +
    scale_fill_gradient(limits = c(0, 100))

1 Like

I figured out why it was failing to give me what I wanted, I was mixing up my bar and boxplot functions, but this actually works wonderfully for what i was trying to get done and works better as a visualization than what I had in mind!

That makes much more sense actually, I think this would be more appropriate.

library(tidyverse)

JA1 %>% 
    count(Length, JAcc, name = "Prop") %>%
    group_by(JAcc) %>% 
    mutate(Prop = Prop/sum(Prop) * 100) %>% 
    ggplot(aes(x = JAcc, y = Prop, fill = Length)) +
    geom_col() +
    geom_text(aes(label = scales::number(Prop, 
                                         accuracy = 0.1,
                                         suffix = "%")),
              color = "white",
              position = "stack",
              vjust = 1.5)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.