Violin plot: binary data

Hello, could anyone help me? I need to do a violin plot using this data. My purpose is to show that the season in which these species reproduce is the same in both hemispheres (south and north). So if anyone knows the code for these in ggplot i will really aprecciate.

hemisphere spring summer fall winter
Lutajnus synagris HN 1 1 0 0
Lutjanus analis HN 1 1 0 0
Lutjanus apodus HN 1 0 0 0
Lutjanus cyanopterus HN 1 1 0 0
Lutjanus griseus HN 1 0 0 0
Lutjanus jocu HN 1 1 1 1
Ocyurus chysurus HN 0 1 0 0
Lutjanus analis HS 1 1 1 1
Lutajnus jocu HS 1 1 1 1
Lutjanus gibbus HS 1 0 0 0
Lutjanus fulvus HS 0 1 0 0
Symphorichthys spilurus HS 1 1 1 1
Lutjanus bohar HS 1 1 1 1

first you should restructure your data, then you can simply use ggplot and geom_violin

# used for data clean up and function piping
library(tidyverse)

# used for data import
library(data.table)

# data.table file read
species <- fread("./rstudioQuestion.txt")

# convert data set to long
species <- species %>% gather(key = "season", value="value", "spring":"winter")

# assign hemisphere to x-axis, and values (breeding?) to the y-axis
plt <- ggplot(species, aes(hemisphere, value))
plt + geom_violin()

Hello, thanks for the help but iam still having some problem to run the code. i think thats something about the hemisphere variable.

library(tidyverse, quietly = TRUE)
binary_data <- data.frame(stringsAsFactors=FALSE,
                          Species= c(
                            "Lutajnus synagris",
                            "Lutjanus analis",
                            "Lutjanus apodus",
                            "Lutjanus cyanopterus",
                            "Lutjanus griseus",
                            "Lutjanus jocu",
                            "Ocyurus chysurus",
                            "Lutjanus analis",
                            "Lutajnus jocu",
                            "Lutjanus gibbus",
                            "Lutjanus fulvus",
                            "Symphorichthys spilurus",
                            "Lutjanus bohar"
                          ),
                          hemisphere= c(
                            "HN", 
                            "HN", 
                            "HN", 
                            "HN", 
                            "HN", 
                            "HN", 
                            "HN", 
                            "HS", 
                            "HS", 
                            "HS", 
                            "HS", 
                            "HS", 
                            "HS"
                            ),
                          spring= c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1),
                          summer= c(1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1),
                          fall= c(0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1),
                          winter= c(0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1)
)
                          
sp <- Species%>% gather(key = "season", value="value", "spring":"winter")

**> sp <- Espécie %>% gather(key = "estação", value="value", "primavera":"inverno")**
**Error in eval(lhs, parent, parent) : object 'Espécie' not found**

Could you please turn this into a self-contained reprex (short for reproducible example)? It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

1 Like

I think I lost you on the piping...

use ?gather to see that the first argument is a dataframe, what the piping operator does is put that left hand side object into the first argument of the right hand side.

binary_data %>% gather(key="season", value ="value", "spring":"winter")  

is the same as

gather(binary_data, key="season", value ="value", "spring":"winter")  

hope that helps!

Violin plots are not for binary data, they are for showing data distribution. Maybe you need to use a different type of plot, something more similar to this.

library(tidyverse, quietly = TRUE)
binary_data <- data.frame(stringsAsFactors=FALSE,
                          Species = c("Lutajnus synagris", "Lutjanus analis", "Lutjanus apodus",
                                      "Lutjanus cyanopterus", "Lutjanus griseus", "Lutjanus jocu",
                                      "Ocyurus chysurus", "Lutjanus analis", "Lutajnus jocu", "Lutjanus gibbus",
                                      "Lutjanus fulvus", "Symphorichthys spilurus", "Lutjanus bohar"),
                          hemisphere = c("HN", "HN", "HN", "HN", "HN", "HN", "HN", "HS", "HS", "HS",
                                         "HS", "HS", "HS"),
                          spring = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1),
                          summer = c(1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1),
                          fall = c(0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1),
                          winter = c(0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1)
                          )
binary_data %>% 
    gather(Season, Presence, spring:winter) %>% 
    group_by(hemisphere, Season) %>% 
    summarise(species_count = sum(Presence)) %>% 
    ggplot(aes(x = Season, y = hemisphere)) +
    geom_point(aes(size = species_count, color = species_count)) +
    scale_size_continuous(range = c(3,18)) +
    guides(colour = guide_legend()) +
    labs(color = 'Number of Species',
         size = 'Number of Species') +
    theme_minimal()

This is what you would get with a violin plot

binary_data %>%
    gather(key = "season", value="value", "spring":"winter") %>% 
    ggplot(aes(hemisphere, value)) + 
    geom_violin()

Thank you so much for your help!!

The problem in this violin plot is because I have to correlate also the season, not only the hemisphere, understand?

I don't understand what you mean by "correlate", is not possible to get correlation between categorical variables.

As I said before a violin plot needs a continuous variable to be applied, you only have categorical variables some of which are one hot encoded.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.