Plotting a scatter plot with categorical data.

Hey R users: a newbie here.

I'm trying to get a plot in R that would look something like A

Y axis is just names that are not important at this moment. I guess you could say it is some kind of density plot but with all points visible.

I used regular plot

plot(data.frame(x,y))

and I get plotted numerical position of the Y.(B)

How do I get this organized so it looks like the first plot? Or is there a simpler way?

Thank you!

# there are several ways to approach this
# let's use the penguins data to illustrate

# install penguins data
remotes::install_github("allisonhorst/palmerpenguins")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'palmerpenguins' from a github remote, the SHA1 (95e62697) has not changed since last install.
#>   Use `force = TRUE` to force installation

# load packages
library(tidyverse)
library(palmerpenguins)
library(ggbeeswarm)
library(ggforce)

# peek at penguins data
glimpse(penguins)
#> Rows: 344
#> Columns: 7
#> $ species           <chr> "Adelie", "Adelie", "Adelie", "…
#> $ island            <chr> "Torgersen", "Torgersen", "Torg…
#> $ culmen_length_mm  <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.…
#> $ culmen_depth_mm   <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.…
#> $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 18…
#> $ body_mass_g       <dbl> 3750, 3800, 3250, NA, 3450, 365…
#> $ sex               <chr> "MALE", "FEMALE", "FEMALE", NA,…

# clunky jitter version
ggplot(data = penguins) +
  aes(x = body_mass_g, y = species) +
  geom_jitter()
#> Warning: Removed 2 rows containing missing values
#> (geom_point).


# lined up beeswarm version
ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_beeswarm() +
  coord_flip()
#> Warning: Removed 2 rows containing missing values
#> (position_beeswarm).


# version that corresponds to geom_violin with geom_sina
ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_sina() +
  coord_flip()
#> Warning: Removed 2 rows containing non-finite values
#> (stat_sina).


# geom_sina with geom_violin
ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_violin() +
  geom_sina() +
  coord_flip()
#> Warning: Removed 2 rows containing non-finite values
#> (stat_ydensity).
#> Warning: Removed 2 rows containing non-finite values
#> (stat_sina).

Created on 2020-06-11 by the reprex package (v0.3.0)

1 Like

Sweet! Thank you so much! :star_struck:

1 Like

Note that you can adjust how tightly the ggbeeswarm points are packed with the cex=1 argument.

library(tidyverse)
library(palmerpenguins)
library(ggbeeswarm)
library(ggforce)

# peek at penguins data
glimpse(penguins)
#> Rows: 344
#> Columns: 7
#> $ species           <chr> "Adelie", "Adelie", "Adelie", "…
#> $ island            <chr> "Torgersen", "Torgersen", "Torg…
#> $ culmen_length_mm  <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.…
#> $ culmen_depth_mm   <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.…
#> $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 18…
#> $ body_mass_g       <dbl> 3750, 3800, 3250, NA, 3450, 365…
#> $ sex               <chr> "MALE", "FEMALE", "FEMALE", NA,…

ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_beeswarm(cex = 0.5) +
  coord_flip()
#> Warning: Removed 2 rows containing missing values
#> (position_beeswarm).


ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_beeswarm(cex = 1.5) +
  coord_flip()
#> Warning: Removed 2 rows containing missing values
#> (position_beeswarm).


ggplot(data = penguins) +
  aes(y = body_mass_g, x = species) +
  geom_beeswarm(cex = 2.5) +
  coord_flip()
#> Warning: Removed 2 rows containing missing values
#> (position_beeswarm).

Created on 2020-06-11 by the reprex package (v0.3.0)

1 Like

Good tip!
I went with a mixture of geom_sina and sunflower plot.

A quick question: why do we need to use coord_flip? I know what it does but it seems that you cannot go without it here by simply reassigning axes.

It appears to be an oddity with ggbeeswarm, which assumes that your categories should be on the x axis and continuous variable on the y axis. This was probably a behavior inherited from ggplot2. This will probably change in the future, as ggplot 3.3.0 (as of Mar 5 2020) now has bi-directional geoms and stats. See https://www.tidyverse.org/blog/2020/03/ggplot2-3-3-0/

I would guess that the next version of ggbeeswarm will no longer make this assumption, as it is an extension of ggplot.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.