Visualizing data with widespread values

ryanmrof · August 13, 2020, 4:14pm

I've collected data for a project where most of my data points have a value less than 6. I want to display the data so that I can see the spread of data points per categorical variable on the x axis but am struggling to find the best method.

data:

"x" , "y" |
"A", 2.29 |
"A", 1.24 |
"A", 4.47 |
"A", 0.49 |
"A", 2.51 |
"A", 28.1 |
"A", 371.4 |
"A", 729.5 |
"B", 0.89 |
"B", 4.16 |
"B", 153.2 |
"D", 0.3 |
"D", 0.33 |
"D", 0.64 |
"D", 1.61 |
"D", 0.54 |
"D", 70.84 |
"D", 45.45 |
"E", 0.57 |
"E", 0.69 |
"E", 6.28 |
"E", 230.5 |
"C", 18.74

So far, I've generated a dot-plot using the following command:

library(ggplot2)
library(ggthemes)
library(tidyverse)
library(ggpubr)
library(ggforce)
data_graph <- ggplot(data, aes(x = x, y = y, color = x))+
geom_point(stat = "identity", position = "identity", size = 10, alpha = 0.6) +
labs(x = "x", y = "y") +
scale_color_manual(values = c("#FF6666", "#9966CC", "#0099FF", "#66CC66", "#FF66CC"), name = "data", labels = c("A", "B", "C", "D", "E")) +
theme(legend.title = element_blank()) +
stat_summary(geom = "point", fun = "mean", col = 'black', size = 5, fill = 'black') +
stat_summary(fun.data = "mean_se", geom = "errorbar", width = 0.15) +
theme_base(base_size = 14, base_family = "Times New Roman")+
scale_x_discrete(labels = c("A", "B", "C", "D", "E"))

When I tried facet_zoom() and use ylim = c(0,8) to subset all data points with a Y value below 8, I get the following error:
Error in [<-(tmp, !is.na(alpha), 4, value = alpha[!is.na(alpha)]) : (subscript) logical subscript too long In addition: Warning message: In rep(colour, length.out = length(alpha)) : 'x' is NULL so the result will be NULL

How do I go about this error? Is there another method in R to subset data?

Thank you in advance.

Ryan

AlexisW · August 14, 2020, 12:16am

You can use

scale_y_continuous(limits = c(0,8))

To show the spread of your data, depending on the context, you can also consider using a violin plot (geom_violin) or boxplot, or to switch the y axis to a log scale (scale_y_log10).

EDIT: and if you're not making any other change to the axis, you can use ylim() directly:

ggplot(df, aes(x = x, y = y, color = x)) +
  geom_point(size = 10, alpha=.6) +
  ylim(0,8)

system · September 4, 2020, 12:16am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.