What would be the best graphs to visualize this dataset?

_shy · December 7, 2022, 12:36am

Hello all, Below is row 1-20 of a dataset I am working on. The dataset is plant count data. There are 3 blocks, each with 4 plots. These 4 plots represent 4 different levels of seeding rates. 5 sample rings were randomly put into each plot on 5 different dates and the number of plants were counted in each ring (sample_size is the number of plants counted) each date. I am having a hard time visualizing this. I would like to know if there is a difference in the number of plants in relation to the seeding rate (plot 1,2,3 or 4) overall, if there is a difference in this result by the dates, and if there is a difference by the 3 blocks. I have been playing around with faceting, boxplots, and regression but not getting the visuals I am looking for/errors. Any thoughts on this is appreciated!

seed_count <-
structure(
list(
date = c(
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16"
),
block = c(
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L
),
plot = c(
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L
),
sample_size = c(
13L,
10L,
13L,
17L,
10L,
7L,
13L,
15L,
9L,
15L,
39L,
4L,
31L,
36L,
18L,
12L,
13L,
15L,
22L,
19L
)
),
row.names = c(NA, 20L),
class = "data.frame"
)

M_AcostaCH · December 7, 2022, 3:34am

Hi @_shy,
I see that you have time data. Maybe you could make a time line if the time are different because in the example data, the colum date are equal in all rows.

Try to use this but with all data set.

library(tidyverse)
ggplot(seed_count, aes(x=date , y=sample_size , fill=plot)) +
  geom_line()

_shy · December 7, 2022, 4:08am

@M_AcostaCH Thanks for the response. Could you explain more what you mean? In the whole dataset each plot is visited and 5 sample sets were counted at each plot on all 5 dates. Are you saying that because this is the case this ggplot would be a good visual? Also any suggestions for how a box plot or linear regression would work with this dataset? I would like to get some p values to represent the different seed rates (plots 1-4 represent the 4 seed rates).

M_AcostaCH · December 7, 2022, 7:29pm

Hi, is difficult make a good response if I dont see all data. Like was I said you could make a time series but linear regresion is well for understand better the data.

 seed_count$plot <- as.factor( seed_count$plot) # set levels of variable

 library(tidyverse)
 ggplot(seed_count, aes(x=plot , y=sample_size , fill=plot)) +
   geom_col()

ggplot(seed_count, aes(x=plot , y=sample_size , fill=plot)) +
   geom_boxplot()
 
 ggplot(seed_count, aes(x=plot , y=sample_size , fill=plot)) +
   geom_jitter(aes(colour = plot))

 model1 <- lm(sample_size~ plot , data=seed_count)
 summary(model1)

 # >  summary(model1)
 # 
 # Call:
 #   lm(formula = sample_size ~ plot + block, data = seed_count)
 # 
 # Residuals:
 #   Min     1Q Median     3Q    Max 
 # -21.6   -2.9    0.4    3.5   13.4 
 # 
 # Coefficients: (1 not defined because of singularities)
 # Estimate Std. Error t value Pr(>|t|)   
 # (Intercept)   12.600      3.532   3.567  0.00257 **
 #   plot2         -0.800      4.995  -0.160  0.87476   
 # plot3         13.000      4.995   2.603  0.01924 * 
 #   plot4          3.600      4.995   0.721  0.48148   
 # block             NA         NA      NA       NA   
 # ---
 #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 # 
 # Residual standard error: 7.898 on 16 degrees of freedom
 # Multiple R-squared:  0.3758,	Adjusted R-squared:  0.2588 
 # F-statistic: 3.211 on 3 and 16 DF,  p-value: 0.05119

jrkrideau · December 7, 2022, 8:58pm

I think @ M_AcostaCH has a good point. We need to see more data. At the moment , block only equals one. Can you give us some sample data with a mixture of blocks?

If I am reading that design correctly you should only have 60 rows of data so supplying the entire data set is reasonable. Data sets with 5,000 rows are another matter.

What dating system are you using d/m/y or m/d/y?

_shy · December 7, 2022, 9:55pm

@jrkrideau that makes sense/ I see what your saying The data I have has 300 rows so hopefully this is OK to include here. (data below) I used dput() but dont know another way to make sharing this dataset smaller. here is is below. for date i have it as m/d/y. Im just having a hard time ordering things in the correct way to not get errors...

seed_count <-
structure(
list(
date = c(
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/11/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"6/24/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"7/21/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"8/16/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16",
"9/23/16"
),
block = c(
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L,
3L
),
plot = c(
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L,
1L,
1L,
1L,
1L,
1L,
2L,
2L,
2L,
2L,
2L,
3L,
3L,
3L,
3L,
3L,
4L,
4L,
4L,
4L,
4L
),
sample_size = c(
13L,
10L,
13L,
17L,
10L,
7L,
13L,
15L,
9L,
15L,
39L,
4L,
31L,
36L,
18L,
12L,
13L,
15L,
22L,
19L,
10L,
7L,
13L,
11L,
4L,
23L,
20L,
11L,
26L,
12L,
6L,
8L,
10L,
4L,
11L,
17L,
15L,
29L,
10L,
8L,
6L,
9L,
16L,
5L,
0L,
13L,
9L,
5L,
8L,
5L,
13L,
13L,
12L,
8L,
7L,
12L,
11L,
7L,
23L,
7L,
14L,
15L,
8L,
4L,
12L,
18L,
15L,
19L,
14L,
9L,
41L,
8L,
17L,
15L,
22L,
NA,
NA,
NA,
NA,
NA,
5L,
26L,
15L,
7L,
6L,
9L,
10L,
25L,
15L,
7L,
12L,
14L,
16L,
26L,
15L,
21L,
15L,
16L,
19L,
12L,
28L,
4L,
6L,
5L,
3L,
11L,
8L,
8L,
17L,
5L,
8L,
11L,
12L,
15L,
8L,
4L,
22L,
6L,
17L,
15L,
14L,
10L,
8L,
2L,
10L,
8L,
8L,
14L,
7L,
3L,
5L,
12L,
11L,
10L,
16L,
16L,
13L,
9L,
18L,
6L,
9L,
13L,
10L,
8L,
8L,
7L,
10L,
11L,
7L,
12L,
4L,
10L,
11L,
6L,
4L,
13L,
16L,
11L,
12L,
7L,
13L,
8L,
7L,
10L,
5L,
17L,
10L,
8L,
7L,
8L,
10L,
7L,
11L,
11L,
6L,
10L,
14L,
13L,
8L,
12L,
13L,
11L,
10L,
12L,
8L,
13L,
14L,
13L,
12L,
13L,
13L,
13L,
17L,
9L,
12L,
7L,
18L,
13L,
12L,
16L,
7L,
7L,
6L,
7L,
11L,
12L,
14L,
11L,
8L,
7L,
18L,
10L,
11L,
9L,
9L,
16L,
12L,
11L,
12L,
10L,
6L,
11L,
7L,
13L,
11L,
8L,
14L,
11L,
12L,
15L,
24L,
5L,
11L,
17L,
10L,
11L,
14L,
11L,
11L,
12L,
3L,
5L,
7L,
4L,
6L,
4L,
6L,
5L,
5L,
9L,
5L,
6L,
7L,
5L,
6L,
5L,
7L,
7L,
6L,
4L,
4L,
4L,
5L,
5L,
2L,
6L,
6L,
8L,
6L,
7L,
2L,
6L,
6L,
7L,
4L,
5L,
4L,
6L,
3L,
3L,
5L,
6L,
5L,
8L,
3L,
5L,
4L,
5L,
5L,
5L,
6L,
5L,
4L,
5L,
4L,
5L,
5L,
4L,
5L,
8L
)
),
row.names = c(NA, 300L),
class = "data.frame"
)

_shy · December 7, 2022, 9:57pm

thanks @M_AcostaCH . this is helpful. I added additional data in a comment above. This is helpful, though because really the trouble I am having is putting things in the right order/assigning the data in the correct place in the code to get a good graph/and some had an error.

jrkrideau · December 8, 2022, 2:13am

Here is one possible approach. You probably will have to install two or three packages.

library(tidyverse)
library(lubridate)
library(ggbeeswarm)
library(patchwork)

seedcount  <- read_csv("seed_count.csv")
seedcount  <- tibble(seedcount)
seedcount  <- seedcount  %>%  mutate(date = mdy(date))

seedcount$plot <- as.factor( seedcount$plot)

b1  <- subset(seedcount, block = 1)
b2  <- subset(seedcount, block = 2)
b3  <- subset(seedcount, block = 3)

block1  <- ggplot(b1, aes(plot, sample_size)) +
  geom_boxplot(show.legend = FALSE) + 
  geom_beeswarm(aes(colour = plot), show.legend = FALSE) + 
  facet_grid(. ~ date) + ggtitle("Block 1")

block2  <- ggplot(b2, aes(plot, sample_size)) +
  geom_boxplot(show.legend = FALSE) + 
  geom_beeswarm(aes(colour = plot), show.legend = FALSE) +
  facet_grid(. ~ date) + ggtitle("Block 2")

block3  <- ggplot(b3, aes(plot, sample_size)) +
  geom_boxplot(show.legend = FALSE) + 
  geom_beeswarm(aes(colour = plot), show.legend = FALSE) +
  facet_grid(. ~ date) + ggtitle("Block 3")

## Plot horizontally
block1 + block2 + block3  

# or 

## Plot vertically
block1 + block2 + block3 + plot_layout(nrow = 3)

system · December 29, 2022, 2:13am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.