How Do I Show Levels In R?

I've been tasked with using a one-way ANOVA test on some data. However, the data used in examples online doesn't look like the data given to me. This is where I'm having trouble because I need to find the levels to start the ANOVA process (see below). Any suggestions on how I can get past the first step?


(1) show the levels --> levels (my_data$group)

(2) re-order the levels --> my_data$group <- ordered(my_data$group , levels = c("...:, "...", "..."))

(3) use dplyr

(4) use ggpubr

(5) create boxplot and plotmeans

(6) compute ANOVA test

Assuming your data was in the form of a data.frame called tc_frame
and assuming that you had a factor variable called my_awesome_group for which you wanted to discover the levels.
then typing the following into the console , would be the way to go !

1 Like

So I'm brand new at r and I have no idea what you just said (I'm so so sorry). I have attached a picture of the excel data that I have uploaded to rstudio but I'm having trouble converting the data to one-way ANOVA. Could you please help me walk through the basic steps?

That's ok. Everyone starts somewhere.
Rstudio roughly breaks down like this

Section 2 being a console where you write R commands.

Your data frame is called Phylum_Data_for_R
Which variable is the grouping one whose levels you want to investigate?

1 Like

So the top row is phylum (variable 1) and the left column is the sites tested (variable 2). Each site has a certain amount of phylum present. I don't know which variable to use because I don't really understand the role of groups and levels. What do you suggest?

It seems like you're hazy on your intentions.
You want to see if there are statistical differences in the means of quanities tested across various sites for some set of phylum ?

Here is an example where I do that, and I think you could adapt it to cover your case.
I encourage you to run through this line by line (use CTRL and Enter key to send the R instruction which you have under your cursor in the main editor window to be run in the console)

library(tidyverse)  # for data manipulation - use install.packages("tidyverse") in the console if you dont yet have this library
library(ggpubr)  # for visualisation - install.packages("ggpubr")
set.seed(42) # so when we use random data we can make our analysis repeatable

made_up_df <- tibble(
  test_site_codes = letters[1:20],
  phylum_1 = rnorm(20,mean=10,sd=2),
  phylum_2 = rnorm(20,mean=8 ,sd=4)

#have a look at the date in table form

#bad shape for what we want to do so reorganise it
df2 <- pivot_longer(made_up_df,
                    cols = starts_with("phylum_"),

#see the preferred shape for the data

#lets make phylum a grouping variable, by converting to a factor with levels
df2$phylum <- as.factor(df2$phylum)

#inspect the levels they should be phylum_1 and phylum_2

#visualise what we are investigating
ggboxplot(df2, x = "phylum", y = "sampled_amount", 
          color = "phylum", palette = c("#00AFBB", "#E7B800"),
          ylab = "sampled amount", xlab = "phylum")

# Mean plots
# ++++++++++++++++++++
# Plot weight by group
# Add error bars: mean_se
# (other values include: mean_sd, mean_ci, median_iqr, ....)
ggline(df2, x = "phylum", y = "sampled_amount", 
       add = c("mean_se", "jitter"))

# Compute the analysis of variance
res.aov <- aov(sampled_amount ~ phylum, data = df2)
# Summary of the analysis

I've figured out the question you've asked me before. I have 3 variables (D, HR, and C). Within each variable is 4 levels (15B, 610B, 15F, and 610F). I know that ANOVA is run with two columns. However, I have 53 colums (i.e. 53 phylums). What should I do? Thank you again for you patience. I'm learning a lot from our conversations.

Its always easier to drive towards a solution, with a target in mind, I think that better than driving mechanically towards running an anova function, its best to have an actual scientific question under study that you want to understand, for which (maybe) anova is a good approach. What do you actually want to undersand about your data ? Is there a hypothesis you want to test?

Hopefully I'm not distracting from the previous paragraph by saying that your data as you descrive it is 'not tidy', by which we mean that different variables are sharing living space amongst a column, its usually beneficial to seperate out things that are seperate. This would mean parsing the string to seperate the D,HR and C values from the 15B, 610B, 15F, and 610F values, putting each in their own columns. But what would be good names for these columns, what are these codes telling us ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.