Vaginal microbiome CST

Hello everyone! I currently doing microbiome analysis using rRNA 16S marker gene sequencing. I did all the necessary plots (phylum and genus abundance) and calculated the alpha and beta diversity parameters.
Reading about the vaginal microbiome I always find that it is clustered into 4 CST (community state types) according to the lactobacillus dominant species. Hierarchical clustering of the taxonomic profiles using Bray-Curtis distances and ward linkage was first employed to define the vaginal CSTs.
I'm having difficulties doing it on my own. Can someone provide me with a good tutorial to do so? Thank you!
I'm using bioconductor packages for the analysis.

2 Likes

So you need two things. First, computing the Bray-Curtis "distance" (or dissimilarity). There seem to be packages that implement it, e.g. {ecodist} or {abdiv}, or it is possible to reimplement it yourself.

Second, you will need to perform hierarchical clustering with Ward linkage, this can be done with the hclust() function (no other package needed). Looking at ?hclust, you will find that it has a method argument, to specify the linkage (see the Details section about Ward), and it has a d argument, where you can give it the distances you've computed previously. You should be able to find tutorials about hclust(), it's a common function.

So, here following the {ecodist} example (this is not an endorsement, I have never used a Bray-Curtis distance), you could use code like that:

library(ecodist)

# load example dataset
data(graze)
# take a look
dim(graze)
graze[1:5,1:5]


# compute distances
dists <- bcdist(graze[, -c(1:2)])

class(dists)

hc <- hclust(dists,
             method = "ward.D2")

plot(hc)

# Decide on clusters: choose 4 cluster
my_cut <- cutree(hc, k = 4)

# number of observations in each cluster
table(my_cut)

# make a table for each observation, of label and cluster
data.frame(label = hc$labels,
           clust = my_cut)
2 Likes