noob asks for help to make a GLM to test the specific diversity according to the type of soil

Hello there!

I'm here to ask for help about GLM (Generalized Linear Model).
I will explain the context to you, I am an intern and I work on the zooplankton of New Caledonia, for my report I have to do statistical tests and I therefore decided to test whether the type of soil had an impact on the specific diversity, in order to see if the type of soil creates endemism.
My data table is the results of 2 months of zooplankton sorting done in various samples from different ponds in New Caledonia.
So I have a column with the name of the samples (which corresponds to a site), the species found (for example Daphnia cephalata), the "landuse" which is a figure that I obtained using Qgis a site of cartography, the numbers correspond to a type of soil.
My supervisor advises me to do a GLM because my data is not suitable for other tests according to him.
But hey, I suck at statistics and i never do GLM. My script on Rstudio currently looks like this: :poop:
I'm joking but it looks like that:

data <- read.table("fichier_tri_stat.csv" , fill = TRUE, header = TRUE, sep = ";" )
attach(data)
data

y<-cbind(species, landuse)
model<-glm(y~???, family=binomial(link="logit"))

And that's all, and it's probably wrong.. so if u have good skills on rstudio and if the subject interests you, i need hellllp.

PS: i'm french so i'm sorry for my bad english

I may not be able to help much here but I will give it a try.

Let's start by clarifying the kind of data you have. Here is my understanding of the data. You have two columns of interest, species and landuse. Species is coded as a name, e. g. Daphnia cephalata. Landuse is a number that encodes a type of soil, so it is a category and not a continuous measure. Is all of that correct?

You want to see if "specific diversity" depends on landuse but nothing in the data I described quantifies directly how many species are associated with each value of landuse. You just have the name of a species and the landuse of the site where it was observed. Wouldn't you need to summarize the data to show how many species are present for each value of landuse? Or am I completely misunderstanding your goal?

P.S. Your English is excellent!

Thank you very much for your answer !
Yes it is, for example: 10= Cropland, rainfed.

In fact I would like to ask Rstudio to "read" which species are present in a site, and thus see if we find the same species in the same types of soil. I don't know if it's possible.

PS: thanks google translate :joy:

Here are a couple of ways to summarize the data. I invented a small data set to use in the example. The first summary shows how may species are at each site. The second summary shows a cross table of which species are at each site. If you data also has a column labeling the soil type of each landuse, you could summarize with respect to that as an alternative to summarizing by landuse.

library(dplyr)
#Invent some data
DF <- data.frame(Species = c("A","C","D","B","C","A","D","B","C","D"),
                  landuse = c(1,1,1,2,2,3,3,3,4,4))
#Display the data
DF
   Species landuse
1        A       1
2        C       1
3        D       1
4        B       2
5        C       2
6        A       3
7        D       3
8        B       3
9        C       4
10       D       4
 
#Count the species per landuse category
SpeciesCount <- DF |> group_by(landuse) |> summarize(N = n())
SpeciesCount
# A tibble: 4 x 2
  landuse     N
    <dbl> <int>
1       1     3
2       2     2
3       3     3
4       4     2

 #Make a table of species and landuse 
table(DF$Species, DF$landuse)
   
    1 2 3 4
  A 1 0 1 0
  B 0 1 1 0
  C 1 1 0 1
  D 1 0 1 1

Thank you so much !
What you told me helped me and I also had help from my superior in the meantime. I now have the results of my GLM but to interpret that is now another pair of sleeves.
Here you have the result of the GLM, with the graphs of dispersion, residuals, uniformity and outlier.
I just saw that you can only put one photo per post, so I'm putting the GLM one, if you're interested I could put the others.
72474ddf-3820-46c0-8e85-24aa852196cb

I'm having trouble finding a site that helps me with the different graphs I get, do you know how to interpret them or know of a site that helps with this.

Thanks and have nice wk!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.