EdgeR : What am I doing? My output is not correct :( Please help me !

MartaLagg · December 18, 2022, 11:07am

Hello!

Please, I need help… I do not understand what is going on with my code.
Let me explain to you : I have a RNAseq results organized in gene counts.

In my experimental design I have a total of 18 samples, that must compared in 3 different groups,
but not between groups. To be more clear, I have 18 samples and 6 conditions. I don’t care about the first 6 samples and I focus only on the second 12 samples.

So in these second samples (columns from 8 to 13 and from 14 to 19) I need to compare the samples from 8 to 13 between them, and the samples from 14 to 19 between them, but not comparison between group 8to13 and group 14to19 should be done.

So, what I do does not work and I don’t understand why.

Here my code, simplified...

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

charge the file, “ a gene count file”

RNAseq = read_xlsx("path/gene_count.xlsx")

head(RNAseq)

####GROUP 8-13, Where samples 8,9 and 10 are triplicates and samples 11,12,13 are triplicates#####

group<- factor(c(1,1,1,2,2,2))

D <- DGEList(counts=RNAseq[,8:13], genes=RNAseq[2], group = factor(group)) #faire l'analyse

keep <- filterByExpr(D)

D <- D[keep, keep.lib.sizes = FALSE]

t = as.matrix(D$samples$lib.size)

rownames(t) = rownames(D$samples)

D <- calcNormFactors(D, method = "TMM")

design <- model.matrix(~group)

DGM <- estimateDisp(D, design)

fit1 <- glmQLFit(DGM, design)

qlf1 <-glmQLFTest(fit1)

outputEDGER <- topTags(qlf1, n=Inf)

MYRESULT <- as.data.frame(outputEDGER)

write_xlsx(MYRESULT, "mypath/results.xlsx")

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Please help a molecular biologist to understand!!
I am becoming crazy, the output I obtain is no pvalue differences and I know that it is false because I have already the results analysed and there are differences, but I need to obtain the same….

I am hopeless, I do not understand
I hope someone can help

Have a nice day!

technocrat · December 18, 2022, 10:09pm

Do you get an error message here? Otherwise, a reprexis needed. See.

MartaLagg · December 18, 2022, 10:35pm

No, it works.
The only problem is that in my file there are 4 conditions, and I need to compare them by groups of two. If I remove manually the columns I don’t need then it works, but there is a problem in the code. Like I cannot select some columns in the matrix for analysing

technocrat · December 18, 2022, 10:41pm

(enough to run) plus code that illustrates the error is needed to provide any insight.

MartaLagg · January 16, 2023, 12:24pm

OK thank you you have been nice, have a good day

AlexisW · January 16, 2023, 5:00pm

As technocrat is pointing out, it's hard to tell if there is any problem without looking at the data: the sequence of commands you use seems mostly correct, but if the columns are not in the order you assume or something like that, we can't tell just from your code.

The few weird things I can see from here are, first, what technocrat already pointed out, the filtering command should usually look like:

D <- D[keep, , keep.lib.sizes = FALSE]

(the second , is important to specify that you keep all columns). But it's on you, as you have the data, to check what D looks like before and after filtering to ensure that the filtering operation does what you want it to do.

Another weird thing is that you define t, but never use it, maybe it's on purpose, maybe you're not doing what you're trying to do.

A third weird thing: you are both giving the groups when defining DGEList(), which means there is a design included in that object, but then you overwrite it with design <- model.matrix(~group). It's not necessarily wrong, but you need to check what the design matrix looks like to make sure you're doing the comparison you think you're doing.

A good check you should try is make an MDS plot, if you see your samples cluster in an unexpected way that could be the sample names that got mixed up.

Another aspect is that you're using a glm test with only two groups (which is perfectly fine), but in that simple case you could try using exactTest() as described in Section 2.10. The glmQLF approach is useful for more complex designs.

Of course you should also check the intermediary results: what do fit1 and qlf1 look like? Can you find the reason genes do not appear different? What if you set your p-value threshold to 0.1 instead of 0.05?

In short, we can't find the reason here, you really need to check each step and make the appropriate plots and diagnostics to check that you are indeed doing what you think you're doing.

If I understand that correctly, what you want to do is not to select columns when building the DGElist object: You'd create an object with all columns, and a full design matrix, do all the dispersion estimates etc on the full object, then do your tests with a contrast argument to only test pairs of conditions.

system · February 27, 2023, 5:00pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.