ncol(countData) == nrow(colData) is not TRUE

maggie1 · October 24, 2022, 8:32am

Proj_count_data = read.table (file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/count_matrix.txt", header = T, row.names = 1, sep = '\t')

Proj_col_data = read.table(file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/BL6Metadata.csv", header = T, sep = ';')

head(Proj_count_data,8)
boxplot(Proj_count_data)
hist(Proj_count_data [,1])
pseudoCount = log2(Proj_count_data + 1)
boxplot(pseudoCount)
hist(pseudoCount[,1])

library(DESeq2)
library(ggplot2)
library(reshape)
pseudoCount = as.data.frame(pseudoCount)
df = melt(pseudoCount, variable.name = "variable", value.name = "value") # reshape the matrix
write.table(df, file ="/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/count_matrixnamechanged.txt")
df_new = read.table (file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/Copy of count_matrixnamechanged.txt", header = T, row.names = 1, sep = '\t')
df_new
df_new = data.frame(df_new, Condition = substr (df$value, 1,18))
ggplot(df_new, aes(x = value, y = X , fill = X)) + geom_boxplot() + xlab("") +
ylab(expression(log[2](count + 1)))
dim(as.matrix(Proj_col_data))
dim(as.matrix(Proj_count_data))

we're testing for the different condidtions

dds = DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, design =~ Condition)
dds

dim(as.matrix(Proj_col_data))
[1] 489222 6
dim(as.matrix(Proj_count_data))
[1] 27179 18

dds = DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, design =~ Condition)
Error in DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, :
ncol(countData) == nrow(colData) is not TRUE

How can I fix the problem?

technocrat · October 24, 2022, 4:33pm

Do countData and colData have the same dim() and are they both square?

AlexisW · October 24, 2022, 5:32pm

The count matrix should have one row per gene, one column per sample. So in your dataset you appear to have 27,179 genes, and 18 samples.

The colData data.frame should have one row per sample, and one column per descriptor (for example one column for age, one column for treatment, etc...) If your design is correct, that means you should have a single column, called Condition (you can have more columns, but they will be ignored). Here you have 489,222 rows in your colData table, which implies 489,222 samples, which does not correspond to the number of samples in your count matrix, so DESeq2 doesn't know what to do with it.

The exact solution depends on what your data really looks like, we can't answer it with the content of your question. If you created a pseudobulk matrix from scRNA-Seq, you might be correctly giving the pseudobulk counts in a cell x condition matrix, but forgot to create the pseudobulk metadata and are giving all the single cells to DESeq2, that would explain the 488k cells.

maggie1 · October 25, 2022, 7:12am

The colData is look like this:

Sorry I’m pretty new with r studio and I do not know how to resolve my problem.

maggie1 · October 25, 2022, 7:35am

maggie1 · October 25, 2022, 8:21am

No and i know the problem is that they don't havee the same dimention. I don't know how to make them have the same dimention

AlexisW · October 25, 2022, 1:13pm

Whatis the ID? Are replicate 1a and 1b independent biological replicates?

What is the Condition? How come you have row 3 and row 21 that are identical (replicate 2a) but for the value of Condition? Does that value mean something?

technocrat · October 25, 2022, 8:54pm

Sorry to have missed the dim() output in your post

Generically, if you need to have two objects where the number of rows of one is equal to the number of columns of another, you need to decide which ones to keep or add. That's usually a domain issue (it requires knowing something about what the underlying data represents) rather than a programming problem.

The DESeqDataSetFromMatrix signature is

DESeqDataSetFromMatrix(countData, colData, design, tidy = FALSE, ignoreRank = FALSE, ...)

and your call conforms, so the problem is in the two arguments, which, per documentation

Rows of colData correspond to columns of countData

and the names have to correspond as well, although you can remove names from both.

Finally, it helps to run the example code to see if what the function produces has the kind of information you expect

library(DESeq2)

countData <- matrix(1:100,ncol=4)
condition <- factor(c("A","A","B","B"))
dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), ~ condition)AAS```

system · November 15, 2022, 8:54pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.