ncol(countData) == nrow(colData) is not TRUE

Proj_count_data = read.table (file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/count_matrix.txt", header = T, row.names = 1, sep = '\t')

Proj_col_data = read.table(file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/BL6Metadata.csv", header = T, sep = ';')

head(Proj_count_data,8)
boxplot(Proj_count_data)
hist(Proj_count_data [,1])
pseudoCount = log2(Proj_count_data + 1)
boxplot(pseudoCount)
hist(pseudoCount[,1])

library(DESeq2)
library(ggplot2)
library(reshape)
pseudoCount = as.data.frame(pseudoCount)
df = melt(pseudoCount, variable.name = "variable", value.name = "value") # reshape the matrix
write.table(df, file ="/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/count_matrixnamechanged.txt")
df_new = read.table (file = "/Volumes/eHDD40_10TB/PROJECTS/Ariane_RNA seq/Raw data_RNA seq_Ariane/Data/BL6/Copy of count_matrixnamechanged.txt", header = T, row.names = 1, sep = '\t')
df_new
df_new = data.frame(df_new, Condition = substr (df$value, 1,18))
ggplot(df_new, aes(x = value, y = X , fill = X)) + geom_boxplot() + xlab("") +
ylab(expression(log[2](count + 1)))
dim(as.matrix(Proj_col_data))
dim(as.matrix(Proj_count_data))

we're testing for the different condidtions

dds = DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, design =~ Condition)
dds

dim(as.matrix(Proj_col_data))
[1] 489222 6
dim(as.matrix(Proj_count_data))
[1] 27179 18

dds = DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, design =~ Condition)
Error in DESeqDataSetFromMatrix(countData = Proj_count_data, colData = Proj_col_data, :
ncol(countData) == nrow(colData) is not TRUE

How can I fix the problem?

Do countData and colData have the same dim() and are they both square?

1 Like

The count matrix should have one row per gene, one column per sample. So in your dataset you appear to have 27,179 genes, and 18 samples.

The colData data.frame should have one row per sample, and one column per descriptor (for example one column for age, one column for treatment, etc...) If your design is correct, that means you should have a single column, called Condition (you can have more columns, but they will be ignored). Here you have 489,222 rows in your colData table, which implies 489,222 samples, which does not correspond to the number of samples in your count matrix, so DESeq2 doesn't know what to do with it.

The exact solution depends on what your data really looks like, we can't answer it with the content of your question. If you created a pseudobulk matrix from scRNA-Seq, you might be correctly giving the pseudobulk counts in a cell x condition matrix, but forgot to create the pseudobulk metadata and are giving all the single cells to DESeq2, that would explain the 488k cells.

2 Likes

The colData is look like this:

Sorry Iā€™m pretty new with r studio and I do not know how to resolve my problem.


No and i know the problem is that they don't havee the same dimention. I don't know how to make them have the same dimention

Whatis the ID? Are replicate 1a and 1b independent biological replicates?

What is the Condition? How come you have row 3 and row 21 that are identical (replicate 2a) but for the value of Condition? Does that value mean something?

1 Like

Sorry to have missed the dim() output in your post

Generically, if you need to have two objects where the number of rows of one is equal to the number of columns of another, you need to decide which ones to keep or add. That's usually a domain issue (it requires knowing something about what the underlying data represents) rather than a programming problem.

The DESeqDataSetFromMatrix signature is

DESeqDataSetFromMatrix(countData, colData, design, tidy = FALSE, ignoreRank = FALSE, ...)

and your call conforms, so the problem is in the two arguments, which, per documentation

Rows of colData correspond to columns of countData

and the names have to correspond as well, although you can remove names from both.

Finally, it helps to run the example code to see if what the function produces has the kind of information you expect

library(DESeq2)

countData <- matrix(1:100,ncol=4)
condition <- factor(c("A","A","B","B"))
dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), ~ condition)AAS```

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.