How to avoid deleting the data when there are no empty cells in a column of a dataframe?

Hi,

I am analysing large datasets in R, and came across an issue. For instance, I have a column in a dataframe(df) with blank cells, here the -which command works fine and as expected. (see example below)

dput(df)
structure(list(Sample_title = c("Healthy Control, biological rep1", 
"Healthy Control, biological rep2", "Healthy Control, biological rep3", 
"Healthy Control, biological rep4"), Sample_accession = c("GSM542941", 
"GSM542942", "GSM542943", "GSM542944"), Patients = c("control", 
"control", "control", "control"), strain = c("none", "none", 
"none", "none"), Tissue = c("Blood", "", "", "Blood")), class = "data.frame", row.names = c(NA, 
-4L))

df = df[-which(df$Tissue == ""),]

dim(df)
[1] 2 5

I have a column in a another dataframe (df_1) without any blank cells and when I run the -which command, all the data gets deleted. How to I avoid deleting and proceed to next step without any changes to the dataframe? (see example below)

dput(df_1)
structure(list(Sample_title = c("Healthy Control, biological rep1", 
"Healthy Control, biological rep2", "Healthy Control, biological rep3", 
"Healthy Control, biological rep4"), Sample_accession = c("GSM542941", 
"GSM542942", "GSM542943", "GSM542944"), Patients = c("control", 
"control", "control", "control"), strain = c("none", "none", 
"none", "none"), Tissue = c("Blood", "Blood", "Blood", "Blood"
)), class = "data.frame", row.names = c(NA, -4L))

df_1 = df_1[-which(df_1$Tissue == ""),]

dim(df_1)
[1] 0 5

OR, checking the blank cells before using -which and removing is the only method?

sum(df$Tissue=="")
which(df$Tissue=="", arr.ind=TRUE)

Thank you,
Toufiq

Look at the steps of your query. when which(...) returns nothing, then the inverse of that is also nothing, hence an empty data set. Ask instead for what you want to keep: df_1[which(df_1$Tissue != ""),]

1 Like

@D.H.Slone
Thank you very much. This is indeed helpful. Perhaps, checking the blank cells before using -which would be helpful right? (see below)

sum(df$Tissue=="")
which(df$Tissue=="", arr.ind=TRUE)

You can run a check if you are curious (I usually do) but it is not needed to function. You can also check nrow(df) before and after an operation.

1 Like

@D.H.Slone
Thank you very much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.