NAs introduced by coercionError in mutate_impl(.data, dots) : Column `XY` must be length 96 (the number of rows) or one, not 8

Hi everybody!
I'm an absolute newbie to R and with the help from a friend we together fiddled together this script: (See below). I played around with the script, trying different things and now it seems I have messed up the script :frowning: When i try to get one factor as numeric. I keep getting this error message:

NAs introduced by coercionError in mutate_impl(.data, dots) : 
  Column `DoseFactorX` must be length 96 (the number of rows) or one, not 8

I have no idea what I'm doing wrong. Is anyone able to save my life? :wink: I'm devastated

for(j in 1:length(fileNames)){
# function to read all sheets from xlxs files
inpList <- list()
SheetNames <- openxlsx::getSheetNames(paste(inpDatadir,fileNames[j], sep=""))
for(i in 1:length(SheetNames)){
  inp <- read_xlsx(paste(inpDatadir,fileNames[j], sep=""), sheet = i, range = "B12:M20")
  inpList[[SheetNames[i]]] <- inp
}

 
for(i in 1:length(SheetNames)){

## extract dose and drug name from sheetnames
Sheetnam <- SheetNames[i]
DrugNames <- unlist(strsplit(Sheetnam, "_"))
Drug1 <- unlist(strsplit(DrugNames[2], "(?=[A-Za-z])(?<=[0-9])|(?=[0-9])(?<=[A-Za-z])", perl=TRUE))
Drug2 <- unlist(strsplit(DrugNames[3], "(?=[A-Za-z])(?<=[0-9])|(?=[0-9])(?<=[A-Za-z])", perl=TRUE))

celltype  <- DrugNames[1]
DrugYDose <- Drug1[2]
DrugYName <- Drug1[1]
DrugXDose <- Drug2[2]
DrugXName <- Drug2[1]

## format data to longlist
test <- as.data.frame(inpList[[i]])
doseFactorX=as.character(c(0,0,0.0625,0.0625,0.125,0.125,0.25,0.25,0.5,0.5,1,1))
doseFactorY=rep(as.character(c(0,0,0.25,0.25,0.5,0.5,1,1)), 12)

testMod <- test %>% gather(., key="column", value="measure") %>% mutate(DoseFactorX=doseFactorX[as.numeric(column)]) %>% mutate(DoseFactorY=doseFactorY) %>% select(-column)

testSummary <- testMod %>% group_by(DoseFactorX, DoseFactorY) %>% summarise(mean=mean(measure), SD=sd(measure))
mean100 <- testSummary$mean[which(testSummary$DoseFactorX==0&testSummary$DoseFactorY==0)]

testSummary <- testSummary %>% mutate(relPerc=mean*100/mean100) %>% mutate(relSD=SD*100/mean100)

DoseX0 <- testSummary %>% filter(DoseFactorX==0 & DoseFactorY!=0) %>% mutate(factor=DoseFactorY) %>% mutate(meanDX=mean) %>% ungroup()%>% select(factor, meanDX)
DoseY0 <- testSummary %>% filter(DoseFactorX!=0 & DoseFactorY==0) %>% mutate(factor=DoseFactorX) %>% mutate(meanDY=mean) %>% ungroup()%>% select(factor, meanDY)

testSummary_mod <- testSummary %>% filter(DoseFactorY!=0) %>% filter(DoseFactorX!=0) %>% left_join(., DoseY0, by=c("DoseFactorX"="factor"))%>% left_join(., DoseX0, by=c("DoseFactorY"="factor"))


testSummaryCDI <- testSummary_mod %>% dplyr::mutate(CDI_div=mean/(as.numeric(meanDY)*as.numeric(meanDX))) 

## plot heatmap CDI
DataMatrix <- testSummaryCDI  %>% select(DoseFactorY, DoseFactorX, CDI_div) %>% spread(., DoseFactorX, CDI_div) 
DataMatrix_fin <- DataMatrix %>% select(-DoseFactorY) %>% as.matrix(.)
rownames(DataMatrix_fin) <- DataMatrix$DoseFactorY

colLab <- as.numeric(colnames(DataMatrix_fin))*as.numeric(DrugXDose)
rowLab <- as.numeric(rownames(DataMatrix_fin))*as.numeric(DrugYDose)
BreaksVec <- seq(from=0, to=2, by=0.006)
colVec_pre <- colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(100)
colFin1 <- rep(colVec_pre[100], 117)
colFin2 <- rep(colVec_pre[1], 117)
colVec <- c(colFin2, colVec_pre, colFin1)
pheatmap(mat = DataMatrix_fin, color= colVec, breaks=BreaksVec, display_numbers = T, border_color = "black", drop_levels = T, kmeans_k = NA, fontsize=12, fontface="bold", labels_col=colLab, labels_row=rowLab,
     cluster_rows = F, cluster_cols = F, main = paste("CDI", celltype, DrugXName, "vs", DrugYName, sep= " "))


## plot heatmap mean
DataMatrix <- testSummary  %>% select(DoseFactorY, DoseFactorX, relPerc) %>% spread(., DoseFactorX, relPerc) 
DataMatrix_fin <- DataMatrix %>% select(-DoseFactorY) %>% as.matrix(.)
rownames(DataMatrix_fin) <- DataMatrix$DoseFactorY

colLab <- as.numeric(colnames(DataMatrix_fin))*as.numeric(DrugXDose)
rowLab <- as.numeric(rownames(DataMatrix_fin))*as.numeric(DrugYDose)
BreaksVec <- seq(from=0, to=201, by=1)
colVec_pre <- colorRampPalette(rev(brewer.pal(n = 7, name =
  "RdYlBu")))(102)
colFin <- rep(colVec_pre[102], 100)
colVec <- c(colVec_pre, colFin)
pheatmap(mat = DataMatrix_fin, color= colVec, breaks=BreaksVec, display_numbers = T, border_color = "black", drop_levels = T, kmeans_k = NA, fontsize=12, fontface="bold", labels_col=colLab, labels_row=rowLab, legend=F,
     cluster_rows = F, cluster_cols = F, main = paste("Viability", celltype, DrugXName, unitX, "vs", DrugYName, unitY,sep= " "))

}
}

It's a bit hard to tell what's going wrong from this script (since it's not insubstantial). Are you able to go through it by steps and isolate the problematic area?

Somewhere in a mutate call in there you're using something that isn't the same length as (or from) your dataframe. If you pass a vector of length one, it applies itself to the whole column (which is why it's saying length 96 or one). Because we don't have the data, it's hard to tell exactly where.

Ideally, you could create a minimal reproducible example, aka a reprex. It will help us help you if we can be sure we're all working with/looking at the same stuff.

install.packages("reprex")

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page. The reprex dos and don'ts are also useful.

What to do if you run into clipboard problems

If you run into problems with access to your clipboard, you can specify an outfile for the reprex, and then copy and paste the contents into the forum.

reprex::reprex(input = "fruits_stringdist.R", outfile = "fruits_stringdist.md")

For pointers specific to the community site, check out the reprex FAQ.

1 Like