Error: Can't subset columns that don't exist.

Hi all,

I am trying to compile some annotated genome files of a bacterium species and I tried everything, but I can't stop having the error "Error: Can't subset columns that don't exist". Does anyone know how can I fix it or what may be wrong?

image

Thank you in advance for your time.

I cant read that image.
if you want to share code / error codes, please do so in the form of text.
you can format your forum posts with code chunks by using a line of 3 backticks

```
code formatted to be easily readable
```
1 Like

When I see the 'Environment' section, I see that I have '0 obs. of 2 variables'. The error may come from here, but I don't know how to fix it.

does your data have n_tools in it ?

I find a lot of this code suspect, the repeated calls to hardcoded gsub.
gsub params are pattern, replacement, x
so if there is a .txt in the filename that will become 19_BTHE, then if theres an additional .txt it would become VPI_5482 etc.
What is the intention here ... it seems ... unlikely...
what could be the intention here ?

I am using a script of a colleague. As I am new to RScript, I may be reading the instructions wrong. Here is the part about gsub and file name:

#import the text file names from dbCAN analysis, which have been saved as txt files in your computer
dbcan_files <- list.files("/path/to/dbcan/output/directory/",pattern="txt")
#to combine dbCAN file of each strain into one file
dbcan_all <- data.frame("GH"="x","file_name"="a",stringsAsFactors=F)[-1,]
for (f in dbcan_files) {
#import the dbCAN file of each strain
file <- read.delim(paste0("/path/to/dbcan/output/directory/",f))
#get the name of each strain
file_name <- basename(paste0("/path/to/dbcan/output/directory/",f))
file_name <- gsub(".txt", "", file_name)

change the name of each column

names(file) <- c("gene", "hmmer", "hotpet", "diamond", "signalp", "n_tools")

About my files, they are like this:

that makes sense in as much as it just reduced the filename to not contain .txt anymore
Thats probably all the gsub'ing you need.
you're data is problematic because it has apparent column headings
'gene','id','hmmer' etc
of which id is probably meant to be part of gene
you could probably change the read.delim to have header=FALSE, skip = 1 ,to skip reading the header line

and please don't post images of text. They are hard to read, can't be quoted from etc.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Thanks for the reply! Here it is:

> dbcan_files <- list.files("C:/Users/Lu/DBCA/",pattern="txt")
> dbcan_all <- data.frame("GH"="x","file_name"="a",stringsAsFactors=F)[-1,]
> for (f in dbcan_files) {
+   file <- read.delim(paste0("C:/Users/Lu/DBCA/",f))
+   file_name <- basename(paste0("C:/Users/Lu/DBCA/",f))
+   file_name <- gsub(".txt", "19_BTHE", file_name)
+   file_name <- gsub(".txt", "VPI-5482", file_name)
+   file_name <- gsub(".txt", "VPI-5482 (1)", file_name)
+   file_name <- gsub(".txt", "OF03-10BH", file_name)
+   file_name <- gsub(".txt", "OF05-16BH", file_name)
+   file_name <- gsub(".txt", "NCTC13706", file_name)
+   file_name <- gsub(".txt", "NCTC10582", file_name)
+   file_name <- gsub(".txt", "MGYG-HGUT-00196", file_name)
+   file_name <- gsub(".txt", "F9-2", file_name)
+   file_name <- gsub(".txt", "CL15T12C11", file_name)
+   file_name <- gsub(".txt", "bq_0049", file_name)
+   file_name <- gsub(".txt", "Bacteroides_thetaiotaomicron_81H8", file_name)
+   file_name <- gsub(".txt", "ATCC_29741", file_name)
+   file_name <- gsub(".txt", "AM51-2", file_name)
+   file_name <- gsub(".txt", "AM30-26", file_name)
+   file_name <- gsub(".txt", "AM26-17", file_name)
+   file_name <- gsub(".txt", "AM15-10", file_name)
+   file_name <- gsub(".txt", "AM09-21", file_name)
+   file_name <- gsub(".txt", "AF45-16BH", file_name)
+   file_name <- gsub(".txt", "AF28-2Y", file_name)
+   file_name <- gsub(".txt", "AF24-19LB", file_name)
+   file_name <- gsub(".txt", "AF14-20", file_name)
+   file_name <- gsub(".txt", "AF03-26", file_name)
+   file_name <- gsub(".txt", "AD135X1B", file_name)
+   file_name <- gsub(".txt", "7330", file_name)
+   file_name <- gsub(".txt", "2789STDY5834945", file_name)
+   file_name <- gsub(".txt", "2789STDY5834899", file_name)
+   file_name <- gsub(".txt", "2789STDY5834846", file_name)
+   file_name <- gsub(".txt", "2789STDY5608873", file_name)
+   file_name <- gsub(".txt", "19_BTHE", file_name)
+   file_name <- gsub(".txt", "14-106904-2", file_name)
+   names(file) <- c("gene", "hmmer", "hotpet", "diamond", "signalp", "#ofTools")
+   f_1 <- file%>%
+     filter(n_tools >= 2)%>%
+     mutate_all(as.character)  
+   f_1$hmmer <-sapply(strsplit(f_1$hmmer, "[()]"), function(x) x[[1]][1])
+   f_1$hmmer <-sapply(strsplit(f_1$hmmer, "_"), function(x) x[[1]][1])
+   f_1[f_1=="N"] <- NA
+   f_2 <- f_1 %>%
+     dplyr::select(gene, hmmer)%>%
+     filter(grepl("GH", hmmer))%>%
+     group_by(hmmer)%>%
+     dplyr::summarise(n = n())
+   names(f_2) <- c("GH", file_name)
+   dbcan_all <- full_join(dbcan_all,f_2,by=c("GH"="GH"))
+ }
Error: Problem with `filter()` input `..1`.
x object 'n_tools' not found
i Input `..1` is `n_tools >= 2`.
Run `rlang::last_error()` to see where the error occurred.

It is to combine all genome strains of a species (that are in txt., like 19_BTHE), and I use:
library(dplyr)
library(tidyr)
library(textshape)#for column to rownames
library(ComplexHeatmap)
library(circlize)

are your delimited files guaranteed to contain columns with n_tools in the header ? if not this is your bug.
Also im skeptical about the gsub stuff


file_name <- "somthing 19_BTHE another thing.txt"
gsub(".txt", "19_BTHE", file_name)
# result
# "somthing 19_BTHE another thing19_BTHE"