Error: Can't subset columns that don't exist PLEASE HELP!

Hello. I need some type of help to get this code to work. I am using the Qiime2 R package to attempt and make a PCoA plot. The code is attached below. I have tried it as select(‘sample_id’, PC1, PC2) and I have tried it as select(sample_id, PC1, PC2). Neither work. I have also tried select(sample_id='sample_id', PC1, PC2). Thank you for your help. Is there anything else you can suggest for this? I know this column exists in the metadata. I have double checked and re-checked this. Furthermore, I have even tried to change the name of the column. This still didn't work. I have also linked the tutorial I am using for further context.

RE: Tutorial: Integrating QIIME2 and R for data visualization and analysis using qiime2R - QIIME 2 Forum

See the FAQ: How to do a minimal reproducible example reprex for beginners.

As far as I can tell from the screenshot, you should assign the output to "result", rather than piping on to ggplot, usecolnames(result) to confirm the existence of all variables passed to ggplot on line 27.

Thank you but I do not know where I would use this? I know you say on line 27 I have no clue where exactly. I am not a coder...Thank you so much though for your timely reply!

Line numbers refer to the screenshot, since no reprex is available.

colnames doesn't go here. Here's some made-up code to illustrate what I meant

result <- mydata %>% select(a,b,c)
colnames(result)
[1] "a", "b", "c" 

So, knowing what is being passed to ggplot

 mydata %>% select(a,b,c) %>% ggplot(.,aes(x=A, y=b))

it is no surprise to receive an error message about unknown columns.

Not that the syntax differs. The dot, . is the placeholder for the object that is received from %>%. If, instead, result had been saved

ggplot(result,aes(x = a, y = b) ...)

I am so sorry for continuously bugging you here but when I tried that it did not work. It still is giving me that error. I am at a loss...I figured out how to get the code into the thread so maybe this time I can actually see what you're talking about! Someone should give you a medal once this is done! Thanks again.

library(qiime2R)
library(tidyverse)
metadata <- readr::read_tsv('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/03_metadata/metadata_all-samples.txt')
uwunifrac <- read_qza('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/unweighted_unifrac_pcoa_results_dec11.qza')
shannon <- read_qza('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/shannon_vector.qza')$data %>% rownames_to_column('sample_id')
uwunifrac$data$Vectors %>% 
  select(sample_id, PC1, PC2) %>%
  left_join(metadata) %>%
  left_join(shannon) %>% 
  mutate(sample_site = gsub('door', 'Door', sample_site),
         sample_site = gsub('drain', 'Drain', sample_site),
         room_function = gsub('00_', '', room_function),
         room_function = gsub('02_', '', room_function),
         room_function = gsub('03_', '', room_function),
         room_function = gsub('04_', '', room_function),
         room_function = gsub('05_', '', room_function),
         room_function = fct_relevel(room_function, 'live animal', 'harvest',
                                     'fabrication and processing', 
                                     'product holding'),
         room_function = fct_recode(room_function,
                                    'fabrication &\nprocessing' = 'fabrication and processing'),
         room_function = fct_recode(room_function, 'product \nholding' = 'product holding'),
         listeria_present = as.factor(listeria_present),
         listeria_present = fct_relevel(listeria_present, 'yes', 'no')) %>% 
  ggplot(aes(x = PC1, y = PC2, color=room_function, shape=listeria_present, size=shannon)) + 
  geom_point(alpha=0.5) +
  theme_q2r() +
  scale_shape_manual(values=c(16,1), name="Listeria Presence") + 
  scale_size_continuous(name="Shannon Diversity") + 
  scale_color_discrete(name="Room Function")
1 Like

Getting there. Still no visibility to the shannon object though. (For a reprex, that can be done by cutting and pasting the output of dput(shannon).

For now, what does

colnames(shannon)

return?

So the dput(shannon) gives a really long list of ouput. I don't know if you need to see all of that so I provided just a snipit of the output.

> colnames(shannon)
[1] "sample_id"    "shannon_entropy"
>dput(shannon)
structure(list(sample_id = c("1.CC.d.1", etc.), shannon_entropy = c(4.78933594656631, etc.)), row.names = c(NA, -642L), class = "data.frame")

Gak. Serves me right for using a tablet.

You’re right, the rows, which are what’s making for the dput length aren’t necessary. What are the column names that carry over from the other objects into the last join?

So I think this was what you were after. These are the column names from the metadata.

> colnames(metadata)
 [1] "sample_id"              "barcode"                "run"                    "plate"                  "well"                  
 [6] "asvs_per_sample"        "sampling_event"         "sampling_date_bad"      "sampling_date"          "first_sampling_event"  
[11] "control"                "room"                   "event_room"             "room_order"             "room_order_number"     
[16] "room_product_type"      "room_function"          "room_function_grouping" "sample_site"            "sample_site_number"    
[21] "sample_site_exact"      "SourceSink"             "listeria_present"       "listeria_species"       "sampler"               
[26] "drain_color"            "collection_date"        "collection_time"        "surface_material"       "temperature_c"         
[31] "temperature_f"          "surface_moisture_notes" "surface_moisture"       "organic_matter_notes"   "organic_matter"        
[36] "pooled"

still missing this one, and I think that it holds the key, because it appears that PC1 and PC2 are intended to come from there and I wonder if they made it.

What does this do?

head(uwunifrac$data$Vectors)

Hello . SO...after using your suggestions for looking into uwunifrac$data$Vectors, I discovered that the sample IDs had to be in the format of SampleID. Furthermore, I also figured out for "shannon" I was selecting a dataset and NOT a variable. It was supposed to be "shannons". So what the majority of it was was simple tab-completing errors and the stinking letter "S" haha. The code below worked beautifully and I was even able to manipulate the graphic further. Thank you so much for your help!!!

metadata <- readr::read_tsv('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/03_metadata/metadata_all-samples.txt')
uwunifrac <- read_qza('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/unweighted_unifrac_pcoa_results.qza')
shannon <- read_qza('/Users/SamuraiSteve/Dropbox/gfic_data_analysis/qiime2/shannon_vector.qza')$data %>% rownames_to_column('SampleID')
uwunifrac$data$Vectors %>% 
  select(SampleID, PC1, PC2) %>%
  left_join(metadata) %>%
  left_join(shannon) %>%
  filter(room_function != '01_animal-swab', 
         room_function != '01_environment', 
         room_function != '01_hand-swab') %>%
  mutate(sample_site = gsub('door', 'Door', sample_site),
         sample_site = gsub('drain', 'Drain', sample_site),
         room_function = gsub('00_', '', room_function),
         room_function = gsub('02_', '', room_function),
         room_function = gsub('03_', '', room_function),
         room_function = gsub('04_', '', room_function),
         room_function = gsub('05_', '', room_function),
         room_function = fct_relevel(room_function, 'live animal', 'harvest',
                                     'fabrication and processing', 
                                     'product holding'),
         room_function = fct_recode(room_function,
                                    'fabrication &\nprocessing' = 'fabrication and processing'),
         room_function = fct_recode(room_function, 'product \nholding' = 'product holding'),
         listeria_present = as.factor(listeria_present),
         listeria_present = fct_relevel(listeria_present, 'yes', 'no')) -> data

prism <- carto_pal(12, 'Prism')
prism <- c("#5F4690", "#38A6A5", "#0F8554", "#CC503E", "#94346E","#994E95")

data %>% ggplot(aes(x = PC1, y = PC2, color=room_function, shape=listeria_present, size = shannons)) + 
  geom_point(alpha=0.3) +
  guides(color = guide_legend(override.aes = list(size = 4))) +
  guides(shape = guide_legend(override.aes = list(size = 4))) +
  theme_few() +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.99, hjust = 0.95),
        legend.spacing.x = unit(0.1, 'cm'),
        legend.spacing.y = unit(0.1, 'cm'),
        text = element_text(family = "Times New Roman",
                            size = 16)) +
  scale_shape_manual(values=c(16,1), name="Listeria Presence") +
  scale_size_continuous(name="Shannon Diversity") + 
  scale_color_manual(values = prism, name="Room Function")
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.