Error with TCGA package - TCGAbiolinks

I'm trying to download SKCM melanoma samples to R, using the package TCGAbiolinks . The wanted data is RNA-seq expression matrix, along with the metadata. Pretty basic stuff.

This is the code right from the beginning:

library(TCGAbiolinks)
GDCprojects = getGDCprojects()
TCGAbiolinks:::getProjectSummary("TCGA-SKCM")

query_TCGA = GDCquery(
  project = "TCGA-SKCM",
  data.category  = "Transcriptome Profiling", 
  data.type = "Gene Expression Quantification",
  experimental.strategy = "RNA-Seq",
  workflow.type = "STAR - Counts",
  sample.type = c("Primary Tumor")) # picked primary
skcm_res = getResults(query_TCGA) # make results as table

GDCdownload(query = query_TCGA)
tcga_data = GDCprepare(query_TCGA)

However, I get this error:

> tcga_data = GDCprepare(query_TCGA)
|=================================================================================|100%                      Completed after 24 s 
Error in `vectbl_as_col_location()`:
! Can't subset columns past the end.
ℹ Locations 2, 3, and 4 don't exist.
ℹ There is only 1 column.
Run `rlang::last_error()` to see where the error occurred.
There were 50 or more warnings (use warnings() to see the first 50)

What does this mean and how do I fix this error? thank you.

Note: Suggestions for other packages that might get the job done would be more than welcomed!

Hmm, an exact copy-paste of your commands seems to work on my computer (see below for log). So I would suggest:

  • check that GCdownload() didn't loose the connection in the middle of the download, do you have the same log as I have below?
  • restart R session and rerun everything in the exact order given here
  • update the package and retry. See my sessionInfo below for the package versions, in particular do you have TCGAbiolinks_2.24.3?

Console log:

> TCGAbiolinks:::getProjectSummary("TCGA-SKCM")
$file_count
[1] 21583

$data_categories
  file_count case_count               data_category
1       1892        469        Structural Variation
2       8024        470 Simple Nucleotide Variation
3       2814        470       Copy Number Variation
4       1850        469     Transcriptome Profiling
5       1425        470             DNA Methylation
6       2828        470            Sequencing Reads
7       1899        470                 Biospecimen
8        499        470                    Clinical
9        352        350          Proteome Profiling

$case_count
[1] 470

$file_size
[1] 2.492801e+13

> query_TCGA = GDCquery(
+   project = "TCGA-SKCM",
+   data.category  = "Transcriptome Profiling", 
+   data.type = "Gene Expression Quantification",
+   experimental.strategy = "RNA-Seq",
+   workflow.type = "STAR - Counts",
+   sample.type = c("Primary Tumor")) # picked primary
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-SKCM
--------------------
oo Filtering results
--------------------
ooo By experimental.strategy
ooo By data.type
ooo By workflow.type
ooo By sample.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
> query_TCGA
       results   project           data.category                      data.type
1 c("c3183.... TCGA-SKCM Transcriptome Profiling Gene Expression Quantification
  legacy access experimental.strategy file.type platform  sample.type barcode
1  FALSE     NA               RNA-Seq        NA       NA Primary ....      NA
  workflow.type
1 STAR - Counts
> skcm_res = getResults(query_TCGA) # make results as table
> GDCdownload(query = query_TCGA)
Downloading data for project TCGA-SKCM
GDCdownload will download 103 files. A total of 435.725194 MB
Downloading as: Thu_Jun_30_14_38_24_2022.tar.gz
Downloading: 100 MB     
> tcga_data = GDCprepare(query_TCGA)
|====================================================|100%                      Completed after 11 s 
Starting to add information to samples
 => Add clinical information to samples
 => Adding TCGA molecular information from marker papers
 => Information will have prefix 'paper_' 
skcm subtype information from:doi:10.1016/j.cell.2015.05.044
Available assays in SummarizedExperiment : 
  => unstranded
  => stranded_first
  => stranded_second
  => tpm_unstrand
  => fpkm_unstrand
  => fpkm_uq_unstrand

Session Info:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.24.3

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.8.0        Biobase_2.56.0             
 [3] httr_1.4.3                  tidyr_1.2.0                
 [5] bit64_4.0.5                 jsonlite_1.8.0             
 [7] R.utils_2.11.0              assertthat_0.2.1           
 [9] stats4_4.2.0                BiocFileCache_2.4.0        
[11] blob_1.2.3                  GenomeInfoDbData_1.2.8     
[13] progress_1.2.2              pillar_1.7.0               
[15] RSQLite_2.2.14              lattice_0.20-45            
[17] glue_1.6.2                  downloader_0.4             
[19] digest_0.6.29               GenomicRanges_1.48.0       
[21] XVector_0.36.0              rvest_1.0.2                
[23] colorspace_2.0-3            R.oo_1.24.0                
[25] Matrix_1.4-1                plyr_1.8.7                 
[27] XML_3.99-0.9                pkgconfig_2.0.3            
[29] biomaRt_2.52.0              zlibbioc_1.42.0            
[31] purrr_0.3.4                 scales_1.2.0               
[33] tzdb_0.3.0                  tibble_3.1.7               
[35] KEGGREST_1.36.2             generics_0.1.2             
[37] TCGAbiolinksGUI.data_1.16.0 IRanges_2.30.0             
[39] ggplot2_3.3.6               ellipsis_0.3.2             
[41] cachem_1.0.6                SummarizedExperiment_1.26.1
[43] BiocGenerics_0.42.0         cli_3.3.0                  
[45] magrittr_2.0.3              crayon_1.5.1               
[47] memoise_2.0.1               R.methodsS3_1.8.1          
[49] fansi_1.0.3                 xml2_1.3.3                 
[51] tools_4.2.0                 data.table_1.14.2          
[53] prettyunits_1.1.1           hms_1.1.1                  
[55] lifecycle_1.0.1             matrixStats_0.62.0         
[57] stringr_1.4.0               S4Vectors_0.34.0           
[59] munsell_0.5.0               DelayedArray_0.22.0        
[61] AnnotationDbi_1.58.0        Biostrings_2.64.0          
[63] compiler_4.2.0              GenomeInfoDb_1.32.2        
[65] rlang_1.0.2                 grid_4.2.0                 
[67] RCurl_1.98-1.6              rstudioapi_0.13            
[69] rappdirs_0.3.3              bitops_1.0-7               
[71] gtable_0.3.0                DBI_1.1.2                  
[73] curl_4.3.2                  R6_2.5.1                   
[75] knitr_1.39                  dplyr_1.0.9                
[77] fastmap_1.1.0               bit_4.0.4                  
[79] utf8_1.2.2                  filelock_1.0.2             
[81] readr_2.1.2                 stringi_1.7.6              
[83] Rcpp_1.0.8.3                vctrs_0.4.1                
[85] png_0.1-7                   dbplyr_2.1.1               
[87] tidyselect_1.1.2            xfun_0.31  

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.