Error with TCGA package - TCGAbiolinks

I'm trying to download SKCM melanoma samples to R, using the package TCGAbiolinks . The wanted data is RNA-seq expression matrix, along with the metadata. Pretty basic stuff.

This is the code right from the beginning:

GDCprojects = getGDCprojects()

query_TCGA = GDCquery(
  project = "TCGA-SKCM",
  data.category  = "Transcriptome Profiling", 
  data.type = "Gene Expression Quantification",
  experimental.strategy = "RNA-Seq",
  workflow.type = "STAR - Counts",
  sample.type = c("Primary Tumor")) # picked primary
skcm_res = getResults(query_TCGA) # make results as table

GDCdownload(query = query_TCGA)
tcga_data = GDCprepare(query_TCGA)

However, I get this error:

> tcga_data = GDCprepare(query_TCGA)
|=================================================================================|100%                      Completed after 24 s 
Error in `vectbl_as_col_location()`:
! Can't subset columns past the end.
ℹ Locations 2, 3, and 4 don't exist.
ℹ There is only 1 column.
Run `rlang::last_error()` to see where the error occurred.
There were 50 or more warnings (use warnings() to see the first 50)

What does this mean and how do I fix this error? thank you.

Note: Suggestions for other packages that might get the job done would be more than welcomed!

Hmm, an exact copy-paste of your commands seems to work on my computer (see below for log). So I would suggest:

  • check that GCdownload() didn't loose the connection in the middle of the download, do you have the same log as I have below?
  • restart R session and rerun everything in the exact order given here
  • update the package and retry. See my sessionInfo below for the package versions, in particular do you have TCGAbiolinks_2.24.3?

Console log:

> TCGAbiolinks:::getProjectSummary("TCGA-SKCM")
[1] 21583

  file_count case_count               data_category
1       1892        469        Structural Variation
2       8024        470 Simple Nucleotide Variation
3       2814        470       Copy Number Variation
4       1850        469     Transcriptome Profiling
5       1425        470             DNA Methylation
6       2828        470            Sequencing Reads
7       1899        470                 Biospecimen
8        499        470                    Clinical
9        352        350          Proteome Profiling

[1] 470

[1] 2.492801e+13

> query_TCGA = GDCquery(
+   project = "TCGA-SKCM",
+   data.category  = "Transcriptome Profiling", 
+   data.type = "Gene Expression Quantification",
+   experimental.strategy = "RNA-Seq",
+   workflow.type = "STAR - Counts",
+   sample.type = c("Primary Tumor")) # picked primary
o GDCquery: Searching in GDC database
Genome of reference: hg38
oo Accessing GDC. This might take a while...
ooo Project: TCGA-SKCM
oo Filtering results
ooo By experimental.strategy
ooo By data.type
ooo By workflow.type
ooo By sample.type
oo Checking data
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
o Preparing output
> query_TCGA
       results   project           data.category                      data.type
1 c("c3183.... TCGA-SKCM Transcriptome Profiling Gene Expression Quantification
  legacy access experimental.strategy file.type platform  sample.type barcode
1  FALSE     NA               RNA-Seq        NA       NA Primary ....      NA
1 STAR - Counts
> skcm_res = getResults(query_TCGA) # make results as table
> GDCdownload(query = query_TCGA)
Downloading data for project TCGA-SKCM
GDCdownload will download 103 files. A total of 435.725194 MB
Downloading as: Thu_Jun_30_14_38_24_2022.tar.gz
Downloading: 100 MB     
> tcga_data = GDCprepare(query_TCGA)
|====================================================|100%                      Completed after 11 s 
Starting to add information to samples
 => Add clinical information to samples
 => Adding TCGA molecular information from marker papers
 => Information will have prefix 'paper_' 
skcm subtype information from:doi:10.1016/j.cell.2015.05.044
Available assays in SummarizedExperiment : 
  => unstranded
  => stranded_first
  => stranded_second
  => tpm_unstrand
  => fpkm_unstrand
  => fpkm_uq_unstrand

Session Info:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.24.3

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.8.0        Biobase_2.56.0             
 [3] httr_1.4.3                  tidyr_1.2.0                
 [5] bit64_4.0.5                 jsonlite_1.8.0             
 [7] R.utils_2.11.0              assertthat_0.2.1           
 [9] stats4_4.2.0                BiocFileCache_2.4.0        
[11] blob_1.2.3                  GenomeInfoDbData_1.2.8     
[13] progress_1.2.2              pillar_1.7.0               
[15] RSQLite_2.2.14              lattice_0.20-45            
[17] glue_1.6.2                  downloader_0.4             
[19] digest_0.6.29               GenomicRanges_1.48.0       
[21] XVector_0.36.0              rvest_1.0.2                
[23] colorspace_2.0-3            R.oo_1.24.0                
[25] Matrix_1.4-1                plyr_1.8.7                 
[27] XML_3.99-0.9                pkgconfig_2.0.3            
[29] biomaRt_2.52.0              zlibbioc_1.42.0            
[31] purrr_0.3.4                 scales_1.2.0               
[33] tzdb_0.3.0                  tibble_3.1.7               
[35] KEGGREST_1.36.2             generics_0.1.2             
[37] TCGAbiolinksGUI.data_1.16.0 IRanges_2.30.0             
[39] ggplot2_3.3.6               ellipsis_0.3.2             
[41] cachem_1.0.6                SummarizedExperiment_1.26.1
[43] BiocGenerics_0.42.0         cli_3.3.0                  
[45] magrittr_2.0.3              crayon_1.5.1               
[47] memoise_2.0.1               R.methodsS3_1.8.1          
[49] fansi_1.0.3                 xml2_1.3.3                 
[51] tools_4.2.0                 data.table_1.14.2          
[53] prettyunits_1.1.1           hms_1.1.1                  
[55] lifecycle_1.0.1             matrixStats_0.62.0         
[57] stringr_1.4.0               S4Vectors_0.34.0           
[59] munsell_0.5.0               DelayedArray_0.22.0        
[61] AnnotationDbi_1.58.0        Biostrings_2.64.0          
[63] compiler_4.2.0              GenomeInfoDb_1.32.2        
[65] rlang_1.0.2                 grid_4.2.0                 
[67] RCurl_1.98-1.6              rstudioapi_0.13            
[69] rappdirs_0.3.3              bitops_1.0-7               
[71] gtable_0.3.0                DBI_1.1.2                  
[73] curl_4.3.2                  R6_2.5.1                   
[75] knitr_1.39                  dplyr_1.0.9                
[77] fastmap_1.1.0               bit_4.0.4                  
[79] utf8_1.2.2                  filelock_1.0.2             
[81] readr_2.1.2                 stringi_1.7.6              
[83] Rcpp_1.0.8.3                vctrs_0.4.1                
[85] png_0.1-7                   dbplyr_2.1.1               
[87] tidyselect_1.1.2            xfun_0.31  

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.