LN12
June 26, 2022, 10:38pm
1
I'm trying to download SKCM melanoma samples to R, using the package TCGAbiolinks
. The wanted data is RNA-seq expression matrix, along with the metadata. Pretty basic stuff.
This is the code right from the beginning:
library(TCGAbiolinks)
GDCprojects = getGDCprojects()
TCGAbiolinks:::getProjectSummary("TCGA-SKCM")
query_TCGA = GDCquery(
project = "TCGA-SKCM",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
experimental.strategy = "RNA-Seq",
workflow.type = "STAR - Counts",
sample.type = c("Primary Tumor")) # picked primary
skcm_res = getResults(query_TCGA) # make results as table
GDCdownload(query = query_TCGA)
tcga_data = GDCprepare(query_TCGA)
However, I get this error:
> tcga_data = GDCprepare(query_TCGA)
|=================================================================================|100% Completed after 24 s
Error in `vectbl_as_col_location()`:
! Can't subset columns past the end.
ℹ Locations 2, 3, and 4 don't exist.
ℹ There is only 1 column.
Run `rlang::last_error()` to see where the error occurred.
There were 50 or more warnings (use warnings() to see the first 50)
What does this mean and how do I fix this error? thank you.
Note: Suggestions for other packages that might get the job done would be more than welcomed!
Hmm, an exact copy-paste of your commands seems to work on my computer (see below for log). So I would suggest:
check that GCdownload()
didn't loose the connection in the middle of the download, do you have the same log as I have below?
restart R session and rerun everything in the exact order given here
update the package and retry. See my sessionInfo below for the package versions, in particular do you have TCGAbiolinks_2.24.3
?
Console log:
> TCGAbiolinks:::getProjectSummary("TCGA-SKCM")
$file_count
[1] 21583
$data_categories
file_count case_count data_category
1 1892 469 Structural Variation
2 8024 470 Simple Nucleotide Variation
3 2814 470 Copy Number Variation
4 1850 469 Transcriptome Profiling
5 1425 470 DNA Methylation
6 2828 470 Sequencing Reads
7 1899 470 Biospecimen
8 499 470 Clinical
9 352 350 Proteome Profiling
$case_count
[1] 470
$file_size
[1] 2.492801e+13
> query_TCGA = GDCquery(
+ project = "TCGA-SKCM",
+ data.category = "Transcriptome Profiling",
+ data.type = "Gene Expression Quantification",
+ experimental.strategy = "RNA-Seq",
+ workflow.type = "STAR - Counts",
+ sample.type = c("Primary Tumor")) # picked primary
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-SKCM
--------------------
oo Filtering results
--------------------
ooo By experimental.strategy
ooo By data.type
ooo By workflow.type
ooo By sample.type
----------------
oo Checking data
----------------
ooo Checking if there are duplicated cases
ooo Checking if there are results for the query
-------------------
o Preparing output
-------------------
> query_TCGA
results project data.category data.type
1 c("c3183.... TCGA-SKCM Transcriptome Profiling Gene Expression Quantification
legacy access experimental.strategy file.type platform sample.type barcode
1 FALSE NA RNA-Seq NA NA Primary .... NA
workflow.type
1 STAR - Counts
> skcm_res = getResults(query_TCGA) # make results as table
> GDCdownload(query = query_TCGA)
Downloading data for project TCGA-SKCM
GDCdownload will download 103 files. A total of 435.725194 MB
Downloading as: Thu_Jun_30_14_38_24_2022.tar.gz
Downloading: 100 MB
> tcga_data = GDCprepare(query_TCGA)
|====================================================|100% Completed after 11 s
Starting to add information to samples
=> Add clinical information to samples
=> Adding TCGA molecular information from marker papers
=> Information will have prefix 'paper_'
skcm subtype information from:doi:10.1016/j.cell.2015.05.044
Available assays in SummarizedExperiment :
=> unstranded
=> stranded_first
=> stranded_second
=> tpm_unstrand
=> fpkm_unstrand
=> fpkm_uq_unstrand
Session Info:
> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TCGAbiolinks_2.24.3
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.8.0 Biobase_2.56.0
[3] httr_1.4.3 tidyr_1.2.0
[5] bit64_4.0.5 jsonlite_1.8.0
[7] R.utils_2.11.0 assertthat_0.2.1
[9] stats4_4.2.0 BiocFileCache_2.4.0
[11] blob_1.2.3 GenomeInfoDbData_1.2.8
[13] progress_1.2.2 pillar_1.7.0
[15] RSQLite_2.2.14 lattice_0.20-45
[17] glue_1.6.2 downloader_0.4
[19] digest_0.6.29 GenomicRanges_1.48.0
[21] XVector_0.36.0 rvest_1.0.2
[23] colorspace_2.0-3 R.oo_1.24.0
[25] Matrix_1.4-1 plyr_1.8.7
[27] XML_3.99-0.9 pkgconfig_2.0.3
[29] biomaRt_2.52.0 zlibbioc_1.42.0
[31] purrr_0.3.4 scales_1.2.0
[33] tzdb_0.3.0 tibble_3.1.7
[35] KEGGREST_1.36.2 generics_0.1.2
[37] TCGAbiolinksGUI.data_1.16.0 IRanges_2.30.0
[39] ggplot2_3.3.6 ellipsis_0.3.2
[41] cachem_1.0.6 SummarizedExperiment_1.26.1
[43] BiocGenerics_0.42.0 cli_3.3.0
[45] magrittr_2.0.3 crayon_1.5.1
[47] memoise_2.0.1 R.methodsS3_1.8.1
[49] fansi_1.0.3 xml2_1.3.3
[51] tools_4.2.0 data.table_1.14.2
[53] prettyunits_1.1.1 hms_1.1.1
[55] lifecycle_1.0.1 matrixStats_0.62.0
[57] stringr_1.4.0 S4Vectors_0.34.0
[59] munsell_0.5.0 DelayedArray_0.22.0
[61] AnnotationDbi_1.58.0 Biostrings_2.64.0
[63] compiler_4.2.0 GenomeInfoDb_1.32.2
[65] rlang_1.0.2 grid_4.2.0
[67] RCurl_1.98-1.6 rstudioapi_0.13
[69] rappdirs_0.3.3 bitops_1.0-7
[71] gtable_0.3.0 DBI_1.1.2
[73] curl_4.3.2 R6_2.5.1
[75] knitr_1.39 dplyr_1.0.9
[77] fastmap_1.1.0 bit_4.0.4
[79] utf8_1.2.2 filelock_1.0.2
[81] readr_2.1.2 stringi_1.7.6
[83] Rcpp_1.0.8.3 vctrs_0.4.1
[85] png_0.1-7 dbplyr_2.1.1
[87] tidyselect_1.1.2 xfun_0.31
system
Closed
July 21, 2022, 6:52pm
3
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.