I think reproducibility is challenging to achieve for students/researchers who have limited access to bigger data storage, and a very important topic to talk about to get ready to jump into the real-world work setting.
So I wanted to create a thread here to ask questions. Plus, I asked for help on the discussion board for my data science program, but I am not getting any response up until now, so I would like to get some professional advice here on RStudio Community.
Thank you in advance.
I am trying to figure out the code to automatically download and unzip large-size data from Kaggle.com. My goal is to either upload the file to my GitHub repo or create a code for others to download data easily. I tried 3 ways.
First, this code below specifies where the zip file is on the Kaggle website, create a temp file, download it and use read.table() and unzip() to access the data file.
temp <- tempfile()
url <- "https://www.kaggle.com/......../download/archive.zip"
download.file(url, temp)
steam_main_data <- read.table(unzip(temp, files = "steam_reviews.csv"))
unlink(temp)
This is failing....
Second, I used the pins library, and also uploaded a data file to OneDrive and DropBox, save it as a pinboard, and read that file.
library(pins)
steam_board <- board_url(c("steam_review_board" = "https://www.dropbox.com/........../steam_reviews.csv?dl=0" ))
steam_main_data_2 <- steam_board %>%
pin_read("steam_review_board") %>%
as.data.frame()
This is failing also.
The third way is to use a piggyback library to directly upload a file from a local file system to GitHub repo, which I really want this way to work.
library(piggyback)
pb_upload("C:/Users/...../...../...../steam_reviews.csv",
repo = "my_repo",
tag = "v0.0.1")
When I run this code, it looks like it is working, but then the uploading progress shows 1% the whole time and it does not go up.... So this is failing.
I would like to know any other ways to solve this issue.
Thank you again.