This 2-day workshop covers how to analyze large amounts of data in R. We will focus on scaling up our analyses using the same dplyr verbs that we use in our everyday work. We will use dplyr with data.table, databases, and Spark. We will also cover best practices on visualizing, modeling, and sharing against these data sources. Where applicable, we will review recommended connection settings, security best practices, and deployment options.”
Who should attend:
You should take this workshop if you want to learn how to work with big data in R. This data can be in-memory, in databases (like SQL Server), or in a cluster (like Spark).
Hi Is there anything we should do to prepare for this, other than have the suggested RStudio,R versions installed. Are there package prerequisites, dplyr and what else?
Hi @skamanrev, thank you for your question. We plan to provide each student a server in the cloud (AWS) that you will be able to access via a web browser in your machine. Please see this section in your GitHub repository for more info: https://github.com/rstudio-conf-2020/big-data#equipment
I'm trying to do the exercises provided on the Big Data repository, but I can't seem to find the data sets.
Could you please upload the data or add a link to the data?