Plot with ggplot

Noodlejoe · April 22, 2020, 9:43am

Hi, I'm trying to replicate the apple mobility plot https://www.apple.com/covid19/mobility/ with ggplot and having trouble with it. My data frame has the countries as rows and the dates a as column names.
Any help would be greatly appreciated. I tried to upload a screenshot and it didn't work

martin.R · April 22, 2020, 9:47am

If you have the dates as columns, then you will need to reshape the data first:
https://tidyr.tidyverse.org/articles/tidy-data.html

Noodlejoe · April 22, 2020, 9:59am

Thanks, so I make the dates as the rows and the countries as the columns?

nirgrahamuk · April 22, 2020, 10:04am

tidy data would have columns for country, date, value(i.e. number_of_cases), the rows would have the information that is appropriate under such headings.

martin.R · April 22, 2020, 10:09am

No, you reshape the data to a 'long' format via e.g. tidyr::pivot_longer() such that your data looks like:

"USA", "2020-04-01", 100
"USA", "2020-04-02", 101
"Italy", "2020-04-01", 200 
"Italy", "2020-04-02", 201

Noodlejoe · April 22, 2020, 10:19am

maps1 %>%
pivot_longer(-Group.1, names_to = "Dates", values_to = "frequency")

Is still returning the same data frame as there original, am I doing something wrong?

martin.R · April 22, 2020, 10:23am

We don't really have enough info to help you out. Could you ask this with a minimal REPR oducible EX ample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Noodlejoe · April 22, 2020, 1:22pm

Martin thanks for you advice. I have now got the data frame reshaped. I tried to add my code using datapasta package but it was giving me grief to. I've managed to plot the data using ggplot and have loads the geom_line from multiple data sets. Its all working but how do I create a legend representing each country?

nirgrahamuk · April 22, 2020, 1:25pm

If it's from multiple datasets you are making it more difficult than otherwise.
One dataset where the the country variable can be the colour mapping of the geom would be easier

Noodlejoe · April 22, 2020, 1:32pm

How do I write that? Below is how I separated into county data frames and then plotted.
But you are suggesting that I can plot it from the original data frame?

AustraliaDataMapMobilityTransit <- subset(transitData, region =="Australia")
BelgiumDataMapMobilityTransit <- subset(transitData, region == "Belgium")
BrazilDataMapMobilityTransit <- subset(transitData, region == "Brazil")
CanadaDataMapMobilityTransit <- subset(transitData, region == "Canada")
CzechRepublicDataMapMobilityTransit <- subset(transitData, region == "Czech Republic")

ggplot()+
geom_line(data = AustraliaDataMapMobilityTransit, aes(x = Dates, y= frequency), colour = "red")+
geom_line(data=BelgiumDataMapMobilityTransit, aes(x= Dates, y = frequency), colour = "blue")+
geom_line(data=BrazilDataMapMobilityTransit, aes(x= Dates, y = frequency), colour = "yellow")+
geom_line(data=CanadaDataMapMobilityTransit, aes(x= Dates, y = frequency), colour = "black")+
geom_line(data=CzechRepublicDataMapMobilityTransit, aes(x= Dates, y = frequency), colour = "orange")

nirgrahamuk · April 22, 2020, 1:36pm


ggplot(data = transitData)+
geom_line( mapping = aes(x = Dates, y= frequency, colour = region))

Noodlejoe · April 22, 2020, 1:42pm

Thanks you are a legend. How do I only plot top 5 and bottom 5.

Noodlejoe · April 22, 2020, 1:47pm

Top 5 countries and bottom 5 countries, each country has approx 100 rows and there is approx 90 odd countries in the list

nirgrahamuk · April 22, 2020, 1:53pm

how do you calculate a ranking of the counties ? i.e. you have a summarisation over their dates in mind ?

Noodlejoe · April 22, 2020, 2:06pm

The countries are listed in order, but there is 100 entries for each country before it moves to the next Screen Shot 2020-04-22 at 10.05.50 pm

Noodlejoe · April 22, 2020, 2:10pm

Or is there a way to specify what countries I want to include

Noodlejoe · April 22, 2020, 2:10pm

Say I want to graph Australia, Canada,America and Russia

nirgrahamuk · April 22, 2020, 2:24pm

sorry friend, you are a little all over the place...
you can certainly filter your data on the way into your graph.
dplyr::filter() function is for that purpose.

filtered_data <- transitData %>% filter(region %in% c("Australia", "Canada"))

ggplot(data = filtered_data )+
geom_line( mapping = aes(x = Dates, y= frequency, colour = region))

system · May 13, 2020, 2:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.