I used the following code to upload my data on R
if (!file.exists("ames-liquor.rds")) {
url <- "https://github.com/ds202-at-ISU/materials/blob/master/03_tidyverse/data/ames-liquor.rds?raw=TRUE"
download.file(url, "ames-liquor.rds", mode="wb")
}
data <- readRDS("ames-liquor.rds")
then used this code to extract geographic latitude and longitude
data <- data %>%
separate(remove= FALSE,
col = 'Store Location' , sep=" ",
into=c("toss-it", "Latitude", "Longitude"))
data <- data %>% mutate(
Latitude = parse_number(Latitude),
Longitude = parse_number(Longitude)
)
after that I used this code to plot a scatterplot of lat and long of store locations and provide a visual breakdown of the liquor category (by Category Name
). Include volume sold in the breakdown.
library(tidyverse)
library(janitor)
set.seed(123)
sample <- sample_n(data, size = 12000, replace = F)
new_data <- sample|>clean_names()
final_data <- new_data|>
separate(col = store_location,
into = c("x", "latitude", "longitude"),
sep = " ")|>
select(category_name, latitude, longitude, volume_sold_liters)|>
na.omit()
final_data$longitude = str_sub(final_data$longitude, end = -2)
final_data$latitude = str_sub(final_data$latitude, start = 2)
final_data$longitude = as.numeric(final_data$longitude)
final_data$latitude = as.numeric(final_data$latitude)
final_data$category_name = as_factor(final_data$category_name)
ggplot(final_data, aes(x = latitude, y = longitude)) +
geom_point() +
theme_bw() +
xlab("LATITUDE (degrees)") +
ylab("LONGITUDE (degrees)")
top_ten <- final_data|>
group_by(category_name)|>
summarise(total_volume = sum(volume_sold_liters))|>
filter(total_volume > 2600)
Now I am really confused on how to
*Find the daily liquor sales in Ames in 2021: summarize number of sales, volume of liquor sold and amount of money spent.
and *Plot volume sold by day (use a scatterplot of volume by day and facet by month). .
Please help me figure it out