Im so confused about how to work with this data for the following questions

I use the code below to download the data

if (!file.exists("ames-liquor.rds")) {
  url <- ""
  download.file(url, "ames-liquor.rds", mode="wb")
data <- readRDS("ames-liquor.rds")

I am so confused on how to get *how many observations are in data? *how many different cities are in the data? (Variable City, careful, trick question!) *different stores: how many different stores are in the data? Check first with Store Name, then with Store Number. Discuss differences (give an example), and then answer the question of how many different stores are in the data set.

And the data cleaning part of it

  • how to extract geographic latitude and longitude from the variable Store Location
  • how to check variable types. Pick five variables that need to be converted to a different type and fix those.
  • how to use the package lubridate to convert the Date variable to a date. Then extract year, month and day from the variable Date

Hi, welcome!

Homework inspired questions are welcome here, but you have to tell us what have you tried so far? what is your specific problem? We are more inclined towards helping you with specific coding problems rather than doing your work for you.

For guidelines about how to properly ask homework related questions here, please read our homework policy.

Thank you, this not the homwork question, its the sample question for exam next month and I'm really having hard time with it

Thats somewhat of a homework too, isn't it?

You might want to have a look at some functions. Since your downloaded data comes from somewhat tidyverse related, you will stick to the tidyverse I guess. For your first tasks, you should read about nrow() and dplyr::n_distinct() as well as unique(). Those are likely the minimum set of functions you are supposed to use to answer those questions.

Your second part is about working with geographical data - which can be done i.e. with the sf package (see here). To check variable types you can use str() and coerce incorrectly specified data with the large amount of as.*** functions. And some advice regarding the lubridate package can be found here.

Good luck on your exams and feel free to ask further questions, if you tried something and come to a point where you need help regarding the code. :slight_smile:
Kind regards

Hi, Im download the file and try some things:

filename <- "C:\\Users\\macosta\\Downloads\\ames-liquor.rds"  # change you path

liquor <- readRDS(filename)

dim(liquor) # dimension  661945  rows and 24 colums

str(liquor) # Check the structure data

For get the coordinate in different colums:

# clean the  colums for get only easy character:

liquor$`Store Location` <-  gsub("POINT","", liquor$`Store Location`)

liquor$`Store Location` <-  liquor$`Store Location` %>% str_replace_all("\\(|\\)", "")


options(pillar.sigfig = 8)

liquor <- liquor%>%
  mutate(`Store Location` = trimws(`Store Location`)) %>%
  separate(`Store Location`, sep = " ", into = c("Lat", "Lon")) %>%
  mutate(across(Lat:Lon, as.numeric))

# You could use leaflet package for plot this coordinates:

# Lat           Lon
# <dbl>        <dbl>
# 1 -93.619455 42.022848
# 2 -93.669896 42.021605
# 3 -93.669896 42.021605
# 4  NA        NA       
# 5  NA        NA       
# 6 -93.618911 42.022854
# 7 -93.669896 42.021605
# 8 -93.619455 42.022848
# 9 -93.669896 42.021605
# 10  NA        NA 

Thank you so much for your reply

For some general help in getting some summary stats see summaries and descriptive info.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.