After watching countless videos...help!
I am trying to scrape the "winners" from this MA. lottery website using rvest with no luck. Here is the webpage: Massachusetts Lottery
Any help with the code would be appreciated.
I think you may need {RSelenium} rather then {rvest} but I cannot check as my machine throws an error when I try try to run it. This looks like a good tutorial Web Scraping in R: Selenium, FireFox, and PhantomJS | Christopher Belanger, PhD
Thanks! I'll check out RSelenium.
You might want to consentrate on a Network tab of your browser's dev tools to figure out how exactly that data is fetched. In this particular case it comes from API calls like https://www.masslottery.com/api/v1/winners/query?start_index=0&count=25&sort=newestFirst
, which you can use yourself though httr
/ httr2
or just point jsonlite
to the url:
api_query <-"https://www.masslottery.com/api/v1/winners/query?start_index=0&count=25&sort=newestFirst"
winners_resp <- jsonlite::fromJSON(api_query)
str(winners_resp)
#> List of 2
#> $ pageOfWinners :'data.frame': 25 obs. of 7 variables:
#> ..$ date_of_win : chr [1:25] "2024-08-26" "2024-08-26" "2024-08-26" "2024-08-26" ...
#> ..$ prize_amount_display: chr [1:25] "$100,000" "$100,000" "$20,000" "$20,000" ...
#> ..$ prize_amount_usd : int [1:25] 100000 100000 20000 20000 20000 20000 20000 20000 15000 10000 ...
#> ..$ identifier : chr [1:25] "mass_cash" "mass_cash" "433" "433" ...
#> ..$ name : chr [1:25] "Mass Cash" "Mass Cash" "Lifetime Millions" "Lifetime Millions" ...
#> ..$ retailer : chr [1:25] "Gulf Foodmart" "Gulf Foodmart" "Highland Farm" "Bridgeview Convenience Store" ...
#> ..$ retailer_location : chr [1:25] "Lanesboro" "Lanesboro" "Provincetown" "Tyngsboro" ...
#> $ totalNumberOfWinners: int 686169
tibble::as_tibble(winners_resp$pageOfWinners)
#> # A tibble: 25 Ă— 7
#> date_of_win prize_amount_display prize_amount_usd identifier name retailer
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 2024-08-26 $100,000 100000 mass_cash Mass… Gulf Fo…
#> 2 2024-08-26 $100,000 100000 mass_cash Mass… Gulf Fo…
#> 3 2024-08-26 $20,000 20000 433 Life… Highlan…
#> 4 2024-08-26 $20,000 20000 433 Life… Bridgev…
#> 5 2024-08-26 $20,000 20000 433 Life… Abc Min…
#> 6 2024-08-26 $20,000 20000 billion-dol… BILL… 7-Eleve…
#> 7 2024-08-26 $20,000 20000 433 Life… Alltown…
#> 8 2024-08-26 $20,000 20000 billion-dol… BILL… Saratog…
#> 9 2024-08-26 $15,000 15000 keno Keno Amvets …
#> 10 2024-08-26 $10,000 10000 100x-cash-2… 100X… Colbea …
#> # â„ą 15 more rows
#> # â„ą 1 more variable: retailer_location <chr>
Feel free to play with start_index
& count
parameters in API request. And sometimes it's worth testing with your own values, for example in this case the record count is not fixed to 25 per request.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.