Don't know how to scrape a specific site

JerreB · July 22, 2022, 11:49am

Hi everyone,

For my masters thesis I have to collect data from a website, all data is publically available, but doing it by hand is very time consuming.

I want to scrape the following website: https://www.colruyt.be/nl/producten
More specifically i want to scrape the name of the product, the price, weight, price per weight and the name of the little leaf label ( it has category A,B,C,D,E).

Can someone help me out, i can really use the help

nirgrahamuk · July 22, 2022, 2:27pm

try using rvest
Easily Harvest (Scrape) Web Pages • rvest (tidyverse.org)

JerreB · July 22, 2022, 2:53pm

I did, other sites work, but this one does not

nirgrahamuk · July 22, 2022, 3:25pm

What have you tried so far? what is your specific problem?, we are more inclined towards helping you with specific coding problems rather than doing your work for you.

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

JerreB · July 22, 2022, 3:41pm

#what i have tried so far is this, i want to get the title for example of the products over the page:

library(rvest)

productname <- read_html("https://www.colruyt.be/nl/producten") %>%
  +  html_nodes(".card__text") %>%
  +   html_text()
productname
(0)

but i don't get any output, so i am doing something wrong, but idk how

system · August 12, 2022, 3:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.