Hello, I want to scrap a website for demo purposes.
The code of the website is:
<!DOCTYPE html><html lang="en-US" prefix="og: http://ogp.me/ns#" ng-app="TUMNewsApp"><head><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta charset="UTF-8"><meta name="robots" content="index, follow"><meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"><meta name="google-site-verification" content="NhJYhvGtruZZ2iJmsYz1KuofthSHyl3icQQYT3wba6k" /><link rel="shortcut icon"
[...]
target="_blank">tum.school.of.management</a> </span></div><div class="social_item"> <i class="icon-icn_tum_youtube"></i> <span> Subscribe to our channel!<br> <a href="//www.youtube.com/channel/UCXdFu0pi275lddSR1HjLg8A" target="_blank">TUM School of Management</a> </span></div></div><div class="social rss"> <a target="_blank" href="https://www.wi.tum.de/feed/rss/"><div class="social_item"> <i class="icon-icn_tum_RSS"></i> <span> <b>NEWS RSS abonnieren</b> </span></div> </a></div></div></div></div> <script>(function ( $ ) {
$( document ).ready( function () {
$('.feature_8871').slick({
infinite: false,
slidesToShow: 1,
slidesToScroll: 1,
arrows: true,
dots: true,
autoplay: true,
autoplaySpeed: 5000
});
var count = $('.feature_8871 .slick-dots li').length;
$(".feature_8871 .slick-dots li").each(function () {
$(this).find("button").remove();
});
});
})( jQuery );</script> </div></div></div></div><div class="vc_row wpb_row row bg_default"><div class="wpb_column vc_column_container vc_col-sm-12 col-xs-12 col-sm-12"><div class="vc_column-inner"><div class="wpb_wrapper"><div class="dhsv_vc_anker point"><div id="News-Archive" data-ankername="News Archive" class="ankerpoint"></div></div><div class="wpb_text_column wpb_content_element " ><div class="wpb_wrapper"><h3>News Archive</h3></div></div> <script language='javascript'>var beitraege = [ {
ID:'59888',
url:'https://www.wi.tum.de/wp-content/uploads/2019/02/Fotolia_208486536_S-300x169.jpg',
category:' <span>International</span> ',
tag:[45],
permalink: 'https://www.wi.tum.de/tum-ranked-in-the-first-league-with-study-quality/',
date:'8 May 2019',
title:'TUM ranked in the first league with study quality',
exerpt: 'CHE University Ranking: Students rate engineering programs Students give the Technical University of Munich (TUM) many high marks. This is seen in the latest rankings from the Centre for Higher … <br>Read More here <i class="icon-icn_tum_internlink"></i>'
}, {
ID:'59594',
url:'https://www.wi.tum.de/wp-content/uploads/2018/04/20170323_bwl_Flyer_AH_311912-300x200.jpg',
category:' <span>General</span> <span>Student Life</span> <span>Alumni</span> ',
tag:[15, 41, 222],
permalink: 'https://www.wi.tum.de/applications-open-for-the-social-impact-award-2019/',
date:'4 May 2019',
title:'Applications open for the Social Impact Award 2019',
exerpt: 'TUM School of Management students and graduates can now submit their projects for the Social Impact Award 2019. If your project study, Bachelor’s or Master’s thesis tackles a social issue … <br>Read More here <i class="icon-icn_tum_internlink"></i>'
}, {
ID:'59614',
url:'https://www.wi.tum.de/wp-content/uploads/2019/05/Fotolia_261607955_S_WP-sized-300x169.jpg',
category:' <span>General</span> <span>Studies</span> ',
tag:[15, 91],
permalink: 'https://www.wi.tum.de/subject-with-high-returns-why-business-studies-at-universities-in-germany-must-not-be-weakened-by-prof-dr-friedl-and-prof-dr-hutzschenreuter/',
date:'3 May 2019',
title:'Subject with high returns – Why business studies at universities in Germany must not be weakened by Prof. Dr. Friedl and Prof. Dr. Hutzschenreuter',
exerpt: 'On April 18th 2019, the Frankfurter Allgemeine Zeitung published an article by Prof. Dr. Gunther Friedl and Prof. Dr. Thomas Hutzschenreuter why business studies at universities in Germany must be … <br>Read More here <i class="icon-icn_tum_internlink"></i>'
}, {
[...]
I want to save the different events into a data frame.
For example:
[...]
ID:'59888',
url:'https://www.wi.tum.de/wp-content/uploads/2019/02/Fotolia_208486536_S-300x169.jpg',
category:' <span>International</span> ',
tag:[45],
permalink: 'https://www.wi.tum.de/tum-ranked-in-the-first-league-with-study-quality/',
date:'8 May 2019',
title:'TUM ranked in the first league with study quality',
exerpt: 'CHE University Ranking: Students rate engineering programs Students give the Technical University of Munich (TUM) many high marks. This is seen in the latest rankings from the Centre for Higher … <br>Read More here <i class="icon-icn_tum_internlink"></I>'
[...]
But I can't scrape the website with CSS or Xpath.
I tried this:
tumNews <-read_html("https://www.wi.tum.de/about-2/news-events/")
tumNews %>%
html_nodes(".boxview , .excerpt , .ng-binding+ .ng-binding") %>%
html_text()
But I don't get the values I am looking for. Is there someone who can help me? Thanks in advance!