I'm currently writing some code in R language in order to extract information of the funding that various projects on a website have acquired.
I am using the rvest-package in R.
Here is a sample of how the HTML-code on the website looks:
<title>Project 2030 is launched</title>
<div data-name="category">Domestic news</div> <!--/category-->
<div data-name="funding">25000000</div><!--/funding-->
My question is.. how can I do the same for the "funding" part - or more specifically, how can I extract the number 25000000? Using "html_node("div#funding)" or other varities does not seem to be sufficient.
I'd recommend using xpath to identify the specific nodes you want. Discovering node identifiers using SelectorGadget or writing your own CSS selectors often works great, but it can fail you when things aren't identified super carefully. Here's an example to grab the funding field: