Hi @Krim and welcome to RStudio Community

The table of data on the webpage is loaded via JavaScript and this is the reason why using {rvest} is not ideal to scrape it. This is because the table takes a few seconds to load when you visit the url: https://fundf10.eastmoney.com/jjjz_510300.html, so rvest::read_html() is not able to capture it as it only captures everything available on the webpage immediately after the site launches (i.e. the static HTML).
So, we are going to use the {RSelenium} package for this task. It is a package, which allows you to manipulate your browser right from your code using a Selenium server. The only downside is that it takes a few steps to setup. Everything is not ready out of the box when you install the package with install.packages("RSelenium"), but I'll do my best to walk you through all steps with as many details as possible. Also, it is important to mention that I am a Windows user.
Setup
- Install the latest version of Java: https://java.com/en/download/. Restart your computer when the installation process is over.
- Install Firefox. In my personal experience, manipulating firefox is easier from Selenium: https://www.mozilla.org/en-US/firefox/new/
Connect to Selenium server from R
# Load packages ----
pacman::p_load(RSelenium, purrr, rvest, glue)
# Start a Selenium server
driver <- rsDriver(port = 4444L, browser = "firefox")
remote_driver <- driver$client
Hopefully, everything has worked for you so far.
Mini tutorial
Now, let me walk you through a quick tutorial in which we will scrape the second page of the table on the webpage.
# Open browser ----
remote_driver$open() # This code will actually open the firefox browser
# Navigate to URL ----
url <- "https://fundf10.eastmoney.com/jjjz_510300.html"
remote_driver$navigate(url) # This code will actually open the website in the browser that opened up earlier
# Navigate to page 2 of the table ----
# ** Find page 2 button
page2_btn <- remote_driver$findElement(using = "css", value = glue(".pagebtns > label[value='2']"))
# ** Move pointer to button
remote_driver$mouseMoveToLocation(webElement = page2_btn)
# ** Click on page 2 button
page2_btn$click() # Notice how the browser goes to page 2 of the table
# Find table element in HTML page ----
table_el <- remote_driver$findElement(using = "css", value = "#jztable")
# Scrape table ----
table_page2 <- table_el$getElementAttribute("innerHTML") %>%
.[[1]] %>%
read_html() %>%
html_table() %>%
.[[1]]
table_page2
# A tibble: 20 x 7
`<U+51C0><U+503C><U+65E5><U+671F>` `<U+5355><U+4F4D><U+51C0><U+503C>` `<U+7D2F><U+8BA1><U+51C0><U+503C>` `<U+65E5><U+589E><U+957F><U+7387>` `<U+7533><U+8D2D><U+72B6><U+6001>` `<U+8D4E><U+56DE><U+72B6><U+6001>` `<U+5206><U+7EA2><U+9001><U+914D>`
<chr> <dbl> <dbl> <chr> <chr> <chr> <lgl>
1 2021-07-07 5.19 2.09 1.17% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
2 2021-07-06 5.13 2.07 0.02% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
3 2021-07-05 5.13 2.07 0.09% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
4 2021-07-02 5.12 2.07 -2.82% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
5 2021-07-01 5.27 2.13 0.09% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
6 2021-06-30 5.26 2.12 0.69% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
7 2021-06-29 5.23 2.11 -1.10% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
8 2021-06-28 5.29 2.13 0.22% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
9 2021-06-25 5.28 2.13 1.70% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
10 2021-06-24 5.19 2.10 0.19% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
11 2021-06-23 5.18 2.09 0.52% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
12 2021-06-22 5.15 2.08 0.63% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
13 2021-06-21 5.12 2.07 -0.25% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
14 2021-06-18 5.13 2.07 0.05% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
15 2021-06-17 5.13 2.07 0.47% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
16 2021-06-16 5.10 2.06 -1.63% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
17 2021-06-15 5.19 2.10 -1.12% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
18 2021-06-11 5.25 2.12 -0.81% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
19 2021-06-10 5.29 2.13 0.69% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
20 2021-06-09 5.25 2.12 0.11% <U+573A><U+5185><U+4E70><U+5165> <U+573A><U+5185><U+5356><U+51FA> NA
Scrape the full table
The mini tutorial shows all the steps needed to scrape the data from a specific page of the table. Now, we will package all these steps into a function and automate the scraping of all pages
# Find total number of pages ----
div_page_btns <- remote_driver$findElements(using = "css", value = "div.pagebtns")
n_pages <- div_page_btns[[1]]$findChildElements(using = "css", value = "label[value]") %>%
map_chr(~ unlist(.x$getElementText())) %>%
as.numeric() %>%
max(na.rm = TRUE)
# Create function (it uses all the steps in the mini tutorial) ----
scrape_table_page <- function(page){
message(glue::glue("Scraping data on page {page}."))
page_btn <- remote_driver$findElement(using = "css", value = glue::glue("div.pagebtns > label[value = '{page}']"))
remote_driver$mouseMoveToLocation(webElement = page_btn)
page_btn$click()
Sys.sleep(1) # Give browser a second to load the data on the new page
table_el <- remote_driver$findElement(using = "css", value = "#jztable")
table_el$getElementAttribute("innerHTML") %>%
.[[1]] %>%
read_html() %>%
html_table() %>%
.[[1]]
}
Now we can apply the function to all pages
mydata <- map_dfr(seq_len(n_pages), scrape_table_page)
Let me know if you have questions.