wikipedia table has observations in wrong column after importing using rvest

Mubanga · August 8, 2021, 10:03pm

Hi folks,

I was looking at the rvest package and tried to use it to scrape this table from Wikipedia List of 5G NR networks - Wikipedia

Here is my code

library(rvest)
library(tidyverse)

url <- c("https://en.wikipedia.org/wiki/List_of_5G_NR_networks")
wikipage <- read_html(url) 
data_table <- html_nodes(wikipage, "table") 

# pick the second table using pluck from purrr
nr_table <- data_table %>% 
  html_table(header = TRUE) %>%
purrr::pluck(2)
# fill is now deprecated according to the documentation, so I left the argument (Fill =TRUE) out of the html_table() function.

# when I view the imported table, i notice some observations are in the wrong column. For example Vodafone is now under the country or territory column the same as "n5: 10 MHz(Mar 2021)". This is not correct.
View(nr_table)

How can I make sure rvest's read_html () function preserves the table structure and observations do not get shifted to the wrong column(variable)?
Thanks in advance.

Mubanga · August 10, 2021, 11:23pm

I now understand why I am having issues with this particular table in wikipedia ("List of 5G NR networks - Wikipedia") -- essentially the html_table() function makes a few assumptions one of them being that no cell spans multiple rows... this is what is causing my issue since I have cells (from the country column) in the wikipedia table that span multiple rows. Based on this new understanding from the documentation, I should be able to find a workaround soon..

system · August 31, 2021, 11:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.