Hi!
I am trying to scrape a table from the web, and ultimately convert a column from a character string to numeric.
My complete set up:
library(readxl)
library(janitor)
library(tidyverse)
library(gt)
library(rvest)
library(reprex)
Scraping the table didn't provide any errors, but here's the code I used:
table_costs_messy <- list()
table_costs_messy <- read_html("https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2426671/")%>%
html_node("table") %>%
html_table(header = TRUE) %>%
clean_names() %>%
slice(-1) %>%
slice(1:17)
Where the problem starts:
I've scraped a table and cleaned it, and have tried to remove special characters from two columns so that I will be able to convert those values to integers later. This part of the code does not throw error messages, but I think this may be where my problem starts.
table_costs_messy[table_costs_messy == "not applicable"] <-NA
colnames(table_costs_messy)[2:3] <- c("xF", "xM")
gsub(x = table_costs_messy, pattern = "\\$|\\*", "")
Now that I've removed the unwanted characters, I'd hope to be able to make a
new table.This is where I get an error that I have not properly removed special characters.
table_costs_clean <- table_costs_messy %>%
pivot_longer(cols = starts_with("x"),
names_to = "Sex",
names_prefix = "x",
values_to = "Cost",
values_ptypes = list(Cost = integer()),
values_drop_na = FALSE
)
table_costs_clean
#> Error: Lossy cast from <character> to <integer>.
* Locations: 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
Thanks very much for any help or advice you might have!
Created on 2020-03-06 by the reprex package (v0.3.0)