I'm looking for a way to convert HTML characters (ie.... ³ entity number I think they are called) to unicode (ie. \uB003)? My use case is that I can get information like: "Streamflow, ft³/s", but I want to put it on a ggplot2 graph. Here is what I've come up with, but it's not (a) working and (b) ideal:
library(xml2)
unescape_html <- function(str){
xml2::xml_text(xml2::read_html(paste0("<x>", str, "</x>")))
}
ugly_string <- "Streamflow, ft³/s"
fancy_chars <- regmatches(ugly_string, gregexpr("&#\\d{3};", ugly_string))
replacement <- unescape_html(fancy_chars)
formatted_chars <- gsub(pattern = "&#\\d{3};",
replacement = replacement,
x = ugly_string)
formatted_chars
[1] "Streamflow, ft³/s"
So, it's close, but still got a funky Â. The end goal is to get ugly_string
as a not-so-ugly axis label on a ggplot plot
1 Like
Have you looked at bquote
?
You can add to your ggplot:
+ labs(x = bquote("Streamflow, " ~ ft^3/s), y = "blablabla")
(Or the other way around depending on whether streamflow is the x or y axis.)
Here is an example with a silly graph:
library(tidyverse)
my_dat <- tibble(
streamflow = letters[1:10],
y = 1:10
)
my_dat %>% ggplot(aes(streamflow, y)) +
geom_point() +
labs(x = bquote("Streamflow, " ~ ft^3/s), y = "y")
1 Like
Thanks! I can use bquote
when I know the equation. The issue is I have a web service that spits out THOUSANDS of parameters with these html codes (for instance, maybe it's cubic meters, or degrees, or who-knows-what). I'm trying to write a function (or use one that's already created) to make the labels pretty without needing to write them by hand.
I have a functional one now:
unescape_html <- function(str){
fancy_chars <- regmatches(str, gregexpr("&#\\d{3};",str))
unescaped <- xml2::xml_text(xml2::read_html(paste0("<x>", fancy_chars, "</x>")))
fancy_chars <- gsub(pattern = "&#\\d{3};",
replacement = unescaped, x = str)
fancy_chars <- gsub("Â","", fancy_chars)
return(fancy_chars)
}
unescape_html("Streamflow, ft³/s")
[1] "Streamflow, ft³/s"
I'm just not confident how robust it is.
3 Likes
how about
html_to_unicode <- function(x) {
tmp <- tempfile(fileext = ".html")
on.exit(file.remove(tmp))
tmp_out <- tempfile(fileext = ".md")
on.exit(file.remove(tmp_out))
write(x, tmp)
rmarkdown::pandoc_convert(tmp, output = tmp_out)
readLines(tmp_out)
}
ugly_string <- "Streamflow, ft³/s"
html_to_unicode(ugly_string)
#> [1] "Streamflow, ft³/s"
Created on 2018-04-11 by the reprex package (v0.2.0).
2 Likes