OK, so I was able to scrape a data frame for you which has the binary and the UTF-8 codes (I'm just showing you a subset because the first several entries are <control>
and blanks.
Because string encoding is, well, unpredictably weird, your results may vary, or you might want a different set of characters, etc., but the method I used should work for the various combinations available on the site:
https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=bin
library(tidyverse)
library(janitor)
library(rvest)
#> Loading required package: xml2
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:purrr':
#>
#> pluck
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
url <- "https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=bin"
utf8_enc <- url %>%
read_html() %>%
html_nodes(css = 'body > table.codetable') %>%
html_table()
utf8_enc_tab <- utf8_enc[[1]]
utf8_enc_tab <- utf8_enc_tab %>%
janitor::clean_names()
utf8_enc_tab %>%
slice(70:80)
#> unicodecode_point character utf_8_bin name
#> 1 U+0045 E 01000101 LATIN CAPITAL LETTER E
#> 2 U+0046 F 01000110 LATIN CAPITAL LETTER F
#> 3 U+0047 G 01000111 LATIN CAPITAL LETTER G
#> 4 U+0048 H 01001000 LATIN CAPITAL LETTER H
#> 5 U+0049 I 01001001 LATIN CAPITAL LETTER I
#> 6 U+004A J 01001010 LATIN CAPITAL LETTER J
#> 7 U+004B K 01001011 LATIN CAPITAL LETTER K
#> 8 U+004C L 01001100 LATIN CAPITAL LETTER L
#> 9 U+004D M 01001101 LATIN CAPITAL LETTER M
#> 10 U+004E N 01001110 LATIN CAPITAL LETTER N
#> 11 U+004F O 01001111 LATIN CAPITAL LETTER O
Created on 2019-02-28 by the reprex package (v0.2.1)
I did write it out to a csv, but I suggest you do the scraping on your own machine, since these things vary from OS to OS, etc.
You can then basically use this to do a lookup:
utf8_enc_tab <- utf8_enc[[1]]
utf8_enc_tab <- as_tibble(utf8_enc_tab) %>%
janitor::clean_names()
x <- "abc"
characters <- strsplit(x, "")[[1]]
char_frame <- tibble(chars = characters)
char_frame <- char_frame %>%
mutate(bits = pryr::bits(chars)) %>%
left_join(utf8_enc_tab, by = c("chars" = "character"))
char_frame
#> # A tibble: 3 x 5
#> chars bits unicodecode_point utf_8_bin name
#> <chr> <chr> <chr> <chr> <chr>
#> 1 a 01100001 U+0061 01100001 LATIN SMALL LETTER A
#> 2 b 01100010 U+0062 01100010 LATIN SMALL LETTER B
#> 3 c 01100011 U+0063 01100011 LATIN SMALL LETTER C