Hello! Yesterday I asked a question which someone very kindly answered for me. The problem is that their answer doesn't work for me and we don't understand why.
Problem: Take some foreign language text and output a list of words by frequency and count. Here is the R code.
library(tidyverse)
a <- tibble(
text = "Привет, друзья! Меня зовут Макс и добро пожаловать на мой подкаст!
Да, наконец-то, наконец-то я запустил, я сделал свой подкаст!
Ухуууу! И я очень, очень, очень рад этому!"
)
a <- sapply(a, function(x) strsplit(x, split = " ")) %>%
unlist() %>%
tolower() %>%
as_tibble() %>%
mutate(value = str_replace_all(value, "[^[:alnum:]]", "")) %>%
count(value)
a
The person who made it gets a nice list of words in Russian in alphabetical order with a count. But when I run the code this is what I see in the console...
> library(tidyverse)
>
> a <- tibble(
+ text = "Привет, друзья! Меня зовут Макс и добро пожаловать на мой подкаст!
+ Да, наконец-то, наконец-то я запустил, я сделал свой подкаст!
+ Ухуууу! И я очень, очень, очень рад этому!"
+ )
>
>
>
> a <- sapply(a, function(x) strsplit(x, split = " ")) %>%
+ unlist() %>%
+ tolower() %>%
+ as_tibble() %>%
+ mutate(value = str_replace_all(value, "[^[:alnum:]]", "")) %>%
+ count(value)
>
> a
# A tibble: 22 x 2
value n
<chr> <int>
1 "" 6
2 "<U+0434><U+0430>" 1
3 "<U+0434><U+043E><U+0431><U+0440><U+043E>" 1
4 "<U+0434><U+0440><U+0443><U+0437><U+044C><U+044F>" 1
5 "<U+0437><U+0430><U+043F><U+0443><U+0441><U+0442><U+0438><U+043B>" 1
6 "<U+0437><U+043E><U+0432><U+0443><U+0442>" 1
7 "<U+0438>" 2
8 "<U+043C><U+0430><U+043A><U+0441>" 1
9 "<U+043C><U+0435><U+043D><U+044F>" 1
10 "<U+043C><U+043E><U+0439>" 1
# ... with 12 more rows
I'm assuming it's something in my settings if the code works fine for them but not for me. Any ideas what it could be that I need to change in RStudio or maybe on my computer? Many thanks