I have a personal chat downloaded from Telegram which I would like to get into a tidy format with 3 columns: name, timestamp, and message text. Optionally, it might be useful to have a separate tagging for whether it was a test message, reply, forwarded message, or sticker. I tried following the guide on this page to no avail, perhaps because the structure of the JSON has changed.
Here is the head of the current structure:
{
"name": "Grace 🧤",
"type": "personal_chat",
"id": 2730825451,
"messages": [
{
"id": 1980499,
"type": "message",
"date": "2020-01-01T00:00:02",
"from": "Henry",
"from_id": 4325636679,
"text": "It's 2020..."
},
{
"id": 1980500,
"type": "message",
"date": "2020-01-01T00:00:04",
"from": "Henry",
"from_id": 4325636679,
"text": "Fireworks!"
},
{
"id": 1980501,
"type": "message",
"date": "2020-01-01T00:00:05",
"from": "Grace 🧤 ðŸ’",
"from_id": 4720225552,
"text": "You're a minute late!"
},
{
> str(tele.json)
List of 4
$ name : chr "Grace <U+0001F9E4>"
$ type : chr "personal_chat"
$ id : num 4.72e+09
$ messages:List of 312397
..$ :List of 6
.. ..$ id : num 1980499
.. ..$ type : chr "message"
.. ..$ date : chr "2020-01-01T00:00:02"
.. ..$ from : chr "Henry"
.. ..$ from_id: num 4.33e+09
.. ..$ text : chr "It's 2020.."
..$ :List of 6
.. ..$ id : num 1980500
.. ..$ type : chr "message"
.. ..$ date : chr "2020-01-01T00:00:04"
.. ..$ from : chr "Henry"
.. ..$ from_id: num 4.33e+09
.. ..$ text : chr "Fireworks!"
I tried importing it as such:
library(rjson)
tele.json <- fromJSON(file = "twentytwenty.json")
# Replicating the example on the website given, I get NULL
rlist::list.filter(tele.json[["chats"]][["list"]],
.[["name"]] == "Henry")
NULL
Each message looks something like this:
$messages[[974]]
$messages[[974]]$id
[1] 1981527
$messages[[974]]$type
[1] "message"
$messages[[974]]$date
[1] "2020-01-01T21:39:51"
$messages[[974]]$from
[1] "Henry"
$messages[[974]]$from_id
[1] 4325636679
$messages[[974]]$text
[1] "The quick brown fox jumped over a lazy dog"
# Prints till [[1000]] then truncates
[ reached getOption("max.print") -- omitted 311397 entries ]
Replies look like this (I show the str
as I was unable to find it in print
):
..$ :List of 7
.. ..$ id : num 1980589
.. ..$ type : chr "message"
.. ..$ date : chr "2020-01-01T00:13:43"
.. ..$ from : chr "Grace <U+0001F9E4> <U+0001F352>"
.. ..$ from_id : num 4.72e+09
.. ..$ reply_to_message_id: num 1980585
.. ..$ text : chr "I like trains~"
I presume forwarded messages would be different too, but I was unable to locate an example.
Stickers look like this, which I would like to label so I can count or remove them:
$messages[[969]]
$messages[[969]]$id
[1] 1981522
$messages[[969]]$type
[1] "message"
$messages[[969]]$date
[1] "2020-01-01T21:39:24"
$messages[[969]]$from
[1] "Grace \U0001f9e4 \U0001f352"
$messages[[969]]$from_id
[1] 4720225552
$messages[[969]]$file
[1] "(File not included. Change data exporting settings to download.)"
$messages[[969]]$thumbnail
[1] "(File not included. Change data exporting settings to download.)"
$messages[[969]]$media_type
[1] "sticker"
$messages[[969]]$sticker_emoji
[1] "\U0001f60d"
$messages[[969]]$width
[1] 512
$messages[[969]]$height
[1] 512
$messages[[969]]$text
[1] ""
Changes in emojis attached to the saved name also seems like it might be an issue, though I suppose it would be possible to filter by str_detect the name in a case-insensitive fashion.
Appreciate any help!