Hi, I'm trying to do some analysis on my Facebook Messages data, however I'm running into some encoding issues. Facebook allows you to download your data as .json files. I want to analyze some of the emojis and text used, but am having difficulties converting them. For example, in the raw .json file, I see:
From what I read in a similar post, JSON does not allow \u to be part of its structure, so this file should be invalid. That's why it's not getting loaded properly
Maybe you can add an extra escape character to every \u in the file to prevent it from loading incorrectly.
Thanks for the link. Since I don't control the creation of these json files, and there are many of them, I guess I'd need to write like a bash command to loop through them all and do a find/replace? Or is there a way I can do this processing within R?
You could do this in R, but I think the easier solution might be to see if you can change the type of JSON file. How did you download it? Is there any option to change the type of json or the encoding when downloading?
It looks like facebook gives you the option of json or html for an export, but that's about it. I'm surprised a company like that would produce non-valid json code. I can see what the html files look like, but I figured json would be easier to parse.
I found this which might be related, but not sure how I can solve this in R?
Hi PJ,
thanks so much for the example. Since that's not the only messy unicode, I'll just start working through and start replacing as needed as I encounter these.
I also found this thread on SO, so it looks like a known issue. Maybe I can use some of the code there via reticulate and do this at a systematic level.