jsonl to dataframe

StevenUA · January 28, 2020, 10:08am

We have gathered historical tweets regarding public organisations and want to do a sentiment analysis on them. The type of file is a ".jsonl", meaning that every line is a separate json (see screenshot). We need to convert the information to a normal dataframe to start working on the data. Can somebody please explain which code we need? I can't find clear answers on the internet on how to deal with json lines.

Kind regard
Steven

andresrcs · January 28, 2020, 12:10pm

Hi, welcome!

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

FAQ: What's a reproducible example (`reprex`) and how do I create one? meta

Why reprex? Getting unstuck is hard. Your first step here is usually to create a reprex, or reproducible example. The goal of a reprex is to package your code, and information about your problem so that others can run it and feel your pain. Then, hopefully, folks can more easily provide a solution. What's in a Reproducible Example? Parts of a reproducible example: background information - Describe what you are trying to do. What have you already done? complete set up - include any library() calls and data to reproduce your issue. data for a reprex: Here's a discussion on setting up data for a reprex make it run - include the minimal code required to reproduce your error on the data…

StevenUA · February 6, 2020, 1:44pm

You can find the jsonl-file in the following dropbox: https://www.dropbox.com/s/49k8luoup9a1miu/twitter_premium.jsonl?dl=0

I still am unable to read it as a simple dataframe in r and can't find clear answers online.

nirgrahamuk · February 6, 2020, 2:56pm

I can get you this far, but no further... theres an issue with each json not having the same dimensions necessarily as the others, so making one single table to hold them all is frought. my solution works as far as giving you a list of tables, one for each of the 400 json

library(tidyverse)
library(jsonlite)
library(purrr)

fileName <- "twitter_premium.jsonl"
tpj <- readLines(fileName, file.info(fileName)$size)
tpj2 <- paste0("[",tpj,"]")

list_of_json <- map(tpj2,~as_tibble(jsonlite::fromJSON(.)))


#error from inconsistent table structures

try_one_table <- map_dfr(tpj2,~as_tibble(jsonlite::fromJSON(.)))
#Error: Argument 13 can't be a list containing data frames

system · February 27, 2020, 2:56pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.