how to parse a json file from a txt

gfuse · January 5, 2020, 11:05am

How can I parse a json file on R Studio if there are more json in a string?
Thanks

andresrcs · January 5, 2020, 2:36pm

Hi, welcome!

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

gfuse · January 7, 2020, 2:50pm

Hi, thanks for your help!
I need to parse two columns in which I have: id (with only one value) and parsed_parameters (which has more json fields inside).

I send you an example:

My dataset is in csv format, but I have tried also to transform it in a txt file or in json file and I tried to import it in R but I cannot parse the two columns.
I would like to obtain each parameter in one column, and If an ID does not have a parameter I need NA

Can you help me please?

I have tried this R code but it does not work:
setwd("D:/")
df <- read.delim("status_changes_1.txt", header = TRUE, na.strings = -1)
install.packages("jsonlite")
library(jsonlite)
filepath<-"status_changes_1.txt"
prova <- fromJSON(filepath)

andresrcs · January 7, 2020, 4:37pm

A screenshot is not very useful, please follow the guide I gave you to learn how to share sample data on a copy/paste friendly format(or at least share a link to a sample file), that would make things a lot easier for people trying to help you.

gfuse · January 7, 2020, 4:46pm

Ok I try.
This is what I obtain from the command head(df)

head(df)
id.parsed_parameters
1
b39;"Pr"=>"342", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0"
2 c52;"Pr"=>"675", "Err"=>"255", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "PrPh"=>"4", "Temp"=>"255", "DelVal"=>"255",
3
130d;"Pr"=>"999", "Err"=>"1", "DryT"=>"0", "Lang"=>"201", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"1", "PrPh"=>"0", "Temp"=>"40", "FillR"=>"0", "Steam"=>"1"

Thanks

andresrcs · January 7, 2020, 5:13pm

That is still not copy/paste friendly (you need to read the guide more carefully) and also shows that you haven't correctly imported the data into R, a link to a sample file would be useful to help you.

gfuse · January 8, 2020, 9:11am

I cannot understand you, I am sorry. If you give me your email I send you the file.
The question for you is: with a json string, how can I parse it?
I cannot send you more than I have already sent. This is the problem.

Thanks
Giada

andresrcs · January 8, 2020, 2:00pm

What part of this reprex guide is confusing to you?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

It depends on the structure of the JSON file, that is why I'm asking for sample data.

gfuse · January 9, 2020, 7:59am

Ok I think now the reprex is ready. Here below:

setwd("D:/")
df <- read.delim("status_changes_1.txt", header = TRUE, na.strings = -1)
str(df)
names(df)
head(df)

install.packages("datapasta")
datapasta::df_paste(head(df,5)[,c('appliance_id.parsed_parameters')]

df_1 <- data.frame(stringsAsFactors = FALSE,
appliance_id = c('b39c5f0c-a614-47bb-a43c-87e312d211ef'),
parsed_parameters = c("Pr"=>"8", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"0", "PrPh"=>"0", "Temp"=>"0", "FillR"=>"0", "Steam"=>"0", "DelVal"=>"0", "MachMd"=>"2", "PrCode"=>"129", "SLevel"=>"0", "SpinSp"=>"16", "RemTime"=>"600", "RecipeId"=>"0", "DisTestOn"=>"0", "DisTestRes"=>"0", "WiFiStatus"=>"0", "CheckUpState"=>"0")
)

install.packages("jsonlite")
library(jsonlite)
filepath<-"status_changes_1.txt"
prova <- fromJSON(df)

I think the problem is that I have more json strings in one string.

Thanks again
Giada

andresrcs · January 9, 2020, 2:01pm

Sadly, no, I don't have access to "status_changes_1.txt" so I can't run your code, reproduce your issue and try to give you a solution, as I said before it would be much easier for anyone willing to help you if you upload a sample file to a cloud storage service (like dropbox, google drive, box, etc.) and share a link.

I can only try to give you a solution based on the incorrectly imported data you are showing.

library(tidyverse)

df_1 <- data.frame(stringsAsFactors = FALSE,
                   appliance_id = c('b39c5f0c-a614-47bb-a43c-87e312d211ef'),
                   parsed_parameters = '"Pr"=>"8", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"0", "PrPh"=>"0", "Temp"=>"0", "FillR"=>"0", "Steam"=>"0", "DelVal"=>"0", "MachMd"=>"2", "PrCode"=>"129", "SLevel"=>"0", "SpinSp"=>"16", "RemTime"=>"600", "RecipeId"=>"0", "DisTestOn"=>"0", "DisTestRes"=>"0", "WiFiStatus"=>"0", "CheckUpState"=>"0"'
)

df_1 %>%
    separate_rows(parsed_parameters, sep = ",") %>% 
    separate(parsed_parameters, c("parameter", "value"), sep = "=>") %>% 
    mutate_if(is.character, ~str_remove_all(., "\\\"")) %>% 
    mutate(value = as.numeric(value)) %>% 
    spread(parameter, value) %>% 
    as_tibble() # This is just for friendly console printing
#> # A tibble: 1 x 29
#>   appliance_id ` CheckUpState` ` DelVal` ` DisTestOn` ` DisTestRes` ` DryT`
#>   <chr>                  <dbl>     <dbl>        <dbl>         <dbl>   <dbl>
#> 1 b39c5f0c-a6…               0         0            0             0       0
#> # … with 23 more variables: ` Err` <dbl>, ` FillR` <dbl>, ` Lang` <dbl>, `
#> #   MachMd` <dbl>, ` Opt1` <dbl>, ` Opt2` <dbl>, ` Opt3` <dbl>, ` Opt4` <dbl>,
#> #   ` Opt5` <dbl>, ` Opt6` <dbl>, ` Opt7` <dbl>, ` Opt8` <dbl>, ` Opt9` <dbl>,
#> #   ` PrCode` <dbl>, ` PrPh` <dbl>, ` RecipeId` <dbl>, ` RemTime` <dbl>, `
#> #   SLevel` <dbl>, ` SpinSp` <dbl>, ` Steam` <dbl>, ` Temp` <dbl>, `
#> #   WiFiStatus` <dbl>, Pr <dbl>

^{Created on 2020-01-09 by the reprex package (v0.3.0.9000)}

gfuse · January 9, 2020, 3:13pm

Ok thanks.
here the link: https://www.dropbox.com/s/n8iqqzf6yek7g8t/parameters.csv?dl=0

many thanks!!
Giada

andresrcs · January 9, 2020, 6:36pm

Well, that file is not on a JSON format, it seems like you have already processed it and put the result in a .csv file, so I'm going to keep proposing the same solution.

library(tidyverse)

sample_data <- read_delim("https://www.dropbox.com/s/n8iqqzf6yek7g8t/parameters.csv?dl=1", delim = ";", quote = "\"")

sample_data %>%
    separate_rows(parameters, sep = ",") %>% 
    separate(parameters, c("parameter", "value"), sep = "=>") %>%
    drop_na() %>% 
    mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
    mutate(value = as.numeric(value)) %>% 
    pivot_wider(id_cols = id, names_from = parameter, values_from = value)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 664 rows [691,
#> 732, 802, 1582, 1807, 1947, 2189, 2812, 3966, 4256, 4460, 4895, 5991, 6227,
#> 6596, 6654, 6867, 7192, 7489, 7775, ...].
#> Warning: NAs introducidos por coerción
#> # A tibble: 9,993 x 286
#>       id    Pr   Err  DryT  Lang  Opt1  Opt2  Opt3  Opt4  Opt5  Opt6  Opt7  Opt8
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1     0     0     0     7     0     0     0     0     0     0     0     0
#>  2     2    55   255    NA    NA     0     0     0     0     0     0     0    NA
#>  3     3    12     0     0    21     0     0     0     0     0     0     0     0
#>  4     4     1     0     0    20     0     0     0     0     0     0     0     0
#>  5     5    90   255     0    NA     0     0     0     0     0     0     0     0
#>  6     6     3   255     0    NA     0     0     0     0     0     0     0     0
#>  7     7   234   255     0    NA     0     0     0     0     0     0     0     0
#>  8     8    16     0     0     0     0     0     0     0     0     0     0     0
#>  9     9    16     0     0     1     0     0     0     0     0     0     0     0
#> 10    10     3   255    NA    NA     0     0     0     0     0     0     0    NA
#> # … with 9,983 more rows, and 273 more variables: Opt9 <dbl>, PrPh <dbl>,
#> #   Temp <dbl>, FillR <dbl>, Steam <dbl>, DelVal <dbl>, MachMd <dbl>,
#> #   PrCode <dbl>, SLevel <dbl>, SpinSp <dbl>, RemTime <dbl>, RecipeId <dbl>,
#> #   DisTestOn <dbl>, DisTestRes <dbl>, WiFiStatus <dbl>, CheckUpState <dbl>,
#> #   Eco <dbl>, Reset <dbl>, OpzProg <dbl>, Program <dbl>, ExtraDry <dbl>,
#> #   MissSalt <dbl>, OpenDoor <dbl>, TreinUno <dbl>, MissRinse <dbl>,
#> #   StartStop <dbl>, StatoWiFi <dbl>, DelayStart <dbl>, MetaCarico <dbl>,
#> #   StatoDWash <dbl>, CodiceErrore <dbl>, OpenDoorOpt <dbl>, Det <dbl>,
#> #   T0R <dbl>, T0W <dbl>, TIW <dbl>, NtcD <dbl>, NtcW <dbl>, PF <dbl>,
#> #   Cmp <dbl>, Fan <dbl>, Ice <dbl>, Lck <dbl>, Spr <dbl>, Door <dbl>,
#> #   Heat <dbl>, Wifi <dbl>, CmpFq <dbl>, Error <dbl>, FrSet <dbl>, FzSet <dbl>,
#> #   FrHiTmp <dbl>, FzHiTmp <dbl>, IncrDoorCnt <dbl>, Soft <dbl>, motS <dbl>,
#> #   numF <dbl>, rED0 <dbl>, unbC <dbl>, unbF <dbl>, APSoff <dbl>, chartL <dbl>,
#> #   APSfreq <dbl>, DPrgCnt <dbl>, DetPreW <dbl>, DetWarn <dbl>,
#> #   forwarded <dbl>, SPrgCnt <dbl>, SoftPreW <dbl>, SoftWarn <dbl>,
#> #   WaterHard <dbl>, BMO <dbl>, DBN <dbl>, MMS <dbl>, RMO <dbl>, BaSt <dbl>,
#> #   DOFU <dbl>, MOLW <dbl>, AMin1 <dbl>, AMin2 <dbl>, AMin3 <dbl>, AMin4 <dbl>,
#> #   AMin5 <dbl>, AMin6 <dbl>, AMin7 <dbl>, AMin8 <dbl>, AgiSt <dbl>,
#> #   FiErr <dbl>, TMIHM <dbl>, Tarea <dbl>, Tdays <dbl>, LUTime <dbl>,
#> #   LockSt <dbl>, AvgSpeed <dbl>, lock <dbl>, Z1hot <dbl>, Z1low <dbl>,
#> #   Z1pan <dbl>, Z2hot <dbl>, Z2low <dbl>, …

^{Created on 2020-01-09 by the reprex package (v0.3.0.9000)}

gfuse · January 10, 2020, 8:02am

great! many thanks for your help
But if at the end of the script I write View(sample_data) I see again only two columns. How can I obtain a View of one column for each parameter?

Best

andresrcs · January 10, 2020, 10:31am

tidyverse functions do not perform in-place modifications, they create a new data frame instead, if you want changes to persist, you have to explicitly assign them to an object (it could be the same one of you want to override).

new_dataframe <- sample_data %>%
    separate_rows(parameters, sep = ",") %>% 
    separate(parameters, c("parameter", "value"), sep = "=>") %>%
    drop_na() %>% 
    mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
    mutate(value = as.numeric(value)) %>% 
    pivot_wider(id_cols = id, names_from = parameter, values_from = value)

gfuse · January 13, 2020, 1:25pm

I'm sorry, I have another question for you:
I need to make a pie chart with the % of the count of ID per each parameters, for example:
For the Pr = 1, I have 3000 id and I need to represent the % in a pie chart, respect to other values.
For the Pr = 2, I have 4500 id etc......

I have used pie() and also ggplot2 but it doesn't work...How can I do?
Thanks again

andresrcs · January 13, 2020, 1:35pm

This is a very different question, we like to keep things tidy around here, so please ask this on a new topic and don't forget to provide a REPRoducible EXample (reprex) for your new question.

gfuse · January 24, 2020, 1:19pm

Hi, I have a question regarding the json parsing for which you have helped me.

In this step:
sample_data %>%
separate_rows(parsed_parameters, sep = ",") %>%
separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = id, names_from = parsed_parameters, values_from = value)
view(sample_data)

new_dataframe <- sample_data %>%
separate_rows(parsed_parameters, sep = ",") %>%
separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = id, names_from = parsed_parameters, values_from = value)

I have noted that the columns which contain timestamp or character are not well parsed for example the parameter BMO or DOFU. they contain na but it is not correct.

How can I see every value?

Thanks in advance
Giada

andresrcs · January 24, 2020, 8:33pm

The sample data is no longer available at the link you provided so I don't have data to work with.

system · February 14, 2020, 8:39pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.