Parse a json field from a csv which shows only the max value among a serie of values

gfuse · February 7, 2020, 10:49am

Hi,
I need help on one issue:
I need to parse a json field called "payload" in the csv, and I would like to obtain twon columns for each ID: Counters and last_counters
For each of these two columns I want to see the maximun value.
For example: in the json counters I can have these values: 10, 11, 130, 150.
I want to see only the value 150

at this link you find the csv:
https://www.dropbox.com/s/g7o1p0m1i9kenln/statistic_example.xlsx?dl=0

Can you help me plase?
Thanks!

Giada

pieterjanvc · February 7, 2020, 12:14pm

Hi,

Could you explain in a bit more detail what the final data structure would look like (generate a data frame with some mock values). I'm confused about the columns per ID.

Do you want a separate data frame per id, or each ID a new column? The latter is not possible as the columns would be different length.
Some IDs don't have last_counters so how would you handle that?
What should happen if both are empty?

Again, please generate an example of the final table (or whatever you want to create) so we can see what you want. You can use the reprex guide to help you. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

Hope this helps,
PJ

gfuse · February 7, 2020, 1:03pm

Thanks for your response!
I want to see a dataframe with three columns:

id ---> as-is
counters --- > with only the max value of the series
last counters --- > with only the max value of the series

I can parse the json with the following code but I cannot how to sho the max value:

install.packages("tidyverse")
library(tidyverse)
setwd("D:/")
statistics_total <- read_delim("statistic_example.csv", delim = ";", quote = """)

statistics_total %>%
separate_rows(payload, sep = ",") %>%
separate(payload, c("payload", "value"), sep = ":") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = id, names_from = payload, values_from = value)

new_dataframe <- statistics_total %>%
separate_rows(payload, sep = ",") %>%
separate(payload, c("payload", "value"), sep = ":") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = appliance_id, names_from = payload, values_from = value)

pieterjanvc · February 7, 2020, 2:23pm

Hi,

just to be clear: the original file is JSON text pasted in a csv file right? There is no original valid JSON file?

PJ

gfuse · February 7, 2020, 2:42pm

Yes, right. I made a query from the database and I downloaded it as a csv.

pieterjanvc · February 7, 2020, 10:21pm

Hi,

How about this:

library(stringr)
library(dplyr)
library(purrr)

#This should be the same as your csv file when read
myData = data.frame(
  stringsAsFactors = FALSE,
                ID = c(1L,2L,3L,4L,5L,6L,7L,8L,
                       9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,
                       20L,21L,22L,23L,24L),
           payload = c("{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"10,65538,244,10,11,13,18\",\"last_counters\":\"10,3,244,10,11,13,18\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}","{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"13908,13105,13876,13414,13880,13876,13873,13364,13413,14137,13112,14137,14133,13107,13876,13366,13107,13105,14129,12897,13366,13365,13413,13619,13618,13625,13616,13620,13414,13410,13621,13113,0,13361,13362,14130,13921,13365,13107\"}",
                       "{\"counters\":\"13868,13105,13876,13414,13880,13876,13873,13364,13413,14137,13112,14137,14133,13107,13876,13366,13107,13105,14129,12897,13366,13365,13413,13619,13618,13625,13616,13620,13414,13410,13621,13113,0,13622,13366,13625,13365,14136,13366\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"12905,14391,25138,78130,13366,12851,77874,78130,13363,13874,13620,25908,13108,12853,14645,77876,13365,26165,90675,13620,14645,91186,79926,77874,12595,12598,13365,25142,13878,14389,14643,25910,0,14133,13877,12343,13110,12852,25136\",\"last_counters\":\"0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}","{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"\",\"last_counters\":\"\"}",
                       "{\"counters\":\"13925,13105,13876,13414,13880,13876,13873,13364,13413,14137,13112,14137,14133,13107,13876,13366,13107,13105,14129,12897,13366,13365,13413,13619,13618,13625,13616,13620,13414,13410,13621,13113,0,13412,13362,13873,13880,13366,14179\"}",
                       "{\"counters\":\"12902,14391,25138,12595,13366,12851,12339,12595,13363,13874,13620,25908,13108,12853,14645,12341,13365,26165,25140,13620,14645,25651,13110,13109,25908,12853,14390,14390,25142,25398,14134,24885,0,14131,12339,13623,25908,25654,13156\"}",
                       "{\"counters\":\"13826,13105,13876,13414,13880,13876,13873,13364,13413,14137,13112,14137,14133,13107,13876,13366,13107,13105,14129,12897,13366,13365,13413,13619,13618,13625,13616,13620,13414,13410,13621,13113,0,14179,13880,13154,14134,13366,13367\"}",
                       "{\"counters\":\"13919,13105,13876,13414,13880,13876,13873,13364,13413,14137,13112,14137,14133,13107,13876,13366,13107,13105,14129,12897,13366,13365,13413,13619,13618,13625,13616,13620,13414,13410,13621,13113,0,13365,13107,13112,14135,13625,13364\"}",
                       "{\"counters\":\"0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0\"}")
)

#Extract the values from the JSON string per line
extractData = map(myData$payload, function(x){
  data = str_match_all(x, ":\"([\\d,\\.]+)\"")[[1]][,2]
  
  #Sometimes there is no last_counters, in that case return NA for that
  if(length(data) == 2){
    list(
      max(str_split(data[1], ",") %>% unlist %>% as.integer()),
      max(str_split(data[2], ",") %>% unlist %>% as.integer())
    )
  } else {
    list(
      max(str_split(data[1], ",") %>% unlist %>% as.integer()),
      NA
    )
  }
})

#Build a data frame with the IDs and the extracted data
myData = data.frame(ID = myData$ID, 
                    maxCounter = sapply(extractData, "[[", 1),
                    maxLastCounter = sapply(extractData, "[[", 2))

There are a LOT of NAs in this data frame, so I'm not sure what you want to use it for but here it is anyway

Hope this helps,
PJ

gfuse · February 8, 2020, 9:34am

Thanks! Buy i have Only One question: you insert for each id the value: id=c(1,2,3....) Buy i have a lot of id.. how i can insert the entire column without insert each value manually?
The same thing with counters and last counters....i have around 10.000 rows....

Thanks again

andresrcs · February 8, 2020, 12:00pm

That is just sample data for the sake of making the example reproducible, you are not supposed to manually type those values, you have to use your actual data instead.

gfuse · February 10, 2020, 1:29pm

Perfect, it works! here n example of the output obtained
Many thanks for your help!

andresrcs · February 10, 2020, 1:33pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · February 24, 2020, 12:26pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.