how to parse a json file from a txt

How can I parse a json file on R Studio if there are more json in a string?
Thanks

Hi, welcome!

We don't really have enough info to help you out. Could you ask this with a minimal REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

1 Like

Hi, thanks for your help!
I need to parse two columns in which I have: id (with only one value) and parsed_parameters (which has more json fields inside).

I send you an example:

My dataset is in csv format, but I have tried also to transform it in a txt file or in json file and I tried to import it in R but I cannot parse the two columns.
I would like to obtain each parameter in one column, and If an ID does not have a parameter I need NA

Can you help me please?

I have tried this R code but it does not work:
setwd("D:/")
df <- read.delim("status_changes_1.txt", header = TRUE, na.strings = -1)
install.packages("jsonlite")
library(jsonlite)
filepath<-"status_changes_1.txt"
prova <- fromJSON(filepath)

A screenshot is not very useful, please follow the guide I gave you to learn how to share sample data on a copy/paste friendly format(or at least share a link to a sample file), that would make things a lot easier for people trying to help you.

Ok I try.
This is what I obtain from the command head(df)

head(df)
id.parsed_parameters
1
b39;"Pr"=>"342", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0"
2 c52;"Pr"=>"675", "Err"=>"255", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "PrPh"=>"4", "Temp"=>"255", "DelVal"=>"255",
3
130d;"Pr"=>"999", "Err"=>"1", "DryT"=>"0", "Lang"=>"201", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"1", "PrPh"=>"0", "Temp"=>"40", "FillR"=>"0", "Steam"=>"1"

Thanks

That is still not copy/paste friendly (you need to read the guide more carefully) and also shows that you haven't correctly imported the data into R, a link to a sample file would be useful to help you.

I cannot understand you, I am sorry. If you give me your email I send you the file.
The question for you is: with a json string, how can I parse it?
I cannot send you more than I have already sent. This is the problem.

Thanks
Giada

What part of this reprex guide is confusing to you?

It depends on the structure of the JSON file, that is why I'm asking for sample data.

Ok I think now the reprex is ready. Here below:

setwd("D:/")
df <- read.delim("status_changes_1.txt", header = TRUE, na.strings = -1)
str(df)
names(df)
head(df)

install.packages("datapasta")
datapasta::df_paste(head(df,5)[,c('appliance_id.parsed_parameters')]

df_1 <- data.frame(stringsAsFactors = FALSE,
appliance_id = c('b39c5f0c-a614-47bb-a43c-87e312d211ef'),
parsed_parameters = c("Pr"=>"8", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"0", "PrPh"=>"0", "Temp"=>"0", "FillR"=>"0", "Steam"=>"0", "DelVal"=>"0", "MachMd"=>"2", "PrCode"=>"129", "SLevel"=>"0", "SpinSp"=>"16", "RemTime"=>"600", "RecipeId"=>"0", "DisTestOn"=>"0", "DisTestRes"=>"0", "WiFiStatus"=>"0", "CheckUpState"=>"0")
)

install.packages("jsonlite")
library(jsonlite)
filepath<-"status_changes_1.txt"
prova <- fromJSON(df)

I think the problem is that I have more json strings in one string.

Thanks again
Giada

Sadly, no, I don't have access to "status_changes_1.txt" so I can't run your code, reproduce your issue and try to give you a solution, as I said before it would be much easier for anyone willing to help you if you upload a sample file to a cloud storage service (like dropbox, google drive, box, etc.) and share a link.

I can only try to give you a solution based on the incorrectly imported data you are showing.

library(tidyverse)

df_1 <- data.frame(stringsAsFactors = FALSE,
                   appliance_id = c('b39c5f0c-a614-47bb-a43c-87e312d211ef'),
                   parsed_parameters = '"Pr"=>"8", "Err"=>"0", "DryT"=>"0", "Lang"=>"7", "Opt1"=>"0", "Opt2"=>"0", "Opt3"=>"0", "Opt4"=>"0", "Opt5"=>"0", "Opt6"=>"0", "Opt7"=>"0", "Opt8"=>"0", "Opt9"=>"0", "PrPh"=>"0", "Temp"=>"0", "FillR"=>"0", "Steam"=>"0", "DelVal"=>"0", "MachMd"=>"2", "PrCode"=>"129", "SLevel"=>"0", "SpinSp"=>"16", "RemTime"=>"600", "RecipeId"=>"0", "DisTestOn"=>"0", "DisTestRes"=>"0", "WiFiStatus"=>"0", "CheckUpState"=>"0"'
)

df_1 %>%
    separate_rows(parsed_parameters, sep = ",") %>% 
    separate(parsed_parameters, c("parameter", "value"), sep = "=>") %>% 
    mutate_if(is.character, ~str_remove_all(., "\\\"")) %>% 
    mutate(value = as.numeric(value)) %>% 
    spread(parameter, value) %>% 
    as_tibble() # This is just for friendly console printing
#> # A tibble: 1 x 29
#>   appliance_id ` CheckUpState` ` DelVal` ` DisTestOn` ` DisTestRes` ` DryT`
#>   <chr>                  <dbl>     <dbl>        <dbl>         <dbl>   <dbl>
#> 1 b39c5f0c-a6…               0         0            0             0       0
#> # … with 23 more variables: ` Err` <dbl>, ` FillR` <dbl>, ` Lang` <dbl>, `
#> #   MachMd` <dbl>, ` Opt1` <dbl>, ` Opt2` <dbl>, ` Opt3` <dbl>, ` Opt4` <dbl>,
#> #   ` Opt5` <dbl>, ` Opt6` <dbl>, ` Opt7` <dbl>, ` Opt8` <dbl>, ` Opt9` <dbl>,
#> #   ` PrCode` <dbl>, ` PrPh` <dbl>, ` RecipeId` <dbl>, ` RemTime` <dbl>, `
#> #   SLevel` <dbl>, ` SpinSp` <dbl>, ` Steam` <dbl>, ` Temp` <dbl>, `
#> #   WiFiStatus` <dbl>, Pr <dbl>

Created on 2020-01-09 by the reprex package (v0.3.0.9000)

Ok thanks.
here the link: https://www.dropbox.com/s/n8iqqzf6yek7g8t/parameters.csv?dl=0

many thanks!!
Giada

Well, that file is not on a JSON format, it seems like you have already processed it and put the result in a .csv file, so I'm going to keep proposing the same solution.

library(tidyverse)

sample_data <- read_delim("https://www.dropbox.com/s/n8iqqzf6yek7g8t/parameters.csv?dl=1", delim = ";", quote = "\"")

sample_data %>%
    separate_rows(parameters, sep = ",") %>% 
    separate(parameters, c("parameter", "value"), sep = "=>") %>%
    drop_na() %>% 
    mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
    mutate(value = as.numeric(value)) %>% 
    pivot_wider(id_cols = id, names_from = parameter, values_from = value)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 664 rows [691,
#> 732, 802, 1582, 1807, 1947, 2189, 2812, 3966, 4256, 4460, 4895, 5991, 6227,
#> 6596, 6654, 6867, 7192, 7489, 7775, ...].
#> Warning: NAs introducidos por coerción
#> # A tibble: 9,993 x 286
#>       id    Pr   Err  DryT  Lang  Opt1  Opt2  Opt3  Opt4  Opt5  Opt6  Opt7  Opt8
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1     0     0     0     7     0     0     0     0     0     0     0     0
#>  2     2    55   255    NA    NA     0     0     0     0     0     0     0    NA
#>  3     3    12     0     0    21     0     0     0     0     0     0     0     0
#>  4     4     1     0     0    20     0     0     0     0     0     0     0     0
#>  5     5    90   255     0    NA     0     0     0     0     0     0     0     0
#>  6     6     3   255     0    NA     0     0     0     0     0     0     0     0
#>  7     7   234   255     0    NA     0     0     0     0     0     0     0     0
#>  8     8    16     0     0     0     0     0     0     0     0     0     0     0
#>  9     9    16     0     0     1     0     0     0     0     0     0     0     0
#> 10    10     3   255    NA    NA     0     0     0     0     0     0     0    NA
#> # … with 9,983 more rows, and 273 more variables: Opt9 <dbl>, PrPh <dbl>,
#> #   Temp <dbl>, FillR <dbl>, Steam <dbl>, DelVal <dbl>, MachMd <dbl>,
#> #   PrCode <dbl>, SLevel <dbl>, SpinSp <dbl>, RemTime <dbl>, RecipeId <dbl>,
#> #   DisTestOn <dbl>, DisTestRes <dbl>, WiFiStatus <dbl>, CheckUpState <dbl>,
#> #   Eco <dbl>, Reset <dbl>, OpzProg <dbl>, Program <dbl>, ExtraDry <dbl>,
#> #   MissSalt <dbl>, OpenDoor <dbl>, TreinUno <dbl>, MissRinse <dbl>,
#> #   StartStop <dbl>, StatoWiFi <dbl>, DelayStart <dbl>, MetaCarico <dbl>,
#> #   StatoDWash <dbl>, CodiceErrore <dbl>, OpenDoorOpt <dbl>, Det <dbl>,
#> #   T0R <dbl>, T0W <dbl>, TIW <dbl>, NtcD <dbl>, NtcW <dbl>, PF <dbl>,
#> #   Cmp <dbl>, Fan <dbl>, Ice <dbl>, Lck <dbl>, Spr <dbl>, Door <dbl>,
#> #   Heat <dbl>, Wifi <dbl>, CmpFq <dbl>, Error <dbl>, FrSet <dbl>, FzSet <dbl>,
#> #   FrHiTmp <dbl>, FzHiTmp <dbl>, IncrDoorCnt <dbl>, Soft <dbl>, motS <dbl>,
#> #   numF <dbl>, rED0 <dbl>, unbC <dbl>, unbF <dbl>, APSoff <dbl>, chartL <dbl>,
#> #   APSfreq <dbl>, DPrgCnt <dbl>, DetPreW <dbl>, DetWarn <dbl>,
#> #   forwarded <dbl>, SPrgCnt <dbl>, SoftPreW <dbl>, SoftWarn <dbl>,
#> #   WaterHard <dbl>, BMO <dbl>, DBN <dbl>, MMS <dbl>, RMO <dbl>, BaSt <dbl>,
#> #   DOFU <dbl>, MOLW <dbl>, AMin1 <dbl>, AMin2 <dbl>, AMin3 <dbl>, AMin4 <dbl>,
#> #   AMin5 <dbl>, AMin6 <dbl>, AMin7 <dbl>, AMin8 <dbl>, AgiSt <dbl>,
#> #   FiErr <dbl>, TMIHM <dbl>, Tarea <dbl>, Tdays <dbl>, LUTime <dbl>,
#> #   LockSt <dbl>, AvgSpeed <dbl>, lock <dbl>, Z1hot <dbl>, Z1low <dbl>,
#> #   Z1pan <dbl>, Z2hot <dbl>, Z2low <dbl>, …

Created on 2020-01-09 by the reprex package (v0.3.0.9000)

1 Like

great! many thanks for your help
But if at the end of the script I write View(sample_data) I see again only two columns. How can I obtain a View of one column for each parameter?

Best

tidyverse functions do not perform in-place modifications, they create a new data frame instead, if you want changes to persist, you have to explicitly assign them to an object (it could be the same one of you want to override).

new_dataframe <- sample_data %>%
    separate_rows(parameters, sep = ",") %>% 
    separate(parameters, c("parameter", "value"), sep = "=>") %>%
    drop_na() %>% 
    mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
    mutate(value = as.numeric(value)) %>% 
    pivot_wider(id_cols = id, names_from = parameter, values_from = value)
2 Likes

I'm sorry, I have another question for you:
I need to make a pie chart with the % of the count of ID per each parameters, for example:
For the Pr = 1, I have 3000 id and I need to represent the % in a pie chart, respect to other values.
For the Pr = 2, I have 4500 id etc......

I have used pie() and also ggplot2 but it doesn't work...How can I do?
Thanks again

This is a very different question, we like to keep things tidy around here, so please ask this on a new topic and don't forget to provide a REPRoducible EXample (reprex) for your new question.

1 Like

Hi, I have a question regarding the json parsing for which you have helped me.

In this step:
sample_data %>%
separate_rows(parsed_parameters, sep = ",") %>%
separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = id, names_from = parsed_parameters, values_from = value)
view(sample_data)

new_dataframe <- sample_data %>%
separate_rows(parsed_parameters, sep = ",") %>%
separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
drop_na() %>%
mutate_if(is.character, ~str_remove_all(., "[\"\s]")) %>%
mutate(value = as.numeric(value)) %>%
pivot_wider(id_cols = id, names_from = parsed_parameters, values_from = value)

I have noted that the columns which contain timestamp or character are not well parsed for example the parameter BMO or DOFU. they contain na but it is not correct.

How can I see every value?

Thanks in advance
Giada

The sample data is no longer available at the link you provided so I don't have data to work with.