How to parse a json field

Hello,
I have a problem with a json field in a csv. I should parse the field Parsed_parameters in this dataset_example:

The code I use is:

install.packages("tidyverse")
library(tidyverse)
setwd("D:/")
status_washing <- read_delim("dataset_example.csv", delim = ";", quote = "\"")

status_washing %>%
  separate_rows(parsed_parameters, sep = ",") %>% 
  separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
  drop_na() %>% 
  mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
  mutate(value = as.numeric(value)) %>% 
  pivot_wider(id_cols = appliance_id, names_from = parsed_parameters, values_from = value)
view(status_washing)

new_dataframe <- status_washing %>%
  separate_rows(parsed_parameters, sep = ",") %>% 
  separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
  drop_na() %>% 
  mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
  pivot_wider(id_cols = id, names_from = parsed_parameters, values_from = value)

but the result is:

Values in `value` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(value = list)` to suppress this warning.
* Use `values_fn = list(value = length)` to identify where the duplicates arise
* Use `values_fn = list(value = summary_fun)` to summarise duplicates

How can I see each value for each parameter in column?

Thanks
Giada

1 Like

Entries like ""DryT""=>""1"", are going to be difficult to parse. You mentioned json field. Does the data come from a json file originally, and then converted to csv? If so, a package designed to read json, such as rjson will avoid the problem.

Yes it was a json file riginally, then converted to csv.
How can rjson help me?
I think the code I wrote it's correct, because ""DryT""=>""1"" is parsed correctly, but I cannot see values inside of each column. For example I obtain:
DryT
instead of 1.

The code shows me an error:
Values in value are not uniquely identified; output will contain list-cols.

  • Use values_fn = list(value = list) to suppress this warning.
  • Use values_fn = list(value = length) to identify where the duplicates arise
  • Use values_fn = list(value = summary_fun) to summarise duplicates

Thanks for your help
Giada

I obtain:
DryT
"null"
instead of 1

Working with the json file directly removes an unnecessary layer of complication by converting to csv and then parsing it into a dataframe.

I only have the csv, not the original json

Could work better with

pivot_wider(names_from = parsed_parameters, values_from = value)

Omitting id_cols parameter.

> status_washing %>%
+   separate_rows(parsed_parameters, sep = ",") %>% 
+   separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
+   drop_na() %>% 
+   mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
+   mutate(value = as.numeric(value)) %>% 
+   pivot_wider(names_from = parsed_parameters, values_from = value)
# A tibble: 13 x 43
    user status_id machine_id    Pr   Err   T0R   T0W   TIW  DryT  Lang  NtcD  NtcW  Opt1  Opt2
   <dbl>     <dbl>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1        22        123    14     0     0     0     0     1     6   682   429     0     0
 2     2        33        123    14     0     0     0     0     1     6   811   401     0     0
 3     3        44        123    14     0     0     0     0     1     6   639   442     0     0
 4     4        55        123    14     0     0     0     0     1     6   639   442     0     0
 5     5        66        123    14     0     0     0     0     1     6   316   160     0     0
 6     6        77        123     6     0    46    62     0     0     6   274   156     0     0
 7     7        88        123     6     0     0    20     0     0     6   264   147     0     0
 8     8        99        123     6     0    57    66     0     0     6   261   150     0     0
 9     9       110        123    13     0     0     0     0     1     6   535   253     0     0
10    10       121        123    13     0     0     0     0     1     6   649   436     0     0
11    11       132        123    13     0     0     0     0     1     6   268   150     0     0
12    12       143        123     6     0    51    64     0     0     6   265   147     0     0
13    13       154        123     6     0     0    11     0     0     6   283   162     0     0
# … with 29 more variables: Opt3 <dbl>, Opt4 <dbl>, Opt5 <dbl>, Opt6 <dbl>, Opt7 <dbl>,
#   Opt8 <dbl>, Opt9 <dbl>, PrPh <dbl>, Temp <dbl>, motS <dbl>, numF <dbl>, unbC <dbl>,
#   unbF <dbl>, FillR <dbl>, Steam <dbl>, APSoff <dbl>, DelVal <dbl>, MachMd <dbl>,
#   PrCode <dbl>, SLevel <dbl>, SpinSp <dbl>, chartL <dbl>, APSfreq <dbl>, RemTime <dbl>,
#   RecipeId <dbl>, DisTestOn <dbl>, DisTestRes <dbl>, WiFiStatus <dbl>, CheckUpState <dbl>
1 Like

Thanks but It says:
Error in separate_rows(parsed_parameters, sep = ",") :
object 'parsed_parameters' not found

why?

That is a copy-pasting artifact, sorry about that.
I copied straight from the console, that adds + as continuation.

Ready to copy:

status_washing %>%
  separate_rows(parsed_parameters, sep = ",") %>% 
  separate(parsed_parameters, c("parsed_parameters", "value"), sep = "=>") %>%
  drop_na() %>% 
  mutate_if(is.character, ~str_remove_all(., "[\\\"\\s]")) %>% 
  mutate(value = as.numeric(value)) %>% 
  pivot_wider(names_from = parsed_parameters, values_from = value)

This is the same code you've provided, minus id_cols = .
The more explicit id_cols = c(user, status_id, machine_id) would work as well.
See the documentation of help("pivot_wider"):

id_cols
A set of columns that uniquely identifies each observation. Defaults to all columns in data except for the columns specified in names_from and values_from. Typically used when you have additional variables that is directly related.

1 Like

Now it works.
many thanks for your help!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.