Flattening the Data of Different Data Types

rstudio
sparklyr

#1

Introduction

R code written by using Sparklyr package to create database schema. So, after reading I have the dataframe of following structure in R.

R Database Schema

root
    |-- contributors : string
    |-- created_at : string
    |-- entities (struct)
    |     |-- hashtags (array) : [string]
    |     |-- media (array)
    |     |     |-- additional_media_info (struct)
    |     |     |       |-- description : string
    |     |     |       |-- embeddable : boolean
    |     |     |       |-- monetizable : bollean
    |     |     |-- diplay_url : string
    |     |     |-- id : long
    |     |     |-- id_str : string
    |     |-- urls (array)     
    |-- extended_entities (struct)
    |-- retweeted_status (struct)
    |-- user (struct)
    
I want to flatten this structure and create a new dataframe as below,
    
    root
    |-- contributors : string
    |-- created_at : string
    |-- entities (struct)
    |-- entities.hashtags (array) : [string]
    |-- entities.media (array)
    |-- entities.media.additional_media_info (struct)
    |-- entities.media.additional_media_info.description : string
    |-- entities.media.additional_media_info.embeddable : boolean
    |-- entities.media.additional_media_info.monetizable : bollean
    |-- entities.media.diplay_url : string
    |-- entities.media.id : long
    |-- entities.media.id_str : string
    |-- entities.urls (array)     
    |-- extended_entities (struct)
    |-- retweeted_status (struct)
    |-- user (struct)

Problem Statement

The nested columns are of different data types. [If require I can upload the snapshot of database schema]. Database schema is similar to above structure. I want to flatten the columns. Any solution would be appreciated.


#2

Have you tried using the https://mitre.github.io/sparklyr.nested/ sparklyr extension?

You would have to install it with:

devtools::install_github("mitre/sparklyr.nested")

your_data %>%
  sdf_unnest(entities)

Please refer to the extension docs for examples and additional information.


#3

Hi @javierluraschi, Thanks for reply.

By using sdf_unnest, It will only dig one level deep into the schema. If there are fields nested 2 or 3 levels deep, they will still be nested (albeit only down 1 level) after sdf_unnest is executed. Therefore you are not guaranteed totally flat data after calling sdf_unnest. My database schema is 2 to 4 levels deep. So, it will not be possible to total flattening of data by using sdf_unnest. [ I have referred your given reference link ]