The sparklyr R code
is as below. JSON
file is successfully read and nested columns are invoked. But in nested columns, there are many repeated column names. In following code, I want to rename 'hashtags' column name.
sc <- spark_connect(master = "local", config = conf, version = '2.2.0')
sample_tbl <- spark_read_json(sc,name="example",path="example.json", header = TRUE, memory = FALSE, overwrite = TRUE)
sdf_schema_viewer(sample_tbl) # to create db schema
df <- spark_dataframe(sample_tbl)
parsedCol = list(
invoke(df,"col","created_at"),
invoke(df,"col","entities.hashtags"),
invoke(df,"col","entities.media.additional_media_info.description"),
invoke(df,"col","entities.media.additional_media_info.embeddable"),
invoke(df,"col","entities.media.additional_media_info.monetizable")
)
out = df %>% invoke("select", parsedCol)
sdf_register(out,"parsedSample_tbl")
Any solution would be appreciated.