Working with sparklyr objects

I'm starting to work with R in databricks and have an issue using the geohashTools package. I am trying to convert lat/long to geohashes usign the gh_encode function in geohashTools library within databricks. And while I have solved one problem I have created another.

Using the tbl function I first access a hive table then I attempted to use the follow example code to manipulate the data:

my_tble <- my_tbl %>%
mutate(geo_hash = gh_encode(lat, long, percision = 9L)

but this results in some sort of parse error. So what I did was to use the collect function to convert the hive table into a R data.frame. Runnign the same command the gh_encode function works. But now I have an R dataframe so I though I could convert it to a Spark dataframe using SparkR::as.DataFrame(acled_temp). But now I have a Spark data.frame.

What I can't figure out is why the mutate command as worked just fine except when using the geohoashTools fnctions. Converting object types causes another set of problems. Any idea why mutate(geo_hash = gh_encode(lat, long, percision = 9L) would error out in the first place.

Jeff

Because while using the tbl() function al dplyr commands get translated into sql and executed on the hive database under the hood but there is no sql translation for gh_encode(), that is why collecting the data and applying the function locally works, because for working locally, sql translation is not required

That at least makes sense

Would there a way to extract the lat and long columns run the gh_encode command then append the results back onto the original hive data set. Or can I convert the R data.frame back to a hive object or data.frame

You would have to write the result back into the hive database, maybe with copy_to()

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.