sparklyr and custom-made package

Hello everyone,

It's my first time trying to use sparklyr.

I would like to use my custom-made library, which is not published in CRAN.

To my knowledge so far, I can evaluate a user defined function via spark_apply and spark_apply doesn't like nested functions.

My question is: I'm connecting locally via spark_connect and trying to execute a function from my custom-made library. It is not working. Is this happening because of the existence of nested functions in my package or for some other reason?

Is there any workaround in order to use directly the functions of my package?

Thank you in advance,
Angelica

Hi Angelica,

Have you already considered not using spark_apply()? What functionality does your library provide not available in Spark or one of the available extensions?

Assuming you do need spark_apply(), before using your library, make sure a subset of your data can be transformed in spark_apply() by running something similar to:

data %>% head(n = 1000) %>% spark_apply(~ .x)

Do notice that spark_apply() expects a data frame as input and your R transformation must also return a data frame. If your library returns a matrix or other objects, you will have to manually transform the output into a data frame.

Thank you so much for your direct response.

I would like to use my library, as it provides me with models under specific optimization techniques, that can't be found directly in another package. Since I know that my functions work and produce correct results, I would like to keep using them.

That's why I'd like to use spark_apply and modify the inputs - outputs in the required data frame structure.

Of course my library is based on other R libraries.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.