Specifying StreamingContext Duration in sparklyr

hi all,

I'm new to spark (and sparklyr) and therefore trying out a few things. My plan is to have a producer (sending some random data) over a socket to the consumer. This is working great. The core looks as follows:

producer:

while(TRUE){
    writeLines("Listening...")
    con <- socketConnection(
      host = "localhost",
      port = 9998L,
      blocking = TRUE,
      server = TRUE,
      open = "r+"
    )

    writeLines(sample(1:10000, 1), con)
    close(con)
  }

consumer:

sc <- spark_connect(master = "local")
while(TRUE){
    stream <- stream_read_scoket(sc, options = list(host = "localhost", port = 9998L))
    print(stream)
 }

This prints the expected output one by one (old entries do not show up in the stream anymore).
I now wanted to calculate the mean of the received numbers in the last 20 seconds. Of course I could collect the numbers and store the previous values myself. But I think that's not how it's supposed to be and I'd like to do that calulation in the spark framework without collecting first. I think the correct way to do that is to define a Duration in the "StreamingContext" https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/streaming/StreamingContext.html . But I didn't find a way the define this when using sparklyr.

I'd be happy to get a hint how such a duration could be defined in sparklyr.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.