I am pretty much new to Spark and have only heard about it from our professor. Despite Googling and scratching my head several times I’m still not clear on understanding Spark.
My basic question is dplyr is awesome, then why do we need sparklyr?
Things I understand.
- Apache Spark is not a database - so no package is/was necessary.
- It just increases the computation speed.
So unless our IT department does not have spark on their machine, we should not use it.
Please help me to increase my knowledge.