Implement Spark in a company that has a datacenter



I am new to Spark and I wondered if a company that owns a datacenter can easily implement HDFS storage system and Spark framework on it instead of using cloud services like AWS.

In that case, does someone know any tutorial or have any tips to achieve it ?

(By the way, I use R programming language and would be interested to use R sparklyr package)

Thanks for your help


FYI, I manually removed the upvote/downvotes from your header, and see you’ve deleted your question from StackOverflow. However, since this is an R forum, you might want to tailor your question to this audience, for whom using R is more than incidental.


Hi @John78! Implementing Hadoop at a Data Center will depend heavily on the type and brand infrastructure currently at your data center. Some Hadoop vendors have partner up with hardware vendors to make this implementation easier, for example, I know that EMC advanced NAS called Isilon can be used as the storage for Cloudera and Hortonworks distribution. I think your best bet it so contact the Hadoop providers and discuss your particular case with them to get a more tailor-made plan.

Regarding R and Spark, I would encourage you to visit to get more information about how to use sparklyr