Software Engineer (Spark and MLflow)
paid // remote // full-time
The exponential growth of digital information creates new challenges and opportunities that require specialized tools to help scale compute environments and data science teams. Tools like Apache Spark and MLflow provide support to distribute computing to process large datasets and manage models in large data science teams. It is our goal to make such tools first-class citizens for the R community to empower their users and help R remain a leading platform for data science and machine learning.
As a software engineer working in Spark and MLflow, you will be responsible for a majority of the engineering effort in Spark, MLflow and new exciting initiatives for the R community that intersects distributed computing, modeling workflows, deep learning and beyond. Your responsibilities range from supporting existing and new versions to enabling new features, new R packages and contributing to the Spark and MLflow communities at large. It is our hope that these efforts will build bridges between the machine learning and the data science communities, and result in innovative new research and applications.
Responsibilities of the position include:
- Maintain existing and future versions of the sparklyr and mlflow packages.
- Fix high-priority issues in the sparklyr and mlflow packages.
- Support the hard technical challenges that might arise while using Spark and MLflow with R.
- Write and troubleshoot tests across versions and environments.
- Work with community advocates to prioritize new features and resolve technical issues.
- Expand the scope of distributed computing in R by implementing new features and packages to support cutting-edge advancements in computing frameworks, simulation, deep learning, distributed training, etc.
- Support the engineering infrastructure for Apache-like projects, for instance, by writing release scripts, documentation, etc.
The software engineer will be a member of the Multiverse team, which currently includes Daniel Falbel, Sigrid Keydana, Kevin Kuo and Javier Luraschi. Our team also works closely with Edgar Ruiz, Andrie de Vries, Max Kuhn and J.J. Allaire.
Required qualifications
- Experience shipping professional software including CRAN packages.
- Experience in Scala or C/C++.
- Experience in data science, machine learning or distributed computing.
- Ability to work autonomously and independently on difficult problems.
Desired qualifications
- Committer in the Spark project.
- Experience releasing and maintaining CRAN packages.
- Experience in R, Python or Julia.
- Experience working in open source projects.
About us
- We welcome all talented engineers and are committed to a culture that represents diversity in all its forms .
- We prioritize giving engineers “focus time” to get deep work done. We minimize meetings and attempt to operate asynchronously.
- We are a learning organization and take mentorship and career growth seriously. We hope to learn from you and we anticipate that you will also deepen your skills, influence, and leadership as a result of working at RStudio.
- We operate under a unique sustainable business model : 50% of engineering we do at RStudio is open source. We are profitable and we plan to be around twenty years from now.
Notable
- 100% distributed team (or come in to one of our offices in Seattle or Boston) with minimal travel
- Competitive compensation with great benefits including:
- medical/dental/vision insurance (100% of premiums covered)
- 401k matching
- a home office allowance or reimbursement for a coworking space
- a profit-sharing system
- Flexible environment with a generous vacation policy
Apply now: https://hire.withgoogle.com/public/jobs/rstudiocom/view/P_AAAAAACAAJZOAkvGkRJTHI