Dear R Community.
My goal: Be able to provide guidance on building data infrastructure.
Some background and a "thank you" note:
I started learning R about three years out of the motivation to become more efficient in my day to day work. I was frustrated using MS Excel and felt like there must be something that better suits me. Today, I can honestly say that R has changed my life. By now, I feel confident in using the language, am fluent in tidyverse packages and don't shy away from learning how to use purrr, build apps in Shiny and make my slides with xaringan. All of that, thanks to a wonderful community that truly supports newbies on their journey to using a programming language.
My skill gap:
Over the past six months, I have reached the next hurdle for which I know there must be a more efficient way out there. It's data engineering and building efficient ETL (extract, transform, load) pipelines. While I have become a go-to consultant in my field of work when it comes to data analysis questions, I am more and more being asked questions around the data engineering site of work and also see that this is where most companies have their bottlenecks.
A very typical situation:
- Households/customers receive a service several times per week.
- The service includes the delivery and collection of goods from the household/customer. GPS data collected with external software (e.g. MapMyRun) and QR codes used to identify customers.
- Customer pay for services using mobile money. Data is accessed through external software (e.g. Quickbooks) and downloaded as XLSX files.
- Account managers makes phone calls to customers around once a month to interview on satisfaction of service. These lists are stored on paper and knowledge remains with account managers.
Companies in the illustrated example are interested in:
- Business intellegence: Automated descriptive data analysis on performed services (e.g. Shiny dashboard app).
- Detailed analysis that trigger action: Exploratory analysis on positive and negative customer satisfaction (e.g. customer missing a payment indicates low satisfaction and initiates account manager to perform phone call)
My question to this community are:
- Which Data Engineering tools do I need to learn to be able to provide these companies with support?
- Are there specific courses available that you would recommend?
- I am the administrator of a cloud server that I have established for myself.
- I am able to work with the Unix CLI given that I have clear instructions.
- I am becoming more proficient in finding solutions on StackOverflow for problems I encounter.
- I have a RStudio server and Shiny server running on my cloud server. Thanks to Dean Attali's great tutorial
- I have a MySQL server running on my cloud server.
- I know how to use CronJobs with RStudio to perform automated analysis.
- I have started out by reading this article by Nate Kupp: Getting started: the 3 stages of data infrastructure
I appreciate you reading this and am looking forward to a discussion.