Structure of Data Analyst/Science Teams

kentm · November 15, 2017, 2:19am

I’m curious to learn more about how organizations that others work at structure their data teams. Specifically, I’d be interested to learn about:

Data Infrastructure (how do analysts get access to data / support others access to data)
Data Reporting (how do analysts present results to the organization at different levels)
Data Collaboration (how do analysts work with others to generate ideas and with other analysts that use different tools)

Now to a little bit about my background / organization. I was brought in about a year ago as one of the main data analysts. The organization consists of a few software engineering teams developing different applications for the organization but no real “data focused” team. I started by building out a database that various R applications could easily access. Next I built some Shiny applications focusing on metrics and data access. One of the main problems I have faced is how to best support “real time” data access to individuals who only know Excel. My solution so far is a Shiny application sitting on top of the database that allows individuals to explore the data in an interactive way. The simple approach would be emailing Excel sheets everyday but that is a less than ideal solution. More recently I have been focusing on analyses in the form of R Markdown reports that get circulated to other teams that are interested. I’m hoping to build out some sort of web repository (maybe Shiny?) for these types of reports, but this is still in its infancy.

How do others interact with their organizations to get, analyze, and report on results? What are the main problems you face?

DaveRGP · November 20, 2017, 4:14pm

Interesting question. We're in a somewhat similar position. No direct solutions yet (I'll keep you posted XD) but in terms of distributing RMarkdowns on a regular basis, maybe look at RStudio connect. Enterprise level interface that allows you to distribute reports.

Our current solutions are automated R + python jobs on small ec2 instances running on schedules, but this is somewhat labour intensive as a method of deployment in comparison to what Connect looks like it does. We haven't tested yet, but will be soon