Here's what I will say: If you're inserting into a database, whether it's Redshift, Postgres, etc. It's going to be slow. I messed with this for weeks and determined that not only was it slow, it was dropping rows(this is obviously bad).
Our setup looks like the following.
Person puts .csv file(usually about 400k-900k rows) in an S3 bucket. AWS Lambda written in Python fires on the object creation event which calls our Python code.
The Python code SSHs into another EC2 instance and nohups a bash script which calls specific RScripts based on the file. Looks something like.
#!/bin/bash
Rscript script1 &
Rscript script2 &
Rscript script3 &
Rscript script4
Each script uses aws.s3, and reads in the file from S3 into an object where each code base does the EXACT same processing. We join each row against our lead table(28million rows) row by row and then parse, calculate and fit a model over the top. We write the results to independent locations where our "combinator" is running in the background. This is a python program with almost no overhead but what it does is basically loops over and over looking for files to be named a certain way in a certain location. The loop breaks when the counter hits it's setting based on the files being created. This way, it doesn't matter if Rscript4 finishes first or last, once it's done and creates the .zip, the background job will pick it up and will automatically exit ONCE it grabs all files.
We predominantly use aws.s3, data,table, and just base R. The trick is the splitting of the data. The lambda handler basically does a rec count on the number of rows in the input file and depending on the file, splits it X ways. This changes just because our lead sample will be different per file. So, we chunk into smaller pieces, and run it 3x, 4,x 8x etc concurrently using the bash script.
Once we have the full result sets, that's when we insert it into MSSQL.
Some food for thought: Ran a test today: Microsoft Open R on a Windows EC2 environment took2 hours 41 minutes. Our machine we actually run this stuff is R 3.5.0 on a centos build and finished in 39 minutes. However, Linux has less overhead in general so this isn't a completely fair comparison.
I would start with data.table, openBLAS, aws.s3, and maybe mixing in some Python.
I don't know if you can PM me or anything but maybe I can help some more.