@konradino based on the files you shared with me, this is a network issue. Given that the body of the response contains a large (>75 MB) JSON file, network latency is what's causing the discrepancy between a local performance and RStudio Connect performance.
Consider the following script that uses CURL to make a request to both a local endpoint and a remote endpoint hosted on RStudio Connect:
#!/bin/bash
echo 'url,time_namelookup,time_connect,time_appconnect,time_pretransfer,time_redirect,time_starttransfer,time_total,size_upload,speed_upload,size_download,speed_download' > timings.csv
for url in '<local-endpoint>' '<rstudio-connect-endpoint>'
do
for i in {1..10}
do
curl -d "@test.json" -w "%{url_effective},%{time_namelookup},%{time_connect},%{time_appconnect},%{time_pretransfer},%{time_redirect},%{time_starttransfer},%{time_total},%{size_upload},%{speed_upload},%{size_download},%{speed_download}\\n" -o /dev/null -X POST ${url} >> timings.csv
done
done
This script creates a .csv file containing timing details about each request. Analyzing the results reveals that nearly all of the time discrepancy between the two APIs can be accounted for with download and upload speed.
| url |
size_download |
speed_download |
time_total |
download_time |
| Local Endpoint |
82540139 |
27285996 |
3.025613 |
3.025000 |
| Local Endpoint |
82540139 |
27467600 |
3.005494 |
3.005000 |
| Local Endpoint |
82540139 |
27349283 |
3.018092 |
3.018000 |
| Local Endpoint |
82540139 |
30401524 |
2.715642 |
2.715000 |
| Local Endpoint |
82540139 |
26921115 |
3.066180 |
3.066000 |
| Local Endpoint |
82540139 |
27458462 |
3.006494 |
3.006000 |
| Local Endpoint |
82540139 |
27313083 |
3.022872 |
3.022000 |
| Local Endpoint |
82540139 |
27331171 |
3.020641 |
3.020000 |
| Local Endpoint |
82540139 |
28141881 |
2.933028 |
2.933000 |
| Local Endpoint |
82540139 |
27550113 |
2.996060 |
2.996000 |
| RSC Endpoint |
82540139 |
6719868 |
12.283238 |
12.283000 |
| RSC Endpoint |
82540139 |
6846963 |
12.055490 |
12.055000 |
| RSC Endpoint |
82540139 |
9036581 |
9.134867 |
9.134001 |
| RSC Endpoint |
82540139 |
8575598 |
9.625306 |
9.625001 |
| RSC Endpoint |
82540139 |
7688881 |
10.735571 |
10.735000 |
| RSC Endpoint |
82540139 |
9607745 |
8.591900 |
8.591000 |
| RSC Endpoint |
82540139 |
9847308 |
8.382065 |
8.382000 |
| RSC Endpoint |
82540139 |
9207958 |
8.964228 |
8.964000 |
| RSC Endpoint |
82540139 |
9427771 |
8.755820 |
8.755000 |
| RSC Endpoint |
82540139 |
9588770 |
8.608930 |
8.608001 |
In order to improve performance, these are a couple of options that come to mind:
- Use a different serializer
- Depending on the downstream consumer of this API, you could serialize into something more compressed than JSON in order to cut down on the size of the response.
- Create a paginated response
- Instead of returning everything to the client at once, allow the endpoint to only return a few records at a time, and provide a mechanism for requesting subsequent records.