Ah! So you're needing to do a batch processing after acknowledging receiving the data.
tl;dr reduce communication attempts to a minimum. Transferring data is cheap, connecting is expensive.
First thing I would look into is to reduce the number of requests being made to plumber and to the database.
In a perfect world, a round trip internet request takes about 200ms with 0ms of processing time. So 10000 requests * 0.2 s / request * (1min/60s) = 30+mins
of wasted time. So completing your task in 20mins is beating the odds.
Since you already have the information in hand at the time of /addParamters
, you could change the script to add multiple values at one time. Similar to
INSERT INTO MyTable
( Column1, Column2, Column3 )
VALUES
('John', 123, 'Lloyds Office'),
('Jane', 124, 'Lloyds Office'),
('Billy', 125, 'London Office'),
('Miranda', 126, 'Bristol Office');
(https://stackoverflow.com/a/452934/591574)
This would reduce the sql queries to 300 (vs 10k). (This could be reduced even further if all information could be sent at once.)
Something like (assuming same length of names/values)
a <- letters[1:10]
b <- 11:20
sql_txt <- paste0(
"INSERT INTO database\n (Parameter, Value)\nVALUES\n ",
paste0(
"(?parameter", seq_along(a), ", ?actualVal", seq_along(b), ")",
collapse = ",\n "
)
)
cat(sql_txt)
INSERT INTO database
(Parameter, Value)
VALUES
(?parameter1, ?actualVal1),
(?parameter2, ?actualVal2),
(?parameter3, ?actualVal3),
(?parameter4, ?actualVal4),
(?parameter5, ?actualVal5),
(?parameter6, ?actualVal6),
(?parameter7, ?actualVal7),
(?parameter8, ?actualVal8),
(?parameter9, ?actualVal9),
(?parameter10, ?actualVal10)
sql_params <- append(
setNames(as.list(a), paste0("parameter", seq_along(a))),
setNames(as.list(b), paste0("actualVal", seq_along(b)))
)
str(sql_params)
List of 20
$ parameter1 : chr "a"
$ parameter2 : chr "b"
$ parameter3 : chr "c"
$ parameter4 : chr "d"
$ parameter5 : chr "e"
$ parameter6 : chr "f"
$ parameter7 : chr "g"
$ parameter8 : chr "h"
$ parameter9 : chr "i"
$ parameter10: chr "j"
$ actualVal1 : int 11
$ actualVal2 : int 12
$ actualVal3 : int 13
$ actualVal4 : int 14
$ actualVal5 : int 15
$ actualVal6 : int 16
$ actualVal7 : int 17
$ actualVal8 : int 18
$ actualVal9 : int 19
$ actualVal10: int 20
conn <- poolCheckout(pool)
sql <- sqlInterpolate(
conn = conn,
sql = sql_txt,
.dots = sql_params
)
DBI::dbExecut(conn, sql)
poolReturn(conn)
(No for-loop, yay!)
Final product:
#* Add parameters to DB
#* @param parameter1
#* @param parameter2
# ...
#* @get /addParameters
function(parameter1, parameter2) {
print("Running addParameters")
# set up params
ParametersValues <- list(parameter1, parameter2)
ParametersNames <- c("Parameter 1 Name", "Parameter 2 Name")
# create GIANT sql command
sql_txt <- paste0(
"INSERT INTO database\n (Parameter, Value)\nVALUES\n ",
paste0(
"(?parameter", seq_along(ParametersValues),
", ?actualVal", seq_along(ParametersNames), ")",
collapse = ",\n "
)
)
# setup paramters for sql command
sql_params <- append(
setNames(
as.list(ParametersNames),
paste0("parameter", seq_along(ParametersNames))),
setNames(
as.list(ParametersValues),
paste0("actualVal", seq_along(ParametersValues)))
)
conn <- poolCheckout(pool)
sql <- sqlInterpolate(
conn = conn,
sql = sql_txt,
.dots = sql_params
)
DBI::dbExecut(conn, sql)
poolReturn(conn)
# return the time it completed inserting data
paste0("Done! ", Sys.time())
}
I'm hoping this will be able to be done without the use of pool
as the number of queries will be MUCH less. I'm also hoping the execution time will be MUCH faster, allowing you to do it in real time and not need to use later
or callr
.
If the queries still take too long, I would look into callr
R package. It will allow you to do your data processing in another R session. later
will not be of good use for you as it will still block the main thread.
callr
has a function r_bg
which will call and R background process and execute a function.
Putting it all together, it'd be something like
#* Add parameters to DB
#* @param parameter1
#* @param parameter2
# ...
#* @get /addParameters
function(parameter1, parameter2) {
print("Running addParameters")
# set up params
ParametersValues <- list(parameter1, parameter2)
ParametersNames <- c("Parameter 1 Name", "Parameter 2 Name")
# create GIANT sql command
sql_txt <- paste0(
"INSERT INTO database\n (Parameter, Value)\nVALUES\n ",
paste0(
"(?parameter", seq_along(ParametersValues),
", ?actualVal", seq_along(ParametersNames), ")",
collapse = ",\n "
)
)
# setup paramters for sql command
sql_params <- append(
setNames(
as.list(ParametersNames),
paste0("parameter", seq_along(ParametersNames))),
setNames(
as.list(ParametersValues),
paste0("actualVal", seq_along(ParametersValues)))
)
# exec in R background process
pr <- callr::r_bg(
function(sql_, sql_params_) {
# all functions must be namespaced or re-`library()` in this func
# open conn
conn <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") # change for real info
# setup
sql <- DBI::sqlInterpolate(
conn = conn,
sql = sql_,
.dots = sql_params_
)
# exec
DBI::dbExecute(conn, sql)
# close conn
DBI::dbDisconnect(conn)
},
# pass in parameters to external function
list(
sql_ = sql_txt,
sql_params_ = sql_params
)
)
paste0("Done! ", Sys.time())
}
Caveat... This will create a new R process for each request. Creating 300 R processes all at once will break your machine.
End goal: I would instead work hard to send ALL 10k cell info at once and try to make a single sql insert statement in the main R thread (without callr/later etc). If the function takes over 10s (or longer than tolerable), try using callr to submit the statement in a background R process.
Hope this helps!