best practices for monitoring R processes?

I wondered if there are best practices for monitoring r processes? I dont mean keeping track of memory or an rsession, but in Python I have the option to send events and messages to datadog, but is there something like that in R?
A standardized way to send metrics to a service?

I would want to send metrics like accuracy, rowcounts etc to an external service.

1 Like

Datadog is an implementation detail, I guess you are interesting in logging in general? There are multiple logging packages: https://daroczig.github.io/logger/articles/migration.html. There is also rsyslog - https://cran.r-project.org/web/packages/rsyslog/index.html and probably many others depending on what exactly you want to do.

None of them are standardized, though. They just all do something similar, but in slightly different ways.

1 Like

Cool! I never heard from those packages! These packages seem to be logging focused,is there also something for alerting?

What would you describe as alerting? For example, there is this package - http://dirk.eddelbuettel.com/code/rpushbullet.html. It can send an SMS even once something happened. But I would say it's still basically logging, just the way it is presented is different.

In general, I'd be more inclined to use R to log and then move logic to actually alert about something to a different service that specializes on that (like Datadog in your example, or Kibana, Grafana etc.).

3 Likes

To me logging keeps track of all states in a process, it has several levels: DEBUG, INFO, WARNING and ERROR.
But to me monitoring is very specific: collecting metrics about the state and performance of the process, to aggregate and analyze.
Monitoring is to understand how the systems behaves, and we are usually not looking at logs until something is off.

My usecase is this: I have several processes that run every day on the google infrastructure. They use new data every day. I want to keep track of the test error, and row counts for every run without going through the logs every day.

I think, ultimately it'll depend heavily on your infrastructure and what you have already. One thing that came to my mind is to use InfluxDB to send data about metrics to it (through https://github.com/dleutnant/influxdbr, for example). You can then visualize it with Grafana and, I assume, set up alerting in Grafana itself.

Alternatively, you can still send logs for that to, e.g., ElasticSearch (through https://github.com/ropensci/elastic) and then use Kibana to visualize results and build dashboards/set up alerts there.

1 Like

Thank you! This thread gives me a lot to think about. I might write about it soon.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.