Splunk configuration - tagging centralised logs with the app/markdown content name

I'm currently investigating solutions for getting our RStudio Connect application logs into our splunk store for centralised dashboards, monitoring, investigations, etc

I've spotted (from help in docs/admin/files-directories.html#application-logs) the deconstructed app stdout and stderr logs in e.g. folders like /var/lib/rstudio-connect/jobs/45/yGVowwKSbridm8KU/ and I've seen that there are files in there like bundle which can help me reference back to the /var/lib/rstudio-connect/apps and /var/lib/rstudio-connect/bundles folders.

However, I'm struggling a bit trying to work out how to ingest the logs with app.markdown metadata into splunk.
e.g. if I'm running a shiny app deployed as "EarthquakeTracker" then I'd like to have a tag in splunk which allows me to search for all "EarthquakeTracker" logs.

Has anyone already looked at this area? e.g. how have you managed to get the logs into splunk including tagging the splunk logs with RStudio Connect at least which app is running information?

I'm wondering if I can do something at the admin level by including a line in the log files - e.g. somehow getting RStudio Connect to inject the Shiny/Markdown/Plumber name into the stdout alongside the "Starting R with process ID" header... If I can find a way to identify the current running app (e.g. from the session or serverInfo then I can probably do this during Rprofile.site startup...)

1 Like

@slodge your feedback continues to amaze, thank you! :smile:

This is definitely a rough edge (log aggregation) that I am personally really excited about making some improvements on! I haven't tried this myself, and am definitely long due to do so!

Do you have an example / working splunk config that you are currently working against? I'm not sure what tools are at your disposal in splunk - is there a way to tag things after the fact? There is definitely a mapping from "app ID" (45 in the case above) to the title of the app, let's say. The easiest way to get that mapping is with the connectapi package (the get_content() function), but that would probably be an "after the fact" tagging - i.e. it's not going to be a part of the path or log file at runtime, etc. You could always output that mapping to a file on disk, though, too.

One noteworthy "problem" with the title is that titles can change over time. I would think you would want to do something immutable like App ID or App GUID for the tagging, but it'd be ideal if there was a lookup key within splunk to give it a "pretty" name. Is that a "thing"? Would it be a problem if changing the title disassociated the content / created a new tag in Splunk?

Thanks @cole

I've talked a bit with one of our in-house splunkers.

Ideally we'd like to have every "log line" in Splunk tagged with:

  • app id
  • app name
  • deployment version (id)
  • user name (from session)
  • time (UTC)
  • process id
  • log level (message, warning or error... plus debug would be nice?)

There are ways we can lookup some of that info... but by far the simplest way would be if that info came from folder/file structure and/or from structured log lines (e.g. from outputting the info in json on disk) - could we look at some way to (optionally) change the way RStudio Connect logs data?

PS I'm away from home and the office this week - and in-and-out over the coming weeks - so replies may be a bit sporadic - sorry!

1 Like

Thanks for that background!! That's very useful feedback for us!

Unfortunately, Connect doesn't have a way to change log output / structure today, although this is something that we are very interested in exploring. The only option today would be some type of "pre-aggregator" that parses the logs / rewrites them. I think the level of effort there is enough to discourage the exercise - the easiest path is probably our improving the logging process / flexibility out of the box.

One option I have not mentioned yet, the connectapi package makes use of some private / internal APIs, one of which is exposing the logs associated with "jobs." Unfortunately, there is no way to "stream" this output, so it is pretty one-dimensional, but the API would provide opportunity to get much of this information. If you're interested in that approach, get_jobs() or get_job() are the functions you will want to explore!

Is there any way of finding out if this is on the roadmap and likely to be in near or medium term releases? I know you make good posts when you release versions - but are there any forwards looking posts anywhere? (Or is this information we need to get from our client manager/success manager?)

So still hoping RStudio can change the way the logs are logged...

But in the meantime... I had a play with some connectapi and we can get something out using code like:

library(tidyverse)
library(connectapi)

client <- connect(
  host = 'https://connect.ourdomain.com',
  api_key = Sys.getenv("CONNECT_PROD_API")
)

content <- connectapi::content_item(client, guid = "7e61ec...our_content_guid...9d0765")

jobs <- content$jobs()

full_job_info <-
  jobs %>% 
  map("key") %>% 
  map(~ content$job(.x)) 

matches <- 
  full_job_info %>% 
  map("stdout") %>% 
  map(~ str_match_all(.x, "OurSearchTrigger ([^\\s]*)"))

# matches are in...
matches

Where what we are searching for was logged as:

message("OurSearchTrigger", value, " ")

Woops! Sorry for never responding to this @sloge :smile: (I had a baby 8/6/2020, so that partially explains my absence haha).

In any case, we try to be a bit cagey about providing too much in the way of forward looking posts so that (1) our engineers have the freedom to deliver quality software without being beholden to a deadline / customer expectations, and (2) we are free to pivot based on pressing needs / etc.

That said, this is definitely on our roadmap and is a pain point that I wrestle with often personally. It is a part of a larger arc of "admin-facing" features that make our products work better in enterprise environments, the cloud, etc. It is definitely a priority for us, and something we are actively working towards. Unfortunately, how long until / whether it will meet your needs is something we just don't have any information on.

The information you gave up above about the types of information that would be ideal to have are gold! Any other tidbits that you have of "this would be our dream world" are super useful to us as we scope / plan / execute on dev work :smile:

Hey @slodge,

Sorry for not responding to this! We are unfortunately a bit cagey about our development and release plans / timelines / etc. as occasionally we alter our plans based on (1) whether a feature turns out to be more complex than originally planned, (2) whether we misperceived the customer value, (3) whether something else comes out of nowhere as a very high priority.

I will say that we have been doing a lot of thinking about logging in the last few months. I am very hopeful that this will make its way into product improvements in the next few releases, although there is nothing specific to share yet.

I have you in mind as someone that I want to petition for feedback as we get closer to a release! (And this thread, in particular, has been very useful as we have worked through possibilities).

Definitely feel free to reach out to your customer success representative if/when you have roadmap questions / suggestions / etc. (Or share feedback here! Although sometimes our responsiveness struggles :sweat_smile:)

We have open-sourced an in-house solution for exactly this issue -- forwardings logs to Splunk. Forum post is here: Announcing a way to forward logs from RSConnect to Elasticsearch, Splunk, and many others

3 Likes

Just hitting this topic again - as structured logging is now available for the RStudio Connect log files... but sadly not for the application log files...

I'm still keen to try to do this application log collection well without running additional software if we possibly can...

It's another 6 months later... so thought I'd ping this again to find out if there's any more news from the roadmap here.

We're definitely using the new view on app logs...

<3 these... although it would be fab if:

  • there was also a manual way to refresh the list of log files (sometimes it doesn't seem to contain the actual log of the current running app)
  • the was find box worked across multiple log files
  • there was a way to view these logs without opening the app page itself (e.g. when I'm trying to see the logs for another user without polluting the logs with a new session)

We're also successfully using the structured logs on the main connect log files too - that seems to work well with Splunk :+1:

I've got to admit though, that we're still fighting the lack of structure and the the root permissions on the app-specific log files though - they remain stubbornly hard to use in Splunk. (Although part of this might still be my lack of Linux and Splunk skills!). Is there any roadmap news on when/if they might change?