July 20, 2021
Webhook destinations have been available in LogStream since 2020 (LogStream version 2.4.4), and Packs since July of 2021. In this blog post we’ll cover using Webhooks to trigger incidents in the PagerDuty API, and the Cribl Webhook Pagerduty Pack created to demonstrate how Packs make deployment easier.
LogStream’s core competency is providing an observability pipeline: Streaming events from sources to destinations with a library of functions to route, reduce, transform, enrich, and replay that data. Normally this entails taking some subset of events from a given log producer and delivering them to one or more destinations.
The Webhook destination adds a new wrinkle to LogStream’s arsenal. It allows for a way to call out to external services, using industry-standard HTTPS and JSON, to trigger events elsewhere. If a service you use accepts JSON payload via POST, PUT, or PATCH, LogStream can use it to bridge the gap between your machine data and third-party functionality.
In this case, we’re going to use the Webhook feature to fire based on data in the Cribl Internal Metrics data source. If any outputs on workers begin queuing up, an event will be sent to PagerDuty where their process of acknowledgement and resolution can be followed.
Let’s get started.
Destinations in LogStream have three options when the destination service can’t be reached: Drop the event, block the pipeline, or queue the data. In the case of queuing the data, LogStream will locally store the events until the service is available again, at which point the queue will drain. But… turns out, disk resources aren’t unlimited! Who knew?!
I absolutely recommend having a proper disk-space monitor on all your servers, regardless of persistent queuing (PQ) settings. Nagios or any other of the hundreds of monitoring tools are purpose-built for this. But by watching LogStream’s internal metrics logs, we can have another canary in the coal mine.
If you’ve pulled up the Monitoring panel in LogStream, you’ve seen internal metrics at work. But did you know you can treat those metrics like any other log source? Once you enable the source, you can route, transform, and aggregate any way you see fit before delivery to any of our supported destinations. (Hint: We have a Splunk app for LogStream monitoring.) So that will be our first stop.
Navigate to the Sources screen, and click the Cribl Internal icon. Then enable the Cribl Internal metrics log source. The default values are fine.
Next, navigate to the Destinations configuration screen. Select Webhook.
Click Add New, and use the following URL:
Your config screen should look something like this:
Additionally, under Post-Processing, clear the System fields input.
As usual, all the fancy work is done in the pipeline. I’m going to walk through the functions used in the pipeline here, but you can get the same functionality with the Pack, discussed in the last section of this post.
Navigate to Pipelines and create a new one. I’ve named mine trigger_pipe.
Next, let’s add a drop function to only allow the events we want to examine in the rest of the pipeline. For queue watching, we’ll use this filter expression, and enable the Final flag:
!((_metric == 'cribl.logstream.pq.queue_size') && (_value > 0) )
In other words, if the event is not a
pq.queue_size event with
_value > 0, it will be dropped.
Next, we’ll add the Eval function. This is the core of the whole pipeline. We need to define the fields the PagerDuty API needs while integrating relevant info from the event into the payload. The screenshot below shows the basics. You’ll need to enter your own Integration token from PagerDuty in place of the
routing_key shown below.
In Keep Fields, enter:
routing_key event_action dedup_key payload*
In Drop Fields, enter: *
Finally, add a Suppress function to limit how many events we send to PagerDuty. We’re going to use
dedup_key (defined in the Eval above) as the Ke$y Expression, and use a 5-minute window (300 seconds) as the Suppression Period. In other words, for every host-output combination, we’ll allow one event through every 5 minutes.
Finally, create a new route with a Filter matching
__inputId=='cribl:CriblMetrics', point it at your new pipeline, and your Webhook destination.
You can force a queuing action by blocking access to a destination, or stopping a service. In my lab, I used a Splunk HEC endpoint for delivery of Datagen-created events. To test failure, I simply stopped the Splunk service. Within 30 seconds, a new incident should show up in your PagerDuty control panel.
Using an aggressive filtering option and a Suppress function, we can pare down our stream to just a few key events to trigger Webhook events, and in turn, PagerDuty incidents. While we used internal metrics for this example, any event in your pipeline could be used as a trigger just as easily. The Webhook destination adds more flexibility to an already robust set of options in LogStream.
LogStream Packs simplify deployment. There are 2 primary cases that Packs play into:
In both cases, it can be cumbersome to ensure pipelines, routes, lookups, and other artifacts are all accounted for in each install.
The Pack concept solves this by letting you bundle these things into a portable file format, easily installed into any LogStream 3.0+ installation. The PagerDuty API example above isn’t terribly complicated, but distributing it using a Pack makes it all the easier. No chance of typos, no missed steps in the Pipeline. Just drop it in, fill in your integration token, and you’re off to the races. When creating more complex Pipelines that involve lookups and other elements, the Pack advantage will be even more clear.
In fact, I’ve added an extra tweak to this Pack as a demonstration. Instead of hard-coding your integration token in an Eval, I’ve included a lookup file. Based on matching the host that sent the internal message, you can have different tokens. Maybe your staging team has a different PagerDuty setup than your production team. Since it’s a Pack, there’s no need for you to bother with creating the lookup. The plumbing’s all in place, just put your values in.
Once you’ve signed up for LogStream, you can find the Pack for Cribl Internal Metrics-triggered PagerDuty alerts in Cribl’s GitHub. Included with the Pack is a Readme with notes on how to configure it. The preceding blog post (hopefully) was interesting and showed you something new. But with Packs, we don’t require explanations at this level of detail. Grab the Pack, install it, and follow the directions.
The fastest way to get started with Cribl LogStream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using LogStream within a few minutes.