Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›Cribl.Cloud Solution Brief
The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›Explore Cribl’s Solutions by Use Cases:
Explore Cribl’s Solutions by Integrations:
Explore Cribl’s Solutions by Industry:
Watch On-Demand
Transforming Utility Operations: Enhancing Monitoring and Security Efficiency with Cribl Stream
Watch On-Demand ›Try Your Own Cribl Sandbox
Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›Sally Beauty Holdings
Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›Cribl Corporate Overview
Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›Stay up to date on all things Cribl and observability.
Visit the Newsroom ›Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›It’s not uncommon for machine data systems to send and receive duplicate or repeated events. This could be due to a variety of reasons, for example;
Misconfiguration (a.k.a layer-8 problems) on the source, intermediary or aggregation systems may cause duplicate data to be sent out.
Bugs or software defects may cause a data source to occasionally (or worse, frequently) send duplicates.
Unreliable networks or connectivity issues may force sources to resend data.
Using certain protocols (e.g., UDP) may also cause out of order events or duplicates.
Repeated, low value events that indicate fast cycles or loops (e.g., network device interfaces flapping rapidly).
The functional impact of duplicates will depend on the situation and use cases being served by the system. For instance, a relatively low number of dupes may be insignificant when the system is mostly used by humans for troubleshooting. On the other hand, a large number of dupes can throw off security & operational analytics. Especially when they depend on summarization and/or statistical aggregations.
From a system perspective, a large number of dupes can impose a significant stress on performance and storage. Searches, queries, dashboard views etc. all will be impacted due to the added number of events to sift through. This can and will affect end-user experience. Storage will also be consumed faster. Which means higher value data will likely be unnecessarily/prematurely aged out.
It is clear that when dupes occur it becomes necessary to eliminate them. Cribl puts the machine data administrator in control of the flow and provides functionality to reduce dupes inline before forwarding to a downstream receiver.
The Suppress function plays a central role here. The way it works is the following: First, users define a key expression to uniquely identify events in a stream. Next, the function checks if that key has been seen before. If yes, the event is dropped. Else, it is allowed through. In pseudo-code:
for each event e:
if haveWeSeenThis(e.key):
drop(e)
else:
letItThru(e)
Data deduplication keys are not stored for all time. Instead, they work on a user defined time window. As the window moves forward in time, new keys are inserted and older ones are expired.
The key expression allows the administrator to surgically articulate how to uniquely identify events. This can be as coarse as the entire event content or as fine as a very specific combination of event’s fields.
The nuanced benefit that the key expression brings about is critical. For instance, in the sample above, even though some events look identical you can’t actually deduplicate them based on their full content because time
is different in each one. In fact, given the nature of the timestamp field (i.e. typically monotonically increasing) it should be excluded from all your deduplication efforts. For this particular source here we can construct a key expression that targets only relevant fields for a more precise basis for deduplication. For example, this expression will compose a key using the values of channel
, message
and src
fields:
`${channel}:${message}:${src}
`
The Suppress function offers a few configuration options that can be tuned to precise customer data deduplication requirements. Let’s take a look.
Key Expression: This is used to build a key to uniquely identify events to suppress. For example in our case:`${channel}:${message}:${src}
`
Number to Allow: Number of events to allow per window. Typically this will be 1.
Suppression Period: This is the time window, in seconds, to suppress events after ‘Number to Allow’ events are received. It is recommended to keep this down to a reasonable number, say, less than 60s or so. The greater this number the larger the corpus of keys to store and compare against.
Drop Suppressed Events: This control can be used to indicate if suppressed events should be dropped or just decorated with suppress=1
. This is useful when other functions down the pipeline need to be invoked on duplicate events or to assist downstream systems if they are to remove duplicates.
Notice how in this sample events #2 and #3 have the same key (see expression above) and #3 has been dropped as indicated by strikethrough/greyed-out styling.
Indicating the number of events dropped is always done in context. I.e. no more of those loose, separate “Last message repeated N times” events. The next event that is qualified to be allowed through will be decorated with suppressCount=N
where N represents the number of events dropped in the previous time window.
Depending on the key expression the number of identified duplicate or repeated events can be quite significant.
Removing or minimizing duplicates results in a better experience across the board; the analytics/end system performance will increase, fewer CPU/Storage resources will be wasted and overall user experience will improve. Cribl makes the process of dealing with duplicates in real-time easy. Powerful articulation of key expressions allows for custom and fine-grained targeting of repeated patterns. Give it a try today!
—
If you’d like more details on configuration, see our documentation or join us in Slack #cribl, tweet at us @cribl_io, or contact us via hello@cribl.io. We’d love to help you!
Enjoy it! — The Cribl Team
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.
Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari
Got one of those handy?