Cribl offers a suite of tools designed to optimize data pipelines, with different components tailored for managing and orchestrating data flows at scale across different teams and data sources. One of the biggest problems with building and running a multi-team data engine is isolation. This blog will cover how we at Cribl have handled the challenge of data, configuration, and access isolation for growing teams.
This blog will cover three features: Cloud Workspaces, Worker Groups, and Stream Projects, and how they provide isolation and crafted experiences for different levels of data access and control.
How Did We Get Here?
I started at Cribl a few months after the concept of Worker Groups was introduced into what was then called Cribl LogStream (I’m dating myself here, as we dropped the log back in 2022). Before that, a single node was processing our customers’ data. As with all product investments at Cribl, we start with the job to be done. For the remainder of this blog, I will use this model to explain how each capability in the portfolio should be used as defined by the problem it solves and the job it intends to do for our customers.
The job was: “Give me a way to isolate data at the network, compute, and storage layer to provide secure access to data for my different teams.” With Worker Groups, we also introduced a new persona into the Cribl world – the Group Admin. This user is responsible for all things within a specific group but not the administrator of Cribl in total or even other groups. Isolation achieved!
A few years later, another problem statement emerged: “I want to give access to members of my teams who don’t need to know or don’t have time to learn all of the ins and outs of the Cribl infrastructure. All they need to do is build and test pipelines.” The job to be done was: “Give Cribl Admins a way to give access to streams of data to their data experts without revealing or giving access to superuser/admin capabilities at the group level.” This provides more isolation and a tailored experience for the Data Expert persona.
The data expert just needs access to their part of the data. You could spin up a new Worker Group and grant them group admin access, but then they would have to learn every part of the solution to manage their part of the data. Additionally, many data sources are multiplexed, meaning that they can contain multiple sources and datatypes – S3 pulls alone can have data that can be used by multiple teams. A project allows for all of that data to be managed by one Worker Group but routed to unique Projects by Group for use by the appropriate data experts.
Lately, another problem statement has been coming up, specifically in the Cloud: “I need to have multiple unique environments with their own configurations, access rules, team members, and network isolation for providing full data management to unique parts of my organization.”
This last model is very common in state and federal governments as well as multinational corporations, where Business Units may have completely different teams of Super Admin, Group Admin, and Data Experts but would still like to have one place where billing and metering are combined to share a spend with Cribl. The job here is to provide an on-demand way of building new environments that is still under the banner of the main organization.
Cribl.Cloud Workspaces
Cribl.Cloud introduced Workspaces to give customers an on-demand full environment isolation that provides:
Environment-Level Isolation: Each workspace is an isolated environment with its own access VPC, compute, storage, and networking. Workspaces are a push-button way to expand Cribl to new parts of your org or stand up sandboxes, dev, and test environments.
Large scale team management: Workspaces are integrated with the Cribl Authorization and Authentication tools to allow for fast and secure onboarding of teams and users to specific environments without data leakage or configurations to other environments.
Fully managed Services: Workspaces in Cribl.Cloud seamlessly integrates with existing cloud services for networking, computing, and storage. They provide a flexible and efficient way of providing Cribl data management suite with little management overhead.
Cribl.Cloud Workspaces are ideal for organizations looking to leverage the power of Cribl products across dedicated environments while maintaining the benefits of centralized management, administration, and billing. They offer the benefits of isolation, security, and scalability for customers who require multiple unique enterprise environments.
Cribl Stream Worker Groups
Worker Groups in Cribl Stream are clusters of worker nodes that process your data. Worker Groups are fundamental to isolating and securing data collection and processing.
These groups provide:
Next level isolation: Worker Groups can be isolated using RBAC rules, which can be managed in the app or via a customer-defined IDP like Okta or Active Directory.
Medium team management: Worker Groups serve as a key building block for building an effective multiple-team deployment of Cribl Stream. Teams assigned to a Worker Group can only work with the data collected and processed by that worker group.
Dedicated data flows: Worker groups can scale to hundreds of TBs daily in processing for dedicated sources, pipelines, and destinations.
Cribl Stream Projects
Stream Projects in Cribl is a relatively newer feature that allows for the most granular and isolated organization and management of end-to-end data flows for teams working with similar data sources. Projects enable:
Deeper levels of isolation: Projects provide the highest degree of isolation as they can be set up to provide a subset of data to a team, apply masking and filtering before sending it to the team’s project, and restrict where the Project team can send the data.
Granular team management: Projects enable small teams to manage their data within a single Worker Groups without seeing or affecting other teams. This is essential for teams whose members may need different access levels to different data sources.
Per team version control: Projects support per project version control, making it easier to manage changes and collaborate across teams. This feature facilitates safer deployments and easier rollback procedures.
A simplified experience: Projects ensure that data experts can access what they need without managing the surrounding infrastructure. They are just the source subscriptions, pipelines, and destinations assigned to that team.
Cribl’s product architecture is designed to facilitate multi-team engagement through varying degrees of data and configuration isolation with the added benefit of providing additional data security and scalability.
You can Isolate entire environments with Workspaces in Cribl.Cloud, Isolate Teams within environments with Stream Worker Groups, and finally, isolate specific data feeds and workflows with Stream Projects.