x
AdobeStock_740579735 (1)

Searchception! Iterative Search Through Prior Search Results

March 19, 2024

An analyst’s process often involves searching through a given set of data many times, refining the question and analytics performed each time. Cribl Search was originally designed to be stateless – executing each search from the original dataset provider(s) with every execution. However, a new feature has been introduced to allow searching into previous cached results, accelerating the analyst process for certain types of iterative search development.

Background and Motivation

All searches in Cribl Search generate result sets, which are cached by Search. Clicking on the History tab will show you the list of Search jobs executed by the current user. Each ad-hoc Search has a JobId, indicated in one column on the History panel. Saved and Scheduled Searches also have a Search Name, since that’s part of their configuration.

You will see the previous Search results displayed when you click on one row of an executed Search. Importantly – that Search DOES NOT re-execute; rather, the cached results of the previous execution are displayed. As a result, those results display very quickly since they’re already fetched from the original source and any transformations have been applied.

Due to the iterative nature of Search construction, it’s often useful to start with a dataset sample and then add incremental Search operators that perform transformations, summarizations, arithmetic and statistical functions, and data enrichment. Traditionally, when the Analyst adds one or more elements to their Search pipeline, they re-execute the Search, incurring both time and cost in Cribl Credits.

Beginning with Search 4.4.4, Users can reuse previous Search results as the source for new Searches using the virtual table $vt_results.

How It Works

Virtual tables are built-in ways to access certain types of metadata internal to Cribl Search. Each one begins with $vt_ , and they each display different types of metadata. $vt_list shows all of the Virtual Tables your user has access to.

You can use virtual table names in place of a dataset name in your search to see the contents of each virtual table. The simple search dataset=$vt_list will show you all of your Virtual Tables. After executing that search, one of the Virtual Tables you’ll see is $vt_results. That virtual table requires one of the following parameters, either jobId or jobName. As mentioned above, jobId can be used for any ad-hoc Search, and jobName can be used with Saved or Scheduled Searches. Attempting to search into $vt_results without specifying either of those parameters will generate an error result set reminding the user of those two fields.

Examples

The easiest way to experiment with $vt_results is to run a simple Search. Let’s use the built-in dataset that comes with Cribl Search: cribl_search_sample. There are three dataSources that are currently pre-populated in cribl_search_sample: syslog, VPC Flowlogs, and web server access_common logs. For this example, we can choose any of those data types, so let’s get a quick sample of access_common log data.

dataset=cribl_search_sample dataSource=access_common | limit 10000

Note here that we will get a large sample of those logs: 10,000 events. Depending on your Internet connection speed, that should take 10-20 seconds to capture and download. Once the search is finished, click the History tab and choose the search row at the top. This will display the full result set but should take only a second or two, much faster than the original search. Then click on the “Metrics” tab, and highlight the Search Job in the upper-left corner of the Details modal. Select Cmd+C (Mac) or Ctrl+C (Windows) to copy the Job ID.

Next, let’s create a new search using the $vt_results virtual table with the Search Job ID we copied from the Details modal.

dataset=$vt_results jobId=

This search will immediately return the exact results from the prior search. This time, however, you can append more Operators to the Search pipeline. For example:

dataset=$vt_results jobId=<copied_job_id> | summarize count() by status

You can also try:

dataset=$vt_results jobId=<copied_job_id> | summarize count() by request

This Search should run almost instantaneously. One important note: the TimePicker in the SearchUI is ignored when using $vt_results.

Now let’s take a look at the raw events, using two of the values from those previous searches.

dataset=$vt_results jobId=1708317608676.YpHC8L (status=400 and request=GET*)

Again, you should be able to see fast access to results. Once you’ve established that this is the exact Search you want, you can go back in and substitute the original Search parameters without result limits to use a true time range as your primary filter of results. As a reminder, it’s always good practice to try piping an unknown result set to | count to ensure you’re not asking to return more results to your browser than your results limit.

dataset=cribl_search_sample dataSource=access_common (status=400 and request=GET*) | count

Once you understand how many events there will be, you can remove the count at the end of the pipeline.

dataset=cribl_search_sample dataSource=access_common (status=400 and request=GET*)

Usage with Saved / Scheduled Searches

All of the previous examples we’ve shown use $jobId as the way to identify previous job results because the system automatically cached your search job results with a generated Job Id. However, if you want to refer to a Search job that you have saved explicitly using the Save action, you can refer to it by its saved name. For example, let’s create a new search and save it after it executes:

dataset="cribl_search_sample" dataSource=*vpc* | limit 1000

Click on Actions… and then Save Search. Enter a name like “test_flowlog_sample” and click on Save.

Since you just ran this search, you should be able to execute the $vt_results search.

dataset=$vt_results jobName="test_flowlog_sample"

This requires that the saved search is executed at least once and hasn’t been deleted by the automatic cleanup mechanism specified in Settings->Limits->Search HIstory TTL.

If you take this one step further and specify a Schedule for this search, you can guarantee that it has been executed at least once. When you search for a Scheduled Search by name, you will get the results from the most recent execution.

Named $vt_results datasets can be used in ad hoc searches and dashboards. One advantage to using $vt_results in dashboards with scheduled searches is to make those dashboards execute faster. As of Search v4.5.0, the “Parent Search” feature in Dashboards allows this technique to be used transparently within a Search panel without requiring explicitly using $vt_results in your search.

One final note for advanced users: it’s possible to combine the results of multiple previous Searches simply by specifying multiple jobId or jobName parameters with a boolean or. For example, you could specify:

dataset=$vt_results (jobId=<job_id_1> or jobId=<job_id_2>)

This syntax will create the union of both results sets and allow you to search into them.

Conclusion

This technique of using $vt_results enables analysts to iterate quickly without re-executing the base search. It’s helpful to be able to grab a small sample of data from a dataset, and then experiment with the structure of your Search. This allows fast iteration and avoids both the time and expense of additional Searches to the original dataset providers. In addition, you can use $vt_results or the Parent Search feature in v4.5.0 in inline searches in Dashboards to provide fast access to Scheduled Search results with customized additional transformations.


 

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

.
Blog
Feature Image

Navigating the Mainframe Logging Maze: Insights for the Modern IT Professional

Read More
.
Blog
Feature Image

The Stream Life Episode 100: Storm Drains and Data Lakes

Read More
.
Blog
Feature Image

Why Netbuilder’s Service Model Is a Win-Win for the Company and Its Clients

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box