Screenshot 2019-09-18 21.35.56

Extracting Timestamps from Messy Logs

Last edited: February 14, 2020

In this post we’ll walk through how to use the Auto Timestamp function to extract time fields from:

  • Events with multiple timestamps

  • Events with timestamp offset instead of full timestamps

Events with multiple timestamps

Assume we have events that look like this:

Code example
Sep 20 12:03:55 PA-VM 1,2019/09/20 13:03:58,CRIBL,TRAFFIC,end,2049,2019/09/20 14:03:58,314.817.108.226,10.0.0.102,314.817.108.226,10.0.2.65,cribl,,,incomplete,vsys1,untrusted,trusted,ethernet1/3,ethernet1/2,log-forwarding-default,2018/09/20 13:03:58,574326,1,53722,8088,53722,8088,0x400064,tcp,allow,296,296,0,4,2018/09/20 13:03:45,7,any,0,730277,0x0,United States,10.0.0.0-10.255.255.255,0,4,0,aged-out,0,0,0,0,,PA-VM,from-policy,,,0,,0,,N/A,0,0,0,0

Notice how there are three timestamps, with two different formats and all showing different times:

Timestamp-1: Sep 20 12:03:55, Timestamp-2: 2019/09/20 13:03:58Timestamp-3: 2019/09/20 14:03:58

When the Auto Timestamp function is used with default settings, it will inspect the first 150 bytes (configurable) of the event and extract the first valid timestamp that it encounters. In this case, it will be Timestamp-1.

example1-auto-1024x289

But, what if we wanted to extract Timestamp-3? We have two options:

  • Change function’s Scan Offset to tell it to start looking for a valid timestamp deeper into the event. This will require precise knowledge of the location of the timestamp. E.g.,

example1-offset-1024x289
  •  If the exact location is not known, then we can use Add Timestamp to define a regex to capture a timestamp pattern and provide corresponding timestamp format. E.g,

example1

Events with timestamp offsets instead of full timestamps

Assume we have a dataset that look like this:

Code example
0.012 2020-02-05 17:35:53; Factorio 0.18.3 (build 49258, win64, alpha) 0.013 Operating system: Windows 10 (build 18363) 0.013 Program arguments: "E:\GOG\Factorio\bin\x64\factorio.exe" "--force-opengl" 0.013 Read data path: E:/GOG/Factorio/data 0.013 Write data path: C:/Users/Nate/AppData/Roaming/Factorio [626596/1907076MB] 0.013 Binaries path: E:/GOG/Factorio/bin 0.032 System info: [CPU: AMD Ryzen 7 2700X Eight-Core Processor, 16 cores, RAM: 6553/16310 MB, page: 9579/24406 MB, virtual: 4251/134217727 MB, extended virtual: 0 MB] 0.032 Display options: [FullScreen: 1] [VSync: 1] [UIScale: custom (150.0%)] [Native DPI: 1] [Screen: 255] [Special: lmw] [Lang: en] 0.036 Available displays: 1 0.036 [0]: \\.\DISPLAY1 - NVIDIA GeForce GTX 1070 {0x05, [0,0], 2560x1440, 32bit, 59Hz} 0.684 Initialised OpenGL:[0] GeForce GTX 1070/PCIe/SSE2; driver: 3.3.0 NVIDIA 432.00 0.684 [Extensions] s3tc:yes; KHR_debug:yes; ARB_clear_texture:yes, ARB_copy_image:yes 0.684 [Version] 3.3 0.684 Verbose GraphicsInterfaceOpenGL.cpp:896: [Caps] Tex:32768, TexArr:2048, TexBufSz:131072kB; TexUnits:192; UboSz:64kB 0.684 Graphics settings preset: very-high 0.684 Dedicated video memory size 8192 MB 1.095 Verbose PipelineStateObject.cpp:85: Time to load shaders: 0.411126 seconds. 1.149 Desktop composition is active. 1.150 Graphics options: [Graphics quality: high] [Video memory usage: all] [Light scale: 100%] [DXT: high-quality] [Color: 32bit] 1.150 [Max threads (load/render): 32/16] [Max texture size: 0] [Tex.Stream.: 0] [Rotation quality: normal] [Other: STDC] [B:0,C:0,S:100]

Notice how only the first line has a full timestamp, all other lines have offsets off of that.

To properly assign accurate timestamp to events that follow Event 1 (Line 1) we have to use a cool property of Auto Timestamp (use Last Event’s Time) and then two other functions, namely Regex Extract and Eval.

  • Configure Auto Timestamp  to use Last Event’s Time if it can’t extract a valid timestamp. This effectively means that all events will “inherit” Event 1’s time.

auto-1000x375
  • Use Regex Extract to extract the offset of all events, except for Event 1. Notice the negative lookahead in the Regex field.

auto-regex-1000x420
  • Use Eval to add the offset to the _time field of each event. Convert it to number/float first.

auto-regex-eval-1000x419

The timeoffset field can be Removed in Eval or an internal field can be used instead. E.g. __timeoffset.

 

If you’re trying to deploy LogStream on your environment or have any questions please join our community Slack– we’d love to help you out. If you’re looking to join a fast-paced, innovative team drop us a line at hello@cribl.io– we’re hiring!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog

get started

Choose how to get started

See

Cribl

See demos by use case, by yourself or with one of our team.

Try

Cribl

Get hands-on with a Sandbox or guided Cloud Trial.

Free

Cribl

Process up to 1TB/day, no license required.