It’s time to throw another birthday party for the Cribl Sandbox. The platform turns 4 on Nov 25! 🎉 Crazy how the time flies. Take a trip down memory lane with us as we recall how we leveled up the Sandbox platform after our earlier challenges. This is the first in a series of (upcoming) blogs detailing the Sandbox architecture.
In the Beginning There was a Git Log
Here’s the first git log from our Cribl Sandbox repo:
brendan cribl-sandbox % git log $(git rev-list --max-parents=0 HEAD)
commit de467bee72cf45d69f8c6e3e5d638d2b1ff72d9b
Author: Clint Sharp <redacted>
Date: Mon Nov 25 12:13:25 2019 -0800
initial commit
Since we launched, things have definitely changed. Initially, we compiled all Sandbox assets into a single Docker image and ran everything on the Amazon ECS Fargate platform. It was revolutionary. It was amazing. Sandboxes, however, took 5-10 minutes to start from cold boot. DNS issues plagued us. Ultimately, it wasn’t the ideal user experience we envisioned.
At this point, we were trying various things to improve speed, performance, and the internal developer experience. Somebody threw out Kubernetes, and we were off to the races rearchitecting the platform. The platform runs on a few EC2 instances that comprise the EKS cluster.
Our goal with the Sandbox platform is to keep the barrier to entry low. What’s the point of making your product hard to use and learn? Unfortunately, this mindset comes at a price in terms of security. Believe it or not, you could launch a Cribl Sandbox by just putting any email address in the form two years ago! And this led to the crypto mining bots taking advantage of the Sandbox platform. (Egress traffic filtering, what’s that?) After racking up a five-figure Sandbox bill in one month, we implemented an email-based one-time passcode system to ensure that email addresses were valid and that the requester was actually human.
Who Wants to Be a Millionaire? A Million Codes, That Is…
Did you know that a six-digit number can generate one million codes? We’ve found this to be pretty effective at keeping the riffraff out of the Sandbox. Along with some pretty strict firewall rules and a WAF to prevent misbehaving folks from accessing the API.
So, how do you quickly implement an email verification system? Use commercial software that does this for you! As we were building Cribl.Cloud, we decided to use their SendGrid account.
We layered Twilio into the mix, et voilà, verification codes were sent and verified before the Sandboxes were started.
For historical reference, this is what Sandbox verification emails looked like:
Functional, but not elegant. I must lean into the screen to read that code even with my glasses on. Using Twilio and Sendgrid also created additional headaches for the Technical Marketing Engineering (TME) team. As a separate organization from Cribl Engineering that develops software and ships Cribl and the Cribl.Cloud platform, we (TME) don’t have administrative access to SendGrid and Twilio. It was nearly impossible to troubleshoot email delivery issues, and users were frustrated when they didn’t receive codes. Twilio also charges $0.10 per verification code. In the grand scheme of things, this isn’t a huge amount, but it starts to add up every month with hundreds of Sandbox users per week.
So we asked ourselves, what if we could do it better, more cost-effectively, and improve visibility into the email delivery process simultaneously? Time to level up!
We built our verification code system from the ground up using: Amazon’s Simple Email Service (SES) for email delivery, DynamoDB for temporary code storage, and nodemailer to format and send emails to SES. And with the magic of editing, it only took a few minutes!
The benefit of switching to nodemailer is that we have complete control over the message template. We took this as an opportunity to improve the aesthetics and usability of the message. Here’s what an email looks like today:
Much improved! Cribl branding, check. Readable code, check. Email address for help, check. Moving to SES, we can now obtain feedback notification messages through SNS topics. We’ve configured the SNS topic to invoke a very simple Lambda function that takes the feedback data and writes it to a new object in an S3 bucket. Feedback data is stored in an ordered prefix structure:
/ses/${domain}/${year}/${month}/${date}/${hour}/${notification_type}-${epochmillis}.json
Stepping up our Observability Game
With the introduction of Dashboards in Cribl Search this summer, it’s never been easier to visualize data. Our TME team used this feature addition to build a simple dashboard providing near-real-time visibility of email code delivery from Amazon SES. We’ll show you what that looks like below. With our data stored in S3 as JSON objects, all we needed to do was add a new Data source with the correct bucket path token syntax:
And with that, it is as easy as now writing some searches! Here are the actual queries powering the dashboard we built:
Total number of messages:
dataset="cribl_sandbox_email_logs" domain="sandbox.cribl.io" | summarize count=count()
Total number of messages by type:
dataset="cribl_sandbox_email_logs" domain="sandbox.cribl.io" | summarize Status=count() by notificationType
Top delivery domains:
dataset="cribl_sandbox_email_logs" domain="sandbox.cribl.io" notificationType="Delivery" | extend recipient=delivery.recipients.0 | extract source="recipient" type=regex @'@(?<recipient_domain>.+)$' | summarize count=count() by recipient_domain | top 10 by count
Get bounce reasons:
dataset="cribl_sandbox_email_logs" domain="sandbox.cribl.io" notificationType="Bounce" | extend recipient=mail.destination.0, reason=bounce.bouncedRecipients.0.diagnosticCode | project recipient, reason | render table
All put together, it looks like the screenshot below. We’ve even taken advantage of the new Dashboard Inputs functionality which makes inspecting different time ranges and data sets easy peasy lemon squeezy.
From zero visibility to full visibility with Cribl Search. 🙌 Our data has helped us identify that multiple users are receiving verification codes, but their email filters are blocking them. Bounced messages are logged with full explanations from SES, which allows us to reach out to users proactively instead of waiting for complaints.
Oh, and this whole setup costs us less than $0.25 per month to run. Excelsior!
Goat-bye For Now
We’ll be back with more blogs detailing the Sandbox architecture. We may be a little biased, but we think it’s a pretty neat platform. How many other companies encourage you to tool around in fully functioning free environments? There’s so much to cover, though, we’ll need… an entire series!
If you haven’t tried out our Sandboxes, or it’s been a while since you’ve taken a look, we encourage you to give them a try. We’re constantly updating them with new content as we add features to the Cribl product suite. Join us on our Community Slack if you would like to share your experience with our Sandboxes!
Sign up for a free Cribl.Cloud account and let us help you get control of and better visibility from your data – no matter what data store or format it’s in!