Data Observability: POC, Implementation and Process with Monte Carlo

A modern data stack has a number of components that you can’t live without. Until about a month ago, I’m really embarrassed to say that I never considered or even thought about data observability. Maybe the concept in a different form, like “notification and monitoring”. But after my discussion with my friends over at Monte Carlo, I was really excited to dig deeper into this process.

First, let us level set on the definition of Data Observability. Monte Carlo defines observability as:

Data Observability refers to the monitoring, tracking, and triaging of incidents to prevent downtime.

Although I agree that tactically, that is appropriate. I would suggest a holistic definition that appropriately fits the “why” that our business is interested in:

Data observability refers to the ability of a business to completely observe data pipelines, from ingestion to reporting so that confidence in data integrity and consistency remains high. Data Observability also gives the ability to view and respond to data opportunities as they happen, keeping data up to date and increasing customer service.

We are currently implementing a POC with Monte Carlo to really see how strongly we can leverage the platform. In the last 12 months, we have built an entire data infrastructure from the ground up (including the team). Now, we are full force delivering automated reports as well as a full business intelligence platform with Looker.

In a previous article (which I need to update), I talked about the 4 Pillars of Analytics. As my career has grown and responsibilities have increased, I realized this is just a subset of the entire view of an analytics infrastructure. As I learned as a CTO and now VP of Analytics at Shiftkey, there are more things to think about than just the executable stack. The pillars are still correct (Collection, Validation, Enhancement and Accessibility) but stepping back as a leader who is accountable to the platform, you need to understand the modern stack approach to engineering infrastructure. As yourself these questions:

  1. How will I know if something is wrong in the pipeline (availability, quality, etc)?
  2. When will I know if something is wrong in the pipeline?
  3. How is my analytics infrastructure being used and how often?
  4. What opportunities do I have to remove technical debt and complexity?
  5. If I make a change, do I know what is affected downstream of that change?

I used to think of these items as just table stakes of monitoring and notifications but it is much more than that. You need your team to have that full view to help onboard new people quickly but also so they can understand the impact of the changes they are making. Observability gives you the ability to make more informed decisions quickly. Isn’t that what data is all about?

Next Steps

In the coming weeks, I will drill into my POC with Monte Carlo and write about my experience and process so you can get a peek behind the curtain. Ill try to also explain my thought process for each phase. Ill write an article for each of these over the coming weeks.

Initial Implementation (40m)

Digging in: Data Observability: Monte Carlo > First Observability Functionality

Digging in: More refined use cases

Learning AI: How the platform learns about us

Observability: What have we learned about ourselves

Observability: A-Ha moments

POC: Overall Impressions