Building a New Data Platform: Weeks 6-8

Welcome to the next entry in my series where I’ll be chronicling the journey of building a new data platform, week by week. This project is not just about the technical aspects of platform development but also about the business insights, stakeholder interactions, and the tools that make it all possible. I invite you to my world to come along for the ride.

I recently joined Gallo Mechanical as their VP of Analytics. Gallo is a national leader in the construction services industry and from day one of speaking to them, I knew this was a place I wanted to be. The success of Gallo is a testament to the strength and focus of its leadership, family and employees. Im excited to help bring a data and analytics mindset and apply it to the day to day and strategic operations of the business.


In case you missed it, here are the previous weeks:
Week 1 – Laying the foundation
Week 2 & 3 – Foundation & Planning
Week 4 – Data Modeling
Week 5 – More Modeling

Let’s dive into Week 6.

Where is my time going these last few weeks?

BI Platform, Modeling, Architecture Updates and Stakeholder Interviews

Structure

The more I get into things here at Gallo, I realized I was overcomplicating the data structure. There are really only a few entities that need to be at the core of everything. We do have 3 separate companies that all use the same input systems in different ways (more on that in the future).

As I look through the data models I have built, and after more conversations with stakeholders, I had to rework all the tables and remove the joins I had included. One thing you will see is that many times, you should let the BI platform handle those joins.

Here are a few items to note if you are leveraging Snowflake for your warehouse.

  1. Snowflake is not a relational database, so keys aren’t inherently available. Depending on the ingested data, you might need to set up your tables with these keys (primary and foreign keys). I went through and added these keys manually.
  2. The BI platform you choose might leverage these keys to set up the joins.

Here is an example:

-- SET PRIMARY KEY
ALTER TABLE [database].[schema].[table] ADD PRIMARY KEY ([column]);

-- Company ID
ALTER TABLE [database].[schema].[table]
ADD FOREIGN KEY ([column]) REFERENCES [database].[schema].[table]([column]);

BI POCs (Thoughtspot vs Sigma)

It is critical that you choose the platform that is right for your company, NOT what you are used to. Get a feel for the people who are going to be using it. That defines success.

I ran a POC for each of these platforms. My initial thoughts were that we are a company that uses excel heavily and Sigma was a good choice to hopefully bridge that gap. If you write good SQL and want the familiarity of excel to drive data forward, Sigma is a great choice. It was really easy to understand and build out dashboards. In about 6 hours of work, I was blown away by how much I got done having never used it before. The sales and support team were excellent and were there for any questions I had.

The Thoughtspot team was excellent as well. The resource was a little more hands on. The platform itself was more feature rich and leverages Natural Language Processing (NLP) through the Sage interface. For the business users, this is a killer feature. They could ask questions like “what is the profit, by division, in Texas for the last 6 months?”. I was impressed by the results. Having tried, and failed, to do this with another (more expensive) product, it was crazy to see the results with little setup. Setting up the data was a little more involved but made for a much be

In the end, after meeting with stakeholders, they really want to move beyond excel so that part of it was a non-factor. The rest of the decision was based on the structure of the data and confidence in a more methodical process. We are going with Thoughtspot. If I wanted to move quickly, Sigma was the better choice but would have been more work later on. Writing SQL for every view, including the joins, would have been fast but less scalable. The pricing model for Thoughtspot was also much more transparent. The Essential plan was exactly what we needed and the price was right there on the site.

Focus

Beyond the POCs, my focus has been setting up the correct data structures. Ive settled on ingesting the raw data, creating the core tables (incremental loads) and then building views for current state. This is different from how I set things up at Shiftkey because of the way the data is stored in the VISTA platform.

Also, spent time with the main stakeholders and had a very long meeting with the Finance team understanding how the data is input. This meeting was eye opening and cause a number of pretty substantial changes to my data models. Took me almost a week to rework the data tables.

Setting up the views for Thoughtspot to leverage.

Where we are…

Core Platform: Snowflake
Ingestion: Portable.io
Aggregation: Make.com (short term until summer)
: why not dbt you ask? Because its time consuming and would delay me getting anything out until late summer/fall 2024
BI: Thoughtspot
Data Observability / Monitoring: TBD

Action Summary

  • Reporting platforms POC, Selection and implementation
  • Reworked core aggregation tables
  • Built views to be leveraged in Thoughspot
  • Stakeholder interviews

Looking Ahead

Now I just need to make sure to get the contracts signed for Snowflake and Thoughtspot this week. It’s always good to get these done before end of quarter. I have a conference in Orlando coming up at the end of January so my next week will be building out all of the views that the reporting platform will leverage.


Join me next week as we delve deeper into the intricacies of platform development and witness how a concept gradually transforms into a tangible, functioning entity. Your thoughts and suggestions are always welcome, so feel free to email me your comments. john@slingspace.com