If you have or are building a data team, the most common question I get is “Where do I start?”. Ive created this series of items as a blueprint for building and running a successful team. This is not about the people, that is a post for another day. The focus on this post is to look at the process and components outside of the people.
Data Strategy
Everything starts with the plan and goal. Before anything else, you need to do this and present to senior leadership to gain support and establish confidence.
Setting up a data strategy is crucial for any organization aiming to leverage data effectively to drive decision-making, enhance operations, and achieve business goals. Here’s a concise guide to creating a robust data strategy:
- Define Objectives: Clearly outline the organization’s goals and how data can support them. Identify specific key performance indicators (KPIs) to measure success. These require stakeholder interaction to define the high-level business objectives for the company. If you have aspirations of advanced data components (generative AI, LLMs, etc), then outline the business cases and expectations of needed changes to support these initiatives.
Where it goes wrong: Defining metrics that do not align to company objectives and building KPIs on quantity of work. Don’t do this. Focus on what drives business outcomes. For AI, teams that just talk about technology with no understanding of how it provides value is a huge waste of resources. - Data Governance: Establish data governance policies to ensure data quality, security, and compliance. Designate data stewards and define data access rights to maintain data integrity. Select tools that allow you to maintain ongoing governance and review your plans quarterly.
Where it goes wrong: Establishing the policy but not sticking to it. Relying soely on people to monitor and enforce is not scalable. You need stewards and accountability. - Data Collection: Identify relevant data sources, both internal and external, that align with your objectives. Implement data collection methods and tools while considering data privacy and consent.
Where it goes wrong: Collecting everything is not the way. I used to feel like it was super important to collect everything but that taxes your data team, the infrastructure, and your relationship with engineering. Instead, find the data points that matter, work on a general set of contracts, and start small. - Data Storage and Infrastructure: Determine suitable data storage solutions, such as data warehouses or cloud-based services. Ensure scalability and consider factors like data volume and velocity. Decide if you want a simple system to begin and work your way to more semantic processes. I’d recommend (if money allows), implementing a data platform like Databricks that can accelerate your growth. Also consider data science components and how they will fit into the workflow.
Where it goes wrong: Not building a data lake or data warehouse from the start or as soon as possible. Ive also seen some teams build analytics directly on top of their production databases. Dont be afraid of cloud data structures. - Data Integration: Create a cohesive ecosystem by integrating various data sources. This involves data cleaning, normalization, and transforming data into a unified format. It also includes a critical component of mapping and schema alignment (data contracts).
Where it goes wrong: Not establishing a strong relationship and contracts with engineering teams. It is not the role of data teams to build business logic, the ideal is to get the business logic embedded into the log. Another issue is that teams will not centralize the data at the semantic business layer. Don’t leave it up to the BI to define core aggregations, you will lose control of the context and definitions of core metrics. - Data Analysis and Insights: Employ data analytics tools to extract meaningful insights. Use techniques like data mining, machine learning, and visualization to gain actionable intelligence.
Where it goes wrong: There are 3 things that really destroy productivity. Too many tools, analysts doing every request and too many dashboards. Focus on the business goals and align teams on that. Stop everything else.
Tool: One of my former amazing analysts, Harrison Palmer, introduced me to Hex and it is my new go-to for report generation. - Data-driven Decision Making: Foster a culture of data-driven decision-making across the organization. Encourage stakeholders to rely on data insights to make informed choices. Be a strong partner and consultant.
Where it goes wrong: Three things stand out. First, not setting up tools that ensure data reliability and consistency. Crucial to set up the platform to make sure that stakeholders know that your data is flowing correctly (Data Observability). Second, you need to be a consulting partner for your stakeholders. Build that relationship and help them ask the right questions. Third, once a decision is made, follow up with the results of those decisions. Most teams forget there is another side of the equation. - Data Security: Implement robust security measures to safeguard sensitive data. Conduct regular audits and risk assessments to identify vulnerabilities. Also, have procedures and processes in place for role based access for the platform and BI.
Where it goes wrong: Teams let anything go. Everyone can access everything in favor of speed. This may seem good, but lack of controls is a huge risk and can delay any potential IPO or acquisition. - Data Integrity: Put a system in place that instills confidence in data and ensures that you are building on a solid foundation. The system should be set up to be able to monitor and track the lineage of your data.
Where it goes wrong: Thinking that this is a valid place to cut costs. It can lead to inconsistency of data. Early on, you need an observability tool to make sure the data points you are leveraging for decisions is accurate. You are only as good as the data you make available. You need that trust.
Tool: MonteCarlo is the hands down leader in the observability space. - Training and Awareness: Educate employees about the data strategy, its significance, and how it aligns with the organization’s objectives. Offer training on data tools and techniques. I treat data like a product so if your teams arent in the monthly product reviews, you should have monthly data reviews that showcase all the amazing business value they are bringing to the table.
Where it goes wrong: Training: Not putting dollars or time in budget for team training is going to come back to haunt you. The industry is changing and you need to make sure they stay up to speed. Awareness: People need to know that you are aligned to the business. If someone says “I dont know what the data team is working on” you have failed. - Continuous Improvement: Regularly review and adapt the data strategy as business needs evolve. Stay updated with industry trends and emerging technologies to remain competitive. Listen to your staff and stakeholders. They know the challenges and opportunities that exist.
Where it goes wrong: Data leaders sometimes think they have the answers all themselves. News flash, you do not. Making too many changes that are not needed can cause you and your team pain. Also, if you don’t measure the impact, you have no idea if you are improving.
By following these steps, an organization can establish a comprehensive data strategy that maximizes the potential of its data assets and propels it towards success in an increasingly data-driven world.