Why I Can't Shut Up About Warehouse-native Data Platforms

At last we have data platforms that bring us closer to our customers and let us create value for them with our data

Sep 22, 2023

Warehouse-native data solutions are popping up all over the place. In MarTech, solutions like Rudderstack and Hightouch are ways to make your warehouse source of truth for digital marketing while NetSpring (disclosure: I’m an advisor), Houseware, and Kubit bring product analytics into the warehouse. Even the incumbents like Amplitude are starting in on warehouse-native solutions. Given how many times a new data platform innovation has been hailed as the second coming only to end in large cloud provider bills, it’s easy to be skeptical of this latest iteration. However, warehouse-native stands to finally unlock the value of data across the business and is so approachable that we’ve reached a point where there’s no excuse for our platforms to not be warehouse-native. Its ability to unlock our data for customers stands to be game-changing in everything from product development to marketing to customer service.

How We Got Here

A brief history of data platforms goes something like this:

Pre-digital Products (until late ‘00s): Products were offline and generated data at relatively low volume and velocity. Offline data analyses could easily keep up with product development. All was right in the world.
Dawn of Digital Products (~2010): Products became digital, allowing them to quickly evolve and produce high volumes of data. Data tools were ill-equipped for this change and our offline tools and techniques fell way behind the speed of product. Products were now built online in prod while data was relegated to offline warehouses, creating a barrier between the work of data teams and our customers.
Siloed Data Tools (~2015): Nature and SaaS vendors both abhor a vacuum, so we started getting point solutions (e.g., Segment in MarTech and Amplitude in product analytics) to our data problems. They were built to be performant enough to solve our velocity problem, but in the process created a new problem as each solution came with its own data silo. This led to partial solutions as marketing platforms contradicted product platforms which contradicted finance and so on. We spent a lot of money on solutions, but the results were mixed.
Quasi-Prod Warehouse Platforms (late ‘10s): The silos were necessary because while warehouses of that era could hold all of the data, they were too slow to meet the platforms’ use cases. The advent of Snowflake, Databricks, BigQuery, etc. gave us data warehouses that are performant to quasi-prod levels (not really prod level in all cases, but fast enough for most of our use cases). This enabled us to use them as source of truth for low-latency experiences instead of solution-specific silos.

I know - great history lesson, John but. . .

I do, and you should too!

Earning Our Keep By Reuniting With Our Customers

Data are valuable to the extent that they help us grow the business by better serving customers. Dashboards can be useful, but directly impacting the customer through marketing that causes the right person to buy, product development that grows our market penetration, and customer service that keeps the customers we have can be much more important to our bottom line. The acceleration of online products circa 2010 necessitated a great wall between data teams and the customer that relegated us to dashboard-wielding spectators. The move to warehouse-native knocks down that wall and gets us back in the game.

This unlocks many of the use cases for data that we expected to get years ago but couldn’t because siloed data led to us views of the customer that were clearly wrong and eroded trust in our solutions. Now our marketing efforts can easily incorporate inputs from across the enterprise - for example, the segmentation computed offline by an analyst is now trivial to incorporate into a live marketing campaign. Product analytics can look at the whole customer, not just the data sent from the front end instrumentation. . .no more cases of “product analytics says retention rate is x%, but finance says it’s y%” that lead to people distrusting product reporting and just yolo’ing their roadmap on the basis of a few customer feedback sessions. While many of these things might have been theoretically possible before - anything is possible with enough code - in practice most people had to throw in the towel.

The other unlock comes from who can do the work - anyone who can write SQL. In a warehouse-native world, if you can write to a table, you can reach the customer. No longer is an analyst dependent on an engineer in order for their good ideas to reach customers. The data team can be at the forefront of delivering personalized customer experiences while engineering can keep its focus on building the product experiences that only it can build. At last the data team is back on the field and can earn its keep through more than just drowning executives in dashboards.

Where Are We Now

In many ways, the silo solutions were pretty good outside of the fact that they were silos. This makes the solution straightforward - take the same idea and build it to work with the warehouse. This isn’t as easy as just taking the existing data model and exposing it in the warehouse - much like API design requires some thought, there is work that goes into making a data model interoperable with other data models - but this also isn’t an intractable problem. Existing solution providers (e.g., Amplitude) are attempting to retrofit warehouse-nativeness into their products while new providers (e.g., Netspring, Kubit, Rudderstack) are going warehouse-native from the start. We will see more warehouse-native entrants in existing spaces as well as new solutions facilitated by the ability to source data from across the enterprise - for example, customer service seems like an area ripe for a solution in which a comprehensive view of the customer is made available so that better experiences can be provided.

Where Do You Go From Here

In a nutshell, there are three things to do to accelerate your journey to delivering customer benefit through the adoption of a warehouse-native data strategy:

Start expecting any platforms you purchase to be warehouse-native. It’s 2023 and we have the technology. . .there’s no longer an excuse for silos!

Stop buying and developing in existing data silos. . .this is just stuff you’ll have to undo later and delay your time to benefit.

Continue building out your data warehouse as your single source of truth on your customers. . .warehouse-native will unlock the value in the investment you’ve already made and make it easier in the future to realize more value from your investment.

Like any other platform investment, start with an identification and understanding of your most pressing data problems and look for solutions to them - don’t let this become an exercise in solutions in search of a problem. Consider the extent to which data from across the enterprise will be necessary to solve the problem and the level of curation the data requires (low levels of curation can often be solved through lighter-weight ETL and reverse ETL tools). Problems best solved with curated data from across the business are the ones most likely to benefit immediately from warehouse-native solutions such as the ones mentioned earlier as well as the new solutions that continue to hit the market.

And if all of this sounds exciting to you too but a little overwhelming, let’s talk.

Move Fast and Count Things

Discussion about this post