Pragmatic Orthodoxy - Data Signals #1 - 03.11.25
                            
                    Hello,
welcome to a new bi-weekly format of this newsletter. I am thinking about it a while already, how I can share interesting content I come across. It shouldn't be just a next, here is a list of great articles I read, type of post. These are great, but there are already plenty of them.
So, I developed this signal format. The basic idea is that I highlight interesting ideas and thoughts I come across when reading a post. Then spend some time with an AI model to riff and brainstorm over these thoughts and then compile a new edition of this post featuring a specific thought or idea that you can simple read or dive deeper into it by reading the referenced source content. At the end I will describe the whole workflow, so you get an idea how this format is created.
Before we start, I am running a free workshop in two weeks:
From Product Analytics to Growth Intelligence: The Metrics That Actually Explain Growth
We still have 25 seats open. Claim yours here.
Signals #1: When the Old Orthodoxies Crumble
I like to read any article about data architecture/model—stuff from people building actual systems, not just theorizing—and something keeps showing up. The orthodoxies we all learned in the 2000s and 2010s? They're being quietly replaced. Not with chaos or some flashy new paradigm, but with something more pragmatic.
The pattern I'm seeing: start simple, add complexity only when reality demands it. And the interesting part is how this same idea shows up across completely different domains. Let me walk you through what I mean.
Signal 1: Storage is Cheap, Your Time is Not
Source: Zach Wilson & Sahar Massachi - "Stop Using Slowly-Changing Dimensions!"
Link: https://blog.dataexpert.io/p/the-data-warehouse-setup-no-one-taught
Look, we're still teaching Slowly-Changing Dimension Type 2 patterns like it's 2008. Zach and Sahar call this out directly. These patterns were designed for an era when storage was expensive, and we had to be clever about how we tracked changes over time. I am guilty of this as well. It's baked in muscle memory.
Their alternative? Just date-stamp your data snapshots. Embrace intentional duplication. Boom. The insight that landed for me: "Storage is cheap. Your time is not."
Here's what they propose: a three-tier architecture (raw → silver → gold) (tbf - I don't like any of these medallion references, mine are raw, core, analytics - but we all like different names) where everything starts virtual. Just views. You only materialize tables when performance data actually proves a query path is hot. No premature optimization. No complex versioning schemes until you have evidence you need them.
One takeaway: The bottleneck shifted from storage costs to engineering team bandwidth. We should be optimizing for human time, not disk space.
Takeaway for me - switching a current local project to data snapshots and see how it goes and feels like.
Now here's where it gets interesting. This "virtual first, physicalize when proven" thinking? It shows up in an unexpected place—data modeling patterns that span Kimball, Data Vault, and even graph databases.
Signal 2: One Model, Many Projections
Source: Robert Anderson - "One Model, Many Projections: Toggling Stars, Graph, and Data Vault"
Link: https://medium.com/@rdo.anderson/one-model-many-projections-toggling-stars-graph-and-data-vault-9d9d937e0986
Robert's HOOK pattern does something clever. Define identity, time, and proof once in your semantic layer. Then render them as whatever shape you need: Kimball star schema for BI, property graphs for lineage, Data Vault for audit trails. All of it starts as virtual views.
The key insight for me: these aren't competing models that require separate implementations. They're different projections of the same semantic foundation. You create Kimball dimensions as views over your core layer. Need a hot fact table for heavy BI workload? Fine, materialize just that specific aggregation. Need graph traversals? Add adjacency caches only when deep traversals prove slow.
Think about what this means—you never remodel, you re-render. New use case? Add a projection. Don't rebuild the foundation.
One takeaway: Stop choosing between modeling approaches. Build the semantic foundation once, project it many ways.
I am doing this right now in my current own stack. My semantic layer here is a Yaml model that defines the core entities, how they build up and how you use them for analytics work. Based on that I can materialize them in any way.
The thing both Zach and Robert are assuming: you need semantic clarity first. You need to know what your entities are and how they relate before you can toggle between physical representations. That's where conceptual modeling stops being just tactical and becomes strategic.
Signal 3: Discovery Path vs Design Path
Source: Juha Korpela - "Conceptual Modeling: Thinking Beyond Solution Design"
Link: https://commonsensedata.substack.com/p/conceptual-modeling-thinking-beyond
Juha's substack is quite new - make sure to subscribe.
Juha challenges something I think most of us take for granted: that data modeling exists only to design specific solutions. He proposes conceptual models actually operate on two paths. There's a DESIGN path—yeah, creating specific solutions. But there's also a DISCOVERY path—building enterprise semantic knowledge that accumulates over time.
The historical failure we've all seen? Those massive Enterprise Data Models that tried to comprehensively model entire organizations top-down. They collapsed under their own scope. Every time.
Juha's alternative makes a lot more sense to me: domain-driven consolidation. Multiple models organized around business domains, built bottom-up from actual solution work. Not a monolith, but an "interlinked collection." I do this in my projects as well.
Here's the payoff: semantic discoveries from solution-level work feed an enterprise repository, which then accelerates future solution designs. It's a self-reinforcing loop where tactical work builds strategic assets.
One takeaway: Don't try to model "everything." Model domains, then connect them. Bottom-up emergence beats top-down declaration.
So what's the pattern underneath all of this?
Notice what ties these together. Zach says materialize only proven hot paths. Robert says create projections only for actual use cases. Juha says build domains from real solution work, not abstract enterprise schemas.
Signal 4: Evidence-Based Complexity
What I'm seeing across these signals is a philosophical shift I'd call for new evidence-based complexity.
The old orthodoxy was prevention-based. Design for scale upfront. Normalize aggressively. Create comprehensive models before building anything. The implicit assumption: change is expensive, so get it right the first time.
The new orthodoxy is adaptation-based. Start with the simplest thing that works. Instrument it. Let reality show you where complexity is justified. The implicit assumption: change is cheap now—storage is cheap, compute is cheap, refactoring is manageable—so invest complexity only where it delivers proven value. We could call this agile, but this was burned a long time ago.
This doesn't mean "move fast and break things." It means "move deliberately and let data guide investment." Zach's three-tier virtual architecture. Robert's toggle-to-physicalize pattern. Juha's bottom-up domain consolidation. All variations on the same theme.
One takeaway: Treat architectural complexity as a response to measured constraint, not a hedge against imagined future needs.
How this edition works:
I collect interesting posts in Readwise's Reader (I am using the Mac app). Here, I also follow the blogs I am interested in. So discovery are usually three areas: Linkedin (far less than it was), Reader (Blog RSS) and ChatGPT or Claude Research (when I have a topic I want to learn more about). I scan the posts or watch the videos on higher speed (my attention span for work topics is difficult). And mark ideas, thoughts and concept that I find interesting. Then I extract them and distill them with my thoughts about by a LLM (Claude Code at the moment) into my obsidian vault.
When it's time for the signal edition, I use the LLM again to discover potential topics. Then we find 2-3 sources in my vault and then we develop together a main thought that ties them together and a storyline to tell this thought. This goes down to very specific bullet points. I then let the model write the final version based on my writing style.
If you have any feedback - hit reply and let me know.