💣 Data setups are exploding and I am very curious to see where we will be in 5 years.
I have a fantastic read to share and I warn you it’s a long one and it becomes very database technically. But it’s an important one:
The last 2-3 years in data where a lot about hailing the resurrection of SQL. I think the raise of dbt played a huge part in it. Why? Because they provided a great approach to overcome the problems that big data models based on SQL have.
But for me it does not feel that we are close to a solution. When I work with new companies and new setups, dbt and SQL-based modeling is great. You can start quickly, extend your model in a scalable way. Everything looks like to be under control.
When I work with teams with mature data models I feel pain. These are great models. But the effort to extend, refactor, test and optimize makes me pretty unsatisfied. In software development we have found plenty of approaches to make huge systems extendable, scalable and reliable. Quick development and high quality are characteristics of a good setup. But I don’t have the feeling that we have that in data modeling.
So at night, I still dream of a dbt like framework that relies on something like pandas.
What do you think?