Essays

The great unbundling of data: Open formats and composable platforms

The data warehouse is no longer the centre of gravity.
It’s just one node in a growing constellation.

Modern data strategies aren’t about centralising everything in one place. They’re about composability—the ability to mix and match components, adapt to change, and build around the edges.

At the heart of that shift? Open data formats.

Once a backend technical detail, open formats are now a front-line strategic choice—shaping how companies design infrastructure, adopt tools, and prepare for AI-native workflows.

We’re seeing a new set of standards emerge:

Open table formats like Iceberg and Delta Lake
In-memory engines like DuckDB and Polars
Columnar storage formats like Arrow and Parquet

These aren’t just better technologies. They represent a shift in mindset—from monolithic systems to modular ones. From vendor lock-in to architectural freedom.

Because in 2025, data strategy is defined by flexibility.

We now live in a world where:

Storage is disaggregated from compute
Query engines are interchangeable
AI can generate analytics code on the fly
Collaboration spans across departments, clouds, and tools

In this world, open formats are the common language. They enable optionality. They allow data to live where it’s needed, to move when it must, and to be accessed by any engine, tool, or agent that’s fit for the job.

We're also seeing this reflected in the market.

The Databricks acquisition of Neon is one example: combining operational databases and lakehouses under one roof—unified by shared, open foundations.

Analytics databases like QuestDB and OneTick are aligning too. QuestDB now natively supports Parquet. OneTick is evolving in the same direction. The playbook is clear: support open formats, interoperate broadly, and avoid dead ends.

But maybe the most interesting force accelerating this shift is AI.

As generative systems become more capable, the interface between human and data is no longer code—it’s intention. And when code is generated on demand, the burden shifts to the structure beneath it. The format becomes more important than the syntax.

Open formats ensure that data is usable, accessible, and queryable by whatever comes next—whether it’s a traditional engine or an LLM-driven agent.

This leads to a deeper question:

What if the data lakehouse isn’t the endgame?

For years, we’ve treated the centralised lakehouse as the answer to everything. But as AI agents evolve, the need to consolidate all data in one place starts to look less compelling.

Instead, a more likely future is one where intelligent agents orchestrate analytics across distributed data sources. Rather than pipe everything into one central location, these agents tap into accessible open formats where the data already lives.

This has major implications—not just for architecture, but for cost, latency, and security. It makes open formats not just a convenience, but a requirement.

Over the past few months, I’ve spoken with CDOs, CAIOs, chief architects, and engineering leads. One pattern keeps emerging: the firms leaning into openness—open tables, open catalogs, composable architectures—are moving faster. They’re navigating change with less friction and building systems that are more resilient to whatever comes next.

This is the real shift underway.

Not just a new format or engine—but a new philosophy of data:

Modular over monolithic
Open over proprietary
Federated over centralised
Designed for agents, not just analysts

Two trends, one direction:
Modular, open data platforms are the future.

The great unbundling of data: Open formats and composable platforms

What if the data lakehouse isn’t the endgame?

Read next

The economics of intelligence: AI and the efficient frontier

ASICS on the track and ASICs in the data center: Specialization and the pursuit of record performance

Fast and curious: Tsinghua researchers beat Dijkstra’s legendary shortest path algorithm, opening the door to faster routing everywhere