The great unbundling of data

Open formats, composable platforms, and AI agents.

The data warehouse is no longer the centre of gravity.
It’s just one node in a growing constellation.

Modern data strategies aren’t about centralising everything in one place. They’re about composability—the ability to mix and match components, adapt to change, and build around the edges.

At the heart of that shift? Open data formats.

Once a backend technical detail, open formats are now a front-line strategic choice—shaping how companies design infrastructure, adopt tools, and prepare for AI-native workflows.

We’re seeing a new set of standards emerge:

  • Open table formats like Iceberg and Delta Lake
  • In-memory engines like DuckDB and Polars
  • Columnar storage formats like Arrow and Parquet

These aren’t just better technologies. They represent a shift in mindset—from monolithic systems to modular ones. From vendor lock-in to architectural freedom.

Because in 2025, data strategy is defined by flexibility.

We now live in a world where:

  • Storage is disaggregated from compute
  • Query engines are interchangeable
  • AI can generate analytics code on the fly
  • Collaboration spans across departments, clouds, and tools

In this world, open formats are the common language. They enable optionality. They allow data to live where it’s needed, to move when it must, and to be accessed by any engine, tool, or agent that’s fit for the job.

We're also seeing this reflected in the market.

The Databricks acquisition of Neon is one example: combining operational databases and lakehouses under one roof—unified by shared, open foundations.

Analytics databases like QuestDB and OneTick are aligning too. QuestDB now natively supports Parquet. OneTick is evolving in the same direction. The playbook is clear: support open formats, interoperate broadly, and avoid dead ends.

But maybe the most interesting force accelerating this shift is AI.

As generative systems become more capable, the interface between human and data is no longer code—it’s intention. And when code is generated on demand, the burden shifts to the structure beneath it. The format becomes more important than the syntax.

Open formats ensure that data is usable, accessible, and queryable by whatever comes next—whether it’s a traditional engine or an LLM-driven agent.

This leads to a deeper question:

What if the data lakehouse isn’t the endgame?

For years, we’ve treated the centralised lakehouse as the answer to everything. But as AI agents evolve, the need to consolidate all data in one place starts to look less compelling.

Instead, a more likely future is one where intelligent agents orchestrate analytics across distributed data sources. Rather than pipe everything into one central location, these agents tap into accessible open formats where the data already lives.

This has major implications—not just for architecture, but for cost, latency, and security. It makes open formats not just a convenience, but a requirement.

Over the past few months, I’ve spoken with CDOs, CAIOs, chief architects, and engineering leads. One pattern keeps emerging: the firms leaning into openness—open tables, open catalogs, composable architectures—are moving faster. They’re navigating change with less friction and building systems that are more resilient to whatever comes next.

This is the real shift underway.

Not just a new format or engine—but a new philosophy of data:

  • Modular over monolithic
  • Open over proprietary
  • Federated over centralised
  • Designed for agents, not just analysts

Two trends, one direction:
Modular, open data platforms are the future.

Subscribe to JAMES CORCORAN

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe