By Craxel Founder and CEO David Enga
June 17, 2025
What does it say about the Databricks and Snowflake engines that to provide fast enough query for AI that they need to turn to Postgres?
Here's what is going on. Scanning engines (index free approaches) can't provide hot query at scale, because they have to scan too much unnecessary data. For example, in Google Big Query's white paper they talk about how it would require10,000 disk drives and 5,000 processors to scan 1 TB of data in 1 second. Even for Google, hard to split the data into enough pieces to get that sort of parallelization - leaving you with slow query times. Index free scanning engines exist in the first place because relational engines like Postgres can't index the data fast enough because of concurrency problems in their data structure (e.g. BTrees). Therefore, to scale you have to shard across lots of Postgres instances. This is very expensive.
In my opinion, this is a capitulation by Databricks and Snowflake as their customers look at other solutions such as another Postgres variant in AWS Aurora for faster query for AI. Is there any other explanation? Transactional use cases? If so, that just highlights you can't get that with index free engines. So what do you get with them? I'm afraid the answer may be very convenient infrastructure to generate monthly reports.
Everything in data revolves around the indexing problem. Craxel's algorithmic innovation, an O(1) indexing solution that works for complex data, means you can organize data at line speed with transactional consistency so that you can have hot query at petabyte scale without the cost of these older technologies.
Finally, with Craxel's Black Forest, you get instant access to fully connected data (knowledge graphs) at massive scale. Anyone want to try to build connected data (e.g. graph) with Databricks, Snowflake, Aurora, or Postgres?
The headline is wrong - Postgres hasn't won. Craxel's customers are the winners.