New Approaches To the Old-School Relational Database
August 2, 2022
Editor’s Note: As two investors keeping a close eye on the world of data, Kaitlyn Henry from OpenView and Sam Broner from Dorm Room Fund share their perspectives on the history of the great relational database debate and new opportunities for the category.
Data infrastructure is all the rage. Whether you’re an engineer trying to choose the most scalable architecture, an operator trying to future-proof your business, or a founder trying to improve data utility, companies large and small are trying new things, questioning the status quo, and building large, diverse data teams to support new initiatives.
Databases are a fundamental part of this conversation. There has been no shortage of database innovation in the past 20 years as we’ve gone through everything from SQL to NoSQL, Platform as a Service (PaaS) to object-relational mapping (ORMs). While a thousand different types of databases have had their moments in the spotlight, by no means has the space for innovation disappeared.
SQL vs. NoSQL: An age-old debate
If you’ve been a developer in the last 50 years, you have almost certainly used a relational SQL database or written SQL code—and you may not have exactly enjoyed it. Hating relational databases has been a topic of discussion in the developer community since at least 2007, and remains present in conference keynotes and tech blogs to this day.
Relational databases and the SQL query language were built in an era when a relatively small amount of data lived in one place and could be neatly organized into tables and columns. Introduced in the early 1960s, they promised consistent, structured data, but had shortcomings when it came to scale. This was only exacerbated as companies started producing more data and making the shift to the cloud.
Unfortunately, these changes have only made relational databases increasingly difficult to work with. Despite widespread recognition of their use in most applications, it took a lot of effort for developers to realize just how powerful these databases could be. The rise of the data architect and database administrator function only seemed to make things worse. You’d often see friction created between these database specialists and groaning developers saying, “just give me the data.”
These shortcomings led to the rise of NoSQL databases, which traded neat, transactional data models for horizontal scale and a renewed focus on developer experience. MongoDB is the poster child of this era, IPOing in 2017 and currently commanding a $19B market cap as of July 2022.
Relational databases: What’s old is new again
It’s good to see the debate over SQL versus NoSQL die down over the past few years as the discussion shifts toward finding the right tool for the job.
As investors, we’re starting to see operators recognize that relational databases do have a place in the modern data stack. Founders are also reimagining the relational database from first principles, with an emphasis on three key principles:
- Developer experience;
- Fault tolerance and reliability; and
- Purpose-built databases.
Poor developer experience was a big part of what gave rise to the NoSQL era in the first place. “MongoDB is not the most sophisticated database, but it’s captured huge market share because they focused on developer experience,” said Yury Selivanov, co-founder and CEO of EdgeDB. “They threw out SQL and gave developers simple APIs. That resonated with the community.”
Companies that were perhaps better suited for the transactional nature of a relational database opted for MongoDB instead, because of its ease of use.
“I was surprised, initially, to learn that a company like Stripe uses MongoDB,” said Selivanov. “But they were probably so fed up with bad UX that they accepted the tradeoffs.”
Abstraction layers like ORMs and PaaS have made major strides in developer experience. Companies like Hasura, Prisma, Supabase, and others have amassed large developer followings and made it easier to build data-driven applications.
It’s a crowded category for new entrants in the startup landscape, but we continue to see more improvements for developers working with databases. We’re talking about additional wiggle room for devs to work thanks to the functionality of abstraction layers.
Companies like EdgeDB and Planetscale are pursuing this strategy, and we expect others like it to emerge in the future.
Fault tolerance & reliability
Even with cloud providers’ claiming to improve service uptime, many products continue to focus on resilience and fault tolerance. These products are attempting to push the existing boundaries set in the consistency, availability, and partition tolerance (CAP) theorem by meeting all three goals.
In 2012, Google’s Cloud Spanner paved the way for the creation of more horizontally scalable and container-native services, now known as “distributed SQL” databases. Since then, large and enduring products have been built in the space like CockroachDB (valued at $5B in December 2021), Apache Cassandra, and Azure Cosmos DB—and new unicorn-valued companies like Yugabyte.
Purpose-built database experiences
As databases enter their sixth decade of existence, we are starting to see big companies being built on specialized database software. Though it’s unlikely that any one particular speciality database will take over the world, we do believe that the database footprint inside an organization will get more heterogeneous over time.
Within purpose-built databases, we’re particularly excited about real-time databases, in-database machine learning (ML), and next-generation online analytical processing (OLAP) databases.
The need for real-time data is no longer relegated to traditionally niche industries, like IoT or financial services. As businesses of all shapes and sizes become data-driven and automate more processes, getting information in real-time is becoming increasingly important.
Rather than tweaking existing infrastructure to make it fit real-time use cases, we’re seeing developers opt for “in-memory” data stores–unlike data stored on disks–like Redis or newer, purpose-built real-time databases like Materialize.
Real-time libraries like Microsoft’s Fluid Framework make it easy to solve for consistency (like in the CAP theorem) across distributed databases delivering a real-time experience to customers. Companies like Readyset focused on caching layers can also provide the real-time experience–in particular to a SQL database.
In-database machine learning
Additionally, more companies are finding ways to tap into machine learning (ML) and predictive analytics, especially in common go-to-market issues like customer churn or forecasting inventory. In-database machine learning makes the data-to-predictions workflow as lightweight as possible, and will likely become a cornerstone feature of every SQL database in the future.
Major cloud players are already supporting in-database machine learning, like Amazon Redshift ML and BigQuery ML from Google. Startups like MindsDB are also helping developers add in-database ML to your existing SQL database.
Next-generation OLAP databases
Cloud data warehouses like Snowflake have been a big part of the analytics boom that we’ve seen over the past few years. They’re a big asset for companies trying to become data-driven. Unsurprisingly, the amount of data engineering tools emerging such as dbt and Airbyte make data warehouses easier to work with.
But these data warehouses aren’t without their shortcomings, as latency continues to be a struggle for certain analytics use cases, particularly for user-facing ones. Newer data warehouse vendors are actively trying to improve this.
There’s also a new generation of OLAP databases that aim to reduce query time and achieve sub-millisecond latency. Rockset was an earlier commercial player in this category. More recently, we’ve seen commercial entities being built off of other next-gen OLAP databases such as Apache Druid, where Imply has made it their business to make Druid easier to use and deploy.
It’s not clear exactly which real-time data use cases will be best suited for these new OLAP players. That said, it is clear that they will be one of many tools in a developer’s tool belt moving forward.
Both old and new databases can be friends
General-purpose databases like MySQL should embrace this heterogeneity, not fight it, by making it easy to integrate with purpose-built databases. We’re already starting to see this happen, with technology like PostgreSQL’s foreign data wrappers making it easier to access external databases.
There are also many opportunities to make queries and run computations easier on data living in multiple databases. While the query language GraphQL has made major strides in this arena, it still lacks the ability to do more complex calculations on data across multiple sources.
Ariga, the company commercializing an open source project called Atlas, takes a declarative approach to schema management. It does all of the schema migration work that GraphQL users might have to do in something like Liquibase today. Ultimately, Atlas makes it easier to query multiple databases as if they were one endpoint.
Relational databases are not the final frontier
We’ve barely scratched the surface of new opportunities in the database ecosystem. There are many interesting use cases for time series databases, graph databases, peer-to-peer databases leveraging InterPlanetary File Systems, and more.
If you’re building a database for the next generation of the data-driven world, we’d love to meet you. You can find Kaitlyn on LinkedIn or at [email protected], and can find Sam on Twitter at @SamBroner.