All hail the return of the general-purpose database • The Register

By Daniel G. Barnes Last updated Feb 15, 2022

Paid feature In a tech industry as complex as this, it pays to keep things simple where you can. Databases are no exception. So why do cloud service providers offer multiple databases for different tasks when one was enough before? This article examines how traditional SQL relational database systems have evolved to serve as simpler general-purpose platforms for cloud applications.

First, let’s address the fragmentation of database wallets. Customer expectations have evolved to the point where the traditional RDBMS is inadequate for many, warns Andrew Oliver, senior director of product marketing for MariaDB Corporation.

“Modern apps are still active,” he says. “For many companies, there is no room for failure when the system must survive not only failures, but also the deployment of new software, services and features.”

This was not a problem until web-scale applications evolved. An online service that cannot scale efficiently will suffer from availability issues when it reaches capacity. If your online storefront is showing timeout errors instead of last minute freebie options, that’s a problem.

The limits of RDBMS scaling

The traditional RDBMS was not built with this kind of massive scale in mind. Vendors have solved this problem by increasing the size of their databases. This usually meant hosting them in big-iron servers with memory architectures carefully matched to the software. To scale, you replace your server with an even larger server.

Scaling was an expensive, vendor-bound proposition. Over time, the industry has evolved to more easily increase capacity as it scales. First, he developed distributed systems. Then the cloud made things even simpler. It promised elastic compute and storage capacities to handle the kinds of volatile workloads associated with large-scale web applications. It’s harder to dynamically scale the capacity of an RDBMS running on hardware you’ve already paid for.

Reliability is also an issue. Despite all its promises of never failing, the cloud sometimes takes databases offline. Cloud service providers are selling the idea of multiple Availability Zones to support redundancy, but traditional client-server RDBMS was not designed for that.

“In 2021, my database is running on someone else’s server in someone else’s data center connected to many infrastructures, and I can’t control or predict its future configuration “, says Oliver. “I need software that doesn’t require me to trust the underlying infrastructure.”

One way to support the scalability of RDBMS in the cloud is to use read replicas, with one node for writes and any number of nodes for reads. This allows applications to scale, at least for read requests.

However, the changing nature of today’s large-scale applications has created a need beyond read replicas. Whether you’re updating your location in real time or unwittingly giving your data to an advertising network, there are plenty of apps where the writes outnumber or equal the reads.

The shift to polyglot persistence

Businesses have responded to these challenges by moving away from relational models. This often took a long time. Amazon began to move away from its Oracle-based RDBMS after the database began to fatigue under the weight of so many queries, but didn’t. shut down your last Oracle server until 2019. He developed databases based on key-value stores, columnar architectures and document structures. NoSQL providers like MongoDB and Couchbase have long offered relational alternatives to the market.

Their creators have adapted these systems for applications with needs for resilience, scalability and flexibility. The underlying implication is that every database was the right tool for the right job, otherwise known in database circles as polyglot persistence.

The problem with this emerging fad is that NoSQL databases have often dropped features built into the venerable general-purpose relational model. One of them was support for ACID transactions.

ACID is the gold standard for consistency models. He rocks Atomicity (all operations in a transaction must succeed, otherwise none succeed), VSconsistency (all transactions leave the structurally sound database), Isolution (the transactions do not conflict), and Durability (all transactions are permanent).

Many NoSQL databases focus on BASE consistency, which is more forgiving. Basique AAvailability means that the database is working most of the time. Sthe oft state means stores (such as replicas don’t need to be write-consistent all the time), regarding Eeventual consistency (these stores may update lazily, perhaps at play time). Although most NoSQL databases later added some form of ACID transaction, they still do not offer the same guarantee or level of consistency performance as traditional RDBMSs.

Many applications need ACID’s stricter standards, along with other things that NoSQL databases often leave out, like the ability to use joins for efficient data storage. These more demanding applications include many back-office functions such as finance, where you want to ensure that everyone’s account is balanced at all times.

Companies often bridge the functional gap between these two databases by creating customer-facing NoSQL systems while letting the RDBMS reconcile critical or financial data behind the scenes. This, however, creates complex architectures, while increasing maintenance and support overhead.

Multiple databases of different types mean two sets of developer skills, plus glue code for each database to communicate with the others. The more of these systems you have in your infrastructure, the more expensive skills and glue become.

The Rise of Distributed SQL

How can companies avoid finding themselves in this delicate situation? “What we really need is a database that can handle scale, survive multiple outages across cloud Availability Zones, and provide true consistency,” says Oliver. In short, a versatile platform that properly supports cloud-based models without giving up the benefits of RDBMS.

This is where distributed SQL comes in. This technology is actually a hybrid of SQL and NoSQL technologies, providing the regular SQL-based RDBMS structure while supporting the distributed operations that NoSQL systems are known for.

Distributed SQL partitions and replicates relational tables in the background, allowing them to run on multiple read and write nodes in a scalable environment. MariaDB Xpand, a distributed SQL database, handles this partitioning automatically by using a hash of a fragment key to create index partitions and algorithmically assign them to nodes.

The product automatically rebalances partitions to maintain performance on all available nodes and avoid “hot spots”. This rebalancing also supports the addition and removal (accidental or otherwise) of nodes, always keeping the data consistent and available.

To ensure adequate replication and resiliency, each shard has a set number of copies so that if one node goes missing, another replica is still there. This gives customers at least two replicas of each partition of each table, each running on different Availability Zones.

Distributed SQL in MariaDB SkySQL

MariaDB SkySQL, the cloud version of the open source database, additional support for SQL distributed in May 2021. In addition to traditional row-based indexes, it provides columnar indexes, which allow for more flexibility in data searches while using less space. Traditional row-based indexes require a dedicated composite index for each specific combination of columns they want to use in a query. Columnar indexes can be combined in any way to create a variety of indexes, allowing developers to perform powerful ad hoc queries and intraday operational analyzes against a transactional database.

SkySQL also handles semi-structured documents, as do many NoSQL databases. JSON, the lingua franca of key-value and document database stores, has been part of the SQL standard since SQL:2016. The cloud database can store a JSON document in a table. It can retrieve a JSON document from a row through an SQL projection (a subset of columns in a row).

MariaDB has also added other features to position its cloud database service as a versatile product that can handle things seamlessly for users. These include a database proxy that handles node failures independent of the client. It provisions a single IP address for the application to communicate with all nodes. Instead of returning failed transactions to the application, the database automatically handles transaction replay in the background. This eliminates error handling for failed nodes on the application side.

Benefits of a Generic Data Platform

An SQL database encompassing these properties brings several benefits, not the least of which is the scalability and performance of the distributed database operation. Developers don’t have to write different code to support cloud operations. They can use the SQL language they know to manage their distributed databases, but can still access JSON storage and manipulation if they need it.

Simplifying the architecture of a distributed database in this way makes it possible to reduce development time, and therefore overall costs. It provides customers with a single source of truth that supports ACID-level consistency without having to assemble a bunch of databases serving different applications. It does all of these things while providing the benefits of a scalable cloud architecture.

For many companies, distributed SQL is the middle ground between old and new. This gives them a proven and trusted relationship model that has powered larger organizations, along with the benefits of cloud-based environments.

In a world of many choices, distributed SQL and cloud computing have made the relationship more relevant than ever.

Sponsored by MariaDB.

data storage real time

All hail the return of the general-purpose database • The Register

Related posts: