Unraveling the Intricacies of NoSQL Databases

How do NoSQL databases work? Simply Explained!

Estimated read time: 1:20

Learn to use AI like a Pro

Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.

Summary

This video offers a comprehensive overview of NoSQL databases, highlighting their growing popularity among big companies. It starts by explaining the limitations of traditional relational databases, which scale mainly vertically and struggle with handling large volumes of data due to their complex relationships. In contrast, NoSQL databases scale both vertically and horizontally by simplifying data storage into key-value pairs, allowing for easier distribution across multiple servers. The video delves into the advantages of NoSQL, such as better scalability and schemaless flexibility, while also addressing the trade-offs, like eventual consistency and limitations in complex data retrieval. Examples of popular NoSQL systems, such as AWS DynamoDB and Google Cloud BigTable, illustrate their massive scalability and use in modern computing infrastructures. Finally, the meaning behind the term 'NoSQL' is discussed, clarifying its dual interpretation.

Highlights

Relational databases struggle with scaling due to complex data relationships 🧩.
NoSQL simplifies data into key-value pairs for easy distribution 🗝️.
Horizontal scaling in NoSQL is like adding more buildings, not floors 🌆.
Data in NoSQL is partitioned and stored across multiple servers 📚.
During high demand, NoSQL efficiently handles millions of queries per second ⚙️.

Key Takeaways

NoSQL databases scale better than relational databases due to key-value storage 🌟.
They support both vertical and horizontal scaling, unlike traditional databases 🏢➡️🏢.
NoSQL databases are schemaless, allowing flexible data structures 🌐.
Examples of NoSQL databases include DynamoDB, BigTable, and CosmosDB 🌩️.
NoSQL databases offer great scalability, as seen in systems like Amazon's during Prime Day ⚡.

Overview

NoSQL databases have surged in popularity, particularly among large enterprises needing to manage vast amounts of data efficiently. These systems stand out by simplifying the data storage process into key-value pairs, which eases the distribution of data across multiple servers. This capability allows for what is known as horizontal scaling, akin to adding more buildings rather than just floors to increase capacity, making them far more scalable than traditional relational databases.

One of the significant benefits of NoSQL databases is their schemaless nature, which permits a flexible approach to data structure. Unlike relational databases that require a fixed schema, NoSQL allows for a dynamic arrangement of data, adapting to evolving needs without extensive restructuring. However, NoSQL isn't without its trade-offs, which include challenges in complex data retrieval and eventual consistency, wherein real-time data retrieval might not always reflect the very latest updates immediately across all servers.

Despite these challenges, NoSQL databases like AWS's DynamoDB, Google Cloud's BigTable, and Azure's CosmosDB remain pivotal in the IT landscape, offering robust solutions for handling large-scale operations. Their ability to manage massive volumes of requests was showcased on occasions like Amazon's Prime Day, which saw unprecedented spikes in queries. The term 'NoSQL' encapsulates this world of flexible, scalable database solutions, emphasizing their ability to complement rather than completely replace traditional database systems.

How do NoSQL databases work? Simply Explained! Transcription

00:00 - 00:30 NoSQL databases have become very popular. Big companies rely on them to store hundreds of petabytes of data and run millions of queries per second. But what is a NoSQL database? How does it work, and why does it scale so much better than traditional, relational databases? Let's start by quickly explaining the problem with relational databases like MySQL, MariaDB, SQL Server, and alike. These are built to store relational data as efficiently as possible. You can have a table for customers, orders, and products, linking together logically:
00:30 - 01:00 customers place orders and orders contain products. This tight organization is great for managing your data, but it comes at a cost: relational databases have a hard time scaling. They have to maintain these relationships, and that's an intensive process, requiring a lot of memory and compute power. So for a while, you can keep upgrading your database server, but at some point, it won't be able to handle the load.
01:00 - 01:30 In technical terms, we say that relational databases can scale vertically, but not horizontally, whereas NoSQL databases can scale both vertically and horizontally. You can compare this to a building: vertically scaling means adding more floors to an existing building, while horizontal scaling means adding more buildings. You intuitively understand that vertical scaling is only possible to a certain extend, while horizontal scaling is much more powerful. Why do NoSQL databases scale so well?
01:30 - 02:00 Well, first of all, they do away with these costly relationships. In NoSQL, every item in the database stands on its own. This simple modification means that they're essentially key-value stores. Each item in the database only has two fields: a unique key and a value. For instance: when you want to store product information, you can use the product's bar code as the key and the product name as the value. This seems restrictive, but the value can be something like a JSON document containing
02:00 - 02:30 more data, like price and description. This simpler design is why NoSQL databases scale better. If a single database server is not enough to store all your data or handle all the queries, you can split the workload across two or more servers. Each server will then be responsible for only a part of your database. To give an example: Apple runs a NoSQL database that consists of 75,000 servers. In NoSQL terms, these parts of your database are called partitions, and it brings up a
02:30 - 03:00 question. If your database is split across potentially thousands of partitions, how do you know where an item is stored? That's where the primary key comes in. Remember, NoSQL databases are key-value stores, and the key determines on what partition an item will be stored. Behind-the-scenes, NoSQL databases use a hash function to convert each item's primary key into a number that falls into a fixed range.
03:00 - 03:30 Say between 0 and 100. This hash value and the range is then used to determine where to store an item. If your database is small enough or doesn't get many requests, you can put everything on a single server. This one will then be responsible for the entire range. If that server is becoming overloaded, you can add a secondary server, which means that the range will be split in half. Server 1 will be responsible for all items with a hash between 0 and 50, while server
03:30 - 04:00 2 will store everything between 50 and 100. Theoretically, you've now doubled your database capacity: both in terms of storage and in the number of queries you can execute. This range is also called a keyspace. It's a simple system that solves two problems: where to store new items and where to find existing ones. All you have to do is calculate the hash of an item's key and keep track of which server is responsible for which part of the keyspace.
04:00 - 04:30 Now, in this example, the range of 0 to 100 is a bit small. It would only allow you to split up your database into 100 pieces at most. So, real NoSQL databases have much bigger key spaces, allowing them to scale almost without restrictions. Besides great scalability, NoSQL is schemaless, which means that items in the database don't need to have the same structure. Each one can be completely different. In a relational database, you have to define your table's structure, and then each item
04:30 - 05:00 must conform to it. Changing this structure isn't straightforward and could even lead to loss of data. Not having a schema can be a big advantage if your application and data structure is constantly evolving. At this point, it's clear that NoSQL databases have certain advantages over relational ones. But that's not to say that relational databases are obsolete, far from it. NoSQL is more limited in the way you can retrieve your data, only allowing you to retrieve items
05:00 - 05:30 by their primary key. Finding orders by ID is no problem, but finding all orders above a certain amount would be very inefficient. Relational databases, on the other hand, have no trouble with this. There are workarounds for this issue, but only if you know how you're going to access your data. And that might not always be the case. Another downside is that NoSQL databases are eventually consistent. When you write a new item to the database and try to read it back straight away, it
05:30 - 06:00 might not be returned. As I've explained, NoSQL splits your database into partitions. But each partition is mirrored across multiple servers. That way, a server can go down without much impact. When you write a new item to the database, one of these mirrors will store the new item and then copy it to the others in the background. This process might take a little bit of time. So when you read that item, the NoSQL database might try to read it from a mirror that doesn't
06:00 - 06:30 have it yet. This is not a big issue in practice because data is replicated in just a few milliseconds. And if you want consistency, most NoSQL databases do have that option. So, in summary: both NoSQL and relational databases will be around for the foreseeable future. Each with their own strengths and weaknesses. So now you know how NoSQL works, let's look at a few examples. Cloud providers heavily promote NoSQL because they can scale it more easily.
06:30 - 07:00 AWS has DynamoDB, Google Cloud has BigTable, and Azure has CosmosDB. To give you another example of their scalability: during Amazon Prime Day in 2019, Amazon's NoSQL database peaked at 45 million requests per second. That's mind-boggling! But you can also run NoSQL databases yourself with software like Cassandra (which was developed by Facebook), Scylla, CouchDB, MongoDB, and more.
07:00 - 07:30 Before ending this video, let's quickly talk about the name "NoSQL." It's a bit confusing as it can be interpreted in two ways. First up: "NoSQL" can mean "not only SQL," pointing to the fact that some NoSQL databases partially understand the SQL query language, on top of their own query capabilities. And secondly, it's often called "NoSQL" in the sense of "non-relational" because it can't easily store relational data. So that was it for this video.
07:30 - 08:00 Please subscribe if you learned something from it, and I hope to see you in the next video!