Get the latest AI workflows to boost your productivity and business performance, delivered weekly by expert consultants. Enjoy step-by-step guides, weekly Q&A sessions, and full access to our AI workflow archive.
Summary
This video offers a comprehensive overview of NoSQL databases, highlighting their growing popularity among big companies. It starts by explaining the limitations of traditional relational databases, which scale mainly vertically and struggle with handling large volumes of data due to their complex relationships. In contrast, NoSQL databases scale both vertically and horizontally by simplifying data storage into key-value pairs, allowing for easier distribution across multiple servers. The video delves into the advantages of NoSQL, such as better scalability and schemaless flexibility, while also addressing the trade-offs, like eventual consistency and limitations in complex data retrieval. Examples of popular NoSQL systems, such as AWS DynamoDB and Google Cloud BigTable, illustrate their massive scalability and use in modern computing infrastructures. Finally, the meaning behind the term 'NoSQL' is discussed, clarifying its dual interpretation.
Highlights
Relational databases struggle with scaling due to complex data relationships đ§Š.
NoSQL simplifies data into key-value pairs for easy distribution đī¸.
Horizontal scaling in NoSQL is like adding more buildings, not floors đ.
Data in NoSQL is partitioned and stored across multiple servers đ.
During high demand, NoSQL efficiently handles millions of queries per second âī¸.
Key Takeaways
NoSQL databases scale better than relational databases due to key-value storage đ.
They support both vertical and horizontal scaling, unlike traditional databases đĸâĄī¸đĸ.
NoSQL databases are schemaless, allowing flexible data structures đ.
Examples of NoSQL databases include DynamoDB, BigTable, and CosmosDB đŠī¸.
NoSQL databases offer great scalability, as seen in systems like Amazon's during Prime Day âĄ.
Overview
NoSQL databases have surged in popularity, particularly among large enterprises needing to manage vast amounts of data efficiently. These systems stand out by simplifying the data storage process into key-value pairs, which eases the distribution of data across multiple servers. This capability allows for what is known as horizontal scaling, akin to adding more buildings rather than just floors to increase capacity, making them far more scalable than traditional relational databases.
One of the significant benefits of NoSQL databases is their schemaless nature, which permits a flexible approach to data structure. Unlike relational databases that require a fixed schema, NoSQL allows for a dynamic arrangement of data, adapting to evolving needs without extensive restructuring. However, NoSQL isn't without its trade-offs, which include challenges in complex data retrieval and eventual consistency, wherein real-time data retrieval might not always reflect the very latest updates immediately across all servers.
Despite these challenges, NoSQL databases like AWS's DynamoDB, Google Cloud's BigTable, and Azure's CosmosDB remain pivotal in the IT landscape, offering robust solutions for handling large-scale operations. Their ability to manage massive volumes of requests was showcased on occasions like Amazon's Prime Day, which saw unprecedented spikes in queries. The term 'NoSQL' encapsulates this world of flexible, scalable database solutions, emphasizing their ability to complement rather than completely replace traditional database systems.
How do NoSQL databases work? Simply Explained! Transcription
00:00 - 00:30 NoSQL databases have become very popular. Big companies rely on them to store hundreds
of petabytes of data and run millions of queries per second. But what is a NoSQL database? How does it work, and why does it scale so
much better than traditional, relational databases? Let's start by quickly explaining the problem
with relational databases like MySQL, MariaDB, SQL Server, and alike. These are built to store relational data as
efficiently as possible. You can have a table for customers, orders,
and products, linking together logically:
00:30 - 01:00 customers place orders and orders contain
products. This tight organization is great for managing
your data, but it comes at a cost: relational databases have a hard time scaling. They have to maintain these relationships,
and that's an intensive process, requiring a lot of memory and compute power. So for a while, you can keep upgrading your
database server, but at some point, it won't be able to handle the load.
01:00 - 01:30 In technical terms, we say that relational
databases can scale vertically, but not horizontally, whereas NoSQL databases can scale both vertically
and horizontally. You can compare this to a building: vertically
scaling means adding more floors to an existing building, while horizontal scaling means adding
more buildings. You intuitively understand that vertical scaling
is only possible to a certain extend, while horizontal scaling is much more powerful. Why do NoSQL databases scale so well?
01:30 - 02:00 Well, first of all, they do away with these
costly relationships. In NoSQL, every item in the database stands
on its own. This simple modification means that they're
essentially key-value stores. Each item in the database only has two fields:
a unique key and a value. For instance: when you want to store product
information, you can use the product's bar code as the key and the product name as the
value. This seems restrictive, but the value can
be something like a JSON document containing
02:00 - 02:30 more data, like price and description. This simpler design is why NoSQL databases
scale better. If a single database server is not enough
to store all your data or handle all the queries, you can split the workload across two or more
servers. Each server will then be responsible for only
a part of your database. To give an example: Apple runs a NoSQL database
that consists of 75,000 servers. In NoSQL terms, these parts of your database
are called partitions, and it brings up a
02:30 - 03:00 question. If your database is split across potentially
thousands of partitions, how do you know where an item is stored? That's where the primary key comes in. Remember, NoSQL databases are key-value stores,
and the key determines on what partition an item will be stored. Behind-the-scenes, NoSQL databases use a hash
function to convert each item's primary key into a number that falls into a fixed range.
03:00 - 03:30 Say between 0 and 100. This hash value and the range is then used
to determine where to store an item. If your database is small enough or doesn't
get many requests, you can put everything on a single server. This one will then be responsible for the
entire range. If that server is becoming overloaded, you
can add a secondary server, which means that the range will be split in half. Server 1 will be responsible for all items
with a hash between 0 and 50, while server
03:30 - 04:00 2 will store everything between 50 and 100. Theoretically, you've now doubled your database
capacity: both in terms of storage and in the number of queries you can execute. This range is also called a keyspace. It's a simple system that solves two problems:
where to store new items and where to find existing ones. All you have to do is calculate the hash of
an item's key and keep track of which server is responsible for which part of the keyspace.
04:00 - 04:30 Now, in this example, the range of 0 to 100
is a bit small. It would only allow you to split up your database
into 100 pieces at most. So, real NoSQL databases have much bigger
key spaces, allowing them to scale almost without restrictions. Besides great scalability, NoSQL is schemaless,
which means that items in the database don't need to have the same structure. Each one can be completely different. In a relational database, you have to define
your table's structure, and then each item
04:30 - 05:00 must conform to it. Changing this structure isn't straightforward
and could even lead to loss of data. Not having a schema can be a big advantage
if your application and data structure is constantly evolving. At this point, it's clear that NoSQL databases
have certain advantages over relational ones. But that's not to say that relational databases
are obsolete, far from it. NoSQL is more limited in the way you can retrieve
your data, only allowing you to retrieve items
05:00 - 05:30 by their primary key. Finding orders by ID is no problem, but finding
all orders above a certain amount would be very inefficient. Relational databases, on the other hand, have
no trouble with this. There are workarounds for this issue, but
only if you know how you're going to access your data. And that might not always be the case. Another downside is that NoSQL databases are
eventually consistent. When you write a new item to the database
and try to read it back straight away, it
05:30 - 06:00 might not be returned. As I've explained, NoSQL splits your database
into partitions. But each partition is mirrored across multiple
servers. That way, a server can go down without much
impact. When you write a new item to the database,
one of these mirrors will store the new item and then copy it to the others in the background. This process might take a little bit of time. So when you read that item, the NoSQL database
might try to read it from a mirror that doesn't
06:00 - 06:30 have it yet. This is not a big issue in practice because
data is replicated in just a few milliseconds. And if you want consistency, most NoSQL databases
do have that option. So, in summary: both NoSQL and relational
databases will be around for the foreseeable future. Each with their own strengths and weaknesses. So now you know how NoSQL works, let's look
at a few examples. Cloud providers heavily promote NoSQL because
they can scale it more easily.
06:30 - 07:00 AWS has DynamoDB, Google Cloud has BigTable,
and Azure has CosmosDB. To give you another example of their scalability:
during Amazon Prime Day in 2019, Amazon's NoSQL database peaked at 45 million requests
per second. That's mind-boggling! But you can also run NoSQL databases yourself
with software like Cassandra (which was developed by Facebook), Scylla, CouchDB, MongoDB, and
more.
07:00 - 07:30 Before ending this video, let's quickly talk
about the name "NoSQL." It's a bit confusing as it can be interpreted
in two ways. First up: "NoSQL" can mean "not only SQL,"
pointing to the fact that some NoSQL databases partially understand the SQL query language,
on top of their own query capabilities. And secondly, it's often called "NoSQL" in
the sense of "non-relational" because it can't easily store relational data. So that was it for this video.
07:30 - 08:00 Please subscribe if you learned something
from it, and I hope to see you in the next video!