What is Database Sharding?

Jun 08, 2021

So. So how do you query this

database

? I would then optimize the queries using an SQ optimizer. Well, let's say we have a lot of data. So optimizing queries is, you know, old school. So. I could make an index on the table. Alright. Indexing is nice, but we are looking for something that is serious. Well, we have a lot of data. So can we use a NoSQL? No, we're not going to learn audio anymore. Now, for the last time,

what

do you think we should do? So I'll shorten, I'll use shortening. Hmm. Well. Hired. What is fragmentation?

Let's say you have pizza and you can't eat it all by yourself. So you divide it into portions and call your friends. More than eight friends. Now each of these friends will receive a slice of pizza. What you have effectively done is divide the pizza according to each friend's portion. Just like that, we can have servers that will take the load of the requests, which will be sent to it. So if there's a server here, how does that get brought to the pizza model? User ID number zero will start here. 100 starts here, 200. What you've effectively done is take all the server requests you had and assign them to a pizza so that each of these slices will be served by a server, in this case the server ID.

More Interesting Facts About,

what is database sharding...

Number six. The key thing to keep in mind here is that we couldn't eat the entire pizza alone. We needed friends to finish the pizza. Handle pizza efficiently. And when you get your friends together, you're effectively reaching for the pizza and tearing it into pieces. When you do that, you're splitting the pizza. This type of partition, which uses some type of key to divide data into pieces and assign them to different servers, is called horizontal partitioning. Horizontal partitioning depends on a key, which is an attribute of the data you are storing in the partition. You can contrast this with vertical partitioning.

There is a link in the description below, which uses columns to effectively partition data. But we are focusing on the horizontal partitioning part and specifically we are focusing on one concept, which is fragmentation. Now, we mentioned that

sharding

is taking an attribute of the data and splitting it so that each server gets a shard. But

what

I mean by servers is that the servers here are

database

servers. We can contrast this with what we have been talking about so far about normal servers. Normal servers are application servers. They are platform servers that handle data, but try to be as stateless as possible to keep things decoupled and really clean and nice.

This will take care of the essence of the data, okay? And we can't afford to make mistakes here. Consistency is important. This is one of the key attributes of any database: any data that persists in it is what you can read later. And there is some kind of synchronization: if a person makes an update, the new request will read that update. Well? So that's consistency. We also look at availability, which means that the database should not fail or remain down. You don't want that, you want your application to run all the time, but consistency trumps availability.

When it comes to data, in most cases there is more to think about. What should you chunk your data into? In our case, we have used a user ID, but in apps like Lindo, which use location, the location can be fragmented. And then if a person says, find me all users in C x, then X can drop into this chunk and all you need to do is read this chunk, which is what this database, the database server number seven, can do for you. , good? That chunk will be smaller, it will also be easier to maintain, and it will probably give you faster performance.

All the good things about fragmentation? And the first problem that must be taken into account is the unions between fragments. If they are between fragments, what will happen is that the query must be directed to two different fragments. They need to extract your data and then join it over the network. And this is going to be extremely expensive. So here is one of the problems. The second point comes when you look at the pizza and realize that it is completely inflexible. The fragments are inflexible. You can't have more slices of pizza or fewer slices of pizza.

It is done. But we want our database servers to be flexible in number. So one of the really good algorithms for this is consistent hashing. You should take a look at that. There is a database that actually uses this, and it's the cache, right? This doesn't really implement a consistent hash. You can use application logic over charged database memory to do your work. So it's not really a problem. It might be a problem, but you can't have a dynamic number of shots. Now, to overcome this problem, what we do is take a photo that contains too much data and then dynamically split it into pieces.

So this slice of pizza is like a pizza to us. Yes, when we expand it enough, it will be a very large portion and then we divide it into smaller pieces. So there will be some kind of manager for each particular shot, which will assign the requests to the correct mini slice, so to speak, on the pizza slice, on a single slice of pizza. Using this technique, which is hierarchical fragmentation, we can get rid of the inflexibility here. So point number two is no longer a big deal. Now, one of the smart things to do here is to create an index of these fragments.

Assuming your query requires this index can be on a completely different attribute compared to the user id. And one of the good examples of this is finding all the people in New York who are over 50 years old. So if these are the city IDs, then New York will land, let's say here, and then you can index on age. This way you will find all users from New York within a certain age range. So all your queries are fast. That's the most important thing about fragmentation. Your read performance increases and your correct performance increases because all your queries are focused on a particular point.

But what happens if a fragment fails? Let's say there is some kind of electrical problem there. In that case you could have something like a master-slave architecture. The master-slave architecture is a very common architecture. What happens in this is that you have multiple slaves that are copying the master. As long as there is a correct request, it is always in the hands of the teacher. The master is the most up-to-date copy, while the slaves continually pull from the master and read from it. What happens then is that if there is a read request, it can be distributed among the slaves.

Whereas if there is a correct request, it always goes to the teacher. In case the master fails, the slaves choose a master among themselves, right? And there is a nice single point of fault tolerance here. Conceptually, it's pretty easy. You just take your data, split it into chunks, split it into ranges essentially, and then persist it in different places. But when it comes to practical application, this is quite difficult because consistency of this kind is difficult to achieve. And if you're just getting started with your system and thinking about charting, I suggest you keep that in mind.

Although mechanisms like indexing, such as using NoSQL databases, that internally use these types of concepts, using those out-of-the-box solutions or using known solutions like indexing is probably the way to go before going for

sharding

, a database Even harder than sharding is hitting the Like and Subscribe button at the same time. If you can do that, you'll get notifications for more videos and I'll see you next time.

Watch Video & Subscribe

If you have any copyright issue, please Contact