YTread Logo
YTread Logo

Whatsapp System Design: Chat Messaging Systems for Interviews

Jun 08, 2021
Hello everyone. This is GKCS. This is a video about WhatsApp

design

. It is a

chat

-based application, so once you know how to

design

WhatsApp, you will be able to largely design any

chat

-based application. The special thing about WhatsApp is that it has group

messaging

and read receipts. Those are the two key characteristics that people look for in a typical

system

s design interview. But there are also other characteristics that we'll talk about and we'll talk about characteristics that we probably shouldn't talk about during an interview and basically choosing the type of things that we're doing so that we can actually finish in the time that we have.
whatsapp system design chat messaging systems for interviews
Now, among all the characteristics you can ask your interviewer, would you like this one? Would you like that? You should probably start simple and you should start with things you already know, because I've noticed that the first characteristic you ask the interviewer usually says yes to. One of the things I'm comfortable with is group messages. So WhatsApp has groups at most, 200 people can enter these groups, so group

messaging

is something I understand to a large extent. Image sharing is another good question: will images be shared in these messages? And the almost obvious answer is yes, we will allow sharing images or videos.
whatsapp system design chat messaging systems for interviews

More Interesting Facts About,

whatsapp system design chat messaging systems for interviews...

It's a good question too, but I want to say that this is something that if you have used WhatsApp, you will know about shipping, delivery and reading receipts. Then, those marks appear depending on what stage the message is in. The last two things are not critical to an application in terms of features, but it is good to think about them from an engineering point of view. The first is the person online. And if not, when was the last time you were seen in chat? And the second thing is, are the chats temporary or are they permanent?
whatsapp system design chat messaging systems for interviews
So if you take a look at Snapchat, or even if you take a look at WhatsApp, in some ways, they're much more temporary than a lot of the office messaging apps. The reason for this is that you want a lot of privacy. You want to give the user a lot of power. Plus, you actually save a lot of storage space if you think about chats being stored only in the user's apps. But if you need some kind of compliance or if there is any official communication, then you'll want that message to be stored somewhere forever. That's another thing we'll ask.
whatsapp system design chat messaging systems for interviews
Although WhatsApp offers you, so to speak, only temporary chats, if you delete the application and if your friend also deletes it, those chat messages are lost forever. One thing I would like to say is that images have already started to be shared on this channel. If you want to see how this is done, watch the Tinder video. It explains how images can be stored, retrieved, etc., in a sensible engineering way. So you have four features left for this video, and the first one we'll pick up on is group messaging. Before we move on to group messaging, we first need to talk about how one person sends a message to another.
So that's a one-on-one chat, and that's our requirement, which is a 1, 2, 1 chat. Alright, this is what we're going to get to, okay, let's go step by step. Many of the things I will discuss in this are in the

system

design playlist. So check it out. When you're looking for things like load balancing, when you're looking for things like messaging signals, I'm going to use those things as abstractions, as frameworks to fulfill all the characteristics that we've talked about. If you want any details, you can always go there. Single point of failure is also something quite important in WhatsApp's architecture.
So check them out. Now let's get started. You have the application installed on your cell phone. You connect to WhatsApp in the cloud. The place you are connecting to is called the gateway. The reason for this is that you will use an external protocol when you talk to WhatsApp, but WhatsApp may be speaking a different language with its internal services. The main reason is that you don't need as much security. You don't need those big headers that H T P gives you when you talk internally because a lot of the security mechanisms are taken care of on the gateway itself.
So once you connect to the gateway, let's say you are actually sending a message to person B. So, you are person A and you are sending it to person B, person A connects to the Gateway. Actually the gateway needs to send it to person B. Somehow you could store this information about which users are connected to which box on the gateway. In that case you would need some kind of user to assign the box. Well? For the gateway service, which is a microservice itself, it needs to store the information that this user ID is currently connected to box number two.
So if this is frame number 1, 2, 3, then there must be information saying that B is connected to two and A is connected to one. When you have this type of information stored in boxes, it will be expensive. Why is it expensive? Because maintaining a connection, a TCP connection itself requires some memory. What you want to do is increase the maximum number of connections you can store in a single box and you don't want that memory to be wasted keeping information about who is connected to which box. The second thing is that this information is duplicated on all three servers.
Either it is being duplicated, there is some caching mechanism or there is some database that is actually handling this. This is transitory information, so there will be a lot of updates here and this is not good. There are many couplings that I can see in this system. So what you want to do is maintain a silly connection. This TCP connection should be dumb in the sense that it just takes information, gives information, it doesn't know what it's doing. Other than that, the person you want to request information from about who is connected to which box is a microservice itself, and this microservice can be the sessions microservice.
What does a session microservice store? Well, who is connected to which box? Only the information that we were storing here and that was being handled by the gateway was decoupled from the system and sent to the sessions microservice. You can see that there are multiple servers for a single point of failure prevention. Well? So when a user sends a message, user A sends some message, actually requests to send a message. With B's user ID, when the gateway receives this message, it is quite silly. It doesn't know what to do, it just sends it to the session service, okay?
This session service is indirectly a router. When it receives this message, when it receives this request to send a message to user B, what it does is it finds out where user B exists and what mailbox user B is connected to. And then it routes this message by basically sending this message to the gateway two to send it back to user B. Now what happened is that A sent a message to B. Interesting. How can A send a message to B if the server sends this final bit where gateway two sends a message? This cannot be done using S G T P.
It is a server to client protocol. I mean more like it's a client to server protocol. So the client sends requests, the server gives responses. Therefore, you cannot send a message from the server to the client. You can only send requests from the client to the server. There are many ways to overcome this using HTTP itself. One of them is long surveys. In which case, what happens is that every minute or so B can ask: Hey, are there any new messages for me? And then the gateway or session management service, whichever you want, can send you the message.
Of course, this is not real time. And if you want something in real time, especially for chat applications, it is very important to have real time. So S TT P is not something we can use and we need another protocol over T C P. And what we are really looking for are web sockets. So web sockets are very good when it comes to chat applications. The main reason is that they can enable peer-to-peer communication. So A sends to B, B sends to A, there is no client or server semantics here. So with that, what happens is literally the server can send a message to the client.
B, okay, so we're happy that B got the message, now what? Well, B got the message. That means it has been delivered. At this point, user A must be notified that the message has been delivered. There's a place I missed. When the message actually arrives at the gateway and reaches the session service, what you can do is send a parallel response to gateway one saying that, okay, I received the message, now it will be sent to user B whenever possible. , let's say a different database for chat. And because it's stored in the database, it's secure, it's persistent, and it will keep retrying the message until user B receives it.
Then A is guaranteed that B will receive the message. You should then receive the shipping receipt. So just give a response saying that, okay, I have the message gateway and now it's going to send the message to user A. So the sending is done when this whole flow is completed. When B first receives the message, how does she deliver it? How do we give you a delivery receipt? Once you send the message to B and b actually receives it, it should respond. I mean, you should go back to gateway two and say you received the message.
That's an acknowledgment, a TCP acknowledgment when gateway two receives this message, it sends it back to the session service saying, Hey, this message was received. So this message was received. The message will contain A two and A of the field. Yes. So the session service, what it can do is, okay, the message has been received by the person who was tagged here as well, which is B. So the person who sent the message from A should receive a receipt from delivery. And so the sessions find out again where A exists. That's box number one, send a delivery receipt.
A receives a delivery receipt. Well? And of course you can think about how red will work. The moment a person opens, the app comes and opens this chat tab. They send a message saying red and the exact same flow takes care of red as well. Alright, that's a lot to digest if you want. Then you can go over this a little more. This is the first feature of shipping and basically delivering receipts to the sender. Well? The second feature we are talking about is pretty simple. It's about the latest scene or person online right now on a large scale, I mean on a large scale, when there are millions of users, everything gets complicated.
But one of the main architectural things we can do here is this. Simply put, B just wants to know when A was last online so this information needs to be stored somewhere. And what the server can do is ask A, but that would be stupid. So A isn't even in the picture now, and the only messages that will be sent and received will be from B and the server. So B asks the server: how long is the connection? There needs to be information in some table that indicates that this user is lost online at this time.
So some timestamp and you'll have some entry here with a particular timestamp. The only question left is how do you maintain this row with the last seen timestamp for a particular user? This key-value pair, every time a user A performs an activity, basically sending a message or reading a message or any kind of request to the server, should be recorded as an activity and that current timestamp should be persisted in this table. That way we can say that every time A did something, she was definitely online, which means the last seen timestamp should be updated based on this.
B can be told whether A is online or not. One of the key features here is that if A was online three seconds ago, B should not be told that she was online three seconds ago. Instead, the label displayed should be inline. They probably haven't done any activity in the last three seconds. You can keep this threshold as long as you want, maybe 10 seconds, maybe 15 seconds. But the important thing is that they are online or were last seen at least 20 seconds ago. The last scene tag is a bit complicated to update even after doing all the activities.
So what I will do is whenever a user sends a request to the gateway, I will have a microservice, which is the last seen microservice. And what this will do is track user activity. Every time there is an activity, they definitely send a message to the gateway. When they send a message to the gateway, I will say that they were last seen at this time. Now, interestingly, there may be some requests that are not submitted by the user, but by the application itself. For example, when you check for certain messages, maybe you're offline, you're not using the app, but you want your app to notify you every time there's a message.
For example, himdelivery receipt is not my activity. So the request has to be smart in the sense that the client has to be smart in saying that this is a user activity and that it is something that the application itself is doing. So, the client sends two types of messages. One type is user activities and the other is, say, messages generated by the system or applications. Application requests. This can be a mark on the application itself. If it is an application request, do not send it to the last scene service. If this is a user activity, send it to the last scene service, it will update the last seen timestamp for this user.
And that way, what can happen is that user B can tell if the user is online, or at least if they were last seen at this timestamp by querying this service. So feature three is also done. Very well, so we are. Very close to completion. This Chat Messaging App. As you can see, it's a pretty complicated hacker, but we get to everything one by one. Certain things I like to leave out, so to speak, are the load balancer because we've already talked about this, so I won't talk about how the load balancer balances the load across the system.
There is one interesting thing that we haven't talked about in the CDs, which is service discovery or heartbeat maintenance. And that will be recorded in a separate video, but it is quite interesting. You can check out some blogs, I'll probably post them in the description below. The authentication service is another thing I will talk about later. The main reason is that it's pretty simple, but it's something worth talking about as a basic principle. So that will also be taken later. As you can see, these four services are things that are not really relevant to WhatsApp, so to speak.
The service profile is a very generic service of image services, sending emails and sending SMS. Ok, so what is chat app message sending quota? Now you can see that there are five users attracted here. The red guys are in one group, the green guys are in the other group. So every time a red box user sends a message, she must go to all the other red boxes. And this is the feature of group messaging. So this red user is connected to gateway one, while we have the other red users connected to gateway two. So let's say we send a group message through this user.
The problem here is that if the session service stores all the information of all the groups, let's say the red group has these three users and they are connected to these three boxes. It's too complicated for the session service to handle. I mean, it's something you can decouple. So that's what we've done. We have decoupled the information about who exists in which group in a group service. Now, the session service, when it receives a message from a red user, will ask the group service, who are the other group members in this group? The group service can then respond by saying that there are 10 members in this group with these user IDs.
Now the session service runs through its own database. Typically this information will be cached as much as possible, but you can determine where these users are logged in through your database. I'm referring to those 10 users. I had a user ID assigned to the connection. And that connection tells you which box, which gateway it exists on. Then, with this information, you can route messages to each of these users one by one. What happens if the group has too many members? Too much? WhatsApp actually gives you a maximum limit of 200. There are many chat apps that try to contain that at 500, 600.
The main reason is that otherwise you will extend the request too much. If you've seen the Instagram design video, what also happens is that when a celebrity posts something, they effectively message sometimes millions of people, and that's not practical. So you either have to batch process them or wait for these guys to feed them into a chat app because you want the messages to be real-time as much as possible. You really can't have too much traction mechanism. Instead, what you do is limit the number of people in a group. 200 is a slightly reasonable number compared to millions.
Yes, it is a very reasonable number. So what we're going to do is limit the number of users we have to add up to the number X, and we're going to assume that sessions can handle web sockets that send these messages to the relevant users. Well? Now let's get into the details of this mechanism. I mean, we have the basics, how it's going to work, but the details are important. The first thing I would do in this architecture is that many users will connect to my gateways. These gateways are going to be memory hungry. That is the reason we have separated the session service.
That's a good way to reduce memory usage. The second thing you can do is pass the message, right? Maybe the message is sent via S G T P, it's a message chase, etc., etc. You don't really want to pass the message converted into an object, do some clever things with it, find out whether or not it's been authenticated on the gateway itself, all of those responsibilities, as many responsibilities as you can, you want to stay away from gateways because they are web sockets. Those are expensive. Those are real users connected to your box. You would then send a NOT PASSED message to the session service or anyone you send it to.
A smart way to send a NOT PASSED message to any service you want is to have this not passed message go through a passer microservice. You don't really need too many servers, just hard enough. So I'll call it microservice parser and unser. What it will do is take the unapproved message and turn it into a sensible message. So if your internal protocol is instead of T P or something written or audio T C P, you have something like saving, which Facebook uses internally. So I would say savings. So you can pass the message here, right? What is the advantage?
Let me, again, negotiate again, you will receive an email here. If you send the email forward, there is no work you are doing on the gateway itself. This electronic message will be converted into a sentient programming language object by this passerby, okay? And that will then direct you to the right place. Well? That is one way to reduce the memory usage ratio. What are the other concerns or key areas we should focus on? Cluster. ID to user ID? And this is a one-to-many mapping, right? A group can have many user IDs, and to reduce a lot of the duplication of the information it has, we opt for something called consistent hashing.
We should take a look at that. Consistent hashing helps you reduce memory usage on servers by delegating only part of the information to some frames. Well? Check out the video if you're not sure what this is. Consistent hashing will allow you to route the request to the correct box. What should be routed in the group id. If you have the request routed on the group id then you can tell it that for this group, who are the users that belong to this group? Alright? That takes care of the routing mechanism we have in case every time the group service fails, you send the message to the mailbox and it fails.
What do you do for a living? You can try again, but you can only try again if you know which application you were supposed to sign next. So one of the mechanisms for this is message signals. Yes, we've discussed this in the playlist so I won't go into too much detail, but message signals are good in the sense that once you send a message to the message queue, you guarantee that the message will be sent as time now. maybe 10 seconds later, maybe 15 seconds later. Those are configurable options. And also how many times are you going to try again.
All of this is configurable in the message queue. If the message queue cannot send the message, even after five retries, it can tell you that it failed. You propagate the error to the client by saying no, I couldn't send this group message. Okay, that's good too. But you need to tell the client that it failed or was cancelled. Interestingly, when the group service receives this message, it can send a response that says, yes, I received the message sessions, then it sends a response to the gateway and the user sending the original message receives a sending mark.
Group receipts when it comes to delivered or viewed are quite expensive. The main reason is that everyone has to say, yeah, I got the message, I got the message, and finally it has to come back to this guy. So we won't get into as many chat apps. I actually don't even have that. Then it's okay. The last interesting things when it comes to chat messages or group messages, especially, is that you need item power. There is a whole video I made about the new essay and the power of the article. Again, taking the Tinder messaging example, you can take a look at it to know the technical details.
This architecture is actually very resilient and as a chat system will work quite well. Here are some tips and tricks that you may only know if you have worked in messaging

systems

. So I'll give you some examples. For example, I mean, I was reading this blog that Facebook Messenger does not prioritize messages in case there is a big event like, say, New Year or let's say some festival like the Valley in India, there will be a lot of messages. . Everyone will wish each other happiness. The valley, happy new year, and that will put a big burden on the system.
So all the principles of the great limitation come in here, where messages, which are very important, are not accepted. Or sometimes you just drop messages instead of leaving them. I mean, the best thing you can do is deprioritize messages. Things like last seen can be ignored. The entire feature can be ignored. Has this message been delivered? Has it been received? Those aren't as important as sending the message to the user. The first thing about the server, receive the message and the acknowledgment of receipt. That's all the user needs to know. Well? That is more important than seeing whether the person has read the message or not.
So by deprioritizing unimportant messages, you are actually maintaining the good health of the system and it is working well instead of not working at all. So check out the course. It's really useful when you design

systems

like these. Of course, this takes care of the last requirement we had, which was to send group messages. Yes, that is the number one requirement that is ultimately met. Very good, thank you very much for listening. Thank you very much for reviewing the design of this system. If you have questions or suggestions, you can leave them in the comments below.
If you liked the video, press the like button and if you want to receive other notifications, press the subscribe button and see you next time. Oh, and I'll post a poll. So say what you want to see next time. See you.

If you have any copyright issue, please Contact