How ChatGPT Works Technically | ChatGPT Architecture

Mar 15, 2024

In this video, we take a look at how ChatGPT

works

. We learned a lot from making this video. We hope you learn something too. Let's dig deeper. ChatGPT was launched on November 30, 2022. It reached 100 million monthly active users in just two months. It took Instagram two and a half years to reach the same milestone. This is the fastest growing app ever. How does ChatGPT work? The heart of ChatGPT is an LLM or large language model. The current LLM for ChatGPT is GPT-3.5. ChatGPT could also use the latest GPT-4 model, but there aren't many technical details about GPT-4 that we can talk about yet.

What is a large language model? A large language model is a type of neural network-based model that is trained with massive amounts of text data to understand and generate human language. The model uses training data to learn the statistical patterns and relationships between words in the language and then uses this knowledge to predict upcoming words, one word at a time. An LLM is usually characterized by its size and the number of parameters it contains. The largest GPT-3.5 model has 175 billion parameters spread across 96 layers of the neural network, making it one of the largest deep learning models ever created.

More Interesting Facts About,

how chatgpt works technically chatgpt architecture...

The input and output of the model are organized by token. Tokens are numerical representations of words or, more correctly, parts of words. Numbers are used as tokens instead of words because they can be processed more efficiently. GPT-3.5 was trained with a large amount of data from the Internet. The source data set contains 500 billion tokens. Looked at another way, the model was trained with hundreds of billions of words. The model was trained to predict the next token given a sequence of input tokens. It is capable of generating text structured in a way that is grammatically correct and semantically similar to the Internet data with which it was trained.

But without proper guidance, the model can also generate results that are false, toxic, or reflect harmful feelings. Even with that serious drawback, the model is already useful, but in a very structured way. It could be "taught" to perform tasks in natural language using carefully designed text prompts. From this arose the new field "rapid engineering." To make the model more secure and capable of chatbot-style questions and answers, the model is further refined to become a version that was used in ChatGPT. Tuning is a process that turns the model that doesn't quite align with human values into a fine-tuned model that ChatGPT could use.

This process is called Reinforcement Training from Human Feedback (RLHF). OpenAI explains how they ran RLHF on the model, but it's not easy for non-ML people to understand. Let's try to understand it with an analogy. Imagine GPT-3.5 as a highly skilled chef that can prepare a wide variety of dishes. Adjusting GPT-3.5 with RLHF is like refining this chef's skills to make his dishes more delicious. Initially, the chef is trained with a large data set of recipes and cooking techniques. However, sometimes the chef doesn't know what dish to prepare for a specific customer order. To help with this, we collected feedback from real people to create a new data set.

The first step is to create a comparison data set. We ask the chef to prepare several dishes for a given request and then have people rate the dishes based on taste and presentation. This helps the chef understand which dishes customers prefer. The next step is reward modeling. The chef uses this feedback to create a "reward model," which is like a guide to understanding customers' preferences. The bigger the reward, the better the dish. Next, we train the model with PPO or proximal policy optimization. In this analogy, the chef practices preparing dishes while following the reward model. They use a technique called Proximate Policy Optimization to improve their skills.

This is like the chef comparing her current dish to a slightly different version and learning which one is better based on the reward model. This process is repeated several times and the chef refines her skills based on updated customer feedback. With each iteration, the chef gets better at preparing dishes that satisfy customers' preferences. To look at it another way, GPT-3.5 is fine-tuned with RLHF by collecting feedback from people, creating a reward model based on their preferences, and then iteratively improving the model's performance using PPO. This allows GPT-3.5 to generate better responses tailored to specific user requests.

Now that we understand how the model is trained and tuned, let's take a look at how the model is used in ChatGPT to reply to a message. Conceptually, it's as simple as feeding the message into the ChatGPT model and returning the result. Actually, it's a little more complicated. First, ChatGPT knows the context of the chat conversation. This is done through the ChatGPT user interface which feeds the model with all the previous conversation every time a new message is entered. This is called rapid conversational injection. This is how ChatGPT seems to take context into account. Second, ChatGPT includes primary message engineering.

These are pieces of instructions injected before and after the user's cue to guide the model toward a conversational tone. These indications are invisible to the user. Third, the message is passed to the moderation API to warn or block certain types of unsafe content. The generated result is also likely to be passed to the moderation API before returning to the user. And that concludes our journey into the fascinating world of ChatGPT. A lot of engineering went into creating the models used by ChatGPT. The technology behind this is constantly evolving, opening doors to new possibilities and reshaping the way we communicate.

Fasten your seat belt and enjoy the ride. If you like our videos, you might also like our systems design newsletter. Covers topics and trends in large-scale system design. Trusted by 250,000 readers. Subscribe at blog.bytebytego.com

Watch Video & Subscribe

If you have any copyright issue, please Contact