YTread Logo
YTread Logo

Database Normalization 1NF 2NF 3NF

Apr 07, 2024
Hello, I'm Jesper, here I do data architecture in digital transformation videos on YouTube. Today I'm going to discuss data

normalization

, the language of data

normalization

is both mathematics and philosophy and I think you'll understand this as the video progresses. It doesn't explain everything, but it gives us a deeper understanding of a particular type of data called structural data and how to connect structured data and do more with it, like automation, analytics, prediction, artificial intelligence, all those things. fun and good things and, coincidentally or maybe. No, it also gives us an idea of ​​the other side of data, unstructured data, things that are in spreadsheets, things that are generated on the Internet, in a nutshell, it is a perfect starting point for a greater and deeper understanding of data and how it works and how it works. data is connected and what you can potentially do with it uses a data modeling language to show how data is connected and the nature of these connections or relationships to tell a story this language is radically different from the language we normally use it even has the its own alphabet called cardinality, but this will be covered in a separate video, we are used to processing thinking based on, like planning, we use processes and process flows, workflows, arrows, etc. to represent things in business, how we do things, the steps and sequences to achieve something in In life, we often describe ourselves as a process, if we ask that in part to describe ourselves, we often We describe what we do, not who we are, it is now becoming philosophical, a process describes what we do, data describes who we are, data can exist without the process, while the process must have data to exist.
database normalization 1nf 2nf 3nf
You could say that the data is persistent, while the process is not. That begs the question, so we thought the process, who we are, is very philosophical and certainly worthy of a serious conversation about DNA, but Edgar Court came into the picture and wanted more. More than great at conversations, he reduced data relationships to mathematics and in 1970 he launched the relational model, which is a systematic approach to connecting and maintaining data based on mathematical rules. Technology companies like Oracle, IBM, Microsoft, Amazon, Google used its relational model to create their own relational

database

s. Popular open source

database

s like mysql are also based on it, but that's all technology, let's forget about technology. for now doctor cod provides five rules for normalizing data where each rule builds on the other starting with this first normal form and ending with the fifth normal Form normalization is a gateway to a deeper understanding of data because it addresses what that gives more meaning to the data, which are its relationships with other data.
database normalization 1nf 2nf 3nf

More Interesting Facts About,

database normalization 1nf 2nf 3nf...

The magic of data lies in its relationships and the types of relationships called cardinality. Simply put, normalization is about connecting data in the right way. In the same way, the first three normalization rules deal with basic basic concepts, while the last two deal with exceptions, therefore, for practical reasons, normalization usually refers to the third normal form and remember that, to be in third normal form, it must also be in first and second normal form, so the focus today is normalization up to third normal form. The first normal form is about atomic values ​​and unique identifiers. Let's say we want to model employees and their skills and we've been given this spreadsheet of data with the task of normalizing it.
database normalization 1nf 2nf 3nf
I've used spreadsheets as an example to make it easier to understand, but the correct tool is to use table or entity first. The normal form specifies that the following actions must be performed on data number one, each cell can never contain more than one value, for example, a cell cannot contain As a result, both the skill ID and the skill name we need to split them into separate columns, number two, each row must be unique i.e. one column or a combination of columns, you must be able to uniquely identify the row, this is called primary key in this example name. and the address would be a potential primary key, but often the primary key is generated by the system.
database normalization 1nf 2nf 3nf
In our case, we will add a computer-generated primary key. The primary key is of great importance and features prominently in all other normalization rules. Three, it also means that the name of each column must be unique and in this case we need to rename our skill columns to make them unique and four there must be no repeating groups. Repeated groups are deleted and placed in a new spreadsheet or table. Now we have two spreadsheets or tables with nice rows of data. is uniquely identified, each has no more than one value in each cell and there are no repeating groups, welcome to the first normal form, but the fun doesn't end here.
The second normal form imposes new rules and states that all data must depend on the principal. key, so let's first examine the spreadsheet, the name, address and job names are all related to the employee id, so you are already in the second normal form, yes, that's great, but what? what about the skill name of the second spreadsheet that relates to the skill id but not the employee id? The second normal form stipulates that any column that does not depend on the full primary key must be split into its own worksheet or table, so we need to create one more skill called employee skill, a primary key that links to other worksheets. calculation or tables is also called foreign key, so in this case, the employee id is the foreign key of the employee and the skill id is a foreign key of the skill.
Now I have three spreadsheets or tables with good rows of data, but each column depends on the full primary key. Yes, again, welcome to the second normal form, but Dr. Codd still. He was not happy because he introduced a type 2 rule set called third normal form. Third normal form also focuses on the primary key and states that the primary key must completely define all columns and columns cannot depend on any other keys, so let's examine our spreadsheets. again, in skills, the skill id defines the name of the skill and the skill name is not related to any other key, so the third normal form is satisfied in employee skills, employee id and The skill id has no other columns and therefore third normal form is satisfied in employee skills, the employee id defines the name and address. and the name and address are not related to any other key and therefore comply with third normal form, but the employee id does not define the job name, therefore they violate third normal form.
This means that the job name needs to be split into its own spreadsheet and table and for consistency we have created a computer generated job ID because the job ID links the employee and the job. We need to create a new job id column in employee, as we discussed in the second normal form. Any primary key that links spreadsheets or tables also becomes a foreign key. Now we have four. spreadsheets or tables with good rows of data, but a primary key defines each keyless column. Welcome to third normal form. In summary, third normal form has transformed one non-normalized spreadsheet or table into four normalized spreadsheets.
I hope this explained normalization and how to normalize data. to the third normal form   and if you liked this video, please like and subscribe. I hope to see you in my next video.

If you have any copyright issue, please Contact