Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller ones and defining relationships between them.
It helps to:
Without Normalization:
| Student_ID | Student_Name | Subject1 | Subject2 |
|---|---|---|---|
| 101 | Ram | Math | Science |
| 102 | Riya | Math | English |
| 101 | Ram | Math | Science |
π Here, Ram is repeated. It wastes space and can cause errors.row level redundancy removes by primary key make student as primary key
To avoid:
Rule:
π Bad Table (Not in 1NF):
| Roll | Name | Subjects |
|---|---|---|
| 1 | Amit | Math, Science |
| 2 | Riya | English, History |
π Good Table (1NF):
| Roll | Name | Subject |
|---|---|---|
| 1 | Amit | Math |
| 1 | Amit | Science |
| 2 | Riya | English |
| 2 | Riya | History |
π£ Interview Question:
Q: What is 1NF? Why is repeating groups bad in a table?
Answer:
1NF (First Normal Form) ensures that each column in a table contains only **atomic (indivisible)** values and each record is unique.
Repeating groups are bad because:
Example:
β Not in 1NF (Repeating groups):
+------------+-----------------------+
| Student_ID | Subjects |
+------------+-----------------------+
| 101 | Math, English, Hindi |
+------------+-----------------------+
β In 1NF (Atomic values):
+------------+----------+
| Student_ID | Subject |
+------------+----------+
| 101 | Math |
| 101 | English |
| 101 | Hindi |
+------------+----------+
If a table has a composite primary key (made from two or more columns), and a non-key column depends only on part of that keyβnot the wholeβ then it's a partial dependency.
Student_Course| Roll_No | Course_ID | Course_Name | Student_Name |
|---|---|---|---|
| 101 | C01 | DBMS | Deepak |
| 101 | C02 | OS | Deepak |
| 102 | C01 | DBMS | Shreeram |
Primary Key: (Roll_No, Course_ID)
Partial Dependencies: Student_Name depends only on Roll_No, and Course_Name depends only on
Course_ID.
β This violates 2NF.
We split the table into three:
Student| Roll_No | Student_Name |
|---|---|
| 101 | Deepak |
| 102 | Shreeram |
Course| Course_ID | Course_Name |
|---|---|
| C01 | DBMS |
| C02 | OS |
Student_Course| Roll_No | Course_ID |
|---|---|
| 101 | C01 |
| 101 | C02 |
| 102 | C01 |
β Now the design is in 2NF. All non-key columns fully depend on the full primary key.
π£ Interview Tip:
Q: What is 2NF?
A: 2NF ensures that all non-key attributes are fully dependent on the entire primary
key. It removes partial dependencies from a table with a composite primary key.
Every column in the table should depend directly on the primary key.
If a column depends on another column (not on the primary key), thatβs called a transitive
dependency.
| Roll | Name | Dept_ID | Dept_Name |
|---|---|---|---|
| 1 | Amit | 10 | Computer |
| 2 | Riya | 20 | Electrical |
Problem: Dept_Name depends on Dept_ID, and Dept_ID
depends on Roll.
So Dept_Name is not directly dependent on the primary key
Roll.
This is called a transitive dependency β
If a table is not in 3NF:
| Roll | Name | Dept_ID |
|---|---|---|
| 1 | Amit | 10 |
| 2 | Riya | 20 |
| Dept_ID | Dept_Name |
|---|---|
| 10 | Computer |
| 20 | Electrical |
β
Now every column in the Student table depends only on the primary key Roll.
π This design follows Third Normal Form (3NF)!
π£ Interview Tip:
Q: What is Transitive Dependency?
A: It happens when a column depends on another column instead of directly on the
primary key.
Example: Dept_Name depends on Dept_ID, not on Roll.
A super key is a column (or set of columns) that can uniquely identify each row in a table.
| Teacher | Subject | Dept |
|---|---|---|
| Ram | Math | Science |
| Ram | Physics | Science |
Issue: Teacher β Dept, but Teacher is not a super key.
This breaks the rule of BCNF.
| Teacher | Dept |
|---|---|
| Ram | Science |
| Subject | Teacher |
|---|---|
| Math | Ram |
| Physics | Ram |
β Now every dependency follows the rule: the left side is a super key. This design is in BCNF.
π£ Interview Question:
Q: What is the difference between 3NF and BCNF?
A: In 3NF, some non-super key dependencies are allowed. In BCNF, the left side of
every dependency must be a super key.
Rule:
A student can have multiple skills and multiple projects.
π Bad Table (Not in 4NF):
| Student | Skill | Project |
|---|---|---|
| Riya | Java | WebApp |
| Riya | C++ | WebApp |
| Riya | Java | GameDesign |
| Riya | C++ | GameDesign |
β Split into:
Skill Table:
| Student | Skill |
|---|---|
| Riya | Java |
| Riya | C++ |
Project Table:
| Student | Project |
|---|---|
| Riya | WebApp |
| Riya | GameDesign |
| Form | Rule | Main Focus | Problem It Solves | Example Issue |
|---|---|---|---|---|
| 1NF (First Normal Form) |
- Only atomic (single) values in each cell - No repeating groups or arrays |
Make each cell contain only one value | Removes multi-valued or repeated fields | A student has multiple phone numbers in one cell |
| 2NF (Second Normal Form) |
- Must be in 1NF - No partial dependency (non-key column depends only on part of composite key) |
Remove partial dependency | Solves redundancy caused by composite keys | Student_Name depends only on Roll_No, not full key (Roll_No, Course_ID) |
| 3NF (Third Normal Form) |
- Must be in 2NF - No transitive dependency (non-key depends on another non-key) |
Remove indirect dependency | Prevents update and inconsistency problems | Dept_Name depends on Dept_ID, not Roll directly |
| BCNF (Boyce-Codd Normal Form) |
- Must be in 3NF - Left side of every dependency must be a super key |
Stronger version of 3NF | Removes all anomalies where non-super key determines something | Teacher β Dept but Teacher is not a super key |
Sometimes, we combine tables back together to:
π Example:
Normalized (2 tables):
StudentCourseDenormalized (1 table):
| Student | Course | Teacher |
|---|---|---|
| Riya | Math | Ram |
Denormalization increases redundancy but improves speed in some cases.
π£ Interview Question:
What is denormalization? When do we use it?
This HTML note is ready for easy revision and can be saved for offline use!