Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller ones and defining relationships between them.
It helps to:
Without Normalization:
Student_ID | Student_Name | Subject1 | Subject2 |
---|---|---|---|
101 | Ram | Math | Science |
102 | Riya | Math | English |
101 | Ram | Math | Science |
π Here, Ram is repeated. It wastes space and can cause errors.row level redundancy removes by primary key make student as primary key
To avoid:
Rule:
π Bad Table (Not in 1NF):
Roll | Name | Subjects |
---|---|---|
1 | Amit | Math, Science |
2 | Riya | English, History |
π Good Table (1NF):
Roll | Name | Subject |
---|---|---|
1 | Amit | Math |
1 | Amit | Science |
2 | Riya | English |
2 | Riya | History |
π£ Interview Question:
Q: What is 1NF? Why is repeating groups bad in a table?
Answer:
1NF (First Normal Form) ensures that each column in a table contains only **atomic (indivisible)** values and each record is unique.
Repeating groups are bad because:
Example:
β Not in 1NF (Repeating groups):
+------------+-----------------------+ | Student_ID | Subjects | +------------+-----------------------+ | 101 | Math, English, Hindi | +------------+-----------------------+
β In 1NF (Atomic values):
+------------+----------+ | Student_ID | Subject | +------------+----------+ | 101 | Math | | 101 | English | | 101 | Hindi | +------------+----------+
If a table has a composite primary key (made from two or more columns), and a non-key column depends only on part of that keyβnot the wholeβ then it's a partial dependency.
Student_Course
Roll_No | Course_ID | Course_Name | Student_Name |
---|---|---|---|
101 | C01 | DBMS | Deepak |
101 | C02 | OS | Deepak |
102 | C01 | DBMS | Shreeram |
Primary Key: (Roll_No, Course_ID)
Partial Dependencies: Student_Name depends only on Roll_No, and Course_Name depends only on
Course_ID.
β This violates 2NF.
We split the table into three:
Student
Roll_No | Student_Name |
---|---|
101 | Deepak |
102 | Shreeram |
Course
Course_ID | Course_Name |
---|---|
C01 | DBMS |
C02 | OS |
Student_Course
Roll_No | Course_ID |
---|---|
101 | C01 |
101 | C02 |
102 | C01 |
β Now the design is in 2NF. All non-key columns fully depend on the full primary key.
π£ Interview Tip:
Q: What is 2NF?
A: 2NF ensures that all non-key attributes are fully dependent on the entire primary
key. It removes partial dependencies from a table with a composite primary key.
Every column in the table should depend directly on the primary key.
If a column depends on another column (not on the primary key), thatβs called a transitive
dependency.
Roll | Name | Dept_ID | Dept_Name |
---|---|---|---|
1 | Amit | 10 | Computer |
2 | Riya | 20 | Electrical |
Problem: Dept_Name
depends on Dept_ID
, and Dept_ID
depends on Roll
.
So Dept_Name
is not directly dependent on the primary key
Roll
.
This is called a transitive dependency β
If a table is not in 3NF:
Roll | Name | Dept_ID |
---|---|---|
1 | Amit | 10 |
2 | Riya | 20 |
Dept_ID | Dept_Name |
---|---|
10 | Computer |
20 | Electrical |
β
Now every column in the Student table depends only on the primary key Roll
.
π This design follows Third Normal Form (3NF)!
π£ Interview Tip:
Q: What is Transitive Dependency?
A: It happens when a column depends on another column instead of directly on the
primary key.
Example: Dept_Name depends on Dept_ID, not on Roll.
A super key is a column (or set of columns) that can uniquely identify each row in a table.
Teacher | Subject | Dept |
---|---|---|
Ram | Math | Science |
Ram | Physics | Science |
Issue: Teacher β Dept
, but Teacher
is not a super key.
This breaks the rule of BCNF.
Teacher | Dept |
---|---|
Ram | Science |
Subject | Teacher |
---|---|
Math | Ram |
Physics | Ram |
β Now every dependency follows the rule: the left side is a super key. This design is in BCNF.
π£ Interview Question:
Q: What is the difference between 3NF and BCNF?
A: In 3NF, some non-super key dependencies are allowed. In BCNF, the left side of
every dependency must be a super key.
Rule:
A student can have multiple skills and multiple projects.
π Bad Table (Not in 4NF):
Student | Skill | Project |
---|---|---|
Riya | Java | WebApp |
Riya | C++ | WebApp |
Riya | Java | GameDesign |
Riya | C++ | GameDesign |
β Split into:
Skill Table:
Student | Skill |
---|---|
Riya | Java |
Riya | C++ |
Project Table:
Student | Project |
---|---|
Riya | WebApp |
Riya | GameDesign |
Form | Rule | Main Focus | Problem It Solves | Example Issue |
---|---|---|---|---|
1NF (First Normal Form) |
- Only atomic (single) values in each cell - No repeating groups or arrays |
Make each cell contain only one value | Removes multi-valued or repeated fields | A student has multiple phone numbers in one cell |
2NF (Second Normal Form) |
- Must be in 1NF - No partial dependency (non-key column depends only on part of composite key) |
Remove partial dependency | Solves redundancy caused by composite keys | Student_Name depends only on Roll_No, not full key (Roll_No, Course_ID) |
3NF (Third Normal Form) |
- Must be in 2NF - No transitive dependency (non-key depends on another non-key) |
Remove indirect dependency | Prevents update and inconsistency problems | Dept_Name depends on Dept_ID, not Roll directly |
BCNF (Boyce-Codd Normal Form) |
- Must be in 3NF - Left side of every dependency must be a super key |
Stronger version of 3NF | Removes all anomalies where non-super key determines something | Teacher β Dept but Teacher is not a super key |
Sometimes, we combine tables back together to:
π Example:
Normalized (2 tables):
Student
Course
Denormalized (1 table):
Student | Course | Teacher |
---|---|---|
Riya | Math | Ram |
Denormalization increases redundancy but improves speed in some cases.
π£ Interview Question:
What is denormalization? When do we use it?
This HTML note is ready for easy revision and can be saved for offline use!