A Complete Step-by-Step Normalization Walkthrough
A practical step by step normalization example demonstrates how to transform a flat file with redundancy into a robust relational schema. This process systematically moves data through First, Second, Third, and Boyce-Codd Normal Forms, eliminating update anomalies, saving storage space, and ensuring data integrity.
Scenario: The Unnormalized Order System
Imagine a small business that tracks customer orders. The initial data is stored in a single, flat table. This design is intuitive for a beginner but is fundamentally flawed for a database system.
We need to analyze this raw data to identify dependencies and normalize the structure.
Initial Requirements and Raw Data
The business requirements are straightforward. We need to track who bought what, how much they paid, and the status of the item.
The initial table structure looks like this:
- OrderID: A unique number for each order.
- CustomerName: The name of the person buying.
- CustomerPhone: The contact number.
- Item: The name of the product.
- ItemPrice: The price of that specific item.
- OrderStatus: Whether the order is “Shipped” or “Pending”.
Here is a sample of the raw data stored in this single table:
OrderID, CustomerName, CustomerPhone, Item, ItemPrice, OrderStatus
101, John Smith, 555-0100, Laptop, 999.00, Shipped
101, John Smith, 555-0100, Mouse, 25.00, Shipped
102, Jane Doe, 555-0101, Keyboard, 50.00, Pending
103, Bob Jones, 555-0102, Monitor, 300.00, Shipped
103, Bob Jones, 555-0102, Mouse, 25.00, Pending
Step 1: Achieving First Normal Form (1NF)
The first rule of normalization is First Normal Form (1NF). It requires that every column contains atomic values and that there are no repeating groups.
Eliminating Repeating Groups
In our raw data example, the table already appears flat, meaning we don’t have columns like “Item1”, “Item2”, “Item3”. However, the composite primary key is a critical issue here.
Identifying the Problem
Currently, OrderID alone cannot be the primary key. If we look at the data, OrderID 101 appears twice because John Smith ordered two different items (Laptop and Mouse).
To satisfy 1NF, every row must be unique. We must identify a composite primary key consisting of OrderID and Item.
1NF Structure
Once we accept that every row is now unique based on the combination of Order and Item, the table is technically in 1NF.
The attributes remain the same, but we acknowledge that we cannot use OrderID or Item alone to identify a record.
Step 2: Achieving Second Normal Form (2NF)
Second Normal Form (2NF) builds upon 1NF. It requires that there are no partial dependencies.
A partial dependency exists if a non-key attribute depends on only part of a composite primary key.
Analyzing Partial Dependencies
Our current primary key is composite: (OrderID, Item).
Let’s look at the non-key attributes:
- CustomerPhone: This depends on the Customer, not on the specific Item being ordered.
- ItemPrice: This depends on the Item, not on the Order.
- OrderStatus: This depends on the Order.
The 2NF Violation
Because ItemPrice relies solely on Item, it has a partial dependency on the primary key.
Similarly, CustomerName and CustomerPhone rely solely on OrderID (assuming one customer makes one order in this simplified view, or we need to trace the customer link).
The Decomposition Action
To fix this, we must split the table. We separate the data that depends fully on the OrderID and the data that depends fully on the Item.
We create three tables:
- Orders Table: Contains
OrderIDandCustomerName,CustomerPhone,OrderStatus. - Items Table: Contains
ItemandItemPrice. - OrderDetails Table: Contains the link between Orders and Items (
OrderID,Item).
After this decomposition, every non-key attribute depends on the whole primary key of its new table.
Step 3: Achieving Third Normal Form (3NF)
Third Normal Form (3NF) addresses transitive dependencies. A transitive dependency occurs when a non-key attribute depends on another non-key attribute.
Identifying Transitive Dependencies
Let’s review the “Orders” table we created in Step 2. It currently holds:
OrderID(Primary Key)CustomerNameCustomerPhoneOrderStatus
The Hidden Problem
While CustomerName depends on OrderID, does CustomerPhone depend on OrderID?
Not directly. It depends on the customer.
If we change the customer’s name, we change the name in the Order table. But if we change the phone number, we must update it in the Order table for that customer.
This creates a transitive dependency: OrderID → CustomerName → CustomerPhone.
The Decomposition Action
We must separate the Customer information from the Order information.
We introduce a Customers table with CustomerID, CustomerName, and CustomerPhone.
We then update the Orders table to hold only the CustomerID as a foreign key.
Similarly, we look at the Items table from Step 2. It has Item and ItemPrice.
If we add a product category (e.g., “Electronics”), the price might depend on the category, but not strictly on the item name alone if items vary by category. However, in our strict case, ItemPrice depends entirely on Item, so the Items table is already in 3NF.
Step 4: Achieving Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF. It ensures that for every functional dependency, the determinant must be a candidate key.
Checking for Remaining Issues
Let’s look closely at our Orders table again. We have a foreign key CustomerID and an OrderStatus.
What if an OrderID is also a CustomerID? This shouldn’t happen in our design, but we must check the logic.
Suppose we add a constraint: “If an order status is ‘Cancelled’, the customer cannot be assigned a discount.”
This implies a dependency: OrderStatus → DiscountEligibility.
If OrderStatus determines a rule but is not a candidate key, we violate BCNF.
Applying BCNF Rules
In our specific “step by step normalization example,” we generally assume standard relationships. However, to strictly satisfy BCNF, we might need to split the Order status logic if it affects business rules differently than the customer ID does.
Our final schema typically looks like this:
- Customers (
CustomerID,Name,Phone) - Orders (
OrderID,CustomerID,Status) - OrderDetails (
OrderID,ItemID,Quantity) - Items (
ItemID,Name,Price)
Every determinant in this final structure is a candidate key. This confirms the schema is in BCNF.
Verification: Why We Did This
We have successfully moved from a single table with high redundancy to a set of normalized tables.
This was not just about structure; it was about preventing errors.
Eliminated Update Anomalies
Previously, if a customer changed their phone number, we had to update every single order row belonging to them. Now, we update the Customers table once.
Deleting an item that was never ordered caused data loss because the price was tied to the order. Now, the price is stored independently.
Preventing Insert Anomalies
Before, we could not add a new product to the catalog without creating a fake order for it. Now, we can insert items into the Items table independently of any orders.
Ensuring Data Integrity
By using foreign keys in the OrderDetails and Orders tables, the database ensures that we cannot associate an order with a non-existent item or a non-existent customer.
Conclusion
Normalization is a necessary discipline for any relational database. While it requires more initial effort to design, it pays off in long-term maintenance and data consistency.
This step by step normalization example illustrates the logical path from a flat file to a normalized schema. It shows that the goal is not just to follow rules, but to isolate changes and protect data integrity.
Key Takeaways
- Atomic Values: Ensure every column holds a single value to satisfy 1NF.
- Composite Keys: Identify partial dependencies to separate attributes in 2NF.
- Transitive Dependencies: Separate dependent attributes in 3NF to prevent cascading updates.
- BCNF Check: Verify that every determinant is a candidate key for maximum integrity.
- Trade-offs: Normalization reduces redundancy but may increase query complexity.