Data Normalization: Introduction and Class Discussion

First Normal Form (1NF)

  • Each column must contain only one valueFor example, an address field should be broken up into one or more fields for the street address; and one field each for the city, state, zipcode, and country.

Second Normal Form (2NF)

  • The table must already be 1NF (i.e. compliant in First Normal Form)
  • Identify foreign keys where values are duplicated and create new tables. For example, information about an artist should not be stored in a table about paintings on exhibit: only the artist’s name or ID as a foreign key should be stored in the table containing information about the paintings while the information about the artists (year born, year died, etc) should be moved to a separate table in which the artist’s name or ID is the primary key.Note: This is the step when we break up tables to set up 1:many relationships.

Third Normal Form (3NF)

  • The table must already be 1NF (i.e. compliant in First Normal Form)
  • The table must also already be 2NF (i.e. compliant in Second Normal Form)
  • When a table is 3NF, every column is independent of every other non-key column.If the database is already 2NF, this step might not be necessary but is a good review of all fields. For example, in the table of paintings that we worked on in class, the country where the painting is currently owned should not be added to the paintings table but rather to a table of museums or collectors which includes geographic location and then related to the table of paintings in that way. Sometimes the database is already in 3NF if 2NF was done completely; but often, this is a good check to be sure that columns in a given table which depend on outside tables were not inadvertently added.

Note:

Remember that in the “real world”, that practical concerns about convenience and performance will override some of the theoretical notions of normalization! So a design that includes some controlled data duplication to speed up performance (such as storing postal and zip codes, which could otherwise be looked up against the information in the street, city, and state or province of an address) will be preferred over a design that strictly goes “by the rules”. This is part of the art of database design: where theory and practice meet.