Database Normalization and Relationships                      

   Contact
   Search
   C
   C++
   Visual Basic
   Java
   JavaScript
   DHTML
   Style Sheets
   About
   Normalization
   Active X
   TDC Binding
   PHP
   Perl and CGI
   Flash
   XML
   SQL
   Messages
   Chat
   MCSE
   Linux
   Cabling   
   ActionScript
   Downloads
   E-Cards   
 
    
    

 The goal of normalization is to remove data redundancy.  Normalization consists of three basic forms, though some now argue four or more.  Normalization Forms are progressive, that is to say that for data to be in a higher form of normalization it must also meet all the requirements of each lower form beneath it.  To list the process for attaining these forms: 

1NF -
First Normal Form sets very basic rules for an organized database and may split larger tables with repeating fields into smaller tables:

  1. Eliminate duplicative columns from the same table.

  2. Create separate tables for each group of related data and identify each row with a unique column or set of columns.  This column or set of columns has a unique an non-repeating value and is called the primary key.

  3. There should be no duplicate rows.

  4. A relational table, by definition, is in first normal form.  All values of the columns contain no repeating values. 

  5. There may be repeating rows (records) though not columns, and non-key columns may not be completely dependent on the primary key.

2NF - Second normal form removes even more duplicating data and splits tables into smaller tables once again:

  1. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

  2. Create relationships between these new tables and their predecessors through the use of foreign keys.

  3. Table can not have a composite primary key.

  4. It is already in 1NF and every non-key column is fully dependent on the primary key. 

3NF - Third normal form removes repeating fields and splits tables further:

  1. Remove columns that are not dependent upon the primary key so that all field items depend only on the primary key.

  2. Table already meets requirements for 1NF and 2NF.

 

To list examples of these three forms, starting with a raw, un-normalized table, we can outline four basic stages of normalization:

  1. An Un-Normalized Table

The fields enclosed in the inner parentheses, (PRODUCT-ID, PRODUCT-desc and QUANTITY) are in a repeating group.  ORDER-ID is underlined, since it alone is acting as the primary key.

Table Structure = ORDERS(ORDER-ID, ORDER-Date, (PRODUCT-ID, PRODUCT-desc, QUANTITY))

Records

ORDER-ID

ORDER-DATE

PRODUCT-ID

PRODUCT-desc

QUANTITY

1

2037

7/16/2003

199

213

796

Heat Sink
Fan
Thermal Paste

2
4
2

2

2456

11/10/2004

201

117

CPU – AMD 3400 64
Mainboard – KF572

3
4

3

2798

2/15/2005

209

300Gb SATA Drive

5

 

  1. 1st Normal Form

Contains no repeating groups within a record.  In order to convert the un-normalized table above to 1NF, I must expand the primary key to include the key of the repeating group.  In this case, we could create another table that uses these values as foreign keys or we could combine ORDER-ID and PRODUCT-ID into a compound primary key.

Table Structure = ORDERS(ORDER-ID, ORDER-Date, (PRODUCT-ID, PRODUCT-desc, QUANTITY))

Records

ORDER-ID

ORDER-DATE

PRODUCT-ID

PRODUCT-desc

QUANTITY

1

2037

7/16/2003

199

Heat Sink

2

2

2037

7/16/2003

213

Fan

4

3

2037

7/16/2003

796

Thermal Paste

2

4

2456

11/10/2004

201

CPU – AMD 3400 64

3

5

2456

11/10/2004

117

Mainboard – KF572

4

6

2798

2/15/2005

209

300Gb SATA Drive

5

 

  1. 2nd Normal Form

The table is in 1NF and all fields that are not part of the primary key are functionally dependent on the primary key.  If using a compound primary key, every field must be functionally dependent on both key values.  Therefore, if a table were in 1NF and utilized a single primary key, it would already be in 2NF.

To convert to 2NF, any fields that are not functionally dependent on the entire compound primary key may need to be split into smaller tables.  In the 1NF table above, the ORDER-DATE field is not functionally dependent on the entire primary key, but only in the ORDER-ID part of it.  It is a candidate for splitting.  PRODUCT-desc also does not depend on the entire compound primary key, but only on PRODUCT-ID.  It too is a candidate for splitting.  Only the QUANTITY field functionally depends on both a PRODUCT-ID and an ORDER-ID.  To reach 2NF, split this table into three tables:

1st Table Structure = ORDERS(ORDER-ID, ORDER-DATE, QUANTITY)

Records

ORDER-ID

ORDER-DATE

QUANTITY

1

2037

7/16/2003

4

2

2456

11/10/2004

6

3

2798

2/15/2005

3

 
2nd Table Structure = PRODUCTS(PRODUCT-ID, PRODUCT-desc)

Records

PRODUCT-ID

PRODUCT-desc

1

117

Mainboard – KF572

2

199

Heat Sink

3

201

CPU – AMD 3400 64

4

209

300 Gb SATA Drive

5

213

Fan

6

796

Thermal Paste


3rd Table Structure = TRANSACTIONS(ORDER-ID, PRODUCT-ID, QUANTITY)

Records

ORDER-ID

PRODUCT-ID

1

2037

199

2

2037

213

3

2037

796

4

2456

201

5

2456

117

6

2798

209


After splitting the original table into these three tables, all fields are functionally dependent on the compound primary key.  The TRANSACTIONS table is in 2NF, but not in 3NF since it contains a compound primary key.

 

  1. 3rd Normal Form

The table is in 2NF and there are no non-key fields that are dependent on other non-key fields.  To put it bluntly, the key, the whole key, and nothing but the key.  It usually involves removing a compound primary key.  To convert the above TRANSACTIONS table to 3NF, we could assign it a TRANSACTION ID instead of using the compound primary key of ORDER-ID and PRODUCT-ID.

3NF Version of Table Structure Above = TRANSACTIONS(TRANSACTION_ID, ORDER_ID, PRODUCT_ID)

Records

TRANSACTION_ID

ORDER-ID

PRODUCT-ID

1

0001

2037

199

2

0002

2037

213

3

0003

2037

796

4

0004

2456

201

5

0005

2456

117

6

0006

2798

209

 If we have a table like so:

PC-CUSTOMERS(CUSTOMER-ID, NAME, ADDRESS, TECH-ID, TECH-NAME)

Record

CUSTOMER-ID

NAME

ADDRESS

TECH-ID

TECH-NAME

1

1001

Smith

1800 Lane

8007

Scotty

2

1002

Jones

2950 St.

8008

LaForge

3

1003

Germany

31 Ave.

8009

Kim

Each customer has a technician assigned to them.  By definition, the table is in 1NF since it has no repeating groups.  By definition, it can be considered to be in 2NF since it has a single (non-compound) primary key and all fields not part of the primary key depend on the primary key.  It is not yet in 3NF, however, since the TECH-NAME non-key field is functionally dependent on the TECH-ID field, and not just the primary key.  To convert this table to 3NF, all records that have a functional dependency on another non-key field must be removed, so we will split the table into 2 tables:

Table 1 = CUSTOMERS(CUSTOMER-ID, NAME, ADDRESS, TECH-ID)

Record

CUSTOMER-ID

NAME

ADDRESS

TECH-ID

1

1001

Smith

1800 Lane

8007

2

1002

Jones

2950 St.

8008

3

1003

Germany

31 Ave.

8009


Table 2 = TECHNICIANS(TECH-ID, TECH-NAME)

Record

TECH-ID

TECH-NAME

1

8007

Scotty

2

8008

LaForge

3

8009

Kim

Since each field in each table depends only on the primary key and there are no compound primary keys, it is in 3NF.

These are the three standard forms of normalization.  Often a relational database will stop after reaching 3NF.  However, there are also several other forms of normalization that may be pursued:

4NF – The controversial and debated 4th Normal Form implies:

  1. A relation is in 4NF if it has no multi-valued dependencies.

DKNF – Also, in Domain Key Normal Form, a key uniquely identifies each row in a table. Domains are sets of permissible values for attributes. Domain restrictions are enforced to ensure the database is kept free from modification anomalies.

Friend

Fav.Color

City

Zip Code

BobC

Pink

A

23456

Meredith

Blue

B

45675

Marissa

Red

C

39485

SusieQ

Lavender

D

12395

If this were a table and you wanted to delete a friend, you would also inadvertently delete a zip code and city combination.  It would be better to keep the city zip code combination, and this could be done by splitting the table into two tables and then combining them when needed.  

These level of normalization interact with the following entity relationships in the form of primary and foreign key fields:

  1. One-To-One – Both tables can have only one record on either side of the relationship. Each primary key value relates to only one (or no) record in the related table.

  2. One-To-Many - The primary key table contains only one record that relates to none, one, or many records in the related table.

  3. Many-To-Many - Each record in both tables can relate to any number of records (or no records) in the other table. As an example, if you have many brothers and sisters, your brothers and sisters also have many brothers and sisters. Many-to-many relationships require a third table, known as an associate or linking table.  This third table is necessary since relational systems can't directly accommodate the many-to-many relationship.

The basic purpose of a data dictionary is thorough documentation of a system for posterity's sake, code and data maintenance, and future modifications.  By definition, a data dictionary may consist of a collection of the following components:

1. Data Element Definitions = field names, format, codes,
                                              and descriptions.

2. Table Definitions = names, elements, and descriptions.

3. Database Schema = key connections and relationship structure.

4. Entity-relationship model of data = diagram of entities and
                                                           relationship types.

5. Database security model = associating users and groups of users
                                              with data and object access rights.

 

 

6. Data Stores = inventory of data in a system.

7. Data Structures = arrangements of data attributed objects that
                               define the organization of a single instance of a
                               data flow.

8. Data Elements = a descriptive property or characteristic of an
                              entity.

9. Data Flows = input of data to a process or output of data from a
                         process.

10. Processes = work performed on or in response to incoming data
                        flows or conditions.

11. External entities = a person, OU, system or object that is
                                   outside the system but with which it interacts
                                   with.

In conclusion, the data dictionary becomes the documentation of a database and may be exceptionally useful in designing the schema and in subsequent iterations of the SDLC.  Normalization is the key to eliminating redundancy in records and tables and storing information efficiently.  Proper entity relationships ensure data integrity and that transactions are processed properly.  This is to say, data dictionaries, normalization and entity relationships all play important roles in designing a good relational database.