The goal of
normalization is to remove data redundancy.
Normalization consists of three basic
forms, though some now argue four or more. Normalization Forms
are progressive, that is to say that for data to be in a higher
form of normalization it must also meet all the requirements of
each lower form beneath it. To list the process for attaining
these forms:
1NF -
First
Normal Form sets very basic rules for an organized database and
may split larger tables with repeating fields into smaller
tables:
-
Eliminate duplicative columns from the same table.
-
Create separate tables for each group of related data and
identify each row with a unique column or set of columns.
This column or set of columns has a unique an non-repeating
value and is called the primary key.
-
There should be no duplicate rows.
-
A relational table, by
definition, is in first normal form. All values of the
columns contain no repeating values.
-
There may be repeating rows
(records) though not columns, and non-key columns may not be
completely dependent on the primary key.
2NF
- Second normal form removes even more duplicating data and
splits tables into smaller tables once again:
-
Remove subsets of data that apply to multiple rows of a
table and place them in separate tables.
-
Create relationships between these new tables and their
predecessors through the use of foreign keys.
-
Table can not have a composite primary key.
-
It
is already in 1NF and every non-key column is fully
dependent on the primary key.
3NF
- Third normal form removes repeating fields and splits tables
further:
-
Remove columns that are not dependent upon the primary key
so that all field items depend only on the primary key.
-
Table already meets requirements for 1NF and 2NF.
To list
examples of these three forms, starting with a raw,
un-normalized table, we can outline four basic stages of
normalization:
-
An Un-Normalized Table
The fields enclosed in the inner
parentheses, (PRODUCT-ID, PRODUCT-desc and QUANTITY) are in a
repeating group. ORDER-ID is underlined, since it alone is
acting as the primary key.
Table Structure = ORDERS(ORDER-ID,
ORDER-Date, (PRODUCT-ID, PRODUCT-desc, QUANTITY))
|
Records |
ORDER-ID |
ORDER-DATE |
PRODUCT-ID |
PRODUCT-desc |
QUANTITY |
|
1 |
2037 |
7/16/2003 |
199
213
796 |
Heat Sink
Fan
Thermal Paste |
2
4
2 |
|
2 |
2456 |
11/10/2004 |
201
117 |
CPU – AMD 3400 64
Mainboard – KF572 |
3
4 |
|
3 |
2798 |
2/15/2005 |
209 |
300Gb SATA Drive |
5 |
-
1st Normal Form
Contains no repeating groups within a
record. In order to convert the un-normalized table above to
1NF, I must expand the primary key to include the key of the
repeating group. In this case, we could create another table
that uses these values as foreign keys or we could combine
ORDER-ID and PRODUCT-ID into a compound primary key.
Table Structure = ORDERS(ORDER-ID,
ORDER-Date, (PRODUCT-ID, PRODUCT-desc, QUANTITY))
|
Records |
ORDER-ID |
ORDER-DATE |
PRODUCT-ID |
PRODUCT-desc |
QUANTITY |
|
1 |
2037 |
7/16/2003 |
199 |
Heat Sink |
2 |
|
2 |
2037 |
7/16/2003 |
213 |
Fan |
4 |
|
3 |
2037 |
7/16/2003 |
796 |
Thermal Paste |
2 |
|
4 |
2456 |
11/10/2004 |
201 |
CPU – AMD 3400 64 |
3 |
|
5 |
2456 |
11/10/2004 |
117 |
Mainboard – KF572 |
4 |
|
6 |
2798 |
2/15/2005 |
209 |
300Gb SATA Drive |
5 |
-
2nd Normal Form
The table is in 1NF and all fields that
are not part of the primary key are functionally dependent on
the primary key. If using a compound primary key, every field
must be functionally dependent on both key values. Therefore,
if a table were in 1NF and utilized a single primary key, it
would already be in 2NF.
To convert to 2NF, any fields that are not
functionally dependent on the entire compound primary key may
need to be split into smaller tables. In the 1NF table above,
the ORDER-DATE field is not functionally dependent on the entire
primary key, but only in the ORDER-ID part of it. It is a
candidate for splitting. PRODUCT-desc also does not depend on
the entire compound primary key, but only on PRODUCT-ID. It too
is a candidate for splitting. Only the QUANTITY field
functionally depends on both a PRODUCT-ID and an ORDER-ID. To
reach 2NF, split this table into three tables:
1st Table Structure = ORDERS(ORDER-ID,
ORDER-DATE, QUANTITY)
|
Records |
ORDER-ID |
ORDER-DATE |
QUANTITY |
|
1 |
2037 |
7/16/2003 |
4 |
|
2 |
2456 |
11/10/2004 |
6 |
|
3 |
2798 |
2/15/2005 |
3 |
2nd Table Structure =
PRODUCTS(PRODUCT-ID, PRODUCT-desc)
|
Records |
PRODUCT-ID |
PRODUCT-desc |
|
1 |
117 |
Mainboard – KF572 |
|
2 |
199 |
Heat Sink |
|
3 |
201 |
CPU – AMD 3400 64 |
|
4 |
209 |
300 Gb SATA Drive |
|
5 |
213 |
Fan |
|
6 |
796 |
Thermal Paste |
3rd Table Structure = TRANSACTIONS(ORDER-ID,
PRODUCT-ID, QUANTITY)
|
Records |
ORDER-ID |
PRODUCT-ID |
|
1 |
2037 |
199 |
|
2 |
2037 |
213 |
|
3 |
2037 |
796 |
|
4 |
2456 |
201 |
|
5 |
2456 |
117 |
|
6 |
2798 |
209 |
After splitting the original table into these three tables,
all fields are functionally dependent on the compound
primary key. The TRANSACTIONS table is in 2NF, but not in
3NF since it contains a compound primary key.
-
3rd Normal Form
The table is in 2NF and there are no
non-key fields that are dependent on other non-key fields. To
put it bluntly, the key, the whole key, and nothing but the
key. It usually involves removing a compound primary key.
To convert the above TRANSACTIONS table to 3NF, we could assign
it a TRANSACTION ID instead of using the compound primary key of
ORDER-ID and PRODUCT-ID.
3NF
Version of Table Structure Above = TRANSACTIONS(TRANSACTION_ID,
ORDER_ID, PRODUCT_ID)
|
Records |
TRANSACTION_ID |
ORDER-ID |
PRODUCT-ID |
|
1 |
0001 |
2037 |
199 |
|
2 |
0002 |
2037 |
213 |
|
3 |
0003 |
2037 |
796 |
|
4 |
0004 |
2456 |
201 |
|
5 |
0005 |
2456 |
117 |
|
6 |
0006 |
2798 |
209 |
If we have a table like so:
PC-CUSTOMERS(CUSTOMER-ID,
NAME, ADDRESS, TECH-ID, TECH-NAME)
|
Record |
CUSTOMER-ID |
NAME |
ADDRESS |
TECH-ID |
TECH-NAME |
|
1 |
1001 |
Smith |
1800 Lane |
8007 |
Scotty |
|
2 |
1002 |
Jones |
2950 St. |
8008 |
LaForge |
|
3 |
1003 |
Germany |
31 Ave. |
8009 |
Kim |
Each customer has a technician assigned to
them. By definition, the table is in 1NF since it has no
repeating groups. By definition, it can be considered to be
in 2NF since it has a single (non-compound) primary key and
all fields not part of the primary key depend on the primary
key. It is not yet in 3NF, however, since the TECH-NAME
non-key field is functionally dependent on the TECH-ID
field, and not just the primary key. To convert this table
to 3NF, all records that have a functional dependency on
another non-key field must be removed, so we will split the
table into 2 tables:
Table 1 = CUSTOMERS(CUSTOMER-ID,
NAME, ADDRESS, TECH-ID)
|
Record |
CUSTOMER-ID |
NAME |
ADDRESS |
TECH-ID |
|
1 |
1001 |
Smith |
1800 Lane |
8007 |
|
2 |
1002 |
Jones |
2950 St. |
8008 |
|
3 |
1003 |
Germany |
31 Ave. |
8009 |
Table 2 = TECHNICIANS(TECH-ID, TECH-NAME)
|
Record |
TECH-ID |
TECH-NAME |
|
1 |
8007 |
Scotty |
|
2 |
8008 |
LaForge |
|
3 |
8009 |
Kim |
Since each field in each table depends only on the primary
key and there are no compound primary keys, it is in 3NF.
These
are the three standard forms of normalization. Often a
relational database will stop after reaching 3NF. However,
there are also several other forms of normalization that may
be pursued:
4NF
– The controversial and debated 4th Normal Form
implies:
-
A relation is in 4NF if it has no multi-valued
dependencies.
DKNF –
Also,
in Domain Key Normal Form, a key uniquely
identifies each row in a table. Domains are sets of
permissible values for attributes. Domain restrictions are
enforced to ensure the database is kept free from
modification anomalies.
|
Friend |
Fav.Color |
City |
Zip Code |
|
BobC |
Pink |
A |
23456 |
|
Meredith |
Blue |
B |
45675 |
|
Marissa |
Red |
C |
39485 |
|
SusieQ |
Lavender |
D |
12395 |
If
this were a table and you wanted to delete a friend, you
would also inadvertently delete a zip code and city
combination. It would be better to keep the city zip code
combination, and this could be done by splitting the table
into two tables and then combining them when needed.
These level of normalization interact with the following
entity relationships in the form of primary and foreign key
fields:
-
One-To-One
– Both tables can have only one record on either side of
the relationship. Each primary key value relates to only
one (or no) record in the related table.
-
One-To-Many
- The primary key table contains only one record that
relates to none, one, or many records in the related
table.
-
Many-To-Many
- Each record in both tables can relate to any number of
records (or no records) in the other table. As an
example, if you have many brothers and sisters, your
brothers and sisters also have many brothers and
sisters. Many-to-many relationships require a third
table, known as an associate or linking table. This
third table is necessary since relational systems can't
directly accommodate the many-to-many relationship.
The basic
purpose of a data dictionary is thorough documentation of a
system for posterity's sake, code and data maintenance, and
future modifications. By definition, a data dictionary may
consist of a collection of the following components:
1. Data
Element Definitions = field names, format, codes,
and descriptions.
2. Table Definitions = names, elements, and descriptions.
3. Database Schema = key connections and relationship
structure.
4. Entity-relationship model of data = diagram of
entities and
relationship types.
5. Database security model = associating users and groups
of users
with data and
object access rights.
6. Data
Stores = inventory of data in a system.
7. Data Structures = arrangements of data attributed
objects that
define the organization of a
single instance of a
data flow.
8. Data Elements = a descriptive property or
characteristic of an
entity.
9. Data Flows = input of data to a process or output of
data from a
process.
10. Processes = work performed on or in response to
incoming data
flows or conditions.
11. External entities = a person, OU, system or object
that is
outside the system but with
which it interacts
with.
In conclusion, the data dictionary becomes the documentation
of a database and may be exceptionally useful in designing
the schema and in subsequent iterations of the SDLC.
Normalization is the key to eliminating redundancy in
records and tables and storing information efficiently.
Proper entity relationships ensure data integrity and that
transactions are processed properly. This is to say, data
dictionaries, normalization and entity relationships all
play important roles in designing a good relational
database.
