Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm in the process of designing my first database and I'm trying to follow the normalization rules up to 3NF. I am using the following definitions for normalization:

  • 1NF: No repeating groups and a primary key is identified.
  • 2NF: Partial key dependencies are removed. All non-key columns are fully dependent on the primary key.
  • 3NF: Transitive dependencies and non-key dependencies are removed.

The table is for storing information about pet food. Here is a basic example of the table before any normalization:

╔════════════╦═════════╦═════════════╦════════════╦═══════════╦═══════════════╦═══════╗
║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Ingredients   ║ Price ║
╠════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════════════╬═══════╣
║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken, Salt ║ 18.99 ║
╠════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════════════╬═══════╣
║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken, Salt ║ 18.99 ║
╚════════════╩═════════╩═════════════╩════════════╩═══════════╩═══════════════╩═══════╝

Following the 1NF definition above, I added a primary key column. In this case, the only issue was in the Ingredients column containing multiple entries per record:

╔═════════╦════════════╦═════════╦═════════════╦════════════╦═══════════╦═════════════╦═══════╗
║ Food ID ║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Ingredients ║ Price ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Salt        ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Salt        ║ 18.99 ║
╚═════════╩════════════╩═════════╩═════════════╩════════════╩═══════════╩═════════════╩═══════╝

There is a repeating group now because of the Ingredients column. To fix this, I created two separate tables; An ingredients table, and an associative table that connects the ingredients table and food table together:

╔═══════════════╦════════════╗
║ Ingredient ID ║ Ingredient ║
╠═══════════════╬════════════╣
║ 1             ║ Chicken    ║
╠═══════════════╬════════════╣
║ 2             ║ Salt       ║
╚═══════════════╩════════════╝

╔═════════╦═══════════════╗
║ Food ID ║ Ingredient ID ║
╠═════════╬═══════════════╣
║ 1       ║ 1             ║
╠═════════╬═══════════════╣
║ 1       ║ 2             ║
╠═════════╬═══════════════╣
║ 2       ║ 1             ║
╠═════════╬═══════════════╣
║ 2       ║ 2             ║
╚═════════╩═══════════════╝

Now, I can remove the Ingredients column from the food table:

╔═════════╦════════════╦═════════╦═════════════╦════════════╦═══════════╦═══════╗
║ Food ID ║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Price ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ 18.99 ║
╚═════════╩════════════╩═════════╩═════════════╩════════════╩═══════════╩═══════╝

At this point, I am unsure of how to proceed.

My line of thinking is this: If any of the non-primary key columns change its value, the Food ID must be different. For example, the brand, flavor, breed size, age group, ingredients, and price can all be the same. But if the animal type is a dog, and then changes to a cat, it must be a different record. This applies to all the non-primary key columns. Following this thinking, all the non-primary key columns ARE fully dependent on the primary key, and there are no transitive dependencies.

Is my line of thinking correct? Is 2NF and 3NF, by the definitions I listed above, already satisfied?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
124 views
Welcome To Ask or Share your Answers For Others

1 Answer

To save on screen space I'll rename attributes:

  • Brand: BRD
  • Flavor: FLA
  • Animal: ANM
  • Breed size: SIZ
  • Age group: AGP
  • Ingredient: ING
  • Price: PRI

Method 1

Normalization, as in textbooks.

Step 1.1

Due to repeating groups in ingredients, the value in the table is not a relation, hence the table does not represent a relational variable, hence it is not in 1NF. The solution is to make sure that the ingredient column (ING) has exactly one ingredient per row-- as you have done. However, no new attributes are added (no new ID). Now we have (in 1NF):

R {BRD, FLA, ANM, SIZ, AGP, ING, PRI}

The whole heading is the key.

Step 1.2

From FDs:

  • {BRD, FLA} -> {ANM}
  • {BRD, FLA} -> {SIZ}
  • {BRD, FLA} -> {AGP}
  • {BRD, FLA} -> {PRI}

by applying union rule for FDs and Heath's theorem:

 R1 {BRD, FLA, ANM, SIZ, AGP, PRI}
KEY {BRD, FLA}

 R2 {BRD, FLA, ING}
KEY {BRD, FLA, ING}

Done. That's it, if I got the FDs correctly.
Both are in BCNF, with confidence I would say that R1 is in 5NF, R2 in 6NF.

Method 2

Database design is predicate design.

Not a formal normalization method as found in most textbooks, but a design method which results in tables being in high NF (5NF, 6NF).

In the first step, the problem (business domain) is verbalized using simple predicates and associated constraints. The problem can be then reasoned about in natural language using logic.

A simple predicate can not be decomposed without losing information, its matching relvar is in 6NF.

In the second step, these simple predicates (and matching relvars) can be combined, making sure not to introduce redundancy and possibility of logical errors, namely contradiction. Once done, relvars (tables) are expected to be in 5NF.

Step 2.1

Describing the problem using simple predicates and matching 6NF relvars. Not verbalizing constraints (it gets long-winded), just stating them.

-- Brand BRD exists.
--
brand {BRD}
   PK {BRD}
-- Flavor FLA exists.
--
flavor {FLA}
    PK {FLA}
-- Animal type ANM exists.
--
animal {ANM}
    PK {ANM}
-- Breed size SIZ exists.
--
bsize {SIZ}
   PK {SIZ}
-- Age group AGP exists.
--
age {AGP}
 PK {AGP}
-- Ingredient ING exists.
--
ingredient {ING}
        PK {ING}
-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM.
--
food {BRD, FLA, ANM}
  PK {BRD, FLA}

 FK1 {BRD} REFERENCES brand  {BRD}
 FK2 {FLA} REFERENCES flavor {FLA}
 FK3 {ANM} REFERENCES animal {ANM}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed size SIZ.
--
food_bsize {BRD, FLA, SIZ}
        PK {BRD, FLA}

       FK1 {BRD, FLA} REFERENCES
      food {BRD, FLA}

       FK2 {SIZ} REFERENCES bsize {SIZ}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed age group AGP.
--
food_age {BRD, FLA, AGP}
      PK {BRD, FLA}

       FK1 {BRD, FLA} REFERENCES
      food {BRD, FLA}

       FK2 {AGP} REFERENCES age {AGP}
-- Pet food with flavor FLA made by brand BRD
-- is priced at PRI Euros per unit.
--
price {BRD, FLA, PRI}
   PK {BRD, FLA}

   FK {BRD, FLA} REFERENCES
 food {BRD, FLA}
-- Pet food with flavor FLA made by brand BRD
-- contains ingredient ING.
--
recipe {BRD, FLA, ING}
    PK {BRD, FLA, ING}

   FK1 {BRD, FLA} REFERENCES
  food {BRD, FLA}

   FK2 {ING} REFERENCES ingredient {ING}

Step 2.2

Just by looking at the keys, we can see that food, food_bsize, food_age, and price tables can be combined.

-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM, recommended for
-- breed size SIZ, breed age group AGP; priced
-- at PRI Euros per unit.
--
food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
   PK {BRD, FLA}

 FK1 {BRD} REFERENCES brand  {BRD}
 FK2 {FLA} REFERENCES flavor {FLA}
 FK3 {ANM} REFERENCES animal {ANM}
 FK4 {SIZ} REFERENCES bsize  {SIZ}
 FK5 {AGP} REFERENCES age    {AGP}

If we decide not to keep the first six tables defining domains, then the final result is as in the first method:

food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
   PK {BRD, FLA}


recipe {BRD, FLA, ING}
    PK {BRD, FLA, ING}

    FK {BRD, FLA} REFERENCES food_
       {BRD, FLA}

However, in a real-world project, you will likely need brand, flavor, animal, bsize, and age tables to constrain domains. There is no rule in normalization stating that you should have them.
Also, not all attributes may be known at the sime time, so it is not likely that you'll combine all of food, food_bsize, food_age, and price tables into food_. This will depend on the business process and optionality of attributes.

Adding IDs

Adding surrogate keys (IDs) has nothing to do with normalization. You may need to add them for other reasons, take a look at this example.


Note:

All attributes (columns) NOT NULL

KEY = PK or AK

PK = Primary Key
AK = Alternate Key   (Unique)
SK = Proper Superkey (Unique)
FK = Foreign Key

I always use the second method for design. Have never seen the first one used outside of a formal DB course of some kind; and people asking for help on SO. For some reason, beyond my understanding, method 1 is taught in DB courses under topic "DB design" before method 2. Most schools do not teach method 2 at all. Go figure.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...