Information System Analysis and Design: Normalization

Normalization

Normalization process is the establishment of the database structure so that most of the ambiguity can be removed. Normalization stage, starting from the most mild (1NF) to most stringent (5NF). Usually only up to the level of 3NF or BCNF because already sufficient to generate the table-a table of good quality. Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies that could lead to a loss of data integrity.

Normal form (normal form) is a database of the scheme class relations defined for the purpose of high integrity and maintainability. Creation of a normal form is called normalization. Normalization achieved with analysis dependence among the attributes of each individual associated with that relation.

Normalization database
Normalization be used to optimizing table structures, increase speed, the income data is the same, more efficient in the use of storage media, reduce redundancy, avoid anomalies (insertion anomalies, deletion anomalies, update anomalies), improved data integrity.

A table saying good (efficient) or if the normal 3 to meet the following criteria:

If there is decomposition (decomposition) table, it must be guaranteed safe the composition (Lossless-Join Decomposition). That is, after the table is described a new table-table, the table-table can generate a new table with the same exact.
Maintain dependence on the functional changes in data (Dependency preservation).
Does not violate Boyce-Code Normal Form (BCNF).

If the three criteria (BCNF) can’t be met, then at least the table does not violate the Normal Form of the third stage (3rd Normal Form / 3NF).

Functional Dependency

Functional Dependency attributes describe the relationship in a relationship.
An attribute said functionally dependant on the other, if we use the value attribute to determine the value of the other attributes.
Symbol that is used to represent à functional dependency. Read the functional set.
Notation: A à B
A and B are attributes of a table. A means of determining the functional B or B depends on A, if and only if there are 2 rows of data with the same value of A, then B is also the same value
Notation: A xà B
It is the opposite of the previous notation.

Functional Dependency of the table value

Functional Dependency:

• NRP à Nama

• Mata_Kuliah, NRP à Nilai

Non Functional Dependency:

• Mata_Kuliah à NRP

• NRP à Nilai

Nrp à Name
Because for each value Nrp the same, then the value of the same name
(Mata_kuliah, NRP) à Value
Because the value of attributes depending on the NRP and Mata_kuliah together. In another sense Mata_kuliah for the NRP and the same, they also rated the same, because Mata_kuliah and the NRP is a key (is unique).

NRP ≠> Mata_kuliah
Value ≠> NRP

First Normal Form - 1NF

Relations, there is a condition in First Normal Form (1NF) if and only if all domains are covered only atomic value, for example, there is no recurrence group (domains) in a tuple. Advantage of the 1NF relation compared Unnormalized (UNRs) is a simplification in the form of representation and ease of use in developing a query language. lack is the need to duplicate data, Most of the system relations (not all) require a form of relations in 1NF.

A table on the form said to be normal if I did not reside in the unnormalized form of a table, where there is a kind of field multiplication and field that allows a null (empty).
May not be found:

Many attributes of value (Multi valued attributes).
Attributes composite or a combination of both.
Price is the domain attribute must be atomic rates.

Second Normal Form - 2NF

A super key is a set of one or more attribute, which, taken in particular where the possible for us to uniquely identify one entity or relationship.

A Candidate key is a subset of attributes-the attributes that super key is also super key and not reducible to the other super key. A primary key is selected from the set of candidate key for use in relation to an index of the ownership of one or more of the attribute that can be defined from the unique value of one or more of the attribute called functional dependency.

Given a relation (R), a set of (B) is functionally dependent on another set of attributes (A) if, at one time, each value A associated with a value B, this is a form of the FD A → symbol with B. A relation is in second position to its normal form (2NF) if and only if the relationship is also in 1NF and every attribute non key fully dependent on its primary key. 2NF requires that any FD in the relation must include all components of the primary key as the determinant, either direct or transitive.

Why should 2NF, consider the benefits of the 1NF R. paper, pub-id, and editor-created duplicate id. For each author of the paper. If the editor of publications for a paper change, some also in the tuple must be updated. Finally, if one paper on the take, all the symbol tupple should be removed. This form will give the side effects on the deletion of information that a associated with the auth-id name and auth-auth-addr.

One way to do this is to transform the relations into two or more of the 2NF relations.

example R1: paper-id, id-auth-name → Inst, Inst-addrR2: auth-id → auth-name, auth-addrR3: paper-pub-id → id, editor-id.

Normal form 2NF met in a table if it meets the form of 1NF, and all the attributes than the primary key, have a full Functional Dependency on primary key.
A table does not meet 2NF, if there are attributes that dependency (Functional Dependency) are only partial (only depending on the part of the primary key).

If there are attributes that have no dependence on the primary key, then the attributes must be moved or removed.
Y is full if it is said to delete an Functional dependency X attribute A from X means that Y is no longer dependent functional.

Y said if deleting a partial attribute A from X means that Y is functionally dependent. Functional dependency .
R depend on the full functional primary key R.ÎRelation scheme R in the form 2NF if every non-primary key attribute A .

Third Normal Form - 3NF

Normal form 3NF fulfilled if the form meets 2NF, and if there are no non-primary key attribute that has a dependence on non-primary key attributes ofthe other (transitive dependenc

ies).

A relationship in the Third Normal Form (3NF) if and only if the relation is in 2NF and every non key attribute is non transitive dependent on primary key

Boyce-Codd Normal Form (BNCF)

Boyce-Codd Normal Form constraint has a stronger form of the Normal third. To be BNCF, relations must be in the form of first normal and forced each of the attributes depends on the function in the super key attributes.
In the example below there is a relationship seminar, is the Primary Key NPM + Seminar.

Students may take one or two seminars. Each seminar requires 2 each of the students and led by one of the 2 seminar. Each leader can only take one s

eminar course. NPM in this example and show a seminar guide.

Normal form of the fourth and fifth

Relations in the fourth normal form (NF 4) if the relation in BCNF and dependency tdak contain many values. To remove the dependency of many values from a relation, we divide the relationship into two new relations. Each relation contains two attributes that have a lot of relationship value.

Relations in fifth normal form (5NF) deal with the property called the join without any loss of information (lossless join). Fifth normal form (also called the 5 NF PJNF (projection join normal f

orm). The case is very rare and appear difficult to detect in practice.

Rules of Data Normalization

Eliminate Repeating Groups

In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.

Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign ke

y for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.

Eliminate Redundant Data

In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends o

nly on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table.

Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.

Or suppose the last member listing a particular database leaves

the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form.

To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member.

Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughou

t the application.

Eliminate Columns Not Dependent On Key

The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table.

The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM

were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!

BCNF. Boyce-Codd Normal Form

Boyce-Codd Normal Form states mathematically that:

A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.

BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.

Basically, a humorous way to remember BCNF is that all functional dependencies are:
"The key, the whole key, and nothing but the key, so help me Codd."

Isolate Independent Multiple Relationships

This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships.

The way this situation starts is by a business request list the one shown below. This could be any 2 M:M relationships from a single entity. For instance, a member could k

now many software tools, and a software tool may be used by many members. Also, a member could have recommended many books, and a book could be recommended by many members.

So, to resolve the two M:M relationships, we know that we should resolve them separately, and that would give us 4th normal form. But, if we were to combine them into a single table, it might look right (it is in 3rd normal form) at first. This is shown below, and violates 4th normal form.

To get a picture of what is wrong, look at some sample data, shown below. The first few records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and Steve Doesn't know any software tools. Our solution has forced us to do strange things like create dummy records in both Book and Software to allow the record in the association, since it is key only table.

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M relationships are resolved independently if they are indeed independent, as shown below.

Isolate Semantically Related Multiple Relationships

modify the original business diagram and add a link between the books and the software tools, indicating which books deal with which software tools, as shown below.

This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the multiple M:M relationships into a single association, which would now violate 5th normal form. The ternary association looks identical to the one shown in the 4th normal form example, and is also going to have trouble displaying the information correctly. This time we would have even more trouble because we can't show the relationships between books and software unless we have a member to link to, or we have to add our favorite dummy member record to allow the record in the association table.

The solution, as before, is to ensure that all M:M relationships that are independent are resolved independently, resulting in the model shown below. Now information about members and books, members and software, and books and software are all stored independently, even though they are all very much semantically related. It is very tempting in many situations to combine the multiple M:M relationships because they are so similar. Within complex business discussions, the lines can become blurred and the correct solution not so obvious.

Optimal Normal Form

At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF), it is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

Domain-Key Normal Form

This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies.

"if every constraint on the relation is a logical consequence of the definition of keys and domains"

Constraint "a rule governing static values of attributes"

Key "unique identifier of a tuple"

Domain "description of an attribute’s allowed values"

Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is

· functionally dependent on X, and

· not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency

A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Multi valued dependency

A multi valued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Super key

A super key is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct super keys. {Employee ID, Employee Address, Skill} would be a super key for the "Employees' Skills" table; {Employee ID, Skill} would also be a super key.

Candidate key

A candidate key is a minimal super key, that is, a super key for which we can say that no proper subset of it is also a super key. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

Information System Analysis and Design

Sunday, April 26, 2009

Normalization

No comments:

Post a Comment

Followers

Blog Archive

About Me