Sunday, April 26, 2009

Normalization

Normalization

Normalization process is the establishment of the database structure so that most of the ambiguity can be removed. Normalization stage, starting from the most mild (1NF) to most stringent (5NF). Usually only up to the level of 3NF or BCNF because already sufficient to generate the table-a table of good quality. Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies that could lead to a loss of data integrity.

Normal form (normal form) is a database of the scheme class relations defined for the purpose of high integrity and maintainability. Creation of a normal form is called normalization. Normalization achieved with analysis dependence among the attributes of each individual associated with that relation.

Normalization database
Normalization be used to optimizing table structures, increase speed, the income data is the same, more efficient in the use of storage media, reduce redundancy, avoid anomalies (insertion anomalies, deletion anomalies, update anomalies), improved data integrity.

A table saying good (efficient) or if the normal 3 to meet the following criteria:

  • If there is decomposition (decomposition) table, it must be guaranteed safe the composition (Lossless-Join Decomposition). That is, after the table is described a new table-table, the table-table can generate a new table with the same exact.
  • Maintain dependence on the functional changes in data (Dependency preservation).
  • Does not violate Boyce-Code Normal Form (BCNF).

If the three criteria (BCNF) can’t be met, then at least the table does not violate the Normal Form of the third stage (3rd Normal Form / 3NF).

Functional Dependency

Functional Dependency attributes describe the relationship in a relationship.
An attribute said functionally dependant on the other, if we use the value attribute to determine the value of the other attributes.
Symbol that is used to represent à functional dependency. Read the functional set.
Notation: A à B
A and B are attributes of a table. A means of determining the functional B or B depends on A, if and only if there are 2 rows of data with the same value of A, then B is also the same value
Notation: A xà B
It is the opposite of the previous notation.






Functional Dependency of the table value

Functional Dependency:

NRP à Nama

Mata_Kuliah, NRP à Nilai

Non Functional Dependency:

Mata_Kuliah à NRP

NRP à Nilai

Nrp à Name
Because for each value Nrp the same, then the value of the same name
(Mata_kuliah, NRP) à Value
Because the value of attributes depending on the NRP and Mata_kuliah together. In another sense Mata_kuliah for the NRP and the same, they also rated the same, because Mata_kuliah and the NRP is a key (is unique).

NRP ≠> Mata_kuliah
Value ≠> NRP

First Normal Form - 1NF

Relations, there is a condition in First Normal Form (1NF) if and only if all domains are covered only atomic value, for example, there is no recurrence group (domains) in a tuple. Advantage of the 1NF relation compared Unnormalized (UNRs) is a simplification in the form of representation and ease of use in developing a query language. lack is the need to duplicate data, Most of the system relations (not all) require a form of relations in 1NF.

A table on the form said to be normal if I did not reside in the unnormalized form of a table, where there is a kind of field multiplication and field that allows a null (empty).
May not be found:

  • Many attributes of value (Multi valued attributes).
  • Attributes composite or a combination of both.
  • Price is the domain attribute must be atomic rates.

Second Normal Form - 2NF

A super key is a set of one or more attribute, which, taken in particular where the possible for us to uniquely identify one entity or relationship.

A Candidate key is a subset of attributes-the attributes that super key is also super key and not reducible to the other super key. A primary key is selected from the set of candidate key for use in relation to an index of the ownership of one or more of the attribute that can be defined from the unique value of one or more of the attribute called functional dependency.

Given a relation (R), a set of (B) is functionally dependent on another set of attributes (A) if, at one time, each value A associated with a value B, this is a form of the FD A → symbol with B. A relation is in second position to its normal form (2NF) if and only if the relationship is also in 1NF and every attribute non key fully dependent on its primary key. 2NF requires that any FD in the relation must include all components of the primary key as the determinant, either direct or transitive.

Why should 2NF, consider the benefits of the 1NF R. paper, pub-id, and editor-created duplicate id. For each author of the paper. If the editor of publications for a paper change, some also in the tuple must be updated. Finally, if one paper on the take, all the symbol tupple should be removed. This form will give the side effects on the deletion of information that a associated with the auth-id name and auth-auth-addr.

  • One way to do this is to transform the relations into two or more of the 2NF relations.

example R1: paper-id, id-auth-name → Inst, Inst-addrR2: auth-id → auth-name, auth-addrR3: paper-pub-id → id, editor-id.

Normal form 2NF met in a table if it meets the form of 1NF, and all the attributes than the primary key, have a full Functional Dependency on primary key.
A table does not meet 2NF, if there are attributes that dependency (Functional Dependency) are only partial (only depending on the part of the primary key).

If there are attributes that have no dependence on the primary key, then the attributes must be moved or removed.
Y is full if it is said to delete an Functional dependency X attribute A from X means that Y is no longer dependent functional.

Y said if deleting a partial attribute A from X means that Y is functionally dependent. Functional dependency .
R depend on the full functional primary key R.ÎRelation scheme R in the form 2NF if every non-primary key attribute A .

Third Normal Form - 3NF

Normal form 3NF fulfilled if the form meets 2NF, and if there are no non-primary key attribute that has a dependence on non-primary key attributes ofthe other (transitive dependenc

ies).

A relationship in the Third Normal Form (3NF) if and only if the relation is in 2NF and every non key attribute is non transitive dependent on primary key

Boyce-Codd Normal Form (BNCF)

Boyce-Codd Normal Form constraint has a stronger form of the Normal third. To be BNCF, relations must be in the form of first normal and forced each of the attributes depends on the function in the super key attributes.
In the example below there is a relationship seminar, is the Primary Key NPM + Seminar.

Students may take one or two seminars. Each seminar requires 2 each of the students and led by one of the 2 seminar. Each leader can only take one s

eminar course. NPM in this example and show a seminar guide.

Normal form of the fourth and fifth

Relations in the fourth normal form (NF 4) if the relation in BCNF and dependency tdak contain many values. To remove the dependency of many values from a relation, we divide the relationship into two new relations. Each relation contains two attributes that have a lot of relationship value.

Relations in fifth normal form (5NF) deal with the property called the join without any loss of information (lossless join). Fifth normal form (also called the 5 NF PJNF (projection join normal f

orm). The case is very rare and appear difficult to detect in practice.

Rules of Data Normalization

  1. Eliminate Repeating Groups

In the original member list, each member name is followed by any databases that the member has experience with. Some might know many, and others might not know any. To answer the question, "Who knows DB2?" we need to perform an awkward scan of the list looking for references to DB2. This is inefficient and an extremely untidy way to store information.

Moving the known databases into a seperate table helps a lot. Separating the repeating groups of databases from the member information results in first normal form. The MemberID in the database table matches the primary key in the member table, providing a foreign ke

y for relating the two tables with a join operation. Now we can answer the question by looking in the database table for "DB2" and getting the list of members.












  1. Eliminate Redundant Data

In the Database Table, the primary key is made up of the MemberID and the DatabaseID. This makes sense for other attributes like "Where Learned" and "Skill Level" attributes, since they will be different for every member/database combination. But the database name depends o

nly on the DatabaseID. The same database name will appear redundantly every time its associated ID appears in the Database Table.

Suppose you want to reclassify a database - give it a different DatabaseID. The change has to be made for every member that lists that database! If you miss some, you'll have several members with the same database under different IDs. This is an update anomaly.

Or suppose the last member listing a particular database leaves

the group. His records will be removed from the system, and the database will not be stored anywhere! This is a delete anomaly. To avoid these problems, we need second normal form.

To achieve this, separate the attributes depending on both parts of the key from those depending only on the DatabaseID. This results in two tables: "Database" which gives the name for each DatabaseID, and "MemberDatabase" which lists the databases for each member.

Now we can reclassify a database in a single operation: look up the DatabaseID in the "Database" table and change its name. The result will instantly be available throughou

t the application.









  1. Eliminate Columns Not Dependent On Key

The Member table satisfies first normal form - it contains no repeating groups. It satisfies second normal form - since it doesn't have a multivalued key. But the key is MemberID, and the company name and location describe only a company, not a member. To achieve third normal form, they must be moved into a separate table. Since they describe a company, CompanyCode becomes the key of the new "Company" table.

The motivation for this is the same for second normal form: we want to avoid update and delete anomalies. For example, suppose no members from the IBM

were currently stored in the database. With the previous design, there would be no record of its existence, even though 20 past members were from IBM!







  1. BCNF. Boyce-Codd Normal Form

Boyce-Codd Normal Form states mathematically that:

A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.

BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.

Basically, a humorous way to remember BCNF is that all functional dependencies are:
"The key, the whole key, and nothing but the key, so help me Codd."

  1. Isolate Independent Multiple Relationships

This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships.

The way this situation starts is by a business request list the one shown below. This could be any 2 M:M relationships from a single entity. For instance, a member could k

now many software tools, and a software tool may be used by many members. Also, a member could have recommended many books, and a book could be recommended by many members.










So, to resolve the two M:M relationships, we know that we should resolve them separately, and that would give us 4th normal form. But, if we were to combine them into a single table, it might look right (it is in 3rd normal form) at first. This is shown below, and violates 4th normal form.








To get a picture of what is wrong, look at some sample data, shown below. The first few records look right, where Bill knows ERWin and recommends the ERWin Bible for everyone to read. But something is wrong with Mary and Steve. Mary didn't recommend a book, and Steve Doesn't know any software tools. Our solution has forced us to do strange things like create dummy records in both Book and Software to allow the record in the association, since it is key only table.

The correct solution, to cause the model to be in 4th normal form, is to ensure that all M:M relationships are resolved independently if they are indeed independent, as shown below.


  1. Isolate Semantically Related Multiple Relationships

modify the original business diagram and add a link between the books and the software tools, indicating which books deal with which software tools, as shown below.


This makes sense after the discussion on Rule 4, and again we may be tempted to resolve the multiple M:M relationships into a single association, which would now violate 5th normal form. The ternary association looks identical to the one shown in the 4th normal form example, and is also going to have trouble displaying the information correctly. This time we would have even more trouble because we can't show the relationships between books and software unless we have a member to link to, or we have to add our favorite dummy member record to allow the record in the association table.


The solution, as before, is to ensure that all M:M relationships that are independent are resolved independently, resulting in the model shown below. Now information about members and books, members and software, and books and software are all stored independently, even though they are all very much semantically related. It is very tempting in many situations to combine the multiple M:M relationships because they are so similar. Within complex business discussions, the lines can become blurred and the correct solution not so obvious.


  1. Optimal Normal Form

At this point, we have done all we can with Entity-Relationship Diagrams (ERD). Most people will stop here because this is usually pretty good. However, another modeling style called Object Role Modeling (ORM) can display relationships that cannot be expressed in ERD. Therefore there are more normal forms beyond 5th. With Optimal Normal Form (OMF), it is defined as a model limited to only simple (elemental) facts, as expressed in ORM.

  1. Domain-Key Normal Form

This level of normalization is simply a model taken to the point where there are no opportunities for modification anomalies.

"if every constraint on the relation is a logical consequence of the definition of keys and domains"

Constraint "a rule governing static values of attributes"

Key "unique identifier of a tuple"

Domain "description of an attribute’s allowed values"

Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is

· functionally dependent on X, and

· not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency

A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and YZ.

Multi valued dependency

A multi valued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Super key

A super key is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct super keys. {Employee ID, Employee Address, Skill} would be a super key for the "Employees' Skills" table; {Employee ID, Skill} would also be a super key.

Candidate key

A candidate key is a minimal super key, that is, a super key for which we can say that no proper subset of it is also a super key. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

Friday, April 3, 2009

DATA FLOW DIAGRAM

DATA FLOW DIAGRAM
Data Flow Diagram (DFD) is a means of making the model system allows the professional to describe the system process as a functional network that is connected to one with flow data, either manually or computerized. DFD is a tool of making a model that provides emphasis only on the function of the system. DFD is a tool that system design-oriented flow data with the concept of decomposition can be used to depiction and analysis of system design that is easily
communicated by professionals and users to the system the program.
Also called Data Flow Diagram (DFD) to describe the system modules in a smaller, easier and less understanding of the field to understand the computer system that will be done.

Diagram context
• include one process and expressed scope from a system
• it's the highest level from DFD which expressed all input into the system and output from system
• System is bounded by the boundary (it's shown by the dashed line)
• there isn't any storage

This diagram is the highest-level diagram of the DFD describes the relationship with the environment outside the system.
1. Specify the name of the system.
2. Determine the limits of the system.
3. Determine the terminator are present in the system.
4. Determine what is received / given terminator from / to the system.
5. Drawing context diagram.

Zero diagram
1. Describes process of DFD.
2. Providing views on the overall system in which, showing the main function or process that is, the flow of data and entity external
3. At this level of data storage possible
4. To process that does not detailed again on the next level then added to the symbol '*' or 'P' at the end of the process
5. Balance of input and output (balancing) between 0 to diagram context diagram should be maintained

Detailed Diagram
Detailed Diagram is diagrams that decompose a process that is in the diagram zero level or above
Numbering level in the DFD.
In one level there should be no more than 7 units and the maximum of 9, when more should be done in the decomposition

SPECIFICATION PROCESS
Each process in the DFD must have a specification process
At the top level method is used to describe the process can use a sentence with descriptive
At a more detailed level, namely on the bottom (functional primitive) requires specification of a more structured
Specification process will be the guideline for the programmer to make this program (coding)
Method used in the specification process: the process of disintegration in the form of a story, decision table, decision tree.

ON FOREIGN
1. Something that is outside the system, but it provides data in the system or to provide data from the system
2. Symbol with the notation
3. External entity not part of the system including
4. Naming:
• The name of the form of noun
• Terminal may not have the same name except the same object
Sure every system has a limit system (Boundary) that a separate system with the outside environment. System will receive input and produce output to the outside environment. Unity outside (external entity) is the unit (entity) in environment outside the system that can be a person, organization or other system that is located outside in environment that will provide input or receive output from the system.

DATA FLOW
• Is the place flow information
• Depicted with the straight line that connects the components of the system
• Data flow direction is indicated with arrows and lines give the name on the flow of data that flows
• Flow data flow between processes, data storage and data flow indicates that the form of data input for the system
Data flow in DFD is given an arrow symbol. This data flow between processes, store data, and external entity. This data flow represents the flow of data can be input to the system or the results of the processing system.
1. Packet of data
When two or more of a source to the same destination the same, it should be considered as a single stream of data, because two or more data flows together as a package. The data flow must be indicated as with the data flow, even though for example consist of several documents.
2. Diverging data flow
Data flow that spreads shows a number of copies of the same data flow from the same source to different purposes.
3. Converging data flow
Gather flow data showed that some of the data flow that is different from different sources to join with the same goal.
4. Cracked the concept and purpose of data flow
All data flow must be generated from a process (can be one or both, that is derived from a process or not come from a process but a process leading to or coming from a process and to a process). Concept is important because the flow of data is one of the results of a process or will be used to perform a process.

Processes
• The process is what is done by the system
• The process can process data flows or data entry into the flow of data out
• Transform the process of working one or more of the input data into one or more of the output data in accordance with the desired specifications
• Each process has one or more inputs and produce one or more output
• The process also often called bubble

Guidelines of the process:
1. Name of the process consists of a verb and noun, which reflects the function of the process
2. Do not use the process as part of the name of a bubble
3. May not have some process that has the same name
4. The process should be given a number. Order number wherever possible to follow the flow of the process or sequence, but the sequence number does not mean that the absolute is a process in chronological order

A process is an activity or work done by the people, machines, or the results of a computer data that flow into the process to produce data that will flow out of the process. for physical data flow diagrams (PDFD), the process can be done by people, machinery, computers and for logical data flow diagram (LDFD), a process only shows the process from the computer.
a. Identification data
This identification included of a number that shows the reference number of the process.
b. Process name
the name of the process shows what is done by the process.
c. Processing
PDFD to indicate that the process is not only the process of the computer but also a manual process, such as the process is done by people, machinery, and so forth. This shows the processor who or where the process is done. LDFD, here for as a processor is program that the process referred.

DFD error location is:
• Process does not have any input but the output, this error is called black holes, because the data into the process and the like disappeared into a black hole.
• The input process but did not receive the input, the error is called the miracle.

Data storage
a. Data storage is a storage place for data that exists in the system
b. Symbol with a pair of parallel lines or two lines with one side open side
c. The process can retrieve data from or provide data to the database
d. Guidelines of the name:
• The name should reflect the data storage
• When his name more than one word must be marked with the number


Data dictionary
Work to help the system to interpret the application in detail and organize all elements of the data used in the system precisely so that the system analyst and have a basic understanding of the same input, output, storage and process. At analysis, the data dictionary is used as a means of communication between the systems analyst with the user. At the system design, data dictionary is used to design input, reports and databases. Flow data on the global DAD, further details can be seen in the data dictionary.


Contain of Data dictionary
1. Name of data flow: must note that readers who need further explanation about a flow of data can find it easily.
2. Alias: alias or other name of the data can be written when there is.
3. Forms of data: used to segment the data dictionary to use when designing the system.
4. Flow data: indicates from which data flows and where the data.
5. Description: to give an explanation of the meaning of the data flow.
6. Period: at the occurrence of the data showed, that the DD can know when the input is input to the system, when the program is done, and when the report produced.
7. Volume: the volume should be noted is the volume average (the average number of flow data that flows in a certain period) and the peak volume (highest volume). Volume is how much to use external storage to be used.
8. Structure data: shows the flow of data recorded in the DD of the items whatever.

Balancing IN DFD
a. The flow of data into and out of a process must be the same as the flow of data into and out of the details of the process on the level / levels below.
b. Name of the data flow into and out of the process must match the name of the flow of data into and out of the details of the process.
c. Number and the name of an entity outside the process must be equal to the number of names and entities outside of the details of the process.
d. The issues that must be considered in the DFD which have more than one level:
• There must be a balance between input and output of one level and next level.
• Balance between level 0 and level 1 at the input / output of stream data to or from the terminal on level 0, while the balance between level 1 and level 2 is seen on the input / output of stream data to / from the process concerned.
• Name of the flow of data, data storage and terminals at each level must be the same if the same object.

Restrictions in DFD
a) Flow data may not be from outside the entity directly to other outside entities without going through a process.
b) Flow data may not be from the savings directly to the data to outside entities without going through a process.
c) Flow data may not be saving the data directly from the savings and other data without going through a process.
d) Flow data from one process directly to the other without going through the process of saving data should / be avoided as much as possible