Conception and Modeling of Relational DatabasesArticle contents:
The Entity Relationship Model
The creation of the structure for a relational database is an iterative process in which a technical data model is created on the basis of a business data model. The physical data model determines exactly how the data are structured in a DBMS. The procedure is roughly comparable with the design of classes in an object-oriented system, except the goal is not the implementation of a system but the implementation of a database structure.
As with system design, a graphical modeling language can also be used for the design and documentation of databases. Usually, an Entity Relationship Diagram (or ER Diagram) is used to display the all-important elements of a database structure.
Unlike the universal UML class diagram that can be used to model data structures and systems, the ER diagram is only used to model data schemas. In contrast to UML as the standard modeling language for IT systems, there are several variants of the ER diagram with differing notation elements.
Elements of an ER Model
The elements in an ER Model are entities, attributes, keys, and relationships. In practice, artificial keys (surrogate keys) are often used as primary keys. An artificial key is a purely technical attribute that carries no other information and has no equivalent in the world outside the DBMS. It is often a whole number and is automatically filled and managed by the DBMS.
Entities can be related to each other. A relationship is two-digit/two-figures if two entities are related. But relationships can also be multi-digit, i.e., they can relate more than two entities. A relationship itself can also have its own attributes.
Graphical Notation of Entities and Attributes in ER Models
The most important elements in graphical ER diagrams are presented in the following three notation forms: Chen notation, Martin notation (or crow’s foot notation), and as UML class diagrams.
Picture shows entities and attributes in Chen notation. In this notation, entities are represented by a rectangle and attributes by an oval that are connected to the corresponding entity. Primary keys are identified by underlining the attribute name.
Martin notation (crow’s foot notation)
Figure 13 shows an entity with attributes in Martin notation. The name of the entity is noted over a rectangle. The primary key is noted in the upper part of the rectangle (the key-icon is optional), all other attributes are in the lower part.
UML class diagram
The UML class diagram is used to model the entity as a class with attributes, as shown in picture. Primary keys are noted as property values of the attribute in curly brackets after the name.
In practice, the UML and the Martin notations are more compact because the modeling of the individual attributes does not take up much space and the affiliation of attributes to relations can be determined at a glance.
Normal Forms of Databases
In order to avoid possible errors when working with stored data, the structure of tables in databases is evaluated and improved on the basis of objective criteria, in particular on whether the structure conforms to identified normal forms. Normalization optimizes the structure of a database and the normal form of a database is a measure of this structure. It provides information on how redundant data are stored in a database. Data contain redundancy if parts can be omitted without any loss of information. If data are stored redundantly in a database, the risk of inconsistencies is high. When creating the structure of a database table, you should therefore attempt to keep redundancy as minimal as possible. The higher the normal form, the less redundant the data are. To determine the normal form, consider the structure of attributes of a relation and their interdependencies. Figure shows the dependencies of the normal forms explained in more detail. In addition, the Boyce Codd normal form and the fourth normal form take further steps above the third normal form. For most practical projects the third normal form is sufficient. Often the third normal form won’t be achieved because of reasons of efficiency.
Depending on requirements for data stock, a data schema will be transferred into the normal form that is needed by normalization. Using the example given in Table, the first, second, and third normal forms of tables are explained below. The example in Table shows which vendor supplied which customer with which article. There are three columns in the table that contain information about the vendor, customer, and article supplied.
|Table 24: Table for Article Delivery Example|
|MediaService, Hattstadt||Sophie Weber, 182932||Edge Of Tomorrow; EAN 838883|
|MovieGroup, Hamburg||Sophie Weber, 182932||Cry Baby; EAN 1222233|
|MovieGroup, Hamburg||Ralf Fischer, 488389||In the Morning; ISBU XX-NHN-223|
|FilmService, Berlin||Maria Bolz, 299376||Edge Of Tomorrow; EAN 838883|
First Normal Form
The following conditions must be fulfilled for a table to be in the first normal form:
- All attributes must contain single values. Composite values are not allowed. Exactly one value must be stored for each attribute.AND
- Each individual data record can be uniquely identified using a primary key.
In Table 24, these criteria are not met because all attributes are composite values. The normalization step for the first normal form splits all occurring compound values, value enumerations, and nested values, and creates new columns for the sub-elements. Key attributes are identified by which data records in the table can be uniquely distinguished from each other. Furthermore, no primary key has yet been defined. Table 25 shows our example table placed into the first normal form. The composite attributes supplier, customer, and article have been broken down into their individual components and defined as separate attributes. In addition, a composite primary key has been determined, namely the attributes “CustomerNumber” and “ArticleID.” In Table 25 and all subsequent tables, attributes used as primary keys are set in boldface.
|Table 25: First Normal Form with Compound Primary Key|
|MediaService||Hattstadt||Weber||Sophie||182932||Edge of Tomorrow||EAN 838883|
|Movie Group||Hamburg||Weber||Sophie||182932||Cry Baby||EAN 1222233|
|Movie Group||Hamburg||Fischer||Ralf||488389||In the Morning||ISBU XX-NHN-223|
|FilmService||Berlin||Bolz||Maria||299376||Edge of Tomorrow||EAN 838883|
Table 26 shows a possible alternative where an artificial key is selected as the primary key. The column ArticleDeliveryID has been added to the table for this purpose.
|Table 26: First Normal Form with Transformed Table Including Simple Primary Key. Part 1|
With the first normal form, it is possible to sort according to certain attributes and to clearly reference and change data records.
Second Normal Form
The second normal form is only relevant for tables with compound primary keys. In order to achieve the second normal form, all criteria for the first normal form must be met and, each non-key attribute is always dependent on the entire compound primary key and not on only a part of the key.
If the primary key consists of at least two attributes, all other attributes of the relation must always be dependent on the entire key. It is not sufficient for an attribute to be dependent on only part of a key.
Table 25, which is in the first normal form, shows the dependencies illustrated in Figure 16:
- The attributes CustomerLastName and CustomerFirstName are dependent on the customer number, since the attributes CustomerLastName and CustomerFirstName follow from the attribute CustomerNumber. However, the customer number is only part of the primary key.
- The ArticleName attribute is dependent on the ArticleID, since the ArticleID attribute can be used to infer the value of ArticleName. However, ArticleID is only part of the primary key.
Tables 27, 28, and 29 show the results of Table 25 transformed into the second normal form. The dependencies on parts of the primary key have been resolved by creating two new tables: Customer and Item.
|Table 27: ArticleDelivery, Second Normal Form|
|Table 28: Customer, Second Normal Form|
|Table 29: ArticleDelivery, Second Normal Form|
|Edge of Tomorrow||EAN 838883|
|Cry Baby||EAN 1222233|
|In the Morning||ISBU XX-NHN-233|
There is no second normal form for Table 26 shown above because a simple primary key was selected for this table.
By normalizing to the second normal form, each relation or table represents exactly one issue. Redundancy is eliminated by placing previously repeated information, such as article or customer, in separate tables and referencing these tables in the other tables. In the example, CustomerLastName and ArticleName are now stored exactly once as opposed to several times, as
was the case in the first normal form.
Third Normal Form
For a table to be in the third normal form, first the criteria of the first normal form must be fulfilled and in the case of tables with a composite primary key, the criteria of the second normal form. In addition, the following must apply:
Any non-key attribute depends directly on the key attribute. There are no functional dependencies between attributes that are not part of the primary key.
With the second normal form, the remaining non-key attributes of a table are thematically linked to each other. However, it must be examined to see whether there are dependencies between the attributes that should be stored in a separate table.
Figure 17 shows the dependencies between the SupplierName and SupplierCity attributes from Table 27. These two attributes have dependencies that are not related to any part of the primary key. Therefore, these dependencies must first be resolved in order to reach the third normal form.
The third normal form of the original Table 24 is shown in Figure 18. A total of four tables have been created.
The third normal form of the original Table 24 with a newly introduced simple primary key is shown in Figure 19. Four tables were also created for this third normal form.
In the third normal form, all the transitive dependencies of the data are immediately visible, since they are directly represented by the structure of the tables.