BUSINESS INFORMATION SYSTEMS
International Business Administration
2021-2022
Source:Valacich, S. J. & Schneider, C. (2018). Information Systems Today: Managing in
the Digital World, 8/e, Global Edition. Harlow, UK: Pearson Education Ltd. ISBN 978-1-
292-21597-6.
,Business Information Systems
Lecture 0
Data management
Why is data management important?
There are three key problems caused by ‘bad’ data management practices:
1. Data redundancy: presence of duplicate data in multiple places
2. Data inconsistency: same attribute has different values
3. Dataset inflexibility: how can you add some data later?
Relational database: data is not stored in 1 big table, but in separate tables that are linked
to each other with keys.
Relational databases are based on a data model, which defines the characteristics of data
within entities and maps the relationship between entities.
Basic elements of relational databases:
Data model (or database): a collection of related entities
Entity: something you collect data about (students, classes, employees, customers, projects,
etc.) Entities are also called tables, where each row is a record, and each column is an
attribute.
Attribute (or field/property): a characteristic of an entity (e.g. Student has attributes
studentID, first_name, last_name, date_of_birth, etc.)
Record: collection of data on attributes about a single instance of an entity (‘1234567’,
‘Adam’, ‘Johnson’, 01-01-2001’, etc.)
1
,Normalization
Normalization is a technique to manage the trade-off between data redundancy and ease of
use / processing speed in databases. It is a standardized technique that transforms
databases iteratively accordingly to an increasingly strict set of rules.
A database satisfies a normal form if every table in that database satisfied the normal form.
1NF (1st normal form) – requirements:
1. The domain of each attribute contains only atomic values
2. The value of each attribute contains a single value from that domain
Domain: the space of possible values that a specific attribute derives it values from (i.e. what
can be entered as a value into this field).
Atomic value: a value that cannot be split into multiple logically distinct parts.
1NF (1st normal form) – illustration
1. The domain of each attribute contains only atomic values: An address (e.g. De
Boelelaan 1105, 1081 HV Amsterdam) should be split into distinct parts: street
names, street number, postal code, city.
2. The value of each attribute contains a single value from that domain: on a shopping
list the attribute item can only be one thing, for example “bread”. An item cannot
van multiple values, such as “bread, milk”.
2NF (2nd normal form) – requirements
Second normal form addresses redundancy. A table satisfies 2NF if…
1. … it satisfies 1NF
2. … in case there is a composite key, there are no partial dependencies
Composite key: a record is uniquely identified by a combination of multiple attributes.
Partial dependency: when a non-key attribute depends on a subset of the composite key.
2NF – illustration
Imagine a table of student course grades. Composite key would be a combination of the
attributes ‘student’, ‘course’ and ‘year’ (assuming for simplicity you can only get a course
grade once a year).
2
, If we include attributes that only relate to student in this table, for example ‘birth year’ or
‘nationality’, these attributes would have partial dependency on the key of the table.
a) The easiest fix to get a table in 2NF is to add a unique primary key in the form of an
ID (gradeID would be a good solution based on the previous example).
b) A different solution would be to split the table up into separate tables, where each
table contains information that exclusively relates to a part of the composite key (in
the previous example move student-specific info to a Student table).
3NF (3rd normal form) - requirements
A table satisfies 3NF if…
1. … it satisfies 2NF (and thus automatically 1NF)
2. … there are no transitive dependencies
Transitive dependency: attributes are functionally dependent on other non-key attributes.
3NF - Illustration
In the example of student grades, if we use solution a and add a unique gradeID to the table
to satisfy 2NF, we would now create the next problem: attributes ‘birth year’ and
‘nationality’ are dependent on the (now) non-key attribute ‘student’.
a) The main solution is to split the table up into separate tables, where each table
contains information that exclusively relates to a part of the composite key (in the
previous example move student-specific info to a Student table).
b) Because the student name might not be unique, it makes sense to add a studentID to
this table too.
Normalization – summary:
Through normalization steps, we split the flat file table into multiple related tables. No
redundancy, no inconsistency (but still room for human error!). Lower storage requirements.
Increased complexity of required SQL coding.
Entity Relationship Diagrams (ERD)
Entity-relationship diagrams describe the relationships between things of interest (entities).
An ERD is based on four elemental concepts:
• Entities
• Attributes
• Relationships
• Cardinalities
Entities represent things that are of interest (i.e., about which you want to store some data):
player, student, course, employee, assessment etc.
Attributes describe an entity (i.e., they represent properties of an entity). Player -> name,
country, date of birth.
3