Data
Management
Literature
I:
Database
systems
Database
systems:
design,
implementation,
and
management
(8
edition),
P.
Rob
and
C.
Coronell
1.1
Data
vs.
Information
Data
are
raw
facts
which
have
not
yet
been
processed
to
reveal
their
meaning.
Information
is
the
result
of
processing
raw
data
to
reveal
kits
meaning.
Raw
data
must
be
properly
formatted
for
storage,
processing
and
presentation.
Also,
to
reveal
meaning,
information
requires
context.
Information
can
be
used
as
the
foundation
for
decision
making.
Good
decision
making
is
the
key
to
organizational
survival
in
a
global
environment.
We
are
entering
a
knowledge
age.
Information
is
the
bedrock
of
knowledge.
Knowledge
implies
familiarity,
awareness,
and
understanding
of
information
as
it
applies
to
an
environment.
Data
management
is
a
discipline
that
focusses
on
the
generation,
storage
and
retrieval
of
data.
1.6
Database
systems
A
file
system
contains
separate
and
unrelated
files.
Data
systems
consist
of
logically
related
data
stored
in
a
single
logical
data
repository.
Because
of
this
single
repository,
the
database
represents
a
big
change
in
the
way
end-‐user
data
is
stored,
accessed
and
managed.
Advantages
of
database
management
systems
(DBMS)
over
file
systems:
• Possibility
to
eliminate
most
of
the
file
system’s
data
inconsistency,
data
anomaly,
data
dependency
and
structural
dependency
problems
• Current
DBMS
software
stores
the
relationships
between
those
structures
and
the
access
paths
to
those
structures,
all
in
a
central
location
• Current
DBMS
takes
care
of
defining,
storing,
and
managing
all
required
access
paths
to
those
components
Note
that
the
DBMS
is
one
of
several
crucial
components
of
a
database
system.
1.6.1
The
database
system
environment
Database
system
refers
to
an
organization
of
components
that
define
and
regulate
the
collection,
storage,
management
and
use
of
data
within
a
database.
The
environment
consists
of
five
components:
• Hardware:
system’s
physical
devices
• Software:
three
types
of
software
are
needed
o Operating
system
software:
manages
all
hardware
components
and
facilitates
all
software
to
run
on
the
computer
(Windows)
, o
DBMS
software:
manages
data
within
the
database
system
(SQL)
o Application
programs
and
utility
software:
to
access
and
manipulate
data
in
the
DBMS
and
to
manage
the
computer
environment
in
which
access
and
manipulation
takes
place
(graphical
user
interface)
• People:
all
users.
Five
types
can
be
identified:
o System
administrators:
oversee
the
database
system
general
operations
o Database
administrators:
manage
DBMS
and
ensure
it
works
properly
o Database
designers:
design
the
structure.
In
fact,
they
are
the
database
architectures.
If
the
database
design
is
poor,
the
whole
environment
can
be
useless.
They
determine
what
data
are
to
be
entered
into
the
database.
o System
analysts
and
programmers:
design
and
implement
the
application
programs.
Create
entry
screens,
reports
and
procedures.
o End
users:
use
application
programs
to
run
organization’s
daily
operations
• Procedures:
instructions
and
rules
that
govern
the
design
and
use
of
the
database
system.
A
way
to
monitor
and
audit
the
data
and
information.
• Data:
all
collection
of
facts
stored
in
the
database.
Database
systems
can
be
created
and
managed
at
different
levels
of
complexity
and
with
varying
adherence
to
precise
standards.
Managers
must
take
into
account
that
database
solutions
must
be
cost-‐effective
and
technically
and
strategically
effective.
1.6.2
DBMS
functions
A
DBMS
performs
important
functions
to
ensure
the
integrity
and
consistency
of
data.
• Data
dictionary
management:
definitions
of
the
data
elements
and
their
relationships
(metadata)
are
stored
in
a
data
dictionary.
Is
used
by
the
DBMS
to
look
up
the
required
data
component
structures
and
relationships,
so
you
don’t
have
to
code
this.
Also,
any
change
made
in
the
database
structure
is
automatically
recorded
in
the
dictionary.
It
provides
data
abstraction
and
removes
structural
and
data
dependency
from
the
system.
• Data
storage
management:
creates
and
manages
the
complex
structures
required
for
data
storage,
so
you
don’t
have
to
define
and
program
the
physical
data
characteristics.
Also
important
for
performance
tuning,
which
relates
to
the
activities
that
make
the
database
perform
more
efficiently
in
terms
of
storage
and
access
speed.
• Data
transformation
and
presentation:
transforms
entered
data
to
conform
to
the
required
structures.
• Security
management:
security
system
that
enforces
user
security
and
data
privacy.
Who
can
access
what
data
items
and
who
can
perform
which
functions.
• Multiuser
access
control:
sophisticated
algorithms
to
ensure
that
multiple
users
can
access
the
database
concurrently
without
compromising
the
integrity
of
the
database.
• Backup
and
recovery
management:
to
ensure
data
safety
and
integrity.
Current
systems
facilitate
routine
and
special
backup
and
restore
procedures.
• Data
integrity
management:
promotes
and
enforces
integrity
rules
to
minimize
data
redundancy
and
maximizing
data
consistency.
, • Database
access
languages
and
application
program
interfaces:
a
query
language
(nonprocedural
language)
provides
data
access.
Structured
query
language
(SQL)
is
the
standard
query
language
and
data
access
standard
in
DBMS.
• Database
communication
interfaces:
DBMSs
accept
end-‐users
requests
via
multiple
network
environments.
For
instance,
access
via
various
browsers,
automatically
publish
reports
or
connect
third-‐party
systems
to
distribute
information.
1.6.3
Managing
the
database
system:
a
shift
in
focus
The
role
of
the
human
components
shifts
from
programming
to
broader
aspects
of
managing
the
organization’s
data
resources
and
the
administration
of
the
complex
database
software.
The
database
system
facilitates
a
more
sophisticated
use
of
data
resources
as
long
as
the
database
is
designed
to
make
use
of
that
available
power.
Database
systems
also
have
disadvantages:
• Increased
costs:
hardware,
software
and
human
skills
requirements.
• Management
complexity:
many
different
technologies
and
has
a
significant
impact
on
a
company’s
resources
and
culture.
Security
issues
must
be
assessed
constantly.
• Maintaining
currency:
system
must
be
kept
current
to
maximize
efficiency.
Also,
personnel
training
costs
are
high
because
technology
develops
rapidly.
• Vendor
dependence:
given
the
heavy
investment
in
technology
and
personnel
training,
companies
might
be
reluctant
to
change
database
vendor
⇒
probably
no
price
advantages
and
customers
are
limited
in
choice
of
components.
• Frequent
upgrade/replacement
cycle:
vendors
frequently
update
their
products.
Upgrades
cost
money
and
training
the
users
and
administrators
costs
money.
2.1
Data
modeling
and
data
models
Data
modeling
is
the
first
step
in
designing
a
database.
It
is
the
process
of
creating
a
specific
data
model
for
a
determined
problem
domain.
A
data
model
is
a
relatively
simple
representation
of
more
complex
real-‐world
data
structures
(often
graphical).
Within
the
database
environment,
the
data
model
represents
data
structures
and
their
characteristics,
relations,
constraints,
transformations,
and
other
constructs
to
support
a
specific
problem
domain.
An
implementation-‐ready
data
model
should
contain
at
least
the
following
components:
• A
description
of
the
data
structure
that
will
store
the
end-‐user
data
• A
set
of
enforceable
rules
to
guarantee
the
integrity
of
the
data
• A
data
manipulation
methodology
to
support
the
real-‐world
data
transformations
2.2
The
importance
of
data
models
Data
models
can
facilitate
interaction
among
the
designer,
the
applications
programmer
and
the
end
user.
Data
models
are
a
communication
tool.
Every
user
(programmers,
managers,
employees,
etc.)
view
data
differently.
A
sound
data
environment
requires
an
overall
database
blueprint
based
on
an
appropriate
data
model.
When
a
good
blueprint
is
available,
it
doesn’t
matter
that
the
users’
view
of
data
is
different.
, Keep
in
mind
that
a
house
blueprint
and
the
data
model
are
an
abstraction.
You
cannot
live
in
the
blueprint
or
draw
required
data
out
of
the
data
model.
2.3
Data
model
basic
building
blocks
There
are
four
main
building
blocks
of
all
data
models:
• Entities:
anything
about
which
data
are
to
be
collected
and
stored.
Entities
represent
a
particular
type
of
object
and
are
therefore
distinguishable.
Every
occurrence
is
unique.
• Attributes:
a
characteristic
of
an
entity.
• Relationship:
an
association
among
entities.
o One-‐to-‐many
(1:M
or
1..*):
“A
painter
paints
many
different
paintings,
but
each
one
of
them
is
painted
by
only
one
painter”
o Many-‐to-‐many
(M:N
or
*..*):
“An
employee
may
learn
many
job
skills
and
each
job
skill
may
be
learned
by
many
employees”
o One-‐to-‐one
(1:1
or
1..1):
“A
retail
company’s
management
structure
may
require
that
each
of
its
stored
be
managed
by
a
single
employee.
In
turn,
each
store
manager
manages
only
a
single
store”
Each
relationship
goes
in
both
directions;
bidirectional.
• Constraint:
a
restriction
placed
on
the
data.
They
help
ensure
data
integrity.
3.1
A
logical
view
of
data
Relational
databases
allow
data
to
be
grouped
into
tables
and
set
relationships
between
tables.
A
relational
model
enables
you
to
view
data
logically
rather
than
physically.
It
allows
the
designer
to
focus
on
the
logical
representation
of
data
and
its
relationships,
rather
than
on
the
physical
storage
details.
3.1.1
Tables
and
their
characteristics
The
logical
view
of
the
relational
database
is
facilitated
by
the
creation
of
data
relationships
based
on
a
logical
construct
known
as
a
relation.
A
table
contains
a
group
of
related
entity
occurrences.
Characteristics
of
a
relational
table:
• A
table
is
perceived
as
a
two-‐dimensional
structure
composed
of
rows
and
columns
• Each
table
row
(tuple)
represents
a
single
entity
occurrence
within
the
entity
set
• Each
table
column
represents
an
attribute,
and
each
column
has
a
distinct
name
• Each
row/column
intersection
represents
a
single
data
value
• All
values
in
a
column
must
conform
to
the
same
data
format
o Numeric:
when
you
can
perform
meaningful
arithmetic
procedures
o Character:
text
data
or
string
data
o Date:
calendar
dates
in
a
special
format
o Logical:
a
true
or
false
condition
• Each
column
has
a
specific
range
of
values
known
as
the
attribute
domain
• The
order
of
the
rows
and
columns
is
immaterial
to
the
DBMS
• Each
table
must
have
an
attribute
or
a
combination
of
attributes
that
uniquely
identifies
each
row