This document contains all reading material and lectures that were discussed during the lectures of Business Intelligence & Business Analytics in the spring of 2023.
Note that the last Python exercises are not added in this summary!
Full list of material:
- Database systems: design, impleme...
Business Intelligence & Business Analytics
Spring 2023 – Anne van den Eijnden
Intro to Business Intelligence ................................................................................................................... 2
The Database .......................................................................................................................................... 2
Data Modelling and Data Models ........................................................................................................ 4
Data Warehousing...................................................................................................................................... 5
Data lake .................................................................................................................................................. 7
OLAP Business Databases & Dashboards.............................................................................................. 8
Extraction, Transformation, and Load (ETL)....................................................................................... 8
OLAP Business Databases .................................................................................................................... 9
Performance Dashboards ................................................................................................................... 11
Business Analytics .................................................................................................................................... 12
Overview of the Data Mining Process.............................................................................................. 13
CRISP-DM.............................................................................................................................................. 14
Data Visualization ................................................................................................................................. 15
Regression Models ................................................................................................................................... 18
Performance Measures............................................................................................................................ 19
Prediction accuracy measures ............................................................................................................ 19
Judging Classifier Performance .......................................................................................................... 20
Naïve Bayes ............................................................................................................................................... 22
K-Nearest Neighbors ............................................................................................................................... 23
Decision Trees........................................................................................................................................... 24
Association Rules ...................................................................................................................................... 26
Collaborative Filtering.......................................................................................................................... 27
Clustering ................................................................................................................................................... 28
Distance Measures ............................................................................................................................... 30
SQL.............................................................................................................................................................. 31
Python ........................................................................................................................................................ 36
,Intro to Business Intelligence
Business Intelligence is an umbrella term that combines the processes, technologies, and tools
needed to transform data into information, information into knowledge, and knowledge into
plans that drive profitable business action.
Data are facts that have not yet been processed to reveal their meaning (raw)
- You transform raw data into a data summary to provide more insight
- Raw data must be properly formatted for storage, processing, and presentation
- Structured data: the result of taking unstructured data and formatting it to facilitate storage,
use and the generation of information
Information is the result of processing raw data to reveal its meaning
- Information requires context
- Information can be used as the foundation for decision making
Knowledge is the body of information and facts about a specific subject
- Implies familiarity, awareness, and understanding of information as it applies to an
environment
Data management is a discipline that focuses on the proper generation, storage and retrieval of
data
The Database
A database is a shared, integrated computer structure that stores a collection of:
- End-user data: raw facts of interest to the end user
- Metadata: through which the end-user data are integrated and managed
A database management system (DBMS) is a collection of programs that manages the database
structure and controls access to the data stored in the database
- Serves as intermediary between the user and the database
- Presents the end user with a single, integrated view of the data in the database
Advantages of DBMS
- Improved data sharing - Improved data access
- Improved data security - Improved decision making
- Better data integration - Increased end-user productivity
- Minimized data inconsistency
Types of Databases
A single user DB supports only one user at a time
- A desktop DB is a single-user DB that runs on a personal computer
A multi-user DB supports multiple users at the same time
- A workgroup DB supports a relatively small number of users (<50)
- An enterprise DB supports many users (>50)
A centralized DB supports data located at a single site
A distributed DB supports data distributed across several different sites
An operational DB is designed primarily to support a company’s day-to-day operations
A data warehouse focuses primarily on storing data used to generate information required to
make tactical or strategic decisions
,Database Systems
A database system refers to an organization of
components that define and regulate the collection,
storage, management, and use of data within a
database environment
Hardware refers to all of the system’s physical
devices
Software:
- Operating system software manages all hardware components and makes it possible for all
other software to run on the computer
- DBMS software manages the database within the database system
- Application programs and utility software are used to access and manipulate data in the
DBMS and to manage the computer environment in which data access and manipulation
take place
Users:
- System administrators oversee the database system’s general operations
- Database administrators manage the DBMS and ensure that the database is functioning
properly
- Database designers design the database structure
- System analysts and programmers design and implement the application programs
- End users are the people who use the application programs to run the organization
Procedures are the instructions and rules that govern the design and use of the database system
Data covers the collection of facts stored in the database
DBMS Functions
- Data dictionary management: store definitions of the data elements and their relationships
(metadata) in a data dictionary
- Data storage management: creates and manages the complex structures required for data
storage
o Performance tuning: activities that make the database perform more efficiently in terms
of storage and access speed
- Data transformation and presentation: transforms entered data to conform to required data
structures and transforms data to make it conform to the user’s logical expectations
- Security management: creates a security system that enforces user security and data privacy
- Multiuser access control: uses sophisticated algorithms to ensure that multiple users can
access the database concurrently without compromising the integrity of the database
- Backup and recovery management: provides backup and data recovery to ensure data safety
and integrity
- Database access languages and application programming interfaces: data access through a
query language (SQL)
- Database communication interfaces:
o End users can generate answers to queries by filling in screen forms through their
preferred web browser
o Automatically publish predefined reports on a website
o Connect third-party systems to distribute information via e-mail or other applications
Data Modelling and Data Models
Data modelling, the first step in designing a database, refers to the process of creating a specific
data model for a determined problem domain
A data model is a relatively simple representation, usually graphical, of more complex real-world
data structures
- A communication tool that fosters improved understanding of the organization
- Done properly, the final data model is in effect a ‘blueprint’ containing all the instructions to
build a database that will meet all end-user requirements
An implementation-ready data model should contain at least the following components:
- A description of the data structure that will store the end-user data
- A set of enforceable rules to guarantee the integrity of the data
- A data manipulation methodology to support the real-world data transformation
Basic Building Blocks
An entity is anything about which data are to be collected and stored
- Each entity occurrence is unique and distinct
An attribute is a characteristic of an entity
A relationship describes an association among entities (bidirectional
- One-to-Many (1..*)
- Many-to-Many (*..*)
- One-to-One (1..1)
A constraint is a restriction placed on the data to ensure data integrity
Relational Table
- A table is perceived as a two-dimensional structure composed of rows and columns
- Each row (tuple) represents a single entity occurrence within the entity set
- Each column represents an attribute, and each column has a distinct name
- Each row/column intersection represents a single data value
- All values in a column must conform to the same data format
- Each column has a specific range of values known as the attribute domain
- The order of rows and columns is immaterial to the DBMS
- Each table must have an attribute or a combination of attributes that uniquely identifies each
row called a key
- Tables within the database share common attributes that enable the tables to be linked
together
Keys
The key’s role is based on a concept known as determination
- If you know the value of attribute A, you can look up the value of attribute B (A → B)
- Attribute B is functionally dependent on attribute A if each value in column A determines
one and only one value in column B
Composite key: a key that is composed of more than one attribute
- Any attribute that is part of a key is known as a key attribute
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper anneTBKIM. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.