Summary – Business Intelligence and Data Management 2019 – by Darya Krapyva
Content
Lecture 1 – Intro to BI + Data management ................................................................................. 2
Lecture 2 – Data Warehousing ........................................................................................................ 5
Lecture 3 – OLAP business databases & reporting ................................................................ 12
Lecture 4 – Data mining introduction (CH 1, 2, 3) .................................................................... 18
Lecture 5 – Regression Analysis (CH2, 6) ................................................................................. 22
Lecture 6 – Classification with k nearest neighbors (CH 7) .................................................. 24
Lecture 7 – Classification with Naive Bayes (CH8) ................................................................. 27
Lecture 8 – Performance measures (CH5) ................................................................................. 30
Lecture 9 – Decision trees (CH9) .................................................................................................. 33
Lecture 10 – Association rules (CH14) ....................................................................................... 38
Lecture 11 – Clustering (CH 15) .................................................................................................... 42
SQL LAB SESSIONS PART A, B, C, D - ANSWERS ................................................................. 46
Shmueli, Galit, Patel, Nitin R, and Peter C. Bruce, Data-Mining for Business Analytics, Wiley, 2016, ISBN 9781118729274.
, Created by: Darya Krapyva - 2019
Lecture 1 – Intro to BI + Data management
Part 1: Introduction
Data Management – Managing data as a valuable resource.
Business Intelligence – Data driven decision making. Transforming data into meaningful information/knowledge to support
business decision-making.
Data, information and knowledge
Data – items that are the most elementary descriptions of things, events, activities and transactions (Raw symbols).
They can come in both a structured and unstructured from, and Internal or external.
Information – organized data that has meaning and value (Formatted data). It is the result of processing raw data to reveal its
meaning.
Knowledge – processed data or information that is applicable to a business decision problem (Data relationships).
Taxonomy of Business Intelligence: Methods
1. Descriptive analytics – Use data to understand past & present (OLAP, DBM and Data warehousing framework).
KPIs are often put into a dashboard view to provide (real-time) insights.
2. Predictive analytics – Predict future behaviour based on past performance (regression and clustering).
3. Prescriptive analytics - Make decisions or recommendations to achieve the best performance.
Taxonomy of Business Intelligence: Function - Marketing analytics, Sales analytics, HR analytics, Financial analytics etc.
Part 2: Introduction to Business Intelligence
From DSS to Intelligence or Analytics (two views):
There are two views on this subject:
1. Business Intelligence: data warehousing and descriptive analytics.
Business Analytics: predictive and prescriptive analytics
2. “Within this course, Business Intelligence = Business Analytics.”
BI is an umbrella term that combines the processed technologies, and tools needed to transform data into information,
information into knowledge, and knowledge into plans that drive profitable business action (Sharda 2014). → process definition.
Another definition is that it is information and knowledge that enables business decision-making (Sabherwal, 2011) → product
definition
The objective of the BI product is to provide historical, current and predictive
views of business operations. Information/knowledge that could relate to:
- Understanding customer preferences
- Coping with competition
- Identifying with opportunities
- Enhancing internal efficiency
BI Solution – Support the BI process by utilizing BI tools.
BI product - information and knowledge that supports decision making.
BI tools - data warehousing, knowledge management and statistics.
A performance dashboard is a combination of techniques.
2
, Created by: Darya Krapyva - 2019
Part 3: Introduction to Databases
Database – A collection of related tables, designed, maintained and utilized by multiple users, with software to update & query
the data. Manipulation of data is possible using query language. As the data is divided in smaller proportions, it is important to
connect the smaller proportions (joining tables).
A database consists of the following Database Elements: Data (the database), software, hardware and users.
Database management system (DBMS) is the software that controls the data (Oracle, DB2, mySQL). It manages the data
within the database and offers ways to manipulate the data using query language.
The DBMS contains a data dictionary that can look up the required date component structure and also offers you the possibility
to change this. It also creates and manages the complex structures required for data storage and helps with performance tuning
(increasing efficiency) with the multiple physical data files present.
“As you can see, the DBMS receives Structured Querying Language (SQL) queries from the client and accesses the database
for file access. As a result, data is transported from the database to the requesting client.”
Database systems allow users to:
1. Organise (CREATE)
2. Store (INSERT)
3. Update (UPDATE)
4. Delete (DELETE)
5. Retrieve (SELECT)
Database Terminology – A database consists of separate tables with their uniquely defines names (employees, customers
and orders). These tables form a structures list of data of a specific type. Every Table is divided into Fields (columns) and
Records (rows). A table is a structured list of data of a specific type with a name.
3
, Created by: Darya Krapyva - 2019
Part 4: Relational Databases
Relational Databases allow data to be grouped into tables + to subsequently set relationships between these tables. You can
use a Join line to link different tables using a common field. Such a Join line indicates the relationship between two tables
(customers and orders).
Keys are important as they establish relationships among tables, and they ensure the integrity of data. There are different kind
of keys to be found in tables:
• Primary key (PK) – Fields that uniquely identifies each record in a table (can never be null) In a relational table draft,
primary keys are underlined. Bno (Book), Rno (Reader), Bno + Rno + Load date (Loan)
• Keys – consist of 1 or more attributes that determine other attributes. Key’s role is based on determination: A → B, C,
D. If you know A, you can lookup B, C, D (so these are functionally dependent on A)
• Composite key – a key that is composed of more than one key attribute.
• Composite Primary key – if you need a combination of two tables for the new table (book and reader to determine
load data).
• Super key – Any key that uniquely identifies each row (Author, Title, Bno)
• Candidate key – Super key without unnecessary attributes (Bno)
• Relational scheme – Textual representation of the database tables. Primary key attributes are underlined. (Book
(Bno, Author, Title – Loan (Bno, Rno, Loan date, Return date)
• Foreign key - Attribute whose values match the primary key values in the related table.
• Secondary key - key strictly used for data retrieval (does not need to yield a unique outcome).
Class exercise – identify the Primary Key and Foreign Key
“In short, the Primary Key is the candidate key chosen to be the unique row identifier.
The choice of a PK is based on the designer or end-user requirements. Each primary key value must be unique to ensure the
entity integrity (null not permitted in PK).”
4