Index
Articles Bi DM (Week 1-3).......................................................................................................................4
Art.1 Database Management (1.1, 1.6, 2.1-2.3, 3.1, 3.2, 3.6, 5.1, 5.2)...............................................4
1. Database systems.......................................................................................................................4
2. Data models................................................................................................................................7
3. The rational Database model......................................................................................................8
5. Normalization of the database tables.......................................................................................11
Art. 2 Data Warehouse Design.........................................................................................................12
1. Introduction to Data warehousing................................................................................................12
1.1 Decision Support Systems.......................................................................................................12
1.2 Data Warehousing..................................................................................................................13
1.3 Data Warehouse Architectures...............................................................................................14
1.3.1 Single-Layer Architecture.....................................................................................................14
1.3.2 Two-Layer Architecture.......................................................................................................14
1.3.3 Three-Layer Architecture.....................................................................................................16
1.3.4 An Additional Architecture Classification.............................................................................16
Art. 3 Multidimensional Database Technology.................................................................................18
BI DM Book (Week 4-11)......................................................................................................................22
Part 1 Priliminaries...........................................................................................................................22
Chapter 1: Introduction....................................................................................................................22
1.1 what is business analytics?.....................................................................................................22
1.3 Data mining and related terms...............................................................................................22
1.4 Big data...................................................................................................................................22
1.7 Terminology and notation......................................................................................................22
Chapter 2: Overview of the data mining process..............................................................................24
2.2 Core ideas in data mining.......................................................................................................24
2.3 The steps in data mining.........................................................................................................24
2.4 preliminary steps....................................................................................................................25
2.5 Predictive power and overfitting............................................................................................26
2.8 Automating data mining solutions..........................................................................................27
Part 2 Data exploration and dimension reduction................................................................................28
Chapter 3: Data Visualization...........................................................................................................28
3.1 Uses of Data visualization.......................................................................................................28
3.2 Data examples........................................................................................................................28
3.3 Basic charts: bar charts, line graphs and scatter plots............................................................28
1
, 3.4 Multidimensional visualization...............................................................................................29
3.5 Specialized visualizations........................................................................................................30
Chapter 4 Dimension Reduction.......................................................................................................31
4.1 Introduction............................................................................................................................31
4.2 Curse of Dimensionality..........................................................................................................31
4.3 Practical considerations..........................................................................................................31
4.4 Data summaries......................................................................................................................31
Part 3 Performance evaluation.............................................................................................................32
Chapter 5: Evaluating predictive performance.................................................................................32
5.1 Introduction............................................................................................................................32
5.2 Evaluating predictive performance.........................................................................................32
5.3 Judging classifier performance...............................................................................................33
5.4 Judging ranking performance.................................................................................................35
5.5 Oversampling..........................................................................................................................35
Part 4 Prediction and classifications methods......................................................................................37
Chapter 6: Multiple linear regression...............................................................................................37
6.1 Introduction............................................................................................................................37
6.2 Explanatory vs. predictive modelling......................................................................................37
6.3 Estimating the regression equation and prediction................................................................37
6.4 Variable selection in linear regression....................................................................................37
Chapter 7: k-Nearest-neighbours (k-NN)..........................................................................................39
7.1 The k-NN classifier (categorical outcome)..............................................................................39
7.2 k-NN for a numerical response...............................................................................................40
7.3 Advantages and shortcomings of k-NN algorithms.................................................................40
Chapter 8: The Naïve Bayes classifier...............................................................................................41
8.1 Introduction............................................................................................................................41
8.2 Applying the full (exact) Bayesian classifier............................................................................41
8.3 Advantages and shortcomings of the Naïve Bayes classifier...................................................41
Chapter 9: Classification and Regression Trees................................................................................43
9.1 Introduction............................................................................................................................43
9.2 Classification trees..................................................................................................................43
9.3 Evaluation the performance of a classification tree................................................................44
9.4 Avoiding overfitting................................................................................................................44
9.5 classification rules from trees.................................................................................................45
9.6 Classification trees for more than two classes........................................................................45
9.7 Regression trees.....................................................................................................................45
2
, 9.8 Advantage, weaknesses, and extensions................................................................................46
9.9 Improving prediction: multiple trees......................................................................................46
Part 5 Mining relationships among records..........................................................................................48
Chapter 14: Association rules and collaborative filtering.................................................................48
14.1 Association rules...................................................................................................................48
14.2 Collaborative filtering...........................................................................................................50
14.3 Summary...............................................................................................................................52
Chapter 15: Cluster analysis.............................................................................................................53
15.1 Introduction..........................................................................................................................53
15.2 Measuring distance between two observations...................................................................53
15.3 Measuring distance between two clusters...........................................................................54
15.4 Hierarchical (agglomerative) clustering................................................................................55
15.5 Non-hierarchical clustering: the k-means algorithm.............................................................56
3
, Articles Bi DM (Week 1-3)
Art.1 Database Management (1.1, 1.6, 2.1-2.3, 3.1, 3.2, 3.6, 5.1, 5.2)
1. Database systems
1.1 Data vs. Information
Data are raw facts. The word raw indicates that the facts have not yet been processed to reveal
their meaning. Keep in mind that raw data must be properly formatted for storage, processing,
and presentation.
Information is the result of processing raw data to reveal its meaning. To reveal meaning,
information requires context.
Data are the foundation of information, which is the bedrock of knowledge—that is, the body of
information and facts about a specific subject. Knowledge implies familiarity, awareness, and
understanding of information as it applies to an environment. A key characteristic of knowledge
is that “new” knowledge can be derived from “old” knowledge.
Let’s summarize some key points:
Data constitute the building blocks of information.
Information is produced by processing data.
Information is used to reveal the meaning of data.
Accurate, relevant, and timely information is the key to good decision making.
Good decision making is the key to organizational survival in a global environment.
Data management is a discipline that focuses on the proper generation, storage, and retrieval of
data.
1.6 Database systems
Unlike the file system, with its many separate and unrelated files, the database system consists
of logically related data stored in a single logical data repository. (The “logical” label reflects the
fact that, although the data repository appears to be a single unit to the end user, its contents
may actually be physically distributed among multiple data storage facilities and/or locations.)
The current generation of DBMS software stores not only the data structures, but also the
relationships between those structures and the access paths to those structures—all in a central
location. Also takes care of defining, storing, and managing all required access paths to those
components.
1.6.1 The database system environment
Database system refers to an organization of components that define and regulate the
collection, storage, management, and use of data within a database environment.
From a general management point of view, the database system is composed of the five major
parts: hardware, software, people, procedures, and data.
1. Hardware: Hardware refers to all of the system’s physical devices.
2. Software: Although the most readily identified software is the DBMS itself, to make the database
system function fully, three types of software are needed: operating system software, DBMS
software, and application programs and utilities.
o Operating system software manages all hardware components and makes it possible for
all other software to run on the computers.
4