Samenvatting

Business Intelligence and Data Management - Summary Book

Name: Business Intelligence and Data Management - Summary Book
SKU: doc_408877
Rating: 4.00 (1 reviews)
Author: brendastevense

1 beoordeling

34 keer verkocht

Instelling
Tilburg University (UVT)

Boek
Data Mining for Business Analytics

This is a summary of Chapters 1, 2, 5, 6, 7, 8, 9, 14 and 15 from "Data Mining for Business Analytics" (Schmueli, Bruce & Patel, 3th edition). These are the relevant chapters for the exam of the course Business Intelligence and Data Management, which is compulsory for MSc Information Management and...

[Meer zien]

Voorbeeld 4 van de 57 pagina's

Bekijk voorbeeld

Heel boek samengevat? Nee
Wat is er van het boek samengevat? Chapters 1, 2, 5, 6, 7, 8, 9, 14 and 15
Geupload op 22 maart 2018
Aantal pagina's 57
Geschreven in 2017/2018
Type Samenvatting

1 beoordeling

Door: judithlambrichs • 5 jaar geleden

Volgen

brendastevense Lid sinds 7 jaar 39 documenten verkocht

Chapter 1 Introduction

1.1 What is Business Analytics?

Business analytics (BA) is the practice of bringing quantitative data to bear on decision-making. It
includes a range of data analysis methods.

Business intelligence (BI) refers to data visualization and reporting for understanding the past and the
present. It has evolved into effective tools and practices, such as creating interactive dashboards that
allow the user not only to access real-time data but also to directly interact with it.

Business analytics now typically includes BI as well as sophisticated data analysis methods used for
exploring relationships between measurements, predicting new records, and to forecast future
values.

1.2 What is Data Mining?

Data mining refers to business analytics methods that include statistical and machine-learning
methods that inform decision making, often in automated fashion. Prediction is typically an
important component, often at the individual level.

1.3 Data Mining and Related Terms

Data mining stands at the confluence of the fields of statistics and machine learning (also known as
artificial intelligence). However, in classical statistics computing is difficult and data are scarce,
whereas in data mining applications both data and computing power are plentiful.

Another major difference is the focus inn statistics on inference from a sample to the population. In
contrast, the focus in machine learning is on predicting individual records.

Data mining is vulnerable to the danger of overfitting, where a model is fit so closely to the available
sample of data that it describes not merely structural characteristics of the data but random
peculiarities as well.

We use the term machine learning to refer to algorithms that learn directly from data, especially
local patterns, often in layered or iterative fashion. In contrast, we use statistical models to refer to
methods that apply global structure to the data.

1.4 Big Data

Big data is a relative term that refers to the amount of data by reference to the past, and to the
methods and devices available to deal with them. The challenge big data presents if often
characterized by the four V’s:
- Volume refers to the amount of data.

,- Velocity refers to the speed at which it is being generated and changed.
- Variety refers to the different types of data being generated.
- Veracity refers to the fact that data is being generated by organic distributed processes and not
subject to the controls or quality checks that apply to data collected for a study.

Most large organizations face both the challenge and the opportunity of big data because most
routine data processes now generate data that can be stored and, possibly, analyzed.

1.5 Data Science

Data science is a mix of skills in the areas of statistics, machine learning, math, programming,
business, and IT. However, it is a rare individual who combines deep skills in all the constituent areas.

This book focuses on developing the statistical and machine learning models that will eventually be
plugged into a deployed system.

1.6 Why Are There So Many Different Methods?

The usefulness of a method can depend on factors such a the size of the dataset, the types of
patterns that exist in the data, whether the data meet some underlying assumptions of the method,
how noise the data are, and the particular goal of the analysis.

Different methods can lead to different results, and their performance can vary. It is therefore
customary in data mining to apply several different methods and select the one that appears most
useful for the goal at hand.

,Chapter 2 Overview of the Data Mining Process

2.1 Introduction

This book focuses on predictive analytics, the tasks of classification and prediction as well as pattern
discovery. Not covered are OLAP (online analytical processing) and SQL (structured query language).
They do not involve statistical modeling or automated algorithmic methods.

2.2 Core Ideas in Data Mining

Classification

A common task in data mining is to predict the value of a categorical variable (e.g. the recipient of an
offer can respond or not respond). Similar data where the classification is known are used to develop
rules, which are then applied to the data with the unknown classification.

Prediction

Here, we are trying to predict the value of a numerical variable (e.g. amount of purchase).

Association Rules and Recommendation Systems

Association rules, or affinity analysis, is designed to find general associations patterns between items
in large databases (“what goes with what”). For example, grocery stores can use such information for
bundling products and it can help predict future symptoms for returning patients.

Online recommendation systems (e.g. Netflix) use collaborative filtering, which is a method that
generates “what goes with what” at the individual user level. Recommendation systems aim to
deliver personalized recommendations to users with a wide range of preferences.

Predictive Analytics

Classification, prediction, and association rules and collaborative filtering constitute the analytical
methods employed in predictive analytics.

Data Reduction and Dimension Reduction

The performance of data mining algorithms if often improved when the number of variables is
limited, and when large numbers of records can be grouped into homogeneous groups. The process
of consolidating a large number of records into a smaller set is termed data reduction. Methods for
reducing the number of cases are often called clustering.

Reducing the number of variables is typically called dimension reduction, which improves predictive
power, manageability, and interpretability.

, Data Exploration and Visualization

Exploration is used for data cleaning and manipulation as well as for visual discovery and hypothesis
generation.

Exploration by creating charts and dashboards is called data visualization or visual analytics. For
numerical variables we use histograms and boxplots to learn about the distribution of their value, to
detect outliers, and to find other information that is relevant to the analysis. Similarly, for categorical
variables we use bar charts.

Supervised and Unsupervised Learning

For supervised learning algorithms we must have data available in which the value of the outcome of
interest is known. These training data are the data from which the algorithm “learns” or is “trained”
about the relationship between predictor variables and the outcome variable. The algorithm is then
applied to the validation data where the outcome is known, to see how well it does in comparison to
other models. It is prudent to save a third sample which also includes known outcomes (the test
data) to use with the model finally selected to predict how well it will do. The model can then be
used to classify or predict the outcome of interest in new cases where the outcome is unknown.

Unsupervised learning algorithms are those used where there is no outcome variable to predict or
classify. Association rules, dimension reduction methods, and clustering techniques are all
unsupervised learning methods.

2.3 The Steps in Data Mining

Here is a list of steps to taken in a typical data mining effort:
1. Develop an understanding of the purpose of the data-mining project. The most serious errors in
analytics projects result from a poor understanding of the problem.

2. Obtain the dataset to be used in the analysis. This often involves random sampling from a large
database. It may also involve pulling together data from different databases or sources. The
databases could be internal (e.g. past purchases made by customers) or external (e.g. credit
ratings).

3. Explore, clean, and preprocess the data. This step involves verifying that the data are in
reasonable condition (e.g. missing data, reasonable range of values, outliers, consistency in the
definitions of fields).

4. Reduce the data dimension, if necessary. Dimension reduction can involve operations such as
eliminating unneeded variables, transforming variables, and creating new variables.

5. Determine the data-mining task. This involves translating the general question or problem into a
more specific data-mining question.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper brendastevense. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67479 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

Samenvatting

Business Intelligence and Data Management - Summary Book

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

1 beoordeling

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

In een paar klikken geregeld

Direct to-the-point

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?