Chapter 1
Business analytics (BA) is the practice and art of bringing quantitative data together
to bear decision making. It includes a range of data analysis methods
Next level of analytics is Business Intelligence. It refers to data visualization and reporting
for understanding.
Data mining refers to business analytics that go beyond (BA) count, descriptive
techniques, reporting and methods based on business rules. Data mining methods have the
ability to cope with huge amounts of (big) data and extract value. Synonyms for data mining:
predictive analytics, predictive modelling and machine learning.
Machine learning vs. statistics: it is not the same. Statistics is focused on the
‘average effect’ on a box while machine learning is focused on predicting individual boxes.
With data mining there is the risk of overfitting, which is not allowed in statistics.
Definition of machine learning in this book: algorithms that learn directly from data.
Definition of statistical models: methods that apply global structure to data. Many
practitioners use machine learning to refer to all the methods from this book.
Big data is a relative term. The challenges of it are often related to four V’s: velocity
(speed), veracity (organic, so no quality standards), variety, volume.
Data science is a mix of skills in the area of business, statistics, machine learning,
math, programming and IT. A data scientist is a rare individual who combine deep skills in all
constituent areas.
, Chapter 2
The core of the book focus on what’s called predictive analytics: the tasks of
classification and prediction as well as pattern discovery, which have become key elements
of a business analytics function.
Core ideas in data mining: classification is perhaps the most basis form of business
analytics. Persons pays or not, respond or does respond or not etc. Task of data mining is to
examine whether the classification is unknown or will occur in the future. Prediction is
similar to that, except that we are trying to predict the value of a numerical value rather
than a class (yes or no). → refers to prediction of the value of a continuous variable.
Association rules or affinity analysis is designed to find general associations patterns
between items in large databases.
Online recommendation systems (Amazon & Netflix) use collaborative filtering, a
method that uses individual user’s preference based on history, behaviour etc.
Classification, prediction, and, to some extent, association rules and collaborative
filtering constitute the analytical methods employed in predictive analytics.
The process of consolidating a large number records (or cases) into smaller set is
called data reduction. Methods for reducing the amount of cases are often called clustering.
Reducing the number of variables is called dimension reduction, which is a common step
before deploying supervised learning methods on the data.
Exploration is in one of the earliest stages of engaging with the data and is about
understanding the global landscape of the data and detecting unusual values. Methods are:
looking at different aggregations, check individual values and relationships between them,
creating charts and dashboards → data visualization or visual analytics.
Fundamental distinction among data mining techniques: supervised learning
algorithms are those used in classification and prediction. You need to have train data so the
algorithm can ‘train’ and learn on it. Then you need validation data to benchmark with other
models and after that you can use the model at a case where the outcome is unknown.
(example: simple linear regression model). Unsupervised learning algorithms are those used
where there is no outcome variable to predict or classify. Association rules, dimension
reduction methods and clustering techniques are examples of unsupervised methods.
List of steps to be taken in a typical data mining effort:
1. Develop an understanding of the purpose of the data mining project
2. Obtain the data set to be used in the analysis
3. Explore, clean and preprocess the data
4. Reduce the data dimension, if necessary
5. Determine the data mining task (classification, prediction, clustering etc.)
6. Partition of the data (for supervised tasks)
7. Choose the data mining techniques to be used
8. Use algorithms to perform the tasks (iterative process)
9. Interpret the results of the algorithms
10. Deploy the mode
These steps encompass the steps in the SEMMA methodology, developed by SAS:
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Resumer. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.90. You're not tied to anything after your purchase.