Summary

Samenvatting Data Science Methods for MADS (EBM216A05)

133 views 6 purchases

Course
Data Science Methods for MADS (EBM216A05)

Institution
Rijksuniversiteit Groningen (RuG)

Comprehensive summary for the course Data Science Methods for MADS. Both the lectures and mandatory literature are in the summary. Additional information has also been used to better understand certain concepts.

[Show more]

Preview 4 out of 34 pages

View example

Uploaded on February 5, 2021
Number of pages 34
Written in 2020/2021
Type Summary

mads
marketing analytics
data science methods
dsm
data science
marketing
rug

Institution
Rijksuniversiteit Groningen (RuG)
Education
Marketing Analytics & Data Science
Course
Data Science Methods for MADS (EBM216A05)

nikkinuman

Member since 9 year 530 documents sold

137

$6.90

Also available in package deal from $8.06

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Also available in package deal (1)

DATA SCIENCE METHODS FOR MADS BUNDLE

$ 11.13 $ 8.06

7x sold

2 items

1. Summary - Samenvatting data science methods for mads (ebm216a05)
2. Summary - Samenvatting data science methods for mads (ebm216a05)
Show more

DATA SCIENCE METHODS

1. MACHINE LEARNING
Machine learning
Herbert Alexander Simon: “Learning is any process by which a system improves performance from
experience. Machine learning is concerned with computer programs that automatically improve their
performance through experience”
 Machines learning
o Learns the data
o Build predictions of new data based on past data
o Improve performance through experience
o More past data > better model > higher accuracy of current outcomes

 Many ways in which the machine learns
o Supervised learning
 Uses labeled data to train the model
 Dependent techniques
 Correct results are known and given as an input to the model during the learning
process
 Seeks to learn from a training set to predict the output when given an input, less
concerned with the “true” linkage between variables.
 Methods: Logistic regression, neural networks, SVM, K-nearest Neighbour, Naïve
Bayes, Decision Tree and Artificial Neural Networks
 Ensemble methods: meta-learning algorithms that combine multiple
individual learners (e.g. random forest, gradient boosted trees, XGBoost).
 Probabilistic graphical models: use (un)directed graphs to encode the
conditional dependence of the random variable (e.g. Bayesian networks,
Markov random fields)
 Deep neutral networks: artificial neural networks with more than one
hidden layer (e.g. convolutional neutral networks, recurrent neural
networks).

o Unsupervised learning
 Clustering data based on similar characteristics
 No labeled data
 Interdependent techniques
 Goal is to find hidden patterns in the data.
 Clustering, factor analysis
 Methods: clustering (K-means, hierarchical, DBSCAN), dimensionality reduction
(PCA, singular value decomposition, factor analysis).
 Topic models: discover and extract semantic structures from textual data
(e.g. Latent Dirichlet Allocation (LDA))
 Representation learning: allows a system to automatically discover the
representation needed for feature detection or classification from raw data
(e.g. autoencoder, word embedding, network embedding)

, o Reinforcement learning
 Based on feedback; learn from feedback given
 Feedback is given to output and put back into the machine learning model
 Thus, the learning agent interacts with the environment by taking actions and
observing feedback in order to optimize a certain objective function
 Methods: multi-earned bandit, dynamic programming, sarsa, n-step temporal
difference, deep Q network.

o Ma & Sun (2019) name some other machine learning methods
 Semi-supervised learning = the output is known for only a subset of data
 Transfer learning = adjusting an existing model, which is trained using a different
dataset for a different purpose, based on the current training data set, for the task
at hand
 Active learning = limited training instances available at first, can be acquired by
algorithm to improve predictive accuracy but determining the most important
training instances is costly

 Input of data  Machine learning model  Output according to algorithm applied

 Iteration = a term used in machine learning that indicates the number of times the algorithm’s
parameters are updated. Any machine learning is composed of multiple iterations. It does
something, checks if it is right, does something again. 100 iterations = 100 re-tries in which the
machine corrects errors

Difference between data mining and machine learning
Data mining
 The process of discovering patterns in a data set
 Before machine learning
 Perform data mining by using programming methods and algorithms
 Helps to extract useful data from large amounts of raw data
 Helps us to understand the data and make it usable
 Involves manual efforts to find knowledge and insights in data
 Part of the Knowledge Discovery in Databases (KDD) process
o Non-trivial process of identifying implicit, valid, novel, potentially useful, and
understandable patterns in data
o Data base  Data warehouse  Data mining  Evaluation  Knowledge

Machine learning
 Techniques to make computers learn new things without explicitly programming
 Based on pattern recognition, computational learning and artificial intelligence
 Main uses of machine learning are predictive analysis and classification
 Algorithms / models train the system to identify patterns / learn about new insights

,Strengths and weaknesses of machine learning methods
Strengths Weaknesses
Ability to handle unstructured data (e.g. texts, images) Not easy to interpret but
and data of hybrid formats (e.g. combination of texts, 1) many ML methods have statistical foundations with
images). interpretable parameters
2) post-hoc interpretation techniques exist and
3) models have been adapted for interpretation.
Ability to handle large data volume: millions of Relationship typically correlational instead of causal:
observations are the norm. predictive focus causes little focus on endogeneity.
Flexible model structure: increases the chance of Unproven on analysing individual consumer level
capturing true linkage between input and output variables. heterogeneity and dynamics
Strong predictive performance in real-world settings.

Over-fitting and under-fitting in machine learning
Overfitting = good performance on the training data, poor generalization to other data
 A overfitted model has too many parameters to be justified by the actual underlying data and
therefore build an overly complex model.

The model function has too much complexity (parameters) to fit the true
function correctly.

Underfitting = poor performance on the training data and poor generalization to other data
 A underfitted model has not enough parameters to capture the trends in the underlying system.

The model function does not have enough complexity (parameters) to fit
the true function correctly

Artificial Intelligence (AI) vs. machine learning vs.
deep learning
Artificial intelligence = Automated systems that make
split-second context-dependent decisions. Generally
implemented using machine learning algorithms. It is
about making the machine behave in ways that would be
called intelligent if a human were behaving like that. This
term is often used to describe machines that mimic

, “cognitive” functions that humans associate with the human mind such as learning and problem
solving.
Machine learning = a computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P.
Deep learning = representation-learning methods with multiple levels of representation. Obtained by
composing simple but non-linear modules that each transform the representation at one level into a
representation at a higher, slightly more abstract level. The adjective “deep” in deep learning comes
from the use of multiple layers in the network. Deep learning is concerned with an unbounded number
of (hidden) layers which permits practical application and optimized implementation.

AI-driven marketing industry trends
Marketing trends
 Interactive and media-rich: ML can generate insights in the (mobile) interactions between
consumers and the firm
 Personalization and targeting: ML method are propelling personalization and targeting to a
new level. ML assists in context-dependent targeting
 Real-time optimization and automation: ML methods are the go-to solutions for
optimization and automation
 Customer journey focus: ML can help firms to master the decision journey

Marketing practices
 Customer engagement: AI-driven innovations are rapidly reshaping engagement practices
 Search: ML can improve the relevance and robustness of search results
 Recommendation: ML is used to effectively match products and consumers
 Attribution: ML is able to generate accurate performance feedback and can help improve
channel design and allocation

Review of machine learning literature in marketing (Ma & Sun, 2020)
SVM: one of the first ML methods introduced to marketing. SVM predicts better than logit models,
logistic regression and hierarchical bayes. Overwhelms traditional methods.

Traditional text-mining: text-mining process includes downloading, cleaning, information extraction,
chunking, and identification of semantic relationships. Can help firms to identify response-worthy
reviews.

Topic models: can be used to identify topics from consumer search queries and webpages, can
also show that topics from those two sources are related. Have been not only applied to text, but also
to other marketing settings where semantic structure exists (e.g. predicting purchases, user profiling).

Deep learning: most frequently used in marketing for analyzing text and images. Was used to
evaluate feature importance in predicting conversion. Was used to investigate the impact of images on
demand. Was used in a study proving that photos are more predictive in restaurant survival than
reviews. Was used to extract image features to predict person’s attractiveness.

Tree ensembles: was used to show that personalization improved the clicks to the top position and
that the return to personalization varied with user history and query type.

Causal forest: recent advancements have made it possible to use ML for causal research. Was used to
investigate how information disclosure affects pharmaceutical companies’ payments to physicians.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller nikkinuman. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.90. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

59063 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Seller