Samenvatting Data Science Methods for MADS (EBM216A05)
133 views 6 purchases
Course
Data Science Methods for MADS (EBM216A05)
Institution
Rijksuniversiteit Groningen (RuG)
Comprehensive summary for the course Data Science Methods for MADS. Both the lectures and mandatory literature are in the summary. Additional information has also been used to better understand certain concepts.
1. MACHINE LEARNING
Machine learning
Herbert Alexander Simon: “Learning is any process by which a system improves performance from
experience. Machine learning is concerned with computer programs that automatically improve their
performance through experience”
Machines learning
o Learns the data
o Build predictions of new data based on past data
o Improve performance through experience
o More past data > better model > higher accuracy of current outcomes
Many ways in which the machine learns
o Supervised learning
Uses labeled data to train the model
Dependent techniques
Correct results are known and given as an input to the model during the learning
process
Seeks to learn from a training set to predict the output when given an input, less
concerned with the “true” linkage between variables.
Methods: Logistic regression, neural networks, SVM, K-nearest Neighbour, Naïve
Bayes, Decision Tree and Artificial Neural Networks
Ensemble methods: meta-learning algorithms that combine multiple
individual learners (e.g. random forest, gradient boosted trees, XGBoost).
Probabilistic graphical models: use (un)directed graphs to encode the
conditional dependence of the random variable (e.g. Bayesian networks,
Markov random fields)
Deep neutral networks: artificial neural networks with more than one
hidden layer (e.g. convolutional neutral networks, recurrent neural
networks).
o Unsupervised learning
Clustering data based on similar characteristics
No labeled data
Interdependent techniques
Goal is to find hidden patterns in the data.
Clustering, factor analysis
Methods: clustering (K-means, hierarchical, DBSCAN), dimensionality reduction
(PCA, singular value decomposition, factor analysis).
Topic models: discover and extract semantic structures from textual data
(e.g. Latent Dirichlet Allocation (LDA))
Representation learning: allows a system to automatically discover the
representation needed for feature detection or classification from raw data
(e.g. autoencoder, word embedding, network embedding)
, o Reinforcement learning
Based on feedback; learn from feedback given
Feedback is given to output and put back into the machine learning model
Thus, the learning agent interacts with the environment by taking actions and
observing feedback in order to optimize a certain objective function
Methods: multi-earned bandit, dynamic programming, sarsa, n-step temporal
difference, deep Q network.
o Ma & Sun (2019) name some other machine learning methods
Semi-supervised learning = the output is known for only a subset of data
Transfer learning = adjusting an existing model, which is trained using a different
dataset for a different purpose, based on the current training data set, for the task
at hand
Active learning = limited training instances available at first, can be acquired by
algorithm to improve predictive accuracy but determining the most important
training instances is costly
Input of data Machine learning model Output according to algorithm applied
Iteration = a term used in machine learning that indicates the number of times the algorithm’s
parameters are updated. Any machine learning is composed of multiple iterations. It does
something, checks if it is right, does something again. 100 iterations = 100 re-tries in which the
machine corrects errors
Difference between data mining and machine learning
Data mining
The process of discovering patterns in a data set
Before machine learning
Perform data mining by using programming methods and algorithms
Helps to extract useful data from large amounts of raw data
Helps us to understand the data and make it usable
Involves manual efforts to find knowledge and insights in data
Part of the Knowledge Discovery in Databases (KDD) process
o Non-trivial process of identifying implicit, valid, novel, potentially useful, and
understandable patterns in data
o Data base Data warehouse Data mining Evaluation Knowledge
Machine learning
Techniques to make computers learn new things without explicitly programming
Based on pattern recognition, computational learning and artificial intelligence
Main uses of machine learning are predictive analysis and classification
Algorithms / models train the system to identify patterns / learn about new insights
,Strengths and weaknesses of machine learning methods
Strengths Weaknesses
Ability to handle unstructured data (e.g. texts, images) Not easy to interpret but
and data of hybrid formats (e.g. combination of texts, 1) many ML methods have statistical foundations with
images). interpretable parameters
2) post-hoc interpretation techniques exist and
3) models have been adapted for interpretation.
Ability to handle large data volume: millions of Relationship typically correlational instead of causal:
observations are the norm. predictive focus causes little focus on endogeneity.
Flexible model structure: increases the chance of Unproven on analysing individual consumer level
capturing true linkage between input and output variables. heterogeneity and dynamics
Strong predictive performance in real-world settings.
Over-fitting and under-fitting in machine learning
Overfitting = good performance on the training data, poor generalization to other data
A overfitted model has too many parameters to be justified by the actual underlying data and
therefore build an overly complex model.
The model function has too much complexity (parameters) to fit the true
function correctly.
Underfitting = poor performance on the training data and poor generalization to other data
A underfitted model has not enough parameters to capture the trends in the underlying system.
The model function does not have enough complexity (parameters) to fit
the true function correctly
Artificial Intelligence (AI) vs. machine learning vs.
deep learning
Artificial intelligence = Automated systems that make
split-second context-dependent decisions. Generally
implemented using machine learning algorithms. It is
about making the machine behave in ways that would be
called intelligent if a human were behaving like that. This
term is often used to describe machines that mimic
, “cognitive” functions that humans associate with the human mind such as learning and problem
solving.
Machine learning = a computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P.
Deep learning = representation-learning methods with multiple levels of representation. Obtained by
composing simple but non-linear modules that each transform the representation at one level into a
representation at a higher, slightly more abstract level. The adjective “deep” in deep learning comes
from the use of multiple layers in the network. Deep learning is concerned with an unbounded number
of (hidden) layers which permits practical application and optimized implementation.
AI-driven marketing industry trends
Marketing trends
Interactive and media-rich: ML can generate insights in the (mobile) interactions between
consumers and the firm
Personalization and targeting: ML method are propelling personalization and targeting to a
new level. ML assists in context-dependent targeting
Real-time optimization and automation: ML methods are the go-to solutions for
optimization and automation
Customer journey focus: ML can help firms to master the decision journey
Marketing practices
Customer engagement: AI-driven innovations are rapidly reshaping engagement practices
Search: ML can improve the relevance and robustness of search results
Recommendation: ML is used to effectively match products and consumers
Attribution: ML is able to generate accurate performance feedback and can help improve
channel design and allocation
Review of machine learning literature in marketing (Ma & Sun, 2020)
SVM: one of the first ML methods introduced to marketing. SVM predicts better than logit models,
logistic regression and hierarchical bayes. Overwhelms traditional methods.
Traditional text-mining: text-mining process includes downloading, cleaning, information extraction,
chunking, and identification of semantic relationships. Can help firms to identify response-worthy
reviews.
Topic models: can be used to identify topics from consumer search queries and webpages, can
also show that topics from those two sources are related. Have been not only applied to text, but also
to other marketing settings where semantic structure exists (e.g. predicting purchases, user profiling).
Deep learning: most frequently used in marketing for analyzing text and images. Was used to
evaluate feature importance in predicting conversion. Was used to investigate the impact of images on
demand. Was used in a study proving that photos are more predictive in restaurant survival than
reviews. Was used to extract image features to predict person’s attractiveness.
Tree ensembles: was used to show that personalization improved the clicks to the top position and
that the return to personalization varied with user history and query type.
Causal forest: recent advancements have made it possible to use ML for causal research. Was used to
investigate how information disclosure affects pharmaceutical companies’ payments to physicians.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller nikkinuman. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.05. You're not tied to anything after your purchase.