100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Data Science Methods for MADS (EBM216A05) €6,49
In winkelwagen

Samenvatting

Samenvatting Data Science Methods for MADS (EBM216A05)

 133 keer bekeken  6 keer verkocht

Uitgebreide samenvatting voor het vak Data Science Methods for MADS. Zowel de colleges en verplichte literatuur staan in de samenvatting. Ook aanvullende informatie is gebruikt om bepaalde concepten beter te snappen.

Voorbeeld 4 van de 34  pagina's

  • 5 februari 2021
  • 34
  • 2020/2021
  • Samenvatting
Alle documenten voor dit vak (7)
avatar-seller
nikkinuman
DATA SCIENCE METHODS

1. MACHINE LEARNING
Machine learning
Herbert Alexander Simon: “Learning is any process by which a system improves performance from
experience. Machine learning is concerned with computer programs that automatically improve their
performance through experience”
 Machines learning
o Learns the data
o Build predictions of new data based on past data
o Improve performance through experience
o More past data > better model > higher accuracy of current outcomes

 Many ways in which the machine learns
o Supervised learning
 Uses labeled data to train the model
 Dependent techniques
 Correct results are known and given as an input to the model during the learning
process
 Seeks to learn from a training set to predict the output when given an input, less
concerned with the “true” linkage between variables.
 Methods: Logistic regression, neural networks, SVM, K-nearest Neighbour, Naïve
Bayes, Decision Tree and Artificial Neural Networks
 Ensemble methods: meta-learning algorithms that combine multiple
individual learners (e.g. random forest, gradient boosted trees, XGBoost).
 Probabilistic graphical models: use (un)directed graphs to encode the
conditional dependence of the random variable (e.g. Bayesian networks,
Markov random fields)
 Deep neutral networks: artificial neural networks with more than one
hidden layer (e.g. convolutional neutral networks, recurrent neural
networks).

o Unsupervised learning
 Clustering data based on similar characteristics
 No labeled data
 Interdependent techniques
 Goal is to find hidden patterns in the data.
 Clustering, factor analysis
 Methods: clustering (K-means, hierarchical, DBSCAN), dimensionality reduction
(PCA, singular value decomposition, factor analysis).
 Topic models: discover and extract semantic structures from textual data
(e.g. Latent Dirichlet Allocation (LDA))
 Representation learning: allows a system to automatically discover the
representation needed for feature detection or classification from raw data
(e.g. autoencoder, word embedding, network embedding)

, o Reinforcement learning
 Based on feedback; learn from feedback given
 Feedback is given to output and put back into the machine learning model
 Thus, the learning agent interacts with the environment by taking actions and
observing feedback in order to optimize a certain objective function
 Methods: multi-earned bandit, dynamic programming, sarsa, n-step temporal
difference, deep Q network.

o Ma & Sun (2019) name some other machine learning methods
 Semi-supervised learning = the output is known for only a subset of data
 Transfer learning = adjusting an existing model, which is trained using a different
dataset for a different purpose, based on the current training data set, for the task
at hand
 Active learning = limited training instances available at first, can be acquired by
algorithm to improve predictive accuracy but determining the most important
training instances is costly

 Input of data  Machine learning model  Output according to algorithm applied

 Iteration = a term used in machine learning that indicates the number of times the algorithm’s
parameters are updated. Any machine learning is composed of multiple iterations. It does
something, checks if it is right, does something again. 100 iterations = 100 re-tries in which the
machine corrects errors

Difference between data mining and machine learning
Data mining
 The process of discovering patterns in a data set
 Before machine learning
 Perform data mining by using programming methods and algorithms
 Helps to extract useful data from large amounts of raw data
 Helps us to understand the data and make it usable
 Involves manual efforts to find knowledge and insights in data
 Part of the Knowledge Discovery in Databases (KDD) process
o Non-trivial process of identifying implicit, valid, novel, potentially useful, and
understandable patterns in data
o Data base  Data warehouse  Data mining  Evaluation  Knowledge

Machine learning
 Techniques to make computers learn new things without explicitly programming
 Based on pattern recognition, computational learning and artificial intelligence
 Main uses of machine learning are predictive analysis and classification
 Algorithms / models train the system to identify patterns / learn about new insights

,Strengths and weaknesses of machine learning methods
Strengths Weaknesses
Ability to handle unstructured data (e.g. texts, images) Not easy to interpret but
and data of hybrid formats (e.g. combination of texts, 1) many ML methods have statistical foundations with
images). interpretable parameters
2) post-hoc interpretation techniques exist and
3) models have been adapted for interpretation.
Ability to handle large data volume: millions of Relationship typically correlational instead of causal:
observations are the norm. predictive focus causes little focus on endogeneity.
Flexible model structure: increases the chance of Unproven on analysing individual consumer level
capturing true linkage between input and output variables. heterogeneity and dynamics
Strong predictive performance in real-world settings.




Over-fitting and under-fitting in machine learning
Overfitting = good performance on the training data, poor generalization to other data
 A overfitted model has too many parameters to be justified by the actual underlying data and
therefore build an overly complex model.

The model function has too much complexity (parameters) to fit the true
function correctly.




Underfitting = poor performance on the training data and poor generalization to other data
 A underfitted model has not enough parameters to capture the trends in the underlying system.

The model function does not have enough complexity (parameters) to fit
the true function correctly




Artificial Intelligence (AI) vs. machine learning vs.
deep learning
Artificial intelligence = Automated systems that make
split-second context-dependent decisions. Generally
implemented using machine learning algorithms. It is
about making the machine behave in ways that would be
called intelligent if a human were behaving like that. This
term is often used to describe machines that mimic

, “cognitive” functions that humans associate with the human mind such as learning and problem
solving.
Machine learning = a computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P.
Deep learning = representation-learning methods with multiple levels of representation. Obtained by
composing simple but non-linear modules that each transform the representation at one level into a
representation at a higher, slightly more abstract level. The adjective “deep” in deep learning comes
from the use of multiple layers in the network. Deep learning is concerned with an unbounded number
of (hidden) layers which permits practical application and optimized implementation.


AI-driven marketing industry trends
Marketing trends
 Interactive and media-rich: ML can generate insights in the (mobile) interactions between
consumers and the firm
 Personalization and targeting: ML method are propelling personalization and targeting to a
new level. ML assists in context-dependent targeting
 Real-time optimization and automation: ML methods are the go-to solutions for
optimization and automation
 Customer journey focus: ML can help firms to master the decision journey

Marketing practices
 Customer engagement: AI-driven innovations are rapidly reshaping engagement practices
 Search: ML can improve the relevance and robustness of search results
 Recommendation: ML is used to effectively match products and consumers
 Attribution: ML is able to generate accurate performance feedback and can help improve
channel design and allocation

Review of machine learning literature in marketing (Ma & Sun, 2020)
SVM: one of the first ML methods introduced to marketing. SVM predicts better than logit models,
logistic regression and hierarchical bayes. Overwhelms traditional methods.

Traditional text-mining: text-mining process includes downloading, cleaning, information extraction,
chunking, and identification of semantic relationships. Can help firms to identify response-worthy
reviews.

Topic models: can be used to identify topics from consumer search queries and webpages, can
also show that topics from those two sources are related. Have been not only applied to text, but also
to other marketing settings where semantic structure exists (e.g. predicting purchases, user profiling).

Deep learning: most frequently used in marketing for analyzing text and images. Was used to
evaluate feature importance in predicting conversion. Was used to investigate the impact of images on
demand. Was used in a study proving that photos are more predictive in restaurant survival than
reviews. Was used to extract image features to predict person’s attractiveness.

Tree ensembles: was used to show that personalization improved the clicks to the top position and
that the return to personalization varied with user history and query type.

Causal forest: recent advancements have made it possible to use ML for causal research. Was used to
investigate how information disclosure affects pharmaceutical companies’ payments to physicians.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper nikkinuman. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 56326 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,49  6x  verkocht
  • (0)
In winkelwagen
Toegevoegd