Data Science Methods (DSM) all papers of reading list summary
Table of contents
LECTURE 1..................................................................................................................................................... 2
Chapter 11.4 "Evaluation of Statistical Models" (p. 308-320) from Blattberg, R.C., Kim, B.-D., and Neslin, S.A.
(2008), "Database Marketing: Analyzing and Managing Customers" Springer, New York.................................2
Kübler, R, Wieringa, J.E. and Pauwels, K.H. (2017) "Machine Learning and Big Data" Chapter 19 in: Leeflang
et al (eds), "Advanced Methods for Modeling Markets" Springer, New York.....................................................3
De Haan, E., Verhoef, P. C., & Wiesel, T. (2015). The predictive ability of different customer feedback metrics
for retention. International Journal of Research in Marketing, 32(2), 195-206..................................................4
LECTURE 2..................................................................................................................................................... 6
Neslin, S. A., Gupta, S., Kamakura, W., Lu, J., & Mason, C. H. (2006). Defection detection: Measuring and
understanding the predictive accuracy of customer churn models. Journal of marketing research, 43(2), 204-
211........................................................................................................................................................................6
Lemmens, A., & Croux, C. (2006). Bagging and boosting classification trees to predict churn. Journal of
Marketing Research, 43(2), 276-286....................................................................................................................7
LECTURE 3..................................................................................................................................................... 9
Holtrop, N., Wieringa, J. E., Gijsenberg, M. J., & Verhoef, P. C. (2017). No future without the past? Predicting
churn in the face of customer privacy. International Journal of Research in Marketing, 34(1), 154-172...........9
LECTURE 4.................................................................................................................................................... 11
Elements of Statistical Learning, Hastie et al., Chapter 3..................................................................................11
Elements of Statistical Learning, Hastie et al., Chapter 7..................................................................................12
Elements of Statistical Learning, Hastie et al., Chapter 9.4...............................................................................13
LECTURE 5.................................................................................................................................................... 15
Misra, K., Schwartz, E. M., & Abernethy, J. (2019). Dynamic online pricing with incomplete information using
multiarmed bandit experiments. Marketing Science, 38(2), 226-252...............................................................15
Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. A Tutorial on Thompson Sampling. Chapters (1-
4)........................................................................................................................................................................16
LECTURE 6.................................................................................................................................................... 18
P.C. Verhoef, E. Kooge, N. Walk, and J.E. Wieringa (2022) "Creating Value with Data Analytics in Marketing"
Routledge, Chapter 6 ("Customer Privacy and Data Security").........................................................................18
Tsalikis, J., & Fritzsche, D. J. (2013). Business Ethics: A Literature Review with a Focus on Marketing Ethics.
Citation Classics from the Journal of Business Ethics, 337–404.........................................................................18
Ethics guideline for trustworthy AI, chapter I and II..........................................................................................19
LECTURE 7.................................................................................................................................................... 21
Athey, S., & Imbens, G. W. (2015). Machine learning methods for estimating heterogeneous causal effects.
stat, 1050(5), 1-26..............................................................................................................................................21
*The key takeaways and the summaries may, of course, contain overlapping information.
,LECTURE 1
Chapter 11.4 "Evaluation of Statistical Models" (p. 308-320) from Blattberg, R.C., Kim, B.-
D., and Neslin, S.A. (2008), "Database Marketing: Analyzing and Managing Customers"
Springer, New York.
Chapter 11.4 of "Evaluation of Statistical Models" (p. 308-320) from Blattberg, R.C., Kim, B.-
D., and Neslin, S.A. (2008), "Database Marketing: Analyzing and Managing Customers"
discusses the evaluation of statistical models. The authors discuss how statistical models are
used to make predictions or to understand the relationship between different variables, and
how the accuracy of these models can be evaluated.
In this chapter, the authors cover a number of topics related to model evaluation, including
the use of validation data sets, the use of cross-validation techniques, the calculation of
prediction error, and the use of measures of model fit such as R-squared. They also discuss
the concept of overfitting, which is when a model is too complex and does not generalize
well to new data, and the importance of choosing an appropriate level of model complexity.
Overall, this chapter provides an overview of the methods and techniques used to evaluate
the performance of statistical models in the context of database marketing.
There are several methods and techniques that are commonly used to evaluate the
performance of statistical models in the context of database marketing:
1. Validation data sets: A model can be trained on one set of data and tested on a
separate, independent validation set in order to evaluate its performance. This allows for an
estimate of how the model will perform on new, unseen data.
2. Cross-validation: This is a technique in which the data is split into a number of folds,
and the model is trained and tested on different combinations of the folds. This can provide
a more robust estimate of model performance, as it uses multiple train-test splits.
3. Prediction error: The difference between the predicted values and the actual values
can be used to measure the accuracy of a model. Common measures of prediction error
include mean absolute error (MAE), mean squared error (MSE), and root mean squared
error (RMSE).
4. Model fit measures: Measures such as R-squared can be used to evaluate the degree
to which the model explains the variance in the data. A higher R-squared value indicates a
better-fitting model.
5. Overfitting: It is important to avoid overfitting, which is when a model is too complex
and does not generalize well to new data. One way to avoid overfitting is to use
regularization techniques, which constrain the model to prevent it from becoming too
complex.
Overall, the choice of evaluation methods will depend on the specific goals and context of
the modeling project, as well as the nature of the data and the type of model being used.
, A few additional key takeaways from Chapter 11.4 of "Evaluation of Statistical Models" (p.
308-320) from Blattberg, R.C., Kim, B.-D., and Neslin, S.A. (2008), "Database Marketing:
Analyzing and Managing Customers":
1. It is important to carefully evaluate the performance of statistical models in order to
ensure that they are reliable and accurate.
2. There are a variety of methods and techniques that can be used to evaluate model
performance, including the use of validation data sets, cross-validation, prediction error
measures, and model fit measures.
3. It is important to avoid overfitting, which is when a model is too complex and does
not generalize well to new data.
4. Choosing an appropriate level of model complexity is important for achieving good
model performance. A model that is too simple may not capture the complexity of the data,
while a model that is too complex may overfit the data.
5. Model evaluation should be an ongoing process, as the performance of a model can
change over time as the underlying data changes. It is important to periodically re-evaluate
and update models as needed to ensure that they continue to perform well.
Kübler, R, Wieringa, J.E. and Pauwels, K.H. (2017) "Machine Learning and Big Data"
Chapter 19 in: Leeflang et al (eds), "Advanced Methods for Modeling Markets" Springer,
New York.
Based on the abstract you have provided, it seems that the key takeaways of the article
"Machine Learning and Big Data" Chapter 19 in "Advanced Methods for Modeling Markets"
by Kübler, R, Wieringa, J.E. and Pauwels, K.H. (2017) are:
1. The volume of data available to marketers has increased significantly in recent years,
including data from consumer handscan panels, brand tracking, clickstream data, and social
media interactions.
2. These data sources can be used to build models that combine consumer actions with
mindset metrics and to understand customers' online decision journeys.
3. The increasing volume of social media data provides new opportunities for marketing
research and can be combined with existing data sources to further develop the field.
4. Big data analytics can be used to create value for customers and firms.
Overall, the article seems to discuss the ways in which the availability of large amounts of
data, including data from social media, is transforming marketing research and the
opportunities it provides for understanding and predicting consumer behavior.
Summary of the introduction:
In the last decade, there has been a significant increase in data available to marketers. This
data includes consumer actions and mindset metrics, clickstream data, and social media
interaction data. Combining these data sources allows for further development in marketing
research and provides new opportunities for insights. Predictive modeling has long been
used in marketing research and practice, and machine learning has recently become more
prevalent in marketing due to the increase in data availability. Machine learning allows for
the development of algorithms that can classify customers and make real-time advertising