Samenvatting

Samenvatting - Predictive and Prescriptive Analytics (F000801)

19 keer bekeken 1 keer verkocht

Instelling
Universiteit Gent (UGent)

Samenvatting van het vak Predictive and Prescriptive Analytics (PPA) gegeven door professor Matthias Bogaert, jaar . Samengevat op basis van les + slides

[Meer zien]

Voorbeeld 4 van de 35 pagina's

Bekijk voorbeeld

Geupload op 25 maart 2024
Aantal pagina's 35
Geschreven in 2023/2024
Type Samenvatting

€7,16

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

PREDICTIVE AND PRESCRIPTIVE ANALYSIS
LECTURE 1: LINEAR PROGRAMMING
Analytics

The point of analytics is always to make better decisions based on extracting knowledge from data..
There is descriptive, predictive and prescriptive analytics.

With predictive analytics you want to predict something in the future. Predictive analytics is a
subfield of machine learning, the difference lies in their predictions. Predictive analytics’ predictions
are about the future, while machine learning predictions are on unseen data.

Prescriptive Analytics

Prescriptive analytics prescribes which actions/decisions (maximize sales e.g.) would most likely
optimize the outcome of a certain decision process, by the use of auxiliary data. It tells you what you
need to do with the predictions from the predictive analysis part.

Auxiliary data = data from an external source, that is incorporated or linked in some way to the data
collected by the study e.g.

Example of prescriptive analysis; amazon or any other commercial site will recommend other
products to buy based on other products you have bought. This way they are prescribing to you what
to buy. This is amazon's way to increase their retention. There is always an optimization model behind
it.

Prescriptive Analytics has four main components to it:

- Optimization
- IT and Big Data
- Data Understanding
- Statistics and Data Mining

Prescriptive Analytics is mainly a mix of Predictive Analytics and Optimization:

1

,It uses predictive data to prescribe the future, which holds uncertainty. This is done by using dynamic
programming, stochastic optimization, reinforcement learning…

The definition of prescriptive analytics states: “A set of mathematical techniques that
computationally determine a set of high-value alternative actions or decisions given a complex set of
objectives, requirements and constraints, with the goal of improving business performance.

Main takeaways of this definition: it is a mathematical model, it is built computationally, we will
come up with an objective that have constraints and most importantly we will IMPROVE something.

Also, a focus on decision making using auxiliary data: Meaning we will make predictions that will help
us make better decisions and increase our decision power.

1. Data Representation:
a. Uncertain quantities of interest: Y1,…,Yn, where Y∈Υ⊂Rdy.
b. Auxiliary data on associated covariates: X1,…,Xn, where X∈χ⊂Rdx.
2. Objective Function:
a. Minimization of an uncertain cost: v=c(z;Y).
3. Standard Model:
a. Standard stochastic cost: vstoch=min E[c(z;Y)].
4. Model with Auxiliary Data:
a. Cost conditioned on auxiliary data: v∗=minE[c(z;Y)∣X=x].
5. Full-information Optimal Decision:
a. Full-information optimal decision: v∗ represents the optimal decision with complete
knowledge of data.
6. Predictive Prescriptions:
a. Predictive prescriptions: znX are decisions made based on observed covariates.
b. Goal: Achieve E[c(znX;Y)∣X=x]≈v∗(x) and v<vstoch.

In other words: Prescriptive analytics uses uncertain quantities Y and covariates X to minimize the
uncertain cost v=c(Z;Y). The standard model minimizes the expected cost vstoch, while considering
auxiliary data aims to minimize the expected cost v∗ conditioned on X. The goal is to make predictive
prescriptions ZnX that approximate the optimal decision v∗, achieving a cost v close to v∗ and less
than vstoch.

Prescriptive Performance

The performance metric used in prescriptive analytics is the Coefficient of Prescriptiveness (P). It
measures the efficacy of a predictive prescription. It is a
unitless measure that lies between 0 and 1 with 0 being not
prescriptive and 1 being highly prescriptive.

The formula states 1 minus the ratio of:

- The minimal cost achieved when utilizing auxiliary data compared to the minimal cost
achieved with perfect information (i.e., using actual values).
- The minimal cost achieved when excluding the auxiliary data compared to the minimal cost
achieved with perfect information.

2

,In other words, it is a measure used in prescriptive analytics to quantify the effectiveness or impact
of utilizing auxiliary data in decision-making processes. It represents the degree to which
incorporating auxiliary data improves decision-making compared to making decisions based solely on
perfect information.

Optimization Methods

Optimization methods vary in their handling of uncertainty and complexity. Exact methods, such as
linear programming for fractional and linear problems, typically provide the best solutions. Integer
programming, dealing with integer and linear problems, also offers precise solutions. On the other
hand, (meta-)heuristics and (non-)linear programming, addressing integer/fractional and linear/non-
linear problems respectively, provide good solutions but may not guarantee:

When conducting prescriptive analytics in Python, a linear optimization problem can be tackled using
the PuLP package. Additionally, sensitivity analysis can be performed using two key techniques:
shadow prices and reduced costs. Shadow prices represent the rate of change in the objective value
when the right-hand side of a constraint is increased by 1 unit. On the other hand, reduced costs
indicate the amount by which the coefficient of a decision variable must change to yield a positive
value for that variable. Notably, the reduced cost is equivalent to the shadow price of the non-
negativity constraint of the variable. It's essential to note that both shadow prices and reduced costs
are only valid within the allowable decrease/increase limits.

A Branch & Bound procedure, begins by solving the relaxed Linear Programming (LP) problem with
real values. It then divides the solution space into subsets through branching, ensuring they are
mutually exclusive and exhaustive. During the bounding phase, candidate solutions are checked
against a lower bound at each branch, discarding any that are worse. This method systematically
explores the solution space, leading to efficient identification of optimal solutions while minimizing
computational effort.

LECTURE 2: PREDICTIVE AND PRESCRIPTIVE ANALYSIS
When conducting a case study or any sort of business problem, the CRISP-DM model is the go to
process to go through. It stands for Cross Industry Standard Process for
Data Mining and involves following steps:

3

, Business & Data Understanding

These go hand in hand. You start by understanding what the business problem is, what the company
in question is dealing with and what they want do to tackle it. Followed up by taking a closer look at
the data for the first time and making some first descriptive statistics, checking for missing values …

Data Preparation

Here you create a basetable, which you will use to make models. Split your dataset into training, test
and validation sets. Check whether all variables are necessary to use and delete those not needed by
either PCA or preferably, by using Recursive Feature Elimination. This variable selection can decrease
the computation time by a lot. The major strength of this approach is that it selects variables using a
well-performing prediction algorithm and established variable importance score.

Recursive Feature Elimination (RFE) = a useful algorithm for feature selection, particularly when
dealing with numerous predictors. It works by iteratively removing the least important predictors
until a stopping criterion is met. Random Forest (RF) is well-suited for RFE due to its reliable variable
importance scores and its avoidance of excluding variables from the final prediction equation.
However, correlations among predictors should be carefully considered. Other tree-based methods
like gradient boosting are also viable options for RFE.

scikit-learn provides built-in functions for both Recursive Feature Elimination (RFE) and Recursive
Feature Elimination with Cross-Validation (RFECV). The main difference between the two is that RFE
requires manually selecting the number of features, while RFECV automates this process by
conducting cross-validation to determine the optimal number of features. Running RFE can be time-
consuming, but several strategies can decrease the computation time. These include reducing the
number of features or tuning parameters of the random forest classifier, such as decreasing the
number of trees or limiting the depth of each tree. Additionally, other methods like parallelization,
feature sampling, or model caching can also be effective in reducing running time.

Next to RFE, Boruta is another common variable selection method used with Random Forests. It
gradually eliminates features that are less relevant than randomly generated variables, while
BorutaShap enhances this process by incorporating Shapley values and providing flexibility with any
tree-based model. Both methods follow a similar procedure, involving the addition and shuffling of
shadow variables (=copies of all variables), ranking features, and iteratively selecting those that
surpass importance thresholds. BorutaShap is more flexible than Boruta since any tree-based model
can be used and also use mean decrease in Gini. Computational overhead is also lowered thanks to
the sampling procedure which takes the smallest possible subset at each iteration.

Choosing between RFE and Boruta depends on the goal: RFE aims for minimal-optimal variable
subsets for optimal classification (prediction), whereas Boruta aims to identify all relevant variables
for a comprehensive understanding. While both methods offer similar performance and computation
time, Boruta tends to be slightly more accurate.

Alternative variable selection methods include:

- Filter methods: gauge correlations between predictors and responses
o Pearson correlation, Fisher score, Cramer’s V…
- Wrapper methods: use external search procedures for subset selection
o RFE…
- Embedded methods: optimize objective functions while considering the feature count
o Lasso, Ridge…

4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper LLEO. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,16. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 48298 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire universiteiten

Populaire hogescholen

Populaire studieboeken voor Communicatie en Taal

Populaire studieboeken voor Economie en Bedrijf

Populaire studieboeken voor Exact en Informatica

Populaire studieboeken voor Gedrag en Maatschappij

Populaire studieboeken voor Gezondheid en Geneeskunde

Populaire studieboeken voor Recht en Bestuur

Verkoper