Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Samenvatting - Predictive and Prescriptive Analytics (F000801) €7,16
Ajouter au panier

Resume

Samenvatting - Predictive and Prescriptive Analytics (F000801)

 19 vues  1 fois vendu

Samenvatting van het vak Predictive and Prescriptive Analytics (PPA) gegeven door professor Matthias Bogaert, jaar . Samengevat op basis van les + slides

Aperçu 4 sur 35  pages

  • 25 mars 2024
  • 35
  • 2023/2024
  • Resume
Tous les documents sur ce sujet (1)
avatar-seller
LLEO
PREDICTIVE AND PRESCRIPTIVE ANALYSIS
LECTURE 1: LINEAR PROGRAMMING
Analytics

The point of analytics is always to make better decisions based on extracting knowledge from data..
There is descriptive, predictive and prescriptive analytics.

With predictive analytics you want to predict something in the future. Predictive analytics is a
subfield of machine learning, the difference lies in their predictions. Predictive analytics’ predictions
are about the future, while machine learning predictions are on unseen data.

Prescriptive Analytics

Prescriptive analytics prescribes which actions/decisions (maximize sales e.g.) would most likely
optimize the outcome of a certain decision process, by the use of auxiliary data. It tells you what you
need to do with the predictions from the predictive analysis part.

Auxiliary data = data from an external source, that is incorporated or linked in some way to the data
collected by the study e.g.




Example of prescriptive analysis; amazon or any other commercial site will recommend other
products to buy based on other products you have bought. This way they are prescribing to you what
to buy. This is amazon's way to increase their retention. There is always an optimization model behind
it.

Prescriptive Analytics has four main components to it:

- Optimization
- IT and Big Data
- Data Understanding
- Statistics and Data Mining

Prescriptive Analytics is mainly a mix of Predictive Analytics and Optimization:




1

,It uses predictive data to prescribe the future, which holds uncertainty. This is done by using dynamic
programming, stochastic optimization, reinforcement learning…

The definition of prescriptive analytics states: “A set of mathematical techniques that
computationally determine a set of high-value alternative actions or decisions given a complex set of
objectives, requirements and constraints, with the goal of improving business performance.

Main takeaways of this definition: it is a mathematical model, it is built computationally, we will
come up with an objective that have constraints and most importantly we will IMPROVE something.

Also, a focus on decision making using auxiliary data: Meaning we will make predictions that will help
us make better decisions and increase our decision power.

1. Data Representation:
a. Uncertain quantities of interest: Y1,…,Yn, where Y∈Υ⊂Rdy.
b. Auxiliary data on associated covariates: X1,…,Xn, where X∈χ⊂Rdx.
2. Objective Function:
a. Minimization of an uncertain cost: v=c(z;Y).
3. Standard Model:
a. Standard stochastic cost: vstoch=min E[c(z;Y)].
4. Model with Auxiliary Data:
a. Cost conditioned on auxiliary data: v∗=minE[c(z;Y)∣X=x].
5. Full-information Optimal Decision:
a. Full-information optimal decision: v∗ represents the optimal decision with complete
knowledge of data.
6. Predictive Prescriptions:
a. Predictive prescriptions: znX are decisions made based on observed covariates.
b. Goal: Achieve E[c(znX;Y)∣X=x]≈v∗(x) and v<vstoch.


In other words: Prescriptive analytics uses uncertain quantities Y and covariates X to minimize the
uncertain cost v=c(Z;Y). The standard model minimizes the expected cost vstoch, while considering
auxiliary data aims to minimize the expected cost v∗ conditioned on X. The goal is to make predictive
prescriptions ZnX that approximate the optimal decision v∗, achieving a cost v close to v∗ and less
than vstoch.

Prescriptive Performance

The performance metric used in prescriptive analytics is the Coefficient of Prescriptiveness (P). It
measures the efficacy of a predictive prescription. It is a
unitless measure that lies between 0 and 1 with 0 being not
prescriptive and 1 being highly prescriptive.

The formula states 1 minus the ratio of:

- The minimal cost achieved when utilizing auxiliary data compared to the minimal cost
achieved with perfect information (i.e., using actual values).
- The minimal cost achieved when excluding the auxiliary data compared to the minimal cost
achieved with perfect information.



2

,In other words, it is a measure used in prescriptive analytics to quantify the effectiveness or impact
of utilizing auxiliary data in decision-making processes. It represents the degree to which
incorporating auxiliary data improves decision-making compared to making decisions based solely on
perfect information.

Optimization Methods

Optimization methods vary in their handling of uncertainty and complexity. Exact methods, such as
linear programming for fractional and linear problems, typically provide the best solutions. Integer
programming, dealing with integer and linear problems, also offers precise solutions. On the other
hand, (meta-)heuristics and (non-)linear programming, addressing integer/fractional and linear/non-
linear problems respectively, provide good solutions but may not guarantee:




When conducting prescriptive analytics in Python, a linear optimization problem can be tackled using
the PuLP package. Additionally, sensitivity analysis can be performed using two key techniques:
shadow prices and reduced costs. Shadow prices represent the rate of change in the objective value
when the right-hand side of a constraint is increased by 1 unit. On the other hand, reduced costs
indicate the amount by which the coefficient of a decision variable must change to yield a positive
value for that variable. Notably, the reduced cost is equivalent to the shadow price of the non-
negativity constraint of the variable. It's essential to note that both shadow prices and reduced costs
are only valid within the allowable decrease/increase limits.

A Branch & Bound procedure, begins by solving the relaxed Linear Programming (LP) problem with
real values. It then divides the solution space into subsets through branching, ensuring they are
mutually exclusive and exhaustive. During the bounding phase, candidate solutions are checked
against a lower bound at each branch, discarding any that are worse. This method systematically
explores the solution space, leading to efficient identification of optimal solutions while minimizing
computational effort.

LECTURE 2: PREDICTIVE AND PRESCRIPTIVE ANALYSIS
When conducting a case study or any sort of business problem, the CRISP-DM model is the go to
process to go through. It stands for Cross Industry Standard Process for
Data Mining and involves following steps:




3

, Business & Data Understanding

These go hand in hand. You start by understanding what the business problem is, what the company
in question is dealing with and what they want do to tackle it. Followed up by taking a closer look at
the data for the first time and making some first descriptive statistics, checking for missing values …

Data Preparation

Here you create a basetable, which you will use to make models. Split your dataset into training, test
and validation sets. Check whether all variables are necessary to use and delete those not needed by
either PCA or preferably, by using Recursive Feature Elimination. This variable selection can decrease
the computation time by a lot. The major strength of this approach is that it selects variables using a
well-performing prediction algorithm and established variable importance score.

Recursive Feature Elimination (RFE) = a useful algorithm for feature selection, particularly when
dealing with numerous predictors. It works by iteratively removing the least important predictors
until a stopping criterion is met. Random Forest (RF) is well-suited for RFE due to its reliable variable
importance scores and its avoidance of excluding variables from the final prediction equation.
However, correlations among predictors should be carefully considered. Other tree-based methods
like gradient boosting are also viable options for RFE.

scikit-learn provides built-in functions for both Recursive Feature Elimination (RFE) and Recursive
Feature Elimination with Cross-Validation (RFECV). The main difference between the two is that RFE
requires manually selecting the number of features, while RFECV automates this process by
conducting cross-validation to determine the optimal number of features. Running RFE can be time-
consuming, but several strategies can decrease the computation time. These include reducing the
number of features or tuning parameters of the random forest classifier, such as decreasing the
number of trees or limiting the depth of each tree. Additionally, other methods like parallelization,
feature sampling, or model caching can also be effective in reducing running time.

Next to RFE, Boruta is another common variable selection method used with Random Forests. It
gradually eliminates features that are less relevant than randomly generated variables, while
BorutaShap enhances this process by incorporating Shapley values and providing flexibility with any
tree-based model. Both methods follow a similar procedure, involving the addition and shuffling of
shadow variables (=copies of all variables), ranking features, and iteratively selecting those that
surpass importance thresholds. BorutaShap is more flexible than Boruta since any tree-based model
can be used and also use mean decrease in Gini. Computational overhead is also lowered thanks to
the sampling procedure which takes the smallest possible subset at each iteration.

Choosing between RFE and Boruta depends on the goal: RFE aims for minimal-optimal variable
subsets for optimal classification (prediction), whereas Boruta aims to identify all relevant variables
for a comprehensive understanding. While both methods offer similar performance and computation
time, Boruta tends to be slightly more accurate.

Alternative variable selection methods include:

- Filter methods: gauge correlations between predictors and responses
o Pearson correlation, Fisher score, Cramer’s V…
- Wrapper methods: use external search procedures for subset selection
o RFE…
- Embedded methods: optimize objective functions while considering the feature count
o Lasso, Ridge…

4

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur LLEO. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €7,16. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

53340 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€7,16  1x  vendu
  • (0)
Ajouter au panier
Ajouté