100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary of Business Analytics €4,74
In winkelwagen

College aantekeningen

Summary of Business Analytics

 111 keer bekeken  4 keer verkocht
  • Vak
  • Instelling
  • Boek

This document is a summary of the lectures by dr. Bettina Siflinger. The course is a mayor course in the joint Data Science bachelor of Tilburg University and Technische Universiteit Eindhoven. It contains a summary of the lecture notes and remarks from the teacher.

Laatste update van het document: 3 jaar geleden

Voorbeeld 4 van de 34  pagina's

  • 12 januari 2021
  • 1 februari 2021
  • 34
  • 2020/2021
  • College aantekeningen
  • Dr. betinna siflinger
  • Alle colleges
avatar-seller
JBM040: Business Analytics
Quartile 2: 2020 – 2021
Teacher: dr. B.M. Siflinger b.m.siflinger@uvt.nl



Business analysis is the ability of firms/organizations to collect, analyse and act
on data




1

,Introduction & Important concepts in probability and statistics
Part 1. Introduction
There are two estimation problems:
1. Prediction: Develop a formula for making predictions about the dependent
variable, based on the observed values of the independent variables.
General question you ask yourself: What happens?

2. Causal analysis: Independent variables are regarded as causes of the
dependent variable. The goal is to determine whether a particular independent
variable really affect the dependent variable, and to estimate the magnitude of
that effect, if any.
General question you ask yourself: Why does it happen?

Now, consider the data generation process for a linear model: y=β 0 + β 1 x 1 +..+ β k x k
with outcome: y , regressors: x 1 , .. x k, and “true” parameters: β 0 ,.. , β k . Its error
term is: u N ( 0 , σ 2 I ) .You should make an assumption for relationship of x=x 1 ,… , x k
and u : E ( u|x )=0.
- E(u∨x) indicates if x and u are dependent or not.
- I is the identity matrix, with only 1’s in the diagonal.

The main goal of OLS is to obtain the estimates ^β 0 , β^ 1 , … , β^ k that minimize sum of
squared residuals.

OLS has two goals with respect to the two estimation problems. They have
different quantities of interest, but the same calculations are involved:

Predictive modelling: Estimate conditional mean E( y∨x) .
^
E ( y∨x )= ^β 0 + ^β 1 x 1+ …+ ^β k x k

Causal estimation: Estimate partial derivative (slope parameter) with respect
to some x j .
^
∂ E ( y|x ) ^
=β j
∂ xj

Both of the goals can be achieved simultaneously by OLS under the condition of
the assumption of zero conditional mean: E ( u|x )=0 .
E ( y|x )=E ( xβ +u|x )=xβ + E(u∨x)

The prediction procedure is interested in the regression line that fits the data as
close as possible. E(u∨x) does not play a role because the prediction is based on
the things that you observe, which E(u∨x) is not. Now it is possible to obtain the
best fit to the data according to least squares criterion

Causal estimation is interested in a particular β j . The causal interpretation of β j
fails if E ( u|x j ) ≠ 0, because the partial derivative with respect to E(u∨x) must be
zero because otherwise we do not get β . Instead get a biased estimate of β j .

All these methods can be used in econometrics.
Econometrics: “based upon the development of statistical methods for estimating
economic relationships, testing economic theories, and evaluating and implementing


2

,government and business policy” . It has the goal to infer that one variable has a
causal effect on another variable. You can use the ceteris paribus analysis.
Investigate the effect of x j on y when all the other factors are fixed. For example:
Problem: There is mostly observational data available
Solution: Impose assumptions to simulate ceteris paribus analysis.
Make sure
that x j and u are independent.
In an exercise, it could be that a regression can be found based on two
parameters. However, there can be other factors that influence the outcome.
Due to omitted variables bias, the estimated regression coefficient b is. This b^ is
only unbiased if cov ( x , u )=0:

^ cov ( x , y ) =cov ( x , xb+u ) =b+ cov ( x ,u )
b=
cov (x , x ) cov ( x , x) cov ( x , x)

Part 2. Probability theory: Random variables
The probability distribution is a function that describes the probability of
obtaining possible values that a random variable X can take on. In addition, the
discrete random variable is a list of outcomes x 1 , … , x k with their probabilities
p1 , … , pk . The continuous random variable is a variable that takes value in a
continuum.

These random variable have an expected value E( X) or μ, which is the average
of all possible values of X . The calculation of the estimate value is different for
different type of random variables:
k
- Discrete RV: E ( x )=∑ x j p j
j=1

- Continuous RV: E ( x )= ∫ x f ( x ) dx
−∞


This calculation has some properties:
 A constant c : E ( c )=c
 Constants a and b : E ( aX +b )=aE ( X ) +b
 (a 1 , … , ak ) are constants, (X 1 , … , X k ) are random variables:
n n
E
(∑ ) ∑
i=1
ai X i =
i=1
a i E( X i)


The variance says something about the distance from X to its mean μ.
2 2 2 2
Var ( X )=σ =E [ ( X−μ ) ] = E ( X )−μ
It has properties:
 Constant X : Var ( X )=0
 Constants a and b : Var ( a+ bX )=b 2 Var ( X )
 Standard deviation: sd ( X )=√ Var ( X)

The covariance measures the linear dependence between the random variables
X and Y .
Cov ( X , Y )=σ xy =E [ ( X−μ x )( Y −μY ) ]=E ( XY )−E ( X ) E(Y )



3

, It has properties:
 If X and Y are independent: Cov ( X , Y )=0
 Constants a 1 , b1 , a2 , b2: Cov ( a1 X+ b1 , a2 Y + b2 )=a1 a2 Cov ( X , Y )

The correlation coefficient is an indicator of how much two random variables
correlate. This value always lays within the range [−1 ,1].
Cov ( X , Y ) σ xy
Corr ( X , Y )= =
sd ( X ) sd (Y ) σ x σ y
It has properties:
 Cov ( X , Y ) and Corr ( X , Y ) have the same sign
 Cov ( X , Y )=0→ Corr ( X , Y )=0

The properties of the variance of sums of random variables:
 Constants a and b : Var ( aX + bY )=a 2 Var ( X ) +b 2 Var ( Y ) +2 ab Cov ( X , Y )
 X and Y uncorrelated: Var ( X +Y )=Var ( X ) + Var ( Y )=Var ( X−Y )
 X 1 , … , X n parwise uncorrelated random variable and a i :i=1 , … , n are
n
2
constants: Var ( a1 X 1+ …+ an X n ) =∑ ai Var( X i)
i=1
The conditional expectation of the relationship between X and Y is denoted by
E(Y ∨X ). Calculate Y which is related to X .

It has properties:
 Function c ( X): E ( c ( X )|X )=c ( X)
 Functions a ( X ) and b (X ): E [ a ( X ) Y +b ( X )| X ] =a ( X ) E ( Y | X ) +b( X)
 X and Y are independent: E ( Y |X )=E(Y )

The Law of iterated expectations (LIE): E ( E ( Y |X ) )=E(Y ). The E(Y ) is a
n
weighted average of the E(Y ∨X =x j) with weights p j → E (Y )=∑ pk E(Y ∨X=x k ).
k=1


Part 3. Finite sample properties
From here on, random variables are also notated as lower case letters. Finite
sample properties are the properties of an estimator that holds for any sample
size. Take a random sample ( y 1 , y2 , … , y n) from a population distribution
depending on unknown parameter θ . An estimator of θ is a rule that assigns each
possible outcome of the sample a value of θ :
n
1
 Natural estimator for μ (mean): y= ∑y
n i=1 i
^
 Estimator θ^ for θ : θ=h ( y 1 , y 2 , … , y n ) where h is some function of RV
^
The estimator θ is a RV because it depends on a random sample. It is an
unbiased estimator if E ( θ^ ) =θ for all possible θ . This indicates that unbiasedness
does not depend on the sample size. The bias of an estimator θ^ :Bias ( θ^ )=E ( θ)−θ
^ .

σ2
The sample variance of an estimator is Var ( y )= . In a sequence of unbiased
n
estimators, the one with the smallest variance is preferred.



4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper datasciencestudent. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,74. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53068 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,74  4x  verkocht
  • (0)
In winkelwagen
Toegevoegd