100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Complete summary of all lectures of Statistics & Methodology €7,99   In winkelwagen

College aantekeningen

Complete summary of all lectures of Statistics & Methodology

1 beoordeling
 210 keer bekeken  22 keer verkocht

Summary of all lectures for the course Statistics & Methodology

Voorbeeld 4 van de 62  pagina's

  • 12 oktober 2019
  • 62
  • 2019/2020
  • College aantekeningen
  • Onbekend
  • Alle colleges
Alle documenten voor dit vak (7)

1  beoordeling

review-writer-avatar

Door: evelienwanders • 4 jaar geleden

avatar-seller
48723
Statistics & Methodology
Lecture I
Statistical reasoning
The foundation of all good statistical analysis is a deliberate, careful and thorough
consideration of uncertainty. The purpose of statistics is to systematize the way we account
for uncertainty when making data-based decisions.
Data scientist must scrutinize large numbers of data and extract useful knowledge from this
data. To convert this data into knowledge, data scientists apply various data analytic
techniques.
Statistical inference
When doing statistical inference, we focus on how certain variables relate to each other, for
example: does increased spending on advertising correlate with more sales? Is there a relation
between the umber of liquor stores in an area an the amount of crime?
Statistical inference is the process of drawing conclusions about populations or scientific
truths from data.
Types of variables
Categorical/nominal
Nominal Oridinal
Unordered categories, such as gender and Ordered categories, such as level of
marital status. education.
Numerical
Discrete Continuous
A variable that can only take on a certain A variable that can have an infinite number
number of values. There are no in-between of values, such as time and weight.
values, such as number of cars parked in a
lot or coin tosses.


Probability distributions
Probability distributions quantify how likely it is to observe each possible value of some
probabilistic entity, i.e. a list of all possible values of a random variable, along with their
probabilities.
Binominal distributions
Also called a discrete probability distribution as it is used for discrete variables. A random
variable has a binominal distribution if the following conditions are met:
- There are a fixed number of trials (n);
- Each trial has two possible outcomes (success of failure);
- The probability of success (p) is the same for each trial;
- The trials are independent.

,There are several ways to find the binominal distributions of a random variable. First you can
use the formula:




However, you could also use so called binominal tables.
Two of the most interesting properties of a distribution are the expected value and the
variance. The expected value is the mean of the distribution; the average value of all possible
values of a random variable. The variance of a distribution if the average squared distance of
each value from the expected value.


Normal distribution
Also called the continuous probability distribution. A variable has a normal distribution if its
values fall into a smooth continuous curve; Bell-curve.
Each normal distribution has its own mean (µ) and
standard deviation (σ).
Note: if n is large enough you can use the normal
distribution as well.
One special form of the normal distribution is the
standard normal distribution, or Z-distribution. This
has a mean of zero and a standard deviation of 1. A
value on the Z-distribution represents the number of
standard deviations the data is above or below the
mean, this value is called a z-score.


Sampling distributions
A sampling distribution quantifies the possible values of a test statistic over infinite repeated
sampling.
In general, the mean of the sampling distribution equals the mean of the entire population.
This makes sense; the average of the averages from all samples is the average of the
population that the samples came from. Variability in the sample distribution is measured in
terms of the standard error and is calculated by using the following formula:
Where σx is the population standard deviation and n the sample size. As n is the denominator
in this formula, the standard error decreases if n increases. This means that larger samples
give more precision and less change from sample to sample. And as σx is the numerator in this
formula, the standard error of the sample will increase if the population standard deviation
increases. This makes sense; it is harder to estimate the population’s average when the
population varies a lot to begin with.

,Interpretation
In a loose sense, each point on the curve says: there is a … probability of observing the
corresponding value in any given sample.


Statistical testing
In practice, we may want to distill the information from the probability distributions into a
simple statistic so we can make a judgement. One way to distill this information and control
for uncertainty is through statistical testing.
In parametric testing there are two types of tests: the t-score and the z-score. The t-test is a
statistical test used to compare population means for two independent samples. The t-test is
most appropriate when the sample size is small (< 30) and when the population standard
deviation is unknown. The z-test is most appropriate when the population variance is known
and the sample size is large (> 30).


T-test Z-test
Meaning
Parametric test to identify how the means of Parametric test to identify how the means of
two sets of data differ from one another two sets of data differ from one another
when variance is unknown. when variance is known.
Distribution
Student t-distribution Normal distribution
Population variance
Unknown Known
Sample size
Small (< 30) Large (> 30)


P-value
A test statistic by itself is just a number, therefore we need to compare the statistic to some
objective reference. This is done by computing a sampling distribution of the test statistic.
To quantify how exceptional our estimated test statistic is, we compare the estimated value
to a sampling distribution of the test statistic, assuming no effect (the null hypothesis). If our
estimated statistic would be very unusual in a population where the null hypothesis is true,
we reject the null and claim a statistically significant effect.
We can find the probability associated with a range of values by computing the area of the
corresponding slice from the distribution.

, By calculating the area in the null distribution that exceeds our estimated test statistic, we can
compute the probability of observing the given test statistic or if the null hypothesis were true.
In other words; we can compute the probability of having sampled the data we observed from
a population wherein there is no true mean difference in rating.
If you test statistic is close to 0, or at least within the range where most of the results should
fall, then you cannot reject the null hypothesis. If your test statistic is out in the tails of the
distribution, this means that the results of this sample do not verify the claim, so we reject the
null hypothesis.
You can be more specific about your conclusion by noting exactly how far out of the
distribution the test statistic falls. You do this by looking up the test statistic in the distribution
and finding the probability of being at that value or beyond it. This is called the p-value and it
tells you how likely it was that you would have gotten your sample results if the null hypothesis
were true. The farther out the test statistic is on the tails of the distribution, the smaller the
p-value will be, and the more evidence you have against the null hypothesis.
To make a proper decision about whether or not to reject the null hypothesis, you determine
your cutoff probability for your p-value before doing a hypothesis test. This cutoff is called the
alpha level (α). If the p-value is greater than or equal to α, you cannot reject the null
hypothesis. If the p-value is smaller than α, you reject the null hypothesis.
Note: if you do not use a directional hypothesis, your test should be two-tailed.
Interpretation
There is a … probability of observing a test statistic
at least as large as the estimated test statistic, if the
null hypothesis is true. The p-value has the same
logic as proof by contradiction.
Note: we cannot say that that is a … probability of
observing the estimated test statistic, if the null
hypothesis is true. This is because the probability of
observing any individual point on a continuous
distribution is exactly zero.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper 48723. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 66579 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,99  22x  verkocht
  • (1)
  Kopen