100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
STUDY GUIDE INTERMEDIATE STATISTICS I UNIVERSITY COLLEGE 2020, Discovering statistics using IBM SPSS statistics, summary, samenvatting €5,98   In winkelwagen

Samenvatting

STUDY GUIDE INTERMEDIATE STATISTICS I UNIVERSITY COLLEGE 2020, Discovering statistics using IBM SPSS statistics, summary, samenvatting

1 beoordeling
 112 keer bekeken  4 keer verkocht

Study guide intermediate statistics I for Erasmus University College Students. It is thus especially designed for EUC students and the chapters that are included are named in the description, but of course, you can still use it for your own course! Good luck with your exams. Discovering Statist...

[Meer zien]

Voorbeeld 4 van de 43  pagina's

  • Nee
  • 2.1-2.8, 5.1-5.4, 6.1-6.5.6, 8.1-8.11, 18.1-18.3.5, 18.4-18.5.7
  • 2 december 2020
  • 43
  • 2020/2021
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (3)

1  beoordeling

review-writer-avatar

Door: rodrigopitufo99 • 2 jaar geleden

avatar-seller
EUCstudent
STUDY GUIDE INTERMEDIATE
STATISTICS I ERASMUS
UNIVERSITY COLLEGE 2020

, PBL 1
Chapter 2; Everything you never wanted to know about statistics
2.2 Building statistical models
Scientists do much the same: they build (statistical) models of real-world processes in an
attempt to predict how these processes operate under certain conditions. It must represent
the data collected (observed data).
Fit The degree to which a statistical model represents the data collected is known as the fit
of the model
Good fit If the engineer uses this model to make predictions about the real world then, because it
so closely resembles reality, she can be confident that these predictions will be
accurate.
Moderate There are some similarities to reality but also some important differences
fit
Poor fit Any predictions based on this model are likely to be completely inaccurate
2.3 Populations and samples
Population The complete set of observations a researcher is interested in.
Sample Subset of a population, often taken for the purpose of statistical inference
Linear Models based upon a straight line
model
Linear models have two types of biases
1. Many models in the scientific literature might not be the ones that fit best
2. Many data sets might not have been published because a linear model was a poor fit
Both biases are there because they didn’t look at a non-linear model
Scatter plot A scatter plot of two variables shows the values of one variable on the Y-axis and the
values of the other variable on the X-axis. They are well suited for revealing the
relationship between the two variables.
Positive There is a positive association between variables X and Y if smaller values of X are
association associated with smaller values of Y and larger values of X are associated with larger
values of Y.
Negative There is a negative association between variables X and Y if smaller values of X are
association associated with larger values of Y and larger values of X are associated with smaller
values of Y. r=-1
Linear relationship There is a perfect linear relationship between two variables if a scatterplot of the
point falls on a straight line. The relationship is linear even if the points diverge from
the line as long as the divergence is random rather than being systematic. r=1
Correlation The correlation measures the direction and strength of the linear relationship
between two quantitative variables Interval and Ratio). Association between X and Y.
Correlation is usually written as r. It can range from -1 to 1. It is symmetric
(correlation XY is the same as YX). It is unaffected by linear transformations.
Third variable A third variable is responsible for the correlation between two other variables.
problem
Covariance The covariance between variables X and Y is a unstandardized measure of linear
association between them




(Pearson) The correlation measures the direction and strength of a linear relation. It is the
Correlation standardized version of the covariance: its value is not dependent on the
measurement scale of the variables. Values near −1 or +1 indicate a strong
negative or positive relation, respectively. Values near 0 indicate a weak relation.



2.4 Statistical models
Outcome= (model) + error
Statistical models are made up of variables and parameters.
Parameter Whereas variables measure data, parameters describe the relation between those
variables. They are constants who represent some truth about the measured variables

, (R.c.). It is a value calculated in a population.
Statistic A value computed in a sample to estimate a parameter
2.4.1. The mean as a statistical model
3 measures for the center of a distribution  Mean, median and mode
Mean The mean is calculated by the sum of the observations divided by the number of
observations. In the middle off the x-graph. (variance/deviation)



Median The median is the value that splits the numerically ordered observations into two
equal parts. As such, it is the middle value of all observations. If it is not possible to
split the data in to two equal parts (which is the case if there is an even number of
observations), the median is computed by taking the mean of the two middle
observations. The median is denoted by M
Mode The mode is the most frequent value. If multiple values have the same frequency, the
data has multiple modes.
Measures of Variance, standard deviation, percentiles, quartiles, interquartile range (IQR)
the variability (measure of fit)
of a
distribution
Variance/ The average of the squared deviations from the mean  or mean squared error
mean squared
error

Standard Taking the square root of the variance. It is an indicator of how numbers vary from the
deviation mean




 same as total error
Degrees of The degrees of freedom of an estimate is the number of independent pieces of
freedom information on which the estimate is based. If they are dependent, we do not have a
(df) degree of freedom. In general, the degrees of freedom for an estimate is equal to the
number of values minus the number of parameters estimated en route to the estimate
in question. The denominator of the variance is the degrees of freedom: (n-1)
2.4.3 Estimating parameters
The parameter you find, has the least error given the data you have.
Method of a method of estimating parameters (such as the mean, or a regression coefficient)
least squares that is based on minimizing the sum of squared errors. The parameter estimate will
be the value, out of all of those possible, that has the smallest sum of squared errors.
The standard The standard normal distribution is a normal distribution with mean 0 and standard
normal deviation 1: N(0,1)
distribution


2.5.1 The standard error
Sampling variation the extent to which a statistic (the mean, median, t, F, etc.) varies in samples
taken from the same population
Sampling the distribution of possible values of a given statistic that we could expect to
distribution/probabil get from a given population
ity distribution
The standard If we have a population with mean μ and standard deviation σ and we
deviation of the repeatedly draw small random samples with n observations from this
sampling population, then the standard deviation of sampling distribution of x¯ is given
distribution of the
sample mean
by: . The standard deviation of the sampling distribution of the
sample mean is also referred to as the standard error (of the mean) (SE).
It is a measure of how representative a sample is likely to be of the population.
Central limit For any population with finite mean μ and finite non-zero variance σ2,
theorem (CLT) the sampling distribution of the sample mean approaches a normal distribution

, σ
with mean μ and standard deviation . The sampling distribution of any
√n
σ
statistic is approximately N ¿, ¿ when n is “large enough”-> even when the
√n
population is not normally distributed. It only holds when (1) n is large enough
(>30) and (2) observations are independent.
2.5.2.1 Calculating confidence intervals
Confidence An interval of reasonable values for the population mean. A confidence interval is
interval a range of scores likely to contain the parameter being estimated. Intervals can
be constructed to be more or less likely to contain the parameter: 95% of 95%
confidence intervals contain the estimated parameter whereas 99% of 99%
confidence intervals contain the estimated parameter. The wider the confidence
interval, the more uncertainty there is about the value of the parameter. It has a
confidence level C, where C is the probability that the interval will capture the
true parameter value in repeated samples




where z=(1-p)/2
2.5.2.3 Calculating confidence intervals in small samples
T- For counteracting the bias. The distributions are bell-shaped and symmetric about 0,
distribution but the precise form depends on their degrees of freedom. We use the notation t(k) for
a t-distribution with k degrees of freedom. T-distributions have more probabilities in the
tails, but if the degrees of freedom increases, the t-distribution approaches the
standard normal distribution.
C= x ± tn-1 x s/√ n
2.5.2.4 Showing confidence intervals visually
By showing them graphically, we can see if they overlap of not (and thus if the mean could be
from the same sample).
If they do not overlap, this can have two reasons
1. Our confidence intervals both contain the population mean, but from different
populations
2. Both samples come from the same population, but 1 of them doesn’t contain the
population mean.
2.6.1 Null hypothesis significance testing
Significance Two approaches
test 1. FISHER  A significance test is conducted and the probability value reflects the
strength of the evidence against the null hypothesis.
P<0.01, the data provide strong evidence that the null hypothesis is false.
0.01<P<0.05, The null hypothesis is rejected, but with less confidence.
0.05<P<0.10, weak evidence, cannot be rejected. Higher probabilities provide less
evidence that the null hypothesis is false.
More suitable for scientific research.
2. NEYMAN AND PEARSON  Specify an α level before analyzing the data. If the data
analysis results in a probability value below the α level, then the null hypothesis is
rejected; if it is not, then the null hypothesis is not rejected. It does not matter how
significant something is.
More suitable for yes/no decisions
Alernative The prediction that there will be an effect
hypothesis
Null it says that your prediction is wrong and the predicted effect doesn’t exist.
hypothesis
Probability The probability value is the probability of an outcome given the null hypothesis were
value true. It is not the probability of the hypothesis given the outcome.
Significance The probability value below which the null hypothesis is rejected is called the
level level. If the null hypothesis is rejected it only means that the effect is not exactly
zero, it does not tell if its important or large. Finding that an effect is statistically
significant signifies that the effect is real and not due to chance.
Hypotheses can be directional or non-directional.
1.6.1.4 Test statistic

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper EUCstudent. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,98. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 82871 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,98  4x  verkocht
  • (1)
  Kopen