100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
4.4C Applied Multivariate Data Analysis Samenvatting Field boek €5,49   In winkelwagen

Samenvatting

4.4C Applied Multivariate Data Analysis Samenvatting Field boek

1 beoordeling
 153 keer bekeken  19 keer verkocht

Samenvatting van het boek van Field voor het vak 4.4C Applied Multivariate Data Analysis. De samenvatting omvat de hoofdstukken 2,3,6,8,9,11, 12, 13, 14, 15, 16, en 17.

Voorbeeld 4 van de 52  pagina's

  • Nee
  • Hoofdstuk 2,3,6,8,9, 11 t/m 17
  • 23 januari 2022
  • 52
  • 2021/2022
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (4)

1  beoordeling

review-writer-avatar

Door: polinastenina • 10 maanden geleden

avatar-seller
KenzaS
Field Book – Ch. 2,3,6,8,9,11, 12, 13, 14, 15, 16, 17

Chapter 2: The Spine of Statistics

2.1 What will this chapter tell me?
How we can use the properties of data to go beyond our observations and draw inferences
about the world at large.

2.2 What is the SPINE of statistics?
Standard error
Parameters
Interval estimates (Confidence intervals)
Null hypothesis significance testing
Estimation

2.3 Statistical models
Scientists build (statistical) models of real-world processes to predict how these processes
operate under certain conditions. The degree to which a statistical model represents the
data collected is known as the fit of the model.

Outcome = Model + Error
This means that the data we observe can be predicted from the model we choose to fit plus
some amount of error.

2.4 Populations and Samples
Scientists are usually interested in finding results that apply to an entire population of
entities. We rarely have access to every member of a population. Therefore, we collect data
from a smaller subset of the population known as a sample.

2.5 P is for Parameters
Statistical models are made up of variables and parameters. Variables are measured
constructs that vary across entities in the sample. In contract, parameters are not measured
and are (usually) constants believed to represent some fundamental truth about the
relations between variables in the model.
We can predict values of an outcome variable based on a model. The form of the model
changes, but there will always be some error in prediction, and there will always be
parameters that tell us about the shape or form of the model.

2.5.1 The mean as a statistical model
The mean value is a hypothetical value: it is a model created to summarize the data and
there will be error in prediction. Hats on equations  means they are estimates.

2.5.2 Assessing the fit of a model: sums of squares and variance revisited
The error or deviance for a particular entity is the score predicted by the model for that
entity subtracted from the corresponding observed score.

,The sum of squares (SS) can be used to assess the total error in any model (Add the squared
particular errors). To estimate the mean squared error (also known as variance) in the in the
population we need to divide the SS by the degrees of freedom (df: n-1) (SS/df).
We can use the sum of squared errors and the mean squared error to assess the fit of a
model.

Degrees of freedom relate to the number of observations that are free to vary.

2.6 E is for estimating parameters
The equation for the mean is designed to estimate that parameter to minimize the error.
That doesn’t necessarily mean that the value is a good fit to the data, but it is a better fit
than any other value you might have chosen.
This section has focused on the principle of minimizing the sum of squared errors, and this is
known as the method of least squares or ordinary least squares OLS. However, there are
other estimation methods as well.

2.7 S is for standard error
To go beyond the data we need to look at how representative our samples are of the
population of interest. The population mean, μ, is the parameter we’re trying to estimate.
But since we don’t have access to the whole population we use a sample, of which we get
the sample mean. If we take multiple samples we get different means, this illustrates the
sampling variation. Since the samples contain different members of the population they
vary.

A sampling distribution is the frequency distribution of sample means (or whatever
parameter you’re trying to estimate) from the same population. If we would have thousands
of samples (unicorn idea), the average of all the samples would be the population mean. The
standard deviation would tell us how widely sample means spread around the population
mean, so how representative of the population a sample mean is likely to be. The standard
deviation of sample means is known as the standard error of the mean (SE) or standard
error for short. This would be calculated by taking the difference between each sample
mean and the overall mean, squaring those differences, adding them up, and then dividing
by the number of samples. Finally, the square root of this value would need to be taken. 
we don’t take that many samples. Central limit theorem tells us that as samples get large
(>30, smaller gets t-distribution), the sampling distribution has a normal distribution with a
s
mean equal to the population mean and a standard deviation of σ X =
√N

2.8 I is for (confidence) interval
We can use the estimated parameters and standard error to calculate boundaries within
which we believe the population value will fall, called confidence intervals.

2.8.1 Calculating confidence intervals
Rather than fixating on a single value from the sample (the point estimate), we could use an
interval estimate instead: we use our sample value as the midpoint but set a lower and
upper limit as well. Typically, we look at 95% confidence intervals: they are limits
constructed such that, for a certain percentage of samples (here 95%), the true value of the

,population parameter falls within the limits. To calculate the confidence interval, we need
to know the limits within which 95% if sample means will fall.

Lower boundary of confidence interval = X −(1.96 × SE)
Upper boundary of confidence interval = X +(1.96 × SE)

2.8.2 Calculating other confidence intervals
Sometimes we want to calculate other types of confidence intervals such as 99% or 90%.
(1−0.95)
Then you need to find the z-value: = 0.025  Look up in the table, z = 1.96. For
2
other values, you can replace 1.96 in the formula by the new z-value.

2.8.3 Calculating confidence intervals in small samples
For smaller samples, you have a t-distribution. So to construct a confidence interval in a
small sample we use the same principle as before, but instead of the value for z we use the
value for t.
Lower boundary of confidence interval = X −( t n−1 × SE )
Upper boundary of confidence interval = X + ( t n−1 × SE )

2.8.4 Showing confidence intervals visually
The confidence interval is usually displayed using something called an error bar, which looks
like the letter ‘I’. If the bars of any two means do not overlap then we can infer that these
means are from different populations, they are significantly different.

2.9 N is for null hypothesis significance testing
2.9.1 Fisher’s p-value
Only when there is a 5% chance (or 0.05 probability) of getting the result we have (or more
extreme) if no effect exists are we confident enough to accept that the effect is genuine.
Fisher’s basic point was that you should calculate the probability of an event and evaluate
this probability within the research context.

2.9.2 Types of hypothesis
In contrast to Fisher, Neyman and Pearson believed that scientific statements should be split
into testable hypotheses. The hypothesis or prediction from your theory would normally be
that an effect will be present, the alternative hypothesis (H1, sometimes called
experimental hypothesis). The null hypothesis (H0) is the opposite of the alternative
hypothesis and usually states that an effect is absent. The null hypothesis is useful because
it gives us a baseline against which to evaluate how plausible our alternative hypothesis is.
We can talk only in terms of the probability obtaining a particular result or statistic if,
hypothetically speaking, the null hypothesis were true.
Hypothesis can be directional or non-directional. A directional hypothesis states than an
effect will occur, but also states the direction of the effect (less chocolate, one-tailed). A
non-directional hypothesis states that an effect will occur, but not the direction (amount of
chocolate).

, 2.9.3 The process of NHST
NHST is a blend of Fisher’s idea of using the probability value p as an index of the weight of
evidence against a null hypothesis, and Neyman and Pearson’s idea of testing a null
hypothesis against an alternative hypothesis.




2.9.4 Test statistics
Systematic variation is variation that can be explained by the model we’ve fitted to the data.
Unsystematic variation is not attributable to the effect we’re investigating and cannot be
explained by the model we’ve fitted. The simplest way to test whether the model fits the
data, or whether our hypothesis is a good explanation of the data we have observed, is to
compare the systematic variation against the unsystematic variation.
signal variance explained by model ¿ parameter effect
Test statistic= = = =
noise variance not explained by model sampling variation∈the parameter error
The exact form of the calculation changes depending on which test statistic you’re
calculating.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper KenzaS. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 75632 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,49  19x  verkocht
  • (1)
  Kopen