100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
4.4C Applied Multivariate Data Analysis Samenvatting Field boek $5.83
Add to cart

Summary

4.4C Applied Multivariate Data Analysis Samenvatting Field boek

1 review
 172 views  21 purchases
  • Course
  • Institution
  • Book

Summary of Field's book for the course 4.4C Applied Multivariate Data Analysis. The summary includes chapters 2,3,6,8,9,11, 12, 13, 14, 15, 16, and 17.

Preview 4 out of 52  pages

  • No
  • Hoofdstuk 2,3,6,8,9, 11 t/m 17
  • January 23, 2022
  • 52
  • 2021/2022
  • Summary

1  review

review-writer-avatar

By: polinastenina • 1 year ago

avatar-seller
Field Book – Ch. 2,3,6,8,9,11, 12, 13, 14, 15, 16, 17

Chapter 2: The Spine of Statistics

2.1 What will this chapter tell me?
How we can use the properties of data to go beyond our observations and draw inferences
about the world at large.

2.2 What is the SPINE of statistics?
Standard error
Parameters
Interval estimates (Confidence intervals)
Null hypothesis significance testing
Estimation

2.3 Statistical models
Scientists build (statistical) models of real-world processes to predict how these processes
operate under certain conditions. The degree to which a statistical model represents the
data collected is known as the fit of the model.

Outcome = Model + Error
This means that the data we observe can be predicted from the model we choose to fit plus
some amount of error.

2.4 Populations and Samples
Scientists are usually interested in finding results that apply to an entire population of
entities. We rarely have access to every member of a population. Therefore, we collect data
from a smaller subset of the population known as a sample.

2.5 P is for Parameters
Statistical models are made up of variables and parameters. Variables are measured
constructs that vary across entities in the sample. In contract, parameters are not measured
and are (usually) constants believed to represent some fundamental truth about the
relations between variables in the model.
We can predict values of an outcome variable based on a model. The form of the model
changes, but there will always be some error in prediction, and there will always be
parameters that tell us about the shape or form of the model.

2.5.1 The mean as a statistical model
The mean value is a hypothetical value: it is a model created to summarize the data and
there will be error in prediction. Hats on equations  means they are estimates.

2.5.2 Assessing the fit of a model: sums of squares and variance revisited
The error or deviance for a particular entity is the score predicted by the model for that
entity subtracted from the corresponding observed score.

,The sum of squares (SS) can be used to assess the total error in any model (Add the squared
particular errors). To estimate the mean squared error (also known as variance) in the in the
population we need to divide the SS by the degrees of freedom (df: n-1) (SS/df).
We can use the sum of squared errors and the mean squared error to assess the fit of a
model.

Degrees of freedom relate to the number of observations that are free to vary.

2.6 E is for estimating parameters
The equation for the mean is designed to estimate that parameter to minimize the error.
That doesn’t necessarily mean that the value is a good fit to the data, but it is a better fit
than any other value you might have chosen.
This section has focused on the principle of minimizing the sum of squared errors, and this is
known as the method of least squares or ordinary least squares OLS. However, there are
other estimation methods as well.

2.7 S is for standard error
To go beyond the data we need to look at how representative our samples are of the
population of interest. The population mean, μ, is the parameter we’re trying to estimate.
But since we don’t have access to the whole population we use a sample, of which we get
the sample mean. If we take multiple samples we get different means, this illustrates the
sampling variation. Since the samples contain different members of the population they
vary.

A sampling distribution is the frequency distribution of sample means (or whatever
parameter you’re trying to estimate) from the same population. If we would have thousands
of samples (unicorn idea), the average of all the samples would be the population mean. The
standard deviation would tell us how widely sample means spread around the population
mean, so how representative of the population a sample mean is likely to be. The standard
deviation of sample means is known as the standard error of the mean (SE) or standard
error for short. This would be calculated by taking the difference between each sample
mean and the overall mean, squaring those differences, adding them up, and then dividing
by the number of samples. Finally, the square root of this value would need to be taken. 
we don’t take that many samples. Central limit theorem tells us that as samples get large
(>30, smaller gets t-distribution), the sampling distribution has a normal distribution with a
s
mean equal to the population mean and a standard deviation of σ X =
√N

2.8 I is for (confidence) interval
We can use the estimated parameters and standard error to calculate boundaries within
which we believe the population value will fall, called confidence intervals.

2.8.1 Calculating confidence intervals
Rather than fixating on a single value from the sample (the point estimate), we could use an
interval estimate instead: we use our sample value as the midpoint but set a lower and
upper limit as well. Typically, we look at 95% confidence intervals: they are limits
constructed such that, for a certain percentage of samples (here 95%), the true value of the

,population parameter falls within the limits. To calculate the confidence interval, we need
to know the limits within which 95% if sample means will fall.

Lower boundary of confidence interval = X −(1.96 × SE)
Upper boundary of confidence interval = X +(1.96 × SE)

2.8.2 Calculating other confidence intervals
Sometimes we want to calculate other types of confidence intervals such as 99% or 90%.
(1−0.95)
Then you need to find the z-value: = 0.025  Look up in the table, z = 1.96. For
2
other values, you can replace 1.96 in the formula by the new z-value.

2.8.3 Calculating confidence intervals in small samples
For smaller samples, you have a t-distribution. So to construct a confidence interval in a
small sample we use the same principle as before, but instead of the value for z we use the
value for t.
Lower boundary of confidence interval = X −( t n−1 × SE )
Upper boundary of confidence interval = X + ( t n−1 × SE )

2.8.4 Showing confidence intervals visually
The confidence interval is usually displayed using something called an error bar, which looks
like the letter ‘I’. If the bars of any two means do not overlap then we can infer that these
means are from different populations, they are significantly different.

2.9 N is for null hypothesis significance testing
2.9.1 Fisher’s p-value
Only when there is a 5% chance (or 0.05 probability) of getting the result we have (or more
extreme) if no effect exists are we confident enough to accept that the effect is genuine.
Fisher’s basic point was that you should calculate the probability of an event and evaluate
this probability within the research context.

2.9.2 Types of hypothesis
In contrast to Fisher, Neyman and Pearson believed that scientific statements should be split
into testable hypotheses. The hypothesis or prediction from your theory would normally be
that an effect will be present, the alternative hypothesis (H1, sometimes called
experimental hypothesis). The null hypothesis (H0) is the opposite of the alternative
hypothesis and usually states that an effect is absent. The null hypothesis is useful because
it gives us a baseline against which to evaluate how plausible our alternative hypothesis is.
We can talk only in terms of the probability obtaining a particular result or statistic if,
hypothetically speaking, the null hypothesis were true.
Hypothesis can be directional or non-directional. A directional hypothesis states than an
effect will occur, but also states the direction of the effect (less chocolate, one-tailed). A
non-directional hypothesis states that an effect will occur, but not the direction (amount of
chocolate).

, 2.9.3 The process of NHST
NHST is a blend of Fisher’s idea of using the probability value p as an index of the weight of
evidence against a null hypothesis, and Neyman and Pearson’s idea of testing a null
hypothesis against an alternative hypothesis.




2.9.4 Test statistics
Systematic variation is variation that can be explained by the model we’ve fitted to the data.
Unsystematic variation is not attributable to the effect we’re investigating and cannot be
explained by the model we’ve fitted. The simplest way to test whether the model fits the
data, or whether our hypothesis is a good explanation of the data we have observed, is to
compare the systematic variation against the unsystematic variation.
signal variance explained by model ¿ parameter effect
Test statistic= = = =
noise variance not explained by model sampling variation∈the parameter error
The exact form of the calculation changes depending on which test statistic you’re
calculating.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller KenzaS. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $5.83. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

59063 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$5.83  21x  sold
  • (1)
Add to cart
Added