100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary for Intermediate statistis 1 EUC €8,99
In winkelwagen

Samenvatting

Summary for Intermediate statistis 1 EUC

 26 keer bekeken  2 keer verkocht

This summary is based on all the content of the course Intermediate statistics 1 from EUC and summarizes the material discussed in the book and lectures.

Voorbeeld 3 van de 24  pagina's

  • 25 januari 2024
  • 24
  • 2023/2024
  • Samenvatting
Alle documenten voor dit vak (3)
avatar-seller
lillianLouise
WEEK 1: basic statistics

Statistical inference: a process in which we draw conclusions about populations based on
sample data. The goal is to make generalizations or predictions about a population
parameter using information obtained from a subset of that population (the sample).
Statistical inference involves two main approaches: estimation and hypothesis testing.

Parameters: a numerical characteristic that describes a specific aspect of an entire
population. It is a fixed, often unknown value, used for making predictions or generalizations
about the population based on sample data. (population mean, population standard
deviation).

Levels of measurement:
- Nominal: the data can only be categorized
- Ordinal: the data can be categorized and ranked
- Interval: the data can be categorized, ranked, and evenly spaced
- Ratio: the data can be categorized, ranked, evenly spaced, and has a natural zero.

Law of large numbers: As you increase sample size (or the number of samples), then the
sample mean will approach the population mean.

Central limit theorem: if you take sufficiently large samples from a population, the samples’
means will be normally distributed, even if the population isn’t normally distributed.

Mean: The mean, also known as the average, is calculated by adding up all the values in a
dataset and then dividing by the number of values. The mean is sensitive to extreme values
(outliers) in the dataset
Median: The median is the middle value when a dataset is ordered from smallest to largest.
If there is an even number of observations, the median is the average of the two middle
values. The median is not influenced by extreme values and is especially useful with skewed
distributions.
Mode: The mode is the value that occurs most frequently in a dataset. A dataset may have
one mode (unimodal), two modes (bimodal), or more (multimodal).
Unlike the mean and median, the mode can be applied to categorical data as well.
________________________________________________________________________

,Right-skewed (positively skewed): The right tail of the distribution is longer than the left tail.
The majority of the data points are concentrated on the left side, and the distribution extends
to the right. In a histogram, it looks like a tail stretching to the right.

Left-skewed (negatively skewed): The left tail of the distribution is longer than the right tail.
The majority of data points are concentrated on the right side, and the distribution extends to
the left. In a histogram, it looks like a tail stretching to the left.




bimodal distribution: has two distinct modes or peaks, indicating that the data has two
prominent groups or patterns.
________________________________________________________________________

NHST: stands for Null Hypothesis Significance Testing. It is a common method used in
statistical inference to assess the evidence against a null hypothesis.

Here's a breakdown of the key components:

- Null Hypothesis (H0): This is a statement of no effect or no difference in the
population. It is often a default assumption that there is no change, no effect, or no
relationship.

- Alternative Hypothesis (H1 or Ha): This is the opposite of the null hypothesis, stating
that there is a significant effect, difference, or relationship in the population.

- Test Statistic: A statistic calculated from the sample data that helps us decide
whether to reject the null hypothesis. A test statistic is a measure of agreement
between your sample data and what is expected under the null hypothesis.

- Significance Level (α): This is the probability of rejecting the null hypothesis when it is
actually true. Commonly used values are 0.05 or 0.01.

- P-value: The probability of obtaining a test statistic as extreme as, or more extreme
than, the one observed in the sample data, assuming the null hypothesis is true.

- Decision Rule: If the p-value is less than the chosen significance level (α), we reject
the null hypothesis in favor of the alternative hypothesis.
-> p < α : reject the null hypothesis


- Conclusion: Based on the decision rule, we make a conclusion about whether there
is enough evidence to reject the null hypothesis and support the alternative
hypothesis.

, Two-tailed tests: used when the alternative hypothesis is non-directional.
- A non-directional hypothesis states that a population parameter (such as a mean or
regression coefficient) is not equal to a certain value (such as 0). Two-tailed tests are
appropriate for most studies.
- If you’re calculating a confidence interval, choose two-tailed.

One-tailed tests: used when the alternative hypothesis is directional.
- A directional hypothesis states that a population parameter is greater than or less
than a certain value.
- Your alternative hypothesis is directional if it includes words such as “greater than,”
“less than,” “increases,” “decreases,” or the “<” or “>” sign. If it doesn’t include these
(or similar), it is probably non-directional.

Confidence intervals:
estimate +/- critical value ⋅ standard error
𝜎
𝑥 ± 𝑧∗ ⋅
𝑛


Type I error: (false-positive) occurs if an investigator rejects a null hypothesis that is actually
true in the population.
Type II error: (false-negative) occurs if the investigator fails to reject a null hypothesis that is
actually false in the population.
________________________________________________________________________

Z-test:
Objective: The objective of a z-test is to determine whether a sample statistic is significantly
different from a known or hypothesized population parameter. This test is commonly used
when the population standard deviation is known, or the sample size is large (>30), allowing
for the use of the standard normal distribution (z-distribution).

Formula:
𝑥−µ
𝑍 = σ
𝑛

𝑥−µ
𝑍 ≈ 𝑠
𝑛

𝑛
2
∑ (𝑥𝑖−𝑥)
𝑖 =1
𝑠 = 𝑛−1
-> Infer from the table the probability of getting this value, or a value larger.
-> p < α : reject the null hypothesis
-> p < α : accept the null hypothesis

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lillianLouise. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52355 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,99  2x  verkocht
  • (0)
In winkelwagen
Toegevoegd