100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary for Intermediate statistis 1 EUC $9.62
Add to cart

Summary

Summary for Intermediate statistis 1 EUC

 26 views  2 purchases
  • Course
  • Institution

This summary is based on all the content of the course Intermediate statistics 1 from EUC and summarizes the material discussed in the book and lectures.

Preview 3 out of 24  pages

  • January 25, 2024
  • 24
  • 2023/2024
  • Summary
avatar-seller
WEEK 1: basic statistics

Statistical inference: a process in which we draw conclusions about populations based on
sample data. The goal is to make generalizations or predictions about a population
parameter using information obtained from a subset of that population (the sample).
Statistical inference involves two main approaches: estimation and hypothesis testing.

Parameters: a numerical characteristic that describes a specific aspect of an entire
population. It is a fixed, often unknown value, used for making predictions or generalizations
about the population based on sample data. (population mean, population standard
deviation).

Levels of measurement:
- Nominal: the data can only be categorized
- Ordinal: the data can be categorized and ranked
- Interval: the data can be categorized, ranked, and evenly spaced
- Ratio: the data can be categorized, ranked, evenly spaced, and has a natural zero.

Law of large numbers: As you increase sample size (or the number of samples), then the
sample mean will approach the population mean.

Central limit theorem: if you take sufficiently large samples from a population, the samples’
means will be normally distributed, even if the population isn’t normally distributed.

Mean: The mean, also known as the average, is calculated by adding up all the values in a
dataset and then dividing by the number of values. The mean is sensitive to extreme values
(outliers) in the dataset
Median: The median is the middle value when a dataset is ordered from smallest to largest.
If there is an even number of observations, the median is the average of the two middle
values. The median is not influenced by extreme values and is especially useful with skewed
distributions.
Mode: The mode is the value that occurs most frequently in a dataset. A dataset may have
one mode (unimodal), two modes (bimodal), or more (multimodal).
Unlike the mean and median, the mode can be applied to categorical data as well.
________________________________________________________________________

,Right-skewed (positively skewed): The right tail of the distribution is longer than the left tail.
The majority of the data points are concentrated on the left side, and the distribution extends
to the right. In a histogram, it looks like a tail stretching to the right.

Left-skewed (negatively skewed): The left tail of the distribution is longer than the right tail.
The majority of data points are concentrated on the right side, and the distribution extends to
the left. In a histogram, it looks like a tail stretching to the left.




bimodal distribution: has two distinct modes or peaks, indicating that the data has two
prominent groups or patterns.
________________________________________________________________________

NHST: stands for Null Hypothesis Significance Testing. It is a common method used in
statistical inference to assess the evidence against a null hypothesis.

Here's a breakdown of the key components:

- Null Hypothesis (H0): This is a statement of no effect or no difference in the
population. It is often a default assumption that there is no change, no effect, or no
relationship.

- Alternative Hypothesis (H1 or Ha): This is the opposite of the null hypothesis, stating
that there is a significant effect, difference, or relationship in the population.

- Test Statistic: A statistic calculated from the sample data that helps us decide
whether to reject the null hypothesis. A test statistic is a measure of agreement
between your sample data and what is expected under the null hypothesis.

- Significance Level (α): This is the probability of rejecting the null hypothesis when it is
actually true. Commonly used values are 0.05 or 0.01.

- P-value: The probability of obtaining a test statistic as extreme as, or more extreme
than, the one observed in the sample data, assuming the null hypothesis is true.

- Decision Rule: If the p-value is less than the chosen significance level (α), we reject
the null hypothesis in favor of the alternative hypothesis.
-> p < α : reject the null hypothesis


- Conclusion: Based on the decision rule, we make a conclusion about whether there
is enough evidence to reject the null hypothesis and support the alternative
hypothesis.

, Two-tailed tests: used when the alternative hypothesis is non-directional.
- A non-directional hypothesis states that a population parameter (such as a mean or
regression coefficient) is not equal to a certain value (such as 0). Two-tailed tests are
appropriate for most studies.
- If you’re calculating a confidence interval, choose two-tailed.

One-tailed tests: used when the alternative hypothesis is directional.
- A directional hypothesis states that a population parameter is greater than or less
than a certain value.
- Your alternative hypothesis is directional if it includes words such as “greater than,”
“less than,” “increases,” “decreases,” or the “<” or “>” sign. If it doesn’t include these
(or similar), it is probably non-directional.

Confidence intervals:
estimate +/- critical value ⋅ standard error
𝜎
𝑥 ± 𝑧∗ ⋅
𝑛


Type I error: (false-positive) occurs if an investigator rejects a null hypothesis that is actually
true in the population.
Type II error: (false-negative) occurs if the investigator fails to reject a null hypothesis that is
actually false in the population.
________________________________________________________________________

Z-test:
Objective: The objective of a z-test is to determine whether a sample statistic is significantly
different from a known or hypothesized population parameter. This test is commonly used
when the population standard deviation is known, or the sample size is large (>30), allowing
for the use of the standard normal distribution (z-distribution).

Formula:
𝑥−µ
𝑍 = σ
𝑛

𝑥−µ
𝑍 ≈ 𝑠
𝑛

𝑛
2
∑ (𝑥𝑖−𝑥)
𝑖 =1
𝑠 = 𝑛−1
-> Infer from the table the probability of getting this value, or a value larger.
-> p < α : reject the null hypothesis
-> p < α : accept the null hypothesis

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lillianLouise. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.62. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

52355 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$9.62  2x  sold
  • (0)
Add to cart
Added