100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Samenvatting Statistiek 1A psychologie CA$6.42   Add to cart

Summary

Samenvatting Statistiek 1A psychologie

 8 views  1 purchase
  • Course
  • Institution

Dit is een samenvatting van Statistiek 1A van psychologie jaar 1. Hierin zijn de colleges en het boek verwerkt.

Preview 3 out of 29  pages

  • March 17, 2023
  • 29
  • 2022/2023
  • Summary
avatar-seller
Statistiek 1A samenvatting
Hoofdstuk 1: Looking at data-distributions
H1.1: Data
A statistical analysis starts with a set of data. We construct a set of data by first deciding which cases
we want to study. For each case, we record information about characteristics (variables).
- Cases: the objects described by a set of data.
- Variable: characteristic of a case.
Different cases van have different values of the variables.
- Label: special variable used in some data sets to uniquely identify different cases.

Two types of variables:
- Categorical variable: places a case into one of several groups or categories.
- Quantitative variable: takes numerical values for which arithmetic operations such as adding
and averaging make sense.

We sometimes use the term observation to describe the data for a particular case.
Spreadsheets are useful for doing simple computations.

Another important part of the description of any quantitative variable is its unit of measurement.

Key characteristics of a data set:
1. Who? What cases do the data describe? How many cases does the data set contain?
2. What? How many variables does the data contain? What are the exact definitions of these
variables? What are the units of measurement for each quantitative variable?
3. Why? What purpose do the data have? Do we hope to answer some specific question?

Converting a count to a rate is an example of adjusting one variable to create another.

H1.2: Displaying distributions with graphs
For each variable, the cases generally will have different values. The distribution of a variable
describes how the values of a variable vary from case to case. We can use graphical and numerical
descriptions for a distribution.

Exploratory data analysis helps us examine data to describe their main features. It uses graphs and
numerical summaries to describe the variables in a data set and the relations among them.

Predictive analytics: interest in using our descriptions to predict something in the future.

The distribution of a categorical variable lists the categories and gives either the count or the
percent of cases that fall in each category.
An alternative to the percent  proportion: the count divided by the sum of the counts.
- Percent is the proportion times 100.

The use of graphical methods allows us to see this information and other characteristics of the data
easily.
Two types of graphs for categorical variables: bar graphs and pie charts.
- Pie charts naturally uses percents. A bar graph can use counts and percents.

,Two ways to show quantitative variables: stemplots and histograms.

• Stemplots:
- A stemplot (stem-and-leaf plot) gives a quick picture of the shapes of a distribution while
including the actual numerical values in the graph. Works best for small numbers.
When you want to compare two related distributions, you can use a back-to-back stemplot. The
leaves on each side are ordered out from the common stem.

Two modifications of the basic stemplot helpful in different situations:
- You can dubble the number of stems in a plot by splitting stems: separating each stem into
two, one with leaves 0-4 and the other with leaves 5-9.
- When the observed values have many digits, you can simplify the plot by trimming the
numbers, removing the last digit or digits before making a stemplot.
The purpose of a stemplot is to display the shape of a distribution.

• Histograms:
A histogram breaks the range of values of a variable into classes and displays only the count of
percent of the observations that fall into each class.
- Classes are of equal width.
- Histograms do not display the actual values observed, stemplots do.  We prefer stemplots
for small data sets.

For a histogram you have to count the number of individuals in each class. Each count is called a
frequency, and a table of frequencies for all classes is a frequency table.

Use histograms of percents for comparing several distributions that have different numbers of
observations.

Too few classes in a histogram will give a ‘skyscraper’ graph, with all the values in a few classes with
tall bars.
Too many classes will give a ‘pancake’ graph, with most classes having one or no observations.

Differences histograms and bar graphs:
- A histogram shows the distribution of counts of percents among the values of a single
quantitative variable. The classes define a range of values, and the heights of the bars
represent count of values within the given range.
- A bar graph compares the counts or percents of different values for a single categorical
variable.
- Bar graphs have a blank space between the bars to separate the items being compared.
Histograms have no blank space to indicate that all values of the variable are covered.

Examining a distribution
- Look for the overall pattern and for striking deviations from the pattern.
- Describe the the overall pattern by its shape, center, and spread.
- An important kind of deviation is an outlier, an individual value that falls outside the overall
pattern.

We can describe the center of a distribution by its midpoint, the value with roughly half the
observations taking smaller values and half taking higher values.
Describe the spread of a distribution by giving the smallest and largest values.
The extreme values of a distribution are in a tail of the distribution.

, Mode: a major peak in the distribution.
- Unimodal: one peak in the distribution.
- Bimodal: two peaks in the distribution.
- Trimodal: three peaks in the distribution.

A distribution is symmetric if the patterns of values smaller and larger than its midpoint are mirror
images of each other.
It is skewed to the right if the right tail (larger values) is much longer than the left tail (smaller
values).

Outliers can be one of the most important characteristics of a data set.

A time plot of a variable plots each observation against the time at which it was measured. Always
put time on the horizontal scale of your plot and the variable you are measuring on the vertical scale.
A time plot can reveal interesting patterns in a set of data.

H1.3: Describing distributions with numbers
Measuring center:
The two common measures of center are the mean and the median.
- Mean: average value.
- Median: middle value.
Mean:
X-bar: mean of all x-values.
Important weakness of the mean as measure of center: the mean is sensitive to the influence of a
few extreme observations. The mean cannot resist the influence of extreme observations  not a
resistant measure of the center.
A resistant measure is sometimes called a robust measure.



Median:
The median M is the midpoint of a distribution. Half the observations are smaller than the median,
and the other half are larger than the median.
Finding the median’s location in the ordered list  (n+1) / 2

Comparing the mean and the median
- The median is more resistant than the mean.
- The mean and median are the most common measures of the center of a distribution. For a
symmetric distribution they are close together. In a skewed distribution, the mean is farther
out in the long tail than is the median.

Measuring spread: The quartiles
The simplest useful numerical description of a distribution consists of both a measure of center and a
measure of spread.
- The upper quartile is the median of the upper half of the data. The lower quartile is the
median of the lower half of the data.
- With the median, the quartiles divide the data into four equal parts: 25% of the data in each
part.
- The pth percentile of a distribution is the value that has p% of the observations fall at or
below it.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller viviannezwiers. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for CA$6.42. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75632 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
CA$6.42  1x  sold
  • (0)
  Add to cart