Samenvatting

Samenvatting Statistiek 1A psychologie

1 keer verkocht

Vak
Statistiek 1 B (PSBA108)

Instelling
Rijksuniversiteit Groningen (RuG)

Dit is een samenvatting van Statistiek 1A van psychologie jaar 1. Hierin zijn de colleges en het boek verwerkt.

[Meer zien]

Voorbeeld 3 van de 29 pagina's

Bekijk voorbeeld

Geupload op 17 maart 2023
Aantal pagina's 29
Geschreven in 2022/2023
Type Samenvatting

statistiek 1a
colleges
boek
samenvatting

Statistiek 1A samenvatting
Hoofdstuk 1: Looking at data-distributions
H1.1: Data
A statistical analysis starts with a set of data. We construct a set of data by first deciding which cases
we want to study. For each case, we record information about characteristics (variables).
- Cases: the objects described by a set of data.
- Variable: characteristic of a case.
Different cases van have different values of the variables.
- Label: special variable used in some data sets to uniquely identify different cases.

Two types of variables:
- Categorical variable: places a case into one of several groups or categories.
- Quantitative variable: takes numerical values for which arithmetic operations such as adding
and averaging make sense.

We sometimes use the term observation to describe the data for a particular case.
Spreadsheets are useful for doing simple computations.

Another important part of the description of any quantitative variable is its unit of measurement.

Key characteristics of a data set:
1. Who? What cases do the data describe? How many cases does the data set contain?
2. What? How many variables does the data contain? What are the exact definitions of these
variables? What are the units of measurement for each quantitative variable?
3. Why? What purpose do the data have? Do we hope to answer some specific question?

Converting a count to a rate is an example of adjusting one variable to create another.

H1.2: Displaying distributions with graphs
For each variable, the cases generally will have different values. The distribution of a variable
describes how the values of a variable vary from case to case. We can use graphical and numerical
descriptions for a distribution.

Exploratory data analysis helps us examine data to describe their main features. It uses graphs and
numerical summaries to describe the variables in a data set and the relations among them.

Predictive analytics: interest in using our descriptions to predict something in the future.

The distribution of a categorical variable lists the categories and gives either the count or the
percent of cases that fall in each category.
An alternative to the percent  proportion: the count divided by the sum of the counts.
- Percent is the proportion times 100.

The use of graphical methods allows us to see this information and other characteristics of the data
easily.
Two types of graphs for categorical variables: bar graphs and pie charts.
- Pie charts naturally uses percents. A bar graph can use counts and percents.

,Two ways to show quantitative variables: stemplots and histograms.

• Stemplots:
- A stemplot (stem-and-leaf plot) gives a quick picture of the shapes of a distribution while
including the actual numerical values in the graph. Works best for small numbers.
When you want to compare two related distributions, you can use a back-to-back stemplot. The
leaves on each side are ordered out from the common stem.

Two modifications of the basic stemplot helpful in different situations:
- You can dubble the number of stems in a plot by splitting stems: separating each stem into
two, one with leaves 0-4 and the other with leaves 5-9.
- When the observed values have many digits, you can simplify the plot by trimming the
numbers, removing the last digit or digits before making a stemplot.
The purpose of a stemplot is to display the shape of a distribution.

• Histograms:
A histogram breaks the range of values of a variable into classes and displays only the count of
percent of the observations that fall into each class.
- Classes are of equal width.
- Histograms do not display the actual values observed, stemplots do.  We prefer stemplots
for small data sets.

For a histogram you have to count the number of individuals in each class. Each count is called a
frequency, and a table of frequencies for all classes is a frequency table.

Use histograms of percents for comparing several distributions that have different numbers of
observations.

Too few classes in a histogram will give a ‘skyscraper’ graph, with all the values in a few classes with
tall bars.
Too many classes will give a ‘pancake’ graph, with most classes having one or no observations.

Differences histograms and bar graphs:
- A histogram shows the distribution of counts of percents among the values of a single
quantitative variable. The classes define a range of values, and the heights of the bars
represent count of values within the given range.
- A bar graph compares the counts or percents of different values for a single categorical
variable.
- Bar graphs have a blank space between the bars to separate the items being compared.
Histograms have no blank space to indicate that all values of the variable are covered.

Examining a distribution
- Look for the overall pattern and for striking deviations from the pattern.
- Describe the the overall pattern by its shape, center, and spread.
- An important kind of deviation is an outlier, an individual value that falls outside the overall
pattern.

We can describe the center of a distribution by its midpoint, the value with roughly half the
observations taking smaller values and half taking higher values.
Describe the spread of a distribution by giving the smallest and largest values.
The extreme values of a distribution are in a tail of the distribution.

, Mode: a major peak in the distribution.
- Unimodal: one peak in the distribution.
- Bimodal: two peaks in the distribution.
- Trimodal: three peaks in the distribution.

A distribution is symmetric if the patterns of values smaller and larger than its midpoint are mirror
images of each other.
It is skewed to the right if the right tail (larger values) is much longer than the left tail (smaller
values).

Outliers can be one of the most important characteristics of a data set.

A time plot of a variable plots each observation against the time at which it was measured. Always
put time on the horizontal scale of your plot and the variable you are measuring on the vertical scale.
A time plot can reveal interesting patterns in a set of data.

H1.3: Describing distributions with numbers
Measuring center:
The two common measures of center are the mean and the median.
- Mean: average value.
- Median: middle value.
Mean:
X-bar: mean of all x-values.
Important weakness of the mean as measure of center: the mean is sensitive to the influence of a
few extreme observations. The mean cannot resist the influence of extreme observations  not a
resistant measure of the center.
A resistant measure is sometimes called a robust measure.

Median:
The median M is the midpoint of a distribution. Half the observations are smaller than the median,
and the other half are larger than the median.
Finding the median’s location in the ordered list  (n+1) / 2

Comparing the mean and the median
- The median is more resistant than the mean.
- The mean and median are the most common measures of the center of a distribution. For a
symmetric distribution they are close together. In a skewed distribution, the mean is farther
out in the long tail than is the median.

Measuring spread: The quartiles
The simplest useful numerical description of a distribution consists of both a measure of center and a
measure of spread.
- The upper quartile is the median of the upper half of the data. The lower quartile is the
median of the lower half of the data.
- With the median, the quartiles divide the data into four equal parts: 25% of the data in each
part.
- The pth percentile of a distribution is the value that has p% of the observations fall at or
below it.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper viviannezwiers. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,19. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69411 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis