100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Statistics for Educational Scientists Part 3, lecture notes and exercises €10,46   In winkelwagen

Samenvatting

Summary Statistics for Educational Scientists Part 3, lecture notes and exercises

 7 keer bekeken  0 keer verkocht

Notes from the lectures Statistics for Educational Scientists Part 3 (), combined with information from the Powerpoint. Also contains all the exercises used in the lectures, solved and often with explanations as to why one answer is correct.

Voorbeeld 10 van de 129  pagina's

  • 17 september 2024
  • 129
  • 2023/2024
  • Samenvatting
Alle documenten voor dit vak (2)
avatar-seller
xanalaenen
Statistics for Educational Scientists, Part 3: Data-analytic process

1 Example


Research: motivation and creativity (Amabile, 1985)
● Research question: “What is the influence of extrinsic or intrinsic motivation on creativity?”
● Experimental design
○ 47 students randomly assigned to two groups
○ Task: write a poem
○ Group 1: list of intrinsic reasons (n = 24)
○ Group 2: list of extrinsic reasons (n = 23)
○ Poems rated bij 12 poets on a 40-point scale

Questionnaire given to creative writers, to rank intrinsic and extrinsic reasons for writing:




Creativity scores in two motivation groups, and their summary statistics:

Averages differ, bur can we conclude that there are
mean differences in the population?




1

, 2 Data analysis workflow


2.1 Preparations

What to do during preparations for data analysis?
● Is the research question clear?
● Evaluate the design of the experiment
○ In our example we have an experiment where participants are randomly assigned to
two conditions
○ Causal inference may be possible
● Check data
○ Ex.: forgetting a decimal point
○ Ex.: score higher than maximum score

2.2 Exploratory data analysis

During exploratory data analysis we use descriptive statistics to …
● Become familiar with the data
● Tentatively seek answers to research questions
● Detect outliers
● Uncover interesting aspects of the data


Example - Exploratory data analysis


A histogram for each group so we can
see how data are distributed amongst
the groups

We can see that people who had to list
intrinsic motivations generally have
higher scores than people who had to
list extrinsic motivations




Boxplots give an idea about the
distribution of scores in terms of
quartiles

We can also look at symmetry,
skewness, …




2

, A bit of practice…

Look at the boxplots. Which statement is wrong?

A. The boys (J) and girls (M) group achieve the same
maximum score but a different minimum score
B. The boys’ group (J) average is 13, while the
girls’ group (M) average is 15
C. The girl’s boxplot (M) is symmetric, while the boys’
boxplot (J) is skewed
D. For the boys’ boxplot (J), we are dealing with an
outlier, but not for the girls’ boxplot (M)
E. No idea

B is the wrong answer because the boxplots show us the
medians, NOT the averages/means


2.3 Statistical inference

We see the differences between groups, but are these differences also observed in the population?
To check, we need to formulate hypotheses

Stap-by-step guide for statistical inference
1. Formulate H0 and H1
2. Select the significant level α (optional)
3. Calculate test statistics
4. Determine the p-value
5. Decision (optional)

Notations:
● Yij: score of person i in group j of the dependent variable
● nj: number of observations in group j
● 𝑌𝑗: sample mean in group j


We start with a research question, which we then try to translate into a statistic model to answer the
research model → we use this to formulate hypotheses
1. Formulate models and hypotheses
● Ho: µ1 = µ2 (restricted model)
𝑖𝑖𝑑
○ Yi1 ∼
N(µ,σ2), i = 1, …, n1
𝑖𝑖𝑑
○ Yi2 ∼
N(µ,σ2), i = 1, …, n2
𝑖𝑖𝑑
○ Yij = µ + εij, εij ∼
N(0,σ2)
● H1: µ1 ≠ µ2 (restricted model)
𝑖𝑖𝑑
○ Yi1 ∼
N(µ1,σ2), i = 1, …, n1
𝑖𝑖𝑑
○ Yi2 ∼
N(µ2,σ2), i = 1, …, n2
𝑖𝑖𝑑
○ Yij = µj + εij, εij ∼
N(0,σ2)
● iid = independent and identically distributed: observations are independent and come
from the same distribution




3

, 2. Test statistics: choice and value
● What do we know about the distribution of 𝑌1 − 𝑌2 across different samples?
○ Normally distributed
○ With mean value µ1 - µ2
1 1
○ And standard deviation σ 𝑛1
+ 𝑛2

○ σ = unknown, so estimate using sample variances

● 𝑡=
(𝑌 − 𝑌 ) − (µ − µ )
2 1 2 1
=
(𝑌 − 𝑌 ) − 0
2 1

𝑆𝐸(𝑌 − 𝑌 ) 2 1
𝑆𝐸(𝑌 − 𝑌 )
2 1


(𝑛1 − 1)𝑆'21 + (𝑛2 − 1)𝑆'22
○ Where 𝑆𝐸 𝑌2 − 𝑌1 = ( ) 𝑛1 + 𝑛2 − 2
×
1
𝑛1
+
1
𝑛2
𝑛𝑗
2
○ And 𝑆𝑗 =
'2 1
𝑛𝑗 − 1 (
∑ 𝑌𝑖𝑗 − 𝑌𝑗
𝑖=1
)
3. Derive sample distribution, determine p-value and (optional) make a decision
● Given H0 is true: t ~ tdf=n1+n2-2
● A sampling distribution = repeated sampling
○ If we conduct an experiment many times, we get different data that we can
plot in different histograms to look at the distribution of scores
○ By replicating the experiment multiple times, we get the sampling distribution
of the statistics
● Deciding if we reject or accept the null hypothesis
○ Compare value of test statistics with use of tables or SPSS
○ Determine rejection region: region in distribution where we reject the null
hypothesis
○ If the distribution is two-sided, there is a rejection region on both sides of the
graph (2 x probability)
● Optional
○ Compare p-value with α to decide if the result of the test is statistically
significant or not
○ Make a decision to reject or accept H0
4. Effect size determination
● 100(1-α)% confidence interval (CI) for difference between two averages
● (
𝐶𝐼 = 𝑌2 − 𝑌1 ± 𝑡 𝑛 ) *
( 1
+ 𝑛2 − 2 ) (
× 𝑆𝐸 𝑌2 − 𝑌1 )
● Effect size helps evaluate “practical significance”


A bit of practice…

Which t-test statistic do you obtain from the data below?

A. t = (-) 0,09
B. t = (-) 0,32
C. t = (-) 0,87
D. t = (-) 1,63

𝑌2 − 𝑌1
𝑡 =
(𝑛1 − 1)𝑆'21 1 1
𝑛1 + 𝑛2 − 2
× 𝑛1
+ 𝑛2

15 − 13,8751 1,1429
𝑡 = = = 0, 87
2
(15 − 1) × 2, 8284 + (14 − 1) × 4,1111
2
1 1 12,2857 × 0,1381
15 + 14 − 2
× 15
+ 14




4

, Example - intrinsic and extrinsic motivation

Step 1: formulate models and hypotheses
Restricted model Unrestricted model




Assumptions: Assumptions:
● Scores follow normal distribution with ● µ is different in both groups
mean µ ● standard variance and variation is the
● µ will be the same in both groups same across both groups

Step 2: test statistics - choice and calculation


𝑡=
(𝑌 − 𝑌 ) − 0
2 1
=
𝑌2 − 𝑌1

𝑆𝐸(𝑌 − 𝑌 )
2 1 (𝑛1 − 1)𝑆1 + (𝑛2 − 1)𝑆'22
'2
1 1
𝑛1 + 𝑛2 − 2
× 𝑛1
+ 𝑛2

(19,88 − 15,74)
= 2 2
(24 − 1) × 4,44 + (23 − 1) × 5,25 1 1
24 + 23 − 2
× 24
+ 23

= 2, 92

Step 3: derive sample distribution and determine p- value, and (optionally) make a decision

Compare value of test statistic (2,92) with t-distribution with df = 45
Because the distribution is two-sides, there is a rejection region on both sides of the graph, so
there’s two times the probability that t-sore is larger than 2,92

SPSS: p = 0,0054
Table D: 0,005 > p > 0,0025 BUT it’s a two-sided distribution so 0,01 > p > 0,005




p = 0,0054 < α → we reject H0

Step 4: effect size determination

* 0,05
𝑡45 → 𝑝 = 2
= 0, 025

(
if α = 0,05, then 95% 𝐶𝐼 = 𝑌2 − 𝑌1 ± 𝑡45 × 𝑆𝐸 𝑌2 − 𝑌1 ) *
( )
= 4, 14 ± 2, 014 × 1, 42 → [1, 2801; 6, 9999]




5

,Interpreting the size of a p-value
● Is there evidence of a difference?
○ p = 0: convincing
○ p > 0,01: moderate
○ p > 0,05: suggestive, but inconclusive
○ p > 0,1: NO
● ATTENTION
○ The p-value is NOT the probability that H0 is wrong
○ The p-value depends on n (not the effect size)
○ Do not say: “the p-value is significant”


A bit of practice…

Determine the p-value using the following data. Compare with α = 0,05. What conclusion do you
draw?

A. 0,025 < p < 0,05 → reject H0
B. 0,025 < p < 0,05 → accept H0
C. 0,05 < p < 0,10 → reject H0
D. 0,05 < p < 0,10 → accept H0
E. No idea

df = 23 + 20 - 2 = 41
Table D → look at df = 40 (if we can’t find df in table, we always look at smaller df)
1,836 lies in between 0,05 and 0,025 → x 2: 0,05 < p < 0,1
p > α, so we accept H0


A bit of practice…

What is the crucial t-value at an α of 0,01? Use the following data:

A. t* = 2.326
B. t* = 2.403
C. t* = 2.576
D. t* = 2.678
E. No idea

df = 35 + 20 - 2 = 53 → looking at df = 50 in Table D
0,01
𝑝 = 2 = 0, 005


2.4 Interpretation

What should be present in the interpretation of data?
● Formulate a conclusion
○ Answer the research questions
○ Use substantive terminology
● Summarize results using plots
○ If only two groups are compared, it is not necessary to show the results in a plot
○ With multiple groups, however, it is useful
● State findings’ limitations
○ Randomization: causal inference possible
○ Random samples: assumption questionable, strictly no inference to population
possible



6

, Example - Interpretation extrinsic and intrinsic motivation

“There is strong evidence that a lower creativity score for a poem is obtained after completing the
extrinsic questionnaire (M=16) compared to the intrinsic questionnaire (M=20), (t(45) = 2.9, p< 0.01,
two-sided). The estimated difference is 4 points on a 40-point scale. The 95% confidence interval
for the decrease in creativity score due to extrinsic motivation ranges from 1 to 7 points.”


A bit of practice…

You want to investigate whether boys score significantly more or less on a test than girls. Based on
SPSS, you obtain the output below. Complete the conclusion below (with α = 0,05).




“There is significant evidence that the group “BOYS” (M=12,0714) scores differently on average on
the test than the group “GIRLS” (M=15), (t(27) = 2,638, p = 0,014). The estimated difference is 2,93
points with a 95% CI of 0,65 to 5,21 points.


A bit of practice…

A researcher conducts a t-test to compare the extent to which 30 professors and 32 teaching
assistants attach importance to guided self-study. When she wants to check the effect size, she
obtains the following 99% CI: [1,01; 8,99]. What is the value of the corresponding test statistic?

A. 2,660
B. 2,9949
C. 3,3333
D. 5
E. No idea


( ) *
𝐶𝐼 = 𝑌2 − 𝑌1 ± 𝑡 𝑛 +𝑛 −2 × 𝑆𝐸 𝑌2 − 𝑌1 → C = 99%, 𝑡(60) = 2,66
( 1 2 ) ( ) *


8,99 + 1,01
Center of CI = 2
=5
Calculate m: (upper - center) = (8,99 - 5) = 3,99

𝑡
*

( 𝑛1+𝑛2 − 2 ) (
× 𝑆𝐸 𝑌2 − 𝑌1 = 3, 99 )
(
𝑆𝐸 𝑌2 − 𝑌1 = ) 3,99
2,66
= 1, 5


𝑡=
(𝑌 − 𝑌 )
2 1
=
5
= 3, 3333
𝑆𝐸(𝑌 − 𝑌 )
2 1
1,5




In reality data analysis workflow is often more complicated!
Models involve certain assumptions, which sometimes do not hold true


7

,8

, Statistics for Educational Scientists, Part 3: analysis of variance with one
factor (ANOVA-1)

1 Example


Example - Validation of Boston Naming Test (BNT)

46 primary school children

Variables:
● Independent variable (IV): divided amongst 4 groups
○ Modality-specific speech and language development disorder (STOS)
○ STOS with behavioral disorders
○ STOS with generalized cognitive impairments
○ Children without STOS (control group)
● Dependent variable (DV): number of items correct (min. 0 - max. 60)




Children get 60 different pictures, and have to name the object that is portrayed on the picture

Research questions:
● Are there differences between the population averages between the four groups?
● Is there a difference between children without a STOS and those with an STOS? (contrast
analysis)


2 Notation and introduction to one-way ANOVA


Notation:
● Yij: score of person i in group j on the dependent variable (DV)
● nj: number of observations in group j
● N: total number of observations
● a: number of groups
● 𝑌j: sample mean in group j
● 𝑌: overall sample mean

Data structure → usually nice to have a look at data: tabulate data
Participant dataset usually used in software (like SPSS)




9

, There are 3 columns:
● Person identification number
● Dependent variable
● Group

This is the general data structure you get
when you upload a data set into softwares
like SPSS

For our example of Boston Naming Test,
our dependent variable is the score a
participant gets on the test (0-60)




3 Exploratory data analysis


We want to get an idea about certain patterns in the data → a preliminary indication of what’s going
on with the data

We can put the data into boxplots
This is going to visualize the distribution of our dependent variable
Boxplots CANNOT tell us the mean of a distribution


Example - Boston Naming Test




We can see an outlier in the distribution of our normal group (group 1)
The second group (STOS) has a much larger variability compared to the other groups in the
experiment




10

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper xanalaenen. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,46. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 72042 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€10,46
  • (0)
  Kopen