In dit bestand kan de lezer zeer uitgebreide collegeaantekeningen vinden die bijna woord voor woord overeen komen met de colleges. Daarnaast heb ik de antwoorden van de assignments ook opgenomen in mijn college aantekeningen (groen is het antwoord van de docent). Hiermee heb ik het tentamen ruimsch...
Advanced Research Methods
Teacher: Erika van Elsas (erika.vanelsas@ru.nl)
The course will given an overview of the most commonly used advanced research
methods in political sciences. With the use of concrete examples, students will learn the
basic principles behind each method, the strengths and limitations of the method, and the
basic conditions before one can apply it.
Learning goals:
Critically reflect on methodological choices
Choose appropriate research methods for the analysis of political phenomena
Justify my own methodological choices in a research design
Select and justify and appropriate research design for my master thesis.
What to do:
I need a good understanding of the basic research methodology and research methods.
Read the Dutch summary from Lena;
Refresh my knowledge by watching the brief video on statistical modelling under
content;
Slides of PSRM II
What does the course entail?
• There are seven individual assignments (you don’t have to hand them in)
• Tuesday = instruction
• Thursday = discussion and asking questions about the assignment.
• Exam written, open questions.
Terms and concepts Explanation
Qualitative methods
Descriptive You want to describe, so descriptive research. We want to do this through a number
calculate statistics.
Measurement levels Nominal = Qualitative data. There is no difference between the values, there is no
ordering, it is about race, colours, gender and other categories.
Ordinal = Qualitative data. There is an ordering involved, the one is better than the
other: agree/not agree, MBO – HBO – University.
Interval = Quantitative data. Not a natural zero point (birthyears, temperature in
Celsius).
Ratio = Quantitative data. There is a zero point, length, weight, distance, profit.
Distributions De verdeling. The distribution of your data. The more we increase our sample size,
the more the sampling distribution will take this bell-shaped distribution that we know
as the normal distribution. The idea of the mean is that it indicates the top of the
distribution. The central value.
Population and Population refers to the whole Dutch population and to make implications about this
sample population, we use a sample.
Statistical inference We use statistics because we want to give a concise representation of a large
amount of information. The output of our statistical model will give us the opportunity
to make generalised assumptions about a larger unobserved population.
it relies on sampling theory and the central limit theorem. The idea that our sample
can be seen as just one instance of a large distribution of possible samples (the
sampling distribution).
1-
,Hypothesis test Your expectation. You’re going to examine your expectation.
Null-hypothesis Ga je vanuit op voorhand. Iets blijft gelijk, er veranderd niets.
Statistical significance Statistical significance is a determination made by an analyst that the results in the
data are not explainable by chance alone. Statistical hypothesis testing is the method
by which the analyst makes this determination. This test provides a p-value, which is
the probability of observing results as extreme as those in the data, assuming the
results are truly due to chance alone. A p-value of 5% or lower is often considered to
be statistically significant.
Confidence interval A confidence interval displays the probability that a parameter will fall between a pair
of values around the mean. Once we know the confidence interval, we have two
numbers that give us an indication of: there would be a 95% chance of drawing a
sample with mean X-fledhead when the actual population mean falls within that
range.
Mean Greek symbol of mu. Gemiddelde van de populatie. The central tendency.
Mode Komt het vaakste voor in your sample.
Median Middelste getal. n + 1 = median. Line up all the values from small to large and chop
your data in the middle, you find the median.
Dispersion Is a descriptive static. It describes something about our variable. It is describing the
variability of that variable, so how much do the individual observations differ from
each other overall.
Variance Is always positive. The term variance refers to a statistical measurement of the
spread between numbers in a data set.
Standard deviation A standard deviation (or σ) is a measure of how dispersed the data is in relation to
the mean. Low standard deviation means data are clustered around the mean, and
high standard deviation indicates data are more spread out.
Standard error The standard error tells you how accurate the mean of any given sample from that
population is likely to be compared to the true population mean. When the standard
error increases, i.e. the means are more spread out, it becomes more likely that any
given mean is an inaccurate representation of the true population mean.
Residual The differences between the observed values (Yi) and the predicted values (Ŷi) are
called residuals (= the part that is left unexplained). We use these residuals to
estimate the model.
Correlation Correlation is a statistical term that describes the degree to which to variables move
in coordination with each other. Correlation does not equal causation.
t-test A type of inferential statistic that is used to determine if there is a significant
difference between the means of two groups. It is a hypothesis testing tool which
allows testing the hypothesis. A t-test looks at the t-statistic, the t-distribution values
and the degrees of freedom (df) to determine the statistical significance.
ANOVA (for t-test and Analysis of variance. ANOVA is an analysis tool used in statistics that splits an
z-tests) observed aggregate variability found inside a data set into two parts: systematic
factors and random facts. Analysts use the ANOVA test to determine the influence
that independent variables have on the dependent variable in a regression study.
OLS/multilinear Ordinary Least Squares is a type of linear least squares method for estimating the
regression analysis unknow parameters in a linear regression model. OLS chooses the parameters by
the principle of least squares: minimizing the sum of the squares of the differences
between the observed dependent variable (values of the variable being observed) in
the given dataset and those predicted by the linear function of the independent
variable.
Intercept Or constant. The intercept is the expected mean value of Y when all X = 0. If X
equals zero, the intercept is the expected mean value of Y.
Unstandardized Unstandardized scales → when you interpret the coefficients on their original scales
regression coefficient of the variable.
Standardized scales → translating our variables into z-scores and then we can
express our coefficients as standardized coefficients.
2-
,Dummy variable A dummy variable is a numerical variable used in regression analysis to
represent subgroups of the sample in your study. Dummy variables are useful
because they enable us to use a single regression equation to represent multiple
groups. This means that we don’t need to write out separate equation models for
each subgroup. The dummy variables act like ‘switches’ that turn various
parameters on and off in an equation. You create variables that are going to be
substituting for our categories; dummies four our categories.
Moderation/interaction The effect that occurs when a third variable changes the
effect nature of the relationship between a predictor and an
outcome, particularly in analyses such as multiple
regression.
Probabilistic Not perfectly predicting each case. This means we are not perfectly predicting each
observed case but we are searching for the causal effect that best describes what is
going on in all cases combined.
Simple/Mulitple linear Goal is to better explain the variation in Y by adding X variables to the equation. For
regression example, you want to look at the amount of hours that students spend on learning.
Some students study 10 hours per week, others 20 hours. Than you look at which
variable can explain this variance. If you add more X variables, you can better
explain the variation.
Simple focusses on one X variable.
Multiple extending your model by adding more X variables.
b-coefficients The interpretation of each b coefficient (each effect estimated by the model) changes
in a multiple linear regression because b now represents the effects of X1, while
keeping all other X variables constant.
Least squares method This is the line that optimally optimizes the distance of all of the dots on the
regression line. It is the optimal solution. It describes the relationship between X and
Y.
Unexplained variance All the residuals added up together.
Je hebt een blauwe regression line en de unexplained variance zijn al die rode lijntjes
met zwarte puntjes.
Linearity Regression assumptions: functional form.
Each incremented X leads to a fixed change in Y. If X increases with 1, Y increases
with a certain amount.
Additivity Regression assumptions: functional form.
Means that Y is the result of a simple adding up of the components of the model.
Normally distributed The error term is normally distributed with a mean of
errors. zero. This means that the error term is normally
distributed around the regression line.
Homoscedasticity The variance in the error is constant across all observations. The variance becomes
larger for higher values of X.
N stands for normally
distributed and between brackets you see the expected
mean of zero and the single term for the variance (sigma squared it
means there is only one variance).
Independent error This assumption means that there is no correlation between the error term of
assumption one observation and another. Or between groups of
observations.
This problem is in time series data and in nested multilevel data. This
assumption is important to keep in mind.
3-
, Exogeneity This means that the effect of X on Y is not caused by any other variable or by a
reverse causal effect of Y on X.
Session 1 – Intro and causal mechanisms
What is causality? What are causal mechanisms? Is the testing of causal mechanisms a
necessary condition for establishing causality? In this meeting, we’ll talk about different
definitions of causality, the importance of causal mechanisms and how they relate to
analysis at the micro and macro level.
Agenda
Introduction to ARM
Causal inference
Causation as robust dependence & its critiques
Methodological individualism
Introduction to ARM
4-
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller maudwigink. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $10.18. You're not tied to anything after your purchase.