Summary with everything for midterm Statistics PM CIS
45 vues 4 fois vendu
Cours
Statistiek for PM CIS (800957B6)
Établissement
Tilburg University (UVT)
Summary with everything you need to learn for the midterm Statistics for Pre-master Communication & Information Sciences (Tilburg University). Including notes of the lectures, information from Practical Units, notes for Reporting results and a summary of the chapters from the book.
- No relationship between age and the number of wrinkles you have.
- Women are equally likely as men to wear a skirt.
Alternative hypothesis H1:
- Positive relationship between age and the number of wrinkles you have. The older people
are, the more wrinkles they have.
- Women are more likely to wear a skirt or dress than men.
Experiment > manipulate something
- This supposed to have an effect: CAUSE-> EFFECT
The manipulated variable is the independent variable, and the effect is the dependent variable
Correlational design: you can’t manipulate (more wrinkles when you get older > you can’t
manipulate let people get older)
Independent variable:
- If experiment: the proposed cause, which is manipulated
- If correlation design: a predictor variable
Dependent variable:
- If experiment: the proposed effect
- If correlation design: a predictor variable
Categorical variables: entities are divided into distinct categories (gender, animal person)
- Binary or dichotomous: only 2 categories (dead or alive)
- Nominal variables: more than 2 categories
- Ordinal variable: same as. nominal but a logical order
Continuous variables: entities get a distinct score (age, leisure ranking)
- Interval variable: equal intervals on the variable represent equal differences in the property
being measured (distance/order/equality)
- Ratio variable: same as interval, but ration of scores on the scale also need to make sense.
(weight in kilograms, number of calories in a beverage)
Examples:
- Coffee: both
- Sleep: continuous
- Military rank: ordinal
- Extraversion: interval
Ordinal variables often labelled as continuous, because it allows for more analyses.
,PU1
The rows represent the so-called cases. Partcipants can be cases, but also schools and every other
object that possesses characteristics.
You can change the measurement level of your variables. For some analyses jamovi will not allow
you to enter nominal variables and not even display them in the menu because it makes no sense to
perform those types of analyses on nominal variables. To prevent misery, make sure the
measurement levels of your variables are correctly specified.
(change ‘data type’ to integer so there are no decimals anymore)
,H1.
Why do we use statistics and not our common sense:
When people are presented with a strong argument that contradicts our pre-existing beliefs,
we find it hard to even perceive it to be a strong argument. Even worse, when people are
presented with a weak argument that agrees with our pre-existing biases, almost no-one can
see that the argument is weak.
There are a lot of critical questions that you can’t answer with statistics, but the answers to
those questions will have a huge impact on how you analyze and interpret data.
H2.1
The theoretical construct: the thing that you’re trying to take a measurement of, like “age”,
“gender” or an “opinion”. A theoretical construct can’t be directly observed, and often they’re
actually a bit vague.
The measure: refers to the method or the tool that you use to make your observations. A question
in a survey, a behavioral observation or a brain scan could all count as a measure.
Operationalization: the process by which we take a meaningful but somewhat vague concept and
turn it into a precise measurement
The process of operationalization can involve several different things:
- Being precise about what you are trying to measure.
- Determining what method you will use to measure it.
- Defining the set of allowable values that the measurement can take.
Variable: what we end up with when we apply our measure to something in the world. That is,
variables are the actual “data” that we end up with in our data sets.
H2.2
Scales of measurement:
- Binary or dichotomous: only 2 categories (dead or alive)
- Nominal scale variable (Categorical variable): no relationship between the different
possibilities. (age)
- Ordinal scale variable: there is a way to order the different possibilities.
- Interval scale variable: interval scale variables the differences between the numbers are
interpretable, but the variable doesn’t have a “natural” zero value. (temperature)
- Ratio variable: same as interval, but the zero also need to make sense. (weight)
- Continuous variable: one in which, for any two values that you can think of, it’s always
logically possible to have another value in between.
, - Discrete variable: isn’t continuous. For a discrete variable it’s sometimes the case that
there’s nothing in the middle.
H2.3
Reliability: tells you how precisely you are measuring something
It refers to the repeatability or consistency of your measurement
- Test-retest reliability: relates to consistency over time. If we repeat the measurement at a
later date do we get the same answer?
- Inter-rater reliability: relates to consistency across people. If someone else repeats the
measurement (e.g., someone else rates my intelligence) will they produce the same answer?
- Parallel forms reliability: relates to consistency across theoretically-equivalent
measurements. If I use a different set of bathroom scales to measure my weight does it give
the same answer?
- Internal consistency reliability: if a measurement is constructed from lots of different parts
that perform similar functions (e.g., a personality questionnaire result is added up across
several questions) do the individual parts tend to give similar answers.
H2.4
Y: to be explained
X1, X2: doing the explaining
Dependent variable: being explained = Y
Independent variable (IV): used to do the explaining = X
If there really is a relationship between Y and X: we can say that Y depends on X, and if we have
designed our study “properly” then X isn’t dependent on anything else.
Use X (the predictors) to make guesses about Y (the outcomes)
H2.5
,Experimental research: the researcher controls all aspects of the study, especially what participants
experience during the study. The researcher manipulates or varies the predictor variables (IVs) but
allows the outcome variable (DV) to vary naturally. The idea here is to deliberately vary the
predictors (IVs) to see if they have any causal effects on the outcomes
Randomisation: randomly assign people to different groups, and then give each group a different
treatment (assign them different values of the predictor variables). Randomisation minimise (but not
eliminate) the possibility that there are any systematic difference between groups.
Non-experimental research: any study in which the researcher doesn’t have as much control as they
do in an experiment.
- Quasi-experimental research: an experiment but we don’t control the predictors (IVs). We
can still use statistics to analyse the results, but have to be more careful and circumspect.
- Case studies: aims to provide a very detailed description of one or a few instances.
H2.6
Validity: tells you how accurate the measure is.
- Internal validity: refers to the extent to which you are able draw the correct conclusions
about the causal relationships between variables.
- External validity: relates to the generalisability or applicability of your findings.
- Construct validity: a question of whether you’re measuring what you want to be measuring.
- Face validity: whether or not a measure “looks like” it’s doing what it’s supposed to.
- Ecological validity: the entire set up of the study should closely approximate the real world
scenario that is being investigated
H2.7
Worries in validity:
- Confounder: an additional, often unmeasured variable that turns out to be related to both
the predictors and the outcome. The existence of confounders threatens the internal validity
of the study because you can’t tell whether the predictor causes the outcome, or if the
confounding variable causes it.
- Artefact: when the result is only true in the special situation that you test your study in. The
possibility that your result is an artefact describes a threat to your external validity, because
it raises the possibility that you can’t generalise or apply your results to the actual
population that you care about.
Confounders are a bigger concern for non-experimental studies. The more control you have over
what happens during the study, the more you can prevent confounders from affecting the results.
Artefactual results are a bigger concern for experimental studies than for non-experimental studies.
That is why a lot of studies are non-experimental, because what the researcher is trying to do is
examine human behaviour in a more naturalistic context.
History effects: the possibility that specific events may occur during the study that might influence
the outcome measure. (flood makes the next participants thinking differently about risk handling)
Maturational effects: about change over time of people (getting older/tired..)
,Repeated testing: a history effect in which the “event” that influences the second measurement is
the first measurement itself. (People are nervous at time 1, but might be more calm at 2/people
might make a intelligence test better at time 2, bevaus they learned learned the general rules of how
to solve “intelligence-test-style” questions at 1)
Selection bias: for example people selected into two groups have different characteristics like
gender and the treatment works better on females than males.
Differential attrition = Heterogeneous attrition: the attrition effect is different for different groups.
(When lots of people start dropping out. Not random, because the people that remain are more
conscientious, more tolerant of boredom, etc., than those that leave.)
- Homogeneous attrition: the attrition effect is the same for all groups, treatments or
conditions. (When the easily bored participants are dropping out at about the same time)
Non-response bias: You mail out a survey to 1000 people but only 300 of them reply. The 300
people who replied are almost certainly not a random subsample.
- Also, not everyone will answer every question > The problem of missing data: If the data
that is missing was “lost” randomly, then it’s not a big problem. If it’s missing systematically,
then it can be a big problem.
Regression to the mean: refers to any situation where you select data based on an extreme value on
some measure. Because the variable has natural variation it almost certainly means that when you
take a subsequent measurement the later measurement will be less extreme than the first one.
Experimenter bias: the experimenter can accidentally end up influencing the results of the
experiment by subtly communicating the “right answer” or the “desired behaviour” to the
participants.
Reactivity/ demand effects: people alter their performance because of the attention that the study
focuses on them.
- The good participant tries to be too helpful to the researcher. He or she seeks to figure out
the experimenter’s hypotheses and confirm them.
- The negative participant does the exact opposite of the good participant. He or she seeks to
break or destroy the study or the hypothesis in some way.
- The faithful participant is unnaturally obedient. He or she seeks to follow instructions
perfectly, regardless of what might have happened in a more realistic setting.
- The apprehensive participant gets nervous about being tested or studied, so much so that
his or her behaviour becomes highly unnatural, or overly socially desirable.
Placebo effect: situation where the mere fact of being treated causes an improvement in outcomes.
(If you give people a drug and tell them that it’s a cure for a disease, they will tend to get better
faster because of the believe)
Situation, measurement and sub-population effects: The choice of sub-population from which you
draw your participants, the location, timing and manner in which you run your study and the tools
that you use to make your measurements might all be influencing the results.
Fraud, deception and self-deception
, - Data fabrication: people just make up the data, sometimes with good intentions (the
researcher believes that it reflects the truth, other occasions, the fraud is deliberate and
malicious.
- Hoaxes: often jokes intended to be discovered, but often to discredit someone or some field.
- Data misrepresentation: often the data don’t actually say what the researchers think they
say due to a lack of sophistication in the data analyses.
- Study “misdesign”: a researcher designs a study that has built-in flaws that are never
reported in the paper. The data that are reported are completely real and are correctly
analysed, but they are produced by a study that is actually quite wrongly put together.
- Data mining & post hoc hypothesizing: when you keep trying to analyze your data in lots of
different ways, you’ll eventually find something that “looks” like a real effect but isn’t.
- Publication bias & self-censoring: non-reporting of negative results.
H3.2
Analyses can be selected from the analysis ribbon or menu along the top. Selecting an analysis will
present an ‘options panel’ for that particular analysis, allowing you to assign different variables to
different parts of the analysis, and select different options. At the same time, the results for the
analysis will appear in the right ‘Results panel’ and will update in real-time as you make changes to
the options.
When you have the analysis set up correctly you can dismiss the analysis options by clicking the
arrow to the top right of the optional panel. If you wish to return to these options, you can click on
the results that were produced. In this way, you can return to any analysis that you (or say, a
colleague) created earlier.
If you decide you no longer need a particular analysis, you can remove it with the results context
menu. Right-clicking on the analysis results will bring up a menu and by selecting ‘Analysis’ and then
‘Remove’ the analysis can be removed. But more on this later. First, let’s take a more detailed look at
the spreadsheet view.
H3.3
Variables:
- ID: persons name, id number
- Nominal: text labels (gender)
- Ordinal: nominal with an order (agree, strongle agree, etc.)
- Continuous (=interval =ratio): height or weight
Computed Variables are those which take their value by performing a computation on other
variables. Computed Variables can be used for different purposes, like log transforms, z-scores, sum-
scores, negative scoring and means.
V functions perform their calculation on a variable as a whole. (For example, MEAN(A, B) will
produce the mean of A and B for each row. VMEAN(A) gives the mean of all the values in A.)
Jamovi also provides an “R Syntax Mode”. In this mode jamovi produces equivalent R code for each
analysis. To change to syntax mode, select the Application menu to the top right of jamovi (a button
, with three vertical dots) and click the “Syntax mode” checkbox there. You can turn off syntax mode
by clicking this a second time.
In syntax mode analyses continue to operate as before but now they produce R syntax, and ‘ascii
output’ like an R session. Like all results objects in jamovi, you can right click on these items
(including the R syntax) and copy and paste them, for example into an R session. At present, the
provided R syntax does not include the data import step and so this must be performed manually in
R. There are many resources explaining how to import data into R and if you are interested we
recommend you take a look at these; just search on the interweb.
H.3.4
Loading data from csv files:
- Heading: Does the first row of the file contain the names for each variable - a ‘header’ row?
The booksales.csv file has a header, so that’s a yes.
- Decimal: What character is used to specify the decimal point? In English speaking countries
this is almost always a period (i.e., .). That’s not universally true though, many European
countries use a comma.
- Quote: What character is used to denote a block of text? That’s usually going to be a double
quote mark (“). It is for the booksales.csv file.
H.3.5
Loading data from Textfiles:
- Header: the first row contains the column names. If that’s not open up the file in a
spreadsheet programme such as Open Office and add the header row manually.
- Sep: As the name “comma separated value” indicates, values are separated by commas.
- Quote: It’s conventional in csv files to include a quoting character for textual data. As you
can see by looking at the booksales.csv file, this is usually a double quote character, “. But
sometimes there is no quoting character at all, or you might see a single quote mark ’ used
instead.
- Skip: sometimes the first few rows have nothing to do with the actual data.
- Missing values: The data file needs to include a “special” value to indicate that the entry is
missing. By default, jamovi assumes that this value is 995, for both numeric and text data, so
you should make sure that, where necessary, all missing values in the csv file are replaced
with 99 (or -9999; whichever you choose) before opening / importing the file into jamovi.
Once you have opened / imported the file into jamovi all the missing values are converted to
blank or greyed out cells in the jamovi spreadsheet view. You can also change the missing
value for each variable as an option in the Data - Setup view.
Open Excel files first in Excel or another spreadsheet programme that can handle Excel files, and
then export the data as a csv file before opening / importing the csv file into jamovi.
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur merel001. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €3,99. Vous n'êtes lié à rien après votre achat.