100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Intro. To Research In Marketing Spring €3,39   In winkelwagen

Samenvatting

Samenvatting Intro. To Research In Marketing Spring

 19 keer bekeken  1 keer verkocht

Alle info/stof vanuit de lectures samengevat.

Voorbeeld 3 van de 22  pagina's

  • 17 maart 2022
  • 22
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (2)
avatar-seller
FantaNaranja
DV = outcome metric: ratio/interval = contentious
IV = explanatory non-metric: ordinal/nominal = categorical

L1: Chapter 2 Book p. 31-88 (see images)

1. Univariate profiling: the starting point for understanding the nature of any variable is to
characterize the shape of its distribution. → Histogram.

2. Bivariate profiling: relationships between two or more variables. → Scatterplot. This is a
graph of data points based on two metric variables.
A straight line means a linear relationship or correlation. A curved set can denote a
nonlinear relationship. And a random pattern may indicate no relationship.
Bivariate profiling, examining group differences: Use a Boxplot, a pictorial representation
of the data distribution of a metric variable for each group of a nonmetric variable.
Boxplot: the middle 50% above 25% and beneath 75%. median = line. A bigger spread is
bigger boxplot = higher standard deviation. Whiskers: lines extending from each box.

Outliers = 1.0 to 1.5 quartiles (25%-37.5%) away from the box
Extreme values = 1.5+ quartiles away from the box are both depicted by symbols outside
whiskers.

3. Multivariate profiling: To compare observations characterized on a multivariate profile
we can add multivariate graphical displays.
1. direct portrayal of the data values
2. mathematical transformation of original data into a relationship = Andrew Fourier
3. face, high representativeness.

MISSING DATA, we need to worry because:
1. Practical impact: the reduction of sample size n.
2. Substantive impact: results based on data with a nonrandom missing data process could
be biased.

4 step process for identifying missing data and applying remedies:

STEP 1: Determine the type of missing data
Ignorable missing data: when missing data is expected and used in technique.
• When taking a survey, the part of population that is not included in sample is missing.
• It is part of the survey list. Respondents skip questions in case of certain answers
• Censored data: when respondents cannot give complete info (time or death)
Not ignorable data:
• Many missing data processes are know to the researcher (failure to complete
survey), but some remedies can be used.
• Unknown missing data processes are less easily identified. (refusal to answer because
of sensitive nature). When missing data = random, remedies may be available.

, STEP 2: Determine the extent of missing data
Missing data under 10% can be ignored.
Before step 3, the researcher should consider Deleting individual Cases and/or Variables.
Variables with lower than 15% missing data are candidates for deletion, but higher levels
(20/30%) can often be remedied.
STEP 3: Diagnose the randomness of the missing data process
• Missing data are termed missing at random (MAR) if the missing values of Y depend
on X. → not generalizable
• A higher level of randomness is termed missing completely at random (MCAR). The
cases with missing data are indistinguishable from cases with complete data.
Diagnostic tests for levels of randomness:
1. form 2 groups: missing data for Y and valid values for Y. → test if sign. Difference exist.
2. second approach: overall test of randomness to determine to classify data as MCAR.

STEP 4: Select the imputation method
Imputation is the process of estimating the missing value based on valid values of other
variables in the sample.

Imputation method Advantages Disadvantages Best used when
Complete data: only have -Simplest and default -reduction in n -large n
complete data for programs -affected by -strong relationships
nonrandom process - low missing data
All available data -maximizes use of data -varying sample n -low missing data
-results in largest n -out of range values -moderate relationship
Case substitution: replace entire -realistic values -must have similar - additional cases
observation additional cases available
Hot and Cold Deck imputation: - replaces missing data -must define suitably - established
Hot: other observation in list with actual values from similar cases replacement value is
Cold: external case input similar case known
Mean substitution: replace - easy -reduces variance - low missing data
missing with mean value - all cases complete - lowers correlation -strong relationship
Regression imputation: predict - employs relationships - reduces generalizable - mod/high missed data
missing data on its relationship - values are connected -need strong relation - strong relationships
Model-Based Methods: involve - accommodates - complex model -only method that can
missing data nonrandom and -requires specialized accommodate
random data processes not available software nonrandom missing
- best representation data

Under 10% - any imputation.
10%-20% - all-available, hot deck case & regression for MCAR. Model-based for MAR.
20%+ - regression for MCAR and model-based for MAR.

Outliers:
-Practical: 20 persons income between 30k-60k, average 48k, but when 1 people of 1 million
adds here, 90k average. Researcher must asses whether value is retained or eliminated due
to its undue influence on results.

, -Substantive: outliner must be viewed how representative it is of the population. If there is a
group of millionaires this can be retained, but if he is the only one, it may be deleted.

Why do outliers occur?
1. Procedural error: data entry error, mistake in coding
2. Extraordinary event: a unique real observation, researcher decides if it fits objective
3. Extraordinary observations with no explanation: researcher decides (most deleted)
4. Unique in their combination: not high or low, but unique in combination.

Detecting outliners:
- Univariate Detection: make normal distribution, look if outliner exist normally in
distribution, then decide. Outliner 2.5 or higher.
- Bivariate Detection: pair of variables can be assessed jointly through scatterplot. Downside
is when there are 5 variables, we already have 10 graphs.
- Multivariate Detection: more than 2 variables. Bivariate becomes inadequate, because a
lot of graphs and are limited to 2Dimension. High D2 value is observation farther removed
from general distribution.

Four important statistical assumptions
1. Normality: Shape of the distribution if its not normal: Kurtosis: flatness, Skewness:
balance of distribution.
2. Heteroscedasticity: result of 1 or more variables of nonnormality.
3. Linearity: remedy for nonlinear relationships can be transformed to linear.
Absence of correlated errors: when these errors aren’t found, serious biases can occur.

Lecture 1

Multivariate Analysis: ‘Broadly speaking, it refers to all statistical methods that
simultaneously analyze multiple measurements on each individual or object under
investigation’

Nonmetric measurement scales:
Nominal: unique definition (brand name, %, mode, Chi square test
gender, student ANR)
Ordinal: indicate ‘order’, sequence (level of percentiles, median, rank correlation
educations)
Metric measurement scales:
Interval: arbitrary origin (attribute scores, Arithmetic average, range, standard
price index) deviation, product-moment correlation
Ratio: unique origin, zero point (age, cost, Geometric average, coefficient of variation
number of customers)

Reliability: Is the measure ‘consistent’, correctly registered?
Validity: Does the measure capture the concept it is supposed to measure?

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper FantaNaranja. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,39. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 75323 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€3,39  1x  verkocht
  • (0)
  Kopen