Resume

Summary of practicals

0 fois vendu

Cours
Missing Data Theory and Causal Effects

Établissement
Universiteit Utrecht (UU)

Alle practica van MDCE samengevat.

[Montrer plus]

Aperçu 3 sur 16 pages

Voir l'exemple

Publié le 7 avril 2021
Nombre de pages 16
Écrit en 2020/2021
Type Resume

practicals
missing data theory
causal effects

Établissement
Universiteit Utrecht (UU)
Cours
Minor methoden en statistiek
Cours
Missing Data Theory and Causal Effects

willemijnvanes

Membre depuis 6 année 62 documents vendus

$6.42

Ajouter au panier

Enregistrer

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Notes on practicals
Practical 1

We know it is a function in R, because of the parenthesis.

If you are creating a dataframe in the way
dat3a <- as.data.frame(mat3)
from an abject in which the propreties are not correct, the resulting dataframe is not correct.
Therefore, you should create a dataframe from the data itself
dat3b <- data.frame(V1 = vec1, V2 = vec2)
when your objects are both numerical and characters
vec 1 <- 1, 2, 3, 4, 5, 6
vec2 <- A, B, C, D, E, F

Factor = a categorical variable with a numerical representation
With the factor function you can change the labels of your factors, assign ‘Utrecht’ to 1.

Overview of the dimensions of a dataset, rows and columns
dim(boys)
With the head and tail function, you get the first or last 6 cases. Way of inspecting your
dataset.

Labels for missing data: <NA>(non-numeric data) or NA (numeric data) means not available

Using the exclamation mark (!), turns TRUE into FALSE and FALSE into TRUE.

To inspect your data you can use different functions:
The structure function gives you an overview of the measurement levels, of the head of the
data (first few variables), and the class of the variables
str(boys)
The summary function gives you information about the distribution for numeric data, and the
table for categorical data on all the variables.
summary(boys)
If you want to explore a certain dimension, you use the dollar sign ($). For example the
standard deviation of age in the dataset boys.
sd(boys$age)
We cannot calculate a standard deviation without telling R how to deal with the missingness.
na.rn = TRUE
means remove the missing values. So, then you will only calculate the standard deviation on
the observed data.

If you want to ask for data with two combined variables, we need two separate evaluations.
mean(subset(boys, age < 15 & reg 1= “north”)$age, na.rn = TRUE)
Within the subset you specify your two dimensions, and then you only use the subset age.

When you load a dataset you can open a help-screen with
?mammalsleep
and it gives you information about the variables names.

,The input for a correlation function for each complete observed pair is
cor(sleepdata, use = “pairwise.complete.obs”)
Exclude the categorical columns, for example column one, by using
cor(sleepdata(,-1), use = “pairwise.complete.obs”)
However, the correlationmatrix has many decimals, so take this into account with the round
function. You can for example round the correlations to two decimals
round(cor(sleepdata(,-1), use = “pairwise.complete.obs”), 2)

Convenient functions, any object in the workspace can be saved.
save.image(“Practical_X.RData”)
save(sleepdata, file “Sleepdata.RData”)

If you want to exclude variables, you can do this with the names of the variables
exclude <- c(“Echida”, “Lesser short-tailed shrew”, “Musk shrew”)
which <- sleepdata$species %in% exlcude
The which is a vector with the same length of the data and when you apply this you only get
the names back by default for which it says TRUE. So your new dataset with the excluded
variables would be
sleepdata2 <- sleepdata(!which, )

When plotting your variables, you use ~ which indicates that you want to model something,
based on something else. It separates the outcome part from the predictor, allowing for a
visual representation.
plot(brw ~ species, data = sleepdata2)

If you want to find all your cases that are higher/lower than one standard deviation above the
mean, you take several steps
sd.brw <- sd(sleepdata2$brw)
mean.brw <- mean(sleepdata2$brw)
which <- sleepdata2$brw > (mean.brw + (1 * sd.brw))
as.character(sleepdata2$species[which])
So, you calculate the standard deviation and the mean of brain weight, then you make a new
object (this overrides your last used code under which). With which you calculate the
variables bigger than one standard deviation above the mean, and expose the species for
which which holds as a character.

Practical 2

Objects in R are case-sensitive. This means that
a <- 100
A <- 200
are different characters with each their own value.

To learn more about the data, use one of the two following help commands
help(nhanes)
?nhanes
To get an overview of the data, use
summary(nhanes)

, When you want to explore the missingness in the dataset you can use the summary command,
or
apply(nhanes, MARGIN = 2, FUN = function(x) sum(is.na(x)))
The code ‘applies’ the function that calculates the sum (sum()) over the missings (is.na) on a
set of data (x). The nice thing about apply is that you can apply functions on two-dimensional
objects. In this case you execute a function that calculates the sum of missings (FUN =
function(x) sum(is.na(x))) over the columns (MARGIN = 2) of object nhanes. If you would
change MARGIN = 2 to MARGIN = 1, you would do the same, but over the rows of nhanes.

The function colMeans()calculates the mean of numerical columns
colMeans(nhanes, na.rm=TRUE)
However, you have to specify how you would like to handle the missing values. By using
na.rm=TRUE
it tells R that you would like to remove (rm) the missings (na).

To determine how many cases would be available if only the complete cases were used, there
are multiple ways
1 You could look at the data and determine the number of completely observed cases
2 You could use the missing data pattern to deduce the number of cases for which the pattern
1 1 1 1 (everything observed) holds.
3 You could use code to determine the number of cases (rows) that have no missings. For
example:
nrow(na.omit(nhanes))
It performs listwise deletion on the object you use the function on. In other words, it removes
any incomplete row.

To check the missing data patter, use
md.pattern(nhanes)
Looking at the missing data pattern is always useful (but may be difficult for datasets with
many variables). It can give you an indication on how much information is missing and how
the missingness is distributed.

If you want to create a missingness indicator to indicate if your variable is missing or not
missing you create a new vector
rbmi <- is.na(nhanes$bmi)
rbmi
You create a new vector rbmi (you can see it as a variable) that indicates whether bmi is
missing (TRUE) or not missing (FALSE), with the same length as the old variable.

To test if the missingness in one factor depends on another factor perform a t-test with
t.test(age ~ rbmi, data=nhanes)
You test here whether the missingness in bmi depends on age.

With a bivariate dataset you can calculate the correlation between the variables with the
following code
cor(data)

With partially incomplete data you can use ad hoc imputation methods to impute the missing
variables.
First you need to evaluate the means and correlation of the incomplete data set.

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur willemijnvanes. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour $6.42. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

69569 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 15 ans

Commencez à vendre!

Récemment vu par vous

Pack ·

(0)

BEM 1ste jaar

Notes de cours ·

(0)

tema 5 apuntes

Resume ·

(0)

Freud Zusammenfassung

Resume ·

(0)

Tema 1: Células

Autre ·

(0)

Crisissituatie

Resume ·

(0)

CSP4801 Assignment 8

Dissertation ·

(0)

Unit 15 Assignment C

Resume ·

(0)

Caja de Herramientas

Notes de cours ·

(0)

Resume

Summary of practicals

Infos sur le Document

Sujets

École, étude et sujet

Vendeur

Avis reçus

Aperçu du contenu

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

L’achat facile et rapide

Focus sur l’essentiel

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Garantie de remboursement : comment ça marche ?

Auprès de qui est-ce que j'achète ce résumé ?

Est-ce que j'aurai un abonnement?

Peut-on faire confiance à Stuvia ?

Récemment vu par vous

Pack ·

BEM 1ste jaar

Notes de cours ·

tema 5 apuntes

Resume ·

Freud Zusammenfassung

Resume ·

Tema 1: Células

Autre ·

Crisissituatie

Resume ·

CSP4801 Assignment 8

Dissertation ·

Unit 15 Assignment C

Resume ·

Caja de Herramientas

Notes de cours ·

ESPECTROFOTÓMETRO - UV