100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary STA130 Midterm Aid Sheet CA$12.20   Add to cart

Summary

Summary STA130 Midterm Aid Sheet

 7 views  0 purchase

This is the study document I used to study for the midterm. We were able to use an aid sheet during the exam, and you can use this to inspire the content and layout of yours. I would add more information on confidence intervals, and more code examples.

Preview 1 out of 2  pages

  • January 17, 2023
  • 2
  • 2022/2023
  • Summary
All documents for this subject (3)
avatar-seller
ralwab
Modern Stats+DS software/programming/computational tools → mathematical+algorithmic data/statistical analysis methodologies →
explained+advocated w/ written+verbal communication → facilitate data-driven and evidence-based decision making
Learning first learning, structured course material is good → it’s faster to learn and troubleshoot problems yourself
Jupyterhub is a cloud-based service → run R/Rstudio from any web browser. Jupyterhub > Rstudio GUI IDE program that wraps… > R) > tidyverse
R Markdown Reproducibility (text+outputs+code)
R methods+algorithms usually built-in/loaded from packages → most R users don’t build algorithms/data types
tidyverse Key set of R packages that help facilitate modern stats+DS
bias survivorship bias → look at data that survived and doesn’t look at group with no data
alpha significance
Basic Functions glimpse() → summary printout shows variables vertically & shows no. of rows
head() → output is tibble & doesn’t show total no. of rows & can see n rows
c() → vector | all() → output is boolean | sum() → translate logical TRUE to numeric 1 and logical FALSE to numeric 0
help() | name() → column names
data/variable types numerical(cont, disc) | categorical (nom, ord, bin→categorical variables = logical T/F boolean variables)
123 & 1.23 same
for R (double)

Coercion




Visualisation Func coord_flip(), order geom_bar, labs(x= , y= )
Distributional 1st → centre/location: median, mean, mode
Characteristics
2nd → Spread/scale statistics: IQR, variance, SD
3rd/Higher order characteristics → skewness+modality+outliers
Truly tidy data Rows→ observations | columns→ variables | cell→ single measurement
Tidy data benefits Can use same tools in similar ways for diff datasets vs hard to reuse untidy data & one-time approaches
print vs head print → outputs n number of rows indicated.
Data Wrangling select() → extract subset of variables | remove variable w/ ‘-’ and rename w ‘=’ vs dplyr::rename(),
Functions (dplyr)
filter() → extract rows based on conditions in one+ columns & filter(is.na())
arrange() → sort observation based on values in one or more variables & desc()
mutate() → make new column w/ interesting variables & case_when(<condition eg. b>=a ~ “Female”,>) → ‘~` =
response (L) DEPEND ON explanatory variables (R)
Aggregation functions → summarise((n=n() → sample size *doesn’t know NA values, <obj>=sum(), median(), mean(),
var(), sd(), IQR(), quantile(<obj>, 0.75), min(), max())
group_by() %>% → group rows by column values
is.na() | !is.na()
na.rm() → ignores/excludes NA
Other: n_distinct()
%in% → see if an element is in dataframe/vector | levels() and nlevels()
Inference Theoretical populations vs Actual samples → population-(sampling)->sample-(inference)->population
Sample statistic


x̄ →
Hypothesis Testing
Functions




[i] → indexing into a vector, matrix, array, list or dataframe
Steps 1. Null Hypothesis → assumed value of parameter H0 : p=0.5 (sampling distribution to be compared against observed
test stat) & Alternative Hypothesis → H1 : p≠0.5 (Null is FALSE)
2. Set α-significance level (the probability we make a wrong decision about a chosen assumption) → reject H0 for
p-values less than α. It’s also probability→Type I error of rejecting a true H0 … Type II error failing to reject true NULL
3. Simulate Sampling Distribution assuming NULL is TRUE & 4. Compute p-value → The probability [can be
approximated] of observing a test statistic that is as or more extreme than the one we got if the NULL Hypothesis is
actually TRUE
5. “Reject H0 at α-significance level” if p-value is less than α OTHERWISE “fail to reject NULL at sig level”
Example Two 1. pick α=0.05 & placebo: 0.58 & actual: 0.75
Sample Hypothesis
Test
2. Test stat μ1=0.58 & μ2=0.75 → p=0.75-0.58
3. H0 : μ1=μ2 → μ1-μ2=0 & H1 : μ1≠μ2
4. Simulate sampling distribution assuming NULL is TRUE → set.seed() and n repetitions

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller ralwab. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for CA$12.20. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

73216 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
CA$12.20
  • (0)
  Add to cart