False:
establishing association is easy as long as two variables generally move together but
that does not mean they cause one another and there for have a causal link. - answer Is
the Following True or False: Establishing causality is generally easier than establishing
association
True:
the point of randomization is to make the two groups receiving treatment and control as
similar as possible - answer Is the Following True or False: Randomizing treatment and
control reduces the risk of potential confounders
False:
I am not 100% sure about this, but maybe try to think about it like smoking and cancer.
smoking and cancer are correlated, but smoking is not a sufficient cause to get it. -
answer Is the Following True or False: if x is not a sufficient condition for y then x is not
correlated with y.
False.
In the case of a list, multiplying L*2 will only result in the list being repeated twice. -
answerIs the Following True or False: Suppose L is a list in python containing only
numeric values, then L*2 will return all the numeric values doubled.
false:
there are a lot of mistakes that can happen. Like if you are randomly selecting and you
just take outliers it will not equal the population parameter. - answerTrue or False: A
statistic calculated from a random sample must be equal to the population peramiter
True:
yes that is by detention the idea of a test stat. - answerIs the Following True or False:
The test statistic tells us the relative plausibility of the null vs alternative hypothesis.
true - answerIs the Following True or False
if we fail to reject the null hypothesis when the null is false we have committed a type 2
error
false:
empirical is what the distribution of what we actually observe.
sample is the distribution of what our sample predicts.
so if we roll a die 3 times that would be an empirical
,if we make a computer assimilate rolling a die 3 times that would be a sampling -
answerIs the Following True or False:
the sampling distribution is the empirical distribution of given samples values.
true:
This is what a bootstrap is - answerIs the Following True or False:
when calculating the sample distribution for a bootstrap we use replacement with our
observations
false:
95 out of 100 will be in it, but any given sample is binary so it does not have a
probability. - answerIs the Following True or False:
a given 95% confidence interval captures the true value of a parameter with a
probability of 95%
true
yeah as you add more and more data points things tend to bunch away from outliers. -
answerIs the Following True or False:
as our sample size increases the standard dev of the sample distribution is smaller
false there are decreasing marginal returns - answerIs the Following True or False:
The standard error decreases in a linear fashion ie increasing sample size by 2
decrease error by 2.
false - answerIs the Following True or False:
the data used in data science is always numeric
true
that is a feature - answerIs the Following True or False
correlation r is unit less.
true. - answerIs the Following True or False:
a given 90% confidence interval captures the true value of the parameter or it does not
the basic idea of association is that it is a measure of how related two variables are. it is
unit less because it just a measure of how one thing compares to another. -
answerassociation
in order to standardize association, we take points and subtract the mean form them
and then decide them by the standard deviation to put everything in a way that is unit
less and standard and in terms of how many standard deviations they are form the
mean. also z score is centerd at 0 , and 1 is 1 stnadrd deviationa way. this allows us to
compare data - answerhow does association relate to z scores
correctional coefficient is written as r, it is a measure of how strong an association is
and given between {-1,1} association is about how two variables are related, correlation
, is about how two variables are associated - answerwhat is correlation, correlation
coefficient and how do they relate to linearity.
it is using this correctional to create a linear relationship that allows us to predict where
a point is. it should be created in a way that minimzes error. - answerwhat is a linear
regression
a residual is the value of our observed value - our predicted value(y hat) it is often what
we want to minimize, and we use the least squares regression test, that focused on
outliers with higher outliers . - answerwhat is a risidual
it is basically where there are some points along our line that have a smaller or larger
residual than others, we want to try to minimize this.
also note that if there are points where all the points are above or bellow the line of best
fit, it probably indicates a non-linear relationship - answerwhat is non constant error
severance or heteroscedascity
so are stated above r is our coeraltion coeficant. squaring this gets us r^2 which tells us
how well much of the error in our measuerment is explained yb a linar relaitonship. -
answerwhat is r^2
when we take a smaple we are trying out best to estimate a population. there however
is going to be error in this naturally and we want to know how much this error would
make our random sample vary from any other we could have taken. so we use a large
sample and resample form it with replacement, to make many other possible samples
that we wehn calucalate the slope for. - answerwhat is the prupose of bootstarping the
slope
it is where we record all teh sloeps that were calucalted form out bootstrap and plot
them. - answerwhat is a sampleing distrobution
a confidince interval is saying if for intance we chose a 95% confidince interval. that of
the samples we took within our bootstap teh center 95% meaning form 2.5 to 97.% are
within these values and thus there is a 85% chance that a value may fall in this range,
but not of any value falling wthin this range as the value is a bainry and either is or is not
int hsi range once it sicaulcaute. - answerwhat is a confedince interval
classification is taking what we understnad about two indepnedint vraibles and using
them to help us predict something about the dependint vraible - answerwhat is
classification
observations are our dependint vraibles or more or less what we want to preidct, our
atrivute are the indpendint vraibles by wich we will about trying to predict the
observation - answerwhat are observations in machine learning
what are attributes in machine learning.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller julianah420. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $15.49. You're not tied to anything after your purchase.