All the lectures are about data driven decision making.
Data brings in information but could also bring in garbage. When we don’t distinguish the
information from the garbage, we have the trap of garbage in, garbage out namely that
garbage data coming in leads to garbage decisions.
FSA Lecture 3 1
,You can have the best model but if your data is garbage, your results will be garbage.
You need also good models, even if you have perfect data but a garbage model you will have
garbage results.
There is one exception where we can have some garbage data (data with outliers, data with
duplicates, data which is missing) but when we have good models, models that can deal with
that garbage will still lead to reliable results even there is some garbage in the data.
FSA Lecture 3 2
,That’s the use of robust models, models that are robust to this types of problems in the data.
Either we avoid the garbage and we do data cleaning or we design the models to be robust
such that they still make reliable decisions in the presence of garbage.
➔ Importance of data cleaning (and to be efficient at doing it)
4
Data cleaning is unavoidable when handling data. Fortunately a big part of data cleaning can
be automated, that laid into routines and therefore delegated to algorithms that will do the
work. Even though we find that data scientist spend most of their time on collecting the data,
clean the data and organizing the data.
FSA Lecture 3 3
, Data cleaning, handling data is a bit like going to the doctor. First the doctor needs to diagnose
what’s going on and then propose a solution. Here we also going to diagnose the type of dirty
data: duplicates, missing values and outliers. Depending on the type of dirty data we will give
different solutions for example with duplicates we decide to remove them, missing data we
can also remove them or do imputation, imputation means that we replace the missing value
with a reasonable number and similarly for outliers, we can decide to remove them or replace
them with a reasonable value.
FSA Lecture 3 4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller hwugent. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.36. You're not tied to anything after your purchase.