Lecture 1: Fundamental Concepts, Applications, and Process of Data
Science
Objectives of this lecture:
• Explain and understand concepts central to this course such as AI, machine learning,
data mining, etc.
• Explain the importance of data for business
• Identify applications and tasks that can be solved by data analytics and support decision
making
• Explain and apply the analytics process model for developing data-driven business
solutions
Terminology and Fundamental Concepts of Data Science:
• Querying and reporting:
o You know exactly what you are looking for.
o SQL: SELECT * FROM CUSTOMERS WHERE AGE > 45
• OLAP: Online Analytical Processing:
o GUI to query large data collections in real-
time
o Pre-programmed dimensions of analysis (à
faster to find information than with querying)
o Summary level
ð For both Querying and OLAP: No modeling or pattern
finding. OLAP GUI Example
è Classic Business Intelligence: You know what you are looking for à Query/OLAP
• Data Science: “A set of fundamental principles that guide extraction of knowledge from
data”
• Data Mining: “The extraction of knowledge from data, via technologies that incorporate
these principles”
• Big Data: “Data that is so large that traditional data storage and processing systems are
unable to deal with it”
ð You don’t know what you look for/want to find new intricate patterns in the (big) data
à Data Mining (to create value from unprocessed data)
1
,Technologies:
ð Last decade: evolution of AI relying more and more on ML, and ML on DL, but not
synonyms!
Examples of Applications of Data Science:
A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its
whereabouts are unknown.
The incident occurred on the downtown train line, which runs from Covington and Ashland
stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with
the Federal Railroad Administration to find the thief. “The theft of this nuclear material will
have significant negative consequences on public and environmental health, our workforce
and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement.
“The safety of people, the environment and the nation’s nuclear stockpile is our highest
priority,” Hicks said. “We will get to the bottom of this and make no excuses.”
ð What could be the relevance of this article regarding AI?
o This text was partly written by an A.I., only the part in bold was written by a
human. The rest was filled-in by an A.I. using only the first sentence as a base.
What could be the relevance of these pictures regarding AI?
ð What could be the relevance of these pictures regarding AI?
o Both people are images generated by A.I..
2
,Concerns?
• Modern ML techniques are very good at learning complex patterns in data to solve
certain types of predefined tasks
• Data science harnesses these techniques to solve commercial and business issues to
create value
Data:
• At the basis of all of this: data!
• What is data?
o Raw stream of facts
Sometimes big:
• The Large Hadron Collider (LHC at CERN) has 150 million sensors, together generating
about 40 million measurements per second
• Walmart registers more than a million customer transactions per hour
Data as a strategic asset:
• Data can lead to better decision making through data science
• Data à information/knowledge
• Data is a valuable asset
Which types of decisions to support through data science:
• Decisions for which discoveries need to be made:
o Usually high impact
o E.g., prediction of demand shocks in times of crisis
3
, • Decisions that repeat, especially at massive scale:
o Decision-making can benefit from even small increases in decision-making
accuracy on data analysis.
o E.g., credit scoring
The Data Science Process:
Important technology: Machine learning
ð Learns from data.
o But what is learning?
Learning:
• We usually learn a function:
y = f(x)
• f: a mathematical or logical formula:
o Can be learned using algorithms that learn f(x) from data, from examples
o E.g.: f() a program to identify cats in video data
o Gets better with more examples à Remember: Machine learning
OR:
o Mapping of x to y can be hardcoded, what the program does à solution is thus
not “learned”
Example:
• y = f(x) looks suspiciously like linear regression:
ð But often more complex!
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Madikan. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.97. You're not tied to anything after your purchase.