Lecture 1: Fundamental Concepts, Applications, and Process of Data
Science
Objectives of this lecture:
• Explain and understand concepts central to this course such as AI, machine learning,
data mining, etc.
• Explain the importance of data for business
• Identify applications and tasks that can be solved by data analytics and support decision
making
• Explain and apply the analytics process model for developing data-driven business
solutions
Terminology and Fundamental Concepts of Data Science:
• Querying and reporting:
o You know exactly what you are looking for.
o SQL: SELECT * FROM CUSTOMERS WHERE AGE > 45
• OLAP: Online Analytical Processing:
o GUI to query large data collections in real-
time
o Pre-programmed dimensions of analysis (à
faster to find information than with querying)
o Summary level
ð For both Querying and OLAP: No modeling or pattern
finding. OLAP GUI Example
è Classic Business Intelligence: You know what you are looking for à Query/OLAP
• Data Science: “A set of fundamental principles that guide extraction of knowledge from
data”
• Data Mining: “The extraction of knowledge from data, via technologies that incorporate
these principles”
• Big Data: “Data that is so large that traditional data storage and processing systems are
unable to deal with it”
ð You don’t know what you look for/want to find new intricate patterns in the (big) data
à Data Mining (to create value from unprocessed data)
1
,Technologies:
ð Last decade: evolution of AI relying more and more on ML, and ML on DL, but not
synonyms!
Examples of Applications of Data Science:
A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its
whereabouts are unknown.
The incident occurred on the downtown train line, which runs from Covington and Ashland
stations. In an email to Ohio news outlets, the U.S. Department of Energy said it is working with
the Federal Railroad Administration to find the thief. “The theft of this nuclear material will
have significant negative consequences on public and environmental health, our workforce
and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement.
“The safety of people, the environment and the nation’s nuclear stockpile is our highest
priority,” Hicks said. “We will get to the bottom of this and make no excuses.”
ð What could be the relevance of this article regarding AI?
o This text was partly written by an A.I., only the part in bold was written by a
human. The rest was filled-in by an A.I. using only the first sentence as a base.
What could be the relevance of these pictures regarding AI?
ð What could be the relevance of these pictures regarding AI?
o Both people are images generated by A.I..
2
,Concerns?
• Modern ML techniques are very good at learning complex patterns in data to solve
certain types of predefined tasks
• Data science harnesses these techniques to solve commercial and business issues to
create value
Data:
• At the basis of all of this: data!
• What is data?
o Raw stream of facts
Sometimes big:
• The Large Hadron Collider (LHC at CERN) has 150 million sensors, together generating
about 40 million measurements per second
• Walmart registers more than a million customer transactions per hour
Data as a strategic asset:
• Data can lead to better decision making through data science
• Data à information/knowledge
• Data is a valuable asset
Which types of decisions to support through data science:
• Decisions for which discoveries need to be made:
o Usually high impact
o E.g., prediction of demand shocks in times of crisis
3
, • Decisions that repeat, especially at massive scale:
o Decision-making can benefit from even small increases in decision-making
accuracy on data analysis.
o E.g., credit scoring
The Data Science Process:
Important technology: Machine learning
ð Learns from data.
o But what is learning?
Learning:
• We usually learn a function:
y = f(x)
• f: a mathematical or logical formula:
o Can be learned using algorithms that learn f(x) from data, from examples
o E.g.: f() a program to identify cats in video data
o Gets better with more examples à Remember: Machine learning
OR:
o Mapping of x to y can be hardcoded, what the program does à solution is thus
not “learned”
Example:
• y = f(x) looks suspiciously like linear regression:
ð But often more complex!
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
√ Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Madikan. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.