Data-Driven Artificial Intelligence
Lecture 1: Introduction
AI breakthroughs
- 2016: AlphaGo beats Go master Lee Se-dol
- 2021: Dall-E: creating images from text, e.g. type in “an oil pastel drawing
of an annoyed cat in a spaceship”
- 2022: ChatGPT: ask any question
The hype curve on AI 2023
Power of data
Data flood
- 1000 MB = 1 gigabyte (GB) 1000 GB = 1 terabyte (TB)
- 1000 TB = 1 petabyte (PB) 1000 PB = 1 exabyte (EB)
- 1000 EB = 1 zettabyte (ZB) 1000 ZB = 1 yottabyte (YB)
Big data
- Volume: quantity of generated and stored data
- Variety: type and nature of the data
- Velocity: speed at which the data is generated and
processed
AI and big data drive the 4th Industrial Revolution
Business decisions:
- Almost all activities in running businesses involve decision making: main
task of managers
- Decision analysis for dealing with the decision problems in a structured
way: studied for a long time in business research (decision support
systems)
Business intelligence is composed of methods that enhance efficiency and
facilitate decision making by integrating information and processes with the use
of tools that transform data into useful and actionable knowledge.
, From data to information to knowledge
Business intelligence and decision support
- Conventional decision support: emphasis on
deduction
- Business intelligence: emphasis on
induction
- BI: data-driven (AI) decision support
Degree of intelligence (this course focus: descriptive and predictive)
- Descriptive analytics: uses data to understand past and current
business performance
o Answers questions: what has occurred? How much did we sell in
each region? What type of customer she is (regarding return
behavior)?
o Techniques and methods:
Reporting, summarization, visualization
Segmentation: clustering, associate rule
- Predictive analytics: analyses past performance to predict the future;
what will occur?
o Answers questions: how much we will sell in each region?
o Techniques and methods:
Regression and classification
Text mining/natural language processing
- Prescriptive analytics: identifies the best alternatives to minimize or
maximize some objective: what should occur?
o Answers questions: how much should we produce to maximize
profit?
o Techniques and methods:
Mainly optimization techniques: mathematical optimization
models; heuristics
The human factor becomes less important.
Some risks should be taken seriously.
Just because something can be
measured doesn’t mean that it should
be measured.
The EU approach to AI is human centric.
7 requirements for trustworthy AI: human agency and oversight,
robustness and safety, privacy and data governance, transparency,
diversity/non-discrimination/fairness, societal and environmental well-
being, accountability
The EU Act regulates AI applications.
The five tribes of AI
,There is no clear consensus on the definition of AI. Russell & Norvig: a program
that:
- Acts like human
- Thinks like human (human-like patterns of thinking steps)
- Acts or thinks rationally (logically, correctly)
Master algorithm hypothesis: “all knowledge – past, present and future – can
be derived from data by a single, universal learning algorithm”.
The five tribes
Tribe Origin Master Strength Methods/
Algorithm technology
Symbolists Logic, Inverse Structure inference Inverse
philosophy deduction deduction,
reasoning
Connection Neuroscien Back- Estimating Back-propagation,
sits ce propagation parameters deep learning
Evolutionari Biology Evolutionary Structure learning Evolutionary
es algorithms algorithms
Bayesian Statistics Probabilistic Weighing evidence HMM, graphical
inference models
Analogizers Psychology Kernel Mapping to new kNN, SVM, …
machines instances
Symbolic AI
- Often referred to good old-fashioned AI, dominant AI school in 1970s,
1980s
- Many aspects of intelligence achieved by manipulation of symbols and
symbolic solvers
- Identifying and extracting regularities
o Propositional logic
o Rules form: A implies B
- Deductive reasoning: the process of drawing deductive
inferences
o An inference is valid if its conclusion follows logically from
its premises
- Inverse reasoning (induction): identifying missing components
that block deductive reasoning
o New knowledge created through generalization
- Applications: expert systems, automated theorem provers, ontologies,
automated planning and scheduling systems
Connectionist AI
- What the brain does is learning;
knowledge is stored in connections
between neuros
- The dominant tribe today!
Evolutionary AI: learning is about
natural selection
, In 1975, John Henry Holland wrote the ground-breaking book on genetic
algorithms: “Adaption in Natural and Artificial systems”
Bayesian AI: learning is uncertain inference (challenge: dealing with noisy,
incomplete information)
Bayes’ theorem tells us how to update our beliefs in light of
new evidence
Analogy-based AI: learning is about
recognizing similarities
Data pre-processing
CRISP-DM framework: cross-industry
standard process for data mining
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
Data understanding: descriptive analytics
- Basic statistical descriptions of data
- Data visualization
Data preparation consists of the following four tasks:
1. Data integration
2. Data cleaning
3. Data reduction
4. Data transformation & discretization
Data integration:
- Example: six different databases used
for data collection -> Order
Management System (OMS), Orders,
Customers, Early Customer Solution
(ECS), Products, Transaction Margin
- Data source problems:
o Different original purpose
o Different database schemas
o Different information detail/granularity
o Different data semantics
o Different file formats
- Solutions:
o Scheme mapping
o Conversion into standards (e.g. MySQL, RDF triples, XML schemas)
o Creation of new schema & vocabulary (e.g. for variable names)
- Challenges: