Machine Learning and Reasoning for Health (XM_0102)
All documents for this subject (2)
Seller
Follow
tararoopram
Content preview
L1: Introduction
ML is to automatically learn patterns form data. A computer program is set to learn from
experience E with respect to some class of tasks T and performance P, if its performance at
tasks in T improves with E.
KR: Representing information about the world in a form that a computer can use and knowledge
about solving complex tasks.
Characteristics of medical data: Sparsity (lot of zeros/NAs; sparse data requires a lot more
instances), Missing values (result in more noise, potential biases in models), Lot of assumptions
needed to process the data (e.g., diagnostic code not provided means the patient did not have
the disease; when they do not hold the learned models can be inaccurate)
Missing values
● Because often not continuously measured, often measured for a reason and registration
itself can be poor
● Types: MCAR (missing data is independent of the observed and unobserved data), MAR
(missing data is systematically related to the observed but not the unobserved data),
MNAR (missing data is systematically related to the unobserved data, i.e., related to
events or factors which are not measured by the researcher → highest bias risk)
Injecting (medical) knowledge into machine learning:
● Feature selection
○ Learning theory says
○ Assume we have M hypotheses, then:
○ Basically, if given enough training examples we can approximate the
out-of-sample error arbitrarily well by the in-sample error. The statement comes
with a probabilistic caveat, it only holds with a certain probability 1−δ > 0.
However, this probability can be chosen to be arbitrarily small. Note that every
finite hypothesis set is PAC learnable.
○ The more features, the more hypotheses we will need, and the more instances
we will need
○ Feature selection (reducing the number of features) can reduce model complexity
(i.e., for ERF, measure explanainity with number of terms in model/average rule
length of DT/proportion of expert knowledge terms in DT)
● Feature abstraction: we reduce model complexity by creating features at a higher level of
abstraction (e.g., using coding structures/ontologies)
1
, ● Missing value handling → use closed world assumption (CWA): absence of a fact means
it is necessarily false; knowledge can help determine whether cwa is correct
● ML techniques that embed knowledge (ExpertRuleFit)
Module 1: Machine Learning with Prior Knowledge
L2: Design Patterns & Expert Rule Fit
There are two AI systems
● System 1: thinking fast (data driven) → statistical
○ Scalable: worse with less data → “sample inefficiency”
○ Not explainale: blackbox
● System 2: thinking slow (knowlegde driven) → symbolic
○ Scalable: worse with more data → “combinational explosion”
○ Explainable because of rules
Make patterns by coupling elementary components
● Informed learning with prior knowledge (to improve performance of a learning algorithm)
○ Use of symbolic reasoner to improve the performance of a subsequent learning
algorithm
○ Use knowledge for data abstraction
○ Use knowledge for data completion → compensate for incompleteness of data
(Expert) RuleFit
● RuleFit: Rule-based ML ensemble method,
that combines several base models (e.g.,
boosing methods) in order to produce one
optimal predictive model → classification
model
○ Ensemble learning takes the form of a generalized, linear model
○ Two stages of RuleFit
i. Ensemble generation
2
, ● Tree ensemble generation via stochastic gradient boosting,
Conversion of trees into rules, Rule Cleaning, Inclusion of linear
terms
● Resulting ensemble consists of rules + linear terms
ii. Lasso-regularized regression
● Learning the weight coefficients by solving an optimization
problem; we regularize/penalize weight coefficients 𝛼’s and 𝛽’s to
melt down the initial set of rules to the truly informative ones.
○ Removal of uninformative rules and linear terms
● Rulefit strengths/ limitations
○ Strengths: comparatively high interpretability, potential for knowledge discovery,
state-of-the-art predictive performance
○ Limitations: training data dependency, limited expert acceptance
● Three stages of Expert RuleFit
1. Expert Knowledge Acquisition
2. Combined (Rule) Ensemble Generation
3. Expert Knowledge-Aware Regularization
○ Adaptive lasso with customised penalty values v for expert knowledge
● Aims Expert RuleFit: combine the strengths of ML and Knowledge representation
○ Exploiting the “learning ability” from the data
○ Exploiting the knowledge that we already have (we extend our knowledge with
learning of data)
○ Adding knowledge for “explainability” (rules from experts are more acceptable for
experts)
■ Goal: improve performance and increase human trust in model results
(explainability)
Some methods/models
● Boosting: iteratively training a base classifier on re-weighted versions of the training data
then employing the weighted sum of estimates of the trained sequence of classifiers. At
each iteration: weights are increased for those cases that have been most frequently
misclassified and weights are decreased for correctly classified cases; emphasis on
instances that are difficult to predict.
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller tararoopram. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $10.61. You're not tied to anything after your purchase.