Data science and ethics
Inhoud
Inleiding .................................................................................................................................. 6
Course and Evaluation........................................................................................................ 6
Why care? ........................................................................................................................... 6
1. Expected from society ............................................................................................................. 6
2. Huge potential risks ................................................................................................................. 6
3. Potential benefits .................................................................................................................... 7
4. Future ...................................................................................................................................... 7
5. SciFi becomes Sci ..................................................................................................................... 7
Goal of the course .................................................................................................................. 8
Ethics in the News................................................................................................................... 8
Data science ethics ................................................................................................................. 8
Trolley Problem .................................................................................................................. 9
Ethics of self-driving cars .................................................................................................... 9
Data, Algorithms and Models........................................................................................... 10
Different Roles.................................................................................................................. 11
FAT ........................................................................................................................................ 11
FAT Flow: a Data Science Ethics Framework .................................................................... 12
FAT Flow: Concepts and Techniques ................................................................................ 13
FAT Flow: Cautionary Tales .............................................................................................. 13
Subjectivity of ethics ........................................................................................................ 13
Discussion Case 1....................................................................................................................... 14
Fair Data Gathering .......................................................................................................... 14
Transparent Data Gathering............................................................................................. 14
Discussion Case 2....................................................................................................................... 14
Fair Data Preparation ....................................................................................................... 15
Transparent Data Preparation ......................................................................................... 15
Fair Data Modelling .......................................................................................................... 15
Transparant Data Modeling ............................................................................................. 15
Fair Model Evaluation ...................................................................................................... 15
Transparent Model Evaluation ......................................................................................... 16
Fair Model Deployment ................................................................................................... 16
Transparent Model Deployment ...................................................................................... 16
Beyond Data Science Ethics .................................................................................................. 16
1
, Ethical AI Frameworks .......................................................................................................... 16
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems (2018) ............ 16
Ethics guidelines for trustworthy AI (2019) ..................................................................... 17
White House Executive Order on Maintaining American Leadership in Artificial
Intelligence, Feb. 2019 ..................................................................................................... 17
ISO .................................................................................................................................... 17
Discussion Case 3....................................................................................................................... 17
Ethical Data Gathering ............................................................................................................. 18
Privacy and GDPR ................................................................................................................. 18
Privacy .............................................................................................................................. 18
GDPR ................................................................................................................................. 20
GDPR key concepts .................................................................................................................... 20
Discussion Case 1....................................................................................................................... 24
CIA ......................................................................................................................................... 24
Privacy Mechanisms: Encryption and hashing ..................................................................... 24
Symmetric encryption ...................................................................................................... 26
Asymmetric encryption .................................................................................................... 26
Encryption for data protection......................................................................................... 28
Hashing ............................................................................................................................. 29
Quantum Computing ........................................................................................................ 32
Obfuscation ...................................................................................................................... 33
Government Backdoor ......................................................................................................... 33
Public data ............................................................................................................................ 35
Clearview.AI...................................................................................................................... 36
Bias ........................................................................................................................................ 36
Sample Bias ...................................................................................................................... 37
Experimentation ................................................................................................................... 39
Summary data gathering ...................................................................................................... 41
Ethical Data Preprocessing ....................................................................................................... 41
Input Selection ................................................................................................................. 41
Discrimination against sensitive groups: Data Preprocessing for non-discrimination ........ 42
Measuring ......................................................................................................................... 42
Proxies for discrimination.......................................................................................................... 42
Methods ........................................................................................................................... 43
1. Massaging: Relabeling ........................................................................................................... 43
2. Reweighing ............................................................................................................................ 45
2
, 3. Sampling ................................................................................................................................ 47
Experiments ............................................................................................................................... 47
Conclusions................................................................................................................................ 48
Privacy ................................................................................................................................... 49
Defining Target Variable................................................................................................... 49
Measuring Fairness (Revisited) ........................................................................................ 49
COMPAS case............................................................................................................................. 50
Methods to include privacy .............................................................................................. 50
Anonymizing Data ..................................................................................................................... 50
Online Re-identificaiton ................................................................................................... 53
Conclusion: ....................................................................................................................... 55
Data Preprocessing and Modelling: Privacy ............................................................................. 55
Data preprocessing ............................................................................................................... 55
K-anonymity ..................................................................................................................... 55
Recap k-anonymity .................................................................................................................... 55
L-diversity ......................................................................................................................... 56
T-closeness ....................................................................................................................... 58
Differential privacy ........................................................................................................... 59
Privacy loss parameter ε............................................................................................................ 62
How do we add this noise? ....................................................................................................... 63
Assumption 1: Single Count Query. Needed? ........................................................................... 64
Assumption 2: trusted data curator .......................................................................................... 66
Conclusion ........................................................................................................................ 68
Ethical Modelling: Including Privacy and Preferences ............................................................. 69
Including Privacy ................................................................................................................... 69
Differential Privacy ........................................................................................................... 69
Zero Knowledge Proofs .................................................................................................... 69
Homomorphic Encryption ................................................................................................ 70
Secure Multi Party Communication ................................................................................. 72
Applications ............................................................................................................................... 74
Federated Learning .......................................................................................................... 75
Federated Averaging ................................................................................................................. 76
Applications ............................................................................................................................... 77
Overview........................................................................................................................... 77
Including Preferences ........................................................................................................... 78
3
, Including domain knowledge: monotonicity constraints................................................. 78
Trolley problem ................................................................................................................ 79
Including Ethical Preferences .................................................................................................... 79
Ethical Modelling: Including fairness and Explainable AI ......................................................... 81
Fairness in modeling stage: measures and methods ........................................................... 81
Measures .......................................................................................................................... 81
Measuring fairness of Y’ ............................................................................................................ 81
Methods ........................................................................................................................... 83
COMPAS ........................................................................................................................... 83
Including Fairness in Modeling ......................................................................................... 84
Explainable AI ....................................................................................................................... 85
Why need for explanations .............................................................................................. 85
Trust........................................................................................................................................... 85
Compliance ................................................................................................................................ 87
Insight ........................................................................................................................................ 87
Improve ..................................................................................................................................... 87
Comprehensible and Explaining ....................................................................................... 88
Global and instance-based explanation methods............................................................ 89
Explanations .............................................................................................................................. 89
ANN/SVM Rule Extraction ......................................................................................................... 90
SVM Rule Extraction .................................................................................................................. 91
Linear Models ............................................................................................................................ 93
Instance-based explanations ..................................................................................................... 93
Advantages ................................................................................................................................ 97
Challenges ................................................................................................................................. 98
Conclusion ................................................................................................................................. 98
Ethical Reporting ...................................................................................................................... 98
Ethical Reporting .............................................................................................................. 98
p-Hacking ................................................................................................................................... 99
Multiple comparisons .............................................................................................................. 100
Case 1: Twitter to predict stock market .................................................................................. 101
Case 2: Reporting in credit scoring .......................................................................................... 103
Introduction to validation ....................................................................................................... 103
Quantitative validation ............................................................................................................ 104
Qualitative validation .............................................................................................................. 108
The advertising technology industry .............................................................................. 108
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller studentam1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.88. You're not tied to anything after your purchase.