100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Data Science and Ethics €5,49
In winkelwagen

Samenvatting

Summary Data Science and Ethics

 3 keer bekeken  0 keer verkocht

Summary of all the lectures and the course manual of the Data Science and Ethics course by prof. David Martens at the university of antwerp (first semester of master programs)

Voorbeeld 4 van de 33  pagina's

  • Ja
  • 2 januari 2025
  • 33
  • 2024/2025
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (1)
avatar-seller
ellentaeymans
SUMMARY – DATA SCIENCE AND ETHICS

Introduction
Why should we care about ethics when it comes to data science?
- It is expected from society (especially gen Z cares about social justice and ethics)
- There are huge potential risks
o For humans: physical and mental well-being, privacy and discrimination
o For businesses: reputational and financial risks
- But, there are also many potential benefits to caring about ethics
o It can improve the accuracy and fairness of the data and the model, but can also be a
marketing instrument
- Digitalization in general and the use of AI will become part of the future

Data scientists and business students are not inherently unethical, but they are not trained to think about it

Data science ethics
= about what is right and wrong when doing data science

Data science can be used for good intentions (reduce crime, improve medical diagnoses, increase
profitability), but also for bad ones (data leaks, discrimination)

Responsible AI = the development and application of AI that is aligned with moral values in society - about
what is right and wrong when you’re developing or using AI

Utilitarianism vs. deontology
Utilitarianism: consequentialism, focuses on the result of the act – you chose the action that results in the
highest net benefit (to you or to a group of people) because the result justifies the act
 The action is moral if the consequences are – therefore, the theory justifies immoral things

Deontology: there are some things that you cannot do, the action should be moral as well, the ends do not
justify the means

Aristotle: “Moral behavior can be found at the mean between two extremes – excess (using it all without
any concerns for the ethical consequences) and deficiency (not using it at all)”
 Can be applied to data science as well : use all available data for any possible application (without
concern for privacy, discrimination or transparency) vs. using no data at all

Data science equilibrium
We have to find a balance between the ethical concerns and the utility of
data science
 If the ethical concerns are bigger, we’ll need bigger ethical practices
to keep the balance
 The more to the left, the stronger the need for data science ethics practices

Link of the trolley problem to data science: AI is used in many self-driving cars

Data, algorithms and models
Data = facts or information, especially when examined and used to find out things or to make decisions
Algorithm = a set of rules that must be followed when solving a particular problem
Prediction or AI model = the decision-making formula, which has been learnt from data by a prediction/AI
algorithm

Personal data = data that is relating to an identifiable person
Behavioral data = data that provides evidence of actions that you took (ex. Facebook likes, location data)
Sensitive data = data related to race, ethnicity, political opinion, religion, sexual orientation
1

, SUMMARY – DATA SCIENCE AND ETHICS
FAT Flow
= a framework for data science ethics, consists of three dimensions
1) The stages in the data science process



2) Evaluation criterion
Fair, accountable, and transparent
3) Role of the human
Data subject, data scientist, manager, and model subject
FAT
Fair = treating people equally without favoritism or discrimination
1) Privacy – fair to the data subject’s privacy rights
Privacy = a state in which one is not observed or disturbed by others (= a human right)
2) Discrimination – not discriminating against sensitive groups
Accountable = required or expected to justify actions or decisions – responsible
Responsible: having an obligation to do something or having control over or care for someone as part of
one’s job or role. This obligation has three components:
1) Implement appropriate and effective measures to ensure that principles are complied with
2) Demonstrate the compliance of the measures upon request (to regulators for example)
3) Recognize potential negative consequences
Transparent = easy to perceive or detect
1) Transparency of the process
o Is crucial for fairness and accountability
2) Explainable AI – explain the thought process of the AI model

Different roles of the data science process
Data subject: the person whose data is being used
Data scientist: the person who is performing the data science
Manager: the person who manages and signs off on a data science project
Model subject: the person on who the model is being applied

Overview of fairness and transparency concepts covered in the course


If the sample is biased, then the AI model
will be biased as well




Ethics guidelines for trustworthy AI
Trustworthy AI has three components: it should be lawful, ethical, and robust

AI act
Takes a risk based approach: depending on how big the ethical
concerns are, the risk is higher or less




2

, SUMMARY – DATA SCIENCE AND ETHICS
Ethical Data Gathering
Questions that need to be considered:
- For fairness
o Fair to the data subject and model subject: is the privacy of the data subject and model
subject respected, when gathering their data?
o Fair to the model subject: is a sufficient sample included for all sensitive groups?
- For transparency
o Transparent to the data subject and model subject: what data is used, for what purposes,
and for how long?
o Transparent to the model subject: if A/B testing is performed, is the user aware of this and
did they give informed consent?
o Transparent to the data scientist: how is the data gathered? Was specific over- or under
sampling of certain groups considered?
o Transparent to the manager: how is the data gathered?

Privacy and GDPR
Privacy
- A lot of personal data is out there (locations  how much time we spend at … the doctors,…)
o Regulated: they cannot use it unless it's improving our services
- A lot of personal data can be predicted (based on social media behavior, they can predict your
sexuality for example)
o Dangerous thing: you don’t have to know any personal data to predict pregnancies,
sexualities,…  very sensitive information that can be predicted
- Once personal data is shared online, it’s hard to make it private again

Solutions: awareness, regulations, or technology (eg. facial recognition)

Privacy is a human right – it is about the protection of personal data

Cambridge Analytica
Was a political consulting firm that used Facebook information of over 80 million users without their
permission, which was then used for targeted political advertising
 Information used; page likes, birthday, city,…
 The data was obtained from an app – people were paid to take a test and provide their Facebook
information, but unwillingly, the data on the user’s Facebook friends were sent when uploading
 This was against Facebook’s policy  Facebook removed the app, suspended Cambridge Analytica,
but the damage has already been done

GDPR
= General Data Protection Regulation

- Privacy and data protection of European citizens, also applicable to non-European companies if they
collect data on EU citizens
- Applicable since 2018
- Fines up to 20 million or 4% of the turnover of the company

Key concepts
1) Personal data – any information relating to an individual (private, personal, or public life)
2) Anonymisation – data cannot be brought back to an individual (the owner is not re-identifiable)
 Not mentioned in GDPR
3) Psydonimisation – the processing of personal data in a way that it can no longer be attributed to a
data subject without the use of additional information ex. through encryption
3

, SUMMARY – DATA SCIENCE AND ETHICS
When does GDPR allow processing of personal data?
1) Unambiguous consent of the data subject
2) To fulfill a contract to which the data subject is party
3) Compliance with a legal obligation
4) Protection of vital interests of the data subjects
5) Performance of a task carried out in the public interest
6) Legitimate interest (subject to a balancing act between the data subject's rights and the interests of
the controller)
a. The company can process personal data in order to carry out tasks related to your business
activities

Unambiguous consent is a very complex term – maybe it’s better to have a short terms&conditions that
people actually do read and therefore accept intentionally instead of a very long one that no one reads and
simply accepts

Processing of personal data
The controller of the data shall be responsible for and be able to demonstrate the compliance with the
principles relating to processing of personal data

Encryption and hashing
Encryption
= to encode a message or information in such a way that only authorized persons can access it

Historically
 Shift cipher (move a few letters down the alphabet)
 Shave the head of the messenger, write a message, let the hair grown and send the messenger
 Enigma
o Electro-mechanical machine used by Germans in WW II
o The state of the machine is defined by settings of rotors and plugs
o Each typed letter changes the state of the machine and outputs some other letter
o Only if two machines (the one of the sender and receiver) start in the same state, the same
letter will be outputted  but, there are 10^6 possible states
 The receiver would have to type the encoded letters on his machine and the
original message would come out

Symmetric encryption
The same key is used for encrypting and decrypting the message

 Weakness: knowing that most messages start or end with “Hello”, “Greetings”,… simplifies
decrypting the message because when you find the key when decrypting these words you can
decrypt the whole message
 DES - Data Encryption Standard
o One of the first major standards in symmetric key encryption – 56 bit key
o Flaw: too small – brute force attack will find the key
 AES - Advanced Encryption Standard
o 128, 192, or 256 bit keys  more secure
 Challenges
o How to share keys: unsecure or overheard
o How to manage keys: if multiple users need to communicate with one another, a lot of keys
have to be shared before communicating



4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ellentaeymans. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 47561 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,49
  • (0)
In winkelwagen
Toegevoegd