Lecture 1. Introduction
Protecting data against:
- Unauthorized access
- Misuse (using data for something different than you think)
- Manipulation
- Loss
Data protection is a legal term: GDPR (General Data Protection Regulation), though it’s
limited to personal data only. The same type of technologies can be used for protecting
different kinds of data (e.g., business data).
Different data protection technologies (DPT’s) can target different parts of the architecture.
They can target:
- The data processing (secure multiparty communication, homomorphic encryption:
perform operations on encrypted data which are still useful when this data is
decrypted again)
- The data itself (anonymization, pseudonymization, etc.)
- The entire ICT system (access control, anonymous communication, etc.)
Some applications of DPT’s are:
- Privacy-preserving machine learning: Being able to learn from data without breaking
privacy rules of the people that the data was gathered from.
- Blockchain: To some extent the blockchain is anonymous (that’s why it’s used by
criminals), but not fully. There are some techniques to further improve this
anonymity.
Anonymity and Pseudonymity
In both cases, you want to make sure that activities cannot be linked to a natural person.
With anonymity, you want there to be no link to any identity, like with election ballots.
Pseudonymity means that activities are linked to an identifier (pseudonym, like an ID) that
does not reveal the person. Another example could be an e-mail address. Multiple activities
can now be grouped together by the person who performed them, and from this it could be
possible to infer who that person is (e.g., because the location data gives away its
home/work address).
Measuring data protection
There is no general standard method available. Many different metrics have been proposed
to measure different aspects of data, in different contexts. E.g., for location data, a matric of
data protection is the radius in which you know the person is (if you know it up to 10m, its
less protected than with 10km).
Often, these types of metrics describe against a given type of adversary (e.g., reveal the
person, change the data, etc.).
,𝒌-anonimity: refers to the number 𝑘 which is the amount of people that could be behind a
pseudonymized or anonymized data record. For example, when looking at course surveys for
a Computer Science course, a data record that answered female for gender, has little
possible people behind it, due to the small number of females studying computer science.
𝑘-anonimity is not enough. It’s possible that all 𝑘 entries have the same field of interest (e.g.,
all 𝑘 entries have the same disease, which you want to find out) (homogeneity attack) or
you can infer your field of interest because of your background knowledge (background
knowledge attack).
,Lecture 2. Context of DPTs I
Information security is defined as “protecting information and information systems from
unauthorized access, use, disclosure, disruption, modification, or destruction in order to
provide integrity, confidentiality, and availability.”
The CIA principles denote the three principles of security:
- Confidentiality: preserving authorized restrictions on access and disclosure.
- Integrity: guarding against improper information modification or destruction.
- Availability: ensuring timely and reliable access to and use of information.
Privacy is a very vague concept and has a long history of philosophical and legal
interpretations. Different ways of looking at privacy are:
- Privacy as confidentiality: “disclosure of information is … prevented, or … in a
- way that cannot be linked back to the individual”.
- Privacy as control: “providing individuals with means to control the disclosure of their
information”.
- Privacy as practice: “improving transparency and enabling identity construction”.
4 dimensions of privacy taxonomy:
- Purpose: For what purpose can the data be used?
(Single reason single time, reuse for the same single reason, reuse for a selected set
of purposes, reuse for any purpose related to the single purpose, anything)
- Visibility: Who is permitted to access the data?
(Owner, House (data collector, e.g., Google or Facebook), Third parties (authorized by
owner or the data collector), Anyone)
- Granularity: What level of detail is made available?
(Existential (the fact that the data exists), partial (parts of the data is available, e.g., a
part of the zip code), specifics)
- Retention: How long is the data kept?
(A date or infinite)
Techniques to support privacy
The following figure shows the lifecycle of the design of a
system, and which privacy-related things happen at those
phases. Note that privacy should also be kept in mind
during the later phases.
Privacy enhancing technologies (PETs) are also called
privacy preserving technologies (PPTs).
Personally Identifiable Information (PII) is a term used in
the USA that is very similar to (but not the same as) the
European term ‘personal data’.
Legal background
The leading source of data protection information in Europe is the GDPR (General Data
Protection Regulation), which is adopted in April 2016 and was entered into force May 2018.
It replaced the Data Protection Directive. Note the difference, a directive is defined on a
, European level and needs to be adopted by all member-states. Whereas a regulation
expands the member-states’ law and does not have to be adopted.
Some basic notions and terms used in the GDPR are:
If the controller also does the processing, there is no processor but only a controller.
The GDPR applies if at least one of the following criteria is true:
- Processing takes place in the context of the activities of an establishment (e.g.,
branch or subsidiary) of the controller / processor in the EU, even if the processing
takes place outside the EU.
- Processing takes place relating to goods and services offered to data subjects in the
EU.
- Processing is related to monitoring of data subjects in the EU.
- Controller established in a place where member state law applies due to public
international law (e.g., the embassy of a member state)
There are 7 principles relating to processing of personal data:
The GDPR mentions multiple legal bases to process personal data: