Data Analytics and Privacy
Week 1 - Introductio 2
• Chapter 1 from ‘Data Science and Analytics with Python’ by J. Rogel-Salazar 2
• Lecture - 01/11/21 - Fundamentals of Data Analytics/ Science 4
Week 2 - Evolution of privacy & data protection: the interplay of society, technology,
and la 6
• Chapter 1, paragraph 1.1 and 1.2 of the FRA Handbook on European Data Protection Law
6
• Lecture - 08/11/21 - 7
Week 3 - GDPR : basic portions and fundamental principle 11
• Page 115-159 of the FRA Handbook on European Data Protection Law 12
• Lecture - 15/11/21 14
Week 4 - Transparenc 18
• Circumvention by design - dark patterns in cookie consent for online news outlets 18
• Lecture 4 - 22/11/21 21
Week 5 - Privacy by Design Strategies and Implementatio 24
• Resolution No. 984/2018 24
• ENISA report on Privacy by Design in big data 25
• Lecture 5 - 29/11/21 29
Week 6 - The principles of purpose limitation and lawful grounds for government
processin 34
• Handbook on European Data Protection law 34
• Tijmen H.A. Wisman and Rein J.L. Tijm, Radical Reframing of the Purpose Limitation
Principle 35
• Lecture 6 - 06/12/21 39
Week 7 - Risk Pro ling and Automated Decision-Makin 43
• Handbook on European Data Protection law 43
• Lilian Edwards and Michael Veale, Enslaving the algorithm: from a ‘right to an explantion’
to a ‘right to better decisions’? 44
• Lecture 7 - 13/12/21 47
1
w g fi n y g n s
,Week 1 - Introduction
• Chapter 1 from ‘Data Science and Analytics with Python’ by J. Rogel-Salazar
Python : popular programming language available for various platforms and widely used both in
business and academia
Data require tools that enable drawing conclusions and making decisions based on its evidence.
Chapter covers :
- What data science is and how it is related to various disciplines
- Characteristics of a good data scientist
- Composition of a data science team
- Overview of the typical work ow in a data science and analytics project
- Observe trials and tribulations in the work cycle of a data scientist
Statistics : originally understood as the analysis and interpretation of information of states
Science : organised knowledge, based on testable evidence and predictions
Use of data : evidence to support decision making
Data science is not simply the direct use of statistics or the systematisation of data.
What is data science ?
-> extraction of knowledge and insight from various sources of data, and the skills required to
achieve this range from programming to design, and from mathematics to storytelling.
-> data science and analytics is a sort of portmanteau for a number of overlapping tasks related
to data - from collection, provision, and preparation, analysis and visualisation, curation and
storage - that exploit tools from empirical sciences, mathematics, business intelligence, machine
learning and arti cial intelligence.
—> aim of these tasks is to enable e ective, pragmatic and actionable decisions
Careful storage and analysis of data delivers a very competitive edge.
Examples of data science products best explained by the questions they aim to answer, e.g. :
- What product will sell better in conjunction with another popular product? —> market basket
analysis
- Who will be declared Prime Minister (or President, or winner; depending on the avour of the
government system of interest) in the next general election? —> predictive analytics
- How can customers be encouraged to spend a longer time in an online portal? —> e-
commerce
Predictive analytics do not tells us the future, instead they allow us to forecast. For data to be
useful it should be available and it has to be timely. Realising that data may not be t for
answering the equations at hand is a di cult but important thing to bear in mind.
Data science process is iterative.
Steps that a data science project may follow :
- Question identi cation : without a clear question, there is no insight. Breaking down the
problem into smaller questions is useful
- Data acquisition : identify appropriate sources of suitable and useful data
2
fifi fl ff ffi fl fi
,- Data munging : if there is no insight without a question, then there is no data without data
munging -> most time consuming task
- Model construction : every model needs to be evaluated - in terms of e ectiveness and
accurateness - against the testing dataset, and decide if the model is suitable for deployment
- Representation : Data visualisation is more of an art than a science -> data representation
should be accurate, simple and providing clari cation to the story being communicated
- Interaction : the e ectiveness of a model needs to be monitored. Think of this process and
upward spiral where constant iterations provide improvement and new insights - as there are
new and follow up questions arising naturally through the process
Data scientist team and their stakeholders should always have the following questions in mind:
- what data was used and why?
- Where was the data acquired from and who owns it?
- Was the entire dataset used? Is a sample representative of the entire population?
- Were there any outliers? Have they been considered in the analysis?
- What assumptions were made when applying the model/algorithm? Are they easily relaxed/
strengthened?
- What does the result of the model mean to the process/business/product?
3
ff fi ff
, • Lecture - 01/11/21 - Fundamentals of Data Analytics/ Science
Data analytics vs data science
-> Data analysis is a process of inspecting, cleaning, transforming and modelling data with the
goal of discovering useful information, informing conclusions and supporting decision-making.
-> Data science is an interdisciplinary eld that uses scienti c methods, processes, algorithms
and systems to extract knowledge and insights from many structural and unstructured data.
Data analytics -> data related tasks, from collection, preparation, analysis and visualisation, to
curation and storage
E.g. movie recommendation system, customer segmentation, sentiment analysis model, credit car
fraud detection
Chart :
- Summarises what we can do with data
- The more we move up, the more useful the conclusion we can take from data becomes
What about data ?
Big data is data with 3Vs :
1. Volume : enormous amounts of data
2. Velocity : real time stream of data
3. Variety : data from a range of sensors, with di erent types
What makes privacy of Big Data a problem di erent to traditional privacy?
The scale makes it a di erent problem. There is a lack of control and transparency. There is also
data reusability. Also data inference and re-identi cation.
4
ff fi ff fi fi