Summary Little Bites of Big Data for Public Policy, ISBN: 9781506383507 Governance and Policymaking (620089-M-6) Tilburg University
All for this textbook (2)
Written for
Tilburg University (UVT)
Data Science
Governance and Policymaking (620089M6)
All documents for this subject (1)
1
review
By: sadjiaas • 3 year ago
Seller
Follow
48723
Reviews received
Content preview
Summary Governance and Policymaking
LECTURE 1
Introduction to data science in the public sector
What is public sector data science?
Helping governments to improve governance and making society better by using data science.
Rewind: statistics and the state
There is a long history between the state and the use of data/statistics. This is discussed in Woolf’s (1989) article, in which is
explained historically why the state uses data:
- First it was just by coincidence;
Church data or insurance data, which could be useful for policy makers.
- In the 17th-18th century mainly descriptive statistics were used;
- In the 19th century mainly predictive statistics were used;
- Later even more sophisticated data gathering and analysis became popular.
The internet of thing, big data, artificial intelligence etcetera.
Woolf: statistics are the compass of statesmen at least since the 19 th century. This leads to the expectation that governments
also do better. Is this also the case? No always, according to Kettl (chapter 1):
- We think that governments do not perform well;
- We thick that in order to perform better, they need more and better evidence;
- BUT, knowing is difficult.
Is big data the solution then?
- Nowadays everything we do generates information (internet of things);
- Much more data is available (big data);
- Can this be digested into insights that improve government decision?
Kettl: although big data has potential, it is not that easy. He mentions three major challenges:
1. We do not know everything and we never can;
2. Some of what we know is wrong;
3. Policymakers are not bound by evidence to make decisions.
We need to take into account the law of supply and demand and close the supply-demand gap. Kettl mentions five principles
that need to be taken into account:
1. Evidence is of no use unless its consumers want and use it;
2. Get the story and get it right;
3. Capture what the evidence says in clear language;
4. Sell the story, make the evidence convincing;
5. Evidence has to speak above the noise.
What is big data?
In Kitchin’s article big data is characterized as follows:
- 3 V’s (velocity, volume, variety);
- Exhaustive;
- Relational;
- Flexible.
Big data is not so much about “bigness”, but more about how the
data is being produced. This is what big data sets apart from regular
data.
Kitchin: big data creates a whole new epistemological approach (see graph for a
refresher about ontology and epistemology). We are currently entering the fourth
,epistemology. This is kind of the end of theory; we no longer need to test hypotheses. We just use big data to see what
correlations we can find.
Anderson (2008): petabytes allow us to say “correlation is enough.” We can analyze the data without hypotheses about what
might show. We can throw the numbers into the biggest computing clusters the world has even seen and let statistical
algorithms find patterns where science cannot. Correlations supersedes causation, and science can advance even without
coherent models, unified theories, or really any mechanistic explanation at all. There is no reason to cling to our old ways.
An example of the end of theory is Spotify’s discover weekly; it recommends music you would like based on
algorithms, not theories. It is completely based on data. They finetune it based on the data if you listen to the
discover weekly numbers.
This new epistemological approach is characterized as being:
- Exhaustive;
- View from nowhere;
- Free from bias;
- Speak for themselves.
This sounds a little bit extreme, therefore Kitchin argues that we do not need to push is this far yet since there are some flaws:
- All these data cannot be the data of all billions of people in the world;
- There are still people behind the collection of this data, so it is never free of bias;
Weapons of math destruction: the models are opaque and as good as the data on which they are based. If the
data is biased, the model is also biased and can reinforce discrimination.
- Data does not speak completely for itself. It is never context free and you always need to interpret it.
Instead of the epistemological approach, we should be a bit more modest and focus on data driven science, in which we should
be:
- Abductive (forming a conclusion from the information that is known);
- Reflexive.
If you have trouble differentiating deduction, induction, and abduction, thinking about their roots might help. All three words
are based on Latin ducere, meaning "to lead." The prefix de- means "from," and deduction derives from generally accepted
statements or facts. The prefix in- means "to" or "toward," and induction leads you to a generalization. The prefix ab- means
"away," and you take away the best explanation in abduction.
Discussion questions
In her conclusion Woolf laments the neutrality associated with statistics:
- Why is statistics not neutral according to her? Do you agree?
o It is not neutral since it depends on the quality of the data collection and the context the data is collected in
(the type of country, democratic or not).
- Can the use of big data lead to more neutrality? Why/not according to you?
- What would Kitchin have to say about his?
o Even when it comes to big data, the data that is available is not exhaustive for everyone and everything.
Therefore, you should look at the data source and characteristics of your data.
What makes bit data different from other data?
- What does Kitchen have to say about this?
o 3 V’s;
o Not to much is “bigness”, but the way it is produced.
- Is policymaking with big data easier than with regular data? Why/not?
, o It can have potential as regards to the accuracy of the results; more data, means more reliable results;
o However, the results are only as good as the data they are based on;
o It can also bring different difficulties, such as bias, privacy concerns etcetera.
Woolf (1989) – Statistics and the Modern State
The study of statistics is integral to the development of the modern state and modern society. If statistics and administration
nowadays appear tied by an umbilical cord, historically this was not always the case:
- In 17th century the earliest mathematically sophisticated utilization of statistics was a mainly a byproduct of the use of
statistics in other branches, for example insurance companies and churches;
- 17th – 18th century: descriptive statistics.
- 19th century: predictive statistics. The sources of statistical innovation are multiple and often unexpected, reflecting the
very complexity of the motivation and orientation of activities in civil society as much as the interest of the state
- Now: even more sophisticated data gathering and analysis techniques (IoT, Big Data, AI).
In the period of intense statistical activities that laid the basis for “modern” qualitative enquiries in the western world, the
general direction behind these developments differed fundamentally from nation to nation. The fundamental difference is
reflected in the definitions of the term itself. For example, Germany, defined statistics as “science dealing with the facts of a
state”. Statistics meant the ordered and numerical description of all aspects of the state. In such definitions, statistics was
understood as quintessentially descriptive. Its purpose was to collect and classify information about the material world.
Statistics long remained as distant from economic theory as it was from mathematical. The opposition between the statisticians
and economists was profound and helps explain why statistics continued to be rejected by successive schools of economists well
into the nineteenth century.
The genealogy of statistics
In recent years the moral statistics of the mid-nineteenth century has attracted considerable attention. This is not surprising
given the apparent prominence in all advanced western societies of these surveys of aspects of social life, and the identification
made by both their practitioners and later researchers between these statistics and the origin of modern social science.
The rapid development of moral sciences from the late 18 th century led to a huge growth in demand for information about
society. Alongside its status as a scientific tool of observation and classification, the rapid success of statistics at the turn of the
18th century can be attributed to its almost structural penchant for environmental explanations.
The connection between statistics and the state
What initially appeared as a major but straightforward administrative task of assembling facts in numerical form to provide
description of a virtually unchanging society was rapidly brought up against the realities of the dynamics of economic and social
change.
Conclusion
In the history of statistics since the 17th and 18th centuries, it is evident that over the long-term national differences gave way to
similarities in the adoption and diffusion of statistical approaches. A fundamental distinction can be made between liberal or
democratic countries and those with a strong state tradition. In the former educated elites in society forced the pace to make
their administrations extend their statistical activities. In the latter, the state was always the major sponsor and agent of
statistical methods and usually refused to render the information public.
These differences between elite or state initiatives became more blurred as awareness of the utility of statistics circulated
rapidly to demonstrate social and economic explanations, and to shape policy and enact reforms. By the late 19 th century
statistics was absorbed into a mode of thought and argument in both the social sciences and public policy in all western polities
irrespective of their political diversity. A cause and consequence has been the erroneous assumption of neutrality attributed to
statistics.
Kettle (2017)
Chapter 1 – Knowing better
For all battles we have about public policy, we can all agree on at least two things:
1. We can do much better;
, 2. One way to do better is to know better what to do.
However, knowing turns out to be a lot harder than it looks. A research that Dunning with one of his students, Kruger,
conducted found that incompetent people cannot see how incompetent they are. In his research Dunning asked people their
opinion about non-existing world events. None of us knows everything, so the Dunning-Kruger effect applies to all of us.
This complicates the two foundations of our problems with a third one:
3. When it comes to public policy, we do not think the government does well. We think it can and it should do better. We
think we can do better by knowing more, but we think we know more than we do, often do not recognize that we do
not know and think that those who disagree with us are idiots.
The problem of figuring out what we know and then determining what to do about it is getting bigger, and the problem is
growing faster than we can keep up. We are awash in an accelerating supply of information, which we call the big data
movement. All of this information generates huge piles of big data that can be digested into insights that can improve our
decisions. In some cases we can use traditional statistics to wrestle these data into meaning, but in far more cases we need new
and better tools, which can provide better insights. Making good sense of all the data is the goal of this book.
Doing without knowing (everything)
Decision makers do not always follow what we know, and their decisions certainly do not always lead us in the right direction.
Consider, for example, the war in Iraq. America argued that they needed to go to war against Iraq because they were stockpiling
weapons of mass destruction and were on the verge of using them. In fact, there were no weapons of mass destruction. The
analysts who made the case for war overstated the evidence.
What accounts for such problems, here and in other cases? There are three challenges:
1. We do not know everything – and we never can;
2. Some of what we know is wrong;
3. We do not need evidence to make decisions: policymakers do not always listen to the policy analysts. They often argue
that policy analysts tend to overestimate the amount and distinctiveness of the information for social problem solving.
For some problems, people will always depend heavily on ordinary knowledge – information that flows from experience
and common sense. It is not always clear to the policymakers what value policy analysis adds.
The law of supply and demand
These challenges lead to a central fact: no matter how much evidence analysts slide before government’s policymakers, they
will not use it unless it is useful to them. Unless policymakers want it and use it, producing more of it will not affect policy one
bit.
There is an understandable dilemma at the core of much analysis about government. Analysts look at governments and know it
can be better – and they are right – but too often they are frustrated. Analysts’ answers are not always the ones that
policymakers find easy to accept. Sometimes the analysts do not have enough contact with the policymakers to know what
problems most need analysis. Therefor, a lot of analysts who focus on supplying analysis often get discouraged by the gap
between thing they say and the actions that policy officials take. This is called the supply-side problem.
On the other hand we have the demand-side problem. Policymakers sometimes have little patience for the rigor and arcane
methods of policy analysis. Additionally, they often speak in a language that policymakers cannot translate. While they often
may not say it, they often trust their own instincts more than the studies of the analysts.
This produces a gap between the supply side and the demand side of the analysis. Which causes analysts to spending a lot of
time producing evidence that policymakers do not use and policymakers spend a lot of time making mistakes that better
evidence could help them avoid.
We can do better if we know better. We can know better about what works, but too often there is a gap between the
knowing and the doing.
Making evidence speak
Making policy better requires closing the supply-demand gap in public policy evidence. We need to find a balance between the
evidence that analysts supply and the evidence that policymakers demand. This leads to the following five principles:
1. Evidence is of no use to anyone unless its consumers want it and use it. This is the challenge of balancing supply and
demand;
2. It is important to get the story, and get it right. This is the challenge of data analytics;
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller 48723. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.64. You're not tied to anything after your purchase.