Data-driven Control
Chapter 5: Introduction to Data Analytics in Accounting
Introduction
Data is proliferating at an exponential rate. This data proliferation is caused by increasing
computer processing power, increasing storage capacity and increasing bandwidth.
To understand the scope of the data revolution, it is important to consider the four V’s of big
data: volume, velocity, variety and veracity.
Big data: data sets characterized by huge amounts (volume) of frequently updated data
(velocity) in various formats (variety), for which the quality may be suspect (veracity).
- Data volume: the amount of data created and stored by an organization.
- Data velocity: the pace at which data is created and stored.
- Data variety: the different forms data can take.
- Data veracity: the quality or trustworthiness of data.
To be successful in the future with big data, it is important to understand more than tools and
techniques; it is critical to develop an appropriate mindset that allows you to think about data
holistically. A mindset is a mental attitude, a way of thinking, or frame of mind. Mindsets are
powerful collections of beliefs and thoughts that shape how you think and feel and what you
do. In the accounting domain, one critical mindset for future accountants is to develop the
analytical mindset: a way of thinking that centres on the correct use of data and analysis for
decision making.
According to EY, an analytics mindset is the ability to:
1. Ask the right questions
2. Extract, transform and load relevant data
3. Apply appropriate data analytic techniques
4. Interpret and share the results with stakeholders
Ask the right questions (1)
Data can be defined as facts that are collected, recorded, stored and processed by a system. To
start the process of transforming data into information, one must have a question or desired
outcome. Start by establishing objectives that are SMART:
- Specific: needs to be directed and focused to produce a meaningful answer
- Measurable: must be amenable to data analysis and thus the inputs to answering the
question must be measurable with data
- Achievable: should be able to be answered and the answer should cause a decision
maker to take an action
- Relevant: should relate to the objectives of the organization or the situation under
consideration
- Timely: must have defined time horizon for answering
Extract, transform and load relevant data (2)
ETL process: a set of procedures for blending data. The acronym stands for extract,
transform and load data. The ETL process is often the most time-consuming part of the
analytics mindset process. One of the reasons the ETL process can be so time-consuming is
that the process typically differs for every different program, database, or system that stores or
uses data.
Extracting data
,Extracting data is the first step in the ETL process. It has 3 steps:
1. Understand data needs and the data available
After defining the needed data, the next step is to understand the data itself, which entails
understanding things like location, accessibility, and structure of the data.
o Structured data: data that is highly organized and fits in fixed fields. For
example, accounting data like a general ledger, data in a relational database
and most spreadsheets.
o Unstructured data: data that has no uniform structure. For example: images,
audio files, documents, social media and presentations.
o Semi-structured: data that has some organization but is not fully organized to
be inserted into a relational database. Examples include data sorted in csv, xml
or various forms of streamed data.
Data warehouses typically store only structured data or data that has been transformed to
structured data.
Given the immense size of data warehouses, it is often more efficient to process data in
smaller data repositories holding structed data for a subset of an organization, called data
marts.
Data lake: collection of structured, semi-structured and unstructured data stored in a single
location. The size of data lakes can cause problems if they become so large that it allows
important data to become dark data. Dark data is information the organization has collected
and stored that would be useful for analysis but is not analysed and is thus generally ignored.
Data lakes can also become data swamps: data repositories that are not accurately
documented so that the stored data cannot be properly identified and analysed.
Typically, the best way to understand the structure of data is to consult the data dictionary.
The data dictionary should contain metadata, which is data that describes other data.
2. Perform the data extraction
Data extraction may require receiving permission from the data owner. The data owner is the
person or function in the organization who is accountable for the data and can give permission
to access and analyze the data.
With permission from the data owner, the data will then need to be extracted into separate
files or into a flat file. Flat file is a text file that contains data from multiple tables or sources
and merges that data into a single row. When including data in a flat file, a delimiter (field
separator) needs to be used to distinguish fields on a single line. A delimiter is a character, or
series of characters, that marks the end of one field and the beginning of the next field.
Examples are a pipe delimiter (recommend), commas and tabs.
A text qualifier is two characters that indicate the beginning and ending of a field and tell the
program to ignore any delimiters contained between the characters.
3. Verify the data extraction quality and document what you have done
Once the data has been extracted, it is best practice to verify that the extraction was complete,
accurate and approved. An additional verification step used by auditors is to reperform the
data extraction for a sample of records and compare the smaller data extract with the full data
,extract. The final data extraction best practice is to create a new data dictionary containing all
of the information about the fields in the data extraction.
Transforming data
Given the amount of time spent on and complexity of transforming data, chapter 6 discusses it
in detail. 4 steps:
1. Understand the data and the desired outcome
o A data dictionary should be created if there isn’t one.
o Also important to understand the specifications for the transformed data.
2. Standardize, structure and clean the data
o Most time-consuming part of the transformation process.
o Transforming data required being resourceful and creative in understanding
and fixing all the issues that exist in the data.
3. Validate data quality and the desired outcome
o It is critical to validate the data after it has been transformed to make sure the
data is free from errors.
o Making sure the data meets the data requirements is critical to a successful
transformation process.
4. Document the transformation process
o Update the data dictionary once again.
Loading data
If the data has been properly transformed, this process is relatively quick and easy. However,
there are a few important considerations when loading the data.
1. The transformed data must be sorted in a format and structure acceptable to the
receiving software.
2. Programs may treat some data formats differently than expected. It is important to
understand how the new program will interpret data formats.
Once the data is successfully loaded into the new program, it is important to update or create a
new data dictionary, as the person who loaded the data often is not available to answer
questions when the data is used.
Apply appropriate data analytic techniques (3)
There are four categories of data analytics:
1. Descriptive analytics: information that results from examination of data to understand
the past, answers the question “what happened?”.
o The computation of accounting ratios, such as ROI or gross margin.
2. Diagnostic analytics: information that attempts to determine causal relationships,
answers the question “why did this happen?”.
o For example, does increasing the IT budget in an organization increase
employee efficiency and effectiveness?
3. Predictive analytics: information that results from analyses that focus on predicting
the future, answers the question “what might happen in the future”.
o Forecasting future events like stock prices or currency exchange rates.
4. Prescriptive analytics: information that results from analyses to provide a
recommendation of what should happen, answers the question “what should be
done?”.
, o For example, the creation of algorithms that predict whether an individual or
company will pay back their loan and then make a recommendation about
whether a loan should be extended or not.
o They can be programmed to take an action. In the example, the loan may be
granted without the need of a human to review the application.
Having a working knowledge of techniques and tools means that when the supervisor asks you
to do a task, you have experience performing similar tasks, but you likely need to review how
to do something. With some effort, however, you would be able to complete the task with
little to no help from others. Mastery of a content means that if your supervisor comes to you
with a project, you could immediately work on that project and have a deep understanding of
the techniques and tools necessary to complete the project.
Interpret and share the results with stakeholders (4)
Interpreting results requires human judgement. One common way people interpret results
incorrectly relates to correlation and causation. Correlation tells if two things happen at the
same time. Causation tells that the occurrence of one thing will cause the occurrence of a
second thing.
A second common misinterpretation results from psychological research which provides
evidence of systematic biases in the way people interpret results.
You should strive to interpret results objectively, making sure you fully understand what the
results of analyses mean. This takes training and practice to do effectively.
Sharing data analytics results with others is often called data storytelling or storytelling. Data
storytelling is the process of translating often complex data analyses into more easy to
understand terms to enable better decision making. Several components to tell a successful
data story.
1. Remember the question that initiated the analytics process. As part of generating an
appropriate question, the story designer considers the objectives of the stakeholders.
2. Consider the audience. For example, include their experience with the particular data
and data analytics in general, how much detail they will desire to answer the question
or whether the stakeholders need a quick or in-depth answer.
3. The use of data visualization. Data visualization is the use of a graphical
representation of data to convey meaning. A common way to display data vizs is with
a data dashboards. Data dashboard is a display of important data points, metrics and
key performance indicators in easily understood data visualizations. Good principles
of visualization design include:
o Choosing the right type of visualization
o Simplifying the presentation of data
o Emphasizing what is important
o Representing the data ethically
Additional data analytics considerations
To understand data analytics, it is important to cover two additional topics.
1. Automating steps within the analytics process
Automation: is the application of machines to automatically perform a task once performed
by humans. Business automation ranges on a spectrum from very basic to very complex.
o Very basic requires a very defined process to that a programmer can design all
of the logic needed to perform every step of the tasks. Often caried out with