100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
College aantekeningen Advanced Data Management (320102-M-6) €5,48   In winkelwagen

College aantekeningen

College aantekeningen Advanced Data Management (320102-M-6)

 15 keer bekeken  0 keer verkocht

In this document you will find all lecture notes of Advanced Data Management.

Voorbeeld 10 van de 34  pagina's

  • 9 december 2021
  • 34
  • 2021/2022
  • College aantekeningen
  • Arash saghafi
  • Alle colleges
Alle documenten voor dit vak (2)
avatar-seller
kelseydeweerd
Lecture 1 26-10-2021

Learning objectives
● Explain the concept of data management
● Recognize the role of data in business
● Examine the process of analyzing the application domain
● Design a business case for DM

What is data?
Facts (measurements or statistics) used as a basis for
reasoning, inference, or analysis
● Reasoning, inference, or analysis ➔ value creation

Definition
DMBOK: “data management is the development, execution,
and supervision of plans, policies, programs and practices that
deliver, control, protect and enhance the value of data and information throughout their life
cycles”.




Advanced data management
● Understand the application/business ➔ information modelling
● Evaluate and interpret the data ➔ ontology engineering
● Comply and keep up with an ever changing landscape ➔ governance, risk
assessment, IT management
● Apply tools to generate value ➔ business intelligence and analytics

Understand the domain (DM perspective)
Modelling the domain allows us to understand the role of technology and its data
requirements.
What is a model?
● A model is a formal representation of the target domain, using constructs and
construction rules
We build models to describe a domain in unambiguous ways
● Analysis of existing domain
● Planning or designing a future state
In order to reason about the phenomena in that domain and communicate between the
stakeholders
● Also used in building a business case (communication)
Using models we can explore, observe, analyze, explain and predict phenomena in the
domain.
And build (or plan/design) artifacts that operate in the domain


1

,Uses of models in DM
● Understand the business in order to generate value
● Data governance
● Integration and metadata management
● Improving data security and quality

A more abstract view of information systems
Information systems are models or representations of real-world phenomena and
applications (Representation Theory).
Information systems are comprised of three structures
● Deep structure: meanings and facts about real world phenomena in form of data
and business rules
○ DM and business process management
● Surface structure: features such as user interfaces that allow users to engage with
the deep structure
○ DM and IT management / system design
● Physical structure: the infrastructure (e.g., hardware and network) that enable the
implementation of surface and deep structures
○ DM and (physical) architecture

Application of models in DM




Business case
Before committing resources, businesses need to have a (somewhat) clear picture of what
the expected costs and benefits are (van Gils 2020) ➔ businesses need to understand the
rationale for undertaking an initiative.
Conducting such analysis is challenging as a large part of Data Management falls under the
category of complex systems where an ultimate solution can’t be analyzed to solve a
problem.
Conceptual models allow us to simplify and formalize the complexity.

Business case - systems thinking
To tackle the complexity, systems thinking conceptualizes the problem as:
● System made of interacting components
● Interactions happen within an environment
● The systems fulfil a goal




2

,Two methods to analyze complex systems
● Soft Systems Methodology including Rich Pictures where the interactions and roles
are modelled
● System Dynamics where cause and effect relationships among various variables
are studied (using causal loops)

Example of Rich Picture - E-Commerce




Example of causal loops - E-Commerce




Resolution of a project can be:
● Successful: on time, on budget, with all features and functions.
● Challenged: project completed, but either over time, over budget, or with fewer
features than originally specified.
● Failed: cancelled at some point during development.

Reasons for project failures
● Incomplete requirements
● Lack of user involvement
● Lack of resources



3

, ● Unrealistic expectations
● Lack of executive support
● Changing requirements and specifications
● Lack of planning
● Didn’t need it any longer
● Lack of IT management
● Technology illiteracy
Mostly human related. Not spending enough time to measure, compare, communicate.

Information Systems Research
Information systems research domain is the confluence of people, organizations, and
technology (Hevner et al. 2004).
We identify a business problem and build a solution in the form of an artifact or theory
(Gregor and Hevner 2013).
● In building a solution, we borrow from the knowledge base and
● The research outcome contributes back to the knowledge base

Contribution (Gregor and Hevner 2013)
Knowledge Contribution Framework (p. 345)




A good contribution / solution
The contribution could be either in the process or the product (e.g., faster performance, or
better results).
Paraphrasing Izak Benbasat: a good contribution in information systems has to be
● New
● True
● Interesting

Goal of the project
Think about a data management issue
● Take inspiration from the readings or a known business case
Propose / discuss / present / design a solution for that data management issue that is
● New, true, and interesting




4

,Lecture 2 02-11-2021

Learning objectives
● Analyze the value of information within organizations
● Distinguish between different interpretation of data
● Examine and use a self-service business intelligence tool (Tableau - Lab)

Setting
Sources of information on the WWW have increased
● Web Technologies have evolved
Companies with more than 1000 employees store, on average, over 235 terabytes of digital
information.
Users have become content providers
● Social networks
● Crowdsourcing and citizen science
● …

Big data
● Big data applies to information that can’t be processed or analyzed using traditional
processes or tools
● Organizations have access to a wealth of information, but they don’t know how to get
value out of it because it is sitting in its most raw form or in a semi-structured or
unstructured format
○ Zikopoulos, Paul, and Chris Eaton. Understanding big data: Analytics for
enterprise class hadoop and streaming data. McGraw-Hill Osborne Media,
2011

Characteristics of Big Data
The 4 V’s ➔ Volume, Variety, Velocity, Veracity.
Or
● Multiple sources of data
○ Usually unstructured or semi-structured
● Multiple users
● Multiple and unanticipated applications

Challenge
Premise
● By managing the 4 V’s of the big data, better decisions could be made that may
improve the company’s competitiveness, efficiency, insight, profitability, and more.
In other words
● Value lies in extracting knowledge from data

Business intelligence ➔ is an umbrella term that includes the applications, infrastructure
and tools, and best practices that enable access to and analysis of information to improve
and optimize decisions and performance (Gartner group)
Business intelligence reveals insights from raw data.




5

,Prediction: precision vs. recall

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
Or all positive predictions
𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒+𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Or all positive cases in reality




Evaluating BI knowledge
BI knowledge is typically a form of patterns
● Patterns are recurring characteristics that could help identify a phenomena
Pattern quality
● Objective evaluation based on statistical strengths of findings
○ E.g., 100% of MIM students have a bachelor’s degree
○ Precision and recall
● Subjective evaluation based on human judgment and depends on expectations of a
particular domain
○ Expected vs. unexpected (e.g., “sales are high in December” is an expected
pattern)
○ Actionable vs. unactionable (e.g., “Deliveries by Supplier A are always
delayed”)

Characteristics of BI data
● Historic
○ Data describing changes of a phenomenon throughout time (at least two time
points)
■ E.g., John scored 98% on his Math 101 test and 70% on Math 102
● Aggregate
○ Data representing a larger population
■ E.g., Enrolment in the IM program grew by 20% compared to last year




6

,Common BI systems architecture
● Data store
○ Practically, a duplicate of the traditional Transaction Processing System.
Instead of overloading the Transaction Processing System, the analysis and
reporting is delegated to the Data Store (on the duplicate data)
● Data warehouse
○ A central repository (or warehouse) where all the current and historic data
across an organization is collected
■ Integrating and making sense of large amounts of disparate data is
difficult and time-consuming
● Data marts
○ A subset of data from a data warehouse where the views are tailored for
specific applications

Common BI systems architecture




Van Gils (2020), p. 100

Human roles in BI data management
● Data owner
○ Person who is ultimately accountable for a data set and ensures data’s fitness
for the required application
■ E.g., Head of the Audit Department is accountable for the overall
quality accounting and financial data
● Data steward
○ Person with hands-on responsibility for managing data output
■ E.g., Senior Product Manager who is directly responsible for data
related to a given product/service
● Data user
○ Uses data for applications and negotiates with the data owner for access
■ E.g., analyst working on a BI project




7

,Current trends
● Information is democratized
● Self-service business intelligence
○ Enabling users to become more self-reliant and less dependent on the IT
organization by making BI tools easier to use ➔ efficiency and speed of
analysis
○ Downside: interpretation mistakes
● Visualization
○ A picture is worth a thousand words
○ The common language in boardrooms and social media
● Internet-of-Things (IoT)
○ Everything is connected
○ Sensors generate data around the clock
● Micro-targeting vs. macro-targeting
○ Behavioral science + Big Data Analysis




8

,Lecture 3 09-11-2021

Recap
● Self-service business intelligence
● Visual analytics
● Big data
○ The 4 Vs
○ Data-driven decisions can positively affect companies’ competitiveness,
efficiency, profitability, and more.
○ Thus, companies are motivated to turn data into assets

Learning objectives
● Explain different data quality dimensions
● Identify data abstraction levels
● Discuss data classification
● Identify the impact of classification decisions

Information Quality (IQ)
Notion of information quality is dependent on application of data
● Example: for financial analysis of Fortune 500 companies, data in units of thousands
of dollars would be sufficient, but for auditing the financial statements, we need
precision to the cent
Overall, information quality is conceptualized as fitness for use for specific purposes.

Information quality is usually evaluated in terms of its dimensions
● Accuracy
● Reliability / consistency
● Timeliness (currency)
● Completeness
From users’ perspective (subjective): ease of manipulation and value.

Information Quality Dimensions (Wand and Wang)
Accuracy: data represents the correct state of the real world
● Example: Ashley and John have written a test. If we enter Ashley’s mark for John in
the system, the mapping would be inaccurate.
○ Anything other than John’s grade makes the data inaccurate




Reliability: dependability of the output information, or correctness of the analyzed data
● Example: in the Tableau exercise, we saw that California had the highest amount of
sales. That is a correct statement with regards to the data.

Timeliness (currency): whether the data is up-to-date, and available on time
● Example: financial data are timely as they are updated in real-time (i.e., without
delay) and they are available to retrieve 24/7


9

, Completeness: ability of the information system to represent every relevant state of the real
world system.




Completeness:
Complete (with some redundancies) vs. incomplete




Clarity (below low clarity): Meaningful states (below not meaningful):




Data abstraction levels
Data codifies what we know about the world in the form of facts used as a basis for
reasoning, inference, or analysis.
● Conceptual level
○ Used for understanding and communication regarding a specific application
domain and is usually technology-agnostic. As in most business cases, it
models business concepts and their relationships (covered in week 1)
● Logical level
○ Deciding how to structure the data so that is becomes suitable for the
application in the information system
● Physical level
○ Considers how data is stored and transmitted between systems and takes the
technology infrastructure into consideration

Logical level
Traditionally, structure on data is defined in form of classification
● Classification is an abstraction mechanism used to represent phenomena with
common properties
Classes are supposed to provide cognitive economy
● Maximize the information that can be inferred about the phenomenon (Parsons and
Wand 2012)

Theoretical background on classification
Classification is not inherent to real world phenomena, but is an artifact of the human mind.
Classes are created in order to comprehend phenomena by grouping them based on
similarity (Lakoff 1987).
Schema theory (Derry 1996): humans form mental models to construct an understanding of
the phenomena they observe. Learning is a form of active construction of mental models.



10

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper kelseydeweerd. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 67474 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,48
  • (0)
  Kopen