100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten
logo-home
COMPLETE Summary - Interactive Data Transformation (2023) €6,99
In winkelwagen

Samenvatting

COMPLETE Summary - Interactive Data Transformation (2023)

 1 keer verkocht

Includes everything from: Lecture 1: Data Management Systems, Relational, and SQL Lecture 2: Entity Relationships, and Translating from a Natural Lecture 3: Translating ERD to DB Schema, and Database Normalization Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems...

[Meer zien]

Voorbeeld 3 van de 23  pagina's

  • 8 juni 2023
  • 23
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (1)
avatar-seller
anneTBKIM
Interactive Data Transformation
Unit 4 – Year 2022/2023


Contents
Lecture 1: Data Management Systems, Relational, and SQL .............................................................. 2
Relational data model ............................................................................................................................ 3
Single table queries using SQL ............................................................................................................. 4
Lecture 2: Entity Relationships, and Translating from a Natural Language Perspective................ 5
Basic concepts ........................................................................................................................................ 5
Lecture 3: Translating ERD to DB Schema, and Database Normalization ....................................... 8
Transforming entity relationship diagram to relational schema ..................................................... 8
Normalization ........................................................................................................................................ 11
Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems .................... 12
Evolution of Data Management ......................................................................................................... 12
Big data and its challenges .................................................................................................................. 12
Big data analytics .................................................................................................................................. 13
Reasons for going beyond traditional RDBMS ................................................................................ 13
Lecture 5: The Spark Ecosystem, RDDs, Programming Model, and PySpark ................................ 17
Lambda expressions ............................................................................................................................. 17
Apache Spark ........................................................................................................................................ 18
Programming models ........................................................................................................................... 19
Lecture 6: Data Transformations with SQL, Entity Recognition, Data Cleaning Tools ................ 20
Views ...................................................................................................................................................... 21
Functions ............................................................................................................................................... 21
Creating and populating ...................................................................................................................... 22
Data from websites, integration & cleaning, entity extraction & resolution .............................. 22

,Lecture 1: Data Management Systems, Relational, and SQL
Reasons for Database Management Systems (DBMS):
In the early days database applications were built on top of file systems,
Drawbacks:
• Data redundancy and inconsistency: multiple file formats, duplication in different files
• Difficulty in accessing data: need to write a new program to carry out each new task
• Data isolation: multiple files and formats
• Integrity problems: integrity constraints become buried in program code rather than being
stated explicitly
- Hard to add new constraints or change existing ones
• Atomicity of updates: failures leave data in an inconsistent state with partial updates carried
out
• Concurrent access by multiple users: needed for performance, uncontrolled concurrent
accesses can lead to inconsistencies
• Security problems: hard to provide user access to some, but not all, data

A database (DB) is a collection of data with the same structure, including correlations and
relationships, that is defined for a particular use and used by several users
A database management system (DBMS) is a collection of programs over DB that specify the
data types, structure, constraints, and store this on a disk, retrieve, update, and mange access
rights
• A black box interacting between users/applications and the database
• The ultimate goal is to separate data from application
- Provide an interface that the application programmer must follow
- Allow system administrator to make modifications without having an impact on the user
- Users can change their view of the data without having to worry about how it is stored

There are different layers within the DBMS:
• External layer: communication with users
- Analysis of user requests (queries)
- Access control
- Answer presentation
• Logical layer:
- Optimization of queries
- Resolving conflicting accesses, i.e., multiple users
- Guarantees constant availability even in case of failures
• Internal layer:
- Storing the data
- Software for structuring the data
- Efficient access methods, i.e., keys, indices, etc.

System development life cycle
1. Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem
• Enterprise modeling: analyze current data processing, the general business functions and
their database needs, and justify need for new data and databases in support of business
• Conceptual data modeling: identify scope of database requirements for proposed
information system and analyze overall data requirements for business function(s) supported
by database
• Understand current data processing
• Understand general business functions and needs

, 2. Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements
• Conceptual data modeling:
- Develop preliminary conceptual data model, including entities and relationships
- Compare to enterprise data model
- Develop detailed conceptual data model, including entities, relationships, attributes, and
business rules
- Make conceptual data model consistent with other models of information systems
- Populate repository with all conceptual database specifications
• Output: conceptual schema
• Corresponds to a detailed technology independent specification of the overall organizational
data structure

3. Design
• Logical: representation of the DB; transform the conceptual schema into a logic schema,
which describes the data in terms of the data management technology that will be used to
implement the database
• Physical: the set of specifications that describe how data from a logical schema are stored in
a computer’s secondary memory by a specific database management system

4. Implementation: a designer writes, tests, and installs the program/scripts that access, create,
or modify the database
• Finalize all database documentation, train users, and put procedures into place for the
ongoing support of the information system users
• Populate with data
• Install application(s) and test
• Complete documentation and training materials

5. Maintenance: add, delete, or change characteristics of the structure of the database in order
to meet changing business conditions, to correct errors in database design, or to improve the
processing speed of database applications
• Monitor the operation and usefulness of the system
• Repair by fixing errors in database and applications
• Enhance by analyzing the database and applications to ensure that evolving information
requirements are met

Different types of DBMS:
• Traditional DBMS: text and numerical data
• Multimedia DBMS: multimedia data
• Spatial DBMS: geographic and geometric data
• Data warehouses
Relational data model
A relational model is an approach to managing data by representing it grouped into relations

A relational DBMS (RDBMS) is a database management system that manages data as a collection
of tables in which all relationships are represented by common values in related tables

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper anneTBKIM. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 65004 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis
€6,99  1x  verkocht
  • (0)
In winkelwagen
Toegevoegd