100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten
logo-home
[23-24] Interactive Data Transformation complete summary IM €5,99
In winkelwagen

Samenvatting

[23-24] Interactive Data Transformation complete summary IM

2 beoordelingen
 42 keer verkocht

A complete summary of the lecture slides, recorded videos, and live lectures. Passed the course with a 7.5 by only studying this summary.

Voorbeeld 4 van de 61  pagina's

  • 13 november 2022
  • 61
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (3)

2  beoordelingen

review-writer-avatar

Door: alirahbarimanesh • 1 jaar geleden

review-writer-avatar

Door: lucloots • 1 jaar geleden

avatar-seller
IMstudentTiU2122
Summary
Interactive Data
Transformation

,Table of Contents
Lecture 1: Database management systems, relational data models, and SQL ....................................... 1
1.1. Database management systems .................................................................................................. 1
1.2. Relational data model .................................................................................................................. 3
1.3. Single table queries using SQL ...................................................................................................... 4
Lecture 2: Entity relationship and translating from a natural language specification ............................ 5
2.1. Basic concepts .............................................................................................................................. 5
2.2. Relationships, degrees & cardinalities ......................................................................................... 9
2.3. Generalization & specialization .................................................................................................. 15
Lecture 3: Transforming ERD to relational schema, and normalization ............................................... 19
3.1. Transforming ERDs ..................................................................................................................... 19
3.2. Data normalization ..................................................................................................................... 25
Lecture 4: Evolution of data management, big data, and data intensive systems ............................... 28
4.1 Evolution of data management ................................................................................................... 28
4.2. Big data analytics ........................................................................................................................ 28
4.3. Reasons for going beyond traditional RDBMS ........................................................................... 30
4.4. Storage layer............................................................................................................................... 32
4.5. Computation layer ...................................................................................................................... 33
Lecture 5: The Spark ecosystem, RDDs, programming model, and PySpark ........................................ 40
5.1. Lambda expressions ................................................................................................................... 40
5.2. Apache Spark .............................................................................................................................. 41
5.3. RDDs ........................................................................................................................................... 41
5.4. Programming model ................................................................................................................... 43
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, and more ........... 49
6.1. Processing multiple tables .......................................................................................................... 49
6.2. Views .......................................................................................................................................... 50
6.3. Functions .................................................................................................................................... 51
6.4. Creating & populating ................................................................................................................ 53
6.5. Data from websites, integration & cleaning, and entity extraction & resolution ...................... 56
6.6. Integration & cleaning ................................................................................................................ 59

,Lecture 1: Database management systems, relational data models,
and SQL

1.1. Database management systems
Reasons for database management systems (DBMS): it offers solutions to the following problems:
• Data redundancy and consistency: multiple file formats, duplication in different files.
• Difficulty in accessing data: need to write a new program to carry out each new task.
• Data isolation: multiple files and formats.
• Integrity problems: integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly. Hard to add new constraints or change
existing ones.
• Atomicity of updates: transfer of funds from one account to another should either be
complete or not happen at all. Failures may leave data in an inconsistent state with partial
updates carried out.
• Concurrent access by multiple users: uncontrolled concurrent accesses can lead to
inconsistencies.
o Example: two people reading a balance (e.g., €100) and then withdrawing money (e.g.,
50 for person A, 70 for person B) at the same time.
• Security problems: hard to provide user access to some, but not all, data.

Database (DB): shared collection of data with the same structure, including correlations and
relationships for a common purpose.

DBMS: a collection of programs that manages the database structure and controls access to the data
stored in the database. It offers functions and methods to build and manipulate the data. It can be
seen as a black box interacting between users/applications and the database.




Goals of a DBMS: separate data from application.
• Provide an interface that the application programmer must follow.
• Allow system administrator to make modifications without having an impact on the user, for
example improve or reconfigure systems.
• Users can change their view of the data without having to worry about how it is stored.




1

, Layers of a DBMS (architecture):
• Internal layer: software for storing and structuring the data and offers efficient access
methods.
• Logical layer: optimization of queries, resolves conflicting accesses of multiple users and
guarantees constant availability (even in case of failures).
• External layer: communicates with users, analyses user requests/queries, controls access and
presents the answers.




Development process / life cycle of a DBMS:
• Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem. Steps include analyzing the current data processing and
general business functions and needs.
• Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements. The output is a conceptual schema/ERD that corresponds to a
detailed, technology independent specification of the overall organizational data structure.
• Logical design: representation of the database. Transform the conceptual schema, i.e.,
outcome of previous step, in terms of the data management system.
• Physical design: the set of specifications that describe how data are stored in a computer’s
secondary memory by a specific database management system.
• Implementation: build database implementation, populate with data, install and test
applications, complete documents and training materials.
• Maintenance: monitor the operation and usefulness of the system. Repair errors in the
database and applications. Enhance by analyzing the database and applications to ensure that
evolving information requirements are met.




2

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper IMstudentTiU2122. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 68175 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis
€5,99  42x  verkocht
  • (2)
In winkelwagen
Toegevoegd