Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
[23-24] Interactive Data Transformation complete summary IM 5,99 €   Ajouter au panier

Resume

[23-24] Interactive Data Transformation complete summary IM

2 revues
 212 vues  37 fois vendu
  • Cours
  • Établissement

A complete summary of the lecture slides, recorded videos, and live lectures. Passed the course with a 7.5 by only studying this summary.

Aperçu 4 sur 61  pages

  • 13 novembre 2022
  • 61
  • 2021/2022
  • Resume

2  revues

review-writer-avatar

Par: alirahbarimanesh • 1 année de cela

review-writer-avatar

Par: lucloots • 1 année de cela

avatar-seller
Summary
Interactive Data
Transformation

,Table of Contents
Lecture 1: Database management systems, relational data models, and SQL ....................................... 1
1.1. Database management systems .................................................................................................. 1
1.2. Relational data model .................................................................................................................. 3
1.3. Single table queries using SQL ...................................................................................................... 4
Lecture 2: Entity relationship and translating from a natural language specification ............................ 5
2.1. Basic concepts .............................................................................................................................. 5
2.2. Relationships, degrees & cardinalities ......................................................................................... 9
2.3. Generalization & specialization .................................................................................................. 15
Lecture 3: Transforming ERD to relational schema, and normalization ............................................... 19
3.1. Transforming ERDs ..................................................................................................................... 19
3.2. Data normalization ..................................................................................................................... 25
Lecture 4: Evolution of data management, big data, and data intensive systems ............................... 28
4.1 Evolution of data management ................................................................................................... 28
4.2. Big data analytics ........................................................................................................................ 28
4.3. Reasons for going beyond traditional RDBMS ........................................................................... 30
4.4. Storage layer............................................................................................................................... 32
4.5. Computation layer ...................................................................................................................... 33
Lecture 5: The Spark ecosystem, RDDs, programming model, and PySpark ........................................ 40
5.1. Lambda expressions ................................................................................................................... 40
5.2. Apache Spark .............................................................................................................................. 41
5.3. RDDs ........................................................................................................................................... 41
5.4. Programming model ................................................................................................................... 43
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, and more ........... 49
6.1. Processing multiple tables .......................................................................................................... 49
6.2. Views .......................................................................................................................................... 50
6.3. Functions .................................................................................................................................... 51
6.4. Creating & populating ................................................................................................................ 53
6.5. Data from websites, integration & cleaning, and entity extraction & resolution ...................... 56
6.6. Integration & cleaning ................................................................................................................ 59

,Lecture 1: Database management systems, relational data models,
and SQL

1.1. Database management systems
Reasons for database management systems (DBMS): it offers solutions to the following problems:
• Data redundancy and consistency: multiple file formats, duplication in different files.
• Difficulty in accessing data: need to write a new program to carry out each new task.
• Data isolation: multiple files and formats.
• Integrity problems: integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly. Hard to add new constraints or change
existing ones.
• Atomicity of updates: transfer of funds from one account to another should either be
complete or not happen at all. Failures may leave data in an inconsistent state with partial
updates carried out.
• Concurrent access by multiple users: uncontrolled concurrent accesses can lead to
inconsistencies.
o Example: two people reading a balance (e.g., €100) and then withdrawing money (e.g.,
50 for person A, 70 for person B) at the same time.
• Security problems: hard to provide user access to some, but not all, data.

Database (DB): shared collection of data with the same structure, including correlations and
relationships for a common purpose.

DBMS: a collection of programs that manages the database structure and controls access to the data
stored in the database. It offers functions and methods to build and manipulate the data. It can be
seen as a black box interacting between users/applications and the database.




Goals of a DBMS: separate data from application.
• Provide an interface that the application programmer must follow.
• Allow system administrator to make modifications without having an impact on the user, for
example improve or reconfigure systems.
• Users can change their view of the data without having to worry about how it is stored.




1

, Layers of a DBMS (architecture):
• Internal layer: software for storing and structuring the data and offers efficient access
methods.
• Logical layer: optimization of queries, resolves conflicting accesses of multiple users and
guarantees constant availability (even in case of failures).
• External layer: communicates with users, analyses user requests/queries, controls access and
presents the answers.




Development process / life cycle of a DBMS:
• Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem. Steps include analyzing the current data processing and
general business functions and needs.
• Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements. The output is a conceptual schema/ERD that corresponds to a
detailed, technology independent specification of the overall organizational data structure.
• Logical design: representation of the database. Transform the conceptual schema, i.e.,
outcome of previous step, in terms of the data management system.
• Physical design: the set of specifications that describe how data are stored in a computer’s
secondary memory by a specific database management system.
• Implementation: build database implementation, populate with data, install and test
applications, complete documents and training materials.
• Maintenance: monitor the operation and usefulness of the system. Repair errors in the
database and applications. Enhance by analyzing the database and applications to ensure that
evolving information requirements are met.




2

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur IMstudentTiU2122. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour 5,99 €. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

79976 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
5,99 €  37x  vendu
  • (2)
  Ajouter