Samenvatting

Summary The weekly readings of Data Wrangling and Data Analysis - INFOMDWR

215 keer bekeken 6 keer verkocht

Instelling
Universiteit Utrecht (UU)

Summaries of the weekly readings for the data wrangling and data analysis course in the Applied Data Science Master at UU

[Meer zien]

Voorbeeld 4 van de 130 pagina's

Bekijk voorbeeld

Geupload op 6 november 2021
Aantal pagina's 130
Geschreven in 2021/2022
Type Samenvatting

boolean queries
data collection
data extraction
python
sql
data consistency
data visualization
exploratory data analysis
data preparation
reduc
r
interactive visualizations
cleaning and transformation

€8,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Database System Concepts
Week 2 - Chapter 1 to Chapter 4.1

Chapter 1: Introduction

Database Management Systems

Database system applications:

- Enterprise information
- Sales
- Accounting
- Human resources
- Manufacturing
- Online retailers

- Banking and Finance
- Banking
- Credit card transactions
- Finance

- Universities

- Airlines

- Telecommunications

Disadvantages of keeping information in a file-processing system:

- Data redundancy and inconsistency
- Difference in languages and structure may cause difficulties
- Double entries of data in different areas of the system lead to higher storage and access
costs
- Data inconsistency: various copies of the same data no longer agreeing

- Difficulty in accessing data
- Conventional file processing systems do not allow needed data to be retrieved in an
efficient manner

- Data isolation
- Data is scattered in various files in different formats
- Writing applications to retrieve data is therefore more complex/difficult

- Integrity problems

, - Data values stored in the database must satisfy consistency constraints
- It is difficult to alter programmes when new constraints are added

- Atomicity problems
- An action must occur in its entirety or not at all, this is difficult to ensure in a
conventional file-processing system.

- Concurrent-access anomalies
- With many users accessing databases simultaneously, supervision measures must be in
place to allow many actions to occur at once and not result in incorrect entries etc.

- Security problems
- Individuals should only be able to access the information that they need, not all the
information in the database.

The above mentioned difficulties prompted the creation of database systems.

View of Data

Data abstraction:

Efficient retrieval of data has led to complex data structures. Developers hide the complexity from their
users through levels of abstraction.

- Physical level:
- Describes how the data are actually stored. Describes complex low-level data structures
in detail.

- Logical level:
- What data are stored in the database and what relationships exist among the data.

- View level:
- Highest level of abstraction
- Describes only part of the entire database
- The view level exists to simplify interaction with the system
- The system may provide many views for the same database

Instances and schemas:

- The collection of information stored in the database at a particular moment is called an instance.
- The overall design of the database is called the database schema. Schemas are changed
infrequently, if at all.

- The physical schema describes the database design at the physical level.

, - The logical schema describes the database design at the logical level.
- A database may have several schemas at the view level, which are sometimes called subschemas.
These describe different views of the database.

- Application programmes are said to exhibit physical data independence if they do not depend on
the physical schema. They therefore do not need to be rewritten if the physical schema changes.

Data models:

Underlying the structure of the database is the data model. This is a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.

Data models can be classified into four different categories:

1. Relational model
- Uses a collection of tables to represent both data and the relationships amongst these data.
- Tables are known as relations. The columns of the tables correspond to the attributes of
the record type.
- It is the most widely used data model.

2. Entity-Relationship model (E-R)
- Uses a collection of basic objects called entities, and the relationships among these
objects.
- It is widely used in database design.

3. Object-Based Data model
- This has become the dominant software-development methodology.
- Can be seen as extending the E-R model with notions of encapsulation, methods, and
object identity.
- It combines features of the object-oriented data model and the relational data model.

4. Semistructured Data model
- Permits the specification of data where individual data items of the same type may have
different sets of attributes. This is in contrast to the aforementioned models.
- The extensible markup language (XML) is widely used to represent semistructured data.

Database Languages

A database system provides a data-definition language to specify the database schema. It provides a
data-manipulation language to express database queries and updates. They are not separate languages,
they simply form parts of a single database language.

Data Manipulation Language: (DML)

, Allows users to access and manipulate data as organized by the appropriate data model. The types of
access are:

- Retrieval of stored info
- Insertion of new info
- Deletion of info
- Modification of info

There are 2 types of DML:

- Procedural DML’s - user specifies what data are needed and how to get those data
- Declarative/non-procedural DML’s - user specifies what data are needed without specifying how

- A query is a statement requesting the retrieval of information. The portion of the DML that
involves info retrieval is known as query language. SQL is the most widely used database query
language.

- The levels of abstraction discussed also apply to manipulating data.

Data Definition Language: (DDL)

- The DDL is used to express a set of definitions as well as to specify additional properties of the
data.

- Data storage and definition language: these statements usually define the implementation details
of the database schemas.

- The data values stored in the database must satisfy certain consistency constraints. Database
systems implement integrity constraints that can be tested with minimal overhead.

- Domain constraints:
- A domain of possible values must be associated with every attribute.
- The most elementary form of integrity constraint.
- They are tested easily by the system when a new data item is entered into the database.

- Referential integrity:
- Cases where we wish to ensure that a value that appears in one relation for a given set of
attributes also appears in a certain set of attributes in another relation.
- Database modifications can violate referential integrity.
- The normal procedure is to reject the action causing the violation.

- Assertions:
- Any condition that the database must always satisfy.
- When an assertion is created, the system tests it for validity.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jsstudent. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 51662 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Summary The weekly readings of Data Wrangling and Data Analysis - INFOMDWR

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?