Cheat sheet for Data Management for Data Analytics (JBI050)
26 views 4 purchases
Course
Data management for data analytics (JBI050)
Institution
Technische Universiteit Eindhoven (TUE)
This is a cheat sheet for the course Data Management for Data Analytics provided at the TU/e. All information is summarized on two sides of an A4 paper, color coded based on the subject.
Closure – Armstrong rules Algorithms for 𝑭+ and 𝜶+ RA Operators
A1. Reflexivity 𝑖𝑓 𝛽 ⊆ 𝛼, 𝑡ℎ𝑒𝑛 𝛼 → 𝛽
𝐹 + := 𝐹 Select from r based on predicate p
A2. Augmentation 𝑖𝑓 𝛼 → 𝛽, 𝑡ℎ𝑒𝑛 𝛾𝛼 → 𝛾𝛽 𝝈𝒑 ሺ𝒓ሻ
A3. Transitivity 𝑖𝑓 𝛼 → 𝛽, 𝑎𝑛𝑑 𝛽 → 𝛾, 𝑡ℎ𝑒𝑛 𝛼 → 𝛾 Repeat
For each fd 𝑓 in 𝐹 +
𝚷𝑨𝒊 ,…,𝑨𝒏 ሺ𝒓ሻ Keep the specified columns
The following rules can be inferred from A1-A3 Apply A1 and A2 on 𝑓 and add resulting fd’s to 𝐹 +
A4. Union 𝑖𝑓 𝛼 → 𝛽, 𝑎𝑛𝑑 𝛼 → 𝛾, 𝑡ℎ𝑒𝑛 𝛼 → 𝛽𝛾 For each pair of fd’s 𝑓1 and 𝑓2 𝒓∪𝒔 Union of two tables
A5. Decomposition 𝑖𝑓 𝛼 → 𝛽𝛾, 𝑡ℎ𝑒𝑛 𝛼 → 𝛽 𝑎𝑛𝑑 𝛼 → 𝛾 If A3 can be applied, add resulting fd to 𝐹 +
A6. Pseudotransitivity 𝑖𝑓 𝛼 → 𝛽, 𝑎𝑛𝑑 𝛾𝛽 → 𝛿, 𝑡ℎ𝑒𝑛 𝛼𝛾 → 𝛿 𝒓−𝒔 Set difference of two tables
Until 𝐹 + does not change any further
Connect all tuples from one relation
BCNF – No redundancies 𝒓×𝒔
Result := 𝛼 to all tuples from the other relation
For each FD 𝜶 → 𝜷 ∈ 𝑭+, one of the following holds: While there are still changes to result
𝝆𝒙ሺ𝑨𝟏 ,…,𝑨𝒏ሻሺ𝒓ሻ Rename table(columns)
𝛼 → 𝛽 is trivial (𝛽 ⊆ 𝛼ሻ For each 𝛽 → 𝛾
If 𝛽 ⊆ result, then result := result ∪ 𝛾
𝛼 is a superkey for the whole schema 𝑅 ← Assignment
If we have only one table: Canonical cover 𝒓∩𝒔 The intersection of two tables
Check whether the FDs in 𝐹 are a violation of BCNF Attributes are extraneous for 𝑭 = {… , 𝜶 → 𝜷, … } if:
If 𝑭′ = {… , 𝜶 − {𝑨} → 𝜷, … } ⟺ 𝑭. Natural join. Connect the tables on
If we have a decomposition: 𝒓⋈𝒔 the common columns
Check if 𝛽 is in ሺ𝛼 − {𝐴}ሻ+
𝐹
For every set of attributes 𝛼 ⊆ 𝑅𝑖 , check that 𝛼 + either
includes no attribute of 𝑅𝑖 − 𝛼, or all attributes of 𝑅𝑖 . If 𝑭′ = {… , 𝜶 → 𝜷 − {𝑩}, … } ⟺ 𝑭 All tuples in r associated with all
𝒓÷𝒔 tuples in s
Check if ሺ𝛼 → 𝐵ሻ is in 𝐹 ′+: 𝛽 ∈ ሺ𝛼ሻ+
𝐹′
If it only has 2 attributes, it is always in BCNF.
Each left side of the FD in 𝑭𝒄 should be unique:
Decompose into (𝛼 → 𝛽, and 𝛼 → 𝛾 ⇒ 𝛼 → 𝛽𝛾) Ordering the results
𝑅1 = 𝛼 ∪ 𝛽 (a table with just 𝛼 and 𝛽)
𝑅2 = 𝑅 − ሺ𝛽 − 𝛼ሻ (still has 𝛼but no longer has 𝛽) SELECT operators
SELECT … FROM … Order by A in descending
WHERE … order (ASC is default)
Result := {R} SELECT A ሺΠሻ select the specified columns ORDER BY A DESC
Done := False
DISTINCT / ALL Are duplicates allowed. ALL is default. Order by A in ascending
Compute 𝐹 +
… ORDER BY A
order, output the first n
While (not done) do: * Asterisk Select all attributes LIMIT n results.
If (there is a schema 𝑅𝑖 in result that is not in BCNF)
Arithmetic Operations with + - * /, these can be
Let 𝛼 → 𝛽 be a nontrivial FD that holds on 𝑅𝑖 such
expressions used on constants and attributes
that 𝛼 → 𝑅𝑖 is not in 𝐹 + and 𝛼 ∩ 𝛽 = ∅. Null values
Result := ሺ𝑟𝑒𝑠𝑢𝑙𝑡 – {𝑅𝑖 }ሻ ∪ ሺ{𝑅𝑖 − 𝛽}ሻ ∪ {𝛼 ∪ 𝛽} AS Rename attributes
Else done := True
SELECT B FROM … Select all B that have null
FROM operators WHERE A is NULL values in A
3NF – Preserve dependencies
SQL generally just ignores them
For each FD 𝜶 → 𝜷 ∈ 𝑭+, one of the following holds: FROM 𝑹𝟏 , … , 𝑹𝒏 (×ሻ use the specified relations
In BCNF Views
AS Rename the tables
Each attribute B in 𝛽 − 𝛼 is contained in a candidate
key for R
WHERE operators CREATE VIEW V AS The query is stored in the
database and can be used
Check for each non-trivial 𝛼 → 𝛽 if 𝛼 is a superkey. If not, <query> in FROM
verify if each attribute in 𝛽 ∈ a candidate key of 𝑅.
WHERE P (𝜎) where P is True
V1 depends directly on V2 if V2 is used in defining V1
A BETWEEN x AND y where A ≥ x and A≤y
Let 𝐹𝑐 be a canonical cover for 𝐹 V is recursive if it depends on itself
𝑖 := 0 Find patterns using:
For each FD 𝛼 → 𝛽 in 𝐹𝑐 A LIKE “%Main%” % - matches any substring View expansion is basically computing the closure.
If none of the schemas 𝑅𝑗 , 1 ≤ 𝑗 ≤ 𝑖 contains 𝛼 𝛽 _ - matches any character Find any 𝑉𝑖 in query and replace it by the expression that
𝑖 := 𝑖+1 defines 𝑉𝑖 until no more views are present in the query.
𝑅𝑗 :=𝛼 𝛽 Subqueries are nested select-from-where expressions
A IN () Whether values are in the subquery Derived relations
If none of the schemas 𝑅𝑗 1 ≤ 𝑗 ≤ 𝑖 contains a candidate
key for 𝑅
EXISTS () Is the result of the subquery empty
𝑖:= 𝑖+1 WITH max_A() AS ( Define a temporary
𝑅𝑖 := any candidate key for 𝑅 Checks whether the subquery has Select Max(A) from R) table, that is only
UNIQUE () available to the query
duplicates (True if empty)
If some 𝑅𝑗 is a subset for some other 𝑅𝑘 , then remove it SELECT A FROM R, max_A in question.
Return ሺ𝑅1 , … , 𝑅𝑖 ሻ
Set operators Common table expressions: subquery in FROM clause
Check for lossless-ness and dependency preservation Triggers
() UNION () (∪) the union of two sets
The decomposition is lossless if:
() INTERSECT () (∩) the intersection Statements that are automatically executed as a side effect
∀𝑟 𝑟 = Π𝑅1 ሺ𝑟ሻ ⋈ Π𝑅2 ሺ𝑟ሻ of a modification to the database.
() EXCEPT () (−) the set difference Specifies when: upon which events, under which
At least one of the following FDs is in 𝑭+ :
conditions
𝑅1 ∩ 𝑅2 → 𝑅1 or 𝑅1 ∩ 𝑅2 → 𝑅2
The default is DISTINCT, they automatically remove duplicates Specifies what: the actions to take
It is dependency preserving if:
Aggregate functions
ሺ𝑭𝟏 ∪ … ∪ 𝑭𝒏 ሻ+ = 𝑭+ Create trigger name
{before or after} {insertion, deletion, or update}
SELECT AVG (A) SELECT SUM (A) on relation
Result = 𝛼
While (changes to result) do SELECT MIN (A) SELECT COUNT ( DISTINCT A) Referencing new row as new
For each 𝑅𝑖 in the decomposition COUNT(*) counts all tuples Referencing old row as old
T=ሺresult ∩ 𝑅𝑖 ሻ+ ∩ 𝑅𝑖 SELECT MAX (A) For each {row or statement} when condition
Result = result ∪ 𝑡 begin
If result contains all attributes in 𝛽, then 𝛼 → 𝛽 is preserved SELECT A, AGG(B) A must be in SELECT clause. …
end
Apply this test on 𝐹, not on 𝐹 +
FROM … Get aggregate values for B per
GROUP BY A value in A.
… GROUP BY A Alternative to WHERE when
HAVING P on agg(B) using grouped data.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller NienkeUr. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.77. You're not tied to anything after your purchase.