Design
Database
- Massive
- Persistent
- Safe
- Multi-user
- Convenient
- Efficient
- Reliable
UML
Unified Modeling Language: PlantUML
Classes
for data modeling:
- add PK(primary key)
- drop methods
-----------
| student |
|---------|
|sID PK |
|sName |
|GPA |
|---------|
|<methods>|
-----------
Associations
relationships between objects of 2 classes:
- one to one:
1..1 --- 1..1. - many to one:
* --- 1..1 - one to many:
1..1 --- *. - many to many:
* --- *.
----------- ---------
| student | |college|
|---------| | |
|sID PK |x..y Apply m..n| |
|sName |-------------------| |
|GPA | | |
|---------| | |
|<methods>| | |
----------- ---------
Associations Classes
- classes store information of relationship edge between 2 data classes
- unnecessary if 0..1 or 1..1
c1 * --- 1..1 c2
information of relationship edge can stored in c1
owing to every object of c1 only associated with 1 object of c2
Subclasses
children classes
Relational Algebra
Operators
- select operator σ(sigma):
σ(sID < 100 ^ sAge > 20)Table_Nameset constraints - project operator π(pi) :
π(sID, GPA)Table_Nameselect certain columns - cross-product operator x: Table1 x Table2,
m tuples(rows) x n tuples(rows) =>
m*ntuples(rows) - natural join operator ∞: σ(E1.A1 = E2.A1 ^ E1.A2 = E2.A2 ...) (E1 x E2)
- theta join operator ∞(condition): σ(condition) (E1 x E2), call condition as ϴ
- difference operator -: matching schemas => change rows/tuples
- union/intersection operator ∪ / ∩: matching schemas => change rows/tuples
- rename operator ρ: change schemas(attributes name),
different schemas
<=>same schemas (union/intersection/self-join) - assign statement :=
- tree notation
π(sID, GPA) (σ(sID < 100 ^ GPA > 3.7) Student)
Relational Design
Decomposition
- start with mega-relations: including all attributes
- decompose into smaller relations(BCNF/4NF)
Functional Dependencies
- A -> B =>
1-1/n-1mapping - Key sets: closure of sets contains all attributes
Assuming relation R(A, B, C, D, ..., G) and closure of A, B {A, B},
A->C->D, B->E->F, F->G => {A, B}+ = {A, B, C, ..., G}, then {A, B} is a key.
If no such closure exists, then treat all attributes as a key.
BCNF
Boyce-codd normal form:
For each A -> B having A is super key && B isn't key,
A -> B -> C does not exist.
Here's the algorithm:
/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/
compute FDs for R
compute key for R using its FDs
while (there is relation R' aren't in BCNF) {
pick any R' with A -> B that violates BCNF (A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs for R1 and R2
compute keys for R1 and R2 using their FDs
}
Multi Valued Dependencies
A -> B && rest attributes=>A ->> B.A ->> B(1-n mapping),A ->> C(1-n mapping), noB -> C/C ->> B,B * Credundant tuples/rows.A ->>B && A ->>C=>A ->> B∩C.A ->>B && B ->>C=>A ->> C-B.
4NF
Fourth normal form:
If A ->> B then A is key && B isn't key,
here's the algorithm:
/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/
compute FDs and MVDs for R
compute key for R using its FDs
while (there is relation R' aren't in 4NF) {
pick any R' with A ->> B that violates 4NF(A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs and MVDs for R1 and R2
compute keys for R1 and R2 using their FDs
}
Normalized Design
- every row has the same number of columns
- every row has a unique key(PRIMARY KEY)
- everything in a row is all relevant to unique key
- everything in a row is all relevant to each other
note
(id, name, birth, major, grade)normalized to(id, name, birth)+(id, major, grade):gradeis not relevant tostudent id.(name, os, lang)normalized to(name, os)+(name, lang):osisn't relevant tolang.