Skip to main content

Design

Database

  • Massive
  • Persistent
  • Safe
  • Multi-user
  • Convenient
  • Efficient
  • Reliable

UML

Unified Modeling Language: PlantUML

Classes

for data modeling:

  • add PK(primary key)
  • drop methods
-----------
| student |
|---------|
|sID PK |
|sName |
|GPA |
|---------|
|<methods>|
-----------

Associations

relationships between objects of 2 classes:

  • one to one: 1..1 --- 1..1.
  • many to one: * --- 1..1
  • one to many: 1..1 --- *.
  • many to many: * --- *.
-----------                   ---------
| student | |college|
|---------| | |
|sID PK |x..y Apply m..n| |
|sName |-------------------| |
|GPA | | |
|---------| | |
|<methods>| | |
----------- ---------

Associations Classes

  • classes store information of relationship edge between 2 data classes
  • unnecessary if 0..1 or 1..1
c1 * --- 1..1 c2
information of relationship edge can stored in c1
owing to every object of c1 only associated with 1 object of c2

Subclasses

children classes

Relational Algebra

Operators

  • select operator σ(sigma): σ(sID < 100 ^ sAge > 20)Table_Name set constraints
  • project operator π(pi) : π(sID, GPA)Table_Name select certain columns
  • cross-product operator x: Table1 x Table2, m tuples(rows) x n tuples(rows) => m*n tuples(rows)
  • natural join operator ∞: σ(E1.A1 = E2.A1 ^ E1.A2 = E2.A2 ...) (E1 x E2)
  • theta join operator ∞(condition): σ(condition) (E1 x E2), call condition as ϴ
  • difference operator -: matching schemas => change rows/tuples
  • union/intersection operator ∪ / ∩: matching schemas => change rows/tuples
  • rename operator ρ: change schemas(attributes name), different schemas <=> same schemas (union/intersection/self-join)
  • assign statement :=
  • tree notation
π(sID, GPA) (σ(sID < 100 ^ GPA > 3.7) Student)

Relational Design

Decomposition

  • start with mega-relations: including all attributes
  • decompose into smaller relations(BCNF/4NF)

Functional Dependencies

  • A -> B => 1-1/n-1 mapping
  • Key sets: closure of sets contains all attributes

Assuming relation R(A, B, C, D, ..., G) and closure of A, B {A, B}, A->C->D, B->E->F, F->G => {A, B}+ = {A, B, C, ..., G}, then {A, B} is a key. If no such closure exists, then treat all attributes as a key.

BCNF

Boyce-codd normal form:

For each A -> B having A is super key && B isn't key, A -> B -> C does not exist. Here's the algorithm:

/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/

compute FDs for R
compute key for R using its FDs

while (there is relation R' aren't in BCNF) {
pick any R' with A -> B that violates BCNF (A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs for R1 and R2
compute keys for R1 and R2 using their FDs
}

Multi Valued Dependencies

  • A -> B && rest attributes => A ->> B.
  • A ->> B (1-n mapping), A ->> C (1-n mapping), no B -> C/C ->> B, B * C redundant tuples/rows.
  • A ->>B && A ->>C => A ->> B∩C.
  • A ->>B && B ->>C => A ->> C-B.

4NF

Fourth normal form:

If A ->> B then A is key && B isn't key, here's the algorithm:

/*
* @brief fixed point algorithm just like most algorithms from compiler
*
* by decomposing to transform non-key dependent attributes to key dependent attributes
*/

compute FDs and MVDs for R
compute key for R using its FDs

while (there is relation R' aren't in 4NF) {
pick any R' with A ->> B that violates 4NF(A is not its key)
decompose R' into R1(A, B) and R2(A, rest)
compute FDs and MVDs for R1 and R2
compute keys for R1 and R2 using their FDs
}

Normalized Design

  • every row has the same number of columns
  • every row has a unique key(PRIMARY KEY)
  • everything in a row is all relevant to unique key
  • everything in a row is all relevant to each other
note
  • (id, name, birth, major, grade) normalized to (id, name, birth) + (id, major, grade): grade is not relevant to student id.
  • (name, os, lang) normalized to (name, os) + (name, lang): os isn't relevant to lang.