Back to UWaterloo
 DBMS supports an underlying data model, and authorize users
 offer durability and concurrency control
 a schema instance is a collection of data
 set notation can describe subsets of data, this is a query: \(\{ (x_1, x_2, \dots, x_k)  \phi \}\), find some tuple given a logical equation \(\phi\)

answer to the query above over DB is the relation \(\{ (\theta(x_1), \dots, \theta(x_k)) \  \ DB, \theta \models\phi \}\)
 so the relation is a set of results that satisfy our logical conditions \(\phi\) for tuples being mapped over our valuation \(\theta\)
 example forgets about finite constraint, and models a three column table to compute addition

\(\{(x,y)  PLUS(x,y,5)\}\) is a query for all pairs that sum to 5
 apply predicate logic for more expressive queries
relational model organizes information in a finite number of relations

universe is a set of values D with equality

relation, canonically a table
 has a predicate name R, and arity k of R (number of arguments, in this case number of columns)

relation instance is \(R \subseteq D^k\)

database has a signature defining a finite set \(\rho = (R_1, \dots, R_n)\) of predicate names

instance is a relation Ri for each \(R_i\), bold means relation for each predicate name
 notation: \(DB = (D, =, R_1, \dots, R_n)\), for example \(DB = (Z, =, PLUS, TIMES)\)

AUTHOR
might be some \(R_i\), the label, its relation instance Ri might have arity of 2, id and name, part of the universe of tuples \(Z \times Names\)
 on truthiness, when a relationship between objects (tuples) is present, it is true, and false if relationship is absent
 a valuation is a function \(\theta\) that maps variable names to values in the universe D: \(\theta : \{x_1, x_2, \dots\} \rightarrow D\)

\(\theta[x \mapsto v]\) defines x to map to v in our function
 the truth of a formula \(\phi\) (truth being existence of a relation) depends of a database instance and valuation
integrity constraints are laws, that must be true in every valid database instance
 addition is commutative
 is a binary function
 is a total function
 is monotone
example:
 every boss manages a unique department: \(\forall x_1,y_1, x_2, y_2 ,z.EMP(x_1,y_1,z) \land .EMP(x_2,y_2,z) \rightarrow y_1 = y_2\)
builds on the relational calculus
 SQL is a standard, has many implementations
 has three parts: Data Manipulation, Data Definition, Data Control
First Order SQL
Summary:
 captures all of relational calculus
 polynomial runtime (PTIME)
 shortcomings: no aggregation (counting) or recursion (path in graph)
Details:
 simple \(\exists, \land\) queries use
SELECT, FROM, WHERE
to declare

R AS p
defines a tuple variable (correlation) p, with attribute names \(a_1, a_2, \dots a_n\), similar to relational calculus where we would say \(R(p.a_1, p.a_2, \dots, p.a_n)\)

AS
keyword is optional: SELECT r.publication FROM wrote r
 this is even still equivalent to
SELECT publication FROM wrote
 for \(\lnot, \lor\) we need set operators
UNION, EXCEPT, INTERSECT
 we can use this to connect queries together, but both queries must return the same signature (be unioncompatible)

OR
in the WHERE
clause looks like UNION
, but often doesn't cover the empty set case
 of course with \(\lnot, \exists\) we can form \(\forall\) queries now
 for nested queries (something like \((A \lor B) \land C\)) can use the keyword
WITH
to name the results of a child query for use in another query
WITH
can be omitted by inlining queries SELECT book.title FROM (SELECT * FROM books) book

WHERE
subqueries are syntactic sugar that allow you to inline queries that look like predicates, with keywords IN, SOME, ALL, EXISTS, NOT
SELECT title FROM publication WHERE pubid in (SELECT pubid FROM article)

parametric subqueries allow queries to depend on (attributes from) the main query
 semantically, subqueries let us say "All x in R s.t. part of x doesn't appear in S"
Example: List all authors who always publish with someone else
SELECT DISTINCT a1.name
FROM author a1, a2
WHERE NOT EXISTS (
SELECT * FROM publications p, wrote w
WHERE p.pubid == w.publication
AND a1.aid = w.author
AND a2.aid NOT IN (
SELECT author
FROM wrote
WHERE publication = p.pubid
AND author <> a1.aid
)
)
Modifying Database
More SQL syntax for actually modifying state based on first order logic
Aggregation
Extension of first order SQL
 finding a number of tuples (counting), min/max, mapping values over a column
Data Modeling and EntityRelationship Model
 anomalies in schema design lead to violations of transactional consistency
 functional dependencies let us reason about ways to perform schema decomposition, thereby avoiding anomalies

computeX+(X, F) is an algorithm for determining which columns can be determined from the key + functional dependency set pair (by continuously adding any attributes to the set of X+ that are determined by elements in X+ currently)
 a decomposition is dependency preserving if we get some equivalent F' and none of the functional dependencies are interrelational (requires a join)
 Normalization is the process of decomposing a schema (set of relation schemas), so that it is in some standard form
 BCNF is the most desirable form, followed by 3NF
 multivalued dependencies capture anomalies/redundancies not captured by functional dependencies (when X >> Y, means a set of values for Y is determined instead of a single unique value)
 a dependency basis is a way to partition a relation schema so that we can further decompose schemas with multivalued dependencies, since splitting righthand sides of dependencies (as in minimal cover) doesn't work for multivalued (because it uniquely defines a large set, the cartesian product of all rightside attributes' values)
 breaking down the example for CourseTeacherHourRoomStudentGrade relation, "dependency basis for X = C is RX = [Y1 = T, Y2 = HR, Y3 = SG] s.t. F = C >> Z iff Z  C is a union of some of the partitions of RX"
 5NF is even stronger, using join dependencies, but has no axioms
Query Execution
We use relational algebra for implementing query execution
 all relational calculus queries are representable in relational algebra
 naive implementations of each operator are intuitive, but slow
 one optimization is to use indexes
 calculating the cost of different physical plans can be done using the simple cost model for I/O, using several parameters to estimate relative cost of operations, based on assumptions of uniformity and independence
Transaction Execution