Sequential Programs

Back to UWaterloo

MIPS [aside]

Abstraction: labels & variables

Relocation and Linking

Stack, $30 stores the stack pointer, pointing to the top of the stack

Evaluating Expressions

  1. Using the stack:
  1. Using temp vars/virtual registers:
  1. Using temp vars with operations on registers

If statements:

if (e1 op e2) T else E

evaluate(e1)
t1 = $3
evaluate(e2)
$4 = t1
comparison of $4 and $3, branch to labelElse
T
jump to labelEnd
Define(labelElse)
  E
Define(labelEnd)
# CALLER
Stack.allocate(parameters)
LIS(Reg.targetPC)
Use(proc)
JALR(Reg.targetPC)

# PROCEDURE/CALLEE
Define(proc)
Reg.savedParamPointer = Reg.allocated  
Stack.allocate(frame)
dynamicLink = Reg.framePointer  # saved frame pointer of caller
# savedPC is on the Stack, this is needed for nested procedures
savedPC = Reg.savedPC # Reg 31
Reg.framePointer = Reg.allocated
paramPointer = Reg.savedParamPointer
# body
Reg.savedPC = savedPC
Reg.framePointer = dynamicLink
Stack.pop # pops frame chunk
Stack.pop # pops paramaters chunk
JR(Reg.savedPC) 

On the stack:

a5 - don't mess up Reg.result

Static lib
A static link is a pointer to the frame of the statically enclosed procedure of the currently executing procedure

First class functions

Heap

A data structure that manages memory so it can be allocated and freed

Objects

Objects can be thought of in terms of closures

def newCounter: (()=>int, (int)=>Unit) = {
  var value = 0
  def get() = value
  def incrementBy(amount: Int) = {
    value = value + amount
  }
  (get, incrementBy) // returns a pair of functions, that share an environment stored on the HEAP
}

A6 Tutorial

// Testing your closures locally

def increaseBy(increment: Int): (Int)=> Int = {
  def procedure(x:Int) = { x + increment }
  procedure
}

def main(a: Int, b: Int) = (increaseBy(a))(b)

// now reimplemented in Lacs
val increment = new Variable("increment")
val increaseby = new Procedure("increaseBy",
  Seq(increment), Seq())

val x = new Variable("x")

// 4th arg, Option can be None or some other Procedure 
val procedure = new Procedure("procedure", Seq(x), Seq(), Some(increaseBy))

// some helper function
def v(variable: Variable): Code = read(Reg.result, variable)
// now define the body of the procedure
// make sure you implement some way that accesses outer scoped variable (should be similar to a regular read)
procedure.code = binOp(v(x), plus, v(increment)) // increment is an outer variable

increaseBy.code = Closure(procedure)

val a = new Variable("a")
val b = new Variable("b")

val parameterVar = new Variable("parameterVar")
val main = new Procedure("main", Seq(a,b), Seq())

// you can't use paramChunk because you don't know until runtime which paramChunk to access outer vars
main.code = CallClosure(call(increaseBy, v(a)), b, Seq(parameterVar))

val machineCode = compilerA6(Seq(main, increaseBy, procedure))
val endState = A1.loadAndRun(machineCode.words, Word(encodeSigned(1)), Word(encodeSigned(2)))

Finite Automata

In Theory of Computation, a Deterministic Finite Automata is a deterministic finite state machine that accepts and rejects finite strings of symbols, to produce a unique run of the automata for each input string. It recognizes exactly the set of regular languages, useful for lexical analysis and pattern matching

[ASIDE] A regular language is a formal language (set of string and symbols constrained by rules) that can be expressed with regular expressions.

Formally, a DFA is a 5-tuple; <∑, Q, q0, A, ∂> where

DFA recognition algorithm:

Non-determininistic Finite Automata (NFA) has multiple choices of transitions between states

Formally, a NFA is a 5-tuple; <∑, Q, q0, A, ∂> same as DFA except ∂: Q x ∑ -> 2^Q is a transition function to set of states, where 2^Q is the power set

NFA recognition algorithm:

A word w is accepted by an NFA if ∂*(q0, w) n A ≠ {}, the set of possible transitions is contained in A, the set of accepting states

NFA-ε, accepts ε (empty string) as a possible string input. It nicely represents unknown states. To eliminate all epsilon-transitions from an NFA:

Scanning

Breaking a string into tokens using an NFA constrained by maximal munch property

Let L be language of tokens, L* be the language of words that can be scanned

Method:

  1. run DFA for L until it gets stuck
  2. if in non-accepting state, backtrack to last-visited accepting state. if no accepting states visited, ERROR
  3. output a token corresponding to accepting state
  4. set DFA to start state and repeat from 1 (this is why we have to add epsilon transitions for L* eariler)

Regular Expressions

A regular expression expresses a regular language. So L is basically a function that maps regular expressions to the sets of strings that it represents.

ε       L(ε) = {ε} # empty string
a in ∑  L(a) = {a} # literal character
ø       L(ø) = {} # empty set
R1 | R2 L(R1|R2) = L(R1) u (L(R2) # R1 & R2 are regex
R1R2    L(R1R2) = {xy | x in L(R1), y in L(R2)} # concatenation of elements contained within each of the regular languages
R*      L(R*) = { x1x2..xn | forall i . xi in L(R), n ≥ 0 } # Kleene's Star

Kleene's Theorem:

  1. exists a DFA specifying L
  2. exists an NFA without epsilon-transitions specifying L
  3. exists an NFA specifying L (could have epsilon)
  4. exists a regex specifying L
  5. L is regular

Context-Free Grammars

Regular expressions can't express unlimited recursive structures. We want nesting in programming languages, so we need something more powerful than regular languages.

A context-free grammar (CFG) is a set of recursive rewriting rules (productions) used to generate patterns of strings. Regular Expressions use iteration, Context-free Grammars use recursion

You can represent all regular languages with CFGs, but not all context-free languages can be described by regular expressions/dfas/nfas

To generate a string of terminal symbols from a CFG:

canonical derivations: rightmost and leftmost, for precedence of parsing

Parsing

CYK

CYK starts with S, starting nonterminal, as alpha. We want to expand nonterminals in alpha such that the input string is represented by the CFG. So, check all possible chains of production rules until it finds a valid parse tree. It follows four cases, each trying to reduce the string of terminal and non-terminal characters, alpha, to the base case where alpha is empty. Since alpha is our current state, we keep checking the leftmost character from alpha to see if it can reduce our input, with the end goal being an empty alpha and empty input (i.e. all of the input is represented by a parse tree of terminal/nonterminal characters). Complexity occurs when the leading character of alpha is a nonterminal, since you have to now have to split the input string in two, such that the nonterminal describes the first partition and the rest of alpha describes the second partition of the input string. So there are a quadratic number of substrings of input that could be checked, each with a linear time partition check. So O(n^3) time complexity. Dynamic programming is used to optimize the problem by eliminating repeated calculations with a memoization table.
- Bottom up parsing, with dynamic programming to save results in table
- given w, does s =>* w
- sub-problem: given x, does alpha =>* x, alpha is a sequence of terminals and nonterminals, x is a seq of characters
- w =yxz for some y,z. x is a substring of w
- A -> gamma alpha in P. alpha is a suffix of RHS of a rule in R
- so we have to test for every suffix alpha and every substring x, in O(w^3) time, since for each w^2 substrings, check against O(w) partitions of each substring.
- the table contains single entries for each nonterminal, where each cell of the table has a substring
- cases for filling our table:
- alpha -> epsilon (empty string) : true if x = epsilon
- alpha -> aß : true if x = ay AND ß =>* y
- alpha -> A : true if exists A -> gamma in P s.t. gamma =>* x
- alpha -> Aß : if exists y,z where x=y and A =>* y and ß=>* z

Earley's Algorithm

Context Sensitive Analysis

Reject programs that satisfy a grammar but are still invalid

Lacs Types

A type is a collection of values. It is an interpretation of a sequence of bits

A type system is a set of rules htat compute the type of expression from the types of its subexpressions

Using premise-conclusion notation from proofs, type inference rules:

Γ |- NUM: Int    in the context Γ (captial gamma), the symbol table, NUM has type Int 
Γ(ID) = γ        Γ must be fed through all rules so that our symbol table is accessible when needed
____________
Γ |- ID: γ

Let E in {expras, expra, expr, term, factor}

Γ |- E1 : Int
Γ |- E2 : Int
_____________
Γ |- E1 op E2 : Int   for op being arithmetic operation

Type tree has two passes: first pass get all "defdef"s and create new ProcedureScopes for them. Second pass, create a symbol table for each.

Memory Management

Heap is a data structure to manage memory that can be allocated/freed at any time

The stack and heap grow toward each other, by our convention stack grows by subtraction, heap by addition.

We can structure our heap with blocks of memory, each block holding metadata about size and free/used. Then use a linked list of free blocks. A fragmented heap is split into small blocks.

Garbage Collector

Figures out the extent of objects for you, frees them implicitly.

Cheney's Algorithm

A stop and copy algorithm, which splits our total heap size in half, and performs collection by copying live objects (ones with pointers to it) from one semispace (the "from-space") to the other semispace (the "to-space")

The algorithm works in 2 parts:

Lambda Calculus

For expressing computation based on function abstraction. It is turing complete and thus acts as a universal model of computation

lambda calculus consists of a language of lambda terms, defined by formal syntax and a set of transformation rules

Church Encoding

Means of representing data operators in lambda calculus (since in untyped lambda calculus, we only have function types)

Church numerals can be used to represent natural numbers by the number of times some function f is applied to x:

Exam