Digital Computers

Back to UWaterloo

Summary

Assembly and Data

Input/Output

Memory System

Caches

Virtual Memory

Processors

Notes

Part 3: ISA & Addressing Modes

Intro to Addressing

Instruction Sets

Representative Notations

Addressing Modes

Stacks

Part 4: ARM

ARM ISA Characteristics

Addressing Modes
Memory Operations
Instructions

All instructions can take registers or literals

Assembler

Assembler converts source program to an object program in machine language

Conditional

ARM - conditionals

Part 5: Basic IO

A system bus is an example of an interconnection network, joining I/O devices, memory devices and CPU

Program Controlled I/O

Interrupt Service Routine

- special subroutine, called interrupt service routine (ISR) is responsible for handling services requested by the interrupt and ensuring consistent state after the ISR
 - difference between this and subroutine is that the ISR has to copy many registers to the stack to preserve their values

Interrupt Nesting

Sometimes ISRs take large latencies for servicing higher priority devices

Processor Control Registers

in addition to processor status (PS) register, other control registers are present


mov r2, ps

Part 6: IO Organization

Basic Hardware

I/O Interface

Bus Operation

Part 7: Memory Systems

Array of simple memory cells, each storing a single bit of information

DRAM
DMA

Direct Memory Access is a hardware unit that handles memory access operations for the processor. Program controlled I/O is taxing on the processor with lots of overhead to access just one byte. Processor delagates to the DMA, which responds with an ISR when finished

Caches & Memory Hierarchy

Hierarchy of memory components goes from fastests to slowest, with speed cost trade off

Caches are faster than memory because of their hardware implementation. The cache is on the CPU, with transistors and it doesn't have to go through the external bus. Additionally, the cache is smaller so less to check.

Cache Writing Protocol for Store R2, (R3):

Mapping Functions determine the location in the cache for each memory address

Cache Structure
Replacement Algorithms

For set-associative and fully-associative caches, a cache miss needs to evict some undetermined block, of which is determined using a replacement algorithm.

Cache performance

For C time required to access a block in cache, M time when accessing memory (including miss penalty) and h hit percentage, cache performance is intuitively tavg = (hC + (1-h)M).

If we're given M' instead, being the latency for accessing Memory, then the time is tavg = C + (1-h)M' since the time for a miss is equal to the time for a hit, then no matter what, we incur the time taken to access a block from cache, and add extra latency if it's a miss.

For two caches, intuitively tavg = h1•C1 + (1-h1)(h2•C2 + (1-h2)M)

Solving Cache Problems

Given a-bit addresses, 2^a (bytes or words; depending on addressing system) main memory, c byte/word cache size, b bytes/words per cache block. Cache block capacity = c/b. Word bits are first log2(b) bits (rightmost), block bits are next log2(c/b) bits, tag bits are remaining.

For final cache content questions, do the timeline table method, visually updated each cache block with respect to a timeline. Then you can figure out quickly which block is the least recently used.

Virtual Memory

Programs are written using the full address space 2^32, but physical memory capacity is often less than this (2GB for example). Virtual memory populates the rest, storing it on disk

Lab4: ISR

  1. generate Random delay 5-25s in R6
  2. display on LED
  3. Delay of 1s then decrement your ## in R6m then display
  4. if R4 ≤ 0, fkasg LEDs on and off for 1 sec each until you press the button (ISR)

Part 8: Basic Processing

CPU has control unit, ALU and registers. Fundamentally, there are 5 steps:

Processor has hardware to implement every instruction in the ISA.

The datapath of a processor hold a bunch of components that implement the ISA.

Design of Register file:

  1. Fetch gets whatevers at PC address.
  2. Decode
  3. Execute

Part 9: Pipelining

Pipelining allows for concurrent instructions per cycle, with different parts (stages) of the processor

Issues

For pipelining problems, be aware of how the register files are being updated. Forwarding is important to understand. Forwarding can be done by adding a mux with the output of one stage, linked to an input of a previous stage to share values. An intermediate register may have a wrong value in it. By adding a mux, we can select the actual correct value from the output of a previous command. In general, we don't want to add a mux in the decode stage since the mux is expensive with time. Adding it to other stages is negligable since they involve memory operations.

A change made by some stage at a given clock cycle is not noticed until the next cycle.

instr PC R4 RA RM RZ RY
1F 37c00 1000 - - - - 
2D 37C04 " 
3X
4M
5W

Memory Stalls

Branches alter sequential execution, whose effect is unknown until the execution stage. The pipeline needs to squas the instructions fetched down the wrong path. The time taken is the penalty