Three views of an OS:
Threads expose parallelism, enabling faster program execution and better processor utilization through less blocking
preemption is forcing a running thread to stop so that another thread has a chance to run, accomplished using interrupts (maybe a timer based)
Critical section is some portion of memory shared between threads
Hardware-specific synchronization instruction, xchg
provides a way to test and set a lock in a sing atomic operation (instead of multiple)
OS/161 has locks which block instead of spinning
Wait channels are used to implement thread blocking
wchan_sleep
blocks a thread, and causes a context switch like thread_yield
wchan_wake
unblocks a thread sleeping on a certain (or any) wait channelSemaphore keeps track of how many resources are available
c++ // like lock acquire implementation? wait(lock * my_lock, cv * my_cv) ??? ??? lock_release(lock) wchannel_sleep(cv -> ) lock_acquire(lock)
Bounded Buffer can be written with locks and CVs
cv.wait(notfull, mutex)
, then insert item and cv.signal(notempty, mutex)
, unlockCV implementation, 1 lock and 4 CV for assignment 1. Don't use timers. Use pressure-plate like behaviour for fairness. Let k
more cars or if the intersection clears, switch.
If you own the lock that you try to acquire, you get a deadlock
2 techniques for prevention
acquire(A)
while(!try_acquire(B)
release(A)
acquire(A)
// now guaranteed to have A and B
Not great because context switch isnt guaranteed, and we are spinning
Resource ordering, assign all resources a unique number and then you must acquire resources in strictly increasing order
Suppose that a threaded program has N queues of items. The program needs to support an operation called Transfer i,j). Each call to Transfer will transfer a single item from the ith queue to the jth queue, unless there is nothing in the queue, in which case the call will not affect the queues. The program will have multiple concurrent threads, each of which may call Transfer zero or more times. How would you use locks to ensure that Transfer operations are atomic? Specifically, how many locks would you use, what would each lock protect, and when would the locks be acquired and released to ensure that transfers are atomic?
a process is the environment created to run a program in
a process has an array of threads, data, files
kernel abstracts physical ram as virtual memory, remaps addresses to 0, so every process is isolated and each thinks it is the only thing running; processes dont talk to each other
process management calls: creation, destruction, synchronization, attribute management
fork Creates an identical clone, child, returns 0 if its the child
exit terminates, stores a final message (status code) and turns the process into a zombie
waitpid makes process go to sleep until process with pid terminates, can omly be called on children
You need synchronization around the global parts of sys4 implementation
System calls to implement:
struct proc p = kmalloc
memcopy
-like thingproc_create_runprogram()
to setup processas_copy()
, pass it old and new address spaces, then it allocates memory and copy it from the parentthread_fork()
creates new thread, need entry point function to setup kernel stack so that it can return to user mode ..?// entry point
entrypt( void *tf, unsigned long ) {
// take parent's trap frame and put it on the kernel stack
struct trapframe t = *tf;
// setup return value for fork, which need to go in the trap frame for that thread
t.tf.v0 = 0
t.tf.a3 = ...
t.tf.pc += 4 // program counter
mips_usermode // fake the return from the exception
copy the parent's trap frame to the heap, and pass it to the child (so no synchronization issue)
If the kernel panics on a page fault, it's just telling you you're getting a segfault: doing something with memory you're not allowed to do
Warning: Do hex math on exam
In our address space, code starts at 0, followed by data. Data grows up. Stack starts on the far right and grows right.
In OS-161, stack is set by OS.
Each process has an address space. OS allocates some ram for the process based on its request (code + data size). Core map is a data structure that is used to track memory that's used vs. free. Each address space is mapped with respect to 0, and uses fake addresses.
Aside: modern operating systems randomize position of code, data, stack and heap to hide any information about the process structure itself
Address Translation: performed by the hardware by the MMU. Every time there's a context switch, the kernel tells the MMU how to translate instruction addresses. We have three goals
Dynamic Relocation: most efficient scheme in both time and space.
Paging: divide memory into small segments (frames or physical pages). Now we can request ANY frames
valid?
V -> virtual addr
PGSIZE = page size = frame size
PG# = V / PGSIZE
OFFSET = V % PGSIZE
FRAME# = lookup(PG#)
PHYSADDR = FRAME# x FRAMESIZE + OFFSET
= FRAME# x PGSIZE + OFFSET
= first address of page + offset into page
#OF_PGS = VMemSize / PGSIZE (not virtual address size)
= 2^16 / 2^12
= 2^4 pages
#BITSPERPG = log (#OF_PGS) (these are bits to address pages)
= log (2^4)
= 4
#OFFRAMES = PhysMemSize / FRAMESIZE
BITSPERFRAME = log (#OFFRAMES)
BITSFOROFFSET = log (PGSIZE)
practice:
a) bitsforoffset = 16
pages = 2^16
b) bitsforpageaddr = 16
c) 48
Vmemsize = 2^16
Physmemsize = 2^18
pagesize = 2^12
pages = 2^4
pageaddrbits = 4
phys = framesize x frame# + offset
phys = 2^12 * 0x26 + 2c
= 0x900 * 0x26 + 2c
=
page# = 0x1
frame# = 0x26
0x1502C
0x32800
0x14024
so the format of the solution for "given v find p using page tables": first calculate b
, the # of bits needed for the page address, take the first b
bits of v
and lookup in our page table to get f
the frame number. Finally "append" them together.
Page table lookups are slow. The idea is, when an address needs to translated, check the cache before leaving the CPU to memory. Improves performance, especially on sequential reads (like programs, arrays, etc...)
Page fault is when the TLB does not have the page cached already so it throws an exception to be handled by the kernel (as in software managed tlbs)
We want to try to limit page faults. This can be done by
What is the clock replacement algorithm for cache eviction?
Modified round robin heuristic with a skip bit. Every time a page table entry is used, set the skip bit to 1. When evicting, round robin and skip any tlb entries that have a value 1, zeroing it as it’s passed. Evict the first 0.
So far, been putting entire address space of each process in memory (with a large space between the stack and the heap).
Modern OS's use on demand leaves them on disk, and only load frames into memory the exact moment it's needed. Downside, everytime you address something that's not in RAM, you need to fetch it from disk (slow process)
Executable and linking format (ELF) files
cs350-objdump
, load elf function reads header, which contains all information you need to know about a program (code segment, virtual memory address, offset, rodata / read only data / constants)50 / 50 split between user and kernel
In OS161/MIPS, given a physical address, it could have come from 3 possible spots: kuseg, kseg0, or kseg1
Scheduling is a problem of ordering jobs based on parameters like arrival time, duration/size, priority. Simple approaches include round robin and shortest job. Pre-emption can be used so that jobs don’t have to run in one pass, they can be stopped in between. This allows for round robin with a runtime, or longest time remaining. In the context of operating systems, jobs are processes, and often processes do not have a known duration. Jobs can also be blocked when doing CPU scheduling.
CFS (linux completely fair scheduler) is a weighted scheduling system that tries to balance process runtimes based on weight which is semantically a process’ priority. At any time, the amount of effective CPU time a process gets should be approximately it’s percentage of the total weight of all jobs (job weight/ sum of all weights). We measure this will virtual runtime, which can be calculated as the actual runtime * all weights / job weight, and CFS will attempt to have equal virtual runtime across all processes.
This is implemented in two possible ways.
What are device registers vs device drivers?
Device drivers are the kernel code that interacts with hardware devices.
Device registers are either status or command, (read/write) and act as an interface for sending and receiving information from a hardware device.
in
and out
Current time status registers
Restart-on-expiry command
Countdown status+command
Beep command
Data transfer to/from devices is facilitated by a data buffer. Two options for transferring to or from this buffer
Went over disks. Old hard drives had spinning disks for large data storage. Disks are divided into tracks and have blocks/sectors on each track. Disks use a mechanical head the seeks across tracks while the disk spins. We want to minimize seek time (time for head to move across tracks), rotational latency (how fast the disk spins and how much it has to rotate to line up the starting block) and finally transfer time (time taken to spin through number of sequential blocks). In practice, software can only optimize seek time.
What is the elevator algorithm for disk head scheduling?
Always seek in one direction (either inwards or outwards) until there are no more read requests in the given direction, then go the other direction. In this way, read/write operations are batched, with a maximum latency guarantee of two full end to end seeks, preventing starvation.
Interface:
Persistent data structures:
Directories are special case of files. They have an i-number and an i-node.
Directories use hard links to the files in it. A hard link is a mapping from a pathname to an i-number. I numbers are unique, pathnames are not. Once all hard links are deleted, the file is effectively deleted too. Hard links cannot be made to directories in order to prevent cycles in file system structure.
Very Simple File System (VSFS) is an implementation of the logical file system. It uses I nodes to store file meta data. An i-node is a fixed size, 256 bytes, containing file size, number of data blocks, and up to 12 pointers to data blocks (either direct or with some degree of indirection depending on the implementation of the file system, we covered double and triple). VSFS has all the i-nodes at the start of the file system in addition to the superblock, i-node bit map and the data block bitmap. Bitmaps are a simple block of binary data that corresponds to which data/i-node locations are in use vs free. This is for fast lookup (constant time) for finding free I-nodes or data blocks. Finally, the superblock holds all the file system information like i-node capacity and location of the i-node table.
A file system might observe power failure or unexpected shutdown, and use be tolerant to such failure. Two options to do so are,