Skip to content

Instantly share code, notes, and snippets.

@HappyTetrahedron
Last active February 3, 2020 08:35
Show Gist options
  • Select an option

  • Save HappyTetrahedron/4eae45352a7fffd8b8498a752118b2e9 to your computer and use it in GitHub Desktop.

Select an option

Save HappyTetrahedron/4eae45352a7fffd8b8498a752118b2e9 to your computer and use it in GitHub Desktop.
Selected exam questions for system construction

Some exam questions for System Construction

1. Assume you use the Raspberry Pi as a computer to monitor a flight. You have good basic runtime system such as Minos in place but you don’t know about the quality of the code written for the different realitime task . In particular you want to be protected agains getting stuck in an infinite loop. (a) is this at all a problem in Minos? (b) How would you protect yourself?

In Minos, the assumption is made that all tasks run to completion within their given time frame. If there was an infinite loop, that would not be the case. Effectively the task's stack would get obliterated on the next context switch. That would at least get the system unstuck but it might end up in an unwanted state.

To protect myself I could try to formally verify the code to ensure all loops terminate.

Better: I could install a watchdog that periodically checks whether the realtime tasks are still doing something useful.

2. Suppose you had a, say, encyprtion schip on your system supporting SPI. Assume you had to read a 32-bot secret via SPI (sending a challenge → response). Explain on the lowest level what happens (draw waveform).

Ah yes, encryption ships, my favourite!

The encryption ship would be the SPI slave in this scenario and my system the master. The master selects the ship on the SS wire and starts toggling the clock on SCLK wire. The ship then starts sending on the MISO wire.

SS    ¯¯¯¯____________________________¯¯¯¯¯ Switches in the middle of a clock cycle
SCLK  ______¯_¯_¯_¯_¯_¯_¯_¯_¯_¯_¯_¯_¯______ Phase: Sample on rising or falling edge of clock? here: rising || Polarity: Is the idle state high or low? here: high
MISO  _____________¯¯__¯¯__¯¯__¯¯__¯¯______ Data sent is (0 0 0) 0 1 0 1 0 1 0 1 0 1
MOSI  _____¯¯__¯¯__________________________ Data sent is 1 0 1 (0 0 0 0 0 0 0 0 0 0)

3. Properties of the ARM instruction set [regularity, 3 operands, conditional execution]?

  • It uses 32bit instructions, many of which execute in one cycle
  • Its instructions have 3 operands
  • It uses a load/store architecture, no memory operands
  • It has index optimized instructions, e.g. store multiple and load multiple
  • All instructions can be executed conditionally using predication

4. Explain the module loading concept of Oberon. What happens when a command like A.B gets executed?

The list of loaded modules is checked for the presence of A. If it is there, procedure B in A is executed right away.

If it is not there, module A has to be loaded first. A's object file is read and the code is loaded. If A imports further modules, these have to go through the same process: The list of loaded modules is checked and modules are transitively imported. In this case, the reference counters of already loaded modules have to be increased (reference counter = number of loaded modules which import this module)

Once the code of Ais loaded and all dependencies are loaded, all procedure calls from A to its dependencies have to be fixed up. Because the exact location of imported modules is only known at runtime, the locations of procedure entry points have to be patched into A's code at this point. A has a list of the positions of procedure calls within its own code, these are now updated with the absolute position of the imported module's procedures.

After the module is finally loaded, its body is executed. After that, finally, procedure B is executed as well.

5. Assume modules A,B and C where C and B both import A. How do you make sure that when A is changed, inconsistent code in B and C is not executed? What if you want to make this more finegrained: inconsistencies only in affected portions of the code? What goes to the symbol file and what goes to the object file?

Whenever A is compiled, a fingerprint of A's interface is calculated. This fingerprint changes whenever e.g. exported procedure signatures in A change. Whenever B or C are compiled, they store the fingerprint of A as they saw it at compile time. At run time, when the import is resolved, the fingerprint B and C have stored is compared to the actual fingerprint of A. If it doesn't match, that indicates the interface of A is different from what B or C would expect, hence the code is inconsistent.

To make it more finegrained, one could store multiple fingerprints, e.g. one per exported procedure. In that case, only those procedures which actually changed would fail to execute, the rest could still be executed safely.

The symbol file contains all exported procedure signatures and data structures. The object file contains all code and data as well as the fingerprints.

6. Single processor system (e.g. Minos) Sketch the Classical, simple memory layout? (heap from bottom, stack from top). Why unmapped page between top and bottom (stack overflow protection). What is the difference between stack- and heap- memory in that context (heap-memory allocation is explicit, while stack-memory is usually allocated implicitly).

Virtual memory:

  • Top: RAM disk
  • Stack: grows from the top, implicitly allocated
  • One unmapped page - required for overflow protection. If heap or stack overflow, a page fault would occur here.
  • Heap: grows from the bottom, explicitly allocated and de-allocated
  • Kernel image
  • first 1 MB (unmapped to trap null pointers)

7. Multi processor system (e.g. A2). Simple memory layout? Stacks? How many processes possible?

We need multiple active stacks at the same time. They are allocated page by page on the page heap using virtual addressing. The number of processes is practically bounded by how much memory is available.

8. Briefly sketch the live cycles of activities in A2. Where are the synchronous / asynchronous context switches?

The states are Ready, Running, Awaiting Condition, Awaiting Lock and Terminated. Activities start in Ready, when they get scheduled they transition to Running. When they try to acquire a lock they transition to Awaiting Lock. When they use the await command they transition to Awaiting Condition. When they acquire the lock or the condition gets fulfilled respectively, they go back to Ready. When the activity completes, it transitions to Terminated.

Synchronous context switches happen when a process transitions to Awaiting Condition or Awaiting Lock. Since it needs to wait anyway, it yields execution. Asynchronous context switches are caused by interrupts and can happen anytime, they don't show up in the lifecycle per se.

9. Single processor, preemptive system, regular tasks with priorities. Single stack?

It's possible if all tasks can run to completion within their given time frame.

10. How can a multiprocessor system be started?

First, only one processor starts and sets up the boot program at a fixed address. It then sends a startup command to the other processors via APIC, which then all run the boot program. This program sets up the correct runtime environment (32bit), initializes registers, MMU, interrupt handling etc. and finally sets the started flag and sets up the scheduler.

The first processor (bootprocessor) then proceeds to boot the console.

11. The Oberon System presented in the third part of the course (Paul Reeds lectures) provides a message based system for the GUI. Messages are identified by a dynamic type test system (dynamic module loading!). This requires fast type tests. How can this be implemented?

Each instance of a type has a pointer to its type descriptor. That type descriptor contains the type names of the type itself and its parent and grandparent type. To perform a type test, the object's type descriptor is accessed and all types therein are compared one-by-one to the desired type. This is fast, as no complex address resolution is required, however it does have limitations (cannot check arbitrarily deep hierarchies)

12. Properties of the ARM architecture particularily suitable for operating systems? (operating modes with shadowed registers). What is a FIQ / Fast Interrupt on ARM?

A fast interrupt is a prioritized interrupt that can occur while a regular interrupt is being handled - that's what the specification says.

Fast interrupts provide dedicated banked registers (r8-r14, where r14 is used for the return address). If the interrupt handler code can make do with only the 3 remaining registers (no other registers, no stack), then it is not necessary to restore any registers explicitly when returning from the interrupt, making interrupt handling faster.

13. Briefly explain what SPI is. Explain how the communication works.

SPI (Serial Peripheral Interface) is a synchronous full duplex communication interface with a small range. It supports one master and multiple slaves (with slave select). There are separate wires for both directions of communication (MISO and MOSI), one wire for the Slave Select and one wire for the clock signal, which is generated by the master.

The master first selects the slave by signalling on the SS wire. Then it starts toggling the clock. Both chips can then transmit data by setting the respective wire high or low for a full clock cycle per bit. The bit is sampled on the edge of the clock (rising or falling, depending on phase).

14. Explain the Active Cells Compute model. What is hybrid compilation in that context?

In Active Cells, computation is defined as a network of cells, which are independent specialized computation units that can be defined in software or hardware. There is no direct resource sharing between cells, but they have defined inputs and outputs which are connected to each other in a network. Cells are kept simple, and there is no operating system. It is easy to create bespoke systems.

The whole system is described in a high level language, which is compiled down to hardware description. The cells correspond to TRMs and the networks are the wirings between them. The code of the cells can be compiled down to instruction code, only the network (and hardware implementation modules) need to go all the way down to HDL. When the implementation of a cell changes, only the corresponding IC needs to be updated, and that can be patched into the final bitstream without requiring a full synthesis. Changes to the network still require synthesis, but they are more seldom.

15. What is GPIO? How can it be used together with buitin-IO drivers on a typical ARM platform?

GPIO (General purpose I/O) is a set of pins which can either act as digital inputs or outputs, to be defined by the user at run time. On a typical platform, drivers are available for dedicated modes (e.g. SPI mode). It is defined which pins would serve which function in which mode, and the mode is selected by setting specific registers.

16. What does it mean to activate a command in an environment with dynamic (module) loading, what happens when a command is activated? What kind of information is necessary during loading and what kind of information for module compilation? How to check consistency at load time? (fine- and coarse grained fingerprinting)

The command is part of a module. It is executed directly if that module is already loaded; otherwise, the module is loaded first. To load a module, we need the module's code and data as well as the same information for all its dependencies. For compilation, the code of the module is required (in a HLL) as well as the symbol files of its dependencies, which contain only the dependency module's interface information (procedure signatures etc.) To check consistency at load time, we use fingerprints of the module's interfaces, which are stored in the module's object file as well as in each importing module's object file.

17. Why does a procedure that shall be used for an interrupt have to be flagged as interrupt procedure? On Arm an additional parameter has to be supplied, why? (different meaning of the return link)

Interrupt procedures have a different calling convention - when they return, they cannot simply use a branch and link instruction, but they also need to restore the processor state register. The interrupt procedure also needs to be installed in the corresponding interrupt vector.

The parameter is whether or not to use a fast interrupt (FIQ). In fast interrupts, a fresh set of shadow registers is used, alleviating the need for storing the old register values. That way, the interrupt handler is executed faster, although some restrictions apply (there mustn't be any local variables in memory)

18. What are interprocessor interrupts required for?

When one processor requires another processor to do some action, e.g. cache flushing in the MMU after a change in memory mappings or shutting down the system

19. How do you avoid race conditions (data races) on commonly used memory resource (such as a global variable)? On a single processor system? On a multiprocessor system?

On a single processor system, an easy way is to disable context switching for a brief time period, that way it is guaranteed that no other process interferes with accessing the resource.

On multiprocessor systems, instructions such as CAS can be used which atomically set a value based on a condition. Because the operation is atomic, no other process can interfere.

20. Sketch the programming model of A2. Semantics?

Instead of using objects, locks, and threads as separate concepts, A2 knows "Active Objects", where each object manages its own thread and serves as a monitor for locking.

21. Explain the difference between synchronous and asynchronous context switches. How can we make use of the information?

In a synchronous CS, the process currently running yields execution (either explicitly through a yield command or implicitly by waiting for a lock/condition). In an asynchronous CS, the process currently running does not anticipate the switch as it is caused by an external event (interrupt).

In a synchronous CS, the process is aware that it is going to be preempted, and thus we can get away with not storing all of its state. The process can restore the state on its own, because it knows that it has to do so. We only need to store its stack pointer and instruction pointer. In an async CS, we have to store the entire state (all registers); the process will not be able to resume otherwise.

22. Sketch the live cycles of activities in A2.

The states are Ready, Running, Awaiting Condition, Awaiting Lock and Terminated. Activities start in Ready, when they get scheduled they transition to Running. When they try to acquire a lock they transition to Awaiting Lock. When they use the await command they transition to Awaiting Condition. When they acquire the lock or the condition gets fulfilled respectively, they go back to Ready. When the activity completes, it transitions to Terminated.

23. Classical simple memory layout. How to implement dynamics using the MMU? Difference between Heap and Stack (w.r.t. the visibility of the operations)?

The stack grows from top, the heap grows from bottom. Each process has its own stack in virtual memory. Whenever a page fault occurs, a new page is allocated for that process, i.e. pages are implicitly allocated and the process is unaware. To allocate heap space, the process has to do it on its own, so it will be aware of the new allocation.

24. How do you implement a spinlock using CAS? Semantics of CAS?

while CAS(lock, 1, 0) != 0 {}
// lock acquired
lock = 0
// lock released

CAS takes three arguments - a variable (memory location), its new value, and its old value. Only if the actual value of the memory location corresponds to the supplied old value, it is changed to the new value atomically. The instruction returns the actual old value.

25. Single processor system. How to implement mutual exclusion? [switch off interrupts]

Yeah, well, switch off interrupts.

26. Multi processor system. How to implement mutual exlusion (low level).

Use locks with CAS

27. Explain the difference between the execution model of Java and that of Active Oberon on monitors.

Java monitors use Signal-and-continue: The signalling process continues execution after signalling. The process which got signalled enters the monitor's entry queue. However, it is possible for different processes to get scheduled between the signalling and signalled process.

Active Objects use Signal-and-exit: The signalling process executes to completion after signalling (which, in A2, is done by fulfilling a condition or releasing a lock). The process which got signalled gets scheduled immediately after; it has priority over other processes.

28. When a process is de-scheduled by the timer handler, the full state is somewhere on the stack. Is it enough to store SP and FP or does the state have to be stored somewhere?

Since the process got preempted, we cannot be exactly sure which subset of registers etc. it currently uses, so we have to store the entire state somewhere.

29. Explain the priority scheduling of A2. How do you make sure that a new process with higher priority than a running thread will actually be executed immediately?

Each object has queues for other objects which are awaiting a condition or queue controlled/currently held by this object. When the object gets de-scheduled, these queues are checked first, and objects therein have priority over objects in the global ready queue.

30. Do you remember why the first page of our system was unmapped? Pitfall? (large arrays)

The first page is unmapped to trap null pointers - they would cause a page fault

More advanced questions

31. What is the ABA problem? How can it be solved? How does thread local storage help?

The ABA problem occurs when a thread observes a part of a global state and then assumes the entire state is unchanged if that partial observation remains unchanged. An example with a stack could be: Thread A wants to push an element. It first looks at the head, then sets the new element's next to the head and finally sets the head pointer to the new element using CAS. If, in the meantime, other threads pop the head, push something else, and then push the same head again, the CAS will succeed even though the stack has been modified. This can lead to inconsistencies in the datastructure and corruption.

The ABA problem can be solved e.g. with pointer tagging or hazard pointers. Pointer tagging is not a real solution, it simply makes the issue less likely. Hazard pointers involve storing all addresses each thread is about to write to to a "hazard list", which has to be checked by each process before accessing addresses. The issue is that this list requires one entry per process, which could be arbitrarily many. The hazard pointers could be placed in thread local storage.

It is also possible to maintain a node pool in thread local storage. Nodes can be re-used, but each thread has its own pool, so they will not interfere with each other.

32. Multi processor system, memory layout. How would you implement stack growing facilities in software? (without the MMU)

Use heap blocks with information at the end as to which block comes next: Whenever the stack needs to be expanded, an ExpandStack function is called and put on the stack which manages the jump to a different memory location for whatever function comes next.

33. What is the difference between preemptive and cooperative multitasking. What are the advantages and disadvantages of cooperative multitasking? What is implicit cooperative multitasking and how does it help for lockfree programming?

In cooperative multitasking, processes yield execution from time to time to allow others to execute. In preemptive, processes don't need to yield, but get preempted through interrupts. Cooperative multitasking allows for easy processor-local exclusion by simply not yielding, and the context switch itself is simpler because not all process state needs to be stored. However, one needs to assume all processes are actually cooperative; if arbitrary code from unverified sources is run, that might not be the case.

In implicit cooperative multitasking, the compiler automatically inserts yield code periodically, so the programmer doesn't need to worry about that. It is possible to declare an uncooperative block, in which no yields get inserted, which allows for simple processor-local mutual exclusion.

34. Lock free secheduler: we are using instances of lock-free queues for the ready list. Do you know the problem that can appear when a thread gets scheduled with: get new process from queue, put currently running process to queue, task switch.

It is possible for some other thread to dequeue the now-suspended task and run it on the stack of the currently executing thread.

Processor 1                                 Processor 2

current := GetCurrentTask() # Task A
next := Dequeue(readyqueue) # Task B
Enqueue(current, readyqueue)
                                            current := GetCurrentTask() # Task C
                                            Enqueue(current, readyqueue)
                                            next := Dequeue(readyqueue) # Task A
SwitchTo(next) begins # runs on stack of
                      # Task A
                                            SwitchTo(next) # switches to Task A

Here, two processors run on the stack of Task A, which will likely corrupt the stack!

SwitchTo(next) ends 

35. Given a four character 7-segment display hardware on some FPGA board with the following hardware implementation (in order to save bits) [draw display + multiplexer]. You want to output numbers. Using the active cells model, what would you suggest to implement in software / in HDL?

Without the drawing I can't really answer this...

36. Assume you have a programming language that is claimed to be safe. What kind of functionality has to be supported in OS development that is unsafe (get and set values in address, set and get special registers like processor status register, special data type for low-level memory access (BYTE) )

  • Setting and getting values at specific addresses
  • Modifying special registers such as processor state
  • Pointers.

37. What does an object file in a system with dynamic loading consist of (dependency information: imports / exports, entries to be fixuped, code and data)

Code, data, imports, exports, fixups.

38. What is a fixup?

The location of a reference to an external procedure or type inside a module's code.

In dynamic module loading, the exact location of procedures in memory is unknown at compile time and has to be inferred at load time. For that, the importing module contains a list of fixups which, at load time, need to be replaced by the actual addresses of these procedures.

39. How can a dynamic stack be implemented? (On a system with virtual memory: unmapped pages / page fault handler, system without virtual memory: stack allocation by compiler)

40. Describe in more detail how the stack expansion and reduction would be implemented with virtual memory / with heap blocks. Elaborate the data-structures on the stack. Think about parameter passing. (ideas only, typical question to be elaborated during exam).

Virtual memory: When a page fault occurs on a stack access, allocate a new page and retry the instruction. When a process exits, de-allocate all of its stack pages.

Heap blocks: The process stores the current "stack limit", once it gets close to that, a stack expansion call is made which allocates and "jumps" to a different heap block. When execution returns from a heap block, that block can be de-allocated.

41. Describe a mechanism to check if a system is still alive / to check if a certain process does something "often enough" (watchdog): check register state in timer interrupt

Watchdog: Have a special register which the process has to periodically write to. The watchdog process is run periodically (e.g. through timer interrupts), checks if the register has been set, and resets it. If the register was not set, the watchdog could for example restart the process in question.

42. Describe what has to be done to implement an Unload functionality in a system with dynamic module loading

Reference counters need to be stored for each loaded module: a count of how many other modules depend on it. When a module is to be unloaded, its reference counter has to be 0. It is then unloaded, and all its dependency module are checked: Their reference counters are decreased by 1, and if they are now 0, that module is unloaded as well, and its dependency modules are checked etc.

43. Describe a very simple scheduler that supports tasks with (periodic) high-, lowand (non-periodic) background-priority. What is a sufficient condition to make this scheduler work? (high- and low- tasks have to run to completion within their periods.) Sketch a running szenario and comment on the stack. Is one stack enough?

Tasks are scheduled at intervals. Higher priority tasks preempt lower priority ones, and all tasks can run to completion within their given timeframe. They all share the same stack, which is made possible by that latter assumption.

44. Describe the Serial Peripheral Bus. How does the communication work? SCLK, MOSI, MISO, SS – how do you control this in software? Roughly describe bitbanging implementation.

For each bit:

  • Set the MOSI to bit
  • Wait half a clock cycle
  • Set clock to 0
  • Wait half a clock cycle
  • Set clock to 1

45. How did the SPI shift registers in the LED display driver allow for multi-digits control?

Say each digit requires 7 bits. If I want to control 2 digits, I send the 2x7 bits for both of them in sequence. The first digit receives 7 bits, and then when the next ones arrive, starts shifting these 7 bits out to the 2nd digit. After all 14 bits, each display has exactly "its" bits in the shift register. This can be daisy chained arbitrarily.

46. What is a memory-mapped I/O register?

A register which can be accessed at a specific memory location even though it does not correspond to actual memory

47. What is an active object (object with activity plus a monitor construct). What is the advantage of implementing active objects in the language (and not as library => peer managed vs. system managed implementation). Semantics of active objects.

Locks and monitors are there by default, one can write compiler support for them

48. How is a multi-processor system booted? (One dedicated start processor activating the other processors during system startup)

First, only one processor starts and sets up the boot program at a fixed address. It then sends a startup command to the other processors via APIC, which then all run the boot program. This program sets up the correct runtime environment (32bit), initializes registers, MMU, interrupt handling etc. and finally sets the started flag and sets up the scheduler.

The first processor (bootprocessor) then proceeds to boot the console.

49. Show the life cycles of activities in the active object system / a system with monitor When does which transition take place?

Active Objects: The states are Ready, Running, Awaiting Condition, Awaiting Lock and Terminated. Activities start in Ready, when they get scheduled they transition to Running. When they try to acquire a lock they transition to Awaiting Lock. When they use the await command they transition to Awaiting Condition. When they acquire the lock or the condition gets fulfilled respectively, they go back to Ready. When the activity completes, it transitions to Terminated.

Monitors: The states are new, runnable, waiting, blocked, and terminated. Threads are initially new, and once they're started they become runnable. When they invoke wait(), they become waiting until notified, where they go back to runnable. When they try to acquire a lock, they become blocked until the lock is acquired, at which point they go back to runnable. Once finished, they become terminated.

It's essentially the same, isn't it? but with no distinction between ready and running.

50. What kind of data structures do support threads / processes in a system with a monitor / the active object sytem? (running array, ready queue, object queues with conditions and locks). What kind of action takes place during transitions in the process state diagram (ready->running, ready -> waiting etc.)

ready -> running: The process gets scheduled on a processor and starts execution

running -> awaiting condition: The process has invoked an await() call and is now waiting for a condition to turn true. It is de-scheduled. It is de-scheduled, its await queues are checked, and a new process is scheduled.

running -> awaiting lock: The process is trying to acquire a lock and is waiting for it to become available. It is de-scheduled, its await queues are checked, and then a new process is scheduled.

running -> terminated: The process has finished execution. Its await queues are checked and a new process is scheduled.

51. How can stack allocation be supported in a system with virtual memory for multiple processes ? Is the stack conceptually unbounded? (no, virtual memory also has to be divided)

Page wise allocation again?

52. In the course we distinguished between synchronous and asynchronous context switches. Why? (performance for synchronous switches) Explain what has to be done for context switch (sync->sync, async->sync, async->async, sync->async)

  • From sync: Store fp, sp, pc

  • From async: Store entire state

  • To sync: Restore fp, sp, pc

  • To async: Restore entire state (copy state)

53. In A2, certain objects provide a monitor. Where is the monitor information (lock queues etc.) stored? (in the header of the object, not in the type descriptor).

Each object has two queues, one for objects awaiting a lock and one for objects awaiting a condition. They're stored in the object header.

54. In a system with (active or passive) objects, objects have to provide certain metadata. Which? (type extension information, method tables, garbage collection flags, lock queues, array size information). Where are the metadata stored? (in type descriptor or in object(header))

  • Type extension information: "Inheritance", stored in type descriptor
  • Method tables: Not sure where those are stored
  • Garbage collection flags: Stored in Type descriptor
  • Lock queues: Stored in object header
  • Array size info: ??

55. What is a type descriptor and what is it used for? How to implement a type check / type guard? Sketch a type descriptor lookup.

Type descriptor contents:

  • Size of allocated object
  • 3 entries for extensions (inheritance)
  • Pointer field offsets for garbage collection

Required for type checks (IS) and for allocating a new object

To implement a type check, the object's type tag (=address of type descriptor) is resolved and the extensions are compared to the reference type.

56. When monitors are implemented using constructs such as "Lock", "Unlock" and "Await", behind the scenes fine-granular locks have to be used. Why and which? (locks on the shared data structures, i.e. lock queues / wait queues)

The implementation relies on shared data structures - each object has a queue for other objects which wait on a lock/condition. Access to these structures needs to be exclusive to prevent race conditions.

57. In A2 waiting conditions are evaluated when a process exits the lock of an object. How can that accomplihed. (Boxing of waiting conditions in a procedure). Special question: what can go wrong? (thread-local memory such as the thread id might be wrong during evaluation of the condition).

The condition evaluation is turned into a function, a reference to which is stored in the await queue.

58. Where are the following metadata typically stored: type tag, array descriptor, lock/ await queues?

  • type tag: object header
  • array descriptor: type descriptor (?)
  • queues: object header

59. What is the problem with locks?

Deadlocks can occur: each process is waiting for another process to release a lock.

Livelocks can also occur: No processor holds a lock, but they all wait with acquiring one for fear of causing a deadlock

Lastly, starvation is possible: A process could potentially wait forever to receive a lock.

60. What is CAS, semantics. Can you replace it with atomic registers? (special question, not handled in the course)

CAS (Compare and Swap): Atomically update a value iff the assumption about the previous value is correct.

CAS(location, old, new): ATOMIC
    old_value = location.value
    if (location.value == old):
        location.value = new
    return old_value

CAS can not be replaced with atomic registers. Atomic registers have a consensus number of 1, while CAS' consensus number is infinite. (Consensus number = maximum number of threads which can solve the consensus problem using this primitive. As is apparent, atomic registers are useless as soon as we have more than 1 thread.)

61. What is the key idea of Cooperative Multitasking wr.t. Lock-free algorithms? (uncooperative sections, bounded number of processors). What is about interrupts? (virtual processors)

In Coop Multitasking, scheduling can be disabled in specific sections (=uncooperative sections). The process will not get de-scheduled inside these sections, which provides the following guarantee: At most (#processors - 1) other processes run concurrently. Thanks to this, certain lock-free algorithms can be implemented more efficiently, e.g. hazard pointers with one pointer per processor instead of per thread.

To account for interrupts, we add a "virtual processor" which would run the interrupt code. We keep separate hazard pointers for that one, which increases the number of potential concurrent processes by 1, but still provides an upper bound.

62. When you write a scheduler in a lock-free kernel, then the enqueueing and dequeueing of tasks into lists runs while the context switch takes place. What is the potential problem and how do you solve it?

In the naive task switch, the next task is dequeued, then the current (now suspended) task is enqueued, and finally the switch to the next task happens.

It is possible for some other thread to dequeue the now-suspended task and run it on the stack of the currently executing thread.

Processor 1                                 Processor 2

current := GetCurrentTask() # Task A
next := Dequeue(readyqueue) # Task B
Enqueue(current, readyqueue)
                                            current := GetCurrentTask() # Task C
                                            Enqueue(current, readyqueue)
                                            next := Dequeue(readyqueue) # Task A
SwitchTo(next) begins # runs on stack of
                      # Task A
                                            SwitchTo(next) # switches to Task A

Here, two processors run on the stack of Task A, which will likely corrupt the stack!

SwitchTo(next) ends 

To mitigate, we need to run the enqueue on the new thread instead of the old one, to prevent any other processor from picking it up before it is really de-scheduled.

**63. What types of parallelism may be exploited using the dataflow approach ? (task parallelism, stream parallelism / pipelienes and data parallelism)

Task parallelism: The same task can be executed on different data concurrently

Stream parallelism: Multiple tasks in sequence can be executed concurrently in a pipeline.**

64. Problems with current commercial multicore processor architecture? Memory Bottleneck, Synchronisation overhead, Cache invalidation

  • Memory Bottleneck: Access to shared memory is tricky and slow.
  • Caches: For shared memory, if one processor writes to it, the caches of all other processors need to be invalidated, which causes nasty overheads.

65. What is the advantage of using 18-bits for each instruction on a TRM? Disadvantage?

Higher code density, lower decoding overhead (not sure why)

Jumps cannot be encoded in a single instruction (addresses are too large)

66. What is the advantage of dynamically building hardware for dataflow programs? (less congestion, less energy consumption)

  • No global memory (no congestion)
  • No interrupts - no OS
  • Easy to integrate dedicated hardware
  • Low power consumption

67. If you were to write a hybrid compiler to generate and deploy cell networks on FPGAs, how would you implement it? Where runs what?

We can separate the actual component implementation and the hardware specification. The high level implementation is compiled to a HDL (e.g. verilog) which is then synthesized into a bitstream using platform-specific tools and settings.

The cell network has to be synthesized. However, the cells can be implemented as TRMs and the code running within them can be compiled to some machine code, which can be patched into the final bitstream for flashing. This alleviates the need for a full synthesis when only the cell code changes.

68. What kind of signals do you minimally need in a system of cooperating processes (Active Cells) for point-to-point communication? Valid, ready, data in / out + select for multiplexing network.

The sender provides the following signals:

  • Valid (output signal): set when valid data can be read
  • Select
  • Data out (output signal): the actual data

The receiver provides the following:

  • Ready (output signal): set when receiver is ready to process data
  • Select
  • Data In (input signal): where the actual data is read
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment