Icosian Reflections

…a tendency to systematize and a keen sense

that we live in a broken world.

[CS161] On Scheduling

This is a very technical post, largely for the benefit of the students of CS161: Operating Systems, for which I am a Teaching Fellow this semester. It may be useful to you if you're interested in operating systems for some reason, but if you're not in a CS mood today, maybe just move along.

Why Do We Schedule, Master Bruce?

A scheduler, as you know, is responsible for determining which threads run, for how long, and in what order. As much as possible, it should give the shared illusion that each process is running constantly to completion, using the entire processor. To this end, there are three major desiderata:

  • That interactive threads (in particular, user-interactive threads) are responsive.
  • That no process starves.
  • That the system, on average, runs quickly.

These high-level desiderata factor into the low-level conditions that:

  • Threads which block expecting a response are rescheduled promptly after waking.
  • Time is allocated more-or-less fairly, subject to:
  • Processes closer to completion are prioritized (recall that shortest-time-to-completion-first is provably optimal in total average time)
  • ...but in any case, do not starve even the long-running processes too much (exponentially is a good benchmark -- since that, analytically, places a finite cap on the total runlength of a process).

Many of the designs I've seen from the class in A2 design documents fail in at least one of these respects, so I think it may be useful to go over common design failures and good designs.

indicates a potential issue of which you should be cautious. Check that your system doesn't have this as an issue!


Timeslicing and Timedicing

Not every thread blocks voluntarily; after running a thread for a certain period of time, you will need to deschedule it and let someone else run. Very often, we make the decision about the maximum timeslice a thread will be allowed when we schedule it. (Every design I've seen so far does this, at least.)

However, beware: It is not always the case that the thread wants to (or is capable of) using the entire slice given to it. If you consistently charge a thread for using the entire timeslice when it only uses a fraction of it, you are not allocating things very fairly. This is a failure of round-robin schedulers using constant-size timeslices.

You can remedy this in at least two ways:

  • Don't charge the thread the entire slice, by crediting it the remainder, probably as an opportunity to be scheduled sooner than "back of the queue". Just giving it more time next time probably doesn't work, since threads which block a lot tend to keep blocking a lot, and the banked extra time doesn't really help. This also has potential issues if you aren't aware of how much remainder you're crediting, and simply give a constant amount of credit to everyone if they manage to block before their slice finishes.
  • Do charge the entire slice, whatever, but next time, give it a shorter (more appropriate) slice. Compensate by making it run more often, so that it doesn't get throttled just for being blocky. Don't just shorten the slice, since that only limits the amount of time it gets to run.

Queue Like a Brit

A common way to do the bookkeeping of "shorter timeslice, scheduled more often" is by using multi-level feedback queues where threads in lower-numbered queues are scheduled more often, but are given shorter timeslices. The simplest re-queueing algorithm is to simply enqueue a descheduled thread up or down a level depending on whether it blocked or overran, in the hopes that it'll either reach equilibrium, the top, or the bottom.

One useful MLFQ thing is to always insert new threads at the top (fast) end -- the downside of accidentally putting a runny thread at the blocky end is much less than the downwide of putting a blocky thread at the runny end, or even the middle. (Why?)

However, the naive algorithm (just pop threads off the queues in fixed proportions) for doing this has problems -- imagine if you have \(N\) runny blocky threads, and one runny one: the blocky threads will, on average, run for \(\frac1{2N}\) of the time, and the runny thread will get the other \(\frac12\) to itself. This disadvantages threads in overpopulated queues both in timeshare, and (for blocky threads) in responsiveness.

You can design around this by (1) attempting to ensure that queues do not become overpopulated (2) ensuring that the only queues that become overpopulated are those which contain threads you're fine with disadvantaging (say...long-running and runny) or (3) popping more frequently off more-populated queues.

Are We Done Yet?

Recall that the provably optimal solution for minimizing total runtime in the non-streaming, non-blocking case is to run to completion the process which will take the shortest amount of time, followed by the next-shortest, and so on. Now, this result does not hold if processes need to block for intervals, or new jobs enter as you're running, and says nothing about responsiveness, but it's still important to remember. In 161, we would like to see you schedule intelligently w/r/t estimated-time-to-completion, running threads you expect to exit soon for more time on average than threads you do not expect to exit soon.

It is a common assumption that threads that have run for a long time so far can be expected to run for even longer (and so have a longer estimated time-to-completion than threads that are short-run-so-far). This assumption has some interesting implications for the distribution over runlengths of jobs on your system, which you should discuss on the Piazza post I made about it. But for now, it's okay to take it as valid.

Nevertheless, it is not okay to always run shorter-run(-so-far) threads before any longer-run ones. Imagine, for example, the case of the Tortise and the Rabbits -- the Tortise is long-running and never forks new processes; the Rabbits are short-running individually, but spawn new Rabbits fairly rapidly. So the Tortise will never run when it gets older than the average Rabbit lifespan, and in particularly egregious cases, Rabbits may get swamped even by their own children!

This problem presents itself even in softer prioritize-the-young cases -- in particular, if long-running threads are deprioritized exponentially in age, there is a maximum runlength beyond which they may never be able to pass, if there is a constant buzz of short-running threads also happening. This is a problem, even in the randomized case.


This is not a comprehensive list, but it touches on at least three problems I actually saw in designs submitted in A2 DesDocs.