 Languages and concurrent programs

Zac Kinkaid -- U. Toronto

Notes legibility estimate: LOW

Automatic analysis of algorithms

We'd like to know things like "What numeric types are used here? Are these array accesses in-bounds?"

Today: Proving the absence of faults in multi-threaded programs.
Multi-threaded programs are a great target for automated analysis, since they're so notoriously difficult for humans to reason about.

History:

• Floyd -- program invariants
• Ashcroft, Manna -- extended Floyd to multithreaded programs by treating a multiprogram as a nondeterministic single-threaded program -- difficulty: doesn't scale in #threads
• ??, Reese -- (for data-independent threads) Prove each thread correct individually; then check that the reasoning doesn't interfere across threads -- difficulty extending to data-dependent threads, which requires some cleverness, which is difficult to automate

in common: attempts to reason about multi-threaded code using (extensions of) sequential reasoning

The big problem

Given (program, property), does property(program) hold?

Problem!

Turing: Halting indeterminacy
Rice's Theorem: Every nontrivial property is undecideable

Solution

Allow either one-sided error or indeterminacy/divergence

Example

x = 2; y = 1
while(y != 0):
y = randInt() mod x
x = y + 2

Consider let the set of reachable states be points on the integral lattice; while we can't compute anything meaningful on all of them, we can reason about geometric supersets of them (here, the set $y-x\leq1$)

Invariants from Constraints

Write a constraint system defined by program structure; prove by induction that the invariants continue to hold

Again, this extends badly

Data-flow graphs

Draw some sequential graphs side-by-side; add additional cross-thread dataflow links (aware of active synchronization). Think: When you're imposing synchronization on your programs, you're really pruning unwanted dataflow.

Multiple-threading: Need to add data-flow links within threads (representing cross-instance interaction), but don't need extra instances. Again, aware of synchronization.

Method: Just assume you already have the invariants, then:

-->> [Interference Analysis] - (data-flow edges) -> [Invariant Generation] - ("invariants") -->> {repeat}

...once (if?) the process converges, the "invariants" you have are actual invariants. Think of it as "induction on # context switches" -- so it terminates finitely, since there's an upper limit on # data-flow edges.

Q: Why don't you just add all the possible data-flow edges, and run it in one pass? A: Given inductive invariants, some edges may not actually be material, so an $n$-pass algorithm can give you a more precise system of invariants.

Live-Fire

(In many instances, DUET can prove more assertions in less time than conventional benchmarking.)

Q: Do you have a sense of what things DUET has trouble with? A: The fact that we use simple pointer analysis to resolve memory queries gives us a lot of imprecision.

iDFGs

Problem: You lose information about your program as soon as you assume it's a data-flow graph.

The Question: Can you find explicit parallelism in a complete proof system?

Godel: Nope!

Revised Question: Can you find explicit parallelism in a proof system complete wrt Floyd proofs?

Inductive Data Flow Graph -- analogously to data-flow graphs, but the links are hypotheses which satisfice proof obligations.

How do you think about programs?

• A single nondeterministic control-flow graph
• An infinite series of traces

So: Let's look at an iDFG as a language recognizer. Every iDFG edge enforces an ordering constraint on logic nodes; so if every ordering constraint is satisfied, then the trace is recognized by the iDFG.

Thm: If there is a Floyd proof of the correctness of \$P\$ (on the full cartesian-product nondet-sequential-graph), then there is an iDFG proof, with size polynomial in the data complexity of \$P\$.

What's Next?

What do you do when you find a bug?

• Report it to the user.
• Suggest a program fix? Say...inserting missing synchronization?

iDFGs are a great platform for this, since they're structurally separate from the program themselves -- can represent a subset of program traces; can represent a superset of program traces. (Can suggest allowable parallelizations that don't add additional strange behavior.)

We program at such a low level -- we think about red-black trees, not in terms of abstract sets. How do we reason about high-level program behavior?

• Satisfiability-modulo-theories solvers -- a very general platform for doing automated reasoning
• Compositional program analysis

Q: Are there non-obvious challenges in extending an static-analysis solver to deal with the situation where one of your agents exhibits potentially unknown behavior, for example, attempting to prove an operating system kernel secure versus the space of possible user programs? A: Well, nondeterministic actions are supported, but [something called reflection] is a hard problem in even sequential static analysis, and probably an even harder one in parallel programs.