Interrupts in OS code: let's reason about them. Yes, this means concurrency.
Presented as part of the 2016 HCSS conference.
ABSTRACT
Existing modelled and veried operating systems (OS's) typically run on uniprocessor platforms and run with interrupts mostly disabled. This makes formal reasoning more tractable: execution is mostly sequential.
The eChronos OS is a real-time OS used in tightly constrained devices such as medical implants, running on (uniprocessor) embedded micro-controllers with no memory-protection support. It is used in the DARPA-funded HACMS program, where it runs the ight control software of a high-assurance quadcopter.
To provide low latency, the eChronos OS runs with interrupts enabled and provides a preemptive scheduler. This means that external hardware device interrupts may happen at any time, including during execution of OS code, and that application tasks may be preempted by the scheduler to run a more "urgent" task. The notion of urgency is captured by priorities given to tasks, and the main job of the scheduler is to guarantee that the task executing is always the highest-priority runnable task.
In terms of verification, this means that concurrency reasoning is required: at any given point in time, a hardware interrupt may happen as OS code is running (including scheduler code), leading to execution switching to interrupt handling code. Formally reasoning about concurrent execution signicantly increase the complexity: application and OS instructions may be interleaved with handler instructions.
In our work we explicitly model the effect of interrupts and their handling by the hardware and OS. We provide a general formal model of the interleaving between OS code, application code and interrupt handlers. We then instantiate this model to formalise the scheduling behavior of the eChronos OS, and prove the main scheduler property mentioned above.
Our work is all formalised and machine-checked in the Isabelle theorem prover. It adapts the Owicki-Gries methods for concurrency reasoning to model the non-deterministic occurrences of interrupts. Our model supports both direct and delayed calls to the scheduler, as well as nested interrupts.
Formal verification has increased significantly the reliability and security of software systems in recent years. Tackling the concurrency induced by interrupts, and more generally providing frameworks for reasoning about interleaved execution of low-level OS code, will make it possible to apply high-assurance techniques to a wider range of domains, including real-time and multicore.
------
Acknowledgements NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre-of-Excellence Program.
BIO
June Andronick is a Senior Researcher at Data61|CSIRO (formerly NICTA). Her research focuses on increasing the reliability of critical software systems, by mathematically proving that the code behaves as expected and satisfies security and safety requirements. She contributed to the seL4 correctness proof and now focuses on concurrency reasoning for OS code. She leads the concurrency software verification research in Data61, and is deputy leader of the Trustworthy Systems group. She was recognised in 2011 by MIT's Technology Review as one of the world's top young innovators (TR35). She holds a PhD in Computer Science from the University of Paris-Sud, France.