Testing a real-time operating system on a single core is hard. Testing it on four cores is a nightmare. And testing a multi-processor locking protocol like DFLP (Distributed Flexible Locking Protocol), one that physically migrates tasks across silicon, on a non-deterministic SMP (Symmetric Multiprocessing) system? That’s where traditional testing just stops working.

Here’s the thing: you cannot reliably test for absence of deadlock by running test cases. You can run a million iterations of spfmlp01 on a quad-core Leon3 simulator and never hit the one interleaving where Core 2’s dispatcher fires mid-migration while Core 0 is holding a sticky lock and Core 3 is doing a FIFO handoff. The state space of a 4-CPU RTEMS system executing concurrent lock operations is just too large. Standard testing can prove the presence of bugs, never their absence.

This is why the 350-hour GSoC 2026 DFLP timeline sets aside real time for Formal Verification using Promela and SPIN.


1. Why Testing Fails for SMP Locking

Let’s put some numbers on it. Four CPUs, each capable of running one of three tasks, each task can be in one of five states (idle, requesting, waiting, holding, releasing). Minimum state space:

(5 states × 3 tasks)^4 CPUs = 50,625 unique system states

And that’s a massive undercount. Once you factor in the RTEMS dispatcher internals (Thread_Control lifecycle flags, Per_CPU_Control dispatch disable counters, Scheduler_Node sticky bits, interrupt timing), the real number of reachable states goes into the millions.

A test case asserts against one specific execution path. Even our carefully written spfmlp01, spfmlp02, and spfmlp03 tests only validate a handful of deterministic scenarios: “Task A acquires, Task B waits, Task A releases, Task B gets it.” They just can’t reach the explosive universe of non-deterministic interleavings that real 4-CPU hardware runs into.

The Heisenbugs

The bugs hiding in these unexplored interleavings are called Heisenbugs. They vanish when you try to observe them. Adding a printf to debug them changes the timing just enough to alter which core wins the dispatch race, and poof, the bug disappears. The INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL panic I chased during the FMLP port was exactly this kind of bug: it only showed up under specific Per_CPU_Control counter asymmetries that were basically impossible to reproduce on demand.


2. Enter Promela and SPIN

Promela (Process Meta-Language) is a modeling language built for concurrent systems. Instead of writing C code that executes a protocol, you write a Promela model that describes the protocol’s state machine.

SPIN (Simple Promela Interpreter) is the verification engine. Given a Promela model, SPIN doesn’t just run it. It exhaustively explores every single reachable state in the model, checking assertions at every step. It doesn’t sample the state space. It covers it completely.

What We Model

For DFLP, the Promela model needs to capture the critical concurrent actors:

  1. Processes: Each CPU core is modeled as an independent Promela proctype. Each task attempting to acquire a DFLP lock is another proctype.
  2. Shared State: The DFLP lock itself (owner field, FIFO wait queue), the Per_CPU_Control dispatch disable counters, the thread pin_level, and the STATES_LIFE_IS_CHANGING lifecycle flag.
  3. Atomic Sections: The _Thread_queue_Acquire_critical() and _Thread_queue_Release() boundaries in the RTEMS SuperCore map to Promela atomic { } blocks, representing indivisible kernel operations.
  4. Non-determinism: The scheduler’s dispatching decisions (which core runs which task when) are modeled using Promela’s if :: guard1 -> ... :: guard2 -> ... fi non-deterministic choice, forcing SPIN to explore every possible scheduling decision.

What We Prove

SPIN doesn’t just check “does it crash?” It verifies temporal properties, mathematical statements about all possible futures of the system:

  • !deadlock (Freedom from Deadlock): SPIN exhaustively verifies that there is no reachable state where all tasks are permanently blocked, each waiting on a lock held by another. This gets validated across the entire state graph.
  • Mutual Exclusion: At no point in any execution can two tasks simultaneously be in the “holding” state for the same DFLP lock. In Promela: assert(!(task_a_holds && task_b_holds)).
  • FIFO Ordering: If Task A enters the wait queue before Task B, Task A must acquire the lock before Task B. This is modeled using Promela sequence variables and verified as a Linear Temporal Logic (LTL) property.
  • Migration Safety: A task’s pin_level must be zero before any _Thread_Set_CPU() migration call. A non-zero pin_level during migration is exactly what triggers the INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL fatal panic.

3. Why This Justifies the 350-Hour Scope

Building these Promela models from scratch takes serious effort. Here’s why:

3.1: No Existing RTEMS 7 Promela Models

There are no pre-existing formal models of the RTEMS 7 SuperCore dispatcher. The SPIN models for DFLP have to be built from first principles by reading the C implementation and translating the state machine logic into Promela. That means manually modeling:

  • The _Thread_queue_Enqueue_sticky() / _Thread_queue_Surrender_sticky() paths.
  • The dispatch disable/enable symmetry across Per_CPU_Control boundaries.
  • The scheduler helping protocol and priority aggregation via Red-Black Trees.
  • The _Thread_Set_CPU() migration path and how it interacts with the STATES_LIFE_IS_CHANGING protection bit.

3.2: State Space Explosion Management

Raw SPIN verification on a naively modeled 4-CPU system will just eat all your memory. Practical SPIN work requires careful abstraction: replacing concrete RTEMS priority values with symbolic booleans (high/low), bounding queue lengths, and using partial order reduction to collapse equivalent interleavings. Getting these abstractions right (tight enough to fit in memory, loose enough to still catch bugs) is where the real engineering challenge lives.

3.3: Iterative Model Refinement

Formal verification is not “write model, press button, done.” It’s an iterative loop:

  1. Write initial model.
  2. SPIN finds a spurious counterexample (a “bug” that only exists in your model, not the real code).
  3. Refine the model to eliminate the spurious state while keeping all real states.
  4. SPIN either finds a real bug or confirms correctness.
  5. Update the C implementation if a real bug was found.
  6. Go back to step 1.

Each iteration can take days of careful analysis.


4. The Payoff: Mathematical Confidence

When SPIN reports "0 errors found" after exhaustively exploring millions of states, that’s not a test passing. That’s a mathematical proof that the property holds for every possible execution of the model. No amount of fuzz testing or randomized runs can give you this level of assurance.

For a safety-critical RTOS like RTEMS, where missing a deadline can mean a satellite loses attitude control or a surgical robot moves unexpectedly, formal verification isn’t nice-to-have. It’s the only responsible approach for something as dangerous as migrating live real-time tasks across CPU cores during execution.


5. Wrapping Up

Standard SMP regression tests like spfmlp01-03 are necessary but not sufficient. They validate the happy path, the specific execution sequence the test author imagined. They can’t reach the millions of non-deterministic interleavings that a quad-core RTEMS system will hit in the field.

Promela/SPIN closes this gap by exhaustively proving that critical safety properties (no deadlock, mutual exclusion, FIFO fairness, migration safety) hold for every possible execution. Building these models for the RTEMS 7 SuperCore dispatcher is a serious undertaking and forms a core pillar of the 350-hour GSoC 2026 DFLP timeline.

The FMLP port taught me the hard way that INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL hides in the cracks between CPU cores. SPIN makes sure those cracks are sealed shut, permanently and provably.


TL;DR

  • The Problem: Traditional testing can’t reliably detect race conditions in SMP systems. The state space of a 4-CPU RTEMS dispatcher is way too large, and bugs (Heisenbugs) only show up under specific, unreproducible interleavings.
  • The Tool: Promela models the locking protocol’s state machine. SPIN exhaustively explores every reachable state, checking assertions at every step. Not sampling. Complete coverage.
  • What Gets Proven: Freedom from deadlock (!deadlock), mutual exclusion, FIFO ordering, and migration safety (pin_level == 0 before _Thread_Set_CPU()).
  • Why 350 Hours: No existing RTEMS 7 Promela models exist. Building them from scratch requires translating the SuperCore dispatcher logic, managing state space explosion, and iterating through model refinement cycles.
  • The Payoff: When SPIN says “0 errors,” it’s a mathematical proof, not a passed test. For safety-critical systems, this is the only responsible way to verify a protocol that migrates real-time tasks across silicon.