If you spend any time developing hard real-time systems on modern multi-core hardware, you will eventually face the absolute nightmare that is multi-processor resource contention. When multiple Central Processing Unit (CPU) cores fight for the same piece of memory, the mathematically proven safety of your Real-Time Operating System (RTOS) gets put to the ultimate test.
In RTEMS (Real-Time Executive for Multiprocessor Systems), the answer to this complex puzzle for clustered Symmetric Multiprocessing (SMP) environments is currently being forged: The Flexible Multiprocessor Locking Protocol, universally known as FMLP.
Note: The native integration of FMLP deep into the RTEMS 7 SuperCore is currently an active upstream effort (see MR #882). I am actively porting it from the original research implementation repository based on the formal paper “Supporting Multiprocessor Resource Synchronization Protocols in RTEMS” (Shi et al., 2021).
In this post, we’ll break down FMLP. We’ll define the acronyms, look at why it was created, explain how its two distinct variants work under the hood, and cover why it is considered a masterpiece of real-time scheduling mathematics.
Why We Need FMLP: Priority Inversion and Starvation
Before we can appreciate the mechanics of FMLP, we have to understand the two massive problems it was built to solve: Priority Inversion and Starvation.
What is Priority Inversion?
Imagine three tasks running on your system:
- Task Low (Priority 10): A background logging task.
- Task Medium (Priority 5): A GUI rendering task.
- Task High (Priority 1): A critical flight-control task.
Task Low acquires a software lock (a Semaphore) to write some data. Before it finishes, Task High wakes up and urgently needs that exact same lock. Task High is now “blocked” waiting for Task Low.
This is already bad, but it gets worse. Task Medium wakes up. Because Task Medium has a higher priority than Task Low, it preempts Task Low. Now, your absolutely critical flight-control task is waiting for a GUI renderer to finish, just so the logging task can finally run and release the lock.
This chain reaction is called Priority Inversion, and in an hard real-time system, it causes missed deadlines and catastrophic failures.
What is Starvation?
To solve Priority Inversion, traditional protocols use Priority queuing. When multiple tasks wait for a lock, the highest priority task gets it next.
But what if high-priority tasks constantly keep asking for the lock? A medium-priority task might sit in the waiting queue forever, perpetually passed over by newer, higher-priority arrivals. This is called Starvation.
FMLP was designed specifically by real-time researchers to mathematically conquer both of these problems simultaneously in a multi-core SMP environment.
The Fix: Strict FIFO Queuing
The defining characteristic of FMLP is its wait queue discipline. Instead of sorting waiting tasks by their priority, FMLP strictly uses FIFO (First-In, First-Out) queuing.
When a task requests an FMLP lock that is currently taken, it gets in line. If you are the first task in line, you get the lock next. Period. It does not matter if a task with maximum system priority gets in line behind you; they must wait their turn.
This solves Starvation completely. Every task is guaranteed a mathematically bound maximum wait time based exactly on how many tasks are physically waiting ahead of it, not on the shifting priorities of the rest of the system.
But wait. If we use FIFO, aren’t we re-introducing Priority Inversion? If a High Priority task is stuck behind a Low Priority task in the FIFO queue, isn’t that a disaster?
This is where FMLP splits into its two brilliant variants to protect the system.
FMLP-Short: Spinning for Microseconds
Not all critical sections (the code protected by a lock) are created equal. Some critical sections just update a single counter in RAM, taking a fraction of a microsecond. Putting a task to sleep just to wake it up a microsecond later wastes massive amounts of CPU time executing context-switches.
FMLP-Short (FMLP-S) is engineered specifically for these lightning-fast operations.
- Spinning, not Sleeping: When an FMLP-S lock is taken, waiting tasks do not go to sleep (“suspend”). Instead, they perform a Busy Wait (Spinning). The processor literally loops continuously, checking if the lock is free. They spin in strict FIFO order.
- Priority Ceiling: To ensure the task currently holding the lock finishes as fast as humanly possible, FMLP-S utilizes a Priority Ceiling protocol. The moment a task acquires the lock, its priority is artificially skyrocketed to the “ceiling” (the maximum priority configured for that scheduler instance).
- Local Non-Preemptive Execution: Furthermore, FMLP-S disables preemption locally on the executing core. Nothing can interrupt the lock holder.
The Result: The lock holder blasts through the ultra-short critical section at maximum priority with zero interruptions. The waiting tasks, spinning in RAM, instantly detect the lock release and the first task in the FIFO queue snaps the lock up with zero context-switch latency.
FMLP-Long: Suspending for Heavy Lifting
If your critical section involves copying massive chunks of memory or waiting for a slow hardware peripheral to respond, spinning in a loop is a terrible idea. You would be burning CPU cycles that other tasks desperately need.
This is where FMLP-Long (FMLP-L) takes over.
- Suspending, not Spinning: When an FMLP-L lock is taken, waiting tasks do not spin. They are immediately Suspended (put to sleep). They are placed into a FIFO wait queue, entirely freeing up their respective CPU cores to run other tasks.
- Priority Inheritance: FMLP-L abandons the Priority Ceiling used in the Short variant. Instead, it relies on the Priority Inheritance Protocol.
- If Task Low holds the lock, and Task High requests the lock and goes to sleep in the FIFO queue, the RTEMS kernel immediately boosts Task Low’s priority to match Task High.
- Task Low “inherits” the urgency of the tasks waiting for it. This guarantees that Task Low will not be preempted by medium-priority tasks, completely stopping the Priority Inversion chain reaction we discussed earlier!
The Result: Tasks wait efficiently in their sleep, ordered fairly by FIFO, while the lock holder inherits the maximum priority of the fastest-approaching deadline in the queue to ensure it finishes without delay.
Hardware Reality: Clustered Scheduling and “Sticky” Tasks
To truly understand how FMLP maps to silicon, we must understand its limits regarding hardware topology.
FMLP is designed for Clustered Scheduling. In RTEMS SMP, you don’t just dump all your CPU cores into one giant pool. You group them into “clusters” (usually grouping cores that share an L2 or L3 physical CPU cache).
FMLP is what researchers call a Sticky protocol. When a task acquires an FMLP lock, that task remains securely bound to its originally assigned cluster or CPU core.
- If a task on Core 2 requests a lock, it executes the protected code natively on Core 2.
- FMLP does not natively migrate tasks across the SoC (System on Chip) simply to access a lock.
This “sticky scheduling” is phenomenal for ensuring CPUs don’t constantly lose their L1 cache data by migrating tasks needlessly. It keeps execution predictable and fast.
However, as noted by recent proposals extending RTEMS locking capabilities (such as the Distributed Flexible Locking Protocol / DFLP), this sticky nature means FMLP cannot physically migrate a task to reach a hardware peripheral (like a DMA controller) that is physically hardwired to a completely different core. FMLP handles software contention perfectly, but it relies on other protocols to route tasks to physical destinations.
Why This Matters: Mastering Determinism
For architects of flight-control systems or surgical robots, speed is secondary to mathematical guarantees. Before FMLP, massive SMP clusters meant juggling Priority Inversion bottlenecks or accepting unfair queueing that risked task starvation. By integrating FMLP’s strict FIFO fairness and O(log n) scalability directly into the RTEMS 7 SuperCore, we are providing the bedrock for hard real-time determinism on modern multi-core hardware. FMLP allows developers to scale to dense SMP silicon without ever sacrificing the predictability that mission-critical systems demand.
Wrapping Up
The Flexible Multiprocessor Locking Protocol is exactly what its name implies: Flexible.
By natively acknowledging that short memory updates and long hardware wait-states are fundamentally different beasts, FMLP gives RTEMS developers surgical tools.
- FMLP-S trades CPU cycles (spinning) to eliminate context-switch latency for microsecond operations.
- FMLP-L puts tasks to sleep and utilizes mathematical Priority Inheritance to safely orchestrate long operations without grinding the multi-core system to a halt.
Coupled with an unwavering commitment to strict FIFO queue fairness to entirely eliminate task starvation, FMLP stands as one of the most mechanically robust synchronization tools available to modern embedded engineers. As the work to fully integrate it into the RTEMS 7 SuperCore concludes, it will provide developers with a bedrock of mathematically proven determinism for their most complex multi-core applications.
TL;DR
- What is FMLP? The Flexible Multiprocessor Locking Protocol. It’s a real-time protocol actively being ported into RTEMS 7 (via MR #882) to handle multi-core memory contention.
- The Main Goal: It uses strict FIFO (First-In, First-Out) waiting queues to guarantee fairness and completely eliminate Starvation (where lower priority tasks wait forever).
- FMLP-Short: Designed for microsecond-long operations. Waiting tasks “spin” (busy-loop) instead of going to sleep, saving context-switch time. The lock holder gets a massive, localized priority boost (Priority Ceiling) to finish instantly.
- FMLP-Long: Designed for slow operations. Waiting tasks suspend (go to sleep) in the FIFO queue, freeing up the CPU. The lock holder inherits the maximum priority of the waiting tasks (Priority Inheritance) to prevent Priority Inversion.
- Sticky Scheduling: FMLP binds tasks strictly to their currently assigned CPU cluster, ensuring execution remains fast and predictable without constantly flushing CPU caches.