Karthikey Kadati | Linux Kernel Contributor

In single-core processors, making sure two tasks don’t accidentally overwrite the same piece of memory at the exact same time is relatively simple. You can use tricks like Priority Inheritance (briefly boosting a task’s priority so it gets out of the way faster), and if all else fails, you simply disable hardware interrupts so nothing can abruptly interrupt the current task.

But when you move to a multi-core environment—specifically Symmetric Multiprocessing (SMP), where multiple processing cores share the exact same memory—those old tricks stop working. Disabling interrupts on Core 0 doesn’t stop Core 1 from trampling your data.

To guarantee response times (determinism) across multiple cores, a Real-Time Operating System like RTEMS (Real-Time Executive for Multiprocessor Systems) relies on mathematically proven, multi-processor locking protocols. Let’s look at how these protocols have evolved: where we started with MrsP, how we optimized things with FMLP, and what the upcoming DFLP solves for modern hardware.

The Priority Inversion Problem

Before diving into the protocols themselves, we need to understand the main villain they are trying to defeat: Priority Inversion.

Imagine a low-priority task grabs a software lock so it can safely read a sensor. Suddenly, an urgent, high-priority task wakes up and needs that exact same lock. The high-priority task is now “blocked” and must wait. If a medium-priority task wakes up, it can preempt the low-priority task, causing the urgently needed high-priority task to wait even longer. This chain reaction is unacceptable in hard real-time systems (like flight controllers or robotic arms).

To fix this, operating systems temporarily boost the low-priority task’s priority so it can finish its work quickly and release the lock. In an SMP system with multiple cores, coordinating these priority boosts safely without causing deadlocks is incredibly complex.

This brings us to our first major solutions.

Early Multi-Core Locking: OMIP and MrsP

To handle cross-core priority inversion, RTEMS introduced protocols like OMIP (O(m) Independence-Preserving Protocol) and MrsP (Multiprocessor Resource Sharing Protocol).

MrsP, in particular, was a massive leap forward. Designed for clustered SMP scheduling, MrsP utilizes a priority ceiling mechanism. Every lock is assigned a maximum “ceiling” priority. When a task acquires a MrsP lock, its priority is immediately yanked up to that ceiling.

Furthermore, if a high-priority task on Core A gets blocked waiting for a lock held by a preempted low-priority task on Core B, MrsP employs a “Scheduler Helping Protocol.” The waiting high-priority task actually donates its CPU time to the low-priority lock owner, allowing it to temporarily migrate and finish its computation using the waiting task’s CPU core.

The Catch: While MrsP provides excellent bounds for shared memory, its core philosophy relies on what is essentially Sticky Scheduling. The task accessing the resource is “sticky” to its originally assigned processors or clusters. While MrsP allows temporary migrations to donate CPU time (helping), the protocol does not migrate a task for the purpose of reaching localized physical hardware.

Improving Wait Times with FMLP

As RTEMS matured, developers realized they needed a protocol that optimally handled both extremely short delays and considerably long delays, while providing strict fairness. This led to FMLP (Flexible Multiprocessor Locking Protocol).

FMLP actually consists of two vastly different algorithms tailored for different lengths of critical sections:

FMLP-Short (FMLP-S): Designed for code blocks that execute in a fraction of a microsecond. In FMLP-Short, an executing task disables preemption locally and is boosted to a priority ceiling. But the real magic is the wait queue: waiting tasks do not go to sleep. Instead, they perform a busy wait (spin-lock) in strict FIFO (First-In, First-Out) order. Spinning wastes CPU cycles, but for incredibly short delays, spinning is actually faster than paying the heavy context-switch penalty of putting a task to sleep.
FMLP-Long (FMLP-L): Designed for lengthy critical sections (like copying massive buffers or waiting for a slow peripheral). FMLP-Long abandons the “priority ceiling” and instead uses Priority Inheritance. More importantly, waiting tasks do not spin. They are suspended (put to sleep) in a FIFO queue. This frees up the CPU for other tasks while guaranteeing that tasks are served exactly in the order they arrived.

The Catch: Like MrsP, FMLP relies entirely on Sticky Scheduling. In FMLP, tasks do not migrate across clusters. The executing task remains securely tied to its home CPU.

Why is Sticky Scheduling a problem?

Imagine your System on Chip (SoC) has a hardware cryptography engine or a DMA (Direct Memory Access) controller physically wired only to Core 1. If a task residing on Core 3 needs to use that crypto-engine, locking it via MrsP or FMLP doesn’t actually solve your problem. The task gets the software lock safely, but from Core 3, it still cannot physically reach the hardware localized to Core 1!

The Next Step: Addressing Physical Hardware with DFLP

To solve this core-localized hardware problem, we need an entirely different approach. A locking protocol shouldn’t just manage priorities—it needs to actively route tasks to physical destinations.

Slated for upcoming RTEMS releases (and driven heavily by current Google Summer of Code projects), the Distributed Flexible Locking Protocol (DFLP) introduces a major paradigm shift: Temporary Task Migration.

How DFLP Solves the Hardware Problem

Unlike sticky scheduling in MrsP or FMLP, DFLP binds a semaphore (a locking mechanism) to a designated “Synchronization Processor.” Let’s revisit our crypto-engine example:

A task executing natively on Core 3 requests the DFLP lock for the crypto-engine (which is physically tied to Core 1).
If the lock is available, the operating system’s kernel temporarily pauses the task on Core 3 and physically migrates the task to Core 1 using direct internal scheduler updates (_Thread_Set_CPU).
The task executes its protected code natively on Core 1, giving it perfect, localized physical access to the crypto-engine.
Once the work is done and the lock is released, DFLP seamlessly migrates the task back home to Core 3.

If the lock is already taken, DFLP relies on strict FIFO queuing and priority boosting to ensure the waiting tasks are processed fairly and predictably, never violating hard real-time deadlines.

Wrapping Up

The evolution of locking protocols in RTEMS directly mirrors how much more complicated multi-core hardware has become over the last decade:

MrsP provided the theoretical and practical foundation for safe, cross-core memory sharing through Priority Ceilings and CPU time donation.
FMLP heavily optimized the wait queues, bringing strict FIFO fairness and introducing the vital split between spin-based (FMLP-Short) and suspend-based (FMLP-Long) locking.
DFLP conquers the physical reality of asymmetric silicon chips, overcoming the limits of sticky scheduling by combining real-time locking with the ability to dynamically migrate threads exactly where the physical hardware needs them to be.

By implementing all three protocols deep inside its SuperCore, RTEMS ensures that whether you’re safely updating a simple array in RAM or orchestrating access to a complex hardware accelerator glued to a specific core, you have exactly the right mathematical protocol for the job.

TL;DR

The Problem: In multi-core systems, tasks fighting for locks can trigger Priority Inversion or system deadlock.
MrsP: Solved basic core sharing by boosting a lock-holder’s priority to a maximum Ceiling. If preempted, waiting tasks explicitly donate their CPU time to let the holder finish quicker.
FMLP: Eliminated Starvation using strict FIFO waiting queues. It splits into FMLP-Short (spending CPU cycles to infinitely spin) and FMLP-Long (putting tasks to sleep using Priority Inheritance).
DFLP: Overcomes the limitation of “Sticky Scheduling” by proactively using Temporary Task Migration to ship tasks natively to the Core-Localized hardware.