Karthikey Kadati | Linux Kernel Contributor

If you’ve followed the evolution of locking in RTEMS (Real-Time Executive for Multiprocessor Systems), you know that managing hard real-time systems in a multi-core SMP (Symmetric Multiprocessing) environment is a monumental challenge.

Protocols like MrsP (Multiprocessor Resource Sharing Protocol) and FMLP (Flexible Multiprocessor Locking Protocol) beautifully solved the software side of this equation. MrsP introduced Priority Ceilings to safely share cross-cluster memory. FMLP introduced strict FIFO (First-In, First-Out) queuing to completely eliminate task starvation.

But there was a lingering, physical problem involving system hardware that neither of them could solve.

Enter the Distributed Flexible Locking Protocol (DFLP).

Note: DFLP is currently the centerpiece of my Google Summer of Code (GSoC) 2026 proposal. I am actively working on natively porting it into the RTEMS 7 SuperCore directly from the original research implementation repository alongside Dr. Junjie Shi and Dr. Kuan-Hsun Chen’s formal literature.

In this deep dive, we’re going to explore exactly what the Core-Localized Hardware problem is, how sticky scheduling fails to address it, and how DFLP utilizes the extreme mechanism of Temporary Task Migration to solve it.

1. The Physical Problem: Core-Localized Hardware

Modern silicon isn’t perfectly symmetric. Inside your System on Chip (SoC), you might have a dozen CPU cores, but the peripheral hardware is often physically glued to specific regions of the silicon.

Imagine you have a hardware Cryptography Engine or a DMA (Direct Memory Access) controller that is hard-wired exclusively to Core 1.

Now, imagine a high-priority task assigned to Core 3 urgently needs to use that Cryptography Engine to decrypt an incoming network packet. The developer dutifully wraps the crypto-engine in a standard MrsP or FMLP lock to prevent race conditions.

The Failure of “Sticky Scheduling”

Here is where the architecture breaks down: Both MrsP and FMLP are “Sticky” protocols. That means that when a task acquires a lock, it MUST execute the locked critical section from its home processor.

If our task on Core 3 acquires the lock, it safely has software ownership… but it is still physically on Core 3. It cannot physically reach the crypto-engine hardware hard-wired to Core 1. The software protocol worked flawlessly, but the hardware topology defeated it.

2. The DFLP Paradigm: “If the Mountain Won’t Come to Muhammad…”

DFLP fundamentally re-invents what a locking protocol is supposed to do. A lock shouldn’t just be a state machine for priorities—it needs to act as a physical travel agent.

Unlike sticky protocols, DFLP binds its locking semaphores to a designated “Synchronization Processor”. In our example, the semaphore protecting the crypto-engine would configure Core 1 as its Synchronization Processor.

When a task needs the resource, DFLP doesn’t just grant it a priority boost. It initiates Temporary Task Migration.

The Migration Mechanic (The “Seize” Protocol)

Here is the exact state machine of what happens under the RTEMS SuperCore hood when you request a DFLP lock (the “Seize” operation):

Request: The task executing on Core 3 requests the crypto-engine lock.
Protection: Before moving anything, DFLP instantly protects the “in-flight” thread. By setting specific SuperCore lifecycle flags (like STATES_LIFE_IS_CHANGING), DFLP tells the kernel: “This thread is in a transient state. Do not aggressively dispatch it, time it out, or hit it with random UNIX signals while I am moving it!”
Migration: The kernel overrides the task’s native affinity via _Thread_Set_CPU(), ripping the task out of Core 3’s scheduler block and dynamically injecting it into Core 1’s scheduler.
Execution: The task wakes up natively on Core 1. It is now physically adjacent to the crypto-engine and freely executes the critical section with zero hardware bottlenecks.

Returning Home (The “Surrender” Protocol)

Once the task finishes decrypting the packet, it “Surrenders” the lock.

If the queue is empty: DFLP safely un-pins the task, calls _Thread_Set_CPU() again, and seamlessly teleports the task back to its home on Core 3 so it can finish the rest of its execution.
If tasks are waiting (The Handoff): If another task is in the FIFO wait queue for the crypto-engine, DFLP performs an “Atomic Handoff.” Our task goes back to Core 3, while simultaneously triggering the core-migration sequence for the waiting task, pulling the next task onto Core 1 instantly.

3. How DFLP Defeats Deadlock and Starvation

Moving tasks across CPU boundaries while holding real-time locks is incredibly dangerous. If two tasks try to migrate to the same core while waiting on interdependent locks, you can easily trigger a catastrophic deadlock.

Priority Boosting + FIFO

DFLP prevents these disasters by combining DFLP migration with the brilliant queuing mechanics pioneered by FMLP-Short:

Strict FIFO Queuing: Just like FMLP, waiting tasks form a strict First-In, First-Out line. This absolutely guarantees that no task will ever suffer from Starvation (waiting forever while newer, higher-priority tasks cut in line).
Priority Boosting: Once a task migrates to the Synchronization Processor to execute its critical section, it cannot be allowed to dawdle. Using Priority Boosting, the executing task instantly inherits a priority ceiling. This guarantees that standard tasks physically residing on the Synchronization Processor cannot preempt the migrated visitor, defeating Priority Inversion.

4. Deep SuperCore Danger: Pin Levels

Building this effectively into the RTEMS 7 SuperCore is an exercise in extreme caution. Because task migration fundamentally alters the dispatch state of a thread, DFLP has to strictly evaluate the thread’s pin_level (a tracker determining if a thread is locked to its current state) before moving it. Attempting to physically rip a thread out of a core when its pin_level is unbalanced will trigger a fatal INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL panic. My implementation relies on atomic primitives and invariant kernel checks to make sure the thread is cleanly pinned during its physical journey across the silicon.

4. Why This Matters: Unlocking Silicon

Historically, deploying a Real-Time OS on core-localized hardware forced a brutal compromise: either fracture the application into architecturally rigid AMP message-passing architectures or accept poor scaling via global SMP queues. DFLP resolves this conflict. By implementing Temporary Task Migration, DFLP allows developers to treat asymmetric Systems-on-Chip (SoC) as a unified playground. You get the programming simplicity of SMP combined with the surgical hardware utilization of AMP. RTEMS won’t just run on modern silicon—it will dynamically route tasks to exactly where the physical hardware resources live.

5. Wrapping Up

The introduction of the Distributed Flexible Locking Protocol (DFLP) represents the final missing puzzle piece in the modern RTEMS SMP architecture.

It takes the mathematical priority-safety of MrsP, combines it with the starvation-free FIFO fairness of FMLP, and breaks the final mold of sticky-scheduling by physically routing real-time tasks to the silicon that needs them. While there is a slight jitter cost incurred by migrating a scheduler block across CPUs, the ability to finally access isolated, core-localized hardware with mathematical bounds makes it a mandatory tool for next-generation embedded deployments.

TL;DR

What is DFLP? The Distributed Flexible Locking Protocol. It’s an advanced multi-processor lock being ported to RTEMS 7 SMP that enables Temporary Task Migration.
The Hardware Problem: Legacy protocols like FMLP and MrsP use “Sticky Scheduling”—tasks cannot leave their originally assigned CPUs. Therefore, they cannot physically reach specialized hardware (like a crypto-engine) hard-wired to an isolated core across the SoC.
The DFLP Solution: DFLP explicitly binds a lock to a “Synchronization Processor.” When a task needs the lock, DFLP physically rips it from its home CPU and teleports it to the remote processor. The task executes perfectly local to the hardware, then migrates back home.
Mathematical Safety: DFLP uses strict FIFO waiting queues and localized Priority Boosting to ensure migrating tasks aren’t starved by other tasks or preempted mid-operation, maintaining hard real-time safety.