Karthikey Kadati | Linux Kernel Contributor

This isn’t a tutorial, and it isn’t a sales pitch. This is an engineering record.

Over the past months, I’ve been working deep inside the RTEMS (Real-Time Executive for Multiprocessor Systems) kernel: porting locking protocols, debugging kernel panics, writing YAML specifications, and patching infrastructure across four separate repositories. This post documents what I built, what broke, what I learned, and why it all leads to the Distributed Flexible Locking Protocol (DFLP) for GSoC 2026.

If you want the TL;DR: I ported the Flexible Multiprocessor Locking Protocol (FMLP) into the RTEMS 7 SuperCore, authored 7 Merge Requests across 4 repositories, wrote 3 regression tests that pass on a quad-core Leon3 simulator, and in the process internalized the exact Thread_Control, Scheduler_Node, and Per_CPU_Control machinery that DFLP’s Task Migration will manipulate. This post is the proof.

1. Earning Trust: Infrastructure First

Before touching the SuperCore’s locking infrastructure, I started where every new contributor should: at the edges. These contributions are not glamorous, but they show that I can navigate the full RTEMS ecosystem (kernel, BSP, toolchain, CI) without hand-holding.

SPARC Optimization (MR #835)

Commit: 3ec0ca37ff

The register keyword in C is a legacy hint that modern compilers universally ignore. On SPARC, the %g7 register is the thread-local self pointer and it needs to be pinned to a specific hardware register for the kernel’s entire lifetime. The old code used the register keyword for this. My patch replaces it with GCC’s __asm__ register variable syntax, which is the correct way to pin a global register on SPARC.

This is a one-line change, but it required understanding:

The SPARC register windowing model and why %g7 is a global (not windowed) register.
GCC inline assembly syntax for register variables.
The RTEMS BSP build pipeline to make sure the change didn’t break the LEON3 toolchain.

POSIX Test Cleanup (MR #879)

Commit: c8a880a45c

The psximfs and psximfs04 tests emitted -Werror=unused-parameter warnings on strict compilers. The fix: explicit (void) casts on unused Init arguments. Simple, but it matters because RTEMS builds with -Werror by default. One warning is a build failure.

CI YAML Formatting (MR #1109)

Commit: 69ecb4abf5

456 insertions, 456 deletions. A full reformat of spec/build/testsuites/sptests/grp.yml to satisfy the RTEMS CI formatting mandate. No functional change, pure hygiene. But without it, the CI pipeline rejects every MR that touches the test suite build specs.

RSB Python Version Enforcement (MR #199)

Commit: 0237d50

The RTEMS Source Builder silently produced broken output on Python 3.5. My patch adds an early version check to sb-check that fails fast with a clear error message. References Issue #113.

2. The FMLP Port: 29 Files, +2,393 Lines, One Protocol (MR #882)

This is the centerpiece. The Flexible Multiprocessor Locking Protocol had existed only in a research repository maintained by Junjie Shi and Kuan-Hsun Chen at TU Dortmund. My job was to port it natively into the RTEMS 7 SuperCore, integrate it into the Classic API, and prove it works.

The port is structured as four clean, bisectable commits:

Commit 1: Import

b42fd0d96d

Raw import of the 2020 research patch. This brings in the core algorithms for both variants:

FMLP-Short (FMLP-S): Busy-wait (spinning) with Priority Ceiling. Uses _Thread_queue_Operations_FIFO for strict FIFO ordering.
FMLP-Long (FMLP-L): Suspension-based with Priority Inheritance. Uses _Thread_queue_Operations_priority_inherit for PI-ordered waiting.

Commit 2: Modernize (The War Story)

32d16c0c6c

This is where the real engineering happened. The research code was written against an older RTEMS internal API. Modernizing it meant refactoring every function to use current SuperCore interfaces, splitting inline implementations, and reformatting to RTEMS coding standards.

And this is where the system started panicking.

The `INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL` Saga

The bug: after acquiring an FMLP-S lock and releasing it under specific multi-core scheduling conditions, the kernel would fire INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL. Immediate, unrecoverable fatal error.

Here’s the root cause, traced across three layers of the SuperCore:

Layer 1: Sticky Scheduling. FMLP-S uses what the RTEMS SuperCore calls “Sticky Scheduling.” When a task acquires an FMLP-S lock, _FMLPS_Claim_ownership() calls:

cpu_self = _Thread_queue_Dispatch_disable( queue_context );
_FMLPS_Release( fmlps, queue_context );
_Thread_Priority_update_and_make_sticky( executing );
_Thread_Dispatch_enable( cpu_self );

The _Thread_Priority_update_and_make_sticky() call increments the thread’s sticky level and adjusts the dispatch disable counter on the current CPU’s Per_CPU_Control. This pins the thread. The scheduler can’t migrate it while the sticky lock is held.

Layer 2: The Cross-CPU Problem. On surrender, the path has to call _Thread_Priority_update_and_clean_sticky() to decrement the sticky level. But here’s the problem: if the thread has been helped by a remote CPU’s scheduler (via the Scheduler Helping Protocol), the “current CPU” might not be the same CPU where the dispatch disable was originally incremented.

The result: _Thread_Dispatch_disable_critical() incremented CPU A’s counter. _Thread_Dispatch_enable() decremented CPU B’s counter. CPU A’s counter is now permanently elevated. At the next dispatch point, the kernel detects the asymmetry and panics.

Layer 3: The Fix. The solution required making sure that the _Thread_Dispatch_disable_critical() / _Thread_Dispatch_enable() pair always operates on the same Per_CPU_Control pointer. The cpu_self variable has to be captured before any operation that could cause scheduler helping or thread migration, and it has to be passed through the entire surrender path.

In _FMLPS_Surrender(), the correct sequence is:

_FMLPS_Remove_priority( executing, &fmlps->Ceiling_priority, queue_context );
/* ... */
cpu_self = _Thread_Dispatch_disable_critical(
  &queue_context->Lock_context.Lock_context
);
_FMLPS_Release( fmlps, queue_context );
_Thread_Priority_update_and_clean_sticky( executing );
_Thread_Dispatch_enable( cpu_self );

The priority removal happens before the dispatch disable, and the cpu_self pointer captured at disable time is the exact same pointer used at enable time. No cross-CPU asymmetry.

Debugging this took days. It was invisible on uniprocessor configurations. It only showed up on the quad-core SIS Leon3 simulator under specific task arrival patterns. In retrospect, it was the single most valuable lesson of the entire port.

Commit 3: Classic API Integration

5d93112e12

This commit wires the SuperCore implementation into RTEMS’s user-facing Semaphore Manager:

attr.h: Added RTEMS_FLEXIBLE_MULTIPROCESSOR_LOCKING_SHORT and RTEMS_FLEXIBLE_MULTIPROCESSOR_LOCKING_LONG attribute flags.
semcreate.c: Routes the new flags through attrimpl.h to instantiate FMLPS_Control or FMLPL_Control blocks.
semobtain.c / semrelease.c: Delegates to _FMLPS_Seize() / _FMLPL_Seize() and their surrender counterparts.
semdelete.c / semflush.c / semsetpriority.c: Full lifecycle support.
mon-sema.c: Updated the system monitor so it correctly identifies FMLP semaphore types instead of showing a generic error.

Commit 4: Regression Tests

aadaeaa31b

Three test suites, each targeting a different aspect:

Test	Protocol	What It Validates
`spfmlp01`	FMLP-L	Suspension logic, multi-processor mutual exclusion, correct unblock ordering.
`spfmlp02`	FMLP-S	Single-processor spinning, contention behavior, mutex semantics.
`spfmlp03`	FMLP-S	FIFO ordering at equal priority, Priority Ceiling Protocol, ceiling violation rejection.

All three pass on the SIS Leon3 simulator configured with 4 CPUs. Each test includes a .doc file describing the directives and concepts tested, and a .scn file documenting expected console output.

3. Closing the Loop: Specification and Documentation

A protocol isn’t “ported” until it’s specified, documented, and testable through the formal RTEMS pipeline. This required work in two more repositories.

YAML Specifications (rtems-central, MR #15)

Branch: fmlp-definitions | Commits: aef88c2b, f348b3ef

I authored YAML specification files that define FMLP’s requirements in a format consumable by spec2modules.py, RTEMS’s requirement-to-code traceability toolchain:

spec/rtems/sem/req/fmlp-obtain.yml
spec/rtems/sem/req/fmlp-prio-change-while-waiting.yml
spec/rtems/sem/req/fmlp-uniprocessor-scheduler.yml
spec/score/tq/req/enqueue-fmlp.yml
spec/score/tq/req/surrender-fmlp.yml
spec/score/tq/req/timeout-fmlp.yml

These specs feed the pre-qualification infrastructure. In safety-critical domains (DO-178C, ISO 26262), every kernel behavior has to trace from a formal requirement to a test case. These YAML files are that link.

C-User Documentation (rtems-docs, MR #206)

Branch: feature-fmlp-docs | 4 commits

Commit	What It Does
`505777a`	Adds the BibTeX citation for Shi et al. 2021, the foundational FMLP research paper.
`4ebecec`	Documents `RTEMS_FLEXIBLE_MULTIPROCESSOR_LOCKING_SHORT` and `_LONG` attributes, their constraints (binary semaphores only), and mutual exclusivity rules.
`2d56393`	Explains FMLP-S vs FMLP-L differences, adds fatal error documentation for sticky lock violations.
`226a7b4`	Adds FMLP-S and FMLP-L to the glossary. Updates index files so the new sections are discoverable in the c-user guide.

4. The Contribution Map

#	Repository	MR	Files Changed	Summary
1	`rtems`	#882	29 (+2,393 lines)	FMLP-S/L SuperCore port + Classic API + 3 regression tests
2	`rtems-central`	#15	6+	YAML requirement specs for pre-qualification
3	`rtems-docs`	#206	4+	C-User documentation, glossary, BibTeX citation
4	`rtems`	#835	1	SPARC `register` to `__asm__` register variable
5	`rtems`	#879	1	POSIX unused-parameter warning fix
6	`rtems`	#1109	1 (+456/-456)	CI YAML formatting compliance
7	`rtems-source-builder`	#199	1 (+8/-2)	Python 3.6 version enforcement

Total: 7 Merge Requests across 4 repositories.

5. The Bridge: From FMLP to DFLP

Everything above is prologue. Here’s the thesis:

FMLP’s Sticky Scheduling is what showed me the need for DFLP’s Task Migration.

When I implemented _FMLPS_Claim_ownership() and _FMLPS_Surrender(), I had to deeply understand why sticky scheduling exists: it pins a thread to a CPU so that Per_CPU_Control dispatch counters stay symmetric. It solves software contention well. But it creates a fundamental physical limitation: a task pinned to Core 3 can never reach hardware on Core 1.

DFLP takes the opposite approach. Instead of pinning, it migrates. Instead of incrementing a sticky counter, it manipulates Scheduler_Node assignments, overrides CPU affinity via _Thread_Set_CPU(), and uses the STATES_LIFE_IS_CHANGING lifecycle bit to protect the thread during the in-flight window.

The SuperCore primitives are the same ones I already know:

Thread_Control: the TCB I traced through the dispatch panic.
Scheduler_Node: the priority aggregation structure I wired into FMLP’s ceiling logic.
Per_CPU_Control: the dispatch counter whose asymmetry I debugged for days.
_Thread_queue_Enqueue_sticky() / _Thread_queue_Surrender_sticky(): the exact enqueue paths DFLP will replace with migration-aware equivalents.
Red-Black Trees: the priority aggregation infrastructure that will track cross-CPU priority boosts during migration.

Why 350 Hours is Realistic

The DFLP port is not an exploratory research project. It’s a structured engineering task:

Import the DFLP research implementation (same source repository as FMLP).
Modernize it for RTEMS 7 (same process I already completed for FMLP).
Integrate into the Classic API (same semaphore creation, obtain, release, delete paths).
Write regression tests (same test framework, same Leon3 simulator).
Author YAML specifications (same spec2modules.py pipeline).
Write documentation (same c-user guide structure).
Formally verify with Promela/SPIN (the new piece that justifies the expanded timeline).

Steps 1-6 are identical in structure to the FMLP work. Step 7 is the new engineering challenge. The formal verification models have to be built from scratch because no existing Promela models of the RTEMS 7 SuperCore dispatcher exist.

6. Conclusion

Having solved the software contention problem in FMLP, having debugged the dispatch disable symmetry, authored the sticky scheduling paths, written the ceiling priority logic, and proven correctness with regression tests on quad-core silicon, I am now moving to solve the core-localized hardware problem via DFLP.

The 350-hour GSoC 2026 timeline is not an estimate. It’s a structured work plan built on the exact same architectural patterns, development workflow, and testing infrastructure that this DevLog documents.

The SuperCore is not a black box to me. I’ve been inside it. I’ve broken it. I’ve fixed it. And I’m ready to extend it.

TL;DR

7 Merge Requests across 4 repositories (rtems, rtems-central, rtems-docs, rtems-source-builder).
The FMLP Port (MR #882): 29 files, +2,393 lines. Ported FMLP-Short (spinning/ceiling) and FMLP-Long (suspension/PI) into the RTEMS 7 SuperCore. Debugged INTERNAL_ERROR_BAD_THREAD_DISPATCH_DISABLE_LEVEL, a Per_CPU_Control dispatch disable counter asymmetry caused by cross-CPU sticky scheduling.
The Tests: spfmlp01 (FMLP-L suspension), spfmlp02 (FMLP-S spinning), spfmlp03 (FIFO + ceiling). All pass on quad-core Leon3.
Tooling: YAML requirement specs (MR #15) for spec2modules.py pre-qualification. C-User documentation (MR #206) with BibTeX, attributes, fatal errors, glossary.
Infrastructure: SPARC register optimization (MR #835), POSIX warning fix (MR #879), CI YAML formatting (MR #1109), RSB Python 3.6 check (MR #199).
The Bridge: FMLP’s Sticky Scheduling showed me the need for DFLP’s Task Migration. The SuperCore primitives (Thread_Control, Scheduler_Node, Per_CPU_Control, RB-Trees) are internalized. The 350-hour DFLP timeline is a structured engineering plan, not an estimate.