Karthikey Kadati | Linux Kernel Contributor

When we talk about Symmetric Multiprocessing (SMP), the word “symmetric” sets a very specific expectation: every processor in the system is created equal. In a pure, theoretical SMP model, any thread can execute on any core at any given time, with the exact same access to memory, hardware peripherals, and interrupts.

But as any embedded systems engineer knows, the theoretical model rarely survives contact with real silicon. Real multi-core hardware is intrinsically asymmetric at the physical level. This discrepancy creates what we call the Core-Localized Hardware Problem.

Let’s explore why this problem exists, why it breaks pure SMP paradigms, and how RTEMS bridges the gap between SMP software runtimes and asymmetric hardware realities.

1. The Illusion of Perfect Symmetry

In modern System on Chips (SoCs), resources are localized to specific cores for performance and power-efficiency reasons. Some typical core-localized resources include:

Private Memory and Caches: While main memory (RAM) is shared, L1 caches (and often L2 caches) are strictly private to each core. Furthermore, in NUMA (Non-Uniform Memory Access) architectures, different cores have significantly different latencies when accessing different banks of memory.
Private Peripheral Interrupts (PPIs): On architectures like ARM, the Generic Interrupt Controller (GIC) explicitly defines PPIs. These are interrupts physically routed to one specific core (for example, a core-local timer or watch-dog).
Asymmetric Peripherals: An SoC might feature a hardware cryptography engine or a DSP block that only resides on the bus adjacent to Core 0.

The Problem for Pure SMP

If a pure SMP Real-Time Operating System blindly schedules a high-priority task to run on whichever core happens to be idle, it walks right into a trap:

Lost Interrupts: If a task that relies on a Private Peripheral Interrupt migrates from Core 0 to Core 1, it will suddenly stop receiving its hardware interrupts. Core 1 physically cannot see the PPI intended for Core 0.
Cache Thrashing: If a task is forcibly migrated to a new core every time it yields, it leaves its “hot” L1 cache behind. The new core must fetch everything from main memory again. In hard real-time systems, this variability in Execution Time breaks determinism and ruins Worst-Case Execution Time (WCET) calculations.

2. AMP: The Natural Solution

Asymmetric Multiprocessing (AMP) avoids the Core-Localized Hardware Problem entirely by embracing the asymmetry.

In an AMP system, you run a distinct, separate instance of an Operating System (or bare-metal application) on each core. The OS on Core 0 is statically configured to handle Core 0’s PPIs and local peripherals. The OS on Core 1 handles Core 1’s resources.

Because tasks never migrate between OS instances, they never leave their private caches behind, and they never lose their localized hardware. The software maps perfectly 1:1 with the uneven hardware layout. However, this comes at the cost of massive software complexity: developers must manually balance the load, and the cores must communicate over complex message-passing frameworks (like the RTEMS Multiprocessing Manager / MPCI).

3. How RTEMS Solves Core-Localization in SMP

The Holy Grail is keeping the programming simplicity of modern SMP—where the OS manages the workload—while respecting the core-localized realities of the hardware. RTEMS tackles this through several advanced scheduling controls:

Clustered Scheduling

RTEMS doesn’t force a single, global queue for all cores. Instead, it utilizes Clustered Scheduling. The system designer can partition cores into clusters (e.g., grouping cores that share an L2 cache into one cluster). High-priority tasks are scheduled only within their assigned cluster. This bounds thread migration to cores that share physical locality, eliminating catastrophic cache-thrashing across the entire SoC.

Task Processor Affinity

RTEMS allows developers to define Task Affinity. If a task interacts with a crypto-engine that is physically close to Core 0, or relies on an ARM PPI hardwired to Core 1, the developer can explicitly set the affinity of that task to only execute on that specific processor. The scheduler respects this core-localization, never migrating the task away from the hardware it needs.

Thread Pinning

Sometimes a task only needs local hardware for a fraction of a millisecond. RTEMS provides an internal mechanism called Thread Pinning (_Thread_Pin). A thread can temporarily pin itself to its current processor, ensuring that the scheduler will not migrate it during a critical section where it is manipulating per-processor data structures or local timers.

Interrupt Locks (Instead of Disabling Interrupts)

In uniprocessor systems, disabling local interrupts is a quick way to protect data. On SMP, disabling Core 0’s interrupts does nothing to stop Core 1. RTEMS replaces these legacy paradigms with Interrupt Locks, combining SMP ticket locks with local interrupt disabling. This allows safe atomic access to shared state without ignoring the reality of the multi-processor interrupt layout.

4. Where RTEMS Currently Lags (Limitations & Future Work)

As powerful as RTEMS is, it is important to understand where the framework’s multiprocessing capabilities currently lag behind bleeding-edge requirements or general-purpose OS counterparts:

No Dynamic Load Balancing in AMP: In RTEMS’s AMP implementation (the Multiprocessing Manager / MPCI), tasks cannot be dynamically migrated from an overloaded node to an idle node at runtime. The distribution of tasks to processors must be completely mapped out during the application design phase.
Arbitrary Processor Affinity is a Proof-of-Concept: While task affinity works for basic pinning, the RTEMS scheduler that supports arbitrary mathematical subsets of processor affinities is still officially considered a proof-of-concept implementation.
Scaling Beyond 32 Cores: The default global fixed-priority scheduler in RTEMS SMP is optimized for up to 32 processors. While RTEMS has experimental support for structures like MCS locks (which scale better across massive NUMA systems with 64+ cores), massive-scale NUMA is not yet the out-of-the-box standard.

Conclusion

The “Core-Localized Hardware Problem” proves that true Symmetry is largely a software abstraction floating on top of asymmetric silicon. While AMP systems accept this asymmetry by physically dividing the software, RTEMS SMP empowers developers to use modern Multiprocessing intelligently. By utilizing Clustered Scheduling and Task Affinity, RTEMS allows you to reap all the throughput benefits of SMP without abandoning the hard real-time determinism demanded by core-localized hardware.

TL;DR

The Hardware Problem: Real silicon is rarely perfectly symmetric. Some hardware (like a crypto-engine, Private Peripheral Interrupts, or private L1 caches) is physically hard-wired to a specific core.
The SMP Flaw: If an OS forces a task to seamlessly migrate to any idle core, it leaves localized hardware behind or destroys its private cache.
The RTEMS Solution: Instead of purely splitting the OS physically (AMP), RTEMS guarantees real-time localization by utilizing Clustered Scheduling arrays, explicit Task Affinity binding, and temporary Thread Pinning.