In our article covering the scheduling algorithms of real-time operating systems (RTOS), we stated that they can run tasks in such a way that leaves the impression of a multitasking behavior. This is achieved by giving the RTOS the capability to interrupt a currently executing task and to start executing another one. At some point in time, the interrupted task should resume its operation. When that occurs the microprocessor must be put in the same state as it was the last time the interrupted task was being executed. This is done using a mechanism called context switching.
Each task uses a specific set of resources when it is executing. These include CPU registers, system status flags, access to memory (heap, stack), etc.. All these resources are what we call a task state (aka task execution context). Context switching is a process of saving the task state (with the intention for it to be restored at a later point in time) and switching it with another already saved task state.
Task context switching guarantees that each task sees the CPU as its own. This mimics the behavior of real multitasking, where each task should have its own dedicated CPU.
Context Switching Basics
Context switching is not a mechanism used only in real-time operating systems. Every microprocessor uses some form of context switching when an exception occurs and a service routine has to be executed. In most modern CPU architectures the exception context switching is usually handled partly by the hardware (some registers are automatically saved) and partly by the compiler-generated code.
Task context switching in real-time operating systems is implemented as part of their source code. Although it is handled using software, context switching is hardware dependent, as the resources needed may differ from one microprocessor to another. This means that the code for task context switching must be ported for each CPU architecture.
Implementation Details
Now we will analyze how task context switching can be implemented. As a start, we should make sure that each task, has its own private stack. In addition to using it as a regular application stack, this private stack will also be used to store the task state (CPU registers, return address, stack pointer value, etc.). In the basic “bare-bone” applications we usually have only one stack for the whole program. The obvious question is how can we implement individual stacks for each task? This is accomplished by modifying the value of the stack pointer register. The stack is just a section from the volatile memory (RAM) that we “reserve” for stack operation. The location of this section is pointed by the stack pointer. The basic principle is that each task will have a specific area of the memory for its stack. The start address of this stack will be stored in a variable, so it can be loaded when the task is being activated.
Memory Allocation
RTOS kernel objects such as tasks, semaphores, etc. can be allocated dynamically or statically (during compilation). For task dynamic allocation, the RTOS usually provides different schemes. For example:
- allocating the space for the task once, and never freeing it
- once a task has completed its operation the space allocated for it on the heap is freed
The most suitable dynamic allocation scheme depends on the application complexity and the resource constraints of the embedded system.

Now let’s see how much memory an RTOS task requires. We are not focusing on a specific RTOS distribution and we will try to cover the things that are common across all of them.
Each created task should have a task control information memory area and a stack memory area. This is shown on the left side of Fig. 1. Task control information area has a fixed size and it may include:
- task’s name (a pointer to the C function implementing the task)
- debug information
- the size of the task’s stack
- top of the stack pointer (address)
- task priority
All tasks are placed in the heap memory.
Context Switching Flow
As a final step let’s analyze a context switching flow using an example involving two tasks – Task 1 and Task 2. Task 1 is currently running, while Task 2 which has a higher priority has just become ready to run. This situation will require a context switch, and the steps involved are the following:
- Task 1 is executing
- RTOS tick exception is generated
- The hardware automatically saves some registers onto the current task’s stack. This depends on the CPU architecture.
- ARM Cortex-M automatically saves R0-R3, R12, LR(R14), return address and xPSR.
- An RTOS handler function for the tick exception stores any additional registers that are part of the current task state (see Fig. 2).
- For ARM Cortex-M this handler function should save registers R4-R11, R14.
- The stack pointer value (address of the last register pushed into the stack) is saved in the task control information area.
- The handler checks if there is a higher priority task waiting to be run, in our case, this is Task 2. The CPU stack pointer register is loaded with the stack pointer value stored in Task 2’s information control memory area.
- We are still in the RTOS handler function for the tick exception, but now the stack pointer is pointing to the last entry in Task 2 stack area. The context that is saved by this handler upon entry (step 4) into the stack is now being restored. Note that the context for Task 1 was saved upon exception entry, but now we are restoring the context for Task 2.
- When exiting the handler, the hardware automatically restores any data saved during step 3 of the flow. Note again that the hardware is restoring the context for Task 2, as the stack pointer is loaded with Task 2’s stack address (step 6).
- Now we are out of the exception handling routine. The program will continue regular execution but not on the task that was interrupted (Task 1), instead Task 2 will be executed. The context switch is complete!

All the “magic” of performing a task context switch is directly related to the manipulation of the stack pointer value for achieving an individual stack area for each task.
I am confused with the comment “All tasks are placed in the heap memory.” I though stack and heap share the RAM, the heap is for dynamic allocation and stack area is for all functions calls and other stack information. Could you explain why do you mean with that comment?
Your statement is correct. The heap and the stack both share the available RAM space. When it comes to RTOS tasks(aka threads) they can be dynamically created/deleted during the program execution and thus their taks memory structures (see Fig.1) are placed in the heap region of the memory. The stack for each RTOS task is also part of this memory structure stored in the heap. Each task has its own isolated stack.
On a side note: some RTOS implementations allow creating statically alocated tasks, and they will not use the heap region.
Thanks for the question Carlos!