In the context of programming, there are many different definitions of what a function is, but in general, we can say that a function is a named piece of code that performs a specific task. Functions may have different names (e.g methods, subroutines, etc.) in some languages. The function itself is just a regular code, the specific thing is that there are some additional operations that have to be performed when entering and exiting a function. In this article, we will explain how a function is handled by a microprocessor with examples for ARM Cortex-M4.

It is important to have a basic knowledge of how a function call is handled by the processor. Functions make the program more manageable, however, each function call requires additional instructions to be executed. How many and what type of instructions are required depends on the instruction set of the microprocessor and the compiler that is used. These required operations are known as function calling convention.

These are the common resources required by most of the calling conventions:

  • Branch instruction – it modifies the program counter so we can go to the function.
  • CPU registers – they can hold some of the function arguments.
  • Stack – it stores some of the function arguments, local variables, return address of the function, etc.
  • Return instruction – so we can return to the address of the instruction that has to be executed after the function is completed.

Entering a Function (Function Prologue)

As we know a computer program is executed instruction by instruction and if it weren’t for various branch instructions the flow of the program would be pretty straight forward. A function can be called many times from different places inside the program and it needs to be placed at a specific memory location. Using branch instructions we can tell the program counter to go to the memory address where the function code begins and proceed to execute code from there. We also need to store information for the address that the program counter should return to after exiting the function. This is the address of the next instruction after the function call (in the example shown below this is address 0x00e6):

00e2:movs r0, #7  
00e4:bl   0x32 <function1> ; from here we branch to the the code of function1
00e6:mov  r5,r7  ; the program counter should go to this address (0x00e6) after exiting function1

For branching to the start address of the function we use the bl instruction. This instruction also stores the return address in register r14 (the link register (LR)).

How are function arguments placed in the memory?

Before calling the branch instruction for entering the function, we must place the arguments of the function in general-purpose registers or the stack, depending on the case. This operation is known as marshaling the arguments.

There is no defined convention in C language how this marshaling of the arguments should be handled. So whether all function arguments are put on the stack or in CPU registers depends on the microprocessor architecture and the compiler. The procedure call standard for the ARM architecture defines that the first four 32 bit arguments of the function should be placed in registers r0-r3 and the rest should be pushed into the stack.

Placing a variable on the stack can be done using the following instructions:

mov r0, #10 ; put the value 10 in register r0
str r0, [sp,#0] ; store the value of r0 in the first position in the stack

These registers (r0-r3) are also known as caller-save or scratch registers. The function can modify them freely and it’s up to the caller to save their value if needed before branching to the function code.

Registers r4-r11 are known as callee-save registers. This means that their values before entering the function and after returning from the function should match. If the function needs to use them, then their original values should be saved and then restored right before returning. This is done by pushing these registers into the stack first thing when the function is entered. This can be done with the following code:

push {r7,r8,r9} ; push the registers that will be used by the function, so they can be restored before returing to the caller

Returning From a Function (Function Epilogue)

When we reach the end of the function we need to tell the program counter to return to the address of the instruction that has to be executed next (the instruction following the bl instruction used for entering the function). The return address is stored in the link register (r14). The address is automatically stored there when bl branch instruction is executed.

As already mention If any callee-save registers were pushed into the stack, they should be popped before returning from the function. This can be done with the following instruction:

pop {r7,r8,r9} ; retrieve the saved values of r7,r8,r9 registers from the stack

The return value can be stored in registers r0-r3. The bx instruction is used for returning from a function in the following way:

bx lr ; branch to the address that is stored in lr(r14) register (this is the return address of the function)

Complete ARM Cortex-M Example

In this chapter, we will use a simple C function called within the main function of a program and show what assembly language code is generated after compilation.

The function we are using is called sum and it returns the sum of two 32 bit integers.

#include <stdio.h>
#include <stdlib.h>

uint32_t result;

uint32_t sum (uint32_t a, uint32_t b) {

	return a+b;

int main(void) {

    /* Enter an infinite loop. */
    while(1) {
       /* All other program logic removed for clarity of the example*/
        result = sum(5,10);
    return 0 ;

Shown below is the assembly code result from compiler configured for maximum optimization.

000000e2:   movs r1, #10  ; put the first argument of the function in r1
000000e4:   movs r0, #5   ;put the second argument of the function in r0
000000e6:   bl 0x25d8 <sum> ; from here we go to the subroutine code 
000025d8:   adds r0, r0, r1  ;Sum register r0 and r1 and store the result in r0
000025da:   bx  lr ; return from the function

Assembly code of the same program is shown below, however, this time the compilator is configured for no optimization. This results in having more instruction for performing the same function. The compiler generates additional code for stack frame pointer and places all arguments on the stack, even though as already mentioned they are passed using r0-r3 registers.

000000e2:  movs r1, #10  ; put the first argument of the function in r1
000000e4:  movs r0, #5   ;put the second argument of the function in r0
000000e6:  bl 0x25d8 <sum> ; from here we go to the subroutine code
000025d8:   push    {r7} The reason for pushing r7 is a callee-save register, that will be used in this function
000025da:   sub     sp, #12  ; make room for a, b and return value (move the stack pointer, 3 words)
000025dc:   add     r7, sp, #0 ; r7 now points to the same as the sp with no ofset
000025de:   str     r0, [r7, #4]  ; store r0 (variable a) in stack 
000025e0:   str     r1, [r7, #0]  ; store r1 (variable b) in stack
000025e2:   ldr     r2, [r7, #4] ; load register r2
000025e4:   ldr     r3, [r7, #0] ; load register r3
000025e6:   add     r3, r2       ; add r3 and r2 and store result in r3
000025e8:   mov     r0, r3   ; move the result to r0
000025ea:   adds    r7, #12  
000025ec:   mov     sp, r7  ;return sp to its original position
000025ee:   ldr.w   r7, [sp], #4  ; restore r7 original value
000025f2:   bx      lr  ; return from the instruction


Was this article helpful?