To better understand the concept of little-endian and big-endian formats we first have to look at how memory is accessed. The memory space of a microprocessor is expected to be byte-addressable. When a CPU has a 32-bit address bus this means that it can access a total of 232 addresses each with a size of 1 byte.
Why do we need a byte-addressable memory in a 32-bit processor? Because programming languages have data types with sizes smaller than 32 bits. For example, the C language supports data types with 8-bit, 16-bit, 32-bit, 64-bit sizes. Can we use only 32-bit variables? Sure, we can but that will result in inefficient use of the available memory. This is why modern 32-bit processors support 8-bit, 16-bit, and 32-bit transfers (see Fig.1). There are memory access instructions for each supported data size. For example, ARM Cortex-M processors have the following versions of load instructions: LDRB (load byte), LDRH(load half word) LDR (load word), etc.
Endianness refers to the way bytes are ordered when a data item with a size bigger than 1 byte (e.g. 32-bit variable) is placed in memory or transmitted over a communication interface.
There are two types of endianness:
- Little-endian – The bytes are ordered with the least significant byte placed at the lowest address.
- Big-endian – The bytes are ordered with the most significant byte placed at the lowest address.
A comparison of how data items are placed in memory using little-endian and big-endian is shown in Fig. 3. For simplicity, we are using memory with 8 addresses. Data 1 is stored first and as a 32-bit variable, it takes the first four memory addresses (remember each address can store 1 byte). The lowest memory address for Data 1 is address 0. In the case of little-endian format, Data 1 is stored with its least significant byte (0x29) at the lowest address, the next byte (0xA4) at the following address, and so on. In the case of big-endian format, Data 1 is stored with its most significant byte (0x65) at address 0, the next byte (0x73) at address 1, and so on.
There is not a clear advantage of using one endianness format over the other. Many processors use big-endian and many use little-endian. There are also processors that can be configured to use one or the other.
The endianness does not have a big impact on the microprocessor hardware implementation. As the endianness is relevant only on the operations that access data from the memory. Keep in mind that the CPU registers and all other processor building blocks have no sense of endianness, they just operate on data of a certain size (e.g 32-bits). Once data is retrieved from the memory it is placed in the CPU registers properly – for a 32-bit architecture the most significant bit is at position 31 and the least significant bit is at position 0.
Software Point of View
Should we bother with endianness when we use a high-level language like C? The short answer is yes. Although we can write programs without knowing the type of endianness used by the processor, we should be aware of the following scenarios where little-endian and big-endian formats are important:
- Communication between devices – There are situations where the protocol for communication has different endianness that the processor. A common example is the internet protocol (TCP/IP) suite that is defined as big-endian. If used by a processor that is little-endian then we must make sure the endianness difference is taken into account and the byte order of the data to be sent/received is properly managed.
- Accessing memory using pointers – Endianness has an impact in cases when we use pointers and type casting. A practical example in C is shown in the code below.
uint32_t data_1 = 0x6573a42; uint8_t data_2, data_3; uint8_t * p_data; data_2 = (uint8_t) data_1; /* data_2 will get a value 0x42 (independent of the endianess) */ p_data = (uint8_t *) &data_1; /* Set the value of the pointer p_data to the address of data1 variable */ data_3 = *p_data; /* What will the value of data_3 be in big-endian CPU and little-endian CPU? */ /* Little-endian: data_3 will have the value 0x42 (least significant byte of data_1) Big-endian: data_3 will have the value 0x65 (most significant byte of data_1) */
In general, endianness does not affect the way we write code, it does not even affect that much the way we design microprocessors. However, in some instances, little-endian and big-endian formats have the potential to cause issues if not properly taken care of. Knowing the endianness of the microprocessor, all devices, and communication protocols that are going to be used in an embedded system is highly recommended.
“data_2 = (uint8_t) data_1; /* data_2 will get a value 0x42 (independent of the endianess) */” – I guess this is wrong.
Typecasting should affect the endianess. dart_2 would be different on LE and BE
If you are not directly manipulating memory addresses (using pointers) in your C program, then endianness does not affect the behavior of type casting.
In the particular example “data_2 = (uint8_t) data_1; ” we have data_1 (32bit data type) that is cast to an 8bit data type, and the expected result is that we will keep only the least significant byte of data_1. This least significant byte in our case has a value of 0x42. It will be stored at a different address depending on the endianness (as shown in Fig.3) but data_2 will always get the correct value of 0x42.