Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753252AbXKWAQQ (ORCPT ); Thu, 22 Nov 2007 19:16:16 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751409AbXKWAQA (ORCPT ); Thu, 22 Nov 2007 19:16:00 -0500 Received: from mtaout01-winn.ispmail.ntl.com ([81.103.221.47]:13413 "EHLO mtaout01-winn.ispmail.ntl.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751388AbXKWAP7 (ORCPT ); Thu, 22 Nov 2007 19:15:59 -0500 From: Daniel Drake To: linux-kernel@vger.kernel.org Cc: davem@davemloft.net Cc: kune@deine-taler.de Cc: johannes@sipsolutions.net Subject: [RFC] Documentation about unaligned memory access Message-Id: <20071123001554.12F8B9D4A1F@zog.reactivated.net> Date: Fri, 23 Nov 2007 00:15:53 +0000 (GMT) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8893 Lines: 259 Being spoilt by the luxuries of i386/x86_64 I've never really had a good grasp on unaligned memory access problems on other architectures and decided it was time to figure it out. As a result I've written this documentation which I plan to submit for inclusion as Documentation/unaligned_memory_access.txt Before I do so, any comments on the following? Thanks, Daniel UNALIGNED MEMORY ACCESSES ========================= Linux runs on a wide variety of architectures which have varying behaviour when it comes to memory access. This document presents some details about unaligned accesses, why you need to write code that doesn't cause them, and how to write such code! What's the definition of an unaligned access? ============================================= Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x10000004 is fine, but reading 4 bytes of data from address 0x10000005 would be an unaligned memory access. Why unaligned access is bad =========================== Most architectures are unable to perform unaligned memory accesses. Any unaligned access causes a processor exception. Some architectures have an exception handler implemented in the kernel which corrects the memory access, but this is very expensive and is not true for all architectures. You cannot rely on the exception handler to correct your memory accesses. In summary: if your code causes unaligned memory accesses to happen, your code will not work on some platforms, and will perform *very* badly on others. You may be wondering why you have never seen these problems on your own architecture. Some architectures (such as i386 and x86_64) do not have this limitation, but nevertheless it is important for you to write portable code that works everywhere. Natural alignment ================= The rule we mentioned earlier forms what we refer to as natural alignment: When accessing N bytes of memory, the base memory address must be evenly divisible by N, i.e. addr % N == 0 When writing code, assume the target architecture has natural alignment requirements. Sidenote: in reality, only a few architectures require natural alignment on all sizes of memory access. However, again we must consider ALL supported architectures; natural alignment is the only way to achieve full portability. Code that doesn't cause unaligned access ======================================== At first, the concepts above may seem a little hard to relate to actual coding practice. After all, you don't have a great deal of control over memory addresses of certain variables, etc. Fortunately things are not too complex, as in most cases, the compiler ensures that things will work for you. For example, take the following structure: struct foo { u16 field1; u32 field2; u8 field3; }; Let us assume that an instance of the above structure resides in memory starting at address 0x10000000. With a basic level of understanding, it would not be unreasonable to expect that accessing field2 would cause an unaligned access. You'd be expecting field2 to be located at offset 2 bytes into the structure, i.e. address 0x10000002, but that address is not evenly divisible by 4 (remember, we're reading a 4 byte value here). Fortunately, the compiler understands the alignment constraints, so in the above case it would insert 2 bytes of padding inbetween field1 and field2. Therefore, for standard structure types you can always rely on the compiler to pad structures so that accesses to fields are suitably aligned (assuming you do not cast the field to a type of different length). Similarly, you can also rely on the compiler to align variables and function parameters to a naturally aligned scheme, based on the size of the type of the variable. Sidenote: in the above example, you may wish to reorder the fields in the above structure so that the overall structure uses less memory. For example, moving field3 to sit inbetween field1 and field2 (where the padding is inserted) would shrink the overall structure by 1 byte: struct foo { u16 field1; u8 field3; u32 field2; }; Sidenote: it should be obvious by now, but in case it is not, accessing a single byte (u8 or char) can never cause an unaligned access, because all memory addresses are evenly divisible by 1. Code that causes unaligned access ================================= With the above in mind, let's move onto a real life example of a function that can cause an unaligned memory access. The following function adapted from include/linux/etherdevice.h is an optimized routine to compare two ethernet MAC addresses for equality. unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) { const u16 *a = (const u16 *) addr1; const u16 *b = (const u16 *) addr2; return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; } In the above function, the reference to a[0] causes 2 bytes (16 bits) to be read from memory starting at address addr1. Think about what would happen if addr1 was an odd address, such as 0x10000003. (Hint: it'd be an unaligned access) Despite the potential unaligned access problems with the above function, it is included in the kernel anyway but is documented to only work on 16-bit-aligned addresses. It is up to the caller to ensure this alignment or not use this function at all. This alignment-unsafe function is still useful as it is a decent optimization for the cases when you can ensure alignment. Here is another example of code that could cause unaligned accesses: void myfunc(u8 *data, u32 value) { [...] *((u32 *) data) = cpu_to_le32(value); [...] } This code will cause unaligned accesses every time the data parameter points to an address that is not evenly divisible by 4. Consider the following structure: struct foo { u16 field1; u32 field2; u8 field3; } __attribute__((packed)); It's the same structure as we looked at earlier, but the packed attribute has been added. This attribute ensures that the compiler never inserts any padding and the structure is laid out in memory exactly as is suggested above. The packed attribute is useful when you want to use a C struct to represent some data that comes in a fixed arrangement 'off the wire'. It should be clear why accessing fields of an instance of that structure could cause unaligned accesses in some situations. Even if the instance started at an address such as 0x10000000 where accessing field1 would not cause an unaligned access, accessing field2 would be reading 4 bytes from 0x10000002, which, is an unaligned access. The compiler didn't jump to your rescue and insert padding because you asked it not to. In summary, the 3 main scenarios where you may run into unaligned access problems involve: 1. Recasting variables to types of different lengths 2. Pointer arithmetic followed by access to at least 2 bytes of data 3. Accessing elements of packed structures Avoiding unaligned accesses =========================== Going back to an earlier example: void myfunc(u8 *data, u32 value) { [...] *((u16 *) data) = cpu_to_le32(value); [...] } To avoid the unaligned memory access, you could rewrite it as follows: void myfunc(u8 *data, u32 value) { [...] value = cpu_to_le32(value); memcpy(data, value, sizeof(value)); [...] } It's safe to assume that memcpy will always copy bytewise and hence will never cause an unaligned access. Recall an example packed structure from earlier: struct foo { u16 field1; u32 field2; u8 field3; } __attribute__((packed)); The following code will potentially cause 2 unaligned accesses: writing to field2, then reading from field2: void myfunc2(u32 some_data) { struct foo myinstance; u32 tmp; myinstance.field2 = some_data; tmp = myinstance.field2 * 2; } When writing this code, you should be aware that field2 acccesses are potentially unaligned therefore the above will break on some systems. The kernel provides two macros to simplify handling of situations such as the above: void myfunc2(u32 some_data) { struct foo myinstance; u32 tmp; put_unaligned(tmp, &myinstance.field2); tmp = get_unaligned(&myinstance.field2); } These macros work from pointers to the unaligned data, and work for memory accesses of any length (not just 32 bits as in the example above). You could even use put_unaligned() rather than memcpy() in order to solve the bug in the first example (myfunc()) given above. -- Author: Daniel Drake With help from: Johannes Berg, Uli Kunitz. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/