Being spoilt by the luxuries of i386/x86_64 I've never really had a good
grasp on unaligned memory access problems on other architectures and decided
it was time to figure it out. As a result I've written this documentation
which I plan to submit for inclusion as
Documentation/unaligned_memory_access.txt
Before I do so, any comments on the following?
Thanks,
Daniel
UNALIGNED MEMORY ACCESSES
=========================
Linux runs on a wide variety of architectures which have varying behaviour
when it comes to memory access. This document presents some details about
unaligned accesses, why you need to write code that doesn't cause them,
and how to write such code!
What's the definition of an unaligned access?
=============================================
Unaligned memory accesses occur when you try to read N bytes of data starting
from an address that is not evenly divisible by N (i.e. addr % N != 0).
For example, reading 4 bytes of data from address 0x10000004 is fine, but
reading 4 bytes of data from address 0x10000005 would be an unaligned memory
access.
Why unaligned access is bad
===========================
Most architectures are unable to perform unaligned memory accesses. Any
unaligned access causes a processor exception.
Some architectures have an exception handler implemented in the kernel which
corrects the memory access, but this is very expensive and is not true for
all architectures. You cannot rely on the exception handler to correct your
memory accesses.
In summary: if your code causes unaligned memory accesses to happen, your code
will not work on some platforms, and will perform *very* badly on others.
You may be wondering why you have never seen these problems on your own
architecture. Some architectures (such as i386 and x86_64) do not have this
limitation, but nevertheless it is important for you to write portable code
that works everywhere.
Natural alignment
=================
The rule we mentioned earlier forms what we refer to as natural alignment:
When accessing N bytes of memory, the base memory address must be evenly
divisible by N, i.e. addr % N == 0
When writing code, assume the target architecture has natural alignment
requirements.
Sidenote: in reality, only a few architectures require natural alignment
on all sizes of memory access. However, again we must consider ALL supported
architectures; natural alignment is the only way to achieve full portability.
Code that doesn't cause unaligned access
========================================
At first, the concepts above may seem a little hard to relate to actual
coding practice. After all, you don't have a great deal of control over
memory addresses of certain variables, etc.
Fortunately things are not too complex, as in most cases, the compiler
ensures that things will work for you. For example, take the following
structure:
struct foo {
u16 field1;
u32 field2;
u8 field3;
};
Let us assume that an instance of the above structure resides in memory
starting at address 0x10000000. With a basic level of understanding, it would
not be unreasonable to expect that accessing field2 would cause an unaligned
access. You'd be expecting field2 to be located at offset 2 bytes into the
structure, i.e. address 0x10000002, but that address is not evenly divisible
by 4 (remember, we're reading a 4 byte value here).
Fortunately, the compiler understands the alignment constraints, so in the
above case it would insert 2 bytes of padding inbetween field1 and field2.
Therefore, for standard structure types you can always rely on the compiler
to pad structures so that accesses to fields are suitably aligned (assuming
you do not cast the field to a type of different length).
Similarly, you can also rely on the compiler to align variables and function
parameters to a naturally aligned scheme, based on the size of the type of
the variable.
Sidenote: in the above example, you may wish to reorder the fields in the
above structure so that the overall structure uses less memory. For example,
moving field3 to sit inbetween field1 and field2 (where the padding is
inserted) would shrink the overall structure by 1 byte:
struct foo {
u16 field1;
u8 field3;
u32 field2;
};
Sidenote: it should be obvious by now, but in case it is not, accessing a
single byte (u8 or char) can never cause an unaligned access, because all
memory addresses are evenly divisible by 1.
Code that causes unaligned access
=================================
With the above in mind, let's move onto a real life example of a function
that can cause an unaligned memory access. The following function adapted
from include/linux/etherdevice.h is an optimized routine to compare two
ethernet MAC addresses for equality.
unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
{
const u16 *a = (const u16 *) addr1;
const u16 *b = (const u16 *) addr2;
return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
}
In the above function, the reference to a[0] causes 2 bytes (16 bits) to
be read from memory starting at address addr1. Think about what would happen
if addr1 was an odd address, such as 0x10000003. (Hint: it'd be an unaligned
access)
Despite the potential unaligned access problems with the above function, it
is included in the kernel anyway but is documented to only work on
16-bit-aligned addresses. It is up to the caller to ensure this alignment or
not use this function at all. This alignment-unsafe function is still useful
as it is a decent optimization for the cases when you can ensure alignment.
Here is another example of code that could cause unaligned accesses:
void myfunc(u8 *data, u32 value)
{
[...]
*((u32 *) data) = cpu_to_le32(value);
[...]
}
This code will cause unaligned accesses every time the data parameter points
to an address that is not evenly divisible by 4.
Consider the following structure:
struct foo {
u16 field1;
u32 field2;
u8 field3;
} __attribute__((packed));
It's the same structure as we looked at earlier, but the packed attribute has
been added. This attribute ensures that the compiler never inserts any padding
and the structure is laid out in memory exactly as is suggested above.
The packed attribute is useful when you want to use a C struct to represent
some data that comes in a fixed arrangement 'off the wire'.
It should be clear why accessing fields of an instance of that structure could
cause unaligned accesses in some situations. Even if the instance started at
an address such as 0x10000000 where accessing field1 would not cause an
unaligned access, accessing field2 would be reading 4 bytes from 0x10000002,
which, is an unaligned access. The compiler didn't jump to your rescue and
insert padding because you asked it not to.
In summary, the 3 main scenarios where you may run into unaligned access
problems involve:
1. Recasting variables to types of different lengths
2. Pointer arithmetic followed by access to at least 2 bytes of data
3. Accessing elements of packed structures
Avoiding unaligned accesses
===========================
Going back to an earlier example:
void myfunc(u8 *data, u32 value)
{
[...]
*((u16 *) data) = cpu_to_le32(value);
[...]
}
To avoid the unaligned memory access, you could rewrite it as follows:
void myfunc(u8 *data, u32 value)
{
[...]
value = cpu_to_le32(value);
memcpy(data, value, sizeof(value));
[...]
}
It's safe to assume that memcpy will always copy bytewise and hence will
never cause an unaligned access.
Recall an example packed structure from earlier:
struct foo {
u16 field1;
u32 field2;
u8 field3;
} __attribute__((packed));
The following code will potentially cause 2 unaligned accesses: writing to
field2, then reading from field2:
void myfunc2(u32 some_data)
{
struct foo myinstance;
u32 tmp;
myinstance.field2 = some_data;
tmp = myinstance.field2 * 2;
}
When writing this code, you should be aware that field2 acccesses are
potentially unaligned therefore the above will break on some systems. The
kernel provides two macros to simplify handling of situations such as the
above:
void myfunc2(u32 some_data)
{
struct foo myinstance;
u32 tmp;
put_unaligned(tmp, &myinstance.field2);
tmp = get_unaligned(&myinstance.field2);
}
These macros work from pointers to the unaligned data, and work for memory
accesses of any length (not just 32 bits as in the example above). You could
even use put_unaligned() rather than memcpy() in order to solve the bug in
the first example (myfunc()) given above.
--
Author: Daniel Drake <[email protected]>
With help from: Johannes Berg, Uli Kunitz.
On Nov 22, 2007 4:15 PM, Daniel Drake <[email protected]> wrote:
> Before I do so, any comments on the following?
>
< above case it would insert 2 bytes of padding inbetween field1 and field2.
> above case it would insert 2 bytes of padding in between field1 and field2.
< moving field3 to sit inbetween field1 and field2 (where the padding is
> moving field3 to sit in between field1 and field2 (where the padding is
--
avuton
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
Daniel Drake wrote:
> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
>
> Before I do so, any comments on the following?
...
> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.
Also, x86 doesn't prohibit unaligned accesses, but I believe they have a
significant performance cost and are best avoided where possible.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
Thanks you for working proactively on these problems.
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.
Not all. Some simply produce the wrong answer - thats oh so much more
exciting.
> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.
Its usually faster if you don't misalign on x86 as well.
Alan
Robert Hancock <[email protected]> writes:
>
> Also, x86 doesn't prohibit unaligned accesses,
That depends, e.g. for SSE2 they can be forbidden.
> but I believe they have
> a significant performance cost and are best avoided where possible.
On Opteron the typical cost of a misaligned access is a single cycle
and some possible penalty to load-store forwarding.
On Intel it is a bit worse, but not all that much. Unless you do
a lot of accesses of it in a loop it's not really worth something
caring about too much.
-Andi
On Nov 22, 2007, at 20:29:11, Alan Cox wrote:
>> Most architectures are unable to perform unaligned memory
>> accesses. Any unaligned access causes a processor exception.
>
> Not all. Some simply produce the wrong answer - thats oh so much
> more exciting.
As one example, the MicroBlaze soft-core processor family designed
for use on Xilinx FPGAs will (by default) simply forcibly zero the
lower bits of the unaligned address, such that the following code
will fail mysteriously:
const char foo[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 };
printf("0x%08lx 0x%08lx 0x%08lx 0x%08lx\n",
*((u32 *)(foo+0)),
*((u32 *)(foo+1)),
*((u32 *)(foo+2)),
*((u32 *)(foo+3)));
Instead of outputting:
0x00010203 0x01020304 0x02030405 0x03040506
It will output:
0x00010203 0x00010203 0x00010203 0x00010203
Other embedded architectures have very similar problems. Some may
provide an "unaligned data access" exception, but offer insufficient
information to repair the damage and resume execution.
Cheers,
Kyle Moffett
On Fri, 23 Nov 2007, Alan Cox wrote:
> Its usually faster if you don't misalign on x86 as well.
i'm not sure if i agree with "usually"... but i know you (alan) are
probably aware of the exact requirements of the hw.
for everyone else:
on intel x86 processors an access is unaligned only if it crosses a
cacheline boundary (64 bytes). otherwise it's aligned. the penalty for
crossing a cacheline boundary varies from ~12 cycles (core2) to many
dozens of cycles (p4).
on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16
bytes. the penalty is a mere 3 cycles if an access crosses the specified
boundary.
if you're making <= 4 byte accesses i recommend not worrying about
alignment on x86. it's pretty hard to beat the hardware support.
i curse all the RISC and embedded processor designers who pretend
unaligned accesses are something evil and to be avoided. in case you're
worried, MIPS patent 4,814,976 expired in december 2006 :)
-dean
dean gaudet <[email protected]> writes:
> on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16
> bytes. the penalty is a mere 3 cycles if an access crosses the specified
> boundary.
Worth noting though, is that atomic accesses that cross cache lines on
an Opteron system is going to lock down the Hypertransport fabric for
you during the operation -- which is obviously not so nice.
--
Arne.
On Nov 23 2007 00:15, Daniel Drake wrote:
>
>What's the definition of an unaligned access?
>=============================================
>
>Unaligned memory accesses occur when you try to read N bytes of data starting
>from an address that is not evenly divisible by N (i.e. addr % N != 0).
>For example, reading 4 bytes of data from address 0x10000004 is fine, but
>reading 4 bytes of data from address 0x10000005 would be an unaligned memory
>access.
>
Try shorter numbers, like 0x10005 :)
>Code that doesn't cause unaligned access
>========================================
In written style, not using n't contracted forms might be preferable.
>Sidenote: in the above example, you may wish to reorder the fields in the
>above structure so that the overall structure uses less memory. For example,
>moving field3 to sit inbetween field1 and field2 (where the padding is
>inserted) would shrink the overall structure by 1 byte:
>
> struct foo {
> u16 field1;
> u8 field3;
> u32 field2;
> };
>
>Sidenote: it should be obvious by now, but in case it is not, accessing a
>single byte (u8 or char) can never cause an unaligned access, because all
>memory addresses are evenly divisible by 1.
Sidenote: You would want an alignment like this:
struct foo {
uint32_t field2;
uint16_t field1;
uint8_t field3;
};
>Consider the following structure:
> struct foo {
> u16 field1;
> u32 field2;
> u8 field3;
> } __attribute__((packed));
>
>It's the same structure as we looked at earlier, but the packed attribute has
>been added. This attribute ensures that the compiler never inserts any padding
>and the structure is laid out in memory exactly as is suggested above.
>
>The packed attribute is useful when you want to use a C struct to represent
>some data that comes in a fixed arrangement 'off the wire'.
>
In the packed case, does not GCC automatically output extra instructions to not
run into unaligned access?
>To avoid the unaligned memory access, you could rewrite it as follows:
>
> void myfunc(u8 *data, u32 value)
> {
> [...]
> value = cpu_to_le32(value);
> memcpy(data, value, sizeof(value));
> [...]
> }
>
>It's safe to assume that memcpy will always copy bytewise and hence will
>never cause an unaligned access.
>
Usually it copies register-size-wise where possible and bytesize at the
left and right edges if they are unaligned. That's how glibc memcpy does it,
not sure how complete the kernel memcpy is in this regard.
On Fri, Nov 23, 2007 at 12:15:53AM +0000, Daniel Drake wrote:
> Why unaligned access is bad
> ===========================
>
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.
"Some architectures are unable to perform unaligned memory accesses,
either an exception is generated, or the data
access is silently invalid. In architectures that allow unaligned
access, natural aligned accesses are usually faster than non-aligned."
> In summary: if your code causes unaligned memory accesses to happen, your code
> will not work on some platforms, and will perform *very* badly on others.
*very* -> *slower*
> Natural alignment
> =================
Please move this definition before "Why unaligned access is bad".
Also, it would be nice to have a table of ISAs:
ISA Need Need
natural alignment
alignment by x
--------------------------------------------
m68k No 2
powerpc/ppc Yes Word size
x86 No No
x86_64 No No
--
Heikki Orsila Barbie's law:
[email protected] "Math is hard, let's go shopping!"
http://www.iki.fi/shd
On Thursday 22 November 2007 04:15:53 pm Daniel Drake wrote:
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
>
> struct foo {
> u16 field1;
> u32 field2;
> u8 field3;
> };
>
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).
It would also insert 3 bytes of padding after field3, in order to satisfy
alignment constraints for arrays of these structures.
> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For
> example, moving field3 to sit inbetween field1 and field2 (where the
> padding is inserted) would shrink the overall structure by 1 byte:
>
> struct foo {
> u16 field1;
> u8 field3;
> u32 field2;
> };
It will actually shrink it by 4 bytes, for the very same reason.
-- Vadim Lobanov
Daniel Drake ?????:
> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
>
> Before I do so, any comments on the following?
>From the viewpoint of yours truly (and I am a teacher of operating system classes), this is a long-expected document, which is going to be very useful especially for newbies. My students often make alignment mistakes in their code, and your article will definitely make my job much easier.
Thank you, Daniel, for your work.
Dmitri
>
> Thanks,
> Daniel
>
>
>
>
> UNALIGNED MEMORY ACCESSES
> =========================
>
> Linux runs on a wide variety of architectures which have varying behaviour
> when it comes to memory access. This document presents some details about
> unaligned accesses, why you need to write code that doesn't cause them,
> and how to write such code!
>
>
> What's the definition of an unaligned access?
> =============================================
>
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
> For example, reading 4 bytes of data from address 0x10000004 is fine, but
> reading 4 bytes of data from address 0x10000005 would be an unaligned memory
> access.
>
>
> Why unaligned access is bad
> ===========================
>
> Most architectures are unable to perform unaligned memory accesses. Any
> unaligned access causes a processor exception.
>
> Some architectures have an exception handler implemented in the kernel which
> corrects the memory access, but this is very expensive and is not true for
> all architectures. You cannot rely on the exception handler to correct your
> memory accesses.
>
> In summary: if your code causes unaligned memory accesses to happen, your code
> will not work on some platforms, and will perform *very* badly on others.
>
> You may be wondering why you have never seen these problems on your own
> architecture. Some architectures (such as i386 and x86_64) do not have this
> limitation, but nevertheless it is important for you to write portable code
> that works everywhere.
>
>
> Natural alignment
> =================
>
> The rule we mentioned earlier forms what we refer to as natural alignment:
> When accessing N bytes of memory, the base memory address must be evenly
> divisible by N, i.e. addr % N == 0
>
> When writing code, assume the target architecture has natural alignment
> requirements.
>
> Sidenote: in reality, only a few architectures require natural alignment
> on all sizes of memory access. However, again we must consider ALL supported
> architectures; natural alignment is the only way to achieve full portability.
>
>
> Code that doesn't cause unaligned access
> ========================================
>
> At first, the concepts above may seem a little hard to relate to actual
> coding practice. After all, you don't have a great deal of control over
> memory addresses of certain variables, etc.
>
> Fortunately things are not too complex, as in most cases, the compiler
> ensures that things will work for you. For example, take the following
> structure:
>
> struct foo {
> u16 field1;
> u32 field2;
> u8 field3;
> };
>
> Let us assume that an instance of the above structure resides in memory
> starting at address 0x10000000. With a basic level of understanding, it would
> not be unreasonable to expect that accessing field2 would cause an unaligned
> access. You'd be expecting field2 to be located at offset 2 bytes into the
> structure, i.e. address 0x10000002, but that address is not evenly divisible
> by 4 (remember, we're reading a 4 byte value here).
>
> Fortunately, the compiler understands the alignment constraints, so in the
> above case it would insert 2 bytes of padding inbetween field1 and field2.
> Therefore, for standard structure types you can always rely on the compiler
> to pad structures so that accesses to fields are suitably aligned (assuming
> you do not cast the field to a type of different length).
>
> Similarly, you can also rely on the compiler to align variables and function
> parameters to a naturally aligned scheme, based on the size of the type of
> the variable.
>
> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For example,
> moving field3 to sit inbetween field1 and field2 (where the padding is
> inserted) would shrink the overall structure by 1 byte:
>
> struct foo {
> u16 field1;
> u8 field3;
> u32 field2;
> };
>
> Sidenote: it should be obvious by now, but in case it is not, accessing a
> single byte (u8 or char) can never cause an unaligned access, because all
> memory addresses are evenly divisible by 1.
>
>
> Code that causes unaligned access
> =================================
>
> With the above in mind, let's move onto a real life example of a function
> that can cause an unaligned memory access. The following function adapted
> from include/linux/etherdevice.h is an optimized routine to compare two
> ethernet MAC addresses for equality.
>
> unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
> {
> const u16 *a = (const u16 *) addr1;
> const u16 *b = (const u16 *) addr2;
> return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
> }
>
> In the above function, the reference to a[0] causes 2 bytes (16 bits) to
> be read from memory starting at address addr1. Think about what would happen
> if addr1 was an odd address, such as 0x10000003. (Hint: it'd be an unaligned
> access)
>
> Despite the potential unaligned access problems with the above function, it
> is included in the kernel anyway but is documented to only work on
> 16-bit-aligned addresses. It is up to the caller to ensure this alignment or
> not use this function at all. This alignment-unsafe function is still useful
> as it is a decent optimization for the cases when you can ensure alignment.
>
>
> Here is another example of code that could cause unaligned accesses:
> void myfunc(u8 *data, u32 value)
> {
> [...]
> *((u32 *) data) = cpu_to_le32(value);
> [...]
> }
>
> This code will cause unaligned accesses every time the data parameter points
> to an address that is not evenly divisible by 4.
>
>
> Consider the following structure:
> struct foo {
> u16 field1;
> u32 field2;
> u8 field3;
> } __attribute__((packed));
>
> It's the same structure as we looked at earlier, but the packed attribute has
> been added. This attribute ensures that the compiler never inserts any padding
> and the structure is laid out in memory exactly as is suggested above.
>
> The packed attribute is useful when you want to use a C struct to represent
> some data that comes in a fixed arrangement 'off the wire'.
>
> It should be clear why accessing fields of an instance of that structure could
> cause unaligned accesses in some situations. Even if the instance started at
> an address such as 0x10000000 where accessing field1 would not cause an
> unaligned access, accessing field2 would be reading 4 bytes from 0x10000002,
> which, is an unaligned access. The compiler didn't jump to your rescue and
> insert padding because you asked it not to.
>
>
> In summary, the 3 main scenarios where you may run into unaligned access
> problems involve:
> 1. Recasting variables to types of different lengths
> 2. Pointer arithmetic followed by access to at least 2 bytes of data
> 3. Accessing elements of packed structures
>
>
> Avoiding unaligned accesses
> ===========================
>
> Going back to an earlier example:
> void myfunc(u8 *data, u32 value)
> {
> [...]
> *((u16 *) data) = cpu_to_le32(value);
> [...]
> }
>
> To avoid the unaligned memory access, you could rewrite it as follows:
>
> void myfunc(u8 *data, u32 value)
> {
> [...]
> value = cpu_to_le32(value);
> memcpy(data, value, sizeof(value));
> [...]
> }
>
> It's safe to assume that memcpy will always copy bytewise and hence will
> never cause an unaligned access.
>
>
> Recall an example packed structure from earlier:
>
> struct foo {
> u16 field1;
> u32 field2;
> u8 field3;
> } __attribute__((packed));
>
> The following code will potentially cause 2 unaligned accesses: writing to
> field2, then reading from field2:
>
> void myfunc2(u32 some_data)
> {
> struct foo myinstance;
> u32 tmp;
>
> myinstance.field2 = some_data;
> tmp = myinstance.field2 * 2;
> }
>
> When writing this code, you should be aware that field2 acccesses are
> potentially unaligned therefore the above will break on some systems. The
> kernel provides two macros to simplify handling of situations such as the
> above:
>
> void myfunc2(u32 some_data)
> {
> struct foo myinstance;
> u32 tmp;
>
> put_unaligned(tmp, &myinstance.field2);
> tmp = get_unaligned(&myinstance.field2);
> }
>
> These macros work from pointers to the unaligned data, and work for memory
> accesses of any length (not just 32 bits as in the example above). You could
> even use put_unaligned() rather than memcpy() in order to solve the bug in
> the first example (myfunc()) given above.
>
> --
> Author: Daniel Drake <[email protected]>
> With help from: Johannes Berg, Uli Kunitz.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Fri, 23 Nov 2007 00:15:53 +0000 (GMT)
Daniel Drake <[email protected]> wrote:
> Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> grasp on unaligned memory access problems on other architectures and decided
> it was time to figure it out. As a result I've written this documentation
> which I plan to submit for inclusion as
> Documentation/unaligned_memory_access.txt
>
> Before I do so, any comments on the following?
>
A very nice, and much needed document. I think you should include one thing though:
memcpy() is _only_ safe when one of the pointers is char* or void*. If it is anything more complex than that, gcc will assume alignment and optimise based on that. E.g. memcpy() of two long:s generates the same assembly as doing an assignment.
(Technically it is no different for char* and void*, but since they have byte alignment, gcc can't really do anything creative.)
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Sat, Nov 24, 2007 at 02:34:41PM +0100, Pierre Ossman wrote:
> On Fri, 23 Nov 2007 00:15:53 +0000 (GMT)
> Daniel Drake <[email protected]> wrote:
>
> > Being spoilt by the luxuries of i386/x86_64 I've never really had a good
> > grasp on unaligned memory access problems on other architectures and decided
> > it was time to figure it out. As a result I've written this documentation
> > which I plan to submit for inclusion as
> > Documentation/unaligned_memory_access.txt
> >
> > Before I do so, any comments on the following?
> >
>
> A very nice, and much needed document. I think you should include one thing though:
>
> memcpy() is _only_ safe when one of the pointers is char* or void*. If it is anything more complex than that, gcc will assume alignment and optimise based on that. E.g. memcpy() of two long:s generates the same assembly as doing an assignment.
Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
in any case. Intelligent ones, like the one provided in glibc, first copy
bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
of word size, and then it copies word-by-word.
Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
copies the last bytes.
So, in effect, as long as no packed structures are used, memcpy should
be safer on *int, etc., than *char, as the compiler ensures
word-alignment.
--
lfr
0/0
On Sat, 24 Nov 2007 15:50:52 +0000
Luciano Rocha <[email protected]> wrote:
>
> Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
> in any case. Intelligent ones, like the one provided in glibc, first copy
> bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
> of word size, and then it copies word-by-word.
>
> Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
> copies the last bytes.
>
> So, in effect, as long as no packed structures are used, memcpy should
> be safer on *int, etc., than *char, as the compiler ensures
> word-alignment.
>
It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation.
E.g., we have the following struct:
struct foo
{
u8 a[4];
u32 b;
};
This struct will have a size of 8 bytes and an alignment of 4 bytes (caused by the member b). Now take the following code:
void copy_foo(struct foo *dst, struct foo *src)
{
*dst = *src;
}
On a platform that supports 64-bit loads and stores (e.g. AVR32, where I got hit by this), this will generate:
LD r1, (src)
ST r1, (dst)
Now if I replace that with:
void copy_foo(struct foo *dst, struct foo *src)
{
memcpy(dst, src, sizeof(struct foo));
}
then it will generate the same code. So I cannot use copy_foo() to transfer a struct foo either out of, or into a packet buffer.
In other words, memcpy() does _not_ save you from alignment issues. If you cast from char* or void* to something else, you better be damn sure the alignment is correct because gcc will assume it is.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> On Sat, 24 Nov 2007 15:50:52 +0000
> Luciano Rocha <[email protected]> wrote:
>
> >
> > Dumb memcpy (while (len--) { *d++ = *s++ }) will have alignment problems
> > in any case. Intelligent ones, like the one provided in glibc, first copy
> > bytes till output is aligned (C file) *or* size is a multiple (i686 asm file)
> > of word size, and then it copies word-by-word.
> >
> > Linux's x86_64 memcpy does the opposite, copies 64bit words, and then
> > copies the last bytes.
> >
> > So, in effect, as long as no packed structures are used, memcpy should
> > be safer on *int, etc., than *char, as the compiler ensures
> > word-alignment.
> >
>
> It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation.
Yes, on *int and other assumed aligned pointers, gcc uses its internal
version.
However, my point is that those pointers, unless speaking of packed
structures, can safely be assumed aligned, while char*/void* can't.
> In other words, memcpy() does _not_ save you from alignment issues. If you cast from char* or void* to something else, you better be damn sure the alignment is correct because gcc will assume it is.
Nothing does, even memcpy doesn't check alignment of the source, or
alignment at all in some assembly implementations (only word-copy,
without checking if at word-boundary).
--
lfr
0/0
On Sat, 24 Nov 2007 17:22:36 +0000
Luciano Rocha <[email protected]> wrote:
> On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> > It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation.
>
> Yes, on *int and other assumed aligned pointers, gcc uses its internal
> version.
>
> However, my point is that those pointers, unless speaking of packed
> structures, can safely be assumed aligned, while char*/void* can't.
>
I get the sensation we're violently in agreement here, just misunderstanding each other. :)
_My_ point was that the documentation should mention that normal, unpacked C objects have alignments that influence the code generated by __builtin_memcpy(). As such, one should always make sure to have either src or dst be char*/void* when alignment cannot be guaranteed. The example in the documentation has this, but it isn't explicit that this is required.
Rgds
--
-- Pierre Ossman
Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
On Sat, 24 Nov 2007 17:22:36 +0000
Luciano Rocha <[email protected]> wrote:
> Nothing does, even memcpy doesn't check alignment of the source, or
> alignment at all in some assembly implementations (only word-copy,
> without checking if at word-boundary).
An out-of-line implementation can only do that if the architecture
allows unaligned loads and stores. Since it has no clue about the types
involved, it must assume that both pointers as well as the length may be
misaligned.
gcc, on the other hand, knows exactly what types are involved, so when
it expands its own builtin-memcpy inline it can optimize it based on
the required alignment of those types. So when you cast between types
with different alignment requirements, you must make sure the result is
properly aligned, or you need to use get_unaligned()/put_unaligned()
to override gcc's assumptions.
Btw, some versions of avr32-gcc (I think it was 4.0.x) assumed packed
structs were properly aligned too, with disastrous results. gcc-4.1
handles packed structs correctly as far as I can tell.
Håvard
On Sat, Nov 24, 2007 at 06:35:25PM +0100, Pierre Ossman wrote:
> On Sat, 24 Nov 2007 17:22:36 +0000
> Luciano Rocha <[email protected]> wrote:
>
> > On Sat, Nov 24, 2007 at 05:19:31PM +0100, Pierre Ossman wrote:
> > > It most certainly does not. gcc will assume that an int* has int alignment. memcpy() is a builtin, which gcc can translate to pretty much anything. And C specifies that a pointer to foo, will point to a real object of type foo, so gcc can't be blamed for the unsafe typecasts. I have tested this the hard way, so this is not just speculation.
> >
> > Yes, on *int and other assumed aligned pointers, gcc uses its internal
> > version.
> >
> > However, my point is that those pointers, unless speaking of packed
> > structures, can safely be assumed aligned, while char*/void* can't.
> >
>
> I get the sensation we're violently in agreement here, just misunderstanding each other. :)
That's it. :)
Sorry for the noise,...
--
lfr
0/0
On Thursday 22 November 2007 16:15, Daniel Drake wrote:
> In summary: if your code causes unaligned memory accesses to happen, your
> code will not work on some platforms, and will perform *very* badly on
> others.
Although understanding alignment is important, there is another
extreme - what I call "sadistic alignment". It's when data is being
aligned even if it will definitely run on an arch which doesn't require
this (arch/x86/*), or data being aligned to ridiculously large boundary.
Like gcc aligning any char array bigger that 31 byte to 32 bytes.
Bytes, not bits. Try to compile this with -O2:
static char s1[] = "12345678901234567890123456789012";
static char s2[] = "12345678901234567890123456789012";
void f(char*);
void g() {
f(s1);
f(s2);
}
$ hexdump -Cv t.o
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 38 01 00 00 00 00 00 00 34 00 00 00 00 00 28 00 |8.......4.....(.|
00000030 0a 00 07 00 55 89 e5 83 ec 08 c7 04 24 40 00 00 |....U.......$@..|
00000040 00 e8 fc ff ff ff c7 04 24 00 00 00 00 e8 fc ff |........$.......|
00000050 ff ff c9 c3 00 00 00 00 00 00 00 00 00 00 00 00 |................| <=== HERE
00000060 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456|
00000070 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| <=== HERE
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| <=== HERE
000000a0 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 |1234567890123456|
000000b0 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012|
000000c0 00 00 00 00 00 47 43 43 3a 20 28 47 4e 55 29 20 |.....GCC: (GNU) |
000000d0 34 2e 30 2e 33 20 28 55 62 75 6e 74 75 20 34 2e |4.0.3 (Ubuntu 4.|
000000e0 30 2e 33 2d 31 75 62 75 6e 74 75 35 29 00 00 2e |0.3-1ubuntu5)...|
000000f0 73 79 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e |symtab..strtab..|
43 bytes wasted!
Thankfully, it is fixed in later gcc versions.
Please do not succumb to "alignment scare" in your doc.
--
vda
On Fri, 23 Nov 2007, Heikki Orsila wrote:
> On Fri, Nov 23, 2007 at 12:15:53AM +0000, Daniel Drake wrote:
> > Why unaligned access is bad
> > ===========================
> >
> > Most architectures are unable to perform unaligned memory accesses. Any
> > unaligned access causes a processor exception.
>
> "Some architectures are unable to perform unaligned memory accesses,
> either an exception is generated, or the data
> access is silently invalid. In architectures that allow unaligned
> access, natural aligned accesses are usually faster than non-aligned."
>
> > In summary: if your code causes unaligned memory accesses to happen, your code
> > will not work on some platforms, and will perform *very* badly on others.
>
> *very* -> *slower*
>
> > Natural alignment
> > =================
>
> Please move this definition before "Why unaligned access is bad".
>
> Also, it would be nice to have a table of ISAs:
>
> ISA Need Need
> natural alignment
> alignment by x
> --------------------------------------------
> m68k No 2
`No' for >= 68020.
`Yes' for < 68020.
> powerpc/ppc Yes Word size
> x86 No No
> x86_64 No No
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Sun, Nov 25, 2007 at 12:16:08PM +0100, Geert Uytterhoeven wrote:
> > ISA Need Need
> > natural alignment
> > alignment by x
> > --------------------------------------------
> > m68k No 2
>
> `No' for >= 68020.
> `Yes' for < 68020.
My bad, yes..
mc68020+ No No
(mc68000/010 No 2) (not for Linux)
--
Heikki Orsila Barbie's law:
[email protected] "Math is hard, let's go shopping!"
http://www.iki.fi/shd
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
Should clarify that you mean "with power-of-two N" - even more
strictly this depends on the processor, but I'm pretty sure there is
none which supports aligned accesses of N==3...
Olaf
> mc68020+ No No
> (mc68000/010 No 2) (not for Linux)
Actually ucLinux has been persuaded to run on m68000.
On Nov 23, 2007 1:15 AM, Daniel Drake <[email protected]> wrote:
[...]
>
> Before I do so, any comments on the following?
>
[...]
> void myfunc(u8 *data, u32 value)
> {
> [...]
> value = cpu_to_le32(value);
> memcpy(data, value, sizeof(value));
> [...]
> }
I suppose you mean:
memcpy(data, &value, sizeof(value));
/DM
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote:
> dean gaudet <[email protected]> writes:
> > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16
> > bytes. the penalty is a mere 3 cycles if an access crosses the specified
> > boundary.
>
> Worth noting though, is that atomic accesses that cross cache lines on
> an Opteron system is going to lock down the Hypertransport fabric for
> you during the operation -- which is obviously not so nice.
ooh awesome, i hadn't measured that before.
on a 2 node sockF / revF with a random pointer chase running on cpu 0 /
node 0 i see the avg load-to-load cache miss latency jump from 77ns to
109ns when i add an unaligned lock-intensive workload on one core of node
1. the worst i can get the pointer chase latency to is 273ns when i add
two threads on node 1 fighting over an unaligned lock.
on a 4 node (square) the worst case i can get seems to be an increase from
98ns with no antagonist to 385ns with 6 antagonists fighting over an
unaligned lock on the other 3 nodes.
cool.
-dean
> Sidenote: in the above example, you may wish to reorder the fields in the
> above structure so that the overall structure uses less memory. For example,
> moving field3 to sit inbetween field1 and field2 (where the padding is
> inserted) would shrink the overall structure by 1 byte:
>
> struct foo {
> u16 field1;
> u8 field3;
> u32 field2;
> };
You can reorder to u32, u16, u8 order and save another byte :)
A reference to pahole could be appropriate here, and probably a small
note that some large existing structures like netdev have deliberate
holes to achieve cache alignment.
johannes
> Going back to an earlier example:
> void myfunc(u8 *data, u32 value)
> {
> [...]
> *((u16 *) data) = cpu_to_le32(value);
> [...]
typo? should it be a u32 cast?
> To avoid the unaligned memory access, you could rewrite it as follows:
>
> void myfunc(u8 *data, u32 value)
> {
> [...]
> value = cpu_to_le32(value);
> memcpy(data, value, sizeof(value));
> [...]
> }
I think you should use put_unaligned here as well. Or maybe just reorder
this vs. the section below where you use get/put_unaligned.
johannes
On Fri, Nov 23, 2007 at 01:43:29PM +0200, Heikki Orsila wrote:
> On Fri, Nov 23, 2007 at 12:15:53AM +0000, Daniel Drake wrote:
> > Why unaligned access is bad
> > ===========================
> >
> > Most architectures are unable to perform unaligned memory accesses. Any
> > unaligned access causes a processor exception.
>
> "Some architectures are unable to perform unaligned memory accesses,
> either an exception is generated, or the data
> access is silently invalid. In architectures that allow unaligned
> access, natural aligned accesses are usually faster than non-aligned."
>
> > In summary: if your code causes unaligned memory accesses to happen, your code
> > will not work on some platforms, and will perform *very* badly on others.
>
> *very* -> *slower*
>
> > Natural alignment
> > =================
>
> Please move this definition before "Why unaligned access is bad".
>
> Also, it would be nice to have a table of ISAs:
>
> ISA Need Need
> natural alignment
> alignment by x
> --------------------------------------------
> m68k No 2
> powerpc/ppc Yes Word size
> x86 No No
> x86_64 No No
arm32 Yes 2 for 16bit data, 4 for 32bit
Note, if the unaligned handler is running, the alignment will be fixed
by the fault handler (at the cost of taking a fault). If the unaligned
handler is turned off, you get a "free" shift of the data instead.
--
Ben ([email protected], http://www.fluff.org/)
'a smiley only costs 4 bytes'
Em Mon, Nov 26, 2007 at 03:47:06PM +0100, Johannes Berg escreveu:
>
> > Sidenote: in the above example, you may wish to reorder the fields in the
> > above structure so that the overall structure uses less memory. For example,
> > moving field3 to sit inbetween field1 and field2 (where the padding is
> > inserted) would shrink the overall structure by 1 byte:
> >
> > struct foo {
> > u16 field1;
> > u8 field3;
> > u32 field2;
> > };
>
> You can reorder to u32, u16, u8 order and save another byte :)
>
> A reference to pahole could be appropriate here, and probably a small
> note that some large existing structures like netdev have deliberate
> holes to achieve cache alignment.
shameless plug:
https://ols2006.108.redhat.com/2007/Reprints/melo-Reprint.pdf
- Arnaldo
On Nov 23, 2007, at 5:43 AM, Heikki Orsila wrote:
> On Fri, Nov 23, 2007 at 12:15:53AM +0000, Daniel Drake wrote:
>> Why unaligned access is bad
>> ===========================
>>
>> Most architectures are unable to perform unaligned memory accesses.
>> Any
>> unaligned access causes a processor exception.
>
> "Some architectures are unable to perform unaligned memory accesses,
> either an exception is generated, or the data
> access is silently invalid. In architectures that allow unaligned
> access, natural aligned accesses are usually faster than non-aligned."
>
>> In summary: if your code causes unaligned memory accesses to
>> happen, your code
>> will not work on some platforms, and will perform *very* badly on
>> others.
>
> *very* -> *slower*
>
>> Natural alignment
>> =================
>
> Please move this definition before "Why unaligned access is bad".
>
> Also, it would be nice to have a table of ISAs:
>
> ISA Need Need
> natural alignment
> alignment by x
> --------------------------------------------
> m68k No 2
> powerpc/ppc Yes Word size
on ppc it varies from processor to processor if misaligned data is
fixed up or causes an exception. However its highly recommend to be
naturally aligned. I'm not sure I follow what is meant by the second
column (need alignment by x).
- k
On Fri, 23 November 2007 00:15:53 +0000, Daniel Drake wrote:
>
> What's the definition of an unaligned access?
> =============================================
>
> Unaligned memory accesses occur when you try to read N bytes of data starting
> from an address that is not evenly divisible by N (i.e. addr % N != 0).
> For example, reading 4 bytes of data from address 0x10000004 is fine, but
> reading 4 bytes of data from address 0x10000005 would be an unaligned memory
> access.
The wording could also apply to a DMA of 8k from a 4k-aligned address.
But I don't have a good idea how to improve it.
> It's safe to assume that memcpy will always copy bytewise and hence will
> never cause an unaligned access.
s/always copy/always behave as if copying/
memcpy usually copies at least wordwise, possibly even in bigger chunks.
But that is just the inner loop. Unaligned bytes at the beginning/end
receive special treatment.
Jörn
--
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop