Subject: GCC, unaligned access and UB in the Linux kernel

Hi everyone,

I noticed that in the Linux kernel we have a define called
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS that's used in a fair amount of places
with the following purpose: when it's set, unaligned access is supported by the
CPU so we can do it directly, otherwise fall-back to some logic where a byte at
the time is read/written. For example, check the implementation of
do_strncpy_from_user():
https://elixir.bootlin.com/linux/latest/source/lib/strncpy_from_user.c#L15


That's nice and seems to work today as expected, but there's one problem:
unaligned access is UB according to the ISO C standard, no matter if the
architecture supports it or not. Also, GCC doesn't have any option similar to "-
fno-strict-aliasing" to make unaligned access non-UB. At the moment, they won't
consider introducing such an option either. Check this bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031

Starting from GCC 8.x, the compiler introduced some new optimizations that
assume correct alignment which can break some code[1]. However, unaligned access
like the following [from do_strncpy_from_user()]:

*(unsigned long *)(dst+res) = c;

Still generate correct instructions because:

1. There's no aliasing involved [1]
2. SIMD instructions are not allowed in the kernel [2]

But that doesn't mean at all that things won't change in the future. At any
point, some optimization in a newer compiler might generate incorrect code even
for the above-mentioned example. Therefore, while I understand compiler
engineers' point of view (they provide a compiler with an ISO-compliant
behaviour), I'm concerned about the correctness of all the code that assumes
unaligned access (on some architectures) in C is just fine.

Mitigations
------------
In my understanding, the simplest way to force GCC to emit a single MOV
instruction with unaligned access, without risking any kind of UB, is to use
__builtin_memcpy(). It works great, but it requires fixing the code in many
places. Also, maybe using get_unaligned()/put_unaligned() is the right thing to
do? The problem is that the put_unaligned_* inline functions don't use
__builtin_memcpy() and are defined like:

static __always_inline void put_unaligned_le32(u32 val, void *p)
{
*((__le32 *)p) = cpu_to_le32(val);
}

So, still UB. To make the compiler happy, maybe we should make them use
__builtin_memcpy()?


Conclusion
-------------
What do you think about all of this? I realize that this is not a big urgent
problem *right now*, but at some point it might become one. How do you believe
this problem should be addressed in Linux?


Thanks,
Vlad



Footnotes
---------------------------------------------------
[1] If aliasing is involved, even with -fno-strict-aliasing, unaligned access
WILL break some code, today. Check the following example:

int h(int *p, int *q){
*p = 1;
*q = 1;
return *p;
}

typedef __attribute__((__may_alias__)) int I;

I k(I *p, I *q){
*p = 1;
*q = 1;
return *p;
}

Starting from GCC 8.1, both h() and k() will always return 1, when compiled with
-O2, even with -fno-strict-aliasing.

[2] Some SIMD instructions have alignment requirements that recent compilers
might just start to assume to be true, in my current understanding. In general,
SIMD instructions can be emitted automatically by the compiler because of auto-
vectorization. But, fortunately, that *cannot* happen in the kernel because we
build with -fno-mmx, -fno-sse, -fno-avx etc.

--
Vladislav Valtchev (vvaltchev)


2021-05-04 20:44:59

by Florian Weimer

[permalink] [raw]
Subject: Re: GCC, unaligned access and UB in the Linux kernel

* Vladislav K. Valtchev:

> Hi everyone,
>
> I noticed that in the Linux kernel we have a define called
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS that's used in a fair amount of places
> with the following purpose: when it's set, unaligned access is supported by the
> CPU so we can do it directly, otherwise fall-back to some logic where a byte at
> the time is read/written. For example, check the implementation of
> do_strncpy_from_user():
> https://elixir.bootlin.com/linux/latest/source/lib/strncpy_from_user.c#L15
>
>
> That's nice and seems to work today as expected, but there's one problem:
> unaligned access is UB according to the ISO C standard, no matter if the
> architecture supports it or not. Also, GCC doesn't have any option similar to "-
> fno-strict-aliasing" to make unaligned access non-UB. At the moment, they won't
> consider introducing such an option either. Check this bug:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031
>
> Starting from GCC 8.x, the compiler introduced some new optimizations that
> assume correct alignment which can break some code[1]. However, unaligned access
> like the following [from do_strncpy_from_user()]:
>
> *(unsigned long *)(dst+res) = c;
>
> Still generate correct instructions because:
>
> 1. There's no aliasing involved [1]
> 2. SIMD instructions are not allowed in the kernel [2]
>
> But that doesn't mean at all that things won't change in the future. At any
> point, some optimization in a newer compiler might generate incorrect code even
> for the above-mentioned example. Therefore, while I understand compiler
> engineers' point of view (they provide a compiler with an ISO-compliant
> behaviour), I'm concerned about the correctness of all the code that assumes
> unaligned access (on some architectures) in C is just fine.
>
> Mitigations
> ------------
> In my understanding, the simplest way to force GCC to emit a single MOV
> instruction with unaligned access, without risking any kind of UB, is to use
> __builtin_memcpy(). It works great, but it requires fixing the code in many
> places. Also, maybe using get_unaligned()/put_unaligned() is the right thing to
> do? The problem is that the put_unaligned_* inline functions don't use
> __builtin_memcpy() and are defined like:
>
> static __always_inline void put_unaligned_le32(u32 val, void *p)
> {
> *((__le32 *)p) = cpu_to_le32(val);
> }
>
> So, still UB. To make the compiler happy, maybe we should make them use
> __builtin_memcpy()?
>
>
> Conclusion
> -------------
> What do you think about all of this? I realize that this is not a big urgent
> problem *right now*, but at some point it might become one. How do you believe
> this problem should be addressed in Linux?
>
>
> Thanks,
> Vlad
>
>
>
> Footnotes
> ---------------------------------------------------
> [1] If aliasing is involved, even with -fno-strict-aliasing, unaligned access
> WILL break some code, today. Check the following example:
>
> int h(int *p, int *q){
> *p = 1;
> *q = 1;
> return *p;
> }
>
> typedef __attribute__((__may_alias__)) int I;
>
> I k(I *p, I *q){
> *p = 1;
> *q = 1;
> return *p;
> }
>
> Starting from GCC 8.1, both h() and k() will always return 1, when compiled with
> -O2, even with -fno-strict-aliasing.
>
> [2] Some SIMD instructions have alignment requirements that recent compilers
> might just start to assume to be true, in my current understanding. In general,
> SIMD instructions can be emitted automatically by the compiler because of auto-
> vectorization. But, fortunately, that *cannot* happen in the kernel because we
> build with -fno-mmx, -fno-sse, -fno-avx etc.

Cc:ing linux-toolchains.

__attribute__ ((aligned (1))) can be used to reduce alignment, similar
to attribute packed on structs. If that doesn't work for partially
overlapping accesses, that's probably a compiler bug.

2021-05-04 20:54:54

by Willy Tarreau

[permalink] [raw]
Subject: Re: GCC, unaligned access and UB in the Linux kernel

Hi Florian,

On Tue, May 04, 2021 at 10:35:39PM +0200, Florian Weimer wrote:
> > [1] If aliasing is involved, even with -fno-strict-aliasing, unaligned access
> > WILL break some code, today. Check the following example:
> >
> > int h(int *p, int *q){
> > *p = 1;
> > *q = 1;
> > return *p;
> > }
> >
> > typedef __attribute__((__may_alias__)) int I;
> >
> > I k(I *p, I *q){
> > *p = 1;
> > *q = 1;
> > return *p;
> > }
> >
> > Starting from GCC 8.1, both h() and k() will always return 1, when compiled with
> > -O2, even with -fno-strict-aliasing.
> >
> > [2] Some SIMD instructions have alignment requirements that recent compilers
> > might just start to assume to be true, in my current understanding. In general,
> > SIMD instructions can be emitted automatically by the compiler because of auto-
> > vectorization. But, fortunately, that *cannot* happen in the kernel because we
> > build with -fno-mmx, -fno-sse, -fno-avx etc.
>
> Cc:ing linux-toolchains.
>
> __attribute__ ((aligned (1))) can be used to reduce alignment, similar
> to attribute packed on structs. If that doesn't work for partially
> overlapping accesses, that's probably a compiler bug.

Indeed, for me it fixes the example above with gcc-8.4:

Before:
0000000000000020 <k>:
20: c7 07 01 00 00 00 movl $0x1,(%rdi)
26: b8 01 00 00 00 mov $0x1,%eax
2b: c7 06 01 00 00 00 movl $0x1,(%rsi)
31: c3 retq

After:
0000000000000020 <k>:
20: c7 07 01 00 00 00 movl $0x1,(%rdi)
26: c7 06 01 00 00 00 movl $0x1,(%rsi)
2c: 8b 07 mov (%rdi),%eax
2e: c3 retq

That's good to know :-)

Willy

2021-05-05 10:53:41

by David Laight

[permalink] [raw]
Subject: RE: GCC, unaligned access and UB in the Linux kernel

From: Vladislav K. Valtchev
> Sent: 04 May 2021 19:08
>
> Hi everyone,
>
> I noticed that in the Linux kernel we have a define called
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS that's used in a fair amount of places
> with the following purpose: when it's set, unaligned access is supported by the
> CPU so we can do it directly, otherwise fall-back to some logic where a byte at
> the time is read/written. For example, check the implementation of
> do_strncpy_from_user():
> https://elixir.bootlin.com/linux/latest/source/lib/strncpy_from_user.c#L15
>
>
> That's nice and seems to work today as expected, but there's one problem:
> unaligned access is UB according to the ISO C standard, ...

The C standard isn't relevant.

...
>
> Mitigations
> ------------
> In my understanding, the simplest way to force GCC to emit a single MOV
> instruction with unaligned access, without risking any kind of UB, is to use
> __builtin_memcpy(). It works great, but it requires fixing the code in many
> places. Also, maybe using get_unaligned()/put_unaligned() is the right thing to
> do? The problem is that the put_unaligned_* inline functions don't use
> __builtin_memcpy() and are defined like:
>
> static __always_inline void put_unaligned_le32(u32 val, void *p)
> {
> *((__le32 *)p) = cpu_to_le32(val);
> }
>
> So, still UB. To make the compiler happy, maybe we should make them use
> __builtin_memcpy()?

That doesn't help you at all.
If the compiler is given any hint that the address is aligned
it will use potentially misaligned memory accesses.
So if the above was:
*(int __attribute__((packed)) *)p = val;
and the caller had:
int *p = maybe_unaligned();
put_unaligned_le32(0, p);

Then gcc will generate a 32bit write even on sparc.
__builtin_memcpy() will expand to exactly the same 32bit write.

This is nothing new - at least 20 years.

Basically, the C standard only allows you to cast a pointer
to 'void *' and then back to its original type.
GCC makes use of this for various optimisations and will
track the alignment through (void *) casts.

I believe that, even for the simd instructions, gcc will pick
the 'misalign supporting' version if the data is marked
with the relevant alignment.

Oh - and the loop vectorisation code is a pile of crap.
You almost never want it - look at the code it generates
for a loop you go round a few times.
It you are doing 10000 iteractions you'll need to unroll
it yourself to get decent performance.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Subject: Re: GCC, unaligned access and UB in the Linux kernel

On Wed, 2021-05-05 at 10:41 +0000, David Laight wrote:
> > That's nice and seems to work today as expected, but there's one problem:
> > unaligned access is UB according to the ISO C standard, ...
>
> The C standard isn't relevant.

Forget that I mentioned it. Unaligned access is UB for GCC, no matter the
architecture. Please check all the comments in this WONTFIX bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031

> That doesn't help you at all.

As I mentioned, TODAY that is not a problem. But as the GCC guys insist that
they consider it UB, one morning we might wake up and discover that with the
latest version of GCC, the kernel is misbehaving.

> If the compiler is given any hint that the address is aligned
> it will use potentially misaligned memory accesses.
> So if the above was:
> *(int __attribute__((packed)) *)p = val;
> and the caller had:
> int *p = maybe_unaligned();
> put_unaligned_le32(0, p);
>

Yeah, but you changed the code. That's what I'm talking about.
The code that I quoted is *current* code existing in the kernel that still works
because GCC doesn't have *yet* an optimization that breaks it. The GCC guys
explicitly claim we should NOT write code like that, not even on architectures
that support unaligned access natively, because it's UB and they might break it
any moment.

> Then gcc will generate a 32bit write even on sparc.
> __builtin_memcpy() will expand to exactly the same 32bit write.

I don't understand what you want mean with that comparison other than it's
possible to achieve the same result in another way.

__builtin_memcpy() does what we want it to do, and doesn't require
maybe_unaligned() nor __attribute__((packed)), nor it emits more instructions.
Therefore, it looks to me that put_unaligned_le32() could be re-implemented with
__builtin_memcpy() without any drawbacks.

> This is nothing new - at least 20 years.
>
> Basically, the C standard only allows you to cast a pointer
> to 'void *' and then back to its original type.
> GCC makes use of this for various optimisations and will
> track the alignment through (void *) casts.

The compiler cannot track the alignment across function calls in different
translation units. It can only assume that the alignment is already correct
(ehm, runtime checking is out of the table of course, except for UBSAN).

Today, when we read/write an integer, GCC emits a MOV instruction (on x86) and
it doesn't care about the alignment. That's why it works and, honestly, I wish
it to continue to work like that. But, that's not necessarily what will happen.


> I believe that, even for the simd instructions, gcc will pick
> the 'misalign supporting' version if the data is marked
> with the relevant alignment.
>
> Oh - and the loop vectorisation code is a pile of crap.
> You almost never want it - look at the code it generates
> for a loop you go round a few times.
> It you are doing 10000 iteractions you'll need to unroll
> it yourself to get decent performance.


In kernel we don't care about the SIMD case, because such instructions are
disabled. I just mentioned it, for the sake of completeness. When compiling user
applications, SIMD instruction are normally allowed and some of them have
alignment requirements that the compiler might assume as given instead of
checking, because dereferencing unaligned pointers is UB.


--
Vladislav Valtchev (vvaltchev)