2021-05-14 10:42:16

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

From: Arnd Bergmann <[email protected]>

The get_unaligned()/put_unaligned() helpers are traditionally architecture
specific, with the two main variants being the "access-ok.h" version
that assumes unaligned pointer accesses always work on a particular
architecture, and the "le-struct.h" version that casts the data to a
byte aligned type before dereferencing, for architectures that cannot
always do unaligned accesses in hardware.

Based on the discussion linked below, it appears that the access-ok
version is not realiable on any architecture, but the struct version
probably has no downsides. This series changes the code to use the
same implementation on all architectures, addressing the few exceptions
separately.

I've included this version in the asm-generic tree for 5.14 already,
addressing the few issues that were pointed out in the RFC. If there
are any remaining problems, I hope those can be addressed as follow-up
patches.

Arnd

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/[email protected]/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2


Arnd Bergmann (13):
asm-generic: use asm-generic/unaligned.h for most architectures
openrisc: always use unaligned-struct header
sh: remove unaligned access for sh4a
m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
powerpc: use linux/unaligned/le_struct.h on LE power7
asm-generic: unaligned: remove byteshift helpers
asm-generic: unaligned always use struct helpers
partitions: msdos: fix one-byte get_unaligned()
apparmor: use get_unaligned() only for multi-byte words
mwifiex: re-fix for unaligned accesses
netpoll: avoid put_unaligned() on single character
asm-generic: uaccess: 1-byte access is always aligned
asm-generic: simplify asm/unaligned.h

arch/alpha/include/asm/unaligned.h | 12 --
arch/arm/include/asm/unaligned.h | 27 ---
arch/ia64/include/asm/unaligned.h | 12 --
arch/m68k/Kconfig | 1 +
arch/m68k/include/asm/unaligned.h | 26 ---
arch/microblaze/include/asm/unaligned.h | 27 ---
arch/mips/crypto/crc32-mips.c | 2 +-
arch/openrisc/include/asm/unaligned.h | 47 -----
arch/parisc/include/asm/unaligned.h | 6 +-
arch/powerpc/include/asm/unaligned.h | 22 ---
arch/sh/include/asm/unaligned-sh4a.h | 199 --------------------
arch/sh/include/asm/unaligned.h | 13 --
arch/sparc/include/asm/unaligned.h | 11 --
arch/x86/include/asm/unaligned.h | 15 --
arch/xtensa/include/asm/unaligned.h | 29 ---
block/partitions/ldm.h | 2 +-
block/partitions/msdos.c | 2 +-
drivers/net/wireless/marvell/mwifiex/pcie.c | 10 +-
include/asm-generic/uaccess.h | 4 +-
include/asm-generic/unaligned.h | 141 +++++++++++---
include/linux/unaligned/access_ok.h | 68 -------
include/linux/unaligned/be_byteshift.h | 71 -------
include/linux/unaligned/be_memmove.h | 37 ----
include/linux/unaligned/be_struct.h | 37 ----
include/linux/unaligned/generic.h | 115 -----------
include/linux/unaligned/le_byteshift.h | 71 -------
include/linux/unaligned/le_memmove.h | 37 ----
include/linux/unaligned/le_struct.h | 37 ----
include/linux/unaligned/memmove.h | 46 -----
net/core/netpoll.c | 4 +-
security/apparmor/policy_unpack.c | 2 +-
31 files changed, 131 insertions(+), 1002 deletions(-)
delete mode 100644 arch/alpha/include/asm/unaligned.h
delete mode 100644 arch/arm/include/asm/unaligned.h
delete mode 100644 arch/ia64/include/asm/unaligned.h
delete mode 100644 arch/m68k/include/asm/unaligned.h
delete mode 100644 arch/microblaze/include/asm/unaligned.h
delete mode 100644 arch/openrisc/include/asm/unaligned.h
delete mode 100644 arch/powerpc/include/asm/unaligned.h
delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
delete mode 100644 arch/sh/include/asm/unaligned.h
delete mode 100644 arch/sparc/include/asm/unaligned.h
delete mode 100644 arch/x86/include/asm/unaligned.h
delete mode 100644 arch/xtensa/include/asm/unaligned.h
delete mode 100644 include/linux/unaligned/access_ok.h
delete mode 100644 include/linux/unaligned/be_byteshift.h
delete mode 100644 include/linux/unaligned/be_memmove.h
delete mode 100644 include/linux/unaligned/be_struct.h
delete mode 100644 include/linux/unaligned/generic.h
delete mode 100644 include/linux/unaligned/le_byteshift.h
delete mode 100644 include/linux/unaligned/le_memmove.h
delete mode 100644 include/linux/unaligned/le_struct.h
delete mode 100644 include/linux/unaligned/memmove.h

--
2.29.2

Cc: Amitkumar Karwar <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Ganapathi Bhat <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: John Johansen <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: "Richard Russon (FlatCap)" <[email protected]>
Cc: Russell King <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Sharvari Harisangam <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vladimir Oltean <[email protected]>
Cc: Xinming Hu <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]




2021-05-14 10:45:32

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

From: Arnd Bergmann <[email protected]>

As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
behavior when faced with overlapping unaligned pointers. The kernel's
unaligned/access-ok.h header technically invokes undefined behavior
that happens to usually work on the architectures using it, but if the
compiler optimizes code based on the assumption that undefined behavior
doesn't happen, it can create output that actually causes data corruption.

A related problem was previously found on 32-bit ARMv7, where most
instructions can be used on unaligned data, but 64-bit ldrd/strd causes
an exception. The workaround was to always use the unaligned/le_struct.h
helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
8715/1: add a private asm/unaligned.h").

The same solution should work on all other architectures as well, so
remove the access-ok.h variant and use the other one unconditionally on
all architectures, picking either the big-endian or little-endian version.

With this, the arm specific header can be removed as well, and the
only file including linux/unaligned/access_ok.h gets moved to including
the normal file.

Fortunately, this made almost no difference to the object code produced
by gcc-11. On x86, s390, powerpc, and arc, the resulting binary appears
to be identical to the previous version, while on arm64 and m68k there
are minimal differences that looks like an optimization pass went into
a different direction, usually using fewer stack spills on the new
version.

Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
---
arch/arm/include/asm/unaligned.h | 25 -----------
arch/mips/crypto/crc32-mips.c | 2 +-
include/asm-generic/unaligned.h | 13 +-----
include/linux/unaligned/access_ok.h | 68 -----------------------------
4 files changed, 3 insertions(+), 105 deletions(-)
delete mode 100644 arch/arm/include/asm/unaligned.h
delete mode 100644 include/linux/unaligned/access_ok.h

diff --git a/arch/arm/include/asm/unaligned.h b/arch/arm/include/asm/unaligned.h
deleted file mode 100644
index 3c5248fb4cdc..000000000000
--- a/arch/arm/include/asm/unaligned.h
+++ /dev/null
@@ -1,25 +0,0 @@
-#ifndef __ASM_ARM_UNALIGNED_H
-#define __ASM_ARM_UNALIGNED_H
-
-/*
- * We generally want to set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on ARMv6+,
- * but we don't want to use linux/unaligned/access_ok.h since that can lead
- * to traps on unaligned stm/ldm or strd/ldrd.
- */
-#include <asm/byteorder.h>
-
-#if defined(__LITTLE_ENDIAN)
-# include <linux/unaligned/le_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned __get_unaligned_le
-# define put_unaligned __put_unaligned_le
-#elif defined(__BIG_ENDIAN)
-# include <linux/unaligned/be_struct.h>
-# include <linux/unaligned/generic.h>
-# define get_unaligned __get_unaligned_be
-# define put_unaligned __put_unaligned_be
-#else
-# error need to define endianess
-#endif
-
-#endif /* __ASM_ARM_UNALIGNED_H */
diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c
index faa88a6a74c0..0a03529cf317 100644
--- a/arch/mips/crypto/crc32-mips.c
+++ b/arch/mips/crypto/crc32-mips.c
@@ -8,13 +8,13 @@
* Copyright (C) 2018 MIPS Tech, LLC
*/

-#include <linux/unaligned/access_ok.h>
#include <linux/cpufeature.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/string.h>
#include <asm/mipsregs.h>
+#include <asm/unaligned.h>

#include <crypto/internal/hash.h>

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index d79df721ae60..36bf03aaa674 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -8,22 +8,13 @@
*/
#include <asm/byteorder.h>

-/* Set by the arch if it can handle unaligned accesses in hardware. */
-#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/access_ok.h>
-#endif
-
#if defined(__LITTLE_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/le_struct.h>
-# endif
+# include <linux/unaligned/le_struct.h>
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_le
# define put_unaligned __put_unaligned_le
#elif defined(__BIG_ENDIAN)
-# ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-# include <linux/unaligned/be_struct.h>
-# endif
+# include <linux/unaligned/be_struct.h>
# include <linux/unaligned/generic.h>
# define get_unaligned __get_unaligned_be
# define put_unaligned __put_unaligned_be
diff --git a/include/linux/unaligned/access_ok.h b/include/linux/unaligned/access_ok.h
deleted file mode 100644
index 167aa849c0ce..000000000000
--- a/include/linux/unaligned/access_ok.h
+++ /dev/null
@@ -1,68 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_UNALIGNED_ACCESS_OK_H
-#define _LINUX_UNALIGNED_ACCESS_OK_H
-
-#include <linux/kernel.h>
-#include <asm/byteorder.h>
-
-static __always_inline u16 get_unaligned_le16(const void *p)
-{
- return le16_to_cpup((__le16 *)p);
-}
-
-static __always_inline u32 get_unaligned_le32(const void *p)
-{
- return le32_to_cpup((__le32 *)p);
-}
-
-static __always_inline u64 get_unaligned_le64(const void *p)
-{
- return le64_to_cpup((__le64 *)p);
-}
-
-static __always_inline u16 get_unaligned_be16(const void *p)
-{
- return be16_to_cpup((__be16 *)p);
-}
-
-static __always_inline u32 get_unaligned_be32(const void *p)
-{
- return be32_to_cpup((__be32 *)p);
-}
-
-static __always_inline u64 get_unaligned_be64(const void *p)
-{
- return be64_to_cpup((__be64 *)p);
-}
-
-static __always_inline void put_unaligned_le16(u16 val, void *p)
-{
- *((__le16 *)p) = cpu_to_le16(val);
-}
-
-static __always_inline void put_unaligned_le32(u32 val, void *p)
-{
- *((__le32 *)p) = cpu_to_le32(val);
-}
-
-static __always_inline void put_unaligned_le64(u64 val, void *p)
-{
- *((__le64 *)p) = cpu_to_le64(val);
-}
-
-static __always_inline void put_unaligned_be16(u16 val, void *p)
-{
- *((__be16 *)p) = cpu_to_be16(val);
-}
-
-static __always_inline void put_unaligned_be32(u32 val, void *p)
-{
- *((__be32 *)p) = cpu_to_be32(val);
-}
-
-static __always_inline void put_unaligned_be64(u64 val, void *p)
-{
- *((__be64 *)p) = cpu_to_be64(val);
-}
-
-#endif /* _LINUX_UNALIGNED_ACCESS_OK_H */
--
2.29.2


2021-05-14 20:03:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On Fri, May 14, 2021 at 3:02 AM Arnd Bergmann <[email protected]> wrote:
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.

This continues to look great to me, and now has the even simpler
remaining implementation.

I'd be tempted to just pull it in for 5.13, but I guess we don't
actually have any _outstanding_ bug in this area (the bug was in our
zlib code, required -O3 to trigger, has been fixed now, and the biggy
case didn't even use "get_unaligned()").

So I guess your 5.14 timing is the right thing to do.

Linus

2021-05-14 20:30:50

by Vineet Gupta

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On 5/14/21 12:22 PM, Linus Torvalds wrote:
> On Fri, May 14, 2021 at 11:52 AM Vineet Gupta
> <[email protected]> wrote:
>> Wasn't the new zlib code slated for 5.14. I don't see it in your master yet
> You're right, I never actually committed it, since it was specific to
> ARC and -O3

Well, not really, the issue manifested in ARC O3 testing, but I showed
the problem existed for arm64 gcc too.

> and I wasn't entirely happy with the amount of testing it
> got (with Heiko pointing out that the s390 stuff needed more fixes for
> the change).

With his addon patch everything seemed hunky dory.

> The patch below is required on top of your patch to make it compile
> for s390 as well.
> Tested with kernel image decompression, and also btrfs with file
> compression; both software and hardware compression.
> Everything seems to work.

> So in fact it's not even queued up for 5.14 due to this all, I just dropped it.

But Why. Can't we throw it in linux-next for 5.14. I promise to test it
- and will likely hit any corner cases. Also for the time being we could
force just that file/files to build for -O3 to stress test the aspects
that were fragile.

>>> and the biggy
>>> case didn't even use "get_unaligned()").
>> Indeed this series is sort of orthogonal to that bug, but IMO that bug
>> still exists in 5.13 for -O3 build, granted that is not enabled for !ARC.
> Right, the zlib bug is still there.
>
> But Arnd's series wouldn't even fix it: right now inffast has its own
> - ugly and slow - special 2-byte-only version of "get_unaligned()",
> called "get_unaligned16()".

I know that's why said they are orthogonal.


> And because it's ugly and slow, it's not actually used for
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
>
> Vineet - maybe the fix is to not take my patch to update to a newer
> zlib, but to just fix inffast to use the proper get_unaligned(). Then
> Arnd's series _would_ actually fix all this..

OK if you say so.

-Vineet

2021-05-14 20:49:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On Fri, May 14, 2021 at 12:45 PM Vineet Gupta
<[email protected]> wrote:
>
> Well, not really, the issue manifested in ARC O3 testing, but I showed
> the problem existed for arm64 gcc too.

.. but not with a supported kernel configuration.

> > So in fact it's not even queued up for 5.14 due to this all, I just dropped it.
>
> But Why.

I just didn't have time to deal with it during the merge window. If
you keep it alive, that's all fine and good.

Linus

2021-05-19 05:55:22

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> behavior when faced with overlapping unaligned pointers. The kernel's
> unaligned/access-ok.h header technically invokes undefined behavior
> that happens to usually work on the architectures using it, but if the
> compiler optimizes code based on the assumption that undefined behavior
> doesn't happen, it can create output that actually causes data corruption.
>
> A related problem was previously found on 32-bit ARMv7, where most
> instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> an exception. The workaround was to always use the unaligned/le_struct.h
> helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> 8715/1: add a private asm/unaligned.h").
>
> The same solution should work on all other architectures as well, so
> remove the access-ok.h variant and use the other one unconditionally on
> all architectures, picking either the big-endian or little-endian version.

FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
copy between overlapping unaligned pointers
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

I'm not sure whether the kernel will run into that or not, and gcc has since
fixed it. But it's worth mentioning, especially since the issue mentioned in
this commit sounds very similar (overlapping unaligned pointers), and both
involved implementations of DEFLATE decompression.

Anyway, partly due to the above, in userspace I now only use memcpy() to
implement {get,put}_unaligned_*, since these days it seems to be compiled
optimally and have the least amount of problems.

I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
it was fixed in gcc 6.

- Eric

2021-05-19 14:13:32

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Mon, May 17, 2021 at 11:53 PM Eric Biggers <[email protected]> wrote:
> On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <[email protected]>
> >
> > As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> > behavior when faced with overlapping unaligned pointers. The kernel's
> > unaligned/access-ok.h header technically invokes undefined behavior
> > that happens to usually work on the architectures using it, but if the
> > compiler optimizes code based on the assumption that undefined behavior
> > doesn't happen, it can create output that actually causes data corruption.
> >
> > A related problem was previously found on 32-bit ARMv7, where most
> > instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> > an exception. The workaround was to always use the unaligned/le_struct.h
> > helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> > 8715/1: add a private asm/unaligned.h").
> >
> > The same solution should work on all other architectures as well, so
> > remove the access-ok.h variant and use the other one unconditionally on
> > all architectures, picking either the big-endian or little-endian version.
>
> FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
> copy between overlapping unaligned pointers
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Thank you for pointing this out

> I'm not sure whether the kernel will run into that or not, and gcc has since
> fixed it. But it's worth mentioning, especially since the issue mentioned in
> this commit sounds very similar (overlapping unaligned pointers), and both
> involved implementations of DEFLATE decompression.

I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1
and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's
preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Trying with both the original get_unaligned() version in there and the
packed-struct
variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those
myself for arc hs4x , but it's rather different from the output that Vineet got
and I don't know how to spot whether the problem exists in any of those
versions.

> Anyway, partly due to the above, in userspace I now only use memcpy() to
> implement {get,put}_unaligned_*, since these days it seems to be compiled
> optimally and have the least amount of problems.
>
> I wonder if the kernel should do the same, or whether there are still cases
> where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> it was fixed in gcc 6.

It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.

Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug, but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.

I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.

Arnd

2021-05-19 18:19:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <[email protected]> wrote:
> >
> > I wonder if the kernel should do the same, or whether there are still cases
> > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> > it was fixed in gcc 6.
>
> It would have to be memmove(), not memcpy() in this case, right?

No, it would simply be something like

#define __get_unaligned_t(type, ptr) \
({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

#define get_unaligned(ptr) \
__get_unaligned_t(typeof(*(ptr)), ptr)

but honestly, the likelihood that the compiler generates something
horrible (possibly because of KASAN etc) is uncomfortably high.

I'd prefer the __packed thing. We don't actually use -O3, and it's
considered a bad idea, and the gcc bug is as such less likely than
just the above generating unacceptable code (we have several cases
where "bad code generation" ends up being an actual bug, since we
depend on inlining and depend on some code sequences not generating
calls etc).

But I hate how gcc is buggy in so many places here, and the
straightforward thing is made to explicitly not work.

I absolutely despise compiler people who think it's ok to generate
known bad code based on pointless "undefined behavior" arguments - and
then those same clever optimizations break even when you do things
properly. It's basically intellectual dishonesty - doing known
fragile things, blaming the user when it breaks, but then not
acknowledging that the fragile shit they did was broken even when the
user bent over backwards.

Linus

2021-05-19 18:20:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <[email protected]> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
> #define __get_unaligned_t(type, ptr) \
> ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })
>
> #define get_unaligned(ptr) \
> __get_unaligned_t(typeof(*(ptr)), ptr)
>
> but honestly, the likelihood that the compiler generates something
> horrible (possibly because of KASAN etc) is uncomfortably high.
>
> I'd prefer the __packed thing. We don't actually use -O3, and it's
> considered a bad idea, and the gcc bug is as such less likely than
> just the above generating unacceptable code (we have several cases
> where "bad code generation" ends up being an actual bug, since we
> depend on inlining and depend on some code sequences not generating
> calls etc).

I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.

From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as

'-ftree-loop-vectorize'
Perform loop vectorization on trees. This flag is enabled by
default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
'-fauto-profile'.

-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.

To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

Arnd

2021-05-19 18:24:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <[email protected]> wrote:
>
> To be on the safe side, we could pass -fno-tree-loop-vectorize along
> with -O3 on the affected gcc versions, or use a bigger hammer
> (not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

I personally think -O3 in general is unsafe.

It has historically been horribly buggy. It's gotten better, but this
case clearly shows that "gotten better" really isn't that high of a
bar.

Very few projects use -O3, which is obviously part of why it's buggy.
But the other part of why it's buggy is that vectorization is simply
very complicated, and honestly, judging by the last report the gcc
people don't care about being careful. They literally are ok with
knowingly generating an off-by-one range check, because "it's
undefined behavior".

With that kind of mentality, I'm not personally all that inclined to
say "sure, use -O3". We know it has bugs even for the well-defined
cases.

> -O3 is set for the lz4 and zstd compression helpers and for wireguard.

I'm actually surprised wireguard would use -O3. Yes, performance is
important. But for wireguard, correctness is certainly important too.
Maybe Jason isn't aware of just how bad gcc -O3 has historically been?

And -O3 has often generated _slower_ code, in addition to the bugs.
It's not like it's a situation where "-O3 is obviously better than
-O2". There's a reason -O2 is the default.

And that tends to be even more true in the kernel than in many user
space programs (ie smaller loops, generally much higher I$ miss rates
etc).

Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
There are known buggy gcc versions that aren't ancient.

Of the other cases, that xor-neon.c case actually makes sense. For
that file, it literally exists _only_ to get a vectorized version of
the trivial xor_8regs loop. It's one of the (very very few) cases of
vectorization we actually want. And in that case, we might even want
to make things easier - and more explicit - for the compiler by making
the xor_8regs loops use "restrict" pointers.

That neon case actually wants and needs that tree-vectorization to
DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
xor_8regs code is literally using hand-unrolled loops already, exactly
to make it as simple as possible for the compiler (but the lack of
"restrict" pointers means that it's not all that simple after all, and
I assume the compiler generates conditionals for the NEON case?

lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
it also very much uses unaligned accesses, which is where the gcc bug
hits. I doubt that it really needs or wants the loop vectorization.

zstd looks very similar to lz4.

End result: at a minimum, I'd suggest using
"-fno-tree-loop-vectorize", although somebody should check that NEON
case.

And I still think that using O3 for anything halfway complicated
should be considered odd and need some strong numbers to enable.

Linus

2021-05-19 18:28:49

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

Hi Linus,

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<[email protected]> wrote:
> I'm actually surprised wireguard would use -O3. Yes, performance is
> important. But for wireguard, correctness is certainly important too.
> Maybe Jason isn't aware of just how bad gcc -O3 has historically been?
> Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
> There are known buggy gcc versions that aren't ancient.

My impression has always been that O3 might sometimes generate slower
code, but not that it generates buggy code so commonly. Thanks for
letting me know.

I have a habit of compulsively run IDA Pro after making changes (brain
damage from too many years as a "security person" or something), to
see what the compiler did, and I've just been doing that with O3 since
the beginning of the project, so that's what I wound up optimizing
for. Or sometimes I'll work little things out in Godbolt's compiler
explorer. It's not like it matters much most of the time, but
sometimes I enjoy the golf. Anyway, I've never noticed it producing
any clearly wrong code compared to O2. But I'm obviously not testing
on all compilers or on all architectures. So if you think there's
danger lurking somewhere, it seems reasonable to change that to O2.

Comparing gcc 11's output between O2 and O3, it looks like the primary
difference is that the constant propagation is much less aggressive
with O2, and less inlining in general also means that some stores and
loads to local variables across static function calls aren't being
coalesced. A few null checks are removed too, where the compiler can
prove them away.

So while I've never seen issues with that code under O3, I don't see a
super compelling speed up anywhere either, but rather a bunch of
places that may or may not be theoretically faster or slower on some
system, maybe. I can queue up a patch for the next wireguard series I
send to Dave.

Jason

2021-05-19 18:35:28

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Tue, May 18, 2021 at 6:12 PM Linus Torvalds
<[email protected]> wrote:
> On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <[email protected]> wrote:
>
> Of the other cases, that xor-neon.c case actually makes sense. For
> that file, it literally exists _only_ to get a vectorized version of
> the trivial xor_8regs loop. It's one of the (very very few) cases of
> vectorization we actually want. And in that case, we might even want
> to make things easier - and more explicit - for the compiler by making
> the xor_8regs loops use "restrict" pointers.
>
> That neon case actually wants and needs that tree-vectorization to
> DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
> xor_8regs code is literally using hand-unrolled loops already, exactly
> to make it as simple as possible for the compiler (but the lack of
> "restrict" pointers means that it's not all that simple after all, and
> I assume the compiler generates conditionals for the NEON case?

Right, I think there is an ongoing debate over how to best handle this
one in clang, since that does not do any vectorization for this file
unless the pointers are marked "restrict". As far as I remember, there
are some callers that want to do the xor in place though, which
means restrict is wrong.

> lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
> it also very much uses unaligned accesses, which is where the gcc bug
> hits. I doubt that it really needs or wants the loop vectorization.

I ran some limited speed tests with the lz4 sources that come with Ubuntu,
using gcc-10.3 on an AMD Zen1 Threadripper with 10GB of /dev/urandom
input.
This package patches the sources to use -O2 and no vectorization,
which turns out to be the fastest combination for my stupid test as well.

The results are barely above noise, but it appears that -O2
-ftree-loop-vectorize
makes it slightly slower than just -O2, while -O3 is even slower than
that regardless of -fno-tree-loop-vectorize/-ftree-loop-vectorize.

I see that Nobuhiro Iwamatsu (Cc'd) changed the Debian lz4 package to
use -O2, but I don't see an explanation for it. I also see that the lz4 sources
force -O2 on ppc64 because -O3 causes a 30% slowdown from vectorization.
The kernel version is missing the bit that does this.

> zstd looks very similar to lz4.

> End result: at a minimum, I'd suggest using
> "-fno-tree-loop-vectorize", although somebody should check that NEON
> case.

> And I still think that using O3 for anything halfway complicated
> should be considered odd and need some strong numbers to enable.

Agreed. I think there is a fairly strong case for just using -O2 on lz4
and backport that to stable.
Searching for lz4 bugs with -O3 also finds several reports including
one that I sent myself:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702

I see that user space zstd is built with -O3 in Debian, but it the changelog
also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
Biggers", so maybe Eric has some useful ideas on whether we should
just use -O2 for the in-kernel version.

Arnd

2021-05-19 18:36:15

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

From: Linus Torvalds
> Sent: 18 May 2021 15:56
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <[email protected]> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
> #define __get_unaligned_t(type, ptr) \
> ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

You still need something to ensure that gcc can't assume that
'ptr' has an aligned type.
If there is an 'int *ptr' visible in the call chain no amount
of (void *) casts will make gcc forget the alignment.
So the memcpy() will get converted to an aligned load-store pair.
(This has always caused grief on sparc.)

A cast though (long) might be enough, as might a cast to a __packed
struct pointer type.
Using a union of the two pointer types might be ok - but might
generate a store/load to stack.
An alternative is an asm statement with input and output of different
pointer types but using the same register for both.
That ought to force the compile to forget any tracked type
and value.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-05-19 18:37:04

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

On Tue, May 18, 2021 at 10:51:23PM +0200, Arnd Bergmann wrote:
>
> > zstd looks very similar to lz4.
>
> > End result: at a minimum, I'd suggest using
> > "-fno-tree-loop-vectorize", although somebody should check that NEON
> > case.
>
> > And I still think that using O3 for anything halfway complicated
> > should be considered odd and need some strong numbers to enable.
>
> Agreed. I think there is a fairly strong case for just using -O2 on lz4
> and backport that to stable.
> Searching for lz4 bugs with -O3 also finds several reports including
> one that I sent myself:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702
>
> I see that user space zstd is built with -O3 in Debian, but it the changelog
> also lists "Improved : better speed on clang and gcc -O2, thanks to Eric
> Biggers", so maybe Eric has some useful ideas on whether we should
> just use -O2 for the in-kernel version.
>

In my opinion, -O2 is a good default even for compression code. I generally
don't see any benefit from -O3 in compression code I've written.

That being said, -O2 is what I usually use during development. Other people
could write code that relies on -O3 to be optimized well.

The Makefiles for lz4 and zstd use -O3 by default, which is a little concerning.
I do expect that they're still well-written enough to do well with -O2 too, but
it would require doing benchmarks to tell for sure. (As Arnd noted, it happens
that I did do such benchmarks on zstd about 5 years ago, and I found an issue
where some functions weren't marked inline when they should be, causing them to
be inlined at -O3 but not at -O2. That got fixed.)

- Eric

2021-12-16 17:29:56

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

Hi Arnd,

(replying to an old thread as this came up in the discussion regarding
misaligned loads and stored in siphash() when compiled for ARM
[f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])

On Fri, 14 May 2021 at 12:02, Arnd Bergmann <[email protected]> wrote:
>
> From: Arnd Bergmann <[email protected]>
>
> The get_unaligned()/put_unaligned() helpers are traditionally architecture
> specific, with the two main variants being the "access-ok.h" version
> that assumes unaligned pointer accesses always work on a particular
> architecture, and the "le-struct.h" version that casts the data to a
> byte aligned type before dereferencing, for architectures that cannot
> always do unaligned accesses in hardware.
>
> Based on the discussion linked below, it appears that the access-ok
> version is not realiable on any architecture, but the struct version
> probably has no downsides. This series changes the code to use the
> same implementation on all architectures, addressing the few exceptions
> separately.
>
> I've included this version in the asm-generic tree for 5.14 already,
> addressing the few issues that were pointed out in the RFC. If there
> are any remaining problems, I hope those can be addressed as follow-up
> patches.
>

I think this series is a huge improvement, but it does not solve the
UB problem completely. As we found, there are open issues in the GCC
bugzilla regarding assumptions in the compiler that aligned quantities
either overlap entirely or not at all. (e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
conditionally emit code that violates C alignment rules. E.g., there
is this example in Documentation/core-api/unaligned-memory-access.rst:

bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
{
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
return fold == 0;
#else
...

(which now deviates from its actual implementation, but the point is
the same) where CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in the
wrong way (IMHO).

The pattern seems to be

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
// ignore alignment rules, just cast to a more aligned pointer type
#else
// use unaligned accessors, which could be either cheap or expensive,
// depending on the architecture
#endif

whereas the following pattern makes more sense, I think, and does not
violate any C rules in the common case:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
// use unaligned accessors, which are cheap or even entirely free
#else
// avoid unaligned accessors, as they are expensive; instead, reorganize
// the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
#endif

The only remaining problem here is reinterpreting a char* pointer to a
u32*, e.g., for accessing the IP address in an Ethernet frame when
NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
as I understand it.

In the 32-bit ARM case (v6+) [which is admittedly an outlier] this
makes a substantial difference, as ARMv6 does have efficient unaligned
accessors (load/store word or halfword may be used on misaligned
addresses) but requires that load/store double-word and load/store
multiple are only used on 32-bit aligned addresses. GCC does the right
thing with the unaligned accessors, but blindly casting away
misalignment may result in alignment traps if the compiler happened to
emit load-double or load-multiple instructions for the memory access
in question.

Jason already verifed that in the siphash() case, the aligned and
unaligned versions of the code actually compile to the same machine
code on x86, as the unaligned accessors just disappear. I suspect this
to be the case for many instances where
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is being used, mostly in the
networking stack.

So I intend to dig a bit deeper into this, and perhaps propose some
changes where the interpretation of
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is documented more clearly, and
tweaked according to my suggestion above (while ensuring that codegen
does not suffer, of course)

Thoughts, concerns, objections?


--
Ard.




> Link: https://lore.kernel.org/lkml/[email protected]/
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
> Link: https://lore.kernel.org/lkml/[email protected]/
> Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
>
>
> Arnd Bergmann (13):
> asm-generic: use asm-generic/unaligned.h for most architectures
> openrisc: always use unaligned-struct header
> sh: remove unaligned access for sh4a
> m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> powerpc: use linux/unaligned/le_struct.h on LE power7
> asm-generic: unaligned: remove byteshift helpers
> asm-generic: unaligned always use struct helpers
> partitions: msdos: fix one-byte get_unaligned()
> apparmor: use get_unaligned() only for multi-byte words
> mwifiex: re-fix for unaligned accesses
> netpoll: avoid put_unaligned() on single character
> asm-generic: uaccess: 1-byte access is always aligned
> asm-generic: simplify asm/unaligned.h
>
> arch/alpha/include/asm/unaligned.h | 12 --
> arch/arm/include/asm/unaligned.h | 27 ---
> arch/ia64/include/asm/unaligned.h | 12 --
> arch/m68k/Kconfig | 1 +
> arch/m68k/include/asm/unaligned.h | 26 ---
> arch/microblaze/include/asm/unaligned.h | 27 ---
> arch/mips/crypto/crc32-mips.c | 2 +-
> arch/openrisc/include/asm/unaligned.h | 47 -----
> arch/parisc/include/asm/unaligned.h | 6 +-
> arch/powerpc/include/asm/unaligned.h | 22 ---
> arch/sh/include/asm/unaligned-sh4a.h | 199 --------------------
> arch/sh/include/asm/unaligned.h | 13 --
> arch/sparc/include/asm/unaligned.h | 11 --
> arch/x86/include/asm/unaligned.h | 15 --
> arch/xtensa/include/asm/unaligned.h | 29 ---
> block/partitions/ldm.h | 2 +-
> block/partitions/msdos.c | 2 +-
> drivers/net/wireless/marvell/mwifiex/pcie.c | 10 +-
> include/asm-generic/uaccess.h | 4 +-
> include/asm-generic/unaligned.h | 141 +++++++++++---
> include/linux/unaligned/access_ok.h | 68 -------
> include/linux/unaligned/be_byteshift.h | 71 -------
> include/linux/unaligned/be_memmove.h | 37 ----
> include/linux/unaligned/be_struct.h | 37 ----
> include/linux/unaligned/generic.h | 115 -----------
> include/linux/unaligned/le_byteshift.h | 71 -------
> include/linux/unaligned/le_memmove.h | 37 ----
> include/linux/unaligned/le_struct.h | 37 ----
> include/linux/unaligned/memmove.h | 46 -----
> net/core/netpoll.c | 4 +-
> security/apparmor/policy_unpack.c | 2 +-
> 31 files changed, 131 insertions(+), 1002 deletions(-)
> delete mode 100644 arch/alpha/include/asm/unaligned.h
> delete mode 100644 arch/arm/include/asm/unaligned.h
> delete mode 100644 arch/ia64/include/asm/unaligned.h
> delete mode 100644 arch/m68k/include/asm/unaligned.h
> delete mode 100644 arch/microblaze/include/asm/unaligned.h
> delete mode 100644 arch/openrisc/include/asm/unaligned.h
> delete mode 100644 arch/powerpc/include/asm/unaligned.h
> delete mode 100644 arch/sh/include/asm/unaligned-sh4a.h
> delete mode 100644 arch/sh/include/asm/unaligned.h
> delete mode 100644 arch/sparc/include/asm/unaligned.h
> delete mode 100644 arch/x86/include/asm/unaligned.h
> delete mode 100644 arch/xtensa/include/asm/unaligned.h
> delete mode 100644 include/linux/unaligned/access_ok.h
> delete mode 100644 include/linux/unaligned/be_byteshift.h
> delete mode 100644 include/linux/unaligned/be_memmove.h
> delete mode 100644 include/linux/unaligned/be_struct.h
> delete mode 100644 include/linux/unaligned/generic.h
> delete mode 100644 include/linux/unaligned/le_byteshift.h
> delete mode 100644 include/linux/unaligned/le_memmove.h
> delete mode 100644 include/linux/unaligned/le_struct.h
> delete mode 100644 include/linux/unaligned/memmove.h
>
> --
> 2.29.2
>
> Cc: Amitkumar Karwar <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Florian Fainelli <[email protected]>
> Cc: Ganapathi Bhat <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: James Morris <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: John Johansen <[email protected]>
> Cc: Jonas Bonn <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Rich Felker <[email protected]>
> Cc: "Richard Russon (FlatCap)" <[email protected]>
> Cc: Russell King <[email protected]>
> Cc: "Serge E. Hallyn" <[email protected]>
> Cc: Sharvari Harisangam <[email protected]>
> Cc: Stafford Horne <[email protected]>
> Cc: Stefan Kristiansson <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Vladimir Oltean <[email protected]>
> Cc: Xinming Hu <[email protected]>
> Cc: Yoshinori Sato <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
>

2021-12-16 17:43:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On Thu, Dec 16, 2021 at 9:29 AM Ard Biesheuvel <[email protected]> wrote:
>
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules. E.g., there
> is this example in Documentation/core-api/unaligned-memory-access.rst:
>
> bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
> {
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
> ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));
> return fold == 0;
> #else

It probably works fine in practice - the one case we had was really
pretty special, and about the vectorizer doing odd things.

But I think we should strive to convert these to use
"get_unaligned()", since code generation is fine. It still often makes
sense to have that test for the config variable, simply because the
approach might be different if we know unaligned accesses are slow.

So I'll happily take patches that do obvious conversions to
get_unaligned() where they make sense, but I don't think we should
consider this some huge hard requirement.

Linus

2021-12-16 17:49:36

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

From: Ard Biesheuvel
> Sent: 16 December 2021 17:30
>
> Hi Arnd,
>
> (replying to an old thread as this came up in the discussion regarding
> misaligned loads and stored in siphash() when compiled for ARM
> [f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d])
>
> On Fri, 14 May 2021 at 12:02, Arnd Bergmann <[email protected]> wrote:
> >
> > From: Arnd Bergmann <[email protected]>
> >
> > The get_unaligned()/put_unaligned() helpers are traditionally architecture
> > specific, with the two main variants being the "access-ok.h" version
> > that assumes unaligned pointer accesses always work on a particular
> > architecture, and the "le-struct.h" version that casts the data to a
> > byte aligned type before dereferencing, for architectures that cannot
> > always do unaligned accesses in hardware.

I'm pretty sure the compiler is allowed to 'read through' that cast
and still do an aligned access.
It has always been hard to get the compiler to 'forget' about known/expected
alignment - typically trying to stop memcpy() faulting on sparc.
Real function calls are usually required - but LTO may scupper that.

> >
> > Based on the discussion linked below, it appears that the access-ok
> > version is not realiable on any architecture, but the struct version
> > probably has no downsides. This series changes the code to use the
> > same implementation on all architectures, addressing the few exceptions
> > separately.
> >
> > I've included this version in the asm-generic tree for 5.14 already,
> > addressing the few issues that were pointed out in the RFC. If there
> > are any remaining problems, I hope those can be addressed as follow-up
> > patches.
> >
>
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

I think we can stop the compiler merging unaligned requests by adding a byte-sized
memory barrier for the base address before and after the access.
That should still support complex addressing modes (esp on x86).

Another option is to do the misaligned access from within an asm statement.
While architecture dependant, it only really depends on the syntax of the ld/st
instruction.
The compiler can't merge those because it doesn't know whether the data is
'frobbed' before/after the memory access.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-12-16 19:15:42

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On Thu, Dec 16, 2021 at 06:29:40PM +0100, Ard Biesheuvel wrote:
> I think this series is a huge improvement, but it does not solve the
> UB problem completely. As we found, there are open issues in the GCC
> bugzilla regarding assumptions in the compiler that aligned quantities
> either overlap entirely or not at all. (e.g.,
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363)

That isn't open, it was closed as INVALID back in May.

(Naturally) aligned quantities only overlap if they are the same datum.
This follows directly from the definition of (naturally) aligned. There
is no mystery here.

All unaligned data need to be marked up properly.

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used in many places to
> conditionally emit code that violates C alignment rules.

Most of this is ABI, not C. It is the ABI that requires certain
alignments. Ignoring that plain does not work, but even if it would
you will end up with much slower generated code.

> whereas the following pattern makes more sense, I think, and does not
> violate any C rules in the common case:
>
> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> // use unaligned accessors, which are cheap or even entirely free
> #else
> // avoid unaligned accessors, as they are expensive; instead, reorganize
> // the data so we don't need them (similar to setting NET_IP_ALIGN to 2)
> #endif

Yes, this looks more reasonable.

> The only remaining problem here is reinterpreting a char* pointer to a
> u32*, e.g., for accessing the IP address in an Ethernet frame when
> NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> as I understand it.

The problem is never casting a pointer to pointer to character type, and
then later back to an appriopriate pointer type. These things are both
required to work. The problem always is accessing something as if it
was something of another type, which is not valid C. This however is
exactly what -fno-strict-aliasing allows, so that works as well.

But this does not have much to do with alignment.


Segher

2021-12-17 12:35:00

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

From: Segher Boessenkool
> Sent: 16 December 2021 18:56
...
> > The only remaining problem here is reinterpreting a char* pointer to a
> > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > as I understand it.
>
> The problem is never casting a pointer to pointer to character type, and
> then later back to an appriopriate pointer type.
> These things are both required to work.

I think that is true of 'void *', not 'char *'.
'char' is special in that 'strict aliasing' doesn't apply to it.
(Which is actually a pain sometimes.)

> The problem always is accessing something as if it
> was something of another type, which is not valid C. This however is
> exactly what -fno-strict-aliasing allows, so that works as well.

IIRC the C language only allows you to have pointers to valid data items.
(Since they can only be generated by the & operator on a valid item.)
Indirecting any other pointer is probably UB!

This (sort of) allows the compiler to 'look through' casts to find
what the actual type is (or might be).
It can then use that information to make optimisation choices.
This has caused grief with memcpy() calls that are trying to copy
a structure that the coder knows is misaligned to an aligned buffer.

So while *(unaligned_ptr *)char_ptr probably has to work.
If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
assume the alignment of the 'int_ptr' and do a single aligned access.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2021-12-17 13:54:23

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] Unify asm/unaligned.h around struct helper

On Fri, Dec 17, 2021 at 12:34:53PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 16 December 2021 18:56
> ...
> > > The only remaining problem here is reinterpreting a char* pointer to a
> > > u32*, e.g., for accessing the IP address in an Ethernet frame when
> > > NET_IP_ALIGN == 2, which could suffer from the same UB problem again,
> > > as I understand it.
> >
> > The problem is never casting a pointer to pointer to character type, and
> > then later back to an appriopriate pointer type.
> > These things are both required to work.
>
> I think that is true of 'void *', not 'char *'.

No, see 6.3.2.3/7. Both are allowed (and behave the same in fact).

> 'char' is special in that 'strict aliasing' doesn't apply to it.
> (Which is actually a pain sometimes.)

That has nothing to do with it. Yes, you can validly access any memory
as a character type, but that has nothing to do with what pointer casts
are allowed and which are not.

> > The problem always is accessing something as if it
> > was something of another type, which is not valid C. This however is
> > exactly what -fno-strict-aliasing allows, so that works as well.
>
> IIRC the C language only allows you to have pointers to valid data items.
> (Since they can only be generated by the & operator on a valid item.)

Not so. For example you are explicitly allowed to have pointers one
past the last element of an array (and do arithmetic on that!), and of
course null pointers are a thing.

C allows you to make up pointers from integers as well. This is
perfectly fine to do. Accessing anything via such pointers might well
be not standard C, of course.

> Indirecting any other pointer is probably UB!

If a pointer points to an object, indirecting it gives an lvalue of that
object. It does not matter how you got that pointer, all that matters
is that it points at a valid object.

> This (sort of) allows the compiler to 'look through' casts to find
> what the actual type is (or might be).
> It can then use that information to make optimisation choices.
> This has caused grief with memcpy() calls that are trying to copy
> a structure that the coder knows is misaligned to an aligned buffer.

This is 6.5/7.

Alignment is 6.2.8 but it doesn't actually come into play at all here.

> So while *(unaligned_ptr *)char_ptr probably has to work.

Only if the original pointer points to an object that is correct
(including correctly aligned) for such an lvalue.

> If the compiler can see *(unaligned_ptr *)(char *)int_ptr it can
> assume the alignment of the 'int_ptr' and do a single aligned access.

It is undefined behaviour to have an address in int_ptr that is not
correctly aligned for whatever type it points to.


Segher