LinuxLists.cc - a question about IP checksum helper for arm64

2018-07-06 16:28:35

Subject: a question about IP checksum helper for arm64

Hi Robin, Luke,

Recently I bumped into an error when running GCC undefined behavior
sanitizer:

UBSAN: Undefined behaviour in
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
load of misaligned address ffffffc198c8b254 for type 'const
__int128 unsigned'
which requires 16 byte alignment

The relevant code:

tmp = *(const __uint128_t *)iph;
iph += 16;
ihl -= 4;
tmp += ((tmp >> 64) | (tmp << 64));
sum = tmp >> 64;
do {
sum += *(const u32 *)iph;
iph += 4;
} while (--ihl);

But, I checked the generated disassembly, it doesn't look like anything
special is generated taking advantage of that.

I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
emitted, but don't see it.

There were some prior discussions about GCC behavior, like this thread:
https://patchwork.kernel.org/patch/9081911/ , in which you talked about
the difference between GCC4 and GCC5.3. It looks to me this is regressed
in Linaro GCC6.4 build.

I have not checked newer GCC versions.

Will it be more stable to just do this with inline assembly instead of
relying on __uint128_t data type?

GCC documentation says __int128 is supported for targets which have an
integer mode wide enough to hold 128 bits. aarch64 doesn't have such an
integer mode.

Thanks

Bo

2018-07-09 11:55:22

by Robin Murphy

[permalink] [raw]

Subject: Re: a question about IP checksum helper for arm64

Hi Bo,

On 06/07/18 17:27, Bo Yan wrote:
> Hi Robin, Luke,
>
> Recently I bumped into an error when running GCC undefined behavior
> sanitizer:
>
> UBSAN: Undefined behaviour in
> kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
>        load of misaligned address ffffffc198c8b254 for type 'const
> __int128 unsigned'
>        which requires 16 byte alignment

What's your config and reproducer here? I've had UBSan enabled a few
times since that patch went in and never noticed anything. I've just
tried it with 4.18-rc3 and indeed don't see anything from just booting
the machine and making some network traffic. It does indeed fire if I
also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a
million other warnings for all manner of types - that's to be expected
since, as the help text says, "Enabling this option on architectures
that support unaligned accesses may produce a lot of false positives."

>
> The relevant code:
>
>         tmp = *(const __uint128_t *)iph;
>         iph += 16;
>         ihl -= 4;
>         tmp += ((tmp >> 64) | (tmp << 64));
>         sum = tmp >> 64;
>         do {
>                 sum += *(const u32 *)iph;
>                 iph += 4;
>         } while (--ihl);
>
> But, I checked the generated disassembly, it doesn't look like anything
> special is generated taking advantage of that.
>
> I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
> emitted, but don't see it.

My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried
the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I
see LDP emitted as expected for most of the identifiable int128 accesses
(both in a standalone test harness and a quick survey of kernel code via
'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well
be places where the compiler gets clever enough to elide all or part of
that load where data is already held in registers - I've not audited
*that* closely - but the whole point of having a pure C implementation
is that it can be aggressively inlined more than inline asm ever could.

> There were some prior discussions about GCC behavior, like this thread:
> https://patchwork.kernel.org/patch/9081911/ , in which you talked about
> the difference between GCC4 and GCC5.3. It looks to me this is regressed
> in Linaro GCC6.4 build.
>
> I have not checked newer GCC versions.
>
> Will it be more stable to just do this with inline assembly instead of
> relying on __uint128_t data type?
>
> GCC documentation says __int128 is supported for targets which have an
> integer mode wide enough to hold 128 bits. aarch64 doesn't have such an
> integer mode.

Yet AArch64 GCC definitely does support __uint128_t, or this code
wouldn't even build ;)

Robin.

>
> Thanks
>
> Bo

2018-07-09 19:55:59

by Bo Yan

[permalink] [raw]

Subject: Re: a question about IP checksum helper for arm64

Hi Robin,

That UBSAN error prompted me to check the generated instructions. The
error by itself doesn't make sense to me because there is no requirement
for 128b alignment on ldp/stp.

With 4.18-rc3, when I build for the default "defconfig" in
arch/arm64/configs/, I see the disassembled code shows two ldr instead
of a ldp.

One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After
disassembling vmlinux, I see following:

if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
ffff0000089bcdb8: 38776b05 ldrb w5, [x24,x23]
static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
{
__uint128_t tmp;
u64 sum;

tmp = *(const __uint128_t *)iph;
ffff0000089bcdbc: f9400481 ldr x1, [x4,#8]
ffff0000089bcdc0: f8776b00 ldr x0, [x24,x23]
ffff0000089bcdc4: 12000ca5 and w5, w5, #0xf
ffff0000089bcdc8: 510014a3 sub w3, w5, #0x5

This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the
default optimization level is -O2.

I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o
in net/ipv4. The result is inconsistent. In some instances, I do see ldp
instruction being generated, in some other cases, it's two ldr. For
example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I
expected. For ip_rcv, it's not.

So it looks like this is not very consistent, but it also looks like in
majority of cases it generates the ldp instructions.

On 07/09/2018 04:54 AM, Robin Murphy wrote:
> Hi Bo,
>
> On 06/07/18 17:27, Bo Yan wrote:
>> Hi Robin, Luke,
>>
>> Recently I bumped into an error when running GCC undefined behavior
>> sanitizer:
>>
>> UBSAN: Undefined behaviour in
>> kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
>>         load of misaligned address ffffffc198c8b254 for type 'const
>> __int128 unsigned'
>>         which requires 16 byte alignment
>
> What's your config and reproducer here? I've had UBSan enabled a few
> times since that patch went in and never noticed anything. I've just
> tried it with 4.18-rc3 and indeed don't see anything from just booting
> the machine and making some network traffic. It does indeed fire if I
> also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a
> million other warnings for all manner of types - that's to be expected
> since, as the help text says, "Enabling this option on architectures
> that support unaligned accesses may produce a lot of false positives."
>
>>
>> The relevant code:
>>
>>          tmp = *(const __uint128_t *)iph;
>>          iph += 16;
>>          ihl -= 4;
>>          tmp += ((tmp >> 64) | (tmp << 64));
>>          sum = tmp >> 64;
>>          do {
>>                  sum += *(const u32 *)iph;
>>                  iph += 4;
>>          } while (--ihl);
>>
>> But, I checked the generated disassembly, it doesn't look like
>> anything special is generated taking advantage of that.
>>
>> I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
>> emitted, but don't see it.
>
> My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried
> the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I
> see LDP emitted as expected for most of the identifiable int128 accesses
> (both in a standalone test harness and a quick survey of kernel code via
> 'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well
> be places where the compiler gets clever enough to elide all or part of
> that load where data is already held in registers - I've not audited
> *that* closely - but the whole point of having a pure C implementation
> is that it can be aggressively inlined more than inline asm ever could.
>
>> There were some prior discussions about GCC behavior, like this
>> thread: https://patchwork.kernel.org/patch/9081911/ , in which you
>> talked about the difference between GCC4 and GCC5.3. It looks to me
>> this is regressed in Linaro GCC6.4 build.
>>
>> I have not checked newer GCC versions.
>>
>> Will it be more stable to just do this with inline assembly instead of
>> relying on __uint128_t data type?
>>
>> GCC documentation says __int128 is supported for targets which have an
>> integer mode wide enough to hold 128 bits. aarch64 doesn't have such
>> an integer mode.
>
> Yet AArch64 GCC definitely does support __uint128_t, or this code
> wouldn't even build ;)
>
> Robin.
>
>>
>> Thanks
>>
>> Bo