Hi Robin, Luke,
Recently I bumped into an error when running GCC undefined behavior
sanitizer:
UBSAN: Undefined behaviour in
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
load of misaligned address ffffffc198c8b254 for type 'const
__int128 unsigned'
which requires 16 byte alignment
The relevant code:
tmp = *(const __uint128_t *)iph;
iph += 16;
ihl -= 4;
tmp += ((tmp >> 64) | (tmp << 64));
sum = tmp >> 64;
do {
sum += *(const u32 *)iph;
iph += 4;
} while (--ihl);
But, I checked the generated disassembly, it doesn't look like anything
special is generated taking advantage of that.
I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
emitted, but don't see it.
There were some prior discussions about GCC behavior, like this thread:
https://patchwork.kernel.org/patch/9081911/ , in which you talked about
the difference between GCC4 and GCC5.3. It looks to me this is regressed
in Linaro GCC6.4 build.
I have not checked newer GCC versions.
Will it be more stable to just do this with inline assembly instead of
relying on __uint128_t data type?
GCC documentation says __int128 is supported for targets which have an
integer mode wide enough to hold 128 bits. aarch64 doesn't have such an
integer mode.
Thanks
Bo
Hi Bo,
On 06/07/18 17:27, Bo Yan wrote:
> Hi Robin, Luke,
>
> Recently I bumped into an error when running GCC undefined behavior
> sanitizer:
>
> UBSAN: Undefined behaviour in
> kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
> load of misaligned address ffffffc198c8b254 for type 'const
> __int128 unsigned'
> which requires 16 byte alignment
What's your config and reproducer here? I've had UBSan enabled a few
times since that patch went in and never noticed anything. I've just
tried it with 4.18-rc3 and indeed don't see anything from just booting
the machine and making some network traffic. It does indeed fire if I
also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a
million other warnings for all manner of types - that's to be expected
since, as the help text says, "Enabling this option on architectures
that support unaligned accesses may produce a lot of false positives."
>
> The relevant code:
>
> tmp = *(const __uint128_t *)iph;
> iph += 16;
> ihl -= 4;
> tmp += ((tmp >> 64) | (tmp << 64));
> sum = tmp >> 64;
> do {
> sum += *(const u32 *)iph;
> iph += 4;
> } while (--ihl);
>
> But, I checked the generated disassembly, it doesn't look like anything
> special is generated taking advantage of that.
>
> I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
> emitted, but don't see it.
My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried
the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I
see LDP emitted as expected for most of the identifiable int128 accesses
(both in a standalone test harness and a quick survey of kernel code via
'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well
be places where the compiler gets clever enough to elide all or part of
that load where data is already held in registers - I've not audited
*that* closely - but the whole point of having a pure C implementation
is that it can be aggressively inlined more than inline asm ever could.
> There were some prior discussions about GCC behavior, like this thread:
> https://patchwork.kernel.org/patch/9081911/ , in which you talked about
> the difference between GCC4 and GCC5.3. It looks to me this is regressed
> in Linaro GCC6.4 build.
>
> I have not checked newer GCC versions.
>
> Will it be more stable to just do this with inline assembly instead of
> relying on __uint128_t data type?
>
> GCC documentation says __int128 is supported for targets which have an
> integer mode wide enough to hold 128 bits. aarch64 doesn't have such an
> integer mode.
Yet AArch64 GCC definitely does support __uint128_t, or this code
wouldn't even build ;)
Robin.
>
> Thanks
>
> Bo
Hi Robin,
That UBSAN error prompted me to check the generated instructions. The
error by itself doesn't make sense to me because there is no requirement
for 128b alignment on ldp/stp.
With 4.18-rc3, when I build for the default "defconfig" in
arch/arm64/configs/, I see the disassembled code shows two ldr instead
of a ldp.
One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After
disassembling vmlinux, I see following:
if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
ffff0000089bcdb8: 38776b05 ldrb w5, [x24,x23]
static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
{
__uint128_t tmp;
u64 sum;
tmp = *(const __uint128_t *)iph;
ffff0000089bcdbc: f9400481 ldr x1, [x4,#8]
ffff0000089bcdc0: f8776b00 ldr x0, [x24,x23]
ffff0000089bcdc4: 12000ca5 and w5, w5, #0xf
ffff0000089bcdc8: 510014a3 sub w3, w5, #0x5
This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the
default optimization level is -O2.
I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o
in net/ipv4. The result is inconsistent. In some instances, I do see ldp
instruction being generated, in some other cases, it's two ldr. For
example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I
expected. For ip_rcv, it's not.
So it looks like this is not very consistent, but it also looks like in
majority of cases it generates the ldp instructions.
On 07/09/2018 04:54 AM, Robin Murphy wrote:
> Hi Bo,
>
> On 06/07/18 17:27, Bo Yan wrote:
>> Hi Robin, Luke,
>>
>> Recently I bumped into an error when running GCC undefined behavior
>> sanitizer:
>>
>> UBSAN: Undefined behaviour in
>> kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
>> load of misaligned address ffffffc198c8b254 for type 'const
>> __int128 unsigned'
>> which requires 16 byte alignment
>
> What's your config and reproducer here? I've had UBSan enabled a few
> times since that patch went in and never noticed anything. I've just
> tried it with 4.18-rc3 and indeed don't see anything from just booting
> the machine and making some network traffic. It does indeed fire if I
> also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a
> million other warnings for all manner of types - that's to be expected
> since, as the help text says, "Enabling this option on architectures
> that support unaligned accesses may produce a lot of false positives."
>
>>
>> The relevant code:
>>
>> tmp = *(const __uint128_t *)iph;
>> iph += 16;
>> ihl -= 4;
>> tmp += ((tmp >> 64) | (tmp << 64));
>> sum = tmp >> 64;
>> do {
>> sum += *(const u32 *)iph;
>> iph += 4;
>> } while (--ihl);
>>
>> But, I checked the generated disassembly, it doesn't look like
>> anything special is generated taking advantage of that.
>>
>> I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be
>> emitted, but don't see it.
>
> My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried
> the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I
> see LDP emitted as expected for most of the identifiable int128 accesses
> (both in a standalone test harness and a quick survey of kernel code via
> 'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well
> be places where the compiler gets clever enough to elide all or part of
> that load where data is already held in registers - I've not audited
> *that* closely - but the whole point of having a pure C implementation
> is that it can be aggressively inlined more than inline asm ever could.
>
>> There were some prior discussions about GCC behavior, like this
>> thread: https://patchwork.kernel.org/patch/9081911/ , in which you
>> talked about the difference between GCC4 and GCC5.3. It looks to me
>> this is regressed in Linaro GCC6.4 build.
>>
>> I have not checked newer GCC versions.
>>
>> Will it be more stable to just do this with inline assembly instead of
>> relying on __uint128_t data type?
>>
>> GCC documentation says __int128 is supported for targets which have an
>> integer mode wide enough to hold 128 bits. aarch64 doesn't have such
>> an integer mode.
>
> Yet AArch64 GCC definitely does support __uint128_t, or this code
> wouldn't even build ;)
>
> Robin.
>
>>
>> Thanks
>>
>> Bo