Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp904168imm; Mon, 9 Jul 2018 12:55:59 -0700 (PDT) X-Google-Smtp-Source: AAOMgpc7IymeQFvic8qh2k/MsSWZHzjajtvCvEHgzy+XvJgUeal8G0hkStsTlDx+UiV6NqjZBAiz X-Received: by 2002:a63:1811:: with SMTP id y17-v6mr13839961pgl.356.1531166159358; Mon, 09 Jul 2018 12:55:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531166159; cv=none; d=google.com; s=arc-20160816; b=ZgLPMrDNC6xcLN4vzJoIJzVHdaNPOK+Wnt0x8idxVlLy/0au9bzHJEdnZDTL1L5AOj ril1LgdM6zkv9I17lFvQsWpLbitRS710/c02lwlxzqFvjugCeBKnOzGIkswLw9/vCh/g htH7z98c6fjFpyigyABMPZmzJzrDtn3GNn/5KKKnTXzOoalfvTe2YkmF16RvzHKRCIel iVCcpiEqS+2oQWXnNbkZ0k2CqJd3bJ+ZvwXc/nI/LwlPPBELyws5MpaZ97dBQFucny9h DqMIZqoU43yzWMVelUg2nn+PMzzy6En5zDTfemDxOviQkxyyLIgeI3pF+PXqpyCo3pVJ bSsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=CcLXmmWB4ZN0TmZYCiBgO5TjX5fwyNFSvx1lRSl1nWY=; b=O2A6YzAv5KzIWYy5Jgrf9AECNCgOyp8/QyyTWXi4p70jVuUDPVVcXPBugwJZtkcJTq aOMn7NtLq5mRvS9YUJjUgQq+kJDVn7ZIQQDkinL92Z+hA0iUgAjpcJNXBDjaCTF1DNDY Qs7ec95AMykZNl/BTeEM62I3S8kyPmwsbjIrnNR/smJI20wnYELrdSeuhAFpzWBAMVli GDppNrdIlbVkwHeU/E/WMqOCtFddioyGSuzd6N61SXZJZ+OVnszsgbN9IDOzw/j9Dk2r hkysDvpZllJQBUUN7rqvwOfN2pboV5VhzQh2EIt4Tegoq/foYNY1bi5g1CtAiyZ6Atje oBkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p26-v6si14032992pgv.344.2018.07.09.12.55.45; Mon, 09 Jul 2018 12:55:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933149AbeGITxz convert rfc822-to-8bit (ORCPT + 99 others); Mon, 9 Jul 2018 15:53:55 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:1635 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932893AbeGITxw (ORCPT ); Mon, 9 Jul 2018 15:53:52 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Mon, 09 Jul 2018 12:53:49 -0700 Received: from HQMAIL108.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Mon, 09 Jul 2018 12:53:48 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Mon, 09 Jul 2018 12:53:48 -0700 Received: from [172.17.136.14] (172.17.136.14) by HQMAIL108.nvidia.com (172.18.146.13) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 9 Jul 2018 19:53:51 +0000 Subject: Re: a question about IP checksum helper for arm64 To: Robin Murphy CC: , References: <141d76cc-d43b-5412-fb39-426d7c2261b9@nvidia.com> <6a3e91ff-ce9f-8120-732e-a33dad6d0146@arm.com> X-Nvconfidentiality: public From: Bo Yan Message-ID: <988c3225-a961-5de0-eb22-7366891f65da@nvidia.com> Date: Mon, 9 Jul 2018 12:53:50 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <6a3e91ff-ce9f-8120-732e-a33dad6d0146@arm.com> X-Originating-IP: [172.17.136.14] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL108.nvidia.com (172.18.146.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Robin, That UBSAN error prompted me to check the generated instructions. The error by itself doesn't make sense to me because there is no requirement for 128b alignment on ldp/stp. With 4.18-rc3, when I build for the default "defconfig" in arch/arm64/configs/, I see the disassembled code shows two ldr instead of a ldp. One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After disassembling vmlinux, I see following: if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) ffff0000089bcdb8: 38776b05 ldrb w5, [x24,x23] static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl) { __uint128_t tmp; u64 sum; tmp = *(const __uint128_t *)iph; ffff0000089bcdbc: f9400481 ldr x1, [x4,#8] ffff0000089bcdc0: f8776b00 ldr x0, [x24,x23] ffff0000089bcdc4: 12000ca5 and w5, w5, #0xf ffff0000089bcdc8: 510014a3 sub w3, w5, #0x5 This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the default optimization level is -O2. I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o in net/ipv4. The result is inconsistent. In some instances, I do see ldp instruction being generated, in some other cases, it's two ldr. For example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I expected. For ip_rcv, it's not. So it looks like this is not very consistent, but it also looks like in majority of cases it generates the ldp instructions. On 07/09/2018 04:54 AM, Robin Murphy wrote: > Hi Bo, > > On 06/07/18 17:27, Bo Yan wrote: >> Hi Robin, Luke, >> >> Recently I bumped into an error when running GCC undefined behavior >> sanitizer: >> >> UBSAN: Undefined behaviour in >> kernel-4.9/arch/arm64/include/asm/checksum.h:34:6 >>         load of misaligned address ffffffc198c8b254 for type 'const >> __int128 unsigned' >>         which requires 16 byte alignment > > What's your config and reproducer here? I've had UBSan enabled a few > times since that patch went in and never noticed anything. I've just > tried it with 4.18-rc3 and indeed don't see anything from just booting > the machine and making some network traffic. It does indeed fire if I > also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a > million other warnings for all manner of types - that's to be expected > since, as the help text says, "Enabling this option on architectures > that support unaligned accesses may produce a lot of false positives." > >> >> The relevant code: >> >>          tmp = *(const __uint128_t *)iph; >>          iph += 16; >>          ihl -= 4; >>          tmp += ((tmp >> 64) | (tmp << 64)); >>          sum = tmp >> 64; >>          do { >>                  sum += *(const u32 *)iph; >>                  iph += 4; >>          } while (--ihl); >> >> But, I checked the generated disassembly, it doesn't look like >> anything special is generated taking advantage of that. >> >> I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be >> emitted, but don't see it. > > My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried > the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I > see LDP emitted as expected for most of the identifiable int128 accesses > (both in a standalone test harness and a quick survey of kernel code via > 'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well > be places where the compiler gets clever enough to elide all or part of > that load where data is already held in registers - I've not audited > *that* closely - but the whole point of having a pure C implementation > is that it can be aggressively inlined more than inline asm ever could. > >> There were some prior discussions about GCC behavior, like this >> thread: https://patchwork.kernel.org/patch/9081911/ , in which you >> talked about the difference between GCC4 and GCC5.3. It looks to me >> this is regressed in Linaro GCC6.4 build. >> >> I have not checked newer GCC versions. >> >> Will it be more stable to just do this with inline assembly instead of >> relying on __uint128_t data type? >> >> GCC documentation says __int128 is supported for targets which have an >> integer mode wide enough to hold 128 bits. aarch64 doesn't have such >> an integer mode. > > Yet AArch64 GCC definitely does support __uint128_t, or this code > wouldn't even build ;) > > Robin. > >> >> Thanks >> >> Bo