Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1683791pxj; Wed, 19 May 2021 11:24:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxH6hdzVhNSK5jruq6FLo9iYWY1PMWFBlokwMzGULkZ0k1i6PN4rzvdOQuKKeU2BFb0oVDl X-Received: by 2002:a17:906:3a45:: with SMTP id a5mr503008ejf.288.1621448641763; Wed, 19 May 2021 11:24:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621448641; cv=none; d=google.com; s=arc-20160816; b=xIakM7n5hW3/AeZcXAzd/+3fv3SUIXaAe1hMjx6xTnnbQVT1L2cdSOs3e9v2L270Pb NpTUVYFtq34740z2P1NjTT6O1u9nxx8Xgm+rr2POZjDARV9YRXCI8OjEyPN+/ReujEEG 1P5YaqUFtRWl7aonOysMfrNscCkmgEyFRvfD2HxlxvcNg0pMO+W3fCJlhAnMkg9g0LcY zmpUSs+gLcMjOlkL7WuW0BG2D4/1ujZFhSys4NragqJzWsybPDZdlfTYcJy78ogFh0b8 yKeOpHeuuJZ1AKbpKayVpMmmr0XoWuOPNO+USi3DRV/b7Gf1bGVuIuVRyrJ4hTBsSkWf AUEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ryNGn3iPnypPYn2b5vg9zKKQ/b9xvDeiGVvzaVD/bEs=; b=pkSFZ5yWxnl2hdGnWTTZpjcNGHhHEzHwqLkr6fbMPrbv22n5PCo3YvKwBAPYZO7Or1 ZOFhLgKOycyvrgghNSeoe0IDpQ4ggW5i4pTOxsKNK7mp/DiSCsRLoXf/9XjDFmhvwAO3 NVpX7ihQ2h2mn+ctVqn3RhKt9hCsSFSOmbj4Gy815pwGpRC/8d63ma6a/FpBDIIEnp6P bHypSxbBIw1k+/SJNtkNhJO2yv6Dq6FhVuuYKClXwAlfUuLY4MCbYjuKU5QiMqQl5hZs rt+amIVY4W5gjIQp41CMT0Z9Yvp81AwRD8WEkSagLp9+ErZx8EEWGhXE5HOjBWyo9juD stPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=EaWVhjNr; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bx25si7790871edb.505.2021.05.19.11.23.37; Wed, 19 May 2021 11:24:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=EaWVhjNr; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344604AbhERQNm (ORCPT + 99 others); Tue, 18 May 2021 12:13:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235055AbhERQNl (ORCPT ); Tue, 18 May 2021 12:13:41 -0400 Received: from mail-lf1-x135.google.com (mail-lf1-x135.google.com [IPv6:2a00:1450:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 531EAC061573 for ; Tue, 18 May 2021 09:12:23 -0700 (PDT) Received: by mail-lf1-x135.google.com with SMTP id w33so7037422lfu.7 for ; Tue, 18 May 2021 09:12:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ryNGn3iPnypPYn2b5vg9zKKQ/b9xvDeiGVvzaVD/bEs=; b=EaWVhjNrNBWn6f1mh1OvKUeDE/tFuKqvgMLfofpYgBGgWg79RhbWTVMtICXNGGKhjS 3Mq5M+u41o3/M+25Q2/C6ZfA8MHWu/DrQxEX0WIpCg+HH9Ws5sQuJVmPUoK2wkGm2nzc sbk5hpok8r5v8UCa0qR/IAKlL2f1Ed+l4beT4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ryNGn3iPnypPYn2b5vg9zKKQ/b9xvDeiGVvzaVD/bEs=; b=cpy+5Ck8BO9cqAXWBpz/Zb/spTZ/sMQO+TynreyVYB3nnDRVbCNC4b83fyvy3k6zK2 9hYbRBbGigTHtbjgvSSfekrXVH8N0mEXk7jMhqaWWfncdPfVPTpXVhaG1QugJK1kGKFo ssGXVYkq4ZCYD3PfsqLmUznq+K74eCLl5h7N44yn8yjo2TSw4wA87dDAaTUipDpjf5gV yHWp9u7ntu+5BE6TbEWqV3tM72nC/mhVzn9of8abS6+SZe5Ymi1Ibi2ea/3HbWtl+HCe cD8cH/WAQ/8FhHDiDxtwQlM4dMafZ02HLe1tYKn9wD8Z3DmInQOT0DfFpHk/xJ5jWE7l Q1lA== X-Gm-Message-State: AOAM531gmCam34Lc+9z92cvXGKahfZ8iPachcdXlkCPlaiqsHg1UM0pj LlIKykPlxp0MvKUTyed89NL24dut7bL7BjOv X-Received: by 2002:a05:6512:1281:: with SMTP id u1mr4518409lfs.443.1621354341542; Tue, 18 May 2021 09:12:21 -0700 (PDT) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com. [209.85.167.52]) by smtp.gmail.com with ESMTPSA id j3sm583780lfg.28.2021.05.18.09.12.20 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 May 2021 09:12:20 -0700 (PDT) Received: by mail-lf1-f52.google.com with SMTP id a2so14955535lfc.9 for ; Tue, 18 May 2021 09:12:20 -0700 (PDT) X-Received: by 2002:a19:ca15:: with SMTP id a21mr4619384lfg.487.1621354340156; Tue, 18 May 2021 09:12:20 -0700 (PDT) MIME-Version: 1.0 References: <20210514100106.3404011-1-arnd@kernel.org> <20210514100106.3404011-8-arnd@kernel.org> In-Reply-To: From: Linus Torvalds Date: Tue, 18 May 2021 06:12:03 -1000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers To: Arnd Bergmann , "Jason A. Donenfeld" Cc: Eric Biggers , linux-arch , Vineet Gupta , Russell King , Herbert Xu , "David S. Miller" , Thomas Bogendoerfer , Linux ARM , Linux Kernel Mailing List , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , "open list:BROADCOM NVRAM DRIVER" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann wrote: > > To be on the safe side, we could pass -fno-tree-loop-vectorize along > with -O3 on the affected gcc versions, or use a bigger hammer > (not use -O3 at all, always set -fno-tree-loop-vectorize, ...). I personally think -O3 in general is unsafe. It has historically been horribly buggy. It's gotten better, but this case clearly shows that "gotten better" really isn't that high of a bar. Very few projects use -O3, which is obviously part of why it's buggy. But the other part of why it's buggy is that vectorization is simply very complicated, and honestly, judging by the last report the gcc people don't care about being careful. They literally are ok with knowingly generating an off-by-one range check, because "it's undefined behavior". With that kind of mentality, I'm not personally all that inclined to say "sure, use -O3". We know it has bugs even for the well-defined cases. > -O3 is set for the lz4 and zstd compression helpers and for wireguard. I'm actually surprised wireguard would use -O3. Yes, performance is important. But for wireguard, correctness is certainly important too. Maybe Jason isn't aware of just how bad gcc -O3 has historically been? And -O3 has often generated _slower_ code, in addition to the bugs. It's not like it's a situation where "-O3 is obviously better than -O2". There's a reason -O2 is the default. And that tends to be even more true in the kernel than in many user space programs (ie smaller loops, generally much higher I$ miss rates etc). Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2? There are known buggy gcc versions that aren't ancient. Of the other cases, that xor-neon.c case actually makes sense. For that file, it literally exists _only_ to get a vectorized version of the trivial xor_8regs loop. It's one of the (very very few) cases of vectorization we actually want. And in that case, we might even want to make things easier - and more explicit - for the compiler by making the xor_8regs loops use "restrict" pointers. That neon case actually wants and needs that tree-vectorization to DTRT. But maybe it doesn't need the actual _loop_ vectorization? The xor_8regs code is literally using hand-unrolled loops already, exactly to make it as simple as possible for the compiler (but the lack of "restrict" pointers means that it's not all that simple after all, and I assume the compiler generates conditionals for the NEON case? lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but it also very much uses unaligned accesses, which is where the gcc bug hits. I doubt that it really needs or wants the loop vectorization. zstd looks very similar to lz4. End result: at a minimum, I'd suggest using "-fno-tree-loop-vectorize", although somebody should check that NEON case. And I still think that using O3 for anything halfway complicated should be considered odd and need some strong numbers to enable. Linus