Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2756726pxb; Sun, 8 Nov 2020 12:17:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJxAUVrPig2RaUmhr2LTwoqlgwK+OHX6g2LA8EC41j2XUNG4o9YaYvpIkueshuRtocWJEPRl X-Received: by 2002:a17:906:f18f:: with SMTP id gs15mr12352555ejb.474.1604866663432; Sun, 08 Nov 2020 12:17:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604866663; cv=none; d=google.com; s=arc-20160816; b=M0rGwvJIVAhwMzxPJwa0nIipauWfxV8dAaMzcHtHJwxhP5RJb6vjduRDbOFUHuWhWR 9up7wYQhAxy7iQEX1axW5QZG34qAl8+n216SJbiVSoT2Bm95ljj642Z+RH27J3AUYJFi NrBsDTteUbV+YE/1qYx2OoLJNKE6P0sdWrhFg6XXPx9pJ7QM6L3BMU4xotk48V2MAMbD r8jFnzZWsXRUf7EKk/PV+7WsVBija8hcv8PPRt+zxMEXU9AVPlx64WeVHSEpAGC02PJ2 cX7mqtXbF2fZWNc6eoWXN/esPEDx1SvTb0eww0tGa3SKXa5hXAzaraQEOth4uWw4V/wf 6Gtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xp9MU/9y5ufoLO0NuzfoU0cK03pmsPRntnlp+guWlx4=; b=Bja7NtyoGmaLEix0y9XO2dmTzgUpUmKxMiE+0OwhKNBNK+dV8foxwJdm+6nv7uD6OS LTZsPZcBujiYx4VOArs+o/gqVJS8U4XvWhu0uTa6kMTPgpmMEbh7cEAlUT+5KssTb0p0 pJO7XetezHXSneigux2JEr60Fbqxx2JhVD++JtkAv10wPXCaBXibMurXf46MzIfyEbJb OsOOwUbbcIwqkf659vaTfRW8snh2lmqYHyEvidHgNjYm5L7vfHounIhBbyGMhO2BDAR/ yW3eygg5UiDCpMjqBnJlCoYoXt9Qc0qw4jEsQT+qoom2CaEBPS2Pwhm0i2pw5//tCOwL cW0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=f5oYkN2h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dn20si5182409ejc.324.2020.11.08.12.17.20; Sun, 08 Nov 2020 12:17:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=f5oYkN2h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728104AbgKHUPF (ORCPT + 99 others); Sun, 8 Nov 2020 15:15:05 -0500 Received: from mail.kernel.org ([198.145.29.99]:33204 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbgKHUPE (ORCPT ); Sun, 8 Nov 2020 15:15:04 -0500 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7F020221FF for ; Sun, 8 Nov 2020 20:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604866503; bh=DzrytRBv4yv5uzHi5q12YXuVzqPc8Sd8QLQFJ4ZtEog=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=f5oYkN2hm7CQkCGPle6S4dyWS8tIC1tMMuM6cjt85xmYsYDSkVvDzUDGGcQ39Epj7 fM989LT3208jEDVzaHwGhOBfrNsS8EynVAM+5QpXuFoUBdExq+drjN1/A8h4mWRXQK 6qC3kXj2nXqTvTFacf0itLMI2Kt5NBWgDulF/eNA= Received: by mail-ot1-f53.google.com with SMTP id n15so6920135otl.8 for ; Sun, 08 Nov 2020 12:15:03 -0800 (PST) X-Gm-Message-State: AOAM530cpE8CcxCA+awGgT4zltmP32dMQJRA9Ukyx+YAP3dQOp9RTkLd +dQ9dK9IjeZW8Nvzd2GNSOpTgRbq7RhTHQU/HV0= X-Received: by 2002:a9d:62c1:: with SMTP id z1mr7838332otk.108.1604866502776; Sun, 08 Nov 2020 12:15:02 -0800 (PST) MIME-Version: 1.0 References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201108174014.GA219672@rani.riverdale.lan> <20201108180942.GA226037@rani.riverdale.lan> In-Reply-To: <20201108180942.GA226037@rani.riverdale.lan> From: Ard Biesheuvel Date: Sun, 8 Nov 2020 21:14:50 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization To: Arvind Sankar Cc: Arnd Bergmann , Adrian Ratiu , Nick Desaulniers , Russell King , Linux Kernel Mailing List , clang-built-linux , Nathan Chancellor , kernel@collabora.com, Linux ARM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 8 Nov 2020 at 19:10, Arvind Sankar wrote: > > On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote: > > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > > > Due to a Clang bug [1] neon autoloop vectorization does not happen or > > > happens badly with no gains and considering previous GCC experiences > > > which generated unoptimized code which was worse than the default asm > > > implementation, it is safer to default clang builds to the known good > > > generic implementation. > > > > > > The kernel currently supports a minimum Clang version of v10.0.1, see > > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1"). > > > > > > When the bug gets eventually fixed, this commit could be reverted or, > > > if the minimum clang version bump takes a long time, a warning could > > > be added for users to upgrade their compilers like was done for GCC. > > > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976 > > > > > > Signed-off-by: Adrian Ratiu > > > --- > > > arch/arm/include/asm/xor.h | 3 ++- > > > arch/arm/lib/Makefile | 3 +++ > > > arch/arm/lib/xor-neon.c | 4 ++++ > > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > > > index aefddec79286..49937dafaa71 100644 > > > --- a/arch/arm/include/asm/xor.h > > > +++ b/arch/arm/include/asm/xor.h > > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = { > > > NEON_TEMPLATES; \ > > > } while (0) > > > > > > -#ifdef CONFIG_KERNEL_MODE_NEON > > > +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG) > > > > > > extern struct xor_block_template const xor_block_neon_inner; > > > > > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > > > index 6d2ba454f25b..53f9e7dd9714 100644 > > > --- a/arch/arm/lib/Makefile > > > +++ b/arch/arm/lib/Makefile > > > @@ -43,8 +43,11 @@ endif > > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > > > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > > > > > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 > > > +ifndef CONFIG_CC_IS_CLANG > > > ifeq ($(CONFIG_KERNEL_MODE_NEON),y) > > > NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon > > > CFLAGS_xor-neon.o += $(NEON_FLAGS) > > > obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o > > > endif > > > +endif > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > > index e1e76186ec23..84c91c48dfa2 100644 > > > --- a/arch/arm/lib/xor-neon.c > > > +++ b/arch/arm/lib/xor-neon.c > > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL"); > > > * Pull in the reference implementations while instructing GCC (through > > > * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > > > * NEON instructions. > > > + > > > + * On Clang the loop vectorizer is enabled by default, but due to a bug > > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke > > > + * so xor-neon is disabled in favor of the default reg implementations. > > > */ > > > #ifdef CONFIG_CC_IS_GCC > > > #pragma GCC optimize "tree-vectorize" > > > -- > > > 2.29.0 > > > > > > > It's actually a bad idea to use #pragma GCC optimize. This is basically > > the same as tagging all the functions with __attribute__((optimize)), > > which GCC does not recommend for production use, as it _replaces_ > > optimization options rather than appending to them, and has been > > observed to result in dropping important compiler flags. > > > > There've been a few discussions recently around other such cases: > > https://lore.kernel.org/lkml/20201028171506.15682-1-ardb@kernel.org/ > > https://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net/ > > > > For this file, given that it is supposed to use -ftree-vectorize for the > > whole file anyway, is there any reason it's not just added to CFLAGS via > > the Makefile? This seems to be the only use of pragma optimize in the > > kernel. > > Eg, this shows that the pragma results in dropping -fno-strict-aliasing. > https://godbolt.org/z/1nfrKT > > The first function does not use vectorization because s and s->a might > alias. > Thanks, Arvind. I wasn't aware of this issue at the time, but I agree that we should replace the #pragma with a command line option in this case. And given that we already set CFLAGS_xor-neon.o in the Makefile, adding it there would have been more straight-forward to begin with.