Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2131007pxb; Sat, 7 Nov 2020 10:13:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPBFBk+vPXAxSKLxIV3QXmbo84vkJwX60W2jSoepWHa6coj1OzfrKmOKcyQ/2EPlTSuCr/ X-Received: by 2002:a05:6402:b6e:: with SMTP id cb14mr7958795edb.308.1604772836593; Sat, 07 Nov 2020 10:13:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604772836; cv=none; d=google.com; s=arc-20160816; b=rtY6n6+F/KPDaEuF0TkYv5wFN+bCAW6awPbWm5kmzmhF3aaFIcjO2Zqzow4v/IJfD4 nFGOuk+AA6EQq6uftr/gQgt7VTngzquJpWbzqa52fyi2AV/NBiNCBneYpMRQWh3mHtrl RD11PWhFkZydjr2zzlj9ZTHnt9NE37LwDONNyoy9sM70RSCR9p2fRqbsDXuN4S4v0vkd uxZc7mxLqj/DIbpxS+EmvWPbpmNHXcXes2MpQ9JNcLb6k1NkvFNLFnBTpAytBVr6TvW0 9NwXoow8ePNOivozeCtUv38ooLAr3SwBHg+7JjMRtaByf3QVATcuX3IuzXjwTYOX7xoS 9L0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=VG0bxgy1qgcv5qc4TNNaIiTzUNsg+MWEbj6FogyrDbY=; b=pIMxxUcPD26i4imXDXASJXWRVG9M9QiHp+lP6J8RYkOHr11I1tYFFTJqct0AmVW+ki BT737HbnCU1SwakVay0T2H8YTsWqHt2ypu7e8T19VrqR8qT9H5pv7HU0+YB27gdMSnFF 1pvY5zpXnn66AljMkr/AUKF9Wp8lVm/xZAASq1JW+cBZ5G1+X8gmnHHiYgD7JwmEisJD qpsfZCNYVc9C6FSUl7NRQxe1DJlN+2y0S5FQLqF84vW6F4r1sKEsWripTPpFyOQACDip 8vRt367qvrFI097CJ9bc7b4Oq8sbBsKQgaCvY4PxalCng0ZTJnzGSp8NjFcENwSounUC zzSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c7si3689842edx.218.2020.11.07.10.13.32; Sat, 07 Nov 2020 10:13:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727871AbgKGSHz (ORCPT + 99 others); Sat, 7 Nov 2020 13:07:55 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:47912 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726333AbgKGSHz (ORCPT ); Sat, 7 Nov 2020 13:07:55 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 803CB1F45390 From: Adrian Ratiu To: Nick Desaulniers Cc: Nathan Chancellor , Arnd Bergmann , Linux ARM , clang-built-linux , Russell King , LKML , Collabora Kernel ML , Ard Biesheuvel Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization In-Reply-To: References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> Date: Sat, 07 Nov 2020 20:07:47 +0200 Message-ID: <87tuu1ujkc.fsf@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 06 Nov 2020, Nick Desaulniers wrote: > On Fri, Nov 6, 2020 at 3:50 AM Adrian Ratiu > wrote: >> >> Hi Nathan, >> >> On Fri, 06 Nov 2020, Nathan Chancellor >> wrote: >> > + Ard, who wrote this code. >> > >> > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: >> >> Due to a Clang bug [1] neon autoloop vectorization does not >> >> happen or happens badly with no gains and considering >> >> previous GCC experiences which generated unoptimized code >> >> which was worse than the default asm implementation, it is >> >> safer to default clang builds to the known good generic >> >> implementation. The kernel currently supports a minimum >> >> Clang version of v10.0.1, see commit 1f7a44f63e6c >> >> ("compiler-clang: add build check for clang 10.0.1"). When >> >> the bug gets eventually fixed, this commit could be reverted >> >> or, if the minimum clang version bump takes a long time, a >> >> warning could be added for users to upgrade their compilers >> >> like was done for GCC. [1] >> >> https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by: >> >> Adrian Ratiu >> > >> > Thank you for the patch! We are also tracking this here: >> > >> > https://github.com/ClangBuiltLinux/linux/issues/496 >> > >> > It was on my TODO to revist getting the warning eliminated, >> > which likely would have involved a patch like this as well. >> > >> > I am curious if it is worth revisting or dusting off Arnd's >> > patch in the LLVM bug tracker first. I have not tried it >> > personally. If that is not a worthwhile option, I am fine >> > with this for now. It would be nice to try and get a fix >> > pinned down on the LLVM side at some point but alas, finite >> > amount of resources and people :( >> >> I tested Arnd's kernel patch from the LLVM bugtracker [1], but >> with the Clang v10.0.1 I still get warnings like the following >> even though the __restrict workaround seems to affect the >> generated instructions: >> >> ./include/asm-generic/xor.h:15:2: remark: the cost-model >> indicates that interleaving is not beneficial >> [-Rpass-missed=loop-vectorize] >> ./include/asm-generic/xor.h:11:1: remark: List vectorization >> was possible but not beneficial with cost 0 >= 0 >> [-Rpass-missed=slp-vectorizer] xor_8regs_2(unsigned long bytes, >> unsigned long *__restrict p1, unsigned long *__restrict p2) > > If it's just a matter of overruling the cost model #pragma clang > loop vectorize(enable) > > will do the trick. > > Indeed, ``` diff --git a/include/asm-generic/xor.h > b/include/asm-generic/xor.h index b62a2a56a4d4..8796955498b7 > 100644 --- a/include/asm-generic/xor.h +++ > b/include/asm-generic/xor.h @@ -12,6 +12,7 @@ > xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned > long *p2) > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0]; p1[1] ^= p2[1]; > @@ -32,6 +33,7 @@ xor_8regs_3(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0]; p1[1] ^= p2[1] ^ p3[1]; > @@ -53,6 +55,7 @@ xor_8regs_4(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; p1[1] ^= p2[1] ^ > p3[1] ^ p4[1]; > @@ -75,6 +78,7 @@ xor_8regs_5(unsigned long bytes, unsigned long > *p1, unsigned long *p2, > { > long lines = bytes / (sizeof (long)) / 8; > > +#pragma clang loop vectorize(enable) > do { > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > ``` seems to generate the vectorized code. > > Why don't we find a way to make those pragma's more toolchain > portable, rather than open coding them like I have above rather > than this series? Hi Nick, Thank you very much for the suggestion. I agree. If a toolchain portable way can be found to realiably trigger the optimization, I will gladly replace this patch. :) Will work on it starting Monday then report back my findings or, if I can get it to work in a satisfying manner, send a v2 series directly. The first patch is still needed because it's more of a general cleanup as Nathan correctly observed. Regards, Adrian > > -- > Thanks, > ~Nick Desaulniers