Dear all,
This is v2 of the patch series at
id:[email protected]
Tested on next-20201112 using GCC 10.2.0 and Clang 10.0.1.
Kind regards,
Adrian
Changes in v2:
- Dropped the patch which disabled Clang vectorization (Nick)
- Added new patch to move pragmas to makefile cmdline options
(Arvid and Ard)
Adrian Ratiu (1):
arm: lib: xor-neon: move pragma options to makefile
Nathan Chancellor (1):
arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
arch/arm/lib/Makefile | 2 +-
arch/arm/lib/xor-neon.c | 17 -----------------
2 files changed, 1 insertion(+), 18 deletions(-)
--
2.29.2
Using a pragma like GCC optimize is a bad idea because it tags
all functions with an __attribute__((optimize)) which replaces
optimization options rather than appending so could result in
dropping important flags. Not recommended for production use.
Because these options should always be enabled for this file,
it's better to set them via command line. tree-vectorize is on
by default in Clang, but it doesn't hurt to make it explicit.
Suggested-by: Arvind Sankar <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Adrian Ratiu <[email protected]>
---
arch/arm/lib/Makefile | 2 +-
arch/arm/lib/xor-neon.c | 10 ----------
2 files changed, 1 insertion(+), 11 deletions(-)
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..12d31d1a7630 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -45,6 +45,6 @@ $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
- CFLAGS_xor-neon.o += $(NEON_FLAGS)
+ CFLAGS_xor-neon.o += $(NEON_FLAGS) -ftree-vectorize -Wno-unused-variable
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
endif
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index e1e76186ec23..62b493e386c4 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -14,16 +14,6 @@ MODULE_LICENSE("GPL");
#error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
#endif
-/*
- * Pull in the reference implementations while instructing GCC (through
- * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions.
- */
-#ifdef CONFIG_CC_IS_GCC
-#pragma GCC optimize "tree-vectorize"
-#endif
-
-#pragma GCC diagnostic ignored "-Wunused-variable"
#include <asm-generic/xor.h>
struct xor_block_template const xor_block_neon_inner = {
--
2.29.2
From: Nathan Chancellor <[email protected]>
Drop warning because kernel now requires GCC >= v4.9 after
commit 6ec4476ac825 ("Raise gcc version requirement to 4.9").
Reported-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Adrian Ratiu <[email protected]>
---
arch/arm/lib/xor-neon.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index b99dd8e1c93f..e1e76186ec23 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
* -ftree-vectorize) to attempt to exploit implicit parallelism and emit
* NEON instructions.
*/
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#ifdef CONFIG_CC_IS_GCC
#pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
#endif
#pragma GCC diagnostic ignored "-Wunused-variable"
--
2.29.2
On Thu, Nov 12, 2020 at 1:23 PM Adrian Ratiu <[email protected]> wrote:
>
> From: Nathan Chancellor <[email protected]>
>
> Drop warning because kernel now requires GCC >= v4.9 after
> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9").
>
> Reported-by: Nick Desaulniers <[email protected]>
> Signed-off-by: Nathan Chancellor <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
Link: https://github.com/ClangBuiltLinux/linux/issues/496
Link: https://github.com/ClangBuiltLinux/linux/issues/503
Reviewed-by: Nick Desaulniers <[email protected]>
> ---
> arch/arm/lib/xor-neon.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..e1e76186ec23 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> * NEON instructions.
> */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
> #pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
> #endif
>
> #pragma GCC diagnostic ignored "-Wunused-variable"
> --
> 2.29.2
>
--
Thanks,
~Nick Desaulniers
On Thu, Nov 12, 2020 at 1:23 PM Adrian Ratiu <[email protected]> wrote:
>
> Using a pragma like GCC optimize is a bad idea because it tags
> all functions with an __attribute__((optimize)) which replaces
> optimization options rather than appending so could result in
> dropping important flags. Not recommended for production use.
>
> Because these options should always be enabled for this file,
> it's better to set them via command line. tree-vectorize is on
> by default in Clang, but it doesn't hurt to make it explicit.
>
> Suggested-by: Arvind Sankar <[email protected]>
> Suggested-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
> ---
> arch/arm/lib/Makefile | 2 +-
> arch/arm/lib/xor-neon.c | 10 ----------
> 2 files changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 6d2ba454f25b..12d31d1a7630 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -45,6 +45,6 @@ $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
>
> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> - CFLAGS_xor-neon.o += $(NEON_FLAGS)
> + CFLAGS_xor-neon.o += $(NEON_FLAGS) -ftree-vectorize -Wno-unused-variable
> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> endif
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index e1e76186ec23..62b493e386c4 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,16 +14,6 @@ MODULE_LICENSE("GPL");
> #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> #endif
>
> -/*
> - * Pull in the reference implementations while instructing GCC (through
> - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> - */
> -#ifdef CONFIG_CC_IS_GCC
> -#pragma GCC optimize "tree-vectorize"
> -#endif
> -
> -#pragma GCC diagnostic ignored "-Wunused-variable"
> #include <asm-generic/xor.h>
>
> struct xor_block_template const xor_block_neon_inner = {
> --
> 2.29.2
>
--
Thanks,
~Nick Desaulniers
On Thu, Nov 12, 2020 at 11:24:57PM +0200, Adrian Ratiu wrote:
> Using a pragma like GCC optimize is a bad idea because it tags
> all functions with an __attribute__((optimize)) which replaces
> optimization options rather than appending so could result in
> dropping important flags. Not recommended for production use.
>
> Because these options should always be enabled for this file,
> it's better to set them via command line. tree-vectorize is on
> by default in Clang, but it doesn't hurt to make it explicit.
>
> Suggested-by: Arvind Sankar <[email protected]>
> Suggested-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
> ---
> arch/arm/lib/Makefile | 2 +-
> arch/arm/lib/xor-neon.c | 10 ----------
> 2 files changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 6d2ba454f25b..12d31d1a7630 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -45,6 +45,6 @@ $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
>
> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> - CFLAGS_xor-neon.o += $(NEON_FLAGS)
> + CFLAGS_xor-neon.o += $(NEON_FLAGS) -ftree-vectorize -Wno-unused-variable
> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> endif
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index e1e76186ec23..62b493e386c4 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,16 +14,6 @@ MODULE_LICENSE("GPL");
> #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> #endif
>
> -/*
> - * Pull in the reference implementations while instructing GCC (through
> - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> - */
> -#ifdef CONFIG_CC_IS_GCC
> -#pragma GCC optimize "tree-vectorize"
> -#endif
> -
> -#pragma GCC diagnostic ignored "-Wunused-variable"
> #include <asm-generic/xor.h>
>
> struct xor_block_template const xor_block_neon_inner = {
> --
> 2.29.2
>
On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu <[email protected]> wrote:
>
> From: Nathan Chancellor <[email protected]>
>
> Drop warning because kernel now requires GCC >= v4.9 after
> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9").
>
> Reported-by: Nick Desaulniers <[email protected]>
> Signed-off-by: Nathan Chancellor <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
Again, this does not do what it says on the tin.
If you want to disable the pragma for Clang, call that out in the
commit log, and don't hide it under a GCC version change.
Without the pragma, the generated code is the same as the generic
code, so it makes no sense to build xor-neon.ko at all, right?
> ---
> arch/arm/lib/xor-neon.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..e1e76186ec23 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> * NEON instructions.
> */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
> #pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
> #endif
>
> #pragma GCC diagnostic ignored "-Wunused-variable"
> --
> 2.29.2
>
On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu <[email protected]> wrote:
>
> Using a pragma like GCC optimize is a bad idea because it tags
> all functions with an __attribute__((optimize)) which replaces
> optimization options rather than appending so could result in
> dropping important flags. Not recommended for production use.
>
> Because these options should always be enabled for this file,
> it's better to set them via command line. tree-vectorize is on
> by default in Clang, but it doesn't hurt to make it explicit.
>
> Suggested-by: Arvind Sankar <[email protected]>
> Suggested-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
> ---
> arch/arm/lib/Makefile | 2 +-
> arch/arm/lib/xor-neon.c | 10 ----------
> 2 files changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index 6d2ba454f25b..12d31d1a7630 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -45,6 +45,6 @@ $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
>
> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> - CFLAGS_xor-neon.o += $(NEON_FLAGS)
> + CFLAGS_xor-neon.o += $(NEON_FLAGS) -ftree-vectorize -Wno-unused-variable
> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> endif
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index e1e76186ec23..62b493e386c4 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,16 +14,6 @@ MODULE_LICENSE("GPL");
> #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> #endif
>
> -/*
> - * Pull in the reference implementations while instructing GCC (through
> - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> - */
> -#ifdef CONFIG_CC_IS_GCC
> -#pragma GCC optimize "tree-vectorize"
> -#endif
> -
> -#pragma GCC diagnostic ignored "-Wunused-variable"
> #include <asm-generic/xor.h>
>
> struct xor_block_template const xor_block_neon_inner = {
> --
> 2.29.2
>
So what is the status now here? How does putting -ftree-vectorize on
the command line interact with Clang?
Hi Ard,
On Fri, 13 Nov 2020, Ard Biesheuvel <[email protected]> wrote:
> On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu
> <[email protected]> wrote:
>>
>> From: Nathan Chancellor <[email protected]>
>>
>> Drop warning because kernel now requires GCC >= v4.9 after
>> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9").
>>
>> Reported-by: Nick Desaulniers <[email protected]>
>> Signed-off-by: Nathan Chancellor <[email protected]>
>> Signed-off-by: Adrian Ratiu <[email protected]>
>
> Again, this does not do what it says on the tin.
>
> If you want to disable the pragma for Clang, call that out in
> the commit log, and don't hide it under a GCC version change.
I am not doing anything for Clang in this series.
The option to auto-vectorize in Clang is enabled by default but
doesn't work for some reason (likely to do with how it computes
the cost model, so maybe not even a bug at all) and if we enable
it explicitely (eg via a Clang specific pragma) we get some
warnings we currently do not understand, so I am not changing the
Clang behaviour at the recommendation of Nick.
So this is only for GCC as the "tin" says :) We can fix clang
separately as the Clang bug has always been present and is
unrelated.
>
> Without the pragma, the generated code is the same as the
> generic code, so it makes no sense to build xor-neon.ko at all,
> right?
>
Yes that is correct and that is the reason why in v1 I opted to
not build xor-neon.ko for Clang anymore, but that got NACKed, so
here I'm fixing the low hanging fruit: the very obvious & clear
GCC problems.
>> ---
>> arch/arm/lib/xor-neon.c | 9 +--------
>> 1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
>> index b99dd8e1c93f..e1e76186ec23 100644
>> --- a/arch/arm/lib/xor-neon.c
>> +++ b/arch/arm/lib/xor-neon.c
>> @@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
>> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>> * NEON instructions.
>> */
>> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
>> +#ifdef CONFIG_CC_IS_GCC
>> #pragma GCC optimize "tree-vectorize"
>> -#else
>> -/*
>> - * While older versions of GCC do not generate incorrect code, they fail to
>> - * recognize the parallel nature of these functions, and emit plain ARM code,
>> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
>> - */
>> -#warning This code requires at least version 4.6 of GCC
>> #endif
>>
>> #pragma GCC diagnostic ignored "-Wunused-variable"
>> --
>> 2.29.2
>>
On Fri, 13 Nov 2020, Ard Biesheuvel <[email protected]> wrote:
> On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu
> <[email protected]> wrote:
>>
>> Using a pragma like GCC optimize is a bad idea because it tags
>> all functions with an __attribute__((optimize)) which replaces
>> optimization options rather than appending so could result in
>> dropping important flags. Not recommended for production use.
>>
>> Because these options should always be enabled for this file,
>> it's better to set them via command line. tree-vectorize is on
>> by default in Clang, but it doesn't hurt to make it explicit.
>>
>> Suggested-by: Arvind Sankar <[email protected]>
>> Suggested-by: Ard Biesheuvel <[email protected]> Signed-off-by:
>> Adrian Ratiu <[email protected]> ---
>> arch/arm/lib/Makefile | 2 +- arch/arm/lib/xor-neon.c | 10
>> ---------- 2 files changed, 1 insertion(+), 11 deletions(-)
>>
>> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
>> index 6d2ba454f25b..12d31d1a7630 100644 ---
>> a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -45,6
>> +45,6 @@ $(obj)/csumpartialcopyuser.o:
>> $(obj)/csumpartialcopygeneric.S
>>
>> ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
>> NEON_FLAGS := -march=armv7-a
>> -mfloat-abi=softfp -mfpu=neon
>> - CFLAGS_xor-neon.o += $(NEON_FLAGS) +
>> CFLAGS_xor-neon.o += $(NEON_FLAGS) -ftree-vectorize
>> -Wno-unused-variable
>> obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
>> endif
>> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
>> index e1e76186ec23..62b493e386c4 100644 ---
>> a/arch/arm/lib/xor-neon.c +++ b/arch/arm/lib/xor-neon.c @@
>> -14,16 +14,6 @@ MODULE_LICENSE("GPL");
>> #error You should compile this file with '-march=armv7-a
>> -mfloat-abi=softfp -mfpu=neon' #endif
>>
>> -/* - * Pull in the reference implementations while instructing
>> GCC (through - * -ftree-vectorize) to attempt to exploit
>> implicit parallelism and emit - * NEON instructions. - */
>> -#ifdef CONFIG_CC_IS_GCC -#pragma GCC optimize "tree-vectorize"
>> -#endif - -#pragma GCC diagnostic ignored "-Wunused-variable"
>> #include <asm-generic/xor.h>
>>
>> struct xor_block_template const xor_block_neon_inner = {
>> -- 2.29.2
>>
>
> So what is the status now here? How does putting
> -ftree-vectorize on the command line interact with Clang?
Clang needs to be fixed separately as -ftree-vectorize does not
change anything, the option is enabled by default.
I know it sucks to have such a silent failure, but it's always
been there (the "upgrade your GCC" warning during Clang builds was
bogus) and I do not want to rush a Clang fix without fully
understanding it.
Warning Clang users that the optimization doesn't work was
discussed but dropped because users can't do anything about it.
If we are positively certain this is a kernel bug and not a Clang
bug (i.e. the xor-neon use case is not enabling/triggering the
optimization properly) I could add a TODO comment in the code
FWIW.
Adrian
On Fri, 13 Nov 2020 at 12:05, Adrian Ratiu <[email protected]> wrote:
>
> Hi Ard,
>
> On Fri, 13 Nov 2020, Ard Biesheuvel <[email protected]> wrote:
> > On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu
> > <[email protected]> wrote:
> >>
> >> From: Nathan Chancellor <[email protected]>
> >>
> >> Drop warning because kernel now requires GCC >= v4.9 after
> >> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9").
> >>
> >> Reported-by: Nick Desaulniers <[email protected]>
> >> Signed-off-by: Nathan Chancellor <[email protected]>
> >> Signed-off-by: Adrian Ratiu <[email protected]>
> >
> > Again, this does not do what it says on the tin.
> >
> > If you want to disable the pragma for Clang, call that out in
> > the commit log, and don't hide it under a GCC version change.
>
> I am not doing anything for Clang in this series.
>
> The option to auto-vectorize in Clang is enabled by default but
> doesn't work for some reason (likely to do with how it computes
> the cost model, so maybe not even a bug at all) and if we enable
> it explicitely (eg via a Clang specific pragma) we get some
> warnings we currently do not understand, so I am not changing the
> Clang behaviour at the recommendation of Nick.
>
> So this is only for GCC as the "tin" says :) We can fix clang
> separately as the Clang bug has always been present and is
> unrelated.
>
But you are adding the IS_GCC check here, no? Is that equivalent? IOW,
does Clang today identify as GCC <= 4.6?
> >
> > Without the pragma, the generated code is the same as the
> > generic code, so it makes no sense to build xor-neon.ko at all,
> > right?
> >
>
> Yes that is correct and that is the reason why in v1 I opted to
> not build xor-neon.ko for Clang anymore, but that got NACKed, so
> here I'm fixing the low hanging fruit: the very obvious & clear
> GCC problems.
>
>
Fair enough.
> >> ---
> >> arch/arm/lib/xor-neon.c | 9 +--------
> >> 1 file changed, 1 insertion(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> >> index b99dd8e1c93f..e1e76186ec23 100644
> >> --- a/arch/arm/lib/xor-neon.c
> >> +++ b/arch/arm/lib/xor-neon.c
> >> @@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
> >> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> >> * NEON instructions.
> >> */
> >> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> >> +#ifdef CONFIG_CC_IS_GCC
> >> #pragma GCC optimize "tree-vectorize"
> >> -#else
> >> -/*
> >> - * While older versions of GCC do not generate incorrect code, they fail to
> >> - * recognize the parallel nature of these functions, and emit plain ARM code,
> >> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> >> - */
> >> -#warning This code requires at least version 4.6 of GCC
> >> #endif
> >>
> >> #pragma GCC diagnostic ignored "-Wunused-variable"
> >> --
> >> 2.29.2
> >>
On Fri, 13 Nov 2020, Ard Biesheuvel <[email protected]> wrote:
> On Fri, 13 Nov 2020 at 12:05, Adrian Ratiu
> <[email protected]> wrote:
>>
>> Hi Ard,
>>
>> On Fri, 13 Nov 2020, Ard Biesheuvel <[email protected]> wrote:
>> > On Thu, 12 Nov 2020 at 22:23, Adrian Ratiu
>> > <[email protected]> wrote:
>> >>
>> >> From: Nathan Chancellor <[email protected]>
>> >>
>> >> Drop warning because kernel now requires GCC >= v4.9 after
>> >> commit 6ec4476ac825 ("Raise gcc version requirement to
>> >> 4.9").
>> >>
>> >> Reported-by: Nick Desaulniers <[email protected]>
>> >> Signed-off-by: Nathan Chancellor <[email protected]>
>> >> Signed-off-by: Adrian Ratiu <[email protected]>
>> >
>> > Again, this does not do what it says on the tin.
>> >
>> > If you want to disable the pragma for Clang, call that out in
>> > the commit log, and don't hide it under a GCC version change.
>>
>> I am not doing anything for Clang in this series.
>>
>> The option to auto-vectorize in Clang is enabled by default but
>> doesn't work for some reason (likely to do with how it computes
>> the cost model, so maybe not even a bug at all) and if we
>> enable it explicitely (eg via a Clang specific pragma) we get
>> some warnings we currently do not understand, so I am not
>> changing the Clang behaviour at the recommendation of Nick.
>>
>> So this is only for GCC as the "tin" says :) We can fix clang
>> separately as the Clang bug has always been present and is
>> unrelated.
>>
>
> But you are adding the IS_GCC check here, no? Is that
> equivalent? IOW, does Clang today identify as GCC <= 4.6?
>
I see what you mean now. Thanks.
Clang identifies as GCC <= 4.6 yes, so the code is not strictly
speaking equivalent. The warning to upgrade GCC doesn't make sense
for Clang but I should mention removing it in the commit message
as well.
>> >
>> > Without the pragma, the generated code is the same as the
>> > generic code, so it makes no sense to build xor-neon.ko at all,
>> > right?
>> >
>>
>> Yes that is correct and that is the reason why in v1 I opted to
>> not build xor-neon.ko for Clang anymore, but that got NACKed, so
>> here I'm fixing the low hanging fruit: the very obvious & clear
>> GCC problems.
>>
>>
>
> Fair enough.
>
>> >> ---
>> >> arch/arm/lib/xor-neon.c | 9 +--------
>> >> 1 file changed, 1 insertion(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
>> >> index b99dd8e1c93f..e1e76186ec23 100644
>> >> --- a/arch/arm/lib/xor-neon.c
>> >> +++ b/arch/arm/lib/xor-neon.c
>> >> @@ -19,15 +19,8 @@ MODULE_LICENSE("GPL");
>> >> * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>> >> * NEON instructions.
>> >> */
>> >> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
>> >> +#ifdef CONFIG_CC_IS_GCC
>> >> #pragma GCC optimize "tree-vectorize"
>> >> -#else
>> >> -/*
>> >> - * While older versions of GCC do not generate incorrect code, they fail to
>> >> - * recognize the parallel nature of these functions, and emit plain ARM code,
>> >> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
>> >> - */
>> >> -#warning This code requires at least version 4.6 of GCC
>> >> #endif
>> >>
>> >> #pragma GCC diagnostic ignored "-Wunused-variable"
>> >> --
>> >> 2.29.2
>> >>