Add a function to the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.
The function doesn't perform any runtime tests and gets optimized out to
nothing after passing the build assertions.
At the time of writing, we have not yet asserted that this will work on all
architectures. Architectures which fail that test should adjust their
arch/*/include/asm/bitops.h by either:
- using __builtin_constant_p() to select between the architecture
specific asm and the generic __builitn implementation if the
architecture specific asm produces better code (similar to [1]).
- always calling the generic __builtin implementation if the architecture
specific asm does not produce better code (similar to [2]).
[1] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate
constant expressions")
[2] patch "x86/asm/bitops: Replace __fls() by its generic builtin
implementation"
Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Yury Norov <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
---
This is a follow up of this thread:
https://lore.kernel.org/all/Yw8hJS9f6SofG4%2F6@yury-laptop/
in which I promised to eventually implement a check.
This patch requires below series in order to work on x86:
https://lore.kernel.org/lkml/[email protected]/
Because that series is not yet merge, I am sending it as RFC for
now.
Concerning architectures other than x86, I am fine to write patches
but I will not test it (I do not have the build environment).
One idea would be to add this patch to any kind of CI which runs on
all architecture and see which architectures need to be fixed before
making this mainstream. Tell me what you think.
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3fc7abffc7aa..233a82cd3b6e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2300,6 +2300,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..e6e3d22ce52a 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,50 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+static void __init test_bitops_const_eval(void)
+{
+ int res;
+ u64 res64;
+
+ /*
+ * On any supported optimization level (-O2, -Os) and if
+ * invoked with a compile-time constant argument, the compiler
+ * must be able to fold into a constant expression all the bit
+ * find functions. Namely: __ffs(), ffs(), ffz(), __fls(),
+ * fls() and fls64(). Otherwise, trigger a build bug.
+ */
+
+ /* __ffs(BIT(10)) == 10 */
+ res = __ffs(BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* ffs(BIT(10)) == 11 */
+ res = ffs(BIT(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* ffz(~BIT(10)) == 10 */
+ res = ffz(~BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* __fls(BIT(10)) == 10 */
+ res = __fls(BIT(10)) == 10;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* fls(BIT(10)) == 11 */
+ res = fls(BIT(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res));
+ BUILD_BUG_ON(!res);
+
+ /* fls64(BIT_ULL(10)) == 11 */
+ res64 = fls64(BIT_ULL(10)) == 11;
+ BUILD_BUG_ON(!__builtin_constant_p(res64));
+ BUILD_BUG_ON(!res64);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +138,8 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval();
+
pr_info("Completed bitops test\n");
return 0;
--
2.35.1
Add a function in the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.
The function doesn't perform any runtime tests and gets optimized out to
nothing after passing the build assertions.
Architectures which fail that test should adjust their
arch/*/include/asm/bitops.h in order to use the compiler's generic
__builitn implementation if the argument is a constant expression
(similar to [1]).
[1] commit 146034fed6ee ("x86/asm/bitops: Use __builtin_ffs() to evaluate
constant expressions")
Suggested-by: Yury Norov <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
---
This is tested on x86, x86_64, arm and arm64.
Other architectures were not tested. My idea would be to add this
patch to any kind of CI which runs on all architecture (linux-next?)
and see if anything breaks.
Or maybe I should send a message to the maintainers of all of the
arch/*/include/asm/bitops.h files?
Tell me what you think.
** Changelog **
v1 -> v2:
- Drop the RFC patch. v1 was not ready to be applied on x86 because
of pending changes in arch/x86/include/asm/bitops.h. This was
finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
__builtin_clz{l|ll} to evaluate constant expressions"). Thanks Nick!
- Update the commit description.
- Introduce the test_const_eval() macro to factorize code.
- No functional changes.
Link: https://lore.kernel.org/all/[email protected]/
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..c97d818dbc30 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2454,6 +2454,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..49e2f76575e4 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,35 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr) \
+({ \
+ /* Evaluate once so that compiler can fold it. */ \
+ bool __test_expr = test_expr; \
+ \
+ BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
+ BUILD_BUG_ON(!__test_expr); \
+})
+
+static void test_bitops_const_eval(void)
+{
+ /*
+ * On any supported optimization level (-O2, -Os) and if
+ * invoked with a compile-time constant argument, the compiler
+ * must be able to fold into a constant expression all the bit
+ * find functions. Namely: __ffs(), ffs(), ffz(), __fls(),
+ * fls() and fls64(). Otherwise, trigger a build bug.
+ */
+ const int n = 10;
+
+ test_const_eval(__ffs(BIT(n)) == n);
+ test_const_eval(ffs(BIT(n)) == n + 1);
+ test_const_eval(ffz(~BIT(n)) == n);
+ test_const_eval(__fls(BIT(n)) == n);
+ test_const_eval(fls(BIT(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +123,8 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval();
+
pr_info("Completed bitops test\n");
return 0;
--
2.25.1
Hi Vincent,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.7-rc3 next-20231130]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Mailhol/lib-test_bitops-add-compile-time-optimization-evaluations-assertions/20231130-182837
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231130102717.1297492-1-mailhol.vincent%40wanadoo.fr
patch subject: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20231201/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231201/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
In file included from <command-line>:
In function 'test_bitops_const_eval',
inlined from 'test_bitops_startup' at lib/test_bitops.c:126:2:
>> include/linux/compiler_types.h:435:45: error: call to '__compiletime_assert_183' declared with attribute error: BUILD_BUG_ON failed: !__builtin_constant_p(__test_expr)
435 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:416:25: note: in definition of macro '__compiletime_assert'
416 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:435:9: note: in expansion of macro '_compiletime_assert'
435 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
lib/test_bitops.c:59:9: note: in expansion of macro 'BUILD_BUG_ON'
59 | BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
| ^~~~~~~~~~~~
lib/test_bitops.c:74:9: note: in expansion of macro 'test_const_eval'
74 | test_const_eval(__ffs(BIT(n)) == n);
| ^~~~~~~~~~~~~~~
vim +/__compiletime_assert_183 +435 include/linux/compiler_types.h
eb5c2d4b45e3d2 Will Deacon 2020-07-21 421
eb5c2d4b45e3d2 Will Deacon 2020-07-21 422 #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 423 __compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 424
eb5c2d4b45e3d2 Will Deacon 2020-07-21 425 /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21 426 * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 427 * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21 428 * @msg: a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21 429 *
eb5c2d4b45e3d2 Will Deacon 2020-07-21 430 * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 431 * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21 432 * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21 433 */
eb5c2d4b45e3d2 Will Deacon 2020-07-21 434 #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @435 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21 436
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Vincent,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.7-rc4 next-20231206]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Mailhol/lib-test_bitops-add-compile-time-optimization-evaluations-assertions/20231130-182837
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20231130102717.1297492-1-mailhol.vincent%40wanadoo.fr
patch subject: [PATCH v2] lib: test_bitops: add compile-time optimization/evaluations assertions
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20231206/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231206/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
>> lib/test_bitops.c:74:2: error: call to '__compiletime_assert_178' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(__test_expr)
74 | test_const_eval(__ffs(BIT(n)) == n);
| ^
lib/test_bitops.c:59:2: note: expanded from macro 'test_const_eval'
59 | BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
| ^
include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^
note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
include/linux/compiler_types.h:423:2: note: expanded from macro '_compiletime_assert'
423 | __compiletime_assert(condition, msg, prefix, suffix)
| ^
include/linux/compiler_types.h:416:4: note: expanded from macro '__compiletime_assert'
416 | prefix ## suffix(); \
| ^
<scratch space>:18:1: note: expanded from here
18 | __compiletime_assert_178
| ^
1 error generated.
vim +74 lib/test_bitops.c
62
63 static void test_bitops_const_eval(void)
64 {
65 /*
66 * On any supported optimization level (-O2, -Os) and if
67 * invoked with a compile-time constant argument, the compiler
68 * must be able to fold into a constant expression all the bit
69 * find functions. Namely: __ffs(), ffs(), ffz(), __fls(),
70 * fls() and fls64(). Otherwise, trigger a build bug.
71 */
72 const int n = 10;
73
> 74 test_const_eval(__ffs(BIT(n)) == n);
75 test_const_eval(ffs(BIT(n)) == n + 1);
76 test_const_eval(ffz(~BIT(n)) == n);
77 test_const_eval(__fls(BIT(n)) == n);
78 test_const_eval(fls(BIT(n)) == n + 1);
79 test_const_eval(fls64(BIT_ULL(n)) == n + 1);
80 }
81
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
This series make sure that all the bitops operations (namely __ffs(),
ffs(), ffz(), __fls(), fls(), fls64()) correctly fold constant
expressions given that their argument is also a constant expression.
The first two patches optimize m68k architecture, the third and fourth
optimize the hexagon architecture bitops function, the fifth and final
patch adds test to assert that the constant folding occurs and that
the result is accurate.
This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other
architectures, I am putting my trust into the kernel test robot to
send a report if ever one of the other architectures still lacks
bitops optimizations.
---
** Changelog **
v2 -> v3:
- Add patches 1/5 and 2/5 to optimize m68k architecture bitops.
Thanks to the kernel test robot for reporting!
- Add patches 3/5 and 4/5 to optimize hexagon architecture bitops.
Thanks to the kernel test robot for reporting!
- Patch 5/5: mark test_bitops_const_eval() as __always_inline, this
done, pass n (the test number) as a parameter. Previously, only
BITS(10) was tested. Add tests for BITS(0) and BITS(31).
Link: https://lore.kernel.org/all/[email protected]/
v1 -> v2:
- Drop the RFC patch. v1 was not ready to be applied on x86 because
of pending changes in arch/x86/include/asm/bitops.h. This was
finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
__builtin_clz{l|ll} to evaluate constant expressions").
Thanks Nick!
- Update the commit description.
- Introduce the test_const_eval() macro to factorize code.
- No functional change.
Link: https://lore.kernel.org/all/[email protected]/
Vincent Mailhol (5):
m68k/bitops: force inlining of all bitops functions
m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
hexagon/bitops: force inlining of all bitops functions
hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
lib: test_bitops: add compile-time optimization/evaluations assertions
arch/hexagon/include/asm/bitops.h | 37 ++++++++----
arch/m68k/include/asm/bitops.h | 99 +++++++++++++++++--------------
lib/Kconfig.debug | 4 ++
lib/test_bitops.c | 32 ++++++++++
4 files changed, 118 insertions(+), 54 deletions(-)
--
2.25.1
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60449738 70975612 2288988 133714338 vmlinux.before
60446534 70972412 2289596 133708542 vmlinux.after
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/m68k/include/asm/bitops.h | 87 +++++++++++++++++-----------------
1 file changed, 44 insertions(+), 43 deletions(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..ae0457d582b8 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -28,7 +28,7 @@
* So we use the best form possible on a given platform.
*/
-static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -38,7 +38,7 @@ static inline void bset_reg_set_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -47,7 +47,7 @@ static inline void bset_mem_set_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfset %1{%0:#1}"
:
@@ -71,7 +71,7 @@ arch___set_bit(unsigned long nr, volatile unsigned long *addr)
set_bit(nr, addr);
}
-static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -81,7 +81,7 @@ static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -90,7 +90,7 @@ static inline void bclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfclr %1{%0:#1}"
:
@@ -114,7 +114,7 @@ arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
clear_bit(nr, addr);
}
-static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -124,7 +124,7 @@ static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
: "memory");
}
-static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
@@ -133,7 +133,7 @@ static inline void bchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
: "di" (nr & 7));
}
-static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
+static __always_inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
{
__asm__ __volatile__ ("bfchg %1{%0:#1}"
:
@@ -160,8 +160,8 @@ arch___change_bit(unsigned long nr, volatile unsigned long *addr)
#define arch_test_bit generic_test_bit
#define arch_test_bit_acquire generic_test_bit_acquire
-static inline int bset_reg_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bset_reg_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -173,8 +173,8 @@ static inline int bset_reg_test_and_set_bit(int nr,
return retval;
}
-static inline int bset_mem_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -185,8 +185,8 @@ static inline int bset_mem_test_and_set_bit(int nr,
return retval;
}
-static inline int bfset_mem_test_and_set_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfset_mem_test_and_set_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -213,8 +213,8 @@ arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_set_bit(nr, addr);
}
-static inline int bclr_reg_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bclr_reg_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -226,8 +226,8 @@ static inline int bclr_reg_test_and_clear_bit(int nr,
return retval;
}
-static inline int bclr_mem_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -238,8 +238,8 @@ static inline int bclr_mem_test_and_clear_bit(int nr,
return retval;
}
-static inline int bfclr_mem_test_and_clear_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfclr_mem_test_and_clear_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -266,8 +266,8 @@ arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_clear_bit(nr, addr);
}
-static inline int bchg_reg_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bchg_reg_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -279,8 +279,8 @@ static inline int bchg_reg_test_and_change_bit(int nr,
return retval;
}
-static inline int bchg_mem_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char *p = (char *)vaddr + (nr ^ 31) / 8;
char retval;
@@ -291,8 +291,8 @@ static inline int bchg_mem_test_and_change_bit(int nr,
return retval;
}
-static inline int bfchg_mem_test_and_change_bit(int nr,
- volatile unsigned long *vaddr)
+static __always_inline int
+bfchg_mem_test_and_change_bit(int nr, volatile unsigned long *vaddr)
{
char retval;
@@ -319,8 +319,8 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
return test_and_change_bit(nr, addr);
}
-static inline bool xor_unlock_is_negative_byte(unsigned long mask,
- volatile unsigned long *p)
+static __always_inline bool
+xor_unlock_is_negative_byte(unsigned long mask, volatile unsigned long *p)
{
#ifdef CONFIG_COLDFIRE
__asm__ __volatile__ ("eorl %1, %0"
@@ -350,8 +350,8 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask,
#include <asm-generic/bitops/ffz.h>
#else
-static inline int find_first_zero_bit(const unsigned long *vaddr,
- unsigned size)
+static __always_inline int
+find_first_zero_bit(const unsigned long *vaddr, unsigned size)
{
const unsigned long *p = vaddr;
int res = 32;
@@ -376,8 +376,8 @@ static inline int find_first_zero_bit(const unsigned long *vaddr,
}
#define find_first_zero_bit find_first_zero_bit
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
- int offset)
+static __always_inline int
+find_next_zero_bit(const unsigned long *vaddr, int size, int offset)
{
const unsigned long *p = vaddr + (offset >> 5);
int bit = offset & 31UL, res;
@@ -406,7 +406,8 @@ static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
}
#define find_next_zero_bit find_next_zero_bit
-static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
+static __always_inline int
+find_first_bit(const unsigned long *vaddr, unsigned size)
{
const unsigned long *p = vaddr;
int res = 32;
@@ -431,8 +432,8 @@ static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
}
#define find_first_bit find_first_bit
-static inline int find_next_bit(const unsigned long *vaddr, int size,
- int offset)
+static __always_inline int
+find_next_bit(const unsigned long *vaddr, int size, int offset)
{
const unsigned long *p = vaddr + (offset >> 5);
int bit = offset & 31UL, res;
@@ -465,7 +466,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size,
* ffz = Find First Zero in word. Undefined if no zero exists,
* so code should check against ~0UL first..
*/
-static inline unsigned long ffz(unsigned long word)
+static __always_inline unsigned long ffz(unsigned long word)
{
int res;
@@ -488,7 +489,7 @@ static inline unsigned long ffz(unsigned long word)
*/
#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
@@ -496,7 +497,7 @@ static inline unsigned long __ffs(unsigned long x)
return x;
}
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
if (!x)
return 0;
@@ -518,7 +519,7 @@ static inline int ffs(int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int cnt;
@@ -528,7 +529,7 @@ static inline int ffs(int x)
return 32 - cnt;
}
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
return ffs(x) - 1;
}
@@ -536,7 +537,7 @@ static inline unsigned long __ffs(unsigned long x)
/*
* fls: find last bit set.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int cnt;
@@ -546,7 +547,7 @@ static inline int fls(unsigned int x)
return 32 - cnt;
}
-static inline unsigned long __fls(unsigned long x)
+static __always_inline unsigned long __fls(unsigned long x)
{
return fls(x) - 1;
}
--
2.25.1
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60446534 70972412 2289596 133708542 vmlinux.before
60429746 70978876 2291676 133700298 vmlinux.after
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/m68k/include/asm/bitops.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index ae0457d582b8..3f89b9dccc33 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -470,6 +470,9 @@ static __always_inline unsigned long ffz(unsigned long word)
{
int res;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(~word);
+
__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
: "=d" (res) : "d" (~word & -~word));
return res ^ 31;
@@ -491,6 +494,9 @@ static __always_inline unsigned long ffz(unsigned long word)
!defined(CONFIG_M68000)
static __always_inline unsigned long __ffs(unsigned long x)
{
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(x);
+
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
: "0" (x));
@@ -523,6 +529,9 @@ static __always_inline int ffs(int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
__asm__ ("bfffo %1{#0:#0},%0"
: "=d" (cnt)
: "dm" (x & -x));
@@ -541,6 +550,9 @@ static __always_inline int fls(unsigned int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
__asm__ ("bfffo %1{#0,#0},%0"
: "=d" (cnt)
: "dm" (x));
--
2.25.1
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/hexagon/include/asm/bitops.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 160d8f37fa1a..950d4acc2edc 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -28,7 +28,7 @@
* @nr: bit number to clear
* @addr: pointer to memory
*/
-static inline int test_and_clear_bit(int nr, volatile void *addr)
+static __always_inline int test_and_clear_bit(int nr, volatile void *addr)
{
int oldval;
@@ -52,7 +52,7 @@ static inline int test_and_clear_bit(int nr, volatile void *addr)
* @nr: bit number to set
* @addr: pointer to memory
*/
-static inline int test_and_set_bit(int nr, volatile void *addr)
+static __always_inline int test_and_set_bit(int nr, volatile void *addr)
{
int oldval;
@@ -78,7 +78,7 @@ static inline int test_and_set_bit(int nr, volatile void *addr)
* @nr: bit number to set
* @addr: pointer to memory
*/
-static inline int test_and_change_bit(int nr, volatile void *addr)
+static __always_inline int test_and_change_bit(int nr, volatile void *addr)
{
int oldval;
@@ -103,17 +103,17 @@ static inline int test_and_change_bit(int nr, volatile void *addr)
* Rewrite later to save a cycle or two.
*/
-static inline void clear_bit(int nr, volatile void *addr)
+static __always_inline void clear_bit(int nr, volatile void *addr)
{
test_and_clear_bit(nr, addr);
}
-static inline void set_bit(int nr, volatile void *addr)
+static __always_inline void set_bit(int nr, volatile void *addr)
{
test_and_set_bit(nr, addr);
}
-static inline void change_bit(int nr, volatile void *addr)
+static __always_inline void change_bit(int nr, volatile void *addr)
{
test_and_change_bit(nr, addr);
}
@@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
*
* Undefined if no zero exists, so code should check against ~0UL first.
*/
-static inline long ffz(int x)
+static __always_inline long ffz(int x)
{
int r;
@@ -217,7 +217,7 @@ static inline long ffz(int x)
* This is defined the same way as ffs.
* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int r;
@@ -238,7 +238,7 @@ static inline int fls(unsigned int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int r;
@@ -260,7 +260,7 @@ static inline int ffs(int x)
* bits_per_long assumed to be 32
* numbering starts at 0 I think (instead of 1 like ffs)
*/
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
@@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word)
* Undefined if no set bit exists, so code should check against 0 first.
* bits_per_long assumed to be 32
*/
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
{
int num;
--
2.25.1
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 950d4acc2edc..12c6ad1ea2ed 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -204,6 +204,9 @@ static __always_inline long ffz(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(~x);
+
asm("%0 = ct1(%1);\n"
: "=&r" (r)
: "r" (x));
@@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
asm("{ %0 = cl0(%1);}\n"
"%0 = sub(#32,%0);\n"
: "=&r" (r)
@@ -242,6 +248,9 @@ static __always_inline int ffs(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n"
"{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n"
: "=&r" (r)
@@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(word);
+
asm("%0 = ct0(%1);\n"
: "=&r" (num)
: "r" (word));
@@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return BITS_PER_LONG - 1 - __builtin_clzl(word);
+
asm("%0 = cl0(%1);\n"
"%0 = sub(#31,%0);\n"
: "=&r" (num)
--
2.25.1
Add a function in the bitops test suite to assert that the bitops
helper correctly fold constant expressions (or trigger a build bug
otherwise). This should work on all the optimization levels supported
by Kbuild.
The added function doesn't perform any runtime tests and gets
optimized out to nothing after passing the build assertions.
Suggested-by: Yury Norov <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..c97d818dbc30 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2454,6 +2454,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..99b612515eb6 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr) \
+({ \
+ /* Evaluate once so that compiler can fold it. */ \
+ bool __test_expr = test_expr; \
+ \
+ BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
+ BUILD_BUG_ON(!__test_expr); \
+})
+
+/*
+ * On any supported optimization level (-O2, -Os) and if invoked with
+ * a compile-time constant argument, the compiler must be able to fold
+ * into a constant expression all the bit find functions. Namely:
+ * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise,
+ * trigger a build bug.
+ */
+static __always_inline void test_bitops_const_eval(unsigned int n)
+{
+ test_const_eval(__ffs(BIT(n)) == n);
+ test_const_eval(ffs(BIT(n)) == n + 1);
+ test_const_eval(ffz(~BIT(n)) == n);
+ test_const_eval(__fls(BIT(n)) == n);
+ test_const_eval(fls(BIT(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +122,10 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval(0);
+ test_bitops_const_eval(10);
+ test_bitops_const_eval(31);
+
pr_info("Completed bitops test\n");
return 0;
--
2.25.1
Hi Vincent,
Thanks for your patch!
On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<[email protected]> wrote:
> The inline keyword actually does not guarantee that the compiler will
> inline a functions. Whenever the goal is to actually inline a
> function, __always_inline should always be preferred instead.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
>
> $ size --format=GNU vmlinux.before vmlinux.after
> text data bss total filename
> 60449738 70975612 2288988 133714338 vmlinux.before
> 60446534 70972412 2289596 133708542 vmlinux.after
With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
(i.e. a size increase):
allyesconfig:
text data bss total filename
58878600 72415994 2283652 133578246 vmlinux.before
58882250 72419706 2284004 133585960 vmlinux.after
atari_defconfig:
text data bss total filename
4112060 1579862 151680 5843602 vmlinux-v6.7-rc8
4117008 1579350 151680 5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
The next patch offsets that for allyesconfig, but not for atari_defconfig.
> Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> test_and_set_bit and friends")
Please don't split lines containing tags.
> Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
>
> Signed-off-by: Vincent Mailhol <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
<[email protected]> wrote:
>
> The compiler is not able to do constant folding on "asm volatile" code.
>
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
>
> On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
>
> $ size --format=GNU vmlinux.before vmlinux.after
> text data bss total filename
> 60446534 70972412 2289596 133708542 vmlinux.before
> 60429746 70978876 2291676 133700298 vmlinux.after
Still a win with gcc 9.5.0-1ubuntu1~22.04:
allyesconfig:
text data bss total filename
58882250 72419706 2284004 133585960
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
58863570 72419546 2285860 133568976
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs
So an even bigger win...
atari_defconfig:
text data bss total filename
4117008 1579350 151680 5848038
vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
4116940 1579534 151712 5848186
vmlinux-v6.7-rc8-2-m68k-bitops-use-__builtin_clz,ctzl,ffs
... but a small size increase here.
>
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl()
> to evaluate constant expressions")
Please don't split lines containing tags.
> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
>
> Signed-off-by: Vincent Mailhol <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Tue. 2 janv. 2024 at 19:28, Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Vincent,
>
> Thanks for your patch!
Thanks for the review and for running the benchmark.
> On Sun, Dec 17, 2023 at 8:13 AM Vincent Mailhol
> <[email protected]> wrote:
> > The inline keyword actually does not guarantee that the compiler will
> > inline a functions. Whenever the goal is to actually inline a
> > function, __always_inline should always be preferred instead.
> >
> > On an allyesconfig, with GCC 13.2.1, it saves roughly 5 KB.
> >
> > $ size --format=GNU vmlinux.before vmlinux.after
> > text data bss total filename
> > 60449738 70975612 2288988 133714338 vmlinux.before
> > 60446534 70972412 2289596 133708542 vmlinux.after
>
> With gcc 9.5.0-1ubuntu1~22.04, the figures are completely different
> (i.e. a size increase):
Those results are not normal, there should not be such a big
discrepancy between two versions of the same compiler. I double
checked everything and found out that I made a mistake when computing
the figures: not sure what exactly, but at some point, the ASLR seeds
(or other similar randomization feature) got reset and so, the
decrease I witnessed was just a "lucky roll".
After rerunning the benchmark (making sure to keep every seeds), I got
similar results as you:
text data bss total filename
60449738 70975356 2288988 133714082
vmlinux_allyesconfig.before_this_series
60446534 70979068 2289596 133715198
vmlinux_allyesconfig.after_first_patch
60429746 70979132 2291676 133700554
vmlinux_allyesconfig.final_second_patch
Note that there are still some kind of randomness on the data segment
as shown in those other benchmarks I run:
text data bss total filename
60449738 70976124 2288988 133714850
vmlinux_allyesconfig.before_this_series
60446534 70980092 2289596 133716222
vmlinux_allyesconfig.after_first_patch
60429746 70979388 2291676 133700810
vmlinux_allyesconfig.after_second_patch
text data bss total filename
60449738 70975612 2288988 133714338
vmlinux_allyesconfig.before_this_series
60446534 70980348 2289596 133716478
vmlinux_allyesconfig.after_first_patch
60429746 70979900 2291676 133701322
vmlinux_allyesconfig.after_second_patch
But the error margin is within 1K.
So, in short, I inlined some functions which I shouldn't have. I am
preparing a v4 in which I will only inline the bit-find functions
(namely: __ffs(), ffs(), ffz(), __fls(), fls() and fls64()). Here are
the new figures:
text data bss total filename
60453552 70955485 2288620 133697657
vmlinux_allyesconfig.before_this_series
60450304 70953085 2289260 133692649
vmlinux_allyesconfig.after_first_patch
60433536 70952637 2291340 133677513
vmlinux_allyesconfig.after_second_patch
N.B. The new figures were after a rebase, so do not try to compare
with the previous benchmarks. I will send the v4 soon, after I finish
to update the patch comments and double check things.
Concerning the other functions in bitops.h, there may be some other
ones worth a __always_inline. But I will narrow the scope of this
series only to the bit-find function. If a good samaritan wants to
investigate the other functions, go ahead!
Yours sincerely,
Vincent Mailhol
> allyesconfig:
>
> text data bss total filename
> 58878600 72415994 2283652 133578246 vmlinux.before
> 58882250 72419706 2284004 133585960 vmlinux.after
>
> atari_defconfig:
>
> text data bss total filename
> 4112060 1579862 151680 5843602 vmlinux-v6.7-rc8
> 4117008 1579350 151680 5848038
> vmlinux-v6.7-rc8-1-m68k-bitops-force-inlining
>
> The next patch offsets that for allyesconfig, but not for atari_defconfig.
>
> > Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of
> > test_and_set_bit and friends")
>
> Please don't split lines containing tags.
>
> > Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
> >
> > Signed-off-by: Vincent Mailhol <[email protected]>
>
> Reviewed-by: Geert Uytterhoeven <[email protected]>
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds
This series adds a compile test to make sure that all the bitops
operations (namely __ffs(), ffs(), ffz(), __fls(), fls(), fls64())
correctly fold constant expressions given that their argument is also
a constant expression. The other functions from bitops.h are out of
scope.
So far, only the n68k and the hexagon architectures lack such
optimization. To this extend, the first two patches optimize m68k
architecture, the third and fourth optimize the hexagon architecture
bitops function.
The fifth and final patch adds the compile time tests to assert that
the constant folding occurs and that the result is accurate.
This is tested on arm, arm64, hexagon, m68k, x86 and x86_64. For other
architectures, I am putting my trust into the kernel test robot to
send a report if ever one of these still lacks bitops
optimizations. The kernel test robot did not complain on v3, giving me
confidence that all architectures are now properly optimized.
---
** Changelog **
v3 -> v4:
- Only apply the __always_inline to the bit-find functions, do not
touch other functions from bitops.h. I discovered that the
benchmark done in the v3 was incorrect (refer to the thread for
details). The scope was thus narrowed down to the bit-find
functions for which I could demonstrate the gain in the benchmark.
- Add benchmark for hexagon (patch 3/5 and 4/5). Contrarily to the
m68k benchmark which is with an allyesconfig, the hexagon
benchmark uses a defconfig. The reason is just that the
allyesconfig did not work on first try on my environment (even
before applying this series), and I did not spent efforts to
troubleshoot.
- Add Geert review tag in patch 2/5. Despite also receiving the tag
for patch 1/5, I did not apply due to new changes in that patch.
- Do not split the lines containing tags.
Link: https://lore.kernel.org/all/[email protected]/
v2 -> v3:
- Add patches 1/5 and 2/5 to optimize m68k architecture bitops.
Thanks to the kernel test robot for reporting!
- Add patches 3/5 and 4/5 to optimize hexagon architecture bitops.
Thanks to the kernel test robot for reporting!
- Patch 5/5: mark test_bitops_const_eval() as __always_inline, this
done, pass n (the test number) as a parameter. Previously, only
BITS(10) was tested. Add tests for BITS(0) and BITS(31).
Link: https://lore.kernel.org/all/[email protected]/
v1 -> v2:
- Drop the RFC patch. v1 was not ready to be applied on x86 because
of pending changes in arch/x86/include/asm/bitops.h. This was
finally fixed by Nick in commit 3dae5c43badf ("x86/asm/bitops: Use
__builtin_clz{l|ll} to evaluate constant expressions").
Thanks Nick!
- Update the commit description.
- Introduce the test_const_eval() macro to factorize code.
- No functional change.
Link: https://lore.kernel.org/all/[email protected]/
Vincent Mailhol (5):
m68k/bitops: force inlining of all bit-find functions
m68k/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
hexagon/bitops: force inlining of all bit-find functions
hexagon/bitops: use __builtin_{clz,ctzl,ffs} to evaluate constant
expressions
lib: test_bitops: add compile-time optimization/evaluations assertions
arch/hexagon/include/asm/bitops.h | 25 +++++++++++++++++++-----
arch/m68k/include/asm/bitops.h | 26 ++++++++++++++++++-------
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 32 +++++++++++++++++++++++++++++++
4 files changed, 75 insertions(+), 12 deletions(-)
--
2.43.0
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
__always_inline is also needed for further optimizations which will
come up in a follow-up patch.
Inline all the bit-find function which have a custom m68k assembly
implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls().
On linux v6.7 allyesconfig with GCC 13.2.1, it does not impact the
final size, meaning that, overall, those function were already inlined
on modern GCCs:
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60457956 70953665 2288644 133700265 vmlinux.before
60457964 70953697 2288644 133700305 vmlinux.after
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/m68k/include/asm/bitops.h | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..a8b23f897f24 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -465,7 +465,7 @@ static inline int find_next_bit(const unsigned long *vaddr, int size,
* ffz = Find First Zero in word. Undefined if no zero exists,
* so code should check against ~0UL first..
*/
-static inline unsigned long ffz(unsigned long word)
+static __always_inline unsigned long ffz(unsigned long word)
{
int res;
@@ -488,7 +488,7 @@ static inline unsigned long ffz(unsigned long word)
*/
#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
!defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
@@ -496,7 +496,7 @@ static inline unsigned long __ffs(unsigned long x)
return x;
}
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
if (!x)
return 0;
@@ -518,7 +518,7 @@ static inline int ffs(int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int cnt;
@@ -528,7 +528,7 @@ static inline int ffs(int x)
return 32 - cnt;
}
-static inline unsigned long __ffs(unsigned long x)
+static __always_inline unsigned long __ffs(unsigned long x)
{
return ffs(x) - 1;
}
@@ -536,7 +536,7 @@ static inline unsigned long __ffs(unsigned long x)
/*
* fls: find last bit set.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int cnt;
@@ -546,7 +546,7 @@ static inline int fls(unsigned int x)
return 32 - cnt;
}
-static inline unsigned long __fls(unsigned long x)
+static __always_inline unsigned long __fls(unsigned long x)
{
return fls(x) - 1;
}
--
2.43.0
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
60457964 70953697 2288644 133700305 vmlinux.before
60441196 70957057 2290724 133688977 vmlinux.after
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/m68k/include/asm/bitops.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index a8b23f897f24..02ec8a193b96 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
{
int res;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(~word);
+
__asm__ __volatile__ ("bfffo %1{#0,#0},%0"
: "=d" (res) : "d" (~word & -~word));
return res ^ 31;
@@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
!defined(CONFIG_M68000)
static __always_inline unsigned long __ffs(unsigned long x)
{
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(x);
+
__asm__ __volatile__ ("bitrev %0; ff1 %0"
: "=d" (x)
: "0" (x));
@@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
__asm__ ("bfffo %1{#0:#0},%0"
: "=d" (cnt)
: "dm" (x & -x));
@@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
{
int cnt;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
__asm__ ("bfffo %1{#0,#0},%0"
: "=d" (cnt)
: "dm" (x));
--
2.43.0
The inline keyword actually does not guarantee that the compiler will
inline a functions. Whenever the goal is to actually inline a
function, __always_inline should always be preferred instead.
__always_inline is also needed for further optimizations which will
come up in a follow-up patch.
Inline all the bit-find function which have a custom hexagon assembly
implementation, namely: __ffs(), ffs(), ffz(), __fls(), fls().
On linux v6.7 defconfig with clang 17.0.6, it does not impact the
final size, meaning that, overall, those function were already inlined
on modern clangs:
$ size --format=GNU vmlinux.before vmlinux.after vmlinux.final
text data bss total filename
4827900 1798340 364057 6990297 vmlinux.before
4827900 1798340 364057 6990297 vmlinux.after
Reference: commit 8dd5032d9c54 ("x86/asm/bitops: Force inlining of test_and_set_bit and friends")
Link: https://git.kernel.org/torvalds/c/8dd5032d9c54
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/hexagon/include/asm/bitops.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 160d8f37fa1a..e856d6dbfe16 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -200,7 +200,7 @@ arch_test_bit_acquire(unsigned long nr, const volatile unsigned long *addr)
*
* Undefined if no zero exists, so code should check against ~0UL first.
*/
-static inline long ffz(int x)
+static __always_inline long ffz(int x)
{
int r;
@@ -217,7 +217,7 @@ static inline long ffz(int x)
* This is defined the same way as ffs.
* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
*/
-static inline int fls(unsigned int x)
+static __always_inline int fls(unsigned int x)
{
int r;
@@ -238,7 +238,7 @@ static inline int fls(unsigned int x)
* the libc and compiler builtin ffs routines, therefore
* differs in spirit from the above ffz (man ffs).
*/
-static inline int ffs(int x)
+static __always_inline int ffs(int x)
{
int r;
@@ -260,7 +260,7 @@ static inline int ffs(int x)
* bits_per_long assumed to be 32
* numbering starts at 0 I think (instead of 1 like ffs)
*/
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
@@ -278,7 +278,7 @@ static inline unsigned long __ffs(unsigned long word)
* Undefined if no set bit exists, so code should check against 0 first.
* bits_per_long assumed to be 32
*/
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
{
int num;
--
2.43.0
Add a function in the bitops test suite to assert that the bitops
helper functions correctly fold into constant expressions (or trigger
a build bug otherwise). This should work on all the optimization
levels supported by Kbuild.
The added function does not perform any runtime tests and gets
optimized out to nothing after passing the build assertions.
Suggested-by: Yury Norov <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
---
lib/Kconfig.debug | 4 ++++
lib/test_bitops.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4405f81248fb..85f8638b3ae6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2439,6 +2439,10 @@ config TEST_BITOPS
compilations. It has no dependencies and doesn't run or load unless
explicitly requested by name. for example: modprobe test_bitops.
+ In addition, check that the compiler is able to fold the bitops
+ function into a compile-time constant (given that the argument is also
+ a compile-time constant) and trigger a build bug otherwise.
+
If unsure, say N.
config TEST_VMALLOC
diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..99b612515eb6 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,34 @@ static unsigned long order_comb_long[][2] = {
};
#endif
+/* Assert that a boolean expression can be folded in a constant and is true. */
+#define test_const_eval(test_expr) \
+({ \
+ /* Evaluate once so that compiler can fold it. */ \
+ bool __test_expr = test_expr; \
+ \
+ BUILD_BUG_ON(!__builtin_constant_p(__test_expr)); \
+ BUILD_BUG_ON(!__test_expr); \
+})
+
+/*
+ * On any supported optimization level (-O2, -Os) and if invoked with
+ * a compile-time constant argument, the compiler must be able to fold
+ * into a constant expression all the bit find functions. Namely:
+ * __ffs(), ffs(), ffz(), __fls(), fls() and fls64(). Otherwise,
+ * trigger a build bug.
+ */
+static __always_inline void test_bitops_const_eval(unsigned int n)
+{
+ test_const_eval(__ffs(BIT(n)) == n);
+ test_const_eval(ffs(BIT(n)) == n + 1);
+ test_const_eval(ffz(~BIT(n)) == n);
+ test_const_eval(__fls(BIT(n)) == n);
+ test_const_eval(fls(BIT(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n)) == n + 1);
+ test_const_eval(fls64(BIT_ULL(n + 32)) == n + 33);
+}
+
static int __init test_bitops_startup(void)
{
int i, bit_set;
@@ -94,6 +122,10 @@ static int __init test_bitops_startup(void)
if (bit_set != BITOPS_LAST)
pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
+ test_bitops_const_eval(0);
+ test_bitops_const_eval(10);
+ test_bitops_const_eval(31);
+
pr_info("Completed bitops test\n");
return 0;
--
2.43.0
The compiler is not able to do constant folding on "asm volatile" code.
Evaluate whether or not the function argument is a constant expression
and if this is the case, return an equivalent builtin expression.
On linux 6.7 with an allyesconfig and clang 17.0.6, it saves roughly
4 KB.
$ size --format=GNU vmlinux.before vmlinux.after
text data bss total filename
4827900 1798340 364057 6990297 vmlinux.before
4827072 1795060 364057 6986189 vmlinux.after
Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
Signed-off-by: Vincent Mailhol <[email protected]>
---
arch/hexagon/include/asm/bitops.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index e856d6dbfe16..c74a639c84f3 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -204,6 +204,9 @@ static __always_inline long ffz(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ctzl(~x);
+
asm("%0 = ct1(%1);\n"
: "=&r" (r)
: "r" (x));
@@ -221,6 +224,9 @@ static __always_inline int fls(unsigned int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
+
asm("{ %0 = cl0(%1);}\n"
"%0 = sub(#32,%0);\n"
: "=&r" (r)
@@ -242,6 +248,9 @@ static __always_inline int ffs(int x)
{
int r;
+ if (__builtin_constant_p(x))
+ return __builtin_ffs(x);
+
asm("{ P0 = cmp.eq(%1,#0); %0 = ct0(%1);}\n"
"{ if (P0) %0 = #0; if (!P0) %0 = add(%0,#1);}\n"
: "=&r" (r)
@@ -264,6 +273,9 @@ static __always_inline unsigned long __ffs(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return __builtin_ctzl(word);
+
asm("%0 = ct0(%1);\n"
: "=&r" (num)
: "r" (word));
@@ -282,6 +294,9 @@ static __always_inline unsigned long __fls(unsigned long word)
{
int num;
+ if (__builtin_constant_p(word))
+ return BITS_PER_LONG - 1 - __builtin_clzl(word);
+
asm("%0 = cl0(%1);\n"
"%0 = sub(#31,%0);\n"
: "=&r" (num)
--
2.43.0
On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> The compiler is not able to do constant folding on "asm volatile" code.
>
> Evaluate whether or not the function argument is a constant expression
> and if this is the case, return an equivalent builtin expression.
>
> On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.
>
> $ size --format=GNU vmlinux.before vmlinux.after
> text data bss total filename
> 60457964 70953697 2288644 133700305 vmlinux.before
> 60441196 70957057 2290724 133688977 vmlinux.after
>
> Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
> Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
>
> Reviewed-by: Geert Uytterhoeven <[email protected]>
> Signed-off-by: Vincent Mailhol <[email protected]>
> ---
> arch/m68k/include/asm/bitops.h | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index a8b23f897f24..02ec8a193b96 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> {
> int res;
>
> + if (__builtin_constant_p(word))
> + return __builtin_ctzl(~word);
> +
> __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
> : "=d" (res) : "d" (~word & -~word));
> return res ^ 31;
If the builtin has the desired behaviour, why do we reimplement it in asm?
Shouldn't we abandon one or the other to avoid having to prove (and
maintain) their equivalence?
> @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> !defined(CONFIG_M68000)
> static __always_inline unsigned long __ffs(unsigned long x)
> {
> + if (__builtin_constant_p(x))
> + return __builtin_ctzl(x);
> +
> __asm__ __volatile__ ("bitrev %0; ff1 %0"
> : "=d" (x)
> : "0" (x));
> @@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
> {
> int cnt;
>
> + if (__builtin_constant_p(x))
> + return __builtin_ffs(x);
> +
> __asm__ ("bfffo %1{#0:#0},%0"
> : "=d" (cnt)
> : "dm" (x & -x));
> @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
> {
> int cnt;
>
> + if (__builtin_constant_p(x))
> + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
> +
> __asm__ ("bfffo %1{#0,#0},%0"
> : "=d" (cnt)
> : "dm" (x));
>
On Sun. 28 Jan. 2024 at 14:39, Finn Thain <[email protected]> wrote:
> On Sun, 28 Jan 2024, Vincent Mailhol wrote:
>
> > The compiler is not able to do constant folding on "asm volatile" code.
> >
> > Evaluate whether or not the function argument is a constant expression
> > and if this is the case, return an equivalent builtin expression.
> >
> > On linux 6.7 with an allyesconfig and GCC 13.2.1, it saves roughly 11 KB.
> >
> > $ size --format=GNU vmlinux.before vmlinux.after
> > text data bss total filename
> > 60457964 70953697 2288644 133700305 vmlinux.before
> > 60441196 70957057 2290724 133688977 vmlinux.after
> >
> > Reference: commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions")
> > Link: https://git.kernel.org/torvalds/c/fdb6649ab7c1
> >
> > Reviewed-by: Geert Uytterhoeven <[email protected]>
> > Signed-off-by: Vincent Mailhol <[email protected]>
> > ---
> > arch/m68k/include/asm/bitops.h | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> > index a8b23f897f24..02ec8a193b96 100644
> > --- a/arch/m68k/include/asm/bitops.h
> > +++ b/arch/m68k/include/asm/bitops.h
> > @@ -469,6 +469,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> > {
> > int res;
> >
> > + if (__builtin_constant_p(word))
> > + return __builtin_ctzl(~word);
> > +
> > __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
> > : "=d" (res) : "d" (~word & -~word));
> > return res ^ 31;
>
> If the builtin has the desired behaviour, why do we reimplement it in asm?
> Shouldn't we abandon one or the other to avoid having to prove (and
> maintain) their equivalence?
The asm is meant to produce better results when the argument is not a
constant expression. Below commit is a good illustration of why we
want both the asm and the built:
https://git.kernel.org/torvalds/c/146034fed6ee
I say "is meant", because I did not assert whether this is still true.
Note that there are some cases in which the asm is not better anymore,
for example, see this thread:
https://lore.kernel.org/lkml/[email protected]/
but I did not receive more answers, so I stopped trying to investigate
the subject.
If you want, you can check the produced assembly of both the asm and
the builtin for both clang and gcc, and if the builtin is always
either better or equivalent, then the asm can be removed. That said, I
am not spending more effort there after being ghosted once (c.f. above
thread).
> > @@ -490,6 +493,9 @@ static __always_inline unsigned long ffz(unsigned long word)
> > !defined(CONFIG_M68000)
> > static __always_inline unsigned long __ffs(unsigned long x)
> > {
> > + if (__builtin_constant_p(x))
> > + return __builtin_ctzl(x);
> > +
> > __asm__ __volatile__ ("bitrev %0; ff1 %0"
> > : "=d" (x)
> > : "0" (x));
> > @@ -522,6 +528,9 @@ static __always_inline int ffs(int x)
> > {
> > int cnt;
> >
> > + if (__builtin_constant_p(x))
> > + return __builtin_ffs(x);
> > +
> > __asm__ ("bfffo %1{#0:#0},%0"
> > : "=d" (cnt)
> > : "dm" (x & -x));
> > @@ -540,6 +549,9 @@ static __always_inline int fls(unsigned int x)
> > {
> > int cnt;
> >
> > + if (__builtin_constant_p(x))
> > + return x ? BITS_PER_TYPE(x) - __builtin_clz(x) : 0;
> > +
> > __asm__ ("bfffo %1{#0,#0},%0"
> > : "=d" (cnt)
> > : "dm" (x));
> >
From: Vincent MAILHOL
> Sent: 28 January 2024 06:27
>
> On Sun. 28 Jan. 2024 at 14:39, Finn Thain <[email protected]> wrote:
> > On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> >
> > > The compiler is not able to do constant folding on "asm volatile" code.
> > >
> > > Evaluate whether or not the function argument is a constant expression
> > > and if this is the case, return an equivalent builtin expression.
> > >
...
> > If the builtin has the desired behaviour, why do we reimplement it in asm?
> > Shouldn't we abandon one or the other to avoid having to prove (and
> > maintain) their equivalence?
>
> The asm is meant to produce better results when the argument is not a
> constant expression. Below commit is a good illustration of why we
> want both the asm and the built:
>
> https://git.kernel.org/torvalds/c/146034fed6ee
>
> I say "is meant", because I did not assert whether this is still true.
> Note that there are some cases in which the asm is not better anymore,
> for example, see this thread:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> but I did not receive more answers, so I stopped trying to investigate
> the subject.
>
> If you want, you can check the produced assembly of both the asm and
> the builtin for both clang and gcc, and if the builtin is always
> either better or equivalent, then the asm can be removed. That said, I
> am not spending more effort there after being ghosted once (c.f. above
> thread).
I don't see any example there of why the __builtin_xxx() versions
shouldn't be used all the time.
(The x86-64 asm blocks contain unrelated call instructions and objdump
wasn't passed -d to show what they were.
One even has the 'return thunk pessimisation showing.)
I actually suspect the asm versions predate the builtins.
Does (or can) the outer common header use the __builtin functions
if no asm version exists?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sun. 28 janv. 2024 at 21:16, David Laight <[email protected]> wrote:
> From: Vincent MAILHOL
> > Sent: 28 January 2024 06:27
> >
> > On Sun. 28 Jan. 2024 at 14:39, Finn Thain <[email protected]> wrote:
> > > On Sun, 28 Jan 2024, Vincent Mailhol wrote:
> > >
> > > > The compiler is not able to do constant folding on "asm volatile" code.
> > > >
> > > > Evaluate whether or not the function argument is a constant expression
> > > > and if this is the case, return an equivalent builtin expression.
> > > >
> ...
> > > If the builtin has the desired behaviour, why do we reimplement it in asm?
> > > Shouldn't we abandon one or the other to avoid having to prove (and
> > > maintain) their equivalence?
> >
> > The asm is meant to produce better results when the argument is not a
> > constant expression. Below commit is a good illustration of why we
> > want both the asm and the built:
> >
> > https://git.kernel.org/torvalds/c/146034fed6ee
> >
> > I say "is meant", because I did not assert whether this is still true.
> > Note that there are some cases in which the asm is not better anymore,
> > for example, see this thread:
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > but I did not receive more answers, so I stopped trying to investigate
> > the subject.
> >
> > If you want, you can check the produced assembly of both the asm and
> > the builtin for both clang and gcc, and if the builtin is always
> > either better or equivalent, then the asm can be removed. That said, I
> > am not spending more effort there after being ghosted once (c.f. above
> > thread).
>
> I don't see any example there of why the __builtin_xxx() versions
> shouldn't be used all the time.
> (The x86-64 asm blocks contain unrelated call instructions and objdump
> wasn't passed -d to show what they were.
> One even has the 'return thunk pessimisation showing.)
Fair. My goal was not to point to the assembly code but to this sentence:
However, for non constant expressions, the kernel's ffs() asm
version remains better for x86_64 because, contrary to GCC, it
doesn't emit the CMOV assembly instruction
I should have been more clear. Sorry for that.
But the fact remains, on x86, some of the asm produced more optimized
code than the builtin.
> I actually suspect the asm versions predate the builtins.
This seems true. The __bultins were introduced in:
generic: Implement generic ffs/fls using __builtin_* functions
https://git.kernel.org/torvalds/c/048fa2df92c3
when the asm implementation already existed in m68k.
> Does (or can) the outer common header use the __builtin functions
> if no asm version exists?
Yes, this would be extremely easy. You just need to
#include/asm-generic/bitops/builtin-__ffs.h
#include/asm-generic/bitops/builtin-ffs.h
#include/asm-generic/bitops/builtin-__fls.h
#include/asm-generic/bitops/builtin-fls.h
and remove all the asm implementations. If you give me your green
light, I can do that change. This is trivial. The only thing I am not
ready to do is to compare the produced assembly code and confirm
whether or not it is better to remove asm code.
Thoughts?
Yours sincerely,
Vincent Mailhol
From: Vincent MAILHOL
> Sent: 28 January 2024 13:28
...
> Fair. My goal was not to point to the assembly code but to this sentence:
>
> However, for non constant expressions, the kernel's ffs() asm
> version remains better for x86_64 because, contrary to GCC, it
> doesn't emit the CMOV assembly instruction
The comment in the code is:
* AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the
* dest reg is undefined if x==0, but their CPU architect says its
* value is written to set it to the same as before, except that the
* top 32 bits will be cleared.
I guess gcc isn't willing to act on hearsay!
>
> I should have been more clear. Sorry for that.
>
> But the fact remains, on x86, some of the asm produced more optimized
> code than the builtin.
>
> > I actually suspect the asm versions predate the builtins.
>
> This seems true. The __bultins were introduced in:
>
> generic: Implement generic ffs/fls using __builtin_* functions
> https://git.kernel.org/torvalds/c/048fa2df92c3
I was thinking of compiler versions not kernel source commits.
You'd need to be doing some serious software archaeology.
> when the asm implementation already existed in m68k.
>
> > Does (or can) the outer common header use the __builtin functions
> > if no asm version exists?
>
> Yes, this would be extremely easy. You just need to
>
> #include/asm-generic/bitops/builtin-__ffs.h
> #include/asm-generic/bitops/builtin-ffs.h
> #include/asm-generic/bitops/builtin-__fls.h
> #include/asm-generic/bitops/builtin-fls.h
>
> and remove all the asm implementations. If you give me your green
> light, I can do that change. This is trivial. The only thing I am not
> ready to do is to compare the produced assembly code and confirm
> whether or not it is better to remove asm code.
>
> Thoughts?
Not for me to say.
But the builtins are likely to generate reasonable code.
So unless the asm can be better (like trusting undocumented
x86 cpu behaviour) using them is probably best.
For the ones you are changing it may be best to propose using
the builtins all the time.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sun, 28 Jan 2024, Vincent MAILHOL wrote:
> > > The asm is meant to produce better results when the argument is not
> > > a constant expression.
Is that because gcc's implementation has to satisfy requirements that are
excessively stringent for the kernel's purposes? Or is it a compiler
deficiency only affecting certain architectures?
> ... The only thing I am not ready to do is to compare the produced
> assembly code and confirm whether or not it is better to remove asm
> code.
>
If you do the comparison and find no change, you get to say so in the
commit log, and everyone is happy.
Hi Finn and David,
Sorry for the late feedback, I did not have much time during weekdays.
On Monday. 29 Jan. 2024 at 07:34, Finn Thain <[email protected]> wrote:
> On Sun, 28 Jan 2024, Vincent MAILHOL wrote:
> > > > The asm is meant to produce better results when the argument is not
> > > > a constant expression.
>
> Is that because gcc's implementation has to satisfy requirements that are
> excessively stringent for the kernel's purposes? Or is it a compiler
> deficiency only affecting certain architectures?
I just guess that GCC guys followed the Intel datasheet while the
kernel guys like to live a bit more dangerously and rely on some not
so well defined behaviours... But I am really not the person to whom
you should ask.
I just want to optimize the constant folding and this is the only
purpose of this series. I am absolutely not an asm. That's also why I
am reluctant to compare the asm outputs.
> > ... The only thing I am not ready to do is to compare the produced
> > assembly code and confirm whether or not it is better to remove asm
> > code.
> >
>
> If you do the comparison and find no change, you get to say so in the
> commit log, and everyone is happy.
Without getting into details, here is a quick comparaisons of gcc and
clang generated asm for all the bitops builtin:
https://godbolt.org/z/Yb8nMKnYf
To the extent of my limited knowledge, it looks rather OK for gcc, but
for clang... seems that this is the default software implementation.
So are we fine with the current patch? Or maybe clang support is not
important for m68k? I do not know so tell me :)
Yours sincerely,
Vincent Mailhol
On Sun, 4 Feb 2024, Vincent MAILHOL wrote:
> Sorry for the late feedback, I did not have much time during weekdays.
>
> On Monday. 29 Jan. 2024 at 07:34, Finn Thain <[email protected]> wrote:
> > On Sun, 28 Jan 2024, Vincent MAILHOL wrote:
> > > > > The asm is meant to produce better results when the argument is not
> > > > > a constant expression.
> >
> > Is that because gcc's implementation has to satisfy requirements that are
> > excessively stringent for the kernel's purposes? Or is it a compiler
> > deficiency only affecting certain architectures?
>
> I just guess that GCC guys followed the Intel datasheet while the
> kernel guys like to live a bit more dangerously and rely on some not
> so well defined behaviours... But I am really not the person to whom
> you should ask.
>
> I just want to optimize the constant folding and this is the only
> purpose of this series. I am absolutely not an asm. That's also why I
> am reluctant to compare the asm outputs.
>
How does replacing asm with a builtin prevent constant folding?
> > > ... The only thing I am not ready to do is to compare the produced
> > > assembly code and confirm whether or not it is better to remove asm
> > > code.
> > >
> >
> > If you do the comparison and find no change, you get to say so in the
> > commit log, and everyone is happy.
>
> Without getting into details, here is a quick comparaisons of gcc and
> clang generated asm for all the bitops builtin:
>
> https://godbolt.org/z/Yb8nMKnYf
>
> To the extent of my limited knowledge, it looks rather OK for gcc, but
> for clang... seems that this is the default software implementation.
>
> So are we fine with the current patch? Or maybe clang support is not
> important for m68k? I do not know so tell me :)
>
Let's see if I understand.
You are proposing that the kernel source carry an unquantified
optimization, with inherent complexity and maintenance costs, just for the
benefit of users who choose a compiler that doesn't work as well as the
standard compiler. Is that it?
At some point in the future when clang comes up to scrach with gcc and the
builtin reaches parity with the asm, I wonder if you will then remove both
your optimization and the asm, to eliminate the afore-mentioned complexity
and maintenance costs. Is there an incentive for that work?
On Mon. 5 Feb. 2024 at 08:13, Finn Thain <[email protected]> wrote:
> On Sun, 4 Feb 2024, Vincent MAILHOL wrote:
(...)
> Let's see if I understand.
>
> You are proposing that the kernel source carry an unquantified
> optimization, with inherent complexity and maintenance costs, just for the
> benefit of users who choose a compiler that doesn't work as well as the
> standard compiler. Is that it?
My proposal is quantified, c.f. my commit message:
On an allyesconfig, with GCC 13.2.1, it saves roughly 8 KB.
"Saving roughly 8kb" is a quantification.
And clang use in the kernel is standardized:
https://www.kernel.org/doc/html/latest/process/changes.html#current-minimal-requirements
GCC may be predominant, but, although optional, clang v11+ is
officially supported, and thus my patches should not neglect it.
This is why I am asking whether or not clang support is important or
not for m68k. If you tell me it is not, then fine, I will remove all
the asm (by the way, the patch is already ready). But if there are
even a few users who care about clang for m68k, then I do not think we
should penalize them and I would not sign-off a change which
negatively impacts some users.
The linux-m68k mailing list should know better than me if people care
about clang support. So let me reiterate the question from my previous
message: is clang support important for you?
I would like a formal acknowledgement from the linux-m68k mailing list
before sending a patch that may negatively impact some users. Thank
you.
> At some point in the future when clang comes up to scrach with gcc and the
> builtin reaches parity with the asm, I wonder if you will then remove both
> your optimization and the asm, to eliminate the afore-mentioned complexity
> and maintenance costs. Is there an incentive for that work?
I will not follow up the evolution of clang support for m68k builtins.
The goal of this series is to add a test to assert that all
architectures correctly do the constant folding on the bit find
operations (fifth patch of this series). It happens that m68k and
hexagon architectures are failing that test, so I am fixing this as a
one shot. After this series, I have no plan to do further work on m68k
and hexagon architectures. Thank you for your understanding.
Yours sincerely,
Vincent Mailhol
On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
>
> This is why I am asking whether or not clang support is important or not
> for m68k. If you tell me it is not, then fine, I will remove all the asm
> (by the way, the patch is already ready). But if there are even a few
> users who care about clang for m68k, then I do not think we should
> penalize them and I would not sign-off a change which negatively impacts
> some users.
>
If clang support is important then clang's builtins are important. So why
not improve those instead? That would resolve the issue in a win-win.
On Mon. 5 Feb. 2024 at 18:48, Finn Thain <[email protected]> wrote:
> On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
>
> >
> > This is why I am asking whether or not clang support is important or not
> > for m68k. If you tell me it is not, then fine, I will remove all the asm
> > (by the way, the patch is already ready). But if there are even a few
> > users who care about clang for m68k, then I do not think we should
> > penalize them and I would not sign-off a change which negatively impacts
> > some users.
> >
>
> If clang support is important then clang's builtins are important. So why
> not improve those instead? That would resolve the issue in a win-win.
I am deeply sorry, but with all your respect, this request is unfair.
I will not fix the compiler.
Let me repeat my question for the third time: are you (or any other
m68k maintainer) ready to acknowledge that we can degrade the
performance for clang m68k users? With that acknowledgement, I will
remove the asm and replace it with the builtins.
Yours sincerely,
Vincent Mailhol
GCC and clang provides a set of builtin functions which can be used as
a replacement for ffs(), __ffs(), fls() and __fls().
Using the builtin comes as a trade-off.
Pros:
- Simpler code, easier to maintain
- Guarantee the constant folding
Cons:
- clang does not provide optimized code. Ref:
https://godbolt.org/z/Yb8nMKnYf
Despite of the cons, use the builtin unconditionally and remove any
existing assembly implementation.
This done, remove the find_first_zero_bit(), find_next_zero_bit(),
find_first_bit() and find_next_bit() assembly implementations.
Instead, rely on the generic functions from linux/find.h which will
themselves rely on the builtin we just set up.
Not-signed-off-by: Vincent Mailhol <[email protected]>
---
As written earlier, I do not want to sign-off a patch which can
degrade the performances of m68k clang users. But because it is
already written, there it is.
If someone wants to pick up this, go ahead. Just make sure to remove
any reference to myself. I hereby grant permission for anyone to reuse
that patch, with or without modifications, under the unique condition
that my name gets removed.
---
arch/m68k/include/asm/bitops.h | 215 +--------------------------------
1 file changed, 5 insertions(+), 210 deletions(-)
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 14c64a6f1217..44aac8ced9fc 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -340,218 +340,13 @@ static inline bool xor_unlock_is_negative_byte(unsigned long mask,
#endif
}
-/*
- * The true 68020 and more advanced processors support the "bfffo"
- * instruction for finding bits. ColdFire and simple 68000 parts
- * (including CPU32) do not support this. They simply use the generic
- * functions.
- */
-#if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
-#include <asm-generic/bitops/ffz.h>
-#else
-
-static inline int find_first_zero_bit(const unsigned long *vaddr,
- unsigned size)
-{
- const unsigned long *p = vaddr;
- int res = 32;
- unsigned int words;
- unsigned long num;
-
- if (!size)
- return 0;
-
- words = (size + 31) >> 5;
- while (!(num = ~*p++)) {
- if (!--words)
- goto out;
- }
-
- __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
- : "=d" (res) : "d" (num & -num));
- res ^= 31;
-out:
- res += ((long)p - (long)vaddr - 4) * 8;
- return res < size ? res : size;
-}
-#define find_first_zero_bit find_first_zero_bit
-
-static inline int find_next_zero_bit(const unsigned long *vaddr, int size,
- int offset)
-{
- const unsigned long *p = vaddr + (offset >> 5);
- int bit = offset & 31UL, res;
-
- if (offset >= size)
- return size;
-
- if (bit) {
- unsigned long num = ~*p++ & (~0UL << bit);
- offset -= bit;
-
- /* Look for zero in first longword */
- __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
- : "=d" (res) : "d" (num & -num));
- if (res < 32) {
- offset += res ^ 31;
- return offset < size ? offset : size;
- }
- offset += 32;
-
- if (offset >= size)
- return size;
- }
- /* No zero yet, search remaining full bytes for a zero */
- return offset + find_first_zero_bit(p, size - offset);
-}
-#define find_next_zero_bit find_next_zero_bit
-
-static inline int find_first_bit(const unsigned long *vaddr, unsigned size)
-{
- const unsigned long *p = vaddr;
- int res = 32;
- unsigned int words;
- unsigned long num;
-
- if (!size)
- return 0;
-
- words = (size + 31) >> 5;
- while (!(num = *p++)) {
- if (!--words)
- goto out;
- }
-
- __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
- : "=d" (res) : "d" (num & -num));
- res ^= 31;
-out:
- res += ((long)p - (long)vaddr - 4) * 8;
- return res < size ? res : size;
-}
-#define find_first_bit find_first_bit
-
-static inline int find_next_bit(const unsigned long *vaddr, int size,
- int offset)
-{
- const unsigned long *p = vaddr + (offset >> 5);
- int bit = offset & 31UL, res;
-
- if (offset >= size)
- return size;
-
- if (bit) {
- unsigned long num = *p++ & (~0UL << bit);
- offset -= bit;
-
- /* Look for one in first longword */
- __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
- : "=d" (res) : "d" (num & -num));
- if (res < 32) {
- offset += res ^ 31;
- return offset < size ? offset : size;
- }
- offset += 32;
-
- if (offset >= size)
- return size;
- }
- /* No one yet, search remaining full bytes for a one */
- return offset + find_first_bit(p, size - offset);
-}
-#define find_next_bit find_next_bit
-
-/*
- * ffz = Find First Zero in word. Undefined if no zero exists,
- * so code should check against ~0UL first..
- */
-static inline unsigned long ffz(unsigned long word)
-{
- int res;
-
- __asm__ __volatile__ ("bfffo %1{#0,#0},%0"
- : "=d" (res) : "d" (~word & -~word));
- return res ^ 31;
-}
-
-#endif
-
#ifdef __KERNEL__
-#if defined(CONFIG_CPU_HAS_NO_BITFIELDS)
-
-/*
- * The newer ColdFire family members support a "bitrev" instruction
- * and we can use that to implement a fast ffs. Older Coldfire parts,
- * and normal 68000 parts don't have anything special, so we use the
- * generic functions for those.
- */
-#if (defined(__mcfisaaplus__) || defined(__mcfisac__)) && \
- !defined(CONFIG_M68000)
-static inline unsigned long __ffs(unsigned long x)
-{
- __asm__ __volatile__ ("bitrev %0; ff1 %0"
- : "=d" (x)
- : "0" (x));
- return x;
-}
-
-static inline int ffs(int x)
-{
- if (!x)
- return 0;
- return __ffs(x) + 1;
-}
-
-#else
-#include <asm-generic/bitops/ffs.h>
-#include <asm-generic/bitops/__ffs.h>
-#endif
-
-#include <asm-generic/bitops/fls.h>
-#include <asm-generic/bitops/__fls.h>
-
-#else
-
-/*
- * ffs: find first bit set. This is defined the same way as
- * the libc and compiler builtin ffs routines, therefore
- * differs in spirit from the above ffz (man ffs).
- */
-static inline int ffs(int x)
-{
- int cnt;
-
- __asm__ ("bfffo %1{#0:#0},%0"
- : "=d" (cnt)
- : "dm" (x & -x));
- return 32 - cnt;
-}
-
-static inline unsigned long __ffs(unsigned long x)
-{
- return ffs(x) - 1;
-}
-
-/*
- * fls: find last bit set.
- */
-static inline int fls(unsigned int x)
-{
- int cnt;
-
- __asm__ ("bfffo %1{#0,#0},%0"
- : "=d" (cnt)
- : "dm" (x));
- return 32 - cnt;
-}
-
-static inline unsigned long __fls(unsigned long x)
-{
- return fls(x) - 1;
-}
-
-#endif
+#include <asm-generic/bitops/builtin-ffs.h>
+#include <asm-generic/bitops/builtin-__ffs.h>
+#include <asm-generic/bitops/builtin-fls.h>
+#include <asm-generic/bitops/builtin-__fls.h>
+#include <asm-generic/bitops/ffz.h>
/* Simple test-and-set bit locks */
#define test_and_set_bit_lock test_and_set_bit
--
2.43.0
On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
> On Mon. 5 Feb. 2024 at 18:48, Finn Thain <[email protected]> wrote:
> > On Mon, 5 Feb 2024, Vincent MAILHOL wrote:
> >
> > If clang support is important then clang's builtins are important. So
> > why not improve those instead? That would resolve the issue in a
> > win-win.
>
> I am deeply sorry, but with all your respect, this request is unfair. I
> will not fix the compiler.
>
Absolutely. It is unfair. Just as your proposal was unfair to maintainers.
That patch would make a compiler deficiency into a maintenance burden.