From: Arnd Bergmann <[email protected]>
In some configurations __const_iowrite32_copy() does not get inlined
and gcc runs into the BUILD_BUG():
In file included from <command-line>:
In function '__const_memcpy_toio_aligned32',
inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3,
inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20:
include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed
487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert'
468 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert'
487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG'
193 | BUILD_BUG();
| ^~~~~~~~~
Add a check to ensure that the argument is in fact a constant before
calling into __const_memcpy_toio_aligned32().
Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()")
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm64/include/asm/io.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 4ff0ae3f6d66..44913f227060 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
size_t count)
{
- if (count == 8 || count == 4 || count == 2 || count == 1) {
+ if (__builtin_constant_p(count) &&
+ (count == 8 || count == 4 || count == 2 || count == 1)) {
__const_memcpy_toio_aligned32(to, from, count);
dgh();
} else {
--
2.39.2
On Tue, May 28, 2024, at 14:08, Arnd Bergmann wrote:
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 4ff0ae3f6d66..44913f227060 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const
> void *from, size_t count);
> static inline void __const_iowrite32_copy(void __iomem *to, const void
> *from,
> size_t count)
> {
> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> + if (__builtin_constant_p(count) &&
> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> __const_memcpy_toio_aligned32(to, from, count);
> dgh();
> } else {
I ran into the same issue on __const_iowrite64_copy()
now, will send an updated patch. I also noticed that
the __iowrite32_copy() and __iowrite64_copy() look
redundant after my change, so I wonder if we should
just remove those and rename __const_iowrite32_copy()
and __const_iowrite64_copy() accordingly.
Arnd
On Tue, May 28, 2024 at 02:08:38PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> In some configurations __const_iowrite32_copy() does not get inlined
> and gcc runs into the BUILD_BUG():
>
> In file included from <command-line>:
> In function '__const_memcpy_toio_aligned32',
> inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3,
> inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20:
> include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed
> 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^
> include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert'
> 468 | prefix ## suffix(); \
> | ^~~~~~
> include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert'
> 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^~~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> | ^~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> | ^~~~~~~~~~~~~~~~
> arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG'
> 193 | BUILD_BUG();
> | ^~~~~~~~~
>
> Add a check to ensure that the argument is in fact a constant before
> calling into __const_memcpy_toio_aligned32().
>
> Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()")
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> arch/arm64/include/asm/io.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 4ff0ae3f6d66..44913f227060 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
> static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
> size_t count)
> {
> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> + if (__builtin_constant_p(count) &&
> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> __const_memcpy_toio_aligned32(to, from, count);
> dgh();
> } else {
I don't think this is the right fix.
The idea was that this was checked in __iowrite32_copy(), which does:
#define __iowrite32_copy(to, from, count) \
(__builtin_constant_p(count) ? \
__const_iowrite32_copy(to, from, count) : \
__iowrite32_copy_full(to, from, count))
.. and so __const_iowrite32_copy() should really be marked as __always_inline,
and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
inlined and such that __const_memcpy_toio_aligned32() sees a constant.
The same reasoning applies to __const_iowrite64_copy() and
__const_memcpy_toio_aligned64().
Checking for a constant in __const_iowrite32_copy() doesn't guarantee
that __const_memcpy_toio_aligned32() is inlined and will actually see a
constant.
Does diff the below you for you?
Mark.
---->8----
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 4ff0ae3f6d669..f4350aae92d5d 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -153,8 +153,9 @@ extern void __memset_io(volatile void __iomem *, int, size_t);
* emit the large TLP from the CPU.
*/
-static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,
- const u32 *from, size_t count)
+static __always_inline void
+__const_memcpy_toio_aligned32(volatile u32 __iomem *to, const u32 *from,
+ size_t count)
{
switch (count) {
case 8:
@@ -196,8 +197,8 @@ static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,
void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
-static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
- size_t count)
+static __always_inline void
+__const_iowrite32_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
__const_memcpy_toio_aligned32(to, from, count);
@@ -212,8 +213,9 @@ static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
__const_iowrite32_copy(to, from, count) : \
__iowrite32_copy_full(to, from, count))
-static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,
- const u64 *from, size_t count)
+static __always_inline void
+__const_memcpy_toio_aligned64(volatile u64 __iomem *to, const u64 *from,
+ size_t count)
{
switch (count) {
case 8:
@@ -255,8 +257,8 @@ static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,
void __iowrite64_copy_full(void __iomem *to, const void *from, size_t count);
-static inline void __const_iowrite64_copy(void __iomem *to, const void *from,
- size_t count)
+static __always_inline void
+__const_iowrite64_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
__const_memcpy_toio_aligned64(to, from, count);
On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
>> {
>> - if (count == 8 || count == 4 || count == 2 || count == 1) {
>> + if (__builtin_constant_p(count) &&
>> + (count == 8 || count == 4 || count == 2 || count == 1)) {
>> __const_memcpy_toio_aligned32(to, from, count);
>> dgh();
>> } else {
>
> I don't think this is the right fix.
>
> The idea was that this was checked in __iowrite32_copy(), which does:
>
> #define __iowrite32_copy(to, from, count) \
> (__builtin_constant_p(count) ? \
> __const_iowrite32_copy(to, from, count) : \
> __iowrite32_copy_full(to, from, count))
>
> ... and so __const_iowrite32_copy() should really be marked as __always_inline,
> and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
> inlined and such that __const_memcpy_toio_aligned32() sees a constant.
>
> The same reasoning applies to __const_iowrite64_copy() and
> __const_memcpy_toio_aligned64().
>
> Checking for a constant in __const_iowrite32_copy() doesn't guarantee
> that __const_memcpy_toio_aligned32() is inlined and will actually see a
> constant.
>
> Does diff the below you for you?
Yes, your version addresses both failures I ran into, and
I think all other theoretical cases.
I would prefer to combine both though, using __always_inline
to force the compiler to pick the inline version over
__iowrite32_copy_full() even when it is optimizing for size
and it decides the inline version is larger, but removing
the extra complexity from the macro.
According to Jason, he used a macro here to be sure
that the compiler can detect an inline function argument
as constant when the value is known but it is not
a constant value according to the C standard.
This was indeed a problem in older versions of clang
that missed a lot of optimizations in the kernel, but
clang-8 and higher were changed to have the same behavior
as gcc here, so it is no longer necessary now that the
older versions are unable to build kernels.
Arnd
On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
> >> {
> >> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> >> + if (__builtin_constant_p(count) &&
> >> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> >> __const_memcpy_toio_aligned32(to, from, count);
> >> dgh();
> >> } else {
> >
> > I don't think this is the right fix.
> >
> > The idea was that this was checked in __iowrite32_copy(), which does:
> >
> > #define __iowrite32_copy(to, from, count) \
> > (__builtin_constant_p(count) ? \
> > __const_iowrite32_copy(to, from, count) : \
> > __iowrite32_copy_full(to, from, count))
> >
> > ... and so __const_iowrite32_copy() should really be marked as __always_inline,
> > and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
> > inlined and such that __const_memcpy_toio_aligned32() sees a constant.
> >
> > The same reasoning applies to __const_iowrite64_copy() and
> > __const_memcpy_toio_aligned64().
> >
> > Checking for a constant in __const_iowrite32_copy() doesn't guarantee
> > that __const_memcpy_toio_aligned32() is inlined and will actually see a
> > constant.
> >
> > Does diff the below you for you?
>
> Yes, your version addresses both failures I ran into, and
> I think all other theoretical cases.
>
> I would prefer to combine both though, using __always_inline
> to force the compiler to pick the inline version over
> __iowrite32_copy_full() even when it is optimizing for size
> and it decides the inline version is larger, but removing
> the extra complexity from the macro.
Sorry, I'm not sure what you mean here. I don't see anything handling
optimizing for size today so I'm not sure what change your suggesting to
force the use of the inline version; AFAICT that'd always be forced for
a suitable constant size.
What change are you suggesting?
> According to Jason, he used a macro here to be sure
> that the compiler can detect an inline function argument
> as constant when the value is known but it is not
> a constant value according to the C standard.
>
> This was indeed a problem in older versions of clang
> that missed a lot of optimizations in the kernel, but
> clang-8 and higher were changed to have the same behavior
> as gcc here, so it is no longer necessary now that the
> older versions are unable to build kernels.
Getting rid of the macro is fine by me.
Mark.
On Wed, May 29, 2024, at 17:08, Mark Rutland wrote:
> On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
>> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
>>
>> Yes, your version addresses both failures I ran into, and
>> I think all other theoretical cases.
>>
>> I would prefer to combine both though, using __always_inline
>> to force the compiler to pick the inline version over
>> __iowrite32_copy_full() even when it is optimizing for size
>> and it decides the inline version is larger, but removing
>> the extra complexity from the macro.
>
> Sorry, I'm not sure what you mean here. I don't see anything handling
> optimizing for size today so I'm not sure what change your suggesting to
> force the use of the inline version; AFAICT that'd always be forced for
> a suitable constant size.
>
> What change are you suggesting?
What I meant is that reason gcc chooses to not inline
the macro is when we build with CONFIG_CC_OPTIMIZE_FOR_SIZE.
Since it doesn't know that __const_memcpy_toio_aligned64()
is intended to be small after inlining, it sometimes
decides against it, which (with just my patch) would
fall back to the out-of-line __iowrite32_copy_full()
while trying to generate smaller code.
The __always_inline annotation just overrides the
calculation.
Arnd
On Wed, May 29, 2024 at 06:15:57PM +0200, Arnd Bergmann wrote:
> On Wed, May 29, 2024, at 17:08, Mark Rutland wrote:
> > On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
> >> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
>
> >>
> >> Yes, your version addresses both failures I ran into, and
> >> I think all other theoretical cases.
> >>
> >> I would prefer to combine both though, using __always_inline
> >> to force the compiler to pick the inline version over
> >> __iowrite32_copy_full() even when it is optimizing for size
> >> and it decides the inline version is larger, but removing
> >> the extra complexity from the macro.
> >
> > Sorry, I'm not sure what you mean here. I don't see anything handling
> > optimizing for size today so I'm not sure what change your suggesting to
> > force the use of the inline version; AFAICT that'd always be forced for
> > a suitable constant size.
> >
> > What change are you suggesting?
>
> What I meant is that reason gcc chooses to not inline
> the macro is when we build with CONFIG_CC_OPTIMIZE_FOR_SIZE.
>
> Since it doesn't know that __const_memcpy_toio_aligned64()
> is intended to be small after inlining, it sometimes
> decides against it, which (with just my patch) would
> fall back to the out-of-line __iowrite32_copy_full()
> while trying to generate smaller code.
>
> The __always_inline annotation just overrides the
> calculation.
Ah, ok.
I think what you're suggesting is:
* Add the __always_inline annotations, as in my patch.
* Move the __builtin_constant_p check into __const_iowrite32_copy(), as in your
patch.
* Remove the __iowrite32_copy() macro and rename __const_iowrite32_copy() to
__iowrite32_copy(), removing the redundant logic.
Assuming so, that makes total sense to me.
Mark.