2024-05-28 12:30:04

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] arm64/io: add constant-argument check

From: Arnd Bergmann <[email protected]>

In some configurations __const_iowrite32_copy() does not get inlined
and gcc runs into the BUILD_BUG():

In file included from <command-line>:
In function '__const_memcpy_toio_aligned32',
inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3,
inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20:
include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed
487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert'
468 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert'
487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^~~~~~~~~~~~~~~~
arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG'
193 | BUILD_BUG();
| ^~~~~~~~~

Add a check to ensure that the argument is in fact a constant before
calling into __const_memcpy_toio_aligned32().

Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()")
Signed-off-by: Arnd Bergmann <[email protected]>
---
arch/arm64/include/asm/io.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 4ff0ae3f6d66..44913f227060 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
size_t count)
{
- if (count == 8 || count == 4 || count == 2 || count == 1) {
+ if (__builtin_constant_p(count) &&
+ (count == 8 || count == 4 || count == 2 || count == 1)) {
__const_memcpy_toio_aligned32(to, from, count);
dgh();
} else {
--
2.39.2



2024-05-28 15:30:47

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Tue, May 28, 2024, at 14:08, Arnd Bergmann wrote:

> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 4ff0ae3f6d66..44913f227060 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const
> void *from, size_t count);
> static inline void __const_iowrite32_copy(void __iomem *to, const void
> *from,
> size_t count)
> {
> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> + if (__builtin_constant_p(count) &&
> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> __const_memcpy_toio_aligned32(to, from, count);
> dgh();
> } else {

I ran into the same issue on __const_iowrite64_copy()
now, will send an updated patch. I also noticed that
the __iowrite32_copy() and __iowrite64_copy() look
redundant after my change, so I wonder if we should
just remove those and rename __const_iowrite32_copy()
and __const_iowrite64_copy() accordingly.

Arnd

2024-05-29 11:14:43

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Tue, May 28, 2024 at 02:08:38PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann <[email protected]>
>
> In some configurations __const_iowrite32_copy() does not get inlined
> and gcc runs into the BUILD_BUG():
>
> In file included from <command-line>:
> In function '__const_memcpy_toio_aligned32',
> inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3,
> inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20:
> include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed
> 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^
> include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert'
> 468 | prefix ## suffix(); \
> | ^~~~~~
> include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert'
> 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^~~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> | ^~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
> | ^~~~~~~~~~~~~~~~
> arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG'
> 193 | BUILD_BUG();
> | ^~~~~~~~~
>
> Add a check to ensure that the argument is in fact a constant before
> calling into __const_memcpy_toio_aligned32().
>
> Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()")
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
> arch/arm64/include/asm/io.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 4ff0ae3f6d66..44913f227060 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -199,7 +199,8 @@ void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);
> static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
> size_t count)
> {
> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> + if (__builtin_constant_p(count) &&
> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> __const_memcpy_toio_aligned32(to, from, count);
> dgh();
> } else {

I don't think this is the right fix.

The idea was that this was checked in __iowrite32_copy(), which does:

#define __iowrite32_copy(to, from, count) \
(__builtin_constant_p(count) ? \
__const_iowrite32_copy(to, from, count) : \
__iowrite32_copy_full(to, from, count))

.. and so __const_iowrite32_copy() should really be marked as __always_inline,
and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
inlined and such that __const_memcpy_toio_aligned32() sees a constant.

The same reasoning applies to __const_iowrite64_copy() and
__const_memcpy_toio_aligned64().

Checking for a constant in __const_iowrite32_copy() doesn't guarantee
that __const_memcpy_toio_aligned32() is inlined and will actually see a
constant.

Does diff the below you for you?

Mark.

---->8----
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 4ff0ae3f6d669..f4350aae92d5d 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -153,8 +153,9 @@ extern void __memset_io(volatile void __iomem *, int, size_t);
* emit the large TLP from the CPU.
*/

-static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,
- const u32 *from, size_t count)
+static __always_inline void
+__const_memcpy_toio_aligned32(volatile u32 __iomem *to, const u32 *from,
+ size_t count)
{
switch (count) {
case 8:
@@ -196,8 +197,8 @@ static inline void __const_memcpy_toio_aligned32(volatile u32 __iomem *to,

void __iowrite32_copy_full(void __iomem *to, const void *from, size_t count);

-static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
- size_t count)
+static __always_inline void
+__const_iowrite32_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
__const_memcpy_toio_aligned32(to, from, count);
@@ -212,8 +213,9 @@ static inline void __const_iowrite32_copy(void __iomem *to, const void *from,
__const_iowrite32_copy(to, from, count) : \
__iowrite32_copy_full(to, from, count))

-static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,
- const u64 *from, size_t count)
+static __always_inline void
+__const_memcpy_toio_aligned64(volatile u64 __iomem *to, const u64 *from,
+ size_t count)
{
switch (count) {
case 8:
@@ -255,8 +257,8 @@ static inline void __const_memcpy_toio_aligned64(volatile u64 __iomem *to,

void __iowrite64_copy_full(void __iomem *to, const void *from, size_t count);

-static inline void __const_iowrite64_copy(void __iomem *to, const void *from,
- size_t count)
+static __always_inline void
+__const_iowrite64_copy(void __iomem *to, const void *from, size_t count)
{
if (count == 8 || count == 4 || count == 2 || count == 1) {
__const_memcpy_toio_aligned64(to, from, count);


2024-05-29 12:30:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
>> {
>> - if (count == 8 || count == 4 || count == 2 || count == 1) {
>> + if (__builtin_constant_p(count) &&
>> + (count == 8 || count == 4 || count == 2 || count == 1)) {
>> __const_memcpy_toio_aligned32(to, from, count);
>> dgh();
>> } else {
>
> I don't think this is the right fix.
>
> The idea was that this was checked in __iowrite32_copy(), which does:
>
> #define __iowrite32_copy(to, from, count) \
> (__builtin_constant_p(count) ? \
> __const_iowrite32_copy(to, from, count) : \
> __iowrite32_copy_full(to, from, count))
>
> ... and so __const_iowrite32_copy() should really be marked as __always_inline,
> and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
> inlined and such that __const_memcpy_toio_aligned32() sees a constant.
>
> The same reasoning applies to __const_iowrite64_copy() and
> __const_memcpy_toio_aligned64().
>
> Checking for a constant in __const_iowrite32_copy() doesn't guarantee
> that __const_memcpy_toio_aligned32() is inlined and will actually see a
> constant.
>
> Does diff the below you for you?

Yes, your version addresses both failures I ran into, and
I think all other theoretical cases.

I would prefer to combine both though, using __always_inline
to force the compiler to pick the inline version over
__iowrite32_copy_full() even when it is optimizing for size
and it decides the inline version is larger, but removing
the extra complexity from the macro.

According to Jason, he used a macro here to be sure
that the compiler can detect an inline function argument
as constant when the value is known but it is not
a constant value according to the C standard.

This was indeed a problem in older versions of clang
that missed a lot of optimizations in the kernel, but
clang-8 and higher were changed to have the same behavior
as gcc here, so it is no longer necessary now that the
older versions are unable to build kernels.

Arnd

2024-05-29 15:09:41

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
> >> {
> >> - if (count == 8 || count == 4 || count == 2 || count == 1) {
> >> + if (__builtin_constant_p(count) &&
> >> + (count == 8 || count == 4 || count == 2 || count == 1)) {
> >> __const_memcpy_toio_aligned32(to, from, count);
> >> dgh();
> >> } else {
> >
> > I don't think this is the right fix.
> >
> > The idea was that this was checked in __iowrite32_copy(), which does:
> >
> > #define __iowrite32_copy(to, from, count) \
> > (__builtin_constant_p(count) ? \
> > __const_iowrite32_copy(to, from, count) : \
> > __iowrite32_copy_full(to, from, count))
> >
> > ... and so __const_iowrite32_copy() should really be marked as __always_inline,
> > and the same for __const_memcpy_toio_aligned32(), to guarantee that both get
> > inlined and such that __const_memcpy_toio_aligned32() sees a constant.
> >
> > The same reasoning applies to __const_iowrite64_copy() and
> > __const_memcpy_toio_aligned64().
> >
> > Checking for a constant in __const_iowrite32_copy() doesn't guarantee
> > that __const_memcpy_toio_aligned32() is inlined and will actually see a
> > constant.
> >
> > Does diff the below you for you?
>
> Yes, your version addresses both failures I ran into, and
> I think all other theoretical cases.
>
> I would prefer to combine both though, using __always_inline
> to force the compiler to pick the inline version over
> __iowrite32_copy_full() even when it is optimizing for size
> and it decides the inline version is larger, but removing
> the extra complexity from the macro.

Sorry, I'm not sure what you mean here. I don't see anything handling
optimizing for size today so I'm not sure what change your suggesting to
force the use of the inline version; AFAICT that'd always be forced for
a suitable constant size.

What change are you suggesting?

> According to Jason, he used a macro here to be sure
> that the compiler can detect an inline function argument
> as constant when the value is known but it is not
> a constant value according to the C standard.
>
> This was indeed a problem in older versions of clang
> that missed a lot of optimizations in the kernel, but
> clang-8 and higher were changed to have the same behavior
> as gcc here, so it is no longer necessary now that the
> older versions are unable to build kernels.

Getting rid of the macro is fine by me.

Mark.

2024-05-29 16:16:30

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Wed, May 29, 2024, at 17:08, Mark Rutland wrote:
> On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
>> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:

>>
>> Yes, your version addresses both failures I ran into, and
>> I think all other theoretical cases.
>>
>> I would prefer to combine both though, using __always_inline
>> to force the compiler to pick the inline version over
>> __iowrite32_copy_full() even when it is optimizing for size
>> and it decides the inline version is larger, but removing
>> the extra complexity from the macro.
>
> Sorry, I'm not sure what you mean here. I don't see anything handling
> optimizing for size today so I'm not sure what change your suggesting to
> force the use of the inline version; AFAICT that'd always be forced for
> a suitable constant size.
>
> What change are you suggesting?

What I meant is that reason gcc chooses to not inline
the macro is when we build with CONFIG_CC_OPTIMIZE_FOR_SIZE.

Since it doesn't know that __const_memcpy_toio_aligned64()
is intended to be small after inlining, it sometimes
decides against it, which (with just my patch) would
fall back to the out-of-line __iowrite32_copy_full()
while trying to generate smaller code.

The __always_inline annotation just overrides the
calculation.

Arnd

2024-05-29 16:40:48

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] arm64/io: add constant-argument check

On Wed, May 29, 2024 at 06:15:57PM +0200, Arnd Bergmann wrote:
> On Wed, May 29, 2024, at 17:08, Mark Rutland wrote:
> > On Wed, May 29, 2024 at 02:29:37PM +0200, Arnd Bergmann wrote:
> >> On Wed, May 29, 2024, at 13:14, Mark Rutland wrote:
>
> >>
> >> Yes, your version addresses both failures I ran into, and
> >> I think all other theoretical cases.
> >>
> >> I would prefer to combine both though, using __always_inline
> >> to force the compiler to pick the inline version over
> >> __iowrite32_copy_full() even when it is optimizing for size
> >> and it decides the inline version is larger, but removing
> >> the extra complexity from the macro.
> >
> > Sorry, I'm not sure what you mean here. I don't see anything handling
> > optimizing for size today so I'm not sure what change your suggesting to
> > force the use of the inline version; AFAICT that'd always be forced for
> > a suitable constant size.
> >
> > What change are you suggesting?
>
> What I meant is that reason gcc chooses to not inline
> the macro is when we build with CONFIG_CC_OPTIMIZE_FOR_SIZE.
>
> Since it doesn't know that __const_memcpy_toio_aligned64()
> is intended to be small after inlining, it sometimes
> decides against it, which (with just my patch) would
> fall back to the out-of-line __iowrite32_copy_full()
> while trying to generate smaller code.
>
> The __always_inline annotation just overrides the
> calculation.

Ah, ok.

I think what you're suggesting is:

* Add the __always_inline annotations, as in my patch.

* Move the __builtin_constant_p check into __const_iowrite32_copy(), as in your
patch.

* Remove the __iowrite32_copy() macro and rename __const_iowrite32_copy() to
__iowrite32_copy(), removing the redundant logic.

Assuming so, that makes total sense to me.

Mark.