2023-02-27 16:48:04

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH] x86/ACPI/boot: Improve __acpi_acquire_global_lock

Improve __acpi_acquire_global_lock by using a temporary variable.
This enables compiler to perform if-conversion and improves generated
code from:

...
72a: d1 ea shr %edx
72c: 83 e1 fc and $0xfffffffc,%ecx
72f: 83 e2 01 and $0x1,%edx
732: 09 ca or %ecx,%edx
734: 83 c2 02 add $0x2,%edx
737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
73d: 83 e2 03 and $0x3,%edx
740: 31 c0 xor %eax,%eax
742: 83 fa 03 cmp $0x3,%edx
745: 0f 95 c0 setne %al
748: f7 d8 neg %eax

to:

...
72a: d1 e9 shr %ecx
72c: 83 e2 fc and $0xfffffffc,%edx
72f: 83 e1 01 and $0x1,%ecx
732: 09 ca or %ecx,%edx
734: 83 c2 02 add $0x2,%edx
737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
73d: 8d 41 ff lea -0x1(%rcx),%eax

BTW: the compiler could generate:

lea 0x2(%rcx,%rdx,1),%edx

instead of:

or %ecx,%edx
add $0x2,%edx

but unwated conversion from add to or when bits are known to be zero
prevents this improvement. This is GCC PR108477.

No functional change intended.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
---
arch/x86/kernel/acpi/boot.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 1c38174b5f01..577403c69983 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1853,13 +1853,14 @@ early_param("acpi_sci", setup_acpi_sci);

int __acpi_acquire_global_lock(unsigned int *lock)
{
- unsigned int old, new;
+ unsigned int old, new, val;

old = READ_ONCE(*lock);
do {
- new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
+ val = (old >> 1) & 0x1;
+ new = (old & ~0x3) + 2 + val;
} while (!try_cmpxchg(lock, &old, new));
- return ((new & 0x3) < 3) ? -1 : 0;
+ return val ? 0 : -1;
}

int __acpi_release_global_lock(unsigned int *lock)
--
2.39.2



2023-03-20 17:19:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] x86/ACPI/boot: Improve __acpi_acquire_global_lock

On Mon, Feb 27, 2023 at 5:48 PM Uros Bizjak <[email protected]> wrote:
>
> Improve __acpi_acquire_global_lock by using a temporary variable.
> This enables compiler to perform if-conversion and improves generated
> code from:
>
> ...
> 72a: d1 ea shr %edx
> 72c: 83 e1 fc and $0xfffffffc,%ecx
> 72f: 83 e2 01 and $0x1,%edx
> 732: 09 ca or %ecx,%edx
> 734: 83 c2 02 add $0x2,%edx
> 737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> 73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
> 73d: 83 e2 03 and $0x3,%edx
> 740: 31 c0 xor %eax,%eax
> 742: 83 fa 03 cmp $0x3,%edx
> 745: 0f 95 c0 setne %al
> 748: f7 d8 neg %eax
>
> to:
>
> ...
> 72a: d1 e9 shr %ecx
> 72c: 83 e2 fc and $0xfffffffc,%edx
> 72f: 83 e1 01 and $0x1,%ecx
> 732: 09 ca or %ecx,%edx
> 734: 83 c2 02 add $0x2,%edx
> 737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> 73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
> 73d: 8d 41 ff lea -0x1(%rcx),%eax
>
> BTW: the compiler could generate:
>
> lea 0x2(%rcx,%rdx,1),%edx
>
> instead of:
>
> or %ecx,%edx
> add $0x2,%edx
>
> but unwated conversion from add to or when bits are known to be zero
> prevents this improvement. This is GCC PR108477.
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> ---
> arch/x86/kernel/acpi/boot.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 1c38174b5f01..577403c69983 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1853,13 +1853,14 @@ early_param("acpi_sci", setup_acpi_sci);
>
> int __acpi_acquire_global_lock(unsigned int *lock)
> {
> - unsigned int old, new;
> + unsigned int old, new, val;
>
> old = READ_ONCE(*lock);
> do {
> - new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
> + val = (old >> 1) & 0x1;
> + new = (old & ~0x3) + 2 + val;
> } while (!try_cmpxchg(lock, &old, new));
> - return ((new & 0x3) < 3) ? -1 : 0;
> + return val ? 0 : -1;

I would prefer

if (val)
return -1;

return 0;

> }
>
> int __acpi_release_global_lock(unsigned int *lock)
> --

Otherwise I agree that this change would be an improvement.

>