2024-01-24 10:59:25

by Uros Bizjak

[permalink] [raw]
Subject: [PATCH] x86/asm: Implement local_xchg using CMPXCHG without lock prefix

Implement local_xchg using CMPXCHG instruction without lock prefix.
XCHG is expensive due to the implied lock prefix. The processor
cannot prefetch cachelines if XCHG is used.

Signed-off-by: Uros Bizjak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
---
arch/x86/include/asm/local.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 73dba8b94443..f9af6908aa2f 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -131,8 +131,20 @@ static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
(typeof(l->a.counter) *) old, new);
}

-/* Always has a lock prefix */
-#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
+/*
+ * Implement local_xchg using CMPXCHG instruction without lock prefix.
+ * XCHG is expensive due to the implied lock prefix. The processor
+ * cannot prefetch cachelines if XCHG is used.
+ */
+static __always_inline long
+local_xchg(local_t *l, long n)
+{
+ long c = local_read(l);
+
+ do { } while (!local_try_cmpxchg(l, &c, n));
+
+ return c;
+}

/**
* local_add_unless - add unless the number is already a given value
--
2.31.1



2024-03-01 12:39:04

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: locking/core] locking/x86: Implement local_xchg() using CMPXCHG without the LOCK prefix

The following commit has been merged into the locking/core branch of tip:

Commit-ID: e807c2a37044a51de89d6d4f8a1f5ecfb3752f36
Gitweb: https://git.kernel.org/tip/e807c2a37044a51de89d6d4f8a1f5ecfb3752f36
Author: Uros Bizjak <[email protected]>
AuthorDate: Wed, 24 Jan 2024 11:58:16 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 01 Mar 2024 12:54:25 +01:00

locking/x86: Implement local_xchg() using CMPXCHG without the LOCK prefix

Implement local_xchg() using the CMPXCHG instruction without the LOCK prefix.
XCHG is expensive due to the implied LOCK prefix. The processor
cannot prefetch cachelines if XCHG is used.

Signed-off-by: Uros Bizjak <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/local.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/local.h b/arch/x86/include/asm/local.h
index 73dba8b..59aa966 100644
--- a/arch/x86/include/asm/local.h
+++ b/arch/x86/include/asm/local.h
@@ -131,8 +131,20 @@ static inline bool local_try_cmpxchg(local_t *l, long *old, long new)
(typeof(l->a.counter) *) old, new);
}

-/* Always has a lock prefix */
-#define local_xchg(l, n) (xchg(&((l)->a.counter), (n)))
+/*
+ * Implement local_xchg using CMPXCHG instruction without the LOCK prefix.
+ * XCHG is expensive due to the implied LOCK prefix. The processor
+ * cannot prefetch cachelines if XCHG is used.
+ */
+static __always_inline long
+local_xchg(local_t *l, long n)
+{
+ long c = local_read(l);
+
+ do { } while (!local_try_cmpxchg(l, &c, n));
+
+ return c;
+}

/**
* local_add_unless - add unless the number is already a given value