2020-04-17 14:52:57

by Alex Shi

[permalink] [raw]
Subject: [PATCH 2/2] locking/rtmutex: optimize rt_mutex_cmpxchgs

Checking l->owner first to skip time cost cmpxchgs.

Suggested-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Alex Shi <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
---
kernel/locking/rtmutex.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index cfdd5b93264d..232727a4a220 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -141,8 +141,10 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
* set up.
*/
#ifndef CONFIG_DEBUG_RT_MUTEXES
-# define rt_mutex_cmpxchg_acquire(l,c,n) (cmpxchg_acquire(&l->owner, c, n) == c)
-# define rt_mutex_cmpxchg_release(l,c,n) (cmpxchg_release(&l->owner, c, n) == c)
+# define rt_mutex_cmpxchg_acquire(l, c, n) \
+ (l->owner == c && cmpxchg_acquire(&l->owner, c, n) == c)
+# define rt_mutex_cmpxchg_release(l, c, n) \
+ (l->owner == c && cmpxchg_release(&l->owner, c, n) == c)

/*
* Callers must hold the ->wait_lock -- which is the whole purpose as we force
--
1.8.3.1


2020-04-26 16:37:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/2] locking/rtmutex: optimize rt_mutex_cmpxchgs

Alex Shi <[email protected]> writes:

> Checking l->owner first to skip time cost cmpxchgs.

I don't see what that buys.

It actually adds an extra conditional in the non-contended case, which
is the case we are optimizing for.

In the contended case, i.e. when l->owner != c the cmpxchg cost is
completely irrelevant compared to the slowpath costs.

> #ifndef CONFIG_DEBUG_RT_MUTEXES
> -# define rt_mutex_cmpxchg_acquire(l,c,n) (cmpxchg_acquire(&l->owner, c, n) == c)
> -# define rt_mutex_cmpxchg_release(l,c,n) (cmpxchg_release(&l->owner, c, n) == c)
> +# define rt_mutex_cmpxchg_acquire(l, c, n) \
> + (l->owner == c && cmpxchg_acquire(&l->owner, c, n) == c)
> +# define rt_mutex_cmpxchg_release(l, c, n) \
> + (l->owner == c && cmpxchg_release(&l->owner, c, n) == c)

This kind of micro-optimizing is more than dubious especially w/o
numbers backing up the benefit.

Thanks,

tglx