2021-09-20 11:33:43

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH -tip 0/2] locking/rwbase: Two reader optimizations

Hi,

Patch 1 is a barrier optimization that came up from the reader
fastpath ordering auditing.

Patch 2 is a resend of the previous broken patch that attempts
to use wake_q for read_unlock() slowpath.

Tested on v5.15.y-rt. Applies against tip/urgent.

Thanks!

Davidlohr Bueso (2):
locking/rwbase: Optimize rwbase_read_trylock
locking/rwbase: Lockless reader waking up a writer

kernel/locking/rtmutex.c | 19 +++++++++++++------
kernel/locking/rwbase_rt.c | 11 +++++++----
2 files changed, 20 insertions(+), 10 deletions(-)

--
2.26.2


2021-09-20 11:43:56

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/2] locking/rwbase: Optimize rwbase_read_trylock

Instead of a full barrier around the Rmw insn, micro-optimize
for weakly ordered archs such that we only provide the required
ACQUIRE semantics when taking the read lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/rwbase_rt.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 88191f6e252c..a9034784a5a0 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -59,8 +59,7 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
* set.
*/
for (r = atomic_read(&rwb->readers); r < 0;) {
- /* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
- if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
+ if (likely(atomic_try_cmpxchg_acquire(&rwb->readers, &r, r + 1)))
return 1;
}
return 0;
@@ -183,7 +182,7 @@ static inline void __rwbase_write_unlock(struct rwbase_rt *rwb, int bias,

/*
* _release() is needed in case that reader is in fast path, pairing
- * with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
+ * with atomic_try_cmpxchg_acquire() in rwbase_read_trylock().
*/
(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
--
2.26.2

2021-09-20 11:44:14

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/2] locking/rwbase: Lockless reader waking up a writer

Use the RT-lock safe wake_q to allow waking up the writer
without having to hold the wait_lock across the operation.

While this is ideally for batching wakeups, single wakeup
usage has still shown to be beneficial vs the cost of
try_to_wakeup() when the lock is contended, as well as
not having irqs disabled during the wakeup window, albeit
preemption will remain disabled.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
kernel/locking/rtmutex.c | 19 +++++++++++++------
kernel/locking/rwbase_rt.c | 6 +++++-
2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6bb116c559b4..1581674d640b 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -446,19 +446,26 @@ static __always_inline void rt_mutex_adjust_prio(struct task_struct *p)
}

/* RT mutex specific wake_q wrappers */
-static __always_inline void rt_mutex_wake_q_add(struct rt_wake_q_head *wqh,
- struct rt_mutex_waiter *w)
+static __always_inline void rt_mutex_wake_q_add_task(struct rt_wake_q_head *wqh,
+ struct task_struct *task,
+ unsigned int wake_state)
{
- if (IS_ENABLED(CONFIG_PREEMPT_RT) && w->wake_state != TASK_NORMAL) {
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && wake_state != TASK_NORMAL) {
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(wqh->rtlock_task);
- get_task_struct(w->task);
- wqh->rtlock_task = w->task;
+ get_task_struct(task);
+ wqh->rtlock_task = task;
} else {
- wake_q_add(&wqh->head, w->task);
+ wake_q_add(&wqh->head, task);
}
}

+static __always_inline void rt_mutex_wake_q_add(struct rt_wake_q_head *wqh,
+ struct rt_mutex_waiter *w)
+{
+ rt_mutex_wake_q_add_task(wqh, w->task, w->wake_state);
+}
+
static __always_inline void rt_mutex_wake_up_q(struct rt_wake_q_head *wqh)
{
if (IS_ENABLED(CONFIG_PREEMPT_RT) && wqh->rtlock_task) {
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index a9034784a5a0..8cb58758af3d 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -147,6 +147,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
{
struct rt_mutex_base *rtm = &rwb->rtmutex;
struct task_struct *owner;
+ DEFINE_RT_WAKE_Q(wqh);

raw_spin_lock_irq(&rtm->wait_lock);
/*
@@ -157,9 +158,12 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
*/
owner = rt_mutex_owner(rtm);
if (owner)
- wake_up_state(owner, state);
+ rt_mutex_wake_q_add_task(&wqh, owner, state);

+ /* Pairs with the preempt_enable() in rt_mutex_wake_up_q() */
+ preempt_disable();
raw_spin_unlock_irq(&rtm->wait_lock);
+ rt_mutex_wake_up_q(&wqh);
}

static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb,
--
2.26.2

2021-09-21 03:41:48

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH -tip 0/2] locking/rwbase: Two reader optimizations

On 9/20/21 1:20 AM, Davidlohr Bueso wrote:
> Hi,
>
> Patch 1 is a barrier optimization that came up from the reader
> fastpath ordering auditing.
>
> Patch 2 is a resend of the previous broken patch that attempts
> to use wake_q for read_unlock() slowpath.
>
> Tested on v5.15.y-rt. Applies against tip/urgent.
>
> Thanks!
>
> Davidlohr Bueso (2):
> locking/rwbase: Optimize rwbase_read_trylock
> locking/rwbase: Lockless reader waking up a writer
>
> kernel/locking/rtmutex.c | 19 +++++++++++++------
> kernel/locking/rwbase_rt.c | 11 +++++++----
> 2 files changed, 20 insertions(+), 10 deletions(-)
>
> --
> 2.26.2
>
Your patches look good to me.

Acked-by: Waiman Long <[email protected]>

Subject: [tip: locking/core] locking/rwbase: Optimize rwbase_read_trylock

The following commit has been merged into the locking/core branch of tip:

Commit-ID: c78416d122243c92992a1d1063f17ddd0bc80e6c
Gitweb: https://git.kernel.org/tip/c78416d122243c92992a1d1063f17ddd0bc80e6c
Author: Davidlohr Bueso <[email protected]>
AuthorDate: Sun, 19 Sep 2021 22:20:30 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 07 Oct 2021 13:51:07 +02:00

locking/rwbase: Optimize rwbase_read_trylock

Instead of a full barrier around the Rmw insn, micro-optimize
for weakly ordered archs such that we only provide the required
ACQUIRE semantics when taking the read lock.

Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Waiman Long <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/rwbase_rt.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 15c8110..6fd3162 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -59,8 +59,7 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
* set.
*/
for (r = atomic_read(&rwb->readers); r < 0;) {
- /* Fully-ordered if cmpxchg() succeeds, provides ACQUIRE */
- if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
+ if (likely(atomic_try_cmpxchg_acquire(&rwb->readers, &r, r + 1)))
return 1;
}
return 0;
@@ -187,7 +186,7 @@ static inline void __rwbase_write_unlock(struct rwbase_rt *rwb, int bias,

/*
* _release() is needed in case that reader is in fast path, pairing
- * with atomic_try_cmpxchg() in rwbase_read_trylock(), provides RELEASE
+ * with atomic_try_cmpxchg_acquire() in rwbase_read_trylock().
*/
(void)atomic_add_return_release(READER_BIAS - bias, &rwb->readers);
raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);