2021-10-20 11:11:40

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

In some cases, rcuwait_wake_up can be called even if the actual likelihood
of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting
rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in
fact it is unnecessary when there is no w->task to keep alive: the
memory barrier before the read is what matters in order to avoid missed
wakeups.

Therefore, do an early check of w->task right after the barrier, and skip
rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup.

Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt
injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve
by around 600 cpu cycles.

Cc: Davidlohr Bueso <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Reported-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
kernel/exit.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 91a43e57a32e..a38a08dbf85e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -234,8 +234,6 @@ int rcuwait_wake_up(struct rcuwait *w)
int ret = 0;
struct task_struct *task;

- rcu_read_lock();
-
/*
* Order condition vs @task, such that everything prior to the load
* of @task is visible. This is the condition as to why the user called
@@ -245,10 +243,22 @@ int rcuwait_wake_up(struct rcuwait *w)
* WAIT WAKE
* [S] tsk = current [S] cond = true
* MB (A) MB (B)
- * [L] cond [L] tsk
+ * [L] cond [L] rcuwait_active(w)
+ * task = rcu_dereference(w->task)
*/
smp_mb(); /* (B) */

+#ifdef CONFIG_PREEMPT_RCU
+ /*
+ * The cost of rcu_read_lock() dominates for preemptible RCU,
+ * avoid it if possible.
+ */
+ if (!rcuwait_active(w))
+ return ret;
+#endif
+
+ rcu_read_lock();
+
task = rcu_dereference(w->task);
if (task)
ret = wake_up_process(task);
--
2.27.0


2021-10-20 11:19:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On Wed, Oct 20, 2021 at 07:06:38AM -0400, Paolo Bonzini wrote:
> In some cases, rcuwait_wake_up can be called even if the actual likelihood
> of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting
> rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in
> fact it is unnecessary when there is no w->task to keep alive: the
> memory barrier before the read is what matters in order to avoid missed
> wakeups.
>
> Therefore, do an early check of w->task right after the barrier, and skip
> rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup.
>
> Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt
> injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve
> by around 600 cpu cycles.

*how* ?!?

AFAICT, rcu_read_lock() for PREEMPT_RCU is:

WRITE_ONCE(current->rcu_read_lock_nesting, READ_ONCE(current->rcu_read_lock_nesting) + 1);
barrier();

Paul?

2021-10-20 11:40:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On 20/10/21 13:17, Peter Zijlstra wrote:
> AFAICT, rcu_read_lock() for PREEMPT_RCU is:
>
> WRITE_ONCE(current->rcu_read_lock_nesting, READ_ONCE(current->rcu_read_lock_nesting) + 1);
> barrier();

rcu_read_unlock() is the expensive one if you need to go down
rcu_read_unlock_special().

Paolo

> Paul?

2021-10-20 11:54:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On 20/10/21 13:17, Peter Zijlstra wrote:
> On Wed, Oct 20, 2021 at 07:06:38AM -0400, Paolo Bonzini wrote:
>> In some cases, rcuwait_wake_up can be called even if the actual likelihood
>> of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting
>> rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in
>> fact it is unnecessary when there is no w->task to keep alive: the
>> memory barrier before the read is what matters in order to avoid missed
>> wakeups.
>>
>> Therefore, do an early check of w->task right after the barrier, and skip
>> rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup.
>>
>> Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt
>> injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve
>> by around 600 cpu cycles.
>
> *how* ?!?
>
> AFAICT, rcu_read_lock() for PREEMPT_RCU is:
>
> WRITE_ONCE(current->rcu_read_lock_nesting, READ_ONCE(current->rcu_read_lock_nesting) + 1);
> barrier();
>
> Paul?

Wanpeng, can you share your full .config?

Paolo

2021-10-20 12:07:08

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On Wed, 20 Oct 2021 at 19:53, Paolo Bonzini <[email protected]> wrote:
>
> On 20/10/21 13:17, Peter Zijlstra wrote:
> > On Wed, Oct 20, 2021 at 07:06:38AM -0400, Paolo Bonzini wrote:
> >> In some cases, rcuwait_wake_up can be called even if the actual likelihood
> >> of a wakeup is very low. If CONFIG_PREEMPT_RCU is active, the resulting
> >> rcu_read_lock/rcu_read_unlock pair can be relatively expensive, and in
> >> fact it is unnecessary when there is no w->task to keep alive: the
> >> memory barrier before the read is what matters in order to avoid missed
> >> wakeups.
> >>
> >> Therefore, do an early check of w->task right after the barrier, and skip
> >> rcu_read_lock/rcu_read_unlock unless there is someone waiting for a wakeup.
> >>
> >> Running kvm-unit-test/vmexit.flat with APICv disabled, most interrupt
> >> injection tests (tscdeadline*, self_ipi*, x2apic_self_ipi*) improve
> >> by around 600 cpu cycles.
> >
> > *how* ?!?
> >
> > AFAICT, rcu_read_lock() for PREEMPT_RCU is:
> >
> > WRITE_ONCE(current->rcu_read_lock_nesting, READ_ONCE(current->rcu_read_lock_nesting) + 1);
> > barrier();
> >
> > Paul?
>
> Wanpeng, can you share your full .config?

Yes, in the attachment.

Wanpeng


Attachments:
config (244.00 kB)

2021-10-20 12:07:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On 20/10/21 14:01, Wanpeng Li wrote:
> Yes, in the attachment.
>
> Wanpeng

This one does not have CONFIG_PREEMPT=y, let alone CONFIG_PREEMPT_RCU.
It's completely impossible that this patch has an effect without those
options.

Paolo

2021-10-20 12:13:56

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On Wed, 20 Oct 2021 at 20:04, Paolo Bonzini <[email protected]> wrote:
>
> On 20/10/21 14:01, Wanpeng Li wrote:
> > Yes, in the attachment.
> >
> > Wanpeng
>
> This one does not have CONFIG_PREEMPT=y, let alone CONFIG_PREEMPT_RCU.
> It's completely impossible that this patch has an effect without those
> options.

Sorry, should be this one in the attachment.

Wanpeng


Attachments:
kvm_rcu_config (244.17 kB)

2021-10-22 04:16:11

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On Wed, Oct 20, 2021 at 7:39 PM Paolo Bonzini <[email protected]> wrote:
>
> On 20/10/21 13:17, Peter Zijlstra wrote:
> > AFAICT, rcu_read_lock() for PREEMPT_RCU is:
> >
> > WRITE_ONCE(current->rcu_read_lock_nesting, READ_ONCE(current->rcu_read_lock_nesting) + 1);
> > barrier();
>
> rcu_read_unlock() is the expensive one if you need to go down
> rcu_read_unlock_special().
>

If "actual likelihood of a wakeup is very low." as stated in the changelog,
the likelihood of rcu_read_unlock_special() is also very low.

rcu_read_lock() for PREEMPT_RCU is a function call, is it relevant?

(It is possible to remove the function call if the include-hell can
be resolved or remove the function call via LTO or just remove the
function call in X86 via percpu.)

Thanks
Lai

> Paolo
>
> > Paul?
>

2021-10-22 14:01:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] rcuwait: do not enter RCU protection unless a wakeup is needed

On Wed, Oct 20, 2021 at 08:08:41PM +0800, Wanpeng Li wrote:
> On Wed, 20 Oct 2021 at 20:04, Paolo Bonzini <[email protected]> wrote:
> >
> > On 20/10/21 14:01, Wanpeng Li wrote:
> > > Yes, in the attachment.
> > >
> > > Wanpeng
> >
> > This one does not have CONFIG_PREEMPT=y, let alone CONFIG_PREEMPT_RCU.
> > It's completely impossible that this patch has an effect without those
> > options.
>
> Sorry, should be this one in the attachment.

Uhhmmm.. you have lockdep enabled. You know you shouldn't be doing
performance measurements with lockdep on, right?