2020-08-21 08:58:20

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.

Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Marco Elver <[email protected]>
---
drivers/cpuidle/cpuidle.c | 12 ++++++++----
kernel/sched/idle.c | 22 ++++++++--------------
2 files changed, 16 insertions(+), 18 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -145,13 +145,14 @@ static void enter_s2idle_proper(struct c
* executing it contains RCU usage regarded as invalid in the idle
* context, so tell RCU about that.
*/
- RCU_NONIDLE(tick_freeze());
+ tick_freeze();
/*
* The state used here cannot be a "coupled" one, because the "coupled"
* cpuidle mechanism enables interrupts and doing that with timekeeping
* suspended is generally unsafe.
*/
stop_critical_timings();
+ rcu_idle_enter();
drv->states[index].enter_s2idle(dev, drv, index);
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
@@ -160,7 +161,8 @@ static void enter_s2idle_proper(struct c
* first CPU executing it calls functions containing RCU read-side
* critical sections, so tell RCU about that.
*/
- RCU_NONIDLE(tick_unfreeze());
+ rcu_idle_exit();
+ tick_unfreeze();
start_critical_timings();

time_end = ns_to_ktime(local_clock());
@@ -229,16 +231,18 @@ int cpuidle_enter_state(struct cpuidle_d
/* Take note of the planned idle state. */
sched_idle_set_state(target_state);

- trace_cpu_idle_rcuidle(index, dev->cpu);
+ trace_cpu_idle(index, dev->cpu);
time_start = ns_to_ktime(local_clock());

stop_critical_timings();
+ rcu_idle_enter();
entered_state = target_state->enter(dev, drv, index);
+ rcu_idle_exit();
start_critical_timings();

sched_clock_idle_wakeup_event();
time_end = ns_to_ktime(local_clock());
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+ trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);

/* The cpu is no longer idle or about to enter idle. */
sched_idle_set_state(NULL);
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -54,17 +54,18 @@ __setup("hlt", cpu_idle_nopoll_setup);

static noinline int __cpuidle cpu_idle_poll(void)
{
+ trace_cpu_idle(0, smp_processor_id());
+ stop_critical_timings();
rcu_idle_enter();
- trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
- stop_critical_timings();

while (!tif_need_resched() &&
- (cpu_idle_force_poll || tick_check_broadcast_expired()))
+ (cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();
- start_critical_timings();
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
+
rcu_idle_exit();
+ start_critical_timings();
+ trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());

return 1;
}
@@ -91,7 +92,9 @@ void __cpuidle default_idle_call(void)
local_irq_enable();
} else {
stop_critical_timings();
+ rcu_idle_enter();
arch_cpu_idle();
+ rcu_idle_exit();
start_critical_timings();
}
}
@@ -158,7 +161,6 @@ static void cpuidle_idle_call(void)

if (cpuidle_not_available(drv, dev)) {
tick_nohz_idle_stop_tick();
- rcu_idle_enter();

default_idle_call();
goto exit_idle;
@@ -178,21 +180,17 @@ static void cpuidle_idle_call(void)
u64 max_latency_ns;

if (idle_should_enter_s2idle()) {
- rcu_idle_enter();

entered_state = call_cpuidle_s2idle(drv, dev);
if (entered_state > 0)
goto exit_idle;

- rcu_idle_exit();
-
max_latency_ns = U64_MAX;
} else {
max_latency_ns = dev->forced_idle_latency_limit_ns;
}

tick_nohz_idle_stop_tick();
- rcu_idle_enter();

next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
call_cpuidle(drv, dev, next_state);
@@ -209,8 +207,6 @@ static void cpuidle_idle_call(void)
else
tick_nohz_idle_retain_tick();

- rcu_idle_enter();
-
entered_state = call_cpuidle(drv, dev, next_state);
/*
* Give the governor an opportunity to reflect on the outcome
@@ -226,8 +222,6 @@ static void cpuidle_idle_call(void)
*/
if (WARN_ON_ONCE(irqs_disabled()))
local_irq_enable();
-
- rcu_idle_exit();
}

/*



2020-08-27 01:22:44

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Fri, Aug 21, 2020 at 10:47:41AM +0200, Peter Zijlstra wrote:
> Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
> that the locking tracepoints were using RCU.
>
> Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
> this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
>
> Specifically, sched_clock_idle_wakeup_event() will use ktime which
> will use seqlocks which will tickle lockdep, and
> stop_critical_timings() uses lock.

I was wondering if those tracepoints should just use _rcuidle variant of the
trace call. But that's a terrible idea considering that would add unwanted
overhead I think.

Reviewed-by: Joel Fernandes (Google) <[email protected]>

thanks,

- Joel

>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> Tested-by: Marco Elver <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 12 ++++++++----
> kernel/sched/idle.c | 22 ++++++++--------------
> 2 files changed, 16 insertions(+), 18 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -145,13 +145,14 @@ static void enter_s2idle_proper(struct c
> * executing it contains RCU usage regarded as invalid in the idle
> * context, so tell RCU about that.
> */
> - RCU_NONIDLE(tick_freeze());
> + tick_freeze();
> /*
> * The state used here cannot be a "coupled" one, because the "coupled"
> * cpuidle mechanism enables interrupts and doing that with timekeeping
> * suspended is generally unsafe.
> */
> stop_critical_timings();
> + rcu_idle_enter();
> drv->states[index].enter_s2idle(dev, drv, index);
> if (WARN_ON_ONCE(!irqs_disabled()))
> local_irq_disable();
> @@ -160,7 +161,8 @@ static void enter_s2idle_proper(struct c
> * first CPU executing it calls functions containing RCU read-side
> * critical sections, so tell RCU about that.
> */
> - RCU_NONIDLE(tick_unfreeze());
> + rcu_idle_exit();
> + tick_unfreeze();
> start_critical_timings();
>
> time_end = ns_to_ktime(local_clock());
> @@ -229,16 +231,18 @@ int cpuidle_enter_state(struct cpuidle_d
> /* Take note of the planned idle state. */
> sched_idle_set_state(target_state);
>
> - trace_cpu_idle_rcuidle(index, dev->cpu);
> + trace_cpu_idle(index, dev->cpu);
> time_start = ns_to_ktime(local_clock());
>
> stop_critical_timings();
> + rcu_idle_enter();
> entered_state = target_state->enter(dev, drv, index);
> + rcu_idle_exit();
> start_critical_timings();
>
> sched_clock_idle_wakeup_event();
> time_end = ns_to_ktime(local_clock());
> - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
> + trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
>
> /* The cpu is no longer idle or about to enter idle. */
> sched_idle_set_state(NULL);
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -54,17 +54,18 @@ __setup("hlt", cpu_idle_nopoll_setup);
>
> static noinline int __cpuidle cpu_idle_poll(void)
> {
> + trace_cpu_idle(0, smp_processor_id());
> + stop_critical_timings();
> rcu_idle_enter();
> - trace_cpu_idle_rcuidle(0, smp_processor_id());
> local_irq_enable();
> - stop_critical_timings();
>
> while (!tif_need_resched() &&
> - (cpu_idle_force_poll || tick_check_broadcast_expired()))
> + (cpu_idle_force_poll || tick_check_broadcast_expired()))
> cpu_relax();
> - start_critical_timings();
> - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> +
> rcu_idle_exit();
> + start_critical_timings();
> + trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>
> return 1;
> }
> @@ -91,7 +92,9 @@ void __cpuidle default_idle_call(void)
> local_irq_enable();
> } else {
> stop_critical_timings();
> + rcu_idle_enter();
> arch_cpu_idle();
> + rcu_idle_exit();
> start_critical_timings();
> }
> }
> @@ -158,7 +161,6 @@ static void cpuidle_idle_call(void)
>
> if (cpuidle_not_available(drv, dev)) {
> tick_nohz_idle_stop_tick();
> - rcu_idle_enter();
>
> default_idle_call();
> goto exit_idle;
> @@ -178,21 +180,17 @@ static void cpuidle_idle_call(void)
> u64 max_latency_ns;
>
> if (idle_should_enter_s2idle()) {
> - rcu_idle_enter();
>
> entered_state = call_cpuidle_s2idle(drv, dev);
> if (entered_state > 0)
> goto exit_idle;
>
> - rcu_idle_exit();
> -
> max_latency_ns = U64_MAX;
> } else {
> max_latency_ns = dev->forced_idle_latency_limit_ns;
> }
>
> tick_nohz_idle_stop_tick();
> - rcu_idle_enter();
>
> next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> call_cpuidle(drv, dev, next_state);
> @@ -209,8 +207,6 @@ static void cpuidle_idle_call(void)
> else
> tick_nohz_idle_retain_tick();
>
> - rcu_idle_enter();
> -
> entered_state = call_cpuidle(drv, dev, next_state);
> /*
> * Give the governor an opportunity to reflect on the outcome
> @@ -226,8 +222,6 @@ static void cpuidle_idle_call(void)
> */
> if (WARN_ON_ONCE(irqs_disabled()))
> local_irq_enable();
> -
> - rcu_idle_exit();
> }
>
> /*
>
>

2020-08-27 01:27:43

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Wed, Aug 26, 2020 at 09:18:26PM -0400, Joel Fernandes wrote:
> On Fri, Aug 21, 2020 at 10:47:41AM +0200, Peter Zijlstra wrote:
> > Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
> > that the locking tracepoints were using RCU.
> >
> > Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
> > this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
> >
> > Specifically, sched_clock_idle_wakeup_event() will use ktime which
> > will use seqlocks which will tickle lockdep, and
> > stop_critical_timings() uses lock.
>
> I was wondering if those tracepoints should just use _rcuidle variant of the
> trace call. But that's a terrible idea considering that would add unwanted
> overhead I think.
>
> Reviewed-by: Joel Fernandes (Google) <[email protected]>

BTW, if tracepoint is converted to use RCU-trace flavor, then these kinds of
issues go away, no? That RCU flavor is always watching.

thanks,

- Joel

>
> thanks,
>
> - Joel
>
> >
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> > Tested-by: Marco Elver <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle.c | 12 ++++++++----
> > kernel/sched/idle.c | 22 ++++++++--------------
> > 2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -145,13 +145,14 @@ static void enter_s2idle_proper(struct c
> > * executing it contains RCU usage regarded as invalid in the idle
> > * context, so tell RCU about that.
> > */
> > - RCU_NONIDLE(tick_freeze());
> > + tick_freeze();
> > /*
> > * The state used here cannot be a "coupled" one, because the "coupled"
> > * cpuidle mechanism enables interrupts and doing that with timekeeping
> > * suspended is generally unsafe.
> > */
> > stop_critical_timings();
> > + rcu_idle_enter();
> > drv->states[index].enter_s2idle(dev, drv, index);
> > if (WARN_ON_ONCE(!irqs_disabled()))
> > local_irq_disable();
> > @@ -160,7 +161,8 @@ static void enter_s2idle_proper(struct c
> > * first CPU executing it calls functions containing RCU read-side
> > * critical sections, so tell RCU about that.
> > */
> > - RCU_NONIDLE(tick_unfreeze());
> > + rcu_idle_exit();
> > + tick_unfreeze();
> > start_critical_timings();
> >
> > time_end = ns_to_ktime(local_clock());
> > @@ -229,16 +231,18 @@ int cpuidle_enter_state(struct cpuidle_d
> > /* Take note of the planned idle state. */
> > sched_idle_set_state(target_state);
> >
> > - trace_cpu_idle_rcuidle(index, dev->cpu);
> > + trace_cpu_idle(index, dev->cpu);
> > time_start = ns_to_ktime(local_clock());
> >
> > stop_critical_timings();
> > + rcu_idle_enter();
> > entered_state = target_state->enter(dev, drv, index);
> > + rcu_idle_exit();
> > start_critical_timings();
> >
> > sched_clock_idle_wakeup_event();
> > time_end = ns_to_ktime(local_clock());
> > - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
> > + trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
> >
> > /* The cpu is no longer idle or about to enter idle. */
> > sched_idle_set_state(NULL);
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -54,17 +54,18 @@ __setup("hlt", cpu_idle_nopoll_setup);
> >
> > static noinline int __cpuidle cpu_idle_poll(void)
> > {
> > + trace_cpu_idle(0, smp_processor_id());
> > + stop_critical_timings();
> > rcu_idle_enter();
> > - trace_cpu_idle_rcuidle(0, smp_processor_id());
> > local_irq_enable();
> > - stop_critical_timings();
> >
> > while (!tif_need_resched() &&
> > - (cpu_idle_force_poll || tick_check_broadcast_expired()))
> > + (cpu_idle_force_poll || tick_check_broadcast_expired()))
> > cpu_relax();
> > - start_critical_timings();
> > - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> > +
> > rcu_idle_exit();
> > + start_critical_timings();
> > + trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> >
> > return 1;
> > }
> > @@ -91,7 +92,9 @@ void __cpuidle default_idle_call(void)
> > local_irq_enable();
> > } else {
> > stop_critical_timings();
> > + rcu_idle_enter();
> > arch_cpu_idle();
> > + rcu_idle_exit();
> > start_critical_timings();
> > }
> > }
> > @@ -158,7 +161,6 @@ static void cpuidle_idle_call(void)
> >
> > if (cpuidle_not_available(drv, dev)) {
> > tick_nohz_idle_stop_tick();
> > - rcu_idle_enter();
> >
> > default_idle_call();
> > goto exit_idle;
> > @@ -178,21 +180,17 @@ static void cpuidle_idle_call(void)
> > u64 max_latency_ns;
> >
> > if (idle_should_enter_s2idle()) {
> > - rcu_idle_enter();
> >
> > entered_state = call_cpuidle_s2idle(drv, dev);
> > if (entered_state > 0)
> > goto exit_idle;
> >
> > - rcu_idle_exit();
> > -
> > max_latency_ns = U64_MAX;
> > } else {
> > max_latency_ns = dev->forced_idle_latency_limit_ns;
> > }
> >
> > tick_nohz_idle_stop_tick();
> > - rcu_idle_enter();
> >
> > next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> > call_cpuidle(drv, dev, next_state);
> > @@ -209,8 +207,6 @@ static void cpuidle_idle_call(void)
> > else
> > tick_nohz_idle_retain_tick();
> >
> > - rcu_idle_enter();
> > -
> > entered_state = call_cpuidle(drv, dev, next_state);
> > /*
> > * Give the governor an opportunity to reflect on the outcome
> > @@ -226,8 +222,6 @@ static void cpuidle_idle_call(void)
> > */
> > if (WARN_ON_ONCE(irqs_disabled()))
> > local_irq_enable();
> > -
> > - rcu_idle_exit();
> > }
> >
> > /*
> >
> >

2020-08-27 07:49:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Wed, Aug 26, 2020 at 09:24:19PM -0400, Joel Fernandes wrote:
> On Wed, Aug 26, 2020 at 09:18:26PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 21, 2020 at 10:47:41AM +0200, Peter Zijlstra wrote:
> > > Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
> > > that the locking tracepoints were using RCU.
> > >
> > > Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
> > > this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
> > >
> > > Specifically, sched_clock_idle_wakeup_event() will use ktime which
> > > will use seqlocks which will tickle lockdep, and
> > > stop_critical_timings() uses lock.
> >
> > I was wondering if those tracepoints should just use _rcuidle variant of the
> > trace call. But that's a terrible idea considering that would add unwanted
> > overhead I think.
> >
> > Reviewed-by: Joel Fernandes (Google) <[email protected]>
>
> BTW, if tracepoint is converted to use RCU-trace flavor, then these kinds of
> issues go away, no? That RCU flavor is always watching.

All trace_*_rcuidle() and RCU_NONIDLE() usage is a bug IMO.

Ideally RCU-trace goes away too.

2020-08-27 07:54:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Thu, Aug 27 2020 at 09:47, peterz wrote:
> On Wed, Aug 26, 2020 at 09:24:19PM -0400, Joel Fernandes wrote:
>> On Wed, Aug 26, 2020 at 09:18:26PM -0400, Joel Fernandes wrote:
>> > On Fri, Aug 21, 2020 at 10:47:41AM +0200, Peter Zijlstra wrote:
>> > > Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
>> > > that the locking tracepoints were using RCU.
>> > >
>> > > Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
>> > > this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
>> > >
>> > > Specifically, sched_clock_idle_wakeup_event() will use ktime which
>> > > will use seqlocks which will tickle lockdep, and
>> > > stop_critical_timings() uses lock.
>> >
>> > I was wondering if those tracepoints should just use _rcuidle variant of the
>> > trace call. But that's a terrible idea considering that would add unwanted
>> > overhead I think.
>> >
>> > Reviewed-by: Joel Fernandes (Google) <[email protected]>
>>
>> BTW, if tracepoint is converted to use RCU-trace flavor, then these kinds of
>> issues go away, no? That RCU flavor is always watching.
>
> All trace_*_rcuidle() and RCU_NONIDLE() usage is a bug IMO.

It's the same problem as low level entry/exit. And that stuff is a hack
which papers over the problem instead of fixing it from ground up. But
we are talking about tracing, right?

Thanks,

tglx

Subject: [tip: locking/core] sched,idle,rcu: Push rcu_idle deeper into the idle path

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 1098582a0f6c4e8fd28da0a6305f9233d02c9c1d
Gitweb: https://git.kernel.org/tip/1098582a0f6c4e8fd28da0a6305f9233d02c9c1d
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 07 Aug 2020 20:50:19 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:41:53 +02:00

sched,idle,rcu: Push rcu_idle deeper into the idle path

Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.

Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
drivers/cpuidle/cpuidle.c | 12 ++++++++----
kernel/sched/idle.c | 22 ++++++++--------------
2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 2fe4f3c..9bcda41 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -145,13 +145,14 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
* executing it contains RCU usage regarded as invalid in the idle
* context, so tell RCU about that.
*/
- RCU_NONIDLE(tick_freeze());
+ tick_freeze();
/*
* The state used here cannot be a "coupled" one, because the "coupled"
* cpuidle mechanism enables interrupts and doing that with timekeeping
* suspended is generally unsafe.
*/
stop_critical_timings();
+ rcu_idle_enter();
drv->states[index].enter_s2idle(dev, drv, index);
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
@@ -160,7 +161,8 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
* first CPU executing it calls functions containing RCU read-side
* critical sections, so tell RCU about that.
*/
- RCU_NONIDLE(tick_unfreeze());
+ rcu_idle_exit();
+ tick_unfreeze();
start_critical_timings();

time_end = ns_to_ktime(local_clock());
@@ -229,16 +231,18 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
/* Take note of the planned idle state. */
sched_idle_set_state(target_state);

- trace_cpu_idle_rcuidle(index, dev->cpu);
+ trace_cpu_idle(index, dev->cpu);
time_start = ns_to_ktime(local_clock());

stop_critical_timings();
+ rcu_idle_enter();
entered_state = target_state->enter(dev, drv, index);
+ rcu_idle_exit();
start_critical_timings();

sched_clock_idle_wakeup_event();
time_end = ns_to_ktime(local_clock());
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
+ trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);

/* The cpu is no longer idle or about to enter idle. */
sched_idle_set_state(NULL);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 6bf3498..ea3a098 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -54,17 +54,18 @@ __setup("hlt", cpu_idle_nopoll_setup);

static noinline int __cpuidle cpu_idle_poll(void)
{
+ trace_cpu_idle(0, smp_processor_id());
+ stop_critical_timings();
rcu_idle_enter();
- trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
- stop_critical_timings();

while (!tif_need_resched() &&
- (cpu_idle_force_poll || tick_check_broadcast_expired()))
+ (cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();
- start_critical_timings();
- trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
+
rcu_idle_exit();
+ start_critical_timings();
+ trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());

return 1;
}
@@ -91,7 +92,9 @@ void __cpuidle default_idle_call(void)
local_irq_enable();
} else {
stop_critical_timings();
+ rcu_idle_enter();
arch_cpu_idle();
+ rcu_idle_exit();
start_critical_timings();
}
}
@@ -158,7 +161,6 @@ static void cpuidle_idle_call(void)

if (cpuidle_not_available(drv, dev)) {
tick_nohz_idle_stop_tick();
- rcu_idle_enter();

default_idle_call();
goto exit_idle;
@@ -178,21 +180,17 @@ static void cpuidle_idle_call(void)
u64 max_latency_ns;

if (idle_should_enter_s2idle()) {
- rcu_idle_enter();

entered_state = call_cpuidle_s2idle(drv, dev);
if (entered_state > 0)
goto exit_idle;

- rcu_idle_exit();
-
max_latency_ns = U64_MAX;
} else {
max_latency_ns = dev->forced_idle_latency_limit_ns;
}

tick_nohz_idle_stop_tick();
- rcu_idle_enter();

next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
call_cpuidle(drv, dev, next_state);
@@ -209,8 +207,6 @@ static void cpuidle_idle_call(void)
else
tick_nohz_idle_retain_tick();

- rcu_idle_enter();
-
entered_state = call_cpuidle(drv, dev, next_state);
/*
* Give the governor an opportunity to reflect on the outcome
@@ -226,8 +222,6 @@ exit_idle:
*/
if (WARN_ON_ONCE(irqs_disabled()))
local_irq_enable();
-
- rcu_idle_exit();
}

/*

2020-08-27 22:31:10

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Thu, Aug 27, 2020 at 09:47:48AM +0200, [email protected] wrote:
> On Wed, Aug 26, 2020 at 09:24:19PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 26, 2020 at 09:18:26PM -0400, Joel Fernandes wrote:
> > > On Fri, Aug 21, 2020 at 10:47:41AM +0200, Peter Zijlstra wrote:
> > > > Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
> > > > that the locking tracepoints were using RCU.
> > > >
> > > > Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
> > > > this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
> > > >
> > > > Specifically, sched_clock_idle_wakeup_event() will use ktime which
> > > > will use seqlocks which will tickle lockdep, and
> > > > stop_critical_timings() uses lock.
> > >
> > > I was wondering if those tracepoints should just use _rcuidle variant of the
> > > trace call. But that's a terrible idea considering that would add unwanted
> > > overhead I think.
> > >
> > > Reviewed-by: Joel Fernandes (Google) <[email protected]>
> >
> > BTW, if tracepoint is converted to use RCU-trace flavor, then these kinds of
> > issues go away, no? That RCU flavor is always watching.
>
> All trace_*_rcuidle() and RCU_NONIDLE() usage is a bug IMO.
>
> Ideally RCU-trace goes away too.

I was thinking that unless the rcu_idle_enter/exit calls coincide with points
in the kernel where instrumentation is not allowed, there is always a chance
somebody wants to use tracepoints after rcu_idle_enter or before exit. In
this case, trace_*_rcuidle() is unavoidable unless you can move the
rcu_idle_enter/exit calls deeper as you are doing.

Maybe objtool can help with that?

The other solution is RCU-trace if you can't push the rcu_idle_enter/exit
calls any deeper and still want to get rid of trace_*_rcuidle thingies. Me
and Mathieu were talking at LPC about tracepoint conversion to RCU-trace and
we can work on it if that's the right direction.

thanks,

- Joel

2020-09-02 07:09:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 03/11] sched,idle,rcu: Push rcu_idle deeper into the idle path

On Thu, Aug 27, 2020 at 06:30:01PM -0400, Joel Fernandes wrote:
> On Thu, Aug 27, 2020 at 09:47:48AM +0200, [email protected] wrote:

> > All trace_*_rcuidle() and RCU_NONIDLE() usage is a bug IMO.
> >
> > Ideally RCU-trace goes away too.
>
> I was thinking that unless the rcu_idle_enter/exit calls coincide with points
> in the kernel where instrumentation is not allowed, there is always a chance
> somebody wants to use tracepoints after rcu_idle_enter or before exit. In
> this case, trace_*_rcuidle() is unavoidable unless you can move the
> rcu_idle_enter/exit calls deeper as you are doing.
>
> Maybe objtool can help with that?

The eventual goal is to mark idle code noinstr too. There's more work
before we can do that.