2013-05-14 16:03:03

by Frederic Weisbecker

[permalink] [raw]
Subject: [GIT PULL] Nohz fixes

Ingo,

Please pull the timers/urgent branch that can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/urgent

Thanks,
Frederic
---

Steven Rostedt (2):
nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled
nohz: Warn if the machine can not perform nohz_full


kernel/time/tick-sched.c | 5 +++++
lib/Kconfig.debug | 2 ++
2 files changed, 7 insertions(+), 0 deletions(-)


2013-05-14 16:03:06

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/2] nohz: Warn if the machine can not perform nohz_full

From: Steven Rostedt <[email protected]>

If the user configures NO_HZ_FULL and defines nohz_full=XXX on the
kernel command line, or enables NO_HZ_FULL_ALL, but nohz fails
due to the machine having a unstable clock, warn about it.

We do not want users thinking that they are getting the benefit
of nohz when their machine can not support it.

Signed-off-by: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bc67d42..cfc798b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -178,6 +178,11 @@ static bool can_stop_full_tick(void)
*/
if (!sched_clock_stable) {
trace_tick_stop(0, "unstable sched clock\n");
+ /*
+ * Don't allow the user to think they can get
+ * full NO_HZ with this machine.
+ */
+ WARN_ONCE(1, "NO_HZ FULL will not work with unstable sched clock");
return false;
}
#endif
--
1.7.5.4

2013-05-14 16:03:16

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

From: Steven Rostedt <[email protected]>

Trying to test the nohz_full code, I was not able to get it to work.
Finally I enabled the tick_stop tracepoint and it showed:

tick_stop: success=no msg=perf events running

I talked to Frederic Weisbecker about this and he informed me that
perf is used by the lockup detector. I checked the code, and sure
enough it is.

As perf is always running when LOCKUP_DETECTOR is enabled, which
will always disable nohz_full from working, instead of confusing
users, disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled.

When perf is changed such that it does not prevent nohz_full from
working, then we can and should remove this constraint.

Signed-off-by: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
lib/Kconfig.debug | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 566cf2b..1364d09 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -174,6 +174,8 @@ config DEBUG_SHIRQ
config LOCKUP_DETECTOR
bool "Detect Hard and Soft Lockups"
depends on DEBUG_KERNEL && !S390
+ # Lockup detector currently prevents NO_HZ_FULL from working
+ depends on !NO_HZ_FULL
help
Say Y here to enable the kernel to act as a watchdog to detect
hard and soft lockups.
--
1.7.5.4

2013-05-15 08:37:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Tue, May 14, 2013 at 06:02:51PM +0200, Frederic Weisbecker wrote:
> From: Steven Rostedt <[email protected]>
>
> Trying to test the nohz_full code, I was not able to get it to work.
> Finally I enabled the tick_stop tracepoint and it showed:
>
> tick_stop: success=no msg=perf events running
>
> I talked to Frederic Weisbecker about this and he informed me that
> perf is used by the lockup detector. I checked the code, and sure
> enough it is.
>
> As perf is always running when LOCKUP_DETECTOR is enabled, which
> will always disable nohz_full from working, instead of confusing
> users, disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled.
>
> When perf is changed such that it does not prevent nohz_full from
> working, then we can and should remove this constraint.

That's a bit contradictory in function, you want the NMI watchdog to
cover all code, so disabling whilst entering NO_HZ state is going to
make it not cover some code - *fail*.

Rather I would suggest disabling the NMI watchdog's runtime default; so
you can still enable it with something like:

echo 1 > /proc/sys/kernel/nmi_watchdog

2013-05-15 14:47:42

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 10:37:29AM +0200, Peter Zijlstra wrote:
> On Tue, May 14, 2013 at 06:02:51PM +0200, Frederic Weisbecker wrote:
> > From: Steven Rostedt <[email protected]>
> >
> > Trying to test the nohz_full code, I was not able to get it to work.
> > Finally I enabled the tick_stop tracepoint and it showed:
> >
> > tick_stop: success=no msg=perf events running
> >
> > I talked to Frederic Weisbecker about this and he informed me that
> > perf is used by the lockup detector. I checked the code, and sure
> > enough it is.
> >
> > As perf is always running when LOCKUP_DETECTOR is enabled, which
> > will always disable nohz_full from working, instead of confusing
> > users, disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled.
> >
> > When perf is changed such that it does not prevent nohz_full from
> > working, then we can and should remove this constraint.
>
> That's a bit contradictory in function, you want the NMI watchdog to
> cover all code, so disabling whilst entering NO_HZ state is going to
> make it not cover some code - *fail*.
>
> Rather I would suggest disabling the NMI watchdog's runtime default; so
> you can still enable it with something like:
>
> echo 1 > /proc/sys/kernel/nmi_watchdog

Coming into the middle of the thread is always hard, but why/how does perf
disable nohz_full? I didn't think the hardware events of perf would cause
problems as they are no different than an irq. Curious.

Cheers,
Don

2013-05-15 15:27:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, 2013-05-15 at 11:06 -0400, Don Zickus wrote:

> > That's a bit contradictory in function, you want the NMI watchdog to
> > cover all code, so disabling whilst entering NO_HZ state is going to
> > make it not cover some code - *fail*.

Well, when NO_HZ_FULL is set, it covers no code :-)

> >
> > Rather I would suggest disabling the NMI watchdog's runtime default; so
> > you can still enable it with something like:
> >
> > echo 1 > /proc/sys/kernel/nmi_watchdog

Yeah, just disabling it via run time might work.


>
> Coming into the middle of the thread is always hard, but why/how does perf
> disable nohz_full? I didn't think the hardware events of perf would cause
> problems as they are no different than an irq. Curious.

Right now perf requires a tick, not sure exactly why, but you can look
at the code in perf_event_task_tick(). Thus if NO_HZ_FULL sees that a
perf tick is pending, it won't disable ticks. Unfortunately, the
watchdogs, both NMI and soft lockup, use the perf infrastructure to
trigger NMIs or interrupts. This adds a perf element on the rotate list
and keeps NO_HZ_FULL from *ever* activating.

-- Steve

2013-05-15 15:59:21

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 10:37:29AM +0200, Peter Zijlstra wrote:
> On Tue, May 14, 2013 at 06:02:51PM +0200, Frederic Weisbecker wrote:
> > From: Steven Rostedt <[email protected]>
> >
> > Trying to test the nohz_full code, I was not able to get it to work.
> > Finally I enabled the tick_stop tracepoint and it showed:
> >
> > tick_stop: success=no msg=perf events running
> >
> > I talked to Frederic Weisbecker about this and he informed me that
> > perf is used by the lockup detector. I checked the code, and sure
> > enough it is.
> >
> > As perf is always running when LOCKUP_DETECTOR is enabled, which
> > will always disable nohz_full from working, instead of confusing
> > users, disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled.
> >
> > When perf is changed such that it does not prevent nohz_full from
> > working, then we can and should remove this constraint.
>
> That's a bit contradictory in function, you want the NMI watchdog to
> cover all code, so disabling whilst entering NO_HZ state is going to
> make it not cover some code - *fail*.
>
> Rather I would suggest disabling the NMI watchdog's runtime default; so
> you can still enable it with something like:
>
> echo 1 > /proc/sys/kernel/nmi_watchdog
>

Sounds good, and we then warn the user about that.

2013-05-15 16:08:51

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 11:27:02AM -0400, Steven Rostedt wrote:
> > Coming into the middle of the thread is always hard, but why/how does perf
> > disable nohz_full? I didn't think the hardware events of perf would cause
> > problems as they are no different than an irq. Curious.
>
> Right now perf requires a tick, not sure exactly why, but you can look
> at the code in perf_event_task_tick(). Thus if NO_HZ_FULL sees that a
> perf tick is pending, it won't disable ticks. Unfortunately, the
> watchdogs, both NMI and soft lockup, use the perf infrastructure to
> trigger NMIs or interrupts. This adds a perf element on the rotate list
> and keeps NO_HZ_FULL from *ever* activating.

Ok. Thanks. I don't know what the rotate list is for (nor what it does
in general). But I'll poke around.

Cheers,
Don

2013-05-15 16:55:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 11:06:53AM -0400, Don Zickus wrote:
> Coming into the middle of the thread is always hard, but why/how does perf
> disable nohz_full? I didn't think the hardware events of perf would cause
> problems as they are no different than an irq. Curious.

Yah, right :-) So I think what happens is that when we enter kernel
space for whatever reason (say NMIs for a watchdog), we kill the magic
NO_HZ state that allows a single task that's stuck in userspace to
effectively have the tick disabled.

But yeah, the initial patch was a tad light on detail, so I'll have to
let Steve and Frederic expand / correct.

2013-05-15 16:59:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 11:27:02AM -0400, Steven Rostedt wrote:
> Right now perf requires a tick, not sure exactly why, but you can look
> at the code in perf_event_task_tick(). Thus if NO_HZ_FULL sees that a
> perf tick is pending, it won't disable ticks. Unfortunately, the
> watchdogs, both NMI and soft lockup, use the perf infrastructure to
> trigger NMIs or interrupts. This adds a perf element on the rotate list
> and keeps NO_HZ_FULL from *ever* activating.
>

Hmm.. Stephane had a bunch of patches converting the rotation thing to
an hrtimer. I seem to have forgotten what happened to them but I can't
seem to find them merged.

I'll go look.

That leaves the frequency stuff, but the watchdog doesn't use that.

At which point we could run the watchdog without perf_event_task_tick().
>

2013-05-15 17:04:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:

> At which point we could run the watchdog without perf_event_task_tick().

At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
is enabled ;-)

-- Steve

2013-05-15 17:12:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 06:59:15PM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 11:27:02AM -0400, Steven Rostedt wrote:
> > Right now perf requires a tick, not sure exactly why, but you can look
> > at the code in perf_event_task_tick(). Thus if NO_HZ_FULL sees that a
> > perf tick is pending, it won't disable ticks. Unfortunately, the
> > watchdogs, both NMI and soft lockup, use the perf infrastructure to
> > trigger NMIs or interrupts. This adds a perf element on the rotate list
> > and keeps NO_HZ_FULL from *ever* activating.
> >
>
> Hmm.. Stephane had a bunch of patches converting the rotation thing to
> an hrtimer. I seem to have forgotten what happened to them but I can't
> seem to find them merged.
>
> I'll go look.
>
> That leaves the frequency stuff, but the watchdog doesn't use that.
>
> At which point we could run the watchdog without perf_event_task_tick().

Found them:

[email protected]

Looks like they were stuck in my inbox and never applied, so I just did.
They should appear in tip soonish.

2013-05-15 18:07:37

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 07:11:53PM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 06:59:15PM +0200, Peter Zijlstra wrote:
> > On Wed, May 15, 2013 at 11:27:02AM -0400, Steven Rostedt wrote:
> > > Right now perf requires a tick, not sure exactly why, but you can look
> > > at the code in perf_event_task_tick(). Thus if NO_HZ_FULL sees that a
> > > perf tick is pending, it won't disable ticks. Unfortunately, the
> > > watchdogs, both NMI and soft lockup, use the perf infrastructure to
> > > trigger NMIs or interrupts. This adds a perf element on the rotate list
> > > and keeps NO_HZ_FULL from *ever* activating.
> > >
> >
> > Hmm.. Stephane had a bunch of patches converting the rotation thing to
> > an hrtimer. I seem to have forgotten what happened to them but I can't
> > seem to find them merged.
> >
> > I'll go look.
> >
> > That leaves the frequency stuff, but the watchdog doesn't use that.
> >
> > At which point we could run the watchdog without perf_event_task_tick().
>
> Found them:
>
> [email protected]
>
> Looks like they were stuck in my inbox and never applied, so I just did.
> They should appear in tip soonish.

That was easy. Next problem! :-p

Thanks Peter!

Cheers,
Don

2013-05-16 08:12:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
>
> > At which point we could run the watchdog without perf_event_task_tick().
>
> At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> is enabled ;-)
>

Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
working in NMI context. AFAIR for RCU to work, we need to come out of out magic
NO_HZ state since that would've put RCU into EQS.

Frederic, PaulMck?

2013-05-16 11:38:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 10:10:27AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> > On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
> >
> > > At which point we could run the watchdog without perf_event_task_tick().
> >
> > At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> > is enabled ;-)
> >
>
> Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
> working in NMI context. AFAIR for RCU to work, we need to come out of out magic
> NO_HZ state since that would've put RCU into EQS.
>
> Frederic, PaulMck?

But they are protected inside rcu_nmi_*() functions, that's the only thing we need.
If this interrupt userspace then we resume back to it quickly after the NMI and
re-enter EQS.

No need to restart the tick for that. A remote CPU that wants a quiescent state
from the dyntick CPU will notice soon enough the EQS.

We can certainly drop the perf tick for NMI watchdog:

1) As long as there are no flexible events competing on the CPU, no rotation
should be needed.

2) We don't want event throttling for the watchdog. There is even a hack to
handle that:

/* Callback function for perf event subsystem */
static void watchdog_overflow_callback(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
/* Ensure the watchdog never gets throttled */
event->hw.interrupts = 0;

2013-05-16 15:07:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 10:10:27AM +0200, Peter Zijlstra wrote:
> On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> > On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
> >
> > > At which point we could run the watchdog without perf_event_task_tick().
> >
> > At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> > is enabled ;-)
> >
>
> Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
> working in NMI context. AFAIR for RCU to work, we need to come out of out magic
> NO_HZ state since that would've put RCU into EQS.
>
> Frederic, PaulMck?

Not sure I understand the question, but hopefully the verbiage below helps.

Only RCU read-side critical sections need to work in NMI context,
and RCU hooks into nmi_enter() and nmi_exit() to handle this, and this
will work in NO_HZ_FULL in the same way that it works for NO_HZ_IDLE.

But if there are no NMIs, RCU doesn't care. In other words, RCU needs
to know about NMIs so that it can deal with any RCU read-side critical
sections in the NMI handlers, but RCU doesn't rely on NMIs happening at
any particular time or frequency.

Thanx, Paul

2013-05-16 17:57:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 08:07:06AM -0700, Paul E. McKenney wrote:
> On Thu, May 16, 2013 at 10:10:27AM +0200, Peter Zijlstra wrote:
> > On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> > > On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
> > >
> > > > At which point we could run the watchdog without perf_event_task_tick().
> > >
> > > At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> > > is enabled ;-)
> > >
> >
> > Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
> > working in NMI context. AFAIR for RCU to work, we need to come out of out magic
> > NO_HZ state since that would've put RCU into EQS.
> >
> > Frederic, PaulMck?
>
> Not sure I understand the question, but hopefully the verbiage below helps.
>
> Only RCU read-side critical sections need to work in NMI context,
> and RCU hooks into nmi_enter() and nmi_exit() to handle this, and this
> will work in NO_HZ_FULL in the same way that it works for NO_HZ_IDLE.
>
> But if there are no NMIs, RCU doesn't care. In other words, RCU needs
> to know about NMIs so that it can deal with any RCU read-side critical
> sections in the NMI handlers, but RCU doesn't rely on NMIs happening at
> any particular time or frequency.

I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
and re-enable all sorts of things?

Because clearly RCU needs to exit from EQS, which might (or might not) mean
leaving NO_HZ_FULL.

I'm not entirely up-to-date on those details.

2013-05-16 17:59:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 01:38:12PM +0200, Frederic Weisbecker wrote:
> On Thu, May 16, 2013 at 10:10:27AM +0200, Peter Zijlstra wrote:
> > On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> > > On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
> > >
> > > > At which point we could run the watchdog without perf_event_task_tick().
> > >
> > > At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> > > is enabled ;-)
> > >
> >
> > Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
> > working in NMI context. AFAIR for RCU to work, we need to come out of out magic
> > NO_HZ state since that would've put RCU into EQS.
> >
> > Frederic, PaulMck?
>
> But they are protected inside rcu_nmi_*() functions, that's the only thing we need.
> If this interrupt userspace then we resume back to it quickly after the NMI and
> re-enter EQS.
>
> No need to restart the tick for that. A remote CPU that wants a quiescent state
> from the dyntick CPU will notice soon enough the EQS.

Right.. I just wasn't sure how much damage the RCU EQS stop/(re)start did to
the entire NO_HZ_FULL situation.

2013-05-16 18:33:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, 2013-05-16 at 19:56 +0200, Peter Zijlstra wrote:

> I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
> functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
> and re-enable all sorts of things?

It shouldn't. The nmi_enter() notifies RCU that it can no longer ignore
this CPU, where as nmi_enter() tells it that it can ignore it, as it has
re-entered user space.

>
> Because clearly RCU needs to exit from EQS, which might (or might not) mean
> leaving NO_HZ_FULL.

Yep, but the two are pretty much agnostic from each other.

We only need to leave NO_HZ_FULL if RCU (or anything for that matter)
required having a tick again. But as Paul said, getting an NMI in idle
wont restart the tick, so there's no need to restart it here either.

Now if an NMI were to do a call_rcu() then it would require a tick. But
NMIs doing call_rcu() has much bigger issues to worry about ;-)

-- Steve

>
> I'm not entirely up-to-date on those details.

2013-05-16 23:14:31

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 02:32:58PM -0400, Steven Rostedt wrote:
> On Thu, 2013-05-16 at 19:56 +0200, Peter Zijlstra wrote:
>
> > I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
> > functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
> > and re-enable all sorts of things?
>
> It shouldn't. The nmi_enter() notifies RCU that it can no longer ignore
> this CPU, where as nmi_enter() tells it that it can ignore it, as it has
> re-entered user space.
>
> >
> > Because clearly RCU needs to exit from EQS, which might (or might not) mean
> > leaving NO_HZ_FULL.
>
> Yep, but the two are pretty much agnostic from each other.
>
> We only need to leave NO_HZ_FULL if RCU (or anything for that matter)
> required having a tick again. But as Paul said, getting an NMI in idle
> wont restart the tick, so there's no need to restart it here either.
>
> Now if an NMI were to do a call_rcu() then it would require a tick. But
> NMIs doing call_rcu() has much bigger issues to worry about ;-)

Actually even calling call_rcu() won't restart the tick because the callback
and the grace period lifecycle that come along are handled by the RCU nocb
kthreads. If you have migrated these kthreads accordingly this is handled in the
housekeeping CPU. Of course calling call_rcu() from an NMI involve more problems ;)

In fact we never need to restart the tick for RCU. Even round-trips in the kernel
that are potentially longer than irqs/nmis, such as IO syscalls/exception are
fine because they are either actually short and quickly return to user mode, or they
sleep and go idle so the result is the same: RCU idle mode.

There is just a possible exception that is not yet completely handled: if
a task stays in the kernel too long without sleeping, it may extend a
grace period dangerously (there is no tick to report quiescent states).
In this case we should restart the tick. This is only half implemented
currently: RCU sends IPIs to CPUs that do these excessive grace periods
extensions. Just the CPU that receives that IPI doesn't yet detect the issue
and doesn't restart the tick. That's in the TODO list.

2013-05-17 07:41:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 07:56:02PM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2013 at 08:07:06AM -0700, Paul E. McKenney wrote:
> > On Thu, May 16, 2013 at 10:10:27AM +0200, Peter Zijlstra wrote:
> > > On Wed, May 15, 2013 at 01:04:01PM -0400, Steven Rostedt wrote:
> > > > On Wed, 2013-05-15 at 18:59 +0200, Peter Zijlstra wrote:
> > > >
> > > > > At which point we could run the watchdog without perf_event_task_tick().
> > > >
> > > > At which point we can drop the disable LOCKUP_DETECTOR when NO_HZ_FULL
> > > > is enabled ;-)
> > > >
> > >
> > > Can we? The thing I'm worried about is RCU (of course!). ISTR we rely on RCU
> > > working in NMI context. AFAIR for RCU to work, we need to come out of out magic
> > > NO_HZ state since that would've put RCU into EQS.
> > >
> > > Frederic, PaulMck?
> >
> > Not sure I understand the question, but hopefully the verbiage below helps.
> >
> > Only RCU read-side critical sections need to work in NMI context,
> > and RCU hooks into nmi_enter() and nmi_exit() to handle this, and this
> > will work in NO_HZ_FULL in the same way that it works for NO_HZ_IDLE.
> >
> > But if there are no NMIs, RCU doesn't care. In other words, RCU needs
> > to know about NMIs so that it can deal with any RCU read-side critical
> > sections in the NMI handlers, but RCU doesn't rely on NMIs happening at
> > any particular time or frequency.
>
> I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
> functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
> and re-enable all sorts of things?
>
> Because clearly RCU needs to exit from EQS, which might (or might not) mean
> leaving NO_HZ_FULL.
>
> I'm not entirely up-to-date on those details.

My belief is that NMIs won't cause NO_HZ_FULL to kick that CPU out of
adaptive-ticks mode, but I must defer to Frederic on that.

Of course, the NMI -will- cause OS jitter on whichever CPU handles it,
which some people would want to avoid.

Thanx, Paul

2013-05-19 16:18:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] nohz: Disable LOCKUP_DETECTOR when NO_HZ_FULL is enabled

On Thu, May 16, 2013 at 02:32:58PM -0400, Steven Rostedt wrote:
> On Thu, 2013-05-16 at 19:56 +0200, Peter Zijlstra wrote:
>
> > I suppose the fundamental question was: will receiving NMIs negate NO_HZ_FULL's
> > functionality? That is, will the getting of NMIs make us drop out of NO_HZ_FULL
> > and re-enable all sorts of things?
>
> It shouldn't. The nmi_enter() notifies RCU that it can no longer ignore
> this CPU, where as nmi_enter() tells it that it can ignore it, as it has
> re-entered user space.
>
> >
> > Because clearly RCU needs to exit from EQS, which might (or might not) mean
> > leaving NO_HZ_FULL.
>
> Yep, but the two are pretty much agnostic from each other.
>
> We only need to leave NO_HZ_FULL if RCU (or anything for that matter)
> required having a tick again. But as Paul said, getting an NMI in idle
> wont restart the tick, so there's no need to restart it here either.
>
> Now if an NMI were to do a call_rcu() then it would require a tick. But
> NMIs doing call_rcu() has much bigger issues to worry about ;-)

Someone invoking call_rcu() from an NMI handler will get what they deserve,
good and hard! ;-)

Thanx, Paul

> -- Steve
>
> >
> > I'm not entirely up-to-date on those details.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>