2017-06-20 21:34:10

by Liang, Kan

[permalink] [raw]
Subject: [PATCH] kernel/watchdog: fix spurious hard lockups

From: Kan Liang <[email protected]>

Some users reported spurious NMI watchdog timeouts.

We now have more and more systems where the Turbo range is wide enough
that the NMI watchdog expires faster than the soft watchdog timer that
updates the interrupt tick the NMI watchdog relies on.

This problem was originally added by commit 58687acba592
("lockup_detector: Combine nmi_watchdog and softlockup detector").
Previously the NMI watchdog would always check jiffies, which were
ticking fast enough. But now the backing is quite slow so the expire
time becomes more sensitive.

For mainline the right fix is to switch the NMI watchdog to reference
cycles, which tick always at the same rate independent of turbo mode.
But this is requires some complicated changes in perf, which are too
difficult to backport. Since we need a stable fix too just increase the
NMI watchdog rate here to avoid the spurious timeouts. This is not an
ideal fix because a 3x as large Turbo range could still fail, but for
now that's not likely.

Signed-off-by: Kan Liang <[email protected]>
Cc: [email protected]
Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and
softlockup detector")
---

The right fix for mainline can be found here.
perf/x86/intel: enable CPU ref_cycles for GP counter
perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
https://patchwork.kernel.org/patch/9779087/
https://patchwork.kernel.org/patch/9779089/

kernel/watchdog_hld.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 54a427d1f344..0f7c6e758b82 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -164,7 +164,7 @@ int watchdog_nmi_enable(unsigned int cpu)
firstcpu = 1;

wd_attr = &wd_hw_attr;
- wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+ wd_attr->sample_period = 3 * hw_nmi_get_sample_period(watchdog_thresh);

/* Try to register using hardware perf events */
event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
--
2.11.0


2017-06-20 22:04:02

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Tue, 20 Jun 2017 14:33:09 -0700 [email protected] wrote:

> From: Kan Liang <[email protected]>
>
> Some users reported spurious NMI watchdog timeouts.
>
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> updates the interrupt tick the NMI watchdog relies on.
>
> This problem was originally added by commit 58687acba592
> ("lockup_detector: Combine nmi_watchdog and softlockup detector").
> Previously the NMI watchdog would always check jiffies, which were
> ticking fast enough. But now the backing is quite slow so the expire
> time becomes more sensitive.
>
> For mainline the right fix is to switch the NMI watchdog to reference
> cycles, which tick always at the same rate independent of turbo mode.
> But this is requires some complicated changes in perf, which are too
> difficult to backport. Since we need a stable fix too just increase the
> NMI watchdog rate here to avoid the spurious timeouts. This is not an
> ideal fix because a 3x as large Turbo range could still fail, but for
> now that's not likely.
>
> ...
>
> The right fix for mainline can be found here.
> perf/x86/intel: enable CPU ref_cycles for GP counter
> perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
> https://patchwork.kernel.org/patch/9779087/
> https://patchwork.kernel.org/patch/9779089/

Presumably the "right fix" will later be altered to revert this
one-line workaround?

2017-06-20 22:34:27

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups



On 06/20/2017 05:33 PM, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Some users reported spurious NMI watchdog timeouts.
>
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> updates the interrupt tick the NMI watchdog relies on.
>

Hmm ... odd that I haven't seen this. We're running a pretty wide
variety of systems here. Do you have a reproducer? I'd like to see
this occur on production HW.

P.

2017-06-20 23:00:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Tue, Jun 20, 2017 at 06:34:23PM -0400, Prarit Bhargava wrote:
>
>
> On 06/20/2017 05:33 PM, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Some users reported spurious NMI watchdog timeouts.
> >
> > We now have more and more systems where the Turbo range is wide enough
> > that the NMI watchdog expires faster than the soft watchdog timer that
> > updates the interrupt tick the NMI watchdog relies on.
> >
>
> Hmm ... odd that I haven't seen this. We're running a pretty wide
> variety of systems here. Do you have a reproducer? I'd like to see
> this occur on production HW.

It only happens on a few specific CPU SKUs with a very wide Turbo range.
Reproducer is typically some stress workload that turbos very high.

-Andi

2017-06-21 00:12:10

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups



On 06/20/2017 07:00 PM, Andi Kleen wrote:
> On Tue, Jun 20, 2017 at 06:34:23PM -0400, Prarit Bhargava wrote:
>>
>>
>> On 06/20/2017 05:33 PM, [email protected] wrote:
>>> From: Kan Liang <[email protected]>
>>>
>>> Some users reported spurious NMI watchdog timeouts.
>>>
>>> We now have more and more systems where the Turbo range is wide enough
>>> that the NMI watchdog expires faster than the soft watchdog timer that
>>> updates the interrupt tick the NMI watchdog relies on.
>>>
>>
>> Hmm ... odd that I haven't seen this. We're running a pretty wide
>> variety of systems here. Do you have a reproducer? I'd like to see
>> this occur on production HW.
>
> It only happens on a few specific CPU SKUs with a very wide Turbo range.

Which ones?

> Reproducer is typically some stress workload that turbos very high.

So stress the single Turbo Max core? Or any core?

P.

>
> -Andi
>

2017-06-21 12:40:33

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups


> >
> > The right fix for mainline can be found here.
> > perf/x86/intel: enable CPU ref_cycles for GP counter perf/x86/intel,
> > watchdog: Switch NMI watchdog to ref cycles on x86
> > https://patchwork.kernel.org/patch/9779087/
> > https://patchwork.kernel.org/patch/9779089/
>
> Presumably the "right fix" will later be altered to revert this one-line
> workaround?

The "right fix" itself will not touch the watchdog rate. I will modify the
changelog to notify the people who want to do the backport.

As my understanding, it's not harmful even if we don't revert the
workaround. It can still detect the hardlockup, only takes
a tiny bit longer.

Thanks,
Kan

2017-06-21 13:41:00

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Tue, Jun 20, 2017 at 02:33:09PM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Some users reported spurious NMI watchdog timeouts.
>
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> updates the interrupt tick the NMI watchdog relies on.
>
> This problem was originally added by commit 58687acba592
> ("lockup_detector: Combine nmi_watchdog and softlockup detector").
> Previously the NMI watchdog would always check jiffies, which were
> ticking fast enough. But now the backing is quite slow so the expire
> time becomes more sensitive.
>
> For mainline the right fix is to switch the NMI watchdog to reference
> cycles, which tick always at the same rate independent of turbo mode.
> But this is requires some complicated changes in perf, which are too
> difficult to backport. Since we need a stable fix too just increase the
> NMI watchdog rate here to avoid the spurious timeouts. This is not an
> ideal fix because a 3x as large Turbo range could still fail, but for
> now that's not likely.

As this is an Intel problem, we should at least restrict it to
arch/x86/kernel/apic/hw_nmi.c. I don't want to penalize other arches yet.

>
> Signed-off-by: Kan Liang <[email protected]>
> Cc: [email protected]
> Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and
> softlockup detector")
> ---
>
> The right fix for mainline can be found here.
> perf/x86/intel: enable CPU ref_cycles for GP counter
> perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
> https://patchwork.kernel.org/patch/9779087/
> https://patchwork.kernel.org/patch/9779089/

Does that mean this fix is restricted to just -stable then? Otherwise I am
confused why we should take this patch, if you have a better fix above.

Cheers,
Don

>
> kernel/watchdog_hld.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 54a427d1f344..0f7c6e758b82 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -164,7 +164,7 @@ int watchdog_nmi_enable(unsigned int cpu)
> firstcpu = 1;
>
> wd_attr = &wd_hw_attr;
> - wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> + wd_attr->sample_period = 3 * hw_nmi_get_sample_period(watchdog_thresh);
>
> /* Try to register using hardware perf events */
> event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
> --
> 2.11.0
>

2017-06-21 13:47:53

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Wed, Jun 21, 2017 at 12:40:28PM +0000, Liang, Kan wrote:
>
> > >
> > > The right fix for mainline can be found here.
> > > perf/x86/intel: enable CPU ref_cycles for GP counter perf/x86/intel,
> > > watchdog: Switch NMI watchdog to ref cycles on x86
> > > https://patchwork.kernel.org/patch/9779087/
> > > https://patchwork.kernel.org/patch/9779089/
> >
> > Presumably the "right fix" will later be altered to revert this one-line
> > workaround?
>
> The "right fix" itself will not touch the watchdog rate. I will modify the
> changelog to notify the people who want to do the backport.
>
> As my understanding, it's not harmful even if we don't revert the
> workaround. It can still detect the hardlockup, only takes
> a tiny bit longer.

It depends on you perspective of harmful. :-) There are folks that would
like that sampling rate to be more accurate, so they can detect problems
soon than later. You just took an input of 'watchdog_thresh' and blindly
multiplied it by 3, which can confuse an end user who thought they setup a 5
second threshold but instead it turned into a 15 second one. :-(

Cheers,
Don

2017-06-21 14:02:37

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups

>
> On Tue, Jun 20, 2017 at 02:33:09PM -0700, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Some users reported spurious NMI watchdog timeouts.
> >
> > We now have more and more systems where the Turbo range is wide
> enough
> > that the NMI watchdog expires faster than the soft watchdog timer that
> > updates the interrupt tick the NMI watchdog relies on.
> >
> > This problem was originally added by commit 58687acba592
> > ("lockup_detector: Combine nmi_watchdog and softlockup detector").
> > Previously the NMI watchdog would always check jiffies, which were
> > ticking fast enough. But now the backing is quite slow so the expire
> > time becomes more sensitive.
> >
> > For mainline the right fix is to switch the NMI watchdog to reference
> > cycles, which tick always at the same rate independent of turbo mode.
> > But this is requires some complicated changes in perf, which are too
> > difficult to backport. Since we need a stable fix too just increase
> > the NMI watchdog rate here to avoid the spurious timeouts. This is not
> > an ideal fix because a 3x as large Turbo range could still fail, but
> > for now that's not likely.
>
> As this is an Intel problem, we should at least restrict it to
> arch/x86/kernel/apic/hw_nmi.c. I don't want to penalize other arches yet.
>

Sure, I will modify the patch.

> >
> > Signed-off-by: Kan Liang <[email protected]>
> > Cc: [email protected]
> > Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and
> > softlockup detector")
> > ---
> >
> > The right fix for mainline can be found here.
> > perf/x86/intel: enable CPU ref_cycles for GP counter perf/x86/intel,
> > watchdog: Switch NMI watchdog to ref cycles on x86
> > https://patchwork.kernel.org/patch/9779087/
> > https://patchwork.kernel.org/patch/9779089/
>
> Does that mean this fix is restricted to just -stable then? Otherwise I am
> confused why we should take this patch, if you have a better fix above.
>

The "real fix" is still under discussion. It includes some complicated changes, and
also breaks perf RDPMC users. It may take some times to be merged.
This issue is critical, because we already had some reports from users.
It's better that we can have this patch for both -stable and mainline.

Thanks,
Kan

> Cheers,
> Don
>
> >
> > kernel/watchdog_hld.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index
> > 54a427d1f344..0f7c6e758b82 100644
> > --- a/kernel/watchdog_hld.c
> > +++ b/kernel/watchdog_hld.c
> > @@ -164,7 +164,7 @@ int watchdog_nmi_enable(unsigned int cpu)
> > firstcpu = 1;
> >
> > wd_attr = &wd_hw_attr;
> > - wd_attr->sample_period =
> hw_nmi_get_sample_period(watchdog_thresh);
> > + wd_attr->sample_period = 3 *
> > +hw_nmi_get_sample_period(watchdog_thresh);
> >
> > /* Try to register using hardware perf events */
> > event = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
> > watchdog_overflow_callback, NULL);
> > --
> > 2.11.0
> >

2017-06-21 14:10:34

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups



.
> On Wed, Jun 21, 2017 at 12:40:28PM +0000, Liang, Kan wrote:
> >
> > > >
> > > > The right fix for mainline can be found here.
> > > > perf/x86/intel: enable CPU ref_cycles for GP counter
> > > > perf/x86/intel,
> > > > watchdog: Switch NMI watchdog to ref cycles on x86
> > > > https://patchwork.kernel.org/patch/9779087/
> > > > https://patchwork.kernel.org/patch/9779089/
> > >
> > > Presumably the "right fix" will later be altered to revert this
> > > one-line workaround?
> >
> > The "right fix" itself will not touch the watchdog rate. I will modify
> > the changelog to notify the people who want to do the backport.
> >
> > As my understanding, it's not harmful even if we don't revert the
> > workaround. It can still detect the hardlockup, only takes a tiny bit
> > longer.
>
> It depends on you perspective of harmful. :-) There are folks that would like
> that sampling rate to be more accurate, so they can detect problems soon
> than later. You just took an input of 'watchdog_thresh' and blindly
> multiplied it by 3, which can confuse an end user who thought they setup a 5
> second threshold but instead it turned into a 15 second one. :-(

Now, it cannot get accurate threshold because of the Turbo.
We can only get an accurate one after applying the "right fix".
Right, we should revert the workaround once the "right fix" is applied.

Thanks,
Kan


2017-06-21 21:02:57

by Marc Herbert

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On 20/06/17 17:12, Prarit Bhargava wrote:
>>> Hmm ... odd that I haven't seen this. We're running a pretty wide
>>> variety of systems here. Do you have a reproducer? I'd like to see
>>> this occur on production HW.

"Production" is where this patch was born and still lives right now:

https://chromium-review.googlesource.com/c/506327/

>> It only happens on a few specific CPU SKUs with a very wide Turbo range.
>
> Which ones?
>

The ones with turbo mode > 2.5 x TSC_MHz when you stress them hard and long
enough. Simple maths: just compare the soft and hard timers in the code.

The factor 3 moves the condition to: turbo > 7.5 x TSC_MHz.

2017-06-28 11:41:06

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Tue 20-06-17 14:33:09, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Some users reported spurious NMI watchdog timeouts.
>
> We now have more and more systems where the Turbo range is wide enough
> that the NMI watchdog expires faster than the soft watchdog timer that
> updates the interrupt tick the NMI watchdog relies on.

AFAIR the watchdog doesn't rely on deferred timers so this would suggest
that a standard hrtimer can expire much later than programmed, right?
If that is the case how come other parts of the system do not break. We
do rely on hrtimers on many other places?
--
Michal Hocko
SUSE Labs

2017-06-28 13:24:24

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups


> > From: Kan Liang <[email protected]>
> >
> > Some users reported spurious NMI watchdog timeouts.
> >
> > We now have more and more systems where the Turbo range is wide
> enough
> > that the NMI watchdog expires faster than the soft watchdog timer that
> > updates the interrupt tick the NMI watchdog relies on.
>
> AFAIR the watchdog doesn't rely on deferred timers so this would suggest
> that a standard hrtimer can expire much later than programmed, right?

The softlockup watchdog relies on hrtimers.
The hardlockup watchdog (NMI watchdog) relies on perf subsystem and
using unhalted CPU cycles.
When the softlockup watchdog expires, it updates the hrtimer_interrupts.
When the NMI watchdog expires, it will check the hrtimer_interrupts, and
determine if it's a hardlockup.
The design was to make the softlockup watchdog runs with 2.5 times the
rate of NMI watchdog. So it guarantees that the hrtimer_interrupts is
updated before the NMI watchdog expires.
That works well if Turbo-Mode is disabled.
However, when Turbo-Mode is enabled, unhalted CPU cycles might run
much faster than expected, even faster than softlockup watchdog.
So the softlockup watchdog will not get a chance to update the
hrtimer_interrupts, which will trigger false positives.


Thanks,
Kan

> If that is the case how come other parts of the system do not break. We do
> rely on hrtimers on many other places?
> --
> Michal Hocko
> SUSE Labs

2017-06-28 14:30:01

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: fix spurious hard lockups

On Wed 28-06-17 13:24:08, Liang, Kan wrote:
>
> > > From: Kan Liang <[email protected]>
> > >
> > > Some users reported spurious NMI watchdog timeouts.
> > >
> > > We now have more and more systems where the Turbo range is wide
> > enough
> > > that the NMI watchdog expires faster than the soft watchdog timer that
> > > updates the interrupt tick the NMI watchdog relies on.
> >
> > AFAIR the watchdog doesn't rely on deferred timers so this would suggest
> > that a standard hrtimer can expire much later than programmed, right?
>
> The softlockup watchdog relies on hrtimers.
> The hardlockup watchdog (NMI watchdog) relies on perf subsystem and
> using unhalted CPU cycles.
> When the softlockup watchdog expires, it updates the hrtimer_interrupts.
> When the NMI watchdog expires, it will check the hrtimer_interrupts, and
> determine if it's a hardlockup.
> The design was to make the softlockup watchdog runs with 2.5 times the
> rate of NMI watchdog. So it guarantees that the hrtimer_interrupts is
> updated before the NMI watchdog expires.
> That works well if Turbo-Mode is disabled.
> However, when Turbo-Mode is enabled, unhalted CPU cycles might run
> much faster than expected, even faster than softlockup watchdog.
> So the softlockup watchdog will not get a chance to update the
> hrtimer_interrupts, which will trigger false positives.

So it is not the hrtimer which doesn't fire but rather the NMI events
fire too quickly, right?
--
Michal Hocko
SUSE Labs

2017-06-28 14:32:38

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH] kernel/watchdog: fix spurious hard lockups


> > > > From: Kan Liang <[email protected]>
> > > >
> > > > Some users reported spurious NMI watchdog timeouts.
> > > >
> > > > We now have more and more systems where the Turbo range is wide
> > > enough
> > > > that the NMI watchdog expires faster than the soft watchdog timer
> > > > that updates the interrupt tick the NMI watchdog relies on.
> > >
> > > AFAIR the watchdog doesn't rely on deferred timers so this would
> > > suggest that a standard hrtimer can expire much later than programmed,
> right?
> >
> > The softlockup watchdog relies on hrtimers.
> > The hardlockup watchdog (NMI watchdog) relies on perf subsystem and
> > using unhalted CPU cycles.
> > When the softlockup watchdog expires, it updates the hrtimer_interrupts.
> > When the NMI watchdog expires, it will check the hrtimer_interrupts,
> > and determine if it's a hardlockup.
> > The design was to make the softlockup watchdog runs with 2.5 times the
> > rate of NMI watchdog. So it guarantees that the hrtimer_interrupts is
> > updated before the NMI watchdog expires.
> > That works well if Turbo-Mode is disabled.
> > However, when Turbo-Mode is enabled, unhalted CPU cycles might run
> > much faster than expected, even faster than softlockup watchdog.
> > So the softlockup watchdog will not get a chance to update the
> > hrtimer_interrupts, which will trigger false positives.
>
> So it is not the hrtimer which doesn't fire but rather the NMI events fire too
> quickly, right?

Yes, the NMI fires too quickly.

Thanks,
Kan