2017-06-02 20:15:04

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Tue, May 30, 2017 at 11:26:58AM +1000, Nicholas Piggin wrote:
> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
>
> LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
> for lockup detectors. An architecture that defines HAVE_NMI_WATCHDOG
> need not use this this if it has a very basic watchdog or uses its own
> options and interfaces (e.g., sparc). touch_nmi_watchdog() will
> continue to call their arch_touch_nmi_watchdog().
>
> HARDLOCKUP_DETECTOR_PERF is the perf-based lockup detector.
>
> HARDLOCKUP_DETECTOR is the framework for arch NMI_WATCHDOG hard lockup
> detectors that conform to the LOCKUP_DETECTOR interfaces.

Hi Nick,

Sorry for the late response. I did some sanity testing on your patches on
x86_64 and it seems to work fine. I don't think I have any real issues with
the patches (without making time-consuming cleanup changes).

My last concern is wrapping my head around the config options.

HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.

In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
original split out of watchdog_hld.c). Actually more like the
SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.

In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
SOFTLOCKUP_DETECTOR framework. Instead just the bare bones LOCKUP_DETECTOR.


If so, the following is a little confusing to me..


<snip>

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e4587ebe52c7..a3afe3e10278 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
> The frequency of hrtimer and NMI events and the soft and hard lockup
> thresholds can be controlled through the sysctl watchdog_thresh.
>
> +config SOFTLOCKUP_DETECTOR
> + bool "Detect Soft Lockups"
> + depends on LOCKUP_DETECTOR
> +
> +config HARDLOCKUP_DETECTOR_PERF
> + bool
> + select SOFTLOCKUP_DETECTOR

Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'

> +
> +#
> +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> +# rather than the perf based detector.
> +#
> +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> +# (e.g., sysctl and cmdline).
> +#
> config HARDLOCKUP_DETECTOR
> - def_bool y
> - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> + bool "Detect Hard Lockups"
> + depends on LOCKUP_DETECTOR
> + depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> + select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG

Here is my confusion with HAVE_NMI_WATCHDOG

It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
which would break sparc, I think.

And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
dependency on !HAVE_NMI_WATCHDOG???

Yeah, the config options are confusing.

If the above is right then we might need something like

depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI

(to replace the depends on !HAVE_NMI_WATCHDOG.. line)

Cheers,
Don

>
> config BOOTPARAM_HARDLOCKUP_PANIC
> bool "Panic (Reboot) On Hard Lockups"
> @@ -826,7 +843,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>
> config BOOTPARAM_SOFTLOCKUP_PANIC
> bool "Panic (Reboot) On Soft Lockups"
> - depends on LOCKUP_DETECTOR
> + depends on SOFTLOCKUP_DETECTOR
> help
> Say Y here to enable the kernel to panic on "soft lockups",
> which are bugs that cause the kernel to loop in kernel
> @@ -843,7 +860,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>
> config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
> int
> - depends on LOCKUP_DETECTOR
> + depends on SOFTLOCKUP_DETECTOR
> range 0 1
> default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
> default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
> @@ -851,7 +868,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
> config DETECT_HUNG_TASK
> bool "Detect Hung Tasks"
> depends on DEBUG_KERNEL
> - default LOCKUP_DETECTOR
> + default SOFTLOCKUP_DETECTOR
> help
> Say Y here to enable the kernel to detect "hung tasks",
> which are bugs that cause the task to be stuck in
> --
> 2.11.0
>


2017-06-03 06:10:23

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Fri, 2 Jun 2017 16:15:00 -0400
Don Zickus <[email protected]> wrote:

> On Tue, May 30, 2017 at 11:26:58AM +1000, Nicholas Piggin wrote:
> > Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> > HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
> >
> > LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
> > for lockup detectors. An architecture that defines HAVE_NMI_WATCHDOG
> > need not use this this if it has a very basic watchdog or uses its own
> > options and interfaces (e.g., sparc). touch_nmi_watchdog() will
> > continue to call their arch_touch_nmi_watchdog().
> >
> > HARDLOCKUP_DETECTOR_PERF is the perf-based lockup detector.
> >
> > HARDLOCKUP_DETECTOR is the framework for arch NMI_WATCHDOG hard lockup
> > detectors that conform to the LOCKUP_DETECTOR interfaces.
>
> Hi Nick,
>
> Sorry for the late response. I did some sanity testing on your patches on
> x86_64 and it seems to work fine. I don't think I have any real issues with
> the patches (without making time-consuming cleanup changes).
>
> My last concern is wrapping my head around the config options.
>
> HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.

Yeah it's not the clearest. I think we need another pass over config
options to start straightening them out.

It means the arch has a hardlockup detector, so it has the
arch_touch_nmi_watchdog() and you can't also select the perf HLD.

> In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
> original split out of watchdog_hld.c). Actually more like the
> SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.

Well yes it uses some of the start/stop framework for the SLD, but
doesn't use much beyond that of the lockup detector stuff (most of
the boot options and sysctl parameters etc it does not use). So sparc
is a little odd.

I would hope to convert it over to more like powerpc patch and make
it a first class HLD, but it seems not all options are 100% compatible
so it would need some careful testing.

>
> In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
> SOFTLOCKUP_DETECTOR framework. Instead just the bare bones LOCKUP_DETECTOR.

It does use the HLD framework. The subsequent patch for powerpc adds
a PPC64 case in the dependencies.

>
>
> If so, the following is a little confusing to me..
>
>
> <snip>
>
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index e4587ebe52c7..a3afe3e10278 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
> > The frequency of hrtimer and NMI events and the soft and hard lockup
> > thresholds can be controlled through the sysctl watchdog_thresh.
> >
> > +config SOFTLOCKUP_DETECTOR
> > + bool "Detect Soft Lockups"
> > + depends on LOCKUP_DETECTOR
> > +
> > +config HARDLOCKUP_DETECTOR_PERF
> > + bool
> > + select SOFTLOCKUP_DETECTOR
>
> Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'

Kconfig is pretty clunky, I was struggling to make it do the right
thing... I could try.

>
> > +
> > +#
> > +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> > +# rather than the perf based detector.
> > +#
> > +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> > +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> > +# (e.g., sysctl and cmdline).
> > +#
> > config HARDLOCKUP_DETECTOR
> > - def_bool y
> > - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > + bool "Detect Hard Lockups"
> > + depends on LOCKUP_DETECTOR
> > + depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > + select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
>
> Here is my confusion with HAVE_NMI_WATCHDOG
>
> It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> which would break sparc, I think.
>
> And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> dependency on !HAVE_NMI_WATCHDOG???

I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.

After this patch, it always selects the _PERF detector because that's
the only one available. See the powerpc patch which adds the PPC64
exception here.

Yes it's a bit clunky. I think we can subsequently remove HAVE_NMI_DETECTOR
and replace it with something a bit saner and clean up some of these
convoluted cases.

Thanks,
Nick

2017-06-06 16:50:01

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:
> > My last concern is wrapping my head around the config options.
> >
> > HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.
>
> Yeah it's not the clearest. I think we need another pass over config
> options to start straightening them out.
>
> It means the arch has a hardlockup detector, so it has the
> arch_touch_nmi_watchdog() and you can't also select the perf HLD.

Ok, agreed.

>
> > In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
> > original split out of watchdog_hld.c). Actually more like the
> > SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.
>
> Well yes it uses some of the start/stop framework for the SLD, but
> doesn't use much beyond that of the lockup detector stuff (most of
> the boot options and sysctl parameters etc it does not use). So sparc
> is a little odd.
>
> I would hope to convert it over to more like powerpc patch and make
> it a first class HLD, but it seems not all options are 100% compatible
> so it would need some careful testing.
>

Ok. More comments on sparc below..

> >
> > In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
> > SOFTLOCKUP_DETECTOR framework. Instead just the bare bones LOCKUP_DETECTOR.
>
> It does use the HLD framework. The subsequent patch for powerpc adds
> a PPC64 case in the dependencies.

Ah, ok.

>
> >
> >
> > If so, the following is a little confusing to me..
> >
> >
> > <snip>
> >
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index e4587ebe52c7..a3afe3e10278 100644
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
> > > The frequency of hrtimer and NMI events and the soft and hard lockup
> > > thresholds can be controlled through the sysctl watchdog_thresh.
> > >
> > > +config SOFTLOCKUP_DETECTOR
> > > + bool "Detect Soft Lockups"
> > > + depends on LOCKUP_DETECTOR
> > > +
> > > +config HARDLOCKUP_DETECTOR_PERF
> > > + bool
> > > + select SOFTLOCKUP_DETECTOR
> >
> > Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'
>
> Kconfig is pretty clunky, I was struggling to make it do the right
> thing... I could try.
>
> >
> > > +
> > > +#
> > > +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> > > +# rather than the perf based detector.
> > > +#
> > > +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> > > +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> > > +# (e.g., sysctl and cmdline).
> > > +#
> > > config HARDLOCKUP_DETECTOR
> > > - def_bool y
> > > - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > + bool "Detect Hard Lockups"
> > > + depends on LOCKUP_DETECTOR
> > > + depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > > + select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> >
> > Here is my confusion with HAVE_NMI_WATCHDOG
> >
> > It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> > which would break sparc, I think.
> >
> > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > dependency on !HAVE_NMI_WATCHDOG???
>
> I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.

Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.

For example look at kernel/watchdog.c::watchdog_enabled (line 38)

sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
which I believes means sparc's nmi_watchdog is disabled on boot and has to
be manually enabled?


I _think_ having

depends on LOCKUP_DETECTOR
depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG

will work because your new definition of HARDLOCKUP_DETECTOR is a
combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??

Did I get that right?


I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
HAVE_PERF_NMI_WATCHDOG and then use those two to determine
HARDLOCKUP_DETECTOR. Would that make the config options slightly less
confusing?

Thoughts?

Cheers,
Don

>
> After this patch, it always selects the _PERF detector because that's
> the only one available. See the powerpc patch which adds the PPC64
> exception here.
>
> Yes it's a bit clunky. I think we can subsequently remove HAVE_NMI_DETECTOR
> and replace it with something a bit saner and clean up some of these
> convoluted cases.
>
> Thanks,
> Nick

2017-06-07 03:50:44

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Tue, 6 Jun 2017 12:49:58 -0400
Don Zickus <[email protected]> wrote:

> On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:

> > > > config HARDLOCKUP_DETECTOR
> > > > - def_bool y
> > > > - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > > - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > > + bool "Detect Hard Lockups"
> > > > + depends on LOCKUP_DETECTOR
> > > > + depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > > > + select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > >
> > > Here is my confusion with HAVE_NMI_WATCHDOG
> > >
> > > It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> > > which would break sparc, I think.
> > >
> > > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > > dependency on !HAVE_NMI_WATCHDOG???
> >
> > I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.
>
> Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
> HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.
>
> For example look at kernel/watchdog.c::watchdog_enabled (line 38)
>
> sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
> which I believes means sparc's nmi_watchdog is disabled on boot and has to
> be manually enabled?

Ahh okay because sparc is using watchdog_nmi_enable/disable from the
softlockup watchdog, which checks watchdog_enabled.

>
>
> I _think_ having
>
> depends on LOCKUP_DETECTOR
> depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
>
> will work because your new definition of HARDLOCKUP_DETECTOR is a
> combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
>
> Did I get that right?

Well in some ways, except that most of the NMI watchdogs do not seem to
heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.

NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
completely it own thing.

sparc is somewhere in the middle. It uses some of the HLD stuff, but
not all. That makes it a bit tricky.


> I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> HARDLOCKUP_DETECTOR. Would that make the config options slightly less
> confusing?

This would probably be the right direction to go in, but it will take
slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
meaning that an arch has its own watchdog and does not want any HLD
stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.

While transitioning, we could add a new option instead,

HAVE_ARCH_HARDLOCKUP_DETECTOR

I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
HLD. Possibly you could just change the name to be a bit more regular,
HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Thanks,
Nick

2017-06-08 16:05:07

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Wed, Jun 07, 2017 at 01:50:26PM +1000, Nicholas Piggin wrote:
> >
> > I _think_ having
> >
> > depends on LOCKUP_DETECTOR
> > depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> > select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> >
> > will work because your new definition of HARDLOCKUP_DETECTOR is a
> > combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> >
> > Did I get that right?
>
> Well in some ways, except that most of the NMI watchdogs do not seem to
> heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.
>
> NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
> completely it own thing.
>
> sparc is somewhere in the middle. It uses some of the HLD stuff, but
> not all. That makes it a bit tricky.

Hmm, I can see sparc relying on SOFTLOCKUP, but the HARDLOCKUP code with
your patches seems really small. The only sysctl is nmi_watchdog and
hardlockup_panic. The former is needed but the later is only implemented in
the HARDLOCKUP_DETECTOR_PERF case.

So by having HAVE_NMI_WATCHDOG, you sorta need a sysctl knob which the
HARDLOCKUP_DETECTOR seems to provide on top of the default knobs.

With sparc being special and needing SOFTLOCKUP to call its nmi
enable/disable hooks.


Is there a particular chunk of code you had in mind that did not make sense
with HARDLOCKUP_DETECTOR enabled?

>
>
> > I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> > HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> > HARDLOCKUP_DETECTOR. Would that make the config options slightly less
> > confusing?
>
> This would probably be the right direction to go in, but it will take
> slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> meaning that an arch has its own watchdog and does not want any HLD
> stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
>
> While transitioning, we could add a new option instead,
>
> HAVE_ARCH_HARDLOCKUP_DETECTOR
>
> I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> HLD. Possibly you could just change the name to be a bit more regular,
> HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Actually, I don't think I can just rename it as it has a specific use to let
OPROFILE know the perf events are being NMI triggered as opposed to IRQ
triggered.

Though I like the direction you are going. Then arches either have one or
the other. Or in the ppc case it is dependent on what ppc platform is being
used.

Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
the arch/<arch>/Kconfig explicitly stating which one to use?

Cheers,
Don



>
> Thanks,
> Nick

2017-06-12 08:07:53

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Thu, 8 Jun 2017 12:05:02 -0400
Don Zickus <[email protected]> wrote:

> On Wed, Jun 07, 2017 at 01:50:26PM +1000, Nicholas Piggin wrote:
> > >
> > > I _think_ having
> > >
> > > depends on LOCKUP_DETECTOR
> > > depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> > > select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > >
> > > will work because your new definition of HARDLOCKUP_DETECTOR is a
> > > combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> > >
> > > Did I get that right?
> >
> > Well in some ways, except that most of the NMI watchdogs do not seem to
> > heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.
> >
> > NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
> > completely it own thing.
> >
> > sparc is somewhere in the middle. It uses some of the HLD stuff, but
> > not all. That makes it a bit tricky.
>
> Hmm, I can see sparc relying on SOFTLOCKUP, but the HARDLOCKUP code with
> your patches seems really small. The only sysctl is nmi_watchdog and
> hardlockup_panic. The former is needed but the later is only implemented in
> the HARDLOCKUP_DETECTOR_PERF case.
>
> So by having HAVE_NMI_WATCHDOG, you sorta need a sysctl knob which the
> HARDLOCKUP_DETECTOR seems to provide on top of the default knobs.

I would say there's a few others like the cpumask, threshold, panic,
backtrace, etc.

> With sparc being special and needing SOFTLOCKUP to call its nmi
> enable/disable hooks.

Yes, although we could just remove that dependency (it's minimal code
to start them up with hotplug).

That said, I would hope to actually go the other way in general and move
architectures to using the generic parameters as much as possible.

> Is there a particular chunk of code you had in mind that did not make sense
> with HARDLOCKUP_DETECTOR enabled?

Perhaps the couple of other archs that have their own NMI watchdogs.

>
> >
> >
> > > I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> > > HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> > > HARDLOCKUP_DETECTOR. Would that make the config options slightly less
> > > confusing?
> >
> > This would probably be the right direction to go in, but it will take
> > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > meaning that an arch has its own watchdog and does not want any HLD
> > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> >
> > While transitioning, we could add a new option instead,
> >
> > HAVE_ARCH_HARDLOCKUP_DETECTOR
> >
> > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > HLD. Possibly you could just change the name to be a bit more regular,
> > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR
>
> Actually, I don't think I can just rename it as it has a specific use to let
> OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> triggered.
>
> Though I like the direction you are going. Then arches either have one or
> the other. Or in the ppc case it is dependent on what ppc platform is being
> used.

Okay, glad we're on the same page conceptually :)

>
> Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> the arch/<arch>/Kconfig explicitly stating which one to use?

Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
it provides its own. HARDLOCKUP_DETECTOR option would then depend on
one of the two being defined.

I could try redoing the series with those changes to Kconfig and see how
it looks?

Thanks,
Nick

2017-06-12 20:42:01

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Mon, Jun 12, 2017 at 06:07:39PM +1000, Nicholas Piggin wrote:
> > > This would probably be the right direction to go in, but it will take
> > > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > > meaning that an arch has its own watchdog and does not want any HLD
> > > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> > >
> > > While transitioning, we could add a new option instead,
> > >
> > > HAVE_ARCH_HARDLOCKUP_DETECTOR
> > >
> > > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > > HLD. Possibly you could just change the name to be a bit more regular,
> > > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR
> >
> > Actually, I don't think I can just rename it as it has a specific use to let
> > OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> > triggered.
> >
> > Though I like the direction you are going. Then arches either have one or
> > the other. Or in the ppc case it is dependent on what ppc platform is being
> > used.
>
> Okay, glad we're on the same page conceptually :)
>
> >
> > Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> > the arch/<arch>/Kconfig explicitly stating which one to use?
>
> Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
> it provides its own. HARDLOCKUP_DETECTOR option would then depend on
> one of the two being defined.
>
> I could try redoing the series with those changes to Kconfig and see how
> it looks?

Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
are getting close to have this defined properly which would allow us to
split the code up correctly in the future.

Cheers,
Don

2017-06-13 16:11:36

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Mon, 12 Jun 2017 16:41:56 -0400
Don Zickus <[email protected]> wrote:

> On Mon, Jun 12, 2017 at 06:07:39PM +1000, Nicholas Piggin wrote:
> > > > This would probably be the right direction to go in, but it will take
> > > > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > > > meaning that an arch has its own watchdog and does not want any HLD
> > > > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> > > >
> > > > While transitioning, we could add a new option instead,
> > > >
> > > > HAVE_ARCH_HARDLOCKUP_DETECTOR
> > > >
> > > > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > > > HLD. Possibly you could just change the name to be a bit more regular,
> > > > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR
> > >
> > > Actually, I don't think I can just rename it as it has a specific use to let
> > > OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> > > triggered.
> > >
> > > Though I like the direction you are going. Then arches either have one or
> > > the other. Or in the ppc case it is dependent on what ppc platform is being
> > > used.
> >
> > Okay, glad we're on the same page conceptually :)
> >
> > >
> > > Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> > > the arch/<arch>/Kconfig explicitly stating which one to use?
> >
> > Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
> > it provides its own. HARDLOCKUP_DETECTOR option would then depend on
> > one of the two being defined.
> >
> > I could try redoing the series with those changes to Kconfig and see how
> > it looks?
>
> Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
> are getting close to have this defined properly which would allow us to
> split the code up correctly in the future.

How's this for a replacement patch 3? I think the Kconfig works out much
better now.
--

Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
the minimum arch_touch_nmi_watchdog, and it otherwise does its own
thing and does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to
a full HAVE_HARDLOCKUP_DETECTOR_ARCH.

Signed-off-by: Nicholas Piggin <[email protected]>
---
arch/Kconfig | 23 ++++
arch/powerpc/Kconfig | 1 +
arch/powerpc/kernel/setup_64.c | 2 +-
arch/x86/Kconfig | 1 +
arch/x86/kernel/apic/hw_nmi.c | 2 +-
include/linux/nmi.h | 29 +++--
kernel/Makefile | 2 +-
kernel/sysctl.c | 16 +--
kernel/watchdog.c | 238 ++++++++++++++++++++++++++---------------
kernel/watchdog_hld.c | 32 ------
lib/Kconfig.debug | 26 +++--
11 files changed, 231 insertions(+), 141 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 6c00e5b00f8b..878addc6f141 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
subsystem. Also has support for calculating CPU cycle events
to determine how many clock cycles in a given period.

+
+config HAVE_HARDLOCKUP_DETECTOR_PERF
+ bool
+ depends on HAVE_PERF_EVENTS_NMI
+ help
+ The arch chooses to use the generic perf-NMI-based hardlockup
+ detector. Must define HAVE_PERF_EVENTS_NMI.
+
+config HAVE_NMI_WATCHDOG
+ bool
+ help
+ The arch provides a low level NMI watchdog. It provides
+ asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
+
+config HAVE_HARDLOCKUP_DETECTOR_ARCH
+ bool
+ select HAVE_NMI_WATCHDOG
+ help
+ The arch chooses to provide its own hardlockup detector, which is
+ a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
+ interfaces and parameters provided by hardlockup detector subsystem.
+
+
config HAVE_PERF_REGS
bool
help
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7c8f9972f61..7aba96ec3378 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -202,6 +202,7 @@ config PPC
select HAVE_OPTPROBES if PPC64
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
+ select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE if SMP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f35ff9dea4fb..ab650905f75a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
EXPORT_SYMBOL(ppc_pci_io);
#endif

-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
u64 hw_nmi_get_sample_period(int watchdog_thresh)
{
return ppc_proc_freq * watchdog_thresh;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..3c084149b5d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -157,6 +157,7 @@ config X86
select HAVE_PCSPKR_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI
+ select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index c73c9fb281e1..d6f387780849 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,7 +19,7 @@
#include <linux/init.h>
#include <linux/delay.h>

-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
u64 hw_nmi_get_sample_period(int watchdog_thresh)
{
return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index bd387ef8bccd..8aa01fd859fb 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -11,13 +11,21 @@
#endif

#ifdef CONFIG_LOCKUP_DETECTOR
+void lockup_detector_init(void);
+#else
+static inline void lockup_detector_init(void)
+{
+}
+#endif
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
extern void touch_softlockup_watchdog_sched(void);
extern void touch_softlockup_watchdog(void);
extern void touch_softlockup_watchdog_sync(void);
extern void touch_all_softlockup_watchdogs(void);
extern unsigned int softlockup_panic;
-extern unsigned int hardlockup_panic;
-void lockup_detector_init(void);
+extern int soft_watchdog_enabled;
+extern atomic_t watchdog_park_in_progress;
#else
static inline void touch_softlockup_watchdog_sched(void)
{
@@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
static inline void touch_all_softlockup_watchdogs(void)
{
}
-static inline void lockup_detector_init(void)
-{
-}
#endif

#ifdef CONFIG_DETECT_HUNG_TASK
@@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)

#if defined(CONFIG_HARDLOCKUP_DETECTOR)
extern void hardlockup_detector_disable(void);
+extern unsigned int hardlockup_panic;
#else
static inline void hardlockup_detector_disable(void) {}
#endif

-#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
extern void arch_touch_nmi_watchdog(void);
#else
+#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
static inline void arch_touch_nmi_watchdog(void) {}
#endif
+#endif

/**
* touch_nmi_watchdog - restart NMI watchdog timeout.
@@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
}
#endif

-#ifdef CONFIG_LOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
u64 hw_nmi_get_sample_period(int watchdog_thresh);
+#endif
+
+#ifdef CONFIG_LOCKUP_DETECTOR
extern int nmi_watchdog_enabled;
-extern int soft_watchdog_enabled;
extern int watchdog_user_enabled;
extern int watchdog_thresh;
extern unsigned long watchdog_enabled;
+extern struct cpumask watchdog_cpumask;
extern unsigned long *watchdog_cpumask_bits;
-extern atomic_t watchdog_park_in_progress;
+extern int __read_mostly watchdog_suspended;
#ifdef CONFIG_SMP
extern int sysctl_softlockup_all_cpu_backtrace;
extern int sysctl_hardlockup_all_cpu_backtrace;
diff --git a/kernel/Makefile b/kernel/Makefile
index 72aa080f91f0..4cb8e8b23c6e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_KGDB) += debug/
obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a76cc3..46b189fd865b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
#endif
},
{
+ .procname = "watchdog_cpumask",
+ .data = &watchdog_cpumask_bits,
+ .maxlen = NR_CPUS,
+ .mode = 0644,
+ .proc_handler = proc_watchdog_cpumask,
+ },
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+ {
.procname = "soft_watchdog",
.data = &soft_watchdog_enabled,
.maxlen = sizeof (int),
@@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
.extra2 = &one,
},
{
- .procname = "watchdog_cpumask",
- .data = &watchdog_cpumask_bits,
- .maxlen = NR_CPUS,
- .mode = 0644,
- .proc_handler = proc_watchdog_cpumask,
- },
- {
.procname = "softlockup_panic",
.data = &softlockup_panic,
.maxlen = sizeof(int),
@@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
.extra1 = &zero,
.extra2 = &one,
},
+#endif
#ifdef CONFIG_HARDLOCKUP_DETECTOR
{
.procname = "hardlockup_panic",
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 03e0b69bb5bf..deb010505646 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,15 +29,55 @@
#include <linux/kvm_para.h>
#include <linux/kthread.h>

+/* Watchdog configuration */
static DEFINE_MUTEX(watchdog_proc_mutex);

-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
+
+/* boot commands */
+/*
+ * Should we panic when a soft-lockup or hard-lockup occurs:
+ */
+unsigned int __read_mostly hardlockup_panic =
+ CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
+/*
+ * We may not want to enable hard lockup detection by default in all cases,
+ * for example when running the kernel as a guest on a hypervisor. In these
+ * cases this function can be called to disable hard lockup detection. This
+ * function should only be executed once by the boot processor before the
+ * kernel command line parameters are parsed, because otherwise it is not
+ * possible to override this in hardlockup_panic_setup().
+ */
+void hardlockup_detector_disable(void)
+{
+ watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+}
+
+static int __init hardlockup_panic_setup(char *str)
+{
+ if (!strncmp(str, "panic", 5))
+ hardlockup_panic = 1;
+ else if (!strncmp(str, "nopanic", 7))
+ hardlockup_panic = 0;
+ else if (!strncmp(str, "0", 1))
+ watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+ else if (!strncmp(str, "1", 1))
+ watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+ return 1;
+}
+__setup("nmi_watchdog=", hardlockup_panic_setup);
+
#else
unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
#endif
-int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
int __read_mostly soft_watchdog_enabled;
+#endif
+
int __read_mostly watchdog_user_enabled;
int __read_mostly watchdog_thresh = 10;

@@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
int __read_mostly sysctl_softlockup_all_cpu_backtrace;
int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
#endif
-static struct cpumask watchdog_cpumask __read_mostly;
+struct cpumask watchdog_cpumask __read_mostly;
unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);

-/* Helper for online, unparked cpus. */
-#define for_each_watchdog_cpu(cpu) \
- for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
-
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
/*
* The 'watchdog_running' variable is set to 1 when the watchdog threads
* are registered/started and is set to 0 when the watchdog threads are
@@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
* of 'watchdog_running' cannot change while the watchdog is deactivated
* temporarily (see related code in 'proc' handlers).
*/
-static int __read_mostly watchdog_suspended;
+int __read_mostly watchdog_suspended;
+
+/*
+ * These functions can be overridden if an architecture implements its
+ * own hardlockup detector.
+ */
+int __weak watchdog_nmi_enable(unsigned int cpu)
+{
+ return 0;
+}
+void __weak watchdog_nmi_disable(unsigned int cpu)
+{
+}
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+
+/* Helper for online, unparked cpus. */
+#define for_each_watchdog_cpu(cpu) \
+ for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
+
+atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);

static u64 __read_mostly sample_period;

@@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
return 1;
}
__setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
static int __init hardlockup_all_cpu_backtrace_setup(char *str)
{
sysctl_hardlockup_all_cpu_backtrace =
@@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
#endif
+#endif

/*
* Hard-lockup warnings should be triggered after just a few seconds. Soft-
@@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
__this_cpu_write(watchdog_touch_ts, 0);
}

-/* watchdog detector functions */
-bool is_hardlockup(void)
-{
- unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
-
- if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
- return true;
-
- __this_cpu_write(hrtimer_interrupts_saved, hrint);
- return false;
-}
-
static int is_softlockup(unsigned long touch_ts)
{
unsigned long now = get_timestamp();
@@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
return 0;
}

-static void watchdog_interrupt_count(void)
+/* watchdog detector functions */
+bool is_hardlockup(void)
{
- __this_cpu_inc(hrtimer_interrupts);
-}
+ unsigned long hrint = __this_cpu_read(hrtimer_interrupts);

-/*
- * These two functions are mostly architecture specific
- * defining them as weak here.
- */
-int __weak watchdog_nmi_enable(unsigned int cpu)
-{
- return 0;
+ if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+ return true;
+
+ __this_cpu_write(hrtimer_interrupts_saved, hrint);
+ return false;
}
-void __weak watchdog_nmi_disable(unsigned int cpu)
+
+static void watchdog_interrupt_count(void)
{
+ __this_cpu_inc(hrtimer_interrupts);
}

static int watchdog_enable_all_cpus(void);
@@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
kthread_unpark(per_cpu(softlockup_watchdog, cpu));
}

-/*
- * Suspend the hard and soft lockup detector by parking the watchdog threads.
- */
-int lockup_detector_suspend(void)
-{
- int ret = 0;
-
- get_online_cpus();
- mutex_lock(&watchdog_proc_mutex);
- /*
- * Multiple suspend requests can be active in parallel (counted by
- * the 'watchdog_suspended' variable). If the watchdog threads are
- * running, the first caller takes care that they will be parked.
- * The state of 'watchdog_running' cannot change while a suspend
- * request is active (see related code in 'proc' handlers).
- */
- if (watchdog_running && !watchdog_suspended)
- ret = watchdog_park_threads();
-
- if (ret == 0)
- watchdog_suspended++;
- else {
- watchdog_disable_all_cpus();
- pr_err("Failed to suspend lockup detectors, disabled\n");
- watchdog_enabled = 0;
- }
-
- mutex_unlock(&watchdog_proc_mutex);
-
- return ret;
-}
-
-/*
- * Resume the hard and soft lockup detector by unparking the watchdog threads.
- */
-void lockup_detector_resume(void)
-{
- mutex_lock(&watchdog_proc_mutex);
-
- watchdog_suspended--;
- /*
- * The watchdog threads are unparked if they were previously running
- * and if there is no more active suspend request.
- */
- if (watchdog_running && !watchdog_suspended)
- watchdog_unpark_threads();
-
- mutex_unlock(&watchdog_proc_mutex);
- put_online_cpus();
-}
-
static int update_watchdog_all_cpus(void)
{
int ret;
@@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
}
}

+#else /* SOFTLOCKUP */
+static int watchdog_park_threads(void)
+{
+ return 0;
+}
+
+static void watchdog_unpark_threads(void)
+{
+}
+
+static int watchdog_enable_all_cpus(void)
+{
+ return 0;
+}
+
+static void watchdog_disable_all_cpus(void)
+{
+}
+
+static void set_sample_period(void)
+{
+}
+#endif /* SOFTLOCKUP */
+
+/*
+ * Suspend the hard and soft lockup detector by parking the watchdog threads.
+ */
+int lockup_detector_suspend(void)
+{
+ int ret = 0;
+
+ get_online_cpus();
+ mutex_lock(&watchdog_proc_mutex);
+ /*
+ * Multiple suspend requests can be active in parallel (counted by
+ * the 'watchdog_suspended' variable). If the watchdog threads are
+ * running, the first caller takes care that they will be parked.
+ * The state of 'watchdog_running' cannot change while a suspend
+ * request is active (see related code in 'proc' handlers).
+ */
+ if (watchdog_running && !watchdog_suspended)
+ ret = watchdog_park_threads();
+
+ if (ret == 0)
+ watchdog_suspended++;
+ else {
+ watchdog_disable_all_cpus();
+ pr_err("Failed to suspend lockup detectors, disabled\n");
+ watchdog_enabled = 0;
+ }
+
+ mutex_unlock(&watchdog_proc_mutex);
+
+ return ret;
+}
+
+/*
+ * Resume the hard and soft lockup detector by unparking the watchdog threads.
+ */
+void lockup_detector_resume(void)
+{
+ mutex_lock(&watchdog_proc_mutex);
+
+ watchdog_suspended--;
+ /*
+ * The watchdog threads are unparked if they were previously running
+ * and if there is no more active suspend request.
+ */
+ if (watchdog_running && !watchdog_suspended)
+ watchdog_unpark_threads();
+
+ mutex_unlock(&watchdog_proc_mutex);
+ put_online_cpus();
+}
+
#ifdef CONFIG_SYSCTL

/*
@@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
* a temporary cpumask, so we are likely not in a
* position to do much else to make things better.
*/
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
if (smpboot_update_cpumask_percpu_thread(
&watchdog_threads, &watchdog_cpumask) != 0)
pr_err("cpumask update failed\n");
+#endif
}
}
out:
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 90d688df6ce1..295a0d84934c 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);

-/* boot commands */
-/*
- * Should we panic when a soft-lockup or hard-lockup occurs:
- */
-unsigned int __read_mostly hardlockup_panic =
- CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
static unsigned long hardlockup_allcpu_dumped;
-/*
- * We may not want to enable hard lockup detection by default in all cases,
- * for example when running the kernel as a guest on a hypervisor. In these
- * cases this function can be called to disable hard lockup detection. This
- * function should only be executed once by the boot processor before the
- * kernel command line parameters are parsed, because otherwise it is not
- * possible to override this in hardlockup_panic_setup().
- */
-void hardlockup_detector_disable(void)
-{
- watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
-}
-
-static int __init hardlockup_panic_setup(char *str)
-{
- if (!strncmp(str, "panic", 5))
- hardlockup_panic = 1;
- else if (!strncmp(str, "nopanic", 7))
- hardlockup_panic = 0;
- else if (!strncmp(str, "0", 1))
- watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
- else if (!strncmp(str, "1", 1))
- watchdog_enabled |= NMI_WATCHDOG_ENABLED;
- return 1;
-}
-__setup("nmi_watchdog=", hardlockup_panic_setup);

void arch_touch_nmi_watchdog(void)
{
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587ebe52c7..f87a559dd178 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
The frequency of hrtimer and NMI events and the soft and hard lockup
thresholds can be controlled through the sysctl watchdog_thresh.

+config SOFTLOCKUP_DETECTOR
+ bool "Detect Soft Lockups"
+ depends on LOCKUP_DETECTOR
+
+config HARDLOCKUP_DETECTOR_PERF
+ bool
+ select SOFTLOCKUP_DETECTOR
+
+#
+# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
+# lockup detector rather than the perf based detector.
+#
config HARDLOCKUP_DETECTOR
- def_bool y
- depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
- depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
+ bool "Detect Hard Lockups"
+ depends on LOCKUP_DETECTOR
+ depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
+ select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
+ select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH

config BOOTPARAM_HARDLOCKUP_PANIC
bool "Panic (Reboot) On Hard Lockups"
@@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE

config BOOTPARAM_SOFTLOCKUP_PANIC
bool "Panic (Reboot) On Soft Lockups"
- depends on LOCKUP_DETECTOR
+ depends on SOFTLOCKUP_DETECTOR
help
Say Y here to enable the kernel to panic on "soft lockups",
which are bugs that cause the kernel to loop in kernel
@@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC

config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
int
- depends on LOCKUP_DETECTOR
+ depends on SOFTLOCKUP_DETECTOR
range 0 1
default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
@@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
- default LOCKUP_DETECTOR
+ default SOFTLOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
--
2.11.0

2017-06-14 14:09:42

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
> > Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
> > are getting close to have this defined properly which would allow us to
> > split the code up correctly in the future.
>
> How's this for a replacement patch 3? I think the Kconfig works out much
> better now.

Hi Nick,

I think you made this much clearer, thank you! I am good with this.


Hi Babu,

Can you give this patchset (and particularly this version of patch 3) a try
on sparc to make sure we didn't break anything? I believe this should
resolve the start nmi watchdog on boot issue you noticed. Thanks!

Cheers,
Don

> --
>
> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
>
> LOCKUP_DETECTOR implies the general boot, sysctl, and programming
> interfaces for the lockup detectors.
>
> An architecture that wants to use a hard lockup detector must define
> HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
>
> Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
> the minimum arch_touch_nmi_watchdog, and it otherwise does its own
> thing and does not implement the LOCKUP_DETECTOR interfaces.
>
> sparc is unusual in that it has started to implement some of the
> interfaces, but not fully yet. It should probably be converted to
> a full HAVE_HARDLOCKUP_DETECTOR_ARCH.
>
> Signed-off-by: Nicholas Piggin <[email protected]>
> ---
> arch/Kconfig | 23 ++++
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/kernel/setup_64.c | 2 +-
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/apic/hw_nmi.c | 2 +-
> include/linux/nmi.h | 29 +++--
> kernel/Makefile | 2 +-
> kernel/sysctl.c | 16 +--
> kernel/watchdog.c | 238 ++++++++++++++++++++++++++---------------
> kernel/watchdog_hld.c | 32 ------
> lib/Kconfig.debug | 26 +++--
> 11 files changed, 231 insertions(+), 141 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 6c00e5b00f8b..878addc6f141 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
> subsystem. Also has support for calculating CPU cycle events
> to determine how many clock cycles in a given period.
>
> +
> +config HAVE_HARDLOCKUP_DETECTOR_PERF
> + bool
> + depends on HAVE_PERF_EVENTS_NMI
> + help
> + The arch chooses to use the generic perf-NMI-based hardlockup
> + detector. Must define HAVE_PERF_EVENTS_NMI.
> +
> +config HAVE_NMI_WATCHDOG
> + bool
> + help
> + The arch provides a low level NMI watchdog. It provides
> + asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
> +
> +config HAVE_HARDLOCKUP_DETECTOR_ARCH
> + bool
> + select HAVE_NMI_WATCHDOG
> + help
> + The arch chooses to provide its own hardlockup detector, which is
> + a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
> + interfaces and parameters provided by hardlockup detector subsystem.
> +
> +
> config HAVE_PERF_REGS
> bool
> help
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index f7c8f9972f61..7aba96ec3378 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -202,6 +202,7 @@ config PPC
> select HAVE_OPTPROBES if PPC64
> select HAVE_PERF_EVENTS
> select HAVE_PERF_EVENTS_NMI if PPC64
> + select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
> select HAVE_PERF_REGS
> select HAVE_PERF_USER_STACK_DUMP
> select HAVE_RCU_TABLE_FREE if SMP
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index f35ff9dea4fb..ab650905f75a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
> EXPORT_SYMBOL(ppc_pci_io);
> #endif
>
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> u64 hw_nmi_get_sample_period(int watchdog_thresh)
> {
> return ppc_proc_freq * watchdog_thresh;
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4ccfacc7232a..3c084149b5d1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -157,6 +157,7 @@ config X86
> select HAVE_PCSPKR_PLATFORM
> select HAVE_PERF_EVENTS
> select HAVE_PERF_EVENTS_NMI
> + select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
> select HAVE_PERF_REGS
> select HAVE_PERF_USER_STACK_DUMP
> select HAVE_REGS_AND_STACK_ACCESS_API
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index c73c9fb281e1..d6f387780849 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -19,7 +19,7 @@
> #include <linux/init.h>
> #include <linux/delay.h>
>
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> u64 hw_nmi_get_sample_period(int watchdog_thresh)
> {
> return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index bd387ef8bccd..8aa01fd859fb 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -11,13 +11,21 @@
> #endif
>
> #ifdef CONFIG_LOCKUP_DETECTOR
> +void lockup_detector_init(void);
> +#else
> +static inline void lockup_detector_init(void)
> +{
> +}
> +#endif
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> extern void touch_softlockup_watchdog_sched(void);
> extern void touch_softlockup_watchdog(void);
> extern void touch_softlockup_watchdog_sync(void);
> extern void touch_all_softlockup_watchdogs(void);
> extern unsigned int softlockup_panic;
> -extern unsigned int hardlockup_panic;
> -void lockup_detector_init(void);
> +extern int soft_watchdog_enabled;
> +extern atomic_t watchdog_park_in_progress;
> #else
> static inline void touch_softlockup_watchdog_sched(void)
> {
> @@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
> static inline void touch_all_softlockup_watchdogs(void)
> {
> }
> -static inline void lockup_detector_init(void)
> -{
> -}
> #endif
>
> #ifdef CONFIG_DETECT_HUNG_TASK
> @@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)
>
> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> extern void hardlockup_detector_disable(void);
> +extern unsigned int hardlockup_panic;
> #else
> static inline void hardlockup_detector_disable(void) {}
> #endif
>
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> extern void arch_touch_nmi_watchdog(void);
> #else
> +#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
> static inline void arch_touch_nmi_watchdog(void) {}
> #endif
> +#endif
>
> /**
> * touch_nmi_watchdog - restart NMI watchdog timeout.
> @@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
> }
> #endif
>
> -#ifdef CONFIG_LOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
> u64 hw_nmi_get_sample_period(int watchdog_thresh);
> +#endif
> +
> +#ifdef CONFIG_LOCKUP_DETECTOR
> extern int nmi_watchdog_enabled;
> -extern int soft_watchdog_enabled;
> extern int watchdog_user_enabled;
> extern int watchdog_thresh;
> extern unsigned long watchdog_enabled;
> +extern struct cpumask watchdog_cpumask;
> extern unsigned long *watchdog_cpumask_bits;
> -extern atomic_t watchdog_park_in_progress;
> +extern int __read_mostly watchdog_suspended;
> #ifdef CONFIG_SMP
> extern int sysctl_softlockup_all_cpu_backtrace;
> extern int sysctl_hardlockup_all_cpu_backtrace;
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 72aa080f91f0..4cb8e8b23c6e 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
> obj-$(CONFIG_KGDB) += debug/
> obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
> obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
> -obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
> obj-$(CONFIG_SECCOMP) += seccomp.o
> obj-$(CONFIG_RELAY) += relay.o
> obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4dfba1a76cc3..46b189fd865b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
> #endif
> },
> {
> + .procname = "watchdog_cpumask",
> + .data = &watchdog_cpumask_bits,
> + .maxlen = NR_CPUS,
> + .mode = 0644,
> + .proc_handler = proc_watchdog_cpumask,
> + },
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> + {
> .procname = "soft_watchdog",
> .data = &soft_watchdog_enabled,
> .maxlen = sizeof (int),
> @@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
> .extra2 = &one,
> },
> {
> - .procname = "watchdog_cpumask",
> - .data = &watchdog_cpumask_bits,
> - .maxlen = NR_CPUS,
> - .mode = 0644,
> - .proc_handler = proc_watchdog_cpumask,
> - },
> - {
> .procname = "softlockup_panic",
> .data = &softlockup_panic,
> .maxlen = sizeof(int),
> @@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
> .extra1 = &zero,
> .extra2 = &one,
> },
> +#endif
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> {
> .procname = "hardlockup_panic",
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 03e0b69bb5bf..deb010505646 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -29,15 +29,55 @@
> #include <linux/kvm_para.h>
> #include <linux/kthread.h>
>
> +/* Watchdog configuration */
> static DEFINE_MUTEX(watchdog_proc_mutex);
>
> -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
> +int __read_mostly nmi_watchdog_enabled;
> +
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> +
> +/* boot commands */
> +/*
> + * Should we panic when a soft-lockup or hard-lockup occurs:
> + */
> +unsigned int __read_mostly hardlockup_panic =
> + CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
> +/*
> + * We may not want to enable hard lockup detection by default in all cases,
> + * for example when running the kernel as a guest on a hypervisor. In these
> + * cases this function can be called to disable hard lockup detection. This
> + * function should only be executed once by the boot processor before the
> + * kernel command line parameters are parsed, because otherwise it is not
> + * possible to override this in hardlockup_panic_setup().
> + */
> +void hardlockup_detector_disable(void)
> +{
> + watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> +}
> +
> +static int __init hardlockup_panic_setup(char *str)
> +{
> + if (!strncmp(str, "panic", 5))
> + hardlockup_panic = 1;
> + else if (!strncmp(str, "nopanic", 7))
> + hardlockup_panic = 0;
> + else if (!strncmp(str, "0", 1))
> + watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> + else if (!strncmp(str, "1", 1))
> + watchdog_enabled |= NMI_WATCHDOG_ENABLED;
> + return 1;
> +}
> +__setup("nmi_watchdog=", hardlockup_panic_setup);
> +
> #else
> unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> #endif
> -int __read_mostly nmi_watchdog_enabled;
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> int __read_mostly soft_watchdog_enabled;
> +#endif
> +
> int __read_mostly watchdog_user_enabled;
> int __read_mostly watchdog_thresh = 10;
>
> @@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
> int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
> #endif
> -static struct cpumask watchdog_cpumask __read_mostly;
> +struct cpumask watchdog_cpumask __read_mostly;
> unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>
> -/* Helper for online, unparked cpus. */
> -#define for_each_watchdog_cpu(cpu) \
> - for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> -
> -atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
> -
> /*
> * The 'watchdog_running' variable is set to 1 when the watchdog threads
> * are registered/started and is set to 0 when the watchdog threads are
> @@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
> * of 'watchdog_running' cannot change while the watchdog is deactivated
> * temporarily (see related code in 'proc' handlers).
> */
> -static int __read_mostly watchdog_suspended;
> +int __read_mostly watchdog_suspended;
> +
> +/*
> + * These functions can be overridden if an architecture implements its
> + * own hardlockup detector.
> + */
> +int __weak watchdog_nmi_enable(unsigned int cpu)
> +{
> + return 0;
> +}
> +void __weak watchdog_nmi_disable(unsigned int cpu)
> +{
> +}
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> +
> +/* Helper for online, unparked cpus. */
> +#define for_each_watchdog_cpu(cpu) \
> + for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> +
> +atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>
> static u64 __read_mostly sample_period;
>
> @@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
> return 1;
> }
> __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> {
> sysctl_hardlockup_all_cpu_backtrace =
> @@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> }
> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> #endif
> +#endif
>
> /*
> * Hard-lockup warnings should be triggered after just a few seconds. Soft-
> @@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
> __this_cpu_write(watchdog_touch_ts, 0);
> }
>
> -/* watchdog detector functions */
> -bool is_hardlockup(void)
> -{
> - unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> -
> - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> - return true;
> -
> - __this_cpu_write(hrtimer_interrupts_saved, hrint);
> - return false;
> -}
> -
> static int is_softlockup(unsigned long touch_ts)
> {
> unsigned long now = get_timestamp();
> @@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
> return 0;
> }
>
> -static void watchdog_interrupt_count(void)
> +/* watchdog detector functions */
> +bool is_hardlockup(void)
> {
> - __this_cpu_inc(hrtimer_interrupts);
> -}
> + unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>
> -/*
> - * These two functions are mostly architecture specific
> - * defining them as weak here.
> - */
> -int __weak watchdog_nmi_enable(unsigned int cpu)
> -{
> - return 0;
> + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> + return true;
> +
> + __this_cpu_write(hrtimer_interrupts_saved, hrint);
> + return false;
> }
> -void __weak watchdog_nmi_disable(unsigned int cpu)
> +
> +static void watchdog_interrupt_count(void)
> {
> + __this_cpu_inc(hrtimer_interrupts);
> }
>
> static int watchdog_enable_all_cpus(void);
> @@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
> kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> }
>
> -/*
> - * Suspend the hard and soft lockup detector by parking the watchdog threads.
> - */
> -int lockup_detector_suspend(void)
> -{
> - int ret = 0;
> -
> - get_online_cpus();
> - mutex_lock(&watchdog_proc_mutex);
> - /*
> - * Multiple suspend requests can be active in parallel (counted by
> - * the 'watchdog_suspended' variable). If the watchdog threads are
> - * running, the first caller takes care that they will be parked.
> - * The state of 'watchdog_running' cannot change while a suspend
> - * request is active (see related code in 'proc' handlers).
> - */
> - if (watchdog_running && !watchdog_suspended)
> - ret = watchdog_park_threads();
> -
> - if (ret == 0)
> - watchdog_suspended++;
> - else {
> - watchdog_disable_all_cpus();
> - pr_err("Failed to suspend lockup detectors, disabled\n");
> - watchdog_enabled = 0;
> - }
> -
> - mutex_unlock(&watchdog_proc_mutex);
> -
> - return ret;
> -}
> -
> -/*
> - * Resume the hard and soft lockup detector by unparking the watchdog threads.
> - */
> -void lockup_detector_resume(void)
> -{
> - mutex_lock(&watchdog_proc_mutex);
> -
> - watchdog_suspended--;
> - /*
> - * The watchdog threads are unparked if they were previously running
> - * and if there is no more active suspend request.
> - */
> - if (watchdog_running && !watchdog_suspended)
> - watchdog_unpark_threads();
> -
> - mutex_unlock(&watchdog_proc_mutex);
> - put_online_cpus();
> -}
> -
> static int update_watchdog_all_cpus(void)
> {
> int ret;
> @@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
> }
> }
>
> +#else /* SOFTLOCKUP */
> +static int watchdog_park_threads(void)
> +{
> + return 0;
> +}
> +
> +static void watchdog_unpark_threads(void)
> +{
> +}
> +
> +static int watchdog_enable_all_cpus(void)
> +{
> + return 0;
> +}
> +
> +static void watchdog_disable_all_cpus(void)
> +{
> +}
> +
> +static void set_sample_period(void)
> +{
> +}
> +#endif /* SOFTLOCKUP */
> +
> +/*
> + * Suspend the hard and soft lockup detector by parking the watchdog threads.
> + */
> +int lockup_detector_suspend(void)
> +{
> + int ret = 0;
> +
> + get_online_cpus();
> + mutex_lock(&watchdog_proc_mutex);
> + /*
> + * Multiple suspend requests can be active in parallel (counted by
> + * the 'watchdog_suspended' variable). If the watchdog threads are
> + * running, the first caller takes care that they will be parked.
> + * The state of 'watchdog_running' cannot change while a suspend
> + * request is active (see related code in 'proc' handlers).
> + */
> + if (watchdog_running && !watchdog_suspended)
> + ret = watchdog_park_threads();
> +
> + if (ret == 0)
> + watchdog_suspended++;
> + else {
> + watchdog_disable_all_cpus();
> + pr_err("Failed to suspend lockup detectors, disabled\n");
> + watchdog_enabled = 0;
> + }
> +
> + mutex_unlock(&watchdog_proc_mutex);
> +
> + return ret;
> +}
> +
> +/*
> + * Resume the hard and soft lockup detector by unparking the watchdog threads.
> + */
> +void lockup_detector_resume(void)
> +{
> + mutex_lock(&watchdog_proc_mutex);
> +
> + watchdog_suspended--;
> + /*
> + * The watchdog threads are unparked if they were previously running
> + * and if there is no more active suspend request.
> + */
> + if (watchdog_running && !watchdog_suspended)
> + watchdog_unpark_threads();
> +
> + mutex_unlock(&watchdog_proc_mutex);
> + put_online_cpus();
> +}
> +
> #ifdef CONFIG_SYSCTL
>
> /*
> @@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
> * a temporary cpumask, so we are likely not in a
> * position to do much else to make things better.
> */
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> if (smpboot_update_cpumask_percpu_thread(
> &watchdog_threads, &watchdog_cpumask) != 0)
> pr_err("cpumask update failed\n");
> +#endif
> }
> }
> out:
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 90d688df6ce1..295a0d84934c 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
>
> -/* boot commands */
> -/*
> - * Should we panic when a soft-lockup or hard-lockup occurs:
> - */
> -unsigned int __read_mostly hardlockup_panic =
> - CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
> static unsigned long hardlockup_allcpu_dumped;
> -/*
> - * We may not want to enable hard lockup detection by default in all cases,
> - * for example when running the kernel as a guest on a hypervisor. In these
> - * cases this function can be called to disable hard lockup detection. This
> - * function should only be executed once by the boot processor before the
> - * kernel command line parameters are parsed, because otherwise it is not
> - * possible to override this in hardlockup_panic_setup().
> - */
> -void hardlockup_detector_disable(void)
> -{
> - watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> -}
> -
> -static int __init hardlockup_panic_setup(char *str)
> -{
> - if (!strncmp(str, "panic", 5))
> - hardlockup_panic = 1;
> - else if (!strncmp(str, "nopanic", 7))
> - hardlockup_panic = 0;
> - else if (!strncmp(str, "0", 1))
> - watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> - else if (!strncmp(str, "1", 1))
> - watchdog_enabled |= NMI_WATCHDOG_ENABLED;
> - return 1;
> -}
> -__setup("nmi_watchdog=", hardlockup_panic_setup);
>
> void arch_touch_nmi_watchdog(void)
> {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e4587ebe52c7..f87a559dd178 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
> The frequency of hrtimer and NMI events and the soft and hard lockup
> thresholds can be controlled through the sysctl watchdog_thresh.
>
> +config SOFTLOCKUP_DETECTOR
> + bool "Detect Soft Lockups"
> + depends on LOCKUP_DETECTOR
> +
> +config HARDLOCKUP_DETECTOR_PERF
> + bool
> + select SOFTLOCKUP_DETECTOR
> +
> +#
> +# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
> +# lockup detector rather than the perf based detector.
> +#
> config HARDLOCKUP_DETECTOR
> - def_bool y
> - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> + bool "Detect Hard Lockups"
> + depends on LOCKUP_DETECTOR
> + depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> + select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
>
> config BOOTPARAM_HARDLOCKUP_PANIC
> bool "Panic (Reboot) On Hard Lockups"
> @@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>
> config BOOTPARAM_SOFTLOCKUP_PANIC
> bool "Panic (Reboot) On Soft Lockups"
> - depends on LOCKUP_DETECTOR
> + depends on SOFTLOCKUP_DETECTOR
> help
> Say Y here to enable the kernel to panic on "soft lockups",
> which are bugs that cause the kernel to loop in kernel
> @@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>
> config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
> int
> - depends on LOCKUP_DETECTOR
> + depends on SOFTLOCKUP_DETECTOR
> range 0 1
> default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
> default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
> @@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
> config DETECT_HUNG_TASK
> bool "Detect Hung Tasks"
> depends on DEBUG_KERNEL
> - default LOCKUP_DETECTOR
> + default SOFTLOCKUP_DETECTOR
> help
> Say Y here to enable the kernel to detect "hung tasks",
> which are bugs that cause the task to be stuck in
> --
> 2.11.0
>

2017-06-15 02:16:18

by Babu Moger

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

Hi Don,

On 6/14/2017 9:09 AM, Don Zickus wrote:
> On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
>>> Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
>>> are getting close to have this defined properly which would allow us to
>>> split the code up correctly in the future.
>> How's this for a replacement patch 3? I think the Kconfig works out much
>> better now.
> Hi Nick,
>
> I think you made this much clearer, thank you! I am good with this.
>
>
> Hi Babu,
>
> Can you give this patchset (and particularly this version of patch 3) a try
> on sparc to make sure we didn't break anything? I believe this should
> resolve the start nmi watchdog on boot issue you noticed. Thanks!

There is still one problem with the patch.

# cat /proc/sys/kernel/watchdog
1
# cat /proc/sys/kernel/nmi_watchdog
0

Problem is setting the initial value for "nmi_watchdog"

We need something(or similar) patch on top to address this.
============================================
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5397c63..0105856 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -34,9 +34,13 @@

int __read_mostly nmi_watchdog_enabled;

-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#if defined(CONFIG_HARDLOCKUP_DETECTOR) ||
defined(CONFIG_HAVE_NMI_WATCHDOG)
unsigned long __read_mostly watchdog_enabled =
SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
+#else
+unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
+#endif

+#ifdef CONFIG_HARDLOCKUP_DETECTOR
/* boot commands */
/*
* Should we panic when a soft-lockup or hard-lockup occurs:
@@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
return 1;
}
__setup("nmi_watchdog=", hardlockup_panic_setup);
-
-#else
-unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
#endif

#ifdef CONFIG_SOFTLOCKUP_DETECTOR


>
> Cheers,
> Don
>
>> --
>>
>> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
>> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
>>
>> LOCKUP_DETECTOR implies the general boot, sysctl, and programming
>> interfaces for the lockup detectors.
>>
>> An architecture that wants to use a hard lockup detector must define
>> HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
>>
>> Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
>> the minimum arch_touch_nmi_watchdog, and it otherwise does its own
>> thing and does not implement the LOCKUP_DETECTOR interfaces.
>>
>> sparc is unusual in that it has started to implement some of the
>> interfaces, but not fully yet. It should probably be converted to
>> a full HAVE_HARDLOCKUP_DETECTOR_ARCH.
>>
>> Signed-off-by: Nicholas Piggin <[email protected]>
>> ---
>> arch/Kconfig | 23 ++++
>> arch/powerpc/Kconfig | 1 +
>> arch/powerpc/kernel/setup_64.c | 2 +-
>> arch/x86/Kconfig | 1 +
>> arch/x86/kernel/apic/hw_nmi.c | 2 +-
>> include/linux/nmi.h | 29 +++--
>> kernel/Makefile | 2 +-
>> kernel/sysctl.c | 16 +--
>> kernel/watchdog.c | 238 ++++++++++++++++++++++++++---------------
>> kernel/watchdog_hld.c | 32 ------
>> lib/Kconfig.debug | 26 +++--
>> 11 files changed, 231 insertions(+), 141 deletions(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 6c00e5b00f8b..878addc6f141 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
>> subsystem. Also has support for calculating CPU cycle events
>> to determine how many clock cycles in a given period.
>>
>> +
>> +config HAVE_HARDLOCKUP_DETECTOR_PERF
>> + bool
>> + depends on HAVE_PERF_EVENTS_NMI
>> + help
>> + The arch chooses to use the generic perf-NMI-based hardlockup
>> + detector. Must define HAVE_PERF_EVENTS_NMI.
>> +
>> +config HAVE_NMI_WATCHDOG
>> + bool
>> + help
>> + The arch provides a low level NMI watchdog. It provides
>> + asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
>> +
>> +config HAVE_HARDLOCKUP_DETECTOR_ARCH
>> + bool
>> + select HAVE_NMI_WATCHDOG
>> + help
>> + The arch chooses to provide its own hardlockup detector, which is
>> + a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
>> + interfaces and parameters provided by hardlockup detector subsystem.
>> +
>> +
>> config HAVE_PERF_REGS
>> bool
>> help
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index f7c8f9972f61..7aba96ec3378 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -202,6 +202,7 @@ config PPC
>> select HAVE_OPTPROBES if PPC64
>> select HAVE_PERF_EVENTS
>> select HAVE_PERF_EVENTS_NMI if PPC64
>> + select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
>> select HAVE_PERF_REGS
>> select HAVE_PERF_USER_STACK_DUMP
>> select HAVE_RCU_TABLE_FREE if SMP
>> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
>> index f35ff9dea4fb..ab650905f75a 100644
>> --- a/arch/powerpc/kernel/setup_64.c
>> +++ b/arch/powerpc/kernel/setup_64.c
>> @@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
>> EXPORT_SYMBOL(ppc_pci_io);
>> #endif
>>
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>> u64 hw_nmi_get_sample_period(int watchdog_thresh)
>> {
>> return ppc_proc_freq * watchdog_thresh;
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 4ccfacc7232a..3c084149b5d1 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -157,6 +157,7 @@ config X86
>> select HAVE_PCSPKR_PLATFORM
>> select HAVE_PERF_EVENTS
>> select HAVE_PERF_EVENTS_NMI
>> + select HAVE_HARDLOCKUP_DETECTOR_PERF if HAVE_PERF_EVENTS_NMI
>> select HAVE_PERF_REGS
>> select HAVE_PERF_USER_STACK_DUMP
>> select HAVE_REGS_AND_STACK_ACCESS_API
>> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
>> index c73c9fb281e1..d6f387780849 100644
>> --- a/arch/x86/kernel/apic/hw_nmi.c
>> +++ b/arch/x86/kernel/apic/hw_nmi.c
>> @@ -19,7 +19,7 @@
>> #include <linux/init.h>
>> #include <linux/delay.h>
>>
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>> u64 hw_nmi_get_sample_period(int watchdog_thresh)
>> {
>> return (u64)(cpu_khz) * 1000 * watchdog_thresh;
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index bd387ef8bccd..8aa01fd859fb 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -11,13 +11,21 @@
>> #endif
>>
>> #ifdef CONFIG_LOCKUP_DETECTOR
>> +void lockup_detector_init(void);
>> +#else
>> +static inline void lockup_detector_init(void)
>> +{
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> extern void touch_softlockup_watchdog_sched(void);
>> extern void touch_softlockup_watchdog(void);
>> extern void touch_softlockup_watchdog_sync(void);
>> extern void touch_all_softlockup_watchdogs(void);
>> extern unsigned int softlockup_panic;
>> -extern unsigned int hardlockup_panic;
>> -void lockup_detector_init(void);
>> +extern int soft_watchdog_enabled;
>> +extern atomic_t watchdog_park_in_progress;
>> #else
>> static inline void touch_softlockup_watchdog_sched(void)
>> {
>> @@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
>> static inline void touch_all_softlockup_watchdogs(void)
>> {
>> }
>> -static inline void lockup_detector_init(void)
>> -{
>> -}
>> #endif
>>
>> #ifdef CONFIG_DETECT_HUNG_TASK
>> @@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)
>>
>> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>> extern void hardlockup_detector_disable(void);
>> +extern unsigned int hardlockup_panic;
>> #else
>> static inline void hardlockup_detector_disable(void) {}
>> #endif
>>
>> -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
>> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>> extern void arch_touch_nmi_watchdog(void);
>> #else
>> +#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
>> static inline void arch_touch_nmi_watchdog(void) {}
>> #endif
>> +#endif
>>
>> /**
>> * touch_nmi_watchdog - restart NMI watchdog timeout.
>> @@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
>> }
>> #endif
>>
>> -#ifdef CONFIG_LOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>> u64 hw_nmi_get_sample_period(int watchdog_thresh);
>> +#endif
>> +
>> +#ifdef CONFIG_LOCKUP_DETECTOR
>> extern int nmi_watchdog_enabled;
>> -extern int soft_watchdog_enabled;
>> extern int watchdog_user_enabled;
>> extern int watchdog_thresh;
>> extern unsigned long watchdog_enabled;
>> +extern struct cpumask watchdog_cpumask;
>> extern unsigned long *watchdog_cpumask_bits;
>> -extern atomic_t watchdog_park_in_progress;
>> +extern int __read_mostly watchdog_suspended;
>> #ifdef CONFIG_SMP
>> extern int sysctl_softlockup_all_cpu_backtrace;
>> extern int sysctl_hardlockup_all_cpu_backtrace;
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index 72aa080f91f0..4cb8e8b23c6e 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
>> obj-$(CONFIG_KGDB) += debug/
>> obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>> obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
>> -obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
>> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>> obj-$(CONFIG_SECCOMP) += seccomp.o
>> obj-$(CONFIG_RELAY) += relay.o
>> obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 4dfba1a76cc3..46b189fd865b 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
>> #endif
>> },
>> {
>> + .procname = "watchdog_cpumask",
>> + .data = &watchdog_cpumask_bits,
>> + .maxlen = NR_CPUS,
>> + .mode = 0644,
>> + .proc_handler = proc_watchdog_cpumask,
>> + },
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> + {
>> .procname = "soft_watchdog",
>> .data = &soft_watchdog_enabled,
>> .maxlen = sizeof (int),
>> @@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
>> .extra2 = &one,
>> },
>> {
>> - .procname = "watchdog_cpumask",
>> - .data = &watchdog_cpumask_bits,
>> - .maxlen = NR_CPUS,
>> - .mode = 0644,
>> - .proc_handler = proc_watchdog_cpumask,
>> - },
>> - {
>> .procname = "softlockup_panic",
>> .data = &softlockup_panic,
>> .maxlen = sizeof(int),
>> @@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
>> .extra1 = &zero,
>> .extra2 = &one,
>> },
>> +#endif
>> #ifdef CONFIG_HARDLOCKUP_DETECTOR
>> {
>> .procname = "hardlockup_panic",
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 03e0b69bb5bf..deb010505646 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -29,15 +29,55 @@
>> #include <linux/kvm_para.h>
>> #include <linux/kthread.h>
>>
>> +/* Watchdog configuration */
>> static DEFINE_MUTEX(watchdog_proc_mutex);
>>
>> -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
>> +int __read_mostly nmi_watchdog_enabled;
>> +
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
>> +
>> +/* boot commands */
>> +/*
>> + * Should we panic when a soft-lockup or hard-lockup occurs:
>> + */
>> +unsigned int __read_mostly hardlockup_panic =
>> + CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>> +/*
>> + * We may not want to enable hard lockup detection by default in all cases,
>> + * for example when running the kernel as a guest on a hypervisor. In these
>> + * cases this function can be called to disable hard lockup detection. This
>> + * function should only be executed once by the boot processor before the
>> + * kernel command line parameters are parsed, because otherwise it is not
>> + * possible to override this in hardlockup_panic_setup().
>> + */
>> +void hardlockup_detector_disable(void)
>> +{
>> + watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> +}
>> +
>> +static int __init hardlockup_panic_setup(char *str)
>> +{
>> + if (!strncmp(str, "panic", 5))
>> + hardlockup_panic = 1;
>> + else if (!strncmp(str, "nopanic", 7))
>> + hardlockup_panic = 0;
>> + else if (!strncmp(str, "0", 1))
>> + watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> + else if (!strncmp(str, "1", 1))
>> + watchdog_enabled |= NMI_WATCHDOG_ENABLED;
>> + return 1;
>> +}
>> +__setup("nmi_watchdog=", hardlockup_panic_setup);
>> +
>> #else
>> unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>> #endif
>> -int __read_mostly nmi_watchdog_enabled;
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> int __read_mostly soft_watchdog_enabled;
>> +#endif
>> +
>> int __read_mostly watchdog_user_enabled;
>> int __read_mostly watchdog_thresh = 10;
>>
>> @@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
>> int __read_mostly sysctl_softlockup_all_cpu_backtrace;
>> int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
>> #endif
>> -static struct cpumask watchdog_cpumask __read_mostly;
>> +struct cpumask watchdog_cpumask __read_mostly;
>> unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>>
>> -/* Helper for online, unparked cpus. */
>> -#define for_each_watchdog_cpu(cpu) \
>> - for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>> -
>> -atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>> -
>> /*
>> * The 'watchdog_running' variable is set to 1 when the watchdog threads
>> * are registered/started and is set to 0 when the watchdog threads are
>> @@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
>> * of 'watchdog_running' cannot change while the watchdog is deactivated
>> * temporarily (see related code in 'proc' handlers).
>> */
>> -static int __read_mostly watchdog_suspended;
>> +int __read_mostly watchdog_suspended;
>> +
>> +/*
>> + * These functions can be overridden if an architecture implements its
>> + * own hardlockup detector.
>> + */
>> +int __weak watchdog_nmi_enable(unsigned int cpu)
>> +{
>> + return 0;
>> +}
>> +void __weak watchdog_nmi_disable(unsigned int cpu)
>> +{
>> +}
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> +
>> +/* Helper for online, unparked cpus. */
>> +#define for_each_watchdog_cpu(cpu) \
>> + for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>> +
>> +atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>>
>> static u64 __read_mostly sample_period;
>>
>> @@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
>> return 1;
>> }
>> __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>> {
>> sysctl_hardlockup_all_cpu_backtrace =
>> @@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>> }
>> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>> #endif
>> +#endif
>>
>> /*
>> * Hard-lockup warnings should be triggered after just a few seconds. Soft-
>> @@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
>> __this_cpu_write(watchdog_touch_ts, 0);
>> }
>>
>> -/* watchdog detector functions */
>> -bool is_hardlockup(void)
>> -{
>> - unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>> -
>> - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
>> - return true;
>> -
>> - __this_cpu_write(hrtimer_interrupts_saved, hrint);
>> - return false;
>> -}
>> -
>> static int is_softlockup(unsigned long touch_ts)
>> {
>> unsigned long now = get_timestamp();
>> @@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
>> return 0;
>> }
>>
>> -static void watchdog_interrupt_count(void)
>> +/* watchdog detector functions */
>> +bool is_hardlockup(void)
>> {
>> - __this_cpu_inc(hrtimer_interrupts);
>> -}
>> + unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>>
>> -/*
>> - * These two functions are mostly architecture specific
>> - * defining them as weak here.
>> - */
>> -int __weak watchdog_nmi_enable(unsigned int cpu)
>> -{
>> - return 0;
>> + if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
>> + return true;
>> +
>> + __this_cpu_write(hrtimer_interrupts_saved, hrint);
>> + return false;
>> }
>> -void __weak watchdog_nmi_disable(unsigned int cpu)
>> +
>> +static void watchdog_interrupt_count(void)
>> {
>> + __this_cpu_inc(hrtimer_interrupts);
>> }
>>
>> static int watchdog_enable_all_cpus(void);
>> @@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
>> kthread_unpark(per_cpu(softlockup_watchdog, cpu));
>> }
>>
>> -/*
>> - * Suspend the hard and soft lockup detector by parking the watchdog threads.
>> - */
>> -int lockup_detector_suspend(void)
>> -{
>> - int ret = 0;
>> -
>> - get_online_cpus();
>> - mutex_lock(&watchdog_proc_mutex);
>> - /*
>> - * Multiple suspend requests can be active in parallel (counted by
>> - * the 'watchdog_suspended' variable). If the watchdog threads are
>> - * running, the first caller takes care that they will be parked.
>> - * The state of 'watchdog_running' cannot change while a suspend
>> - * request is active (see related code in 'proc' handlers).
>> - */
>> - if (watchdog_running && !watchdog_suspended)
>> - ret = watchdog_park_threads();
>> -
>> - if (ret == 0)
>> - watchdog_suspended++;
>> - else {
>> - watchdog_disable_all_cpus();
>> - pr_err("Failed to suspend lockup detectors, disabled\n");
>> - watchdog_enabled = 0;
>> - }
>> -
>> - mutex_unlock(&watchdog_proc_mutex);
>> -
>> - return ret;
>> -}
>> -
>> -/*
>> - * Resume the hard and soft lockup detector by unparking the watchdog threads.
>> - */
>> -void lockup_detector_resume(void)
>> -{
>> - mutex_lock(&watchdog_proc_mutex);
>> -
>> - watchdog_suspended--;
>> - /*
>> - * The watchdog threads are unparked if they were previously running
>> - * and if there is no more active suspend request.
>> - */
>> - if (watchdog_running && !watchdog_suspended)
>> - watchdog_unpark_threads();
>> -
>> - mutex_unlock(&watchdog_proc_mutex);
>> - put_online_cpus();
>> -}
>> -
>> static int update_watchdog_all_cpus(void)
>> {
>> int ret;
>> @@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
>> }
>> }
>>
>> +#else /* SOFTLOCKUP */
>> +static int watchdog_park_threads(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static void watchdog_unpark_threads(void)
>> +{
>> +}
>> +
>> +static int watchdog_enable_all_cpus(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static void watchdog_disable_all_cpus(void)
>> +{
>> +}
>> +
>> +static void set_sample_period(void)
>> +{
>> +}
>> +#endif /* SOFTLOCKUP */
>> +
>> +/*
>> + * Suspend the hard and soft lockup detector by parking the watchdog threads.
>> + */
>> +int lockup_detector_suspend(void)
>> +{
>> + int ret = 0;
>> +
>> + get_online_cpus();
>> + mutex_lock(&watchdog_proc_mutex);
>> + /*
>> + * Multiple suspend requests can be active in parallel (counted by
>> + * the 'watchdog_suspended' variable). If the watchdog threads are
>> + * running, the first caller takes care that they will be parked.
>> + * The state of 'watchdog_running' cannot change while a suspend
>> + * request is active (see related code in 'proc' handlers).
>> + */
>> + if (watchdog_running && !watchdog_suspended)
>> + ret = watchdog_park_threads();
>> +
>> + if (ret == 0)
>> + watchdog_suspended++;
>> + else {
>> + watchdog_disable_all_cpus();
>> + pr_err("Failed to suspend lockup detectors, disabled\n");
>> + watchdog_enabled = 0;
>> + }
>> +
>> + mutex_unlock(&watchdog_proc_mutex);
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> + * Resume the hard and soft lockup detector by unparking the watchdog threads.
>> + */
>> +void lockup_detector_resume(void)
>> +{
>> + mutex_lock(&watchdog_proc_mutex);
>> +
>> + watchdog_suspended--;
>> + /*
>> + * The watchdog threads are unparked if they were previously running
>> + * and if there is no more active suspend request.
>> + */
>> + if (watchdog_running && !watchdog_suspended)
>> + watchdog_unpark_threads();
>> +
>> + mutex_unlock(&watchdog_proc_mutex);
>> + put_online_cpus();
>> +}
>> +
>> #ifdef CONFIG_SYSCTL
>>
>> /*
>> @@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>> * a temporary cpumask, so we are likely not in a
>> * position to do much else to make things better.
>> */
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> if (smpboot_update_cpumask_percpu_thread(
>> &watchdog_threads, &watchdog_cpumask) != 0)
>> pr_err("cpumask update failed\n");
>> +#endif
>> }
>> }
>> out:
>> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
>> index 90d688df6ce1..295a0d84934c 100644
>> --- a/kernel/watchdog_hld.c
>> +++ b/kernel/watchdog_hld.c
>> @@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>> static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>> static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
>>
>> -/* boot commands */
>> -/*
>> - * Should we panic when a soft-lockup or hard-lockup occurs:
>> - */
>> -unsigned int __read_mostly hardlockup_panic =
>> - CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>> static unsigned long hardlockup_allcpu_dumped;
>> -/*
>> - * We may not want to enable hard lockup detection by default in all cases,
>> - * for example when running the kernel as a guest on a hypervisor. In these
>> - * cases this function can be called to disable hard lockup detection. This
>> - * function should only be executed once by the boot processor before the
>> - * kernel command line parameters are parsed, because otherwise it is not
>> - * possible to override this in hardlockup_panic_setup().
>> - */
>> -void hardlockup_detector_disable(void)
>> -{
>> - watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> -}
>> -
>> -static int __init hardlockup_panic_setup(char *str)
>> -{
>> - if (!strncmp(str, "panic", 5))
>> - hardlockup_panic = 1;
>> - else if (!strncmp(str, "nopanic", 7))
>> - hardlockup_panic = 0;
>> - else if (!strncmp(str, "0", 1))
>> - watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> - else if (!strncmp(str, "1", 1))
>> - watchdog_enabled |= NMI_WATCHDOG_ENABLED;
>> - return 1;
>> -}
>> -__setup("nmi_watchdog=", hardlockup_panic_setup);
>>
>> void arch_touch_nmi_watchdog(void)
>> {
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index e4587ebe52c7..f87a559dd178 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
>> The frequency of hrtimer and NMI events and the soft and hard lockup
>> thresholds can be controlled through the sysctl watchdog_thresh.
>>
>> +config SOFTLOCKUP_DETECTOR
>> + bool "Detect Soft Lockups"
>> + depends on LOCKUP_DETECTOR
>> +
>> +config HARDLOCKUP_DETECTOR_PERF
>> + bool
>> + select SOFTLOCKUP_DETECTOR
>> +
>> +#
>> +# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
>> +# lockup detector rather than the perf based detector.
>> +#
>> config HARDLOCKUP_DETECTOR
>> - def_bool y
>> - depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
>> - depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
>> + bool "Detect Hard Lockups"
>> + depends on LOCKUP_DETECTOR
>> + depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
>> + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
>> + select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
>>
>> config BOOTPARAM_HARDLOCKUP_PANIC
>> bool "Panic (Reboot) On Hard Lockups"
>> @@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>>
>> config BOOTPARAM_SOFTLOCKUP_PANIC
>> bool "Panic (Reboot) On Soft Lockups"
>> - depends on LOCKUP_DETECTOR
>> + depends on SOFTLOCKUP_DETECTOR
>> help
>> Say Y here to enable the kernel to panic on "soft lockups",
>> which are bugs that cause the kernel to loop in kernel
>> @@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>>
>> config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>> int
>> - depends on LOCKUP_DETECTOR
>> + depends on SOFTLOCKUP_DETECTOR
>> range 0 1
>> default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
>> default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
>> @@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>> config DETECT_HUNG_TASK
>> bool "Detect Hung Tasks"
>> depends on DEBUG_KERNEL
>> - default LOCKUP_DETECTOR
>> + default SOFTLOCKUP_DETECTOR
>> help
>> Say Y here to enable the kernel to detect "hung tasks",
>> which are bugs that cause the task to be stuck in
>> --
>> 2.11.0
>>

2017-06-15 03:04:17

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Wed, 14 Jun 2017 21:16:04 -0500
Babu Moger <[email protected]> wrote:

> Hi Don,
>
> On 6/14/2017 9:09 AM, Don Zickus wrote:
> > On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
> >>> Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
> >>> are getting close to have this defined properly which would allow us to
> >>> split the code up correctly in the future.
> >> How's this for a replacement patch 3? I think the Kconfig works out much
> >> better now.
> > Hi Nick,
> >
> > I think you made this much clearer, thank you! I am good with this.
> >
> >
> > Hi Babu,
> >
> > Can you give this patchset (and particularly this version of patch 3) a try
> > on sparc to make sure we didn't break anything? I believe this should
> > resolve the start nmi watchdog on boot issue you noticed. Thanks!
>
> There is still one problem with the patch.
>
> # cat /proc/sys/kernel/watchdog
> 1
> # cat /proc/sys/kernel/nmi_watchdog
> 0
>
> Problem is setting the initial value for "nmi_watchdog"
>
> We need something(or similar) patch on top to address this.
> ============================================
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5397c63..0105856 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -34,9 +34,13 @@
>
> int __read_mostly nmi_watchdog_enabled;
>
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR) ||
> defined(CONFIG_HAVE_NMI_WATCHDOG)
> unsigned long __read_mostly watchdog_enabled =
> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> +#else
> +unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> +#endif
>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> /* boot commands */
> /*
> * Should we panic when a soft-lockup or hard-lockup occurs:
> @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> return 1;
> }
> __setup("nmi_watchdog=", hardlockup_panic_setup);
> -
> -#else
> -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> #endif
>
> #ifdef CONFIG_SOFTLOCKUP_DETECTOR

Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
also relies on the watchdog_enabled value.

I guess I can fold your incremental patch in. I hope we could get
sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
afterwards though, so we only have 2 cases -- complete hardlockup
detector, or the very bare minimum NMI_WATCHDOG.

Thanks,
Nick

2017-06-15 15:15:06

by Babu Moger

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

Nick,

On 6/14/2017 10:04 PM, Nicholas Piggin wrote:
> On Wed, 14 Jun 2017 21:16:04 -0500
> Babu Moger <[email protected]> wrote:
>
>> Hi Don,
>>
>> On 6/14/2017 9:09 AM, Don Zickus wrote:
>>> On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
>>>>> Yeah, if you wouldn't mind. Sorry for dragging this out, but I feel like we
>>>>> are getting close to have this defined properly which would allow us to
>>>>> split the code up correctly in the future.
>>>> How's this for a replacement patch 3? I think the Kconfig works out much
>>>> better now.
>>> Hi Nick,
>>>
>>> I think you made this much clearer, thank you! I am good with this.
>>>
>>>
>>> Hi Babu,
>>>
>>> Can you give this patchset (and particularly this version of patch 3) a try
>>> on sparc to make sure we didn't break anything? I believe this should
>>> resolve the start nmi watchdog on boot issue you noticed. Thanks!
>> There is still one problem with the patch.
>>
>> # cat /proc/sys/kernel/watchdog
>> 1
>> # cat /proc/sys/kernel/nmi_watchdog
>> 0
>>
>> Problem is setting the initial value for "nmi_watchdog"
>>
>> We need something(or similar) patch on top to address this.
>> ============================================
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 5397c63..0105856 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -34,9 +34,13 @@
>>
>> int __read_mostly nmi_watchdog_enabled;
>>
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#if defined(CONFIG_HARDLOCKUP_DETECTOR) ||
>> defined(CONFIG_HAVE_NMI_WATCHDOG)
>> unsigned long __read_mostly watchdog_enabled =
>> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
>> +#else
>> +unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>> +#endif
>>
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> /* boot commands */
>> /*
>> * Should we panic when a soft-lockup or hard-lockup occurs:
>> @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
>> return 1;
>> }
>> __setup("nmi_watchdog=", hardlockup_panic_setup);
>> -
>> -#else
>> -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>> #endif
>>
>> #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> also relies on the watchdog_enabled value.
>
> I guess I can fold your incremental patch in. I hope we could get
Sure. Please go ahead.
> sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> afterwards though, so we only have 2 cases -- complete hardlockup
Sure. Sounds good. Will look at it later.
> detector, or the very bare minimum NMI_WATCHDOG.
>
> Thanks,
> Nick

2017-06-15 15:51:26

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > /* boot commands */
> > /*
> > * Should we panic when a soft-lockup or hard-lockup occurs:
> > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > return 1;
> > }
> > __setup("nmi_watchdog=", hardlockup_panic_setup);
> > -
> > -#else
> > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> > #endif
> >
> > #ifdef CONFIG_SOFTLOCKUP_DETECTOR
>
> Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> also relies on the watchdog_enabled value.
>
> I guess I can fold your incremental patch in. I hope we could get
> sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> afterwards though, so we only have 2 cases -- complete hardlockup
> detector, or the very bare minimum NMI_WATCHDOG.

Hi Nick,

I agree. Let's move forward with this temp fix just to get things in the
kernel for initial testing. Then follow up with a cleanup patch. The idea
is we can always revert the cleanup patch if things still don't quite work.

Thoughts?

Cheers,
Don

2017-06-15 15:59:15

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Thu, 15 Jun 2017 11:51:22 -0400
Don Zickus <[email protected]> wrote:

> On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > > /* boot commands */
> > > /*
> > > * Should we panic when a soft-lockup or hard-lockup occurs:
> > > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > > return 1;
> > > }
> > > __setup("nmi_watchdog=", hardlockup_panic_setup);
> > > -
> > > -#else
> > > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> > > #endif
> > >
> > > #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> >
> > Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> > also relies on the watchdog_enabled value.
> >
> > I guess I can fold your incremental patch in. I hope we could get
> > sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> > afterwards though, so we only have 2 cases -- complete hardlockup
> > detector, or the very bare minimum NMI_WATCHDOG.
>
> Hi Nick,
>
> I agree. Let's move forward with this temp fix just to get things in the
> kernel for initial testing. Then follow up with a cleanup patch. The idea
> is we can always revert the cleanup patch if things still don't quite work.
>
> Thoughts?

Hi Don,

Yeah that sounds good to me. Would you like me to re-test things
and resend the series?

Thanks,
Nick

2017-06-15 18:42:50

by Don Zickus

[permalink] [raw]
Subject: Re: [PATCH 3/4] watchdog: Split up config options

On Fri, Jun 16, 2017 at 01:59:00AM +1000, Nicholas Piggin wrote:
> On Thu, 15 Jun 2017 11:51:22 -0400
> Don Zickus <[email protected]> wrote:
>
> > On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > > > /* boot commands */
> > > > /*
> > > > * Should we panic when a soft-lockup or hard-lockup occurs:
> > > > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > > > return 1;
> > > > }
> > > > __setup("nmi_watchdog=", hardlockup_panic_setup);
> > > > -
> > > > -#else
> > > > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> > > > #endif
> > > >
> > > > #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> > >
> > > Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> > > also relies on the watchdog_enabled value.
> > >
> > > I guess I can fold your incremental patch in. I hope we could get
> > > sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> > > afterwards though, so we only have 2 cases -- complete hardlockup
> > > detector, or the very bare minimum NMI_WATCHDOG.
> >
> > Hi Nick,
> >
> > I agree. Let's move forward with this temp fix just to get things in the
> > kernel for initial testing. Then follow up with a cleanup patch. The idea
> > is we can always revert the cleanup patch if things still don't quite work.
> >
> > Thoughts?
>
> Hi Don,
>
> Yeah that sounds good to me. Would you like me to re-test things
> and resend the series?

Yes, please. Thanks!

Cheers,
Don