2004-11-05 08:34:50

by Nigel Cunningham

[permalink] [raw]
Subject: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi all.

Tracking down SMP problems, I've found that if you boot with
nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
doing sensitive things like restoring the original kernel. I don't know
enough to provide a patch to disable it so thought I'd ask if someone
could volunteer to fix this?

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6


2004-11-05 16:41:43

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi Nigel

On Fri, 5 Nov 2004, Nigel Cunningham wrote:

> Tracking down SMP problems, I've found that if you boot with
> nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> doing sensitive things like restoring the original kernel. I don't know
> enough to provide a patch to disable it so thought I'd ask if someone
> could volunteer to fix this?

Use enable/disable_lapic_nmi_watchdog but first check to see whether
nmi_watchdog == NMI_IO_APIC in which case you'd then call
disable/enable_timer_nmi_watchdog. Something like;

void swsuspend_disable_nmi_watchdog(void)
{
if ((nmi_watchdog == NMI_IO_APIC) && (smp_processor_id() == 0)) {
disable_timer_nmi_watchdog();
return;
}

disable_lapic_nmi_watchdog();
}

void swsuspend_enable_nmi_watchdog(void)
{
if ((nmi_watchdog == NMI_IO_APIC) && (smp_processor_id() == 0)) {
enable_timer_nmi_watchdog();
return;
}

enable_lapic_nmi_watchdog();
}

Do note that this has to be run on all processors, holla if there is
anything else.

Thanks,
Zwane

2004-11-05 21:18:31

by Nigel Cunningham

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi.

Thanks! I'll give it a go.

Regards,

Nigel

--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-06 09:52:31

by Nigel Cunningham

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi.

On Sat, 2004-11-06 at 03:41, Zwane Mwaikambo wrote:
> Hi Nigel
>
> On Fri, 5 Nov 2004, Nigel Cunningham wrote:
>
> > Tracking down SMP problems, I've found that if you boot with
> > nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> > doing sensitive things like restoring the original kernel. I don't know
> > enough to provide a patch to disable it so thought I'd ask if someone
> > could volunteer to fix this?
>
> Use enable/disable_lapic_nmi_watchdog but first check to see whether
> nmi_watchdog == NMI_IO_APIC in which case you'd then call
> disable/enable_timer_nmi_watchdog. Something like;

Huh! I must have been blind; those routines are right above the lapic
code I was looking at last night!

Thanks!

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-11 11:13:03

by Pavel Machek

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi!

> Tracking down SMP problems, I've found that if you boot with
> nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> doing sensitive things like restoring the original kernel. I don't know
> enough to provide a patch to disable it so thought I'd ask if someone
> could volunteer to fix this?

When we debated this at x86-64 lists, our conclusion was 'critical
section should take less than 5 seconds, and watchdog only touches its
own variables, so stopping it should not be needed'. [on x86-64,
watchdog is enabled even on up].

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-11 20:30:17

by Nigel Cunningham

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi.

On Thu, 2004-11-11 at 10:30, Pavel Machek wrote:
> Hi!
>
> > Tracking down SMP problems, I've found that if you boot with
> > nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> > doing sensitive things like restoring the original kernel. I don't know
> > enough to provide a patch to disable it so thought I'd ask if someone
> > could volunteer to fix this?
>
> When we debated this at x86-64 lists, our conclusion was 'critical
> section should take less than 5 seconds, and watchdog only touches its
> own variables, so stopping it should not be needed'. [on x86-64,
> watchdog is enabled even on up].

I've since decided this too; it turns out that the SMP problems were a
function of a problem with freezing workthreads, which I've since fixed.
I have a perfectly stable system now. Which reminds me, since that code
was merged, I should send the patch to Andy. Will do so shortly.

Regards,

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-11 20:39:02

by Nigel Cunningham

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

Hi.

On Thu, 2004-11-11 at 10:30, Pavel Machek wrote:
> Hi!
>
> > Tracking down SMP problems, I've found that if you boot with
> > nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> > doing sensitive things like restoring the original kernel. I don't know
> > enough to provide a patch to disable it so thought I'd ask if someone
> > could volunteer to fix this?
>
> When we debated this at x86-64 lists, our conclusion was 'critical
> section should take less than 5 seconds, and watchdog only touches its
> own variables, so stopping it should not be needed'. [on x86-64,
> watchdog is enabled even on up].

Oh... oops... Must be too early in the morning!

It's not merged, so I don't have to send the fix.

By the way, the slowness caused by sysdev is because of time.c; I'm
about to try reducing the number of get_cmos_time() calls, which should
speed it up by at least 2 seconds.

Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901

You see, at just the right time, when we were still powerless, Christ
died for the ungodly. -- Romans 5:6

2004-11-12 01:15:16

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: IO_APIC NMI Watchdog not handled by suspend/resume.

On Fri, 12 Nov 2004, Nigel Cunningham wrote:

> On Thu, 2004-11-11 at 10:30, Pavel Machek wrote:
> > Hi!
> >
> > > Tracking down SMP problems, I've found that if you boot with
> > > nmi_watchdog=1 (IO_APIC), the watchdog continues to run while suspend is
> > > doing sensitive things like restoring the original kernel. I don't know
> > > enough to provide a patch to disable it so thought I'd ask if someone
> > > could volunteer to fix this?
> >
> > When we debated this at x86-64 lists, our conclusion was 'critical
> > section should take less than 5 seconds, and watchdog only touches its
> > own variables, so stopping it should not be needed'. [on x86-64,
> > watchdog is enabled even on up].
>
> I've since decided this too; it turns out that the SMP problems were a
> function of a problem with freezing workthreads, which I've since fixed.
> I have a perfectly stable system now. Which reminds me, since that code
> was merged, I should send the patch to Andy. Will do so shortly.

Could you please Cc me, i (really) wanted to work on that code but got
interrupted by some residence moving.

Thanks,
Zwane