2004-01-25 05:26:23

by Huw Rogers

[permalink] [raw]
Subject: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

Uniwill N258SA0 (http://www.uniwill.com/Product/N258SA0/N258SA0.html) aka
Hypersonic Aviator NX6, Fujitsu-Siemens AMILO D 1840 Widescreen, etc.).
SiS 648FX chipset, SiS 900 Ethercard, AMI BIOS, ATI AV350/M10 128Mb.
My machine: Hyperthreaded P4 2.8GHz, .5Gb PC3200 RAM.

Installed Fedora. Upgraded to 2.6.2-rc1 per
http://thomer.com/linux/migrate-to-2.6.html.

Applied kernel patches:
- SiS AGP (http://lkml.org/lkml/2004/1/20/233)
(needed to run ATI's 3.7 fglrx drivers on the SiS/M10 combo)
- ACPI 20031203 (http://acpi.sourceforge.net/)

All good, but ACPI sleep doesn't work and neither does userland IRQ
balancing with Arjan's irqbalance (http://people.redhat.com/arjanv/irqbalance/),
a standard part of the Fedora install.

irqbalance just locks up the machine totally, hard power-off needed, no
traces in the logs. Probably some issue (race?) with it writing to
/proc/irq/X/smp_affinity. And how is irqbalance supposed to play with
kirqd anyway? Grepping this list and others doesn't give any kind of an
answer. But disabling it gives all interrupts to cpu0 (looking at
/proc/interrupts). kirqd apparently only balances between CPU packages,
not between HT siblings (info gleaned from this list).

Anyway, sleep/suspend/standby functionality (important to most laptop
users, need to close the lid and go): This checkin to
kernel/power/main.c seems to disable suspend with SMP (!?):

--- 1.3/kernel/power/main.c Sat Jan 24 20:44:47 2004
+++ 1.4/kernel/power/main.c Sat Jan 24 20:44:47 2004
@@ -172,6 +172,12 @@
if (down_trylock(&pm_sem))
return -EBUSY;

+ /* Suspend is hard to get right on SMP. */
+ if (num_online_cpus() != 1) {
+ error = -EPERM;
+ goto Unlock;
+ }
+
if ((error = suspend_prepare(state)))
goto Unlock;

... which, given the prevalence of hyperthreaded CPUs on laptops, is
fighting a trend. I backed out the above with a #if 0 then tried echo -n
1>/proc/acpi/sleep again. This time I got:

Stopping tasks: ===================================================================
stopping tasks failed (1 tasks remaining)
Restarting tasks...<6> Strange, kirqd not stopped
done

kirqd just wouldn't stop.

Tried booting with acpi=off and apm=smp to force APM, then ran
apm --suspend, but it put the machine into a LCD blanked state it
couldn't get out of without another hard power cycle.

Questions: Why does irqbalance lock up the machine and how is it
supposed to collaborate with kirqd? How is ACPI suspend supposed to work
on any recent laptop if SMP is barred? Why doesn't kirqd stop when asked
to by ACPI suspend once that restriction is bypassed?

A lot of effort is going into swsusp/pmdisk - but a lot of laptop users
prefer S1 to S4, as it's faster and more reliable. It'd be nice to see a
simpler "spin down the hard drive, reduce CPU clock speed to a minimum,
and power down display/ether/wireless/usb/PCMCIA" working ahead of
hibernation.


2004-01-25 19:50:35

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

On Sun, 25 Jan 2004, Huw Rogers wrote:

> irqbalance just locks up the machine totally, hard power-off needed, no
> traces in the logs. Probably some issue (race?) with it writing to
> /proc/irq/X/smp_affinity. And how is irqbalance supposed to play with
> kirqd anyway? Grepping this list and others doesn't give any kind of an
> answer. But disabling it gives all interrupts to cpu0 (looking at
> /proc/interrupts). kirqd apparently only balances between CPU packages,
> not between HT siblings (info gleaned from this list).

Does this happen with the 'noirqbalance' kernel parameter?

2004-01-26 23:33:23

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

In article <[email protected]>,
Huw Rogers <[email protected]> wrote:
| Uniwill N258SA0 (http://www.uniwill.com/Product/N258SA0/N258SA0.html) aka
| Hypersonic Aviator NX6, Fujitsu-Siemens AMILO D 1840 Widescreen, etc.).
| SiS 648FX chipset, SiS 900 Ethercard, AMI BIOS, ATI AV350/M10 128Mb.
| My machine: Hyperthreaded P4 2.8GHz, .5Gb PC3200 RAM.
|
| Installed Fedora. Upgraded to 2.6.2-rc1 per
| http://thomer.com/linux/migrate-to-2.6.html.
|
| Applied kernel patches:
| - SiS AGP (http://lkml.org/lkml/2004/1/20/233)
| (needed to run ATI's 3.7 fglrx drivers on the SiS/M10 combo)
| - ACPI 20031203 (http://acpi.sourceforge.net/)
|
| All good, but ACPI sleep doesn't work and neither does userland IRQ
| balancing with Arjan's irqbalance (http://people.redhat.com/arjanv/irqbalance/),
| a standard part of the Fedora install.

Let me ask a question which probably has an obvious answer... why do you
care to balance the irq on the siblings of a single CPU? Is there some
hidden value I totally miss?

Noting that WBEL-3.0 balances all of the interrupts *except* NICs, I am
sure I don't understand the benefits of balancing between siblings, but
I'm sure someone will enlighten me.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2004-01-27 09:00:26

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

Hi!

> irqbalance just locks up the machine totally, hard power-off needed, no
> traces in the logs. Probably some issue (race?) with it writing to
> /proc/irq/X/smp_affinity. And how is irqbalance supposed to play with
> kirqd anyway? Grepping this list and others doesn't give any kind of an
> answer. But disabling it gives all interrupts to cpu0 (looking at
> /proc/interrupts). kirqd apparently only balances between CPU packages,
> not between HT siblings (info gleaned from this list).
>
> Anyway, sleep/suspend/standby functionality (important to most laptop
> users, need to close the lid and go): This checkin to
> kernel/power/main.c seems to disable suspend with SMP (!?):
>
> --- 1.3/kernel/power/main.c Sat Jan 24 20:44:47 2004
> +++ 1.4/kernel/power/main.c Sat Jan 24 20:44:47 2004
> @@ -172,6 +172,12 @@
> if (down_trylock(&pm_sem))
> return -EBUSY;
>
> + /* Suspend is hard to get right on SMP. */
> + if (num_online_cpus() != 1) {
> + error = -EPERM;
> + goto Unlock;
> + }
> +
> if ((error = suspend_prepare(state)))
> goto Unlock;
>
> ... which, given the prevalence of hyperthreaded CPUs on laptops, is
> fighting a trend. I backed out the above with a #if 0 then tried echo -n
> 1>/proc/acpi/sleep again. This time I got:

Well, no sleep developers have SMP or HT machines, AFAICT.

If you back that out... well you are on your own.

> A lot of effort is going into swsusp/pmdisk - but a lot of laptop users
> prefer S1 to S4, as it's faster and more reliable. It'd be nice to see a
> simpler "spin down the hard drive, reduce CPU clock speed to a minimum,
> and power down display/ether/wireless/usb/PCMCIA" working ahead of
> hibernation.

As far as I can see, noone is interested in S1. If you want to help
with it... [There's no need to stop tasks/stop devices on non-broken
hardware. Unfortunately there's a lot of broken hw out there, so I'm
not sure we can do it by default.]
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-27 15:38:43

by Bart Samwel

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

Pavel Machek wrote:
>>Anyway, sleep/suspend/standby functionality (important to most laptop
>>users, need to close the lid and go): This checkin to
>>kernel/power/main.c seems to disable suspend with SMP (!?):
>>
>>--- 1.3/kernel/power/main.c Sat Jan 24 20:44:47 2004
>>+++ 1.4/kernel/power/main.c Sat Jan 24 20:44:47 2004
>>@@ -172,6 +172,12 @@
>> if (down_trylock(&pm_sem))
>> return -EBUSY;
>>
>>+ /* Suspend is hard to get right on SMP. */
>>+ if (num_online_cpus() != 1) {
>>+ error = -EPERM;
>>+ goto Unlock;
>>+ }
>>+
>> if ((error = suspend_prepare(state)))
>> goto Unlock;
>>
>>... which, given the prevalence of hyperthreaded CPUs on laptops, is
>>fighting a trend. I backed out the above with a #if 0 then tried echo -n
>>1>/proc/acpi/sleep again. This time I got:
>
>
> Well, no sleep developers have SMP or HT machines, AFAICT.
>
> If you back that out... well you are on your own.

Just a random thought: if I understand it correctly, CPU hotplugging is
intended to be able to take CPUs online and offline one by one, am I
right? Well, when that infrastructure's ready, this can probably be made
to work for SMP by taking all the other CPUs offline first. They're all
going to go offline because of the suspend anyway, so it shouldn't make
much difference. :)

-- Bart

2004-01-27 19:28:14

by Nigel Cunningham

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

Hi.

I have SMP working under 2.4, and am not far away from having it for
2.6. There is just one file that needs changing, but I need to learn
some x86 assembly first. If someone already knows x86 assembly and wants
to get it going first, I'll happily apply the patch.

Regards,

Nigel

> > Well, no sleep developers have SMP or HT machines, AFAICT.

--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.

2004-01-27 20:58:54

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.2-rc1 / ACPI sleep / irqbalance / kirqd / pentium 4 HT problems on Uniwill N258SA0

Hi!

> >>Anyway, sleep/suspend/standby functionality (important to most laptop
> >>users, need to close the lid and go): This checkin to
> >>kernel/power/main.c seems to disable suspend with SMP (!?):
> >>
> >>--- 1.3/kernel/power/main.c Sat Jan 24 20:44:47 2004
> >>+++ 1.4/kernel/power/main.c Sat Jan 24 20:44:47 2004
> >>@@ -172,6 +172,12 @@
> >> if (down_trylock(&pm_sem))
> >> return -EBUSY;
> >>
> >>+ /* Suspend is hard to get right on SMP. */
> >>+ if (num_online_cpus() != 1) {
> >>+ error = -EPERM;
> >>+ goto Unlock;
> >>+ }
> >>+
> >> if ((error = suspend_prepare(state)))
> >> goto Unlock;
> >>
> >>... which, given the prevalence of hyperthreaded CPUs on laptops, is
> >>fighting a trend. I backed out the above with a #if 0 then tried echo -n
> >>1>/proc/acpi/sleep again. This time I got:
> >
> >
> > Well, no sleep developers have SMP or HT machines, AFAICT.
> >
> > If you back that out... well you are on your own.
>
> Just a random thought: if I understand it correctly, CPU hotplugging is
> intended to be able to take CPUs online and offline one by one, am I
> right? Well, when that infrastructure's ready, this can probably be made
> to work for SMP by taking all the other CPUs offline first. They're all
> going to go offline because of the suspend anyway, so it shouldn't make
> much difference. :)

That was original plan, but CPU hotplug is unlikely to get into 2.6,
AFAICT. (And Nigel has another solution).

Pavel

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]