Hi!
I've just had a go at fixing the issue with my implementation not
suspending the sysdevs (I believe swsusp does the same). In the process,
I reworked the MTRR support so it's not treated as a sysdev. Instead,
when we're saving cpu state, the mtrr_save function function is called.
When we go to restore CPU state, each CPU calls a function that resets
it's MTRRs and the 'main' cpu then frees the saved data. This is working
well here (did a dozen plus suspends on the trot), but I want to check
that it sounds like the right solution to you.
Perhaps this method should be made more generic? (Are there likely to be
other per-cpu state savers needed?)
One thing I have noticed is that by adding the sysdev suspend/resume
calls, I've gained a few seconds delay. I'll see if I can track down the
cause.
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
Everyone lives by faith. Some people just don't believe it.
Want proof? Try to prove that the theory of evolution is true.
>
>Hi!
>
>I've just had a go at fixing the issue with my implementation not
>suspending the sysdevs (I believe swsusp does the same). In the
process,
>I reworked the MTRR support so it's not treated as a sysdev. Instead,
>when we're saving cpu state, the mtrr_save function function is called.
>When we go to restore CPU state, each CPU calls a function that resets
>it's MTRRs and the 'main' cpu then frees the saved data. This is
working
>well here (did a dozen plus suspends on the trot), but I want to check
>that it sounds like the right solution to you.
>
>Perhaps this method should be made more generic? (Are there likely to
be
>other per-cpu state savers needed?)
>
>One thing I have noticed is that by adding the sysdev suspend/resume
>calls, I've gained a few seconds delay. I'll see if I can track down
the
>cause.
Is the problem MTRR resume must be with IRQ enabled, right? Could we
implement a method sysdev resume with IRQ enabled? MTRR driver isn't the
only case. The ACPI Link device is another case, it's a sysdev (it must
resume before any PCI device resumed), but its resume (it uses semaphore
and non-atomic kmalloc) can't invoked with IRQ enabled. I guess cpufreq
driver is another case when suspend/resume SMP is supported.
Thanks,
Shaohua
Hi.
On Wed, 2004-10-27 at 12:48, Li, Shaohua wrote:
> >
> >Hi!
> >
> >I've just had a go at fixing the issue with my implementation not
> >suspending the sysdevs (I believe swsusp does the same). In the
> process,
> >I reworked the MTRR support so it's not treated as a sysdev. Instead,
> >when we're saving cpu state, the mtrr_save function function is called.
> >When we go to restore CPU state, each CPU calls a function that resets
> >it's MTRRs and the 'main' cpu then frees the saved data. This is
> working
> >well here (did a dozen plus suspends on the trot), but I want to check
> >that it sounds like the right solution to you.
> >
> >Perhaps this method should be made more generic? (Are there likely to
> be
> >other per-cpu state savers needed?)
> >
> >One thing I have noticed is that by adding the sysdev suspend/resume
> >calls, I've gained a few seconds delay. I'll see if I can track down
> the
> >cause.
> Is the problem MTRR resume must be with IRQ enabled, right? Could we
That's right.
> implement a method sysdev resume with IRQ enabled? MTRR driver isn't the
> only case. The ACPI Link device is another case, it's a sysdev (it must
> resume before any PCI device resumed), but its resume (it uses semaphore
> and non-atomic kmalloc) can't invoked with IRQ enabled. I guess cpufreq
> driver is another case when suspend/resume SMP is supported.
I'll see if I can find some time to prepare something. Might be a bit
slow; humans don't multitask very well, and I'm trying to at the moment
:>
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
Everyone lives by faith. Some people just don't believe it.
Want proof? Try to prove that the theory of evolution is true.
On Tuesday 26 October 2004 09:48 pm, Li, Shaohua wrote:
> >One thing I have noticed is that by adding the sysdev suspend/resume
> >calls, I've gained a few seconds delay. I'll see if I can track down
> the
> >cause.
> Is the problem MTRR resume must be with IRQ enabled, right? Could we
> implement a method sysdev resume with IRQ enabled?
If I understand correctly the point of classifying device as sysdev is
that it (device) is essential for the system and must be suspended last
and resumed first, presumably with interrupts off. IRQ controller comes
to mind...
--
Dmitry
Hi.
On Wed, 2004-10-27 at 13:20, Dmitry Torokhov wrote:
> On Tuesday 26 October 2004 09:48 pm, Li, Shaohua wrote:
> > >One thing I have noticed is that by adding the sysdev suspend/resume
> > >calls, I've gained a few seconds delay. I'll see if I can track down
> > the
> > >cause.
> > Is the problem MTRR resume must be with IRQ enabled, right? Could we
> > implement a method sysdev resume with IRQ enabled?
>
> If I understand correctly the point of classifying device as sysdev is
> that it (device) is essential for the system and must be suspended last
> and resumed first, presumably with interrupts off. IRQ controller comes
> to mind...
Yes, but could we not do something like the process with regular
devices. ie a call with interrupts disabled and then a similar call with
interrupts enabled?
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
Everyone lives by faith. Some people just don't believe it.
Want proof? Try to prove that the theory of evolution is true.
On Tuesday 26 October 2004 10:17 pm, Nigel Cunningham wrote:
> Hi.
>
> On Wed, 2004-10-27 at 13:20, Dmitry Torokhov wrote:
> > On Tuesday 26 October 2004 09:48 pm, Li, Shaohua wrote:
> > > >One thing I have noticed is that by adding the sysdev suspend/resume
> > > >calls, I've gained a few seconds delay. I'll see if I can track down
> > > the
> > > >cause.
> > > Is the problem MTRR resume must be with IRQ enabled, right? Could we
> > > implement a method sysdev resume with IRQ enabled?
> >
> > If I understand correctly the point of classifying device as sysdev is
> > that it (device) is essential for the system and must be suspended last
> > and resumed first, presumably with interrupts off. IRQ controller comes
> > to mind...
>
> Yes, but could we not do something like the process with regular
> devices. ie a call with interrupts disabled and then a similar call with
> interrupts enabled?
>
Yes, re-reading the parent post it seems that's what required for ACPI.
Doing a pass with IRQ ON for regular devices, then IRQ OFF, then IRQ on
again for sysdevs and then again with IRQ off. Man, that's getting messy...
Well, I understand that ACPI is using semaphore and a GFP_KERNEL, but what
is the problem with MTRR? I understand that they should be set with IRQ
off but I highly doibt that enabling IRQ at the end is a requirement.
I think what is described in the commnet is rather a "normal flow of events".
--
Dmitry
Hi.
On Wed, 2004-10-27 at 13:50, Dmitry Torokhov wrote:
> Well, I understand that ACPI is using semaphore and a GFP_KERNEL, but what
> is the problem with MTRR? I understand that they should be set with IRQ
> off but I highly doibt that enabling IRQ at the end is a requirement.
> I think what is described in the commnet is rather a "normal flow of events".
The real problem with MTRRs is SMP support: smp_call_function doesn't
like IRQs disabled.
I got around a similar issue with saving CPU state (for suspend-to-disk)
by using the same general sequence that I later discovered described in
arch/i386/kernel/cpu/mtrr/main.c (set_mtrr header) for saving CPU
contexts. I extended it yesterday to do the MTRR settings as well,
before Shaohua pointed to the more general need.
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
Everyone lives by faith. Some people just don't believe it.
Want proof? Try to prove that the theory of evolution is true.
Hi!
> >One thing I have noticed is that by adding the sysdev suspend/resume
> >calls, I've gained a few seconds delay. I'll see if I can track down
> the
> >cause.
> Is the problem MTRR resume must be with IRQ enabled, right? Could we
> implement a method sysdev resume with IRQ enabled? MTRR driver isn't
> the
MTRR does not deserve to be sysdev. It is not essential for the
system, it only makes it slow.
> only case. The ACPI Link device is another case, it's a sysdev (it must
> resume before any PCI device resumed), but its resume (it uses semaphore
> and non-atomic kmalloc) can't invoked with IRQ enabled. I guess cpufreq
> driver is another case when suspend/resume SMP is supported.
I do not see how enabling interrupts before setting up IRQs is good
idea.
What about this one, instead?
* ACPI Link device should allocate with GFP_ATOMIC
* during suspend, locks can't be taken. (We stop userland, etc). So it
should be okay to down_trylock() and panic if that does not work.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
>
>> >One thing I have noticed is that by adding the sysdev suspend/resume
>> >calls, I've gained a few seconds delay. I'll see if I can track down
>> the
>> >cause.
>> Is the problem MTRR resume must be with IRQ enabled, right? Could we
>> implement a method sysdev resume with IRQ enabled? MTRR driver isn't
>> the
>
>MTRR does not deserve to be sysdev. It is not essential for the
>system, it only makes it slow.
It's a CPU driver, cpufreq driver is the same.
>
>> only case. The ACPI Link device is another case, it's a sysdev (it
must
>> resume before any PCI device resumed), but its resume (it uses
semaphore
>> and non-atomic kmalloc) can't invoked with IRQ enabled. I guess
cpufreq
>> driver is another case when suspend/resume SMP is supported.
>
>I do not see how enabling interrupts before setting up IRQs is good
>idea.
>
>What about this one, instead?
>
>* ACPI Link device should allocate with GFP_ATOMIC
>
>* during suspend, locks can't be taken. (We stop userland, etc). So it
>should be okay to down_trylock() and panic if that does not work.
Hmm, the only problem is ACPI link device resume code will call into
ACPI CA code. We can't change ACPI CA (its code is very huge).
Thanks,
Shaohua
>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Pavel Machek
>Sent: Wednesday, October 27, 2004 3:01 AM
>To: Li, Shaohua
>Cc: [email protected]; Patrick Mochel; Linux Kernel
>Mailing List
>Subject: Re: Fixing MTRR smp breakage and suspending sysdevs.
>
>Hi!
>
>> >One thing I have noticed is that by adding the sysdev suspend/resume
>> >calls, I've gained a few seconds delay. I'll see if I can track down
>> the
>> >cause.
>> Is the problem MTRR resume must be with IRQ enabled, right? Could we
>> implement a method sysdev resume with IRQ enabled? MTRR driver isn't
>> the
>
>MTRR does not deserve to be sysdev. It is not essential for the
>system, it only makes it slow.
>
>> only case. The ACPI Link device is another case, it's a
>sysdev (it must
>> resume before any PCI device resumed), but its resume (it
>uses semaphore
>> and non-atomic kmalloc) can't invoked with IRQ enabled. I
>guess cpufreq
>> driver is another case when suspend/resume SMP is supported.
>
>I do not see how enabling interrupts before setting up IRQs is good
>idea.
>
>What about this one, instead?
>
>* ACPI Link device should allocate with GFP_ATOMIC
>
>* during suspend, locks can't be taken. (We stop userland, etc). So it
>should be okay to down_trylock() and panic if that does not work.
Actually, I am trying another approach for Link Device.
- Temporarily enable interrupts during Link Device resume. Turn off all
the external interrupts at suspend time. They will remain suspended
until interrupt device resumes.
Something like the patch below. Only part I don't like is controlling
the resume order by Makefiles and the link order. Probably we can fix
that by sorting the sysdev list at the boottime, depending on our
ordering requirements. I think, the resume order we need to maintain is
something like this: irqrouter, pit/timer, i8259, lapic, ioapic, others
Thanks,
-Venki
--- linux-2.6.9-latest/arch/i386/kernel/i8259.c.org 2004-10-26
21:58:32.000000000 -0700
+++ linux-2.6.9-latest/arch/i386/kernel/i8259.c 2004-10-27
13:06:12.000000000 -0700
@@ -266,6 +266,10 @@ static int i8259A_resume(struct sys_devi
static int i8259A_suspend(struct sys_device *dev, u32 state)
{
save_ELCR(irq_trigger);
+ /* Stop all the interrupts from PIC until resume */
+ outb(0xff, PIC_MASTER_IMR);
+ outb(0xff, PIC_SLAVE_IMR);
+
return 0;
}
--- linux-2.6.9-latest/arch/i386/kernel/io_apic.c.org 2004-10-26
22:05:58.000000000 -0700
+++ linux-2.6.9-latest/arch/i386/kernel/io_apic.c 2004-10-26
22:06:53.000000000 -0700
@@ -2302,6 +2302,7 @@ static int ioapic_suspend(struct sys_dev
*(((int *)entry) + 1) = io_apic_read(dev->id, 0x11 + 2 *
i);
*(((int *)entry) + 0) = io_apic_read(dev->id, 0x10 + 2 *
i);
}
+ clear_IO_APIC();
spin_unlock_irqrestore(&ioapic_lock, flags);
return 0;
--- linux-2.6.9-latest/drivers/acpi/pci_link.c.org 2004-10-26
22:13:20.000000000 -0700
+++ linux-2.6.9-latest/drivers/acpi/pci_link.c 2004-10-27
13:31:43.000000000 -0700
@@ -714,9 +714,12 @@ irqrouter_resume(
{
struct list_head *node = NULL;
struct acpi_pci_link *link = NULL;
+ unsigned long flags;
ACPI_FUNCTION_TRACE("irqrouter_resume");
+ local_save_flags(flags);
+ local_irq_enable();
list_for_each(node, &acpi_link.entries) {
link = list_entry(node, struct acpi_pci_link, node);
@@ -727,6 +730,7 @@ irqrouter_resume(
acpi_pci_link_resume(link);
}
+ local_irq_restore(flags);
return_VALUE(0);
}
--- linux-2.6.9-latest/Makefile.org 2004-10-27 12:38:19.000000000
-0700
+++ linux-2.6.9-latest/Makefile 2004-10-27 13:06:10.000000000 -0700
@@ -571,7 +571,7 @@ libs-y := $(libs-y1) $(libs-y2)
# System.map is generated to document addresses of all kernel symbols
vmlinux-init := $(head-y) $(init-y)
-vmlinux-main := $(core-y) $(libs-y) $(drivers-y) $(net-y)
+vmlinux-main := $(drivers-y) $(core-y) $(libs-y) $(net-y)
vmlinux-all := $(vmlinux-init) $(vmlinux-main)
vmlinux-lds := arch/$(ARCH)/kernel/vmlinux.lds
>>
>>> >One thing I have noticed is that by adding the sysdev
suspend/resume
>>> >calls, I've gained a few seconds delay. I'll see if I can track
down
>>> the
>>> >cause.
>>> Is the problem MTRR resume must be with IRQ enabled, right? Could we
>>> implement a method sysdev resume with IRQ enabled? MTRR driver isn't
>>> the
>>
>>MTRR does not deserve to be sysdev. It is not essential for the
>>system, it only makes it slow.
>>
>>> only case. The ACPI Link device is another case, it's a
>>sysdev (it must
>>> resume before any PCI device resumed), but its resume (it
>>uses semaphore
>>> and non-atomic kmalloc) can't invoked with IRQ enabled. I
>>guess cpufreq
>>> driver is another case when suspend/resume SMP is supported.
>>
>>I do not see how enabling interrupts before setting up IRQs is good
>>idea.
>>
>>What about this one, instead?
>>
>>* ACPI Link device should allocate with GFP_ATOMIC
>>
>>* during suspend, locks can't be taken. (We stop userland, etc). So it
>>should be okay to down_trylock() and panic if that does not work.
>
>
>Actually, I am trying another approach for Link Device.
>
>- Temporarily enable interrupts during Link Device resume. Turn off all
the
>external interrupts at suspend time. They will remain suspended until
>interrupt device resumes.
>
>Something like the patch below. Only part I don't like is controlling
the
>resume order by Makefiles and the link order. Probably we can fix that
by
>sorting the sysdev list at the boottime, depending on our ordering
>requirements. I think, the resume order we need to maintain is
something
>like this: irqrouter, pit/timer, i8259, lapic, ioapic, others
Turn off PIC/IOAPIC in suspend time doesn't mean the PIC/IOAPIC is
disabled in resume. BIOS possibly turns them on in resume.
Thanks,
Shaohua
Hi!
> >What about this one, instead?
> >
> >* ACPI Link device should allocate with GFP_ATOMIC
> >
> >* during suspend, locks can't be taken. (We stop userland, etc). So it
> >should be okay to down_trylock() and panic if that does not work.
>
>
> Actually, I am trying another approach for Link Device.
>
> - Temporarily enable interrupts during Link Device resume. Turn off all
> the external interrupts at suspend time. They will remain suspended
> until interrupt device resumes.
Hmm, perhaps you should turn off external interrupts during resume
time...
But none of this is going to help. You want interrupts so that you can
allocate with GFP_KERNEL, right? But disk is stopped at this point, if
vm system attempts to swap, you are deadlocked, anyway.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms
Hi!
> >> Is the problem MTRR resume must be with IRQ enabled, right? Could we
> >> implement a method sysdev resume with IRQ enabled? MTRR driver isn't
> >> the
> >
> >MTRR does not deserve to be sysdev. It is not essential for the
> >system, it only makes it slow.
> It's a CPU driver, cpufreq driver is the same.
Well, it drives part of cpu. Fortunately that part of cpu is not required for
other drivers. cpufreq definitely should not be sysdev. If mtrr is not needed for drivers (I think it is not), it should not be a sysdev.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms
Hi!
> > >One thing I have noticed is that by adding the sysdev suspend/resume
> > >calls, I've gained a few seconds delay. I'll see if I can track down
> > the
> > >cause.
Sysdev suspend/resume calls should be added to swsusp1, too. Can
someone verify that this fixes stuff? I should get some sleep...
Pavel
--- clean/kernel/power/disk.c 2004-10-01 00:30:32.000000000 +0200
+++ linux/kernel/power/disk.c 2004-10-29 00:30:40.000000000 +0200
@@ -102,6 +116,7 @@
static void finish(void)
{
+ sysdev_resume();
device_resume();
platform_finish();
enable_nonboot_cpus();
@@ -133,8 +148,12 @@
free_some_memory();
disable_nonboot_cpus();
- if ((error = device_suspend(PM_SUSPEND_DISK)))
+ if ((error = device_suspend(PM_SUSPEND_DISK))) {
+ printk("Some devices failed to suspend\n");
goto Finish;
+ }
+
+ sysdev_suspend(POWER_SUSPEND_DISK);
return 0;
Finish:
--- clean/kernel/power/swsusp.c 2004-10-19 14:16:29.000000000 +0200
+++ linux/kernel/power/swsusp.c 2004-10-29 00:32:26.000000000 +0200
@@ -825,6 +812,7 @@
int swsusp_write(void)
{
int error;
+ sysdev_resume();
device_resume();
lock_swapdevices();
error = write_suspend_image();
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
On Fri, 2004-10-29 at 08:38, Pavel Machek wrote:
> Hi!
>
> > > >One thing I have noticed is that by adding the sysdev suspend/resume
> > > >calls, I've gained a few seconds delay. I'll see if I can track down
> > > the
> > > >cause.
Don't think I actually mentioned the case: it's the pit timer, it
appears (number before is jiffies). Interestingly, there is a delay in
suspending, but it only shows after we exit the sysdev call (when
interrupts are reenabled? Haven't looked more closely).
Suspending System Devices
Suspending type 'irqrouter':
4294741499: Starting global drivers irqrouter0
4294741499: Starting auxillary drivers.
Suspending type 'ioapic':
4294741499: Starting global drivers ioapic0
4294741499: Starting auxillary drivers.
4294741499: Starting generic driver.
4294741499: Done.
Suspending type 'lapic':
4294741499: Starting global drivers lapic0
4294741499: Starting auxillary drivers.
4294741499: Starting generic driver.
4294741499: Done.
Suspending type 'timer':
4294741499: Starting global drivers timer0
4294741499: Starting auxillary drivers.
Suspending type 'pit':
4294741499: Starting global drivers pit0
4294741499: Starting auxillary drivers.
4294741499: Starting generic driver.
4294741499: Done.
Suspending type 'i8259':
4294741499: Starting global drivers i82590
4294741499: Starting auxillary drivers.
4294741499: Starting generic driver.
4294741499: Done.
Suspending type 'cpu':
4294741499: Starting global drivers cpu0
4294741499: Starting auxillary drivers.
4294741499: Starting global drivers cpu1
4294741499: Starting auxillary drivers.
Back from sysdev_suspend.
sysdev_resume
Resuming System Devices
Resuming type 'cpu':
4294742128: cpu0
4294742128: Starting auxillary drivers.
4294742128: Starting global drivers cpu0
4294742128: Done.
4294742128: cpu1
4294742128: Starting auxillary drivers.
4294742128: Starting global drivers cpu1
4294742128: Done.
Resuming type 'i8259':
4294742128: i82590
4294742128: Starting generic driver.
4294742128: Starting auxillary drivers.
4294742128: Starting global drivers i82590
4294742128: Done.
Resuming type 'pit':
4294742128: pit0
4294742128: Starting generic driver.
4294772030: Starting auxillary drivers.
4294772030: Starting global drivers pit0
4294772030: Done.
Resuming type 'timer':
4294772030: timer0
4294772030: Starting generic driver.
4294772030: Starting auxillary drivers.
4294772030: Starting global drivers timer0
4294772030: Done.
Resuming type 'lapic':
4294772030: lapic0
4294772030: Starting generic driver.
4294772030: Starting auxillary drivers.
4294772030: Starting global drivers lapic0
4294772030: Done.
Resuming type 'ioapic':
4294772030: ioapic0
4294772030: Starting generic driver.
4294772030: Starting auxillary drivers.
4294772030: Starting global drivers ioapic0
4294772030: Done.
Resuming type 'irqrouter':
4294772030: irqrouter0
4294772030: Starting generic driver.
4294772030: Starting auxillary drivers.
4294772030: Starting global drivers irqrouter0
4294772030: Done.
power up suspend device tree.
done
Regards,
Nigel
--
Nigel Cunningham
Pastoral Worker
Christian Reformed Church of Tuggeranong
PO Box 1004, Tuggeranong, ACT 2901
Everyone lives by faith. Some people just don't believe it.
Want proof? Try to prove that the theory of evolution is true.
>> >> Is the problem MTRR resume must be with IRQ enabled, right? Could
we
>> >> implement a method sysdev resume with IRQ enabled? MTRR driver
isn't
>> >> the
>> >
>> >MTRR does not deserve to be sysdev. It is not essential for the
>> >system, it only makes it slow.
>> It's a CPU driver, cpufreq driver is the same.
>
>Well, it drives part of cpu. Fortunately that part of cpu is not
required
>for
>other drivers. cpufreq definitely should not be sysdev. If mtrr is not
>needed for drivers (I think it is not), it should not be a sysdev.
mtrr can not be a sysdev, but this is current cpufreq driver model. I
guess you need convince Dominik.
Thanks,
Shaohua