2009-01-29 00:33:47

by Parag Warudkar

[permalink] [raw]
Subject: 2.6.29-rc3: tg3 dead after resume


With 2.6.29-rc3 suspend/resume has started working on my workstation again
(did not resume with rc2 - not sure when it broke) but tg3 is dead
after resume.

This is similar to the issue reported back in Jul 2007 -
http://kerneltrap.org/mailarchive/linux-kernel/2007/8/1/154073/thread
which was fixed with a patch to unconditionally save/restore pci config
space - that one is still in tg3.c.

After resume tg3 complains that no firmware is running and eth0 is
non-existent. Rmmoding and modprobing tg3 again causes some timeouts and
errors from tg3 and the link still doesn't work.

Reboot fixes it.


Parag


2009-01-29 01:10:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Wed, 28 Jan 2009, Parag Warudkar wrote:
>
> This is similar to the issue reported back in Jul 2007 -
> http://kerneltrap.org/mailarchive/linux-kernel/2007/8/1/154073/thread
> which was fixed with a patch to unconditionally save/restore pci config
> space - that one is still in tg3.c.

In fact, the new PCI suspend/restore code should have made that
unnecessary, since the PCI layer now makes sure that a save/restore is
done even if the driver hadn't done it.

But at the same time, still having the driver do it certainly shouldn't
have _hurt_ anything either. But it's quite possible that the tg3 thing is
very sensitive to the exact order things happen in - there's a lot of
comments about bugs in there ;)

> After resume tg3 complains that no firmware is running and eth0 is
> non-existent. Rmmoding and modprobing tg3 again causes some timeouts and
> errors from tg3 and the link still doesn't work.

That seems to imply that even the reset failed, which is interesting.

But it also possibly means that the problem is not necessarily the driver
itself, but some cached state that we keep around in "struct pci_dev" even
across a module load/unload.

For example, if we get the "dev->current_state" cache wrong, then we may
not actually end up changing it when we should, because we think we
already match the target state. I don't _think_ that is it, but that's the
kind of thing that could happen.

Can you do a

lspci -vvxxx -s [tg3-device]

before-and-after suspend? Is there some state that looks like it got
corrupted?

Linus

2009-01-29 01:49:45

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Wed, 28 Jan 2009, Linus Torvalds wrote:

> For example, if we get the "dev->current_state" cache wrong, then we may
> not actually end up changing it when we should, because we think we
> already match the target state. I don't _think_ that is it, but that's the
> kind of thing that could happen.
>
> Can you do a
>
> lspci -vvxxx -s [tg3-device]
>
> before-and-after suspend? Is there some state that looks like it got
> corrupted?

Sure, diff -u below. There are differences but not sure if they are
abnormal or expected.

Also, BTW, reverting the only tg3 specific commit -
commit 9e9fd12dc0679643c191fc9795a3021807e77de4
Author: Matt Carlson <[email protected]>
Date: Mon Jan 19 16:57:45 2009 -0800

tg3: Fix firmware loading

did not help.

parag@parag-desktop:~$ diff -u lspci-pre-suspend lspci-post-suspend
--- lspci-pre-suspend 2009-01-28 20:35:37.070584068 -0500
+++ lspci-post-suspend 2009-01-28 20:36:56.922471408 -0500
@@ -12,7 +12,7 @@
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable+
- Address: 00000000fee0f00c Data: 41c9
+ Address: 00000000fee0f00c Data: 41d1
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
@@ -36,15 +36,15 @@
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
-50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7d c9 08 78
-60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
-70: f2 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
-80: 3c 10 07 13 00 00 00 00 34 00 13 04 82 70 08 fc
-90: 19 be 00 01 00 00 00 b7 00 00 00 00 14 00 00 00
-a0: 00 00 00 00 4c 01 00 00 00 00 00 00 3e 01 00 00
-b0: 00 00 00 00 00 00 00 36 00 00 00 00 00 00 00 00
+50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7e cb 08 a8
+60: 00 00 00 00 00 00 00 00 9a 02 02 a0 00 00 00 10
+70: 72 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
+80: 3c 10 07 13 00 00 00 00 00 00 00 00 fe 70 08 fc
+90: 11 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 80 00 00 0e 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
-f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00
+f0: 00 00 00 00 d1 41 00 00 00 00 00 00 00 00 00 00


Parag

2009-01-29 02:11:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Wed, 28 Jan 2009, Parag Warudkar wrote:
>
> Sure, diff -u below. There are differences but not sure if they are
> abnormal or expected.

Well, they're all in the "extended set", ie not the basic registers that
the PCI layer saves. The PCI layer normally just saves the low 16 dwords,
along with the PCI[EX] capability thing.

None of the PCI save/restore routines have ever saved the extended state
(well, "ever" is a strong word - I think we long ago used to pass in how
many bytes we wanted saved, but got rid of it), and it certainly didn't
change with the recent PCI suspend/resume changes.

I get the feeling that it's some odd tg3 issue. That tg3 driver does have
that special

/* Make sure register accesses (indirect or otherwise)
* will function correctly.
*/
pci_write_config_dword(tp->pdev,
TG3PCI_MISC_HOST_CTRL,
tp->misc_host_ctrl);

in its own version of setting the power state, and maybe that really
_must_ happen before we actually set the state back to PCI_D0. That sounds
very odd, but hey..

I added Matt Carlson to the cc, since he seems to be the main tg3
authority here.

Matt: the whole discussion is on netdev and the kernel mailing list, but
the short version is that -rc3 suspends and resumes for Parag again
(unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI
layer now does more at resume time (very early, when interrupts are still
off), see

- pci_pm_resume_noirq ->
pci_pm_default_resume_noirq() ->
pci_restore_standard_config()

for more of the details (basically it always does that
"pci_restore_state()" and tries to bring the device back to PCI_D0).

Linus

2009-01-29 02:20:23

by Matt Carlson

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Wed, Jan 28, 2009 at 06:10:37PM -0800, Linus Torvalds wrote:
>
>
> On Wed, 28 Jan 2009, Parag Warudkar wrote:
> >
> > Sure, diff -u below. There are differences but not sure if they are
> > abnormal or expected.
>
> Well, they're all in the "extended set", ie not the basic registers that
> the PCI layer saves. The PCI layer normally just saves the low 16 dwords,
> along with the PCI[EX] capability thing.
>
> None of the PCI save/restore routines have ever saved the extended state
> (well, "ever" is a strong word - I think we long ago used to pass in how
> many bytes we wanted saved, but got rid of it), and it certainly didn't
> change with the recent PCI suspend/resume changes.
>
> I get the feeling that it's some odd tg3 issue. That tg3 driver does have
> that special
>
> /* Make sure register accesses (indirect or otherwise)
> * will function correctly.
> */
> pci_write_config_dword(tp->pdev,
> TG3PCI_MISC_HOST_CTRL,
> tp->misc_host_ctrl);
>
> in its own version of setting the power state, and maybe that really
> _must_ happen before we actually set the state back to PCI_D0. That sounds
> very odd, but hey..
>
> I added Matt Carlson to the cc, since he seems to be the main tg3
> authority here.
>
> Matt: the whole discussion is on netdev and the kernel mailing list, but
> the short version is that -rc3 suspends and resumes for Parag again
> (unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI
> layer now does more at resume time (very early, when interrupts are still
> off), see
>
> - pci_pm_resume_noirq ->
> pci_pm_default_resume_noirq() ->
> pci_restore_standard_config()
>
> for more of the details (basically it always does that
> "pci_restore_state()" and tries to bring the device back to PCI_D0).

Thanks Linus. I'm looking over the diffs Parag sent and I already see
some suspicious register settings. Let me think about this some more
and then I'll jump into the discussion.

2009-01-29 18:42:38

by Matt Carlson

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Wed, Jan 28, 2009 at 05:49:18PM -0800, Parag Warudkar wrote:
>
>
> On Wed, 28 Jan 2009, Linus Torvalds wrote:
>
> > For example, if we get the "dev->current_state" cache wrong, then we may
> > not actually end up changing it when we should, because we think we
> > already match the target state. I don't _think_ that is it, but that's the
> > kind of thing that could happen.
> >
> > Can you do a
> >
> > lspci -vvxxx -s [tg3-device]
> >
> > before-and-after suspend? Is there some state that looks like it got
> > corrupted?
>
> Sure, diff -u below. There are differences but not sure if they are
> abnormal or expected.
>
> Also, BTW, reverting the only tg3 specific commit -
> commit 9e9fd12dc0679643c191fc9795a3021807e77de4
> Author: Matt Carlson <[email protected]>
> Date: Mon Jan 19 16:57:45 2009 -0800
>
> tg3: Fix firmware loading
>
> did not help.
>
> parag@parag-desktop:~$ diff -u lspci-pre-suspend lspci-post-suspend
> --- lspci-pre-suspend 2009-01-28 20:35:37.070584068 -0500
> +++ lspci-post-suspend 2009-01-28 20:36:56.922471408 -0500
> @@ -12,7 +12,7 @@
> Capabilities: [50] Vital Product Data <?>
> Capabilities: [58] Vendor Specific Information <?>
> Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> - Address: 00000000fee0f00c Data: 41c9
> + Address: 00000000fee0f00c Data: 41d1
> Capabilities: [d0] Express (v1) Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> <4us, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> @@ -36,15 +36,15 @@
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
> 30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
> 40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
> -50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7d c9 08 78
> -60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
> -70: f2 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
> -80: 3c 10 07 13 00 00 00 00 34 00 13 04 82 70 08 fc
> -90: 19 be 00 01 00 00 00 b7 00 00 00 00 14 00 00 00
> -a0: 00 00 00 00 4c 01 00 00 00 00 00 00 3e 01 00 00
> -b0: 00 00 00 00 00 00 00 36 00 00 00 00 00 00 00 00
> +50: 03 58 fc 00 00 00 00 78 09 e8 78 00 7e cb 08 a8
> +60: 00 00 00 00 00 00 00 00 9a 02 02 a0 00 00 00 10
> +70: 72 10 00 00 c0 00 00 00 2c 00 00 00 00 00 00 00
> +80: 3c 10 07 13 00 00 00 00 00 00 00 00 fe 70 08 fc
> +90: 11 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> +a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> +b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 80 00 00 0e 00 00 00 00 00 00 00
> d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
> e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
> -f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00
> +f0: 00 00 00 00 d1 41 00 00 00 00 00 00 00 00 00 00

O.K. These differences can probably be attributed to the driver's chip
reset failure. For some reason, the driver has lost communication with
the firmware through the device's shared memory. A cascading series of
errors will probably be the consequence.

Can you apply the following test patch and see if it helps? The patch
does two things. First, it enables a bit which should restore firmware
communication. If that fixes the problem, then let me know and I'll
spin a proper patch.

In the event that it doesn't work, the patch goes on to test the memory
mapping by simply printing the register value at offset 0x0. The value
should be the device's vendor ID and device ID. Please post the
results so that I can verify it.


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 8b3f846..39fce42 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
{
tg3_switch_clocks(tp);

+ printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
+ tp->dev->name, tr32(0x0) );
+
+ tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
+
tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);

return tg3_reset_hw(tp, reset_phy);

2009-01-29 22:07:33

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Thu, 29 Jan 2009, Matt Carlson wrote:

> Can you apply the following test patch and see if it helps? The patch
> does two things. First, it enables a bit which should restore firmware
> communication. If that fixes the problem, then let me know and I'll
> spin a proper patch.
>
> In the event that it doesn't work, the patch goes on to test the memory
> mapping by simply printing the register value at offset 0x0. The value
> should be the device's vendor ID and device ID. Please post the
> results so that I can verify it.
>
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 8b3f846..39fce42 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> {
> tg3_switch_clocks(tp);
>
> + printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> + tp->dev->name, tr32(0x0) );
> +
> + tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
> +
> tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
>
> return tg3_reset_hw(tp, reset_phy);
>

Hi Matt,

Thanks for the patch. It didn't help with resume - but below is the
output after patching, let me know if you need more details.

( Looks like 0xffffffff is invalid/corrupted device id /vendor id? )

[ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0xc (was 0x0, writing 0x20040000)
[ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
[ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)

[snip]

[ 164.450277] pcieport-driver 0000:1e:00.0: setting latency timer to 64
[ 164.450415] pcieport-driver 0000:1e:01.0: setting latency timer to 64
[ 164.450493] tg3 0000:0e:00.0: restoring config space at offset 0xc (was 0x0, writing 0x20040000)
[ 164.451110] serial 00:08: activated

[snip]

[ 168.913863] Restarting tasks ... done.
[ 170.332953] tg3 0000:0e:00.0: wake-up capability disabled by ACPI
[ 170.332960] tg3 0000:0e:00.0: PME# disabled
[ 170.333047] tg3 0000:0e:00.0: irq 54 for MSI/MSI-X
[ 170.333250] eth0: Reg value at offset 0x0 is 0xffffffff
[ 170.394281] [drm] Loading R500 Microcode
[ 170.394330] [drm] Num pipes: 1
[ 171.726650] tg3: eth0: No firmware running.
[ 183.119745] ADDRCONF(NETDEV_UP): eth0: link is not ready


Parag

2009-01-29 22:22:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thursday 29 January 2009, Matt Carlson wrote:
> On Wed, Jan 28, 2009 at 06:10:37PM -0800, Linus Torvalds wrote:
> >
> >
> > On Wed, 28 Jan 2009, Parag Warudkar wrote:
> > >
> > > Sure, diff -u below. There are differences but not sure if they are
> > > abnormal or expected.
> >
> > Well, they're all in the "extended set", ie not the basic registers that
> > the PCI layer saves. The PCI layer normally just saves the low 16 dwords,
> > along with the PCI[EX] capability thing.
> >
> > None of the PCI save/restore routines have ever saved the extended state
> > (well, "ever" is a strong word - I think we long ago used to pass in how
> > many bytes we wanted saved, but got rid of it), and it certainly didn't
> > change with the recent PCI suspend/resume changes.
> >
> > I get the feeling that it's some odd tg3 issue. That tg3 driver does have
> > that special
> >
> > /* Make sure register accesses (indirect or otherwise)
> > * will function correctly.
> > */
> > pci_write_config_dword(tp->pdev,
> > TG3PCI_MISC_HOST_CTRL,
> > tp->misc_host_ctrl);
> >
> > in its own version of setting the power state, and maybe that really
> > _must_ happen before we actually set the state back to PCI_D0. That sounds
> > very odd, but hey..
> >
> > I added Matt Carlson to the cc, since he seems to be the main tg3
> > authority here.
> >
> > Matt: the whole discussion is on netdev and the kernel mailing list, but
> > the short version is that -rc3 suspends and resumes for Parag again
> > (unlike -rc2), but tg3 doesn't appear to resume properly. The generic PCI
> > layer now does more at resume time (very early, when interrupts are still
> > off), see
> >
> > - pci_pm_resume_noirq ->
> > pci_pm_default_resume_noirq() ->
> > pci_restore_standard_config()
> >
> > for more of the details (basically it always does that
> > "pci_restore_state()" and tries to bring the device back to PCI_D0).
>
> Thanks Linus. I'm looking over the diffs Parag sent and I already see
> some suspicious register settings. Let me think about this some more
> and then I'll jump into the discussion.

FWIW, I can't reproduce the problem with tg3 on my testbox. Suspend to RAM
and resume seem to work correctly on it.

Thanks,
Rafael

2009-01-29 22:23:21

by Matt Carlson

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thu, Jan 29, 2009 at 02:06:35PM -0800, Parag Warudkar wrote:
>
>
> On Thu, 29 Jan 2009, Matt Carlson wrote:
>
> > Can you apply the following test patch and see if it helps? The patch
> > does two things. First, it enables a bit which should restore firmware
> > communication. If that fixes the problem, then let me know and I'll
> > spin a proper patch.
> >
> > In the event that it doesn't work, the patch goes on to test the memory
> > mapping by simply printing the register value at offset 0x0. The value
> > should be the device's vendor ID and device ID. Please post the
> > results so that I can verify it.
> >
> >
> > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > index 8b3f846..39fce42 100644
> > --- a/drivers/net/tg3.c
> > +++ b/drivers/net/tg3.c
> > @@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> > {
> > tg3_switch_clocks(tp);
> >
> > + printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> > + tp->dev->name, tr32(0x0) );
> > +
> > + tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
> > +
> > tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
> >
> > return tg3_reset_hw(tp, reset_phy);
> >
>
> Hi Matt,
>
> Thanks for the patch. It didn't help with resume - but below is the
> output after patching, let me know if you need more details.
>
> ( Looks like 0xffffffff is invalid/corrupted device id /vendor id? )
>
> [ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0xc (was 0x0, writing 0x20040000)
> [ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
> [ 163.856001] tg3 0000:0e:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
>
> [snip]
>
> [ 164.450277] pcieport-driver 0000:1e:00.0: setting latency timer to 64
> [ 164.450415] pcieport-driver 0000:1e:01.0: setting latency timer to 64
> [ 164.450493] tg3 0000:0e:00.0: restoring config space at offset 0xc (was 0x0, writing 0x20040000)
> [ 164.451110] serial 00:08: activated
>
> [snip]
>
> [ 168.913863] Restarting tasks ... done.
> [ 170.332953] tg3 0000:0e:00.0: wake-up capability disabled by ACPI
> [ 170.332960] tg3 0000:0e:00.0: PME# disabled
> [ 170.333047] tg3 0000:0e:00.0: irq 54 for MSI/MSI-X
> [ 170.333250] eth0: Reg value at offset 0x0 is 0xffffffff
^^^^^^^^^^^^^^^^^
So here is our problem. For some reason the memory mapped IO is
failing. I'll have to think about how and why that might happen.

FWIW, I can suspend and resume using the latest linux-2.6 kernel
on a machine with a similar chip here. The problem doesn't seem to
affect all Broadcom devices.

> [ 170.394281] [drm] Loading R500 Microcode
> [ 170.394330] [drm] Num pipes: 1
> [ 171.726650] tg3: eth0: No firmware running.
> [ 183.119745] ADDRCONF(NETDEV_UP): eth0: link is not ready
>
>
> Parag
>

2009-01-29 22:36:07

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Thu, 29 Jan 2009, Matt Carlson wrote:

> FWIW, I can suspend and resume using the latest linux-2.6 kernel
> on a machine with a similar chip here. The problem doesn't seem to
> affect all Broadcom devices.

It is failing for me on HP xw6600 workstation, if that helps in any way.

Parag

2009-01-29 23:04:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thursday 29 January 2009, Parag Warudkar wrote:
>
> On Thu, 29 Jan 2009, Matt Carlson wrote:
>
> > Can you apply the following test patch and see if it helps? The patch
> > does two things. First, it enables a bit which should restore firmware
> > communication. If that fixes the problem, then let me know and I'll
> > spin a proper patch.
> >
> > In the event that it doesn't work, the patch goes on to test the memory
> > mapping by simply printing the register value at offset 0x0. The value
> > should be the device's vendor ID and device ID. Please post the
> > results so that I can verify it.
> >
> >
> > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > index 8b3f846..39fce42 100644
> > --- a/drivers/net/tg3.c
> > +++ b/drivers/net/tg3.c
> > @@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> > {
> > tg3_switch_clocks(tp);
> >
> > + printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> > + tp->dev->name, tr32(0x0) );
> > +
> > + tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
> > +
> > tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
> >
> > return tg3_reset_hw(tp, reset_phy);
> >
>
> Hi Matt,
>
> Thanks for the patch. It didn't help with resume - but below is the
> output after patching, let me know if you need more details.

[--snip--]

In the meantime I tried to rework tg3 suspend/resume so that it uses the new
PCI core capability of handling the PCI-specific parts of both operations.

The patch is appended, please see if it makes any difference.

Thanks,
Rafael

---
drivers/net/tg3.c | 70 +++++++++++++++++++-----------------------------------
1 file changed, 25 insertions(+), 45 deletions(-)

Index: linux-2.6/drivers/net/tg3.c
===================================================================
--- linux-2.6.orig/drivers/net/tg3.c
+++ linux-2.6/drivers/net/tg3.c
@@ -13330,18 +13330,13 @@ static void __devexit tg3_remove_one(str
}
}

-static int tg3_suspend(struct pci_dev *pdev, pm_message_t state)
+#ifdef CONFIG_PM
+
+static int tg3_suspend(struct device *device)
{
+ struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
struct tg3 *tp = netdev_priv(dev);
- pci_power_t target_state;
- int err;
-
- /* PCI register 4 needs to be saved whether netif_running() or not.
- * MSI address and data need to be saved if using MSI and
- * netif_running().
- */
- pci_save_state(pdev);

if (!netif_running(dev))
return 0;
@@ -13363,50 +13358,19 @@ static int tg3_suspend(struct pci_dev *p
tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE;
tg3_full_unlock(tp);

- target_state = pdev->pm_cap ? pci_target_state(pdev) : PCI_D3hot;
-
- err = tg3_set_power_state(tp, target_state);
- if (err) {
- int err2;
-
- tg3_full_lock(tp, 0);
-
- tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
- err2 = tg3_restart_hw(tp, 1);
- if (err2)
- goto out;
-
- tp->timer.expires = jiffies + tp->timer_offset;
- add_timer(&tp->timer);
-
- netif_device_attach(dev);
- tg3_netif_start(tp);
-
-out:
- tg3_full_unlock(tp);
-
- if (!err2)
- tg3_phy_start(tp);
- }
-
- return err;
+ return 0;
}

-static int tg3_resume(struct pci_dev *pdev)
+static int tg3_resume(struct device *device)
{
+ struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
struct tg3 *tp = netdev_priv(dev);
int err;

- pci_restore_state(tp->pdev);
-
if (!netif_running(dev))
return 0;

- err = tg3_set_power_state(tp, PCI_D0);
- if (err)
- return err;
-
netif_device_attach(dev);

tg3_full_lock(tp, 0);
@@ -13430,13 +13394,29 @@ out:
return err;
}

+struct dev_pm_ops tg3_pm_ops = {
+ .suspend = tg3_suspend,
+ .resume = tg3_resume,
+ .freeze = tg3_suspend,
+ .thaw = tg3_resume,
+ .poweroff = tg3_suspend,
+ .restore = tg3_resume,
+};
+
+#define TG3_PM_OPS (&tg3_pm_ops)
+
+#else /* !CONFIG_PM */
+
+#define TG3_PM_OPS NULL
+
+#endif /* !CONFIG_PM */
+
static struct pci_driver tg3_driver = {
.name = DRV_MODULE_NAME,
.id_table = tg3_pci_tbl,
.probe = tg3_init_one,
.remove = __devexit_p(tg3_remove_one),
- .suspend = tg3_suspend,
- .resume = tg3_resume
+ .driver.pm = TG3_PM_OPS,
};

static int __init tg3_init(void)

2009-01-29 23:10:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thursday 29 January 2009, Parag Warudkar wrote:
>
> On Thu, 29 Jan 2009, Matt Carlson wrote:
>
> > FWIW, I can suspend and resume using the latest linux-2.6 kernel
> > on a machine with a similar chip here. The problem doesn't seem to
> > affect all Broadcom devices.
>
> It is failing for me on HP xw6600 workstation, if that helps in any way.

Hm, I have an xw4600 nearby, will try tomorrow.

Thanks,
Rafael

2009-01-29 23:41:52

by Matt Carlson

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thu, Jan 29, 2009 at 03:03:37PM -0800, Rafael J. Wysocki wrote:
> On Thursday 29 January 2009, Parag Warudkar wrote:
> >
> > On Thu, 29 Jan 2009, Matt Carlson wrote:
> >
> > > Can you apply the following test patch and see if it helps? The patch
> > > does two things. First, it enables a bit which should restore firmware
> > > communication. If that fixes the problem, then let me know and I'll
> > > spin a proper patch.
> > >
> > > In the event that it doesn't work, the patch goes on to test the memory
> > > mapping by simply printing the register value at offset 0x0. The value
> > > should be the device's vendor ID and device ID. Please post the
> > > results so that I can verify it.
> > >
> > >
> > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > > index 8b3f846..39fce42 100644
> > > --- a/drivers/net/tg3.c
> > > +++ b/drivers/net/tg3.c
> > > @@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> > > {
> > > tg3_switch_clocks(tp);
> > >
> > > + printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> > > + tp->dev->name, tr32(0x0) );
> > > +
> > > + tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
> > > +
> > > tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
> > >
> > > return tg3_reset_hw(tp, reset_phy);
> > >
> >
> > Hi Matt,
> >
> > Thanks for the patch. It didn't help with resume - but below is the
> > output after patching, let me know if you need more details.
>
> [--snip--]
>
> In the meantime I tried to rework tg3 suspend/resume so that it uses the new
> PCI core capability of handling the PCI-specific parts of both operations.
>
> The patch is appended, please see if it makes any difference.
>
> Thanks,
> Rafael
>
> ---
> drivers/net/tg3.c | 70 +++++++++++++++++++-----------------------------------
> 1 file changed, 25 insertions(+), 45 deletions(-)
>
> Index: linux-2.6/drivers/net/tg3.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/tg3.c
> +++ linux-2.6/drivers/net/tg3.c
> @@ -13330,18 +13330,13 @@ static void __devexit tg3_remove_one(str
> }
> }
>
> -static int tg3_suspend(struct pci_dev *pdev, pm_message_t state)
> +#ifdef CONFIG_PM
> +
> +static int tg3_suspend(struct device *device)
> {
> + struct pci_dev *pdev = to_pci_dev(device);
> struct net_device *dev = pci_get_drvdata(pdev);
> struct tg3 *tp = netdev_priv(dev);
> - pci_power_t target_state;
> - int err;
> -
> - /* PCI register 4 needs to be saved whether netif_running() or not.
> - * MSI address and data need to be saved if using MSI and
> - * netif_running().
> - */
> - pci_save_state(pdev);
>
> if (!netif_running(dev))
> return 0;
> @@ -13363,50 +13358,19 @@ static int tg3_suspend(struct pci_dev *p
> tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE;
> tg3_full_unlock(tp);
>
> - target_state = pdev->pm_cap ? pci_target_state(pdev) : PCI_D3hot;
> -
> - err = tg3_set_power_state(tp, target_state);

tg3_set_power_state() does way more than configuring the power
management registers to the desired state though. It sets up WOL,
configures the chip clocks, etc. This isn't safe to remove.

> - if (err) {
> - int err2;
> -
> - tg3_full_lock(tp, 0);
> -
> - tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
> - err2 = tg3_restart_hw(tp, 1);
> - if (err2)
> - goto out;
> -
> - tp->timer.expires = jiffies + tp->timer_offset;
> - add_timer(&tp->timer);
> -
> - netif_device_attach(dev);
> - tg3_netif_start(tp);
> -
> -out:
> - tg3_full_unlock(tp);
> -
> - if (!err2)
> - tg3_phy_start(tp);
> - }
> -
> - return err;
> + return 0;
> }
>
> -static int tg3_resume(struct pci_dev *pdev)
> +static int tg3_resume(struct device *device)
> {
> + struct pci_dev *pdev = to_pci_dev(device);
> struct net_device *dev = pci_get_drvdata(pdev);
> struct tg3 *tp = netdev_priv(dev);
> int err;
>
> - pci_restore_state(tp->pdev);
> -
> if (!netif_running(dev))
> return 0;
>
> - err = tg3_set_power_state(tp, PCI_D0);

...and here tg3_set_power_state() restores our ability to communicate
with the chip via MMIO. Also, after restoring the power state to D0,
the chip is switched back from VAux to VMain. This isn't safe either.

> - if (err)
> - return err;
> -
> netif_device_attach(dev);
>
> tg3_full_lock(tp, 0);
> @@ -13430,13 +13394,29 @@ out:
> return err;
> }
>
> +struct dev_pm_ops tg3_pm_ops = {
> + .suspend = tg3_suspend,
> + .resume = tg3_resume,
> + .freeze = tg3_suspend,
> + .thaw = tg3_resume,
> + .poweroff = tg3_suspend,
> + .restore = tg3_resume,
> +};
> +
> +#define TG3_PM_OPS (&tg3_pm_ops)
> +
> +#else /* !CONFIG_PM */
> +
> +#define TG3_PM_OPS NULL
> +
> +#endif /* !CONFIG_PM */
> +
> static struct pci_driver tg3_driver = {
> .name = DRV_MODULE_NAME,
> .id_table = tg3_pci_tbl,
> .probe = tg3_init_one,
> .remove = __devexit_p(tg3_remove_one),
> - .suspend = tg3_suspend,
> - .resume = tg3_resume
> + .driver.pm = TG3_PM_OPS,
> };
>
> static int __init tg3_init(void)
>
>

2009-01-30 00:11:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Friday 30 January 2009, Matt Carlson wrote:
> On Thu, Jan 29, 2009 at 03:03:37PM -0800, Rafael J. Wysocki wrote:
> > On Thursday 29 January 2009, Parag Warudkar wrote:
> > >
> > > On Thu, 29 Jan 2009, Matt Carlson wrote:
> > >
> > > > Can you apply the following test patch and see if it helps? The patch
> > > > does two things. First, it enables a bit which should restore firmware
> > > > communication. If that fixes the problem, then let me know and I'll
> > > > spin a proper patch.
> > > >
> > > > In the event that it doesn't work, the patch goes on to test the memory
> > > > mapping by simply printing the register value at offset 0x0. The value
> > > > should be the device's vendor ID and device ID. Please post the
> > > > results so that I can verify it.
> > > >
> > > >
> > > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > > > index 8b3f846..39fce42 100644
> > > > --- a/drivers/net/tg3.c
> > > > +++ b/drivers/net/tg3.c
> > > > @@ -7227,6 +7227,11 @@ static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> > > > {
> > > > tg3_switch_clocks(tp);
> > > >
> > > > + printk( KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> > > > + tp->dev->name, tr32(0x0) );
> > > > +
> > > > + tw32(MEMARB_MODE, tr32(MEMARB_MODE) | MEMARB_MODE_ENABLE);
> > > > +
> > > > tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
> > > >
> > > > return tg3_reset_hw(tp, reset_phy);
> > > >
> > >
> > > Hi Matt,
> > >
> > > Thanks for the patch. It didn't help with resume - but below is the
> > > output after patching, let me know if you need more details.
> >
> > [--snip--]
> >
> > In the meantime I tried to rework tg3 suspend/resume so that it uses the new
> > PCI core capability of handling the PCI-specific parts of both operations.
> >
> > The patch is appended, please see if it makes any difference.
> >
> > Thanks,
> > Rafael
> >
> > ---
> > drivers/net/tg3.c | 70 +++++++++++++++++++-----------------------------------
> > 1 file changed, 25 insertions(+), 45 deletions(-)
> >
> > Index: linux-2.6/drivers/net/tg3.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/net/tg3.c
> > +++ linux-2.6/drivers/net/tg3.c
> > @@ -13330,18 +13330,13 @@ static void __devexit tg3_remove_one(str
> > }
> > }
> >
> > -static int tg3_suspend(struct pci_dev *pdev, pm_message_t state)
> > +#ifdef CONFIG_PM
> > +
> > +static int tg3_suspend(struct device *device)
> > {
> > + struct pci_dev *pdev = to_pci_dev(device);
> > struct net_device *dev = pci_get_drvdata(pdev);
> > struct tg3 *tp = netdev_priv(dev);
> > - pci_power_t target_state;
> > - int err;
> > -
> > - /* PCI register 4 needs to be saved whether netif_running() or not.
> > - * MSI address and data need to be saved if using MSI and
> > - * netif_running().
> > - */
> > - pci_save_state(pdev);
> >
> > if (!netif_running(dev))
> > return 0;
> > @@ -13363,50 +13358,19 @@ static int tg3_suspend(struct pci_dev *p
> > tp->tg3_flags &= ~TG3_FLAG_INIT_COMPLETE;
> > tg3_full_unlock(tp);
> >
> > - target_state = pdev->pm_cap ? pci_target_state(pdev) : PCI_D3hot;
> > -
> > - err = tg3_set_power_state(tp, target_state);
>
> tg3_set_power_state() does way more than configuring the power
> management registers to the desired state though. It sets up WOL,
> configures the chip clocks, etc. This isn't safe to remove.

OK, so it requires more care to be taken.

However, suspend-resume seems to work on my test box with this patch applied,
although admittedly I haven't tested WoL.

I still am interested if it makes any difference for Parag.

> > - if (err) {
> > - int err2;
> > -
> > - tg3_full_lock(tp, 0);
> > -
> > - tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
> > - err2 = tg3_restart_hw(tp, 1);
> > - if (err2)
> > - goto out;
> > -
> > - tp->timer.expires = jiffies + tp->timer_offset;
> > - add_timer(&tp->timer);
> > -
> > - netif_device_attach(dev);
> > - tg3_netif_start(tp);
> > -
> > -out:
> > - tg3_full_unlock(tp);
> > -
> > - if (!err2)
> > - tg3_phy_start(tp);
> > - }
> > -
> > - return err;
> > + return 0;
> > }
> >
> > -static int tg3_resume(struct pci_dev *pdev)
> > +static int tg3_resume(struct device *device)
> > {
> > + struct pci_dev *pdev = to_pci_dev(device);
> > struct net_device *dev = pci_get_drvdata(pdev);
> > struct tg3 *tp = netdev_priv(dev);
> > int err;
> >
> > - pci_restore_state(tp->pdev);
> > -
> > if (!netif_running(dev))
> > return 0;
> >
> > - err = tg3_set_power_state(tp, PCI_D0);
>
> ...and here tg3_set_power_state() restores our ability to communicate
> with the chip via MMIO.

If that were the case, it wouldn't work after a resume with the patch applied.
Still, it does work, at least with the chip I have here.

> Also, after restoring the power state to D0, the chip is switched back from
> VAux to VMain.

Are you sure this actually happens? On my box the chip is in D0 already
when the BIOS returns control to the kernel.

> This isn't safe either.

Anyway, I'd very much prefer to separate the generic PCI operations from the
device-specific code.

Thanks,
Rafael

2009-01-30 18:41:04

by Matt Carlson

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Thu, Jan 29, 2009 at 02:35:44PM -0800, Parag Warudkar wrote:
>
>
> On Thu, 29 Jan 2009, Matt Carlson wrote:
>
> > FWIW, I can suspend and resume using the latest linux-2.6 kernel
> > on a machine with a similar chip here. The problem doesn't seem to
> > affect all Broadcom devices.
>
> It is failing for me on HP xw6600 workstation, if that helps in any way.
>
> Parag

O.K. Let's test some more assumptions. Can you apply the following
patch and observe the system logs when the device is first loaded and
again after resume. The patch looks at the pci command register to
verify that memory space IO is indeed enabled. (It should be.) This is
all that should be needed for MMIO to work.

If the PCI_COMMAND message doesn't match, then it means that the
PCI_COMMAND register isn't getting restored for some reason. If they do
match, then something else in the system is not getting restored
correctly.


diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 8b3f846..67bb29f 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -7225,8 +7225,17 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
*/
static int tg3_init_hw(struct tg3 *tp, int reset_phy)
{
+ u16 cmd;
+
tg3_switch_clocks(tp);

+ pci_read_config_word(tp->pdev, PCI_COMMAND, &cmd);
+
+ printk(KERN_NOTICE "%s: PCI_COMMAND reg = 0x%x (bit 1 is %s)\n",
+ tp->dev->name, cmd, (cmd & PCI_COMMAND_MEMORY) ? "on" : "off");
+ printk(KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
+ tp->dev->name, tr32(0x0));
+
tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);

return tg3_reset_hw(tp, reset_phy);

2009-01-30 22:32:18

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:

>
> I still am interested if it makes any difference for Parag.

No difference - tg3 is still dead after resume.

BTW, I applied the patch on top of the one Matt gave earlier.
Machine booted with original tg3 which I rmmod'ed and then insmod'ed the
new one (with Matt's and Rafael's patch) and then attempted a
suspend-resume.

Is there a reason to try fresh boot with patched tg3 and without
loading old module - I guess I will try that one later as well.

Thanks
Parag

2009-01-30 22:37:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Parag Warudkar wrote:
>
> BTW, I applied the patch on top of the one Matt gave earlier.
> Machine booted with original tg3 which I rmmod'ed and then insmod'ed the
> new one (with Matt's and Rafael's patch) and then attempted a
> suspend-resume.
>
> Is there a reason to try fresh boot with patched tg3 and without
> loading old module - I guess I will try that one later as well.

As long as you didn't suspend/resume with the old module loaded, it really
shouldn't matter.

Linus

2009-01-30 22:50:59

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Matt Carlson wrote:

>
> If the PCI_COMMAND message doesn't match, then it means that the
> PCI_COMMAND register isn't getting restored for some reason. If they do
> On Thu, Jan 29, 2009 at 02:35:44PM -0800, Parag Warudkar wrote:
> >
> >
> > On Thu, 29 Jan 2009, Matt Carlson wrote:
> >
> > > FWIW, I can suspend and resume using the latest linux-2.6 kernel
> > > on a machine with a similar chip here. The problem doesn't seem to
> > > affect all Broadcom devices.
> >
> > It is failing for me on HP xw6600 workstation, if that helps in any way.
> >
> > Parag
>
> O.K. Let's test some more assumptions. Can you apply the following
> patch and observe the system logs when the device is first loaded and
> again after resume. The patch looks at the pci command register to
> verify that memory space IO is indeed enabled. (It should be.) This is
> all that should be needed for MMIO to work.
> On Thu, Jan 29, 2009 at 02:35:44PM -0800, Parag Warudkar wrote:
> >
> >
> > On Thu, 29 Jan 2009, Matt Carlson wrote:

> O.K. Let's test some more assumptions. Can you apply the following
> patch and observe the system logs when the device is first loaded and
> again after resume. The patch looks at the pci command register to
> verify that memory space IO is indeed enabled. (It should be.) This is
> all that should be needed for MMIO to work.
> match, then something else in the system is not getting restored
> correctly.
>
Here is the output after applying the patch (fresh boot btw) -

[ 29.698877] eth0: PCI_COMMAND reg = 0x406 (bit 1 is on)
[ 29.698880] eth0: Reg value at offset 0x0 is 0x167b14e4
[ 29.758169] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 31.295087] tg3: eth0: Link is up at 100 Mbps, full duplex.
[ 31.295090] tg3: eth0: Flow control is off for TX and off for RX.
[ 31.297574] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 41.872007] eth0: no IPv6 routers present
^^^ Pre-Suspend

[ 245.924484] eth0: PCI_COMMAND reg = 0x406 (bit 1 is on)
[ 245.924487] eth0: Reg value at offset 0x0 is 0xffffffff
[ 247.317971] tg3: eth0: No firmware running.
[ 258.710634] ADDRCONF(NETDEV_UP): eth0: link is not ready
^^^ Post-Suspend

So it looks like the memory space IO is enabled before and after suspend.
The device/vendor id goes 0xffffffff after resume - just like before.
Does that one matter? (Firmware may be looking at it?)

>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 8b3f846..67bb29f 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -7225,8 +7225,17 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
> */
> static int tg3_init_hw(struct tg3 *tp, int reset_phy)
> {
> + u16 cmd;
> +
> tg3_switch_clocks(tp);
>
> + pci_read_config_word(tp->pdev, PCI_COMMAND, &cmd);
> +
> + printk(KERN_NOTICE "%s: PCI_COMMAND reg = 0x%x (bit 1 is %s)\n",
> + tp->dev->name, cmd, (cmd & PCI_COMMAND_MEMORY) ? "on" : "off");
> + printk(KERN_NOTICE "%s: Reg value at offset 0x0 is 0x%x\n",
> + tp->dev->name, tr32(0x0));
> +
> tw32(TG3PCI_MEM_WIN_BASE_ADDR, 0);
>
> return tg3_reset_hw(tp, reset_phy);
>
>

2009-01-30 22:55:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Friday 30 January 2009, Parag Warudkar wrote:
>
> On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
>
> >
> > I still am interested if it makes any difference for Parag.
>
> No difference - tg3 is still dead after resume.

Thanks for testing.

Well, I'm not sure if tg3 is at fault, really.

What happens if you unload tg3 before suspend and load it back after the
resume?

Rafael

2009-01-30 23:06:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume


On Fri, 30 Jan 2009, Parag Warudkar wrote:
>
> [ 245.924484] eth0: PCI_COMMAND reg = 0x406 (bit 1 is on)
> [ 245.924487] eth0: Reg value at offset 0x0 is 0xffffffff
> [ 247.317971] tg3: eth0: No firmware running.
> [ 258.710634] ADDRCONF(NETDEV_UP): eth0: link is not ready
> ^^^ Post-Suspend
>
> So it looks like the memory space IO is enabled before and after suspend.
> The device/vendor id goes 0xffffffff after resume - just like before.
> Does that one matter? (Firmware may be looking at it?)

One thing strikes me - are there any bridges between the host (CPU) and
that tg3 device?

Because we obviously have two people who say that their tg3 suspend/resume
works fine, so the tg3 driver is obviously not _totally_ broken. So I'm
wondering if there is something funny in between the CPU and the tg3, like
a hotplug bridge that needs magic to wake up properly.

Because clearly the PCI config space addresses are working fine, but the
thing is, while PCI config space accesses are routed by the device number
(and the bridges notion of secondary bridging), the PCI memory space
routing is based on address. So a PCI bridge can easily get one right (in
fact, it's really hard to get config space accesses wrong without the
bridges being _totally_ screwed up), while not routing the other at all.

So just do that "lspci -vvxxx" for the whole box, before and after, and
send us the "before" and the "diff -u before after" thing, and maybe that
shows something interesting. Because some bridge chip being confused would
also explain why a total re-init of the whole tg3 chip by a driver unload
and reload doesn't seem to help.

Linus

2009-01-30 23:08:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
>
> Well, I'm not sure if tg3 is at fault, really.
>
> What happens if you unload tg3 before suspend and load it back after the
> resume?

Yes, good thing to test. See my previous email - maybe it's a bridge.

Linus

2009-01-30 23:13:56

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:

> On Friday 30 January 2009, Parag Warudkar wrote:
> >
> > On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
> >
> > >
> > > I still am interested if it makes any difference for Parag.
> >
> > No difference - tg3 is still dead after resume.
>
> Thanks for testing.
>
> Well, I'm not sure if tg3 is at fault, really.
>
> What happens if you unload tg3 before suspend and load it back after the
> resume?

This time it fails with different error on loading after suspend/resume
cycle -

1196.873608] tg3 0000:0e:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 1196.873620] tg3 0000:0e:00.0: setting latency timer to 64
[ 1196.880017] tg3 0000:0e:00.0: PME# disabled
[ 1196.996270] tg3: (0000:0e:00.0) phy probe failed, err -19
[ 1197.508033] tg3: Problem fetching invariants of chip, aborting.
[ 1197.508048] tg3 0000:0e:00.0: PCI INT A disabled

Parag

2009-01-30 23:32:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
>
> On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
>
> > On Friday 30 January 2009, Parag Warudkar wrote:
> > >
> > > On Fri, 30 Jan 2009, Rafael J. Wysocki wrote:
> > >
> > > >
> > > > I still am interested if it makes any difference for Parag.
> > >
> > > No difference - tg3 is still dead after resume.
> >
> > Thanks for testing.
> >
> > Well, I'm not sure if tg3 is at fault, really.
> >
> > What happens if you unload tg3 before suspend and load it back after the
> > resume?
>
> This time it fails with different error on loading after suspend/resume
> cycle -
>
> 1196.873608] tg3 0000:0e:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 1196.873620] tg3 0000:0e:00.0: setting latency timer to 64
> [ 1196.880017] tg3 0000:0e:00.0: PME# disabled
> [ 1196.996270] tg3: (0000:0e:00.0) phy probe failed, err -19
> [ 1197.508033] tg3: Problem fetching invariants of chip, aborting.
> [ 1197.508048] tg3 0000:0e:00.0: PCI INT A disabled

It seems like something between the tg3 chip and the host CPU doesn't work
correctly after resume, Linus is right.

I wonder if this change makes any difference:

--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_SUSPEND);

+ if (!drv || !drv->pm)
+ return 0;
+
if (drv && drv->pm && drv->pm->suspend) {
error = drv->pm->suspend(dev);
suspend_report_result(drv->pm->suspend, error);


Rafael

2009-01-30 23:33:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Linus Torvalds wrote:
>
> So just do that "lspci -vvxxx" for the whole box, before and after, and
> send us the "before" and the "diff -u before after" thing, and maybe that
> shows something interesting. Because some bridge chip being confused would
> also explain why a total re-init of the whole tg3 chip by a driver unload
> and reload doesn't seem to help.

It might also be instructive to see the same thing for a working kernel. I
assume plain 2.6.28 works for you?

Linus

2009-01-30 23:45:55

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Linus Torvalds wrote:

>
> Because we obviously have two people who say that their tg3 suspend/resume
> works fine, so the tg3 driver is obviously not _totally_ broken. So I'm
> wondering if there is something funny in between the CPU and the tg3, like
> a hotplug bridge that needs magic to wake up properly.
>
> Because clearly the PCI config space addresses are working fine, but the
> thing is, while PCI config space accesses are routed by the device number
> (and the bridges notion of secondary bridging), the PCI memory space
> routing is based on address. So a PCI bridge can easily get one right (in
> fact, it's really hard to get config space accesses wrong without the
> bridges being _totally_ screwed up), while not routing the other at all.
>
> So just do that "lspci -vvxxx" for the whole box, before and after, and
> send us the "before" and the "diff -u before after" thing, and maybe that
> shows something interesting. Because some bridge chip being confused would
> also explain why a total re-init of the whole tg3 chip by a driver unload
> and reload doesn't seem to help.

Totally worth having this problem from a "getting an opportunity to
understand" standpoint. This confirms my long standing suspicion that bugs
in Linux kernel are merely a handiwork of few clever people to get more
people to understand and contribute :)

Any how here is the pre-suspend lspci -vvxxx output followed by diff -u -

00:00.0 Host bridge: Intel Corporation 5400 Chipset Memory Controller Hub (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 255
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- Queue=0/1 Enable-
Address: fee00000 Data: 0000
Capabilities: [6c] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us
ClockPM- Suprise+ LLActRep+ BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [100] Advanced Error Reporting <?>
00: 86 80 03 40 04 01 10 00 20 00 00 06 10 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 00 00
40: 00 00 00 00 00 00 00 00 84 14 00 00 82 01 00 00
50: 01 58 03 c8 08 00 00 00 05 6c 02 00 00 00 e0 fe
60: 00 00 00 00 00 00 00 00 23 02 63 00 10 00 42 00
70: 01 80 00 00 0f 00 00 00 41 4c 19 00 40 00 41 30
80: 80 0c 00 00 c0 03 40 01 00 00 00 00 00 00 00 00
90: 17 00 00 00 0a 00 00 00 00 00 00 00 41 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 01 00 71 fe 0c 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 ff 07 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 03 41 00 00 60 00 00 00 00 21 11 00
e0: 01 02 01 02 f9 32 3a 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:01.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 1 (rev 20)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=80, subordinate=80, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- Queue=0/1 Enable+
Address: fee0f00c Data: 4171
Capabilities: [6c] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #4, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
ClockPM- Suprise+ LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
Slot # 0, PowerLimit 150.000000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet+ LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [b0] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 21 40 07 05 10 00 20 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 80 80 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 06 00
40: 00 00 00 00 00 00 00 00 84 14 00 00 02 03 00 00
50: 01 58 03 c8 08 00 00 00 05 6c 03 00 0c f0 e0 fe
60: 71 41 00 00 00 00 00 00 00 00 00 00 10 b0 42 01
70: 01 80 00 00 0f 00 00 00 02 4d 39 04 40 00 01 10
80: 00 4b 00 00 c0 03 08 00 00 00 00 00 00 00 00 00
90: 17 00 00 00 0a 00 00 00 00 00 00 00 42 00 01 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:05.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 5 (rev 20)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=60, subordinate=60, sec-latency=0
I/O behind bridge: 00001000-00001fff
Memory behind bridge: f0000000-f00fffff
Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- Queue=0/1 Enable+
Address: fee0f00c Data: 4179
Capabilities: [6c] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
ClockPM- Suprise+ LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
Slot # 0, PowerLimit 150.000000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [b0] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 25 40 07 05 10 00 20 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 60 60 00 10 10 00 00
20: 00 f0 00 f0 01 d0 f1 df 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 0a 00
40: 00 00 00 00 00 00 00 00 84 14 00 00 02 03 00 00
50: 01 58 03 c8 08 00 00 00 05 6c 03 00 0c f0 e0 fe
60: 79 41 00 00 00 00 00 00 00 00 00 00 10 b0 42 01
70: 01 80 00 00 0f 00 00 00 02 4d 39 08 40 00 01 31
80: 00 4b 00 00 c0 03 48 01 00 00 00 00 00 00 00 00
90: 17 00 00 00 0a 00 00 00 00 00 00 00 42 00 01 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:09.0 PCI bridge: Intel Corporation 5400 Chipset PCI Express Port 9 (rev 20)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=10, subordinate=40, sec-latency=0
Memory behind bridge: f0300000-f03fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- Queue=0/1 Enable+
Address: fee0f00c Data: 4181
Capabilities: [6c] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #12, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us
ClockPM- Suprise+ LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
Slot # 0, PowerLimit 25.000000; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet- LinkState+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [b0] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 29 40 07 05 10 00 20 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 00 10 40 00 f0 00 00 00
20: 30 f0 30 f0 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 06 00
40: 00 00 00 00 00 00 00 00 84 14 00 00 02 03 00 00
50: 01 58 03 c8 08 00 00 00 05 6c 03 00 0c f0 e0 fe
60: 81 41 00 00 00 00 00 00 00 00 00 00 10 b0 42 01
70: 01 80 00 00 0f 00 00 00 42 4c 39 0c 40 00 41 30
80: 80 0c 00 00 c0 03 40 01 00 00 00 00 00 00 00 00
90: 17 00 00 00 0a 00 00 00 00 00 00 00 01 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.0 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5k_amb
00: 86 80 30 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 ff 00 00 08 00 08 00 00 00 fe 00 00 00 00
50: 00 00 02 00 00 00 04 08 00 10 11 11 11 11 11 33
60: 00 12 88 01 01 e0 00 00 ff ff ff ff 00 00 00 00
70: 09 c0 e2 3f 00 00 00 00 09 c0 e2 3f 00 00 00 00
80: 01 00 80 00 02 02 80 00 00 00 00 00 00 00 00 00
90: 04 01 80 00 08 03 80 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 ff fb eb bf 10 01 00 00
e0: 76 06 00 00 00 00 00 00 00 00 00 00 20 00 00 00
f0: 58 7f 38 00 00 00 58 00 08 09 28 64 80 a2 fc 12

00:10.1 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5k_amb
00: 86 80 30 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: e0 01 70 46 03 00 00 00 cd 0a 55 35 32 59 02 00
50: 55 a1 ae c6 c8 00 09 00 16 1a b2 fd 07 00 00 00
60: 00 00 eb 01 08 00 00 00 00 00 00 00 00 d0 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 03 01 00 00 82 02 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00
a0: 00 00 00 20 00 00 00 00 60 06 26 08 60 f6 bf 1f
b0: fb ff ff 1f fb ff ff 1f fb ff ff 1f 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.2 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5k_amb
00: 86 80 30 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 af ff 03 b0 07 2c 03 ff af ff 03 ff af ff 03
e0: ff af ff 03 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 1f 00 1f 00 1f 00 1f 00 1f 00

00:10.3 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)
Subsystem: Intel Corporation Device 8086
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5k_amb
00: 86 80 30 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 86 80
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:10.4 Host bridge: Intel Corporation 5400 Chipset FSB Registers (rev 20)
Subsystem: Intel Corporation Device 8086
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5k_amb
00: 86 80 30 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 86 80
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 2d 00 a8 dc 00 1c 07 07 b4 00 c8 00
f0: 00 00 00 7f 00 00 00 00 48 00 00 00 00 00 c8 00

00:11.0 Host bridge: Intel Corporation 5400 Chipset CE/SF Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00: 86 80 31 40 00 00 00 00 20 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 03 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 01 00 01 00 10 00 10 00 01 00 01 00 10 00 10 00
d0: 00 c0 ff 03 0a 40 c0 03 90 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:15.0 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00: 86 80 35 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
70: 00 00 00 00 6b e0 6b e0 00 00 0b a9 00 00 0b a9
80: 24 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 08 00 01 01 08 00 01 01 08 00 01 01 08 00
a0: 01 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 10 3b 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 06 06 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:15.1 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00: 86 80 35 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:16.0 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00: 86 80 36 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
70: 00 00 00 00 69 e0 69 e0 00 00 0b a9 00 00 0b a9
80: 65 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 20 00 01 01 20 00 01 01 20 00 01 01 20 00
a0: 01 01 20 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 c4 0e 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 06 06 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:16.1 Host bridge: Intel Corporation 5400 Chipset FBD Registers (rev 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00: 86 80 36 40 00 00 00 00 20 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:1b.0 Audio device: Intel Corporation 631xESB/632xESB High Definition Audio Controller (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 21
Region 0: Memory at f0200000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM unknown, Latency L0 <64ns, L1 <1us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Virtual Channel <?>
Capabilities: [130] Root Complex Link <?>
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel
00: 86 80 9a 26 06 00 10 00 09 00 03 04 10 00 00 00
10: 04 00 20 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
40: 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 60 42 c8 00 00 00 00 00 00 00 00 00 00 00 00
60: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 91 00 00 00 00 00 00 08 10 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
Memory behind bridge: f0100000-f01fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <256ns, L1 <4us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
Address: fee0f00c Data: 4189
Capabilities: [90] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100] Virtual Channel <?>
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 90 26 07 05 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 00 0e 0e 00 f0 00 00 00
20: 10 f0 10 f0 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 06 00
40: 10 80 41 00 e0 0f 00 00 00 00 10 00 11 2c 01 01
50: 40 00 11 10 60 00 00 00 00 00 48 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 90 01 00 0c f0 e0 fe 89 41 00 00 00 00 00 00
90: 0d a0 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
a0: 01 00 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 11 80 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 4: I/O ports at 2000 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd
00: 86 80 88 26 05 00 80 02 09 00 03 0c 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 20 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 2f 00 00 03 00 00 00 00 00 01 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 19
Region 4: I/O ports at 2020 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd
00: 86 80 89 26 05 00 80 02 09 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 21 20 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 05 02 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 2f 00 00 03 00 00 00 00 00 01 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin C routed to IRQ 18
Region 4: I/O ports at 2040 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd
00: 86 80 8a 26 05 00 80 02 09 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 41 20 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 07 03 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 2f 00 00 03 00 00 00 00 00 01 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin D routed to IRQ 22
Region 4: I/O ports at 2060 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd
00: 86 80 8b 26 05 00 80 02 09 00 03 0c 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 61 20 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 04 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 2f 00 00 03 00 00 00 00 00 01 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) (prog-if 20)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f0204000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Debug port: BAR=1 offset=00a0
Kernel driver in use: ehci_hcd
Kernel modules: ehci-hcd
00: 86 80 8c 26 06 01 90 02 09 20 03 0c 00 00 00 00
10: 00 40 20 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 c2 c9 00 00 00 00 0a 00 a0 20 00 00 00 00
60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 80
70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 aa ff 00 ff ff ff 00 20 00 00 08
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 80 00 09 88 8c 40 00 80 0f 01 00 86 17 00 20

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) (prog-if 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Subsystem: Hewlett-Packard Company Device 1307
00: 86 80 4e 24 07 01 10 00 d9 01 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 20 f0 00 80 22
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 06 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Kernel modules: intel-rng, iTCO_wdt
00: 86 80 70 26 07 01 00 02 09 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 01 f8 00 00 80 00 00 00 01 fa 00 00 10 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 83 85 87 85 d0 00 00 00 80 83 8a 80 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 10 00 09 14 01 04 00 00 81 04 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 06 00 00 01 00 00 00 00 00 00 00 00 13 00 00
b0: 00 00 00 00 00 00 00 00 00 10 00 04 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 33 22 11 00 67 45 00 00 c0 c0 00 00 02 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 01 c0 d1 fe 00 00 00 00 80 0f 01 00 00 00 00 00

00:1f.2 RAID bus controller: Intel Corporation 631xESB/632xESB SATA RAID Controller (rev 09)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 19
Region 0: I/O ports at 20c0 [size=8]
Region 1: I/O ports at 20d0 [size=4]
Region 2: I/O ports at 20c8 [size=8]
Region 3: I/O ports at 20d4 [size=4]
Region 4: I/O ports at 2080 [size=32]
Region 5: Memory at f0204400 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a8] SATA HBA <?>
Kernel driver in use: ahci
Kernel modules: ahci
00: 86 80 82 26 07 00 b0 02 09 00 04 01 00 00 00 00
10: c1 20 00 00 d1 20 00 00 c9 20 00 00 d5 20 00 00
20: 81 20 00 00 00 44 20 f0 00 00 00 00 3c 10 07 13
30: 00 00 00 00 70 00 00 00 00 00 00 00 05 02 00 00
40: 22 c0 22 c0 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 01 a8 02 40 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 70 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 80 03 ff 00 80 03 40 00 00 00 00 00 00 00 00 00
a0: a0 00 00 00 aa 2a 49 2d 12 00 10 00 48 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

0e:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02)
Subsystem: Hewlett-Packard Company Device 1307
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 54
Region 0: Memory at f0100000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at <ignored> [disabled]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0f00c Data: 41c1
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 <4us, L1 <64us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100] Advanced Error Reporting <?>
Capabilities: [13c] Virtual Channel <?>
Capabilities: [160] Device Serial Number 67-ee-00-fe-ff-29-1f-00
Capabilities: [16c] Power Budgeting <?>
Kernel driver in use: tg3
Kernel modules: tg3
00: e4 14 7b 16 06 04 10 00 02 00 00 02 10 00 00 00
10: 04 00 10 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
50: 03 58 fc 00 00 00 00 78 09 e8 78 00 95 ef 08 88
60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 34 00 13 04 82 70 08 fc
90: 19 be 00 01 00 00 00 00 00 00 00 00 94 01 00 00
a0: 00 00 00 00 cc 00 00 00 00 00 00 00 29 01 00 00
b0: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 0e 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
f0: 00 00 00 00 c1 41 00 00 00 00 00 00 00 00 00 00

10:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=1e, subordinate=40, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) Upstream Port, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-SlotPowerLimit 0.000000W
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 00 35 07 01 10 00 01 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 20
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 ff 01 06 00
40: 00 28 02 10 10 70 51 00 01 00 00 00 0f 50 0a 00
50: 81 f4 03 00 00 00 41 10 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00

10:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=11, subordinate=1d, sec-latency=32
Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) PCI/PCI-X Bridge, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- BrConfRtry-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [d8] PCI-X bridge device
Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=133MHz
Status: Dev=00:00.3 64bit- 133MHz- SCD- USC- SCO- SRD-
Upstream: Capacity=65535 CommitmentLimit=65535
Downstream: Capacity=65535 CommitmentLimit=65535
Capabilities: [100] Advanced Error Reporting <?>
Kernel modules: shpchp
00: 86 80 0c 35 07 01 10 00 01 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 22
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 06 00
40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 0a 00
50: 81 f4 03 00 00 00 41 00 00 00 00 00 05 6c 80 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 01 80 02 c8
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d d8 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 07 00 c3 00 03 00 00 00
e0: ff ff ff ff ff ff ff ff 01 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00

1e:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=1e, secondary=20, subordinate=20, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) Downstream Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain+ CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0f00c Data: 4191
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 10 35 07 05 10 00 01 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 1e 20 20 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 91 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 15 00 10 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

1e:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=1e, secondary=40, subordinate=40, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) Downstream Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain+ CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0f00c Data: 4199
Capabilities: [70] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
Capabilities: [100] Advanced Error Reporting <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 14 35 07 05 10 00 01 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 1e 40 40 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 99 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 15 00 10 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

60:00.0 VGA compatible controller: ATI Technologies Inc RV535 [Radeon X1650 Series] (rev 9e)
Subsystem: Diamond Multimedia Systems Device 0672
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 28
Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at f0000000 (64-bit, non-prefetchable) [size=64K]
Region 4: I/O ports at 1000 [size=256]
[virtual] Expansion ROM at f0020000 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [80] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
00: 02 10 c7 71 07 00 10 00 9e 00 00 03 10 00 80 00
10: 0c 00 00 d0 00 00 00 00 04 00 00 f0 00 00 00 00
20: 01 10 00 00 00 00 00 00 00 00 00 00 92 10 72 06
30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 92 10 72 06
50: 01 58 02 06 00 00 00 00 10 80 01 00 a0 0f 58 02
60: 14 08 00 00 01 0d 00 08 40 00 01 11 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

60:00.1 Display controller: ATI Technologies Inc RV535 [Radeon X1650 Series] (rev 9e)
Subsystem: Diamond Multimedia Systems Device 0673
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Region 0: Memory at f0010000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
00: 02 10 e7 71 07 00 10 00 9e 00 80 03 10 00 00 00
10: 04 00 01 f0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 92 10 73 06
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 02 06 00 00 00 00 10 00 01 00 80 0f 00 00
60: 04 00 00 00 01 0d 00 08 00 00 01 11 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

----------------
diff -u lspci-pre-suspend lspci-post-suspend

--- lspci-pre-suspend 2009-01-30 18:19:50.752275695 -0500
+++ lspci-post-suspend 2009-01-30 18:20:52.629779008 -0500
@@ -228,7 +228,7 @@
90: 04 01 80 00 08 03 80 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+c0: 01 00 00 00 01 00 00 00 a5 a5 a5 a5 a5 a5 a5 a5
d0: 00 00 00 00 00 00 00 00 ff fb eb bf 10 01 00 00
e0: 76 06 00 00 00 00 00 00 00 00 00 00 20 00 00 00
f0: 58 7f 38 00 00 00 58 00 08 09 28 64 80 a2 fc 12
@@ -286,9 +286,9 @@
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 86 80
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-40: 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 00 00
+40: 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+60: 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -353,7 +353,7 @@
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
-70: 00 00 00 00 6b e0 6b e0 00 00 0b a9 00 00 0b a9
+70: 00 00 00 00 00 c0 00 c0 01 1b 4c b8 01 1b 4c b8
80: 24 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 08 00 01 01 08 00 01 01 08 00 01 01 08 00
a0: 01 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -395,7 +395,7 @@
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
-70: 00 00 00 00 69 e0 69 e0 00 00 0b a9 00 00 0b a9
+70: 00 00 00 00 00 40 00 40 01 1b 4c b8 01 1b 4c b8
80: 65 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 20 00 01 01 20 00 01 01 20 00 01 01 20 00
a0: 01 01 20 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -472,7 +472,7 @@
f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00

00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
- Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
+ Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
@@ -505,7 +505,7 @@
Capabilities: [180] Root Complex Link <?>
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
-00: 86 80 90 26 07 05 10 00 09 00 04 06 10 00 81 00
+00: 86 80 90 26 04 05 10 00 09 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 00 0e 0e 00 f0 00 00 00
20: 10 f0 10 f0 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 03 01 06 00
@@ -645,8 +645,8 @@
30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 c2 c9 00 00 00 00 0a 00 a0 20 00 00 00 00
-60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 80
-70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
+60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 c0
+70: 00 00 dd 3f 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -698,7 +698,7 @@
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 10 00 09 14 01 04 00 00 81 04 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-a0: 10 06 00 00 01 00 00 00 00 00 00 00 00 13 00 00
+a0: 10 06 00 00 00 00 00 00 00 00 00 00 00 13 00 00
b0: 00 00 00 00 00 00 00 00 00 10 00 04 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 33 22 11 00 67 45 00 00 c0 c0 00 00 02 00 00 00
@@ -754,7 +754,7 @@
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
- Address: 00000000fee0f00c Data: 41c1
+ Address: 00000000fee0f00c Data: 41c9
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
@@ -778,24 +778,24 @@
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
-50: 03 58 fc 00 00 00 00 78 09 e8 78 00 95 ef 08 88
-60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
-70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
-80: 00 00 00 00 00 00 00 00 34 00 13 04 82 70 08 fc
-90: 19 be 00 01 00 00 00 00 00 00 00 00 94 01 00 00
-a0: 00 00 00 00 cc 00 00 00 00 00 00 00 29 01 00 00
-b0: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 00
+50: 03 58 fc 00 00 00 00 78 09 e8 78 00 96 f1 08 b8
+60: 00 00 00 00 00 00 00 00 9a 02 02 a0 00 00 00 10
+70: 72 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
+80: 00 00 00 00 00 00 00 00 00 00 00 00 fe 70 08 fc
+90: 11 be 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 0e 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
-f0: 00 00 00 00 c1 41 00 00 00 00 00 00 00 00 00 00
+f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00

10:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=1e, subordinate=40, sec-latency=0
- Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
+ Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) Upstream Port, MSI 00
@@ -818,7 +818,7 @@
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 00 35 07 01 10 00 01 00 04 06 10 00 81 00
-10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 20
+10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 ff 01 06 00
40: 00 28 02 10 10 70 51 00 01 00 00 00 0f 50 0a 00
@@ -839,7 +839,7 @@
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=11, subordinate=1d, sec-latency=32
- Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
+ Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) PCI/PCI-X Bridge, MSI 00
@@ -848,7 +848,7 @@
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- BrConfRtry-
MaxPayload 128 bytes, MaxReadReq 512 bytes
- DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
+ DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
@@ -857,7 +857,7 @@
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
- Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
+ Capabilities: [80] Subsystem: Gammagraphx, Inc. Device 0000
Capabilities: [d8] PCI-X bridge device
Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=133MHz
Status: Dev=00:00.3 64bit- 133MHz- SCD- USC- SCO- SRD-
@@ -866,14 +866,14 @@
Capabilities: [100] Advanced Error Reporting <?>
Kernel modules: shpchp
00: 86 80 0c 35 07 01 10 00 01 00 04 06 10 00 81 00
-10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 22
+10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 02
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 06 00
-40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 0a 00
+40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 00 00
50: 81 f4 03 00 00 00 41 00 00 00 00 00 05 6c 80 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 01 80 02 c8
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-80: 0d d8 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
+80: 0d d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -916,7 +916,7 @@
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
-50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
+50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 91 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
@@ -962,7 +962,7 @@
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
-50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
+50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 99 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00

2009-01-30 23:52:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> I wonder if this change makes any difference:
>
> --- linux-2.6.orig/drivers/pci/pci-driver.c
> +++ linux-2.6/drivers/pci/pci-driver.c
> @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend(dev, PMSG_SUSPEND);
>
> + if (!drv || !drv->pm)
> + return 0;
> +
> if (drv && drv->pm && drv->pm->suspend) {
> error = drv->pm->suspend(dev);
> suspend_report_result(drv->pm->suspend, error);

I don't think that's right. Now you don't end up calling
pci_pm_default_suspend_generic() at all, and this no pci_save_state().

But I think it could easily be the call to pci_disable_enabled_device().
It does that

if (atomic_read(&dev->enable_cnt))
do_pci_disable_device(dev);

and that ends up disabling PCI_COMMAND_MASTER and then calling
pcibios_disable_device().

Any device we have ever done pci_enable_device() on would trigger this,
which includes PCIE bridges, for example. And while the pcie driver does
that

pcie_portdrv_restore_config ->
pci_enable_device(dev);

thing to re-enable it, that's a no-op since the enable_count is already
non-zero.

And we do try to restore it (pci_restore_standard_config() will call
pci_restore_state()), but since we've done the
pci_disable_enabled_device() _before_ we did the pci_save_state(), we now
restore a non-working setup.

I think. The rules are too damn subtle there. Rafael, can you look around
a bit?

Linus

2009-01-30 23:58:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Parag Warudkar wrote:
>
> Totally worth having this problem from a "getting an opportunity to
> understand" standpoint. This confirms my long standing suspicion that bugs
> in Linux kernel are merely a handiwork of few clever people to get more
> people to understand and contribute :)

Heh. I wish. But if that's the end result, we've done something good.

> Any how here is the pre-suspend lspci -vvxxx output followed by diff -u -

Bingo.

> diff -u lspci-pre-suspend lspci-post-suspend
>
> --- lspci-pre-suspend 2009-01-30 18:19:50.752275695 -0500
> +++ lspci-post-suspend 2009-01-30 18:20:52.629779008 -0500
>
> 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> + Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0

We've disabled IO and MEM behind this bridge - the one that bridges to
secondary bus 0x0e.

And your tg3 device? It's at 0000:0e:00.0. Yeah. Exactly the bus that
we've disabled IO and MEM for.

In other words, it was never your tg3 suspend/resume that was buggy. It
was the suspend/resume for the PCI-E port driver. In fact, I think it's
_exactly_ the issue I just emailed out about.

I bet Rafael can whip up a patch in a minute. I have too much of a
headache right now to look at my screen for a while, so I'll take a break.

Linus

2009-01-31 00:00:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
>
> On Fri, 30 Jan 2009, Linus Torvalds wrote:
>
> >
> > Because we obviously have two people who say that their tg3 suspend/resume
> > works fine, so the tg3 driver is obviously not _totally_ broken. So I'm
> > wondering if there is something funny in between the CPU and the tg3, like
> > a hotplug bridge that needs magic to wake up properly.
> >
> > Because clearly the PCI config space addresses are working fine, but the
> > thing is, while PCI config space accesses are routed by the device number
> > (and the bridges notion of secondary bridging), the PCI memory space
> > routing is based on address. So a PCI bridge can easily get one right (in
> > fact, it's really hard to get config space accesses wrong without the
> > bridges being _totally_ screwed up), while not routing the other at all.
> >
> > So just do that "lspci -vvxxx" for the whole box, before and after, and
> > send us the "before" and the "diff -u before after" thing, and maybe that
> > shows something interesting. Because some bridge chip being confused would
> > also explain why a total re-init of the whole tg3 chip by a driver unload
> > and reload doesn't seem to help.
>
> Totally worth having this problem from a "getting an opportunity to
> understand" standpoint. This confirms my long standing suspicion that bugs
> in Linux kernel are merely a handiwork of few clever people to get more
> people to understand and contribute :)
>
> Any how here is the pre-suspend lspci -vvxxx output followed by diff -u -
>

[--snip--]

I think this is what we're looking for:

> @@ -472,7 +472,7 @@
> f0: 00 00 00 00 00 00 00 00 80 0f 01 00 00 00 00 00
>
> 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> + Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0

and the PCIe port driver may be at fault.

Can you try to remove the pci_save_state(dev) from pcie_port_suspend_late()
and see if that helps?

Rafael

2009-01-31 00:08:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > I wonder if this change makes any difference:
> >
> > --- linux-2.6.orig/drivers/pci/pci-driver.c
> > +++ linux-2.6/drivers/pci/pci-driver.c
> > @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
> > if (pci_has_legacy_pm_support(pci_dev))
> > return pci_legacy_suspend(dev, PMSG_SUSPEND);
> >
> > + if (!drv || !drv->pm)
> > + return 0;
> > +
> > if (drv && drv->pm && drv->pm->suspend) {
> > error = drv->pm->suspend(dev);
> > suspend_report_result(drv->pm->suspend, error);
>
> I don't think that's right. Now you don't end up calling
> pci_pm_default_suspend_generic() at all, and this no pci_save_state().
>
> But I think it could easily be the call to pci_disable_enabled_device().
> It does that
>
> if (atomic_read(&dev->enable_cnt))
> do_pci_disable_device(dev);
>
> and that ends up disabling PCI_COMMAND_MASTER and then calling
> pcibios_disable_device().
>
> Any device we have ever done pci_enable_device() on would trigger this,
> which includes PCIE bridges, for example. And while the pcie driver does
> that
>
> pcie_portdrv_restore_config ->
> pci_enable_device(dev);
>
> thing to re-enable it, that's a no-op since the enable_count is already
> non-zero.
>
> And we do try to restore it (pci_restore_standard_config() will call
> pci_restore_state()), but since we've done the
> pci_disable_enabled_device() _before_ we did the pci_save_state(), we now
> restore a non-working setup.
>
> I think. The rules are too damn subtle there. Rafael, can you look around
> a bit?

Sure, I'm looking at it right now.

Rafael

2009-01-31 00:28:48

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:

> > 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> > - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > + Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0, Cache Line Size: 64 bytes
> > Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
>
> and the PCIe port driver may be at fault.
>
> Can you try to remove the pci_save_state(dev) from pcie_port_suspend_late()
> and see if that helps?
>

I assume you meant pcie_portdrv_suspend_late in
drivers/pci/pcie/portdrv_pci.c - that one did not go well.

With that change machine refuses to suspend (comes back after attempting)
and my keyboard goes dead.

Parag

2009-01-31 00:35:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > I wonder if this change makes any difference:
> >
> > --- linux-2.6.orig/drivers/pci/pci-driver.c
> > +++ linux-2.6/drivers/pci/pci-driver.c
> > @@ -501,6 +501,9 @@ static int pci_pm_suspend(struct device
> > if (pci_has_legacy_pm_support(pci_dev))
> > return pci_legacy_suspend(dev, PMSG_SUSPEND);
> >
> > + if (!drv || !drv->pm)
> > + return 0;
> > +
> > if (drv && drv->pm && drv->pm->suspend) {
> > error = drv->pm->suspend(dev);
> > suspend_report_result(drv->pm->suspend, error);
>
> I don't think that's right. Now you don't end up calling
> pci_pm_default_suspend_generic() at all, and this no pci_save_state().
>
> But I think it could easily be the call to pci_disable_enabled_device().
> It does that
>
> if (atomic_read(&dev->enable_cnt))
> do_pci_disable_device(dev);
>
> and that ends up disabling PCI_COMMAND_MASTER and then calling
> pcibios_disable_device().

pci_disable_enabled_device() is not called for the PCIe port driver, because
it has the legacy PM support.

What happens is

pci_pm_suspend(port) ->
pci_legacy_suspend(port) ->
pcie_portdrv_suspend(port) [this doesn't save the state]
pci_save_state(port)

and then, with interrupts off

pci_pm_suspend_noirq(port) ->
pci_legacy_suspend_late(port) ->
pcie_portdrv_suspend_late(port) ->
pci_save_state(port)

and I suspect this last pci_save_state() breaks things. I'm not sure why,
though.

Thanks,
Rafael

2009-01-31 00:39:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> > > 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09)
> > > - Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > > + Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > Latency: 0, Cache Line Size: 64 bytes
> > > Bus: primary=00, secondary=0e, subordinate=0e, sec-latency=0
> >
> > and the PCIe port driver may be at fault.
> >
> > Can you try to remove the pci_save_state(dev) from pcie_port_suspend_late()
> > and see if that helps?
> >
>
> I assume you meant pcie_portdrv_suspend_late in
> drivers/pci/pcie/portdrv_pci.c - that one did not go well.

Yes.

> With that change machine refuses to suspend (comes back after attempting)
> and my keyboard goes dead.

This gets more and more interesting.

Can you test the patch below, please?

Rafael

---
Subject: PCI PCIe portdrv: Implement pm object
From: Rafael J. Wysocki <[email protected]>

Implement pm object for the PCI Express port driver in order to use
the new power management framework.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/hotplug/pciehp_core.c | 4 +--
drivers/pci/pcie/aer/aerdrv.c | 6 -----
drivers/pci/pcie/portdrv.h | 4 +--
drivers/pci/pcie/portdrv_core.c | 14 +++++-------
drivers/pci/pcie/portdrv_pci.c | 43 ++++++++++++--------------------------
include/linux/pcieport_if.h | 2 -
6 files changed, 25 insertions(+), 48 deletions(-)

Index: linux-2.6/drivers/pci/pcie/portdrv_pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/portdrv_pci.c
+++ linux-2.6/drivers/pci/pcie/portdrv_pci.c
@@ -49,33 +49,21 @@ static int pcie_portdrv_restore_config(s
}

#ifdef CONFIG_PM
-static int pcie_portdrv_suspend(struct pci_dev *dev, pm_message_t state)
-{
- return pcie_port_device_suspend(dev, state);
-
-}
+static struct dev_pm_ops pcie_portdrv_pm_ops = {
+ .suspend = pcie_port_device_suspend,
+ .resume = pcie_port_device_resume,
+ .freeze = pcie_port_device_suspend,
+ .thaw = pcie_port_device_resume,
+ .poweroff = pcie_port_device_suspend,
+ .restore = pcie_port_device_resume,
+};

-static int pcie_portdrv_suspend_late(struct pci_dev *dev, pm_message_t state)
-{
- return pci_save_state(dev);
-}
+#define PCIE_PORTDRV_PM_OPS (&pcie_portdrv_pm_ops)

-static int pcie_portdrv_resume_early(struct pci_dev *dev)
-{
- return pci_restore_state(dev);
-}
+#else /* !PM */

-static int pcie_portdrv_resume(struct pci_dev *dev)
-{
- pcie_portdrv_restore_config(dev);
- return pcie_port_device_resume(dev);
-}
-#else
-#define pcie_portdrv_suspend NULL
-#define pcie_portdrv_suspend_late NULL
-#define pcie_portdrv_resume_early NULL
-#define pcie_portdrv_resume NULL
-#endif
+#define PCIE_PORTDRV_PM_OPS NULL
+#endif /* !PM */

/*
* pcie_portdrv_probe - Probe PCI-Express port devices
@@ -291,12 +279,9 @@ static struct pci_driver pcie_portdriver
.probe = pcie_portdrv_probe,
.remove = pcie_portdrv_remove,

- .suspend = pcie_portdrv_suspend,
- .suspend_late = pcie_portdrv_suspend_late,
- .resume_early = pcie_portdrv_resume_early,
- .resume = pcie_portdrv_resume,
-
.err_handler = &pcie_portdrv_err_handler,
+
+ .driver.pm = PCIE_PORTDRV_PM_OPS,
};

static int __init pcie_portdrv_init(void)
Index: linux-2.6/drivers/pci/pcie/portdrv.h
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/portdrv.h
+++ linux-2.6/drivers/pci/pcie/portdrv.h
@@ -36,8 +36,8 @@ extern struct bus_type pcie_port_bus_typ
extern int pcie_port_device_probe(struct pci_dev *dev);
extern int pcie_port_device_register(struct pci_dev *dev);
#ifdef CONFIG_PM
-extern int pcie_port_device_suspend(struct pci_dev *dev, pm_message_t state);
-extern int pcie_port_device_resume(struct pci_dev *dev);
+extern int pcie_port_device_suspend(struct device *dev);
+extern int pcie_port_device_resume(struct device *dev);
#endif
extern void pcie_port_device_remove(struct pci_dev *dev);
extern int __must_check pcie_port_bus_register(void);
Index: linux-2.6/drivers/pci/pcie/portdrv_core.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/portdrv_core.c
+++ linux-2.6/drivers/pci/pcie/portdrv_core.c
@@ -280,13 +280,12 @@ int pcie_port_device_register(struct pci
static int suspend_iter(struct device *dev, void *data)
{
struct pcie_port_service_driver *service_driver;
- pm_message_t state = * (pm_message_t *) data;

if ((dev->bus == &pcie_port_bus_type) &&
(dev->driver)) {
service_driver = to_service_driver(dev->driver);
if (service_driver->suspend)
- service_driver->suspend(to_pcie_device(dev), state);
+ service_driver->suspend(to_pcie_device(dev));
}
return 0;
}
@@ -294,11 +293,10 @@ static int suspend_iter(struct device *d
/**
* pcie_port_device_suspend - suspend port services associated with a PCIe port
* @dev: PCI Express port to handle
- * @state: Representation of system power management transition in progress
*/
-int pcie_port_device_suspend(struct pci_dev *dev, pm_message_t state)
+int pcie_port_device_suspend(struct device *dev)
{
- return device_for_each_child(&dev->dev, &state, suspend_iter);
+ return device_for_each_child(dev, NULL, suspend_iter);
}

static int resume_iter(struct device *dev, void *data)
@@ -318,11 +316,11 @@ static int resume_iter(struct device *de
* pcie_port_device_suspend - resume port services associated with a PCIe port
* @dev: PCI Express port to handle
*/
-int pcie_port_device_resume(struct pci_dev *dev)
+int pcie_port_device_resume(struct device *dev)
{
- return device_for_each_child(&dev->dev, NULL, resume_iter);
+ return device_for_each_child(dev, NULL, resume_iter);
}
-#endif
+#endif /* PM */

static int remove_iter(struct device *dev, void *data)
{
Index: linux-2.6/include/linux/pcieport_if.h
===================================================================
--- linux-2.6.orig/include/linux/pcieport_if.h
+++ linux-2.6/include/linux/pcieport_if.h
@@ -59,7 +59,7 @@ struct pcie_port_service_driver {
int (*probe) (struct pcie_device *dev,
const struct pcie_port_service_id *id);
void (*remove) (struct pcie_device *dev);
- int (*suspend) (struct pcie_device *dev, pm_message_t state);
+ int (*suspend) (struct pcie_device *dev);
int (*resume) (struct pcie_device *dev);

/* Service Error Recovery Handler */
Index: linux-2.6/drivers/pci/pcie/aer/aerdrv.c
===================================================================
--- linux-2.6.orig/drivers/pci/pcie/aer/aerdrv.c
+++ linux-2.6/drivers/pci/pcie/aer/aerdrv.c
@@ -41,9 +41,6 @@ MODULE_LICENSE("GPL");
static int __devinit aer_probe (struct pcie_device *dev,
const struct pcie_port_service_id *id );
static void aer_remove(struct pcie_device *dev);
-static int aer_suspend(struct pcie_device *dev, pm_message_t state)
-{return 0;}
-static int aer_resume(struct pcie_device *dev) {return 0;}
static pci_ers_result_t aer_error_detected(struct pci_dev *dev,
enum pci_channel_state error);
static void aer_error_resume(struct pci_dev *dev);
@@ -74,9 +71,6 @@ static struct pcie_port_service_driver a
.probe = aer_probe,
.remove = aer_remove,

- .suspend = aer_suspend,
- .resume = aer_resume,
-
.err_handler = &aer_error_handlers,

.reset_link = aer_root_reset,
Index: linux-2.6/drivers/pci/hotplug/pciehp_core.c
===================================================================
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_core.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_core.c
@@ -468,7 +468,7 @@ static void pciehp_remove (struct pcie_d
}

#ifdef CONFIG_PM
-static int pciehp_suspend (struct pcie_device *dev, pm_message_t state)
+static int pciehp_suspend (struct pcie_device *dev)
{
dev_info(&dev->device, "%s ENTRY\n", __func__);
return 0;
@@ -496,7 +496,7 @@ static int pciehp_resume (struct pcie_de
}
return 0;
}
-#endif
+#endif /* PM */

static struct pcie_port_service_id port_pci_ids[] = { {
.vendor = PCI_ANY_ID,

2009-01-31 00:44:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume


* Rafael J. Wysocki <[email protected]> wrote:

> +static struct dev_pm_ops pcie_portdrv_pm_ops = {
> + .suspend = pcie_port_device_suspend,
> + .resume = pcie_port_device_resume,
> + .freeze = pcie_port_device_suspend,
> + .thaw = pcie_port_device_resume,
> + .poweroff = pcie_port_device_suspend,
> + .restore = pcie_port_device_resume,
> +};

pet peeve: could we please use vertical spaces wherever they improve the
code?

Something like:

static struct dev_pm_ops pcie_portdrv_pm_ops = {
.suspend = pcie_port_device_suspend,
.resume = pcie_port_device_resume,
.freeze = pcie_port_device_suspend,
.thaw = pcie_port_device_resume,
.poweroff = pcie_port_device_suspend,
.restore = pcie_port_device_resume,
};

... and it all becomes clear at a glance. Please?

Ingo

2009-01-31 00:47:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > +static struct dev_pm_ops pcie_portdrv_pm_ops = {
> > + .suspend = pcie_port_device_suspend,
> > + .resume = pcie_port_device_resume,
> > + .freeze = pcie_port_device_suspend,
> > + .thaw = pcie_port_device_resume,
> > + .poweroff = pcie_port_device_suspend,
> > + .restore = pcie_port_device_resume,
> > +};
>
> pet peeve: could we please use vertical spaces wherever they improve the
> code?
>
> Something like:
>
> static struct dev_pm_ops pcie_portdrv_pm_ops = {
> .suspend = pcie_port_device_suspend,
> .resume = pcie_port_device_resume,
> .freeze = pcie_port_device_suspend,
> .thaw = pcie_port_device_resume,
> .poweroff = pcie_port_device_suspend,
> .restore = pcie_port_device_resume,
> };
>
> ... and it all becomes clear at a glance. Please?

OK

Rafael

2009-01-31 01:21:52

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:

> This gets more and more interesting.
>
> Can you test the patch below, please?
>
> Rafael
>
> ---
> Subject: PCI PCIe portdrv: Implement pm object
> From: Rafael J. Wysocki <[email protected]>
>
> Implement pm object for the PCI Express port driver in order to use
> the new power management framework.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---

Excellent! This patch works - tg3 comes back and gets link after resume.

Thank you!

Are the below differences worth worrying about - especially since post
suspend some DevID/VendorID and some capabilities seem to be changed?

parag@parag-desktop:~$ diff -u lspci-pre-suspend lspci-post-fix
--- lspci-pre-suspend 2009-01-30 18:19:50.752275695 -0500
+++ lspci-post-fix 2009-01-30 20:14:22.605607870 -0500
@@ -286,9 +286,9 @@
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 86 80
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-40: 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 00 00
+40: 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+60: 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -353,7 +353,7 @@
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
-70: 00 00 00 00 6b e0 6b e0 00 00 0b a9 00 00 0b a9
+70: 00 00 00 00 00 c0 00 c0 01 1b 4c b8 01 1b 4c b8
80: 24 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 08 00 01 01 08 00 01 01 08 00 01 01 08 00
a0: 01 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -395,7 +395,7 @@
40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
-70: 00 00 00 00 69 e0 69 e0 00 00 0b a9 00 00 0b a9
+70: 00 00 00 00 00 40 00 40 01 1b 4c b8 01 1b 4c b8
80: 65 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 01 01 20 00 01 01 20 00 01 01 20 00 01 01 20 00
a0: 01 01 20 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -645,8 +645,8 @@
30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 c2 c9 00 00 00 00 0a 00 a0 20 00 00 00 00
-60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 80
-70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
+60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 c0
+70: 00 00 dd 3f 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -698,7 +698,7 @@
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 10 00 09 14 01 04 00 00 81 04 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-a0: 10 06 00 00 01 00 00 00 00 00 00 00 00 13 00 00
+a0: 10 06 00 00 00 00 00 00 00 00 00 00 00 13 00 00
b0: 00 00 00 00 00 00 00 00 00 10 00 04 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 33 22 11 00 67 45 00 00 c0 c0 00 00 02 00 00 00
@@ -754,7 +754,7 @@
Capabilities: [50] Vital Product Data <?>
Capabilities: [58] Vendor Specific Information <?>
Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable+
- Address: 00000000fee0f00c Data: 41c1
+ Address: 00000000fee0f00c Data: 41c9
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
@@ -778,24 +778,24 @@
20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
-50: 03 58 fc 00 00 00 00 78 09 e8 78 00 95 ef 08 88
+50: 03 58 fc 00 00 00 00 78 09 e8 78 00 9a f7 08 58
60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 34 00 13 04 82 70 08 fc
-90: 19 be 00 01 00 00 00 00 00 00 00 00 94 01 00 00
-a0: 00 00 00 00 cc 00 00 00 00 00 00 00 29 01 00 00
-b0: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 00
+90: 19 be 00 01 00 00 00 44 00 00 00 00 e7 00 00 00
+a0: 00 00 00 00 1f 00 00 00 00 00 00 00 24 00 00 00
+b0: 00 00 00 00 00 00 00 44 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 0e 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
-f0: 00 00 00 00 c1 41 00 00 00 00 00 00 00 00 00 00
+f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00

10:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express
Upstream Port (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=1e, subordinate=40, sec-latency=0
- Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
+ Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) Upstream Port, MSI 00
@@ -818,7 +818,7 @@
Kernel driver in use: pcieport-driver
Kernel modules: shpchp
00: 86 80 00 35 07 01 10 00 01 00 04 06 10 00 81 00
-10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 20
+10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 ff 01 06 00
40: 00 28 02 10 10 70 51 00 01 00 00 00 0f 50 0a 00
@@ -839,7 +839,7 @@
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=10, secondary=11, subordinate=1d, sec-latency=32
- Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
+ Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [44] Express (v1) PCI/PCI-X Bridge, MSI 00
@@ -848,7 +848,7 @@
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
BrConfRtry-
MaxPayload 128 bytes, MaxReadReq 512 bytes
- DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
TransPend-
+ DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s,
Latency L0 unlimited, L1 unlimited
ClockPM- Suprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
@@ -857,7 +857,7 @@
Capabilities: [6c] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
- Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
+ Capabilities: [80] Subsystem: Gammagraphx, Inc. Device 0000
Capabilities: [d8] PCI-X bridge device
Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD-
Freq=133MHz
Status: Dev=00:00.3 64bit- 133MHz- SCD- USC- SCO- SRD-
@@ -866,14 +866,14 @@
Capabilities: [100] Advanced Error Reporting <?>
Kernel modules: shpchp
00: 86 80 0c 35 07 01 10 00 01 00 04 06 10 00 81 00
-10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 22
+10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 02
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 06 00
-40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 0a 00
+40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 00 00
50: 81 f4 03 00 00 00 41 00 00 00 00 00 05 6c 80 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 01 80 02 c8
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-80: 0d d8 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
+80: 0d d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
@@ -916,7 +916,7 @@
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
-50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
+50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 91 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
@@ -962,7 +962,7 @@
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
-50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
+50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 99 41 00 00
70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
parag@parag-desktop:~$

2009-01-31 01:38:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> > This gets more and more interesting.
> >
> > Can you test the patch below, please?
> >
> > Rafael
> >
> > ---
> > Subject: PCI PCIe portdrv: Implement pm object
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Implement pm object for the PCI Express port driver in order to use
> > the new power management framework.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
>
> Excellent! This patch works - tg3 comes back and gets link after resume.
>
> Thank you!

Great, thanks for testing.

It's in the Jesse's linux-next branch, so it should be easy to push
upstream, perhaps with a better changelog.

> Are the below differences worth worrying about - especially since post
> suspend some DevID/VendorID and some capabilities seem to be changed?

I can't tell you right now, I'm too tired. :-)

Anyway, they seem to be worth investigating.

Can you attach one of the files? That will make it easier to look at the
differences.

Thanks,
Rafael


> parag@parag-desktop:~$ diff -u lspci-pre-suspend lspci-post-fix
> --- lspci-pre-suspend 2009-01-30 18:19:50.752275695 -0500
> +++ lspci-post-fix 2009-01-30 20:14:22.605607870 -0500
> @@ -286,9 +286,9 @@
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 86 80
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -40: 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 00 00
> +40: 00 00 00 00 00 00 0f 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> +60: 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> @@ -353,7 +353,7 @@
> 40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
> 50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
> 60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
> -70: 00 00 00 00 6b e0 6b e0 00 00 0b a9 00 00 0b a9
> +70: 00 00 00 00 00 c0 00 c0 01 1b 4c b8 01 1b 4c b8
> 80: 24 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 01 01 08 00 01 01 08 00 01 01 08 00 01 01 08 00
> a0: 01 01 08 00 00 00 00 00 00 00 00 00 00 00 00 00
> @@ -395,7 +395,7 @@
> 40: 00 00 00 80 14 14 b0 b0 ff 01 00 30 ff 01 00 30
> 50: 00 00 00 07 00 00 00 00 ff 3f ff 3f 00 40 00 40
> 60: 00 40 00 40 01 80 01 80 00 00 00 00 00 00 00 00
> -70: 00 00 00 00 69 e0 69 e0 00 00 0b a9 00 00 0b a9
> +70: 00 00 00 00 00 40 00 40 01 1b 4c b8 01 1b 4c b8
> 80: 65 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 01 01 20 00 01 01 20 00 01 01 20 00 01 01 20 00
> a0: 01 01 20 00 00 00 00 00 00 00 00 00 00 00 00 00
> @@ -645,8 +645,8 @@
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 03 01 00 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 01 58 c2 c9 00 00 00 00 0a 00 a0 20 00 00 00 00
> -60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 80
> -70: 00 00 df 3f 00 00 00 00 00 00 00 00 00 00 00 00
> +60: 20 20 ff 01 00 00 00 00 01 00 00 00 00 00 08 c0
> +70: 00 00 dd 3f 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> @@ -698,7 +698,7 @@
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 10 00 09 14 01 04 00 00 81 04 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -a0: 10 06 00 00 01 00 00 00 00 00 00 00 00 13 00 00
> +a0: 10 06 00 00 00 00 00 00 00 00 00 00 00 13 00 00
> b0: 00 00 00 00 00 00 00 00 00 10 00 04 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 33 22 11 00 67 45 00 00 c0 c0 00 00 02 00 00 00
> @@ -754,7 +754,7 @@
> Capabilities: [50] Vital Product Data <?>
> Capabilities: [58] Vendor Specific Information <?>
> Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> - Address: 00000000fee0f00c Data: 41c1
> + Address: 00000000fee0f00c Data: 41c9
> Capabilities: [d0] Express (v1) Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> <4us, L1 unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> @@ -778,24 +778,24 @@
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 07 13
> 30: 00 00 04 20 48 00 00 00 00 00 00 00 03 01 00 00
> 40: 00 00 00 00 00 00 00 00 01 50 03 c0 08 20 00 64
> -50: 03 58 fc 00 00 00 00 78 09 e8 78 00 95 ef 08 88
> +50: 03 58 fc 00 00 00 00 78 09 e8 78 00 9a f7 08 58
> 60: 00 00 00 00 00 00 00 00 98 02 02 a0 00 00 18 76
> 70: f2 10 00 00 c0 00 00 00 20 70 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 34 00 13 04 82 70 08 fc
> -90: 19 be 00 01 00 00 00 00 00 00 00 00 94 01 00 00
> -a0: 00 00 00 00 cc 00 00 00 00 00 00 00 29 01 00 00
> -b0: 00 00 00 00 00 00 00 8e 00 00 00 00 00 00 00 00
> +90: 19 be 00 01 00 00 00 44 00 00 00 00 e7 00 00 00
> +a0: 00 00 00 00 1f 00 00 00 00 00 00 00 24 00 00 00
> +b0: 00 00 00 00 00 00 00 44 00 00 00 00 00 00 00 00
> c0: 00 00 00 00 00 00 00 00 0e 00 00 00 00 00 00 00
> d0: 10 00 01 00 a0 8f 00 00 00 50 10 00 11 64 03 00
> e0: 40 00 11 10 00 00 00 00 05 d0 81 00 0c f0 e0 fe
> -f0: 00 00 00 00 c1 41 00 00 00 00 00 00 00 00 00 00
> +f0: 00 00 00 00 c9 41 00 00 00 00 00 00 00 00 00 00
>
> 10:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express
> Upstream Port (rev 01)
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Bus: primary=10, secondary=1e, subordinate=40, sec-latency=0
> - Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
> + Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- <SERR- <PERR-
> BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [44] Express (v1) Upstream Port, MSI 00
> @@ -818,7 +818,7 @@
> Kernel driver in use: pcieport-driver
> Kernel modules: shpchp
> 00: 86 80 00 35 07 01 10 00 01 00 04 06 10 00 81 00
> -10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 20
> +10: 00 00 00 00 00 00 00 00 10 1e 40 00 f0 00 00 00
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 ff 01 06 00
> 40: 00 28 02 10 10 70 51 00 01 00 00 00 0f 50 0a 00
> @@ -839,7 +839,7 @@
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Bus: primary=10, secondary=11, subordinate=1d, sec-latency=32
> - Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
> + Secondary status: 66MHz+ FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- <SERR- <PERR-
> BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> Capabilities: [44] Express (v1) PCI/PCI-X Bridge, MSI 00
> @@ -848,7 +848,7 @@
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
> Unsupported-
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> BrConfRtry-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr-
> TransPend-
> + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
> TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s,
> Latency L0 unlimited, L1 unlimited
> ClockPM- Suprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
> @@ -857,7 +857,7 @@
> Capabilities: [6c] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> - Capabilities: [80] Subsystem: Hewlett-Packard Company Device 1307
> + Capabilities: [80] Subsystem: Gammagraphx, Inc. Device 0000
> Capabilities: [d8] PCI-X bridge device
> Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD-
> Freq=133MHz
> Status: Dev=00:00.3 64bit- 133MHz- SCD- USC- SCO- SRD-
> @@ -866,14 +866,14 @@
> Capabilities: [100] Advanced Error Reporting <?>
> Kernel modules: shpchp
> 00: 86 80 0c 35 07 01 10 00 01 00 04 06 10 00 81 00
> -10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 22
> +10: 00 00 00 00 00 00 00 00 10 11 1d 20 f0 00 a0 02
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 06 00
> -40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 0a 00
> +40: 80 6e 00 ff 10 6c 71 00 01 00 00 00 00 20 00 00
> 50: 81 f4 03 00 00 00 41 00 00 00 00 00 05 6c 80 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 01 80 02 c8
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -80: 0d d8 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
> +80: 0d d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> @@ -916,7 +916,7 @@
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
> 40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
> -50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
> +50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
> 60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 91 41 00 00
> 70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
> @@ -962,7 +962,7 @@
> 20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 01 06 00
> 40: 00 00 c0 00 10 60 61 00 01 00 00 00 0f 50 00 00
> -50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 48 00
> +50: 41 f4 03 00 20 00 01 10 80 0c 00 00 c0 03 58 00
> 60: 05 70 81 00 0c f0 e0 fe 00 00 00 00 99 41 00 00
> 70: 01 80 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 0d 00 00 00 3c 10 07 13 00 00 00 00 00 00 00 00
> parag@parag-desktop:~$

2009-01-31 01:41:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> Can you test the patch below, please?

Rafael, you're making this _way_ too difficult.

Don't make it use the new PM infrastructure, because that one is certainly
broken: pci_pm_default_suspend_generic() is total crap.

It's saving the disabled state. No WAY is that correct.

That "pci_disable_enabled_device()" should be removed, but even then
that's wrong, because if the driver suspend disabled it, you're now
(again) saving the disabled state.

But all of that is only called if you use the new PM infrastructure. So
the thing is, when you're trying to move the PCI-E drive to the new pm
infrastructure, you're making things _worse_.

The legacy PM infrastructure at least does the whole

pci_dev->state_saved = false;
i = drv->suspend(pci_dev, state);
..
if (pci_dev->state_saved)
goto Fixup;

thing, which will avoid overwriting the state if it was already saved.

HOWEVER. The problem here (I think) is that PCI-E actually does the state
save late, so it won't ever see the "state_saved" in the early ->suspend.
I think a patch like the one below at least simplifies this all, and lets
the PCI layer itself do all the core restore stuff.

The new PM infrastructure gets this totally wrong, and
(a) disables the device before saving state
and
(b) overwrites the (now corrupted) saved state that the driver perhaps
already saved, after the driver may even have put it to sleep.

So let's not use the new PM infrastructure - I don't think it's ready yet.

Let's start simplifying first. Start off by getting rid of the
suspend_early/resume_late, since the PCI layer now does it for us.

I don't see why we don't resume with IO/MEM on, though. The legacy suspend
sequence shouldn't disable them, afaik.

Linus

---
drivers/pci/pcie/portdrv_pci.c | 14 --------------
1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 99a914a..08a8e3c 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -55,16 +55,6 @@ static int pcie_portdrv_suspend(struct pci_dev *dev, pm_message_t state)

}

-static int pcie_portdrv_suspend_late(struct pci_dev *dev, pm_message_t state)
-{
- return pci_save_state(dev);
-}
-
-static int pcie_portdrv_resume_early(struct pci_dev *dev)
-{
- return pci_restore_state(dev);
-}
-
static int pcie_portdrv_resume(struct pci_dev *dev)
{
pcie_portdrv_restore_config(dev);
@@ -72,8 +62,6 @@ static int pcie_portdrv_resume(struct pci_dev *dev)
}
#else
#define pcie_portdrv_suspend NULL
-#define pcie_portdrv_suspend_late NULL
-#define pcie_portdrv_resume_early NULL
#define pcie_portdrv_resume NULL
#endif

@@ -292,8 +280,6 @@ static struct pci_driver pcie_portdriver = {
.remove = pcie_portdrv_remove,

.suspend = pcie_portdrv_suspend,
- .suspend_late = pcie_portdrv_suspend_late,
- .resume_early = pcie_portdrv_resume_early,
.resume = pcie_portdrv_resume,

.err_handler = &pcie_portdrv_err_handler,

2009-01-31 01:42:30

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:

>
> Anyway, they seem to be worth investigating.
>
> Can you attach one of the files? That will make it easier to look at the
> differences.
>

Sure - here is the lspci-pre-suspend file.

Parag


Attachments:
lspci-pre-suspend (58.07 kB)

2009-01-31 01:46:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Parag Warudkar wrote:
>
> Excellent! This patch works - tg3 comes back and gets link after resume.

I still think the patch isn't very good. See my previous email.

The fact that your machine works again is good, though. But before we let
this lie, I'd _really_ like to know what was broken in the legacy PM path,
rather than "let's leave it behind". Because a broken legacy path will end
up biting us for other drivers, and I think the new PM path will need more
work before it's ready for prime-time.

> Are the below differences worth worrying about - especially since post
> suspend some DevID/VendorID and some capabilities seem to be changed?

That's not a devid/vendorid, it's an extended "subsystem" capability that
we didn't save. We could try to save/restore all capabilities, but right
now we only do the ones we care about (pcie/pcix/msi, iirc).

I do suspect we might be better off saving everything we can, rather than
deciding piece-meal to save specific capabilities we know about.

Linus

2009-01-31 01:54:55

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Linus Torvalds wrote:

>
>
> On Fri, 30 Jan 2009, Parag Warudkar wrote:
> >
> > Excellent! This patch works - tg3 comes back and gets link after resume.
>
> I still think the patch isn't very good. See my previous email.

Ok - I will run with it temporarily and will do what I can to track down
what was wrong with the legacy PM.

Thanks!
Parag

2009-01-31 02:19:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Linus Torvalds wrote:
>
> I still think the patch isn't very good. See my previous email.
>
> The fact that your machine works again is good, though. But before we let
> this lie, I'd _really_ like to know what was broken in the legacy PM path,
> rather than "let's leave it behind". Because a broken legacy path will end
> up biting us for other drivers, and I think the new PM path will need more
> work before it's ready for prime-time.

Ho humm. I have a feeling..

The legacy resume basically ends up doing just

pci_restore_standard_config(dev)

in the resume_early path (in pci_pm_default_resume_noirq, which is shared
with both the legacy and the new PM model).

The _new_ PM layer does that too (it's shared), but then in the regular
resume sequence it _also_ does the pci_pm_reenable_device(), and I think
this is key.

Why?

Look at pci_restore_standard_config(): it restores the PCI config space,
but it does so _before_ actually turning the device into PCI_D0. And
that's not actually guaranteed to work at all - if a device is in D3, you
can still read from config space, but writing to it may or may not
actually work.

This may explain why your PCIE bridge works when moving over to the new
PM: because of the whole pci_pm_reenable_device() thing, we end up
doing the pci_enable_resources() thing later, and now it's in PCI_D0, so
now the device actually reacts to it.

This also explains why we don't care if we save the wrong state or not:
even if we save state with IO/MEM disabled, we'll re-enable it at
->resume() time.

HOWEVER, that's still buggy, since it's potentially too late, since any
interrupts that come in before that will see the device without the IO/MEM
set, leading to the whole "hung interrupt" issue again even if the device
is now on.

So we really do want to restore the state to whatever saves state in the
_early_ resume phase, both for legacy and new PM rules, I think. And we
want to make sure that we restore state while the device is in D0, because
otherwise I really think that it possibly could lose our writes.

(Somebody should check me on that - maybe I remember wrong, and config
space writes are guaranteed to take effect even when something is in
D3cold).

So I think we should:

- make sure that the generic PCI layer saves the right state, for the new
PM model too. Don't save it if the driver already did (because the
driver may have turned off the device after saving the state)

- make sure that the legacy PM layer turns the device on before it
restores the state, to make sure that it "takes".

I dunno. I haven't really walked through all the states, but _something_
like this might do it. Note the change to pci_pm_default_suspend_generic:
we save the state only if the driver didn't do it already (which is also
why I ended up having to move all the "dev->state_saved = false" things
around), and we do not disable the device (because for all we know, the
low-level drivers will still need access to the IO/MEM space even at the
suspend_late stage!).

Rafael?

Linus
---
drivers/pci/pci-driver.c | 20 +++++++++++++-------
drivers/pci/pci.c | 3 +++
drivers/pci/pcie/portdrv_pci.c | 14 --------------
3 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 9de07b7..5611f22 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -355,8 +355,6 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state)
int i = 0;

if (drv && drv->suspend) {
- pci_dev->state_saved = false;
-
i = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, i);
if (i)
@@ -434,13 +432,18 @@ static int pci_pm_default_resume(struct pci_dev *pci_dev)

static void pci_pm_default_suspend_generic(struct pci_dev *pci_dev)
{
- /* If device is enabled at this point, disable it */
- pci_disable_enabled_device(pci_dev);
/*
- * Save state with interrupts enabled, because in principle the bus the
- * device is on may be put into a low power state after this code runs.
+ * If the driver didn't save state, do it here with interrupts enabled,
+ * because in principle the bus the device is on may be put into a
+ * low power state after this code runs.
*/
- pci_save_state(pci_dev);
+ if (!pci_dev->state_saved)
+ pci_save_state(pci_dev);
+
+#if 0 /* Why? */
+ /* If device is enabled at this point, disable it */
+ pci_disable_enabled_device(pci_dev);
+#endif
}

static void pci_pm_default_suspend(struct pci_dev *pci_dev)
@@ -498,6 +501,7 @@ static int pci_pm_suspend(struct device *dev)
struct device_driver *drv = dev->driver;
int error = 0;

+ dev->state_saved = false;
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_SUSPEND);

@@ -583,6 +587,7 @@ static int pci_pm_freeze(struct device *dev)
struct device_driver *drv = dev->driver;
int error = 0;

+ pci_dev->state_saved = false;
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_FREEZE);

@@ -657,6 +662,7 @@ static int pci_pm_poweroff(struct device *dev)
struct device_driver *drv = dev->driver;
int error = 0;

+ pci_dev->state_saved = false; /* Do we care? */
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_HIBERNATE);

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 17bd932..d16af49 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1421,6 +1421,9 @@ int pci_restore_standard_config(struct pci_dev *dev)

dev->current_state = PCI_D0;

+ /* Restore state _again_, now that the device is actually on */
+ pci_restore_state(dev);
+
return 0;
}

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 99a914a..08a8e3c 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -55,16 +55,6 @@ static int pcie_portdrv_suspend(struct pci_dev *dev, pm_message_t state)

}

-static int pcie_portdrv_suspend_late(struct pci_dev *dev, pm_message_t state)
-{
- return pci_save_state(dev);
-}
-
-static int pcie_portdrv_resume_early(struct pci_dev *dev)
-{
- return pci_restore_state(dev);
-}
-
static int pcie_portdrv_resume(struct pci_dev *dev)
{
pcie_portdrv_restore_config(dev);
@@ -72,8 +62,6 @@ static int pcie_portdrv_resume(struct pci_dev *dev)
}
#else
#define pcie_portdrv_suspend NULL
-#define pcie_portdrv_suspend_late NULL
-#define pcie_portdrv_resume_early NULL
#define pcie_portdrv_resume NULL
#endif

@@ -292,8 +280,6 @@ static struct pci_driver pcie_portdriver = {
.remove = pcie_portdrv_remove,

.suspend = pcie_portdrv_suspend,
- .suspend_late = pcie_portdrv_suspend_late,
- .resume_early = pcie_portdrv_resume_early,
.resume = pcie_portdrv_resume,

.err_handler = &pcie_portdrv_err_handler,

2009-01-31 02:26:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Parag Warudkar wrote:
>
> Ok - I will run with it temporarily and will do what I can to track down
> what was wrong with the legacy PM.

If you can try the patch I just sent out, and use that as a base for
trying to track down why the heck the legacy code doesn't work, that would
be great. It might fix it (assuming my guess about "restore_state while in
PCI_D3 doesn't work" was correct), but quite frankly, it's equally
possible that it just makes things worse. But it would be really
interesting to hear..

Your machine does seem to be interesting, in that not only does it have a
PCI-E bridge in it (the eeepc I was playing around with at LCA does not),
but judging by the lost config state I also suspect that it actually loses
power during STR.

Which is not at all necessarily a given - I suspect it depends on just how
the power rails are set up on the motherboard. The fact that PCI-E bridges
have apparently worked for others implies that your problems don't happen
for everybody, and may relate to that issue.

Linus

2009-01-31 02:40:34

by Parag Warudkar

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Fri, 30 Jan 2009, Linus Torvalds wrote:

> If you can try the patch I just sent out, and use that as a base for
> trying to track down why the heck the legacy code doesn't work, that would
> be great. It might fix it (assuming my guess about "restore_state while in
> PCI_D3 doesn't work" was correct), but quite frankly, it's equally
> possible that it just makes things worse. But it would be really
> interesting to hear..

Sure - will do.

>
> Your machine does seem to be interesting, in that not only does it have a
> PCI-E bridge in it (the eeepc I was playing around with at LCA does not),
> but judging by the lost config state I also suspect that it actually loses
> power during STR.

Not sure what the significance of eeepc is in this case - mine being a
standard Intel 5400 chipset I would have thought that's the last
place to look for interesting things!

> Which is not at all necessarily a given - I suspect it depends on just how
> the power rails are set up on the motherboard. The fact that PCI-E bridges
> have apparently worked for others implies that your problems don't happen
> for everybody, and may relate to that issue.

Do we know this for sure that PCI-E bridges + Suspend have worked for
others - In this thread at least I think people reported tg3 worked but not
necessarily with a PCI-E bridge.

Parag

2009-01-31 18:52:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Parag Warudkar wrote:
>
> On Fri, 30 Jan 2009, Linus Torvalds wrote:
>
> > If you can try the patch I just sent out, and use that as a base for
> > trying to track down why the heck the legacy code doesn't work, that would
> > be great. It might fix it (assuming my guess about "restore_state while in
> > PCI_D3 doesn't work" was correct), but quite frankly, it's equally
> > possible that it just makes things worse. But it would be really
> > interesting to hear..
>
> Sure - will do.
>
> >
> > Your machine does seem to be interesting, in that not only does it have a
> > PCI-E bridge in it (the eeepc I was playing around with at LCA does not),
> > but judging by the lost config state I also suspect that it actually loses
> > power during STR.
>
> Not sure what the significance of eeepc is in this case - mine being a
> standard Intel 5400 chipset I would have thought that's the last
> place to look for interesting things!
>
> > Which is not at all necessarily a given - I suspect it depends on just how
> > the power rails are set up on the motherboard. The fact that PCI-E bridges
> > have apparently worked for others implies that your problems don't happen
> > for everybody, and may relate to that issue.
>
> Do we know this for sure that PCI-E bridges + Suspend have worked for
> others - In this thread at least I think people reported tg3 worked but not
> necessarily with a PCI-E bridge.

Technically, they are PCIe root ports and many systems have network adapters
connected through them (I have two such systems here, but none of them with
tg3-compatible hardware - my tg3 is a PCI device).

Thanks,
Rafael

2009-01-31 20:46:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Fri, 30 Jan 2009, Linus Torvalds wrote:
> >
> > I still think the patch isn't very good. See my previous email.
> >
> > The fact that your machine works again is good, though. But before we let
> > this lie, I'd _really_ like to know what was broken in the legacy PM path,
> > rather than "let's leave it behind". Because a broken legacy path will end
> > up biting us for other drivers, and I think the new PM path will need more
> > work before it's ready for prime-time.
>
> Ho humm. I have a feeling..
>
> The legacy resume basically ends up doing just
>
> pci_restore_standard_config(dev)
>
> in the resume_early path (in pci_pm_default_resume_noirq, which is shared
> with both the legacy and the new PM model).

Well, please remember that the PCIe port driver has ->resume() too, which is
pcie_portdrv_resume(), which calls pcie_portdrv_restore_config() and that
calls pci_enable_device() (not reenable, just plain enable).

However, that sees the device has been already enabled and doesn't enable
it, although it is supposed to do that.

> The _new_ PM layer does that too (it's shared), but then in the regular
> resume sequence it _also_ does the pci_pm_reenable_device(), and I think
> this is key.

It well may be, but I'm not too sure.

> Why?
>
> Look at pci_restore_standard_config(): it restores the PCI config space,
> but it does so _before_ actually turning the device into PCI_D0. And
> that's not actually guaranteed to work at all - if a device is in D3, you
> can still read from config space, but writing to it may or may not
> actually work.

Yes, I sent a patch for it to Jesse, but he hasn't pushed it yet:
http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commit;h=48f67f54a53bb68619a63c3f38cf7f502ed74b1d

Parag, would it be possible to test this patch on top of 2.6.29-rc3?

> This may explain why your PCIE bridge works when moving over to the new
> PM: because of the whole pci_pm_reenable_device() thing, we end up
> doing the pci_enable_resources() thing later, and now it's in PCI_D0, so
> now the device actually reacts to it.
>
> This also explains why we don't care if we save the wrong state or not:
> even if we save state with IO/MEM disabled, we'll re-enable it at
> ->resume() time.

Nice theory, but please also look at the original 2.6.28 code in
drivers/pci/pcie/portdrv_pci.c . It does:
(suspend):
- suspend port services
- pci_save_state

(resume):
- pci_restore_state()
- pci_enable_device() -> this doesn't enable the device, because it sees the
non-zero refrence count end doesn't do anything.

So, according to your theory, the 2.6.28 code shouldn't work for Parag, but it
does.

Moreover, why oh why the state we save would contain IO/MEM disabled?

We _never_ _ever_ disable them, so WHY?

> HOWEVER, that's still buggy, since it's potentially too late, since any
> interrupts that come in before that will see the device without the IO/MEM
> set, leading to the whole "hung interrupt" issue again even if the device
> is now on.
>
> So we really do want to restore the state to whatever saves state in the
> _early_ resume phase, both for legacy and new PM rules, I think.

We do that, no?

> And we want to make sure that we restore state while the device is in D0,
> because otherwise I really think that it possibly could lose our writes.

That's what the above-mentioned patch fixes.

> (Somebody should check me on that - maybe I remember wrong, and config
> space writes are guaranteed to take effect even when something is in
> D3cold).

Only in D3hot, but MSI-X tables reside in memory space and we need to restore
them too. So, we need to put the device into D0 before restoring the state, if
possible, anyway.

Still, we can't bring any device from D3cold into D0 without ACPI and we can't
use ACPI for that with interrupts off.

> So I think we should:
>
> - make sure that the generic PCI layer saves the right state, for the new
> PM model too.

I thought we did that.

> Don't save it if the driver already did (because the
> driver may have turned off the device after saving the state)

In the new PM model the idea is that drivers won't do that. The core is
supposed to both save the state and turn the device off.

> - make sure that the legacy PM layer turns the device on before it
> restores the state, to make sure that it "takes".

With the above-mentioned patch from the Jesse's tree, it should work this way.

Still, as I said before, I don't think this is relevant in this particular
case, because the original 2.6.28 code evidently works for Parag.

> I dunno. I haven't really walked through all the states, but _something_
> like this might do it. Note the change to pci_pm_default_suspend_generic:
> we save the state only if the driver didn't do it already (which is also
> why I ended up having to move all the "dev->state_saved = false" things
> around), and we do not disable the device (because for all we know, the
> low-level drivers will still need access to the IO/MEM space even at the
> suspend_late stage!).
>
> Rafael?

I think something different from what you told is happening here. I'm not sure
what it is, I need more information.

I'm very much interested in whether your patch below makes any difference for
Parag.

Thanks,
Rafael


> ---
> drivers/pci/pci-driver.c | 20 +++++++++++++-------
> drivers/pci/pci.c | 3 +++
> drivers/pci/pcie/portdrv_pci.c | 14 --------------
> 3 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 9de07b7..5611f22 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -355,8 +355,6 @@ static int pci_legacy_suspend(struct device *dev, pm_message_t state)
> int i = 0;
>
> if (drv && drv->suspend) {
> - pci_dev->state_saved = false;
> -
> i = drv->suspend(pci_dev, state);
> suspend_report_result(drv->suspend, i);
> if (i)
> @@ -434,13 +432,18 @@ static int pci_pm_default_resume(struct pci_dev *pci_dev)
>
> static void pci_pm_default_suspend_generic(struct pci_dev *pci_dev)
> {
> - /* If device is enabled at this point, disable it */
> - pci_disable_enabled_device(pci_dev);
> /*
> - * Save state with interrupts enabled, because in principle the bus the
> - * device is on may be put into a low power state after this code runs.
> + * If the driver didn't save state, do it here with interrupts enabled,
> + * because in principle the bus the device is on may be put into a
> + * low power state after this code runs.
> */
> - pci_save_state(pci_dev);
> + if (!pci_dev->state_saved)
> + pci_save_state(pci_dev);
> +
> +#if 0 /* Why? */
> + /* If device is enabled at this point, disable it */
> + pci_disable_enabled_device(pci_dev);
> +#endif
> }
>
> static void pci_pm_default_suspend(struct pci_dev *pci_dev)
> @@ -498,6 +501,7 @@ static int pci_pm_suspend(struct device *dev)
> struct device_driver *drv = dev->driver;
> int error = 0;
>
> + dev->state_saved = false;
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend(dev, PMSG_SUSPEND);
>
> @@ -583,6 +587,7 @@ static int pci_pm_freeze(struct device *dev)
> struct device_driver *drv = dev->driver;
> int error = 0;
>
> + pci_dev->state_saved = false;
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend(dev, PMSG_FREEZE);
>
> @@ -657,6 +662,7 @@ static int pci_pm_poweroff(struct device *dev)
> struct device_driver *drv = dev->driver;
> int error = 0;
>
> + pci_dev->state_saved = false; /* Do we care? */
> if (pci_has_legacy_pm_support(pci_dev))
> return pci_legacy_suspend(dev, PMSG_HIBERNATE);
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 17bd932..d16af49 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1421,6 +1421,9 @@ int pci_restore_standard_config(struct pci_dev *dev)
>
> dev->current_state = PCI_D0;
>
> + /* Restore state _again_, now that the device is actually on */
> + pci_restore_state(dev);
> +
> return 0;
> }
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 99a914a..08a8e3c 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -55,16 +55,6 @@ static int pcie_portdrv_suspend(struct pci_dev *dev, pm_message_t state)
>
> }
>
> -static int pcie_portdrv_suspend_late(struct pci_dev *dev, pm_message_t state)
> -{
> - return pci_save_state(dev);
> -}
> -
> -static int pcie_portdrv_resume_early(struct pci_dev *dev)
> -{
> - return pci_restore_state(dev);
> -}
> -
> static int pcie_portdrv_resume(struct pci_dev *dev)
> {
> pcie_portdrv_restore_config(dev);
> @@ -72,8 +62,6 @@ static int pcie_portdrv_resume(struct pci_dev *dev)
> }
> #else
> #define pcie_portdrv_suspend NULL
> -#define pcie_portdrv_suspend_late NULL
> -#define pcie_portdrv_resume_early NULL
> #define pcie_portdrv_resume NULL
> #endif
>
> @@ -292,8 +280,6 @@ static struct pci_driver pcie_portdriver = {
> .remove = pcie_portdrv_remove,
>
> .suspend = pcie_portdrv_suspend,
> - .suspend_late = pcie_portdrv_suspend_late,
> - .resume_early = pcie_portdrv_resume_early,
> .resume = pcie_portdrv_resume,
>
> .err_handler = &pcie_portdrv_err_handler,

2009-01-31 21:09:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > Can you test the patch below, please?
>
> Rafael, you're making this _way_ too difficult.
>
> Don't make it use the new PM infrastructure, because that one is certainly
> broken: pci_pm_default_suspend_generic() is total crap.

Oh my. I beg to differ.

> It's saving the disabled state. No WAY is that correct.

So why does it work, actually?

> That "pci_disable_enabled_device()" should be removed, but even then
> that's wrong, because if the driver suspend disabled it, you're now
> (again) saving the disabled state.
>
> But all of that is only called if you use the new PM infrastructure. So
> the thing is, when you're trying to move the PCI-E drive to the new pm
> infrastructure, you're making things _worse_.

I don't really think so (see below).

> The legacy PM infrastructure at least does the whole
>
> pci_dev->state_saved = false;
> i = drv->suspend(pci_dev, state);
> ..
> if (pci_dev->state_saved)
> goto Fixup;
>
> thing, which will avoid overwriting the state if it was already saved.
>
> HOWEVER. The problem here (I think) is that PCI-E actually does the state
> save late, so it won't ever see the "state_saved" in the early ->suspend.
> I think a patch like the one below at least simplifies this all, and lets
> the PCI layer itself do all the core restore stuff.
>
> The new PM infrastructure gets this totally wrong, and
> (a) disables the device before saving state

pci_disable_device() does really only one thing: it clears the bus master bit.
Yes, it also calls pcibios_disable_device(), but on x86 this is a NOP.

I don't think it is SO bad, is it?

> and
> (b) overwrites the (now corrupted) saved state that the driver perhaps
> already saved, after the driver may even have put it to sleep.

The driver using the new model is not supposed to save the state and power
off the device. Still, it's probably a good idea not to trust the drivers. :-)

> So let's not use the new PM infrastructure - I don't think it's ready yet.
>
> Let's start simplifying first. Start off by getting rid of the
> suspend_early/resume_late, since the PCI layer now does it for us.
>
> I don't see why we don't resume with IO/MEM on, though. The legacy suspend
> sequence shouldn't disable them, afaik.

No, it shouldn't.

However, the patch below actually may help and it really is not too different
from my "new infrastructure" patch. It leaves the pcie_portdrv_restore_config()
in the PCIe port driver's ->resume(), but that shouldn't change things,
pci_enable_device() in there shouldn't do anything and the bus master bit
should already be set.

The "new infrastracture" patch makes pci_disable_enabled_device() be called in
the suspend code path, but that only disables the bus master bit, and
pci_reenable_device() be called in the resume code path, but that only sets the
bus master bit back again. So, they are almost the same and I'd be surprised
if your patch didn't help.

I had that "new infrastracture" patch ready yesterday and I thought it might
help, so I sent it. I was too tired to prepare a new patch and that would
probably look like the one below (I'd remove the pcie_portdrv_restore_config()
from ->resume too, but that's only a detail).

Thanks,
Rafael

> ---
> drivers/pci/pcie/portdrv_pci.c | 14 --------------
> 1 files changed, 0 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 99a914a..08a8e3c 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -55,16 +55,6 @@ static int pcie_portdrv_suspend(struct pci_dev *dev, pm_message_t state)
>
> }
>
> -static int pcie_portdrv_suspend_late(struct pci_dev *dev, pm_message_t state)
> -{
> - return pci_save_state(dev);
> -}
> -
> -static int pcie_portdrv_resume_early(struct pci_dev *dev)
> -{
> - return pci_restore_state(dev);
> -}
> -
> static int pcie_portdrv_resume(struct pci_dev *dev)
> {
> pcie_portdrv_restore_config(dev);
> @@ -72,8 +62,6 @@ static int pcie_portdrv_resume(struct pci_dev *dev)
> }
> #else
> #define pcie_portdrv_suspend NULL
> -#define pcie_portdrv_suspend_late NULL
> -#define pcie_portdrv_resume_early NULL
> #define pcie_portdrv_resume NULL
> #endif
>
> @@ -292,8 +280,6 @@ static struct pci_driver pcie_portdriver = {
> .remove = pcie_portdrv_remove,
>
> .suspend = pcie_portdrv_suspend,
> - .suspend_late = pcie_portdrv_suspend_late,
> - .resume_early = pcie_portdrv_resume_early,
> .resume = pcie_portdrv_resume,
>
> .err_handler = &pcie_portdrv_err_handler,

2009-01-31 21:43:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)

On Saturday 31 January 2009, Rafael J. Wysocki wrote:
> On Saturday 31 January 2009, Linus Torvalds wrote:
> >
> > On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> > >
> > > Can you test the patch below, please?
> >
> > Rafael, you're making this _way_ too difficult.
> >
> > Don't make it use the new PM infrastructure, because that one is certainly
> > broken: pci_pm_default_suspend_generic() is total crap.
>
> Oh my. I beg to differ.
>
> > It's saving the disabled state. No WAY is that correct.
>
> So why does it work, actually?
>
> > That "pci_disable_enabled_device()" should be removed, but even then
> > that's wrong, because if the driver suspend disabled it, you're now
> > (again) saving the disabled state.
> >
> > But all of that is only called if you use the new PM infrastructure. So
> > the thing is, when you're trying to move the PCI-E drive to the new pm
> > infrastructure, you're making things _worse_.
>
> I don't really think so (see below).
>
> > The legacy PM infrastructure at least does the whole
> >
> > pci_dev->state_saved = false;
> > i = drv->suspend(pci_dev, state);
> > ..
> > if (pci_dev->state_saved)
> > goto Fixup;
> >
> > thing, which will avoid overwriting the state if it was already saved.
> >
> > HOWEVER. The problem here (I think) is that PCI-E actually does the state
> > save late, so it won't ever see the "state_saved" in the early ->suspend.
> > I think a patch like the one below at least simplifies this all, and lets
> > the PCI layer itself do all the core restore stuff.
> >
> > The new PM infrastructure gets this totally wrong, and
> > (a) disables the device before saving state
>
> pci_disable_device() does really only one thing: it clears the bus master bit.
> Yes, it also calls pcibios_disable_device(), but on x86 this is a NOP.
>
> I don't think it is SO bad, is it?
>
> > and
> > (b) overwrites the (now corrupted) saved state that the driver perhaps
> > already saved, after the driver may even have put it to sleep.
>
> The driver using the new model is not supposed to save the state and power
> off the device. Still, it's probably a good idea not to trust the drivers. :-)
>
> > So let's not use the new PM infrastructure - I don't think it's ready yet.
> >
> > Let's start simplifying first. Start off by getting rid of the
> > suspend_early/resume_late, since the PCI layer now does it for us.
> >
> > I don't see why we don't resume with IO/MEM on, though. The legacy suspend
> > sequence shouldn't disable them, afaik.
>
> No, it shouldn't.
>
> However, the patch below actually may help and it really is not too different
> from my "new infrastructure" patch. It leaves the pcie_portdrv_restore_config()
> in the PCIe port driver's ->resume(), but that shouldn't change things,
> pci_enable_device() in there shouldn't do anything and the bus master bit
> should already be set.
>
> The "new infrastracture" patch makes pci_disable_enabled_device() be called in
> the suspend code path, but that only disables the bus master bit, and
> pci_reenable_device() be called in the resume code path, but that only sets the
> bus master bit back again.

I should have said "in this particular case", because it actually makes a
difference for devices using interrupt pins. Namely, pcibios_enable_device()
additionally enables PCI resources (that may make a difference, but everything
should have been restored already) and sets up an interrupt link for the
device. If the link has been set up already, which often is the case, it
increases a reference count and exits.

Now, this reference count is not used for anything, but it was supposed to be
used for disabling the interrupt links that are no longer needed. This
mechanism is currently disabled, but if we enable it at one point (which may
be necessary to fix suspend-resume on some boxes), it won't work if
the "enable" calls are not balanced with "disable" ones. IOW, for every
pcibios_enable_device() call there should be a complementary
pcibios_disable_device() call. That's why I decided to put the
disable/enable things into the PCI core's suspend/resume code paths (also,
because pci_reenable_device() has been already there for driverless devices).
This might not be the right choice, but I don't really think it does break
things.

Anyway, from what I can tell reading your messages in this thread so far,
you seem to want the PCI core to:
(1) save the state of devices during suspend (avoid doing that if the driver has
already saved the state),
(2) put devices into D0 during resume (early),
(3) restore the state of devices during resume (early).
Still, you don't want the core to disable devices during suspend and to enable
(or reenable) them during resume.

What about putting devices into low power states? [Note that ACPI may be
necessary for this purpose.]

What about devices with no drivers and/or without suspend/resume support?
Do you want them to be disabled during suspend and enabled during resume by
the core? [I guess you do, they have no interrupt handlers that may break
after all.]

Thanks,
Rafael

2009-01-31 21:47:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > Don't make it use the new PM infrastructure, because that one is certainly
> > broken: pci_pm_default_suspend_generic() is total crap.
>
> Oh my. I beg to differ.

Imagine that you're a bridge. Imagine what this does to any device
_behind_ you. Any device that plans to still do something at suspend_late
time, to be specific.

> > It's saving the disabled state. No WAY is that correct.
>
> So why does it work, actually?

And I suspect it simply doesn't. And since almost nobody uses the new PM
state, you just don't see it.

But as to why it fixes Parag's case - I think that's because the new PM
resume does more than the legacy resume does, so it ends up re-enabling
things anyway. It does it too late, but it doesn't matter in this case (no
shared irq issues with the only device behind the pci-e bridge).

> > But all of that is only called if you use the new PM infrastructure. So
> > the thing is, when you're trying to move the PCI-E drive to the new pm
> > infrastructure, you're making things _worse_.
>
> I don't really think so (see below).

See above. I think you really haven't thought the new PM code through.

> pci_disable_device() does really only one thing: it clears the bus master bit.
> Yes, it also calls pcibios_disable_device(), but on x86 this is a NOP.
>
> I don't think it is SO bad, is it?

It's bad. It means that DMA won't work across such a bridge. Yes, it is
probably bridge-dependent, and I know for a fact that at least some Intel
bridges just hard-code the busmaster bit to 1 (at a minimum the host
bridges do, I'm not sure about others), but I also know for a fact that
some other bridges _will_ stop DMA to devices behind them if the BM bit is
clear.

But more importantly: Why do you do it? What's the upside? I don't see it.
There's a known downside - you're saving state that is something else than
what the driver really expects.

So I think clearing bus-master is a huge bug on a bridge, but I think that
on normal devices it's just pointless.

> The driver using the new model is not supposed to save the state and power
> off the device. Still, it's probably a good idea not to trust the drivers. :-)

How about devices that have magic power-down sequences? For example, a
quick grep shows that USB on a PPC PMAC has a special "disable ASIC clocks
for USB" thing after it puts the USB controller to sleep.

That was literally the _first_ driver I looked at. Admittedly because I
knew that USB host controllers tend to be more aware of all the issues
than most random drivers, but still...

I agree that the new model should turn off devices by default, but the
thing is, it should also allow drivers that really know better to do magic
things.

Of course, we can say that such devices should just continue to use the
legacy model, but I thought that the long-term plan was to just replace
it. And if you do, you need to allow for drivers that do special things
due to known motherboard- or chip-specific "issues" (aka "PCI extensions"
aka "hardware bugs").

And yes, I suspect that the magic PPC USB clock thing could maybe be
rewritten as a system device, so there are other alternatives here, but I
do suspect that it will be very painful if the new PM layer _forces_ a
very specific model on drivers that they can't modify at all.

> > I don't see why we don't resume with IO/MEM on, though. The legacy suspend
> > sequence shouldn't disable them, afaik.
>
> No, it shouldn't.
>
> However, the patch below actually may help and it really is not too different
> from my "new infrastructure" patch. It leaves the pcie_portdrv_restore_config()
> in the PCIe port driver's ->resume(), but that shouldn't change things,
> pci_enable_device() in there shouldn't do anything and the bus master bit
> should already be set.

I absolutely agree with this patch. I'd just not expect it to make a
difference (except for the "cleanup factor"). I think it's worth applying,
and _if_ it makes a difference for Parag it's very interesting indeed.

> The "new infrastracture" patch makes pci_disable_enabled_device() be called in
> the suspend code path, but that only disables the bus master bit, and
> pci_reenable_device() be called in the resume code path, but that only sets the
> bus master bit back again. So, they are almost the same and I'd be surprised
> if your patch didn't help.

Hmm. We'll see. I'm a bit doubtful. But we'll see..

Linus

2009-01-31 22:00:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> Anyway, from what I can tell reading your messages in this thread so far,
> you seem to want the PCI core to:
> (1) save the state of devices during suspend (avoid doing that if the driver has
> already saved the state),

Yes, but I'd at least want the device drivers to have the _option_ to
do it themselves.

I would expect that by default any normal PCI driver would just rely on
the PCI layer doing it all for them. But I suspect we may have cases where
the chip driver will simply want to override things, for one reason or
another.

We do know that some devices seem to be very picky and get unhappy about
being put to sleep (we don't put devices into D3 by default in the legacy
PM case for a reason!), and we do know that some existing drivers do extra
things _after_ they've put the device to D3.

So there very much are arguments for drivers wanting to do their own "save
state and power off" if they have special needs.

(Side note: it's entirely possible that one of the reasons we don't put
devices into D3 in the legacy code-path is purely historical: maybe not
because the devices were unhappy, but simply because it triggered the
whole "interrupt at an unlucky place" thing. So I'm hoping that we'll
actually not have this as a real issue, but..)

> (2) put devices into D0 during resume (early),
> (3) restore the state of devices during resume (early).

Yes.

> Still, you don't want the core to disable devices during suspend and to enable
> (or reenable) them during resume.

At an absolute _minimum_, bridges are special.

We already know bridges are special: we do things like

if (!pci_is_bridge(pci_dev))
pci_prepare_to_sleep(pci_dev);


ie we don't actually put the bridges into D3 sleep, because the devices
behind the bridge still need to be available until at LEAST the
"suspend_late()" stage.

But then pci_pm_default_suspend_generic() does that
pci_disable_enabled_device() unconditionally - even for bridges. That's
just wrong, wrong, wrong.

> What about putting devices into low power states? [Note that ACPI may be
> necessary for this purpose.]

I do think we should do it, although I'd at least personally prefer
delaying it to the suspend_late (noirq) phase.

Why? Think about a shared interrupt again - but now coming in at just the
wrong time during _suspend_. The PCI layer has turned off the device.
Oops. Lockup. The same lock-up we worked so hard to avoid during resume.

> What about devices with no drivers and/or without suspend/resume support?

Oh, suspending those early and aggressively may be the right thing. But
again, at least bridges are special.

Some bridges have drivers (pci-e and cardbus bridges at least), others
don't (regular pci bridges). So bridges hit both the "has drivers" and
"don't have drivers" case, and are special in both cases.

Linus

2009-01-31 22:47:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> > >
> > > Don't make it use the new PM infrastructure, because that one is certainly
> > > broken: pci_pm_default_suspend_generic() is total crap.
> >
> > Oh my. I beg to differ.
>
> Imagine that you're a bridge. Imagine what this does to any device
> _behind_ you. Any device that plans to still do something at suspend_late
> time, to be specific.
>
> > > It's saving the disabled state. No WAY is that correct.
> >
> > So why does it work, actually?
>
> And I suspect it simply doesn't. And since almost nobody uses the new PM
> state, you just don't see it.

But many drivers have an analogous code sequence in their PM callbacks and
I've tested it with several drivers on my test boxes. It's never failed for me.

> But as to why it fixes Parag's case - I think that's because the new PM
> resume does more than the legacy resume does, so it ends up re-enabling
> things anyway. It does it too late, but it doesn't matter in this case (no
> shared irq issues with the only device behind the pci-e bridge).

Still, the 2.6.28 resume didn't do the "reenable device" thing and it worked.

I think in the Parag's case the problem is the "double restore".

> > > But all of that is only called if you use the new PM infrastructure. So
> > > the thing is, when you're trying to move the PCI-E drive to the new pm
> > > infrastructure, you're making things _worse_.
> >
> > I don't really think so (see below).
>
> See above. I think you really haven't thought the new PM code through.

Yes, I have, but my experience apparently doesn't match yours.

> > pci_disable_device() does really only one thing: it clears the bus master bit.
> > Yes, it also calls pcibios_disable_device(), but on x86 this is a NOP.
> >
> > I don't think it is SO bad, is it?
>
> It's bad. It means that DMA won't work across such a bridge. Yes, it is
> probably bridge-dependent, and I know for a fact that at least some Intel
> bridges just hard-code the busmaster bit to 1 (at a minimum the host
> bridges do, I'm not sure about others), but I also know for a fact that
> some other bridges _will_ stop DMA to devices behind them if the BM bit is
> clear.

DMA will only not work until the ->resume sets the bus master bit, which
happes before the ->resume of any device behind the bridge runs. There only
is a small window where something (theoretically) may go wrong and I really
don't expect any driver to start DMA from its ->resume_realy or an interrupt
handler.

> But more importantly: Why do you do it? What's the upside? I don't see it.

OK, point taken.

> There's a known downside - you're saving state that is something else than
> what the driver really expects.
>
> So I think clearing bus-master is a huge bug on a bridge, but I think that
> on normal devices it's just pointless.
>
> > The driver using the new model is not supposed to save the state and power
> > off the device. Still, it's probably a good idea not to trust the drivers. :-)
>
> How about devices that have magic power-down sequences? For example, a
> quick grep shows that USB on a PPC PMAC has a special "disable ASIC clocks
> for USB" thing after it puts the USB controller to sleep.

This is exceptional, from what I can tell.

> That was literally the _first_ driver I looked at. Admittedly because I
> knew that USB host controllers tend to be more aware of all the issues
> than most random drivers, but still...
>
> I agree that the new model should turn off devices by default, but the
> thing is, it should also allow drivers that really know better to do magic
> things.
>
> Of course, we can say that such devices should just continue to use the
> legacy model, but I thought that the long-term plan was to just replace
> it. And if you do, you need to allow for drivers that do special things
> due to known motherboard- or chip-specific "issues" (aka "PCI extensions"
> aka "hardware bugs").

We may need an "override default resume" flag for such drivers.

> And yes, I suspect that the magic PPC USB clock thing could maybe be
> rewritten as a system device, so there are other alternatives here, but I
> do suspect that it will be very painful if the new PM layer _forces_ a
> very specific model on drivers that they can't modify at all.
>
> > > I don't see why we don't resume with IO/MEM on, though. The legacy suspend
> > > sequence shouldn't disable them, afaik.
> >
> > No, it shouldn't.
> >
> > However, the patch below actually may help and it really is not too different
> > from my "new infrastructure" patch. It leaves the pcie_portdrv_restore_config()
> > in the PCIe port driver's ->resume(), but that shouldn't change things,
> > pci_enable_device() in there shouldn't do anything and the bus master bit
> > should already be set.
>
> I absolutely agree with this patch.

Well, it's your patch after all, isn't it? ;-)

Rafael

2009-01-31 23:02:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
>
> But many drivers have an analogous code sequence in their PM callbacks and
> I've tested it with several drivers on my test boxes. It's never failed for me.

Rafael!

Read what I write. Twice.

Here it is again: "Imagine that you're a bridge."

Stop the idiocy of just ignoring what I write, and talking about something
else.

Bridges are special.

> > But as to why it fixes Parag's case - I think that's because the new PM
> > resume does more than the legacy resume does, so it ends up re-enabling
> > things anyway. It does it too late, but it doesn't matter in this case (no
> > shared irq issues with the only device behind the pci-e bridge).
>
> Still, the 2.6.28 resume didn't do the "reenable device" thing and it worked.
>
> I think in the Parag's case the problem is the "double restore".

It's possible. Still, it really shouldn't matter.

Or at least, it shouldn't matter, as long as what you restore is sane.
You're just going to rewrite the same data, after all.

That's why I was trying to see if the IO/MEM bits got cleared in the save
image for some reason.

> > See above. I think you really haven't thought the new PM code through.
>
> Yes, I have, but my experience apparently doesn't match yours.

The problem is that you're looking at some individual devices, and saying
"it works for them, so it must work for everybody". Add to that the fact
that you apparently _still_ haven't figured out the difference between
bridges and regular devices, and that most most motherboards probably keep
the PCI bridges powered anyway, and...

And yes, I realize that this is how PM under Linux has worked for a long
time. But it's what I think we should get away from. It's why I pushed so
hard to get the whole interrupt handling sane and stable.

The argument that "it works for a lot of machines" IS NOT AN ARGUMENT,
Rafael! Stop using it as such.

We know that 2.6.28 suspend/resume works for a lot of laptops. Even
possibly _most_ laptops. But it was still broken. We want to get _away_
from that.

> DMA will only not work until the ->resume sets the bus master bit, which
> happes before the ->resume of any device behind the bridge runs.

Read my emails. THIS ISN'T EVEN A RESUME-TIME PROBLEM!

The problems happen on purely the suspend path. How the f*ck do you know
that the drivers behind the bridge don't do everything at 'suspend_late'
time, and expect to be working up until that time?

Here's a big hint: YOU DO NOT KNOW. YOU MUST NOT TURN OFF THE BRIDGE AT
SUSPEND TIME!

I'm getting really fed up with you here. You're not even listening. And
you are _definitely_ not doing any "deep thinking" here.

> > How about devices that have magic power-down sequences? For example, a
> > quick grep shows that USB on a PPC PMAC has a special "disable ASIC clocks
> > for USB" thing after it puts the USB controller to sleep.
>
> This is exceptional, from what I can tell.

So?

Irrelevant. We want to handle the exceptional case too. And we generally
want to handle them _automatically, rather than by:

> We may need an "override default resume" flag for such drivers.

.. why? Wouldn't it be a hell of a lot nicer if the PCI layer just did
things right automatically.

Which the legacy layer already does. It sees "ok, the driver did it's own
pci_save_state(), I'm not going to do it for it".

THAT is robust. And simple. Wouldn't you agree?

So why not do the same in the new one? Why do you want to make the new
interfaces _inferior_ to the old ones?

Linus

2009-01-31 23:09:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)

On Saturday 31 January 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > Anyway, from what I can tell reading your messages in this thread so far,
> > you seem to want the PCI core to:
> > (1) save the state of devices during suspend (avoid doing that if the driver has
> > already saved the state),
>
> Yes, but I'd at least want the device drivers to have the _option_ to
> do it themselves.

(As I wrote in the other message just sent) I wonder if it's a good idea to
introduce an "override default PCI suspend/resume" flag for this purpose.

The drivers that want to handle the PCI stuff themselves would be able to use
this flag to tell the core not to try to power manage the device etc.

Alternatively, the core may check the state_saved bit and look at current_state
to see if the device need not be put into a low power state. Still, putting
the device into a low power state need not be desirable anyway.

> I would expect that by default any normal PCI driver would just rely on
> the PCI layer doing it all for them. But I suspect we may have cases where
> the chip driver will simply want to override things, for one reason or
> another.
>
> We do know that some devices seem to be very picky and get unhappy about
> being put to sleep (we don't put devices into D3 by default in the legacy
> PM case for a reason!), and we do know that some existing drivers do extra
> things _after_ they've put the device to D3.
>
> So there very much are arguments for drivers wanting to do their own "save
> state and power off" if they have special needs.
>
> (Side note: it's entirely possible that one of the reasons we don't put
> devices into D3 in the legacy code-path is purely historical: maybe not
> because the devices were unhappy, but simply because it triggered the
> whole "interrupt at an unlucky place" thing. So I'm hoping that we'll
> actually not have this as a real issue, but..)

It actually has been confirmed recently that this is a real issue. Sadly.

> > (2) put devices into D0 during resume (early),
> > (3) restore the state of devices during resume (early).
>
> Yes.
>
> > Still, you don't want the core to disable devices during suspend and to enable
> > (or reenable) them during resume.
>
> At an absolute _minimum_, bridges are special.
>
> We already know bridges are special: we do things like
>
> if (!pci_is_bridge(pci_dev))
> pci_prepare_to_sleep(pci_dev);
>
>
> ie we don't actually put the bridges into D3 sleep, because the devices
> behind the bridge still need to be available until at LEAST the
> "suspend_late()" stage.
>
> But then pci_pm_default_suspend_generic() does that
> pci_disable_enabled_device() unconditionally - even for bridges. That's
> just wrong, wrong, wrong.

OK, I see your point.

> > What about putting devices into low power states? [Note that ACPI may be
> > necessary for this purpose.]
>
> I do think we should do it, although I'd at least personally prefer
> delaying it to the suspend_late (noirq) phase.
>
> Why? Think about a shared interrupt again - but now coming in at just the
> wrong time during _suspend_. The PCI layer has turned off the device.
> Oops. Lockup. The same lock-up we worked so hard to avoid during resume.

I know. Still, all of the drivers that implement suspend-resume put the
devices into low power states in ->suspend and it's never been observed to
be a source of problems.

Unfortunately, on many systems we are supposed to use ACPI for putting PCI
devices into low power states and right now we can't do that with interrupts
off due to the limitations of our AML interpreter.

The same applies to resume, BTW, but resume is easier, because devices tend
to already be in D0 from the start. Admittedly, though, we may have a problem
with devices that are not in D0 at that point and _require_ ACPI to power them
up (ie. don't support the native PCI PM). Not that I know of any, but still.

ISTR that some devices will not wake up the system if they are not put into
the low power state by ACPI during suspend.

So, I wonder. Do we need to make the AML interpreter allow us to run code
with interrupts off?

> > What about devices with no drivers and/or without suspend/resume support?
>
> Oh, suspending those early and aggressively may be the right thing. But
> again, at least bridges are special.
>
> Some bridges have drivers (pci-e and cardbus bridges at least), others
> don't (regular pci bridges). So bridges hit both the "has drivers" and
> "don't have drivers" case, and are special in both cases.

Yes, the bridges with drivers are somewhat special.

Thanks,
Rafael

2009-01-31 23:27:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)



On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
>
> Alternatively, the core may check the state_saved bit and look at current_state
> to see if the device need not be put into a low power state. Still, putting
> the device into a low power state need not be desirable anyway.

I'd be surprised if we didn't have devices that cannot act as wakeup
devices in D3, for example. Yes, I'm sure it _should_ work, and I realize
that the whole "platform_pci_choose_state()" is supposed to do this all
right, but let me just take a wild stab at guessing that it's not always
going to work.

So I would not be surprised if some devices will want to not be put in D3.

There's also the issue of debugging: we may well want to simply skip
suspending the console device (and all bridges leading up to it). We don't
do that right now, and we basically depend on "no_suspend_console" just
working _despite_ that (because our legacy PCI power management never put
things into sleep states), but that's another example of something where a
driver might simply decide that it doesn't want the default PCI layer
decisions: not because the device cannot do it, but because _we_ don't
want it to do it.

> > Why? Think about a shared interrupt again - but now coming in at just the
> > wrong time during _suspend_. The PCI layer has turned off the device.
> > Oops. Lockup. The same lock-up we worked so hard to avoid during resume.
>
> I know. Still, all of the drivers that implement suspend-resume put the
> devices into low power states in ->suspend and it's never been observed to
> be a source of problems.

I do agree that problems at suspend time are probably somewhat less likely
than at resume time. The devices are mostly in known states (rather than
whatever random state they come up in and the BIOS early programming may
do), and hopefully quiescent (because we told user space to freeze).

But I wouldn't bet on it in general. Think network cards on a shared
interrupt, where the network card gets suspended _after_ the device that
it shares interrupts with. Is the network quiescent? On a lot of desktop
machines it probably is.

But how many people test STR while doing a "ping -f" from another machine?

It _should_ work. Do you guarantee that it does?

I think we should aim for "yes, we do guarantee that it does".

Linus

2009-01-31 23:40:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)



On Sat, 31 Jan 2009, Linus Torvalds wrote:
>
> But how many people test STR while doing a "ping -f" from another machine?
>
> It _should_ work. Do you guarantee that it does?

Btw, this really only is interesting if there's a shared interrupt.

I'm sure that there are network drivers that will crash even on their own
with _just_ the right timing (imagine having a delayed interrupt pending,
then doing the "pci_set_power_state(PCI_D3hot)" thing, and then get the
interrupt handler invoked on another CPU _just_ afterwards), but it's
probably really hard to trigger, and a bug in that specific driver anyway.

But what's much more interesting (and not necessarily a driver bug, but a
general PM infrastructure problem) is if we have that shared interrupt
case, and the network driver gets lots of interrupts just as "driver X" is
shutting down with that interrupt shared. Then, "driver X" will get
interrupts after the PM layer has put its device to sleep, and now "driver
X" is quite understandably confused - it didn't even do the "put to sleep"
itself, but now its device is no longer responding.

And now it's not a really unlikely race condition any more.

Linus

2009-02-01 00:12:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Sunday 01 February 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Rafael J. Wysocki wrote:
> >
> > But many drivers have an analogous code sequence in their PM callbacks and
> > I've tested it with several drivers on my test boxes. It's never failed for me.
>
> Rafael!
>
> Read what I write. Twice.

You didn't give me a chance by removing it. ;-)

> Here it is again: "Imagine that you're a bridge."
>
> Stop the idiocy of just ignoring what I write, and talking about something
> else.
>
> Bridges are special.

[Confused] Yes, they are.

And yes, I agreed that the bus master bit shouldn't be cleared for them. Which
does not necessarily imply that clearing that bit will always lead to problems
in practice.

> > > But as to why it fixes Parag's case - I think that's because the new PM
> > > resume does more than the legacy resume does, so it ends up re-enabling
> > > things anyway. It does it too late, but it doesn't matter in this case (no
> > > shared irq issues with the only device behind the pci-e bridge).
> >
> > Still, the 2.6.28 resume didn't do the "reenable device" thing and it worked.
> >
> > I think in the Parag's case the problem is the "double restore".
>
> It's possible. Still, it really shouldn't matter.
>
> Or at least, it shouldn't matter, as long as what you restore is sane.
> You're just going to rewrite the same data, after all.

Yes, but I suspect the device is misbehaving due to some timing issues.

With devices that behave totally correctly the current code doesn't appear to
cause problems.

> That's why I was trying to see if the IO/MEM bits got cleared in the save
> image for some reason.

Well, we never clear them.

> > > See above. I think you really haven't thought the new PM code through.
> >
> > Yes, I have, but my experience apparently doesn't match yours.
>
> The problem is that you're looking at some individual devices, and saying
> "it works for them, so it must work for everybody".

Look. I have only a limited set of data from devices that have been tested.
On the other hand I have some pieces of documentation that are sometimes not
very clear. I try to use the data and the docs I have to figure out how things
actually work.

That's why the cases like the Parag's one are so important. They allow me to
verify assumptions.

So no, I don't say "it must work for everybody", but I have to assume
_something_ if I don't know that _for_ _sure_. I sometimes make wrong
assumptions, which is a consequence of the fact that I can only use a limited
set of data to verify my observations.

And please note, we still don't know for sure why the Parag's box actually
fails. We have some theories that may or may not be correct, but that's it.

> Add to that the fact that you apparently _still_ haven't figured out the
> difference between bridges and regular devices, and that most most
> motherboards probably keep the PCI bridges powered anyway, and...

[Confused again] Why are you talking about keeping bridges powered?

> And yes, I realize that this is how PM under Linux has worked for a long
> time. But it's what I think we should get away from. It's why I pushed so
> hard to get the whole interrupt handling sane and stable.
>
> The argument that "it works for a lot of machines" IS NOT AN ARGUMENT,
> Rafael! Stop using it as such.
>
> We know that 2.6.28 suspend/resume works for a lot of laptops. Even
> possibly _most_ laptops. But it was still broken. We want to get _away_
> from that.

Working on many (different) systems is a good indication of what is and what
is not going to work in general. It obviously doesn't imply correctness,
though.

> > DMA will only not work until the ->resume sets the bus master bit, which
> > happes before the ->resume of any device behind the bridge runs.
>
> Read my emails. THIS ISN'T EVEN A RESUME-TIME PROBLEM!
>
> The problems happen on purely the suspend path. How the f*ck do you know
> that the drivers behind the bridge don't do everything at 'suspend_late'
> time, and expect to be working up until that time?

DMA from suspend_late? Come on.

> Here's a big hint: YOU DO NOT KNOW. YOU MUST NOT TURN OFF THE BRIDGE AT
> SUSPEND TIME!
>
> I'm getting really fed up with you here. You're not even listening. And
> you are _definitely_ not doing any "deep thinking" here.

Why are you shouting at me? Did you read my reply to your other message?
How many times do I have to tell you I AGREE THAT I SHOULD NOT TURN OFF THE
BUS MASTER BIT FOR BRIDGES so that you actually get it?

> > > How about devices that have magic power-down sequences? For example, a
> > > quick grep shows that USB on a PPC PMAC has a special "disable ASIC clocks
> > > for USB" thing after it puts the USB controller to sleep.
> >
> > This is exceptional, from what I can tell.
>
> So?
>
> Irrelevant. We want to handle the exceptional case too. And we generally
> want to handle them _automatically, rather than by:
>
> > We may need an "override default resume" flag for such drivers.
>
> .. why? Wouldn't it be a hell of a lot nicer if the PCI layer just did
> things right automatically.
>
> Which the legacy layer already does. It sees "ok, the driver did it's own
> pci_save_state(), I'm not going to do it for it".
>
> THAT is robust. And simple. Wouldn't you agree?
>
> So why not do the same in the new one? Why do you want to make the new
> interfaces _inferior_ to the old ones?

On suspend we have two things to do:
- save the state
- put the device into a low power state

The state_saved flag can be used to see if the driver has saved the state.

What about powering off? Are we going to assume that if the driver has saved
the state of the device, the core should not put the device into a low power
state?

Rafael

2009-02-01 00:32:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
> > The problems happen on purely the suspend path. How the f*ck do you know
> > that the drivers behind the bridge don't do everything at 'suspend_late'
> > time, and expect to be working up until that time?
>
> DMA from suspend_late? Come on.

Rafael. Stop being a total idiot.

Read what I wrote.

I'm saying that the driver may not do anything at all at suspend() time,
and leaves everything until suspend_late. Then, at suspend_late(), it
finally really shuts down.

That's actually a very reasonable thing to do in some circumstances. It
simplifies everything, in particular all interrupt handling, since the
device is now fully live all the way while interrupts can happen.

For a USB host controller, for example, it really could make sense to do
that - just leave all the core host controller stuff running, and the only
thing the "suspend()" callback does is to send the commands to the actual
devices, it doesn't necessarily touch the host controller itself at all.

Then, at suspend_late time, you just clear the "running" bit in the
controller (and perhaps not even that - because you want to still push
things out for debugging). End result: you never actually had to shut
anything down at all, and you could (for example) still run a USB serial
port console all the way to shutdown.

And yes, I wanted to do basically exactly that when I was debugging some
issues a year or two ago.

See? The device and driver may be totally alive over a ->suspend() call.
And that means that the bridge CANNOT KNOW that it's ok to shut down DMA.
Because DMA may be the only way the device communicates (again: USB
actually works that way).

So dammit, just admit that you were wrong, instead of continually sending
these _idiotic_ replies. That "turn off bus mastering" was bogus and
idiotic, and had no real cause. Ok to just admit it already?

Linus

2009-02-01 00:37:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)

On Sunday 01 February 2009, Linus Torvalds wrote:
>
> On Sat, 31 Jan 2009, Linus Torvalds wrote:
> >
> > But how many people test STR while doing a "ping -f" from another machine?
> >
> > It _should_ work. Do you guarantee that it does?
>
> Btw, this really only is interesting if there's a shared interrupt.
>
> I'm sure that there are network drivers that will crash even on their own
> with _just_ the right timing (imagine having a delayed interrupt pending,
> then doing the "pci_set_power_state(PCI_D3hot)" thing, and then get the
> interrupt handler invoked on another CPU _just_ afterwards), but it's
> probably really hard to trigger, and a bug in that specific driver anyway.
>
> But what's much more interesting (and not necessarily a driver bug, but a
> general PM infrastructure problem) is if we have that shared interrupt
> case, and the network driver gets lots of interrupts just as "driver X" is
> shutting down with that interrupt shared. Then, "driver X" will get
> interrupts after the PM layer has put its device to sleep, and now "driver
> X" is quite understandably confused - it didn't even do the "put to sleep"
> itself, but now its device is no longer responding.
>
> And now it's not a really unlikely race condition any more.

All this leads to the conclusion that we should put devices into low power
states with interrupts off and this seems to imply that we'll need to make the
AML interpreter allow us to run AML with interrupts off.

Still, what about the following rule:
- If the device is supposed to wake up the system, the driver should prepare it
and put it into a low power state using the existing PCI callbacks, in
->suspend(). In that case, the driver is also required to save the state of
the device before putting it into the low power state. It is also required
to make sure that its interrupt handler will not get confused in case of
shared interrupts.
- If the state of the device hasn't been saved by the driver, the core is
required to save its state (with interrupts off, I suppose?).
- If the state of the device hasn't been saved by the driver, the core will
attempt to put the device into a low power state, using the native PCI PM and
with interrupts off, unless PCI_DEV_FLAGS_NO_D3 is set in dev->flags.

Thanks,
Rafael

2009-02-01 00:42:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume

On Sunday 01 February 2009, Linus Torvalds wrote:
>
> On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
> > > The problems happen on purely the suspend path. How the f*ck do you know
> > > that the drivers behind the bridge don't do everything at 'suspend_late'
> > > time, and expect to be working up until that time?
> >
> > DMA from suspend_late? Come on.
>
> Rafael. Stop being a total idiot.
>
> Read what I wrote.
>
> I'm saying that the driver may not do anything at all at suspend() time,
> and leaves everything until suspend_late. Then, at suspend_late(), it
> finally really shuts down.
>
> That's actually a very reasonable thing to do in some circumstances. It
> simplifies everything, in particular all interrupt handling, since the
> device is now fully live all the way while interrupts can happen.
>
> For a USB host controller, for example, it really could make sense to do
> that - just leave all the core host controller stuff running, and the only
> thing the "suspend()" callback does is to send the commands to the actual
> devices, it doesn't necessarily touch the host controller itself at all.
>
> Then, at suspend_late time, you just clear the "running" bit in the
> controller (and perhaps not even that - because you want to still push
> things out for debugging). End result: you never actually had to shut
> anything down at all, and you could (for example) still run a USB serial
> port console all the way to shutdown.
>
> And yes, I wanted to do basically exactly that when I was debugging some
> issues a year or two ago.
>
> See? The device and driver may be totally alive over a ->suspend() call.
> And that means that the bridge CANNOT KNOW that it's ok to shut down DMA.
> Because DMA may be the only way the device communicates (again: USB
> actually works that way).
>
> So dammit, just admit that you were wrong,

I said I was.

Thanks,
Rafael

2009-02-01 00:52:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29-rc3: tg3 dead after resume



On Sat, 31 Jan 2009, Linus Torvalds wrote:
>
> For a USB host controller, for example, it really could make sense to do
> that - just leave all the core host controller stuff running, and the only
> thing the "suspend()" callback does is to send the commands to the actual
> devices, it doesn't necessarily touch the host controller itself at all.

Same is quite likely true of things like video graphics adapters. Again,
for all the same reasons. Think about all those fbcon drivers. They will
use DMA for things. And again, there are very compelling debugging reasons
to not suspend them for real until suspend_late (if even then).

Linus

2009-02-01 01:07:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)



On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
>
> All this leads to the conclusion that we should put devices into low power
> states with interrupts off and this seems to imply that we'll need to make the
> AML interpreter allow us to run AML with interrupts off.

How many devices actually have the _PS3 method (or whatever it is that we
end up executing)? We might be able to simply flag it, and say "ok, if we
have a _PS3 method, we'll have to suspend early, otherwise we can leave it
for a late suspend".

Definitely not perfect, but perhaps a way to get the safe thing on 99% of
all cases, and have to live with the horrid ACPI rules on some things.

I thought the _DSW thing is common for setting up wakeup, but _PSx is not.
But I have not looked at many ACPI tables in my life. I try to active
avoid it if I at all humanly can.

Linus

2009-02-01 01:14:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)



On Sat, 31 Jan 2009, Linus Torvalds wrote:
>
> I thought the _DSW thing is common for setting up wakeup, but _PSx is not.
> But I have not looked at many ACPI tables in my life. I try to active
> avoid it if I at all humanly can.

Doing a quick grep on my laptop seems to confirm that. No actual _PSx
things found in any acpi tables at all that I can see.

And on a mac mini, there are _PS3 entries that _look_ like they are
connected to the IDE controller and the realtime clock, but it looks like
they don't happen for the PCI devices.

But again - I may have screwed that up. ACPI tables are not my favourite
data structure.

Linus

2009-02-01 01:20:27

by Arjan van de Ven

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)

On Sat, 31 Jan 2009 17:06:47 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
> >
> > All this leads to the conclusion that we should put devices into
> > low power states with interrupts off and this seems to imply that
> > we'll need to make the AML interpreter allow us to run AML with
> > interrupts off.
>
> How many devices actually have the _PS3 method (or whatever it is
> that we end up executing)? We might be able to simply flag it, and
> say "ok, if we have a _PS3 method, we'll have to suspend early,
> otherwise we can leave it for a late suspend".
>

in this area there's a pet pieve of mine, or rather something that
shows up on kerneloops.org quite a bit:
There are several PCI quirks that get run with irqs off, but they are
just the "normal" boot time quirks, and when they get run there, they
can sleep. Some of them do things like call ioremap() and the like....
from memory some of the asus and via quirks come to mind...



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-02-01 01:25:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: What should PCI core do during suspend-resume? (was: Re: 2.6.29-rc3: tg3 dead after resume)

On Sunday 01 February 2009, Linus Torvalds wrote:
>
> On Sun, 1 Feb 2009, Rafael J. Wysocki wrote:
> >
> > All this leads to the conclusion that we should put devices into low power
> > states with interrupts off and this seems to imply that we'll need to make the
> > AML interpreter allow us to run AML with interrupts off.
>
> How many devices actually have the _PS3 method (or whatever it is that we
> end up executing)? We might be able to simply flag it, and say "ok, if we
> have a _PS3 method, we'll have to suspend early, otherwise we can leave it
> for a late suspend".

That seems doable at first sight, although I think we should take D1 and D2
into account too (the ACPI rules may be that for S3, ie. suspend to RAM, given
device should be put into D2, for example). We have a function for checking
if device is power-manageable by ACPI.

Still, in that case, should the rule be that if the device is power-manageable
by ACPI, the PCI core is supposed to put it into a low power state (using ACPI)
with interrupts on and if the device is not power-manageable by ACPI, the
PCI core is supposed to put it into a low power state using the native PM?

> Definitely not perfect, but perhaps a way to get the safe thing on 99% of
> all cases, and have to live with the horrid ACPI rules on some things.

They are not that uncommon AFAICS. On all of my boxes there are devices
power-manageable by ACPI. Usually they are USB controllers and network
adapters, but sometimes it happens to sound cards too.

> I thought the _DSW thing is common for setting up wakeup, but _PSx is not.
> But I have not looked at many ACPI tables in my life. I try to active
> avoid it if I at all humanly can.

Usually, we need to use ACPI to set up wake-up and then use ACPI to put the
device into a low power state. Otherwise, the wake-up may not work.

Thanks,
Rafael