2004-11-18 22:28:09

by Matthias Hentges

[permalink] [raw]
Subject: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Hello all,

I'm in the process of debugging S3 on my notebook and found out that I
can resume from S3 with every kernel up to (and including) 2.6.7-rc1
( patch-2.6.6-bk8-bk9.bz2 ).

After 2.6.7-rc1, my notebook freezes upon a resume from S3. Tested with
2.6.7-rc2, -rc3, 2.6.8.1, 2.6.9 and some 2.6.10-rcX-bkX kernels.

Please note that these tests were run in single user mode with a
barebone kernel
.config (attached) just enough to boot (ie no modules, no usb etc)

I have found a hint on the web that the pci-resume code, which was
included in 2.6.7-rc2, might cause this problem. I removed the call to
pci_default_resume in drivers/pci/pci-driver.c and my laptop resumed
into a working state again ( tested
with 2.6.7 and 2.6.9 ).

I've written an email to Arjan van de Ven, the author of the resume
patch and he suggested positing here, along with a full lspci output
(attached)
He thinks that some device is misbehaving and causing trouble if
resumed.

I have the bad feeling that the device in question is the built-in video
card
"ATI Technologies Inc RV250 5c63 [Radeon Mobility 9200 M9+] (rev 01)"
since every try to re-enable it after a resume failed. The screen just
stays dark, even with acpi_sleep=s3_mode. s3_bios freezes the machine on
a resume, as does resuming with radeonfb or accelerated X (DRI or
fglrx). Not even the boot-radeon tool
helps. It either freezes the machine, or just doesn't work.

In the meantime, I'm using the attached patch which disables the new
pci-resume code.

Thanks
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice


Attachments:
lspci-vvvv.txt (10.63 kB)
dmesg.txt (15.08 kB)
old_pci_resume-2.6.9.patch (1.61 kB)
config.gz (3.96 kB)
Download all attachments

2004-11-19 11:57:32

by Pavel Machek

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Hi!

> I'm in the process of debugging S3 on my notebook and found out that I
> can resume from S3 with every kernel up to (and including) 2.6.7-rc1
> ( patch-2.6.6-bk8-bk9.bz2 ).

You can resume and your video works after resume in 2.6.7? Great!

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-19 11:58:19

by Pavel Machek

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Hi!

> I'm in the process of debugging S3 on my notebook and found out that I
> can resume from S3 with every kernel up to (and including) 2.6.7-rc1
> ( patch-2.6.6-bk8-bk9.bz2 ).
>
> After 2.6.7-rc1, my notebook freezes upon a resume from S3. Tested with
> 2.6.7-rc2, -rc3, 2.6.8.1, 2.6.9 and some 2.6.10-rcX-bkX kernels.
>
> Please note that these tests were run in single user mode with a
> barebone kernel
> .config (attached) just enough to boot (ie no modules, no usb etc)
>
> I have found a hint on the web that the pci-resume code, which was
> included in 2.6.7-rc2, might cause this problem. I removed the call to
> pci_default_resume in drivers/pci/pci-driver.c and my laptop resumed
> into a working state again ( tested
> with 2.6.7 and 2.6.9 ).
>
> I've written an email to Arjan van de Ven, the author of the resume
> patch and he suggested positing here, along with a full lspci output
> (attached)
> He thinks that some device is misbehaving and causing trouble if
> resumed.

Okay, patch is way too ugly. You probably should provide resume method
for your radeon that just does nothing. That should confirm your
theory, fix the crash, and you'll avoid touching common code with it.

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-19 13:57:03

by Matthias Hentges

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Am Freitag, den 19.11.2004, 12:55 +0100 schrieb Pavel Machek:
> Hi!
>
> > I'm in the process of debugging S3 on my notebook and found out that I
> > can resume from S3 with every kernel up to (and including) 2.6.7-rc1
> > ( patch-2.6.6-bk8-bk9.bz2 ).
>
> You can resume and your video works after resume in 2.6.7? Great!

Heh, well no. Video is as dead as it can get :\ No known trick revives
it after a resume . But at least the machine doesn't freeze after S3.


> Okay, patch is way too ugly.

Of course it is :) It's more a proof-?f-concept that pci-resume is
indeed causing the problem. I have no idea how to debug this any
further. In the meantime this patch works for me.

> You probably should provide resume method
> for your radeon that just does nothing. That should confirm your
> theory, fix the crash, and you'll avoid touching common code with it.

Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
even sure that it's the radeon which is acting up here.
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice

2004-11-19 23:09:49

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines


> Of course it is :) It's more a proof-?f-concept that pci-resume is
> indeed causing the problem. I have no idea how to debug this any
> further. In the meantime this patch works for me.
>
> > You probably should provide resume method
> > for your radeon that just does nothing. That should confirm your
> > theory, fix the crash, and you'll avoid touching common code with it.
>
> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
> even sure that it's the radeon which is acting up here.

Have you tried with radeonfb in your kernel config ?

Ben.


2004-11-20 02:46:03

by Matthew Garrett

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Benjamin Herrenschmidt <[email protected]> wrote:

>> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
>> even sure that it's the radeon which is acting up here.
>
> Have you tried with radeonfb in your kernel config ?

In the general case, it's harder to resume systems using framebuffers
than systems that don't. The contortions that are necessary for non-fb
systems tend to break fb systems (you end up with userspace and the
kernel both trying to get the graphics hardware back into a sane state),
so in an ideal world resume would work without any framebuffer support.

--
Matthew Garrett | [email protected]

2004-11-20 03:39:55

by Matthias Hentges

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Am Samstag, den 20.11.2004, 02:43 +0000 schrieb Matthew Garrett:
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> >> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
> >> even sure that it's the radeon which is acting up here.
> >
> > Have you tried with radeonfb in your kernel config ?
>
> In the general case, it's harder to resume systems using framebuffers
> than systems that don't. The contortions that are necessary for non-fb
> systems tend to break fb systems (you end up with userspace and the
> kernel both trying to get the graphics hardware back into a sane state),
> so in an ideal world resume would work without any framebuffer support.

Trying to resume with radeonfb or X (DRI or fglrx) causes the machine
to freeze upon a resume.
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice

2004-11-20 07:34:28

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

On Sat, 2004-11-20 at 02:43 +0000, Matthew Garrett wrote:
> Benjamin Herrenschmidt <[email protected]> wrote:
>
> >> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
> >> even sure that it's the radeon which is acting up here.
> >
> > Have you tried with radeonfb in your kernel config ?
>
> In the general case, it's harder to resume systems using framebuffers
> than systems that don't. The contortions that are necessary for non-fb
> systems tend to break fb systems (you end up with userspace and the
> kernel both trying to get the graphics hardware back into a sane state),
> so in an ideal world resume would work without any framebuffer support.

Bullshit...

Well... In an ideal world, the video chip would come up all back by
itself and nobody would have to care... unfortunately we aren't in an
ideal world.

With the way video cards are evolving, we'll soon have no choice but
have a kernel driver bring the chip back. Userspace has nothing to do
with that, and userspace & kernel aren't fighting over it.

Ben.


2004-11-20 07:35:57

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

On Sat, 2004-11-20 at 04:36 +0100, Matthias Hentges wrote:
> Am Samstag, den 20.11.2004, 02:43 +0000 schrieb Matthew Garrett:
> > Benjamin Herrenschmidt <[email protected]> wrote:
> >
> > >> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
> > >> even sure that it's the radeon which is acting up here.
> > >
> > > Have you tried with radeonfb in your kernel config ?
> >
> > In the general case, it's harder to resume systems using framebuffers
> > than systems that don't. The contortions that are necessary for non-fb
> > systems tend to break fb systems (you end up with userspace and the
> > kernel both trying to get the graphics hardware back into a sane state),
> > so in an ideal world resume would work without any framebuffer support.
>
> Trying to resume with radeonfb or X (DRI or fglrx) causes the machine
> to freeze upon a resume.

At what point does it freeze ? Is the display back before the freeze ?

Ben.


2004-11-20 08:03:16

by Matthias Hentges

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Am Samstag, den 20.11.2004, 18:34 +1100 schrieb Benjamin Herrenschmidt:
> On Sat, 2004-11-20 at 04:36 +0100, Matthias Hentges wrote:
> > Am Samstag, den 20.11.2004, 02:43 +0000 schrieb Matthew Garrett:
> > > Benjamin Herrenschmidt <[email protected]> wrote:
> > >

[...]

> > Trying to resume with radeonfb or X (DRI or fglrx) causes the machine
> > to freeze upon a resume.
>
> At what point does it freeze ? Is the display back before the freeze ?

Sadly the video *never* comes back and stays dark no matter what I try:
- boot-radeon (int10 POST call) doesn't work. Either it segfaults or
it hangs the machine
- Any combination of radeontool light on|off doesn't help (no freeze,
sometimes it
can't read the cards mem address??)
- The int10 radeon patch for X11 doesn't help (freeze)
- radeonfb and / or X (either patched w/ int10 or not) freeze the
machine

I'm running out of ideas with this darn thing.
Since the serial port doesn't come back from S3 either, even a serial
console is of no help.

I have attached the output of lspci -vvv before and after resuming from
S3
The latter shows lots of "[disabled]" entries. Is that of any use?

Thanks
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice


Attachments:
lspci-vvv_after_s3.txt (0.98 kB)
lspci-vvv_before_s3.txt (0.99 kB)
Download all attachments

2004-11-20 22:27:47

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

On Sat, 2004-11-20 at 09:01 +0100, Matthias Hentges wrote:
> Am Samstag, den 20.11.2004, 18:34 +1100 schrieb Benjamin Herrenschmidt:
> > On Sat, 2004-11-20 at 04:36 +0100, Matthias Hentges wrote:
> > > Am Samstag, den 20.11.2004, 02:43 +0000 schrieb Matthew Garrett:
> > > > Benjamin Herrenschmidt <[email protected]> wrote:
> > > >
>
> [...]
>
> > > Trying to resume with radeonfb or X (DRI or fglrx) causes the machine
> > > to freeze upon a resume.
> >
> > At what point does it freeze ? Is the display back before the freeze ?
>
> Sadly the video *never* comes back and stays dark no matter what I try:
> - boot-radeon (int10 POST call) doesn't work. Either it segfaults or
> it hangs the machine
> - Any combination of radeontool light on|off doesn't help (no freeze,
> sometimes it
> can't read the cards mem address??)
> - The int10 radeon patch for X11 doesn't help (freeze)
> - radeonfb and / or X (either patched w/ int10 or not) freeze the
> machine
>
> I'm running out of ideas with this darn thing.
> Since the serial port doesn't come back from S3 either, even a serial
> console is of no help.
>
> I have attached the output of lspci -vvv before and after resuming from
> S3
> The latter shows lots of "[disabled]" entries. Is that of any use?

Difficult to say at this point, the [disabled] thing are easy fixed with
a pci_enable_device(). Unfortunately, on some machines, the firmware
sort-of expects the kenrel driver to reboot the card from scratch...

Ben.


2004-11-20 22:38:01

by Pavel Machek

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Hi!

> >> Sorry, that's beyond my abilities. That's why I'm posting here. I'm not
> >> even sure that it's the radeon which is acting up here.
> >
> > Have you tried with radeonfb in your kernel config ?
>
> In the general case, it's harder to resume systems using

This is not the general case. Read the whole thread, generic PCI
resume was causing problems.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2004-11-21 08:52:31

by Matthias Hentges

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Am Sonntag, den 21.11.2004, 09:27 +1100 schrieb Benjamin Herrenschmidt:
> On Sat, 2004-11-20 at 09:01 +0100, Matthias Hentges wrote:
> > Am Samstag, den 20.11.2004, 18:34 +1100 schrieb Benjamin Herrenschmidt:
> > > On Sat, 2004-11-20 at 04:36 +0100, Matthias Hentges wrote:
> > > > Am Samstag, den 20.11.2004, 02:43 +0000 schrieb Matthew Garrett:
> > > > > Benjamin Herrenschmidt <[email protected]> wrote:
> > > > >
> >
> > [...]
> >
> > > > Trying to resume with radeonfb or X (DRI or fglrx) causes the machine
> > > > to freeze upon a resume.

> > > At what point does it freeze ? Is the display back before the freeze ?
> >
> > Sadly the video *never* comes back and stays dark no matter what I try:

[...]

> > The latter shows lots of "[disabled]" entries. Is that of any use?
>
[...]

> Difficult to say at this point, the [disabled] thing are easy fixed with
> a pci_enable_device(). Unfortunately, on some machines, the firmware
> sort-of expects the kenrel driver to reboot the card from scratch...

I did some more tests today and found out that
"0000:00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP
Controller (rev 21) (prog-if 00 [Normal decode])"

wasn't correctly resumed either.

I wrote a script to dump the pci data (from lspci -x $device). Importing
the data after a resume freezes the machine *if one is touching data
that hasn't been changed during S3*. If I only change the values which
were modified after resume, the machine does *not* freeze.

Maybe that's the problem with pci_default_resume. It looks like it is
just writing back the data it has stored before resuming. Maybe one
should only write the values which have actually changed?

Anyways, using my little script, i managed to restore the PCI data of
the "Processor to AGP Controller" and the Radeon card after a resume.

If X is running on VT7 and one suspends from VT1 and after resuming
switches back to VT7 ( after restoring the PCI data ), the backlight
goes on but the display is still empty.

Looks like I'm still missing something. To bad boot-radeon always
segsfaults :\
An int10 call after restoring the PCI data might just do the trick.
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice

2004-11-21 21:40:46

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines


> I did some more tests today and found out that
> "0000:00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP
> Controller (rev 21) (prog-if 00 [Normal decode])"
>
> wasn't correctly resumed either.
>
> I wrote a script to dump the pci data (from lspci -x $device). Importing
> the data after a resume freezes the machine *if one is touching data
> that hasn't been changed during S3*. If I only change the values which
> were modified after resume, the machine does *not* freeze.
>
> Maybe that's the problem with pci_default_resume. It looks like it is
> just writing back the data it has stored before resuming. Maybe one
> should only write the values which have actually changed?
>
> Anyways, using my little script, i managed to restore the PCI data of
> the "Processor to AGP Controller" and the Radeon card after a resume.

That "update only what changed" makes little sense ... can you send me
the lspci state of the Intel bridge before you try to resume it ? I
suspect our pci_restore_state() should be smarter, that is check if
something changed (a BAR), if yes, switch mem/io off, restore the BARs,
then switch mem/io back on...

Ben.


2004-11-22 04:34:52

by Matthias Hentges

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

Am Montag, den 22.11.2004, 08:39 +1100 schrieb Benjamin Herrenschmidt:

> That "update only what changed" makes little sense

Sorry, I was merely stating my observations.

> ... can you send me
> the lspci state of the Intel bridge before you try to resume it ? I
> suspect our pci_restore_state() should be smarter, that is check if
> something changed (a BAR), if yes, switch mem/io off, restore the BARs,
> then switch mem/io back on...

Attached.

Thanks!
--
Matthias Hentges
Cologne / Germany

[http://www.hentges.net] -> PGP welcome, HTML tolerated
ICQ: 97 26 97 4 -> No files, no URL's

My OS: Debian SID. Geek by Nature, Linux by Choice


Attachments:
dmesg.txt.gz (4.48 kB)
kernel.config.gz (6.17 kB)
lspci-vvvxxx_after_restoring_PCI.txt.gz (3.34 kB)
lspci-vvvxxx_after_resuming.txt.gz (3.30 kB)
lspci-vvvxxx_before_suspending.txt.gz (3.39 kB)
proc_interrupts.txt (406.00 B)
Download all attachments

2004-11-22 04:53:32

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: pci-resume patch from 2.6.7-rc2 breakes S3 resume on some machines

On Mon, 2004-11-22 at 05:34 +0100, Matthias Hentges wrote:
> Am Montag, den 22.11.2004, 08:39 +1100 schrieb Benjamin Herrenschmidt:
>
> > That "update only what changed" makes little sense
>
> Sorry, I was merely stating my observations.
>
> > ... can you send me
> > the lspci state of the Intel bridge before you try to resume it ? I
> > suspect our pci_restore_state() should be smarter, that is check if
> > something changed (a BAR), if yes, switch mem/io off, restore the BARs,
> > then switch mem/io back on...
>
> Attached.

Ok, it's clearly visible that your CPU->AGP bridge isn't properly
restored. I can't tell if the "default" resume code is enough tho, but
it's fairly probably that this isn't the only problem, and that the
video chip itself isn't restored neither...

I don't think the default resume code is to blame here, though the CPU
to AGP bridge may need some special restore code restoring more than
just it's config space (very probable even). I suspect there is some
ACPI trickery here that should be happening and isn't but my knowledge
of ACPI isn't that great.

Once the config space is resumed, I suppose doing a soft-boot of the
card with the BIOS would work, but then, that means preventing anything
from actually touching the video card until that happens...

Ben.