2009-04-06 23:53:18

by Maxim Levitsky

[permalink] [raw]
Subject: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

Hi,

This is first time, I am actually happy about a regression....

I have a notebook, aspire 5720G that fails to do two suspends to ram in
row.

for example I can do s2ram->s2disk->s2ram->...
but can't s2ram->s2ram.

Well that at least was the situation till now.
Also there is no way to debug this - bios doesn't pass control back to
linux on failed resume. I tried probably every guess I could come with.

Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
finally bisected, I can't anymore do two suspends to ram at all,
regardless of suspend to disk in between. Also bios doesn't pass control
when resume fails.

I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
that were fixed.
the fixes are:

a0e280e0f33f6c859a235fb69a875ed8f3420388
1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899


Now, why I am happy about this:
It seems that a suspend cycle changes something that explodes on next
resume. a s2disk cycle cleared this, but not anymore, thus the poison
must be somehow connected to the
a0d4922da2e4ccb0973095d8d29f36f6b1b5f703

PCI state?? I tried restoring it from saved file, (created on suspend)
but this didn't help.
(Only a single register, which looks like a clear or write register
changed).


Also this commits narrows down the search, now it is clear that this is
usb related. No wonder bios pokes at usb on resume and stalls.....


It could even be connected to bios handoff, maybe we don't do that on
resume?

(Note: this regression is present all way to latest -git)


What do you think?

Best regards,
Maxim Levitsky



2009-04-07 15:27:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Tuesday 07 April 2009, Maxim Levitsky wrote:
> Hi,

Hi,

> This is first time, I am actually happy about a regression....
>
> I have a notebook, aspire 5720G that fails to do two suspends to ram in
> row.
>
> for example I can do s2ram->s2disk->s2ram->...
> but can't s2ram->s2ram.
>
> Well that at least was the situation till now.
> Also there is no way to debug this - bios doesn't pass control back to
> linux on failed resume. I tried probably every guess I could come with.
>
> Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
> finally bisected, I can't anymore do two suspends to ram at all,
> regardless of suspend to disk in between. Also bios doesn't pass control
> when resume fails.

I wonder what happens if the in-between s2disk is in the "shutdown" mode.
Theoretically, it should work as a cold boot, so the second s2ram should work
in this case.

> I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
> that were fixed.
> the fixes are:
>
> a0e280e0f33f6c859a235fb69a875ed8f3420388
> 1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899

Commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 has been modified quite a bit
by some later commits due to regressions it introduced.

> Now, why I am happy about this:
> It seems that a suspend cycle changes something that explodes on next
> resume. a s2disk cycle cleared this, but not anymore, thus the poison
> must be somehow connected to the
> a0d4922da2e4ccb0973095d8d29f36f6b1b5f703

Hint: it would be a lot easier to read the message if you included the subjects
of the commits too. :-)

> PCI state?? I tried restoring it from saved file, (created on suspend)
> but this didn't help.
> (Only a single register, which looks like a clear or write register
> changed).
>
>
> Also this commits narrows down the search, now it is clear that this is
> usb related. No wonder bios pokes at usb on resume and stalls.....
>
>
> It could even be connected to bios handoff, maybe we don't do that on
> resume?
>
> (Note: this regression is present all way to latest -git)

I'm not sure if we really should consider this as a regression. s2ram clearly
didn't work correctly on your machine before and the s2disk in between
is not really relevant here IMO.

> What do you think?

Well, not much apart from the observation that the problem with s2ram on your
machine is probably related to USB. In fact, on Intel chipsets there seem to
be some strange links between the USB controllers (most notably EHCI) and
the core chipset that don't appear to be well documented and this may be the
case in which they show up. Dunno.

Thanks,
Rafael

2009-04-09 12:06:58

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Tue, 2009-04-07 at 17:24 +0200, Rafael J. Wysocki wrote:
> On Tuesday 07 April 2009, Maxim Levitsky wrote:
> > Hi,
>
> Hi,
>
> > This is first time, I am actually happy about a regression....
> >
> > I have a notebook, aspire 5720G that fails to do two suspends to ram in
> > row.
> >
> > for example I can do s2ram->s2disk->s2ram->...
> > but can't s2ram->s2ram.
> >
> > Well that at least was the situation till now.
> > Also there is no way to debug this - bios doesn't pass control back to
> > linux on failed resume. I tried probably every guess I could come with.
> >
> > Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
> > finally bisected, I can't anymore do two suspends to ram at all,
> > regardless of suspend to disk in between. Also bios doesn't pass control
> > when resume fails.
>
> I wonder what happens if the in-between s2disk is in the "shutdown" mode.
> Theoretically, it should work as a cold boot, so the second s2ram should work
> in this case.
>
> > I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
> > that were fixed.
> > the fixes are:
> >
> > a0e280e0f33f6c859a235fb69a875ed8f3420388
> > 1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899
>
> Commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 has been modified quite a bit
> by some later commits due to regressions it introduced.
>
> > Now, why I am happy about this:
> > It seems that a suspend cycle changes something that explodes on next
> > resume. a s2disk cycle cleared this, but not anymore, thus the poison
> > must be somehow connected to the
> > a0d4922da2e4ccb0973095d8d29f36f6b1b5f703
>
> Hint: it would be a lot easier to read the message if you included the subjects
> of the commits too. :-)
>
Sure, sorry ;-)

> > PCI state?? I tried restoring it from saved file, (created on suspend)
> > but this didn't help.
> > (Only a single register, which looks like a clear or write register
> > changed).
> >
> >
> > Also this commits narrows down the search, now it is clear that this is
> > usb related. No wonder bios pokes at usb on resume and stalls.....
> >
> >
> > It could even be connected to bios handoff, maybe we don't do that on
> > resume?
> >
> > (Note: this regression is present all way to latest -git)
>
> I'm not sure if we really should consider this as a regression. s2ram clearly
> didn't work correctly on your machine before and the s2disk in between
> is not really relevant here IMO.
Yet, this allowed me at least lalf the time to do s2ram.
s2disk eats battery, if I suspend the system for short time


>
> > What do you think?
>
> Well, not much apart from the observation that the problem with s2ram on your
> machine is probably related to USB. In fact, on Intel chipsets there seem to
> be some strange links between the USB controllers (most notably EHCI) and
> the core chipset that don't appear to be well documented and this may be the
> case in which they show up. Dunno.

Sorry for late reply,

I now semi-bisected, the change inside this patch.

First, UHCI doesn't affect anything.

Then, if I move 'late' suspend functions inside normal ones, everything
returns works like it used to be (I need to retest this statement, I did
other changes too).


Also, I got a idea about, think about this:
my notebook has a webcam, a usb UVC webcam.

Fact that it has firmware, can't be argued, it has to be true.
Suppose its firmware is stored in bios?
In this case, bios would load it on resume from ram, thus it will need
fully functional ehci controller.
We do something wrong, or more correctly, bios has some wrong
assumptions (read assume windows driver) about ehci state.


Best regards,
Maxim Levitsky

2009-04-09 14:23:46

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Thu, 2009-04-09 at 15:06 +0300, Maxim Levitsky wrote:
> On Tue, 2009-04-07 at 17:24 +0200, Rafael J. Wysocki wrote:
> > On Tuesday 07 April 2009, Maxim Levitsky wrote:
> > > Hi,
> >
> > Hi,
> >
> > > This is first time, I am actually happy about a regression....
> > >
> > > I have a notebook, aspire 5720G that fails to do two suspends to ram in
> > > row.
> > >
> > > for example I can do s2ram->s2disk->s2ram->...
> > > but can't s2ram->s2ram.
> > >
> > > Well that at least was the situation till now.
> > > Also there is no way to debug this - bios doesn't pass control back to
> > > linux on failed resume. I tried probably every guess I could come with.
> > >
> > > Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
> > > finally bisected, I can't anymore do two suspends to ram at all,
> > > regardless of suspend to disk in between. Also bios doesn't pass control
> > > when resume fails.
> >
> > I wonder what happens if the in-between s2disk is in the "shutdown" mode.
> > Theoretically, it should work as a cold boot, so the second s2ram should work
> > in this case.
> >
> > > I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
> > > that were fixed.
> > > the fixes are:
> > >
> > > a0e280e0f33f6c859a235fb69a875ed8f3420388
> > > 1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899
> >
> > Commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 has been modified quite a bit
> > by some later commits due to regressions it introduced.
> >
> > > Now, why I am happy about this:
> > > It seems that a suspend cycle changes something that explodes on next
> > > resume. a s2disk cycle cleared this, but not anymore, thus the poison
> > > must be somehow connected to the
> > > a0d4922da2e4ccb0973095d8d29f36f6b1b5f703
> >
> > Hint: it would be a lot easier to read the message if you included the subjects
> > of the commits too. :-)
> >
> Sure, sorry ;-)
>
> > > PCI state?? I tried restoring it from saved file, (created on suspend)
> > > but this didn't help.
> > > (Only a single register, which looks like a clear or write register
> > > changed).
> > >
> > >
> > > Also this commits narrows down the search, now it is clear that this is
> > > usb related. No wonder bios pokes at usb on resume and stalls.....
> > >
> > >
> > > It could even be connected to bios handoff, maybe we don't do that on
> > > resume?
> > >
> > > (Note: this regression is present all way to latest -git)
> >
> > I'm not sure if we really should consider this as a regression. s2ram clearly
> > didn't work correctly on your machine before and the s2disk in between
> > is not really relevant here IMO.
> Yet, this allowed me at least lalf the time to do s2ram.
> s2disk eats battery, if I suspend the system for short time
>
>
> >
> > > What do you think?
> >
> > Well, not much apart from the observation that the problem with s2ram on your
> > machine is probably related to USB. In fact, on Intel chipsets there seem to
> > be some strange links between the USB controllers (most notably EHCI) and
> > the core chipset that don't appear to be well documented and this may be the
> > case in which they show up. Dunno.
>
> Sorry for late reply,
>
> I now semi-bisected, the change inside this patch.
>
> First, UHCI doesn't affect anything.
>
> Then, if I move 'late' suspend functions inside normal ones, everything
> returns works like it used to be (I need to retest this statement, I did
> other changes too).
Yep, just moving content of late suspend and content of early resume to
normal suspend/resume functions fixes this issue.

>
>
> Also, I got a idea about, think about this:
> my notebook has a webcam, a usb UVC webcam.
>
> Fact that it has firmware, can't be argued, it has to be true.
> Suppose its firmware is stored in bios?
> In this case, bios would load it on resume from ram, thus it will need
> fully functional ehci controller.
> We do something wrong, or more correctly, bios has some wrong
> assumptions (read assume windows driver) about ehci state.
>
>
> Best regards,
> Maxim Levitsky
>
>

2009-04-09 15:19:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Thursday 09 April 2009, Maxim Levitsky wrote:
> On Thu, 2009-04-09 at 15:06 +0300, Maxim Levitsky wrote:
> > On Tue, 2009-04-07 at 17:24 +0200, Rafael J. Wysocki wrote:
> > > On Tuesday 07 April 2009, Maxim Levitsky wrote:
> > > > Hi,
> > >
> > > Hi,
> > >
> > > > This is first time, I am actually happy about a regression....
> > > >
> > > > I have a notebook, aspire 5720G that fails to do two suspends to ram in
> > > > row.
> > > >
> > > > for example I can do s2ram->s2disk->s2ram->...
> > > > but can't s2ram->s2ram.
> > > >
> > > > Well that at least was the situation till now.
> > > > Also there is no way to debug this - bios doesn't pass control back to
> > > > linux on failed resume. I tried probably every guess I could come with.
> > > >
> > > > Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
> > > > finally bisected, I can't anymore do two suspends to ram at all,
> > > > regardless of suspend to disk in between. Also bios doesn't pass control
> > > > when resume fails.
> > >
> > > I wonder what happens if the in-between s2disk is in the "shutdown" mode.
> > > Theoretically, it should work as a cold boot, so the second s2ram should work
> > > in this case.
> > >
> > > > I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
> > > > that were fixed.
> > > > the fixes are:
> > > >
> > > > a0e280e0f33f6c859a235fb69a875ed8f3420388
> > > > 1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899
> > >
> > > Commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 has been modified quite a bit
> > > by some later commits due to regressions it introduced.
> > >
> > > > Now, why I am happy about this:
> > > > It seems that a suspend cycle changes something that explodes on next
> > > > resume. a s2disk cycle cleared this, but not anymore, thus the poison
> > > > must be somehow connected to the
> > > > a0d4922da2e4ccb0973095d8d29f36f6b1b5f703
> > >
> > > Hint: it would be a lot easier to read the message if you included the subjects
> > > of the commits too. :-)
> > >
> > Sure, sorry ;-)
> >
> > > > PCI state?? I tried restoring it from saved file, (created on suspend)
> > > > but this didn't help.
> > > > (Only a single register, which looks like a clear or write register
> > > > changed).
> > > >
> > > >
> > > > Also this commits narrows down the search, now it is clear that this is
> > > > usb related. No wonder bios pokes at usb on resume and stalls.....
> > > >
> > > >
> > > > It could even be connected to bios handoff, maybe we don't do that on
> > > > resume?
> > > >
> > > > (Note: this regression is present all way to latest -git)
> > >
> > > I'm not sure if we really should consider this as a regression. s2ram clearly
> > > didn't work correctly on your machine before and the s2disk in between
> > > is not really relevant here IMO.
> > Yet, this allowed me at least lalf the time to do s2ram.
> > s2disk eats battery, if I suspend the system for short time
> >
> >
> > >
> > > > What do you think?
> > >
> > > Well, not much apart from the observation that the problem with s2ram on your
> > > machine is probably related to USB. In fact, on Intel chipsets there seem to
> > > be some strange links between the USB controllers (most notably EHCI) and
> > > the core chipset that don't appear to be well documented and this may be the
> > > case in which they show up. Dunno.
> >
> > Sorry for late reply,
> >
> > I now semi-bisected, the change inside this patch.
> >
> > First, UHCI doesn't affect anything.
> >
> > Then, if I move 'late' suspend functions inside normal ones, everything
> > returns works like it used to be (I need to retest this statement, I did
> > other changes too).
> Yep, just moving content of late suspend and content of early resume to
> normal suspend/resume functions fixes this issue.

If I understand correctly, you're referring to the USB controller suspend and
resume callbacks, is that correct?

Rafael

2009-04-09 15:33:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Thu, 2009-04-09 at 17:19 +0200, Rafael J. Wysocki wrote:
> On Thursday 09 April 2009, Maxim Levitsky wrote:
> > On Thu, 2009-04-09 at 15:06 +0300, Maxim Levitsky wrote:
> > > On Tue, 2009-04-07 at 17:24 +0200, Rafael J. Wysocki wrote:
> > > > On Tuesday 07 April 2009, Maxim Levitsky wrote:
> > > > > Hi,
> > > >
> > > > Hi,
> > > >
> > > > > This is first time, I am actually happy about a regression....
> > > > >
> > > > > I have a notebook, aspire 5720G that fails to do two suspends to ram in
> > > > > row.
> > > > >
> > > > > for example I can do s2ram->s2disk->s2ram->...
> > > > > but can't s2ram->s2ram.
> > > > >
> > > > > Well that at least was the situation till now.
> > > > > Also there is no way to debug this - bios doesn't pass control back to
> > > > > linux on failed resume. I tried probably every guess I could come with.
> > > > >
> > > > > Now, after a commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 which I
> > > > > finally bisected, I can't anymore do two suspends to ram at all,
> > > > > regardless of suspend to disk in between. Also bios doesn't pass control
> > > > > when resume fails.
> > > >
> > > > I wonder what happens if the in-between s2disk is in the "shutdown" mode.
> > > > Theoretically, it should work as a cold boot, so the second s2ram should work
> > > > in this case.
> > > >
> > > > > I actually did 3 bisects, as I had to find fixes to 2 more s2ram bugs
> > > > > that were fixed.
> > > > > the fixes are:
> > > > >
> > > > > a0e280e0f33f6c859a235fb69a875ed8f3420388
> > > > > 1e70c7f7a9d4a3d2cc78b40e1d7768d99cd79899
> > > >
> > > > Commit a0d4922da2e4ccb0973095d8d29f36f6b1b5f703 has been modified quite a bit
> > > > by some later commits due to regressions it introduced.
> > > >
> > > > > Now, why I am happy about this:
> > > > > It seems that a suspend cycle changes something that explodes on next
> > > > > resume. a s2disk cycle cleared this, but not anymore, thus the poison
> > > > > must be somehow connected to the
> > > > > a0d4922da2e4ccb0973095d8d29f36f6b1b5f703
> > > >
> > > > Hint: it would be a lot easier to read the message if you included the subjects
> > > > of the commits too. :-)
> > > >
> > > Sure, sorry ;-)
> > >
> > > > > PCI state?? I tried restoring it from saved file, (created on suspend)
> > > > > but this didn't help.
> > > > > (Only a single register, which looks like a clear or write register
> > > > > changed).
> > > > >
> > > > >
> > > > > Also this commits narrows down the search, now it is clear that this is
> > > > > usb related. No wonder bios pokes at usb on resume and stalls.....
> > > > >
> > > > >
> > > > > It could even be connected to bios handoff, maybe we don't do that on
> > > > > resume?
> > > > >
> > > > > (Note: this regression is present all way to latest -git)
> > > >
> > > > I'm not sure if we really should consider this as a regression. s2ram clearly
> > > > didn't work correctly on your machine before and the s2disk in between
> > > > is not really relevant here IMO.
> > > Yet, this allowed me at least lalf the time to do s2ram.
> > > s2disk eats battery, if I suspend the system for short time
> > >
> > >
> > > >
> > > > > What do you think?
> > > >
> > > > Well, not much apart from the observation that the problem with s2ram on your
> > > > machine is probably related to USB. In fact, on Intel chipsets there seem to
> > > > be some strange links between the USB controllers (most notably EHCI) and
> > > > the core chipset that don't appear to be well documented and this may be the
> > > > case in which they show up. Dunno.
> > >
> > > Sorry for late reply,
> > >
> > > I now semi-bisected, the change inside this patch.
> > >
> > > First, UHCI doesn't affect anything.
> > >
> > > Then, if I move 'late' suspend functions inside normal ones, everything
> > > returns works like it used to be (I need to retest this statement, I did
> > > other changes too).
> > Yep, just moving content of late suspend and content of early resume to
> > normal suspend/resume functions fixes this issue.
>
> If I understand correctly, you're referring to the USB controller suspend and
> resume callbacks, is that correct?


Yes exactly , EHCI controller more correctly.
I disabled UHCI completely in kernel config.


Best regards,
Maxim Levitsky

2009-04-20 11:45:33

by Pavel Machek

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

Hi!

> Also this commits narrows down the search, now it is clear that this is
> usb related. No wonder bios pokes at usb on resume and stalls.....

So obvious next step is to compile-out usb support and see what
happens?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-04-20 19:25:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Tuesday 07 April 2009, Pavel Machek wrote:
> Hi!
>
> > Also this commits narrows down the search, now it is clear that this is
> > usb related. No wonder bios pokes at usb on resume and stalls.....
>
> So obvious next step is to compile-out usb support and see what
> happens?

Well, Maxim has already done more than that. Please look a the entire thread.

Best,
Rafael

2009-04-20 20:17:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [BISECTED] [REGRESSION] can't anymore even do a s2ram-s2disk-s2ram cycle on acer aspire 5720G

On Mon, 2009-04-20 at 21:24 +0200, Rafael J. Wysocki wrote:
> On Tuesday 07 April 2009, Pavel Machek wrote:
> > Hi!
> >
> > > Also this commits narrows down the search, now it is clear that this is
> > > usb related. No wonder bios pokes at usb on resume and stalls.....
> >
> > So obvious next step is to compile-out usb support and see what
> > happens?
>
> Well, Maxim has already done more than that. Please look a the entire thread.

That is true, I even compiled kernel back to 2.6.19, and used absolutly
minimal driver configuration (ahci + vga console + keyboard).

(Yes, screen here doesn't came back to life without nvidia propertary
drivers, but system otherwise works fine, and this huge bug, was
confirmed by many peoples out of which I probably an only one with
nvidia device.)

But, notice, that in 2.6.30-rc, same regression happens, but it isn't
anymore due to ehci (I even copied suspend/resume code from the 'good'
version), thus there is some hope that addtional bisect might reveal
things (and I finally managed to set up distcc properly over wlan, and
get 100% on all systems, well 2 systems, the trick is to use many
threads, something like 15 or so, and I can now do full bisect cycle in
about 10 seconds)


Best regards,
Maxim Levitsky