2007-06-09 19:54:43

by Thomas Gleixner

[permalink] [raw]
Subject: Jinxed VAIO wreckage - current state of affairs

Andrew's jinxed VAIO breaks with the high resolution timer updates in a
very strange way. Andrew identified the following patch as the culprit:

http://www.tglx.de/projects/hrtimers/2.6.22-rc4/broken-out/clockevents-fix-resume-logic.patch

This makes no sense at all. The patch just moves the timer restart to a
different (later) place in the code and does exactly the same thing as
the current code does.

On resume the VAIO is stuck in the following place:

<Andrews debug session>

We finish swsusp_save() and a few other functions then we go

hibernate
->platform_finish
->acpi_hibernation_finish
->acpi_leave_sleep_state
->acpi_evaluate_object

and there it dies, in this call:

status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL);

I wonder how your patch caused that?

<debugs further>

OK, it gets to the last statement in acpi_evaluate_object():

return_ACPI_STATUS(status);

but doesn't hit the printk on return to the caller,
acpi_leave_sleep_state().

</Andrews debug session>

Some data points:

This happens only, when the local apic timer is used. With PIT the
resume works fine.

I back ported the full high res stuff to 2.6.20. On 2.6.20 the VAIO
survives that patch.

Can the suspend/resume and ACPI wizards please give some hint how to
track this 100% reproducible wreckage down.

Thanks,

tglx



2007-06-09 20:54:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Saturday, 9 June 2007 21:54, Thomas Gleixner wrote:
> Andrew's jinxed VAIO breaks with the high resolution timer updates in a
> very strange way. Andrew identified the following patch as the culprit:
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4/broken-out/clockevents-fix-resume-logic.patch
>
> This makes no sense at all. The patch just moves the timer restart to a
> different (later) place in the code and does exactly the same thing as
> the current code does.
>
> On resume the VAIO is stuck in the following place:
>
> <Andrews debug session>
>
> We finish swsusp_save() and a few other functions then we go
>
> hibernate
> ->platform_finish
> ->acpi_hibernation_finish
> ->acpi_leave_sleep_state
> ->acpi_evaluate_object
>
> and there it dies, in this call:
>
> status = acpi_evaluate_object(NULL, METHOD_NAME__WAK, &arg_list, NULL);
>
> I wonder how your patch caused that?
>
> <debugs further>
>
> OK, it gets to the last statement in acpi_evaluate_object():
>
> return_ACPI_STATUS(status);
>
> but doesn't hit the printk on return to the caller,
> acpi_leave_sleep_state().
>
> </Andrews debug session>
>
> Some data points:
>
> This happens only, when the local apic timer is used. With PIT the
> resume works fine.
>
> I back ported the full high res stuff to 2.6.20. On 2.6.20 the VAIO
> survives that patch.
>
> Can the suspend/resume and ACPI wizards please give some hint how to
> track this 100% reproducible wreckage down.

Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?

There are a couple of patches in there that might help in theory, this series
in particular:

swsusp-remove-incorrect-code-from-userc.patch
swsusp-remove-code-duplication-between-diskc-and-userc.patch
swsusp-introduce-restore-platform-operations.patch
swsusp-fix-hibernation-code-ordering.patch
swsusp-remove-code-duplication-between-diskc-and-userc-fix.patch

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-09 21:05:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Sat, 2007-06-09 at 22:59 +0200, Rafael J. Wysocki wrote:
> > Can the suspend/resume and ACPI wizards please give some hint how to
> > track this 100% reproducible wreckage down.
>
> Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
>
> There are a couple of patches in there that might help in theory, this series
> in particular:
>
> swsusp-remove-incorrect-code-from-userc.patch
> swsusp-remove-code-duplication-between-diskc-and-userc.patch
> swsusp-introduce-restore-platform-operations.patch
> swsusp-fix-hibernation-code-ordering.patch
> swsusp-remove-code-duplication-between-diskc-and-userc-fix.patch

I'm working on a hrt version against -mm, so we can figure this out.

tglx


2007-06-10 03:09:37

by Andrew Morton

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:

> Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?

People would have heard if it was busted ;)

Have seen occasional hangs in e100 resume-from-RAM, and occasional
all-black-and-dead symptoms after resume-from-RAM, but it seems to work at
least 90% of the time.

2007-06-11 02:21:29

by Thomas Davis

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

Andrew Morton wrote:
> On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
>
>> Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
>
> People would have heard if it was busted ;)
>
> Have seen occasional hangs in e100 resume-from-RAM, and occasional
> all-black-and-dead symptoms after resume-from-RAM, but it seems to work at
> least 90% of the time.

You doing than I am on my S580p.

if AHCI is loaded, damn thing will not turn off. Goofy part is, another
Sony S80p at work does NOT have this probelm - same bios, same drive
firmware.

Suspend to disk mostly works - sometimes when you return, the screen is
kinda wonky.

Suspend to ram - going in appears to work, coming out it's dead.

thomas

2007-06-15 12:06:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Sat, 2007-06-09 at 20:08 -0700, Andrew Morton wrote:
> On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
>
> > Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
>
> People would have heard if it was busted ;)

Just found a brown paperbag bug in the resume patch logic. Sigh, I was
staring at that code for month without noticing.

Andrew,

can you please test

http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt9.patch

Thanks,

tglx



2007-06-15 15:14:17

by Johannes Weiner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

Hi Thomas,

On Fri, Jun 15, 2007 at 02:05:57PM +0200, Thomas Gleixner wrote:
> Just found a brown paperbag bug in the resume patch logic. Sigh, I was
> staring at that code for month without noticing.
>
> Andrew,
>
> can you please test
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt9.patch

404

2007-06-15 15:25:13

by Randy Dunlap

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Fri, 15 Jun 2007 14:05:57 +0200 Thomas Gleixner wrote:

> On Sat, 2007-06-09 at 20:08 -0700, Andrew Morton wrote:
> > On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > > Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
> >
> > People would have heard if it was busted ;)
>
> Just found a brown paperbag bug in the resume patch logic. Sigh, I was
> staring at that code for month without noticing.
>
> Andrew,
>
> can you please test
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt9.patch

bad URL, not found.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2007-06-15 15:31:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Fri, 2007-06-15 at 08:23 -0700, Randy Dunlap wrote:
> On Fri, 15 Jun 2007 14:05:57 +0200 Thomas Gleixner wrote:
>
> > On Sat, 2007-06-09 at 20:08 -0700, Andrew Morton wrote:
> > > On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
> > >
> > > > Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
> > >
> > > People would have heard if it was busted ;)
> >
> > Just found a brown paperbag bug in the resume patch logic. Sigh, I was
> > staring at that code for month without noticing.
> >
> > Andrew,
> >
> > can you please test
> >
> > http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt9.patch
>
> bad URL, not found.

http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt10.patch

tglx


2007-06-16 06:32:51

by Andrew Morton

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Fri, 15 Jun 2007 17:31:06 +0200 Thomas Gleixner <[email protected]> wrote:

> On Fri, 2007-06-15 at 08:23 -0700, Randy Dunlap wrote:
> > On Fri, 15 Jun 2007 14:05:57 +0200 Thomas Gleixner wrote:
> >
> > > On Sat, 2007-06-09 at 20:08 -0700, Andrew Morton wrote:
> > > > On Sat, 9 Jun 2007 22:59:49 +0200 "Rafael J. Wysocki" <[email protected]> wrote:
> > > >
> > > > > Hmm, how's 2.6.22-rc4-mm2 doing on the Vaio?
> > > >
> > > > People would have heard if it was busted ;)
> > >
> > > Just found a brown paperbag bug in the resume patch logic. Sigh, I was
> > > staring at that code for month without noticing.
> > >
> > > Andrew,
> > >
> > > can you please test
> > >
> > > http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt9.patch
> >
> > bad URL, not found.
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt10.patch
>

That suspends and resumes OK.

What was the bug?

2007-06-16 06:47:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Fri, 2007-06-15 at 23:31 -0700, Andrew Morton wrote:
> >
> > http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt10.patch
> >
>
> That suspends and resumes OK.
>
> What was the bug?

A stupid state check, which prevented the PIT to be setup again. So the
box got stuck waiting for a timer to expire.

Still I can not explain, why this resulted in this strange "disappear in
the return instruction" behavior.

tglx


2007-06-16 06:53:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Sat, 2007-06-16 at 08:47 +0200, Thomas Gleixner wrote:
> On Fri, 2007-06-15 at 23:31 -0700, Andrew Morton wrote:
> > >
> > > http://www.tglx.de/projects/hrtimers/2.6.22-rc4/patch-2.6.22-rc4-hrt10.patch
> > >
> >
> > That suspends and resumes OK.
> >
> > What was the bug?
>
> A stupid state check, which prevented the PIT to be setup again. So the
> box got stuck waiting for a timer to expire.
>
> Still I can not explain, why this resulted in this strange "disappear in
> the return instruction" behavior.

I put up a fixed patch series against rc4-mm to:

http://www.tglx.de/projects/hrtimers/2.6.22-rc4-mm2/patch-2.6.22-rc4-mm2-hrt3.patches.tar.bz2

tglx


2007-06-16 07:20:48

by Andrew Morton

[permalink] [raw]
Subject: Re: Jinxed VAIO wreckage - current state of affairs

On Sat, 16 Jun 2007 08:53:33 +0200 Thomas Gleixner <[email protected]> wrote:

> > Still I can not explain, why this resulted in this strange "disappear in
> > the return instruction" behavior.
>
> I put up a fixed patch series against rc4-mm to:
>
> http://www.tglx.de/projects/hrtimers/2.6.22-rc4-mm2/patch-2.6.22-rc4-mm2-hrt3.patches.tar.bz2

I expect that everyone's forgotten what they do. It wouldn't hurt to send
them all out in the usual fashion.

btw, I have a huge backlog here (almost two days' worth!) and I'll be only
intermittently up for a couple of weeks. There will be some delays,
sorry.