2015-04-02 15:28:09

by Pavel Machek

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Wed 2015-04-01 21:47:43, rhn wrote:
> Hello,
>
> Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
>
> The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
>
> I have tracked the problem to first appear in the commit
> e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
>
> The problem itself manifests in dmesg as follows (system was first
> restarted, then hibernated - this log is from the subsequent
resume):

Ok, can you try to disable cpufreq and cpuidle, and then try if it
reproduces?

At that point, this is the candidate:

commit e67ee10190e69332f929bdd6594a312363321a66
Merge: 21c806d 84c91b7 39c8bba 372ba8c
Author: Rafael J. Wysocki <[email protected]>
Date: Mon Aug 11 23:19:48 2014 +0200

Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'

* pm-sleep:
PM / hibernate: avoid unsafe pages in e820 reserved regions

...
Alternatively, you can just try to revert

commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
Author: Lee, Chun-Yi <[email protected]>
Date: Mon Aug 4 23:23:21 2014 +0800

PM / hibernate: avoid unsafe pages in e820 reserved regions

When the machine doesn't well handle the e820 persistent when
hibernate
resuming, then it may cause page fault when writing image to
snapshot
buffer:


...

Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2015-04-02 16:51:08

by joeyli

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

Hi,

On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> On Wed 2015-04-01 21:47:43, rhn wrote:
> > Hello,
> >
> > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> >
> > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> >
> > I have tracked the problem to first appear in the commit
> > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> >
> > The problem itself manifests in dmesg as follows (system was first
> > restarted, then hibernated - this log is from the subsequent
> resume):
>
> Ok, can you try to disable cpufreq and cpuidle, and then try if it
> reproduces?
>
> At that point, this is the candidate:
>
> commit e67ee10190e69332f929bdd6594a312363321a66
> Merge: 21c806d 84c91b7 39c8bba 372ba8c
> Author: Rafael J. Wysocki <[email protected]>
> Date: Mon Aug 11 23:19:48 2014 +0200
>
> Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
>
> * pm-sleep:
> PM / hibernate: avoid unsafe pages in e820 reserved regions
>
> ...
> Alternatively, you can just try to revert
>
> commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> Author: Lee, Chun-Yi <[email protected]>
> Date: Mon Aug 4 23:23:21 2014 +0800
>
> PM / hibernate: avoid unsafe pages in e820 reserved regions
>
> When the machine doesn't well handle the e820 persistent when
> hibernate
> resuming, then it may cause page fault when writing image to
> snapshot
> buffer:
>
>
> ...
>
> Thanks,
> Pavel

Before revert 84c91b7ae patch, please check does there have log similar as
following in dmesg when hibernate resume fail?

[ 24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]

The address may different, by you should see "e820 nosave region" log. Otherwise
we got another problem.


Thanks a lot!
Joey Lee

2015-04-02 17:22:56

by joeyli

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, Apr 03, 2015 at 12:50:54AM +0800, joeyli wrote:
> Hi,
>
> On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > Hello,
> > >
> > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > >
> > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > >
> > > I have tracked the problem to first appear in the commit
> > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > The problem itself manifests in dmesg as follows (system was first
> > > restarted, then hibernated - this log is from the subsequent
> > resume):
> >
> > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > reproduces?
> >
> > At that point, this is the candidate:
> >
> > commit e67ee10190e69332f929bdd6594a312363321a66
> > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > Author: Rafael J. Wysocki <[email protected]>
> > Date: Mon Aug 11 23:19:48 2014 +0200
> >
> > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> >
> > * pm-sleep:
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > ...
> > Alternatively, you can just try to revert
> >
> > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > Author: Lee, Chun-Yi <[email protected]>
> > Date: Mon Aug 4 23:23:21 2014 +0800
> >
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > When the machine doesn't well handle the e820 persistent when
> > hibernate
> > resuming, then it may cause page fault when writing image to
> > snapshot
> > buffer:
> >
> >
> > ...
> >
> > Thanks,
> > Pavel
>
> Before revert 84c91b7ae patch, please check does there have log similar as
> following in dmesg when hibernate resume fail?
>
> [ 24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]
>
> The address may different, by you should see "e820 nosave region" log. Otherwise
> we got another problem.
>

Forgot to mention, please add "debug no_console_suspend=1 loglevel=9" to kernel
parameter then try to reproduce issue and look at dmesg.


Thanks a lot!
Joey Lee

2015-04-02 18:19:09

by rhn

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, 3 Apr 2015 01:22:21 +0800
joeyli <[email protected]> wrote:

> On Fri, Apr 03, 2015 at 12:50:54AM +0800, joeyli wrote:
> > Hi,
> >
> > On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > Hello,
> > > >
> > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > >
> > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > >
> > > > I have tracked the problem to first appear in the commit
> > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > The problem itself manifests in dmesg as follows (system was first
> > > > restarted, then hibernated - this log is from the subsequent
> > > resume):
> > >
> > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > reproduces?
> > >
> > > At that point, this is the candidate:
> > >
> > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > Author: Rafael J. Wysocki <[email protected]>
> > > Date: Mon Aug 11 23:19:48 2014 +0200
> > >
> > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > * pm-sleep:
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > ...
> > > Alternatively, you can just try to revert
> > >
> > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > Author: Lee, Chun-Yi <[email protected]>
> > > Date: Mon Aug 4 23:23:21 2014 +0800
> > >
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > When the machine doesn't well handle the e820 persistent when
> > > hibernate
> > > resuming, then it may cause page fault when writing image to
> > > snapshot
> > > buffer:
> > >
> > >
> > > ...
> > >
> > > Thanks,
> > > Pavel
> >
> > Before revert 84c91b7ae patch, please check does there have log similar as
> > following in dmesg when hibernate resume fail?
> >
> > [ 24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]
> >
> > The address may different, by you should see "e820 nosave region" log. Otherwise
> > we got another problem.
> >
>
> Forgot to mention, please add "debug no_console_suspend=1 loglevel=9" to kernel
> parameter then try to reproduce issue and look at dmesg.
>
>
> Thanks a lot!
> Joey Lee

Yes, it's present in dmesg when hibernate fails (default kernel params):
[ 3.138824] PM: 0x9d3d3000 in e820 nosave region: [mem 0x9d3d3000-0x9d3d3fff]

I probably didn't make it clear - the top dmesg in my original message was from failed resume.

Cheers,
rhn

2015-04-03 01:23:54

by joeyli

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Thu, Apr 02, 2015 at 08:12:00PM +0200, rhn wrote:
> On Fri, 3 Apr 2015 01:22:21 +0800
> joeyli <[email protected]> wrote:
>
> > On Fri, Apr 03, 2015 at 12:50:54AM +0800, joeyli wrote:
> > > Hi,
> > >
> > > On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> > > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > > Hello,
> > > > >
> > > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > > >
> > > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > > >
> > > > > I have tracked the problem to first appear in the commit
> > > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > >
> > > > > The problem itself manifests in dmesg as follows (system was first
> > > > > restarted, then hibernated - this log is from the subsequent
> > > > resume):
> > > >
> > > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > > reproduces?
> > > >
> > > > At that point, this is the candidate:
> > > >
> > > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > > Author: Rafael J. Wysocki <[email protected]>
> > > > Date: Mon Aug 11 23:19:48 2014 +0200
> > > >
> > > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > * pm-sleep:
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > ...
> > > > Alternatively, you can just try to revert
> > > >
> > > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > > Author: Lee, Chun-Yi <[email protected]>
> > > > Date: Mon Aug 4 23:23:21 2014 +0800
> > > >
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > When the machine doesn't well handle the e820 persistent when
> > > > hibernate
> > > > resuming, then it may cause page fault when writing image to
> > > > snapshot
> > > > buffer:
> > > >
> > > >
> > > > ...
> > > >
> > > > Thanks,
> > > > Pavel
> > >
> > > Before revert 84c91b7ae patch, please check does there have log similar as
> > > following in dmesg when hibernate resume fail?
> > >
> > > [ 24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]
> > >
> > > The address may different, by you should see "e820 nosave region" log. Otherwise
> > > we got another problem.
> > >
> >
> > Forgot to mention, please add "debug no_console_suspend=1 loglevel=9" to kernel
> > parameter then try to reproduce issue and look at dmesg.
> >
> >
> > Thanks a lot!
> > Joey Lee
>
> Yes, it's present in dmesg when hibernate fails (default kernel params):
> [ 3.138824] PM: 0x9d3d3000 in e820 nosave region: [mem 0x9d3d3000-0x9d3d3fff]
>

OK, then the message means 0x9d3d3000 address used by image kernel but in e820
region of current boot. Need check does this e820 region used by setup_data so
reserved as E820_RESERVED_KERN.

Need your complete dmesg to verify the e820 table. If the above assumption is
true, then Yinghai Lu's patchset could fix this problem:

x86: Kill E820_RESERVED_KERN
https://lkml.org/lkml/2015/3/4/434

The target kernel version to merge his patches is v4.1

> I probably didn't make it clear - the top dmesg in my original message was from failed resume.
>
> Cheers,
> rhn

On the other hand,
Could you please check you are using platform mode to turn off machine for
hibernating?

$ cat /sys/power/disk
[platform] shutdown reboot suspend

And, if possible, please file bug on bugzilla.kernel.org and give me the bug
number. I prefer collect log and debugging history in bugzilla for further
tracking.


Thanks a lot!
Joey Lee

2015-04-03 15:58:58

by rhn

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Thu, 2 Apr 2015 17:28:05 +0200
Pavel Machek <[email protected]> wrote:

> On Wed 2015-04-01 21:47:43, rhn wrote:
> > Hello,
> >
> > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> >
> > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> >
> > I have tracked the problem to first appear in the commit
> > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> >
> > The problem itself manifests in dmesg as follows (system was first
> > restarted, then hibernated - this log is from the subsequent
> resume):
>
> Ok, can you try to disable cpufreq and cpuidle, and then try if it
> reproduces?
>
> At that point, this is the candidate:
>
> commit e67ee10190e69332f929bdd6594a312363321a66
> Merge: 21c806d 84c91b7 39c8bba 372ba8c
> Author: Rafael J. Wysocki <[email protected]>
> Date: Mon Aug 11 23:19:48 2014 +0200
>
> Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
>
> * pm-sleep:
> PM / hibernate: avoid unsafe pages in e820 reserved regions
>
> ...
> Alternatively, you can just try to revert
>
> commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> Author: Lee, Chun-Yi <[email protected]>
> Date: Mon Aug 4 23:23:21 2014 +0800
>
> PM / hibernate: avoid unsafe pages in e820 reserved regions
>
> When the machine doesn't well handle the e820 persistent when
> hibernate
> resuming, then it may cause page fault when writing image to
> snapshot
> buffer:
>
>
> ...
>
> Thanks,
> Pavel

I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.

The bug persisted.

Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.

I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111

Cheers,
rhn

2015-04-03 16:00:34

by rhn

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, 3 Apr 2015 09:23:35 +0800
joeyli <[email protected]> wrote:

> On Thu, Apr 02, 2015 at 08:12:00PM +0200, rhn wrote:
> > On Fri, 3 Apr 2015 01:22:21 +0800
> > joeyli <[email protected]> wrote:
> >
> > > On Fri, Apr 03, 2015 at 12:50:54AM +0800, joeyli wrote:
> > > > Hi,
> > > >
> > > > On Thu, Apr 02, 2015 at 05:28:05PM +0200, Pavel Machek wrote:
> > > > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > > > Hello,
> > > > > >
> > > > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > > > >
> > > > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > > > >
> > > > > > I have tracked the problem to first appear in the commit
> > > > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > > >
> > > > > > The problem itself manifests in dmesg as follows (system was first
> > > > > > restarted, then hibernated - this log is from the subsequent
> > > > > resume):
> > > > >
> > > > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > > > reproduces?
> > > > >
> > > > > At that point, this is the candidate:
> > > > >
> > > > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > > > Author: Rafael J. Wysocki <[email protected]>
> > > > > Date: Mon Aug 11 23:19:48 2014 +0200
> > > > >
> > > > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > >
> > > > > * pm-sleep:
> > > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > > >
> > > > > ...
> > > > > Alternatively, you can just try to revert
> > > > >
> > > > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > > > Author: Lee, Chun-Yi <[email protected]>
> > > > > Date: Mon Aug 4 23:23:21 2014 +0800
> > > > >
> > > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > > >
> > > > > When the machine doesn't well handle the e820 persistent when
> > > > > hibernate
> > > > > resuming, then it may cause page fault when writing image to
> > > > > snapshot
> > > > > buffer:
> > > > >
> > > > >
> > > > > ...
> > > > >
> > > > > Thanks,
> > > > > Pavel
> > > >
> > > > Before revert 84c91b7ae patch, please check does there have log similar as
> > > > following in dmesg when hibernate resume fail?
> > > >
> > > > [ 24.349777] PM: 0xab9bc000 in e820 nosave region: [mem 0xab9bc000-0xab9c2fff]
> > > >
> > > > The address may different, by you should see "e820 nosave region" log. Otherwise
> > > > we got another problem.
> > > >
> > >
> > > Forgot to mention, please add "debug no_console_suspend=1 loglevel=9" to kernel
> > > parameter then try to reproduce issue and look at dmesg.
> > >
> > >
> > > Thanks a lot!
> > > Joey Lee
> >
> > Yes, it's present in dmesg when hibernate fails (default kernel params):
> > [ 3.138824] PM: 0x9d3d3000 in e820 nosave region: [mem 0x9d3d3000-0x9d3d3fff]
> >
>
> OK, then the message means 0x9d3d3000 address used by image kernel but in e820
> region of current boot. Need check does this e820 region used by setup_data so
> reserved as E820_RESERVED_KERN.
>
> Need your complete dmesg to verify the e820 table. If the above assumption is
> true, then Yinghai Lu's patchset could fix this problem:
>
> x86: Kill E820_RESERVED_KERN
> https://lkml.org/lkml/2015/3/4/434
>
> The target kernel version to merge his patches is v4.1
>
> > I probably didn't make it clear - the top dmesg in my original message was from failed resume.
> >
> > Cheers,
> > rhn
>
> On the other hand,
> Could you please check you are using platform mode to turn off machine for
> hibernating?
>
> $ cat /sys/power/disk
> [platform] shutdown reboot suspend
>
> And, if possible, please file bug on bugzilla.kernel.org and give me the bug
> number. I prefer collect log and debugging history in bugzilla for further
> tracking.
>
>
> Thanks a lot!
> Joey Lee

Yes, platform mode was used in all instances - both working and broken kernels.

I included full dmesg in the bug report on bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=96111

Cheers,
rhn

2015-04-03 16:40:39

by Pavel Machek

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

Hi!

> > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > >
> > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > >
> > > I have tracked the problem to first appear in the commit
> > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > The problem itself manifests in dmesg as follows (system was first
> > > restarted, then hibernated - this log is from the subsequent
> > resume):
> >
> > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > reproduces?
> >
> > At that point, this is the candidate:
> >
> > commit e67ee10190e69332f929bdd6594a312363321a66
> > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > Author: Rafael J. Wysocki <[email protected]>
> > Date: Mon Aug 11 23:19:48 2014 +0200
> >
> > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> >
> > * pm-sleep:
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > ...
> > Alternatively, you can just try to revert
> >
> > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > Author: Lee, Chun-Yi <[email protected]>
> > Date: Mon Aug 4 23:23:21 2014 +0800
> >
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > When the machine doesn't well handle the e820 persistent when
> > hibernate
> > resuming, then it may cause page fault when writing image to
> > snapshot
> > buffer:
> >
> >
> > ...
> >
> > Thanks,
> > Pavel
>
> I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
>
> The bug persisted.
>
> Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.

Ok, I guess next steps would be verify if 4.0 has the problem, and if
revert of 84c91b7 there fixes it, too... maybe we should revert it for
4.0?


Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-04-03 21:19:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Friday, April 03, 2015 05:58:25 PM rhn wrote:
> On Thu, 2 Apr 2015 17:28:05 +0200
> Pavel Machek <[email protected]> wrote:
>
> > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > Hello,
> > >
> > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > >
> > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > >
> > > I have tracked the problem to first appear in the commit
> > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > The problem itself manifests in dmesg as follows (system was first
> > > restarted, then hibernated - this log is from the subsequent
> > resume):
> >
> > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > reproduces?
> >
> > At that point, this is the candidate:
> >
> > commit e67ee10190e69332f929bdd6594a312363321a66
> > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > Author: Rafael J. Wysocki <[email protected]>
> > Date: Mon Aug 11 23:19:48 2014 +0200
> >
> > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> >
> > * pm-sleep:
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > ...
> > Alternatively, you can just try to revert
> >
> > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > Author: Lee, Chun-Yi <[email protected]>
> > Date: Mon Aug 4 23:23:21 2014 +0800
> >
> > PM / hibernate: avoid unsafe pages in e820 reserved regions
> >
> > When the machine doesn't well handle the e820 persistent when
> > hibernate
> > resuming, then it may cause page fault when writing image to
> > snapshot
> > buffer:
> >
> >
> > ...
> >
> > Thanks,
> > Pavel
>
> I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
>
> The bug persisted.
>
> Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.
>
> I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111

Please check if 4.0-rc6 still has the problem and if reverting the commit in
question on top of it fixes the problem too.


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-04-04 08:13:04

by rhn

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, 03 Apr 2015 23:43:30 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> On Friday, April 03, 2015 05:58:25 PM rhn wrote:
> > On Thu, 2 Apr 2015 17:28:05 +0200
> > Pavel Machek <[email protected]> wrote:
> >
> > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > Hello,
> > > >
> > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > >
> > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > >
> > > > I have tracked the problem to first appear in the commit
> > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > The problem itself manifests in dmesg as follows (system was first
> > > > restarted, then hibernated - this log is from the subsequent
> > > resume):
> > >
> > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > reproduces?
> > >
> > > At that point, this is the candidate:
> > >
> > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > Author: Rafael J. Wysocki <[email protected]>
> > > Date: Mon Aug 11 23:19:48 2014 +0200
> > >
> > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > * pm-sleep:
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > ...
> > > Alternatively, you can just try to revert
> > >
> > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > Author: Lee, Chun-Yi <[email protected]>
> > > Date: Mon Aug 4 23:23:21 2014 +0800
> > >
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > When the machine doesn't well handle the e820 persistent when
> > > hibernate
> > > resuming, then it may cause page fault when writing image to
> > > snapshot
> > > buffer:
> > >
> > >
> > > ...
> > >
> > > Thanks,
> > > Pavel
> >
> > I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
> >
> > The bug persisted.
> >
> > Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.
> >
> > I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111
>
> Please check if 4.0-rc6 still has the problem and if reverting the commit in
> question on top of it fixes the problem too.
>
>

I took the commit 8f778bbc542ddf8f6243b21d6aca087e709cabdc as the base for further checking (I started building before I read your message). It's a descendant of 4.0-rc6, so I hope it's not going to make a difference.

Results:
8f778bb : bad
8f778bb + reverted 84c91b7 : good
8f778bb + patch [1] : good

Thanks!

[1]:
x86: Kill E820_RESERVED_KERN https://lkml.org/lkml/2015/3/4/434 as suggested in joeyli's other email.

2015-04-05 07:25:02

by joeyli

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

Hi Rafael,

On Sat, Apr 04, 2015 at 10:12:43AM +0200, rhn wrote:
> On Fri, 03 Apr 2015 23:43:30 +0200
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Friday, April 03, 2015 05:58:25 PM rhn wrote:
> > > On Thu, 2 Apr 2015 17:28:05 +0200
> > > Pavel Machek <[email protected]> wrote:
> > >
> > > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > > Hello,
> > > > >
> > > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > > >
> > > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > > >
> > > > > I have tracked the problem to first appear in the commit
> > > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > >
> > > > > The problem itself manifests in dmesg as follows (system was first
> > > > > restarted, then hibernated - this log is from the subsequent
> > > > resume):
> > > >
> > > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > > reproduces?
> > > >
> > > > At that point, this is the candidate:
> > > >
> > > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > > Author: Rafael J. Wysocki <[email protected]>
> > > > Date: Mon Aug 11 23:19:48 2014 +0200
> > > >
> > > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > * pm-sleep:
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > ...
> > > > Alternatively, you can just try to revert
> > > >
> > > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > > Author: Lee, Chun-Yi <[email protected]>
> > > > Date: Mon Aug 4 23:23:21 2014 +0800
> > > >
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > When the machine doesn't well handle the e820 persistent when
> > > > hibernate
> > > > resuming, then it may cause page fault when writing image to
> > > > snapshot
> > > > buffer:
> > > >
> > > >
> > > > ...
> > > >
> > > > Thanks,
> > > > Pavel
> > >
> > > I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
> > >
> > > The bug persisted.
> > >
> > > Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.
> > >
> > > I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111
> >
> > Please check if 4.0-rc6 still has the problem and if reverting the commit in
> > question on top of it fixes the problem too.
> >
> >
>
> I took the commit 8f778bbc542ddf8f6243b21d6aca087e709cabdc as the base for further checking (I started building before I read your message). It's a descendant of 4.0-rc6, so I hope it's not going to make a difference.
>
> Results:
> 8f778bb : bad
> 8f778bb + reverted 84c91b7 : good
> 8f778bb + patch [1] : good

Thanks for your dmesg on bko#96111.
I checked and confirm there have the situation of setup_data reserved as E820_RESERVED_KERN.
I will add comment on bugzilla.

>
> Thanks!
>
> [1]:
> x86: Kill E820_RESERVED_KERN https://lkml.org/lkml/2015/3/4/434 as suggested in joeyli's other email.

I think just revert 84c91b7ae until Yinghai Lu's patches merged to v4.1.
I will resend 84c91b7ae patch until Yinghai Lu's patches merged.


Regards
Joey Lee

2015-04-05 07:26:35

by joeyli

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Fri, Apr 03, 2015 at 11:43:30PM +0200, Rafael J. Wysocki wrote:
> On Friday, April 03, 2015 05:58:25 PM rhn wrote:
> > On Thu, 2 Apr 2015 17:28:05 +0200
> > Pavel Machek <[email protected]> wrote:
> >
> > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > Hello,
> > > >
> > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > >
> > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > >
> > > > I have tracked the problem to first appear in the commit
> > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > The problem itself manifests in dmesg as follows (system was first
> > > > restarted, then hibernated - this log is from the subsequent
> > > resume):
> > >
> > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > reproduces?
> > >
> > > At that point, this is the candidate:
> > >
> > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > Author: Rafael J. Wysocki <[email protected]>
> > > Date: Mon Aug 11 23:19:48 2014 +0200
> > >
> > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > >
> > > * pm-sleep:
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > ...
> > > Alternatively, you can just try to revert
> > >
> > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > Author: Lee, Chun-Yi <[email protected]>
> > > Date: Mon Aug 4 23:23:21 2014 +0800
> > >
> > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > >
> > > When the machine doesn't well handle the e820 persistent when
> > > hibernate
> > > resuming, then it may cause page fault when writing image to
> > > snapshot
> > > buffer:
> > >
> > >
> > > ...
> > >
> > > Thanks,
> > > Pavel
> >
> > I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
> >
> > The bug persisted.
> >
> > Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.
> >
> > I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111
>
> Please check if 4.0-rc6 still has the problem and if reverting the commit in
> question on top of it fixes the problem too.
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

I think just revert 84c91b7ae until Yinghai Lu's patches merged to v4.1.
I will resend 84c91b7ae patch until Yinghai Lu's patches merged.


Regards
Joey Lee

2015-04-05 07:51:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Sun, Apr 5, 2015 at 12:24 AM, joeyli <[email protected]> wrote:
>> >
>>
>> I took the commit 8f778bbc542ddf8f6243b21d6aca087e709cabdc as the base for further checking (I started building before I read your message). It's a descendant of 4.0-rc6, so I hope it's not going to make a difference.
>>
>> Results:
>> 8f778bb : bad
>> 8f778bb + reverted 84c91b7 : good
>> 8f778bb + patch [1] : good
>
> Thanks for your dmesg on bko#96111.
> I checked and confirm there have the situation of setup_data reserved as E820_RESERVED_KERN.
> I will add comment on bugzilla.
>
>>
>> Thanks!
>>
>> [1]:
>> x86: Kill E820_RESERVED_KERN https://lkml.org/lkml/2015/3/4/434 as suggested in joeyli's other email.
>
> I think just revert 84c91b7ae until Yinghai Lu's patches merged to v4.1.
> I will resend 84c91b7ae patch until Yinghai Lu's patches merged.

Can you please put https://lkml.org/lkml/2015/3/4/434
into tip?

Thanks

Yinghai

2015-04-06 07:12:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)


* Yinghai Lu <[email protected]> wrote:

> On Sun, Apr 5, 2015 at 12:24 AM, joeyli <[email protected]> wrote:
> >> >
> >>
> >> I took the commit 8f778bbc542ddf8f6243b21d6aca087e709cabdc as the base for further checking (I started building before I read your message). It's a descendant of 4.0-rc6, so I hope it's not going to make a difference.
> >>
> >> Results:
> >> 8f778bb : bad
> >> 8f778bb + reverted 84c91b7 : good
> >> 8f778bb + patch [1] : good
> >
> > Thanks for your dmesg on bko#96111.
> > I checked and confirm there have the situation of setup_data reserved as E820_RESERVED_KERN.
> > I will add comment on bugzilla.
> >
> >>
> >> Thanks!
> >>
> >> [1]:
> >> x86: Kill E820_RESERVED_KERN https://lkml.org/lkml/2015/3/4/434 as suggested in joeyli's other email.
> >
> > I think just revert 84c91b7ae until Yinghai Lu's patches merged to v4.1.
> > I will resend 84c91b7ae patch until Yinghai Lu's patches merged.
>
> Can you please put https://lkml.org/lkml/2015/3/4/434
> into tip?

I cannot apply this patch without a readable changelog, see:

http://lkml.iu.edu/hypermail/linux/kernel/1503.1/05342.html

Your changelog (again) violates about half of the principles I tried
to outline in that post.

Thanks,

Ingo

2015-04-06 23:03:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Unreliable hibernation on Lenovo x230 (regression)

On Sunday, April 05, 2015 03:26:13 PM joeyli wrote:
> On Fri, Apr 03, 2015 at 11:43:30PM +0200, Rafael J. Wysocki wrote:
> > On Friday, April 03, 2015 05:58:25 PM rhn wrote:
> > > On Thu, 2 Apr 2015 17:28:05 +0200
> > > Pavel Machek <[email protected]> wrote:
> > >
> > > > On Wed 2015-04-01 21:47:43, rhn wrote:
> > > > > Hello,
> > > > >
> > > > > Between kernel 3.16 and 3.17, a regression has been introduced where the first hibernation after regular shutdown always fails to resume. Subsequent hibernations succeed.
> > > > >
> > > > > The system is a Lenovo x230 with Intel i5, booting with EFI, with the hibernate partition located on a secondary SSD drive. Installed system is Fedora 20, hibernation and reboots were issued using the KDE shutdown dialog.
> > > > >
> > > > > I have tracked the problem to first appear in the commit
> > > > > e67ee10190e69332f929bdd6594a312363321a66 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > > >
> > > > > The problem itself manifests in dmesg as follows (system was first
> > > > > restarted, then hibernated - this log is from the subsequent
> > > > resume):
> > > >
> > > > Ok, can you try to disable cpufreq and cpuidle, and then try if it
> > > > reproduces?
> > > >
> > > > At that point, this is the candidate:
> > > >
> > > > commit e67ee10190e69332f929bdd6594a312363321a66
> > > > Merge: 21c806d 84c91b7 39c8bba 372ba8c
> > > > Author: Rafael J. Wysocki <[email protected]>
> > > > Date: Mon Aug 11 23:19:48 2014 +0200
> > > >
> > > > Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'
> > > >
> > > > * pm-sleep:
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > ...
> > > > Alternatively, you can just try to revert
> > > >
> > > > commit 84c91b7ae07c62cf6dee7fde3277f4be21331f85
> > > > Author: Lee, Chun-Yi <[email protected]>
> > > > Date: Mon Aug 4 23:23:21 2014 +0800
> > > >
> > > > PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > >
> > > > When the machine doesn't well handle the e820 persistent when
> > > > hibernate
> > > > resuming, then it may cause page fault when writing image to
> > > > snapshot
> > > > buffer:
> > > >
> > > >
> > > > ...
> > > >
> > > > Thanks,
> > > > Pavel
> > >
> > > I tried to disable CONFIG_CPU_IDLE and CONFIG_CPU_FREQ, however for some reason I could only disable CONFIG_CPU_FREQ.
> > >
> > > The bug persisted.
> > >
> > > Reverting the commit 84c91b7 on top of e67ee10 fixes the problem.
> > >
> > > I created a copy of the bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=96111
> >
> > Please check if 4.0-rc6 still has the problem and if reverting the commit in
> > question on top of it fixes the problem too.
> >
> >
>
> I think just revert 84c91b7ae until Yinghai Lu's patches merged to v4.1.
> I will resend 84c91b7ae patch until Yinghai Lu's patches merged.

OK, I'll queue up a revert of 84c91b7ae as a fix for 4.0.


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.