Hi Chen Yu,
On Sun, Sep 25, 2016 at 12:17:57PM +0800, Chen Yu wrote:
> On some platforms, there is occasional panic triggered when trying to
> resume from hibernation, a typical panic looks like:
>
> "BUG: unable to handle kernel paging request at ffff880085894000
> IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
>
> Investigation carried out by Lee Chun-Yi shows that this is because
> e820 map has been changed by BIOS across hibernation, and one
> of the page frames from suspend kernel is right located in restore
> kernel's unmapped region, so panic comes out when accessing unmapped
> kernel address.
>
Sorry for finally I can not find the issue machine back now. So I add
a patch to fool kernel as the e820 changed when S4 resume for testing.
> In order to expose this issue earlier, the md5 hash of e820 map
> is passed from suspend kernel to restore kernel, and the restore
> kernel will terminate the resume process once it finds the md5
> hash are not the same.
>
[...snip]
> ---
> arch/x86/power/hibernate_64.c | 92 ++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 90 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 9634557..d81b1af 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -11,6 +11,10 @@
> #include <linux/gfp.h>
> #include <linux/smp.h>
> #include <linux/suspend.h>
> +#include <linux/scatterlist.h>
> +#include <linux/kdebug.h>
[...snip]
> @@ -216,5 +297,12 @@ int arch_hibernation_header_restore(void *addr)
> restore_jump_address = rdr->jump_address;
> jump_address_phys = rdr->jump_address_phys;
> restore_cr3 = rdr->cr3;
> - return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
> +
> + if (rdr->magic != RESTORE_MAGIC)
> + return -EINVAL;
> +
> + if (hibernation_e820_mismatch(rdr->e820_digest))
> + return -ENODEV;
> +
> + return 0;
> }
> --
Because the check_image_kernel() function doesn't check the return error,
kernel only shows "PM: Image mismatch: architecture specific data". The
message covered two different fail reason.
I suggest that it prints out a log like the restore function in ARM64
architecture. Something like this, please feel free to modify the
wording:
Index: linux/arch/x86/power/hibernate_64.c
===================================================================
--- linux.orig/arch/x86/power/hibernate_64.c
+++ linux/arch/x86/power/hibernate_64.c
@@ -298,11 +298,16 @@ int arch_hibernation_header_restore(void
jump_address_phys = rdr->jump_address_phys;
restore_cr3 = rdr->cr3;
- if (rdr->magic != RESTORE_MAGIC)
+
+ if (rdr->magic != RESTORE_MAGIC) {
+ pr_crit("Hibernate image not generated by this kernel!\n");
return -EINVAL;
+ }
- if (hibernation_e820_mismatch(rdr->e820_digest))
+ if (hibernation_e820_mismatch(rdr->e820_digest)) {
+ pr_crit("The e820 saved regions changed!\n");
return -ENODEV;
+ }
return 0;
}
Other parts in your patch are good to me.
Thanks a lot!
Joey Lee
Hi Joey,
On Sat, Oct 08, 2016 at 12:31:08AM +0800, joeyli wrote:
> Hi Chen Yu,
>
> On Sun, Sep 25, 2016 at 12:17:57PM +0800, Chen Yu wrote:
> > On some platforms, there is occasional panic triggered when trying to
> > resume from hibernation, a typical panic looks like:
> >
> > "BUG: unable to handle kernel paging request at ffff880085894000
> > IP: [<ffffffff810c5dc2>] load_image_lzo+0x8c2/0xe70"
> >
> > Investigation carried out by Lee Chun-Yi shows that this is because
> > e820 map has been changed by BIOS across hibernation, and one
> > of the page frames from suspend kernel is right located in restore
> > kernel's unmapped region, so panic comes out when accessing unmapped
> > kernel address.
> >
>
> Sorry for finally I can not find the issue machine back now. So I add
> a patch to fool kernel as the e820 changed when S4 resume for testing.
>
> > In order to expose this issue earlier, the md5 hash of e820 map
> > is passed from suspend kernel to restore kernel, and the restore
> > kernel will terminate the resume process once it finds the md5
> > hash are not the same.
> >
> [...snip]
> > ---
> > arch/x86/power/hibernate_64.c | 92 ++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 90 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> > index 9634557..d81b1af 100644
> > --- a/arch/x86/power/hibernate_64.c
> > +++ b/arch/x86/power/hibernate_64.c
> > @@ -11,6 +11,10 @@
> > #include <linux/gfp.h>
> > #include <linux/smp.h>
> > #include <linux/suspend.h>
> > +#include <linux/scatterlist.h>
> > +#include <linux/kdebug.h>
>
> [...snip]
>
> > @@ -216,5 +297,12 @@ int arch_hibernation_header_restore(void *addr)
> > restore_jump_address = rdr->jump_address;
> > jump_address_phys = rdr->jump_address_phys;
> > restore_cr3 = rdr->cr3;
> > - return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
> > +
> > + if (rdr->magic != RESTORE_MAGIC)
> > + return -EINVAL;
> > +
> > + if (hibernation_e820_mismatch(rdr->e820_digest))
> > + return -ENODEV;
> > +
> > + return 0;
> > }
> > --
>
> Because the check_image_kernel() function doesn't check the return error,
> kernel only shows "PM: Image mismatch: architecture specific data". The
> message covered two different fail reason.
>
> I suggest that it prints out a log like the restore function in ARM64
> architecture. Something like this, please feel free to modify the
> wording:
>
> Index: linux/arch/x86/power/hibernate_64.c
> ===================================================================
> --- linux.orig/arch/x86/power/hibernate_64.c
> +++ linux/arch/x86/power/hibernate_64.c
> @@ -298,11 +298,16 @@ int arch_hibernation_header_restore(void
> jump_address_phys = rdr->jump_address_phys;
> restore_cr3 = rdr->cr3;
>
> - if (rdr->magic != RESTORE_MAGIC)
> +
> + if (rdr->magic != RESTORE_MAGIC) {
> + pr_crit("Hibernate image not generated by this kernel!\n");
> return -EINVAL;
> + }
>
> - if (hibernation_e820_mismatch(rdr->e820_digest))
> + if (hibernation_e820_mismatch(rdr->e820_digest)) {
> + pr_crit("The e820 saved regions changed!\n");
> return -ENODEV;
> + }
>
> return 0;
> }
>
OK, will refresh it after 4.9-rc1 released due to a e820 modification
recently.
Thanks,
Yu