2018-02-27 12:56:04

by Woody Suwalski

[permalink] [raw]
Subject: 4.16 regression: s2ram broken on non-PAE i686

There is a problem with s2ram on 4.16, and it has now been propagated
to 4.15 and 4.14 stable updates.

It originates from

commit 62c00e6122a6b5aa7b1350023967a2d7a12b54c9
Author: William Grant <[email protected]
<mailto:[email protected]>>
Date: Tue Jan 30 22:22:55 2018 +1100

x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP

s2ram works OK on PAE kernels, breaks badly on non-PAE. I do not think
that the problem can be duplicated in VMPlayer, but it is 100%
reproducible on a "real" hardware.
System goes to sleep OK, but when woken - it reboots the PC.

The issue is tracked in Bugzilla bug 198763
[https://bugzilla.kernel.org/show_bug.cgi?id=198763]

Thanks, Woody


2018-02-28 09:00:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.16 regression: s2ram broken on non-PAE i686

Woody,

On Tue, 27 Feb 2018, Woody Suwalski wrote:

> There is a problem with s2ram on 4.16, and it has now been propagated
> to 4.15 and 4.14 stable updates.
>
> It originates from
>
> commit 62c00e6122a6b5aa7b1350023967a2d7a12b54c9
> Author: William Grant <[email protected]
> <mailto:[email protected]>>
> Date: Tue Jan 30 22:22:55 2018 +1100
>
> x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP
>
> s2ram works OK on PAE kernels, breaks badly on non-PAE. I do not think
> that the problem can be duplicated in VMPlayer, but it is 100%
> reproducible on a "real" hardware.
> System goes to sleep OK, but when woken - it reboots the PC.
>
> The issue is tracked in Bugzilla bug 198763
> [https://bugzilla.kernel.org/show_bug.cgi?id=198763]

Thanks for digging into this so far. Can you please provide dmesg output
from a PAE=y and PAE=n kernel after boot?

Thanks,

tglx

2018-02-28 10:47:53

by Woody Suwalski

[permalink] [raw]
Subject: Re: 4.16 regression: s2ram broken on non-PAE i686

Thomas Gleixner wrote:
> Woody,
>
> On Tue, 27 Feb 2018, Woody Suwalski wrote:
>
>> There is a problem with s2ram on 4.16, and it has now been propagated
>> to 4.15 and 4.14 stable updates.
>>
>> It originates from
>>
>> commit 62c00e6122a6b5aa7b1350023967a2d7a12b54c9
>> Author: William Grant <[email protected]
>> <mailto:[email protected]>>
>> Date: Tue Jan 30 22:22:55 2018 +1100
>>
>> x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP
>>
>> s2ram works OK on PAE kernels, breaks badly on non-PAE. I do not think
>> that the problem can be duplicated in VMPlayer, but it is 100%
>> reproducible on a "real" hardware.
>> System goes to sleep OK, but when woken - it reboots the PC.
>>
>> The issue is tracked in Bugzilla bug 198763
>> [https://bugzilla.kernel.org/show_bug.cgi?id=198763]
> Thanks for digging into this so far. Can you please provide dmesg output
> from a PAE=y and PAE=n kernel after boot?
>
> Thanks,
>
> tglx
Certainly. I understand you want dmesg output for kernels build with and
without PAE, not just PAE=n on the cmdline :-)
Here it is...

Thanks, Woody


Attachments:
dmesg.nopae.txt (53.60 kB)
dmesg.pae.txt (53.75 kB)
Download all attachments

2018-02-28 21:21:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.16 regression: s2ram broken on non-PAE i686

Woody,

On Wed, 28 Feb 2018, Woody Suwalski wrote:
> Certainly. I understand you want dmesg output for kernels build with and
> without PAE, not just PAE=n on the cmdline :-)
> Here it is...

Thanks for providing the data. It did not pinpoint the issue but at least
it gave me the hint to look into the right direction.

Does the patch below fix the issue for you? It's untested as I'm at home
and have no access to a 32bit machine right now.

Thanks,

tglx

8<------------------
Subject: x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
From: Thomas Gleixner <[email protected]>
Date: Wed, 28 Feb 2018 21:14:26 +0100

The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.

This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.

Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.

It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.

Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Woody Suwalski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---
arch/x86/include/asm/pgtable_32.h | 1 +
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/setup.c | 17 +++++------------
arch/x86/kernel/setup_percpu.c | 17 ++++-------------
arch/x86/mm/cpu_entry_area.c | 6 ++++++
arch/x86/mm/init_32.c | 15 +++++++++++++++
6 files changed, 32 insertions(+), 25 deletions(-)

--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -32,6 +32,7 @@ extern pmd_t initial_pg_pmd[];
static inline void pgtable_cache_init(void) { }
static inline void check_pgt_cache(void) { }
void paging_init(void);
+void sync_initial_page_table(void);

/*
* Define this if things work differently on an i386 and an i486:
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -28,6 +28,7 @@ extern pgd_t init_top_pgt[];
#define swapper_pg_dir init_top_pgt

extern void paging_init(void);
+static inline void sync_initial_page_table(void) { }

#define pte_ERROR(e) \
pr_err("%s:%d: bad pte %p(%016lx)\n", \
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1204,20 +1204,13 @@ void __init setup_arch(char **cmdline_p)

kasan_init();

-#ifdef CONFIG_X86_32
- /* sync back kernel address range */
- clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
-
/*
- * sync back low identity map too. It is used for example
- * in the 32-bit EFI stub.
+ * Sync back kernel address range.
+ *
+ * FIXME: Can the later sync in setup_cpu_entry_areas() replace
+ * this call?
*/
- clone_pgd_range(initial_page_table,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-#endif
+ sync_initial_page_table();

tboot_probe();

--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -287,24 +287,15 @@ void __init setup_per_cpu_areas(void)
/* Setup cpu initialized, callin, callout masks */
setup_cpu_local_masks();

-#ifdef CONFIG_X86_32
/*
* Sync back kernel address range again. We already did this in
* setup_arch(), but percpu data also needs to be available in
* the smpboot asm. We can't reliably pick up percpu mappings
* using vmalloc_fault(), because exception dispatch needs
* percpu data.
+ *
+ * FIXME: Can the later sync in setup_cpu_entry_areas() replace
+ * this call?
*/
- clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
-
- /*
- * sync back low identity map too. It is used for example
- * in the 32-bit EFI stub.
- */
- clone_pgd_range(initial_page_table,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-#endif
+ sync_initial_page_table();
}
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -163,4 +163,10 @@ void __init setup_cpu_entry_areas(void)

for_each_possible_cpu(cpu)
setup_cpu_entry_area(cpu);
+
+ /*
+ * This is the last essential update to swapper_pgdir which needs
+ * to be synchronized to initial_page_table on 32bit.
+ */
+ sync_initial_page_table();
}
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -453,6 +453,21 @@ static inline void permanent_kmaps_init(
}
#endif /* CONFIG_HIGHMEM */

+void __init sync_initial_page_table(void)
+{
+ clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ KERNEL_PGD_PTRS);
+
+ /*
+ * sync back low identity map too. It is used for example
+ * in the 32-bit EFI stub.
+ */
+ clone_pgd_range(initial_page_table,
+ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+}
+
void __init native_pagetable_init(void)
{
unsigned long pfn, va;

2018-03-01 02:47:07

by Woody Suwalski

[permalink] [raw]
Subject: Re: 4.16 regression: s2ram broken on non-PAE i686

Thomas Gleixner wrote:
> Woody,
>
> On Wed, 28 Feb 2018, Woody Suwalski wrote:
>> Certainly. I understand you want dmesg output for kernels build with and
>> without PAE, not just PAE=n on the cmdline :-)
>> Here it is...
> Thanks for providing the data. It did not pinpoint the issue but at least
> it gave me the hint to look into the right direction.
>
> Does the patch below fix the issue for you? It's untested as I'm at home
> and have no access to a 32bit machine right now.
>
> Thanks,
>
> tglx
>
> 8<------------------
> Subject: x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
> From: Thomas Gleixner <[email protected]>
> Date: Wed, 28 Feb 2018 21:14:26 +0100
>
> The separation of the cpu_entry_area from the fixmap missed the fact that
> on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
> initial_page_table by the previous synchronizations.
>
> This results in suspend/resume failures because 32bit utilizes initial page
> table for resume. The absence of the cpu_entry_area mapping results in a
> triple fault, aka. insta reboot.
>
> Synchronize the initial page table after setting up the cpu entry
> area. Instead of adding yet another copy of the same code, move it to a
> function and invoke it from the various places.
>
> It needs to be investigated if the existing calls in setup_arch() and
> setup_per_cpu_areas() can be replaced by the later invocation from
> setup_cpu_entry_areas(), but that's beyond the scope of this fix.
>
> Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
> Reported-by: Woody Suwalski <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> ---
> arch/x86/include/asm/pgtable_32.h | 1 +
> arch/x86/include/asm/pgtable_64.h | 1 +
> arch/x86/kernel/setup.c | 17 +++++------------
> arch/x86/kernel/setup_percpu.c | 17 ++++-------------
> arch/x86/mm/cpu_entry_area.c | 6 ++++++
> arch/x86/mm/init_32.c | 15 +++++++++++++++
> 6 files changed, 32 insertions(+), 25 deletions(-)
>
> --- a/arch/x86/include/asm/pgtable_32.h
> +++ b/arch/x86/include/asm/pgtable_32.h
> @@ -32,6 +32,7 @@ extern pmd_t initial_pg_pmd[];
> static inline void pgtable_cache_init(void) { }
> static inline void check_pgt_cache(void) { }
> void paging_init(void);
> +void sync_initial_page_table(void);
>
> /*
> * Define this if things work differently on an i386 and an i486:
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -28,6 +28,7 @@ extern pgd_t init_top_pgt[];
> #define swapper_pg_dir init_top_pgt
>
> extern void paging_init(void);
> +static inline void sync_initial_page_table(void) { }
>
> #define pte_ERROR(e) \
> pr_err("%s:%d: bad pte %p(%016lx)\n", \
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1204,20 +1204,13 @@ void __init setup_arch(char **cmdline_p)
>
> kasan_init();
>
> -#ifdef CONFIG_X86_32
> - /* sync back kernel address range */
> - clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
> - swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> - KERNEL_PGD_PTRS);
> -
> /*
> - * sync back low identity map too. It is used for example
> - * in the 32-bit EFI stub.
> + * Sync back kernel address range.
> + *
> + * FIXME: Can the later sync in setup_cpu_entry_areas() replace
> + * this call?
> */
> - clone_pgd_range(initial_page_table,
> - swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> - min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> -#endif
> + sync_initial_page_table();
>
> tboot_probe();
>
> --- a/arch/x86/kernel/setup_percpu.c
> +++ b/arch/x86/kernel/setup_percpu.c
> @@ -287,24 +287,15 @@ void __init setup_per_cpu_areas(void)
> /* Setup cpu initialized, callin, callout masks */
> setup_cpu_local_masks();
>
> -#ifdef CONFIG_X86_32
> /*
> * Sync back kernel address range again. We already did this in
> * setup_arch(), but percpu data also needs to be available in
> * the smpboot asm. We can't reliably pick up percpu mappings
> * using vmalloc_fault(), because exception dispatch needs
> * percpu data.
> + *
> + * FIXME: Can the later sync in setup_cpu_entry_areas() replace
> + * this call?
> */
> - clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
> - swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> - KERNEL_PGD_PTRS);
> -
> - /*
> - * sync back low identity map too. It is used for example
> - * in the 32-bit EFI stub.
> - */
> - clone_pgd_range(initial_page_table,
> - swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> - min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> -#endif
> + sync_initial_page_table();
> }
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -163,4 +163,10 @@ void __init setup_cpu_entry_areas(void)
>
> for_each_possible_cpu(cpu)
> setup_cpu_entry_area(cpu);
> +
> + /*
> + * This is the last essential update to swapper_pgdir which needs
> + * to be synchronized to initial_page_table on 32bit.
> + */
> + sync_initial_page_table();
> }
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -453,6 +453,21 @@ static inline void permanent_kmaps_init(
> }
> #endif /* CONFIG_HIGHMEM */
>
> +void __init sync_initial_page_table(void)
> +{
> + clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
> + swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> + KERNEL_PGD_PTRS);
> +
> + /*
> + * sync back low identity map too. It is used for example
> + * in the 32-bit EFI stub.
> + */
> + clone_pgd_range(initial_page_table,
> + swapper_pg_dir + KERNEL_PGD_BOUNDARY,
> + min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
> +}
> +
> void __init native_pagetable_init(void)
> {
> unsigned long pfn, va;
Thanks for the patch, good news, it did fix the problem.
I did 2 builds and both worked OK over the s2ram cycle.

It will be necessary to add the patch to 4.15-stable and 4.14-stable, I
believe that both have now broken s2ram.

I will build tomorrow 4.15 and 4.14 with your patch and try it out - the
patch seems to apply OK to 4.15.7 and 4.14.23...

Thanks, Woody


2018-03-01 07:37:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: 4.16 regression: s2ram broken on non-PAE i686

On Wed, 28 Feb 2018, Woody Suwalski wrote:
> Thanks for the patch, good news, it did fix the problem.
> I did 2 builds and both worked OK over the s2ram cycle.

Good.

> It will be necessary to add the patch to 4.15-stable and 4.14-stable, I
> believe that both have now broken s2ram.

Right, it's already tagged for stable.

Thanks,

tglx

Subject: [tip:x86/pti] x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table

Commit-ID: 945fd17ab6bab8a4d05da6c3170519fbcfe62ddb
Gitweb: https://git.kernel.org/tip/945fd17ab6bab8a4d05da6c3170519fbcfe62ddb
Author: Thomas Gleixner <[email protected]>
AuthorDate: Wed, 28 Feb 2018 21:14:26 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Thu, 1 Mar 2018 09:48:27 +0100

x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table

The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.

This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.

With PAE enabled this works by chance because the PGD entry which covers
the fixmap and other parts incindentally provides the cpu_entry_area
mapping as well.

Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.

It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.

Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: Woody Suwalski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Woody Suwalski <[email protected]>
Cc: William Grant <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

---
arch/x86/include/asm/pgtable_32.h | 1 +
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/setup.c | 17 +++++------------
arch/x86/kernel/setup_percpu.c | 17 ++++-------------
arch/x86/mm/cpu_entry_area.c | 6 ++++++
arch/x86/mm/init_32.c | 15 +++++++++++++++
6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index e55466760ff8..b3ec519e3982 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -32,6 +32,7 @@ extern pmd_t initial_pg_pmd[];
static inline void pgtable_cache_init(void) { }
static inline void check_pgt_cache(void) { }
void paging_init(void);
+void sync_initial_page_table(void);

/*
* Define this if things work differently on an i386 and an i486:
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 81462e9a34f6..1149d2112b2e 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -28,6 +28,7 @@ extern pgd_t init_top_pgt[];
#define swapper_pg_dir init_top_pgt

extern void paging_init(void);
+static inline void sync_initial_page_table(void) { }

#define pte_ERROR(e) \
pr_err("%s:%d: bad pte %p(%016lx)\n", \
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1ae67e982af7..4c616be28506 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1204,20 +1204,13 @@ void __init setup_arch(char **cmdline_p)

kasan_init();

-#ifdef CONFIG_X86_32
- /* sync back kernel address range */
- clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
-
/*
- * sync back low identity map too. It is used for example
- * in the 32-bit EFI stub.
+ * Sync back kernel address range.
+ *
+ * FIXME: Can the later sync in setup_cpu_entry_areas() replace
+ * this call?
*/
- clone_pgd_range(initial_page_table,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-#endif
+ sync_initial_page_table();

tboot_probe();

diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 497aa766fab3..ea554f812ee1 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -287,24 +287,15 @@ void __init setup_per_cpu_areas(void)
/* Setup cpu initialized, callin, callout masks */
setup_cpu_local_masks();

-#ifdef CONFIG_X86_32
/*
* Sync back kernel address range again. We already did this in
* setup_arch(), but percpu data also needs to be available in
* the smpboot asm. We can't reliably pick up percpu mappings
* using vmalloc_fault(), because exception dispatch needs
* percpu data.
+ *
+ * FIXME: Can the later sync in setup_cpu_entry_areas() replace
+ * this call?
*/
- clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
-
- /*
- * sync back low identity map too. It is used for example
- * in the 32-bit EFI stub.
- */
- clone_pgd_range(initial_page_table,
- swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
-#endif
+ sync_initial_page_table();
}
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index b9283cc27622..476d810639a8 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -163,4 +163,10 @@ void __init setup_cpu_entry_areas(void)

for_each_possible_cpu(cpu)
setup_cpu_entry_area(cpu);
+
+ /*
+ * This is the last essential update to swapper_pgdir which needs
+ * to be synchronized to initial_page_table on 32bit.
+ */
+ sync_initial_page_table();
}
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 79cb066f40c0..396e1f0151ac 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -453,6 +453,21 @@ static inline void permanent_kmaps_init(pgd_t *pgd_base)
}
#endif /* CONFIG_HIGHMEM */

+void __init sync_initial_page_table(void)
+{
+ clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ KERNEL_PGD_PTRS);
+
+ /*
+ * sync back low identity map too. It is used for example
+ * in the 32-bit EFI stub.
+ */
+ clone_pgd_range(initial_page_table,
+ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+}
+
void __init native_pagetable_init(void)
{
unsigned long pfn, va;

2018-03-01 09:32:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/pti] x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table


Just a few small speling nits:

* tip-bot for Thomas Gleixner <[email protected]> wrote:

> Commit-ID: 945fd17ab6bab8a4d05da6c3170519fbcfe62ddb
> Gitweb: https://git.kernel.org/tip/945fd17ab6bab8a4d05da6c3170519fbcfe62ddb
> Author: Thomas Gleixner <[email protected]>
> AuthorDate: Wed, 28 Feb 2018 21:14:26 +0100
> Committer: Thomas Gleixner <[email protected]>
> CommitDate: Thu, 1 Mar 2018 09:48:27 +0100
>
> x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
>
> The separation of the cpu_entry_area from the fixmap missed the fact that
> on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
> initial_page_table by the previous synchronizations.
>
> This results in suspend/resume failures because 32bit utilizes initial page
> table for resume. The absence of the cpu_entry_area mapping results in a
> triple fault, aka. insta reboot.
>
> With PAE enabled this works by chance because the PGD entry which covers
> the fixmap and other parts incindentally provides the cpu_entry_area
> mapping as well.

s/incindentally/incidentally

s/32bit/32-bit

s/utilizes initial page table
/utilizes the initial page table

> Synchronize the initial page table after setting up the cpu entry
> area. Instead of adding yet another copy of the same code, move it to a
> function and invoke it from the various places.

s/cpu/CPU

> It needs to be investigated if the existing calls in setup_arch() and
> setup_per_cpu_areas() can be replaced by the later invocation from
> setup_cpu_entry_areas(), but that's beyond the scope of this fix.

> + /*
> + * This is the last essential update to swapper_pgdir which needs
> + * to be synchronized to initial_page_table on 32bit.
> + */
> + sync_initial_page_table();


s/swapper_pgdir
/swapper_pg_dir

s/on 32bit/on 32-bit kernels

> + /*
> + * sync back low identity map too. It is used for example
> + * in the 32-bit EFI stub.
> + */

s/sync/Sync

Very nice fix!

Thanks,

Ingo