2015-06-11 23:48:09

by srinivas pandruvada

[permalink] [raw]
Subject: [PATCH] x86: General protection fault after STR (32 bit systems only)

Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
after resume, this cause general protection fault. This is very difficult
to reproduce, but this does happen on 32 bit systems. This crash is not
reproduced after save/restore DS during calls to _save_processor_state/
__restore_processor_state respectively.
Similar issue was reported in the past
https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
was not identified.

Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/suspend_32.h | 2 +-
arch/x86/power/cpu.c | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 552d6c9..3503d0b 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -11,7 +11,7 @@

/* image of the saved processor state */
struct saved_context {
- u16 es, fs, gs, ss;
+ u16 ds, es, fs, gs, ss;
unsigned long cr0, cr2, cr3, cr4;
u64 misc_enable;
bool misc_enable_saved;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 757678f..e0dfb01 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
* segment registers
*/
#ifdef CONFIG_X86_32
+ savesegment(ds, ctxt->ds);
savesegment(es, ctxt->es);
savesegment(fs, ctxt->fs);
savesegment(gs, ctxt->gs);
@@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
* segment registers
*/
#ifdef CONFIG_X86_32
+ loadsegment(ds, ctxt->ds);
loadsegment(es, ctxt->es);
loadsegment(fs, ctxt->fs);
loadsegment(gs, ctxt->gs);
--
1.9.3


2015-06-12 06:07:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* Srinivas Pandruvada <[email protected]> wrote:

> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
> after resume, this cause general protection fault. [...]

But s2ram has no influence on 'returning to user-space'. So unless I'm missing
something this changelog makes no sense.

Unfortunately the patch seems to be completely bogus as well:

> [...] This is very difficult to reproduce, but this does happen on 32 bit
> systems. This crash is not reproduced after save/restore DS during calls to
> _save_processor_state/ __restore_processor_state respectively.
>
> Similar issue was reported in the past
> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
> was not identified.
>
> Signed-off-by: Srinivas Pandruvada <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> ---
> arch/x86/include/asm/suspend_32.h | 2 +-
> arch/x86/power/cpu.c | 2 ++
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> index 552d6c9..3503d0b 100644
> --- a/arch/x86/include/asm/suspend_32.h
> +++ b/arch/x86/include/asm/suspend_32.h
> @@ -11,7 +11,7 @@
>
> /* image of the saved processor state */
> struct saved_context {
> - u16 es, fs, gs, ss;
> + u16 ds, es, fs, gs, ss;
> unsigned long cr0, cr2, cr3, cr4;
> u64 misc_enable;
> bool misc_enable_saved;
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 757678f..e0dfb01 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
> * segment registers
> */
> #ifdef CONFIG_X86_32
> + savesegment(ds, ctxt->ds);
> savesegment(es, ctxt->es);
> savesegment(fs, ctxt->fs);
> savesegment(gs, ctxt->gs);
> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> * segment registers
> */
> #ifdef CONFIG_X86_32
> + loadsegment(ds, ctxt->ds);
> loadsegment(es, ctxt->es);
> loadsegment(fs, ctxt->fs);
> loadsegment(gs, ctxt->gs);

So save_processor_state() is used by 32-bit s2ram in the following place:

arch/x86/kernel/acpi/wakeup_32.S: call save_processor_state

Other uses are:

arch/x86/kernel/acpi/wakeup_64.S: call save_processor_state
arch/x86/kernel/apm_32.c: save_processor_state();
arch/x86/kernel/machine_kexec_32.c: save_processor_state();
arch/x86/kernel/machine_kexec_64.c: save_processor_state();
arch/x86/platform/olpc/xo1-wakeup.S: call save_processor_state
kernel/power/hibernate.c: save_processor_state();
kernel/power/hibernate.c: save_processor_state();

but neither of these call sites seems to matter to the bug: the 32-bit system does
not use APM, kexec, is not an OLPC and does not use hibernation.

And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
state save/restore before ACPI (BIOS) downcalls.

Furthermore, the bug report in:

https://bugzilla.kernel.org/show_bug.cgi?id=61781

talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
not trigger in v3.10.

1)

So the first critical question is: if the ACPI/BIOS suspend code corrupts the
kernel's DS, how can we get so far as to resume fully, return to user-space, and
segfault there so that it can all be reported?

So neither the explanation nor the code makes any sense in the context of the
reported bugs. Can anyone else offer any plausible theory about why this patch
would fix 32-bit user-space segfaults?

2)

Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
prudent to avoid the BIOS stomping on our DS.

But the restoration done in this patch is bogus, it's done way too late in a C
function, there's a number of places where we might use the kernel DS before it's
restored, such as restore_registers():

restore_registers:
movl saved_context_ebp, %ebp
movl saved_context_ebx, %ebx
movl saved_context_esi, %esi
movl saved_context_edi, %edi
pushl saved_context_eflags
popfl
ret

this is called before restore_processor_state().

those 'saved_context_*' references are already using the kernel DS! So if it's
corrupted, we'll crash there already. So your patch seems totally nonsensical.

3)

So the patch below (totally untested) does the DS restoration early on, as the
first thing after we emerge from the sleep. This should be equivalent to your
patch, but is more robust and much simpler: there's no need to save DS, because we
know that it has to be __KERNEL_DS.

Could you please test whether this solves the problem as well? Also, could you
please describe how the failure triggers in your system: how many times do you
have to suspend/resume to trigger the segfaults, and is there anything that makes
the failures less or more likely?

Thanks,

Ingo

=================>
arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..9a3ce66e0dd8 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
jmp ret_point
.p2align 4,,7
ret_point:
+ /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
+ movl $__KERNEL_DS, %eax
+ movl %eax, %ds
+
call restore_registers
call restore_processor_state
ret

2015-06-12 06:48:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Thu, Jun 11, 2015 at 11:07 PM, Ingo Molnar <[email protected]> wrote:
>
> * Srinivas Pandruvada <[email protected]> wrote:
>
>> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
>> after resume, this cause general protection fault. [...]
>
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing
> something this changelog makes no sense.
>
> Unfortunately the patch seems to be completely bogus as well:
>
>> [...] This is very difficult to reproduce, but this does happen on 32 bit
>> systems. This crash is not reproduced after save/restore DS during calls to
>> _save_processor_state/ __restore_processor_state respectively.
>>
>> Similar issue was reported in the past
>> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
>> was not identified.
>>
>> Signed-off-by: Srinivas Pandruvada <[email protected]>
>> Reviewed-by: Andi Kleen <[email protected]>
>> ---
>> arch/x86/include/asm/suspend_32.h | 2 +-
>> arch/x86/power/cpu.c | 2 ++
>> 2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
>> index 552d6c9..3503d0b 100644
>> --- a/arch/x86/include/asm/suspend_32.h
>> +++ b/arch/x86/include/asm/suspend_32.h
>> @@ -11,7 +11,7 @@
>>
>> /* image of the saved processor state */
>> struct saved_context {
>> - u16 es, fs, gs, ss;
>> + u16 ds, es, fs, gs, ss;
>> unsigned long cr0, cr2, cr3, cr4;
>> u64 misc_enable;
>> bool misc_enable_saved;
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 757678f..e0dfb01 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
>> * segment registers
>> */
>> #ifdef CONFIG_X86_32
>> + savesegment(ds, ctxt->ds);
>> savesegment(es, ctxt->es);
>> savesegment(fs, ctxt->fs);
>> savesegment(gs, ctxt->gs);
>> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>> * segment registers
>> */
>> #ifdef CONFIG_X86_32
>> + loadsegment(ds, ctxt->ds);
>> loadsegment(es, ctxt->es);
>> loadsegment(fs, ctxt->fs);
>> loadsegment(gs, ctxt->gs);
>
> So save_processor_state() is used by 32-bit s2ram in the following place:
>
> arch/x86/kernel/acpi/wakeup_32.S: call save_processor_state
>
> Other uses are:
>
> arch/x86/kernel/acpi/wakeup_64.S: call save_processor_state
> arch/x86/kernel/apm_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_64.c: save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S: call save_processor_state
> kernel/power/hibernate.c: save_processor_state();
> kernel/power/hibernate.c: save_processor_state();
>
> but neither of these call sites seems to matter to the bug: the 32-bit system does
> not use APM, kexec, is not an OLPC and does not use hibernation.
>
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
> state save/restore before ACPI (BIOS) downcalls.
>
> Furthermore, the bug report in:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=61781
>
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
> not trigger in v3.10.
>
> 1)
>
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> kernel's DS, how can we get so far as to resume fully, return to user-space, and
> segfault there so that it can all be reported?
>
> So neither the explanation nor the code makes any sense in the context of the
> reported bugs. Can anyone else offer any plausible theory about why this patch
> would fix 32-bit user-space segfaults?

I'm too tired to look at this intelligently right now, but this
reminds me of the sysret_ss_attrs thing. What if we have a situation
where, after suspend/resume, we end up with a perfectly valid ss
*selector* (or, on 64-bit kernels, a ds selector that does not matter
one whit) but a somehow-screwed-up ds *cached hidden descriptor*. (On
32-bit kernels, this could be something exotic like grows-down limit
2^31.)

Now we do the very first return. If we're on AMD hardware and that
return is SYSRET, then we end up with some complete random garbage
loaded in the hidden DS descriptor if SYSRET on 32-bit mode is indeed
screwed up on AMD.

Of course, this is apparently DS and not SS, but maybe something
similar is happening. How easily reproducible is this, and what cpu
is it on? Would something like 'push %ds; pop %ds' after SYSENTER in
vdso32/sysenter.S fix it?

Also, damnit systemd, stop catching SEGV. I want to know all the SEGV
details, and you helpfully threw them away. Good job.

>
> 2)
>
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
> prudent to avoid the BIOS stomping on our DS.
>
> But the restoration done in this patch is bogus, it's done way too late in a C
> function, there's a number of places where we might use the kernel DS before it's
> restored, such as restore_registers():
>
> restore_registers:
> movl saved_context_ebp, %ebp
> movl saved_context_ebx, %ebx
> movl saved_context_esi, %esi
> movl saved_context_edi, %edi
> pushl saved_context_eflags
> popfl
> ret
>
> this is called before restore_processor_state().
>
> those 'saved_context_*' references are already using the kernel DS! So if it's
> corrupted, we'll crash there already. So your patch seems totally nonsensical.

Don't even bother saving it. Just load the known value on resume.

This is a bit odd, in that we don't do the X86_BUG_SYSRET_SS_ATTRS
fixup on 32-bit kernels, and yet no one reported the
sysret_ss_attrs_32 test failing on 32-bit kernels. I assume that
what's going on is that 32-bit kernels don't have any ways to enter
the kernel with SS=0 (because ss0 is in the TSS and it isn't zero),
but we never thought of the other way to have a bogus SS descriptor in
kernel mode: change the GDT without reloading SS.

I'm presently mystified about DS. The 32-bit sysenter path loads
__KERNEL_DS into ds (in a macro helpfully called SAVE_ALL), but what,
if anything, restores DS on return? Could we actually be executing in
32-bit userspace with DS & 3 == 0 after every SYSEXIT on 32-bit
kernels? If so, yikes!

Should we load __USER_DS into DS and ES right before SYSEXIT?

Here's my full-fledged half-asleep theory:

We suspend to RAM. We resume. DS and/or ES contains something
unusual but not unusual enough to crash us. Our first entry to
userspace is via SYSEXIT. Because we're daft, we don't reload DS or
ES at any point along the way. Now we're in userspace with an even
more screwed up DS or ES than usual. We get SIGSEGV (presumably #GP)
and try to deliver the signal. We end up with impossible pt_regs
(bogus RPL) but who cares? We get to __setup_frame, which fixes the
garbage in pt_regs and we re-enter user mode through an IRET patch, so
we finally reload DS and ES. As a result, we successfully deliver the
signal. The saved regs would reveal the damage, but systemd throws
them away, and we remain confused for a full ten kernel versions.

--Andy

2015-06-12 07:15:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* Andy Lutomirski <[email protected]> wrote:

> > 1)
> >
> > So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> > kernel's DS, how can we get so far as to resume fully, return to user-space,
> > and segfault there so that it can all be reported?
> >
> > So neither the explanation nor the code makes any sense in the context of the
> > reported bugs. Can anyone else offer any plausible theory about why this patch
> > would fix 32-bit user-space segfaults?
>
> I'm too tired to look at this intelligently right now, but this reminds me of
> the sysret_ss_attrs thing. What if we have a situation where, after
> suspend/resume, we end up with a perfectly valid ss *selector* (or, on 64-bit
> kernels, a ds selector that does not matter one whit) but a somehow-screwed-up
> ds *cached hidden descriptor*. (On 32-bit kernels, this could be something
> exotic like grows-down limit 2^31.)

Yes, that theory is what my patch tests, by reloading DS with __KERNEL_DS.

This should be safe as the first thing to execute after re-entry, as we don't
save/restore the GDT. (If the BIOS mucks with the GDT without restoring it to our
value we are probably screwed in any case.)

> Now we do the very first return. If we're on AMD hardware and that return is
> SYSRET, then we end up with some complete random garbage loaded in the hidden DS
> descriptor if SYSRET on 32-bit mode is indeed screwed up on AMD.

But why would this change from v3.10 to v3.11? I cannot see any low level x86
change that should make a difference there.

> Don't even bother saving it. Just load the known value on resume.

Yeah, so that's what my simple patch does.

> Here's my full-fledged half-asleep theory:
>
> We suspend to RAM. We resume. DS and/or ES contains something unusual but not
> unusual enough to crash us. Our first entry to userspace is via SYSEXIT.
> Because we're daft, we don't reload DS or ES at any point along the way. Now
> we're in userspace with an even more screwed up DS or ES than usual. We get
> SIGSEGV (presumably #GP) and try to deliver the signal. We end up with
> impossible pt_regs (bogus RPL) but who cares? We get to __setup_frame, which
> fixes the garbage in pt_regs and we re-enter user mode through an IRET patch, so
> we finally reload DS and ES. As a result, we successfully deliver the signal.
> The saved regs would reveal the damage, but systemd throws them away, and we
> remain confused for a full ten kernel versions.

That's indeed plausible.

If so then the DS reloading patch I sent should help.

So we should also do a full review of all the DS/ES save/restore paths,
everywhere, as they don't seem to be very consistently done.

Thanks,

Ingo

2015-06-12 07:41:44

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Thu, Jun 11, 2015 at 11:07 PM, Ingo Molnar <[email protected]> wrote:
>
> * Srinivas Pandruvada <[email protected]> wrote:
>
>> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
>> after resume, this cause general protection fault. [...]
>
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing
> something this changelog makes no sense.
>
> Unfortunately the patch seems to be completely bogus as well:
>
>> [...] This is very difficult to reproduce, but this does happen on 32 bit
>> systems. This crash is not reproduced after save/restore DS during calls to
>> _save_processor_state/ __restore_processor_state respectively.
>>
>> Similar issue was reported in the past
>> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
>> was not identified.
>>
>> Signed-off-by: Srinivas Pandruvada <[email protected]>
>> Reviewed-by: Andi Kleen <[email protected]>
>> ---
>> arch/x86/include/asm/suspend_32.h | 2 +-
>> arch/x86/power/cpu.c | 2 ++
>> 2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
>> index 552d6c9..3503d0b 100644
>> --- a/arch/x86/include/asm/suspend_32.h
>> +++ b/arch/x86/include/asm/suspend_32.h
>> @@ -11,7 +11,7 @@
>>
>> /* image of the saved processor state */
>> struct saved_context {
>> - u16 es, fs, gs, ss;
>> + u16 ds, es, fs, gs, ss;
>> unsigned long cr0, cr2, cr3, cr4;
>> u64 misc_enable;
>> bool misc_enable_saved;
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 757678f..e0dfb01 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
>> * segment registers
>> */
>> #ifdef CONFIG_X86_32
>> + savesegment(ds, ctxt->ds);
>> savesegment(es, ctxt->es);
>> savesegment(fs, ctxt->fs);
>> savesegment(gs, ctxt->gs);
>> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>> * segment registers
>> */
>> #ifdef CONFIG_X86_32
>> + loadsegment(ds, ctxt->ds);
>> loadsegment(es, ctxt->es);
>> loadsegment(fs, ctxt->fs);
>> loadsegment(gs, ctxt->gs);
>
> So save_processor_state() is used by 32-bit s2ram in the following place:
>
> arch/x86/kernel/acpi/wakeup_32.S: call save_processor_state
>
> Other uses are:
>
> arch/x86/kernel/acpi/wakeup_64.S: call save_processor_state
> arch/x86/kernel/apm_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_64.c: save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S: call save_processor_state
> kernel/power/hibernate.c: save_processor_state();
> kernel/power/hibernate.c: save_processor_state();
>
> but neither of these call sites seems to matter to the bug: the 32-bit system does
> not use APM, kexec, is not an OLPC and does not use hibernation.
>
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
> state save/restore before ACPI (BIOS) downcalls.
>
> Furthermore, the bug report in:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=61781
>
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
> not trigger in v3.10.
>
> 1)
>
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> kernel's DS, how can we get so far as to resume fully, return to user-space, and
> segfault there so that it can all be reported?
>
> So neither the explanation nor the code makes any sense in the context of the
> reported bugs. Can anyone else offer any plausible theory about why this patch
> would fix 32-bit user-space segfaults?
>
> 2)
>
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
> prudent to avoid the BIOS stomping on our DS.
>
> But the restoration done in this patch is bogus, it's done way too late in a C
> function, there's a number of places where we might use the kernel DS before it's
> restored, such as restore_registers():
>
> restore_registers:
> movl saved_context_ebp, %ebp
> movl saved_context_ebx, %ebx
> movl saved_context_esi, %esi
> movl saved_context_edi, %edi
> pushl saved_context_eflags
> popfl
> ret
>
> this is called before restore_processor_state().
>
> those 'saved_context_*' references are already using the kernel DS! So if it's
> corrupted, we'll crash there already. So your patch seems totally nonsensical.
>
> 3)
>
> So the patch below (totally untested) does the DS restoration early on, as the
> first thing after we emerge from the sleep. This should be equivalent to your
> patch, but is more robust and much simpler: there's no need to save DS, because we
> know that it has to be __KERNEL_DS.
>
> Could you please test whether this solves the problem as well? Also, could you
> please describe how the failure triggers in your system: how many times do you
> have to suspend/resume to trigger the segfaults, and is there anything that makes
> the failures less or more likely?
>
> Thanks,
>
> Ingo
>
> =================>
> arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..9a3ce66e0dd8 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
> jmp ret_point
> .p2align 4,,7
> ret_point:
> + /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> + movl $__KERNEL_DS, %eax
> + movl %eax, %ds
> +

On further thought, I think you want movl $__USER_DS, %eax. The
32-bit kernel is a strange beast. Also, you should probably fix up
%es as well.

--Andy

> call restore_registers
> call restore_processor_state
> ret



--
Andy Lutomirski
AMA Capital Management, LLC

2015-06-12 07:50:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* Andy Lutomirski <[email protected]> wrote:

> > --- a/arch/x86/kernel/acpi/wakeup_32.S
> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
> > jmp ret_point
> > .p2align 4,,7
> > ret_point:
> > + /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> > + movl $__KERNEL_DS, %eax
> > + movl %eax, %ds
> > +
>
> On further thought, I think you want movl $__USER_DS, %eax. The
> 32-bit kernel is a strange beast. Also, you should probably fix up
> %es as well.

So restore_processor_state() already restores ES. The idea here was to reload DS
early on, because the kernel implicitly uses it for data access so we need it to
be good to be able to continue executing any generic kernel code.

We don't use %es: prefixed assembly AFAICS, what are the implicit users of ES?

Also, to further confuse things, we also have:

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
movw $__KERNEL_DS, %ax
movw %ax, %ss
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs

# reload the gdt, as we need the full 32 bit address
lidt saved_idt
lldt saved_ldt
ljmp $(__KERNEL_CS), $1f
1:
movl %cr3, %eax
movl %eax, %cr3
wbinvd

which seems to be another layer of restoration - but it possibly does not trigger
in the S2RAM case here.

Oh, funny the 'reload the gdt' comment: do you see an LGDT there? It reloads all
segment selectors, the IDT, LDT and CR3, but does not seem to reload the GDT - the
only thing the comment describes.

Thanks,

Ingo

2015-06-12 08:17:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

%es is used implicitly by string instructions.

On June 12, 2015 12:50:13 AM PDT, Ingo Molnar <[email protected]> wrote:
>
>* Andy Lutomirski <[email protected]> wrote:
>
>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> > @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
>> > jmp ret_point
>> > .p2align 4,,7
>> > ret_point:
>> > + /* In case the BIOS corrupted DS, make the kernel context
>minimally functional: */
>> > + movl $__KERNEL_DS, %eax
>> > + movl %eax, %ds
>> > +
>>
>> On further thought, I think you want movl $__USER_DS, %eax. The
>> 32-bit kernel is a strange beast. Also, you should probably fix up
>> %es as well.
>
>So restore_processor_state() already restores ES. The idea here was to
>reload DS
>early on, because the kernel implicitly uses it for data access so we
>need it to
>be good to be able to continue executing any generic kernel code.
>
>We don't use %es: prefixed assembly AFAICS, what are the implicit users
>of ES?
>
>Also, to further confuse things, we also have:
>
>ENTRY(wakeup_pmode_return)
>wakeup_pmode_return:
> movw $__KERNEL_DS, %ax
> movw %ax, %ss
> movw %ax, %ds
> movw %ax, %es
> movw %ax, %fs
> movw %ax, %gs
>
> # reload the gdt, as we need the full 32 bit address
> lidt saved_idt
> lldt saved_ldt
> ljmp $(__KERNEL_CS), $1f
>1:
> movl %cr3, %eax
> movl %eax, %cr3
> wbinvd
>
>which seems to be another layer of restoration - but it possibly does
>not trigger
>in the S2RAM case here.
>
>Oh, funny the 'reload the gdt' comment: do you see an LGDT there? It
>reloads all
>segment selectors, the IDT, LDT and CR3, but does not seem to reload
>the GDT - the
>only thing the comment describes.
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2015-06-12 08:36:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* H. Peter Anvin <[email protected]> wrote:

> %es is used implicitly by string instructions.

Ok, so we are probably better off reloading ES as well early, right
when we return from the firmware, just in case something does
a copy before we hit the ES restore in restore_processor_state(),
which is a generic C function?

Something like the patch below?

I also added FS/GS/SS reloading to make it complete. If this (or a variant
thereof, it's still totally untested) works then we can remove the segment
save/restore layer in __save/restore_processor_state().

Thanks,

Ingo

===========>
arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..1376a7fc21b7 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)


restore_registers:
+ /*
+ * In case the BIOS corrupted our segment descriptors,
+ * reload them to clear out any shadow descriptor
+ * state:
+ */
+ movl $__USER_DS, %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %fs
+ movl %eax, %gs
+ movl $__KERNEL_DS, %eax
+ movl %eax, %ss
+
movl saved_context_ebp, %ebp
movl saved_context_ebx, %ebx
movl saved_context_esi, %esi

2015-06-12 15:48:25

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
>> %es is used implicitly by string instructions.
>
> Ok, so we are probably better off reloading ES as well early, right
> when we return from the firmware, just in case something does
> a copy before we hit the ES restore in restore_processor_state(),
> which is a generic C function?
>
> Something like the patch below?
>
> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> thereof, it's still totally untested) works then we can remove the segment
> save/restore layer in __save/restore_processor_state().
>
> Thanks,
>
> Ingo
>
> ===========>
> arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..1376a7fc21b7 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>
>
> restore_registers:
> + /*
> + * In case the BIOS corrupted our segment descriptors,
> + * reload them to clear out any shadow descriptor
> + * state:
> + */
> + movl $__USER_DS, %eax
> + movl %eax, %ds
> + movl %eax, %es
> + movl %eax, %fs
> + movl %eax, %gs
> + movl $__KERNEL_DS, %eax
> + movl %eax, %ss
> +
> movl saved_context_ebp, %ebp
> movl saved_context_ebx, %ebx
> movl saved_context_esi, %esi

If you follow the convoluted flow of the calls in this file,
wakeup_pmode_return is the first thing called from the trampoline on
resume, and that loads the data segments with __KERNEL_DS. The better
fix would be to set ds/es to __USER_DS there. Also since we are in
the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
__KERNEL_STACK_CANARY or user's gs.

--
Brian Gerst

2015-06-12 16:17:29

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Fri, 2015-06-12 at 08:07 +0200, Ingo Molnar wrote:
> * Srinivas Pandruvada <[email protected]> wrote:
>
> > Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
> > after resume, this cause general protection fault. [...]
>
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing
> something this changelog makes no sense.
>
> Unfortunately the patch seems to be completely bogus as well:
>
> > [...] This is very difficult to reproduce, but this does happen on 32 bit
> > systems. This crash is not reproduced after save/restore DS during calls to
> > _save_processor_state/ __restore_processor_state respectively.
> >
> > Similar issue was reported in the past
> > https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
> > was not identified.
> >
> > Signed-off-by: Srinivas Pandruvada <[email protected]>
> > Reviewed-by: Andi Kleen <[email protected]>
> > ---
> > arch/x86/include/asm/suspend_32.h | 2 +-
> > arch/x86/power/cpu.c | 2 ++
> > 2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> > index 552d6c9..3503d0b 100644
> > --- a/arch/x86/include/asm/suspend_32.h
> > +++ b/arch/x86/include/asm/suspend_32.h
> > @@ -11,7 +11,7 @@
> >
> > /* image of the saved processor state */
> > struct saved_context {
> > - u16 es, fs, gs, ss;
> > + u16 ds, es, fs, gs, ss;
> > unsigned long cr0, cr2, cr3, cr4;
> > u64 misc_enable;
> > bool misc_enable_saved;
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index 757678f..e0dfb01 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
> > * segment registers
> > */
> > #ifdef CONFIG_X86_32
> > + savesegment(ds, ctxt->ds);
> > savesegment(es, ctxt->es);
> > savesegment(fs, ctxt->fs);
> > savesegment(gs, ctxt->gs);
> > @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > * segment registers
> > */
> > #ifdef CONFIG_X86_32
> > + loadsegment(ds, ctxt->ds);
> > loadsegment(es, ctxt->es);
> > loadsegment(fs, ctxt->fs);
> > loadsegment(gs, ctxt->gs);
>
> So save_processor_state() is used by 32-bit s2ram in the following place:
>
> arch/x86/kernel/acpi/wakeup_32.S: call save_processor_state
>
> Other uses are:
>
> arch/x86/kernel/acpi/wakeup_64.S: call save_processor_state
> arch/x86/kernel/apm_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_32.c: save_processor_state();
> arch/x86/kernel/machine_kexec_64.c: save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S: call save_processor_state
> kernel/power/hibernate.c: save_processor_state();
> kernel/power/hibernate.c: save_processor_state();
>
> but neither of these call sites seems to matter to the bug: the 32-bit system does
> not use APM, kexec, is not an OLPC and does not use hibernation.
>
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
> state save/restore before ACPI (BIOS) downcalls.
>
> Furthermore, the bug report in:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=61781
>
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
> not trigger in v3.10.
>
> 1)
>
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> kernel's DS, how can we get so far as to resume fully, return to user-space, and
> segfault there so that it can all be reported?
>
> So neither the explanation nor the code makes any sense in the context of the
> reported bugs. Can anyone else offer any plausible theory about why this patch
> would fix 32-bit user-space segfaults?
>
> 2)
>
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
> prudent to avoid the BIOS stomping on our DS.
>
> But the restoration done in this patch is bogus, it's done way too late in a C
> function, there's a number of places where we might use the kernel DS before it's
> restored, such as restore_registers():
>
> restore_registers:
> movl saved_context_ebp, %ebp
> movl saved_context_ebx, %ebx
> movl saved_context_esi, %esi
> movl saved_context_edi, %edi
> pushl saved_context_eflags
> popfl
> ret
>
> this is called before restore_processor_state().
>
> those 'saved_context_*' references are already using the kernel DS! So if it's
> corrupted, we'll crash there already. So your patch seems totally nonsensical.
>
> 3)
>
> So the patch below (totally untested) does the DS restoration early on, as the
> first thing after we emerge from the sleep. This should be equivalent to your
> patch, but is more robust and much simpler: there's no need to save DS, because we
> know that it has to be __KERNEL_DS.
>
> Could you please test whether this solves the problem as well? Also, could you
> please describe how the failure triggers in your system: how many times do you
> have to suspend/resume to trigger the segfaults, and is there anything that makes
> the failures less or more likely?
It is very random. Sometimes only few hundred trys reproduce this issue.
Some other times it requires thousands of trys (sometimes not
reproducible at all for days)
It is very time sensitive. A small delay or some debug code in resume
path prevents this to crash.
The BIOS folks created special version to check if they are corrupting
any DS, but they were not able to catch any corruption.
Since these are special deployed systems running critical application,
need to request the tests again with your changes. May take long time.

Thanks,
Srinivas
>
> Thanks,
>
> Ingo
>
> =================>
> arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..9a3ce66e0dd8 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
> jmp ret_point
> .p2align 4,,7
> ret_point:
> + /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> + movl $__KERNEL_DS, %eax
> + movl %eax, %ds
> +
> call restore_registers
> call restore_processor_state
> ret

2015-06-12 18:12:03

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <[email protected]> wrote:
> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
>>
>> * H. Peter Anvin <[email protected]> wrote:
>>
>>> %es is used implicitly by string instructions.
>>
>> Ok, so we are probably better off reloading ES as well early, right
>> when we return from the firmware, just in case something does
>> a copy before we hit the ES restore in restore_processor_state(),
>> which is a generic C function?
>>
>> Something like the patch below?
>>
>> I also added FS/GS/SS reloading to make it complete. If this (or a variant
>> thereof, it's still totally untested) works then we can remove the segment
>> save/restore layer in __save/restore_processor_state().
>>
>> Thanks,
>>
>> Ingo
>>
>> ===========>
>> arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>> index 665c6b7d2ea9..1376a7fc21b7 100644
>> --- a/arch/x86/kernel/acpi/wakeup_32.S
>> +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>>
>>
>> restore_registers:
>> + /*
>> + * In case the BIOS corrupted our segment descriptors,
>> + * reload them to clear out any shadow descriptor
>> + * state:
>> + */
>> + movl $__USER_DS, %eax
>> + movl %eax, %ds
>> + movl %eax, %es
>> + movl %eax, %fs
>> + movl %eax, %gs
>> + movl $__KERNEL_DS, %eax
>> + movl %eax, %ss
>> +
>> movl saved_context_ebp, %ebp
>> movl saved_context_ebx, %ebx
>> movl saved_context_esi, %esi
>
> If you follow the convoluted flow of the calls in this file,
> wakeup_pmode_return is the first thing called from the trampoline on
> resume, and that loads the data segments with __KERNEL_DS. The better
> fix would be to set ds/es to __USER_DS there. Also since we are in
> the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> __KERNEL_STACK_CANARY or user's gs.

So why is this issue rare? Is it just that the first entry to
userspace after resume is only very rarely via SYSEXIT? Or is there
some other code path that usually, but not quite always, fixes up the
segment registers?

--Andy

2015-06-12 18:33:27

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Fri, 2015-06-12 at 11:11 -0700, Andy Lutomirski wrote:
> On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <[email protected]> wrote:
> > On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
> >>
> >> * H. Peter Anvin <[email protected]> wrote:
> >>
> >>> %es is used implicitly by string instructions.
> >>
> >> Ok, so we are probably better off reloading ES as well early, right
> >> when we return from the firmware, just in case something does
> >> a copy before we hit the ES restore in restore_processor_state(),
> >> which is a generic C function?
> >>
> >> Something like the patch below?
> >>
> >> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> >> thereof, it's still totally untested) works then we can remove the segment
> >> save/restore layer in __save/restore_processor_state().
> >>
> >> Thanks,
> >>
> >> Ingo
> >>
> >> ===========>
> >> arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> >> 1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> >> index 665c6b7d2ea9..1376a7fc21b7 100644
> >> --- a/arch/x86/kernel/acpi/wakeup_32.S
> >> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> >> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> >>
> >>
> >> restore_registers:
> >> + /*
> >> + * In case the BIOS corrupted our segment descriptors,
> >> + * reload them to clear out any shadow descriptor
> >> + * state:
> >> + */
> >> + movl $__USER_DS, %eax
> >> + movl %eax, %ds
> >> + movl %eax, %es
> >> + movl %eax, %fs
> >> + movl %eax, %gs
> >> + movl $__KERNEL_DS, %eax
> >> + movl %eax, %ss
> >> +
> >> movl saved_context_ebp, %ebp
> >> movl saved_context_ebx, %ebx
> >> movl saved_context_esi, %esi
> >
> > If you follow the convoluted flow of the calls in this file,
> > wakeup_pmode_return is the first thing called from the trampoline on
> > resume, and that loads the data segments with __KERNEL_DS. The better
> > fix would be to set ds/es to __USER_DS there. Also since we are in
> > the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> > __KERNEL_STACK_CANARY or user's gs.
>
> So why is this issue rare? Is it just that the first entry to
> userspace after resume is only very rarely via SYSEXIT? Or is there
> some other code path that usually, but not quite always, fixes up the
> segment registers?
We spent a quite a bit of time debugging this. Any small debug code or
printk causes problem to disappear. We noticed that we can reproduce
this more frequently when the we run just one user space application to
just do suspend/resume test, which supports yours theory.


>
> --Andy

2015-06-12 22:45:53

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Fri, Jun 12, 2015 at 5:48 PM, Brian Gerst <[email protected]> wrote:
> If you follow the convoluted flow of the calls in this file,
> ...

Speaking of which. It is indeed quite bad.

For one, saved_eip is only ever set to point to ret_point:

ENTRY(saved_eip) .long 0
...

movl $ret_point, saved_eip

and it has just a single user, where an indirect jump
through it is performed:

# jump to place where we left off
movl saved_eip, %eax
jmp *%eax

No comments why it is so.

All this seems to be equivalent to trivial

# jump to place where we left off
jmp ret_point

Am I missing something?

2015-06-13 07:01:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* Srinivas Pandruvada <[email protected]> wrote:

> On Fri, 2015-06-12 at 11:11 -0700, Andy Lutomirski wrote:
> > On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <[email protected]> wrote:
> > > On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
> > >>
> > >> * H. Peter Anvin <[email protected]> wrote:
> > >>
> > >>> %es is used implicitly by string instructions.
> > >>
> > >> Ok, so we are probably better off reloading ES as well early, right
> > >> when we return from the firmware, just in case something does
> > >> a copy before we hit the ES restore in restore_processor_state(),
> > >> which is a generic C function?
> > >>
> > >> Something like the patch below?
> > >>
> > >> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> > >> thereof, it's still totally untested) works then we can remove the segment
> > >> save/restore layer in __save/restore_processor_state().
> > >>
> > >> Thanks,
> > >>
> > >> Ingo
> > >>
> > >> ===========>
> > >> arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> > >> 1 file changed, 13 insertions(+)
> > >>
> > >> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> > >> index 665c6b7d2ea9..1376a7fc21b7 100644
> > >> --- a/arch/x86/kernel/acpi/wakeup_32.S
> > >> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > >> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> > >>
> > >>
> > >> restore_registers:
> > >> + /*
> > >> + * In case the BIOS corrupted our segment descriptors,
> > >> + * reload them to clear out any shadow descriptor
> > >> + * state:
> > >> + */
> > >> + movl $__USER_DS, %eax
> > >> + movl %eax, %ds
> > >> + movl %eax, %es
> > >> + movl %eax, %fs
> > >> + movl %eax, %gs
> > >> + movl $__KERNEL_DS, %eax
> > >> + movl %eax, %ss
> > >> +
> > >> movl saved_context_ebp, %ebp
> > >> movl saved_context_ebx, %ebx
> > >> movl saved_context_esi, %esi
> > >
> > > If you follow the convoluted flow of the calls in this file,
> > > wakeup_pmode_return is the first thing called from the trampoline on
> > > resume, and that loads the data segments with __KERNEL_DS. The better
> > > fix would be to set ds/es to __USER_DS there. Also since we are in
> > > the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> > > __KERNEL_STACK_CANARY or user's gs.
> >
> > So why is this issue rare? Is it just that the first entry to
> > userspace after resume is only very rarely via SYSEXIT? Or is there
> > some other code path that usually, but not quite always, fixes up the
> > segment registers?
>
> We spent a quite a bit of time debugging this. Any small debug code or printk
> causes problem to disappear. We noticed that we can reproduce this more
> frequently when the we run just one user space application to just do
> suspend/resume test, which supports yours theory.

So what happens if you add a couple of CPUID (or NOP) calls? If it's timing
related, not segment register related, then that might 'fix' the bug too.

Which would suggest that it might be some hardware problem, or a race with
something (which could not really be on the Linux side, as we are the only thread
of execution at this point)?

Thanks,

Ingo

2015-06-13 07:04:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)


* Brian Gerst <[email protected]> wrote:

> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> >> %es is used implicitly by string instructions.
> >
> > Ok, so we are probably better off reloading ES as well early, right
> > when we return from the firmware, just in case something does
> > a copy before we hit the ES restore in restore_processor_state(),
> > which is a generic C function?
> >
> > Something like the patch below?
> >
> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
> > thereof, it's still totally untested) works then we can remove the segment
> > save/restore layer in __save/restore_processor_state().
> >
> > Thanks,
> >
> > Ingo
> >
> > ===========>
> > arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> > index 665c6b7d2ea9..1376a7fc21b7 100644
> > --- a/arch/x86/kernel/acpi/wakeup_32.S
> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> >
> >
> > restore_registers:
> > + /*
> > + * In case the BIOS corrupted our segment descriptors,
> > + * reload them to clear out any shadow descriptor
> > + * state:
> > + */
> > + movl $__USER_DS, %eax
> > + movl %eax, %ds
> > + movl %eax, %es
> > + movl %eax, %fs
> > + movl %eax, %gs
> > + movl $__KERNEL_DS, %eax
> > + movl %eax, %ss
> > +
> > movl saved_context_ebp, %ebp
> > movl saved_context_ebx, %ebx
> > movl saved_context_esi, %esi
>
> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return
> is the first thing called from the trampoline on resume, and that loads the data
> segments with __KERNEL_DS. [...]

So if wakeup_pmode_return is really the first thing called then the whole premise
of shadow descriptor corruption goes out the window: we reload all relevant
segment registers.

Which leaves us with only two small channels through which the patch might make a
bug go away:

- timing, as it introduces a small delay

- code/cache layout, as it slightly rearranges the code

... but both of these are in the 'grasping at straws' category of hypotheses
really.

Thanks,

Ingo

2015-06-13 07:16:07

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH, DEBUG] x86/32: Add small delay after resume


* Srinivas Pandruvada <[email protected]> wrote:

> > Also, could you please describe how the failure triggers in your system: how
> > many times do you have to suspend/resume to trigger the segfaults, and is
> > there anything that makes the failures less or more likely?
>
> It is very random. Sometimes only few hundred trys reproduce this issue. Some
> other times it requires thousands of trys (sometimes not reproducible at all for
> days) It is very time sensitive.

So the very same kernel image will produce different crash patterns depending on
the time of day? That suggests heat/hardware problems.

> [...] A small delay or some debug code in resume path prevents this to crash.

Fun...

> The BIOS folks created special version to check if they are corrupting any DS,
> but they were not able to catch any corruption. [...]

So is it true that we always execute wakeup_pmode_return first after we return
from the BIOS?

If so then the BIOS touching DS cannot be an issue, as we re-initialize all
segment selectors, which reloads the descriptors:

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
movw $__KERNEL_DS, %ax
movw %ax, %ss
movw %ax, %ds
movw %ax, %es
movw %ax, %fs
movw %ax, %gs

# reload the gdt, as we need the full 32 bit address
lidt saved_idt
lldt saved_ldt
ljmp $(__KERNEL_CS), $1f

> [...] Since these are special deployed systems running critical application,
> need to request the tests again with your changes. May take long time.

So my second patch is clearly broken as per Brian Gerst's comments.

What I would suggest is to try a patch that adds just 100 NOPs or so - attached
below. This patch will add a small delay without any side effects (other than
changing the kernel image layout).

If that makes the crash go away, then I'd say the likelihood that it's hardware
related increases substantially: maybe a PLL has not stabilized yet sufficiently
after resume, or there's some latent heat sensitivity and the fan has not started
up yet, etc.

( You can then use this simple delay generating patch in production systems as
well, to work around the problem. Maybe convince the BIOS folks to add a delay
like this to their resume path before they call Linux. )

Thanks,

Ingo

=================>

arch/x86/kernel/acpi/wakeup_32.S | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..ef26999da80a 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -10,6 +10,12 @@

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
+
+ /* Timing delay of a few dozen cycles: give the hardware some time to recover */
+ .rept 100
+ nop
+ .endr
+
movw $__KERNEL_DS, %ax
movw %ax, %ss
movw %ax, %ds

2015-06-13 14:20:17

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Sat 2015-06-13 00:45:29, Denys Vlasenko wrote:
> On Fri, Jun 12, 2015 at 5:48 PM, Brian Gerst <[email protected]> wrote:
> > If you follow the convoluted flow of the calls in this file,
> > ...
>
> Speaking of which. It is indeed quite bad.
>
> For one, saved_eip is only ever set to point to ret_point:
>
> ENTRY(saved_eip) .long 0
> ...
>
> movl $ret_point, saved_eip
>
> and it has just a single user, where an indirect jump
> through it is performed:
>
> # jump to place where we left off
> movl saved_eip, %eax
> jmp *%eax
>
> No comments why it is so.
>
> All this seems to be equivalent to trivial
>
> # jump to place where we left off
> jmp ret_point
>
> Am I missing something?

I don't think so. Its just that slowdown was not bad enough tofix it.

...patch would be welcome, and even better if you could check the issue
on 64-bit kernel, too...
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-06-13 18:23:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Sat, Jun 13, 2015 at 12:03 AM, Ingo Molnar <[email protected]> wrote:
>
> * Brian Gerst <[email protected]> wrote:
>
>> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
>> >
>> > * H. Peter Anvin <[email protected]> wrote:
>> >
>> >> %es is used implicitly by string instructions.
>> >
>> > Ok, so we are probably better off reloading ES as well early, right
>> > when we return from the firmware, just in case something does
>> > a copy before we hit the ES restore in restore_processor_state(),
>> > which is a generic C function?
>> >
>> > Something like the patch below?
>> >
>> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
>> > thereof, it's still totally untested) works then we can remove the segment
>> > save/restore layer in __save/restore_processor_state().
>> >
>> > Thanks,
>> >
>> > Ingo
>> >
>> > ===========>
>> > arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>> > 1 file changed, 13 insertions(+)
>> >
>> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>> > index 665c6b7d2ea9..1376a7fc21b7 100644
>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>> >
>> >
>> > restore_registers:
>> > + /*
>> > + * In case the BIOS corrupted our segment descriptors,
>> > + * reload them to clear out any shadow descriptor
>> > + * state:
>> > + */
>> > + movl $__USER_DS, %eax
>> > + movl %eax, %ds
>> > + movl %eax, %es
>> > + movl %eax, %fs
>> > + movl %eax, %gs
>> > + movl $__KERNEL_DS, %eax
>> > + movl %eax, %ss
>> > +
>> > movl saved_context_ebp, %ebp
>> > movl saved_context_ebx, %ebx
>> > movl saved_context_esi, %esi
>>
>> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return
>> is the first thing called from the trampoline on resume, and that loads the data
>> segments with __KERNEL_DS. [...]
>
> So if wakeup_pmode_return is really the first thing called then the whole premise
> of shadow descriptor corruption goes out the window: we reload all relevant
> segment registers.

True, but it still leaves the fact that we're loading __KERNEL_DS
instead of __USER_DS, right? So we end up in the kernel in some
context (I have no clue what context) with __KERNEL_DS loaded. It's
very easy for us to inadvertently fix it: we could return to userspace
by any means whatsoever except SYSEXIT, or we could even return back
to some preempted kernel context.

I still think we should replace __KERNEL_DS with __USER_DS in
wakeup_pmode_return and see if the problem goes away.

--Andy

2015-06-13 21:30:25

by Brian Gerst

[permalink] [raw]
Subject: Re: [PATCH] x86: General protection fault after STR (32 bit systems only)

On Sat, Jun 13, 2015 at 2:23 PM, Andy Lutomirski <[email protected]> wrote:
> On Sat, Jun 13, 2015 at 12:03 AM, Ingo Molnar <[email protected]> wrote:
>>
>> * Brian Gerst <[email protected]> wrote:
>>
>>> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <[email protected]> wrote:
>>> >
>>> > * H. Peter Anvin <[email protected]> wrote:
>>> >
>>> >> %es is used implicitly by string instructions.
>>> >
>>> > Ok, so we are probably better off reloading ES as well early, right
>>> > when we return from the firmware, just in case something does
>>> > a copy before we hit the ES restore in restore_processor_state(),
>>> > which is a generic C function?
>>> >
>>> > Something like the patch below?
>>> >
>>> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
>>> > thereof, it's still totally untested) works then we can remove the segment
>>> > save/restore layer in __save/restore_processor_state().
>>> >
>>> > Thanks,
>>> >
>>> > Ingo
>>> >
>>> > ===========>
>>> > arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>>> > 1 file changed, 13 insertions(+)
>>> >
>>> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>>> > index 665c6b7d2ea9..1376a7fc21b7 100644
>>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>>> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>>> >
>>> >
>>> > restore_registers:
>>> > + /*
>>> > + * In case the BIOS corrupted our segment descriptors,
>>> > + * reload them to clear out any shadow descriptor
>>> > + * state:
>>> > + */
>>> > + movl $__USER_DS, %eax
>>> > + movl %eax, %ds
>>> > + movl %eax, %es
>>> > + movl %eax, %fs
>>> > + movl %eax, %gs
>>> > + movl $__KERNEL_DS, %eax
>>> > + movl %eax, %ss
>>> > +
>>> > movl saved_context_ebp, %ebp
>>> > movl saved_context_ebx, %ebx
>>> > movl saved_context_esi, %esi
>>>
>>> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return
>>> is the first thing called from the trampoline on resume, and that loads the data
>>> segments with __KERNEL_DS. [...]
>>
>> So if wakeup_pmode_return is really the first thing called then the whole premise
>> of shadow descriptor corruption goes out the window: we reload all relevant
>> segment registers.
>
> True, but it still leaves the fact that we're loading __KERNEL_DS
> instead of __USER_DS, right? So we end up in the kernel in some
> context (I have no clue what context) with __KERNEL_DS loaded. It's
> very easy for us to inadvertently fix it: we could return to userspace
> by any means whatsoever except SYSEXIT, or we could even return back
> to some preempted kernel context.
>
> I still think we should replace __KERNEL_DS with __USER_DS in
> wakeup_pmode_return and see if the problem goes away.

I'm pretty sure that's what the problem is. If you look at the
sysexit path, it never reloads ds/es. It assumes they are still
__USER_DS set at sysenter. The iret path does restore all the user
segments.

--
Brian Gerst

2015-06-14 06:56:50

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86: Load __USER_DS into DS/ES after resume


* Brian Gerst <[email protected]> wrote:

> >> So if wakeup_pmode_return is really the first thing called then the whole
> >> premise of shadow descriptor corruption goes out the window: we reload all
> >> relevant segment registers.
> >
> > True, but it still leaves the fact that we're loading __KERNEL_DS instead of
> > __USER_DS, right? So we end up in the kernel in some context (I have no clue
> > what context) with __KERNEL_DS loaded. It's very easy for us to inadvertently
> > fix it: we could return to userspace by any means whatsoever except SYSEXIT,
> > or we could even return back to some preempted kernel context.
> >
> > I still think we should replace __KERNEL_DS with __USER_DS in
> > wakeup_pmode_return and see if the problem goes away.
>
> I'm pretty sure that's what the problem is. If you look at the sysexit path, it
> never reloads ds/es. It assumes they are still __USER_DS set at sysenter. The
> iret path does restore all the user segments.

Ok, so something like the patch below, right?

Thanks,

Ingo

=====================>
arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..7302bbaea184 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
movw $__KERNEL_DS, %ax
movw %ax, %ss
- movw %ax, %ds
- movw %ax, %es
movw %ax, %fs
movw %ax, %gs

+ movw $__KERNEL_DS, %ax
+ movw %ax, %ds
+ movw %ax, %es
+
# reload the gdt, as we need the full 32 bit address
lidt saved_idt
lldt saved_ldt

2015-06-14 07:03:33

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] x86: Load __USER_DS into DS/ES after resume

On Sun 2015-06-14 08:56:35, Ingo Molnar wrote:
>
> * Brian Gerst <[email protected]> wrote:
>
> > >> So if wakeup_pmode_return is really the first thing called then the whole
> > >> premise of shadow descriptor corruption goes out the window: we reload all
> > >> relevant segment registers.
> > >
> > > True, but it still leaves the fact that we're loading __KERNEL_DS instead of
> > > __USER_DS, right? So we end up in the kernel in some context (I have no clue
> > > what context) with __KERNEL_DS loaded. It's very easy for us to inadvertently
> > > fix it: we could return to userspace by any means whatsoever except SYSEXIT,
> > > or we could even return back to some preempted kernel context.
> > >
> > > I still think we should replace __KERNEL_DS with __USER_DS in
> > > wakeup_pmode_return and see if the problem goes away.
> >
> > I'm pretty sure that's what the problem is. If you look at the sysexit path, it
> > never reloads ds/es. It assumes they are still __USER_DS set at sysenter. The
> > iret path does restore all the user segments.
>
> Ok, so something like the patch below, right?
>
> Thanks,
>
> Ingo
>
> =====================>
> arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..7302bbaea184 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
> wakeup_pmode_return:
> movw $__KERNEL_DS, %ax
> movw %ax, %ss
> - movw %ax, %ds
> - movw %ax, %es
> movw %ax, %fs
> movw %ax, %gs
>
> + movw $__KERNEL_DS, %ax
> + movw %ax, %ds
> + movw %ax, %es

Umm. Are you sure? :-).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-06-15 16:12:38

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On Sat, 2015-06-13 at 09:15 +0200, Ingo Molnar wrote:
> * Srinivas Pandruvada <[email protected]> wrote:
>
> > > Also, could you please describe how the failure triggers in your system: how
> > > many times do you have to suspend/resume to trigger the segfaults, and is
> > > there anything that makes the failures less or more likely?
> >
> > It is very random. Sometimes only few hundred trys reproduce this issue. Some
> > other times it requires thousands of trys (sometimes not reproducible at all for
> > days) It is very time sensitive.
>
> So the very same kernel image will produce different crash patterns depending on
> the time of day? That suggests heat/hardware problems.
>
> > [...] A small delay or some debug code in resume path prevents this to crash.
>
> Fun...
>
> > The BIOS folks created special version to check if they are corrupting any DS,
> > but they were not able to catch any corruption. [...]
>
> So is it true that we always execute wakeup_pmode_return first after we return
> from the BIOS?
>
> If so then the BIOS touching DS cannot be an issue, as we re-initialize all
> segment selectors, which reloads the descriptors:
>
> ENTRY(wakeup_pmode_return)
> wakeup_pmode_return:
> movw $__KERNEL_DS, %ax
> movw %ax, %ss
> movw %ax, %ds
> movw %ax, %es
> movw %ax, %fs
> movw %ax, %gs
>
> # reload the gdt, as we need the full 32 bit address
> lidt saved_idt
> lldt saved_ldt
> ljmp $(__KERNEL_CS), $1f
>
> > [...] Since these are special deployed systems running critical application,
> > need to request the tests again with your changes. May take long time.
>
> So my second patch is clearly broken as per Brian Gerst's comments.
>
> What I would suggest is to try a patch that adds just 100 NOPs or so - attached
> below. This patch will add a small delay without any side effects (other than
> changing the kernel image layout).
>
> If that makes the crash go away, then I'd say the likelihood that it's hardware
> related increases substantially: maybe a PLL has not stabilized yet sufficiently
> after resume, or there's some latent heat sensitivity and the fan has not started
> up yet, etc.

> ( You can then use this simple delay generating patch in production systems as
> well, to work around the problem. Maybe convince the BIOS folks to add a delay
> like this to their resume path before they call Linux. )
This was already experimented. They added delay in BIOS before handing
over to OS, the crash still occurred.
We were thinking that BIOS SMI handler responsible for suspend/wake up
corrupted the DS even after control passed over to OS. But couldn't
prove it.
Thanks for your valuable debugging suggestions.

Thanks,
Srinivas

>
> Thanks,
>
> Ingo
>
> =================>
>
> arch/x86/kernel/acpi/wakeup_32.S | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..ef26999da80a 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -10,6 +10,12 @@
>
> ENTRY(wakeup_pmode_return)
> wakeup_pmode_return:
> +
> + /* Timing delay of a few dozen cycles: give the hardware some time to recover */
> + .rept 100
> + nop
> + .endr
> +
> movw $__KERNEL_DS, %ax
> movw %ax, %ss
> movw %ax, %ds

2015-06-16 21:34:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
>>
>> So is it true that we always execute wakeup_pmode_return first after we return
>> from the BIOS?
>>
>> If so then the BIOS touching DS cannot be an issue, as we re-initialize all
>> segment selectors, which reloads the descriptors:
>>
>> ENTRY(wakeup_pmode_return)
>> wakeup_pmode_return:
>> movw $__KERNEL_DS, %ax
>> movw %ax, %ss
>> movw %ax, %ds
>> movw %ax, %es
>> movw %ax, %fs
>> movw %ax, %gs
>>
>> # reload the gdt, as we need the full 32 bit address
>> lidt saved_idt
>> lldt saved_ldt
>> ljmp $(__KERNEL_CS), $1f
>>

Where does the GDT get initialized?

-hpa

2015-06-16 22:27:51

by srinivas pandruvada

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On Tue, 2015-06-16 at 14:33 -0700, H. Peter Anvin wrote:
> On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
> >>
> >> So is it true that we always execute wakeup_pmode_return first after we return
> >> from the BIOS?
> >>
> >> If so then the BIOS touching DS cannot be an issue, as we re-initialize all
> >> segment selectors, which reloads the descriptors:
> >>
> >> ENTRY(wakeup_pmode_return)
> >> wakeup_pmode_return:
> >> movw $__KERNEL_DS, %ax
> >> movw %ax, %ss
> >> movw %ax, %ds
> >> movw %ax, %es
> >> movw %ax, %fs
> >> movw %ax, %gs
> >>
> >> # reload the gdt, as we need the full 32 bit address
> >> lidt saved_idt
> >> lldt saved_ldt
> >> ljmp $(__KERNEL_CS), $1f
> >>
>
> Where does the GDT get initialized?
In wakeup_start
During acpi_sleep_prepare we set a 32 bit FW waking vector which points
to wakeup_start.

Thanks,
Srinivas
>
> -hpa
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-06-17 16:35:14

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On Tue, Jun 16, 2015 at 02:33:10PM -0700, H. Peter Anvin wrote:
> On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
> >>
> >> So is it true that we always execute wakeup_pmode_return first after we return
> >> from the BIOS?
> >>
> >> If so then the BIOS touching DS cannot be an issue, as we re-initialize all
> >> segment selectors, which reloads the descriptors:
> >>
> >> ENTRY(wakeup_pmode_return)
> >> wakeup_pmode_return:
> >> movw $__KERNEL_DS, %ax
> >> movw %ax, %ss
> >> movw %ax, %ds
> >> movw %ax, %es
> >> movw %ax, %fs
> >> movw %ax, %gs
> >>
> >> # reload the gdt, as we need the full 32 bit address
> >> lidt saved_idt
> >> lldt saved_ldt
> >> ljmp $(__KERNEL_CS), $1f
> >>
>
> Where does the GDT get initialized?
>
> -hpa

mit 84e70971e67d97bc2db18a4e76d42846272a54bd
Author: Konrad Rzeszutek Wilk <[email protected]>
Date: Fri Apr 5 16:42:22 2013 -0400

x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-06-17 17:23:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On 06/17/2015 09:33 AM, Konrad Rzeszutek Wilk wrote:
>>>>
>>
>> Where does the GDT get initialized?
>>
>> -hpa
>
> mit 84e70971e67d97bc2db18a4e76d42846272a54bd
> Author: Konrad Rzeszutek Wilk <[email protected]>
> Date: Fri Apr 5 16:42:22 2013 -0400
>
> x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
>

Store, no. LOAD?

-hpa

2015-06-17 18:30:53

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH, DEBUG] x86/32: Add small delay after resume

On Wed, Jun 17, 2015 at 10:22:49AM -0700, H. Peter Anvin wrote:
> On 06/17/2015 09:33 AM, Konrad Rzeszutek Wilk wrote:
> >>>>
> >>
> >> Where does the GDT get initialized?
> >>
> >> -hpa
> >
> > mit 84e70971e67d97bc2db18a4e76d42846272a54bd
> > Author: Konrad Rzeszutek Wilk <[email protected]>
> > Date: Fri Apr 5 16:42:22 2013 -0400
> >
> > x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
> >
>
> Store, no. LOAD?

__save_processor_state:
..
/*
* We save it here, but restore it only in the hibernate case.
* For ACPI S3 resume, this is loaded via 'early_gdt_desc' in
* 64-bit
* mode in "secondary_startup_64". In 32-bit mode it is done via
* 'pmode_gdt' in wakeup_start.
*/

>
> -hpa
>
>