2003-05-28 06:30:19

by Milton Miller

[permalink] [raw]
Subject: [PATCH] fix oops on resume from apm bios initiated suspend


Hi Pavel.

Didn't know if you caught this one, but it fixes it for me and others
who responded on the list.

mm is NULL for kernel threads without their own context. active_mm is
maintained the one we lazly switch from.

Without this patch, apm bios initiated suspend events (eg panel close)
cause an oops on resume in the LDT restore, killing kapmd, which causes
further events to not be polled.

milton

===== arch/i386/kernel/suspend.c 1.16 vs edited =====
--- 1.16/arch/i386/kernel/suspend.c Sat May 17 16:09:37 2003
+++ edited/arch/i386/kernel/suspend.c Sat May 24 05:00:02 2003
@@ -114,7 +114,7 @@
cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;

load_TR_desc(); /* This does ltr */
- load_LDT(&current->mm->context); /* This does lldt */
+ load_LDT(&current->active_mm->context); /* This does lldt */

/*
* Now maybe reload the debug registers


2003-05-28 11:01:16

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Hi!

> Didn't know if you caught this one, but it fixes it for me and others
> who responded on the list.
>
> mm is NULL for kernel threads without their own context. active_mm is
> maintained the one we lazly switch from.
>
> Without this patch, apm bios initiated suspend events (eg panel close)
> cause an oops on resume in the LDT restore, killing kapmd, which causes
> further events to not be polled.

Ouch, okay, this looks good. Andrew please apply.

[I guess this is trivial enough for trivial patch monkey if andrew
does not want to take it...]
Pavel

> ===== arch/i386/kernel/suspend.c 1.16 vs edited =====
> --- 1.16/arch/i386/kernel/suspend.c Sat May 17 16:09:37 2003
> +++ edited/arch/i386/kernel/suspend.c Sat May 24 05:00:02 2003
> @@ -114,7 +114,7 @@
> cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;
>
> load_TR_desc(); /* This does ltr */
> - load_LDT(&current->mm->context); /* This does lldt */
> + load_LDT(&current->active_mm->context); /* This does lldt */
>
> /*
> * Now maybe reload the debug registers

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-05-28 13:02:54

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Pavel Machek writes:
> Hi!
>
> > Didn't know if you caught this one, but it fixes it for me and others
> > who responded on the list.
> >
> > mm is NULL for kernel threads without their own context. active_mm is
> > maintained the one we lazly switch from.
> >
> > Without this patch, apm bios initiated suspend events (eg panel close)
> > cause an oops on resume in the LDT restore, killing kapmd, which causes
> > further events to not be polled.
>
> Ouch, okay, this looks good. Andrew please apply.
>
> [I guess this is trivial enough for trivial patch monkey if andrew
> does not want to take it...]
> Pavel
>
> > ===== arch/i386/kernel/suspend.c 1.16 vs edited =====
> > --- 1.16/arch/i386/kernel/suspend.c Sat May 17 16:09:37 2003
> > +++ edited/arch/i386/kernel/suspend.c Sat May 24 05:00:02 2003
> > @@ -114,7 +114,7 @@
> > cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;
> >
> > load_TR_desc(); /* This does ltr */
> > - load_LDT(&current->mm->context); /* This does lldt */
> > + load_LDT(&current->active_mm->context); /* This does lldt */

No one has still explained WHY kapmd's current->mm is NULL for some people,
while it obviously is non-NULL for many others. That worries me a lot more.

2003-05-28 14:05:38

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Hi!

> > > Didn't know if you caught this one, but it fixes it for me and others
> > > who responded on the list.
> > >
> > > mm is NULL for kernel threads without their own context. active_mm is
> > > maintained the one we lazly switch from.
> > >
> > > Without this patch, apm bios initiated suspend events (eg panel close)
> > > cause an oops on resume in the LDT restore, killing kapmd, which causes
> > > further events to not be polled.
> >
> > Ouch, okay, this looks good. Andrew please apply.
> >
> > [I guess this is trivial enough for trivial patch monkey if andrew
> > does not want to take it...]
> > Pavel
> >
> > > ===== arch/i386/kernel/suspend.c 1.16 vs edited =====
> > > --- 1.16/arch/i386/kernel/suspend.c Sat May 17 16:09:37 2003
> > > +++ edited/arch/i386/kernel/suspend.c Sat May 24 05:00:02 2003
> > > @@ -114,7 +114,7 @@
> > > cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff;
> > >
> > > load_TR_desc(); /* This does ltr */
> > > - load_LDT(&current->mm->context); /* This does lldt */
> > > + load_LDT(&current->active_mm->context); /* This does lldt */
>
> No one has still explained WHY kapmd's current->mm is NULL for some people,
> while it obviously is non-NULL for many others. That worries me a
> > > lot more.

That's seems like random variable to me. It probably depends on timing
during suspend. If code is buggy fix it.

Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-05-28 22:17:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

[email protected] wrote:
>
> > > - load_LDT(&current->mm->context); /* This does lldt */
> > > + load_LDT(&current->active_mm->context); /* This does lldt */
>
> No one has still explained WHY kapmd's current->mm is NULL for some people,
> while it obviously is non-NULL for many others.

All kernel threads have current->mm = NULL, via daemonize()->exit_mm(). So
the question becomes "why does this code get called by kernel threads for
some people, and not for others"? Pavel?

Also, is there any point in doing load_LDT(&current->active_mm->context)
for a kernel thread?

> That worries me a lot more.

I'm more worried by the 113-column line in fix_processor_context() :)

2003-05-28 22:52:42

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Hi!

> > > > - load_LDT(&current->mm->context); /* This does lldt */
> > > > + load_LDT(&current->active_mm->context); /* This does lldt */
> >
> > No one has still explained WHY kapmd's current->mm is NULL for some people,
> > while it obviously is non-NULL for many others.
>
> All kernel threads have current->mm = NULL, via daemonize()->exit_mm(). So
> the question becomes "why does this code get called by kernel threads for
> some people, and not for others"? Pavel?

I believe it depends on what process happens to be current at time of
suspend. That can be randomly kernel thread or user process.

Some people use APM, some people use ACPI, and sometimes APM suspend
is triggered because of BIOS, sometimes because user said apm -s...

> Also, is there any point in doing load_LDT(&current->active_mm->context)
> for a kernel thread?

Yes, we want system to be similar state it was when we suspended, to
prevent heisenbugs.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-05-28 22:57:51

by Manfred Spraul

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Andrew Morton wrote:

>Also, is there any point in doing load_LDT(&current->active_mm->context)
>for a kernel thread?
>
>
Yes.
If the next thread switch is to a user space process that uses
kernel_thread->active_mm as user_thread->mm, then switch_to does not
load the ldt. (switch_to, in <asm-i386/mmu_context.h>)
--
Manfred

2003-05-28 23:01:42

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

On Thu, 2003-05-29 at 11:05, Pavel Machek wrote:
> Yes, we want system to be similar state it was when we suspended, to
> prevent heisenbugs.

Heisenbugs? What's that in English? Sounds like it might be house bugs!

Regards,

Nigel

2003-05-28 23:23:54

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Hi!

> > Yes, we want system to be similar state it was when we suspended, to
> > prevent heisenbugs.
>
> Heisenbugs? What's that in English? Sounds like it might be house
> bugs!

No, its after Heisenberg from quantum theory (and used incorrectly,
because it would be probably only difficult to debug, not depending on
debugger). See google, first link.
Pavel

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2003-05-29 08:31:01

by Alex Riesen

[permalink] [raw]
Subject: Re: [PATCH] fix oops on resume from apm bios initiated suspend

Milton Miller, Wed, May 28, 2003 08:43:27 +0200:
>
> Didn't know if you caught this one, but it fixes it for me and others
> who responded on the list.
>

it fixes the oops after restoring from suspend,
but the system still became unstable: random segfaults, random oopses.