2000-12-07 16:38:24

by Richard J Moore

[permalink] [raw]
Subject: Why is double_fault serviced by a trap gate?



Why is double_fault serviced by a trap gate? The problem with this is that
any double-fault caused by a stack-fault, which is the usual reason,
becomes a triple-fault. And a triple-fault results in a processor reset or
shutdown making the fault damn near impossible to get any information on.

Oughtn't the double-fault exception handler be serviced by a task gate? And
similarly the NMI handler in case the NMI is on the current stack page
frame?




Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK



2000-12-07 16:44:34

by Andi Kleen

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, Dec 07, 2000 at 04:04:21PM +0000, [email protected] wrote:
>
>
> Why is double_fault serviced by a trap gate? The problem with this is that
> any double-fault caused by a stack-fault, which is the usual reason,
> becomes a triple-fault. And a triple-fault results in a processor reset or
> shutdown making the fault damn near impossible to get any information on.
>
> Oughtn't the double-fault exception handler be serviced by a task gate? And
> similarly the NMI handler in case the NMI is on the current stack page
> frame?

Sounds like a good idea, when you can afford a few K for a special
NMI/double fault stack. On x86-64 it is planned to do that.


-Andi

2000-12-07 17:03:05

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Andi Kleen wrote:

> On Thu, Dec 07, 2000 at 04:04:21PM +0000, [email protected] wrote:
> >
> >
> > Why is double_fault serviced by a trap gate? The problem with this is that
> > any double-fault caused by a stack-fault, which is the usual reason,
> > becomes a triple-fault. And a triple-fault results in a processor reset or
> > shutdown making the fault damn near impossible to get any information on.
> >
> > Oughtn't the double-fault exception handler be serviced by a task gate? And
> > similarly the NMI handler in case the NMI is on the current stack page
> > frame?
>
> Sounds like a good idea, when you can afford a few K for a special
> NMI/double fault stack. On x86-64 it is planned to do that.
>
>

Well, at least on current ix86 processors it can't. Attempting to
use a task gate appears to be a trick to cause the exception to
be handled on the current stack. The hardware protection hierarchy
won't let this happen. You need to have a stack that is not accessible
from the mode that will be trapped. Otherwise, a user could crash
the machine by setting ESP to 0 and waiting for the next context-
switch or timer-tick.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-07 17:46:30

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On 7 Dec 00 at 11:31, Richard B. Johnson wrote:
> On Thu, 7 Dec 2000, Andi Kleen wrote:
> > On Thu, Dec 07, 2000 at 04:04:21PM +0000, [email protected] wrote:
> > >
> > >
> > > Why is double_fault serviced by a trap gate? The problem with this is that
> > > any double-fault caused by a stack-fault, which is the usual reason,
> > > becomes a triple-fault. And a triple-fault results in a processor reset or
> > > shutdown making the fault damn near impossible to get any information on.
> > >
> > > Oughtn't the double-fault exception handler be serviced by a task gate? And
> > > similarly the NMI handler in case the NMI is on the current stack page
> > > frame?
> >
> > Sounds like a good idea, when you can afford a few K for a special
> > NMI/double fault stack. On x86-64 it is planned to do that.
> >
> >
>
> Well, at least on current ix86 processors it can't. Attempting to
> use a task gate appears to be a trick to cause the exception to
> be handled on the current stack. The hardware protection hierarchy
> won't let this happen. You need to have a stack that is not accessible

No. If interrupt uses task gate, task switch happens. Nothing is stored
in context of old process except registers into TSS. There is only one
(bad) problem. If you want to get it 100% proof (it is not needed for double
fault, but it is definitely needed for NMI, as NMI is very often on SMP
ia32), each CPU's IRQ vector must point to different task, otherwise you
can get TSS in use during doublefault, leading to triplefault again...

> from the mode that will be trapped. Otherwise, a user could crash
> the machine by setting ESP to 0 and waiting for the next context-
> switch or timer-tick.

Yes. Currently if any ESP related problem happens in kernel, machine silently
reboots without any message. With task gate (as Jeff Merkey proposed
some months ago, btw), you can even suspend offending task and recover
from it... I think that also bluesmoke should use task gate, but I
did not read documentation on this yet.
Best regards,
Petr Vandrovec
[email protected]

2000-12-07 17:52:59

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Andi Kleen wrote:

> > Why is double_fault serviced by a trap gate? The problem with this is that
> > any double-fault caused by a stack-fault, which is the usual reason,
> > becomes a triple-fault. And a triple-fault results in a processor reset or
> > shutdown making the fault damn near impossible to get any information on.
> >
> > Oughtn't the double-fault exception handler be serviced by a task gate? And
> > similarly the NMI handler in case the NMI is on the current stack page
> > frame?
>
> Sounds like a good idea, when you can afford a few K for a special
> NMI/double fault stack. On x86-64 it is planned to do that.

A task gate is an absolute must for the double fault if we want to have a
working handler. Intel warns the CPU state can be inconsistent when a
double fault happens and for example I've seen cases where the saved CS
and EIP were not matching each other (tests were not conducted under
Linux). Also SS:ESP might be unusable leading to a triple fault.

The NMI should be left alone, though, I think as we want it to be fast
for the NMI watchdog. Task gates are not necessarily fast (depending on
how you define "fast").

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-07 18:36:37

by Andi Kleen

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, Dec 07, 2000 at 05:55:07PM +0100, Maciej W. Rozycki wrote:
> The NMI should be left alone, though, I think as we want it to be fast
> for the NMI watchdog. Task gates are not necessarily fast (depending on
> how you define "fast").

How often does the NMI watchdog handler run ?


-Andi

2000-12-07 18:43:08

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Petr Vandrovec wrote:

> No. If interrupt uses task gate, task switch happens. Nothing is stored
> in context of old process except registers into TSS. There is only one
> (bad) problem. If you want to get it 100% proof (it is not needed for double
> fault, but it is definitely needed for NMI, as NMI is very often on SMP
> ia32), each CPU's IRQ vector must point to different task, otherwise you
> can get TSS in use during doublefault, leading to triplefault again...

Well, I expect wasting a descriptor and a page of memory for the purpose
of a TSS is not a big problem.

> Yes. Currently if any ESP related problem happens in kernel, machine silently
> reboots without any message. With task gate (as Jeff Merkey proposed

You might handle the stack fault with a task gate, actually, but I'm not
sure it's worth the hassle. Handling just the double fault should be
sufficient.

> some months ago, btw), you can even suspend offending task and recover
> from it... I think that also bluesmoke should use task gate, but I
> did not read documentation on this yet.

Yep. An MCE is an abort like a double-fault, so the CPU state might be
corrupted (by definition -- I have no idea whether it really happens).

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-07 18:55:27

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Andi Kleen wrote:

> How often does the NMI watchdog handler run ?

HZ times per second.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-07 19:01:17

by Andi Kleen

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, Dec 07, 2000 at 07:11:57PM +0100, Maciej W. Rozycki wrote:
> On Thu, 7 Dec 2000, Andi Kleen wrote:
>
> > How often does the NMI watchdog handler run ?
>
> HZ times per second.

Interesting. One of my ports references for PCs lists

0044 r/w PIT counter 3 (PS/2, EISA)
used as fail-safe timer. generates an NMI on time out.
for user generated NMI see at 0462.



I don't know if modern PCs still provide this counter, but if yes it could
be used for a slow NMI watchdog that only runs every 30s or so.

-Andi

2000-12-07 19:21:51

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Andi Kleen wrote:

> Interesting. One of my ports references for PCs lists
>
> 0044 r/w PIT counter 3 (PS/2, EISA)
> used as fail-safe timer. generates an NMI on time out.
> for user generated NMI see at 0462.

Oh no, we don't use that. Even though we could, it's rare -- it exists
for EISA systems only and then mostly older ones (i.e. non-PCI ones).

In fact the only chipset that provides these additional NMI sources I
have docs for is the i82350 one.

> I don't know if modern PCs still provide this counter, but if yes it could
> be used for a slow NMI watchdog that only runs every 30s or so.

An ability to choose a NMI frequency, especially such a low one, would be
desirable but it is really inexistent for most IA32 systems.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-07 19:52:02

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On 7 Dec 00 at 19:04, Maciej W. Rozycki wrote:
> On Thu, 7 Dec 2000, Petr Vandrovec wrote:
>
> > No. If interrupt uses task gate, task switch happens. Nothing is stored
> > in context of old process except registers into TSS. There is only one
> > (bad) problem. If you want to get it 100% proof (it is not needed for double
> > fault, but it is definitely needed for NMI, as NMI is very often on SMP
> > ia32), each CPU's IRQ vector must point to different task, otherwise you
> > can get TSS in use during doublefault, leading to triplefault again...
>
> Well, I expect wasting a descriptor and a page of memory for the purpose
> of a TSS is not a big problem.

It is architectural problem. Each CPU must have its own IDT or GDT table.
If (for real example) you'll use task gate for NMI, both NMIs are currently
(AFAIK) delivered to both CPUs at same time. Both CPUs find in IDT that
they should switch to task 0x1230. So one of them finds TSS 0x1230 (in GDT
entry 0x1230 / 8) as not busy (busy is field in TSS GDT descriptor), marks
it busy and starts executing in new context. But other one finds 0x1230 as
busy. And fault during doublefault is triplefault. Which is hardwired to
reset and we are where we were before...

<fiddling through manuals>

Well, Intel recommends 'Invalid TSS' exception to be handled through TSS
too, for obvious reason that CPU state may be half-old and half-new...
But I'm not sure that all vendors handle TSS fault during doublefault
correctly and I do not want to rely on that.

So either each CPU must have its own IDT, pointing to different slots
in GDT, or each CPU must have its own GDT... I preffer IDT, as having
per-CPU GDT could create some really nasty problems (f.e. synchronizing
LDT entries between CPUs) (*) (**).

> > Yes. Currently if any ESP related problem happens in kernel, machine silently
> > reboots without any message. With task gate (as Jeff Merkey proposed
>
> You might handle the stack fault with a task gate, actually, but I'm not
> sure it's worth the hassle. Handling just the double fault should be
> sufficient.

Yes, it is. Directing stackfault to task gate is wrong, as userspace
faults ar handled by stackfault. Most of kernelspace stackfaults are
handled by doublefault ;-)

Petr Vandrovec
[email protected]

(*) I have even per-process IDT patch at
ftp://platan.vc.cvut.cz/pub/linux/idt/idts-0.00.tar.gz, so per-cpu
IDTs should be doable too... Patch is for 2.3.11-pre3, so it will
need some tweaking if someone wants to try it...

(**) On other hand, it could allow leaking information. Currently
you can find on which CPU you run with:

void main(void) {
int x;

while (1) {
asm ( "str %%ax\n" : "=a"(x));
printf("CPU %u\n", (x - 0x60) / 0x20);
}
}

With per-CPU GDT we could have same value of TR accross all CPUs...

2000-12-07 20:12:03

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Petr Vandrovec wrote:

> It is architectural problem. Each CPU must have its own IDT or GDT table.
> If (for real example) you'll use task gate for NMI, both NMIs are currently
> (AFAIK) delivered to both CPUs at same time. Both CPUs find in IDT that
> they should switch to task 0x1230. So one of them finds TSS 0x1230 (in GDT
> entry 0x1230 / 8) as not busy (busy is field in TSS GDT descriptor), marks
> it busy and starts executing in new context. But other one finds 0x1230 as
> busy. And fault during doublefault is triplefault. Which is hardwired to
> reset and we are where we were before...

This is not a problem itself -- each CPU may have a separate GDT and/or
IDT. We should not use task gates for NMIs but this still applies for
double faults.

> Well, Intel recommends 'Invalid TSS' exception to be handled through TSS
> too, for obvious reason that CPU state may be half-old and half-new...

We could set up a handler similar to the one for the double fault.

> But I'm not sure that all vendors handle TSS fault during doublefault
> correctly and I do not want to rely on that.

That would probably lead to a triple fault, but is it a real problem for
us? It is possible to set up a reliable double fault handler so the
'Invalid TSS' handler would likely get never ever invoked.

> So either each CPU must have its own IDT, pointing to different slots
> in GDT, or each CPU must have its own GDT... I preffer IDT, as having
> per-CPU GDT could create some really nasty problems (f.e. synchronizing
> LDT entries between CPUs) (*) (**).

Just as I wrote earlier...

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-07 21:43:22

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



Which surely we can on today's x86 systems. Even back in the days of OS/2
2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
Double Fault. You need only a minimal stack - 1K, sufficient to save state
and restore ESP to a known point before switching back to the main TSS to
allow normal exception handling to occur.

There no architectural restriction that some folks have hinted at - as long
as the DPL for the task gates is 3.

There's no problem under MP since the double fault exception will be only
presented on the processor that instigated the problem.

As for NMIs I didn't think they were presented to all processors
simultaneously. If they are then the way to handle that is to map a page of
the GDT, to a unique physical address per-processor - i.e. processor
local storage. The virtual address will be the same on each. This is what
we did under OS/2 SMP.
We also alisaed these pages to unique virtual addresses so that they could
be seen by the kernel from any processor context.

The only time you want the NMI handler to be fast is when it's being used
for hand-shaking, which some disk devices do. And perhaps for APIC NMI
class interprocessor interrupts. But I honestly don't think that's really a
good enough reason not to have a task gate for NMI.

The unpredictablility of the abort (NMI or Double-fault) refers to fact
that in general it is indeterminate as to whether it is a fault or trap.
And that's a matter of whether the EIP point at ot after the instruction
related to the exception. The abort nature of theses exceptions is not
really a problem for the exception handler.

In summary I'd say the lack of a task gate is at the very least an
oversight, if not a bug.

If no one else wants to do it I'll see if I can code up the task gates for
the double-fault and NMI.

Richard


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


2000-12-07 22:15:19

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000 [email protected] wrote:

>
>
> Which surely we can on today's x86 systems. Even back in the days of OS/2
> 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> Double Fault. You need only a minimal stack - 1K, sufficient to save state
> and restore ESP to a known point before switching back to the main TSS to
> allow normal exception handling to occur.
>
> There no architectural restriction that some folks have hinted at - as long
> as the DPL for the task gates is 3.
>
[SNIPPED...]

Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
Reference Manual.

The specifc text is: "The TSS does not have a stack pointer for a
privilege level 3 stack, because the procedure cannot be called by a less
privileged procedure. The stack for privilege level 3 is preserved by the
contents of SS and EIP registers which have been saved on the stack
of the privilege level called from level 3".

What this means is that a stack-fault in level 3 will kill you no
matter how cute you try to be. And, putting a task gate as call
procedure entry from a trap or fault is just trying to be cute.
It's extra code that will result in the same processor reset.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-07 22:32:44

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On 7 Dec 00 at 16:44, Richard B. Johnson wrote:
> On Thu, 7 Dec 2000 [email protected] wrote:
>
> > Which surely we can on today's x86 systems. Even back in the days of OS/2
> > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> > Double Fault. You need only a minimal stack - 1K, sufficient to save state
> > and restore ESP to a known point before switching back to the main TSS to
> > allow normal exception handling to occur.
> >
> > There no architectural restriction that some folks have hinted at - as long
> > as the DPL for the task gates is 3.
> >
> [SNIPPED...]
>
> Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
> Reference Manual.
>
> The specifc text is: "The TSS does not have a stack pointer for a
> privilege level 3 stack, because the procedure cannot be called by a less
> privileged procedure. The stack for privilege level 3 is preserved by the
> contents of SS and EIP registers which have been saved on the stack
> of the privilege level called from level 3".
>
> What this means is that a stack-fault in level 3 will kill you no
> matter how cute you try to be. And, putting a task gate as call
> procedure entry from a trap or fault is just trying to be cute.
> It's extra code that will result in the same processor reset.

You misunderstand. There is no SS/ESP for level 3, because of you cannot
switch to CPL 3 using CALL/JMP, you can switch to it only through IRET/RETF.
And both of them fetch new SS/ESP from stack...

If stack-fault happens on CPL3, CPU switches to CPL0 (as defined by
stack fault trap gate), executes appropriate code, and then returns
back to CPL3 through IRET.

Maybe you forgot when reading this, that CPL3 is non-priviledged level,
and CPL0 has most of priviledges.

Problem with doublefault is that if you overflowed CPL0 stack, you just
cannot service this error on same stack, you must switch to another one.
And only way to switch out from CPL0 stack during fault service is
hardware switch to another TSS.

In either case, nothing is ever pushed into old stack, so doing

movl $0,%esp

does not matter. With userspace never, in kernel if you have task gate
for doublefault... In userspace it will not even crash until you send some
signal to that process, or until you'll execute some call/push/pop yourself.
Petr Vandrovec
[email protected]

2000-12-07 23:09:03

by Brian Gerst

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

"Richard B. Johnson" wrote:
>
> On Thu, 7 Dec 2000 [email protected] wrote:
>
> >
> >
> > Which surely we can on today's x86 systems. Even back in the days of OS/2
> > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> > Double Fault. You need only a minimal stack - 1K, sufficient to save state
> > and restore ESP to a known point before switching back to the main TSS to
> > allow normal exception handling to occur.
> >
> > There no architectural restriction that some folks have hinted at - as long
> > as the DPL for the task gates is 3.
> >
> [SNIPPED...]
>
> Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
> Reference Manual.
>
> The specifc text is: "The TSS does not have a stack pointer for a
> privilege level 3 stack, because the procedure cannot be called by a less
> privileged procedure. The stack for privilege level 3 is preserved by the
> contents of SS and EIP registers which have been saved on the stack
> of the privilege level called from level 3".
>
> What this means is that a stack-fault in level 3 will kill you no
> matter how cute you try to be. And, putting a task gate as call
> procedure entry from a trap or fault is just trying to be cute.
> It's extra code that will result in the same processor reset.

No, because the CPL of the task gate would be 0, which means the stack
will be set to tss->esp0. The DPL of 3 means that the descriptor can be
accessed from CPL3. The text you mention generally means that the only
way to get back to CPL3 is with iret (via the saved %cs:%eip and
%ss:%esp pushed on the CPL0/1/2 stack).

--

Brian Gerst

2000-12-07 23:18:34

by Keith Owens

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000 21:09:47 +0000,
[email protected] wrote:
>In summary I'd say the lack of a task gate is at the very least an
>oversight, if not a bug.
>
>If no one else wants to do it I'll see if I can code up the task gates for
>the double-fault and NMI.

If you overflow the kernel stack then you have already scribbled on the
process state at the low end of the kernel stack pages. The process is
definitely not recoverable but you might not even be able to recover
the machine. Corrupt p_opptr and friends, thread_group or pidhash and
other processes can be affected when they follow the chains. However
being able to report the error is a good start, even if you cannot
recover.

If you add task gates, assign enough stack space for debuggers. kdb
does a lot of work when NMI detects a hung cpu and needs stack space to
do that work. A good option is to dedicate a set of process entries
for per cpu task gates, say processes 2-NR_CPUS+1 are dedicated to task
gates.

2000-12-07 23:35:35

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



You seem to be misunderstanding the point of the argument: R3 stack fault -
no problem - handled by trap gate for idt vector 12 - recovery is possible
if one wants to handle it. R0 stack fault - big problem, exception 12 is
converted to a double-fault, which is converted to a triple-fault because
vector 8 is a trap gate and not a task gate.


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


"Richard B. Johnson" <[email protected]> on 07/12/2000 21:44:23

Please respond to [email protected]

To: Richard J Moore/UK/IBM@IBMGB
cc: Andi Kleen <[email protected]>, "Maciej W. Rozycki" <[email protected]>,
[email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




On Thu, 7 Dec 2000 [email protected] wrote:

>
>
> Which surely we can on today's x86 systems. Even back in the days of OS/2
> 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> Double Fault. You need only a minimal stack - 1K, sufficient to save
state
> and restore ESP to a known point before switching back to the main TSS to
> allow normal exception handling to occur.
>
> There no architectural restriction that some folks have hinted at - as
long
> as the DPL for the task gates is 3.
>
[SNIPPED...]

Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
Reference Manual.

The specifc text is: "The TSS does not have a stack pointer for a
privilege level 3 stack, because the procedure cannot be called by a less
privileged procedure. The stack for privilege level 3 is preserved by the
contents of SS and EIP registers which have been saved on the stack
of the privilege level called from level 3".

What this means is that a stack-fault in level 3 will kill you no
matter how cute you try to be. And, putting a task gate as call
procedure entry from a trap or fault is just trying to be cute.
It's extra code that will result in the same processor reset.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-07 23:40:45

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



Yes, indeed this is the point - we should at least be able to report the
problem even if we can't recover - and we should do that in the standard
kernel. It doesn't seem right to convert a bad problem into an unfathomable
disaster, which is what a trap gate for double-fault does. If you're going
to do that then why bother to set up a trap gate, just leave IDT vector 8
as an invalid descriptor. As is stands, the do_double_fault routine is
otiose.


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


Keith Owens <[email protected]> on 07/12/2000 22:47:42

Please respond to Keith Owens <[email protected]>

To: Richard J Moore/UK/IBM@IBMGB
cc: Andi Kleen <[email protected]>, [email protected], "Maciej W. Rozycki"
<[email protected]>, [email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




On Thu, 7 Dec 2000 21:09:47 +0000,
[email protected] wrote:
>In summary I'd say the lack of a task gate is at the very least an
>oversight, if not a bug.
>
>If no one else wants to do it I'll see if I can code up the task gates for
>the double-fault and NMI.

If you overflow the kernel stack then you have already scribbled on the
process state at the low end of the kernel stack pages. The process is
definitely not recoverable but you might not even be able to recover
the machine. Corrupt p_opptr and friends, thread_group or pidhash and
other processes can be affected when they follow the chains. However
being able to report the error is a good start, even if you cannot
recover.

If you add task gates, assign enough stack space for debuggers. kdb
does a lot of work when NMI detects a hung cpu and needs stack space to
do that work. A good option is to dedicate a set of process entries
for per cpu task gates, say processes 2-NR_CPUS+1 are dedicated to task
gates.




2000-12-08 02:07:57

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Brian Gerst wrote:

> "Richard B. Johnson" wrote:
> >
> > On Thu, 7 Dec 2000 [email protected] wrote:
> >
> > >
> > >
> > > Which surely we can on today's x86 systems. Even back in the days of OS/2
> > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> > > Double Fault. You need only a minimal stack - 1K, sufficient to save state
> > > and restore ESP to a known point before switching back to the main TSS to
> > > allow normal exception handling to occur.
> > >
> > > There no architectural restriction that some folks have hinted at - as long
> > > as the DPL for the task gates is 3.
> > >
> > [SNIPPED...]
> >
> > Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
> > Reference Manual.
> >
> > The specifc text is: "The TSS does not have a stack pointer for a
> > privilege level 3 stack, because the procedure cannot be called by a less
> > privileged procedure. The stack for privilege level 3 is preserved by the
> > contents of SS and EIP registers which have been saved on the stack
> > of the privilege level called from level 3".
> >
> > What this means is that a stack-fault in level 3 will kill you no
> > matter how cute you try to be. And, putting a task gate as call
> > procedure entry from a trap or fault is just trying to be cute.
> > It's extra code that will result in the same processor reset.
>
> No, because the CPL of the task gate would be 0, which means the stack
> will be set to tss->esp0. The DPL of 3 means that the descriptor can be
> accessed from CPL3. The text you mention generally means that the only
> way to get back to CPL3 is with iret (via the saved %cs:%eip and
> %ss:%esp pushed on the CPL0/1/2 stack).
>
> --
>
It is yes, not no.

(1) User traps, CPL3, stack for trap is in CPL0.
(2) CPL0 has stack-fault (bad ring zero code, bad memory).
(3) CPL0 traps, using faulted stack, double fault.
(4) There is no stack-trick, including a call-gate to another
"environment" (complete with its previously-reserved stack),
that will ever get you back to (2), much less to (1).

I am not denying the possibility of "warm-booting", i.e.,
reloate some code to where there is a 1:1 physical to virtual
translation, jump to the relocated code, disable paging, restart kernel
code, and possibly examine what happened. You just have to get
back to "flat-mode" with no paging to handle anything beyond a
double fault. You are just not going to be able to restart
from the stack-faulted code.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-08 09:09:25

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



No no. That's that the whole point of a gate. You make a controlled
transition to ring 0 including stack switching. There are complex
protection checking rules, however as long as the DPL of the gate
descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A
stack fault in user mode cannot kill the system. If it ever did it would be
a blatant bug of the most crass kind.

You seem to be implying that a stack fault in R3 will or could cause a
stack fault in R0 - why? Each thread has it's own R0 stack. The value for
R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at
the top of the stack.


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


"Richard B. Johnson" <[email protected]> on 08/12/2000 01:36:58

Please respond to [email protected]

To: Brian Gerst <[email protected]>
cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen <[email protected]>, "Maciej W.
Rozycki" <[email protected]>, [email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




On Thu, 7 Dec 2000, Brian Gerst wrote:

> "Richard B. Johnson" wrote:
> >
> > On Thu, 7 Dec 2000 [email protected] wrote:
> >
> > >
> > >
> > > Which surely we can on today's x86 systems. Even back in the days of
OS/2
> > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> > > Double Fault. You need only a minimal stack - 1K, sufficient to save
state
> > > and restore ESP to a known point before switching back to the main
TSS to
> > > allow normal exception handling to occur.
> > >
> > > There no architectural restriction that some folks have hinted at -
as long
> > > as the DPL for the task gates is 3.
> > >
> > [SNIPPED...]
> >
> > Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
> > Reference Manual.
> >
> > The specifc text is: "The TSS does not have a stack pointer for a
> > privilege level 3 stack, because the procedure cannot be called by a
less
> > privileged procedure. The stack for privilege level 3 is preserved by
the
> > contents of SS and EIP registers which have been saved on the stack
> > of the privilege level called from level 3".
> >
> > What this means is that a stack-fault in level 3 will kill you no
> > matter how cute you try to be. And, putting a task gate as call
> > procedure entry from a trap or fault is just trying to be cute.
> > It's extra code that will result in the same processor reset.
>
> No, because the CPL of the task gate would be 0, which means the stack
> will be set to tss->esp0. The DPL of 3 means that the descriptor can be
> accessed from CPL3. The text you mention generally means that the only
> way to get back to CPL3 is with iret (via the saved %cs:%eip and
> %ss:%esp pushed on the CPL0/1/2 stack).
>
> --
>
It is yes, not no.

(1) User traps, CPL3, stack for trap is in CPL0.
(2) CPL0 has stack-fault (bad ring zero code, bad memory).
(3) CPL0 traps, using faulted stack, double fault.
(4) There is no stack-trick, including a call-gate to another
"environment" (complete with its previously-reserved stack),
that will ever get you back to (2), much less to (1).

I am not denying the possibility of "warm-booting", i.e.,
reloate some code to where there is a 1:1 physical to virtual
translation, jump to the relocated code, disable paging, restart kernel
code, and possibly examine what happened. You just have to get
back to "flat-mode" with no paging to handle anything beyond a
double fault. You are just not going to be able to restart
from the stack-faulted code.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.





2000-12-08 12:14:23

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000 [email protected] wrote:

> Which surely we can on today's x86 systems. Even back in the days of OS/2
> 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> Double Fault. You need only a minimal stack - 1K, sufficient to save state
> and restore ESP to a known point before switching back to the main TSS to
> allow normal exception handling to occur.

The memory hit is surely not a problem.

> There's no problem under MP since the double fault exception will be only
> presented on the processor that instigated the problem.

But what if another double fault happens on another CPU at roughly the
same time (unlikely, but still...)?

> As for NMIs I didn't think they were presented to all processors
> simultaneously. If they are then the way to handle that is to map a page of
> the GDT, to a unique physical address per-processor - i.e. processor
> local storage. The virtual address will be the same on each. This is what
> we did under OS/2 SMP.

Good idea.

> The only time you want the NMI handler to be fast is when it's being used
> for hand-shaking, which some disk devices do. And perhaps for APIC NMI
> class interprocessor interrupts. But I honestly don't think that's really a
> good enough reason not to have a task gate for NMI.

Do we really want to waste 60000+ CPU cycles every second just to handle
a TSS switch?

> The unpredictablility of the abort (NMI or Double-fault) refers to fact
> that in general it is indeterminate as to whether it is a fault or trap.

NMI is a normal interrupt (fault-like) and not an abort. It's fully
predictable.

> And that's a matter of whether the EIP point at ot after the instruction
> related to the exception. The abort nature of theses exceptions is not
> really a problem for the exception handler.

If you get a double fault during retrieving a CPU state from a TSS, you
may end with an inconsistent state -- you may be unable to iretd or use
the stack. For NMIs it doesn't happen -- an NMI event, if happens during
a TSS switch, will not be handled until the switch completes.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-08 12:28:31

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Thu, 7 Dec 2000, Richard B. Johnson wrote:

> I am not denying the possibility of "warm-booting", i.e.,
> reloate some code to where there is a 1:1 physical to virtual
> translation, jump to the relocated code, disable paging, restart kernel
> code, and possibly examine what happened. You just have to get
> back to "flat-mode" with no paging to handle anything beyond a
> double fault. You are just not going to be able to restart
> from the stack-faulted code.

If you want to handle triple faults (well, there should be none of these
given a proper double fault handler) you may use the NMI as well. You are
guaranteed to receive a NMI after a while when the watchdog is active (it
is for SMP systems by default now and it will be for P6+ UP systems for
Linux 2.5 as well). At least current Intel chipsets do not assert RESET
to the CPU as a response to the shutdown special cycle in their default
configuration (we may even explicitly force that behaviour).

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2000-12-08 13:29:19

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Fri, 8 Dec 2000 [email protected] wrote:

>
>
> No no. That's that the whole point of a gate. You make a controlled
> transition to ring 0 including stack switching. There are complex
> protection checking rules, however as long as the DPL of the gate
> descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A
> stack fault in user mode cannot kill the system. If it ever did it would be
> a blatant bug of the most crass kind.
>
> You seem to be implying that a stack fault in R3 will or could cause a
> stack fault in R0 - why? Each thread has it's own R0 stack. The value for
> R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at
> the top of the stack.
>

Read my lips. I implied no such thing. The user trap to kernel was
just a way to get to the kernel, i.e., "system call". Otherwise
you don't have anything to "get back to".

Too many people just want to argue without even reading what they
are arguing against. Again, I implied nothing. I said;

(1) User traps, CPL3, stack for trap is in CPL0.
(2) CPL0 has stack-fault (bad ring zero code, bad memory).
(3) CPL0 traps, using faulted stack, double fault.
(4) There is no stack-trick, including a call-gate to another
"environment" (complete with its previously-reserved stack),
that will ever get you back to (2), much less to (1).

Now, if you can't read this, don't argue.



>
> "Richard B. Johnson" <[email protected]> on 08/12/2000 01:36:58
>
> Please respond to [email protected]
>
> To: Brian Gerst <[email protected]>
> cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen <[email protected]>, "Maciej W.
> Rozycki" <[email protected]>, [email protected]
> Subject: Re: Why is double_fault serviced by a trap gate?
>
>
>
>
> On Thu, 7 Dec 2000, Brian Gerst wrote:
>
> > "Richard B. Johnson" wrote:
> > >
> > > On Thu, 7 Dec 2000 [email protected] wrote:
> > >
> > > >
> > > >
> > > > Which surely we can on today's x86 systems. Even back in the days of
> OS/2
> > > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and
> > > > Double Fault. You need only a minimal stack - 1K, sufficient to save
> state
> > > > and restore ESP to a known point before switching back to the main
> TSS to
> > > > allow normal exception handling to occur.
> > > >
> > > > There no architectural restriction that some folks have hinted at -
> as long
> > > > as the DPL for the task gates is 3.
> > > >
> > > [SNIPPED...]
> > >
> > > Please refer to page 6-16, Inter486 Microprocessor Family Programmer's
> > > Reference Manual.
> > >
> > > The specifc text is: "The TSS does not have a stack pointer for a
> > > privilege level 3 stack, because the procedure cannot be called by a
> less
> > > privileged procedure. The stack for privilege level 3 is preserved by
> the
> > > contents of SS and EIP registers which have been saved on the stack
> > > of the privilege level called from level 3".
> > >
> > > What this means is that a stack-fault in level 3 will kill you no
> > > matter how cute you try to be. And, putting a task gate as call
> > > procedure entry from a trap or fault is just trying to be cute.
> > > It's extra code that will result in the same processor reset.
> >
> > No, because the CPL of the task gate would be 0, which means the stack
> > will be set to tss->esp0. The DPL of 3 means that the descriptor can be
> > accessed from CPL3. The text you mention generally means that the only
> > way to get back to CPL3 is with iret (via the saved %cs:%eip and
> > %ss:%esp pushed on the CPL0/1/2 stack).
> >
> > --
> >
> It is yes, not no.
>
> (1) User traps, CPL3, stack for trap is in CPL0.
> (2) CPL0 has stack-fault (bad ring zero code, bad memory).
> (3) CPL0 traps, using faulted stack, double fault.
> (4) There is no stack-trick, including a call-gate to another
> "environment" (complete with its previously-reserved stack),
> that will ever get you back to (2), much less to (1).
>
> I am not denying the possibility of "warm-booting", i.e.,
> reloate some code to where there is a 1:1 physical to virtual
> translation, jump to the relocated code, disable paging, restart kernel
> code, and possibly examine what happened. You just have to get
> back to "flat-mode" with no paging to handle anything beyond a
> double fault. You are just not going to be able to restart
> from the stack-faulted code.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).
>
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-08 13:50:08

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?






Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK
---------------------- Forwarded by Richard J Moore/UK/IBM on 08/12/2000
13:17 ---------------------------




To: [email protected]
cc:

From: Richard J Moore/UK/IBM@IBMGB
Subject: Re: Why is double_fault serviced by a trap gate?
Importance: Normal


I'm sorry I still don't see your point. You have a double-fault in R0
running on the normal R0 stack I presume. If you don't handle exception 8
with a task gate then this automatically becomes a triple-fault, the
processor resets and we get no information about what's happened.

My point is that the double-fault code is a waste of time unless you use a
task gate. If you're not going to do that then just leave IDT 8 as an
invalid descriptor.

As far as aguing without reading what you're written, that's not the case.
You're using very abreviated language, it's not obvious to me what you're
driving at - I have to fill in the gaps and guess.

What do you mean by "stack-trick"?
Why can't recovery be sufficient at least to give meaninful diagnostic
information?

Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


"Richard B. Johnson" <[email protected]> on 08/12/2000 12:58:06

Please respond to [email protected]

To: Richard J Moore/UK/IBM@IBMGB
cc: Brian Gerst <[email protected]>, Andi Kleen <[email protected]>, "Maciej
W. Rozycki" <[email protected]>, [email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




On Fri, 8 Dec 2000 [email protected] wrote:

>
>
> No no. That's that the whole point of a gate. You make a controlled
> transition to ring 0 including stack switching. There are complex
> protection checking rules, however as long as the DPL of the gate
> descriptor is 3 then ring 3 is allowed to make the transition to ring 0.
A
> stack fault in user mode cannot kill the system. If it ever did it would
be
> a blatant bug of the most crass kind.
>
> You seem to be implying that a stack fault in R3 will or could cause a
> stack fault in R0 - why? Each thread has it's own R0 stack. The value for
> R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at
> the top of the stack.
>

Read my lips. I implied no such thing. The user trap to kernel was
just a way to get to the kernel, i.e., "system call". Otherwise
you don't have anything to "get back to".

Too many people just want to argue without even reading what they
are arguing against. Again, I implied nothing. I said;

(1) User traps, CPL3, stack for trap is in CPL0.
(2) CPL0 has stack-fault (bad ring zero code, bad memory).
(3) CPL0 traps, using faulted stack, double fault.
(4) There is no stack-trick, including a call-gate to another
"environment" (complete with its previously-reserved stack),
that will ever get you back to (2), much less to (1).

Now, if you can't read this, don't argue.



>
> "Richard B. Johnson" <[email protected]> on 08/12/2000 01:36:58
>
> Please respond to [email protected]
>
> To: Brian Gerst <[email protected]>
> cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen <[email protected]>, "Maciej W.
> Rozycki" <[email protected]>, [email protected]
> Subject: Re: Why is double_fault serviced by a trap gate?
>
>
>
>
> On Thu, 7 Dec 2000, Brian Gerst wrote:
>
> > "Richard B. Johnson" wrote:
> > >
> > > On Thu, 7 Dec 2000 [email protected] wrote:
> > >
> > > >
> > > >
> > > > Which surely we can on today's x86 systems. Even back in the days
of
> OS/2
> > > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI
and
> > > > Double Fault. You need only a minimal stack - 1K, sufficient to
save
> state
> > > > and restore ESP to a known point before switching back to the main
> TSS to
> > > > allow normal exception handling to occur.
> > > >
> > > > There no architectural restriction that some folks have hinted at -
> as long
> > > > as the DPL for the task gates is 3.
> > > >
> > > [SNIPPED...]
> > >
> > > Please refer to page 6-16, Inter486 Microprocessor Family
Programmer's
> > > Reference Manual.
> > >
> > > The specifc text is: "The TSS does not have a stack pointer for a
> > > privilege level 3 stack, because the procedure cannot be called by a
> less
> > > privileged procedure. The stack for privilege level 3 is preserved by
> the
> > > contents of SS and EIP registers which have been saved on the stack
> > > of the privilege level called from level 3".
> > >
> > > What this means is that a stack-fault in level 3 will kill you no
> > > matter how cute you try to be. And, putting a task gate as call
> > > procedure entry from a trap or fault is just trying to be cute.
> > > It's extra code that will result in the same processor reset.
> >
> > No, because the CPL of the task gate would be 0, which means the stack
> > will be set to tss->esp0. The DPL of 3 means that the descriptor can
be
> > accessed from CPL3. The text you mention generally means that the only
> > way to get back to CPL3 is with iret (via the saved %cs:%eip and
> > %ss:%esp pushed on the CPL0/1/2 stack).
> >
> > --
> >
> It is yes, not no.
>
> (1) User traps, CPL3, stack for trap is in CPL0.
> (2) CPL0 has stack-fault (bad ring zero code, bad memory).
> (3) CPL0 traps, using faulted stack, double fault.
> (4) There is no stack-trick, including a call-gate to another
> "environment" (complete with its previously-reserved stack),
> that will ever get you back to (2), much less to (1).
>
> I am not denying the possibility of "warm-booting", i.e.,
> reloate some code to where there is a 1:1 physical to virtual
> translation, jump to the relocated code, disable paging, restart kernel
> code, and possibly examine what happened. You just have to get
> back to "flat-mode" with no paging to handle anything beyond a
> double fault. You are just not going to be able to restart
> from the stack-faulted code.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).
>
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/
>

Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.






2000-12-08 17:07:43

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



Actually what you are pointing out here is the differing needs for
differing uses. Real-time, embedded systems etc have different requirements
or at lest different priorities to enterprise usage. I'm coming from the
enterprise server angle - the Linux/390 type of use and high end IA32
Server.

I'll certainly add the double-fault hander to my list of RAS stuff. I'm not
so convinced about NMI being a task gate.

Richard


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


"Richard B. Johnson" <[email protected]> on 08/12/2000 15:04:19

Please respond to [email protected]

To: Richard J Moore/UK/IBM@IBMGB
cc:
Subject: Re: Why is double_fault serviced by a trap gate?




On Fri, 8 Dec 2000 [email protected] wrote:

>
>
> I really think you're taking very negative position - I have seen this
> technique deployed on onther Intel based operating systems. I don't see
why
> Linux shouldn't step up to that. If one is careful the double-fault can
be
> handled to the extent that other kernel services (or a subset of them)
are
> callable and we may be even take a crash dump. I agree that the current
> thread will die and possibly the system will may have to be closed down.
>
>
> Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).
>

If you have a "survival patch" for some recent kernel, or if you
develop one, I will certainly try to help getting it to work. However,
I have been in the "been there, done that.." position trying to
keep a critical system (CAT Scanner) up long enough to complete
a scan after a HV Arc caused bad things to happen (a few single-bit
errors in memory). And I didn't have to worry about all the tasks
that exist in a desktop OS. My OS for the scanner had tasks that were
known at compile-time!

The solution found was checkpointed task code (for restarting where
it left off), and restarting the kernel by:

o Get paging OFF
o Fix up a temporary flat-mode environment.
o Get new kernel code from NVRAM.
o Reload/restart kernel
o Start tasks.

What I learned might be helpful.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.





2000-12-08 17:23:14

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Fri, 8 Dec 2000 [email protected] wrote:

>
>
> Actually what you are pointing out here is the differing needs for
> differing uses. Real-time, embedded systems etc have different requirements
> or at lest different priorities to enterprise usage. I'm coming from the
> enterprise server angle - the Linux/390 type of use and high end IA32
> Server.
>
> I'll certainly add the double-fault hander to my list of RAS stuff. I'm not
> so convinced about NMI being a task gate.
>
> Richard
>
>
[Snipped...]

As I said, if you need some help I'll gladly try.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2000-12-08 21:03:41

by Mikulas Patocka

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

> No no. That's that the whole point of a gate. You make a controlled
> transition to ring 0 including stack switching. There are complex
> protection checking rules, however as long as the DPL of the gate
> descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A
> stack fault in user mode cannot kill the system. If it ever did it would be
> a blatant bug of the most crass kind.

Setting DPL == 3 of any interrupt/trap/fault gate is bad idea because it
allows the user to kill the machine with INT 8 or something like that. DPL
is checked only if interrupt is generated with INT, INT3 or INTO (IA
manual, vol 3, section 5.10.1.1).

Mikulas

2000-12-08 21:19:53

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



Exactly, and you wouldn't set DPL=3 for interrupt 8 since a double-fault
can only occur from ring 0..


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


Mikulas Patocka <[email protected]> on 08/12/2000 20:31:59

Please respond to Mikulas Patocka <[email protected]>

To: Richard J Moore/UK/IBM@IBMGB
cc: [email protected], Brian Gerst <[email protected]>, Andi
Kleen <[email protected]>, "Maciej W. Rozycki" <[email protected]>,
[email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




> No no. That's that the whole point of a gate. You make a controlled
> transition to ring 0 including stack switching. There are complex
> protection checking rules, however as long as the DPL of the gate
> descriptor is 3 then ring 3 is allowed to make the transition to ring 0.
A
> stack fault in user mode cannot kill the system. If it ever did it would
be
> a blatant bug of the most crass kind.

Setting DPL == 3 of any interrupt/trap/fault gate is bad idea because it
allows the user to kill the machine with INT 8 or something like that. DPL
is checked only if interrupt is generated with INT, INT3 or INTO (IA
manual, vol 3, section 5.10.1.1).

Mikulas

2000-12-08 23:05:40

by Keith Owens

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?

On Fri, 8 Dec 2000 07:58:06 -0500 (EST),
"Richard B. Johnson" <[email protected]> wrote:
>Too many people just want to argue without even reading what they
>are arguing against. Again, I implied nothing. I said;
>
> (1) User traps, CPL3, stack for trap is in CPL0.
> (2) CPL0 has stack-fault (bad ring zero code, bad memory).
> (3) CPL0 traps, using faulted stack, double fault.
> (4) There is no stack-trick, including a call-gate to another
> "environment" (complete with its previously-reserved stack),
> that will ever get you back to (2), much less to (1).

Nobody thinks that a stack overflow is recoverable - for that process.
By the time you overflow, the struct task at the bottom of the kernel
stack has been overwritten so the process is dead, gone to make its
maker, it is pushing up daisies. The rest of the system may or may not
recover, depending on the resources that the dead process is still
holding and the links between processes.

Changing the stack overflow to a trap gate will give us diagnostics on
the failing task instead of an immediate triple fault and reboot.
Diagnostics are useful. If the system can recover afterwards then that
is a bonus but it is not guaranteed. The process is always unrecoverable.

I am not convinced that using a trap gate for NMI is a good idea, the
NMI watchdog kicks in too often for my liking. Using a trap gate for a
debugger would be worthwhile, I have always been worried about the
amount of stack that kdb uses.

2000-12-10 00:21:55

by Richard J Moore

[permalink] [raw]
Subject: Re: Why is double_fault serviced by a trap gate?



I agree, I've changed my mind about the use of a task gate for NMI - Intel
recommend an interrupt gate for a very good reason - NMI's are queued until
the IRET so using an interrup gate for NMI (and keeping interrupts
disabled) will guarantee that NMIs are handled serially.

I think our use of a trap gate for NMI in OS/2 was probably not the best
idea.


Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


Keith Owens <[email protected]> on 08/12/2000 22:34:49

Please respond to Keith Owens <[email protected]>

To: [email protected]
cc: Richard J Moore/UK/IBM@IBMGB, Brian Gerst <[email protected]>,
Andi Kleen <[email protected]>, "Maciej W. Rozycki" <[email protected]>,
[email protected]
Subject: Re: Why is double_fault serviced by a trap gate?




On Fri, 8 Dec 2000 07:58:06 -0500 (EST),
"Richard B. Johnson" <[email protected]> wrote:
>Too many people just want to argue without even reading what they
>are arguing against. Again, I implied nothing. I said;
>
> (1) User traps, CPL3, stack for trap is in CPL0.
> (2) CPL0 has stack-fault (bad ring zero code, bad memory).
> (3) CPL0 traps, using faulted stack, double fault.
> (4) There is no stack-trick, including a call-gate to another
> "environment" (complete with its previously-reserved stack),
> that will ever get you back to (2), much less to (1).

Nobody thinks that a stack overflow is recoverable - for that process.
By the time you overflow, the struct task at the bottom of the kernel
stack has been overwritten so the process is dead, gone to make its
maker, it is pushing up daisies. The rest of the system may or may not
recover, depending on the resources that the dead process is still
holding and the links between processes.

Changing the stack overflow to a trap gate will give us diagnostics on
the failing task instead of an immediate triple fault and reboot.
Diagnostics are useful. If the system can recover afterwards then that
is a bonus but it is not guaranteed. The process is always unrecoverable.

I am not convinced that using a trap gate for NMI is a good idea, the
NMI watchdog kicks in too often for my liking. Using a trap gate for a
debugger would be worthwhile, I have always been worried about the
amount of stack that kdb uses.