LinuxLists.cc - Race betwen the NMI handler and the RTC clock in practially all kernels

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

Corey Minyard <[email protected]> writes:

> I had a customer on x86 notice that sometimes offset 0xf in the CMOS
> RAM was getting set to invalid values. Their BIOS used this for
> information about how to boot, and this caused the BIOS to lock up.
>
> They traced it down to the following code in arch/kernel/traps.c (now
> in include/asm-i386/mach-default/mach_traps.c):
>
> outb(0x8f, 0x70);
> inb(0x71); /* dummy */
> outb(0x0f, 0x70);
> inb(0x71); /* dummy */

Just use a different dummy register, like 0x80 which is normally used
for delaying IO (I think that is what the dummy access does)

But I'm pretty sure this NMI handling is incorrect anyways, its
use of bits doesn't match what the datasheets say of modern x86
chipsets say. Perhaps it would be best to just get rid of
that legacy register twiddling completely.

I will also remove it from x86-64.

-Andi

2004-10-25 20:00:00

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

According to the comments in 2.4, this code causes the NMI to be
re-asserted if another NMI occurred while the NMI handler was running.
I have no idea how twiddling with these CMOS registers causes this to
happen, but that is supposed to be the intent. I don't think it has
anything to do with delays.

I would like to know what this code really does before removing it.

-Corey

Andi Kleen wrote:

>Corey Minyard <[email protected]> writes:
>
>
>
>>I had a customer on x86 notice that sometimes offset 0xf in the CMOS
>>RAM was getting set to invalid values. Their BIOS used this for
>>information about how to boot, and this caused the BIOS to lock up.
>>
>>They traced it down to the following code in arch/kernel/traps.c (now
>>in include/asm-i386/mach-default/mach_traps.c):
>>
>> outb(0x8f, 0x70);
>> inb(0x71); /* dummy */
>> outb(0x0f, 0x70);
>> inb(0x71); /* dummy */
>>
>>
>
>Just use a different dummy register, like 0x80 which is normally used
>for delaying IO (I think that is what the dummy access does)
>
>But I'm pretty sure this NMI handling is incorrect anyways, its
>use of bits doesn't match what the datasheets say of modern x86
>chipsets say. Perhaps it would be best to just get rid of
>that legacy register twiddling completely.
>
>I will also remove it from x86-64.
>
>-Andi
>
>

2004-10-25 20:10:34

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, 25 Oct 2004, Andi Kleen wrote:

> > They traced it down to the following code in arch/kernel/traps.c (now
> > in include/asm-i386/mach-default/mach_traps.c):
> >
> > outb(0x8f, 0x70);
> > inb(0x71); /* dummy */
> > outb(0x0f, 0x70);
> > inb(0x71); /* dummy */
>
> Just use a different dummy register, like 0x80 which is normally used
> for delaying IO (I think that is what the dummy access does)

It's not the dummy read that causes the problem. It's the index write
that does. It can be solved pretty easily by not changing the index. It
may be done if an auxiliary variable is used and other users of the index
cooperate. The dummy read isn't really necessary, but of course someone
broke their RTC access logic, so it was added as a workaround.

But then the firmware may screw it up if it changes the index from within
the SMM.

> But I'm pretty sure this NMI handling is incorrect anyways, its
> use of bits doesn't match what the datasheets say of modern x86
> chipsets say. Perhaps it would be best to just get rid of
> that legacy register twiddling completely.

The use is correct. Bit #7 at I/O port 0x70 controls the NMI line
pass-through flip-flop. "0" means "pass-through" and "1" means "force
inactive." As the NMI line is level-driven and the NMI input is
edge-triggered, the sequence is needed to regenerate an edge if another
NMI arrives via the line (not via the APIC) while the handler is running.

Maciej

2004-10-25 20:24:21

by Richard B. Johnson

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, 25 Oct 2004, Andi Kleen wrote:

> Corey Minyard <[email protected]> writes:
>
>> I had a customer on x86 notice that sometimes offset 0xf in the CMOS
>> RAM was getting set to invalid values. Their BIOS used this for
>> information about how to boot, and this caused the BIOS to lock up.
>>
>> They traced it down to the following code in arch/kernel/traps.c (now
>> in include/asm-i386/mach-default/mach_traps.c):
>>
>> outb(0x8f, 0x70);
>> inb(0x71); /* dummy */
>> outb(0x0f, 0x70);
>> inb(0x71); /* dummy */
>
> Just use a different dummy register, like 0x80 which is normally used
> for delaying IO (I think that is what the dummy access does)
>
> But I'm pretty sure this NMI handling is incorrect anyways, its
> use of bits doesn't match what the datasheets say of modern x86
> chipsets say. Perhaps it would be best to just get rid of
> that legacy register twiddling completely.
>
> I will also remove it from x86-64.
>
> -Andi

Normally the offset of the CMOS RAM is left an an unused
offset so that the bus-crash that occurs at power-down
doesn't corrupt the CMOS register contents. After CMOS
access, the offset should be left at either 0 (the seconds-
tick) or at 0x7f. In the case of 0x00, the seconds can
get corrupted at shutdown (it still ticks, but it could
glitch to a maximum of 59 seconds off). In the case
of 0x7f, there shouldn't be anything there. Higher offsets
could alias with some board decodes.

Offset 0x0f is really bad because it stores the reason
for a shutdown.

One should never use the write-offset-address/read-value
sequence of the CMOS as some kind of timer. You don't
even know the bus that accesses it! I could be inside
a super-I/O chip (fast) or external on the "local" bus
running at 18 MHz (slow), you just don't know and you
would need to use the "rtc_lock" spin-lock so that somebody
doesn't access the chip between the time you set the
offset and read or write the result.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
98.36% of all statistics are fiction.

2004-10-25 20:29:07

by Richard B. Johnson

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, 25 Oct 2004, Corey Minyard wrote:

> According to the comments in 2.4, this code causes the NMI to be re-asserted
> if another NMI occurred while the NMI handler was running. I have no idea
> how twiddling with these CMOS registers causes this to happen, but that is
> supposed to be the intent. I don't think it has anything to do with delays.
>
> I would like to know what this code really does before removing it.
>
> -Corey
>
> Andi Kleen wrote:
>

NMI is supposed to be turned OFF if the high-bit in the index
register is set. It is turned back ON if it is reset.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
98.36% of all statistics are fiction.

2004-10-25 20:48:40

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, 25 Oct 2004, Andi Kleen wrote:

> > It's not the dummy read that causes the problem. It's the index write
> > that does. It can be solved pretty easily by not changing the index. It
>
> True. It has to be cached once.

You mean once per write, don't you? The index is w/o, unfortunately.

> > The use is correct. Bit #7 at I/O port 0x70 controls the NMI line
> > pass-through flip-flop. "0" means "pass-through" and "1" means "force
> > inactive." As the NMI line is level-driven and the NMI input is
> > edge-triggered, the sequence is needed to regenerate an edge if another
> > NMI arrives via the line (not via the APIC) while the handler is running.
>
> At least in the datasheet I'm reading (AMD 8111) it is just a global
> enable/disable bit.

The flip-flop is expected to be connected to the NMI input of the
processor, which for systems using local APICs means their LINT1 inputs (I
think it's broadcasted for all existing systems, although in principle
only the BSP needs to be connected). But from the local APIC's point of
view LINT1 is just another local interrupt line which may or may not be
programmed for the NMI delivery mode and moreover, NMIs may arrive via the
LINT0 input or from the performance counter interrupt if programmed so
(this is the case with the NMI watchdog) or from another APIC as an IPI or
an ordinary interrupt. These alternative sources are of course unaffected
by the flip-flop unless you have a strange implementation of the local
APIC.

Maciej

2004-10-25 20:49:13

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

> > It's not the dummy read that causes the problem. It's the index write
> > that does. It can be solved pretty easily by not changing the index. It
>
> True. It has to be cached once.

I checked the Intel datasheets now. Problem is that they define this
register as read-only, and the only way to access it works using
a very chipset specific way (alternative LPC interface)

So it's impossible to check the old value. The original code is the only
way to do this (if it's even needed, Intel also doesn't say anything
about this bit being a flip-flop). Only possible change would be to
write an alternative index.

-Andi

2004-10-25 20:30:31

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

> One should never use the write-offset-address/read-value
> sequence of the CMOS as some kind of timer. You don't

Linux doesn't use it as timer. It just wants to change the
NMI enable bit, which for some unknown reason is in the same register.

-Andi

2004-10-25 21:03:40

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, Oct 25, 2004 at 02:50:27PM -0500, Corey Minyard wrote:
> According to the comments in 2.4, this code causes the NMI to be
> re-asserted if another NMI occurred while the NMI handler was running.
> I have no idea how twiddling with these CMOS registers causes this to
> happen, but that is supposed to be the intent. I don't think it has

I doubt it does anything useful on anything modern.

> anything to do with delays.

Old chipsets didn't like it when you did two accesses to related
registers in a row. Doing a dummy io inbetween causes an delay
that is long enough that fixes that. I think it was just an old fashioned
way to write out_p(). You need some kind of dummy register for this
and the code used the wrong one.

>
> I would like to know what this code really does before removing it.

It clears and sets the NMI enable bit of the chipset. IMHO it's useless
because no chipset should clear it. If anything you can just unconditionally
reenable it.

-Andi

2004-10-25 20:29:07

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels

On Mon, Oct 25, 2004 at 09:07:42PM +0100, Maciej W. Rozycki wrote:
> On Mon, 25 Oct 2004, Andi Kleen wrote:
>
> > > They traced it down to the following code in arch/kernel/traps.c (now
> > > in include/asm-i386/mach-default/mach_traps.c):
> > >
> > > outb(0x8f, 0x70);
> > > inb(0x71); /* dummy */
> > > outb(0x0f, 0x70);
> > > inb(0x71); /* dummy */
> >
> > Just use a different dummy register, like 0x80 which is normally used
> > for delaying IO (I think that is what the dummy access does)
>
> It's not the dummy read that causes the problem. It's the index write
> that does. It can be solved pretty easily by not changing the index. It

True. It has to be cached once.

> > But I'm pretty sure this NMI handling is incorrect anyways, its
> > use of bits doesn't match what the datasheets say of modern x86
> > chipsets say. Perhaps it would be best to just get rid of
> > that legacy register twiddling completely.
>
> The use is correct. Bit #7 at I/O port 0x70 controls the NMI line
> pass-through flip-flop. "0" means "pass-through" and "1" means "force
> inactive." As the NMI line is level-driven and the NMI input is
> edge-triggered, the sequence is needed to regenerate an edge if another
> NMI arrives via the line (not via the APIC) while the handler is running.

At least in the datasheet I'm reading (AMD 8111) it is just a global
enable/disable bit.

-Andi

2004-10-25 21:17:14

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

On Mon, 25 Oct 2004, Andi Kleen wrote:

> So it's impossible to check the old value. The original code is the only
> way to do this (if it's even needed, Intel also doesn't say anything
> about this bit being a flip-flop). Only possible change would be to
> write an alternative index.

You can't read the old value, but you can have a shadow variable written
every time the real index is written. Since NMIs are not preemptible and
this is a simple producer-consumer access, no mutex around accesses to the
variable is needed.

Maciej

2004-10-25 22:26:36

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

Maciej W. Rozycki wrote:

>On Mon, 25 Oct 2004, Andi Kleen wrote:
>
>
>
>>So it's impossible to check the old value. The original code is the only
>>way to do this (if it's even needed, Intel also doesn't say anything
>>about this bit being a flip-flop). Only possible change would be to
>>write an alternative index.
>>
>>
>
> You can't read the old value, but you can have a shadow variable written
>every time the real index is written. Since NMIs are not preemptible and
>this is a simple producer-consumer access, no mutex around accesses to the
>variable is needed.
>
> Maciej
>
>
If you look at my patch, it does create a shadow index.

And you need a mutex for SMP systems. If one processor is handling an
NMI, another processor may still be accessing the device.

The complexity comes because the claiming of the lock, the CPU that owns
the lock, and the index has to be atomic because the NMI handler has to
know all these things when the lock is claimed.

-Corey

2004-10-26 02:57:04

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

Maciej W. Rozycki wrote:

>>And you need a mutex for SMP systems. If one processor is handling an
>>NMI, another processor may still be accessing the device.
>>
>>
> Actually this path is meant to be ever accessed by one CPU only (one that
>has its LINT1 line enabled), but it may be reached by other ones due to
>the NMI watchdog as code does not check if its run by the right processor.
>This probably qualifies as a bug. Only the watchdog code of the NMI
>handler is expected to run everywhere.
>
>
Yes, only one processor will run through the NMI code, but another
processor may be accessing the RTC or something else in CMOS. The mutex
will prevent the NMI and the RTC access from conflicting.

-Corey

2004-10-26 02:42:19

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

On Mon, 25 Oct 2004, Corey Minyard wrote:

> If you look at my patch, it does create a shadow index.

I've noticed, yes. Actually yours is the right approach as we can't use
an arbitrary index in the NMI handler -- C register reads from the RTC
have a side effect of clearing pending interrupts.

> And you need a mutex for SMP systems. If one processor is handling an
> NMI, another processor may still be accessing the device.

Actually this path is meant to be ever accessed by one CPU only (one that
has its LINT1 line enabled), but it may be reached by other ones due to
the NMI watchdog as code does not check if its run by the right processor.
This probably qualifies as a bug. Only the watchdog code of the NMI
handler is expected to run everywhere.

> The complexity comes because the claiming of the lock, the CPU that owns
> the lock, and the index has to be atomic because the NMI handler has to
> know all these things when the lock is claimed.

If not the mentioned bug all the hassle wouldn't be needed.

Maciej

2004-10-26 12:02:03

by Richard B. Johnson

[permalink] [raw]

Subject: Re: Race betwen the NMI handler and the RTC clock in practially all kernels II

On Mon, 25 Oct 2004, Corey Minyard wrote:

> Maciej W. Rozycki wrote:
>
>> On Mon, 25 Oct 2004, Andi Kleen wrote:
>>
>>
>>> So it's impossible to check the old value. The original code is the only
>>> way to do this (if it's even needed, Intel also doesn't say anything
>>> about this bit being a flip-flop). Only possible change would be to write
>>> an alternative index.
>>>
>>
>> You can't read the old value, but you can have a shadow variable written
>> every time the real index is written. Since NMIs are not preemptible and
>> this is a simple producer-consumer access, no mutex around accesses to the
>> variable is needed.
>>

Yes it is!

Task 1 NMI
------- ----
Sets index register
Sets index register to something else
Reads wrong value

The NMI, by definition can't be masked so there is nothing that
can be done with interrupts to prevent task 1 from getting
the wrong value except spin-locks.

Anybody who accesses that shared device must use that device's
spin-lock and the lock must be obtained prior to caching the
shadow value.

>> Maciej
>>
> If you look at my patch, it does create a shadow index.
>
> And you need a mutex for SMP systems. If one processor is handling an NMI,
> another processor may still be accessing the device.
>
> The complexity comes because the claiming of the lock, the CPU that owns the
> lock, and the index has to be atomic because the NMI handler has to know all
> these things when the lock is claimed.
>
> -Corey

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
98.36% of all statistics are fiction.

2004-10-26 13:56:57