2007-08-15 23:32:11

by David P. Reed

[permalink] [raw]
Subject: RFC: do get_rtc_time() correctly

In trying to track down a bug related to hwclock hanging my x86_64
machine, I found myself reading include/asm-generic/rtc.h carefully with
a chipset spec for several RTC chips (in particular, the granddaddy, the
MC146818A) in my hand.

I found that the code in get_rtc_time() is very, very odd, and IMO very
wrong.

The idea in the comment at the top seems to suggest that the author
thought that the UIP flag indicates an update is in progress at that
very instant, so one needs to synchronize with the "falling edge" of
that flag to ensure that one can read the RTC state without instability
in its buffered value. That is not the way the UIP flag is defined to
work.

The UIP flag is =1 during a period PRIOR to the actual update, starting
224 usec before the update, and ending when the update is complete. It
is done that way (which might seem odd) so that if you read UIP=0, you
have a 224 usec window, EVEN IF the UIP were to become =1 just after you
read it.

So the proper way to read the RTC contents is to read the UIP flag, and
if zero, read all the RTC registers with interrupts masked completely,
so all reads happen in the 224 usec window. (NMI can still be a
problem, but you can have NMI's set a flag that forces a retry).

If the UIP flag is one, you need to try again. Pseudo-code is as follows:

retry:
spin_lock_irq(&rtc_lock);
if (UIP_flag !=0 ) { spin_unlock_irq(&rtc_lock); cpu_relax(); goto
retry; }
... read RTC registers ...
spin_unlock_irq(&rtc_lock);

This should work for all RTC's compatible with the MC146818A, and is
also somewhat faster and less masked than the code in the current Linux
(not that reading RTC's is crucial for performance, but the current code
occasionally *loops with all interrupts masked for 10 msec!* Why anyone
thought this necessary, I have no clue.)

I'm happy to code and test a patch. Rather than just submit a patch, I
thought I'd request others' comments on this, since it affects so many
architectures. cc me, if you will, as I don't subscribe to LKML, just
check it periodically.

As noted in the comment, it *is* true that setting the RTC clock needs
additional synchronization, which can be done in the drivers, as it
seems to be. (though I would use an API that is designed so that any
delay during the set period actually adds to the value being stored, so
that delaying during the set_rtc operation would not cause the value
stored to be "old")

- David

PS: I wrote code for 8088 versions of various PC DOS apps back in the
'80's, such as VisiCalc and Lotus 1-2-3, and hacked many timing-related
drivers back in the day - so I know this chip spec cold, and figured out
this odd stuff back then. That's why the weirdness jumped out at me.
I'm still an assembly language coder at heart.


2007-08-15 23:46:05

by Alan

[permalink] [raw]
Subject: Re: RFC: do get_rtc_time() correctly

> So the proper way to read the RTC contents is to read the UIP flag, and
> if zero, read all the RTC registers with interrupts masked completely,
> so all reads happen in the 224 usec window. (NMI can still be a
> problem, but you can have NMI's set a flag that forces a retry).

SMM/SMI is more likely to be what bumps you 224usec or more.

> I'm happy to code and test a patch. Rather than just submit a patch, I
> thought I'd request others' comments on this, since it affects so many
> architectures. cc me, if you will, as I don't subscribe to LKML, just
> check it periodically.

Go for it. The other architectures generally inherit it by inheriting
similar bridge chips or in more modern times the RTC macrocell. It should
also be possible to debug by putting in an optional sanity check
initially which checks the read made sense compared to a few more

2007-08-16 00:15:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: RFC: do get_rtc_time() correctly

David P. Reed wrote:
>
> The idea in the comment at the top seems to suggest that the author
> thought that the UIP flag indicates an update is in progress at that
> very instant, so one needs to synchronize with the "falling edge" of
> that flag to ensure that one can read the RTC state without instability
> in its buffered value. That is not the way the UIP flag is defined to
> work.
> The UIP flag is =1 during a period PRIOR to the actual update, starting
> 224 usec before the update, and ending when the update is complete. It
> is done that way (which might seem odd) so that if you read UIP=0, you
> have a 224 usec window, EVEN IF the UIP were to become =1 just after you
> read it.
>

That's not why it synchronizes with the UIP flag.

The purpose is to figure out where the RTC thinks the beginning of the
second is.

One can argue about the utility of that (since it's uncertain whether
setting the RTC will cause the beginning of the second to be updated),
but that was supposedly the reason for it 15 years ago.

-hpa

2007-08-16 00:27:28

by David P. Reed

[permalink] [raw]
Subject: Re: RFC: do get_rtc_time() correctly

Alan:

Thanks for the comment. I will code a patch, and include a sanity check
as you suggested, and send it for review. Just to clarify one concern
your note raised:

I understand that SMM/SMI servicing can take a long time, but SMM/SMI
shouldn't happen while interrupts are masked using local_irq_disable()
[included in spin_lock_irq()], at least on x86-architectures. If
SMM/SMI can happen even then, the NMI fix below could be generalized.

My mention of NMI (which by definition can't be masked) is because NMI
can happen even while interrupts are masked. This is a timing problem
that can't be dealt with by masking interrupts, and NMI's are used for
watchdogs, etc these days. It seems like just a general good thing to
be able to ask if an NMI has happened. A per-cpu NMI eventcount that
is incremented every NMI would allow one to detect NMI's that happen
during an otherwise masked code sequence by reading it at the beginning
and end of the code sequence. Don't know if NMIs are common on other
architectures, or if this is an architecture dependent concern.

Perhaps I'm really talking about two patches here. One for a mechanism
to detect NMIs that happen during a critical piece of code (so it can be
retried), and one that depends on that to be really proper in reading
the RTC reliably.

Alan Cox wrote:
>> So the proper way to read the RTC contents is to read the UIP flag, and
>> if zero, read all the RTC registers with interrupts masked completely,
>> so all reads happen in the 224 usec window. (NMI can still be a
>> problem, but you can have NMI's set a flag that forces a retry).
>>
>
> SMM/SMI is more likely to be what bumps you 224usec or more.
>
>
>> I'm happy to code and test a patch. Rather than just submit a patch, I
>> thought I'd request others' comments on this, since it affects so many
>> architectures. cc me, if you will, as I don't subscribe to LKML, just
>> check it periodically.
>>
>
> Go for it. The other architectures generally inherit it by inheriting
> similar bridge chips or in more modern times the RTC macrocell. It should
> also be possible to debug by putting in an optional sanity check
> initially which checks the read made sense compared to a few more
>
>

2007-08-16 00:58:47

by Rene Herman

[permalink] [raw]
Subject: Re: RFC: do get_rtc_time() correctly

On 08/16/2007 02:26 AM, David P. Reed wrote:

> My mention of NMI (which by definition can't be masked) is because NMI

Well, not by the CPU it can't, but on a PC, masking NMIs is a simple matter
of setting bit 7 of I/O port 0x70 to 1 (it seems the kernel does not provide
an interface for it though?).

Rene.

2007-08-16 09:58:19

by Alan

[permalink] [raw]
Subject: Re: RFC: do get_rtc_time() correctly

> I understand that SMM/SMI servicing can take a long time, but SMM/SMI
> shouldn't happen while interrupts are masked using local_irq_disable()

Don't bet on it.

> [included in spin_lock_irq()], at least on x86-architectures. If
> SMM/SMI can happen even then, the NMI fix below could be generalized.
>
> My mention of NMI (which by definition can't be masked) is because NMI

On x86 you can mask NMI briefly if you are willing to do a bit of CPU
abuse. Force an NMI, longjmp out of the handler, and NMI is off until the
next iret instruction or similar.

We don't do this and I think Linus would object to anyone who did 8)