2008-10-01 14:34:41

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

Nicolas Pitre <[email protected]> wrote:

> Disabling preemption is unneeded.

I think you may be wrong on that. MEI came up with the following point:

I think either disable preemption or disable interrupt is really
necessary for cnt32_to_63 macro, because there seems to be assumption
that the series of the code (1)-(4) must be executed within a half
period of the 32-bit counter.

-------------------------------------------------
#define cnt32_to_63(cnt_lo)
({
static volatile u32 __m_cnt_hi = 0;
cnt32_to_63_t __x;
(1) __x.hi = __m_cnt_hi;
(2) __x.lo = (cnt_lo);
(3) if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
(4) __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
__x.val;
})
-------------------------------------------------

If a task is preempted while executing the series of the code and
scheduled again after the half period of the 32-bit counter, the task
may destroy __m_cnt_hi.

Their suggested remedy is:

So I think it's better to disable interrupt the cnt32_to_63 and to
ensure that the series of the code are executed within a short period.

I think this is excessive... If we're sat there with interrupts disabled for
more than a half period (65s) then we've got other troubles. I think
disabling preemption for the duration ought to be enough. What do you think?

Now, I'm happy to put these in sched_clock() rather then cnt32_to_63() for my
purposes (see attached patch).

David
---
MN10300: Prevent cnt32_to_63() from being preempted in sched_clock()

From: David Howells <[email protected]>

Prevent cnt32_to_63() from being preempted in sched_clock() because it may
read its internal counter, get preempted, get delayed for more than the half
period of the 'TSC' and then write the internal counter, thus corrupting it.

Whilst some callers of sched_clock() have interrupts disabled or hold
spinlocks, not all do, and so preemption must be held here.

Note that sched_clock() is called from lockdep, but that shouldn't be a problem
because although preempt_disable() calls into lockdep, lockdep has a recursion
counter to deal with this.

Signed-off-by: David Howells <[email protected]>
---

arch/mn10300/kernel/time.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)


diff --git a/arch/mn10300/kernel/time.c b/arch/mn10300/kernel/time.c
index e460658..38f88bb 100644
--- a/arch/mn10300/kernel/time.c
+++ b/arch/mn10300/kernel/time.c
@@ -55,6 +55,9 @@ unsigned long long sched_clock(void)
unsigned long tsc, tmp;
unsigned product[3]; /* 96-bit intermediate value */

+ /* cnt32_to_63() is not safe with preemption */
+ preempt_disable();
+
/* read the TSC value
*/
tsc = 0 - get_cycles(); /* get_cycles() counts down */
@@ -65,6 +68,8 @@ unsigned long long sched_clock(void)
*/
tsc64.ll = cnt32_to_63(tsc) & 0x7fffffffffffffffULL;

+ preempt_enable();
+
/* scale the 64-bit TSC value to a nanosecond value via a 96-bit
* intermediate
*/


2008-10-01 15:36:49

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

On Wed, 1 Oct 2008, David Howells wrote:

> Nicolas Pitre <[email protected]> wrote:
>
> > Disabling preemption is unneeded.
>
> I think you may be wrong on that. MEI came up with the following point:
>
> I think either disable preemption or disable interrupt is really
> necessary for cnt32_to_63 macro, because there seems to be assumption
> that the series of the code (1)-(4) must be executed within a half
> period of the 32-bit counter.

This is still wrong. There is no need for any kind of locking what so
ever, period. That's the _point_ of the whole algorithm. The only
constraint is that the code has to be executed at least once per half
period of the 32 bit counter, which is typically once every couple
minutes or so.

> -------------------------------------------------
> #define cnt32_to_63(cnt_lo)
> ({
> static volatile u32 __m_cnt_hi = 0;
> cnt32_to_63_t __x;
> (1) __x.hi = __m_cnt_hi;
> (2) __x.lo = (cnt_lo);
> (3) if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
> (4) __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
> __x.val;
> })
> -------------------------------------------------
>
> If a task is preempted while executing the series of the code and
> scheduled again after the half period of the 32-bit counter, the task
> may destroy __m_cnt_hi.

Oh, that's a possibility. In that case __m_cnt_hi will be reverted to a
previous state just like if cnt32_to_63() has not been called yet since
the new half period. And a subsequent call will fix that again.

> Their suggested remedy is:
>
> So I think it's better to disable interrupt the cnt32_to_63 and to
> ensure that the series of the code are executed within a short period.
>
> I think this is excessive... If we're sat there with interrupts disabled for
> more than a half period (65s) then we've got other troubles. I think
> disabling preemption for the duration ought to be enough. What do you think?

I think this is excessive too. If you're preempted away for that long
you still have a problem. And if that's a real concern because you run
RT and have tasks frequently blocked for that long then don't use this
interface for critical counters for which absolute correctness is
essential, which is not the case for sched_clock() anyway. A
sched_clock() implementation is meant to be as lightweight as possible
even if it lacks in absolute precision, and the worst that could happen
if you actually manage to cause a glitch in the returned value from
sched_clock() is possibly the scheduling of the wrong task in a non RT
scheduler, and we determined that a RT scheduler is needed to cause this
glitch already.

> Now, I'm happy to put these in sched_clock() rather then cnt32_to_63() for my
> purposes (see attached patch).

If you still feel paranoid about this then I can't stop you from adding
extra locking in your own code. On machines I've created cnt32_to_63()
for, the critical half period delay can be like 9 minutes and worrying
about races that could last that long is really about ignoring a much
worse problem.


Nicolas

2008-10-02 22:24:01

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

Nicolas Pitre <[email protected]> wrote:

> Oh, that's a possibility. In that case __m_cnt_hi will be reverted to a
> previous state just like if cnt32_to_63() has not been called yet since
> the new half period. And a subsequent call will fix that again.

Surely that's not good enough. There's a 50% chance that reversion to a
previous state will result in an extra increment of the __m_cnt_hi on the next
call.

David

2008-10-03 19:15:45

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

On Thu, 2 Oct 2008, David Howells wrote:

> Nicolas Pitre <[email protected]> wrote:
>
> > Oh, that's a possibility. In that case __m_cnt_hi will be reverted to a
> > previous state just like if cnt32_to_63() has not been called yet since
> > the new half period. And a subsequent call will fix that again.
>
> Surely that's not good enough. There's a 50% chance that reversion to a
> previous state will result in an extra increment of the __m_cnt_hi on the next
> call.

If so then you're using this interface in an inappropriate way.

The _absolute_ minimum frequency with which this should be fully
executed is once per half period of the base counter. I hope that in
practice it happens far more often than that.


Nicolas

2008-10-06 10:44:53

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

Nicolas Pitre <[email protected]> wrote:

> If so then you're using this interface in an inappropriate way.
>
> The _absolute_ minimum frequency with which this should be fully
> executed is once per half period of the base counter. I hope that in
> practice it happens far more often than that.

I think you're misunderstanding my contention.

If preemption is enabled, cnt32_to_63() can be called with greater than
minimum frequency and yet reversions can still happen.

The problem is that a process that's half way through executing cnt32_to_63()
can be preempted for a period of time sufficient that when it is rescheduled
and writes __m_cnt_hi, it corrupts it with an out of date value.

David

2008-10-06 15:07:13

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH 1/2] MN10300: Move asm-arm/cnt32_to_63.h to include/linux/

On Mon, 6 Oct 2008, David Howells wrote:

> Nicolas Pitre <[email protected]> wrote:
>
> > If so then you're using this interface in an inappropriate way.
> >
> > The _absolute_ minimum frequency with which this should be fully
> > executed is once per half period of the base counter. I hope that in
> > practice it happens far more often than that.
>
> I think you're misunderstanding my contention.
>
> If preemption is enabled, cnt32_to_63() can be called with greater than
> minimum frequency and yet reversions can still happen.
>
> The problem is that a process that's half way through executing cnt32_to_63()
> can be preempted for a period of time sufficient that when it is rescheduled
> and writes __m_cnt_hi, it corrupts it with an out of date value.

I think you're misunderstanding my objection.

if a process that's half way through executing cnt32_to_63()
is preempted for a period of time in the same ball park as the
half period of the base counter then my point is that you have
a much bigger problem.

For __m_cnt_hi to be written with an out-of-date value, you'd need
this sequence of events:

(1) __x.hi = __m_cnt_hi;
(2) __x.lo = (cnt_lo);
(3) if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
(4) __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);

cnt_lo cnt_hi instance A instance B
---------------------------------------------------------------------
0x7fffffff 0x00000000 ...
0x80000000 0x00000000 ...
[the code isn't called for a _long_ time during which __m_cnt_hi
is not in synch with the low part]
0xff000000 0x00000000 (1) __x.hi = 0
0xff000001 0x00000000 (2) __x.lo = 0xff000001
0xff000002 0x00000000 (3) condition is true
0xff000003 0x00000000 *** preemption ***
0xff000004 0x00000000 ...
[a little while later]
0xffff0000 0x00000000 (1) __x.hi = 0
0xffff0001 0x00000000 (2) __x.lo = 0xffff0001
0xffff0002 0x00000000 (3) condition is true
0xffff0003 0x00000000 (4) update __m_cnt_hi
0xffff0004 0x80000000 ...
[a little while later]
0xffffffff 0x80000000 ...
0x00000000 0x80000000 ...
0x00000001 0x80000000 (1) __x.hi = 0x80000000
0x00000002 0x80000000 (2) __x.lo = 0x00000002
0x00000003 0x80000000 (3) condition is true
0x00000004 0x80000000 (4) update __m_cnt_hi
0x00000005 0x00000001 ...
0x00000006 0x00000001 *** preemption ***
0x00000007 0x00000001 (4) update update __m_cnt_hi
0x00000008 0x80000000 ...

At this point, it is true that __m_cnt_hi is not up to date while it was
a moment before. But that has no consequence because it was reverted
to the same state before the update at 0x00000004, meaning that the
update will be redone as soon as the code is executed again.

Sure this could be defeated if 1) the execution in instance B didn't
come soon enough to notice the initial lack of synchronization before
the second half period passed, or 2) the duration of the preemption of
instance A was long enough so not to leave enough time for __m_cnt_hi to
be corrected back with the current half-period. But those are extreme
cases this algorithm is _not_ made for because in practice they should
_not_ happen.

If the complete code sequence is always executed _at_ _least_ once per
half-period then you're OK. That's the theoretical requirement. In
practice you should expect it to be called many many times per
half-period to keep __m_cnt_hi warm. In my case this is a couple
hundred times per half period so I'm not worried.

So, if you have a system where 1) this code might not get called for a
long time _and_ 2) if called it can be preempted away for a long time,
then you certainly risk having __m_cnt_hi miss a transition. But in
that case fixing those issue is way more important than worrying about a
glitch in the __m_cnt_hi value.


Nicolas