2008-10-08 12:59:55

by Dave Kleikamp

[permalink] [raw]
Subject: Re: Definition of sched_clock broken

On Tue, 2008-09-23 at 14:04 -0700, Jeremy Fitzhardinge wrote:
> kernel/sched_clock.c has the comment:
>
> * The clock: sched_clock_cpu() is monotonic per cpu, and should be somewhat
> * consistent between cpus (never more than 2 jiffies difference).
>
> The two jiffy restriction is way too restrictive.

I won't argue whether this is a good thing or a bad thing, but I did
find a problem in the current implementation.

> Historically sched_clock() is intended to measure the amount of
> schedulable time occurring on a CPU. On a virtual cpu, that is affected
> by the amount of physical cpu time the hypervisor schedules for a vcpu,
> and can therefore advance in a very non-continuous way, depending on the
> overall load on the host system. It is, however, the only timebase that
> gives the kernel a reasonable hope of determining how much cpu a process
> actually got scheduled.
>
> The net result is that the sched_clock timebase is 1) monotonic, 2)
> loses arbitrary amounts of time against a system monotonic clock, 3)
> per-cpu, with 4) arbitrary drift between different cpu's sched_clocks.
>
> Tying the sched_clocks of different cpus together in any way loses these
> properties, and just turns it into another system wide monotonic clock,
> which seems redundant given that we already have one (I understand that
> the relatively loose synchronization allows it to be implemented more
> efficiently than a normal monotonic clock).

Currently, I've found that it is quite easy to observe scd->clock, and
therefore rq->clock, running backwards, losing the monotonic quality.

This patch demonstrates the problem. I've found it triggers often on my
laptop (Lenovo T60p) just doing everyday work. I'll follow up with a
patch that fixes the problem. I'll let Peter and Ingo judge if it is
the correct fix.

diff --git a/kernel/sched.c b/kernel/sched.c
index 7ea767a..b850937 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -638,7 +638,13 @@ static inline int cpu_of(struct rq *rq)

static inline void update_rq_clock(struct rq *rq)
{
- rq->clock = sched_clock_cpu(cpu_of(rq));
+ u64 clock = sched_clock_cpu(cpu_of(rq));
+ if (clock < rq->clock) {
+ printk(KERN_ERR "TIME TURNED BACK!\n");
+ printk(KERN_ERR "new time = %llx rq->clock = %llx\n",
+ clock, rq->clock);
+ } else
+ rq->clock = clock;
}

/*


> At the moment the x86 sched_clock is hooked through paravirt_ops so that
> the underlying hypervisor can provide precise scheduled time
> information, with the hope that the scheduler will use it to make better
> decisions. However if the scheduler needs to be lied to then I can do
> that too, but it's a pity to throw away information that's available to it.

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center


2008-10-08 13:01:34

by Dave Kleikamp

[permalink] [raw]
Subject: [PATCH] sched_clock: prevent scd->clock from moving backwards

sched_clock: prevent scd->clock from moving backwards

When sched_clock_cpu() couples the clocks between two cpus, it may
increment scd->clock beyond the GTOD tick window that __update_sched_clock()
uses to clamp the clock. A later call to __update_sched_clock() may move
the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
monotonic property.

This patch ensures that scd->clock will not be set backward.

Signed-off-by: Dave Kleikamp <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e8ab096..a989d64 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -118,13 +118,13 @@ static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now)

/*
* scd->clock = clamp(scd->tick_gtod + delta,
- * max(scd->tick_gtod, scd->clock),
- * scd->tick_gtod + TICK_NSEC);
+ * max(scd->tick_gtod, scd->clock),
+ * min(scd->clock, scd->tick_gtod + TICK_NSEC));
*/

clock = scd->tick_gtod + delta;
min_clock = wrap_max(scd->tick_gtod, scd->clock);
- max_clock = scd->tick_gtod + TICK_NSEC;
+ max_clock = wrap_min(scd->clock, scd->tick_gtod + TICK_NSEC);

clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

2008-10-08 23:13:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards

On Wed, 2008-10-08 at 08:00 -0500, Dave Kleikamp wrote:
> sched_clock: prevent scd->clock from moving backwards
>
> When sched_clock_cpu() couples the clocks between two cpus, it may
> increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> uses to clamp the clock. A later call to __update_sched_clock() may move
> the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> monotonic property.
>
> This patch ensures that scd->clock will not be set backward.

Ah, yes indeed, this comes from the tick not happening at the same time
on different cpus, so if we use a local timestamp to move a remote clock
forward, this scenario could indeed happen.

The fix looks good to me, good catch, thanks shaggy!

A related 'fix' which I'm still not quite sure about is making the
window 'tick_gtod + 2*TICK_NSEC'. That increases the max observed
difference to 4 jiffies, but allows ticks to be 'late' a bit without
first holding back time and then jumping ahead again.

> Signed-off-by: Dave Kleikamp <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>

Acked-by: Peter Zijlstra <[email protected]>

>
> diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
> index e8ab096..a989d64 100644
> --- a/kernel/sched_clock.c
> +++ b/kernel/sched_clock.c
> @@ -118,13 +118,13 @@ static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now)
>
> /*
> * scd->clock = clamp(scd->tick_gtod + delta,
> - * max(scd->tick_gtod, scd->clock),
> - * scd->tick_gtod + TICK_NSEC);
> + * max(scd->tick_gtod, scd->clock),
> + * min(scd->clock, scd->tick_gtod + TICK_NSEC));
> */
>
> clock = scd->tick_gtod + delta;
> min_clock = wrap_max(scd->tick_gtod, scd->clock);
> - max_clock = scd->tick_gtod + TICK_NSEC;
> + max_clock = wrap_min(scd->clock, scd->tick_gtod + TICK_NSEC);
>
> clock = wrap_max(clock, min_clock);
> clock = wrap_min(clock, max_clock);
>
>

2008-10-09 09:07:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2008-10-08 at 08:00 -0500, Dave Kleikamp wrote:
> > sched_clock: prevent scd->clock from moving backwards
> >
> > When sched_clock_cpu() couples the clocks between two cpus, it may
> > increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> > uses to clamp the clock. A later call to __update_sched_clock() may move
> > the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> > monotonic property.
> >
> > This patch ensures that scd->clock will not be set backward.
>
> Ah, yes indeed, this comes from the tick not happening at the same time
> on different cpus, so if we use a local timestamp to move a remote clock
> forward, this scenario could indeed happen.
>
> The fix looks good to me, good catch, thanks shaggy!
>
> A related 'fix' which I'm still not quite sure about is making the
> window 'tick_gtod + 2*TICK_NSEC'. That increases the max observed
> difference to 4 jiffies, but allows ticks to be 'late' a bit without
> first holding back time and then jumping ahead again.
>
> > Signed-off-by: Dave Kleikamp <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
>
> Acked-by: Peter Zijlstra <[email protected]>

applied to tip/sched/core, thanks!

Ingo

2008-10-09 15:18:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards


* Ingo Molnar <[email protected]> wrote:

>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, 2008-10-08 at 08:00 -0500, Dave Kleikamp wrote:
> > > sched_clock: prevent scd->clock from moving backwards
> > >
> > > When sched_clock_cpu() couples the clocks between two cpus, it may
> > > increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> > > uses to clamp the clock. A later call to __update_sched_clock() may move
> > > the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> > > monotonic property.
> > >
> > > This patch ensures that scd->clock will not be set backward.
> >
> > Ah, yes indeed, this comes from the tick not happening at the same time
> > on different cpus, so if we use a local timestamp to move a remote clock
> > forward, this scenario could indeed happen.
> >
> > The fix looks good to me, good catch, thanks shaggy!
> >
> > A related 'fix' which I'm still not quite sure about is making the
> > window 'tick_gtod + 2*TICK_NSEC'. That increases the max observed
> > difference to 4 jiffies, but allows ticks to be 'late' a bit without
> > first holding back time and then jumping ahead again.
> >
> > > Signed-off-by: Dave Kleikamp <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> >
> > Acked-by: Peter Zijlstra <[email protected]>
>
> applied to tip/sched/core, thanks!

hm, -tip testing found a sporadic hard lockup during bootup, and i've
bisected it back to this patch. They happened on 64-bit test-systems.
I've attached the .config that produced the problem.

i reverted the patch and the lockups went away. But i cannot see what's
wrong with it ...

The lockups were at random places during bootup, such as:

[ 0.000000] initcall init_mbcache+0x0/0x20 returned 0 after 0 msecs
[ 0.000000] calling dquot_init+0x0/0x100 @ 1
[ 0.000000] VFS: Disk quotas dquot_6.5.1
[ 0.000000] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.000000] initcall dquot_init+0x0/0x100 returned 0 after 10 msecs
[ 0.000000] calling init_v2_quota_format+0x0/0x20 @ 1
[ 0.000000] initcall init_v2_quota_format+0x0/0x20 returned 0 after 0 msecs
[ 0.000000] calling dnotify_init+0x0/0x30 @ 1
[... hard lockup ...]

weird ...

Ingo


Attachments:
(No filename) (2.26 kB)
config (55.71 kB)
Download all attachments

2008-10-09 17:54:35

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards

On Thu, 2008-10-09 at 17:17 +0200, Ingo Molnar wrote:

> hm, -tip testing found a sporadic hard lockup during bootup, and i've
> bisected it back to this patch. They happened on 64-bit test-systems.
> I've attached the .config that produced the problem.
>
> i reverted the patch and the lockups went away. But i cannot see what's
> wrong with it ...

I could have sworn I ran with the patch, but maybe I got my patch queue
messed up and never tested it right.

I think I see the problem.

--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -118,13 +118,13 @@ static u64 __update_sched_clock(struct
sched_clock_data *scd, u64 now)

/*
* scd->clock = clamp(scd->tick_gtod + delta,
- * max(scd->tick_gtod, scd->clock),
- * scd->tick_gtod + TICK_NSEC);
+ * max(scd->tick_gtod, scd->clock),
+ * min(scd->clock, scd->tick_gtod +
TICK_NSEC));
*/

clock = scd->tick_gtod + delta;
min_clock = wrap_max(scd->tick_gtod, scd->clock);
- max_clock = scd->tick_gtod + TICK_NSEC;
+ max_clock = wrap_min(scd->clock, scd->tick_gtod + TICK_NSEC);

clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

We want wrap_max(scd->clock, scd->tick_gtod + TICK_NSEC), not
wrap_min(). The problem I am trying to fix is that scd->tick_gtod +
TICK_NSEC may be too low. The upper bound needs to be at LEAST
scd->clock. Limiting it to scd->clock all the time is disastrous. :-)

I'll fix the patch and retest it before sending it again.

Sorry about my sloppiness.

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2008-10-09 18:21:44

by Dave Kleikamp

[permalink] [raw]
Subject: [PATCH] sched_clock: prevent scd->clock from moving backwards

On Thu, 2008-10-09 at 12:54 -0500, Dave Kleikamp wrote:

> I'll fix the patch and retest it before sending it again.

I'm definitely running with the new patch now. Here goes:
-------------------------------------------------
sched_clock: prevent scd->clock from moving backwards

When sched_clock_cpu() couples the clocks between two cpus, it may
increment scd->clock beyond the GTOD tick window that __update_sched_clock()
uses to clamp the clock. A later call to __update_sched_clock() may move
the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
monotonic property.

This patch ensures that scd->clock will not be set backward.

Signed-off-by: Dave Kleikamp <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e8ab096..8178724 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -118,13 +118,13 @@ static u64 __update_sched_clock(struct sched_clock_data *scd, u64 now)

/*
* scd->clock = clamp(scd->tick_gtod + delta,
- * max(scd->tick_gtod, scd->clock),
- * scd->tick_gtod + TICK_NSEC);
+ * max(scd->tick_gtod, scd->clock),
+ * max(scd->clock, scd->tick_gtod + TICK_NSEC));
*/

clock = scd->tick_gtod + delta;
min_clock = wrap_max(scd->tick_gtod, scd->clock);
- max_clock = scd->tick_gtod + TICK_NSEC;
+ max_clock = wrap_max(scd->clock, scd->tick_gtod + TICK_NSEC);

clock = wrap_max(clock, min_clock);
clock = wrap_min(clock, max_clock);

--
David Kleikamp
IBM Linux Technology Center

2008-10-09 21:22:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards


* Dave Kleikamp <[email protected]> wrote:

> On Thu, 2008-10-09 at 17:17 +0200, Ingo Molnar wrote:
>
> > hm, -tip testing found a sporadic hard lockup during bootup, and i've
> > bisected it back to this patch. They happened on 64-bit test-systems.
> > I've attached the .config that produced the problem.
> >
> > i reverted the patch and the lockups went away. But i cannot see what's
> > wrong with it ...
>
> I could have sworn I ran with the patch, but maybe I got my patch queue
> messed up and never tested it right.
>
> I think I see the problem.
>
> --- a/kernel/sched_clock.c
> +++ b/kernel/sched_clock.c
> @@ -118,13 +118,13 @@ static u64 __update_sched_clock(struct
> sched_clock_data *scd, u64 now)
>
> /*
> * scd->clock = clamp(scd->tick_gtod + delta,
> - * max(scd->tick_gtod, scd->clock),
> - * scd->tick_gtod + TICK_NSEC);
> + * max(scd->tick_gtod, scd->clock),
> + * min(scd->clock, scd->tick_gtod +
> TICK_NSEC));
> */
>
> clock = scd->tick_gtod + delta;
> min_clock = wrap_max(scd->tick_gtod, scd->clock);
> - max_clock = scd->tick_gtod + TICK_NSEC;
> + max_clock = wrap_min(scd->clock, scd->tick_gtod + TICK_NSEC);
>
> clock = wrap_max(clock, min_clock);
> clock = wrap_min(clock, max_clock);
>
> We want wrap_max(scd->clock, scd->tick_gtod + TICK_NSEC), not
> wrap_min(). [...]

ah, so the lockup bug was probably that sched_clock() was never going
forwards properly so some task was scheduled forever and livelocked the
system?

> [...] The problem I am trying to fix is that scd->tick_gtod +
> TICK_NSEC may be too low. The upper bound needs to be at LEAST
> scd->clock. Limiting it to scd->clock all the time is disastrous.
> :-)
>
> I'll fix the patch and retest it before sending it again.
>
> Sorry about my sloppiness.

no problem - and it's good that our bad-patch filters worked properly
and efficiently :-)

Ingo

2008-10-10 09:17:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched_clock: prevent scd->clock from moving backwards


* Dave Kleikamp <[email protected]> wrote:

> On Thu, 2008-10-09 at 12:54 -0500, Dave Kleikamp wrote:
>
> > I'll fix the patch and retest it before sending it again.
>
> I'm definitely running with the new patch now. Here goes:
> -------------------------------------------------
> sched_clock: prevent scd->clock from moving backwards
>
> When sched_clock_cpu() couples the clocks between two cpus, it may
> increment scd->clock beyond the GTOD tick window that __update_sched_clock()
> uses to clamp the clock. A later call to __update_sched_clock() may move
> the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
> monotonic property.
>
> This patch ensures that scd->clock will not be set backward.
>
> Signed-off-by: Dave Kleikamp <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>

applied to tip/sched/clock, thanks Dave!

Ingo