LinuxLists.cc - [PATCH 1/3] dynticks - implement no idle hz for x86

[permalink] [raw]

Subject: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c

Con
---

Attachments:

(No filename) (13.00 B)
dyntick-Fix_lost_tick_calculation_in_timer_pm_c.patch (4.17 kB)
Download all attachments

2005-09-02 15:46:28

[permalink] [raw]

Subject: [PATCH 3/3] dyntick - Recover walltime upon wakeup

Con
---

Attachments:

(No filename) (10.00 B)
dyntick-Recover_walltime_upon_wakeup.patch (7.69 kB)
Download all attachments

2005-09-02 16:56:33

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, Sep 03, 2005 at 01:43:57AM +1000, Con Kolivas wrote:
> Ok I've resynced all the patches with 2.6.13-mm1, made some cleanups and minor
> modifications. As pm timer is the only supported timer for dynticks I've also
> made it depend on it.
>
> A rollup patch against 2.6.13-mm1 is here:
>
> http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch
>
> also available in the dyn-ticks directory are the older patches and these
> split out patches posted here.

Are you guys going to sync your interfaces with what ARM has, or are
we going to have two differing dyntick interfaces in the kernel, one
for ARM and one for x86?

I mentioned this before. I seem to be ignored.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-02 17:13:12

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Fri, Sep 02, 2005 at 05:56:23PM +0100, Russell King wrote:
> Are you guys going to sync your interfaces with what ARM has, or are
> we going to have two differing dyntick interfaces in the kernel, one
> for ARM and one for x86?

Three actually, including s390 :) I know that it would be really nice to sync
up with what is there in ARM/s390. I havent looked closely at both
implementations. Will have a look and post an update which should keep the
interfaces alike on all platforms.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-02 17:25:41

[permalink] [raw]

Subject: Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c

Con,
Pls use this updated "Lost tick" calculation patch, which rectifies the
two problems Thomas pointed out. I have done some basic test with it.

Would it be possible to incorporate this updated patch in
http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch?

Sorry for the inconvenience!

----

Currently, lost tick calculation in timer_pm.c is based on number
of microseconds that has elapsed since the last tick. Calculating
the number of microseconds is approximated by cyc2us, which
basically does :

microsec = (cycles * 286) / 1024

Consider 10 ticks lost. This amounts to 14319*10 = 143190 cycles
(14319 = PMTMR_EXPECTED_RATE/(CALIBRATE_LATCH/LATCH)).
This amount to 39992 microseconds as per the above equation
or 39992 / 4000 = 9 lost ticks, which is incorrect.

I feel lost ticks can be based on cycles difference directly
rather than being based on microseconds that has elapsed.

Following patch is in that direction.

With this patch, time had kept up really well on one particular
machine (Intel 4way Pentium 3 box) overnight, while
on another newer machine (Intel 4way Xeon with HT) it didnt do so
well (time sped up after 3 or 4 hours). Hence I consider this
particular patch will need more review/work.

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---

linux-2.6.13-mm1-root/arch/i386/kernel/timers/timer_pm.c | 48 ++++++---------
1 files changed, 22 insertions(+), 26 deletions(-)

diff -puN arch/i386/kernel/timers/timer_pm.c~pm_timer_fix arch/i386/kernel/timers/timer_pm.c
--- linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c~pm_timer_fix 2005-09-02 22:13:44.000000000 +0530
+++ linux-2.6.13-mm1-root/arch/i386/kernel/timers/timer_pm.c 2005-09-02 22:18:09.000000000 +0530
@@ -30,6 +30,8 @@
((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))

+static int pm_ticks_per_jiffy = PMTMR_EXPECTED_RATE / (CALIBRATE_LATCH/LATCH);
+
/* The I/O port the PMTMR resides at.
* The location is detected during setup_arch(),
* in arch/i386/acpi/boot.c */
@@ -37,8 +39,7 @@ u32 pmtmr_ioport = 0;

/* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
+static u32 offset_last;

static unsigned long long monotonic_base;
static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
@@ -127,6 +128,11 @@ pm_good:
if (verify_pmtmr_rate() != 0)
return -ENODEV;

+ printk ("Using %u PM timer ticks per jiffy \n", pm_ticks_per_jiffy);
+
+ offset_last = read_pmtmr();
+ setup_pit_timer();
+
init_cpu_khz();
return 0;
}
@@ -150,47 +156,37 @@ static inline u32 cyc2us(u32 cycles)
*/
static void mark_offset_pmtmr(void)
{
- u32 lost, delta, last_offset;
- static int first_run = 1;
- last_offset = offset_tick;
+ u32 lost, delta, deltaus, offset_now;

write_seqlock(&monotonic_lock);

- offset_tick = read_pmtmr();
+ offset_now = read_pmtmr();

/* calculate tick interval */
- delta = (offset_tick - last_offset) & ACPI_PM_MASK;
+ delta = (offset_now - offset_last) & ACPI_PM_MASK;
+
+ /* convert to ticks */
+ lost = delta / pm_ticks_per_jiffy;
+ offset_last += lost * pm_ticks_per_jiffy;
+ offset_last &= ACPI_PM_MASK;

/* convert to usecs */
- delta = cyc2us(delta);
+ deltaus = cyc2us(lost*pm_ticks_per_jiffy);

/* update the monotonic base value */
- monotonic_base += delta * NSEC_PER_USEC;
+ monotonic_base += deltaus * NSEC_PER_USEC;
write_sequnlock(&monotonic_lock);

- /* convert to ticks */
- delta += offset_delay;
- lost = delta / (USEC_PER_SEC / HZ);
- offset_delay = delta % (USEC_PER_SEC / HZ);
-
-
/* compensate for lost ticks */
if (lost >= 2)
jiffies_64 += lost - 1;
-
- /* don't calculate delay for first run,
- or if we've got less then a tick */
- if (first_run || (lost < 1)) {
- first_run = 0;
- offset_delay = 0;
- }
}

static int pmtmr_resume(void)
{
write_seqlock(&monotonic_lock);
/* Assume this is the last mark offset time */
- offset_tick = read_pmtmr();
+ offset_last = read_pmtmr();
write_sequnlock(&monotonic_lock);
return 0;
}
@@ -205,7 +201,7 @@ static unsigned long long monotonic_cloc
/* atomically read monotonic base & last_offset */
do {
seq = read_seqbegin(&monotonic_lock);
- last_offset = offset_tick;
+ last_offset = offset_last;
base = monotonic_base;
} while (read_seqretry(&monotonic_lock, seq));

@@ -239,11 +235,11 @@ static unsigned long get_offset_pmtmr(vo
{
u32 now, offset, delta = 0;

- offset = offset_tick;
+ offset = offset_last;
now = read_pmtmr();
delta = (now - offset)&ACPI_PM_MASK;

- return (unsigned long) offset_delay + cyc2us(delta);
+ return (unsigned long) cyc2us(delta);
}

_

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-02 20:18:14

by Thomas Schlichter

[permalink] [raw]

Subject: Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c

Hi Srivatsa,

thank you for improving your patch by fixing the two problems. Now I do have
just two minor nits which you may consider:

1. You don't need to hold the monotonic_lock that long, it is only necessary
when updating offset_last and monotonic_base. So I would propose something
like this:

offset_now = read_pmtmr();

/* calculate tick interval */
delta = (offset_now - offset_last) & ACPI_PM_MASK;

/* convert to ticks */
lost = delta / pm_ticks_per_jiffy;

/* convert ticks to usecs */
deltaus = cyc2us(lost * pm_ticks_per_jiffy);
// can we use this instead: ?
// deltaus = jiffies_to_usecs(lost);

write_seqlock(&monotonic_lock);
offset_last += lost * pm_ticks_per_jiffy;
offset_last &= ACPI_PM_MASK;

/* update the monotonic base value */
monotonic_base += deltaus * NSEC_PER_USEC;
write_sequnlock(&monotonic_lock);

2. Can we really assure that the monotonic clock is still monotonic?
I think with your new code we estimate the monotonic clock value and the
offset_last at the last tick.
But if we underestimate monotonic_base or overestimate offset_last (even
simply by rounding errors), the time will make a small step backwards with
the value-update.
And as far as I understand the monotonic clock its not that bad if it drifts a
bit, but it is really bad if time makes steps backward...

But maybe you can show me that I am wrong with my second point.
I hope I don't bother you too much with this kind of stuff...

Thomas

P.S.: I CC'd John because he knows the monotonic clock better than I do... :-)

Am Freitag, 2. September 2005 19:25 schrieb Srivatsa Vaddagiri:
> Con,
> Pls use this updated "Lost tick" calculation patch, which rectifies the
> two problems Thomas pointed out. I have done some basic test with it.
>
> Would it be possible to incorporate this updated patch in
> http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch?
>
> Sorry for the inconvenience!

Attachments:

(No filename) (1.89 kB)
signed data (No filename) (189.00 B)
signature Download all attachments

2005-09-02 21:22:17

by john stultz

[permalink] [raw]

Subject: Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c

On Fri, 2005-09-02 at 22:18 +0200, Thomas Schlichter wrote:
> 2. Can we really assure that the monotonic clock is still monotonic?
> I think with your new code we estimate the monotonic clock value and the
> offset_last at the last tick.
> But if we underestimate monotonic_base or overestimate offset_last (even
> simply by rounding errors), the time will make a small step backwards with
> the value-update.
> And as far as I understand the monotonic clock its not that bad if it drifts a
> bit, but it is really bad if time makes steps backward...
>
> But maybe you can show me that I am wrong with my second point.
> I hope I don't bother you too much with this kind of stuff...
>
> Thomas
>
> P.S.: I CC'd John because he knows the monotonic clock better than I do... :-)

Thanks Thomas, that's a good catch. Since monotonic_clock has no real
notion of interrupt edges (it was designed to be constant regardless if
we miss ticks), I would keep accumulating the full inter-tick intervals
converted to usecs into the monotonic_base.

thanks
-john

2005-09-03 06:13:54

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, 3 Sep 2005 02:56, Russell King wrote:
> On Sat, Sep 03, 2005 at 01:43:57AM +1000, Con Kolivas wrote:
> > Ok I've resynced all the patches with 2.6.13-mm1, made some cleanups and
> > minor modifications. As pm timer is the only supported timer for dynticks
> > I've also made it depend on it.
> >
> > A rollup patch against 2.6.13-mm1 is here:
> >
> > http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch
> >
> > also available in the dyn-ticks directory are the older patches and these
> > split out patches posted here.
>
> Are you guys going to sync your interfaces with what ARM has, or are
> we going to have two differing dyntick interfaces in the kernel, one
> for ARM and one for x86?
>
> I mentioned this before. I seem to be ignored.

RMK

Noone's ignoring you.

What we need to do is ensure that dynamic ticks is working properly on x86 and
worth including before anything else. If and when we confirm this it makes
sense only then to try and merge code from the other 2 architectures to as
much common code as possible as no doubt we'll be modifying other
architectures we're less familiar with. At that stage we will definitely want
to tread even more cautiously at that stage.

Cheers,
Con

2005-09-03 07:58:13

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> Noone's ignoring you.
>
> What we need to do is ensure that dynamic ticks is working properly on x86 and
> worth including before anything else. If and when we confirm this it makes
> sense only then to try and merge code from the other 2 architectures to as
> much common code as possible as no doubt we'll be modifying other
> architectures we're less familiar with. At that stage we will definitely want
> to tread even more cautiously at that stage.

dyntick has all the hallmarks of ending up another mess just like the
"generic" (hahaha) irq stuff in kernel/irq - it's being developed in
precisely the same way - by ignore non-x86 stuff.

I can well see that someone will say "ok, this is ready, merge it"
at which point we then end up with multiple differing userspace
methods of controlling it depending on the architecture, but
multiple differing kernel interfaces as well.

Indeed, you seem to be at the point where you'd like akpm to merge
it. That sets alarm bells ringing if you haven't considered these
issues.

I want to avoid that. Just because a couple of people say "we'll
deal with that later" it's no guarantee that it _will_ happen. I
want to ensure that ARM doesn't get fscked over again like it did
with the generic IRQ crap.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-03 08:01:36

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, 3 Sep 2005 17:58, Russell King wrote:
> On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > Noone's ignoring you.
> >
> > What we need to do is ensure that dynamic ticks is working properly on
> > x86 and worth including before anything else. If and when we confirm this
> > it makes sense only then to try and merge code from the other 2
> > architectures to as much common code as possible as no doubt we'll be
> > modifying other architectures we're less familiar with. At that stage we
> > will definitely want to tread even more cautiously at that stage.
>
> dyntick has all the hallmarks of ending up another mess just like the
> "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> precisely the same way - by ignore non-x86 stuff.
>
> I can well see that someone will say "ok, this is ready, merge it"
> at which point we then end up with multiple differing userspace
> methods of controlling it depending on the architecture, but
> multiple differing kernel interfaces as well.
>
> Indeed, you seem to be at the point where you'd like akpm to merge
> it. That sets alarm bells ringing if you haven't considered these
> issues.
>
> I want to avoid that. Just because a couple of people say "we'll
> deal with that later" it's no guarantee that it _will_ happen. I
> want to ensure that ARM doesn't get fscked over again like it did
> with the generic IRQ crap.

Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all are
consolidated in -mm.

Con

2005-09-03 08:07:00

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > Noone's ignoring you.
> > >
> > > What we need to do is ensure that dynamic ticks is working properly on
> > > x86 and worth including before anything else. If and when we confirm this
> > > it makes sense only then to try and merge code from the other 2
> > > architectures to as much common code as possible as no doubt we'll be
> > > modifying other architectures we're less familiar with. At that stage we
> > > will definitely want to tread even more cautiously at that stage.
> >
> > dyntick has all the hallmarks of ending up another mess just like the
> > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > precisely the same way - by ignore non-x86 stuff.
> >
> > I can well see that someone will say "ok, this is ready, merge it"
> > at which point we then end up with multiple differing userspace
> > methods of controlling it depending on the architecture, but
> > multiple differing kernel interfaces as well.
> >
> > Indeed, you seem to be at the point where you'd like akpm to merge
> > it. That sets alarm bells ringing if you haven't considered these
> > issues.
> >
> > I want to avoid that. Just because a couple of people say "we'll
> > deal with that later" it's no guarantee that it _will_ happen. I
> > want to ensure that ARM doesn't get fscked over again like it did
> > with the generic IRQ crap.
>
> Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all are
> consolidated in -mm.

Does this mean you're seriously going to rewrite bits of it after
you've spent what seems like months sorting out all the problems
currently being found?

Excuse me for being stupid, but I somehow don't see that happening.
Those months would be effectively wasted effort, both on the side
of the people working on the patches and those testing them.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-03 08:15:19

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sat, 3 Sep 2005 18:06, Russell King wrote:
> On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> > On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > > Noone's ignoring you.
> > > >
> > > > What we need to do is ensure that dynamic ticks is working properly
> > > > on x86 and worth including before anything else. If and when we
> > > > confirm this it makes sense only then to try and merge code from the
> > > > other 2 architectures to as much common code as possible as no doubt
> > > > we'll be modifying other architectures we're less familiar with. At
> > > > that stage we will definitely want to tread even more cautiously at
> > > > that stage.
> > >
> > > dyntick has all the hallmarks of ending up another mess just like the
> > > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > > precisely the same way - by ignore non-x86 stuff.
> > >
> > > I can well see that someone will say "ok, this is ready, merge it"
> > > at which point we then end up with multiple differing userspace
> > > methods of controlling it depending on the architecture, but
> > > multiple differing kernel interfaces as well.
> > >
> > > Indeed, you seem to be at the point where you'd like akpm to merge
> > > it. That sets alarm bells ringing if you haven't considered these
> > > issues.
> > >
> > > I want to avoid that. Just because a couple of people say "we'll
> > > deal with that later" it's no guarantee that it _will_ happen. I
> > > want to ensure that ARM doesn't get fscked over again like it did
> > > with the generic IRQ crap.
> >
> > Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all
> > are consolidated in -mm.
>
> Does this mean you're seriously going to rewrite bits of it after
> you've spent what seems like months sorting out all the problems
> currently being found?
>
> Excuse me for being stupid, but I somehow don't see that happening.
> Those months would be effectively wasted effort, both on the side
> of the people working on the patches and those testing them.

I've personally been on this code for 3 separate days in total and have no
deadline or requirement for this to go in ever so I should stop speaking on
behalf of the others.

Cheers,
Con

2005-09-04 20:11:04

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 03.09.2005 [18:14:48 +1000], Con Kolivas wrote:
> On Sat, 3 Sep 2005 18:06, Russell King wrote:
> > On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> > > On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > > > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > > > Noone's ignoring you.
> > > > >
> > > > > What we need to do is ensure that dynamic ticks is working properly
> > > > > on x86 and worth including before anything else. If and when we
> > > > > confirm this it makes sense only then to try and merge code from the
> > > > > other 2 architectures to as much common code as possible as no doubt
> > > > > we'll be modifying other architectures we're less familiar with. At
> > > > > that stage we will definitely want to tread even more cautiously at
> > > > > that stage.
> > > >
> > > > dyntick has all the hallmarks of ending up another mess just like the
> > > > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > > > precisely the same way - by ignore non-x86 stuff.
> > > >
> > > > I can well see that someone will say "ok, this is ready, merge it"
> > > > at which point we then end up with multiple differing userspace
> > > > methods of controlling it depending on the architecture, but
> > > > multiple differing kernel interfaces as well.
> > > >
> > > > Indeed, you seem to be at the point where you'd like akpm to merge
> > > > it. That sets alarm bells ringing if you haven't considered these
> > > > issues.
> > > >
> > > > I want to avoid that. Just because a couple of people say "we'll
> > > > deal with that later" it's no guarantee that it _will_ happen. I
> > > > want to ensure that ARM doesn't get fscked over again like it did
> > > > with the generic IRQ crap.
> > >
> > > Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all
> > > are consolidated in -mm.
> >
> > Does this mean you're seriously going to rewrite bits of it after
> > you've spent what seems like months sorting out all the problems
> > currently being found?
> >
> > Excuse me for being stupid, but I somehow don't see that happening.
> > Those months would be effectively wasted effort, both on the side
> > of the people working on the patches and those testing them.
>
> I've personally been on this code for 3 separate days in total and have no
> deadline or requirement for this to go in ever so I should stop speaking on
> behalf of the others.

To join in this conversation late:

I've got a few ideas that I think might help push Con's patch coalescing
efforts in an arch-independent fashion.

First of all, and maybe this is just me, I think it would be good to
make the dyn_tick_timer per-interrupt source, as opposed to each arch?
Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
APIC, ACPI PM-timer and the HPET. These structures could be put in
arch-specific timer.c files (there currently is not one for x86, I
believe). Then, at compilation time, the appropriate structure would be
linked to the arch-generic code. That should make the arch-independent
code simple to implement (I do have some patches in the works, but it's
slow going, right now, sorry). I think ARM and s390 could perhaps use
this infrastructure as well?

Also, I am a bit confused by the use of "dynamic-tick" to describe these
changes. To me, these are all NO_IDLE_HZ implementations, as they are
only invoked from cpu_idle() (or their equivalent) routines. I know this
is true of s390 and the x86 code, and I believe it is true of the ARM
code? If it were dynamic-tick, I would think we would be adjusting the
timer interrupt frequency continuously (e.g., at the end of
__run_timers() and at every call to {add,mod,del}_timer()). I was
working on a patch which did some renaming to no_idle_hz_timer, etc.,
but it's mostly code churn :)

Con, wrt to the x86 implementation, I think the max_skip field should be
a member of the interrupt source (dyn_tick_timer) structure, as opposed
to the state. This would require dyn_tick_reprogram_timer() to change
slightly: either push the max_skip check into arch-specific code (and
then have arch_reprogram() return the actual number of jiffies
programmed to skip) or simply change the check conditional.

Also, what exactly the purpose of conditional_run_local_timers()? It
seems identical to run_local_timers(), except you check for inequality
before potentially raising the softirq. It seems like the conditional
check in run_timer_softirq() [the TIMER_SOFTIRQ callback] will achieve
the same thing? And, in fact, I think that conditional is always true?
At the end of __run_timers, base->timer_jiffies should be greater than
jiffies by 1.

In any case, sorry for all the words and no code... I will try and
rectify that soon. I think it *is* possible to do some architecting now,
so that other architectures can also easily implement no_idle_hz.

Thanks,
Nish

2005-09-04 20:26:31

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> I've got a few ideas that I think might help push Con's patch coalescing
> efforts in an arch-independent fashion.

Note that ARM contains cleanups on top of Tony's original work, on
which the x86 version is based.

Basically, Tony submitted his ARM version, we discussed it, fixed up
some locking problems and simplified it (it contained multiple
structures which weren't necessary, even in multiple timer-based systems).

I'd be really surprised if any architecture couldn't use what ARM has
today - in other words, this is the only kernel-side interface:

#ifdef CONFIG_NO_IDLE_HZ

#define DYN_TICK_SKIPPING (1 << 2)
#define DYN_TICK_ENABLED (1 << 1)
#define DYN_TICK_SUITABLE (1 << 0)

struct dyn_tick_timer {
unsigned int state; /* Current state */
int (*enable)(void); /* Enables dynamic tick */
int (*disable)(void); /* Disables dynamic tick */
void (*reprogram)(unsigned long); /* Reprograms the timer */
int (*handler)(int, void *, struct pt_regs *);
};

void timer_dyn_reprogram(void);
#else
#define timer_dyn_reprogram() do { } while (0)
#endif

> First of all, and maybe this is just me, I think it would be good to
> make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> APIC, ACPI PM-timer and the HPET. These structures could be put in
> arch-specific timer.c files (there currently is not one for x86, I
> believe).

Each timer source should have its own struct dyn_tick_timer. On x86,
maybe it makes sense having a pointer in the init_timer_opts or timer_opts
structures?

> I think ARM and s390 could perhaps use this infrastructure as well?

ARM already has a well thought-out encapsulation which is 100% suited to
its needs - which are essentially the same as x86 - the ability to select
one of several timer sources at boot time.

I would suggest having a good look at the ARM implementation. See:
include/asm-arm/mach/time.h (bit quoted above)
arch/arm/kernel/irq.c (to update system time before calling any irq handler)
arch/arm/kernel/time.c (initialisation and sysfs interface, etc)
arch/arm/mach-sa1100/time.c, arch/arm/mach-pxa/time.c, and
arch/arm/mach-omap1/time.c (dyntick machine class implementations).

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-04 20:38:04

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > I've got a few ideas that I think might help push Con's patch coalescing
> > efforts in an arch-independent fashion.
>
> Note that ARM contains cleanups on top of Tony's original work, on
> which the x86 version is based.
>
> Basically, Tony submitted his ARM version, we discussed it, fixed up
> some locking problems and simplified it (it contained multiple
> structures which weren't necessary, even in multiple timer-based systems).

Make sense. Thanks for the quick feedback!

> I'd be really surprised if any architecture couldn't use what ARM has
> today - in other words, this is the only kernel-side interface:
>
> #ifdef CONFIG_NO_IDLE_HZ
>
> #define DYN_TICK_SKIPPING (1 << 2)
> #define DYN_TICK_ENABLED (1 << 1)
> #define DYN_TICK_SUITABLE (1 << 0)
>
> struct dyn_tick_timer {
> unsigned int state; /* Current state */
> int (*enable)(void); /* Enables dynamic tick */
> int (*disable)(void); /* Disables dynamic tick */
> void (*reprogram)(unsigned long); /* Reprograms the timer */
> int (*handler)(int, void *, struct pt_regs *);
> };
>
> void timer_dyn_reprogram(void);
> #else
> #define timer_dyn_reprogram() do { } while (0)
> #endif

That looks great! So I guess I'm just suggesting moving this from
include/asm-arch/mach/time.h to arch-independent headers? Perhaps
timer.h is the best place for now, as it already contains the
next_timer_interrupt() prototype (which probably should be in the #ifdef
with timer_dyn_reprogram()).

> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
> > arch-specific timer.c files (there currently is not one for x86, I
> > believe).
>
> Each timer source should have its own struct dyn_tick_timer. On x86,
> maybe it makes sense having a pointer in the init_timer_opts or timer_opts
> structures?

Well, I know John Stultz is not a big fan of timer_opts, and is trying
to get rid of it :) timer_opts is supposed to be for timesources, I
believe, which are distinct from interrupt sources (e.g., TSC, Cyclone,
etc.), whereas I think dyn-tick is dealing with interrupt sources. I
guess if hardware (like the acpi_pm) can do both, there could be some
sort of inter-hooking.

> > I think ARM and s390 could perhaps use this infrastructure as well?
>
> ARM already has a well thought-out encapsulation which is 100% suited to
> its needs - which are essentially the same as x86 - the ability to select
> one of several timer sources at boot time.
>
> I would suggest having a good look at the ARM implementation. See:
> include/asm-arm/mach/time.h (bit quoted above)
> arch/arm/kernel/irq.c (to update system time before calling any irq handler)
> arch/arm/kernel/time.c (initialisation and sysfs interface, etc)
> arch/arm/mach-sa1100/time.c, arch/arm/mach-pxa/time.c, and
> arch/arm/mach-omap1/time.c (dyntick machine class implementations).

Yeah, I took a quick look before sending out my mail, but obviously need
to study it more. Thanks for the pointers! I guess that the time.h,
irq.c and time.c bits could all (or mostly) be done in arch-independent
code? I agree that your encapsulation seems to be suited to most arch's
use of NO_IDLE_HZ.

Overall, though, do you agree it would be best to have the common code
in a common file? If so, I'll work harder on getting some patches out.

Thanks,
Nish

2005-09-04 20:42:21

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

2005-09-04 21:17:56

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sun, Sep 04, 2005 at 01:37:55PM -0700, Nishanth Aravamudan wrote:
> That looks great! So I guess I'm just suggesting moving this from
> include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> timer.h is the best place for now, as it already contains the
> next_timer_interrupt() prototype (which probably should be in the #ifdef
> with timer_dyn_reprogram()).

Sounds great!

> Overall, though, do you agree it would be best to have the common code
> in a common file? If so, I'll work harder on getting some patches out.

Absolutely, with the proviso that ARM doesn't (yet) use the generic IRQ
code. I say "(yet)" there because there are some folk working in this
area, and I've recently merged a couple of bits which reduce the impact
of their patches.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-05 03:06:32

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, 5 Sep 2005 06:37 am, Nishanth Aravamudan wrote:
> On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > I've got a few ideas that I think might help push Con's patch
> > > coalescing efforts in an arch-independent fashion.

Thanks very much Nish!

I've updated the patches here http://ck.kolivas.org/patches/dyn-ticks/ with
the latest change to timer_pm.c that Srivatsa sent me and have a new rollup
there as well as the split out patches. The ball is in Nish's court now so
we'll avoid touching the code till you get back to us (this project needs
some form of locking ;) ).

Cheers,
Con

2005-09-05 05:33:08

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sun, Sep 04, 2005 at 09:26:16PM +0100, Russell King wrote:
> I'd be really surprised if any architecture couldn't use what ARM has
> today - in other words, this is the only kernel-side interface:

Russel,
I went thr' the ARM implementation and have some remarks (mostly
from a SMP perspective):

1. On a SMP platform, we want to let individual CPUs "sleep" independent of
each other. What this mean is there has to be some way of tracking which
CPU's are sleeping currently, so that code like RCU ignores sleeping CPUs.
This was the reason nohz_cpu_mask bitmap was added. I don't see that
bitmap being updated at all in ARM implementation.

2. On architectures like x86 there is a separate jiffy interrupt source
(PIT) which is used to update time-of-day. This is different from the
HZ timer interrupts used on each CPU (local apic timer). When all
CPUs are idle and sleeping, we want to shut off this PIT timer as well.
That's why I added 'arch_all_cpus_idle' interface. One could argue that
this can be done as part of the dyn_tick->reprogram interface as well,
but I felt that having a separate arch_all_cpus_idle is cleaner and
makes it clear what its purpose is.

3. The fact that we want to manipulate the bitmap (set a bit when CPU is going
idle and unset it when it is waking up) _and_ the fact that want to take
some action when all CPUs are idle or when the first CPU is waking up,
requires the use of a spinlock, which is again not present in the ARM
implementation.

4. Again the fact that CPUs could be sleeping independent of each other
requires do_IRQ to check out whether the current CPU was sleeping as
its first step. If the CPU was sleeping, it needs to unset itself
from the bitmap _and_ if we are coming out of "all-cpu-asleep" state,
the PIT timer needs to be restarted as well as time recovered. Note
that these two steps need not be undertaken if we were not in
"all-cpus-asleep" state.

I don't see provisions for all these in the current ARM implementation.
In fact the x86 patch that Tony/Con posted didnt take into account most of these
as well, which is the reason I jumped in to fix the above issues.

5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
it doesnt make sense since it will have to be per-cpu. The bitmap
that I talked of exactly tells that (whether a CPU is skipping
ticks or not).

6. S390 makes use of notifier mechanism to notify when CPUs are coming
in and out of idle state. Don't know how it will be used in other
arches. But obviously, if we are talking of unifying, we have to
provide one.

I hope this makes clear why some of the rework happened, which
in a way is extending the interface that ARM already has. Having
said all these, I do agree that having a consistent interface
is good (for example: x86 has dyn_tick_state structure whereas
ARM uses dyn_tick_timer strucuture itself to store the state etc).

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 05:48:21

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [11:02:25 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 09:26:16PM +0100, Russell King wrote:
> > I'd be really surprised if any architecture couldn't use what ARM has
> > today - in other words, this is the only kernel-side interface:
>
> Russel,
> I went thr' the ARM implementation and have some remarks (mostly
> from a SMP perspective):
>
> 1. On a SMP platform, we want to let individual CPUs "sleep" independent of
> each other. What this mean is there has to be some way of tracking which
> CPU's are sleeping currently, so that code like RCU ignores sleeping CPUs.
> This was the reason nohz_cpu_mask bitmap was added. I don't see that
> bitmap being updated at all in ARM implementation.

Admittedly, I don't think SMP ARM has been around all that long? Maybe
the existing code just has not been extended.

> 2. On architectures like x86 there is a separate jiffy interrupt source
> (PIT) which is used to update time-of-day. This is different from the
> HZ timer interrupts used on each CPU (local apic timer). When all
> CPUs are idle and sleeping, we want to shut off this PIT timer as well.
> That's why I added 'arch_all_cpus_idle' interface. One could argue that
> this can be done as part of the dyn_tick->reprogram interface as well,
> but I felt that having a separate arch_all_cpus_idle is cleaner and
> makes it clear what its purpose is.

I'm not sure on this. It's going to be NULL for other architectures, or
end up being called by the reprogram() call for the last CPU to go idle,
right (presuming there isn't a separate TOD source, like in x86). I
think it is better to be in the reprogram() interface.

> 3. The fact that we want to manipulate the bitmap (set a bit when CPU is going
> idle and unset it when it is waking up) _and_ the fact that want to take
> some action when all CPUs are idle or when the first CPU is waking up,
> requires the use of a spinlock, which is again not present in the ARM
> implementation.

This might just be tied to the same UP capabilities, I'm not sure.

> 4. Again the fact that CPUs could be sleeping independent of each other
> requires do_IRQ to check out whether the current CPU was sleeping as
> its first step. If the CPU was sleeping, it needs to unset itself
> from the bitmap _and_ if we are coming out of "all-cpu-asleep" state,
> the PIT timer needs to be restarted as well as time recovered. Note
> that these two steps need not be undertaken if we were not in
> "all-cpus-asleep" state.

I agree; the latter can be in the arch-specific reprogram() code, just
like arch_all_cpus_idle() (which might be better named to
arch_set_all_cpus_idle()).

> I don't see provisions for all these in the current ARM
> implementation. In fact the x86 patch that Tony/Con posted didnt take
> into account most of these as well, which is the reason I jumped in to
> fix the above issues.

I definitely appreciate you doing so; dyn-tick for x86 has clearly come
a long way in a short time.

> 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
> it doesnt make sense since it will have to be per-cpu. The bitmap
> that I talked of exactly tells that (whether a CPU is skipping
> ticks or not).

I'll take a look at this.

> 6. S390 makes use of notifier mechanism to notify when CPUs are coming
> in and out of idle state. Don't know how it will be used in other
> arches. But obviously, if we are talking of unifying, we have to
> provide one.

Couldn't that be part of the s390-specific init() code? That member is
non-existent in the ARM implementation either, but it should not be hard
to add? The only issue I see, though, is that the function which the
init() member points to should not be marked __init, as we could have an
empty pointer later on?

> I hope this makes clear why some of the rework happened, which
> in a way is extending the interface that ARM already has. Having
> said all these, I do agree that having a consistent interface
> is good (for example: x86 has dyn_tick_state structure whereas
> ARM uses dyn_tick_timer strucuture itself to store the state etc).

I'm not sure on this last one, though, what part of the state can't
simply be represented by an integer with appropriate &-ing?

Thanks,
Nish

2005-09-05 06:33:05

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> Admittedly, I don't think SMP ARM has been around all that long? Maybe
> the existing code just has not been extended.

Yeah, maybe ARM never cared for SMP. But we do care :)

> I'm not sure on this. It's going to be NULL for other architectures, or
> end up being called by the reprogram() call for the last CPU to go idle,
> right (presuming there isn't a separate TOD source, like in x86). I
> think it is better to be in the reprogram() interface.

Non-x86 could have it set to NULL, in which case it doesn't get called.
(I know the current code does not take care of this situation).
But having an explicit 'all_cpus_idle' interface may be good, since
Tony talked of idling some devices when all CPUs are idle. So it
probably has non-x86/PIT uses too.

> > 6. S390 makes use of notifier mechanism to notify when CPUs are coming
> > in and out of idle state. Don't know how it will be used in other
> > arches. But obviously, if we are talking of unifying, we have to
> > provide one.
>
> Couldn't that be part of the s390-specific init() code? That member is
> non-existent in the ARM implementation either, but it should not be hard
> to add? The only issue I see, though, is that the function which the
> init() member points to should not be marked __init, as we could have an
> empty pointer later on?

If we consider that only s390 needs it and other arch's dont, then it need
not be even part of the init code. Basically the notifier list can be maintained
by s390 in its arch-code entirely and have 'stop_hz_timer' call into
dyn_tick_reprogram_timer (or something like that)? But I feel there will be
other uses for the notifier list (I know the slab reap timer fires every two
seconds and that may be unnecessary on idle CPUs if it is not reaping
anything - perhaps it could use such a notifier to fire at longer intervals on
idle CPUs? That may be true of other short-timers that kernel/drivers may be
using. This is just a thought and may need more consideration before we
put a notifier mechanism in arch-independent code).

> I'm not sure on this last one, though, what part of the state can't
> simply be represented by an integer with appropriate &-ing?

Everything can be represented in bits! I was just comparing composition
of structures in ARM and x86. The state bitfield is part of
'struct dyn_tick_timer' itself in ARM while it is part of a separate structure
(dyn_tick_state) in x86. Similar minor points need to be sorted out while
unifying.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 06:44:22

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [12:02:29 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> > Admittedly, I don't think SMP ARM has been around all that long?
> > Maybe the existing code just has not been extended.
>
> Yeah, maybe ARM never cared for SMP. But we do care :)

I just took a look at arm/Kconfig and SMP is marked as EXPERIMENTAL &&
BROKEN. So I'm guessing that is the only reason for some of the
differences you mentioned (the differences are of course, valid and the
x86 SMP implementation makes sense to me to extend arch-independently).

> > I'm not sure on this. It's going to be NULL for other architectures,
> > or end up being called by the reprogram() call for the last CPU to
> > go idle, right (presuming there isn't a separate TOD source, like in
> > x86). I think it is better to be in the reprogram() interface.
>
> Non-x86 could have it set to NULL, in which case it doesn't get
> called. (I know the current code does not take care of this
> situation). But having an explicit 'all_cpus_idle' interface may be
> good, since Tony talked of idling some devices when all CPUs are idle.
> So it probably has non-x86/PIT uses too.

OK, not a problem. I'll try and write up a general intsource.h file
(interrupt source header) tonight and tomorrow and send it to this list
to see if everybody agrees on what's in the structure and where the
arch-independent/dependent line lies.

> > > 6. S390 makes use of notifier mechanism to notify when CPUs are
> > > coming in and out of idle state. Don't know how it will be used
> > > in other arches. But obviously, if we are talking of unifying,
> > > we have to provide one.
> >
> > Couldn't that be part of the s390-specific init() code? That member
> > is non-existent in the ARM implementation either, but it should not
> > be hard to add? The only issue I see, though, is that the function
> > which the init() member points to should not be marked __init, as we
> > could have an empty pointer later on?
>
> If we consider that only s390 needs it and other arch's dont, then it
> need not be even part of the init code. Basically the notifier list
> can be maintained by s390 in its arch-code entirely and have
> 'stop_hz_timer' call into dyn_tick_reprogram_timer (or something like
> that)? But I feel there will be other uses for the notifier list (I
> know the slab reap timer fires every two seconds and that may be
> unnecessary on idle CPUs if it is not reaping anything - perhaps it
> could use such a notifier to fire at longer intervals on idle CPUs?
> That may be true of other short-timers that kernel/drivers may be
> using. This is just a thought and may need more consideration before
> we put a notifier mechanism in arch-independent code).

Yeah, maybe we would be ok with keeping the notifier setup s390-specific
for now, and then extending the faculty to arch-independent code if we
find good (clean) reasons to do so. I'm not saying the slab reaping code
is insufficient, but I want to keep the structure and code as simple as
possible at first (in the design phase, at least).

> > I'm not sure on this last one, though, what part of the state can't
> > simply be represented by an integer with appropriate &-ing?
>
> Everything can be represented in bits! I was just comparing
> composition of structures in ARM and x86. The state bitfield is part
> of 'struct dyn_tick_timer' itself in ARM while it is part of a
> separate structure (dyn_tick_state) in x86. Similar minor points need
> to be sorted out while unifying.

Heh, I agree :) I just wanted to make sure that I hadn't missed
something and there was a *specific* reason the x86 code was using a
separate structure. I actuall prefer keeping it tied to the interrupt
source; it's simpler to me.

Thanks,
Nish

2005-09-05 06:59:23

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

* Nishanth Aravamudan <[email protected]> [050904 23:38]:
> On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > I've got a few ideas that I think might help push Con's patch coalescing
> > > efforts in an arch-independent fashion.
> >
> > Note that ARM contains cleanups on top of Tony's original work, on
> > which the x86 version is based.
> >
> > Basically, Tony submitted his ARM version, we discussed it, fixed up
> > some locking problems and simplified it (it contained multiple
> > structures which weren't necessary, even in multiple timer-based systems).
>
> Make sense. Thanks for the quick feedback!
>
> > I'd be really surprised if any architecture couldn't use what ARM has
> > today - in other words, this is the only kernel-side interface:
> >
> > #ifdef CONFIG_NO_IDLE_HZ
> >
> > #define DYN_TICK_SKIPPING (1 << 2)
> > #define DYN_TICK_ENABLED (1 << 1)
> > #define DYN_TICK_SUITABLE (1 << 0)
> >
> > struct dyn_tick_timer {
> > unsigned int state; /* Current state */
> > int (*enable)(void); /* Enables dynamic tick */
> > int (*disable)(void); /* Disables dynamic tick */
> > void (*reprogram)(unsigned long); /* Reprograms the timer */
> > int (*handler)(int, void *, struct pt_regs *);
> > };
> >
> > void timer_dyn_reprogram(void);
> > #else
> > #define timer_dyn_reprogram() do { } while (0)
> > #endif
>
> That looks great! So I guess I'm just suggesting moving this from
> include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> timer.h is the best place for now, as it already contains the
> next_timer_interrupt() prototype (which probably should be in the #ifdef
> with timer_dyn_reprogram()).

Yes, the above should be enough on all platforms. I believe x86 still uses
two structs, and should be updated to use the interface above. There are
some extra state flags on x86, but even some of those might be
unnecessary now.

It may not be obvious from the mailing list discussions, but really the
remaining problems are to fix the x86 legacy issues with all the timers,
not with the interface.

Tony

2005-09-05 07:01:27

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> First of all, and maybe this is just me, I think it would be good to
> make the dyn_tick_timer per-interrupt source, as opposed to each arch?

Nish, may be a good idea as it may make the code more cleaner (it will
remove the 'if (cpu_has_local_apic())' kind of code that is there
currently in x86). However note that ARM currently has 'handler' member also
part of it, which is used to recover time and that has nothing to do with
interrupt source. Unless there is something like John's TOD, we still
need to recover time in a arch-dependent fashion ..Where do you
propose to have that 'handler' member?

> Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> APIC, ACPI PM-timer and the HPET. These structures could be put in

Does the ACPI PM-timer support generating interrupts also? Same question
I have for HPET.

> arch-specific timer.c files (there currently is not one for x86, I
> believe). Then, at compilation time, the appropriate structure would be
> linked to the arch-generic code. That should make the arch-independent

I think this binding has to be done at run-time, instead of compile-time?
(since we may build in support for local APIC but not find one at run-time, in
which case we have to fall back on PIT as interrupt source).

> code simple to implement (I do have some patches in the works, but it's
> slow going, right now, sorry). I think ARM and s390 could perhaps use
> this infrastructure as well?
>
> Also, I am a bit confused by the use of "dynamic-tick" to describe these
> changes. To me, these are all NO_IDLE_HZ implementations, as they are
> only invoked from cpu_idle() (or their equivalent) routines. I know this
> is true of s390 and the x86 code, and I believe it is true of the ARM
> code? If it were dynamic-tick, I would think we would be adjusting the
> timer interrupt frequency continuously (e.g., at the end of
> __run_timers() and at every call to {add,mod,del}_timer()). I was
> working on a patch which did some renaming to no_idle_hz_timer, etc.,
> but it's mostly code churn :)

Yes, the name 'dynamic-tick' is misleading!

> Con, wrt to the x86 implementation, I think the max_skip field should be
> a member of the interrupt source (dyn_tick_timer) structure, as opposed
> to the state. This would require dyn_tick_reprogram_timer() to change

max_skip is dictated by two things - interrupt and the backing time source.
In case of Local APIC, it may allow for ticks to be skipped upto few tens of
seconds, but if we are using ACPI PM timer to recover time, then we can
really skip not more than what the 24-bit PM timer allows for recovering time.
(few seconds if I remember correctly). Do you agree?

> Also, what exactly the purpose of conditional_run_local_timers()? It
> seems identical to run_local_timers(), except you check for inequality
> before potentially raising the softirq. It seems like the conditional
> check in run_timer_softirq() [the TIMER_SOFTIRQ callback] will achieve
> the same thing? And, in fact, I think that conditional is always true?

Nish, that was just an optimization for not raising the softirq at all
if the CPU was woken up w/o having skipped any ticks (becasue
of some external interrupt).

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 07:27:42

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

* Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> >
> > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > is true of s390 and the x86 code, and I believe it is true of the ARM
> > code? If it were dynamic-tick, I would think we would be adjusting the
> > timer interrupt frequency continuously (e.g., at the end of
> > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > but it's mostly code churn :)
>
> Yes, the name 'dynamic-tick' is misleading!

Huh? For most people dynamic-tick is much more descriptive name than
NO_IDLE_HZ or VST!

If you wanted, you could reprogram the next timer to happen from
{add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.

And you would want to do that if you wanted sub-jiffie timer interrupts.

So I'd rather not limit the name to the currently implemented functionality
only :)

Tony

2005-09-05 07:37:42

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 11:02:25AM +0530, Srivatsa Vaddagiri wrote:
> I don't see provisions for all these in the current ARM implementation.

That's because, like x86, we've been ignoring each other. ARM
doesn't handle dyntick SMP yet - ARM is fairly young as far as
SMP issues goes, and as yet doesn't include a full SMP
implementation in mainline.

Despite that, the timers as implemented on the hardware are not
suitable for dyntick use - attempting to use them, you lose long
term precision of the timer interrupts.

> 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
> it doesnt make sense since it will have to be per-cpu. The bitmap
> that I talked of exactly tells that (whether a CPU is skipping
> ticks or not).

What's DYN_TICK_SKIPPING and what's it used for? It looks like
a redundant definition left over from Tony's original implementation.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-05 07:44:32

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
>
> Nish, may be a good idea as it may make the code more cleaner (it will
> remove the 'if (cpu_has_local_apic())' kind of code that is there
> currently in x86). However note that ARM currently has 'handler' member also
> part of it, which is used to recover time and that has nothing to do with
> interrupt source. Unless there is something like John's TOD, we still
> need to recover time in a arch-dependent fashion ..Where do you
> propose to have that 'handler' member?

Exactly where it is. It's there because of the problem you allude to
above - it's there to catch up system time. Any generic code can't
answer the question "how much time has passed since we disabled the
timer" without additional information.

However, we could change "handler" to be a function pointer which
returns the number of missed ticks instead, and then updates the
kernels time and tick keeping. That would probably be more efficient.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-05 07:50:06

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 08:37:28AM +0100, Russell King wrote:
> That's because, like x86, we've been ignoring each other. ARM
> doesn't handle dyntick SMP yet - ARM is fairly young as far as
> SMP issues goes, and as yet doesn't include a full SMP
> implementation in mainline.

>
> Despite that, the timers as implemented on the hardware are not
> suitable for dyntick use - attempting to use them, you lose long
> term precision of the timer interrupts.

Thats one of the problems I am seeing on x86 as well. Recovering
wall-time precisely after sleep is tough esepcially if the interrupt
source (PIT) and backing-time source (TSC/PM Timer/HPET) can
drift wrt each other. PPC64 should be much better I hope (which is what I
intend to take up next).

> > 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
> > it doesnt make sense since it will have to be per-cpu. The bitmap
> > that I talked of exactly tells that (whether a CPU is skipping
> > ticks or not).
>
> What's DYN_TICK_SKIPPING and what's it used for? It looks like
> a redundant definition left over from Tony's original implementation.

Tony was using it to signal that all CPUs are idle and timer are
being skipped. With the SMP changes I made, I felt it can be
substituted with the nohz_cpu_mask bitmap and hence I removed
it.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 08:00:38

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 01:19:28PM +0530, Srivatsa Vaddagiri wrote:
> > Despite that, the timers as implemented on the hardware are not
> > suitable for dyntick use - attempting to use them, you lose long
> > term precision of the timer interrupts.
>
> Thats one of the problems I am seeing on x86 as well. Recovering
> wall-time precisely after sleep is tough esepcially if the interrupt
> source (PIT) and backing-time source (TSC/PM Timer/HPET) can
> drift wrt each other. PPC64 should be much better I hope (which is what I
> intend to take up next).

This is why the config option to enable it on ARM has a warning in
there about it. Some hardware timer implementations just aren't
suitable for this, so users should be warned about it (and are on
ARM.)

> Tony was using it to signal that all CPUs are idle and timer are
> being skipped. With the SMP changes I made, I felt it can be
> substituted with the nohz_cpu_mask bitmap and hence I removed
> it.

Well, consider that definition removed from ARM. Forget it was even
saw it in there. 8)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-05 08:20:00

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 08:44:25AM +0100, Russell King wrote:
> Exactly where it is. It's there because of the problem you allude to
> above - it's there to catch up system time. Any generic code can't
> answer the question "how much time has passed since we disabled the
> timer" without additional information.
>
> However, we could change "handler" to be a function pointer which
> returns the number of missed ticks instead, and then updates the
> kernels time and tick keeping. That would probably be more efficient.

This is precisely what I have done. I have made cur_timer->mark-offset() to
return the lost ticks and update wall-time from the callee, which
can be either timer_interrupt handler or in dyn-tick case the dyn-tick
code (I have called it dyn_tick_interrupt) which is called before processing
_any_ interrupt. If ARM had a timer_opts equivalent we could have followed
the same approach i.e remove 'handler' member and call dyn_tick_interrupt
as first step in __do_irq/do_IRQ to process whatever it wants (recover wall
time, start PIT timer in case of x86 etc). This is the definition of
dyn_tick_interrupt that I have in my patch:

~~~~~~~~~~~~~

asm-i386/dyn-tick.h:

#ifdef CONFIG_NO_IDLE_HZ

extern void dyn_tick_interrupt(int irq, struct pt_regs *regs);

#else

static inline void dyn_tick_interrupt(int irq, struct pt_regs *regs)
{
}

#endif

And dyn_tick_interrupt is coded as:

arch/i386/kernel/dyn-tick.c:

void dyn_tick_interrupt(int irq, struct pt_regs *regs)
{
int all_were_sleeping = 0;
int cpu = smp_processor_id();

if (!cpu_isset(cpu, nohz_cpu_mask))
return;

spin_lock(&dyn_tick_lock);

if (cpus_equal(nohz_cpu_mask, cpu_online_map))
all_were_sleeping = 1;
cpu_clear(cpu, nohz_cpu_mask);

if (all_were_sleeping) {
/* Recover jiffies */
if (irq) {
int lost;

lost = cur_timer->mark_offset();
if (lost)
do_timer(regs);
}
if (cpu_has_local_apic())
enable_pit_timer();
}

spin_unlock(&dyn_tick_lock);

if (cpu_has_local_apic())
/* Fixme: Needs to be more accurate */
reprogram_apic_timer(1);
else
reprogram_pit_timer(1);

conditional_run_local_timers();

/* Fixme: Enable NMI watchdog */
}

~~~~~~~~~~~

Considering that ARM does not have any of that timer_opts structure,
could you call into INT_OS_TIMER handler from dyn_tick_interrupt? AFAICS,
INT_OS_TIMER handler and dyn_tick->handler is same ..

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 08:32:31

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 01:49:35PM +0530, Srivatsa Vaddagiri wrote:
> This is precisely what I have done. I have made cur_timer->mark-offset() to
> return the lost ticks and update wall-time from the callee, which
> can be either timer_interrupt handler or in dyn-tick case the dyn-tick
> code (I have called it dyn_tick_interrupt) which is called before processing
> _any_ interrupt.

When you have a timer which constantly increments from 0 to MAX and
wraps, and you can set the value to match to cause an interrupt,
it makes more sense to handle it the way we're doing it (which
incidentally leads to no loss of precision.)

Calculating the number of ticks missed, updating the kernel time,
and updating the timer match will cause problems with these - if
the timer has already past the number of ticks you originally
calculated, you may not get another interrupt for a long time.

So I don't actually think that your proposal will work for these
(SA11x0 and PXA).

> If ARM had a timer_opts equivalent we could have followed

I think your timer_opts is effectively our struct sys_timer.

> int lost;
>
> lost = cur_timer->mark_offset();
> if (lost)
> do_timer(regs);

This seems to only recover one tick. What if multiple ticks were lost?

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2005-09-05 09:25:22

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 09:32:21AM +0100, Russell King wrote:
> When you have a timer which constantly increments from 0 to MAX and
> wraps, and you can set the value to match to cause an interrupt,
> it makes more sense to handle it the way we're doing it (which
> incidentally leads to no loss of precision.)

> Calculating the number of ticks missed, updating the kernel time,
> and updating the timer match will cause problems with these - if
> the timer has already past the number of ticks you originally
> calculated, you may not get another interrupt for a long time.
>
> So I don't actually think that your proposal will work for these
> (SA11x0 and PXA).

I presume you are referring to code as in omap_32k_timer_interrupt
which calculates lost ticks as well as updates wall-time and
sets up the next interrupt (BTW doesnt 'now' need to be
refreshed everytime in the loop otherwise will cause the problem
you cite - may not get interrupt for a long time?). Tony,
that may have cause slow bootups for you :)

I am not saying that all the above be done from the callee. In fact
in case of ARM, the same handler can be called from dyn_tick_interrupt.
Having some form of 'dyn_tick_interrupt' makes sense because
it encapsulates functionalities like:

- If CPU is not sleeping currently, return (which can happen in SMP)
- Reset the CPU from the bitmap, under the cover of a spinlock
- Recover wall-time if we are coming out of 'all-cpus-were-asleep'
state. In case of ARM, dyn_tick_timer->handler could be called
for this purpose.

> This seems to only recover one tick. What if multiple ticks were lost?

cur_timer->mark_offset() recovers the rest.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 13:21:35

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
>
> Does the ACPI PM-timer support generating interrupts also? Same question
> I have for HPET.

I think HPET does support generation of interrupts but not PM timer.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 16:28:42

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [13:08:20 +1000], Con Kolivas wrote:
> On Mon, 5 Sep 2005 06:37 am, Nishanth Aravamudan wrote:
> > On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > I've got a few ideas that I think might help push Con's patch
> > > > coalescing efforts in an arch-independent fashion.
>
> Thanks very much Nish!
>
> I've updated the patches here http://ck.kolivas.org/patches/dyn-ticks/ with
> the latest change to timer_pm.c that Srivatsa sent me and have a new rollup
> there as well as the split out patches. The ball is in Nish's court now so
> we'll avoid touching the code till you get back to us (this project needs
> some form of locking ;) ).

Albeit, don't take that to mean that other people shouldn't keep doing
what they are doing (Srivatsa with his pm_timer work, scalability work,
e.g.) :) Hopefully, any changes I make, will not take too long.

Thanks,
Nish

2005-09-05 16:31:00

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [09:58:59 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <[email protected]> [050904 23:38]:
> > On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > I've got a few ideas that I think might help push Con's patch coalescing
> > > > efforts in an arch-independent fashion.
> > >
> > > Note that ARM contains cleanups on top of Tony's original work, on
> > > which the x86 version is based.
> > >
> > > Basically, Tony submitted his ARM version, we discussed it, fixed up
> > > some locking problems and simplified it (it contained multiple
> > > structures which weren't necessary, even in multiple timer-based systems).
> >
> > Make sense. Thanks for the quick feedback!
> >
> > > I'd be really surprised if any architecture couldn't use what ARM has
> > > today - in other words, this is the only kernel-side interface:
> > >
> > > #ifdef CONFIG_NO_IDLE_HZ
> > >
> > > #define DYN_TICK_SKIPPING (1 << 2)
> > > #define DYN_TICK_ENABLED (1 << 1)
> > > #define DYN_TICK_SUITABLE (1 << 0)
> > >
> > > struct dyn_tick_timer {
> > > unsigned int state; /* Current state */
> > > int (*enable)(void); /* Enables dynamic tick */
> > > int (*disable)(void); /* Disables dynamic tick */
> > > void (*reprogram)(unsigned long); /* Reprograms the timer */
> > > int (*handler)(int, void *, struct pt_regs *);
> > > };
> > >
> > > void timer_dyn_reprogram(void);
> > > #else
> > > #define timer_dyn_reprogram() do { } while (0)
> > > #endif
> >
> > That looks great! So I guess I'm just suggesting moving this from
> > include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> > timer.h is the best place for now, as it already contains the
> > next_timer_interrupt() prototype (which probably should be in the #ifdef
> > with timer_dyn_reprogram()).
>
> Yes, the above should be enough on all platforms. I believe x86 still uses
> two structs, and should be updated to use the interface above. There are
> some extra state flags on x86, but even some of those might be
> unnecessary now.

Yes, I agree.

> It may not be obvious from the mailing list discussions, but really the
> remaining problems are to fix the x86 legacy issues with all the timers,
> not with the interface.

The interface in x86 is fine, I agree. But the problem I see, is that we
would now have *3* different implementations of dyn-tick. At the point
where Con or anyone else is ready to propose merging of the code, I
think the dyn-tick work comes across much better if it simultaneously
unifies the existing NO_IDLE_HZ implementations in common files where
appropriate.

Thanks,
Nish

2005-09-05 16:33:42

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [09:00:28 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 01:19:28PM +0530, Srivatsa Vaddagiri wrote:
> > > Despite that, the timers as implemented on the hardware are not
> > > suitable for dyntick use - attempting to use them, you lose long
> > > term precision of the timer interrupts.
> >
> > Thats one of the problems I am seeing on x86 as well. Recovering
> > wall-time precisely after sleep is tough esepcially if the interrupt
> > source (PIT) and backing-time source (TSC/PM Timer/HPET) can drift
> > wrt each other. PPC64 should be much better I hope (which is what I
> > intend to take up next).
>
> This is why the config option to enable it on ARM has a warning in
> there about it. Some hardware timer implementations just aren't
> suitable for this, so users should be warned about it (and are on
> ARM.)

And this is where almost all of the bugs are going to come from in the
x86 implementation. John Stultz's rework helps remove some of the
interrupt dependency of the timeofday code, but he's reworking it now.

> > Tony was using it to signal that all CPUs are idle and timer are
> > being skipped. With the SMP changes I made, I felt it can be
> > substituted with the nohz_cpu_mask bitmap and hence I removed
> > it.
>
> Well, consider that definition removed from ARM. Forget it was even
> saw it in there. 8)

Yes, the cpu_mask covers the same concept, I think it's a good choice.

Thanks,
Nish

2005-09-05 17:02:07

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > >
> > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > timer interrupt frequency continuously (e.g., at the end of
> > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > but it's mostly code churn :)
> >
> > Yes, the name 'dynamic-tick' is misleading!
>
> Huh? For most people dynamic-tick is much more descriptive name than
> NO_IDLE_HZ or VST!

I understand this. My point is that the structures are *not*
dynamic-tick specific. They are interrupt source specific, generally
(also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
users of the interrupt source reprogramming functions, but not the
reprogrammers themselves, in my mind. Also, it still would be confusing
to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)

> If you wanted, you could reprogram the next timer to happen from
> {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.

I messed with this with my soft-timer rework (which has since has fallen
by the wayside). It is a bit of overhead, especially del_timer(), but
it's possible. This is what I would consider "dynamic-tick." And I would
setup a *different* .config option to enable it. Perhaps depending on
CONFIG_NO_IDLE_HZ.

> And you would want to do that if you wanted sub-jiffie timer
> interrupts.

Yes, true, it does enable that. Well, to be honest, it completely
redefines (in some sense) the jiffy, as it is potentially continuously
changing, not just at idle times.

> So I'd rather not limit the name to the currently implemented
> functionality only :)

I'm not trying to limit the name, but make sure we are tying the
strcutures and functions to the right abstraction (interrupt source, in
my opinion).

Thanks,
Nish

2005-09-05 16:59:05

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [12:30:53 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
>
> Nish, may be a good idea as it may make the code more cleaner (it will
> remove the 'if (cpu_has_local_apic())' kind of code that is there
> currently in x86).

Yes, exactly. I think those kind of interrupt-source specific code can
be handled by the interrupt-source :)

> However note that ARM currently has 'handler' member also part of it,
> which is used to recover time and that has nothing to do with
> interrupt source. Unless there is something like John's TOD, we still
> need to recover time in a arch-dependent fashion ..Where do you
> propose to have that 'handler' member?

I think it's ok where it is. Currently, with x86, at least, you can have
an independent interrupt source and time source (not true for all archs,
of course, ppc64 being a good example, I think?) Perhaps "handler"
should be called arch_recover_time() and may point to a timesource
function (currently PIT/TSC/ACPI_PM/HPET on x86, right?) which does the
appropriate catch-up for the time-related variables. In any case, since
most of the timesource code is lost-tick aware, I think it is possible.

> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
>
> Does the ACPI PM-timer support generating interrupts also? Same question
> I have for HPET.

I think, as you figured out, the HPET can, but the ACPI_PM can not. John
might know for sure (I always end up asking him), have added him to the
Cc.

>
> > arch-specific timer.c files (there currently is not one for x86, I
> > believe). Then, at compilation time, the appropriate structure would be
> > linked to the arch-generic code. That should make the arch-independent
>
> I think this binding has to be done at run-time, instead of
> compile-time? (since we may build in support for local APIC but not
> find one at run-time, in which case we have to fall back on PIT as
> interrupt source).

What may be useful is something similar to what John Stultz does in his
rework, attaching priorities to the various interrupt sources. For
example, on x86, if we have an HPET, then we should use it, if not, then
use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
(perhaps determined via a calibration loop), then only use the PIT.

> > code simple to implement (I do have some patches in the works, but it's
> > slow going, right now, sorry). I think ARM and s390 could perhaps use
> > this infrastructure as well?
> >
> > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > is true of s390 and the x86 code, and I believe it is true of the ARM
> > code? If it were dynamic-tick, I would think we would be adjusting the
> > timer interrupt frequency continuously (e.g., at the end of
> > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > but it's mostly code churn :)
>
> Yes, the name 'dynamic-tick' is misleading!

Especially, to me, as the .config option is NO_IDLE_HZ. I prefer
referring to everything as interrupt_source or something similar, I
think (after looking more at the code), then it doesn't matter whether
it is being used for (what is technically) NO_IDLE_HZ or dynamic-tick.

> > Con, wrt to the x86 implementation, I think the max_skip field should be
> > a member of the interrupt source (dyn_tick_timer) structure, as opposed
> > to the state. This would require dyn_tick_reprogram_timer() to change
>
> max_skip is dictated by two things - interrupt and the backing time source.
> In case of Local APIC, it may allow for ticks to be skipped upto few tens of
> seconds, but if we are using ACPI PM timer to recover time, then we can
> really skip not more than what the 24-bit PM timer allows for recovering time.
> (few seconds if I remember correctly). Do you agree?

I agree. I guess max_skip, to me, is what the kernel thinks the
interrupt source should maximally skip by, not what the interrupt source
thinks it can do. So, I think it fits in fine with what you are saying
and with the code you have in the current patch.

> > Also, what exactly the purpose of conditional_run_local_timers()? It
> > seems identical to run_local_timers(), except you check for
> > inequality before potentially raising the softirq. It seems like the
> > conditional check in run_timer_softirq() [the TIMER_SOFTIRQ
> > callback] will achieve the same thing? And, in fact, I think that
> > conditional is always true?
>
> Nish, that was just an optimization for not raising the softirq at all
> if the CPU was woken up w/o having skipped any ticks (becasue of some
> external interrupt).

I was just wondering; I guess it makes sense, but did you check to see
if it ever *doesn't* get called? Like I said, __run_timers() [from how I
understand it], will always increment base->timer_jiffies to one more
than jiffies. So if we disable interrupts and come right back, that
conditional is still true, but time_after_eq(jiffies,
base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
much does it cost to raise the softirq, if it is going to return
immediately from the callback?

Thanks,
Nish

2005-09-05 17:04:28

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [08:44:25 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > First of all, and maybe this is just me, I think it would be good to
> > > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> >
> > Nish, may be a good idea as it may make the code more cleaner (it will
> > remove the 'if (cpu_has_local_apic())' kind of code that is there
> > currently in x86). However note that ARM currently has 'handler' member also
> > part of it, which is used to recover time and that has nothing to do with
> > interrupt source. Unless there is something like John's TOD, we still
> > need to recover time in a arch-dependent fashion ..Where do you
> > propose to have that 'handler' member?
>
> Exactly where it is. It's there because of the problem you allude to
> above - it's there to catch up system time. Any generic code can't
> answer the question "how much time has passed since we disabled the
> timer" without additional information.

I agree.

> However, we could change "handler" to be a function pointer which
> returns the number of missed ticks instead, and then updates the
> kernels time and tick keeping. That would probably be more efficient.

Yes, I think

unsigned long (*recover_time)(int, void *, struct pt_regs *);

or something similar (not sure about the params), might be more
appropriate.

Thanks,
Nish

2005-09-05 17:07:08

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [09:32:21 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 01:49:35PM +0530, Srivatsa Vaddagiri wrote:
> > This is precisely what I have done. I have made cur_timer->mark-offset() to
> > return the lost ticks and update wall-time from the callee, which
> > can be either timer_interrupt handler or in dyn-tick case the dyn-tick
> > code (I have called it dyn_tick_interrupt) which is called before processing
> > _any_ interrupt.
>
> When you have a timer which constantly increments from 0 to MAX and
> wraps, and you can set the value to match to cause an interrupt,
> it makes more sense to handle it the way we're doing it (which
> incidentally leads to no loss of precision.)

This is the way ppc works, I believe (match register).

> Calculating the number of ticks missed, updating the kernel time,
> and updating the timer match will cause problems with these - if
> the timer has already past the number of ticks you originally
> calculated, you may not get another interrupt for a long time.

Yes, this is the source of much bugginess, especially with bad hardware
:)

> > If ARM had a timer_opts equivalent we could have followed
>
> I think your timer_opts is effectively our struct sys_timer.

I agree, in looking over the two. Perhaps those structures could be
served to be unified as well? John Stultz would be the one to talk to,
though.

Thanks,
Nish

2005-09-05 17:26:51

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 09:57:30AM -0700, Nishanth Aravamudan wrote:
> I think it's ok where it is. Currently, with x86, at least, you can have
> an independent interrupt source and time source (not true for all archs,
> of course, ppc64 being a good example, I think?) Perhaps "handler"

By "independent" do you mean driven by separate clocks? PPC64 does
use decrementer as its interrupt source and Time-base-register as
its timesource AFAIK. Both are driven by the same clock I think.

> What may be useful is something similar to what John Stultz does in his
> rework, attaching priorities to the various interrupt sources. For
> example, on x86, if we have an HPET, then we should use it, if not, then
> use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
> (perhaps determined via a calibration loop), then only use the PIT.

This logic is what the arch-code should follow in picking its interrupt
source and is independent of dyn-tick. dyn-tick just works with whatever
arch-code has chosen as its interrupt source.

> I agree. I guess max_skip, to me, is what the kernel thinks the
> interrupt source should maximally skip by, not what the interrupt source
> thinks it can do. So, I think it fits in fine with what you are saying
> and with the code you have in the current patch.

Great!

> I was just wondering; I guess it makes sense, but did you check to see
> if it ever *doesn't* get called? Like I said, __run_timers() [from how I

Haven't tested that, but I feel can happen in practice, since we dont
control device interrupts.

> base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
> much does it cost to raise the softirq, if it is going to return
> immediately from the callback?

Don't know. It just felt nice to avoid any unnecessary invocations.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 17:27:44

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Mon, Sep 05, 2005 at 10:04:24AM -0700, Nishanth Aravamudan wrote:
> > However, we could change "handler" to be a function pointer which
> > returns the number of missed ticks instead, and then updates the
> > kernels time and tick keeping. That would probably be more efficient.
>
> Yes, I think
>
> unsigned long (*recover_time)(int, void *, struct pt_regs *);
>
> or something similar (not sure about the params), might be more
> appropriate.

What would this be for x86? This could be cur_timer->mark_offset()
itself for now i think, until John's TOD comes along.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-05 18:06:51

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [22:57:14 +0530], Srivatsa Vaddagiri wrote:
> On Mon, Sep 05, 2005 at 10:04:24AM -0700, Nishanth Aravamudan wrote:
> > > However, we could change "handler" to be a function pointer which
> > > returns the number of missed ticks instead, and then updates the
> > > kernels time and tick keeping. That would probably be more efficient.
> >
> > Yes, I think
> >
> > unsigned long (*recover_time)(int, void *, struct pt_regs *);
> >
> > or something similar (not sure about the params), might be more
> > appropriate.
>
> What would this be for x86? This could be cur_timer->mark_offset()
> itself for now i think, until John's TOD comes along.

Yes, exactly, I was planning on hooking into the timer_opts for x86,
until John's timesource rework occured, which will keep the code pretty
similar across the change, but helps keep it clear *why* we are calling
mark_offset(), at least to me.

Thanks,
Nish

2005-09-05 18:11:19

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 05.09.2005 [22:55:01 +0530], Srivatsa Vaddagiri wrote:
> On Mon, Sep 05, 2005 at 09:57:30AM -0700, Nishanth Aravamudan wrote:
> > I think it's ok where it is. Currently, with x86, at least, you can have
> > an independent interrupt source and time source (not true for all archs,
> > of course, ppc64 being a good example, I think?) Perhaps "handler"
>
> By "independent" do you mean driven by separate clocks? PPC64 does
> use decrementer as its interrupt source and Time-base-register as
> its timesource AFAIK. Both are driven by the same clock I think.

Well, independent as in not the same, I meant. Let me think about it and
look at the code a bit, before making myself or anyone else more
confused. John, do you have any input on what I'm getting at? I know we
have discussed this before...

> > What may be useful is something similar to what John Stultz does in his
> > rework, attaching priorities to the various interrupt sources. For
> > example, on x86, if we have an HPET, then we should use it, if not, then
> > use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
> > (perhaps determined via a calibration loop), then only use the PIT.
>
> This logic is what the arch-code should follow in picking its interrupt
> source and is independent of dyn-tick. dyn-tick just works with whatever
> arch-code has chosen as its interrupt source.

Yes, true. I didn't mean for the h/w interrupt source selection to be
part of the arch-independent code, but that we might need to include a
priority field in the interrupt_source structure to allow the
arch-dependent code to do so.

> > I agree. I guess max_skip, to me, is what the kernel thinks the
> > interrupt source should maximally skip by, not what the interrupt source
> > thinks it can do. So, I think it fits in fine with what you are saying
> > and with the code you have in the current patch.
>
> Great!
>
> > I was just wondering; I guess it makes sense, but did you check to see
> > if it ever *doesn't* get called? Like I said, __run_timers() [from how I
>
> Haven't tested that, but I feel can happen in practice, since we dont
> control device interrupts.

Well, it would be interesting to see if there's any difference without
that function, or if it's even getting called.

> > base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
> > much does it cost to raise the softirq, if it is going to return
> > immediately from the callback?
>
> Don't know. It just felt nice to avoid any unnecessary invocations.

Yes, but it also might add a function which doesn't need to be. I'll
take a closer look at this too.

Thanks,
Nish

2005-09-06 20:51:21

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 04.09.2005 [23:44:16 -0700], Nishanth Aravamudan wrote:
> On 05.09.2005 [12:02:29 +0530], Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> > > Admittedly, I don't think SMP ARM has been around all that long?
> > > Maybe the existing code just has not been extended.
> >
> > Yeah, maybe ARM never cared for SMP. But we do care :)
>
> I just took a look at arm/Kconfig and SMP is marked as EXPERIMENTAL &&
> BROKEN. So I'm guessing that is the only reason for some of the
> differences you mentioned (the differences are of course, valid and the
> x86 SMP implementation makes sense to me to extend arch-independently).
>
> > > I'm not sure on this. It's going to be NULL for other architectures,
> > > or end up being called by the reprogram() call for the last CPU to
> > > go idle, right (presuming there isn't a separate TOD source, like in
> > > x86). I think it is better to be in the reprogram() interface.
> >
> > Non-x86 could have it set to NULL, in which case it doesn't get
> > called. (I know the current code does not take care of this
> > situation). But having an explicit 'all_cpus_idle' interface may be
> > good, since Tony talked of idling some devices when all CPUs are idle.
> > So it probably has non-x86/PIT uses too.
>
> OK, not a problem. I'll try and write up a general intsource.h file
> (interrupt source header) tonight and tomorrow and send it to this list
> to see if everybody agrees on what's in the structure and where the
> arch-independent/dependent line lies.

Sigh, later than I had hoped, but here is what I have hashed out so far.
Does it seem like a step in the right direction? Rather hand-wavy, but I
think it's mostly correct ;)

Thanks,
Nish

- include/linux/intsource.h
with definitions in kernel/intsource.c

#define DYN_TICK_ENABLED (1 << 1)
#define DYN_TICK_SUITABLE (1 << 0)

#define DYN_TICK_MIN_SKIP 2

/* Abstraction of an interrupt source
* @state: current state
* @max_skip: current maximum number of ticks to skip
* @arch_init: initialization routine
* @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
* @arch_disable_dyn_tick: called via sysfs to disable interrupt
* skipping
* @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
* disable any timesource (e.g. PIT on x86)
* @arch_recover_time: handler for returning from skipped ticks and keeping
* time consistent
*/
struct interrupt_source {
unsigned int state;
unsigned long max_skip;
int (*arch_init) (void);
void (*arch_enable_dyn_tick) (void);
void (*arch_disable_dyn_tick) (void);
unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
/* following empty in UP */
void (*arch_set_all_cpus_idle) (int);
spinlock_t lock;
};

extern void interrupt_source_register(struct interrupt_source *new_interrupt_source);
extern struct interrupt_source *current_intsource;

#ifdef CONFIG_NO_IDLE_HZ
extern void set_interrupt_max_skip(unsigned long max_skip);
/* idle_reprogram_interrupt calls reprogram_interrupt calls current_intsource->arch_reprogram()
* do we really need the first step? */
extern void idle_reprogram_interrupt(void);
/* return number of ticks skipped, potentially for accounting purposes? */
extern unsigned long reprogram_interrupt(void);

extern struct interrupt_source * __init arch_select_interrupt_source(void);
extern void __init dyn_tick_init(void); /* calls select_interrupt_source(), verifies source is usable, then calls interrupt_source_register() */

static inline int dyn_tick_enabled(void)
{
return (current_intsource->state & DYN_TICK_ENABLED);
}

#else /* CONFIG_NO_IDLE_HZ */
static inline void set_interrupt_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_interrupt(void)
{
}

static inline unsigned long reprogram_interrupt(void)
{
return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
return 0;
}
#endif /* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/intsource.h>

#endif /* _DYN_TICK_TIMER_H */

- sched.c / sched.h
/* do we want these elsewhere? */
cpumask_t no_idle_hz_cpumask;

* each arch-specific file pair needs to provide:
arch_select_interrupt_source();
appropriate struct interrupt_source definitions, functions, etc.

- include/asm-i386/intsource.h
with defines in arch/i386/intsource.c

- include/asm-arm/arch-omap/intsource.h
with definitions in arch/arm/mach-omap/intsource.c

- include/asm-s390/intsource.h
with definitions in arch/s390/intsource.c

- include/asm-generic/intsource.h
do I need something here?

2005-09-07 07:38:14

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

* Nishanth Aravamudan <[email protected]> [050905 20:02]:
> On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > >
> > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > timer interrupt frequency continuously (e.g., at the end of
> > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > but it's mostly code churn :)
> > >
> > > Yes, the name 'dynamic-tick' is misleading!
> >
> > Huh? For most people dynamic-tick is much more descriptive name than
> > NO_IDLE_HZ or VST!
>
> I understand this. My point is that the structures are *not*
> dynamic-tick specific. They are interrupt source specific, generally
> (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> users of the interrupt source reprogramming functions, but not the
> reprogrammers themselves, in my mind. Also, it still would be confusing
> to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)

I see what you mean, it's a confusing naming issue currently :) Would
the following solution work for you:

- Dynamic tick is the structure you register with, and then you use it
for any kind of non-continuous timer tinkering

- This structure has at least two possible users, NO_IDLE_HZ and
sub-jiffie timers

So we could have following config options:

CONFIG_DYNTICK
CONFIG_NO_IDLE_HZ depends on dyntick
CONFIG_SUBJIFFIE_TIMER depends on dyntick

> > If you wanted, you could reprogram the next timer to happen from
> > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
>
> I messed with this with my soft-timer rework (which has since has fallen
> by the wayside). It is a bit of overhead, especially del_timer(), but
> it's possible. This is what I would consider "dynamic-tick." And I would
> setup a *different* .config option to enable it. Perhaps depending on
> CONFIG_NO_IDLE_HZ.

Yes, I agree it should be a different .config option. Maybe the example
above would work for that?

> > And you would want to do that if you wanted sub-jiffie timer
> > interrupts.
>
> Yes, true, it does enable that. Well, to be honest, it completely
> redefines (in some sense) the jiffy, as it is potentially continuously
> changing, not just at idle times.

Yeah. But should still work as we already accept interrupts at any point
inbetween jiffies to update time, and update the system time from a
second continuously running timer :)

> > So I'd rather not limit the name to the currently implemented
> > functionality only :)
>
> I'm not trying to limit the name, but make sure we are tying the
> strcutures and functions to the right abstraction (interrupt source, in
> my opinion).

But other devices are interrupt sources too... And really the only use
for this stuct is non-continuous timer stuff, right?

Tony

2005-09-07 08:13:14

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

* Nishanth Aravamudan <[email protected]> [050906 23:55]:

...

> Sigh, later than I had hoped, but here is what I have hashed out so far.
> Does it seem like a step in the right direction? Rather hand-wavy, but I
> think it's mostly correct ;)

Some comments below.

> - include/linux/intsource.h
> with definitions in kernel/intsource.c
>
> #define DYN_TICK_ENABLED (1 << 1)
> #define DYN_TICK_SUITABLE (1 << 0)
>
> #define DYN_TICK_MIN_SKIP 2
>
> /* Abstraction of an interrupt source
> * @state: current state
> * @max_skip: current maximum number of ticks to skip
> * @arch_init: initialization routine
> * @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
> * @arch_disable_dyn_tick: called via sysfs to disable interrupt
> * skipping
> * @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
> * disable any timesource (e.g. PIT on x86)
> * @arch_recover_time: handler for returning from skipped ticks and keeping
> * time consistent
> */
> struct interrupt_source {
> unsigned int state;
> unsigned long max_skip;
> int (*arch_init) (void);
> void (*arch_enable_dyn_tick) (void);
> void (*arch_disable_dyn_tick) (void);
> unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
> unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> /* following empty in UP */
> void (*arch_set_all_cpus_idle) (int);
> spinlock_t lock;
> };

I would still call the struct dyntick, have CONFIG_DYNTICK, and then have
CONFIG_NO_IDLE_HZ and possibly CONFIG_SUBJIFFIE_TIMER register to use it
like I said in my earlier mail. Would that solve the issues you have
with the naming?

> /* return number of ticks skipped, potentially for accounting purposes? */
> extern unsigned long reprogram_interrupt(void);

The number of ticks skipped can be potentially used in idle loops to
select which ACPI C state to go to depending on the estimated length of
sleep.

Regards,

Tony

2005-09-07 15:00:47

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 07.09.2005 [11:13:04 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <[email protected]> [050906 23:55]:
>
> ...
>
> > Sigh, later than I had hoped, but here is what I have hashed out so far.
> > Does it seem like a step in the right direction? Rather hand-wavy, but I
> > think it's mostly correct ;)
>
> Some comments below.

Thanks, Tony!

> > - include/linux/intsource.h
> > with definitions in kernel/intsource.c
> >
> > #define DYN_TICK_ENABLED (1 << 1)
> > #define DYN_TICK_SUITABLE (1 << 0)
> >
> > #define DYN_TICK_MIN_SKIP 2
> >
> > /* Abstraction of an interrupt source
> > * @state: current state
> > * @max_skip: current maximum number of ticks to skip
> > * @arch_init: initialization routine
> > * @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
> > * @arch_disable_dyn_tick: called via sysfs to disable interrupt
> > * skipping
> > * @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
> > * disable any timesource (e.g. PIT on x86)
> > * @arch_recover_time: handler for returning from skipped ticks and keeping
> > * time consistent
> > */
> > struct interrupt_source {
> > unsigned int state;
> > unsigned long max_skip;
> > int (*arch_init) (void);
> > void (*arch_enable_dyn_tick) (void);
> > void (*arch_disable_dyn_tick) (void);
> > unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
> > unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > /* following empty in UP */
> > void (*arch_set_all_cpus_idle) (int);
> > spinlock_t lock;
> > };
>
> I would still call the struct dyntick, have CONFIG_DYNTICK, and then have
> CONFIG_NO_IDLE_HZ and possibly CONFIG_SUBJIFFIE_TIMER register to use it
> like I said in my earlier mail. Would that solve the issues you have
> with the naming?

I'll respond more fully there, but I think it might. If that's the case,
though, I think I'll just push all of the code down into timer.c and
timer.h, no need for a separate file, really. I'll mull it over, see
what the others think as well...

> > /* return number of ticks skipped, potentially for accounting purposes? */
> > extern unsigned long reprogram_interrupt(void);
>
> The number of ticks skipped can be potentially used in idle loops to
> select which ACPI C state to go to depending on the estimated length of
> sleep.

Ah true!

Thanks,
Nish

2005-09-07 15:07:24

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <[email protected]> [050905 20:02]:
> > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > >
> > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > but it's mostly code churn :)
> > > >
> > > > Yes, the name 'dynamic-tick' is misleading!
> > >
> > > Huh? For most people dynamic-tick is much more descriptive name than
> > > NO_IDLE_HZ or VST!
> >
> > I understand this. My point is that the structures are *not*
> > dynamic-tick specific. They are interrupt source specific, generally
> > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > users of the interrupt source reprogramming functions, but not the
> > reprogrammers themselves, in my mind. Also, it still would be confusing
> > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
>
> I see what you mean, it's a confusing naming issue currently :) Would
> the following solution work for you:
>
> - Dynamic tick is the structure you register with, and then you use it
> for any kind of non-continuous timer tinkering
>
> - This structure has at least two possible users, NO_IDLE_HZ and
> sub-jiffie timers
>
> So we could have following config options:
>
> CONFIG_DYNTICK
> CONFIG_NO_IDLE_HZ depends on dyntick
> CONFIG_SUBJIFFIE_TIMER depends on dyntick

Hrm, yes, first you are right with the dependency ordering. I take it
CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
independent users of the same underlying infrastructure.

> > > If you wanted, you could reprogram the next timer to happen from
> > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> >
> > I messed with this with my soft-timer rework (which has since has fallen
> > by the wayside). It is a bit of overhead, especially del_timer(), but
> > it's possible. This is what I would consider "dynamic-tick." And I would
> > setup a *different* .config option to enable it. Perhaps depending on
> > CONFIG_NO_IDLE_HZ.
>
> Yes, I agree it should be a different .config option. Maybe the example
> above would work for that?

Yes, I'm thinking it might.

> > > And you would want to do that if you wanted sub-jiffie timer
> > > interrupts.
> >
> > Yes, true, it does enable that. Well, to be honest, it completely
> > redefines (in some sense) the jiffy, as it is potentially continuously
> > changing, not just at idle times.
>
> Yeah. But should still work as we already accept interrupts at any point
> inbetween jiffies to update time, and update the system time from a
> second continuously running timer :)

The problem with subjiffie timers is that the precision of soft-timers
is jiffies currently. It requires some serious effort to modify the
soft-timer subsystem to be aware of the extra bits it needs,
efficiently -- take a look at what HRT has had to do.

> > > So I'd rather not limit the name to the currently implemented
> > > functionality only :)
> >
> > I'm not trying to limit the name, but make sure we are tying the
> > strcutures and functions to the right abstraction (interrupt source, in
> > my opinion).
>
> But other devices are interrupt sources too... And really the only use
> for this stuct is non-continuous timer stuff, right?

Would "tick_source" be better? I guess you are right, that there is only
this one consumer... Although if that is the case, then maybe a separate
.h/.c file is the right way to go, to isolate the code, reduce
#ifdeffery in timer.h/.c.

Thanks,
Nish

2005-09-07 15:54:07

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 07.09.2005 [11:13:04 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <[email protected]> [050906 23:55]:
>
> ...
>
> > Sigh, later than I had hoped, but here is what I have hashed out so far.
> > Does it seem like a step in the right direction? Rather hand-wavy, but I
> > think it's mostly correct ;)
>
> Some comments below.

Updated document below.

Thanks,
Nish

- include/linux/timer.h
with definitions in kernel/timer.c

OR better in
- include/linux/ticksource.h
with definitions in kernel/ticksource.c?

#define DYN_TICK_ENABLED (1 << 1)
#define DYN_TICK_SUITABLE (1 << 0)

#define DYN_TICK_MIN_SKIP 2

/* Abstraction of a tick source
* @state: current state
* @max_skip: current maximum number of ticks to skip
* @init: initialization routine
* @enable_dyn_tick: called via sysfs to enable interrupt skipping
* @disable_dyn_tick: called via sysfs to disable interrupt
* skipping
* @set_all_cpus_idle: last cpu to go idle calls this, which should
* disable any timesource (e.g. PIT on x86)
* @recover_time: handler for returning from skipped ticks and keeping
* time consistent
*/
struct tick_source {
unsigned int state;
unsigned long max_skip;
int (*init) (void);
void (*enable_dyn_tick) (void);
void (*disable_dyn_tick) (void);
unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
/* following empty in UP */
void (*set_all_cpus_idle) (int);
spinlock_t lock;
};

extern void tick_source_register(struct tick_source *new_tick_source);
extern struct tick_source *current_ticksource;

#ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
extern void set_tick_max_skip(unsigned long max_skip);
/* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
* do we really need the first step? */
extern void idle_reprogram_tick(void);
/* return number of ticks skipped, potentially for accounting purposes? */
extern unsigned long reprogram_tick(void);

extern struct tick_source * __init arch_select_tick_source(void);
extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */

static inline int dyn_tick_enabled(void)
{
return (current_ticksource->state & DYN_TICK_ENABLED);
}

#else /* CONFIG_NO_IDLE_HZ */
static inline void set_tick_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_tick(void)
{
}

static inline unsigned long reprogram_tick(void)
{
return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
return 0;
}
#endif /* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/timer.h> /* or <asm/ticksource.h>, depending */

- sched.c / sched.h
/* do we want these elsewhere? */
cpumask_t no_idle_hz_cpumask;

- each arch-specific file pair needs to provide:
arch_select_tick_source();
appropriate struct tick_source definitions, functions, etc.

- include/asm-i386/timer.h /* or ticksource.h */
with defines in arch/i386/timer.c /* or ticksource.c */

- include/asm-arm/arch-omap/timer.h /* or ticksource.h */
with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */

- include/asm-s390/timer.h /* or ticksource.h */
with definitions in arch/s390/timer.c /* or ticksource.c */

2005-09-07 16:08:47

by Bill Davidsen

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
>
>>Admittedly, I don't think SMP ARM has been around all that long? Maybe
>>the existing code just has not been extended.
>
>
> Yeah, maybe ARM never cared for SMP. But we do care :)
>
>
>>I'm not sure on this. It's going to be NULL for other architectures, or
>>end up being called by the reprogram() call for the last CPU to go idle,
>>right (presuming there isn't a separate TOD source, like in x86). I
>>think it is better to be in the reprogram() interface.
>
>
> Non-x86 could have it set to NULL, in which case it doesn't get called.
> (I know the current code does not take care of this situation).
> But having an explicit 'all_cpus_idle' interface may be good, since
> Tony talked of idling some devices when all CPUs are idle. So it
> probably has non-x86/PIT uses too.

If this is intended to reduce power, and it originally came from that
root, then this is the time to put in a hook for transitions to<=>from
the all-idle state. Various arch may have things other than the PIT
which should (or at least can) be stopped, and which need to be restarted.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-09-07 16:42:32

by Nish Aravamudan

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 9/7/05, Bill Davidsen <[email protected]> wrote:
> Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> >
> >>Admittedly, I don't think SMP ARM has been around all that long? Maybe
> >>the existing code just has not been extended.
> >
> >
> > Yeah, maybe ARM never cared for SMP. But we do care :)
> >
> >
> >>I'm not sure on this. It's going to be NULL for other architectures, or
> >>end up being called by the reprogram() call for the last CPU to go idle,
> >>right (presuming there isn't a separate TOD source, like in x86). I
> >>think it is better to be in the reprogram() interface.
> >
> >
> > Non-x86 could have it set to NULL, in which case it doesn't get called.
> > (I know the current code does not take care of this situation).
> > But having an explicit 'all_cpus_idle' interface may be good, since
> > Tony talked of idling some devices when all CPUs are idle. So it
> > probably has non-x86/PIT uses too.
>
> If this is intended to reduce power, and it originally came from that
> root, then this is the time to put in a hook for transitions to<=>from
> the all-idle state. Various arch may have things other than the PIT
> which should (or at least can) be stopped, and which need to be restarted.

Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
infrastructure being proposed, with the idle CPU mask and the
set_all_cpus_idle() tick_source hook, would allow exactly this in
arch-specific code.

Is there a generic location where the all-idle state is entered?
Currently, I think we can do it via the generic reprogram() routine
checking the mask and then calling set_all_cpus_idle(), if
appropriate, after reprogramming the last idle CPU.

Thanks,
Nish

2005-09-07 17:07:33

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Wed, Sep 07, 2005 at 08:53:52AM -0700, Nishanth Aravamudan wrote:
>
> - include/linux/timer.h
> with definitions in kernel/timer.c
>
> OR better in
> - include/linux/ticksource.h
> with definitions in kernel/ticksource.c?

>
> #define DYN_TICK_ENABLED (1 << 1)
> #define DYN_TICK_SUITABLE (1 << 0)
>
> #define DYN_TICK_MIN_SKIP 2
>
> /* Abstraction of a tick source

I think "tick source" probably doesn't bring out the fact that we are
also dealing with more than tick source here (@recover_time and
@set_all_cpus_idle). From this perspective, I feel that the original
'dyn_tick_timer' name itself was better (atleast it captures the
'dynamic' nature of ticks).

> * @state: current state
> * @max_skip: current maximum number of ticks to skip
> * @init: initialization routine
> * @enable_dyn_tick: called via sysfs to enable interrupt skipping
> * @disable_dyn_tick: called via sysfs to disable interrupt
> * skipping
> * @set_all_cpus_idle: last cpu to go idle calls this, which should
> * disable any timesource (e.g. PIT on x86)
> * @recover_time: handler for returning from skipped ticks and keeping
> * time consistent
> */
> struct tick_source {
> unsigned int state;
> unsigned long max_skip;
> int (*init) (void);
> void (*enable_dyn_tick) (void);
> void (*disable_dyn_tick) (void);
> unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */

How will it be able to return the number of ticks skipped? Or are you referring
to max_skip here?

> unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> /* following empty in UP */
> void (*set_all_cpus_idle) (int);

Does 'set' in 'set_all_cpus_idle' signify anything?

> spinlock_t lock;

I think the 'lock' fits in nicely here.

> };
>
> extern void tick_source_register(struct tick_source *new_tick_source);

I tend to prefer the original interface - dyn_tick_register - itself (since
as I said it captures the dynamic nature of the timer).

> extern struct tick_source *current_ticksource;

In x86-like architectures, there can be multiple ticksources that can
be simultaneously active - ex: APIC and PIT. So one "current_ticksource"
doesnt capture that fact?

>
> #ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
> extern void set_tick_max_skip(unsigned long max_skip);
> /* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
> * do we really need the first step? */

If 'idle_reprogram_tick' is the equivalent of 'idle_reprogram_timer' that is
in the latest patch published by Con, then I think we can avoid the first step
yes.

> extern void idle_reprogram_tick(void);
> /* return number of ticks skipped, potentially for accounting purposes? */
> extern unsigned long reprogram_tick(void);
>
> extern struct tick_source * __init arch_select_tick_source(void);
> extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */

I presume dyn_tick_init will be in arch-independent code. In that case, how
does it "verify" that the source is usable? Seems like we need arch-hooks
for this as well?

> static inline int dyn_tick_enabled(void)
> {
> return (current_ticksource->state & DYN_TICK_ENABLED);
> }
>
> #else /* CONFIG_NO_IDLE_HZ */
> static inline void set_tick_max_skip(unsigned long max_skip)
> {
> }
>
> static inline void idle_reprogram_tick(void)
> {
> }
>
> static inline unsigned long reprogram_tick(void)
> {
> return 0;
> }
>
> static inline void dyn_tick_init(void)
> {
> }
>
> static inline int dyn_tick_enabled(void)
> {
> return 0;
> }
> #endif /* CONFIG_NO_IDLE_HZ */
>
> /* Pick up arch specific header */
> #include <asm/timer.h> /* or <asm/ticksource.h>, depending */
>
> - sched.c / sched.h
> /* do we want these elsewhere? */
> cpumask_t no_idle_hz_cpumask;

Could be moved to timer.h/c?

> - each arch-specific file pair needs to provide:
> arch_select_tick_source();
> appropriate struct tick_source definitions, functions, etc.
>
> - include/asm-i386/timer.h /* or ticksource.h */
> with defines in arch/i386/timer.c /* or ticksource.c */
>
> - include/asm-arm/arch-omap/timer.h /* or ticksource.h */
> with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */
>
> - include/asm-s390/timer.h /* or ticksource.h */
> with definitions in arch/s390/timer.c /* or ticksource.c */

I somehow consider that we can retain what currently exists -
include/asm-i386/dyn-tick.h and arch/i386/kernel/dyn-tick.c ..
IMO current abstraction of 'dyn_tick_timer' is good enough to unify all the
ports of no-idle-hz. We probably need to just iron out the differences between
how ARM and x86 defines this.

As far as the problem of multiple interrupt sources (like APIC, PIT, HPET)
is concerned, it can be completely handled by the architecture code itself and
it appropriately sets the 'reprogram_timer' member to point to APIC, PIT or
HPET reprogramming routines. That would also avoid the
'if (cpu_has_local_apic())' kind of code that exists now.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-07 17:19:35

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
> Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
> infrastructure being proposed, with the idle CPU mask and the
> set_all_cpus_idle() tick_source hook, would allow exactly this in
> arch-specific code.

I think Bill is referring to the "resume" interface i.e an
unset_all_cpus_idle() interface, which is missing (set/unset
probably are not good prefixes maybe?). I feel we can
add one.

> Is there a generic location where the all-idle state is entered?

Should be from the place where the last cpu is set in the bitmap
and bitmap is found equal to cpu_online_map.

> Currently, I think we can do it via the generic reprogram() routine
> checking the mask and then calling set_all_cpus_idle(), if
> appropriate, after reprogramming the last idle CPU.

So are you saying that setting of the CPU in the bitmap will be done
inside reprogram_timer routine? If we consider that reprogram_timer can
directly point to a routine in a interrupt source file (like apic.c/timer_pit.c)
I dont think that it is the right place to set bits in the nohz_cpu_mask.
It can be done by the callee of reprogram_timer itself.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-07 17:23:19

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 07.09.2005 [22:37:03 +0530], Srivatsa Vaddagiri wrote:
> On Wed, Sep 07, 2005 at 08:53:52AM -0700, Nishanth Aravamudan wrote:
> >
> > - include/linux/timer.h
> > with definitions in kernel/timer.c
> >
> > OR better in
> > - include/linux/ticksource.h
> > with definitions in kernel/ticksource.c?
>
> >
> > #define DYN_TICK_ENABLED (1 << 1)
> > #define DYN_TICK_SUITABLE (1 << 0)
> >
> > #define DYN_TICK_MIN_SKIP 2

Another point. Why is this 2? I guess if you're going to make it 2, why
bother defining/checking it at all?

> > /* Abstraction of a tick source
>
> I think "tick source" probably doesn't bring out the fact that we are
> also dealing with more than tick source here (@recover_time and
> @set_all_cpus_idle). From this perspective, I feel that the original
> 'dyn_tick_timer' name itself was better (atleast it captures the
> 'dynamic' nature of ticks).

Yes, maybe you are right. dyn_tick_source, maybe?

> > * @state: current state
> > * @max_skip: current maximum number of ticks to skip
> > * @init: initialization routine
> > * @enable_dyn_tick: called via sysfs to enable interrupt skipping
> > * @disable_dyn_tick: called via sysfs to disable interrupt
> > * skipping
> > * @set_all_cpus_idle: last cpu to go idle calls this, which should
> > * disable any timesource (e.g. PIT on x86)
> > * @recover_time: handler for returning from skipped ticks and keeping
> > * time consistent
> > */
> > struct tick_source {
> > unsigned int state;
> > unsigned long max_skip;
> > int (*init) (void);
> > void (*enable_dyn_tick) (void);
> > void (*disable_dyn_tick) (void);
> > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
>
> How will it be able to return the number of ticks skipped? Or are you
> referring to max_skip here?

Yes, maybe this can be a void function... I was thinking more along the
lines of you can send whatever request you want to reprogram(), it does
what it can with the request (cuts it short if too long, ignores it if
too short) and then returns what it actually did.

> > unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > /* following empty in UP */
> > void (*set_all_cpus_idle) (int);
>
> Does 'set' in 'set_all_cpus_idle' signify anything?

It is the callback for the architecture when the last CPU goes locally
idle, so then system-wide idle code can be placed here (disable PIT on
x86). The name maybe wrong, though, you are right.

> > spinlock_t lock;
>
> I think the 'lock' fits in nicely here.
>
> > };
> >
> > extern void tick_source_register(struct tick_source *new_tick_source);
>
> I tend to prefer the original interface - dyn_tick_register - itself (since
> as I said it captures the dynamic nature of the timer).
>
> > extern struct tick_source *current_ticksource;
>
> In x86-like architectures, there can be multiple ticksources that can
> be simultaneously active - ex: APIC and PIT. So one
> "current_ticksource" doesnt capture that fact?

Not really, though, right? Only one is registered to do the timer
callbacks? So, for x86, if you use the PIT ticksource, you only need to
be PIT aware, but if you use the APIC ticksource, then it needs to be
aware of the APIC and PIT (I believe you mentioned they are tied to each
other), but that's ticksource-specific. CMIIW, though, please.

> > #ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
> > extern void set_tick_max_skip(unsigned long max_skip);
> > /* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
> > * do we really need the first step? */
>
> If 'idle_reprogram_tick' is the equivalent of 'idle_reprogram_timer' that is
> in the latest patch published by Con, then I think we can avoid the first step
> yes.

Yes, exactly.

> > extern void idle_reprogram_tick(void);
> > /* return number of ticks skipped, potentially for accounting purposes? */
> > extern unsigned long reprogram_tick(void);
> >
> > extern struct tick_source * __init arch_select_tick_source(void);
> > extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */
>
> I presume dyn_tick_init will be in arch-independent code. In that
> case, how does it "verify" that the source is usable? Seems like we
> need arch-hooks for this as well?

hrm, you are right. So then maybe the "usable" checks should be pushed
into arch_select_tick_source()?

<snip>

> > - sched.c / sched.h
> > /* do we want these elsewhere? */
> > cpumask_t no_idle_hz_cpumask;
>
> Could be moved to timer.h/c?

Yes, that's what I was thinking. I just wasn't sure if there was a
specific reason for keeping them in sched.h/.c.

> > - each arch-specific file pair needs to provide:
> > arch_select_tick_source();
> > appropriate struct tick_source definitions, functions, etc.
> >
> > - include/asm-i386/timer.h /* or ticksource.h */
> > with defines in arch/i386/timer.c /* or ticksource.c */
> >
> > - include/asm-arm/arch-omap/timer.h /* or ticksource.h */
> > with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */
> >
> > - include/asm-s390/timer.h /* or ticksource.h */
> > with definitions in arch/s390/timer.c /* or ticksource.c */
>
> I somehow consider that we can retain what currently exists -
> include/asm-i386/dyn-tick.h and arch/i386/kernel/dyn-tick.c .. IMO
> current abstraction of 'dyn_tick_timer' is good enough to unify all
> the ports of no-idle-hz. We probably need to just iron out the
> differences between how ARM and x86 defines this.

Maybe you are right. I don't like having a separate struct for the
state, though, and the dyn_tick_timer struct doesn't have a
recover_time() style member. If you look closely, my structure is
basically exactly what the x86 work has, just some different names
(don't need arch_ prefix, for instance, because it's clearly
dyn_tick_timer specific, etc.) I also would like to hear from the s390
folks about their issues/opinions.

> As far as the problem of multiple interrupt sources (like APIC, PIT,
> HPET) is concerned, it can be completely handled by the architecture
> code itself and it appropriately sets the 'reprogram_timer' member to
> point to APIC, PIT or HPET reprogramming routines. That would also
> avoid the 'if (cpu_has_local_apic())' kind of code that exists now.

Yes, true. I'm wondering, do we need to make the
current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
how to gracefully handle the SMP case. Or is that not a problem?

Thanks,
Nish

2005-09-07 17:27:45

by Nish Aravamudan

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 9/7/05, Srivatsa Vaddagiri <[email protected]> wrote:
> On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
> > Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
> > infrastructure being proposed, with the idle CPU mask and the
> > set_all_cpus_idle() tick_source hook, would allow exactly this in
> > arch-specific code.
>
> I think Bill is referring to the "resume" interface i.e an
> unset_all_cpus_idle() interface, which is missing (set/unset
> probably are not good prefixes maybe?). I feel we can
> add one.

Yes, can be added.

enter_all_cpus_idle() and exit_all_cpus_idle() would be better?

> > Is there a generic location where the all-idle state is entered?
>
> Should be from the place where the last cpu is set in the bitmap
> and bitmap is found equal to cpu_online_map.

Yes, this is what I said.

> > Currently, I think we can do it via the generic reprogram() routine
> > checking the mask and then calling set_all_cpus_idle(), if
> > appropriate, after reprogramming the last idle CPU.
>
> So are you saying that setting of the CPU in the bitmap will be done
> inside reprogram_timer routine? If we consider that reprogram_timer can
> directly point to a routine in a interrupt source file (like apic.c/timer_pit.c)
> I dont think that it is the right place to set bits in the nohz_cpu_mask.
> It can be done by the callee of reprogram_timer itself.

No, I was saying what you were, if a little unclearly, so the caller
does something like:

current_dyn_tick_timer->reprogram();
check_cpu_mask(nohz_cpu_mask);
if (we_are_last_idle)
enter_all_cpus_idle();

Thanks,
Nish

2005-09-07 18:15:21

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Wed, Sep 07, 2005 at 10:23:15AM -0700, Nishanth Aravamudan wrote:
> > > #define DYN_TICK_MIN_SKIP 2
>
> Another point. Why is this 2? I guess if you're going to make it 2, why
> bother defining/checking it at all?

I think that should be arch-specific.

> > > void (*disable_dyn_tick) (void);
> > > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> >
> > How will it be able to return the number of ticks skipped? Or are you
> > referring to max_skip here?
>
> Yes, maybe this can be a void function... I was thinking more along the
> lines of you can send whatever request you want to reprogram(), it does
> what it can with the request (cuts it short if too long, ignores it if
> too short) and then returns what it actually did.

Looks fine in that case to have a non-void return.

> > In x86-like architectures, there can be multiple ticksources that can
> > be simultaneously active - ex: APIC and PIT. So one
> > "current_ticksource" doesnt capture that fact?
>
> Not really, though, right? Only one is registered to do the timer
> callbacks?

True.

> So, for x86, if you use the PIT ticksource, you only need to
> be PIT aware, but if you use the APIC ticksource, then it needs to be
> aware of the APIC and PIT (I believe you mentioned they are tied to each
> other), but that's ticksource-specific. CMIIW, though, please.

I was going more by what meaning 'current_ticksource' may give - from
a pure "ticksource" perspective, both (PIT/APIC) are tick sources!
Thats why current_ticksource may not be a good term.

> Maybe you are right. I don't like having a separate struct for the
> state, though, and the dyn_tick_timer struct doesn't have a
> recover_time() style member. If you look closely, my structure is

I agree we can remove the separate struct for state and have
recover_time member. Although in x86, it may have to be a wrapper
around mark_offset() since mark_offset does not recover time
completely (it expect the callee to recover one remaining tick).

> basically exactly what the x86 work has, just some different names
> (don't need arch_ prefix, for instance, because it's clearly
> dyn_tick_timer specific, etc.) I also would like to hear from the s390
> folks about their issues/opinions.

Martin Schwidefsky (whom I have CC'ed) may be the person who can comment on
behalf of s390.

> Yes, true. I'm wondering, do we need to make the
> current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
> how to gracefully handle the SMP case. Or is that not a problem?

I don't see that current_ticksource/current_dyn_tick_timer to be write-heavy.
In fact I see them to be initialied during bootup and after that mostly
read-only. That may not warrant a per-CPU structure.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-07 18:18:53

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Wed, Sep 07, 2005 at 10:27:43AM -0700, Nish Aravamudan wrote:
> enter_all_cpus_idle() and exit_all_cpus_idle() would be better?

Looks perfect.

> No, I was saying what you were, if a little unclearly, so the caller
> does something like:
>
> current_dyn_tick_timer->reprogram();
> check_cpu_mask(nohz_cpu_mask);
> if (we_are_last_idle)
> enter_all_cpus_idle();

Looks fine!

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-07 18:22:53

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 07.09.2005 [23:44:45 +0530], Srivatsa Vaddagiri wrote:
> On Wed, Sep 07, 2005 at 10:23:15AM -0700, Nishanth Aravamudan wrote:
> > > > #define DYN_TICK_MIN_SKIP 2
> >
> > Another point. Why is this 2? I guess if you're going to make it 2, why
> > bother defining/checking it at all?
>
> I think that should be arch-specific.

Yes, I agree, will perhaps add a min_skip member to the structure.

> > > > void (*disable_dyn_tick) (void);
> > > > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > >
> > > How will it be able to return the number of ticks skipped? Or are you
> > > referring to max_skip here?
> >
> > Yes, maybe this can be a void function... I was thinking more along the
> > lines of you can send whatever request you want to reprogram(), it does
> > what it can with the request (cuts it short if too long, ignores it if
> > too short) and then returns what it actually did.
>
> Looks fine in that case to have a non-void return.

Ok, I agree.

> > > In x86-like architectures, there can be multiple ticksources that can
> > > be simultaneously active - ex: APIC and PIT. So one
> > > "current_ticksource" doesnt capture that fact?
> >
> > Not really, though, right? Only one is registered to do the timer
> > callbacks?
>
> True.

Thanks for confirming, I wasn't sure if that was the case or not.

> > So, for x86, if you use the PIT ticksource, you only need to
> > be PIT aware, but if you use the APIC ticksource, then it needs to be
> > aware of the APIC and PIT (I believe you mentioned they are tied to each
> > other), but that's ticksource-specific. CMIIW, though, please.
>
> I was going more by what meaning 'current_ticksource' may give - from
> a pure "ticksource" perspective, both (PIT/APIC) are tick sources!
> Thats why current_ticksource may not be a good term.

Ah, true. maybe current_dyn_tick_source or something to that effect?
Because there should only be one dyn_tick_timer, yes?

> > Maybe you are right. I don't like having a separate struct for the
> > state, though, and the dyn_tick_timer struct doesn't have a
> > recover_time() style member. If you look closely, my structure is
>
> I agree we can remove the separate struct for state and have
> recover_time member. Although in x86, it may have to be a wrapper
> around mark_offset() since mark_offset does not recover time
> completely (it expect the callee to recover one remaining tick).

Yes, certainly, in fact, I think some of these functions may provide
impetus to eventually clean up other code.

> > basically exactly what the x86 work has, just some different names
> > (don't need arch_ prefix, for instance, because it's clearly
> > dyn_tick_timer specific, etc.) I also would like to hear from the s390
> > folks about their issues/opinions.
>
> Martin Schwidefsky (whom I have CC'ed) may be the person who can comment on
> behalf of s390.

Thanks, I forwarded him the first plan I submitted a few days ago, but
didn't add him to the Cc.

> > Yes, true. I'm wondering, do we need to make the
> > current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
> > how to gracefully handle the SMP case. Or is that not a problem?
>
> I don't see that current_ticksource/current_dyn_tick_timer to be write-heavy.
> In fact I see them to be initialied during bootup and after that mostly
> read-only. That may not warrant a per-CPU structure.

I meant more for making sure we can manage that one CPU may have access
to the PIT, but others may not (CPU0)? More along the lines of diverse
h/w setups where perhaps the HPET is tied to one chip, but not the
other. So, actually different hardware per-cpu. If that can't be the
case (at least not currently), then the nohz_mask and spin_lock is
enough to guarantee we don't muck with cpus accidentally.

Thanks,
Nish

2005-09-07 18:33:58

by Nish Aravamudan

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 9/7/05, Srivatsa Vaddagiri <[email protected]> wrote:
> On Wed, Sep 07, 2005 at 10:27:43AM -0700, Nish Aravamudan wrote:
> > enter_all_cpus_idle() and exit_all_cpus_idle() would be better?
>
> Looks perfect.
>
> > No, I was saying what you were, if a little unclearly, so the caller
> > does something like:
> >
> > current_dyn_tick_timer->reprogram();
> > check_cpu_mask(nohz_cpu_mask);
> > if (we_are_last_idle)
> > enter_all_cpus_idle();
>
> Looks fine!

Great!

Thanks,
Nish

2005-09-08 10:01:04

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

* Nishanth Aravamudan <[email protected]> [050907 18:07]:
> On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > * Nishanth Aravamudan <[email protected]> [050905 20:02]:
> > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > >
> > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > but it's mostly code churn :)
> > > > >
> > > > > Yes, the name 'dynamic-tick' is misleading!
> > > >
> > > > Huh? For most people dynamic-tick is much more descriptive name than
> > > > NO_IDLE_HZ or VST!
> > >
> > > I understand this. My point is that the structures are *not*
> > > dynamic-tick specific. They are interrupt source specific, generally
> > > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > > users of the interrupt source reprogramming functions, but not the
> > > reprogrammers themselves, in my mind. Also, it still would be confusing
> > > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
> >
> > I see what you mean, it's a confusing naming issue currently :) Would
> > the following solution work for you:
> >
> > - Dynamic tick is the structure you register with, and then you use it
> > for any kind of non-continuous timer tinkering
> >
> > - This structure has at least two possible users, NO_IDLE_HZ and
> > sub-jiffie timers
> >
> > So we could have following config options:
> >
> > CONFIG_DYNTICK
> > CONFIG_NO_IDLE_HZ depends on dyntick
> > CONFIG_SUBJIFFIE_TIMER depends on dyntick
>
> Hrm, yes, first you are right with the dependency ordering. I take it
> CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
> independent users of the same underlying infrastructure.

Cool, I'm glad we got the dependencies figured out now rather than later :)

> > > > If you wanted, you could reprogram the next timer to happen from
> > > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> > >
> > > I messed with this with my soft-timer rework (which has since has fallen
> > > by the wayside). It is a bit of overhead, especially del_timer(), but
> > > it's possible. This is what I would consider "dynamic-tick." And I would
> > > setup a *different* .config option to enable it. Perhaps depending on
> > > CONFIG_NO_IDLE_HZ.
> >
> > Yes, I agree it should be a different .config option. Maybe the example
> > above would work for that?
>
> Yes, I'm thinking it might.
>
> > > > And you would want to do that if you wanted sub-jiffie timer
> > > > interrupts.
> > >
> > > Yes, true, it does enable that. Well, to be honest, it completely
> > > redefines (in some sense) the jiffy, as it is potentially continuously
> > > changing, not just at idle times.
> >
> > Yeah. But should still work as we already accept interrupts at any point
> > inbetween jiffies to update time, and update the system time from a
> > second continuously running timer :)
>
> The problem with subjiffie timers is that the precision of soft-timers
> is jiffies currently. It requires some serious effort to modify the
> soft-timer subsystem to be aware of the extra bits it needs,
> efficiently -- take a look at what HRT has had to do.

Yes, we should coordinate that with HRT. BTW, we can reduce the overhead
of del_timer() by _not_ calling next_timer_interrupt(), and programming
the next timer interrupt to happen where next jiffie would be. Then once
we get to the idle, we call next_timer_interrupt()...

> > > > So I'd rather not limit the name to the currently implemented
> > > > functionality only :)
> > >
> > > I'm not trying to limit the name, but make sure we are tying the
> > > strcutures and functions to the right abstraction (interrupt source, in
> > > my opinion).
> >
> > But other devices are interrupt sources too... And really the only use
> > for this stuct is non-continuous timer stuff, right?
>
> Would "tick_source" be better? I guess you are right, that there is only
> this one consumer... Although if that is the case, then maybe a separate
> .h/.c file is the right way to go, to isolate the code, reduce
> #ifdeffery in timer.h/.c.

Hmmm, seems like dyntick.[ch] is still the best name for it...

Tony

2005-09-08 21:22:44

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <[email protected]> [050907 18:07]:
> > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > * Nishanth Aravamudan <[email protected]> [050905 20:02]:
> > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > >
> > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > but it's mostly code churn :)
> > > > > >
> > > > > > Yes, the name 'dynamic-tick' is misleading!
> > > > >
> > > > > Huh? For most people dynamic-tick is much more descriptive name than
> > > > > NO_IDLE_HZ or VST!
> > > >
> > > > I understand this. My point is that the structures are *not*
> > > > dynamic-tick specific. They are interrupt source specific, generally
> > > > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > > > users of the interrupt source reprogramming functions, but not the
> > > > reprogrammers themselves, in my mind. Also, it still would be confusing
> > > > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
> > >
> > > I see what you mean, it's a confusing naming issue currently :) Would
> > > the following solution work for you:
> > >
> > > - Dynamic tick is the structure you register with, and then you use it
> > > for any kind of non-continuous timer tinkering
> > >
> > > - This structure has at least two possible users, NO_IDLE_HZ and
> > > sub-jiffie timers
> > >
> > > So we could have following config options:
> > >
> > > CONFIG_DYNTICK
> > > CONFIG_NO_IDLE_HZ depends on dyntick
> > > CONFIG_SUBJIFFIE_TIMER depends on dyntick
> >
> > Hrm, yes, first you are right with the dependency ordering. I take it
> > CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
> > independent users of the same underlying infrastructure.
>
> Cool, I'm glad we got the dependencies figured out now rather than later :)

Yup, I think that makes the most sense. I appreciate your help with it!

> > > > > If you wanted, you could reprogram the next timer to happen from
> > > > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> > > >
> > > > I messed with this with my soft-timer rework (which has since has fallen
> > > > by the wayside). It is a bit of overhead, especially del_timer(), but
> > > > it's possible. This is what I would consider "dynamic-tick." And I would
> > > > setup a *different* .config option to enable it. Perhaps depending on
> > > > CONFIG_NO_IDLE_HZ.
> > >
> > > Yes, I agree it should be a different .config option. Maybe the example
> > > above would work for that?
> >
> > Yes, I'm thinking it might.
> >
> > > > > And you would want to do that if you wanted sub-jiffie timer
> > > > > interrupts.
> > > >
> > > > Yes, true, it does enable that. Well, to be honest, it completely
> > > > redefines (in some sense) the jiffy, as it is potentially continuously
> > > > changing, not just at idle times.
> > >
> > > Yeah. But should still work as we already accept interrupts at any point
> > > inbetween jiffies to update time, and update the system time from a
> > > second continuously running timer :)
> >
> > The problem with subjiffie timers is that the precision of soft-timers
> > is jiffies currently. It requires some serious effort to modify the
> > soft-timer subsystem to be aware of the extra bits it needs,
> > efficiently -- take a look at what HRT has had to do.
>
> Yes, we should coordinate that with HRT. BTW, we can reduce the overhead
> of del_timer() by _not_ calling next_timer_interrupt(), and programming
> the next timer interrupt to happen where next jiffie would be. Then once
> we get to the idle, we call next_timer_interrupt()...

Yes, I agree with the del_timer() changes.

> > > > > So I'd rather not limit the name to the currently implemented
> > > > > functionality only :)
> > > >
> > > > I'm not trying to limit the name, but make sure we are tying the
> > > > strcutures and functions to the right abstraction (interrupt source, in
> > > > my opinion).
> > >
> > > But other devices are interrupt sources too... And really the only use
> > > for this stuct is non-continuous timer stuff, right?
> >
> > Would "tick_source" be better? I guess you are right, that there is only
> > this one consumer... Although if that is the case, then maybe a separate
> > .h/.c file is the right way to go, to isolate the code, reduce
> > #ifdeffery in timer.h/.c.
>
> Hmmm, seems like dyntick.[ch] is still the best name for it...

I guess you are right, I'll change my little summary and send it out so
it's archived.

Thanks,
Nish

2005-09-08 22:08:58

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 08.09.2005 [14:22:13 -0700], Nishanth Aravamudan wrote:
> On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> > * Nishanth Aravamudan <[email protected]> [050907 18:07]:
> > > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > > * Nishanth Aravamudan <[email protected]> [050905 20:02]:
> > > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > > >
> > > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > > but it's mostly code churn :)
> > > > > > >
> > > > > > > Yes, the name 'dynamic-tick' is misleading!

<snipping much useful feedback and many constructive conversations>

So, after *all* that, I'm going back to dyntick (notice no hyphen though
:-P). Everyone ok with this doc?

Thanks,
Nish

- include/linux/dyntick.h
with definitions in kernel/dyntick.c

#define DYN_TICK_ENABLED (1 << 1)
#define DYN_TICK_SUITABLE (1 << 0)

#define DYN_TICK_MIN_SKIP 2

/* Abstraction of a dynamic tick source
* @state: current state
* @max_skip: current maximum number of jiffies to program h/w to skip
* @min_skip: current minimum number of jiffies to program h/w to skip
* @init: initialization routine
* @enable_dyn_tick: called via sysfs to enable interrupt skipping
* @disable_dyn_tick: called via sysfs to disable interrupt
* skipping
* @reprogram: actually interact with h/w, return number of ticks the
* h/w will skip
* @recover_time: handler for returning from skipped ticks and keeping
* time consistent
* @enter_all_cpus_idle: last cpu to go idle calls this, which should
* disable any timer source (e.g. PIT on x86)
* @exit_all_cpus_idle: first cpu to wake after @enter_all_cpus_idle has
* been called should use this to revert the
* effects of that function
*/
struct dyntick_timer {
unsigned int state;
unsigned long max_skip;
unsigned long min_skip;
int (*init) (void);
void (*enable_dyn_tick) (void);
void (*disable_dyn_tick) (void);
unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
/* following empty in UP */
void (*enter_all_cpus_idle) (int);
void (*exit_all_cpus_idle) (int);
spinlock_t lock;
};

extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
/* so do we need this?
Maybe it can just be static to dyntick.c and all the callable
functions will call-down to the structure members? */
extern struct dyntick_timer *current_dyntick_timer;

#ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
extern void set_dyntick_max_skip(unsigned long max_skip);
extern void set_dyntick_min_skip(unsigned long min_skip);
/* return number of ticks skipped, as we can request any number
called from cpu_idle() in dyntick-enabled arch's */
extern unsigned long reprogram_dyntick(void);

extern struct tick_source * __init arch_select_tick_source(void);
/* calls select_tick_source(), then calls tick_source_register() */
extern void __init dyn_tick_init(void);

static inline int dyn_tick_enabled(void)
{
return (current_ticksource->state & DYN_TICK_ENABLED);
}

#else /* CONFIG_NO_IDLE_HZ */
static inline void set_tick_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_tick(void)
{
}

static inline unsigned long reprogram_tick(void)
{
return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
return 0;
}
#endif /* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/dyntick.h>

- timer.c / timer.h
/* moved from sched.c/.h */
cpumask_t no_idle_hz_cpumask;

- each arch-specific file pair needs to provide:
arch_select_tick_source();
an appropriate struct tick_source definitions, functions, etc. per
usable h/w

- include/asm-i386/dyntick.h
with defines in arch/i386/dyntick.c
/* basically already done */

- include/asm-arm/arch-omap/dyntick.h
with definitions in arch/arm/mach-omap/dyntick.c

- include/asm-s390/dyntick.h
with definitions in arch/s390/dyntick.c

2005-09-09 16:12:36

by Bill Davidsen

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

Srivatsa Vaddagiri wrote:

>On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
>
>
>>Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
>>infrastructure being proposed, with the idle CPU mask and the
>>set_all_cpus_idle() tick_source hook, would allow exactly this in
>>arch-specific code.
>>
>>
>
>I think Bill is referring to the "resume" interface i.e an
>unset_all_cpus_idle() interface, which is missing (set/unset
>probably are not good prefixes maybe?). I feel we can
>add one.
>
Exactly what I had in mind. If there are hooks for all_idle transitions
then architectures can hang whatever makes sense there. That hopefully
would result in readable code for both power reduction (laptop) and for
the strange things that embedded systems sometimes do.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2005-09-09 22:30:55

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 08.09.2005 [15:08:54 -0700], Nishanth Aravamudan wrote:
> On 08.09.2005 [14:22:13 -0700], Nishanth Aravamudan wrote:
> > On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> > > * Nishanth Aravamudan <[email protected]> [050907 18:07]:
> > > > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > > > * Nishanth Aravamudan <[email protected]> [050905 20:02]:
> > > > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > > > * Srivatsa Vaddagiri <[email protected]> [050905 10:03]:
> > > > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > > > >
> > > > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > > > but it's mostly code churn :)
> > > > > > > >
> > > > > > > > Yes, the name 'dynamic-tick' is misleading!
>
> <snipping much useful feedback and many constructive conversations>
>
> So, after *all* that, I'm going back to dyntick (notice no hyphen though
> :-P). Everyone ok with this doc?
>
> Thanks,
> Nish
>
>
> - include/linux/dyntick.h
> with definitions in kernel/dyntick.c
>
> #define DYN_TICK_ENABLED (1 << 1)
> #define DYN_TICK_SUITABLE (1 << 0)
>
> #define DYN_TICK_MIN_SKIP 2
>
> /* Abstraction of a dynamic tick source
> * @state: current state
> * @max_skip: current maximum number of jiffies to program h/w to skip
> * @min_skip: current minimum number of jiffies to program h/w to skip
> * @init: initialization routine
> * @enable_dyn_tick: called via sysfs to enable interrupt skipping
> * @disable_dyn_tick: called via sysfs to disable interrupt
> * skipping
> * @reprogram: actually interact with h/w, return number of ticks the
> * h/w will skip
> * @recover_time: handler for returning from skipped ticks and keeping
> * time consistent
> * @enter_all_cpus_idle: last cpu to go idle calls this, which should
> * disable any timer source (e.g. PIT on x86)
> * @exit_all_cpus_idle: first cpu to wake after @enter_all_cpus_idle has
> * been called should use this to revert the
> * effects of that function
> */
> struct dyntick_timer {

Ah ha! I think I figured out what my problem was with the naming (I
*just* can't get my head around it). As we added the cpus_idle() hooks,
recover_time() and other non-per-interrupt-source related functionality,
I think it might be best to name this structure:

struct dyntick_control;

indicating it's dynamic tick basis, but that it's used to control the
subsystem.

I think that makes it clear, and keeps it clear why we have
non-"timer" stuff in there. It's also much clearer to me now why we
*don't* need it per-CPU, as we're relying on one set of callbacks for a
subsystem, and not for each CPU's hardware (works out to be the same,
but makes more sense to me that way).

Does that make more sense?

Thanks,
Nish

2005-09-20 11:07:29

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

Nish,
I did some study of how s390 and ARM are architected and have
some comments as a result of that.

On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> struct dyntick_timer {
> unsigned int state;
> unsigned long max_skip;
> unsigned long min_skip;
> int (*init) (void);
> void (*enable_dyn_tick) (void);
> void (*disable_dyn_tick) (void);
> unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> /* following empty in UP */
> void (*enter_all_cpus_idle) (int);
> void (*exit_all_cpus_idle) (int);
> spinlock_t lock;
> };

The usage of 'lock' probably needs to be made clear. I intended it to be used
for mainly serializing enter/exit_all_cpus_idle routines.

Considering that not all architectures have such routines, then the use
of spinlock can be entirely within the arch code. Maybe the 'enter' routine
is invoked as part of 'reprogram' routine (when the last CPU goes down)
and 'exit' routine is invoked as part of dyn_tick_interrupt() (when
coming out of all_idle_state), both being serialized using the spinlock?

Another interesting point is that I expected recover_time to be
invoked only as part of 'exit_all_cpus_idle', but s390 seems to
have unconditional call to account_ticks (for recovering time) on
any CPU that wakes up. I guess it will be a no-op if other CPUs
were active.

We probably also need to document how 'reprogram' will be invoked
- with xtime_lock held or not. Again s390 does not seem to require it
while ARM is using one. I think we should let the arch code take
xtime_lock if they deem it necessary.

> extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> /* so do we need this?
> Maybe it can just be static to dyntick.c and all the callable
> functions will call-down to the structure members? */
> extern struct dyntick_timer *current_dyntick_timer;

I don't think this can be static - since the low-level arch-code
will need access to, for example, 'recover_time'/'handler'
and 'enter/exit_all_cpus_idle' routines?

> extern struct tick_source * __init arch_select_tick_source(void);
> /* calls select_tick_source(), then calls tick_source_register() */
> extern void __init dyn_tick_init(void);

Hmm ..I think just tick_source_register is sufficient ..we can do
let the arch-code select what tick source it wants and call
register with the selected source ..

>From a point of getting this reviewed by arch-maintainers, I think it will
help if a new version of this interface is posted and point out how the
existing s390/ARM interfaces will be affected. I could help out if you are busy.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-20 14:58:58

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 20.09.2005 [16:36:54 +0530], Srivatsa Vaddagiri wrote:
> Nish,
> I did some study of how s390 and ARM are architected and have
> some comments as a result of that.
>
> On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > struct dyntick_timer {
> > unsigned int state;
> > unsigned long max_skip;
> > unsigned long min_skip;
> > int (*init) (void);
> > void (*enable_dyn_tick) (void);
> > void (*disable_dyn_tick) (void);
> > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > /* following empty in UP */
> > void (*enter_all_cpus_idle) (int);
> > void (*exit_all_cpus_idle) (int);
> > spinlock_t lock;
> > };
>
> The usage of 'lock' probably needs to be made clear. I intended it to
> be used for mainly serializing enter/exit_all_cpus_idle routines.

Yes, now that we are somewhat settled on the structure, I will add
comments to the structure in dyn-tick.h and send out patches. Sorry,
I've been swamped with other tasks lately...

> Considering that not all architectures have such routines, then the
> use of spinlock can be entirely within the arch code. Maybe the
> 'enter' routine is invoked as part of 'reprogram' routine (when the
> last CPU goes down) and 'exit' routine is invoked as part of
> dyn_tick_interrupt() (when coming out of all_idle_state), both being
> serialized using the spinlock?

I think I suggested this at one point or another :)

The usage I envisioned is

dyntick_timer_reprogram():

current_dyntick_timer->reprogram();
if (cpus_full(noidlehz_mask))
current_dyntick_timer->enter_all_cpus_idle(); // which will lock
// with
// current_dyntick_timer->lock,
// if necessary

dyn_tick_interrupt():

if (cpus_full(noidlehz_mask)) {
cpu_unset(cpu, noidlehz_mask);
current_dyntick_timer->exit_all_cpus_idle(); // which will lock
// with
// current_dyntick_timer->lock,
// if necessary

> Another interesting point is that I expected recover_time to be
> invoked only as part of 'exit_all_cpus_idle', but s390 seems to
> have unconditional call to account_ticks (for recovering time) on
> any CPU that wakes up. I guess it will be a no-op if other CPUs
> were active.

Maybe that is just paranoia? In theory, if nothing has happened, then
accounting should not need to do anything; but I'm not sure, I'll add
this to my "code to look at" list ;)

> We probably also need to document how 'reprogram' will be invoked
> - with xtime_lock held or not. Again s390 does not seem to require it
> while ARM is using one. I think we should let the arch code take
> xtime_lock if they deem it necessary.

That seems buggy. I'm guessing they need the xtime_lock there just as
much as ARM and x86 will. In fact, I'm pretty sure all archs will need
it. But I'm fine with leaving it to the arch for now, and then unifying
the locking later, if we find that all archs call seq_lock(xtime_lock).

> > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> > /* so do we need this?
> > Maybe it can just be static to dyntick.c and all the callable
> > functions will call-down to the structure members? */
> > extern struct dyntick_timer *current_dyntick_timer;
>
> I don't think this can be static - since the low-level arch-code
> will need access to, for example, 'recover_time'/'handler'
> and 'enter/exit_all_cpus_idle' routines?

Ah yes, you are probably right...

> > extern struct tick_source * __init arch_select_tick_source(void);
> > /* calls select_tick_source(), then calls tick_source_register() */
> > extern void __init dyn_tick_init(void);
>
> Hmm ..I think just tick_source_register is sufficient ..we can do let
> the arch-code select what tick source it wants and call register with
> the selected source ..

Ok, that is fine by me.

> From a point of getting this reviewed by arch-maintainers, I think it
> will help if a new version of this interface is posted and point out
> how the existing s390/ARM interfaces will be affected. I could help
> out if you are busy.

That would be great. I will try to get your changes merged into what I
already have pending for dyn-tick.h to make sure everyone is still in
agreement.

I think for x86, it's mostly assigning the members of the structure and
perhaps renaming some functions for the PIT/APIC, etc. Same for s390 as
there is only the one timer source. Ideally, the same will hold for ARM,
but it may require some validation.

Thanks again,
Nish

2005-09-22 13:38:08

by Martin Schwidefsky

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Tue, 2005-09-20 at 07:58 -0700, Nishanth Aravamudan wrote:

Hi folks,
finally found some time to catch up on the dynticks discussion. Quite
lengthy already..

> > On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > > struct dyntick_timer {
> > > unsigned int state;
> > > unsigned long max_skip;
> > > unsigned long min_skip;
> > > int (*init) (void);
> > > void (*enable_dyn_tick) (void);
> > > void (*disable_dyn_tick) (void);
> > > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > > unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > > /* following empty in UP */
> > > void (*enter_all_cpus_idle) (int);
> > > void (*exit_all_cpus_idle) (int);
> > > spinlock_t lock;
> > > };
> >
> > The usage of 'lock' probably needs to be made clear. I intended it to
> > be used for mainly serializing enter/exit_all_cpus_idle routines.
>
> Yes, now that we are somewhat settled on the structure, I will add
> comments to the structure in dyn-tick.h and send out patches. Sorry,
> I've been swamped with other tasks lately...

As I first saw the dyntick_timer structure I thought: "why does it have
to be that complicated?". Must be because of the requirements for
high-res-timers, since it's overkill for no-idle-hz.

> > Considering that not all architectures have such routines, then the
> > use of spinlock can be entirely within the arch code. Maybe the
> > 'enter' routine is invoked as part of 'reprogram' routine (when the
> > last CPU goes down) and 'exit' routine is invoked as part of
> > dyn_tick_interrupt() (when coming out of all_idle_state), both being
> > serialized using the spinlock?
>
> I think I suggested this at one point or another :)
>
> The usage I envisioned is
>
> dyntick_timer_reprogram():
>
> current_dyntick_timer->reprogram();
> if (cpus_full(noidlehz_mask))
> current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> // with
> // current_dyntick_timer->lock,
> // if necessary
>
> dyn_tick_interrupt():
>
> if (cpus_full(noidlehz_mask)) {
> cpu_unset(cpu, noidlehz_mask);
> current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> // with
> // current_dyntick_timer->lock,
> // if necessary

I would really like to see how all the fields from the dyntick_timer
structure are supposed to be used. Especially the who-calls-what graph,
if I got it right then the low-level arch code calls common code
functions which in turn call functions from the dyntick_timer structure.
The question is what should be the connecting points between the arch
code and the common timer code? With the current code its
* do_timer()
* update_process_times()
* next_timer_event()
and the non-trivial interactions with rcu via
* test/set/clear bit on nohz_cpu_mask
* rcu_pending() and friends.

> > Another interesting point is that I expected recover_time to be
> > invoked only as part of 'exit_all_cpus_idle', but s390 seems to
> > have unconditional call to account_ticks (for recovering time) on
> > any CPU that wakes up. I guess it will be a no-op if other CPUs
> > were active.
>
> Maybe that is just paranoia? In theory, if nothing has happened, then
> accounting should not need to do anything; but I'm not sure, I'll add
> this to my "code to look at" list ;)

After a cpu woke up some code need to check if a tick has passed or not.
For s390 this is done in the start_hz_timer function called from the
idle notifier. Even if "nothing" has happened it's not just paranoia,
account_ticks sets up the clock comparator for the next tick, updates
xtime if it is necessary and account_user_vtime()/update_process_times()
for cpu time accounting. It is the exact same function that is used for
the regular tick interrupts, its just called from a different context.

The reason that xtime and process ticks are done in a single function is
that s390 doesn't have the distinction between wall-clock timer and
local APIC timer. Each s390 cpu has something called the clock
comparator that each cpu can set up individually. Whenever the value of
the time-of-day (TOD) clock is bigger then the clock comparator the cpu
gets an external interrupt if the cpu is enabled for the interrupt. This
timer interrupt source is used for both the xtime updates and the
process ticks. So there won't be any exit_all_cpus_idle special handling
for s390.

> > We probably also need to document how 'reprogram' will be invoked
> > - with xtime_lock held or not. Again s390 does not seem to require it
> > while ARM is using one. I think we should let the arch code take
> > xtime_lock if they deem it necessary.
>
> That seems buggy. I'm guessing they need the xtime_lock there just as
> much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> it. But I'm fine with leaving it to the arch for now, and then unifying
> the locking later, if we find that all archs call seq_lock(xtime_lock).

Why do you need the xtime_lock to reprogram the clock comparator (=local
APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
needed to reprogram the timer.

> > > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> > > /* so do we need this?
> > > Maybe it can just be static to dyntick.c and all the callable
> > > functions will call-down to the structure members? */
> > > extern struct dyntick_timer *current_dyntick_timer;
> >
> > I don't think this can be static - since the low-level arch-code
> > will need access to, for example, 'recover_time'/'handler'
> > and 'enter/exit_all_cpus_idle' routines?
>
> Ah yes, you are probably right...

Again the question who-calls-what.

> > > extern struct tick_source * __init arch_select_tick_source(void);
> > > /* calls select_tick_source(), then calls tick_source_register() */
> > > extern void __init dyn_tick_init(void);
> >
> > Hmm ..I think just tick_source_register is sufficient ..we can do let
> > the arch-code select what tick source it wants and call register with
> > the selected source ..
>
> Ok, that is fine by me.
>
> > From a point of getting this reviewed by arch-maintainers, I think it
> > will help if a new version of this interface is posted and point out
> > how the existing s390/ARM interfaces will be affected. I could help
> > out if you are busy.
>
> That would be great. I will try to get your changes merged into what I
> already have pending for dyn-tick.h to make sure everyone is still in
> agreement.

Yes, that would be good. And I will happily review and test the changes
for s390.

--
blue skies,
Martin

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

2005-09-22 14:52:25

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On 22.09.2005 [15:38:10 +0200], Martin Schwidefsky wrote:
> On Tue, 2005-09-20 at 07:58 -0700, Nishanth Aravamudan wrote:
>
> Hi folks,
> finally found some time to catch up on the dynticks discussion. Quite
> lengthy already..
>
> > > On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > > > struct dyntick_timer {
> > > > unsigned int state;
> > > > unsigned long max_skip;
> > > > unsigned long min_skip;
> > > > int (*init) (void);
> > > > void (*enable_dyn_tick) (void);
> > > > void (*disable_dyn_tick) (void);
> > > > unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > > > unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > > > /* following empty in UP */
> > > > void (*enter_all_cpus_idle) (int);
> > > > void (*exit_all_cpus_idle) (int);
> > > > spinlock_t lock;
> > > > };
> > >
> > > The usage of 'lock' probably needs to be made clear. I intended it to
> > > be used for mainly serializing enter/exit_all_cpus_idle routines.
> >
> > Yes, now that we are somewhat settled on the structure, I will add
> > comments to the structure in dyn-tick.h and send out patches. Sorry,
> > I've been swamped with other tasks lately...
>
> As I first saw the dyntick_timer structure I thought: "why does it have
> to be that complicated?". Must be because of the requirements for
> high-res-timers, since it's overkill for no-idle-hz.

It has become complicated :) HRT does not (yet) use any of this
functionality for their work; Thomas is interested in perhaps doing so
with ktimers, but that is an eventuality, not a guarantee.

Let me try and give an overview of what is in here. Keep in mind that,
for now, we are only aiming for NO_IDLE_HZ in an arch-generic, clean
fashion (not VST, therefore). And, it is certainly reasonable for
certain members to be NULL depending on the arch (like the idle entry
and exit points).

- state: contains the current state of the dynamic-tick subsystem
(enabled, etc.)
- max_skip, min_skip: the h/w may only be able to skip so far into the
future (32-bit issues, or other random bit-values), so we need to
constrain the dynamic-tick subsystem. Initialization code should set
this value appropriately (on x86 it would depend on what interrupt
source is eventually selected, e.g.). min_skip exists because it might
make sense not to go through the hassle of reprogramming when the time
necessary to reprogram the interrupt is about the same as how long we
would be disabling the interrupt for.
- init(): arch-dependent initialization routine
- {enable,disable}_dyn_tick(): sysfs hooks so that userspace can disable
the dynamic-tick subsystem (useful for debugging, mostly?)
- reprogram(): actually interact with whatever arch-dependent underlying
h/w controls the timer interrupt frequency.
- recover_time(): since we skip ticks, we *must* recover time. Now, one
option is simply to hook into the normal interrupt path, and I am
talking with John about the best approach for that (e.g., just have an
interrupt happen, and *in theory* the time/timer subsystem should be
capable of recovering from "missed" or "lost" ticks), but for now we
have a manual callback to catch up.
- {enter,exit}_all_cpus_idle(): some architectures (ahem, x86) may need
to do additional work, if *all* CPUs go idle (disable the PIT, e.g.).
This may also be a place for eventual PM code to catch the all going
idle event and put the processor(s) into a sleep/low-power state.
- lock: guarantee atomicity of the enter/exit idle routines. we only
want one CPU to be the "last" CPU going idle :)

> > > Considering that not all architectures have such routines, then the
> > > use of spinlock can be entirely within the arch code. Maybe the
> > > 'enter' routine is invoked as part of 'reprogram' routine (when the
> > > last CPU goes down) and 'exit' routine is invoked as part of
> > > dyn_tick_interrupt() (when coming out of all_idle_state), both being
> > > serialized using the spinlock?
> >
> > I think I suggested this at one point or another :)
> >
> > The usage I envisioned is
> >
> > dyntick_timer_reprogram():
> >
> > current_dyntick_timer->reprogram();
> > if (cpus_full(noidlehz_mask))
> > current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> > // with
> > // current_dyntick_timer->lock,
> > // if necessary
> >
> > dyn_tick_interrupt():
> >
> > if (cpus_full(noidlehz_mask)) {
> > cpu_unset(cpu, noidlehz_mask);
> > current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> > // with
> > // current_dyntick_timer->lock,
> > // if necessary
>
> I would really like to see how all the fields from the dyntick_timer
> structure are supposed to be used. Especially the who-calls-what graph,
> if I got it right then the low-level arch code calls common code
> functions which in turn call functions from the dyntick_timer structure.

Yes, I will try and write this up for you and everyone else. In theory,
since we are only dealing with NO_IDLE_HZ right now, what should happen
is the following (very roughly):

during boot, we of course will initialize the subsystem, which will call
current_dyntick_timer->init().

in cpu_idle() [only?], the arch-dependent idle routine, we call
reprogram_dyntick(), which checks if tick skipping is enabled, finds out
when next_timer_interrupt() thinks the next timer is going to expire is,
and then calls into current_dyntick_timer->reprogram() with the delta
value. We also would check here to see if we are the last CPU to go idle
(see above) and hook into current_dyntick_timer->enter_all_cpus_idle()
if so.

Then, on the interrupt, we would see if we are the first CPU coming back
from idle, and hook into current_dyntick_timer->exit_all_cpus_idle() if
so. We then call current_dyntick_timer->recover_time() to make sure our
time subsystem catches up after missing X ticks. Then we execute the
common code-path for the timer interrupt.

> The question is what should be the connecting points between the arch
> code and the common timer code? With the current code its
> * do_timer()
> * update_process_times()
> * next_timer_event()

So I guess our interaction points would include those three and
cpu_idle(). By next_timer_event() did you mean next_timer_interrupt()?

> and the non-trivial interactions with rcu via
> * test/set/clear bit on nohz_cpu_mask
> * rcu_pending() and friends.

Yes, these are definitely the more complicated interactions.

> > > Another interesting point is that I expected recover_time to be
> > > invoked only as part of 'exit_all_cpus_idle', but s390 seems to
> > > have unconditional call to account_ticks (for recovering time) on
> > > any CPU that wakes up. I guess it will be a no-op if other CPUs
> > > were active.
> >
> > Maybe that is just paranoia? In theory, if nothing has happened, then
> > accounting should not need to do anything; but I'm not sure, I'll add
> > this to my "code to look at" list ;)
>
> After a cpu woke up some code need to check if a tick has passed or not.
> For s390 this is done in the start_hz_timer function called from the
> idle notifier. Even if "nothing" has happened it's not just paranoia,
> account_ticks sets up the clock comparator for the next tick, updates
> xtime if it is necessary and account_user_vtime()/update_process_times()
> for cpu time accounting. It is the exact same function that is used for
> the regular tick interrupts, its just called from a different context.

You are right, and in fact, I mis-wrote when I replied to Vatsa. I think
this is how all archs should treat recover_time(). But doesn't the timer
interrupt already do much of this? I mean, if we were to allow the next
interrupt to occur as usual (maybe forced to be called from our common
dyntick_interrupt()), should time not get caught up that way?

> The reason that xtime and process ticks are done in a single function is
> that s390 doesn't have the distinction between wall-clock timer and
> local APIC timer. Each s390 cpu has something called the clock
> comparator that each cpu can set up individually. Whenever the value of
> the time-of-day (TOD) clock is bigger then the clock comparator the cpu
> gets an external interrupt if the cpu is enabled for the interrupt. This
> timer interrupt source is used for both the xtime updates and the
> process ticks. So there won't be any exit_all_cpus_idle special handling
> for s390.

Ah yes, that's why we are trying to figure out what needs to go into
arch code and what doesn't ;)

> > > We probably also need to document how 'reprogram' will be invoked
> > > - with xtime_lock held or not. Again s390 does not seem to require it
> > > while ARM is using one. I think we should let the arch code take
> > > xtime_lock if they deem it necessary.
> >
> > That seems buggy. I'm guessing they need the xtime_lock there just as
> > much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> > it. But I'm fine with leaving it to the arch for now, and then unifying
> > the locking later, if we find that all archs call seq_lock(xtime_lock).
>
> Why do you need the xtime_lock to reprogram the clock comparator (=local
> APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
> needed to reprogram the timer.

Hrm, you might be right. I might have mis-typed again. At the point
where we are ready to hook into the underlying arch-dependent
current_dyntick_timer->reprogram() routine, we should know the delta
according to next_timer_interrupt(), so we should not need to query
jiffies or xtime anymore...

> > > > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> > > > /* so do we need this?
> > > > Maybe it can just be static to dyntick.c and all the callable
> > > > functions will call-down to the structure members? */
> > > > extern struct dyntick_timer *current_dyntick_timer;
> > >
> > > I don't think this can be static - since the low-level arch-code
> > > will need access to, for example, 'recover_time'/'handler'
> > > and 'enter/exit_all_cpus_idle' routines?
> >
> > Ah yes, you are probably right...
>
> Again the question who-calls-what.

This is what I get for replying on little sleep. Vatsa, what arch-code
should need access to current_dyntick_timer->recover_time() or the
current_dyntick_timer->{enter,exit}_all_cpus_idle() routines? Ah...maybe
the problem is I removed the generic dyntick_interrupt()? If we have
that function again, we cann call current_dyntick_timer->recover_time()
from there as well as current_dyntick_timer->exit_all_cpus_idle().
current_dyntick_timer->enter_all_cpus_idle() should only need to be
called from reprogram_dyntick().

> > > > extern struct tick_source * __init arch_select_tick_source(void);
> > > > /* calls select_tick_source(), then calls tick_source_register() */
> > > > extern void __init dyn_tick_init(void);
> > >
> > > Hmm ..I think just tick_source_register is sufficient ..we can do let
> > > the arch-code select what tick source it wants and call register with
> > > the selected source ..
> >
> > Ok, that is fine by me.
> >
> > > From a point of getting this reviewed by arch-maintainers, I think it
> > > will help if a new version of this interface is posted and point out
> > > how the existing s390/ARM interfaces will be affected. I could help
> > > out if you are busy.
> >
> > That would be great. I will try to get your changes merged into what I
> > already have pending for dyn-tick.h to make sure everyone is still in
> > agreement.
>
> Yes, that would be good. And I will happily review and test the changes
> for s390.

Great! That's good news. Thank you for reviewing what we've gotten
cobbled together so far.

Thanks,
Nish

2005-09-22 18:33:05

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Thu, Sep 22, 2005 at 07:52:22AM -0700, Nishanth Aravamudan wrote:

> - recover_time(): since we skip ticks, we *must* recover time. Now, one
> option is simply to hook into the normal interrupt path, and I am
> talking with John about the best approach for that (e.g., just have an
> interrupt happen, and *in theory* the time/timer subsystem should be
> capable of recovering from "missed" or "lost" ticks), but for now we
> have a manual callback to catch up.

It's not just xtime but jiffies also needs to be recovered. My understanding
of John's TOD is that it keeps up time and not jiffies. In which case
we still need to recover jiffies?

Also, I think "recover_time" on s390 may do more than recovering time (as
per Martin's note). So we should probably change the name - "handler" as in
ARM :)?

> > > Considering that not all architectures have such routines, then the
> > > use of spinlock can be entirely within the arch code. Maybe the

I eat my words. I tried embedding the spinlock entirely inside the arch-code
and find it non-trivial and very racy, as illustrated below:

[CPU0 going down]

> > current_dyntick_timer->reprogram();
> > if (cpus_full(noidlehz_mask))

Lets say that this check was true (becasue CPU0 and CPU1 both
were idle). But before we grab the spinlock, CPU1 wakes up
(in the code below), also notices that all_were_idle,
gets spinlock and calls exit_all_cpus_idle. After CPU1 releases
the spinlock, CPU0 now gets the spinlock and calls
enter_all_cpus_idle, which is _wrong_.

> > current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> > // with
> > // current_dyntick_timer->lock,
> > // if necessary

[CPU1 coming up]

> >
> > dyn_tick_interrupt():
> >
> > if (cpus_full(noidlehz_mask)) {
> > cpu_unset(cpu, noidlehz_mask);
> > current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> > // with
> > // current_dyntick_timer->lock,
> > // if necessary
>

This is lot simpler if we use the lock to set/clear the bitmap also.
Question is : what overhead will that introduce on other arches
(like s390/arm) where it not really required (since they will not
have enter/exit_all_idle routines). We could define some macro,
dyn_tick_lock()?, which evaluates to no-op on those arches, but
becomes a spinlock on x86-kind arch'es, something like:

static inline void dyn_tick_lock(void)
{
spin_lock(&dyn_tick->lock);
}

for x86, and

static inline void dyn_tick_lock(void)
{
}

for s390/ARM.

> > I would really like to see how all the fields from the dyntick_timer
> > structure are supposed to be used. Especially the who-calls-what graph,
> > if I got it right then the low-level arch code calls common code
> > functions which in turn call functions from the dyntick_timer structure.

> > After a cpu woke up some code need to check if a tick has passed or not.
> > For s390 this is done in the start_hz_timer function called from the
> > idle notifier. Even if "nothing" has happened it's not just paranoia,
> > account_ticks sets up the clock comparator for the next tick, updates
> > xtime if it is necessary and account_user_vtime()/update_process_times()
> > for cpu time accounting. It is the exact same function that is used for
> > the regular tick interrupts, its just called from a different context.
>
> You are right, and in fact, I mis-wrote when I replied to Vatsa. I think
> this is how all archs should treat recover_time(). But doesn't the timer
> interrupt already do much of this? I mean, if we were to allow the next
> interrupt to occur as usual (maybe forced to be called from our common
> dyntick_interrupt()), should time not get caught up that way?

If you are referring to calling the actual timer interrupt handler "as is" from
dyntick_interrupt(), then I feel uncomfortable about it, primary because
of some hardware interactions timer interrupt handler may have assuming that
it is invoked due to a timer interrupt.

> > > > We probably also need to document how 'reprogram' will be invoked
> > > > - with xtime_lock held or not. Again s390 does not seem to require it
> > > > while ARM is using one. I think we should let the arch code take
> > > > xtime_lock if they deem it necessary.
> > >
> > > That seems buggy. I'm guessing they need the xtime_lock there just as
> > > much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> > > it. But I'm fine with leaving it to the arch for now, and then unifying
> > > the locking later, if we find that all archs call seq_lock(xtime_lock).
> >
> > Why do you need the xtime_lock to reprogram the clock comparator (=local
> > APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
> > needed to reprogram the timer.
>
> Hrm, you might be right. I might have mis-typed again. At the point
> where we are ready to hook into the underlying arch-dependent
> current_dyntick_timer->reprogram() routine, we should know the delta
> according to next_timer_interrupt(), so we should not need to query
> jiffies or xtime anymore...

I feel this is a bit tricky on non-comparator based interrupt sources like
a decrementer on PPC64 or the local APIC timer.

cpu_idle()
|
| (IRQs disabled)
V
unsigned int dyn_tick_reprogram_timer(void)
{
int cpu = smp_processor_id();
unsigned int delta;

cpu_set(cpu, nohz_cpu_mask);

smp_wmb();

if (rcu_pending(cpu) || local_softirq_pending()) {
cpu_clear(cpu, nohz_cpu_mask);
return 0;
}

a. delta = next_timer_interrupt() - jiffies;

if (delta < dyn_tick->min_skip) {
cpu_clear(cpu, nohz_cpu_mask);
return 0;
}

if (delta > dyn_tick->max_skip)
delta = dyn_tick->max_skip;

b. dyn_tick->reprogram(delta);

return delta;
}

If we pass just the number of ticks to skip to 'reprogram' then clearly it
is racy with respect to a changing jiffy. For example, lets say that:

At point a) jiffies = 100, next_timer_interrupt = 105. So we pass count 5 to
reprogram.

However by the time we reach point b) jiffies changes to 101. However since
only relative number was passed, reprogram code will cause the CPU to wake up
at 106.

We could consider passing absolute value to 'reprogram' (say 105), like below:

unsigned int dyn_tick_reprogram_timer(void)
{
int cpu = smp_processor_id();
unsigned long next, delta, seq;

cpu_set(cpu, nohz_cpu_mask);

smp_wmb();

if (rcu_pending(cpu) || local_softirq_pending()) {
cpu_clear(cpu, nohz_cpu_mask);
return 0;
}

do {
read_seqbegin(&xtime_lock);

next = next_timer_interrupt();
delta = next - jiffies;

if (delta < dyn_tick->min_skip) {
cpu_clear(cpu, nohz_cpu_mask);
return 0;
}

if (delta > dyn_tick->max_skip)
next = jiffies + dyn_tick->max_skip;

} while (read_seqretry(&xtime_lock, seq));

dyn_tick->reprogram(next);

return delta;
}

Since reprogram has to convert it back to some relative number, it will need
to reference jiffy, which makes it racy and require the read_seqbegin/retry
based conversion to relative number. I feel it is lot cleaner in such
a case to just take a write_lock(&xtime_lock) for the whole of
dyn_tick_reprogram_timer.

> > Again the question who-calls-what.
>
> This is what I get for replying on little sleep. Vatsa, what arch-code
> should need access to current_dyntick_timer->recover_time() or the
> current_dyntick_timer->{enter,exit}_all_cpus_idle() routines? Ah...maybe
> the problem is I removed the generic dyntick_interrupt()? If we have
> that function again, we cann call current_dyntick_timer->recover_time()
> from there as well as current_dyntick_timer->exit_all_cpus_idle().
> current_dyntick_timer->enter_all_cpus_idle() should only need to be
> called from reprogram_dyntick().

Here's what I had in mind for the entire call-flow. This is based on the
interface attached in the mail.

At bootup:

a. dyn_tick_timer structure is initalized and registered
(dyn_tick_register). dyn_tick_register will set the
global dyn_tick pointer to the passed structure.
(See kernel/dyn-tick.c attached)

b. Somewhere down the line, arch code calls dyn_tick_enable()
to enable skipping ticks. This should be called only after
initializing various h/w resources that will be reprogrammed
when we call dyn_tick->reprogram().

Typically both a. and b. are done during device_initcall time.

c. init_dyn_tick_sysfs() creates "/sys/devices/system/dyn_tick/dyn_tick0/enable" file if dyn_tick_register has been called by know (i.e dyn_tick pointer
is non-NULL).

Entering tickless state:

a. cpu_idle calls dyn_tick_reprogram_timer() with IRQs disabled.

b. dyn_tick_reprogram_timer finds out when the next timer is and
whether is atleast min_skip away. If so, it calls
dyn_tick->reprogram(). In the interface that I have attached,
this is called with write_lock held on xtime_lock and is
passed the relative value of number of ticks to be skipped.
Also on x86-arch, it is called with dyn_tick->lock held.

c. dyn_tick->reprogram will arrange for that many ticks to be
skipped. In addition, on x86 like platforms, it can do
step d.

d. if (cpus_equal(nohz_cpu_mask, cpu_online_map))
dyn_tick->enter_all_cpus_idle();

Exiting tickless state:

a. H/w interrupt comes in. dyn_tick_interrupt() is called
as one of the first steps. dyn_tick_interrupt() is completely
defined in arch-code and does what it wants. As an example,
it can do this:

b. dyn_tick_lock(); /* spin_lock(&dyn_tick->lock) on x86 */

if (cpus_equal(nohz_cpu_mask, cpu_online_map))
all_were_idle = 1;
else
all_were_idle = 0;

cpu_clear(cpu, nohz_cpu_mask);

if (all_were_idle)
dyn_tick->exit_all_cpus_idle();

dyn_tick_unlock(); /* spin_unlock(&dyn_tick->lock) on x86 */

c. dyn_tick->recover_time();
This can recover jiffies/xtime and also setup the
next timer. On s390 this can be account_ticks() for
example. Maybe we should call this dyn_tick->handler?

d. dyn_tick_interrupt() returns so that rest of interrupt processing
can occur.

Note that a-d are completely inside arch-code.

I have attached include/linux/dyn-tick.h & kernel/dyn-tick.c as
detailed reference of the interface.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

Attachments:

(No filename) (10.32 kB)
dyn-tick.patch (7.17 kB)
Download all attachments

2005-09-23 06:56:29

[permalink] [raw]

Subject: Re: [PATCH 1/3] dynticks - implement no idle hz for x86

On Thu, Sep 22, 2005 at 03:38:10PM +0200, Martin Schwidefsky wrote:
> As I first saw the dyntick_timer structure I thought: "why does it have
> to be that complicated?". Must be because of the requirements for
> high-res-timers, since it's overkill for no-idle-hz.

I think most of the complication is because of the different needs of
various architectures. But if you think the structure can be cut down
further, that would be good to consider.

> I would really like to see how all the fields from the dyntick_timer
> structure are supposed to be used. Especially the who-calls-what graph,
> if I got it right then the low-level arch code calls common code
> functions which in turn call functions from the dyntick_timer structure.
> The question is what should be the connecting points between the arch
> code and the common timer code? With the current code its

> * do_timer()
> * update_process_times()
> * next_timer_event()
> and the non-trivial interactions with rcu via
> * test/set/clear bit on nohz_cpu_mask
> * rcu_pending() and friends.

I think with dyn-tick, next_timer_event is replaced by
dyn_tick_reprogram_timer(). We should also add add_timer_on() to the
list.

--

Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2005-09-26 15:09:28