2006-10-10 16:04:47

by Arjan van de Ven

[permalink] [raw]
Subject: [patch 0/2] Introduce round_jiffies() to save spurious wakeups

Hi,

the following 2 patches will introduce the round_jiffies() api and users
thereof.

The general idea is that by rounding the jiffies for certain timers to
the next whole second will make those timers all happen at the same
time; and thus reduce the number of times the cpu has to wake up to
service timers (this assumes a tickless kernel)

Obviously only timers where the exact time of firing isn't so important
can do this; several of the recurring "always live" timers of the kernel
are of this kind, they want "about once a second" or "about once every 4
seconds" and such, and don't really care about the exact jiffy in which
they fire.

An alternative would have been to introduce mod_timer_rounded() or
somesuch APIs (but there's many variants that take jiffies); I feel that
an explicit caller based rounding actually is quite reasonable.

Greetings,
Arjan van de Ven


2006-10-10 16:04:49

by Arjan van de Ven

[permalink] [raw]
Subject: [patch 1/2] round_jiffies infrastructure

From: Arjan van de Ven <[email protected]>
Subject: round_jiffies infrastructure

This patch introduces a round_jiffies() function as well as a
round_jiffies_relative() function. These functions round a jiffies value
to the next whole second. The primary purpose of this rounding is to
cause all "we don't care exactly when" timers to happen at the same jiffy.
This avoids multiple timers to fire within the second for no real reason;
with dynamic ticks these extra timers cause wakeups from deep sleep
CPU sleep states and thus waste power.

The exact wakeup moment is skewed by the cpu number, to avoid all
cpus from waking up at the exact same time (and hitting the same
lock/cachelines there)

Signed-off-by: Arjan van de Ven <[email protected]>



Index: linux-2.6.19-rc1-git6/include/linux/timer.h
===================================================================
--- linux-2.6.19-rc1-git6.orig/include/linux/timer.h
+++ linux-2.6.19-rc1-git6/include/linux/timer.h
@@ -98,4 +98,10 @@ extern void run_local_timers(void);
struct hrtimer;
extern int it_real_fn(struct hrtimer *);

+unsigned long __round_jiffies(unsigned long T, int CPU);
+unsigned long __round_jiffies_relative(unsigned long T, int CPU);
+unsigned long round_jiffies(unsigned long T);
+unsigned long round_jiffies_relative(unsigned long T);
+
+
#endif
Index: linux-2.6.19-rc1-git6/kernel/timer.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/kernel/timer.c
+++ linux-2.6.19-rc1-git6/kernel/timer.c
@@ -80,6 +80,56 @@ tvec_base_t boot_tvec_bases;
EXPORT_SYMBOL(boot_tvec_bases);
static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;

+unsigned long __round_jiffies(unsigned long T, int CPU)
+{
+ int rem;
+ int original = T;
+ rem = T % HZ;
+ if (rem < HZ/4)
+ T = T - rem;
+ else
+ T = T - rem + HZ;
+ /* we don't want all cpus firing at once hitting the same lock/memory */
+ T += CPU * 3;
+ if (T <= jiffies) /* rounding ate our timeout entirely */
+ return original;
+ return T;
+}
+EXPORT_SYMBOL_GPL(__round_jiffies);
+
+unsigned long __round_jiffies_relative(unsigned long T, int CPU)
+{
+ int rem;
+ int original = T;
+ T=T+jiffies;
+ rem = T % HZ;
+ if (rem < HZ/4)
+ T = T - rem;
+ else
+ T = T - rem + HZ;
+ /* we don't want all cpus firing at once hitting the same lock/memory */
+ T += CPU * 3;
+ T = T-jiffies;
+ if (T<=0) /* rounding ate our delay entirely, don't round */
+ return original;
+ return T;
+}
+EXPORT_SYMBOL_GPL(__round_jiffies_relative);
+
+unsigned long round_jiffies(unsigned long T)
+{
+ return __round_jiffies(T, raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(round_jiffies);
+
+unsigned long round_jiffies_relative(unsigned long T)
+{
+ return __round_jiffies_relative(T, raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(round_jiffies_relative);
+
+
+
static inline void set_running_timer(tvec_base_t *base,
struct timer_list *timer)
{

2006-10-10 16:06:44

by Arjan van de Ven

[permalink] [raw]
Subject: [patch 2/2] round_jiffies users

From: Arjan van de Ven <[email protected]>
Subject: round_jiffies users
CC: [email protected]
CC: [email protected]

This patch introduces users of the round_jiffies() function.
These timers all were of the "about once a second" or "about once every X seconds"
variety and several showed up in the "what wakes the cpu up" profiles that
the tickless patches provide.

Signed-off-by: Arjan van de Ven <[email protected]>

Index: linux-2.6.19-rc1-git6/mm/slab.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/mm/slab.c
+++ linux-2.6.19-rc1-git6/mm/slab.c
@@ -926,7 +926,7 @@ static void __devinit start_cpu_timer(in
if (keventd_up() && reap_work->func == NULL) {
init_reap_node(cpu);
INIT_WORK(reap_work, cache_reap, NULL);
- schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
+ schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu));
}
}

@@ -3821,7 +3821,7 @@ static void cache_reap(void *unused)
if (!mutex_trylock(&cache_chain_mutex)) {
/* Give up. Setup the next iteration. */
schedule_delayed_work(&__get_cpu_var(reap_work),
- REAPTIMEOUT_CPUC);
+ round_jiffies_relative(REAPTIMEOUT_CPUC));
return;
}

@@ -3867,7 +3867,8 @@ next:
next_reap_node();
refresh_cpu_vm_stats(smp_processor_id());
/* Set up the next iteration */
- schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC);
+ schedule_delayed_work(&__get_cpu_var(reap_work),
+ round_jiffies_relative(REAPTIMEOUT_CPUC));
}

#ifdef CONFIG_PROC_FS
Index: linux-2.6.19-rc1-git6/fs/jbd/transaction.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/fs/jbd/transaction.c
+++ linux-2.6.19-rc1-git6/fs/jbd/transaction.c
@@ -53,7 +53,7 @@ get_transaction(journal_t *journal, tran
spin_lock_init(&transaction->t_handle_lock);

/* Set up the commit timer for the new transaction. */
- journal->j_commit_timer.expires = transaction->t_expires;
+ journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
add_timer(&journal->j_commit_timer);

J_ASSERT(journal->j_running_transaction == NULL);
Index: linux-2.6.19-rc1-git6/drivers/ata/libata-scsi.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/drivers/ata/libata-scsi.c
+++ linux-2.6.19-rc1-git6/drivers/ata/libata-scsi.c
@@ -3094,7 +3094,8 @@ void ata_scsi_hotplug(void *data)
for (i = 0; i < ATA_MAX_DEVICES; i++) {
struct ata_device *dev = &ap->device[i];
if (ata_dev_enabled(dev) && !dev->sdev) {
- queue_delayed_work(ata_aux_wq, &ap->hotplug_task, HZ);
+ queue_delayed_work(ata_aux_wq, &ap->hotplug_task,
+ round_jiffies_relative(HZ));
break;
}
}
Index: linux-2.6.19-rc1-git6/net/core/dst.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/net/core/dst.c
+++ linux-2.6.19-rc1-git6/net/core/dst.c
@@ -99,7 +99,14 @@ static void dst_run_gc(unsigned long dum
printk("dst_total: %d/%d %ld\n",
atomic_read(&dst_total), delayed, dst_gc_timer_expires);
#endif
- mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
+ /* if the next desired timer is more than 4 seconds in the future
+ * then round the timer to whole seconds
+ */
+ if (dst_gc_timer_expires > 4*HZ)
+ mod_timer(&dst_gc_timer,
+ round_jiffies(jiffies + dst_gc_timer_expires));
+ else
+ mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);

out:
spin_unlock(&dst_lock);
Index: linux-2.6.19-rc1-git6/net/core/neighbour.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/net/core/neighbour.c
+++ linux-2.6.19-rc1-git6/net/core/neighbour.c
@@ -695,7 +695,10 @@ next_elt:
if (!expire)
expire = 1;

- mod_timer(&tbl->gc_timer, now + expire);
+ if (expire>HZ)
+ mod_timer(&tbl->gc_timer, round_jiffies(now + expire));
+ else
+ mod_timer(&tbl->gc_timer, now + expire);

write_unlock(&tbl->lock);
}
Index: linux-2.6.19-rc1-git6/net/sched/sch_generic.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/net/sched/sch_generic.c
+++ linux-2.6.19-rc1-git6/net/sched/sch_generic.c
@@ -209,7 +209,7 @@ static void dev_watchdog(unsigned long a
dev->name);
dev->tx_timeout(dev);
}
- if (!mod_timer(&dev->watchdog_timer, jiffies + dev->watchdog_timeo))
+ if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + dev->watchdog_timeo)))
dev_hold(dev);
}
}
Index: linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
@@ -483,7 +483,7 @@ e1000_up(struct e1000_adapter *adapter)

clear_bit(__E1000_DOWN, &adapter->flags);

- mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
+ mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
return 0;
}

@@ -2493,7 +2493,7 @@ e1000_watchdog(unsigned long data)

netif_carrier_on(netdev);
netif_wake_queue(netdev);
- mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+ mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ));
adapter->smartspeed = 0;
}
} else {
@@ -2503,7 +2503,7 @@ e1000_watchdog(unsigned long data)
DPRINTK(LINK, INFO, "NIC Link is Down\n");
netif_carrier_off(netdev);
netif_stop_queue(netdev);
- mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+ mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ));

/* 80003ES2LAN workaround--
* For packet buffer work-around on link down event;
@@ -2568,7 +2568,7 @@ e1000_watchdog(unsigned long data)
e1000_rar_set(&adapter->hw, adapter->hw.mac_addr, 0);

/* Reset the timer */
- mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
+ mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
}

#define E1000_TX_FLAGS_CSUM 0x00000001

2006-10-10 16:48:06

by Ingo Oeser

[permalink] [raw]
Subject: Re: [patch 2/2] round_jiffies users

Hi Arjan,

Arjan van de Ven wrote:
> Index: linux-2.6.19-rc1-git6/mm/slab.c
> ===================================================================
> --- linux-2.6.19-rc1-git6.orig/mm/slab.c
> +++ linux-2.6.19-rc1-git6/mm/slab.c
> @@ -926,7 +926,7 @@ static void __devinit start_cpu_timer(in
> if (keventd_up() && reap_work->func == NULL) {
> init_reap_node(cpu);
> INIT_WORK(reap_work, cache_reap, NULL);
> - schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
> + schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu));
> }
> }
>

Did you changed the behavior by intention?
You seem to miss the factor "3" here. This hunk should read:

--- linux-2.6.19-rc1-git6.orig/mm/slab.c
+++ linux-2.6.19-rc1-git6/mm/slab.c
@@ -926,7 +926,7 @@ static void __devinit start_cpu_timer(in
if (keventd_up() && reap_work->func == NULL) {
init_reap_node(cpu);
INIT_WORK(reap_work, cache_reap, NULL);
- schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
+ schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, 3 * cpu));
}
}


In case you apply it:

Signed-off-by: Ingo Oese <[email protected]>


Regards

Ingo Oeser

2006-10-10 17:01:04

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 2/2] round_jiffies users

On Tue, 2006-10-10 at 18:47 +0200, Ingo Oeser wrote:
> Hi Arjan,
>
> Arjan van de Ven wrote:
> > Index: linux-2.6.19-rc1-git6/mm/slab.c
> > ===================================================================
> > --- linux-2.6.19-rc1-git6.orig/mm/slab.c
> > +++ linux-2.6.19-rc1-git6/mm/slab.c
> > @@ -926,7 +926,7 @@ static void __devinit start_cpu_timer(in
> > if (keventd_up() && reap_work->func == NULL) {
> > init_reap_node(cpu);
> > INIT_WORK(reap_work, cache_reap, NULL);
> > - schedule_delayed_work_on(cpu, reap_work, HZ + 3 * cpu);
> > + schedule_delayed_work_on(cpu, reap_work, __round_jiffies_relative(HZ, cpu));
> > }
> > }
> >
>
> Did you changed the behavior by intention?
> You seem to miss the factor "3" here. This hunk should read:

Hi,

actually.. not really; the __round_jiffies_relative function just takes
a CPU number, and internally takes care of spreading things around based
on CPU number (eg it does the *3 internally); it's cleaner that way, the
callers don't need to bother by how much to spread for each cpu etc
etc... So the patch is correct as is.


Greetings,
Arjan van de Ven


2006-10-10 18:57:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch 1/2] round_jiffies infrastructure

On Tue, 10 Oct 2006 18:03:30 +0200
Arjan van de Ven <[email protected]> wrote:

> @@ -80,6 +80,56 @@ tvec_base_t boot_tvec_bases;
> EXPORT_SYMBOL(boot_tvec_bases);
> static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;
>
> +unsigned long __round_jiffies(unsigned long T, int CPU)
> +{
> + int rem;
> + int original = T;
> + rem = T % HZ;
> + if (rem < HZ/4)
> + T = T - rem;
> + else
> + T = T - rem + HZ;
> + /* we don't want all cpus firing at once hitting the same lock/memory */
> + T += CPU * 3;
> + if (T <= jiffies) /* rounding ate our timeout entirely */
> + return original;
> + return T;
> +}
> +EXPORT_SYMBOL_GPL(__round_jiffies);
> +

c'mon Arjan. If we're going to create new, kernel-wide,
exported-to-modules infrastructure then it deserves slightly more than zero
documentation.

Some commentary explaining/justifying the magic numbers in there would be
useful too. The HZ/4, the cpu*3.

And coding style too, please: consistent spacing around arithmetic
operators, variables are lower case, constants are upper case.

2006-10-10 20:48:36

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 1/2] round_jiffies infrastructure


> c'mon Arjan. If we're going to create new, kernel-wide,
> exported-to-modules infrastructure then it deserves slightly more than zero
> documentation.

ok I had a weak moment; updated patch below
(lightly tested because current 2.6.19-rc1-git6 dies on x86_64 with an
interrupt issue within a few minutes even unpatched on my machines)


From: Arjan van de Ven <[email protected]>
Subject: round_jiffies infrastructure

This patch introduces a round_jiffies() function as well as a
round_jiffies_relative() function. These functions round a jiffies value
to the next whole second. The primary purpose of this rounding is to
cause all "we don't care exactly when" timers to happen at the same
jiffy.
This avoids multiple timers to fire within the second for no real
reason; with dynamic ticks these extra timers cause wakeups from deep
sleep CPU sleep states and thus waste power.

The exact wakeup moment is skewed by the cpu number, to avoid all
cpus from waking up at the exact same time (and hitting the same
lock/cachelines there)

Signed-off-by: Arjan van de Ven <[email protected]>




Index: linux-2.6.19-rc1-git6/include/linux/timer.h
===================================================================
--- linux-2.6.19-rc1-git6.orig/include/linux/timer.h
+++ linux-2.6.19-rc1-git6/include/linux/timer.h
@@ -98,4 +98,10 @@ extern void run_local_timers(void);
struct hrtimer;
extern int it_real_fn(struct hrtimer *);

+unsigned long __round_jiffies(unsigned long j, int cpu);
+unsigned long __round_jiffies_relative(unsigned long j, int cpu);
+unsigned long round_jiffies(unsigned long j);
+unsigned long round_jiffies_relative(unsigned long j);
+
+
#endif
Index: linux-2.6.19-rc1-git6/kernel/timer.c
===================================================================
--- linux-2.6.19-rc1-git6.orig/kernel/timer.c
+++ linux-2.6.19-rc1-git6/kernel/timer.c
@@ -80,6 +80,139 @@ tvec_base_t boot_tvec_bases;
EXPORT_SYMBOL(boot_tvec_bases);
static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;

+/**
+ * __round_jiffies - function to round jiffies to a full second
+ * @j: the time in (absolute) jiffies that should be rounded
+ * @cpu: the processor number on which the timeout will happen
+ *
+ * __round_jiffies rounds an absolute time in the future (in jiffies)
+ * up or down to (approximately) full seconds. This is useful for timers
+ * for which the exact time they fire does not matter too much, as long as
+ * they fire approximately every X seconds.
+ *
+ * By rounding these timers to whole seconds, all such timers will fire
+ * at the same time, rather than at various times spread out. The goal
+ * of this is to have the CPU wake up less, which saves power.
+ *
+ * The exact rounding is skewed for each processor to avoid all
+ * processors firing at the exact same time, which could lead
+ * to lock contention or spurious cache line bouncing.
+ *
+ * The return value is the rounded version of the "j" parameter.
+ */
+unsigned long __round_jiffies(unsigned long j, int cpu)
+{
+ int rem;
+ int original = j;
+
+ /*
+ * We don't want all cpus firing their timers at once hitting the
+ * same lock or cachelines, so we skew each extra cpu with an extra
+ * 3 jiffies. This 3 jiffies came originally from the mm/ code which
+ * already did this.
+ * The skew is done by adding 3*cpunr, then round, then subtract this
+ * extra offset again.
+ */
+ j += cpu * 3;
+
+ rem = j % HZ;
+
+ /*
+ * If the target jiffie is just after a whole second (which can happen
+ * due to delays of the timer irq, long irq off times etc etc) then
+ * we should round down to the whole second, not up. Use 1/4th second
+ * as cutoff for this rounding as an extreme upper bound for this.
+ */
+ if (rem < HZ/4) /* round down */
+ j = j - rem;
+ else /* round up */
+ j = j - rem + HZ;
+
+ /* now that we have rounded, subtract the extra skew again */
+ j -= cpu * 3;
+
+ if (j <= jiffies) /* rounding ate our timeout entirely; */
+ return original;
+ return j;
+}
+EXPORT_SYMBOL_GPL(__round_jiffies);
+
+/**
+ * __round_jiffies_relative - function to round jiffies to a full second
+ * @j: the time in (relative) jiffies that should be rounded
+ * @cpu: the processor number on which the timeout will happen
+ *
+ * __round_jiffies_relative rounds a time delta in the future (in jiffies)
+ * up or down to (approximately) full seconds. This is useful for timers
+ * for which the exact time they fire does not matter too much, as long as
+ * they fire approximately every X seconds.
+ *
+ * By rounding these timers to whole seconds, all such timers will fire
+ * at the same time, rather than at various times spread out. The goal
+ * of this is to have the CPU wake up less, which saves power.
+ *
+ * The exact rounding is skewed for each processor to avoid all
+ * processors firing at the exact same time, which could lead
+ * to lock contention or spurious cache line bouncing.
+ *
+ * The return value is the rounded version of the "j" parameter.
+ */
+unsigned long __round_jiffies_relative(unsigned long j, int cpu)
+{
+ /*
+ * In theory the following code can skip a jiffy in case jiffies
+ * increments right between the addition and the later subtraction.
+ * However since the entire point of this function is to use approximate
+ * timeouts, it's entirely ok to not handle that.
+ */
+ return __round_jiffies(j + jiffies, cpu) - jiffies;
+}
+EXPORT_SYMBOL_GPL(__round_jiffies_relative);
+
+/**
+ * round_jiffies - function to round jiffies to a full second
+ * @j: the time in (absolute) jiffies that should be rounded
+ *
+ * round_jiffies rounds an absolute time in the future (in jiffies)
+ * up or down to (approximately) full seconds. This is useful for timers
+ * for which the exact time they fire does not matter too much, as long as
+ * they fire approximately every X seconds.
+ *
+ * By rounding these timers to whole seconds, all such timers will fire
+ * at the same time, rather than at various times spread out. The goal
+ * of this is to have the CPU wake up less, which saves power.
+ *
+ * The return value is the rounded version of the "j" parameter.
+ */
+unsigned long round_jiffies(unsigned long j)
+{
+ return __round_jiffies(j, raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(round_jiffies);
+
+/**
+ * round_jiffies_relative - function to round jiffies to a full second
+ * @j: the time in (relative) jiffies that should be rounded
+ *
+ * round_jiffies_relative rounds a time delta in the future (in jiffies)
+ * up or down to (approximately) full seconds. This is useful for timers
+ * for which the exact time they fire does not matter too much, as long as
+ * they fire approximately every X seconds.
+ *
+ * By rounding these timers to whole seconds, all such timers will fire
+ * at the same time, rather than at various times spread out. The goal
+ * of this is to have the CPU wake up less, which saves power.
+ *
+ * The return value is the rounded version of the "j" parameter.
+ */
+unsigned long round_jiffies_relative(unsigned long j)
+{
+ return __round_jiffies_relative(j, raw_smp_processor_id());
+}
+EXPORT_SYMBOL_GPL(round_jiffies_relative);
+
+
+
static inline void set_running_timer(tvec_base_t *base,
struct timer_list *timer)
{

2006-10-10 22:47:32

by Paul Dickson

[permalink] [raw]
Subject: Re: [patch 2/2] round_jiffies users

On Tue, 10 Oct 2006 18:04:23 +0200, Arjan van de Ven wrote:

> + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ));

Shouldn't round_jiffies_relative be used for some of these, a la:

+ mod_timer(&adapter->phy_info_timer, round_jiffies_relative(2 * HZ));

-Paul

2006-10-10 23:52:45

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 2/2] round_jiffies users

Paul Dickson wrote:
> On Tue, 10 Oct 2006 18:04:23 +0200, Arjan van de Ven wrote:
>
>> + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ));
>
> Shouldn't round_jiffies_relative be used for some of these, a la:
>
> + mod_timer(&adapter->phy_info_timer, round_jiffies_relative(2 * HZ));
>

mod_timer() takes an absolute jiffies value as argument, so... no :)

2006-10-11 17:23:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 0/2] Introduce round_jiffies() to save spurious wakeups

On Tue, Oct 10, 2006 at 06:02:45PM +0200, Arjan van de Ven wrote:
> Hi,
>
> the following 2 patches will introduce the round_jiffies() api and users
> thereof.
>
> The general idea is that by rounding the jiffies for certain timers to
> the next whole second will make those timers all happen at the same
> time; and thus reduce the number of times the cpu has to wake up to
> service timers (this assumes a tickless kernel)
>
> Obviously only timers where the exact time of firing isn't so important
> can do this; several of the recurring "always live" timers of the kernel
> are of this kind, they want "about once a second" or "about once every 4
> seconds" and such, and don't really care about the exact jiffy in which
> they fire.
>
> An alternative would have been to introduce mod_timer_rounded() or
> somesuch APIs (but there's many variants that take jiffies); I feel that
> an explicit caller based rounding actually is quite reasonable.

I think the API you proposed is horrible. Having jiffies exposed in
ani API is a mistake, and adding more makes this problem worse. I'd suggest
to start with Alan's patches that add a timer variant that takes a miliseconds
argument instead of jiffies and add a _rounded varaint to it that has
a new parameter that specifies the precision.

2006-10-11 17:55:23

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch 0/2] Introduce round_jiffies() to save spurious wakeups

On Wed, 2006-10-11 at 18:23 +0100, Christoph Hellwig wrote:
> On Tue, Oct 10, 2006 at 06:02:45PM +0200, Arjan van de Ven wrote:
> > Hi,
> >
> > the following 2 patches will introduce the round_jiffies() api and users
> > thereof.
> >
> > The general idea is that by rounding the jiffies for certain timers to
> > the next whole second will make those timers all happen at the same
> > time; and thus reduce the number of times the cpu has to wake up to
> > service timers (this assumes a tickless kernel)
> >
> > Obviously only timers where the exact time of firing isn't so important
> > can do this; several of the recurring "always live" timers of the kernel
> > are of this kind, they want "about once a second" or "about once every 4
> > seconds" and such, and don't really care about the exact jiffy in which
> > they fire.
> >
> > An alternative would have been to introduce mod_timer_rounded() or
> > somesuch APIs (but there's many variants that take jiffies); I feel that
> > an explicit caller based rounding actually is quite reasonable.
>
> I think the API you proposed is horrible. Having jiffies exposed in
> ani API is a mistake, and adding more makes this problem worse.

and other people like Linus disagree with you.

> I'd suggest
> to start with Alan's patches that add a timer variant that takes a miliseconds
> argument instead of jiffies and add a _rounded varaint to it that has
> a new parameter that specifies the precision.

it's half a solution; there's many apis that currently take either
absolute or relative jiffies, and want rounding. Duplicating that lot
doesn't look like the best idea either...

2006-10-12 19:02:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 0/2] Introduce round_jiffies() to save spurious wakeups

On Wed, Oct 11, 2006 at 07:54:57PM +0200, Arjan van de Ven wrote:
> > > An alternative would have been to introduce mod_timer_rounded() or
> > > somesuch APIs (but there's many variants that take jiffies); I feel that
> > > an explicit caller based rounding actually is quite reasonable.
> >
> > I think the API you proposed is horrible. Having jiffies exposed in
> > ani API is a mistake, and adding more makes this problem worse.
>
> and other people like Linus disagree with you.

The only argument from Linus was about getting rid of jiffies completly.
He certainly didn't complain when we added interfaces to reduce jiffies
use in the past (e.g. the msleep APIs)

2006-10-16 13:42:55

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 1/2] round_jiffies infrastructure

Arjan van de Ven <[email protected]> writes:
> +}
> +EXPORT_SYMBOL_GPL(__round_jiffies);

This means non GPL modules will disturb your timers again. probably not
a good strategy.

> +
> +unsigned long __round_jiffies_relative(unsigned long T, int CPU)
> +{
> + int rem;
> + int original = T;
> + T=T+jiffies;
> + rem = T % HZ;
> + if (rem < HZ/4)
> + T = T - rem;
> + else
> + T = T - rem + HZ;
> + /* we don't want all cpus firing at once hitting the same lock/memory */
> + T += CPU * 3;

Consider a dual core Yonah/Merom: it has shared caches and the two cores
can only go to sleep together. With this the wakeups will be always
twice. Not good. I guess you need to add some topology awareness here
and e.g. only spread it for sockets.

BTW we normally put spaces around operators inside expressions.

-Andi