2006-11-22 00:52:58

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue


Add a new flag to timers to allow "Do not call when Idle" mode.
Export this sort of soft timer over workqueues and use it in ondemand governor.

This patch avoids the periodic ondemand timer invocations in
Dynamic Tick kernels.

This can be used in various other kernel timers that end up setting up
unnecessary timers during idle.

Signed-off-by: Venkatesh Pallipadi <[email protected]>

Index: linux-2.6.19-rc-mm/include/linux/timer.h
===================================================================
--- linux-2.6.19-rc-mm.orig/include/linux/timer.h 2006-11-13 15:06:26.000000000 -0800
+++ linux-2.6.19-rc-mm/include/linux/timer.h 2006-11-13 16:01:03.000000000 -0800
@@ -8,6 +8,8 @@

struct tvec_t_base_s;

+#define TIMER_FLAG_NOT_IN_IDLE (0x1)
+
struct timer_list {
struct list_head entry;
unsigned long expires;
@@ -16,6 +18,7 @@
unsigned long data;

struct tvec_t_base_s *base;
+ int flags;
#ifdef CONFIG_TIMER_STATS
void *start_site;
char start_comm[16];
@@ -30,6 +33,7 @@
.expires = (_expires), \
.data = (_data), \
.base = &boot_tvec_bases, \
+ .flags = 0, \
}

#define DEFINE_TIMER(_name, _function, _expires, _data) \
Index: linux-2.6.19-rc-mm/include/linux/workqueue.h
===================================================================
--- linux-2.6.19-rc-mm.orig/include/linux/workqueue.h 2006-11-13 15:06:26.000000000 -0800
+++ linux-2.6.19-rc-mm/include/linux/workqueue.h 2006-11-13 16:01:03.000000000 -0800
@@ -65,6 +65,8 @@
extern int FASTCALL(queue_delayed_work(struct workqueue_struct *wq, struct work_struct *work, unsigned long delay));
extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work, unsigned long delay);
+extern int queue_soft_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct work_struct *work, unsigned long delay);
extern void FASTCALL(flush_workqueue(struct workqueue_struct *wq));

extern int FASTCALL(schedule_work(struct work_struct *work));
Index: linux-2.6.19-rc-mm/kernel/timer.c
===================================================================
--- linux-2.6.19-rc-mm.orig/kernel/timer.c 2006-11-13 15:06:26.000000000 -0800
+++ linux-2.6.19-rc-mm/kernel/timer.c 2006-11-13 16:01:03.000000000 -0800
@@ -639,6 +639,8 @@
j = base->timer_jiffies & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + j, entry) {
+ if (nte->flags & TIMER_FLAG_NOT_IN_IDLE)
+ continue;
expires = nte->expires;
found = nte;
if (j < (base->timer_jiffies & TVR_MASK))
Index: linux-2.6.19-rc-mm/kernel/workqueue.c
===================================================================
--- linux-2.6.19-rc-mm.orig/kernel/workqueue.c 2006-11-13 15:06:26.000000000 -0800
+++ linux-2.6.19-rc-mm/kernel/workqueue.c 2006-11-13 16:01:03.000000000 -0800
@@ -196,6 +196,20 @@
}
EXPORT_SYMBOL_GPL(queue_delayed_work_on);

+int queue_soft_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct work_struct *work, unsigned long delay)
+{
+ int ret;
+ struct timer_list *timer = &work->timer;
+ ret = queue_delayed_work_on(cpu, wq, work, delay);
+ if (ret) {
+ timer->flags |= TIMER_FLAG_NOT_IN_IDLE;
+ }
+ return ret;
+}
+
+EXPORT_SYMBOL_GPL(queue_soft_delayed_work_on);
+
static void run_workqueue(struct cpu_workqueue_struct *cwq)
{
unsigned long flags;
Index: linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.19-rc-mm.orig/drivers/cpufreq/cpufreq_ondemand.c 2006-11-13 15:58:04.000000000 -0800
+++ linux-2.6.19-rc-mm/drivers/cpufreq/cpufreq_ondemand.c 2006-11-21 12:23:15.000000000 -0800
@@ -446,7 +448,7 @@
dbs_info->freq_lo,
CPUFREQ_RELATION_H);
}
- queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
+ queue_soft_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
}

static inline void dbs_timer_init(unsigned int cpu)
@@ -456,9 +458,9 @@
int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
delay -= jiffies % delay;

ondemand_powersave_bias_init();
INIT_WORK(&dbs_info->work, do_dbs_timer, NULL);
- queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
+ queue_soft_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
}

static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)


2006-11-22 02:11:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue

On Tue, 21 Nov 2006 16:28:45 -0800
Venkatesh Pallipadi <[email protected]> wrote:

>
> Add a new flag to timers to allow "Do not call when Idle" mode.
> Export this sort of soft timer over workqueues and use it in ondemand governor.
>
> This patch avoids the periodic ondemand timer invocations in
> Dynamic Tick kernels.
>
> This can be used in various other kernel timers that end up setting up
> unnecessary timers during idle.
>
> Signed-off-by: Venkatesh Pallipadi <[email protected]>
>
> Index: linux-2.6.19-rc-mm/include/linux/timer.h
> ===================================================================
> --- linux-2.6.19-rc-mm.orig/include/linux/timer.h 2006-11-13 15:06:26.000000000 -0800
> +++ linux-2.6.19-rc-mm/include/linux/timer.h 2006-11-13 16:01:03.000000000 -0800
> @@ -8,6 +8,8 @@
>
> struct tvec_t_base_s;
>
> +#define TIMER_FLAG_NOT_IN_IDLE (0x1)
> +
> struct timer_list {
> struct list_head entry;
> unsigned long expires;
> @@ -16,6 +18,7 @@
> unsigned long data;
>
> struct tvec_t_base_s *base;
> + int flags;
> #ifdef CONFIG_TIMER_STATS

Adding a new field to the timer_list is somewhat of a hit - this is going
to make an awful lot of data structures a bit larger. Some of which we
allocate a large number of.

I think we could justfy getting nasty and using the LSB of
timer_list.function for this..

> void *start_site;
> char start_comm[16];
> @@ -30,6 +33,7 @@
> .expires = (_expires), \
> .data = (_data), \
> .base = &boot_tvec_bases, \
> + .flags = 0, \
> }
>
> #define DEFINE_TIMER(_name, _function, _expires, _data) \
> Index: linux-2.6.19-rc-mm/include/linux/workqueue.h
> ===================================================================
> --- linux-2.6.19-rc-mm.orig/include/linux/workqueue.h 2006-11-13 15:06:26.000000000 -0800
> +++ linux-2.6.19-rc-mm/include/linux/workqueue.h 2006-11-13 16:01:03.000000000 -0800
> @@ -65,6 +65,8 @@
> extern int FASTCALL(queue_delayed_work(struct workqueue_struct *wq, struct work_struct *work, unsigned long delay));
> extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> struct work_struct *work, unsigned long delay);
> +extern int queue_soft_delayed_work_on(int cpu, struct workqueue_struct *wq,
> + struct work_struct *work, unsigned long delay);

I don't think that's a well-chosen name. What does the "soft" mean?

Also, this is a new timer API capability, but it is only exposed via the
workqueue API, and then only via a part of it.

A complete implementation would expose the new capability via extensions to
the timer API, and would then (in a separate patch) convert the workqueue
API to use those extensions. (And in fact the third patch would convert
cpufreq to use the new workqueue capabilities...)

> extern void FASTCALL(flush_workqueue(struct workqueue_struct *wq));
>
> extern int FASTCALL(schedule_work(struct work_struct *work));
> Index: linux-2.6.19-rc-mm/kernel/timer.c
> ===================================================================
> --- linux-2.6.19-rc-mm.orig/kernel/timer.c 2006-11-13 15:06:26.000000000 -0800
> +++ linux-2.6.19-rc-mm/kernel/timer.c 2006-11-13 16:01:03.000000000 -0800
> @@ -639,6 +639,8 @@
> j = base->timer_jiffies & TVR_MASK;
> do {
> list_for_each_entry(nte, base->tv1.vec + j, entry) {
> + if (nte->flags & TIMER_FLAG_NOT_IN_IDLE)
> + continue;
> expires = nte->expires;
> found = nte;
> if (j < (base->timer_jiffies & TVR_MASK))
> Index: linux-2.6.19-rc-mm/kernel/workqueue.c
> ===================================================================
> --- linux-2.6.19-rc-mm.orig/kernel/workqueue.c 2006-11-13 15:06:26.000000000 -0800
> +++ linux-2.6.19-rc-mm/kernel/workqueue.c 2006-11-13 16:01:03.000000000 -0800
> @@ -196,6 +196,20 @@
> }
> EXPORT_SYMBOL_GPL(queue_delayed_work_on);
>
> +int queue_soft_delayed_work_on(int cpu, struct workqueue_struct *wq,
> + struct work_struct *work, unsigned long delay)
> +{
> + int ret;
> + struct timer_list *timer = &work->timer;
> + ret = queue_delayed_work_on(cpu, wq, work, delay);
> + if (ret) {
> + timer->flags |= TIMER_FLAG_NOT_IN_IDLE;
> + }
> + return ret;
> +}

See, here we have workqueue code poking around in timer_list internals.

Also, it's weird and conceivably racy to go altering the timer _after_
scheduling the work. What happens if the timer goes off before we set the
bit? And suppose the workqueue callback function frees the storage at
*timer?

All these problems will go away if the APIs are done fully, and correctly.


2006-11-22 06:38:24

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue

Andrew Morton wrote:
>> Index: linux-2.6.19-rc-mm/include/linux/timer.h
>> ===================================================================
>> --- linux-2.6.19-rc-mm.orig/include/linux/timer.h 2006-11-13 15:06:26.000000000 -0800
>> +++ linux-2.6.19-rc-mm/include/linux/timer.h 2006-11-13 16:01:03.000000000 -0800
>> @@ -8,6 +8,8 @@
>>
>> struct tvec_t_base_s;
>>
>> +#define TIMER_FLAG_NOT_IN_IDLE (0x1)
>> +
>> struct timer_list {
>> struct list_head entry;
>> unsigned long expires;
>> @@ -16,6 +18,7 @@
>> unsigned long data;
>>
>> struct tvec_t_base_s *base;
>> + int flags;
>> #ifdef CONFIG_TIMER_STATS
>>
>
> Adding a new field to the timer_list is somewhat of a hit - this is going
> to make an awful lot of data structures a bit larger. Some of which we
> allocate a large number of.
>
> I think we could justfy getting nasty and using the LSB of
> timer_list.function for this..
>
>

The lsb of a function pointer is used in variable length instruction
processors (such as x86 when optimizing for size). The msb is constant
though.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2006-11-22 08:50:41

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue

Avi Kivity wrote:
> The lsb of a function pointer is used in variable length instruction
> processors (such as x86 when optimizing for size). The msb is constant
> though.
>

x86 aligns the start of functions to 4 bytes even when optimizing for
size though...

2006-11-22 17:37:41

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue

On Tue, Nov 21, 2006 at 06:11:14PM -0800, Andrew Morton wrote:
> On Tue, 21 Nov 2006 16:28:45 -0800
> Venkatesh Pallipadi <[email protected]> wrote:
>
> >
> > struct timer_list {
> > struct list_head entry;
> > unsigned long expires;
> > @@ -16,6 +18,7 @@
> > unsigned long data;
> >
> > struct tvec_t_base_s *base;
> > + int flags;
> > #ifdef CONFIG_TIMER_STATS
>
> Adding a new field to the timer_list is somewhat of a hit - this is going
> to make an awful lot of data structures a bit larger. Some of which we
> allocate a large number of.
>
> I think we could justfy getting nasty and using the LSB of
> timer_list.function for this..

That is a clever idea... Is that going to work in all architectures with all
compiler flags?


> > #define DEFINE_TIMER(_name, _function, _expires, _data) \
> > Index: linux-2.6.19-rc-mm/include/linux/workqueue.h
> > ===================================================================
> > --- linux-2.6.19-rc-mm.orig/include/linux/workqueue.h 2006-11-13 15:06:26.000000000 -0800
> > +++ linux-2.6.19-rc-mm/include/linux/workqueue.h 2006-11-13 16:01:03.000000000 -0800
> > @@ -65,6 +65,8 @@
> > extern int FASTCALL(queue_delayed_work(struct workqueue_struct *wq, struct work_struct *work, unsigned long delay));
> > extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
> > struct work_struct *work, unsigned long delay);
> > +extern int queue_soft_delayed_work_on(int cpu, struct workqueue_struct *wq,
> > + struct work_struct *work, unsigned long delay);
>
> I don't think that's a well-chosen name. What does the "soft" mean?
>
> Also, this is a new timer API capability, but it is only exposed via the
> workqueue API, and then only via a part of it.
>
> A complete implementation would expose the new capability via extensions to
> the timer API, and would then (in a separate patch) convert the workqueue
> API to use those extensions. (And in fact the third patch would convert
> cpufreq to use the new workqueue capabilities...)
>

I agree with all API related comments. I will clean it up in the next refresh of
the patch. I wanted to get the comments on this mechanism in general and
wanted to be doubly sure that I am not breaking anything in cascading timer
by doing something like this. Looks like the mechanism is OK.
So, I will work on a cleaner patch and repost.

Thanks,
Venki

2006-11-22 20:03:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] Add do_not_call_when_idle option to timer and workqueue

On Wed, 22 Nov 2006 09:13:24 -0800
Venkatesh Pallipadi <[email protected]> wrote:

> On Tue, Nov 21, 2006 at 06:11:14PM -0800, Andrew Morton wrote:
> > On Tue, 21 Nov 2006 16:28:45 -0800
> > Venkatesh Pallipadi <[email protected]> wrote:
> >
> > >
> > > struct timer_list {
> > > struct list_head entry;
> > > unsigned long expires;
> > > @@ -16,6 +18,7 @@
> > > unsigned long data;
> > >
> > > struct tvec_t_base_s *base;
> > > + int flags;
> > > #ifdef CONFIG_TIMER_STATS
> >
> > Adding a new field to the timer_list is somewhat of a hit - this is going
> > to make an awful lot of data structures a bit larger. Some of which we
> > allocate a large number of.
> >
> > I think we could justfy getting nasty and using the LSB of
> > timer_list.function for this..
>
> That is a clever idea... Is that going to work in all architectures with all
> compiler flags?

Don't know. Possibly not. Other options are list.next, list.prev and
base. None of them are pleasant.