Hi all,
This is a resubmit of the patch proposed by Ingo and me
few month ago:
http://lkml.org/lkml/2007/11/20/343
It was in -mm for a while and then removed due to a move into
the mainline, but it never appeared in it.
As recently a new discussion started on the subject I decided
to resubmit it again.
Now it is against 2.6.25.4, but as far as there are no changes
in the modified files it should apply to 2.6.26-rc4 too.
Comments/critics are wellcome.
Marin Mitov
Signed-off-by: Marin Mitov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
=========================================
--- a/arch/x86/lib/delay_32.c 2007-11-18 08:14:05.000000000 +0200
+++ b/arch/x86/lib/delay_32.c 2007-11-20 19:03:52.000000000 +0200
@@ -40,18 +40,42 @@
:"0" (loops));
}
-/* TSC based delay: */
+/* TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
static void delay_tsc(unsigned long loops)
{
- unsigned long bclock, now;
+ unsigned prev, prev_1, now;
+ unsigned left = loops;
+ unsigned prev_cpu, cpu;
+
+ preempt_disable();
+ rdtscl(prev);
+ prev_cpu = smp_processor_id();
+ preempt_enable();
+ now = prev;
- preempt_disable(); /* TSC's are per-cpu */
- rdtscl(bclock);
do {
rep_nop();
+
+ left -= now - prev;
+ prev = now;
+
+ preempt_disable();
+ rdtscl(prev_1);
+ cpu = smp_processor_id();
rdtscl(now);
- } while ((now-bclock) < loops);
- preempt_enable();
+ preempt_enable();
+
+ if (prev_cpu != cpu) {
+ /*
+ * We have migrated, forget prev_cpu's tsc reading
+ */
+ prev = prev_1;
+ prev_cpu = cpu;
+ }
+ } while ((now-prev) < left);
}
/*
--- a/arch/x86/lib/delay_64.c 2007-11-18 08:14:40.000000000 +0200
+++ b/arch/x86/lib/delay_64.c 2007-11-20 19:47:29.000000000 +0200
@@ -28,18 +28,42 @@
return 0;
}
+/* TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
void __delay(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned prev, prev_1, now;
+ unsigned left = loops;
+ unsigned prev_cpu, cpu;
+
+ preempt_disable();
+ rdtscl(prev);
+ prev_cpu = smp_processor_id();
+ preempt_enable();
+ now = prev;
- preempt_disable(); /* TSC's are pre-cpu */
- rdtscl(bclock);
do {
- rep_nop();
+ rep_nop();
+
+ left -= now - prev;
+ prev = now;
+
+ preempt_disable();
+ rdtscl(prev_1);
+ cpu = smp_processor_id();
rdtscl(now);
- }
- while ((now-bclock) < loops);
- preempt_enable();
+ preempt_enable();
+
+ if (prev_cpu != cpu) {
+ /*
+ * We have migrated, forget prev_cpu's tsc reading
+ */
+ prev = prev_1;
+ prev_cpu = cpu;
+ }
+ } while ((now-prev) < left);
}
EXPORT_SYMBOL(__delay);
-
> static void delay_tsc(unsigned long loops)
> {
> - unsigned long bclock, now;
> + unsigned prev, prev_1, now;
> + unsigned left = loops;
> + unsigned prev_cpu, cpu;
> +
> + preempt_disable();
> + rdtscl(prev);
The unsigneds should be probably u64 and the rdtsc rdtscll.
Otherwise this will all overflow for longer waits on a very
fast systems (e.g. a 5Ghz system wraps 32bit in ~1.1 seconds)
Normally such delays shouldn't be that long, but why risk
overflow even in extreme cases?
-Andi
On Sunday 01 June 2008 07:25:17 pm Andi Kleen wrote:
> > static void delay_tsc(unsigned long loops)
> > {
> > - unsigned long bclock, now;
> > + unsigned prev, prev_1, now;
> > + unsigned left = loops;
> > + unsigned prev_cpu, cpu;
> > +
> > + preempt_disable();
> > + rdtscl(prev);
>
>
> The unsigneds should be probably u64 and the rdtsc rdtscll.
> Otherwise this will all overflow for longer waits on a very
> fast systems (e.g. a 5Ghz system wraps 32bit in ~1.1 seconds)
> Normally such delays shouldn't be that long, but why risk
> overflow even in extreme cases?
Yes in principles, but the overflow (that could happen between
rdtscl(prev) and rdtscl(now) is taken into account the same way
as in time_after()/time_before() macros, (differences only) see:
+ left -= now - prev;
.........
+ } while ((now-prev) < left);
If more than one overflow happen between rdtscl(prev) and
rdtscl(now) (the task is suspended for a long time between two
readings) all overflows after the first one will be lost. But the
patch was submitted to guaranty minimum udelay() initially.
Sure, I could change to u64 if we reach a concensus.
Best regards.
Marin Mitov
>
> -Andi
>
* Marin Mitov <[email protected]> wrote:
> Hi all,
>
> This is a resubmit of the patch proposed by Ingo and me few month ago:
>
> http://lkml.org/lkml/2007/11/20/343
>
> It was in -mm for a while and then removed due to a move into the
> mainline, but it never appeared in it.
hm, we've got this overlapping commit in -tip for the same general
problem:
| # x86/urgent: e71e716: x86: enable preemption in delay
|
| From: Steven Rostedt <[email protected]>
| Date: Sun, 25 May 2008 11:13:32 -0400
| Subject: [PATCH] x86: enable preemption in delay
i think Thomas had a concern with the original fix - forgot the details.
Ingo
------------------>
commit e71e716c531557308895598002bee24c431d3be1
Author: Steven Rostedt <[email protected]>
Date: Sun May 25 11:13:32 2008 -0400
x86: enable preemption in delay
The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!
Using ftrace I found the cause of the latency.
pcscd-2995 3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
pcscd-2995 3dN.2 52360301us : idle_cpu (irq_exit)
pcscd-2995 3dN.2 52360301us : rcu_irq_exit (irq_exit)
pcscd-2995 3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
pcscd-2995 3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)
Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!
At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.
pcscd-2995 3dN.2 52360836us : rcu_irq_exit (irq_exit)
pcscd-2995 3.N.. 52361265us : preempt_schedule (__delay)
Looking at the x86 delay, I found my problem.
In git commit 35d5d08a085c56f153458c3f5d8ce24123617faf, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP. Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).
Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.
[
Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
and to place the rep_nop between preempt_enabled/disable.
]
Signed-off-by: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: Clark Williams <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Luis Claudio R. Goncalves" <[email protected]>
Cc: Gregory Haskins <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
diff --git a/arch/x86/lib/delay_32.c b/arch/x86/lib/delay_32.c
index 4535e6d..d710f2d 100644
--- a/arch/x86/lib/delay_32.c
+++ b/arch/x86/lib/delay_32.c
@@ -44,13 +44,36 @@ static void delay_loop(unsigned long loops)
static void delay_tsc(unsigned long loops)
{
unsigned long bclock, now;
+ int cpu;
- preempt_disable(); /* TSC's are per-cpu */
+ preempt_disable();
+ cpu = smp_processor_id();
rdtscl(bclock);
- do {
- rep_nop();
+ for (;;) {
rdtscl(now);
- } while ((now-bclock) < loops);
+ if ((now - bclock) >= loops)
+ break;
+
+ /* Allow RT tasks to run */
+ preempt_enable();
+ rep_nop();
+ preempt_disable();
+
+ /*
+ * It is possible that we moved to another CPU, and
+ * since TSC's are per-cpu we need to calculate
+ * that. The delay must guarantee that we wait "at
+ * least" the amount of time. Being moved to another
+ * CPU could make the wait longer but we just need to
+ * make sure we waited long enough. Rebalance the
+ * counter for this CPU.
+ */
+ if (unlikely(cpu != smp_processor_id())) {
+ loops -= (now - bclock);
+ cpu = smp_processor_id();
+ rdtscl(bclock);
+ }
+ }
preempt_enable();
}
diff --git a/arch/x86/lib/delay_64.c b/arch/x86/lib/delay_64.c
index bbc6105..4c441be 100644
--- a/arch/x86/lib/delay_64.c
+++ b/arch/x86/lib/delay_64.c
@@ -31,14 +31,36 @@ int __devinit read_current_timer(unsigned long *timer_value)
void __delay(unsigned long loops)
{
unsigned bclock, now;
+ int cpu;
- preempt_disable(); /* TSC's are pre-cpu */
+ preempt_disable();
+ cpu = smp_processor_id();
rdtscl(bclock);
- do {
- rep_nop();
+ for (;;) {
rdtscl(now);
+ if ((now - bclock) >= loops)
+ break;
+
+ /* Allow RT tasks to run */
+ preempt_enable();
+ rep_nop();
+ preempt_disable();
+
+ /*
+ * It is possible that we moved to another CPU, and
+ * since TSC's are per-cpu we need to calculate
+ * that. The delay must guarantee that we wait "at
+ * least" the amount of time. Being moved to another
+ * CPU could make the wait longer but we just need to
+ * make sure we waited long enough. Rebalance the
+ * counter for this CPU.
+ */
+ if (unlikely(cpu != smp_processor_id())) {
+ loops -= (now - bclock);
+ cpu = smp_processor_id();
+ rdtscl(bclock);
+ }
}
- while ((now-bclock) < loops);
preempt_enable();
}
EXPORT_SYMBOL(__delay);
On Monday 09 June 2008 03:13:09 pm Ingo Molnar wrote:
>
> * Marin Mitov <[email protected]> wrote:
>
> > Hi all,
> >
> > This is a resubmit of the patch proposed by Ingo and me few month ago:
> >
> > http://lkml.org/lkml/2007/11/20/343
> >
> > It was in -mm for a while and then removed due to a move into the
> > mainline, but it never appeared in it.
>
> hm, we've got this overlapping commit in -tip for the same general
> problem:
>
> | # x86/urgent: e71e716: x86: enable preemption in delay
> |
> | From: Steven Rostedt <[email protected]>
> | Date: Sun, 25 May 2008 11:13:32 -0400
> | Subject: [PATCH] x86: enable preemption in delay
>
> i think Thomas had a concern with the original fix - forgot the details.
Here they are:
http://lkml.org/lkml/2008/5/25/251
but see my comment to it.
>
> Ingo
There is no principal difference between both patches. I have seen Steven's
one as merged in linux-2.6.26-rc5. The only difference (if it matters of all) is
that in mine patch preempt_disable()/preempt_enable() sections are shorter
and protect only the code that must be protected:
preempt_disable()
rdtscl()
smp_processor_id()
preempt_enable()
As far as Steven's patch is already merged - let it be :-)
Regards
Marin Mitov
>
> ------------------>
> commit e71e716c531557308895598002bee24c431d3be1
> Author: Steven Rostedt <[email protected]>
> Date: Sun May 25 11:13:32 2008 -0400
>
> x86: enable preemption in delay
>
> The RT team has been searching for a nasty latency. This latency shows
> up out of the blue and has been seen to be as big as 5ms!
>
> Using ftrace I found the cause of the latency.
>
> pcscd-2995 3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
> pcscd-2995 3dN.2 52360301us : idle_cpu (irq_exit)
> pcscd-2995 3dN.2 52360301us : rcu_irq_exit (irq_exit)
> pcscd-2995 3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
> )
> pcscd-2995 3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)
>
> Here's an example of a 400 us latency. pcscd took a timer interrupt and
> returned with "need resched" enabled, but did not reschedule until after
> the next interrupt came in at 52360771us 400us later!
>
> At first I thought we somehow missed a preemption check in entry.S. But
> I also noticed that this always seemed to happen during a __delay call.
>
> pcscd-2995 3dN.2 52360836us : rcu_irq_exit (irq_exit)
> pcscd-2995 3.N.. 52361265us : preempt_schedule (__delay)
>
> Looking at the x86 delay, I found my problem.
>
> In git commit 35d5d08a085c56f153458c3f5d8ce24123617faf, Andrew Morton
> placed preempt_disable around the entire delay due to TSC's not working
> nicely on SMP. Unfortunately for those that care about latencies this
> is devastating! Especially when we have callers to mdelay(8).
>
> Here I enable preemption during the loop and account for anytime the task
> migrates to a new CPU. The delay asked for may be extended a bit by
> the migration, but delay only guarantees that it will delay for that minimum
> time. Delaying longer should not be an issue.
>
> [
> Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
> and to place the rep_nop between preempt_enabled/disable.
> ]
>
> Signed-off-by: Steven Rostedt <[email protected]>
> Cc: [email protected]
> Cc: Clark Williams <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: "Luis Claudio R. Goncalves" <[email protected]>
> Cc: Gregory Haskins <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/arch/x86/lib/delay_32.c b/arch/x86/lib/delay_32.c
> index 4535e6d..d710f2d 100644
> --- a/arch/x86/lib/delay_32.c
> +++ b/arch/x86/lib/delay_32.c
> @@ -44,13 +44,36 @@ static void delay_loop(unsigned long loops)
> static void delay_tsc(unsigned long loops)
> {
> unsigned long bclock, now;
> + int cpu;
>
> - preempt_disable(); /* TSC's are per-cpu */
> + preempt_disable();
> + cpu = smp_processor_id();
> rdtscl(bclock);
> - do {
> - rep_nop();
> + for (;;) {
> rdtscl(now);
> - } while ((now-bclock) < loops);
> + if ((now - bclock) >= loops)
> + break;
> +
> + /* Allow RT tasks to run */
> + preempt_enable();
> + rep_nop();
> + preempt_disable();
> +
> + /*
> + * It is possible that we moved to another CPU, and
> + * since TSC's are per-cpu we need to calculate
> + * that. The delay must guarantee that we wait "at
> + * least" the amount of time. Being moved to another
> + * CPU could make the wait longer but we just need to
> + * make sure we waited long enough. Rebalance the
> + * counter for this CPU.
> + */
> + if (unlikely(cpu != smp_processor_id())) {
> + loops -= (now - bclock);
> + cpu = smp_processor_id();
> + rdtscl(bclock);
> + }
> + }
> preempt_enable();
> }
>
> diff --git a/arch/x86/lib/delay_64.c b/arch/x86/lib/delay_64.c
> index bbc6105..4c441be 100644
> --- a/arch/x86/lib/delay_64.c
> +++ b/arch/x86/lib/delay_64.c
> @@ -31,14 +31,36 @@ int __devinit read_current_timer(unsigned long *timer_value)
> void __delay(unsigned long loops)
> {
> unsigned bclock, now;
> + int cpu;
>
> - preempt_disable(); /* TSC's are pre-cpu */
> + preempt_disable();
> + cpu = smp_processor_id();
> rdtscl(bclock);
> - do {
> - rep_nop();
> + for (;;) {
> rdtscl(now);
> + if ((now - bclock) >= loops)
> + break;
> +
> + /* Allow RT tasks to run */
> + preempt_enable();
> + rep_nop();
> + preempt_disable();
> +
> + /*
> + * It is possible that we moved to another CPU, and
> + * since TSC's are per-cpu we need to calculate
> + * that. The delay must guarantee that we wait "at
> + * least" the amount of time. Being moved to another
> + * CPU could make the wait longer but we just need to
> + * make sure we waited long enough. Rebalance the
> + * counter for this CPU.
> + */
> + if (unlikely(cpu != smp_processor_id())) {
> + loops -= (now - bclock);
> + cpu = smp_processor_id();
> + rdtscl(bclock);
> + }
> }
> - while ((now-bclock) < loops);
> preempt_enable();
> }
> EXPORT_SYMBOL(__delay);
>
* Marin Mitov <[email protected]> wrote:
> > i think Thomas had a concern with the original fix - forgot the
> > details.
>
> Here they are:
>
> http://lkml.org/lkml/2008/5/25/251
>
> but see my comment to it.
>
> There is no principal difference between both patches. I have seen
> Steven's one as merged in linux-2.6.26-rc5. The only difference (if it
> matters of all) is that in mine patch
> preempt_disable()/preempt_enable() sections are shorter and protect
> only the code that must be protected:
>
> preempt_disable()
> rdtscl()
> smp_processor_id()
> preempt_enable()
>
> As far as Steven's patch is already merged - let it be :-)
we could still merge your enhancements as well ontop of Steven's, if you
would like to pursue it. If so then please send a patch against -rc5 or
against -tip. Reducing the length of preempt-off sections is a fair
goal.
Ingo
On Monday 09 June 2008 07:16:06 pm Ingo Molnar wrote:
> > There is no principal difference between both patches. I have seen
> > Steven's one as merged in linux-2.6.26-rc5. The only difference (if it
> > matters of all) is that in mine patch
> > preempt_disable()/preempt_enable() sections are shorter and protect
> > only the code that must be protected:
> >
> > preempt_disable()
> > rdtscl()
> > smp_processor_id()
> > preempt_enable()
> >
> > As far as Steven's patch is already merged - let it be :-)
>
> we could still merge your enhancements as well ontop of Steven's, if you
> would like to pursue it. If so then please send a patch against -rc5 or
> against -tip. Reducing the length of preempt-off sections is a fair
> goal.
>
> Ingo
>
Well, the patch is bellow (against 2.6.26-rc6), but I would really
appreciate your comments.
The difference is that in Steven's version the loop is running
mainly with preemption disabled, except for:
preempt_enable();
rep_nop();
preempt_disable();
while in the proposed version the loop is running with
preemption enabled, except for the part:
preempt_disable();
rdtscl(prev_1);
cpu = smp_processor_id();
rdtscl(now);
preempt_enable();
I do realize that this is not a big difference as far as
the time of loop execution is quite short.
In fact in the case of TSCs the problem is not the preemption
itself, but the migration to another cpu after the preemption.
Why not something like that (do keep in mind I am not
an expert :-):
static void delay_tsc(unsigned long loops)
{
get and store the mask of allowed cpus;
/* prevent the migration */
set the mask of allowed cpus to the current cpu only;
/* is it possible? could it be guaranteed? */
loop for the delay;
restore the old mask of allowed cpus;
}
You have got the idea. Could it be realized? Is it more expensive
than the current realization? So, comments, please.
Regards,
Marin Mitov
Signed-off-by: Marin Mitov <[email protected]>
=========================================
--- a/arch/x86/lib/delay_32.c 2008-06-15 11:04:05.000000000 +0300
+++ b/arch/x86/lib/delay_32.c 2008-06-15 11:09:45.000000000 +0300
@@ -40,41 +40,42 @@ static void delay_loop(unsigned long loo
:"0" (loops));
}
-/* TSC based delay: */
+/* TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
static void delay_tsc(unsigned long loops)
{
- unsigned long bclock, now;
- int cpu;
+ unsigned prev, prev_1, now;
+ unsigned left = loops;
+ unsigned prev_cpu, cpu;
preempt_disable();
- cpu = smp_processor_id();
- rdtscl(bclock);
- for (;;) {
- rdtscl(now);
- if ((now - bclock) >= loops)
- break;
+ rdtscl(prev);
+ prev_cpu = smp_processor_id();
+ preempt_enable();
+ now = prev;
- /* Allow RT tasks to run */
- preempt_enable();
+ do {
rep_nop();
+
+ left -= now - prev;
+ prev = now;
+
preempt_disable();
+ rdtscl(prev_1);
+ cpu = smp_processor_id();
+ rdtscl(now);
+ preempt_enable();
- /*
- * It is possible that we moved to another CPU, and
- * since TSC's are per-cpu we need to calculate
- * that. The delay must guarantee that we wait "at
- * least" the amount of time. Being moved to another
- * CPU could make the wait longer but we just need to
- * make sure we waited long enough. Rebalance the
- * counter for this CPU.
- */
- if (unlikely(cpu != smp_processor_id())) {
- loops -= (now - bclock);
- cpu = smp_processor_id();
- rdtscl(bclock);
+ if (prev_cpu != cpu) {
+ /*
+ * We have migrated, forget prev_cpu's tsc reading
+ */
+ prev = prev_1;
+ prev_cpu = cpu;
}
- }
- preempt_enable();
+ } while ((now-prev) < left);
}
/*
--- a/arch/x86/lib/delay_64.c 2008-06-15 11:04:17.000000000 +0300
+++ b/arch/x86/lib/delay_64.c 2008-06-15 11:09:52.000000000 +0300
@@ -28,40 +28,42 @@ int __devinit read_current_timer(unsigne
return 0;
}
+/* TSC based delay:
+ *
+ * We are careful about preemption as TSC's are per-CPU.
+ */
void __delay(unsigned long loops)
{
- unsigned bclock, now;
- int cpu;
+ unsigned prev, prev_1, now;
+ unsigned left = loops;
+ unsigned prev_cpu, cpu;
preempt_disable();
- cpu = smp_processor_id();
- rdtscl(bclock);
- for (;;) {
- rdtscl(now);
- if ((now - bclock) >= loops)
- break;
+ rdtscl(prev);
+ prev_cpu = smp_processor_id();
+ preempt_enable();
+ now = prev;
- /* Allow RT tasks to run */
- preempt_enable();
+ do {
rep_nop();
+
+ left -= now - prev;
+ prev = now;
+
preempt_disable();
+ rdtscl(prev_1);
+ cpu = smp_processor_id();
+ rdtscl(now);
+ preempt_enable();
- /*
- * It is possible that we moved to another CPU, and
- * since TSC's are per-cpu we need to calculate
- * that. The delay must guarantee that we wait "at
- * least" the amount of time. Being moved to another
- * CPU could make the wait longer but we just need to
- * make sure we waited long enough. Rebalance the
- * counter for this CPU.
- */
- if (unlikely(cpu != smp_processor_id())) {
- loops -= (now - bclock);
- cpu = smp_processor_id();
- rdtscl(bclock);
+ if (prev_cpu != cpu) {
+ /*
+ * We have migrated, forget prev_cpu's tsc reading
+ */
+ prev = prev_1;
+ prev_cpu = cpu;
}
- }
- preempt_enable();
+ } while ((now-prev) < left);
}
EXPORT_SYMBOL(__delay);
* Marin Mitov <[email protected]> wrote:
> Why not something like that (do keep in mind I am not an expert :-):
>
> static void delay_tsc(unsigned long loops)
> {
> get and store the mask of allowed cpus;
> /* prevent the migration */
> set the mask of allowed cpus to the current cpu only;
> /* is it possible? could it be guaranteed? */
> loop for the delay;
> restore the old mask of allowed cpus;
> }
>
> You have got the idea. Could it be realized? Is it more expensive than
> the current realization? So, comments, please.
hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
is 4096 bits which is half a kilobyte ...
preempt_disable()/enable() on the other hand only touches a single
variable, (thread_info->preempt_count which is an u32)
Ingo
>>> On Wed, Jun 18, 2008 at 3:55 AM, in message <[email protected]>,
Ingo Molnar <[email protected]> wrote:
> * Marin Mitov <[email protected]> wrote:
>
>> Why not something like that (do keep in mind I am not an expert :-):
>>
>> static void delay_tsc(unsigned long loops)
>> {
>> get and store the mask of allowed cpus;
>> /* prevent the migration */
>> set the mask of allowed cpus to the current cpu only;
>> /* is it possible? could it be guaranteed? */
>> loop for the delay;
>> restore the old mask of allowed cpus;
>> }
>>
>> You have got the idea. Could it be realized? Is it more expensive than
>> the current realization? So, comments, please.
>
> hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
> operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
> is 4096 bits which is half a kilobyte ...
>
> preempt_disable()/enable() on the other hand only touches a single
> variable, (thread_info->preempt_count which is an u32)
>
> Ingo
FWIW: I had submitted some "migration disable" patches a while back that would solve this without the cpus_allowed manipulations described here. Its more expensive than a preempt-disable (but its preemptible), yet its way cheaper (and more correct / less racy) than chaning cpus_allowed. I could resubmit if there was any interest, though I think Ingo said he didnt like the concept on the first pass. Anyway, FYI.
-Greg
>>> On Wed, Jun 18, 2008 at 8:08 AM, in message <[email protected]>,
"Gregory Haskins" <[email protected]> wrote:
>>>> On Wed, Jun 18, 2008 at 3:55 AM, in message <[email protected]>,
> Ingo Molnar <[email protected]> wrote:
>
>> * Marin Mitov <[email protected]> wrote:
>>
>>> Why not something like that (do keep in mind I am not an expert :-):
>>>
>>> static void delay_tsc(unsigned long loops)
>>> {
>>> get and store the mask of allowed cpus;
>>> /* prevent the migration */
>>> set the mask of allowed cpus to the current cpu only;
>>> /* is it possible? could it be guaranteed? */
>>> loop for the delay;
>>> restore the old mask of allowed cpus;
>>> }
>>>
>>> You have got the idea. Could it be realized? Is it more expensive than
>>> the current realization? So, comments, please.
>>
>> hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
>> operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
>> is 4096 bits which is half a kilobyte ...
>>
>> preempt_disable()/enable() on the other hand only touches a single
>> variable, (thread_info->preempt_count which is an u32)
>>
>> Ingo
>
> FWIW: I had submitted some "migration disable" patches a while back that
> would solve this without the cpus_allowed manipulations described here. Its
> more expensive than a preempt-disable (but its preemptible), yet its way
> cheaper (and more correct / less racy) than chaning cpus_allowed. I could
> resubmit if there was any interest, though I think Ingo said he didnt like
> the concept on the first pass. Anyway, FYI.
Sorry, should have provided a reference:
http://lkml.org/lkml/2008/2/12/344
>
> -Greg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2008-06-18 at 06:08 -0600, Gregory Haskins wrote:
> >>> On Wed, Jun 18, 2008 at 3:55 AM, in message <[email protected]>,
> Ingo Molnar <[email protected]> wrote:
>
> > * Marin Mitov <[email protected]> wrote:
> >
> >> Why not something like that (do keep in mind I am not an expert :-):
> >>
> >> static void delay_tsc(unsigned long loops)
> >> {
> >> get and store the mask of allowed cpus;
> >> /* prevent the migration */
> >> set the mask of allowed cpus to the current cpu only;
> >> /* is it possible? could it be guaranteed? */
> >> loop for the delay;
> >> restore the old mask of allowed cpus;
> >> }
> >>
> >> You have got the idea. Could it be realized? Is it more expensive than
> >> the current realization? So, comments, please.
> >
> > hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
> > operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
> > is 4096 bits which is half a kilobyte ...
> >
> > preempt_disable()/enable() on the other hand only touches a single
> > variable, (thread_info->preempt_count which is an u32)
> >
> > Ingo
>
> FWIW: I had submitted some "migration disable" patches a while back
> that would solve this without the cpus_allowed manipulations described
> here. Its more expensive than a preempt-disable (but its
> preemptible), yet its way cheaper (and more correct / less racy) than
> chaning cpus_allowed. I could resubmit if there was any interest,
> though I think Ingo said he didnt like the concept on the first pass.
> Anyway, FYI.
(please teach your mailer to wrap text)
Yeah - migrate_disable() has been proposed several times. The reason I
don't like it is that is creates scheduling artefacts like latencies by
not being able to load-balance (and thereby complicates all that, and
you know we don't need more complication there).
Things like preempt latency and irq off latency are rather easy to
notice and debug in general. migrate_disable() would be fully
preemptable/schedulable which makes it much much harder to instrument or
even detect we have an issue. Which in turn makes it much harder to find
abuse.
>>> On Wed, Jun 18, 2008 at 8:16 AM, in message
<1213791416.16944.222.camel@twins>, Peter Zijlstra <[email protected]>
wrote:
> On Wed, 2008-06-18 at 06:08 -0600, Gregory Haskins wrote:
>> >>> On Wed, Jun 18, 2008 at 3:55 AM, in message <[email protected]>,
>> Ingo Molnar <[email protected]> wrote:
>>
>> > * Marin Mitov <[email protected]> wrote:
>> >
>> >> Why not something like that (do keep in mind I am not an expert :-):
>> >>
>> >> static void delay_tsc(unsigned long loops)
>> >> {
>> >> get and store the mask of allowed cpus;
>> >> /* prevent the migration */
>> >> set the mask of allowed cpus to the current cpu only;
>> >> /* is it possible? could it be guaranteed? */
>> >> loop for the delay;
>> >> restore the old mask of allowed cpus;
>> >> }
>> >>
>> >> You have got the idea. Could it be realized? Is it more expensive than
>> >> the current realization? So, comments, please.
>> >
>> > hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
>> > operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
>> > is 4096 bits which is half a kilobyte ...
>> >
>> > preempt_disable()/enable() on the other hand only touches a single
>> > variable, (thread_info->preempt_count which is an u32)
>> >
>> > Ingo
>>
>> FWIW: I had submitted some "migration disable" patches a while back
>> that would solve this without the cpus_allowed manipulations described
>> here. Its more expensive than a preempt-disable (but its
>> preemptible), yet its way cheaper (and more correct / less racy) than
>> chaning cpus_allowed. I could resubmit if there was any interest,
>> though I think Ingo said he didnt like the concept on the first pass.
>> Anyway, FYI.
>
> (please teach your mailer to wrap text)
Sorry...I know its really annoying, but I have no control over it in groupwise :(
Its a server side / MTA setting. Go figure. I will try to wrap manually.
>
> Yeah - migrate_disable() has been proposed several times. The reason I
> don't like it is that is creates scheduling artefacts like latencies by
> not being able to load-balance (and thereby complicates all that, and
> you know we don't need more complication there).
True, and good point. But this concept would certainly be useful to avoid
the heavyweight (w.r.t. latency) preempt-disable() in quite a few different
areas, so if we can make it work with reasonable visibility, it might be nice
to have.
>
> Things like preempt latency and irq off latency are rather easy to
> notice and debug in general. migrate_disable() would be fully
> preemptable/schedulable which makes it much much harder to instrument or
> even detect we have an issue. Which in turn makes it much harder to find
> abuse.
I wonder if we can come up with any creative instrumentation to get coverage
in this case. I will think about it and add it to the migration-disable queue I
have to be submitted together (unless Ingo et. al. feel strongly that it will
never be accepted even with good instrumentation...then I will not waste
any effort on it).
Regards,
-Greg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 18 June 2008 22:25, Gregory Haskins wrote:
> >>> On Wed, Jun 18, 2008 at 8:16 AM, in message
> > Yeah - migrate_disable() has been proposed several times. The reason I
> > don't like it is that is creates scheduling artefacts like latencies by
> > not being able to load-balance (and thereby complicates all that, and
> > you know we don't need more complication there).
>
> True, and good point. But this concept would certainly be useful to avoid
> the heavyweight (w.r.t. latency) preempt-disable() in quite a few different
> areas, so if we can make it work with reasonable visibility, it might be
> nice to have.
It just seems like pretty worthless bloat to me.
There are _some_ cases where it can be used, but nobody has been
able to come up with compelling uses really. I don't think this
case is helped very much either because the logic in there using
preempt-disable is fine, isn't it?
Except that it should also have a cond_resched in it. Seems like
an ideal place to put cond_resched because it is not a fastpath.
>>> On Wed, Jun 18, 2008 at 8:42 AM, in message
<[email protected]>, Nick Piggin
<[email protected]> wrote:
> On Wednesday 18 June 2008 22:25, Gregory Haskins wrote:
>> >>> On Wed, Jun 18, 2008 at 8:16 AM, in message
>
>> > Yeah - migrate_disable() has been proposed several times. The reason I
>> > don't like it is that is creates scheduling artefacts like latencies by
>> > not being able to load-balance (and thereby complicates all that, and
>> > you know we don't need more complication there).
>>
>> True, and good point. But this concept would certainly be useful to avoid
>> the heavyweight (w.r.t. latency) preempt-disable() in quite a few different
>> areas, so if we can make it work with reasonable visibility, it might be
>> nice to have.
>
> It just seems like pretty worthless bloat to me.
>
> There are _some_ cases where it can be used, but nobody has been
> able to come up with compelling uses really.
Well, I have some ongoing R&D which (I believe) *would* make compelling use of a
migration-disable feature. But to date it is not ready to see the light of day. As
far as existing uses, I think I mostly agree with you. The one argument to the
contrary would be to clean up the handful of places that implement an ad-hoc
migration-disable by messing with cpus_allowed in a similar manner. But perhaps
those could be solved with a preempt-disable() as well.
>I don't think this
> case is helped very much either because the logic in there using
> preempt-disable is fine, isn't it?
You are probably right. In my own defense, I was just responding to the
question about manipulating the cpus_allowed. If you are going to do that
you are better off with my patch (IMO). Changing cpus_allowed to prevent
migration is racy, among other things. Whether this tsc code is optimal with
migration disabled or preemption-disabled is a separate matter which
I did not address.
>
> Except that it should also have a cond_resched in it. Seems like
> an ideal place to put cond_resched because it is not a fastpath.
Seems reasonable to me. Thanks Nick.
-Greg
On Wed, 18 Jun 2008, Nick Piggin wrote:
> There are _some_ cases where it can be used, but nobody has been
> able to come up with compelling uses really. I don't think this
> case is helped very much either because the logic in there using
> preempt-disable is fine, isn't it?
>
> Except that it should also have a cond_resched in it. Seems like
> an ideal place to put cond_resched because it is not a fastpath.
>
Does it really need a cond_resched? preempt_enable when it goes to zero
will already check to see if it should schedule.
-- Steve
On Thursday 19 June 2008 02:16, Steven Rostedt wrote:
> On Wed, 18 Jun 2008, Nick Piggin wrote:
> > There are _some_ cases where it can be used, but nobody has been
> > able to come up with compelling uses really. I don't think this
> > case is helped very much either because the logic in there using
> > preempt-disable is fine, isn't it?
> >
> > Except that it should also have a cond_resched in it. Seems like
> > an ideal place to put cond_resched because it is not a fastpath.
>
> Does it really need a cond_resched? preempt_enable when it goes to zero
> will already check to see if it should schedule.
It doesn't really need one, but of course having one would help
non preempt kernels.
On Wed 2008-06-18 09:55:18, Ingo Molnar wrote:
>
> * Marin Mitov <[email protected]> wrote:
>
> > Why not something like that (do keep in mind I am not an expert :-):
> >
> > static void delay_tsc(unsigned long loops)
> > {
> > get and store the mask of allowed cpus;
> > /* prevent the migration */
> > set the mask of allowed cpus to the current cpu only;
> > /* is it possible? could it be guaranteed? */
> > loop for the delay;
> > restore the old mask of allowed cpus;
> > }
> >
> > You have got the idea. Could it be realized? Is it more expensive than
> > the current realization? So, comments, please.
>
> hm, changing/saving/restorig cpus_allowed is really considered a 'heavy'
> operation compared to preempt_disable(). On a 4096 CPUs box cpus_allowed
> is 4096 bits which is half a kilobyte ...
Time to RLE the bitmap?
<runs> Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html