Use rep_nop instead of barrier for cpu_relax, following $(SUBARCH)'s doing
that (i.e. i386 and x86_64).
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[email protected]>
---
linux-2.6.11-paolo/include/asm-um/processor-generic.h | 2 --
linux-2.6.11-paolo/include/asm-um/processor-i386.h | 8 ++++++++
linux-2.6.11-paolo/include/asm-um/processor-x86_64.h | 8 ++++++++
3 files changed, 16 insertions(+), 2 deletions(-)
diff -puN include/asm-um/processor-generic.h~uml-cpu_relax include/asm-um/processor-generic.h
--- linux-2.6.11/include/asm-um/processor-generic.h~uml-cpu_relax 2005-03-22 16:52:25.000000000 +0100
+++ linux-2.6.11-paolo/include/asm-um/processor-generic.h 2005-03-22 16:54:41.000000000 +0100
@@ -16,8 +16,6 @@ struct task_struct;
struct mm_struct;
-#define cpu_relax() barrier()
-
struct thread_struct {
int forking;
int nsyscalls;
diff -puN include/asm-um/processor-i386.h~uml-cpu_relax include/asm-um/processor-i386.h
--- linux-2.6.11/include/asm-um/processor-i386.h~uml-cpu_relax 2005-03-22 16:53:43.000000000 +0100
+++ linux-2.6.11-paolo/include/asm-um/processor-i386.h 2005-03-22 16:54:39.000000000 +0100
@@ -19,6 +19,14 @@ struct arch_thread {
#include "asm/arch/user.h"
+/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
+static inline void rep_nop(void)
+{
+ __asm__ __volatile__("rep;nop": : :"memory");
+}
+
+#define cpu_relax() rep_nop()
+
/*
* Default implementation of macro that returns current
* instruction pointer ("program counter"). Stolen
diff -puN include/asm-um/processor-x86_64.h~uml-cpu_relax include/asm-um/processor-x86_64.h
--- linux-2.6.11/include/asm-um/processor-x86_64.h~uml-cpu_relax 2005-03-22 16:56:30.000000000 +0100
+++ linux-2.6.11-paolo/include/asm-um/processor-x86_64.h 2005-03-22 16:56:32.000000000 +0100
@@ -12,6 +12,14 @@
struct arch_thread {
};
+/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
+extern inline void rep_nop(void)
+{
+ __asm__ __volatile__("rep;nop": : :"memory");
+}
+
+#define cpu_relax() rep_nop()
+
#define INIT_ARCH_THREAD { }
#define current_text_addr() \
_
[email protected] wrote:
> Use rep_nop instead of barrier for cpu_relax, following $(SUBARCH)'s doing
> that (i.e. i386 and x86_64).
IIRC, Jeff had the idea, to use sched_yield() for this (from a discussion on #uml).
S390 does something similar using a special DIAG-opcode that gives permission to zVM,
that another Guest might run.
On a host running many UMLs, this might improve performance.
So, I would like to have the small patch below (it's not tested, just an idea).
Bodo
> diff -puN include/asm-um/processor-generic.h~uml-cpu_relax include/asm-um/processor-generic.h
> --- linux-2.6.11/include/asm-um/processor-generic.h~uml-cpu_relax 2005-03-22 16:52:25.000000000 +0100
> +++ linux-2.6.11-paolo/include/asm-um/processor-generic.h 2005-03-22 16:54:41.000000000 +0100
> @@ -16,7 +16,8 @@ struct task_struct;
>
> struct mm_struct;
>
> -#define cpu_relax() barrier()
> +#include "kern.h"
> +#define cpu_relax() sched_yield()
>
> struct thread_struct {
> int forking;
On Wednesday 23 March 2005 18:09, Bodo Stroesser wrote:
> [email protected] wrote:
> > Use rep_nop instead of barrier for cpu_relax, following $(SUBARCH)'s
> > doing that (i.e. i386 and x86_64).
>
> IIRC, Jeff had the idea, to use sched_yield() for this (from a discussion
> on #uml).
Hmm, makes sense, but this is to benchmark well... I remember from early
discussions on 2.6 scheduler that using sched_yield might decrease
performance (IIRC starve the calling application).
Also, that call should be put inside the idle loop, not for cpu_relax, which
is very different, since it is used (for instance) in kernel/spinlock.c for
spinlocks, and in such things. The "Pause" opcode is explicitly recommended
(by Intel manuals, I don't recall why) for things like spinlock loops, and
using yield there would be bad.
> S390 does something similar using a special DIAG-opcode that
> gives permission to zVM, that another Guest might run.
> On a host running many UMLs, this might improve performance.
>
> So, I would like to have the small patch below (it's not tested, just an
> idea).
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
http://www.user-mode-linux.org/~blaisorblade
Blaisorblade <[email protected]> wrote:
>
> On Wednesday 23 March 2005 18:09, Bodo Stroesser wrote:
> > [email protected] wrote:
> > > Use rep_nop instead of barrier for cpu_relax, following $(SUBARCH)'s
> > > doing that (i.e. i386 and x86_64).
> >
> > IIRC, Jeff had the idea, to use sched_yield() for this (from a discussion
> > on #uml).
> Hmm, makes sense, but this is to benchmark well... I remember from early
> discussions on 2.6 scheduler that using sched_yield might decrease
> performance (IIRC starve the calling application).
yup, sched_yield() is pretty uniformly bad, and can result in heaps of
starvation if the machine is busy. Best to avoid it unless you really want
it, and have tested it thoroughly under many-tasks-busy workloads.
Blaisorblade wrote:
> On Wednesday 23 March 2005 18:09, Bodo Stroesser wrote:
>
>>[email protected] wrote:
>>
>>>Use rep_nop instead of barrier for cpu_relax, following $(SUBARCH)'s
>>>doing that (i.e. i386 and x86_64).
>>
>>IIRC, Jeff had the idea, to use sched_yield() for this (from a discussion
>>on #uml).
>
> Hmm, makes sense, but this is to benchmark well... I remember from early
> discussions on 2.6 scheduler that using sched_yield might decrease
> performance (IIRC starve the calling application).
>
Typically, for places where cpu_relax is used, sched_yield would be
a poor fit. So yes it could easily reduce performance.
> Also, that call should be put inside the idle loop, not for cpu_relax, which
> is very different, since it is used (for instance) in kernel/spinlock.c for
> spinlocks, and in such things. The "Pause" opcode is explicitly recommended
> (by Intel manuals, I don't recall why) for things like spinlock loops, and
> using yield there would be bad.
>
The other thing is that sched_yield won't relax at all if you are the
only thing running, it will be a busy wait. So again, maybe not a great
fit for the idle loop either.