2009-04-09 15:48:45

by Heiko Carstens

[permalink] [raw]
Subject: [PATCH] mutex: have non-spinning mutexes on s390 by default

From: Heiko Carstens <[email protected]>

The adaptive spinning mutexes will not always do what one would expect on
virtualized architectures like s390. Especially the cpu_relax() loop in
mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
away by the hypervisor.
We would end up in a cpu_relax() loop when there is no chance that the
state of the mutex changes until the target cpu has been scheduled again by
the hypervisor.
For that reason we should change the default behaviour to no-spin on s390.

We do have an instruction which allows to yield the current cpu in favour of
a different target cpu. Also we have an instruction which allows us to figure
out if the target cpu is physically backed.

However we need to do some performance tests until we can come up with
a solution that will do the right thing on s390.
Until then make the old behaviour default for us.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
---
arch/Kconfig | 3 +++
arch/s390/Kconfig | 1 +
kernel/sched_features.h | 4 ++++
3 files changed, 8 insertions(+)

Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -109,3 +109,6 @@ config HAVE_CLK

config HAVE_DMA_API_DEBUG
bool
+
+config HAVE_DEFAULT_NO_SPIN_MUTEXES
+ bool
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -82,6 +82,7 @@ config S390
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_SYSCALL_WRAPPERS
select HAVE_FUNCTION_TRACER
+ select HAVE_DEFAULT_NO_SPIN_MUTEXES
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -14,4 +14,8 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
SCHED_FEAT(ASYM_EFF_LOAD, 1)
SCHED_FEAT(WAKEUP_OVERLAP, 0)
SCHED_FEAT(LAST_BUDDY, 1)
+#ifdef CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES
+SCHED_FEAT(OWNER_SPIN, 0)
+#else
SCHED_FEAT(OWNER_SPIN, 1)
+#endif


2009-04-09 15:53:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Thu, 2009-04-09 at 17:47 +0200, Heiko Carstens wrote:
> From: Heiko Carstens <[email protected]>
>
> The adaptive spinning mutexes will not always do what one would expect on
> virtualized architectures like s390. Especially the cpu_relax() loop in
> mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
> away by the hypervisor.
> We would end up in a cpu_relax() loop when there is no chance that the
> state of the mutex changes until the target cpu has been scheduled again by
> the hypervisor.
> For that reason we should change the default behaviour to no-spin on s390.
>
> We do have an instruction which allows to yield the current cpu in favour of
> a different target cpu. Also we have an instruction which allows us to figure
> out if the target cpu is physically backed.
>
> However we need to do some performance tests until we can come up with
> a solution that will do the right thing on s390.
> Until then make the old behaviour default for us.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Signed-off-by: Heiko Carstens <[email protected]>
> ---
> arch/Kconfig | 3 +++
> arch/s390/Kconfig | 1 +
> kernel/sched_features.h | 4 ++++
> 3 files changed, 8 insertions(+)
>
> Index: linux-2.6/arch/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/Kconfig
> +++ linux-2.6/arch/Kconfig
> @@ -109,3 +109,6 @@ config HAVE_CLK
>
> config HAVE_DMA_API_DEBUG
> bool
> +
> +config HAVE_DEFAULT_NO_SPIN_MUTEXES
> + bool
> Index: linux-2.6/arch/s390/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/s390/Kconfig
> +++ linux-2.6/arch/s390/Kconfig
> @@ -82,6 +82,7 @@ config S390
> select USE_GENERIC_SMP_HELPERS if SMP
> select HAVE_SYSCALL_WRAPPERS
> select HAVE_FUNCTION_TRACER
> + select HAVE_DEFAULT_NO_SPIN_MUTEXES
> select HAVE_OPROFILE
> select HAVE_KPROBES
> select HAVE_KRETPROBES
> Index: linux-2.6/kernel/sched_features.h
> ===================================================================
> --- linux-2.6.orig/kernel/sched_features.h
> +++ linux-2.6/kernel/sched_features.h
> @@ -14,4 +14,8 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
> SCHED_FEAT(ASYM_EFF_LOAD, 1)
> SCHED_FEAT(WAKEUP_OVERLAP, 0)
> SCHED_FEAT(LAST_BUDDY, 1)
> +#ifdef CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES
> +SCHED_FEAT(OWNER_SPIN, 0)
> +#else
> SCHED_FEAT(OWNER_SPIN, 1)
> +#endif

Hmm, I'd rather have you'd make the whole block in __mutex_lock_common
go away on that CONFIG thingy.

Would be nice though to get something working on s390, does it have a
monitor wait like ins where it can properly sleep so that another
virtual host can run?

If so, we could possibly do a monitor wait on the lock owner field
instead of spinning.

2009-04-09 16:14:20

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Thu, 09 Apr 2009 17:54:56 +0200
Peter Zijlstra <[email protected]> wrote:

> > Index: linux-2.6/kernel/sched_features.h
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched_features.h
> > +++ linux-2.6/kernel/sched_features.h
> > @@ -14,4 +14,8 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
> > SCHED_FEAT(ASYM_EFF_LOAD, 1)
> > SCHED_FEAT(WAKEUP_OVERLAP, 0)
> > SCHED_FEAT(LAST_BUDDY, 1)
> > +#ifdef CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES
> > +SCHED_FEAT(OWNER_SPIN, 0)
> > +#else
> > SCHED_FEAT(OWNER_SPIN, 1)
> > +#endif
>
> Hmm, I'd rather have you'd make the whole block in __mutex_lock_common
> go away on that CONFIG thingy.

Sure, I can do that.

> Would be nice though to get something working on s390, does it have a
> monitor wait like ins where it can properly sleep so that another
> virtual host can run?
>
> If so, we could possibly do a monitor wait on the lock owner field
> instead of spinning.

What we now have: the cpu_relax() implementation on s390 will yield the
current (virtual) cpu and give it back to the hypervisor. If the
hypervisor has nothing else to schedule or for fairness reasons decides
that this cpu has to run again it will be scheduled again.
One roundtrip (exit/reenter) is quite expensive (~1200 cycles).

We also have a directed yield where you can tell the hypervisor "don't
schedule me again until cpu x was has been scheduled".

And we have an IPI (signal processer with order code sense running)
which examines if the target cpu is actually physically backed. That's
in the order of ~80 cycles. At least that's what I just measured with
two hypervisors running below my Linux kernel.

So the idea for the spinning loop would be to additionaly test if the
targer cpu is physically backed. If it is we could happily spin and
sense again, more or less like the current code does.
Now the question is what do we do if it isn't backed? Yield the current
cpu in favour of the target cpu or just make the current process sleep
and schedule a different one?
For that I'd like to have performance numbers before we go in any
direction.. It might take some time to get the numbers since it's not
easy to get a slot for performance measurements.

2009-04-09 16:49:10

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

Updated Patch below:

Subject: [PATCH] mutex: have non-spinning mutexes on s390 by default

From: Heiko Carstens <[email protected]>

The adaptive spinning mutexes will not always do what one would expect on
virtualized architectures like s390. Especially the cpu_relax() loop in
mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
away by the hypervisor.
We would end up in a cpu_relax() loop when there is no chance that the
state of the mutex changes until the target cpu has been scheduled again by
the hypervisor.
For that reason we should change the default behaviour to no-spin on s390.

We do have an instruction which allows to yield the current cpu in favour of
a different target cpu. Also we have an instruction which allows us to figure
out if the target cpu is physically backed.

However we need to do some performance tests until we can come up with
a solution that will do the right thing on s390.

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
---
arch/Kconfig | 3 +++
arch/s390/Kconfig | 1 +
kernel/mutex.c | 3 ++-
3 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/Kconfig
===================================================================
--- linux-2.6.orig/arch/Kconfig
+++ linux-2.6/arch/Kconfig
@@ -109,3 +109,6 @@ config HAVE_CLK

config HAVE_DMA_API_DEBUG
bool
+
+config HAVE_DEFAULT_NO_SPIN_MUTEXES
+ bool
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -82,6 +82,7 @@ config S390
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_SYSCALL_WRAPPERS
select HAVE_FUNCTION_TRACER
+ select HAVE_DEFAULT_NO_SPIN_MUTEXES
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
Index: linux-2.6/kernel/mutex.c
===================================================================
--- linux-2.6.orig/kernel/mutex.c
+++ linux-2.6/kernel/mutex.c
@@ -148,7 +148,8 @@ __mutex_lock_common(struct mutex *lock,

preempt_disable();
mutex_acquire(&lock->dep_map, subclass, 0, ip);
-#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES)
+#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES) && \
+ !defined(CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES)
/*
* Optimistic spinning.
*

2009-04-09 16:52:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Thu, 2009-04-09 at 18:48 +0200, Heiko Carstens wrote:
> Updated Patch below:
>
> Subject: [PATCH] mutex: have non-spinning mutexes on s390 by default
>
> From: Heiko Carstens <[email protected]>
>
> The adaptive spinning mutexes will not always do what one would expect on
> virtualized architectures like s390. Especially the cpu_relax() loop in
> mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
> away by the hypervisor.
> We would end up in a cpu_relax() loop when there is no chance that the
> state of the mutex changes until the target cpu has been scheduled again by
> the hypervisor.
> For that reason we should change the default behaviour to no-spin on s390.
>
> We do have an instruction which allows to yield the current cpu in favour of
> a different target cpu. Also we have an instruction which allows us to figure
> out if the target cpu is physically backed.
>
> However we need to do some performance tests until we can come up with
> a solution that will do the right thing on s390.
>
> Cc: Peter Zijlstra <[email protected]>

Acked-by: Peter Zijlstra <[email protected]>

I was looking at how an monitor-wait could be used here, but that
appears non-trivial, there's two variables we're watching, lock->owner
and rq->curr, either could change.

Reducing that to 1 seems an interesting problem :-)

2009-04-09 17:38:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Thu, 2009-04-09 at 18:53 +0200, Peter Zijlstra wrote:

> I was looking at how an monitor-wait could be used here, but that
> appears non-trivial, there's two variables we're watching, lock->owner
> and rq->curr, either could change.
>
> Reducing that to 1 seems an interesting problem :-)

How about something like this?

Does it make sense to implement an monitor-wait spinlock for the virt
case as well? Avi, Jeremy?

---
arch/Kconfig | 3 +++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/processor.h | 21 +++++++++++++++++++++
include/linux/sched.h | 2 ++
kernel/mutex.h | 1 +
kernel/sched.c | 27 ++++++++++++++++++++++++++-
6 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index dc81b34..67aa9f9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -109,3 +109,6 @@ config HAVE_CLK

config HAVE_DMA_API_DEBUG
bool
+
+config HAVE_MUTEX_MWAIT
+ bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2560fff..62d378b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -47,6 +47,7 @@ config X86
select HAVE_KERNEL_LZMA
select HAVE_ARCH_KMEMCHECK
select HAVE_DMA_API_DEBUG
+ select HAVE_MUTEX_MWAIT

config ARCH_DEFCONFIG
string
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7c39de7..c2617e4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -727,6 +727,27 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
:: "a" (eax), "c" (ecx));
}

+#ifdef CONFIG_SMP
+static inline void mutex_spin_monitor(void *addr)
+{
+ if (!boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ return;
+
+ __monitor(addr, 0, 0);
+ smp_mb();
+}
+
+static inline void mutex_spin_monitor_wait(void)
+{
+ if (!boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+ cpu_relax();
+ return;
+ }
+
+ __mwait(0, 0);
+}
+#endif
+
extern void mwait_idle_with_hints(unsigned long eax, unsigned long ecx);

extern void select_idle_routine(const struct cpuinfo_x86 *c);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 25bdac7..87f945e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -342,7 +342,9 @@ extern signed long schedule_timeout_killable(signed long timeout);
extern signed long schedule_timeout_uninterruptible(signed long timeout);
asmlinkage void __schedule(void);
asmlinkage void schedule(void);
+
extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern void mutex_spin_unlock(void);

struct nsproxy;
struct user_namespace;
diff --git a/kernel/mutex.h b/kernel/mutex.h
index 67578ca..c4d2d7a 100644
--- a/kernel/mutex.h
+++ b/kernel/mutex.h
@@ -25,6 +25,7 @@ static inline void mutex_set_owner(struct mutex *lock)
static inline void mutex_clear_owner(struct mutex *lock)
{
lock->owner = NULL;
+ mutex_spin_unlock();
}
#else
static inline void mutex_set_owner(struct mutex *lock)
diff --git a/kernel/sched.c b/kernel/sched.c
index f2cf383..f15af61 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5184,6 +5184,28 @@ need_resched_nonpreemptible:
EXPORT_SYMBOL(schedule);

#ifdef CONFIG_SMP
+
+#ifndef CONFIG_HAVE_MUTEX_MWAIT
+static inline void mutex_spin_monitor(void *addr)
+{
+}
+
+static inline void mutex_spin_monitor_wait(void)
+{
+ cpu_relax();
+}
+#endif
+
+void mutex_spin_unlock(void)
+{
+ /*
+ * XXX fix the below to now require as many ins
+ */
+ preempt_disable();
+ this_rq()->curr = current;
+ preempt_enable();
+}
+
/*
* Look out! "owner" is an entirely speculative pointer
* access and not reliable.
@@ -5225,6 +5247,9 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
rq = cpu_rq(cpu);

for (;;) {
+
+ mutex_spin_monitor(&rq->curr);
+
/*
* Owner changed, break to re-assess state.
*/
@@ -5237,7 +5262,7 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
if (task_thread_info(rq->curr) != owner || need_resched())
return 0;

- cpu_relax();
+ mutex_spin_monitor_wait();
}
out:
return 1;

2009-04-09 17:50:38

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

Peter Zijlstra wrote:
> On Thu, 2009-04-09 at 18:53 +0200, Peter Zijlstra wrote:
>
>
>> I was looking at how an monitor-wait could be used here, but that
>> appears non-trivial, there's two variables we're watching, lock->owner
>> and rq->curr, either could change.
>>
>> Reducing that to 1 seems an interesting problem :-)
>>
>
> How about something like this?
>
> Does it make sense to implement an monitor-wait spinlock for the virt
> case as well? Avi, Jeremy?
>

Last time I tried to put mwait in a spinlock, Arjan (or HPA?) said that
mwait takes approx a week and a half to wake up, and that it was really
only useful for idle power savings.

Has that changed?

Aside from that, using mwait directly doesn't really help PV Xen much
(it never an available CPU feature); we'd need a higher-level hook to
implement something else to block the vcpu.

J

> ---
> arch/Kconfig | 3 +++
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/processor.h | 21 +++++++++++++++++++++
> include/linux/sched.h | 2 ++
> kernel/mutex.h | 1 +
> kernel/sched.c | 27 ++++++++++++++++++++++++++-
> 6 files changed, 54 insertions(+), 1 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index dc81b34..67aa9f9 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -109,3 +109,6 @@ config HAVE_CLK
>
> config HAVE_DMA_API_DEBUG
> bool
> +
> +config HAVE_MUTEX_MWAIT
> + bool
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2560fff..62d378b 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -47,6 +47,7 @@ config X86
> select HAVE_KERNEL_LZMA
> select HAVE_ARCH_KMEMCHECK
> select HAVE_DMA_API_DEBUG
> + select HAVE_MUTEX_MWAIT
>
> config ARCH_DEFCONFIG
> string
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 7c39de7..c2617e4 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -727,6 +727,27 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> :: "a" (eax), "c" (ecx));
> }
>
> +#ifdef CONFIG_SMP
> +static inline void mutex_spin_monitor(void *addr)
> +{
> + if (!boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + return;
> +
> + __monitor(addr, 0, 0);
> + smp_mb();
> +}
> +
> +static inline void mutex_spin_monitor_wait(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
> + cpu_relax();
> + return;
> + }
> +
> + __mwait(0, 0);
> +}
> +#endif
> +
> extern void mwait_idle_with_hints(unsigned long eax, unsigned long ecx);
>
> extern void select_idle_routine(const struct cpuinfo_x86 *c);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 25bdac7..87f945e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -342,7 +342,9 @@ extern signed long schedule_timeout_killable(signed long timeout);
> extern signed long schedule_timeout_uninterruptible(signed long timeout);
> asmlinkage void __schedule(void);
> asmlinkage void schedule(void);
> +
> extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
> +extern void mutex_spin_unlock(void);
>
> struct nsproxy;
> struct user_namespace;
> diff --git a/kernel/mutex.h b/kernel/mutex.h
> index 67578ca..c4d2d7a 100644
> --- a/kernel/mutex.h
> +++ b/kernel/mutex.h
> @@ -25,6 +25,7 @@ static inline void mutex_set_owner(struct mutex *lock)
> static inline void mutex_clear_owner(struct mutex *lock)
> {
> lock->owner = NULL;
> + mutex_spin_unlock();
> }
> #else
> static inline void mutex_set_owner(struct mutex *lock)
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f2cf383..f15af61 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5184,6 +5184,28 @@ need_resched_nonpreemptible:
> EXPORT_SYMBOL(schedule);
>
> #ifdef CONFIG_SMP
> +
> +#ifndef CONFIG_HAVE_MUTEX_MWAIT
> +static inline void mutex_spin_monitor(void *addr)
> +{
> +}
> +
> +static inline void mutex_spin_monitor_wait(void)
> +{
> + cpu_relax();
> +}
> +#endif
> +
> +void mutex_spin_unlock(void)
> +{
> + /*
> + * XXX fix the below to now require as many ins
> + */
> + preempt_disable();
> + this_rq()->curr = current;
> + preempt_enable();
> +}
> +
> /*
> * Look out! "owner" is an entirely speculative pointer
> * access and not reliable.
> @@ -5225,6 +5247,9 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
> rq = cpu_rq(cpu);
>
> for (;;) {
> +
> + mutex_spin_monitor(&rq->curr);
> +
> /*
> * Owner changed, break to re-assess state.
> */
> @@ -5237,7 +5262,7 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
> if (task_thread_info(rq->curr) != owner || need_resched())
> return 0;
>
> - cpu_relax();
> + mutex_spin_monitor_wait();
> }
> out:
> return 1;
>
>

2009-04-09 18:15:39

by Heiko Carstens

[permalink] [raw]
Subject: [tip:core/urgent] mutex: have non-spinning mutexes on s390 by default

Commit-ID: 36cd3c9f925b9307236505ae7ad1ad7ac4d4357c
Gitweb: http://git.kernel.org/tip/36cd3c9f925b9307236505ae7ad1ad7ac4d4357c
Author: Heiko Carstens <[email protected]>
AuthorDate: Thu, 9 Apr 2009 18:48:34 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 9 Apr 2009 19:28:24 +0200

mutex: have non-spinning mutexes on s390 by default

Impact: performance regression fix for s390

The adaptive spinning mutexes will not always do what one would expect on
virtualized architectures like s390. Especially the cpu_relax() loop in
mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
away by the hypervisor.

We would end up in a cpu_relax() loop when there is no chance that the
state of the mutex changes until the target cpu has been scheduled again by
the hypervisor.

For that reason we should change the default behaviour to no-spin on s390.

We do have an instruction which allows to yield the current cpu in favour of
a different target cpu. Also we have an instruction which allows us to figure
out if the target cpu is physically backed.

However we need to do some performance tests until we can come up with
a solution that will do the right thing on s390.

Signed-off-by: Heiko Carstens <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/Kconfig | 3 +++
arch/s390/Kconfig | 1 +
kernel/mutex.c | 3 ++-
3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index dc81b34..78a35e9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -109,3 +109,6 @@ config HAVE_CLK

config HAVE_DMA_API_DEBUG
bool
+
+config HAVE_DEFAULT_NO_SPIN_MUTEXES
+ bool
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index dcb667c..2eca5fe 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -82,6 +82,7 @@ config S390
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_SYSCALL_WRAPPERS
select HAVE_FUNCTION_TRACER
+ select HAVE_DEFAULT_NO_SPIN_MUTEXES
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 5d79781..507cf2b 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -148,7 +148,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

preempt_disable();
mutex_acquire(&lock->dep_map, subclass, 0, ip);
-#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES)
+#if defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES) && \
+ !defined(CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES)
/*
* Optimistic spinning.
*

2009-04-09 18:36:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Thu, 2009-04-09 at 10:50 -0700, Jeremy Fitzhardinge wrote:
> Peter Zijlstra wrote:
> > On Thu, 2009-04-09 at 18:53 +0200, Peter Zijlstra wrote:
> >
> >
> >> I was looking at how an monitor-wait could be used here, but that
> >> appears non-trivial, there's two variables we're watching, lock->owner
> >> and rq->curr, either could change.
> >>
> >> Reducing that to 1 seems an interesting problem :-)
> >>
> >
> > How about something like this?
> >
> > Does it make sense to implement an monitor-wait spinlock for the virt
> > case as well? Avi, Jeremy?
> >
>
> Last time I tried to put mwait in a spinlock, Arjan (or HPA?) said that
> mwait takes approx a week and a half to wake up, and that it was really
> only useful for idle power savings.

Yeah, sad that.

> Has that changed?

Nothing much, I was thinking perhaps it would make sense for the virt
case, but if its not properly virtualized then its pretty useless
indeed.

monitor-wait is basically a hardware/hv futex like thing, so I thought
it might help -- spinning in a guest is pretty painful.

2009-04-17 21:52:19

by folkert

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

> The adaptive spinning mutexes will not always do what one would expect on
> virtualized architectures like s390. Especially the cpu_relax() loop in
> mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
> away by the hypervisor.
> We would end up in a cpu_relax() loop when there is no chance that the
> state of the mutex changes until the target cpu has been scheduled again by
> the hypervisor.
> For that reason we should change the default behaviour to no-spin on s390.

Hmmm. Wouldn't this be a change that applies to ibm system p as wel?
E.g. with linux in an lpar.


Folkert van Heusden

--
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, http://www.vanheusden.com

2009-04-20 12:02:14

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

On Fri, 17 Apr 2009 23:42:12 +0200
Folkert van Heusden <[email protected]> wrote:

> > The adaptive spinning mutexes will not always do what one would expect on
> > virtualized architectures like s390. Especially the cpu_relax() loop in
> > mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
> > away by the hypervisor.
> > We would end up in a cpu_relax() loop when there is no chance that the
> > state of the mutex changes until the target cpu has been scheduled again by
> > the hypervisor.
> > For that reason we should change the default behaviour to no-spin on s390.
>
> Hmmm. Wouldn't this be a change that applies to ibm system p as wel?
> E.g. with linux in an lpar.

This applies to every kernel that runs under a hypervisor. However the
difference is that s390 kernels _always_ have a hypervisor, whereas
other architectures may or may not have one.
In case the mutex spinning code ist causing problem it still can be
switched at run time. That way other architectures can come up with a
solution that fits their needs best.

2009-04-20 12:04:21

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] mutex: have non-spinning mutexes on s390 by default

From: Heiko Carstens <[email protected]>
Date: Mon, 20 Apr 2009 14:01:03 +0200

> On Fri, 17 Apr 2009 23:42:12 +0200
> Folkert van Heusden <[email protected]> wrote:
>
>> > The adaptive spinning mutexes will not always do what one would expect on
>> > virtualized architectures like s390. Especially the cpu_relax() loop in
>> > mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
>> > away by the hypervisor.
>> > We would end up in a cpu_relax() loop when there is no chance that the
>> > state of the mutex changes until the target cpu has been scheduled again by
>> > the hypervisor.
>> > For that reason we should change the default behaviour to no-spin on s390.
>>
>> Hmmm. Wouldn't this be a change that applies to ibm system p as wel?
>> E.g. with linux in an lpar.
>
> This applies to every kernel that runs under a hypervisor.

I wouldn't say "every" because this change is definitely not
appropriate on sparc64 hypervisor based systems.