2010-01-31 21:12:22

by Mathieu Desnoyers

[permalink] [raw]
Subject: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

Create the primitive family:

spin_lock__no_acquire
spin_unlock__no_release
spin_lock_irq__no_acquire
spin_unlock_irq__no_release

raw_spin_lock__no_acquire
raw_spin_unlock__no_release
raw_spin_lock_irq__no_acquire
raw_spin_unlock_irq__no_release

smp_acquire__after_spin_lock()
smp_release__before_spin_unlock()
smp_mb__after_spin_lock()
smp_mb__before_spin_unlock()

This family of locks permits to combine the acquire/release ordering implied by
the spin lock and full memory barriers placed just after spin lock and before
spin unlock. Combining these memory barriers permits to decrease the overhead.

The first user of these primitives is the scheduler code, where a full memory
barrier is wanted before and after runqueue data structure modifications so
these can be read safely by sys_membarrier without holding the rq lock.

Replacing the spin lock acquire/release barriers with these memory barriers
imply either no overhead (x86 spinlock atomic instruction already implies a full
mb) or some hopefully small overhead caused by the upgrade of the spinlock
acquire/release barriers to more heavyweight smp_mb().

The "generic" version of spinlock-mb.h declares both a mapping to standard
spinlocks and full memory barriers. Each architecture can specialize this header
following their own need and declare CONFIG_HAVE_SPINLOCK_MB to use their own
spinlock-mb.h. This initial implementation only specializes the x86
architecture.

Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: Nicholas Miell <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/spinlock-mb.h | 28 ++++++++++++++++++++++++++++
include/asm-generic/spinlock-mb.h | 27 +++++++++++++++++++++++++++
include/linux/spinlock.h | 6 ++++++
init/Kconfig | 3 +++
5 files changed, 65 insertions(+)

Index: linux-2.6-lttng/include/asm-generic/spinlock-mb.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-generic/spinlock-mb.h 2010-01-31 15:12:16.000000000 -0500
@@ -0,0 +1,27 @@
+#ifndef ASM_GENERIC_SPINLOCK_MB_H
+#define ASM_GENERIC_SPINLOCK_MB_H
+
+/*
+ * Generic spinlock-mb mappings. Use standard spinlocks with acquire/release
+ * semantics, and define the associated memory barriers as full memory barriers.
+ */
+
+#define spin_lock__no_acquire spin_lock
+#define spin_unlock__no_release spin_unlock
+
+#define spin_lock_irq__no_acquire spin_lock_irq
+#define spin_unlock_irq__no_release spin_unlock_irq
+
+#define raw_spin_lock__no_acquire raw_spin_lock
+#define raw_spin_unlock__no_release raw_spin_unlock
+
+#define raw_spin_lock_irq__no_acquire raw_spin_lock_irq
+#define raw_spin_unlock_irq__no_release raw_spin_unlock_irq
+
+#define smp_acquire__after_spin_lock() do { } while (0)
+#define smp_release__before_spin_unlock() do { } while (0)
+
+#define smp_mb__after_spin_lock() smp_mb()
+#define smp_mb__before_spin_unlock() smp_mb()
+
+#endif /* ASM_GENERIC_SPINLOCK_MB_H */
Index: linux-2.6-lttng/include/linux/spinlock.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/spinlock.h 2010-01-31 14:59:42.000000000 -0500
+++ linux-2.6-lttng/include/linux/spinlock.h 2010-01-31 15:00:40.000000000 -0500
@@ -393,4 +393,10 @@ extern int _atomic_dec_and_lock(atomic_t
#define atomic_dec_and_lock(atomic, lock) \
__cond_lock(lock, _atomic_dec_and_lock(atomic, lock))

+#ifdef CONFIG_HAVE_SPINLOCK_MB
+# include <asm/spinlock-mb.h>
+#else
+# include <asm-generic/spinlock-mb.h>
+#endif
+
#endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6-lttng/arch/x86/include/asm/spinlock-mb.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/x86/include/asm/spinlock-mb.h 2010-01-31 15:11:28.000000000 -0500
@@ -0,0 +1,28 @@
+#ifndef ASM_X86_SPINLOCK_MB_H
+#define ASM_X86_SPINLOCK_MB_H
+
+/*
+ * X86 spinlock-mb mappings. Use standard spinlocks with acquire/release
+ * semantics. Associated memory barriers are defined as no-ops, because the
+ * spinlock LOCK-prefixed atomic operations imply a full memory barrier.
+ */
+
+#define spin_lock__no_acquire spin_lock
+#define spin_unlock__no_release spin_unlock
+
+#define spin_lock_irq__no_acquire spin_lock_irq
+#define spin_unlock_irq__no_release spin_unlock_irq
+
+#define raw_spin_lock__no_acquire raw_spin_lock
+#define raw_spin_unlock__no_release raw_spin_unlock
+
+#define raw_spin_lock_irq__no_acquire raw_spin_lock_irq
+#define raw_spin_unlock_irq__no_release raw_spin_unlock_irq
+
+#define smp_acquire__after_spin_lock() do { } while (0)
+#define smp_release__before_spin_unlock() do { } while (0)
+
+#define smp_mb__after_spin_lock() do { } while (0)
+#define smp_mb__before_spin_unlock() do { } while (0)
+
+#endif /* ASM_X86_SPINLOCK_MB_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig 2010-01-31 14:59:32.000000000 -0500
+++ linux-2.6-lttng/arch/x86/Kconfig 2010-01-31 15:01:09.000000000 -0500
@@ -55,6 +55,7 @@ config X86
select ANON_INODES
select HAVE_ARCH_KMEMCHECK
select HAVE_USER_RETURN_NOTIFIER
+ select HAVE_SPINLOCK_MB

config OUTPUT_FORMAT
string
Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig 2010-01-31 14:59:42.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig 2010-01-31 15:00:40.000000000 -0500
@@ -320,6 +320,9 @@ config AUDIT_TREE
depends on AUDITSYSCALL
select INOTIFY

+config HAVE_SPINLOCK_MB
+ def_bool n
+
menu "RCU Subsystem"

choice

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68


2010-02-01 08:30:32

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

On Sun, Jan 31, 2010 at 03:52:55PM -0500, Mathieu Desnoyers wrote:
> +/*
> + * X86 spinlock-mb mappings. Use standard spinlocks with acquire/release
> + * semantics. Associated memory barriers are defined as no-ops, because the
> + * spinlock LOCK-prefixed atomic operations imply a full memory barrier.
> + */
> +
> +#define spin_lock__no_acquire spin_lock
> +#define spin_unlock__no_release spin_unlock
> +
> +#define spin_lock_irq__no_acquire spin_lock_irq
> +#define spin_unlock_irq__no_release spin_unlock_irq
> +
> +#define raw_spin_lock__no_acquire raw_spin_lock
> +#define raw_spin_unlock__no_release raw_spin_unlock
> +
> +#define raw_spin_lock_irq__no_acquire raw_spin_lock_irq
> +#define raw_spin_unlock_irq__no_release raw_spin_unlock_irq
> +
> +#define smp_acquire__after_spin_lock() do { } while (0)
> +#define smp_release__before_spin_unlock() do { } while (0)
> +
> +#define smp_mb__after_spin_lock() do { } while (0)
> +#define smp_mb__before_spin_unlock() do { } while (0)

Oh, and that one's wrong. loads can pass spin_unlock on x86 so it
needs to be smp_mb().

2010-02-01 08:30:40

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

On Sun, Jan 31, 2010 at 03:52:55PM -0500, Mathieu Desnoyers wrote:
> Create the primitive family:
>
> spin_lock__no_acquire
> spin_unlock__no_release
> spin_lock_irq__no_acquire
> spin_unlock_irq__no_release
>
> raw_spin_lock__no_acquire
> raw_spin_unlock__no_release
> raw_spin_lock_irq__no_acquire
> raw_spin_unlock_irq__no_release
>
> smp_acquire__after_spin_lock()
> smp_release__before_spin_unlock()
> smp_mb__after_spin_lock()
> smp_mb__before_spin_unlock()

Wow, someone who likes micro optimising things as much as I do.
However, these have the wrong names.

smp_mb__after_x() means that calling that function after calling x()
will give a smp_mb(), right?

With your functions, this is giving a smp_mb() after calling
x__no_acquire().

I would suggest maybe just don't bother with the __no_acquire
__no_release variants of spin locks, and stick with the expected
semantics for the new smb_mb__xxx functions. x86 still gets the
full benefit.

But, I don't know if this is even worthwhile, given where you are
using it.

2010-02-01 14:08:21

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

* Nick Piggin ([email protected]) wrote:
> On Sun, Jan 31, 2010 at 03:52:55PM -0500, Mathieu Desnoyers wrote:
> > Create the primitive family:
> >
> > spin_lock__no_acquire
> > spin_unlock__no_release
> > spin_lock_irq__no_acquire
> > spin_unlock_irq__no_release
> >
> > raw_spin_lock__no_acquire
> > raw_spin_unlock__no_release
> > raw_spin_lock_irq__no_acquire
> > raw_spin_unlock_irq__no_release
> >
> > smp_acquire__after_spin_lock()
> > smp_release__before_spin_unlock()
> > smp_mb__after_spin_lock()
> > smp_mb__before_spin_unlock()
>
> Wow, someone who likes micro optimising things as much as I do.

:-)

> However, these have the wrong names.
>
> smp_mb__after_x() means that calling that function after calling x()
> will give a smp_mb(), right?

Supposed to.

>
> With your functions, this is giving a smp_mb() after calling
> x__no_acquire().

Good point.

>
> I would suggest maybe just don't bother with the __no_acquire
> __no_release variants of spin locks, and stick with the expected
> semantics for the new smb_mb__xxx functions. x86 still gets the
> full benefit.

Well, most other architectures will get a gain by modifying the spin
lock/unlock itself and adding the full memory barriers separately. x86
is a special case here.

>
> But, I don't know if this is even worthwhile, given where you are
> using it.

Given that it's used to modify the scheduler fast path, I try to keep it
as fast as possible. But as you point out later in the thread, we might
just consider taking all the rq locks one after another when issuing a
sys_membarrier() instead. I did these modifications mostly to please
Peter who is concerned about the impact of taking these rq lock very
frequently in a DoS scenario (which you appropriately point out would
not be the first case in the kernel, and would actually generate a RoS
rather than DoS).

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2010-02-01 14:15:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

* Nick Piggin ([email protected]) wrote:
> On Sun, Jan 31, 2010 at 03:52:55PM -0500, Mathieu Desnoyers wrote:
> > +/*
> > + * X86 spinlock-mb mappings. Use standard spinlocks with acquire/release
> > + * semantics. Associated memory barriers are defined as no-ops, because the
> > + * spinlock LOCK-prefixed atomic operations imply a full memory barrier.
> > + */
> > +
> > +#define spin_lock__no_acquire spin_lock
> > +#define spin_unlock__no_release spin_unlock
> > +
> > +#define spin_lock_irq__no_acquire spin_lock_irq
> > +#define spin_unlock_irq__no_release spin_unlock_irq
> > +
> > +#define raw_spin_lock__no_acquire raw_spin_lock
> > +#define raw_spin_unlock__no_release raw_spin_unlock
> > +
> > +#define raw_spin_lock_irq__no_acquire raw_spin_lock_irq
> > +#define raw_spin_unlock_irq__no_release raw_spin_unlock_irq
> > +
> > +#define smp_acquire__after_spin_lock() do { } while (0)
> > +#define smp_release__before_spin_unlock() do { } while (0)
> > +
> > +#define smp_mb__after_spin_lock() do { } while (0)
> > +#define smp_mb__before_spin_unlock() do { } while (0)
>
> Oh, and that one's wrong. loads can pass spin_unlock on x86 so it
> needs to be smp_mb().
>

Good catch !

#if defined(CONFIG_X86_32) && \
(defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE))
/*
* On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
* (PPro errata 66, 92)
*/
# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
#else
# define UNLOCK_LOCK_PREFIX
#endif

Thanks,

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2010-02-01 15:23:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier



On Sun, 31 Jan 2010, Mathieu Desnoyers wrote:
>
> Create the primitive family:
>
> spin_lock__no_acquire
> spin_unlock__no_release
> spin_lock_irq__no_acquire
> spin_unlock_irq__no_release

I really really hate this patch.

Locking is really hard to get right anyway, you already had one bug in
this, and on the major architectures, none of this will matter _anyway_,
since you cannot split the acquire from the lock and the release from the
unlock.

So the only operation that actually makes any sense at all would be
"smp_mb__after_spin_lock" (and no new spinlock primitives at all), since
it can be optimized away on x86 (and then add "smp_mb__before_spin_unlock"
just to make them symmetric). But even that one is very dubious:

"The first user of these primitives is the scheduler code, where a full
memory barrier is wanted before and after runqueue data structure
modifications so these can be read safely by sys_membarrier without
holding the rq lock."

what kind of crazy model is this? That sounds insane. Locking together
with some new random optimistic usage that we don't even know how
performance-critical it is? Mixing locking and non-locking is simply
wrong. Why would it be any safer to read whatever that the lock protects
by adding smp barriers at the lock?

If you need other smp barriers at the lock, then what about the non-locked
accesses _while_ the lock is held? You get no ordering guarantees there.
The whole thing sounds highly dubious.

And all of this for something that is a new system call that nobody
actually uses? To optimize the new and experimental path with some insane
lockfree model, while making the core kernel more complex? A _very_
strong NAK from me.

Linus

2010-02-01 15:41:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [patch 1/3] Create spin lock/spin unlock with distinct memory barrier

On Mon, 2010-02-01 at 07:22 -0800, Linus Torvalds wrote:
>

> If you need other smp barriers at the lock, then what about the non-locked
> accesses _while_ the lock is held? You get no ordering guarantees there.
> The whole thing sounds highly dubious.

The issues is not about protecting data, it was all about ordering an
update of a variable (mm_cpumask) with respect to scheduling. The lock
was just a convenient place to add this protection. The memory barriers
here would allow the syscall to use memory barriers instead of locks.

>
> And all of this for something that is a new system call that nobody
> actually uses? To optimize the new and experimental path with some insane
> lockfree model, while making the core kernel more complex? A _very_
> strong NAK from me.

I totally agree with this. The updates here were from the fear of
grabbing all rq spinlocks (one at a time) called by a syscall would open
up a DoS (or as Nick said RoS - Reduction of Service). If someone called
this syscall within a while(1) loop on some large # CPU box, it could
cause cache thrashing.

But this is all being paranoid, and not worth the complexity in the core
scheduler. We don't even know if this fear is founded.

-- Steve