2008-10-22 08:33:48

by Petr Tesařík

[permalink] [raw]
Subject: [PATCH 0/2] enable irqs when waiting for rwlocks

Hello,

SGI has observed that on large systems, interrupts are not serviced for
a long period of time when waiting for a rwlock. The following patch
series re-enables irqs while waiting for the lock, resembling the code
which is already there for spinlocks.

I only made the ia64 version, because the patch adds some overhead to
the fast path. In particular, I wasn't able to make a reasonably
lightweight implementation for x86, so I assume there are not such large
systems based on x86(_64). Of course, the possibility to implement
raw_{read|write}_lock_flags for any architecture is still there.

Petr Tesarik


2008-10-22 08:34:48

by Petr Tesařík

[permalink] [raw]
Subject: [PATCH 1/2] Allow rwlocks to re-enable interrupts

Pass the original flags to rwlock arch-code, so that it can re-enable
interrupts if implemented for that architecture.

Initially, make __raw_read_lock_flags and __raw_write_lock_flags
stubs which just do the same thing as their non-flags variants.

Signed-off-by: Petr Tesarik <[email protected]>
---
arch/alpha/include/asm/spinlock.h | 3 +++
arch/arm/include/asm/spinlock.h | 3 +++
arch/ia64/include/asm/spinlock.h | 3 +++
arch/mips/include/asm/spinlock.h | 2 ++
arch/powerpc/include/asm/spinlock.h | 3 +++
arch/s390/include/asm/spinlock.h | 3 +++
arch/sh/include/asm/spinlock.h | 3 +++
arch/sparc/include/asm/spinlock_32.h | 2 ++
arch/sparc/include/asm/spinlock_64.h | 2 ++
include/asm-cris/arch-v32/spinlock.h | 2 ++
include/asm-m32r/spinlock.h | 3 +++
include/asm-parisc/spinlock.h | 3 +++
include/asm-x86/spinlock.h | 3 +++
include/linux/spinlock.h | 6 ++++++
kernel/spinlock.c | 8 ++++++++
15 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index aeeb125..e38fb95 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -166,6 +166,9 @@ static inline void __raw_write_unlock(raw_rwlock_t * lock)
lock->lock = 0;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 2b41ebb..c13681a 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -217,6 +217,9 @@ static inline int __raw_read_trylock(raw_rwlock_t *rw)
/* read_can_lock - would read_trylock() succeed? */
#define __raw_read_can_lock(x) ((x)->lock < 0x80000000)

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index 0229fb9..0a61961 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -213,6 +213,9 @@ static inline int __raw_read_trylock(raw_rwlock_t *x)
return (u32)ia64_cmpxchg4_acq((__u32 *)(x), new.word, old.word) == old.word;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index 1a1f320..de03e47 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -479,6 +479,8 @@ static inline int __raw_write_trylock(raw_rwlock_t *rw)
return ret;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)

#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index f56a843..f6491ae 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -287,6 +287,9 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
rw->lock = 0;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) __spin_yield(lock)
#define _raw_read_relax(lock) __rw_yield(lock)
#define _raw_write_relax(lock) __rw_yield(lock)
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index df84ae9..f3861b0 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -172,6 +172,9 @@ static inline int __raw_write_trylock(raw_rwlock_t *rw)
return _raw_write_trylock_retry(rw);
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()

diff --git a/arch/sh/include/asm/spinlock.h b/arch/sh/include/asm/spinlock.h
index e793181..6028356 100644
--- a/arch/sh/include/asm/spinlock.h
+++ b/arch/sh/include/asm/spinlock.h
@@ -216,6 +216,9 @@ static inline int __raw_write_trylock(raw_rwlock_t *rw)
return (oldval > (RW_LOCK_BIAS - 1));
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index bf2d532..46f91ab 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -177,6 +177,8 @@ static inline int __read_trylock(raw_rwlock_t *rw)
#define __raw_write_unlock(rw) do { (rw)->lock = 0; } while(0)

#define __raw_spin_lock_flags(lock, flags) __raw_spin_lock(lock)
+#define __raw_read_lock_flags(rw, flags) __raw_read_lock(rw)
+#define __raw_write_lock_flags(rw, flags) __raw_write_lock(rw)

#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
diff --git a/arch/sparc/include/asm/spinlock_64.h b/arch/sparc/include/asm/spinlock_64.h
index 120cfe4..e856cd0 100644
--- a/arch/sparc/include/asm/spinlock_64.h
+++ b/arch/sparc/include/asm/spinlock_64.h
@@ -230,9 +230,11 @@ static int inline __write_trylock(raw_rwlock_t *lock)
}

#define __raw_read_lock(p) __read_lock(p)
+#define __raw_read_lock_flags(p, f) __read_lock(p)
#define __raw_read_trylock(p) __read_trylock(p)
#define __raw_read_unlock(p) __read_unlock(p)
#define __raw_write_lock(p) __write_lock(p)
+#define __raw_write_lock_flags(p, f) __write_lock(p)
#define __raw_write_unlock(p) __write_unlock(p)
#define __raw_write_trylock(p) __write_trylock(p)

diff --git a/include/asm-cris/arch-v32/spinlock.h b/include/asm-cris/arch-v32/spinlock.h
index 0d5709b..129756b 100644
--- a/include/asm-cris/arch-v32/spinlock.h
+++ b/include/asm-cris/arch-v32/spinlock.h
@@ -121,6 +121,8 @@ static inline int __raw_write_trylock(raw_rwlock_t *rw)
return 1;
}

+#define _raw_read_lock_flags(lock, flags) _raw_read_lock(lock)
+#define _raw_write_lock_flags(lock, flags) _raw_write_lock(lock)

#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
diff --git a/include/asm-m32r/spinlock.h b/include/asm-m32r/spinlock.h
index f5cfba8..dded923 100644
--- a/include/asm-m32r/spinlock.h
+++ b/include/asm-m32r/spinlock.h
@@ -316,6 +316,9 @@ static inline int __raw_write_trylock(raw_rwlock_t *lock)
return 0;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/include/asm-parisc/spinlock.h b/include/asm-parisc/spinlock.h
index f3d2090..fae03e1 100644
--- a/include/asm-parisc/spinlock.h
+++ b/include/asm-parisc/spinlock.h
@@ -187,6 +187,9 @@ static __inline__ int __raw_write_can_lock(raw_rwlock_t *rw)
return !rw->counter;
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/include/asm-x86/spinlock.h b/include/asm-x86/spinlock.h
index 157ff7f..132f5d3 100644
--- a/include/asm-x86/spinlock.h
+++ b/include/asm-x86/spinlock.h
@@ -357,6 +357,9 @@ static inline void __raw_write_unlock(raw_rwlock_t *rw)
: "+m" (rw->lock) : "i" (RW_LOCK_BIAS) : "memory");
}

+#define __raw_read_lock_flags(lock, flags) __raw_read_lock(lock)
+#define __raw_write_lock_flags(lock, flags) __raw_write_lock(lock)
+
#define _raw_spin_relax(lock) cpu_relax()
#define _raw_read_relax(lock) cpu_relax()
#define _raw_write_relax(lock) cpu_relax()
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index e0c0fcc..9e3fe36 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -148,9 +148,11 @@ do { \
extern int _raw_spin_trylock(spinlock_t *lock);
extern void _raw_spin_unlock(spinlock_t *lock);
extern void _raw_read_lock(rwlock_t *lock);
+#define _raw_read_lock_flags(lock, flags) _raw_read_lock(lock)
extern int _raw_read_trylock(rwlock_t *lock);
extern void _raw_read_unlock(rwlock_t *lock);
extern void _raw_write_lock(rwlock_t *lock);
+#define _raw_write_lock_flags(lock, flags) _raw_write_lock(lock)
extern int _raw_write_trylock(rwlock_t *lock);
extern void _raw_write_unlock(rwlock_t *lock);
#else
@@ -160,9 +162,13 @@ do { \
# define _raw_spin_trylock(lock) __raw_spin_trylock(&(lock)->raw_lock)
# define _raw_spin_unlock(lock) __raw_spin_unlock(&(lock)->raw_lock)
# define _raw_read_lock(rwlock) __raw_read_lock(&(rwlock)->raw_lock)
+# define _raw_read_lock_flags(lock, flags) \
+ __raw_read_lock_flags(&(lock)->raw_lock, *(flags))
# define _raw_read_trylock(rwlock) __raw_read_trylock(&(rwlock)->raw_lock)
# define _raw_read_unlock(rwlock) __raw_read_unlock(&(rwlock)->raw_lock)
# define _raw_write_lock(rwlock) __raw_write_lock(&(rwlock)->raw_lock)
+# define _raw_write_lock_flags(lock, flags) \
+ __raw_write_lock_flags(&(lock)->raw_lock, *(flags))
# define _raw_write_trylock(rwlock) __raw_write_trylock(&(rwlock)->raw_lock)
# define _raw_write_unlock(rwlock) __raw_write_unlock(&(rwlock)->raw_lock)
#endif
diff --git a/kernel/spinlock.c b/kernel/spinlock.c
index 29ab207..f769d8a 100644
--- a/kernel/spinlock.c
+++ b/kernel/spinlock.c
@@ -121,7 +121,11 @@ unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
local_irq_save(flags);
preempt_disable();
rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+#ifdef CONFIG_LOCKDEP
LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+#else
+ _raw_read_lock_flags(lock, &flags);
+#endif
return flags;
}
EXPORT_SYMBOL(_read_lock_irqsave);
@@ -151,7 +155,11 @@ unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock)
local_irq_save(flags);
preempt_disable();
rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+#ifdef CONFIG_LOCKDEP
LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+#else
+ _raw_write_lock_flags(lock, &flags);
+#endif
return flags;
}
EXPORT_SYMBOL(_write_lock_irqsave);

2008-10-22 08:45:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Allow rwlocks to re-enable interrupts

On Wed, 2008-10-22 at 10:34 +0200, Petr Tesarik wrote:
> c b/kernel/spinlock.c
> index 29ab207..f769d8a 100644
> --- a/kernel/spinlock.c
> +++ b/kernel/spinlock.c
> @@ -121,7 +121,11 @@ unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
> local_irq_save(flags);
> preempt_disable();
> rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
> +#ifdef CONFIG_LOCKDEP
> LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
> +#else
> + _raw_read_lock_flags(lock, &flags);
> +#endif
> return flags;
> }

That should be CONFIG_LOCK_STAT.

But aside from that, I really don't like this change, I'd rather you'd
create a LOCK_CONTENDED_FLAGS() that can deal with this.

2008-10-22 08:58:30

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH 1/2] Allow rwlocks to re-enable interrupts

On Wed, 2008-10-22 at 10:45 +0200, Peter Zijlstra wrote:
> On Wed, 2008-10-22 at 10:34 +0200, Petr Tesarik wrote:
> > c b/kernel/spinlock.c
> > index 29ab207..f769d8a 100644
> > --- a/kernel/spinlock.c
> > +++ b/kernel/spinlock.c
> > @@ -121,7 +121,11 @@ unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
> > local_irq_save(flags);
> > preempt_disable();
> > rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
> > +#ifdef CONFIG_LOCKDEP
> > LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
> > +#else
> > + _raw_read_lock_flags(lock, &flags);
> > +#endif
> > return flags;
> > }
>
> That should be CONFIG_LOCK_STAT.

Ugh. Fine with me, but is CONFIG_LOCKDEP correct in _spin_lock_irqsave,
or should it also read CONFIG_LOCK_STAT?

> But aside from that, I really don't like this change, I'd rather you'd
> create a LOCK_CONTENDED_FLAGS() that can deal with this.

No problem. I could then also use it for _spin_lock_irqsave, if the
answer to the above question is use CONFIG_LOCK_STAT there as well.

Petr Tesarik

2008-10-22 17:24:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Allow rwlocks to re-enable interrupts

On Wed, 2008-10-22 at 10:58 +0200, Petr Tesarik wrote:
> On Wed, 2008-10-22 at 10:45 +0200, Peter Zijlstra wrote:
> > On Wed, 2008-10-22 at 10:34 +0200, Petr Tesarik wrote:
> > > c b/kernel/spinlock.c
> > > index 29ab207..f769d8a 100644
> > > --- a/kernel/spinlock.c
> > > +++ b/kernel/spinlock.c
> > > @@ -121,7 +121,11 @@ unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
> > > local_irq_save(flags);
> > > preempt_disable();
> > > rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
> > > +#ifdef CONFIG_LOCKDEP
> > > LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
> > > +#else
> > > + _raw_read_lock_flags(lock, &flags);
> > > +#endif
> > > return flags;
> > > }
> >
> > That should be CONFIG_LOCK_STAT.
>
> Ugh. Fine with me, but is CONFIG_LOCKDEP correct in _spin_lock_irqsave,
> or should it also read CONFIG_LOCK_STAT?

Yep.

> > But aside from that, I really don't like this change, I'd rather you'd
> > create a LOCK_CONTENDED_FLAGS() that can deal with this.
>
> No problem. I could then also use it for _spin_lock_irqsave, if the
> answer to the above question is use CONFIG_LOCK_STAT there as well.

If you create LOCK_CONTEDED_FLAGS() the whole issue goes away nicely.

2008-10-22 17:32:19

by Rick Jones

[permalink] [raw]
Subject: Re: [PATCH 0/2] enable irqs when waiting for rwlocks

Petr Tesarik wrote:
> Hello,
>
> SGI has observed that on large systems, interrupts are not serviced for
> a long period of time when waiting for a rwlock. The following patch
> series re-enables irqs while waiting for the lock, resembling the code
> which is already there for spinlocks.

Perhaps I'm just out in left field, but that (and the similar behaviour
for obtaining a spinlock?) feels like treating a symptom rather than a
root cause where the root cause would appear to be long lock hold
times/contention?

rick jones

2008-10-22 19:05:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/2] Allow rwlocks to re-enable interrupts

On Wed, Oct 22, 2008 at 07:24:31PM +0200, Peter Zijlstra wrote:
> > No problem. I could then also use it for _spin_lock_irqsave, if the
> > answer to the above question is use CONFIG_LOCK_STAT there as well.
>
> If you create LOCK_CONTEDED_FLAGS() the whole issue goes away nicely.

Should it also be used for _spin_lock_irq()? I'm puzzled why it's only
used for _irqsave().

(should _spin_lock_bh() re-enable BHs while waiting? Is it just not big
enough of a deal?)

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-10-22 19:19:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] Allow rwlocks to re-enable interrupts

On Wed, 2008-10-22 at 13:05 -0600, Matthew Wilcox wrote:
> On Wed, Oct 22, 2008 at 07:24:31PM +0200, Peter Zijlstra wrote:
> > > No problem. I could then also use it for _spin_lock_irqsave, if the
> > > answer to the above question is use CONFIG_LOCK_STAT there as well.
> >
> > If you create LOCK_CONTEDED_FLAGS() the whole issue goes away nicely.

Gah, I looked at it again, and that #ifdef isn't only to select between
LOCK_CONTENDED and not, but we can't actually have the re-enable for
anything lockdep.

So I was wrong.

> Should it also be used for _spin_lock_irq()? I'm puzzled why it's only
> used for _irqsave().

Right, not sure how this maze is done.

The thing is, with spin_lock_irq() you know the irq state and can do the
enable unconditionally - then again, with ticket locks we cannot do it
at all.

The _flags() version needs the flags to see if irqs was enabled before
we entered the op, if it wasn't we cannot go around enabling them.

> (should _spin_lock_bh() re-enable BHs while waiting? Is it just not big
> enough of a deal?)

Doubt it.. dunno.. personally I'd rather see softirqs die sooner rather
than later.

2008-10-23 14:12:44

by Robin Holt

[permalink] [raw]
Subject: Re: [PATCH 0/2] enable irqs when waiting for rwlocks

On Wed, Oct 22, 2008 at 10:31:53AM -0700, Rick Jones wrote:
> Petr Tesarik wrote:
>> Hello,
>>
>> SGI has observed that on large systems, interrupts are not serviced for
>> a long period of time when waiting for a rwlock. The following patch
>> series re-enables irqs while waiting for the lock, resembling the code
>> which is already there for spinlocks.
>
> Perhaps I'm just out in left field, but that (and the similar behaviour
> for obtaining a spinlock?) feels like treating a symptom rather than a
> root cause where the root cause would appear to be long lock hold
> times/contention?

Sometimes lock contention on large systems will take a few seconds to
pass. This is normal behavior which simply can not be eliminated. For
those cases, we need to rely upon being able to re-enable interrupts and
allowing other operations to continue normally. Simply put, in some
cases, nothing more can be done.

Thanks,
Robin