LinuxLists.cc - Why Semaphore Hardware-Dependent?

2006-08-27 19:22:17

[permalink] [raw]

Subject: Why Semaphore Hardware-Dependent?

Why can't we have a hardware-independent semaphore definition while we
have already had hardware-dependent spinlock, rwlock, and rcu lock? It
seems the semaphore definitions classified into two major categories.
The main deference is whether there is a member variable _sleeper_.
Does this (optional) member indicate any hardware family gene?

Best Regards.
Feng,Dong

2006-08-27 20:54:55

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Sunday 27 August 2006 21:22, Dong Feng wrote:
> Why can't we have a hardware-independent semaphore definition while we
> have already had hardware-dependent spinlock, rwlock, and rcu lock? It

We probably could yes, if up/down were out of lined. The only
reason it is assembly code is that it uses still funky assembly
to get a fast uncontended fast path. Since out of lining
worked for spinlocks it will likely work for semaphores too.

> seems the semaphore definitions classified into two major categories.
> The main deference is whether there is a member variable _sleeper_.
> Does this (optional) member indicate any hardware family gene?

AFAIK the normal semaphores all work basically the same over the
architectures, just the calling conventions are different. If it was
pure out of line C that wouldn't be a problem anymore.

rwsems don't -- there are two flavours: a generic spinlock'ed one and a
complicated atomic based one that only works on some architectures.
As far as I know nobody has demonstrated a clear performance increase
from the first so it might be possible to switch all to the generic
implementation.

If you're interested in this you should probably do patches and benchmarks.

-Andi

2006-08-27 21:06:22

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Sun, 27 Aug 2006, Andi Kleen wrote:

> rwsems don't -- there are two flavours: a generic spinlock'ed one and a
> complicated atomic based one that only works on some architectures.
> As far as I know nobody has demonstrated a clear performance increase
> from the first so it might be possible to switch all to the generic
> implementation.

Yup that would be the major issue.I'd be interested to see some tests in
that area.

2006-08-27 21:39:14

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Sunday 27 August 2006 23:05, Christoph Lameter wrote:
> On Sun, 27 Aug 2006, Andi Kleen wrote:
>
> > rwsems don't -- there are two flavours: a generic spinlock'ed one and a
> > complicated atomic based one that only works on some architectures.
> > As far as I know nobody has demonstrated a clear performance increase
> > from the first so it might be possible to switch all to the generic
> > implementation.
>
> Yup that would be the major issue.I'd be interested to see some tests in
> that area.

x86-64 always uses the spinlocked version (vs i386 using the atomic one)
and I haven't heard of anybody complaining.

-Andi

2006-08-27 22:14:56

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Sun, 27 Aug 2006, Andi Kleen wrote:

> x86-64 always uses the spinlocked version (vs i386 using the atomic one)
> and I haven't heard of anybody complaining.

Ia64 has special counters for rwsemaphores. I'd like to see if there is
any performance loss. mmap_sem is a rwsemaphore. This is performance
critical.

2006-08-28 00:18:44

by Paul Mackerras

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Dong Feng writes:

> Why can't we have a hardware-independent semaphore definition while we
> have already had hardware-dependent spinlock, rwlock, and rcu lock? It
> seems the semaphore definitions classified into two major categories.
> The main deference is whether there is a member variable _sleeper_.
> Does this (optional) member indicate any hardware family gene?

It indicates the presence of instructions that let you implement a
flexible atomic update, that is, either load-locked and
store-conditional instructions, or a compare-and-exchange instruction.

The original x86 implementation had the `count' and `sleepers' fields
in the semaphore structure. For PowerPC, I redesigned the semaphores
to have only the `count' field, which I was able to do because PowerPC
has `load with reservation' and `store conditional' instructions,
which one can use to construct code to do atomic updates where the
resulting value can be just about any function of the initial value.

For semaphores, I made a __sem_update_count function which atomically
updates the count field to (MAX(count, 0) + inc). That implementation
was subsequently picked up by other architectures with equivalent
instructions (alpha, mips, s390, sparc64, etc.). Have a look at
arch/powerpc/kernel/semaphore.c for the details.

I believe the reason for not doing something like this on x86 was the
fact that we still support i386 processors, which don't have the
cmpxchg instruction. That's fair enough, but I would be opposed to
making semaphores bigger and slower on PowerPC because of that. Hence
the two different styles of implementation.

Paul.

2006-08-28 05:14:15

by Chris Wedgwood

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Mon, Aug 28, 2006 at 10:18:35AM +1000, Paul Mackerras wrote:

> I believe the reason for not doing something like this on x86 was
> the fact that we still support i386 processors, which don't have the
> cmpxchg instruction. That's fair enough, but I would be opposed to
> making semaphores bigger and slower on PowerPC because of that.
> Hence the two different styles of implementation.

The i386 is older than some of the kernel hackers, and given that a
modern kernel is pretty painful with less than say 16MB or RAM in
practice, I don't see that it would be all that terrible to drop
support for ancient CPUs at some point (yes, I know some newer
embedded (and similar) CPUs might be affected here too, but surely not
that many that people really use --- and they could just use 2.4.x).

2006-08-28 05:21:28

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Sun, 27 Aug 2006, Chris Wedgwood wrote:

> On Mon, Aug 28, 2006 at 10:18:35AM +1000, Paul Mackerras wrote:
>
> > I believe the reason for not doing something like this on x86 was
> > the fact that we still support i386 processors, which don't have the
> > cmpxchg instruction. That's fair enough, but I would be opposed to
> > making semaphores bigger and slower on PowerPC because of that.
> > Hence the two different styles of implementation.
>
> The i386 is older than some of the kernel hackers, and given that a
> modern kernel is pretty painful with less than say 16MB or RAM in
> practice, I don't see that it would be all that terrible to drop
> support for ancient CPUs at some point (yes, I know some newer
> embedded (and similar) CPUs might be affected here too, but surely not
> that many that people really use --- and they could just use 2.4.x).

Also note that i386 has a cmpxchg emulation for those machines that do not
support cmpxchg.

2006-08-28 06:00:24

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

>
>The i386 is older than some of the kernel hackers, and given that a
>modern kernel is pretty painful with less than say 16MB or RAM in
>practice

I have to concur. (Sure, you can't get a reasonable system to work on 16MB,
but the kernel is fine with 5 megs of RAM. In fact, ancient i386 boxes
usually do not have "big" things like SCSI, USB or Audio.)

>, I don't see that it would be all that terrible to drop
>support for ancient CPUs at some point (yes, I know some newer
>embedded (and similar) CPUs might be affected here too, but surely not
>that many that people really use --- and they could just use 2.4.x).

Jan Engelhardt
--

2006-08-28 07:23:32

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

>
> I believe the reason for not doing something like this on x86 was the
> fact that we still support i386 processors, which don't have the
> cmpxchg instruction.

i386 emulates cmpxchg these days (other than that most likely 99.9+% of all
386s are already long beyond their MTBF, so they shouldn't be a major concern)

> That's fair enough, but I would be opposed to
> making semaphores bigger

If the code was out of lined bigger wouldn't make much difference
And if it worked for spinlocks I don't see why it shouldn't for semaphores.

> and slower on PowerPC because of that.

The question is if it really makes much difference. When semaphores
are congested in my experience the major overhead is in the scheduler
anyways.

That would leave the fast path, but does it help that much there
to have a more complicated implementation?
-Andi

2006-08-28 07:31:11

by Arjan van de Ven

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Mon, 2006-08-28 at 03:22 +0800, Dong Feng wrote:
> Why can't we have a hardware-independent semaphore definition while we
> have already had hardware-dependent spinlock, rwlock, and rcu lock? It
> seems the semaphore definitions classified into two major categories.
> The main deference is whether there is a member variable _sleeper_.

btw semaphores are a deprecated construct mostly; mutexes are the way to
go for new code if they fit the usage model of mutexes. And mutexes are
indeed generic (with a architecture hook to allow a specific operation
to be optimized using assembly)

2006-08-28 12:37:00

by linux-os (Dick Johnson)

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Mon, 28 Aug 2006, Chris Wedgwood wrote:

> On Mon, Aug 28, 2006 at 10:18:35AM +1000, Paul Mackerras wrote:
>
>> I believe the reason for not doing something like this on x86 was
>> the fact that we still support i386 processors, which don't have the
>> cmpxchg instruction. That's fair enough, but I would be opposed to
>> making semaphores bigger and slower on PowerPC because of that.
>> Hence the two different styles of implementation.
>
> The i386 is older than some of the kernel hackers, and given that a
> modern kernel is pretty painful with less than say 16MB or RAM in
> practice, I don't see that it would be all that terrible to drop
> support for ancient CPUs at some point (yes, I know some newer
> embedded (and similar) CPUs might be affected here too, but surely not
> that many that people really use --- and they could just use 2.4.x).

I don't think it's a matter of favoring any old PCs. Instead, it's
a matter of CPU designers creating special optimized operations for
use in semaphores. We should continue to use those if at all possible
because that's what they were designed for, even if semaphores could
be implemented with "standard" operations. Quoting myself from some
discussion years ago; "The fact that one can use an axe as a hammer
does not make an axe a hammer." We should continue to use the right
things even though CPUs are so fast nowadays that performance
advantages may seem to be lost in the noise. It's these little
performance detractors that tend to creep in and reduce the
performance of large systems.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips).
New book: http://www.AbominableFirebug.com/
_

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-08-29 01:00:25

by Paul Mackerras

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Andi Kleen writes:

> > That's fair enough, but I would be opposed to
> > making semaphores bigger
>
> If the code was out of lined bigger wouldn't make much difference

I was referring to the data size not the code size.

> That would leave the fast path, but does it help that much there
> to have a more complicated implementation?

The implementation of the fast path is basically atomic_inc/dec.
Nothing more complicated.

Paul.

2006-08-29 01:18:51

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Arjan van de Ven wrote:
> On Mon, 2006-08-28 at 03:22 +0800, Dong Feng wrote:
>
>>Why can't we have a hardware-independent semaphore definition while we
>>have already had hardware-dependent spinlock, rwlock, and rcu lock? It
>>seems the semaphore definitions classified into two major categories.
>>The main deference is whether there is a member variable _sleeper_.
>
>
> btw semaphores are a deprecated construct mostly; mutexes are the way to
> go for new code if they fit the usage model of mutexes. And mutexes are
> indeed generic (with a architecture hook to allow a specific operation
> to be optimized using assembly)

That's true, although rwsems are still very important for mmap_sem (if
nothing else). There is not yet an rw/se-mutex.

rwsem is another place that has a huge proliferation of assembly code.
I wonder if we can just start with the nice powerpc code that uses
atomic_add_return and cmpxchg (should use atomic_cmpxchg), and chuck
out the "crappy" rwsem fallback implementation, as well as all the arch
specific code?

That would seem to be able to rid us of vast swaths of tricky asm, and
also double the test coverage of the out of line lib/ code. Should still
generate close to optimal code on i386. The only architectures that might
slightly care are those who's atomics hash to locks (in that case, you'd
really rather take a lock in the rwsem's cacheline than a random atomic
lock... but I don't think incredible parisc/sparc SMP performance is
worth the current situation!)

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2006-08-29 06:08:00

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

> and chuck out the "crappy" rwsem fallback implementation

What is crappy with it?

I went with it because there were some serious concerns about
the complexity of the i386 rwsem code and so far nobody has complained
about them being too slow.

But yes rwsems could need some big cleanup.

-Andi

2006-08-29 10:05:47

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Nick Piggin <[email protected]> wrote:

> I wonder if we can just start with the nice powerpc code that uses
> atomic_add_return and cmpxchg (should use atomic_cmpxchg)

Because i386 (and x86_64) can do better by using XADDL/XADDQ.

The problem with CMPXCHG is that it might fail and you might have to attempt
it again. This may be unlikely - it depends on the circumstances. The same
applies to LL/ST equivalents.

On i386, CMPXCHG also ties you to what registers you may use for what to some
extent. OTOH, whilst XADD does less so, the slowpath function does instead,
though with the XADD version, we can make sure that the semaphore address is
in EAX, something we can't do with CMPXCHG.

For those archs where CMPXCHG is the best available, a better algorithm than
the XADD based one is available, though I haven't submitted it. I may still
have the patch somewhere.

However! If what you have is LL/ST equivalents than emulating CMPXCHG to
emulate the XADD algorithm probably isn't the most optimal way either. Don't
get stuck on using LL/ST to emulate what other CPUs have available.

> and chuck out the "crappy" rwsem fallback implementation,

CMPXCHG is not available on all archs, and may not be implemented on all archs
through other atomic instructions.

> as well as all the arch specific code?

Using CMPXCHG is only optimal where that's the best available.

David

2006-08-29 10:57:12

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tuesday 29 August 2006 12:05, David Howells wrote:

>Because i386 (and x86_64) can do better by using XADDL/XADDQ.

x86-64 has always used the spinlock based version.

> On i386, CMPXCHG also ties you to what registers you may use for what to some
> extent.

We've completely given up these kinds of micro optimization for spinlocks,
which are 1000x as critical as rwsems. And nobody was able to benchmark
a difference.

It is very very likely nobody could benchmark a difference on rwsems either.

While I'm sure it's an interesting intellectual exercise to do these
advanced rwsems it would be better for everybody else to go for a single
maintainable C implementation.

-Andi

2006-08-29 15:57:12

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006, David Howells wrote:

> Because i386 (and x86_64) can do better by using XADDL/XADDQ.

And Ia64 would like to use fetchadd....

> CMPXCHG is not available on all archs, and may not be implemented on all archs
> through other atomic instructions.

Which arches do not support cmpxchg?

2006-08-29 16:20:34

by Ralf Baechle

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, Aug 29, 2006 at 08:56:36AM -0700, Christoph Lameter wrote:

> > Because i386 (and x86_64) can do better by using XADDL/XADDQ.
>
> And Ia64 would like to use fetchadd....
>
> > CMPXCHG is not available on all archs, and may not be implemented on all archs
> > through other atomic instructions.
>
> Which arches do not support cmpxchg?

MIPS, Alpha - probably any pure RISC load/store architecture.

Ralf

2006-08-29 16:25:30

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Ralf Baechle <[email protected]> wrote:

> > Which arches do not support cmpxchg?
>
> MIPS, Alpha - probably any pure RISC load/store architecture.

Some of these have LL/SC or equivalent instead, but ARM5 and before, FRV, M68K
before 68020 to name but a few.

And anything that implements CMPXCHG with spinlocks is a really bad candidate
for CMPXCHG-based rwsems.

David

2006-08-29 16:28:26

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006, David Howells wrote:

> Ralf Baechle <[email protected]> wrote:
>
> > > Which arches do not support cmpxchg?
> >
> > MIPS, Alpha - probably any pure RISC load/store architecture.
>
> Some of these have LL/SC or equivalent instead, but ARM5 and before, FRV, M68K
> before 68020 to name but a few.

This is all pretty ancient hardware, right? And they are mostly single
processor so no need to worry about concurrency. Just disable interrupts.

> And anything that implements CMPXCHG with spinlocks is a really bad candidate
> for CMPXCHG-based rwsems.

Those will optimize out if it is a single processor configuration.

2006-08-29 16:53:44

by Ralf Baechle

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, Aug 29, 2006 at 05:25:02PM +0100, David Howells wrote:

> Some of these have LL/SC or equivalent instead, but ARM5 and before, FRV, M68K
> before 68020 to name but a few.

68k before 68020 isn't supported by Linux anyway.

Ralf

2006-08-29 16:57:50

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Christoph Lameter <[email protected]> wrote:

> > Some of these have LL/SC or equivalent instead, but ARM5 and before, FRV,
> > M68K before 68020 to name but a few.
>
> This is all pretty ancient hardware, right? And they are mostly single
> processor so no need to worry about concurrency. Just disable interrupts.

No, they're not all ancient h/w, and "just disabling interrupts" can be really
expensive.

> > And anything that implements CMPXCHG with spinlocks is a really bad
> > candidate for CMPXCHG-based rwsems.
>
> Those will optimize out if it is a single processor configuration.

Not necessarily. Consider preemption.

David

2006-08-29 16:59:51

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006, Ralf Baechle wrote:
> On Tue, Aug 29, 2006 at 05:25:02PM +0100, David Howells wrote:
>
> > Some of these have LL/SC or equivalent instead, but ARM5 and before, FRV, M68K
> > before 68020 to name but a few.
>
> 68k before 68020 isn't supported by Linux anyway.

uClinux anyone?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2006-08-29 17:25:07

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tuesday 29 August 2006 17:56, Christoph Lameter wrote:
> On Tue, 29 Aug 2006, David Howells wrote:
>
> > Because i386 (and x86_64) can do better by using XADDL/XADDQ.
>
> And Ia64 would like to use fetchadd....

This might be a dumb question, but I would expect even on altix
with lots of parallel faulting threads rwsem performance be basically
limited by aquiring the cache line and releasing it later to another CPU.

Do you really think it will make much difference what particular atomic
operation is used? The basic cost of sending the cache line over the
interconnect should be all the same, no? And once the cache line is local
it should be reasonably fast either way.

> > CMPXCHG is not available on all archs, and may not be implemented on all archs
> > through other atomic instructions.
>
> Which arches do not support cmpxchg?

parisc at least iirc (it practically doesn't support very much atomically)
and likely sparcv8.

-Andi

2006-08-29 17:36:28

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006, Andi Kleen wrote:

> On Tuesday 29 August 2006 17:56, Christoph Lameter wrote:
> > On Tue, 29 Aug 2006, David Howells wrote:
> >
> > > Because i386 (and x86_64) can do better by using XADDL/XADDQ.
> >
> > And Ia64 would like to use fetchadd....
>
> This might be a dumb question, but I would expect even on altix
> with lots of parallel faulting threads rwsem performance be basically
> limited by aquiring the cache line and releasing it later to another CPU.

Correct. However, a cmpxchg may have to acquire that cacheline multiple
times in a highly contented situation. A fetchadd acquires the cacheline
only once.

> Do you really think it will make much difference what particular atomic
> operation is used? The basic cost of sending the cache line over the
> interconnect should be all the same, no? And once the cache line is local
> it should be reasonably fast either way.

We have long tuned that portion of the code and therefore we are
skeptical of changes. But if we cannot measure a difference to a
generic implemenentation then it would be okay.

2006-08-29 17:40:52

by Andrew Morton

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006 12:56:54 +0200
Andi Kleen <[email protected]> wrote:

> While I'm sure it's an interesting intellectual exercise to do these
> advanced rwsems it would be better for everybody else to go for a single
> maintainable C implementation.

metoo. It's irritating having multiple implementations around, never being
sure which version people are running with.

2006-08-29 18:18:06

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tuesday 29 August 2006 19:36, Christoph Lameter wrote:
> On Tue, 29 Aug 2006, Andi Kleen wrote:
>
> > On Tuesday 29 August 2006 17:56, Christoph Lameter wrote:
> > > On Tue, 29 Aug 2006, David Howells wrote:
> > >
> > > > Because i386 (and x86_64) can do better by using XADDL/XADDQ.
> > >
> > > And Ia64 would like to use fetchadd....
> >
> > This might be a dumb question, but I would expect even on altix
> > with lots of parallel faulting threads rwsem performance be basically
> > limited by aquiring the cache line and releasing it later to another CPU.
>
> Correct. However, a cmpxchg may have to acquire that cacheline multiple
> times in a highly contented situation.

Hmm, thinking about it that sounds unlikely because only the first and the last
user should touch the rwsem head. But ok it might be possible.

BTW maybe it would be a good idea to switch the wait list to a hlist,
then the last user in the queue wouldn't need to
touch the cache line of the head. Or maybe even a single linked
list then some more cache bounces might be avoidable.

That is really why we need a single C implementation. If we had that
such optimizations would be pretty easy. Without it it's a big mess.

> We have long tuned that portion of the code and therefore we are
> skeptical of changes. But if we cannot measure a difference to a
> generic implemenentation then it would be okay.

Would you be willing to run numbers comparing them? (or provide a benchmark?)

-Andi

2006-08-29 18:31:24

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Andi Kleen <[email protected]> wrote:

> BTW maybe it would be a good idea to switch the wait list to a hlist,
> then the last user in the queue wouldn't need to
> touch the cache line of the head. Or maybe even a single linked
> list then some more cache bounces might be avoidable.

You need a list_head to get O(1) push at one end and O(1) pop at the other.
In addition a singly-linked list makes interruptible ops non-O(1) also.

David

2006-08-29 18:33:37

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tuesday 29 August 2006 20:30, David Howells wrote:
> Andi Kleen <[email protected]> wrote:
>
> > BTW maybe it would be a good idea to switch the wait list to a hlist,
> > then the last user in the queue wouldn't need to
> > touch the cache line of the head. Or maybe even a single linked
> > list then some more cache bounces might be avoidable.
>
> You need a list_head to get O(1) push at one end and O(1) pop at the other.

The poper should know its node address already because it's on its own stack.

> In addition a singly-linked list makes interruptible ops non-O(1) also.

When they are interrupted I guess? Hardly a problem to make that slower.

-Andi

2006-08-29 18:41:57

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, 29 Aug 2006, Andi Kleen wrote:

> > Correct. However, a cmpxchg may have to acquire that cacheline multiple
> > times in a highly contented situation.
>
> Hmm, thinking about it that sounds unlikely because only the first and the last
> user should touch the rwsem head. But ok it might be possible.

Well that sounds encouraging.

> BTW maybe it would be a good idea to switch the wait list to a hlist,
> then the last user in the queue wouldn't need to
> touch the cache line of the head. Or maybe even a single linked
> list then some more cache bounces might be avoidable.
>
> That is really why we need a single C implementation. If we had that
> such optimizations would be pretty easy. Without it it's a big mess.

I am all for optimizations....
>
> > We have long tuned that portion of the code and therefore we are
> > skeptical of changes. But if we cannot measure a difference to a
> > generic implemenentation then it would be okay.
>
> Would you be willing to run numbers comparing them? (or provide a benchmark?)

Yes. The infamous page fault test (pft.c) really hits this one hard and on
a large SMP system you should be able to see problems immediately. In fact
since the page table lock scaling fixes the page fault rate
on our large system is limited by rwsem performance due to the mmap_sem
cacheline becoming contended. We'd be glad about any improvements in our
numbers.

Benchmark code follows (you may have to do some tweaking for 32
bit):

#include <pthread.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#include <sys/mman.h>
#include <time.h>
#include <errno.h>
#include <sys/resource.h>

extern int optind, opterr;
extern char *optarg;

long bytes=16384;
long sleepsec=0;
long verbose=0;
long forkcnt=1;
long repeatcount=1;
long cachelines=1;
long do_bzero=0;
long mypid;
int title=0;

volatile int go, state[128];

struct timespec wall,wall_end,wall_start;
struct rusage ruse;
long faults;
long pages;
long gbyte;
double faults_per_sec;
double faults_per_sec_per_cpu;

#define perrorx(s) (perror(s), exit(1))
#define NBPP 16384

void* test(void*);
void launch(void);

main (int argc, char *argv[])
{
int i, j, c, stat, er=0;
static char optstr[] = "b:c:f:g:r:s:vzHt";

opterr=1;
while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'g':
bytes = atol(optarg)*1024*1024*1024;
break;
case 'b':
bytes = atol(optarg);
break;
case 'c':
cachelines = atol(optarg);
case 'f':
forkcnt = atol(optarg);
break;
case 'r':
repeatcount = atol(optarg);
break;
case 's':
sleepsec = atol(optarg);
break;
case 'v':
verbose++;
break;
case 'z':
do_bzero++;
break;
case 'H':
er++;
break;
case 't' :
title++;
break;
case '?':
er = 1;
break;
}

if (er) {
printf("usage: %s %s\n", argv[0], optstr);
exit(1);
}

pages = bytes*repeatcount/getpagesize();
gbyte = bytes/(1024*1024*1024);
bytes = bytes/forkcnt;

clock_gettime(CLOCK_REALTIME,&wall_start);

if (verbose) printf("Calculated pages=%ld pagesize=%ld.\n",pages,getpagesize());

mypid = getpid();
setpgid(0, mypid);

for (i=0; i<repeatcount; i++) {
if (fork() == 0)
launch();
while (wait(&stat) > 0);
}

getrusage(RUSAGE_CHILDREN,&ruse);
clock_gettime(CLOCK_REALTIME,&wall_end);
wall.tv_sec = wall_end.tv_sec - wall_start.tv_sec;
wall.tv_nsec = wall_end.tv_nsec - wall_start.tv_nsec;
if (wall.tv_nsec <0 ) { wall.tv_sec--; wall.tv_nsec += 1000000000; }
if (wall.tv_nsec >1000000000) { wall.tv_sec++; wall.tv_nsec -= 1000000000; }
if (verbose) printf("Calculated faults=%ld. Real minor faults=%ld, major faults=%ld\n",pages,ruse.ru_minflt+ruse.ru_majflt);
faults_per_sec=(double) pages / ((double) wall.tv_sec + (double) wall.tv_nsec / 1000000000.0);
faults_per_sec_per_cpu=(double) pages / (
(double) (ruse.ru_utime.tv_sec + ruse.ru_stime.tv_sec) + ((double) (ruse.ru_utime.tv_usec + ruse.ru_stime.tv_usec) / 1000000.0));
if (title) printf(" Gb Rep Thr CLine User System Wall flt/cpu/s fault/wsec\n");
printf("%3ld %2ld %4ld %3ld %4ld.%02lds%7ld.%02lds%4ld.%02lds%10.3f %10.3f\n",
gbyte,repeatcount,forkcnt,cachelines,
ruse.ru_utime.tv_sec,ruse.ru_utime.tv_usec/10000,
ruse.ru_stime.tv_sec,ruse.ru_stime.tv_usec/10000,
wall.tv_sec,wall.tv_nsec/10000000,
faults_per_sec_per_cpu,faults_per_sec);
exit(0);
}

char *
do_shm(long shmlen) {
char *p;
int shmid;

printf ("Try to allocate TOTAL shm segment of %ld bytes\n", shmlen);

if ((shmid = shmget(IPC_PRIVATE, shmlen, SHM_R|SHM_W)) == -1)
perrorx("shmget faiiled");

p=(char*)shmat(shmid, (void*)0, SHM_R|SHM_W);
printf(" created, adr: 0x%lx\n", (long)p);
printf(" attached\n");
bzero(p, shmlen);
printf(" zeroed\n");

// if (shmctl(shmid,IPC_RMID,0) == -1)
// perrorx("shmctl failed");
// printf(" deleted\n");

return p;

}

void
launch()
{
pthread_t ptid[128];
int i, j;

for (j=0; j<forkcnt; j++)
if (pthread_create(&ptid[j], NULL, test, (void*) (long)j) < 0)
perrorx("pthread create");

if(0) for (j=0; j<forkcnt; j++)
while(state[j] == 0);
go = 1;
if(0) for (j=0; j<forkcnt; j++)
while(state[j] == 1);
for (j=0; j<forkcnt; j++)
pthread_join(ptid[j], NULL);
exit(0);
}

void*
test(void *arg)
{
char *p, *pe;
long id;
int cl;

id = (long) arg;
state[id] = 1;
while(!go);
p = malloc(bytes);
// p = do_shm(bytes);
if (p == 0) {
printf("malloc of %Ld bytes failed.\n",bytes);
exit;
} else
if (verbose) printf("malloc of %Ld bytes succeeded\n",bytes);
if (do_bzero)
bzero(p, bytes);
else {
for(pe=p+bytes; p<pe; p+=16384)
for(cl=0; cl<cachelines;cl++)
p[cl*128] = 'r';
}
sleep(sleepsec);
state[id] = 2;
pthread_exit(0);
}

2006-08-29 18:57:10

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Andi Kleen <[email protected]> wrote:

> > > BTW maybe it would be a good idea to switch the wait list to a hlist,
> > > then the last user in the queue wouldn't need to
> > > touch the cache line of the head. Or maybe even a single linked
> > > list then some more cache bounces might be avoidable.
> >
> > You need a list_head to get O(1) push at one end and O(1) pop at the other.
>
> The poper should know its node address already because it's on its own stack.

No. The popper (__rwsem_do_wake) runs in the context of up_xxxx(), not
down_xxxx(). Remember: up() may need to wake up several processes if there's
a batch of readers at the front of the queue.

Remember also: rwsems, unlike mutexes, are completely fair.

> > In addition a singly-linked list makes interruptible ops non-O(1) also.
>
> When they are interrupted I guess? Hardly a problem to make that slower.

Currently interruptible rwsems are not available, but that may change, and
whilst I agree making it slower probably isn't a problem, it's still a point
that has to be considered.

David

2006-08-29 19:10:30

by Benjamin LaHaise

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Tue, Aug 29, 2006 at 12:56:54PM +0200, Andi Kleen wrote:
> We've completely given up these kinds of micro optimization for spinlocks,
> which are 1000x as critical as rwsems. And nobody was able to benchmark
> a difference.

That is false. It shows up on things like netperf on the P4, or the AF_UNIX
component of lmbench.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-09-13 17:54:46

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

[sorry, I missed some emails when I was out of action a while back. Catching up...]

Andi Kleen wrote:
>>and chuck out the "crappy" rwsem fallback implementation
>
>
> What is crappy with it?

I just mean that it is supposedly the second class implementation. Just
seems wrong when powerpc has implemented the "better" version in a completely
generic way (patch to follow).

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-13 18:05:07

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

include/asm-alpha/rwsem.h | 259 -------------------------
include/asm-i386/rwsem.h | 296 -----------------------------
include/asm-ia64/rwsem.h | 178 -----------------
include/asm-powerpc/rwsem.h | 152 --------------
include/asm-s390/rwsem.h | 387 -------------------------------------- include/asm-sh/rwsem.h | 159 ---------------
include/asm-sparc64/rwsem-const.h | 12 -
include/asm-sparc64/rwsem.h | 66 ------
include/asm-um/rwsem.h | 6
include/asm-xtensa/rwsem.h | 164 ----------------
lib/rwsem-spinlock.c | 316 -------------------------------
lib/rwsem.c | 257 -------------------------
linux-2.6/arch/alpha/Kconfig | 7
linux-2.6/arch/arm/Kconfig | 7
linux-2.6/arch/arm26/Kconfig | 7
linux-2.6/arch/cris/Kconfig | 7
linux-2.6/arch/frv/Kconfig | 7
linux-2.6/arch/h8300/Kconfig | 8
linux-2.6/arch/i386/Kconfig.cpu | 10
linux-2.6/arch/ia64/Kconfig | 4
linux-2.6/arch/m32r/Kconfig | 9
linux-2.6/arch/m68k/Kconfig | 7
linux-2.6/arch/m68knommu/Kconfig | 8
linux-2.6/arch/mips/Kconfig | 11 -
linux-2.6/arch/parisc/Kconfig | 6
linux-2.6/arch/powerpc/Kconfig | 7
linux-2.6/arch/ppc/Kconfig | 7
linux-2.6/arch/s390/Kconfig | 7
linux-2.6/arch/sh/Kconfig | 7
linux-2.6/arch/sh64/Kconfig | 7
linux-2.6/arch/sparc/Kconfig | 7
linux-2.6/arch/sparc64/Kconfig | 7
linux-2.6/arch/um/Kconfig.x86_64 | 4
linux-2.6/arch/v850/Kconfig | 6
linux-2.6/arch/x86_64/Kconfig | 7
linux-2.6/arch/xtensa/Kconfig | 4
linux-2.6/include/linux/rwsem.h | 76 ++++++-
linux-2.6/kernel/rwsem.c | 292 +++++++++++++++++++++++++++-
linux-2.6/lib/Makefile | 2
39 files changed, 351 insertions(+), 2439 deletions(-)

Index: linux-2.6/include/asm-powerpc/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-powerpc/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,152 +0,0 @@
-#ifndef _ASM_POWERPC_RWSEM_H
-#define _ASM_POWERPC_RWSEM_H
-
-#ifdef __KERNEL__
-
-/*
- * include/asm-powerpc/rwsem.h: R/W semaphores for PPC using the stuff
- * in lib/rwsem.c. Adapted largely from include/asm-i386/rwsem.h
- * by Paul Mackerras <[email protected]>.
- */
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <asm/atomic.h>
-#include <asm/system.h>
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- /* XXX this should be able to be an atomic_t -- paulus */
- signed int count;
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS (-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define __RWSEM_INITIALIZER(name) \
- { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
- LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
- if (unlikely(atomic_inc_return((atomic_t *)(&sem->count)) <= 0))
- rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- while ((tmp = sem->count) >= 0) {
- if (tmp == cmpxchg(&sem->count, tmp,
- tmp + RWSEM_ACTIVE_READ_BIAS)) {
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count));
- if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
- rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = atomic_dec_return((atomic_t *)(&sem->count));
- if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0))
- rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
- if (unlikely(atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count)) < 0))
- rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
- atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
- if (tmp < 0)
- rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
- return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_POWERPC_RWSEM_H */
Index: linux-2.6/include/linux/rwsem.h
===================================================================
--- linux-2.6.orig/include/linux/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ linux-2.6/include/linux/rwsem.h 2006-09-14 03:46:18.000000000 +1000
@@ -1,7 +1,34 @@
-/* rwsem.h: R/W semaphores, public interface
+/*
+ * rwsem.h: R/W semaphores, public interface
*
* Written by David Howells ([email protected]).
* Derived from asm-i386/semaphore.h
+ *
+ * Also by Paul Mackerras <[email protected]>
+ * and Nick Piggin <[email protected]>
+ *
+ * The MSW of the count is the negated number of active writers and waiting
+ * lockers, and the LSW is the total number of active locks
+ *
+ * The lock count is initialized to 0 (no active and no waiting lockers).
+ *
+ * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
+ * uncontended lock. This can be determined because XADD returns the old value.
+ * Readers increment by 1 and see a positive value when uncontended, negative
+ * if there are writers (and maybe) readers waiting (in which case it goes to
+ * sleep).
+ *
+ * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
+ * be extended to 65534 by manually checking the whole MSW rather than relying
+ * on the S flag.
+ *
+ * The value of ACTIVE_BIAS supports up to 65535 active processes.
+ *
+ * This should be totally fair - if anything is waiting, a process that wants a
+ * lock will go to the back of the queue. When the currently active lock is
+ * released, if there's a writer at the front of the queue, then that and only
+ * that will be woken up; if there's a bunch of consequtive readers at the
+ * front, then they'll all be woken up, but no other readers will be.
*/

#ifndef _LINUX_RWSEM_H
@@ -16,14 +43,53 @@
#include <asm/system.h>
#include <asm/atomic.h>

-struct rw_semaphore;
+/*
+ * the semaphore definition
+ */
+struct rw_semaphore {
+ atomic_t count;
+#define RWSEM_UNLOCKED_VALUE 0x00000000
+#define RWSEM_ACTIVE_BIAS 0x00000001
+#define RWSEM_ACTIVE_MASK 0x0000ffff
+#define RWSEM_WAITING_BIAS 0xffff0000
+#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
+#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+ spinlock_t wait_lock;
+ struct list_head wait_list;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};

-#ifdef CONFIG_RWSEM_GENERIC_SPINLOCK
-#include <linux/rwsem-spinlock.h> /* use a generic implementation */
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define __RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }
#else
-#include <asm/rwsem.h> /* use an arch-specific implementation */
+# define __RWSEM_DEP_MAP_INIT(lockname)
#endif

+
+#define __RWSEM_INITIALIZER(name) \
+{ ATOMIC_INIT(RWSEM_UNLOCKED_VALUE), SPIN_LOCK_UNLOCKED, \
+ LIST_HEAD_INIT((name).wait_list), __RWSEM_DEP_MAP_INIT(name) }
+
+#define DECLARE_RWSEM(name) \
+ struct rw_semaphore name = __RWSEM_INITIALIZER(name)
+
+extern void __init_rwsem(struct rw_semaphore *sem, const char *name,
+ struct lock_class_key *key);
+
+#define init_rwsem(sem) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __init_rwsem((sem), #sem, &__key); \
+} while (0)
+
+static inline int rwsem_is_locked(struct rw_semaphore *sem)
+{
+ return (atomic_read(&sem->count) != 0);
+}
+
/*
* lock for reading
*/
Index: linux-2.6/lib/rwsem-spinlock.c
===================================================================
--- linux-2.6.orig/lib/rwsem-spinlock.c 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,316 +0,0 @@
-/* rwsem-spinlock.c: R/W semaphores: contention handling functions for
- * generic spinlock implementation
- *
- * Copyright (c) 2001 David Howells ([email protected]).
- * - Derived partially from idea by Andrea Arcangeli <[email protected]>
- * - Derived also from comments by Linus
- */
-#include <linux/rwsem.h>
-#include <linux/sched.h>
-#include <linux/module.h>
-
-struct rwsem_waiter {
- struct list_head list;
- struct task_struct *task;
- unsigned int flags;
-#define RWSEM_WAITING_FOR_READ 0x00000001
-#define RWSEM_WAITING_FOR_WRITE 0x00000002
-};
-
-/*
- * initialise the semaphore
- */
-void __init_rwsem(struct rw_semaphore *sem, const char *name,
- struct lock_class_key *key)
-{
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- /*
- * Make sure we are not reinitializing a held semaphore:
- */
- debug_check_no_locks_freed((void *)sem, sizeof(*sem));
- lockdep_init_map(&sem->dep_map, name, key);
-#endif
- sem->activity = 0;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-/*
- * handle the lock release when processes blocked on it that can now run
- * - if we come here, then:
- * - the 'active count' _reached_ zero
- * - the 'waiting count' is non-zero
- * - the spinlock must be held by the caller
- * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if wakewrite is non-zero
- */
-static inline struct rw_semaphore *
-__rwsem_do_wake(struct rw_semaphore *sem, int wakewrite)
-{
- struct rwsem_waiter *waiter;
- struct task_struct *tsk;
- int woken;
-
- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-
- if (!wakewrite) {
- if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
- goto out;
- goto dont_wake_writers;
- }
-
- /* if we are allowed to wake writers try to grant a single write lock
- * if there's a writer at the front of the queue
- * - we leave the 'waiting count' incremented to signify potential
- * contention
- */
- if (waiter->flags & RWSEM_WAITING_FOR_WRITE) {
- sem->activity = -1;
- list_del(&waiter->list);
- tsk = waiter->task;
- /* Don't touch waiter after ->task has been NULLed */
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
- goto out;
- }
-
- /* grant an infinite number of read locks to the front of the queue */
- dont_wake_writers:
- woken = 0;
- while (waiter->flags & RWSEM_WAITING_FOR_READ) {
- struct list_head *next = waiter->list.next;
-
- list_del(&waiter->list);
- tsk = waiter->task;
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
- woken++;
- if (list_empty(&sem->wait_list))
- break;
- waiter = list_entry(next, struct rwsem_waiter, list);
- }
-
- sem->activity += woken;
-
- out:
- return sem;
-}
-
-/*
- * wake a single writer
- */
-static inline struct rw_semaphore *
-__rwsem_wake_one_writer(struct rw_semaphore *sem)
-{
- struct rwsem_waiter *waiter;
- struct task_struct *tsk;
-
- sem->activity = -1;
-
- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
- list_del(&waiter->list);
-
- tsk = waiter->task;
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
- return sem;
-}
-
-/*
- * get a read lock on the semaphore
- */
-void fastcall __sched __down_read(struct rw_semaphore *sem)
-{
- struct rwsem_waiter waiter;
- struct task_struct *tsk;
-
- spin_lock_irq(&sem->wait_lock);
-
- if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
- /* granted */
- sem->activity++;
- spin_unlock_irq(&sem->wait_lock);
- goto out;
- }
-
- tsk = current;
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
- /* set up my own style of waitqueue */
- waiter.task = tsk;
- waiter.flags = RWSEM_WAITING_FOR_READ;
- get_task_struct(tsk);
-
- list_add_tail(&waiter.list, &sem->wait_list);
-
- /* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
-
- /* wait to be given the lock */
- for (;;) {
- if (!waiter.task)
- break;
- schedule();
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
- }
-
- tsk->state = TASK_RUNNING;
- out:
- ;
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-int fastcall __down_read_trylock(struct rw_semaphore *sem)
-{
- unsigned long flags;
- int ret = 0;
-
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
- /* granted */
- sem->activity++;
- ret = 1;
- }
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-
- return ret;
-}
-
-/*
- * get a write lock on the semaphore
- * - we increment the waiting count anyway to indicate an exclusive lock
- */
-void fastcall __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
- struct rwsem_waiter waiter;
- struct task_struct *tsk;
-
- spin_lock_irq(&sem->wait_lock);
-
- if (sem->activity == 0 && list_empty(&sem->wait_list)) {
- /* granted */
- sem->activity = -1;
- spin_unlock_irq(&sem->wait_lock);
- goto out;
- }
-
- tsk = current;
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
- /* set up my own style of waitqueue */
- waiter.task = tsk;
- waiter.flags = RWSEM_WAITING_FOR_WRITE;
- get_task_struct(tsk);
-
- list_add_tail(&waiter.list, &sem->wait_list);
-
- /* we don't need to touch the semaphore struct anymore */
- spin_unlock_irq(&sem->wait_lock);
-
- /* wait to be given the lock */
- for (;;) {
- if (!waiter.task)
- break;
- schedule();
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
- }
-
- tsk->state = TASK_RUNNING;
- out:
- ;
-}
-
-void fastcall __sched __down_write(struct rw_semaphore *sem)
-{
- __down_write_nested(sem, 0);
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-int fastcall __down_write_trylock(struct rw_semaphore *sem)
-{
- unsigned long flags;
- int ret = 0;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- if (sem->activity == 0 && list_empty(&sem->wait_list)) {
- /* granted */
- sem->activity = -1;
- ret = 1;
- }
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-
- return ret;
-}
-
-/*
- * release a read lock on the semaphore
- */
-void fastcall __up_read(struct rw_semaphore *sem)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- if (--sem->activity == 0 && !list_empty(&sem->wait_list))
- sem = __rwsem_wake_one_writer(sem);
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-}
-
-/*
- * release a write lock on the semaphore
- */
-void fastcall __up_write(struct rw_semaphore *sem)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- sem->activity = 0;
- if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem, 1);
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-}
-
-/*
- * downgrade a write lock into a read lock
- * - just wake up any readers at the front of the queue
- */
-void fastcall __downgrade_write(struct rw_semaphore *sem)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- sem->activity = 1;
- if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem, 0);
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-}
-
-EXPORT_SYMBOL(__init_rwsem);
-EXPORT_SYMBOL(__down_read);
-EXPORT_SYMBOL(__down_read_trylock);
-EXPORT_SYMBOL(__down_write_nested);
-EXPORT_SYMBOL(__down_write);
-EXPORT_SYMBOL(__down_write_trylock);
-EXPORT_SYMBOL(__up_read);
-EXPORT_SYMBOL(__up_write);
-EXPORT_SYMBOL(__downgrade_write);
Index: linux-2.6/include/asm-alpha/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-alpha/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,259 +0,0 @@
-#ifndef _ALPHA_RWSEM_H
-#define _ALPHA_RWSEM_H
-
-/*
- * Written by Ivan Kokshaysky <[email protected]>, 2001.
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/compiler.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-
-struct rwsem_waiter;
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- long count;
-#define RWSEM_UNLOCKED_VALUE 0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS 0x0000000000000001L
-#define RWSEM_ACTIVE_MASK 0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS (-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define __RWSEM_INITIALIZER(name) \
- { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
- LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
- long oldcount;
-#ifndef CONFIG_SMP
- oldcount = sem->count;
- sem->count += RWSEM_ACTIVE_READ_BIAS;
-#else
- long temp;
- __asm__ __volatile__(
- "1: ldq_l %0,%1\n"
- " addq %0,%3,%2\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- " mb\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
- :"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
- if (unlikely(oldcount < 0))
- rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- long old, new, res;
-
- res = sem->count;
- do {
- new = res + RWSEM_ACTIVE_READ_BIAS;
- if (new <= 0)
- break;
- old = res;
- res = cmpxchg(&sem->count, old, new);
- } while (res != old);
- return res >= 0 ? 1 : 0;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
- long oldcount;
-#ifndef CONFIG_SMP
- oldcount = sem->count;
- sem->count += RWSEM_ACTIVE_WRITE_BIAS;
-#else
- long temp;
- __asm__ __volatile__(
- "1: ldq_l %0,%1\n"
- " addq %0,%3,%2\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- " mb\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
- :"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
- if (unlikely(oldcount))
- rwsem_down_write_failed(sem);
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- long ret = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- if (ret == RWSEM_UNLOCKED_VALUE)
- return 1;
- return 0;
-}
-
-static inline void __up_read(struct rw_semaphore *sem)
-{
- long oldcount;
-#ifndef CONFIG_SMP
- oldcount = sem->count;
- sem->count -= RWSEM_ACTIVE_READ_BIAS;
-#else
- long temp;
- __asm__ __volatile__(
- " mb\n"
- "1: ldq_l %0,%1\n"
- " subq %0,%3,%2\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
- :"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
- if (unlikely(oldcount < 0))
- if ((int)oldcount - RWSEM_ACTIVE_READ_BIAS == 0)
- rwsem_wake(sem);
-}
-
-static inline void __up_write(struct rw_semaphore *sem)
-{
- long count;
-#ifndef CONFIG_SMP
- sem->count -= RWSEM_ACTIVE_WRITE_BIAS;
- count = sem->count;
-#else
- long temp;
- __asm__ __volatile__(
- " mb\n"
- "1: ldq_l %0,%1\n"
- " subq %0,%3,%2\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- " subq %0,%3,%0\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (count), "=m" (sem->count), "=&r" (temp)
- :"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
- if (unlikely(count))
- if ((int)count == 0)
- rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- long oldcount;
-#ifndef CONFIG_SMP
- oldcount = sem->count;
- sem->count -= RWSEM_WAITING_BIAS;
-#else
- long temp;
- __asm__ __volatile__(
- "1: ldq_l %0,%1\n"
- " addq %0,%3,%2\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- " mb\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
- :"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
-#endif
- if (unlikely(oldcount < 0))
- rwsem_downgrade_wake(sem);
-}
-
-static inline void rwsem_atomic_add(long val, struct rw_semaphore *sem)
-{
-#ifndef CONFIG_SMP
- sem->count += val;
-#else
- long temp;
- __asm__ __volatile__(
- "1: ldq_l %0,%1\n"
- " addq %0,%2,%0\n"
- " stq_c %0,%1\n"
- " beq %0,2f\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (temp), "=m" (sem->count)
- :"Ir" (val), "m" (sem->count));
-#endif
-}
-
-static inline long rwsem_atomic_update(long val, struct rw_semaphore *sem)
-{
-#ifndef CONFIG_SMP
- sem->count += val;
- return sem->count;
-#else
- long ret, temp;
- __asm__ __volatile__(
- "1: ldq_l %0,%1\n"
- " addq %0,%3,%2\n"
- " addq %0,%3,%0\n"
- " stq_c %2,%1\n"
- " beq %2,2f\n"
- ".subsection 2\n"
- "2: br 1b\n"
- ".previous"
- :"=&r" (ret), "=m" (sem->count), "=&r" (temp)
- :"Ir" (val), "m" (sem->count));
-
- return ret;
-#endif
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ALPHA_RWSEM_H */
Index: linux-2.6/include/asm-i386/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-i386/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,296 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells ([email protected]).
- *
- * Derived from asm-i386/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consequtive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _I386_RWSEM_H
-#define _I386_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/lockdep.h>
-
-struct rwsem_waiter;
-
-extern struct rw_semaphore *FASTCALL(rwsem_down_read_failed(struct rw_semaphore *sem));
-extern struct rw_semaphore *FASTCALL(rwsem_down_write_failed(struct rw_semaphore *sem));
-extern struct rw_semaphore *FASTCALL(rwsem_wake(struct rw_semaphore *));
-extern struct rw_semaphore *FASTCALL(rwsem_downgrade_wake(struct rw_semaphore *sem));
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- signed long count;
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS (-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
- spinlock_t wait_lock;
- struct list_head wait_list;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- struct lockdep_map dep_map;
-#endif
-};
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
-#else
-# define __RWSEM_DEP_MAP_INIT(lockname)
-#endif
-
-
-#define __RWSEM_INITIALIZER(name) \
-{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) \
- __RWSEM_DEP_MAP_INIT(name) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-extern void __init_rwsem(struct rw_semaphore *sem, const char *name,
- struct lock_class_key *key);
-
-#define init_rwsem(sem) \
-do { \
- static struct lock_class_key __key; \
- \
- __init_rwsem((sem), #sem, &__key); \
-} while (0)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
- __asm__ __volatile__(
- "# beginning down_read\n\t"
-LOCK_PREFIX " incl (%%eax)\n\t" /* adds 0x00000001, returns the old value */
- " js 2f\n\t" /* jump if we weren't granted the lock */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " pushl %%edx\n\t"
- " call rwsem_down_read_failed\n\t"
- " popl %%edx\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
- "# ending down_read\n\t"
- : "+m" (sem->count)
- : "a" (sem)
- : "memory", "cc");
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- __s32 result, tmp;
- __asm__ __volatile__(
- "# beginning __down_read_trylock\n\t"
- " movl %0,%1\n\t"
- "1:\n\t"
- " movl %1,%2\n\t"
- " addl %3,%2\n\t"
- " jle 2f\n\t"
-LOCK_PREFIX " cmpxchgl %2,%0\n\t"
- " jnz 1b\n\t"
- "2:\n\t"
- "# ending __down_read_trylock\n\t"
- : "+m" (sem->count), "=&a" (result), "=&r" (tmp)
- : "i" (RWSEM_ACTIVE_READ_BIAS)
- : "memory", "cc");
- return result>=0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
- int tmp;
-
- tmp = RWSEM_ACTIVE_WRITE_BIAS;
- __asm__ __volatile__(
- "# beginning down_write\n\t"
-LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtract 0x0000ffff, returns the old value */
- " testl %%edx,%%edx\n\t" /* was the count 0 before? */
- " jnz 2f\n\t" /* jump if we weren't granted the lock */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " call rwsem_down_write_failed\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
- "# ending down_write"
- : "+m" (sem->count), "=d" (tmp)
- : "a" (sem), "1" (tmp)
- : "memory", "cc");
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
- __down_write_nested(sem, 0);
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- signed long ret = cmpxchg(&sem->count,
- RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- if (ret == RWSEM_UNLOCKED_VALUE)
- return 1;
- return 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
- __s32 tmp = -RWSEM_ACTIVE_READ_BIAS;
- __asm__ __volatile__(
- "# beginning __up_read\n\t"
-LOCK_PREFIX " xadd %%edx,(%%eax)\n\t" /* subtracts 1, returns the old value */
- " js 2f\n\t" /* jump if the lock is being waited upon */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " decw %%dx\n\t" /* do nothing if still outstanding active readers */
- " jnz 1b\n\t"
- " pushl %%ecx\n\t"
- " call rwsem_wake\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
- "# ending __up_read\n"
- : "+m" (sem->count), "=d" (tmp)
- : "a" (sem), "1" (tmp)
- : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
- __asm__ __volatile__(
- "# beginning __up_write\n\t"
- " movl %2,%%edx\n\t"
-LOCK_PREFIX " xaddl %%edx,(%%eax)\n\t" /* tries to transition 0xffff0001 -> 0x00000000 */
- " jnz 2f\n\t" /* jump if the lock is being waited upon */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " decw %%dx\n\t" /* did the active count reduce to 0? */
- " jnz 1b\n\t" /* jump back if not */
- " pushl %%ecx\n\t"
- " call rwsem_wake\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
- "# ending __up_write\n"
- : "+m" (sem->count)
- : "a" (sem), "i" (-RWSEM_ACTIVE_WRITE_BIAS)
- : "memory", "cc", "edx");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- __asm__ __volatile__(
- "# beginning __downgrade_write\n\t"
-LOCK_PREFIX " addl %2,(%%eax)\n\t" /* transitions 0xZZZZ0001 -> 0xYYYY0001 */
- " js 2f\n\t" /* jump if the lock is being waited upon */
- "1:\n\t"
- LOCK_SECTION_START("")
- "2:\n\t"
- " pushl %%ecx\n\t"
- " pushl %%edx\n\t"
- " call rwsem_downgrade_wake\n\t"
- " popl %%edx\n\t"
- " popl %%ecx\n\t"
- " jmp 1b\n"
- LOCK_SECTION_END
- "# ending __downgrade_write\n"
- : "+m" (sem->count)
- : "a" (sem), "i" (-RWSEM_WAITING_BIAS)
- : "memory", "cc");
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
- __asm__ __volatile__(
-LOCK_PREFIX "addl %1,%0"
- : "+m" (sem->count)
- : "ir" (delta));
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
- int tmp = delta;
-
- __asm__ __volatile__(
-LOCK_PREFIX "xadd %0,%1"
- : "+r" (tmp), "+m" (sem->count)
- : : "memory");
-
- return tmp+delta;
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _I386_RWSEM_H */
Index: linux-2.6/include/asm-ia64/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-ia64/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,178 +0,0 @@
-/*
- * asm-ia64/rwsem.h: R/W semaphores for ia64
- *
- * Copyright (C) 2003 Ken Chen <[email protected]>
- * Copyright (C) 2003 Asit Mallick <[email protected]>
- * Copyright (C) 2005 Christoph Lameter <[email protected]>
- *
- * Based on asm-i386/rwsem.h and other architecture implementation.
- *
- * The MSW of the count is the negated number of active writers and
- * waiting lockers, and the LSW is the total number of active locks.
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffffffff00000001 for
- * the case of an uncontended lock. Readers increment by 1 and see a positive
- * value when uncontended, negative if there are writers (and maybe) readers
- * waiting (in which case it goes to sleep).
- */
-
-#ifndef _ASM_IA64_RWSEM_H
-#define _ASM_IA64_RWSEM_H
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-
-#include <asm/intrinsics.h>
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- signed long count;
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define RWSEM_UNLOCKED_VALUE __IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS __IA64_UL_CONST(0x0000000000000001)
-#define RWSEM_ACTIVE_MASK __IA64_UL_CONST(0x00000000ffffffff)
-#define RWSEM_WAITING_BIAS -__IA64_UL_CONST(0x0000000100000000)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-#define __RWSEM_INITIALIZER(name) \
- { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
- LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
-static inline void
-init_rwsem (struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-/*
- * lock for reading
- */
-static inline void
-__down_read (struct rw_semaphore *sem)
-{
- long result = ia64_fetchadd8_acq((unsigned long *)&sem->count, 1);
-
- if (result < 0)
- rwsem_down_read_failed(sem);
-}
-
-/*
- * lock for writing
- */
-static inline void
-__down_write (struct rw_semaphore *sem)
-{
- long old, new;
-
- do {
- old = sem->count;
- new = old + RWSEM_ACTIVE_WRITE_BIAS;
- } while (cmpxchg_acq(&sem->count, old, new) != old);
-
- if (old != 0)
- rwsem_down_write_failed(sem);
-}
-
-/*
- * unlock after reading
- */
-static inline void
-__up_read (struct rw_semaphore *sem)
-{
- long result = ia64_fetchadd8_rel((unsigned long *)&sem->count, -1);
-
- if (result < 0 && (--result & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void
-__up_write (struct rw_semaphore *sem)
-{
- long old, new;
-
- do {
- old = sem->count;
- new = old - RWSEM_ACTIVE_WRITE_BIAS;
- } while (cmpxchg_rel(&sem->count, old, new) != old);
-
- if (new < 0 && (new & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_read_trylock (struct rw_semaphore *sem)
-{
- long tmp;
- while ((tmp = sem->count) >= 0) {
- if (tmp == cmpxchg_acq(&sem->count, tmp, tmp+1)) {
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_write_trylock (struct rw_semaphore *sem)
-{
- long tmp = cmpxchg_acq(&sem->count, RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void
-__downgrade_write (struct rw_semaphore *sem)
-{
- long old, new;
-
- do {
- old = sem->count;
- new = old - RWSEM_WAITING_BIAS;
- } while (cmpxchg_rel(&sem->count, old, new) != old);
-
- if (old < 0)
- rwsem_downgrade_wake(sem);
-}
-
-/*
- * Implement atomic add functionality. These used to be "inline" functions, but GCC v3.1
- * doesn't quite optimize this stuff right and ends up with bad calls to fetchandadd.
- */
-#define rwsem_atomic_add(delta, sem) atomic64_add(delta, (atomic64_t *)(&(sem)->count))
-#define rwsem_atomic_update(delta, sem) atomic64_add_return(delta, (atomic64_t *)(&(sem)->count))
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* _ASM_IA64_RWSEM_H */
Index: linux-2.6/include/asm-s390/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-s390/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,387 +0,0 @@
-#ifndef _S390_RWSEM_H
-#define _S390_RWSEM_H
-
-/*
- * include/asm-s390/rwsem.h
- *
- * S390 version
- * Copyright (C) 2002 IBM Deutschland Entwicklung GmbH, IBM Corporation
- * Author(s): Martin Schwidefsky ([email protected])
- *
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-/*
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consequtive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-
-struct rwsem_waiter;
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_write(struct rw_semaphore *);
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- signed long count;
- spinlock_t wait_lock;
- struct list_head wait_list;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- struct lockdep_map dep_map;
-#endif
-};
-
-#ifndef __s390x__
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS (-0x00010000)
-#else /* __s390x__ */
-#define RWSEM_UNLOCKED_VALUE 0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS 0x0000000000000001L
-#define RWSEM_ACTIVE_MASK 0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS (-0x0000000100000000L)
-#endif /* __s390x__ */
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * initialisation
- */
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-# define __RWSEM_DEP_MAP_INIT(lockname) , .dep_map = { .name = #lockname }
-#else
-# define __RWSEM_DEP_MAP_INIT(lockname)
-#endif
-
-#define __RWSEM_INITIALIZER(name) \
-{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) \
- __RWSEM_DEP_MAP_INIT(name) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-extern void __init_rwsem(struct rw_semaphore *sem, const char *name,
- struct lock_class_key *key);
-
-#define init_rwsem(sem) \
-do { \
- static struct lock_class_key __key; \
- \
- __init_rwsem((sem), #sem, &__key); \
-} while (0)
-
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
- signed long old, new;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " ahi %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " aghi %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count),
- "i" (RWSEM_ACTIVE_READ_BIAS) : "cc", "memory" );
- if (old < 0)
- rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- signed long old, new;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: ltr %1,%0\n"
- " jm 1f\n"
- " ahi %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b\n"
- "1:"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: ltgr %1,%0\n"
- " jm 1f\n"
- " aghi %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b\n"
- "1:"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count),
- "i" (RWSEM_ACTIVE_READ_BIAS) : "cc", "memory" );
- return old >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write_nested(struct rw_semaphore *sem, int subclass)
-{
- signed long old, new, tmp;
-
- tmp = RWSEM_ACTIVE_WRITE_BIAS;
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " a %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " ag %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count), "m" (tmp)
- : "cc", "memory" );
- if (old != 0)
- rwsem_down_write_failed(sem);
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
- __down_write_nested(sem, 0);
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- signed long old;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%2)\n"
- "0: ltr %0,%0\n"
- " jnz 1f\n"
- " cs %0,%4,0(%2)\n"
- " jl 0b\n"
-#else /* __s390x__ */
- " lg %0,0(%2)\n"
- "0: ltgr %0,%0\n"
- " jnz 1f\n"
- " csg %0,%4,0(%2)\n"
- " jl 0b\n"
-#endif /* __s390x__ */
- "1:"
- : "=&d" (old), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count),
- "d" (RWSEM_ACTIVE_WRITE_BIAS) : "cc", "memory" );
- return (old == RWSEM_UNLOCKED_VALUE) ? 1 : 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
- signed long old, new;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " ahi %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " aghi %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count),
- "i" (-RWSEM_ACTIVE_READ_BIAS)
- : "cc", "memory" );
- if (new < 0)
- if ((new & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
- signed long old, new, tmp;
-
- tmp = -RWSEM_ACTIVE_WRITE_BIAS;
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " a %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " ag %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count), "m" (tmp)
- : "cc", "memory" );
- if (new < 0)
- if ((new & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- signed long old, new, tmp;
-
- tmp = -RWSEM_WAITING_BIAS;
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " a %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " ag %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count), "m" (tmp)
- : "cc", "memory" );
- if (new > 1)
- rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(long delta, struct rw_semaphore *sem)
-{
- signed long old, new;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " ar %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " agr %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count), "d" (delta)
- : "cc", "memory" );
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline long rwsem_atomic_update(long delta, struct rw_semaphore *sem)
-{
- signed long old, new;
-
- __asm__ __volatile__(
-#ifndef __s390x__
- " l %0,0(%3)\n"
- "0: lr %1,%0\n"
- " ar %1,%5\n"
- " cs %0,%1,0(%3)\n"
- " jl 0b"
-#else /* __s390x__ */
- " lg %0,0(%3)\n"
- "0: lgr %1,%0\n"
- " agr %1,%5\n"
- " csg %0,%1,0(%3)\n"
- " jl 0b"
-#endif /* __s390x__ */
- : "=&d" (old), "=&d" (new), "=m" (sem->count)
- : "a" (&sem->count), "m" (sem->count), "d" (delta)
- : "cc", "memory" );
- return new;
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _S390_RWSEM_H */
Index: linux-2.6/include/asm-sh/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-sh/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,159 +0,0 @@
-/*
- * include/asm-ppc/rwsem.h: R/W semaphores for SH using the stuff
- * in lib/rwsem.c.
- */
-
-#ifndef _ASM_SH_RWSEM_H
-#define _ASM_SH_RWSEM_H
-
-#ifdef __KERNEL__
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <asm/atomic.h>
-#include <asm/system.h>
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- long count;
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS (-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define __RWSEM_INITIALIZER(name) \
- { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
- LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
- if (atomic_inc_return((atomic_t *)(&sem->count)) > 0)
- smp_wmb();
- else
- rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- while ((tmp = sem->count) >= 0) {
- if (tmp == cmpxchg(&sem->count, tmp,
- tmp + RWSEM_ACTIVE_READ_BIAS)) {
- smp_wmb();
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count));
- if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
- smp_wmb();
- else
- rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- smp_wmb();
- return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
- int tmp;
-
- smp_wmb();
- tmp = atomic_dec_return((atomic_t *)(&sem->count));
- if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
- smp_wmb();
- if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count)) < 0)
- rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
- atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- smp_wmb();
- tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
- if (tmp < 0)
- rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
- smp_mb();
- return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_SH_RWSEM_H */
Index: linux-2.6/include/asm-sparc64/rwsem-const.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/rwsem-const.h 2005-06-22 12:56:58.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,12 +0,0 @@
-/* rwsem-const.h: RW semaphore counter constants. */
-#ifndef _SPARC64_RWSEM_CONST_H
-#define _SPARC64_RWSEM_CONST_H
-
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS 0xffff0000
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-#endif /* _SPARC64_RWSEM_CONST_H */
Index: linux-2.6/include/asm-sparc64/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-sparc64/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,66 +0,0 @@
-/* $Id: rwsem.h,v 1.5 2001/11/18 00:12:56 davem Exp $
- * rwsem.h: R/W semaphores implemented using CAS
- *
- * Written by David S. Miller ([email protected]), 2001.
- * Derived from asm-i386/rwsem.h
- */
-#ifndef _SPARC64_RWSEM_H
-#define _SPARC64_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <asm/rwsem-const.h>
-
-struct rwsem_waiter;
-
-struct rw_semaphore {
- signed int count;
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define __RWSEM_INITIALIZER(name) \
-{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-static __inline__ void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-extern void __down_read(struct rw_semaphore *sem);
-extern int __down_read_trylock(struct rw_semaphore *sem);
-extern void __down_write(struct rw_semaphore *sem);
-extern int __down_write_trylock(struct rw_semaphore *sem);
-extern void __up_read(struct rw_semaphore *sem);
-extern void __up_write(struct rw_semaphore *sem);
-extern void __downgrade_write(struct rw_semaphore *sem);
-
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
- return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
- atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* __KERNEL__ */
-
-#endif /* _SPARC64_RWSEM_H */
Index: linux-2.6/include/asm-um/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-um/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,6 +0,0 @@
-#ifndef __UM_RWSEM_H__
-#define __UM_RWSEM_H__
-
-#include "asm/arch/rwsem.h"
-
-#endif
Index: linux-2.6/include/asm-xtensa/rwsem.h
===================================================================
--- linux-2.6.orig/include/asm-xtensa/rwsem.h 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,164 +0,0 @@
-/*
- * include/asm-xtensa/rwsem.h
- *
- * This file is subject to the terms and conditions of the GNU General Public
- * License. See the file "COPYING" in the main directory of this archive
- * for more details.
- *
- * Largely copied from include/asm-ppc/rwsem.h
- *
- * Copyright (C) 2001 - 2005 Tensilica Inc.
- */
-
-#ifndef _XTENSA_RWSEM_H
-#define _XTENSA_RWSEM_H
-
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <asm/atomic.h>
-#include <asm/system.h>
-
-/*
- * the semaphore definition
- */
-struct rw_semaphore {
- signed long count;
-#define RWSEM_UNLOCKED_VALUE 0x00000000
-#define RWSEM_ACTIVE_BIAS 0x00000001
-#define RWSEM_ACTIVE_MASK 0x0000ffff
-#define RWSEM_WAITING_BIAS (-0x00010000)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
- spinlock_t wait_lock;
- struct list_head wait_list;
-};
-
-#define __RWSEM_INITIALIZER(name) \
- { RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
- LIST_HEAD_INIT((name).wait_list) }
-
-#define DECLARE_RWSEM(name) \
- struct rw_semaphore name = __RWSEM_INITIALIZER(name)
-
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
-static inline void init_rwsem(struct rw_semaphore *sem)
-{
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
- if (atomic_add_return(1,(atomic_t *)(&sem->count)) > 0)
- smp_wmb();
- else
- rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- while ((tmp = sem->count) >= 0) {
- if (tmp == cmpxchg(&sem->count, tmp,
- tmp + RWSEM_ACTIVE_READ_BIAS)) {
- smp_wmb();
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count));
- if (tmp == RWSEM_ACTIVE_WRITE_BIAS)
- smp_wmb();
- else
- rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
- int tmp;
-
- tmp = cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
- RWSEM_ACTIVE_WRITE_BIAS);
- smp_wmb();
- return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
- int tmp;
-
- smp_wmb();
- tmp = atomic_sub_return(1,(atomic_t *)(&sem->count));
- if (tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0)
- rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
- smp_wmb();
- if (atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
- (atomic_t *)(&sem->count)) < 0)
- rwsem_wake(sem);
-}
-
-/*
- * implement atomic add functionality
- */
-static inline void rwsem_atomic_add(int delta, struct rw_semaphore *sem)
-{
- atomic_add(delta, (atomic_t *)(&sem->count));
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
- int tmp;
-
- smp_wmb();
- tmp = atomic_add_return(-RWSEM_WAITING_BIAS, (atomic_t *)(&sem->count));
- if (tmp < 0)
- rwsem_downgrade_wake(sem);
-}
-
-/*
- * implement exchange and add functionality
- */
-static inline int rwsem_atomic_update(int delta, struct rw_semaphore *sem)
-{
- smp_mb();
- return atomic_add_return(delta, (atomic_t *)(&sem->count));
-}
-
-static inline int rwsem_is_locked(struct rw_semaphore *sem)
-{
- return (sem->count != 0);
-}
-
-#endif /* _XTENSA_RWSEM_H */
Index: linux-2.6/kernel/rwsem.c
===================================================================
--- linux-2.6.orig/kernel/rwsem.c 2006-08-05 18:38:48.000000000 +1000
+++ linux-2.6/kernel/rwsem.c 2006-09-14 03:47:06.000000000 +1000
@@ -13,6 +13,249 @@
#include <asm/atomic.h>

/*
+ * Initialize an rwsem:
+ */
+void __init_rwsem(struct rw_semaphore *sem, const char *name,
+ struct lock_class_key *key)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Make sure we are not reinitializing a held semaphore:
+ */
+ debug_check_no_locks_freed((void *)sem, sizeof(*sem));
+ lockdep_init_map(&sem->dep_map, name, key);
+#endif
+ atomic_set(&sem->count, RWSEM_UNLOCKED_VALUE);
+ spin_lock_init(&sem->wait_lock);
+ INIT_LIST_HEAD(&sem->wait_list);
+}
+
+EXPORT_SYMBOL(__init_rwsem);
+
+struct rwsem_waiter {
+ struct list_head list;
+ struct task_struct *task;
+ unsigned int flags;
+#define RWSEM_WAITING_FOR_READ 0x00000001
+#define RWSEM_WAITING_FOR_WRITE 0x00000002
+};
+
+/*
+ * handle the lock release when processes blocked on it that can now run
+ * - if we come here from up_xxxx(), then:
+ * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
+ * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
+ * - there must be someone on the queue
+ * - the spinlock must be held by the caller
+ * - woken process blocks are discarded from the list after having task zeroed
+ * - writers are only woken if downgrading is false
+ */
+static struct rw_semaphore *__rwsem_do_wake(struct rw_semaphore *sem,
+ int downgrading)
+{
+ struct rwsem_waiter *waiter;
+ struct task_struct *tsk;
+ struct list_head *next;
+ signed long oldcount, woken, loop;
+
+ if (downgrading)
+ goto dont_wake_writers;
+
+ /* if we came through an up_xxxx() call, we only only wake someone up
+ * if we can transition the active part of the count from 0 -> 1
+ */
+ try_again:
+ oldcount = atomic_add_return(RWSEM_ACTIVE_BIAS, &sem->count)
+ - RWSEM_ACTIVE_BIAS;
+ if (oldcount & RWSEM_ACTIVE_MASK)
+ goto undo;
+
+ waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+
+ /* try to grant a single write lock if there's a writer at the front
+ * of the queue - note we leave the 'active part' of the count
+ * incremented by 1 and the waiting part incremented by 0x00010000
+ */
+ if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE))
+ goto readers_only;
+
+ /* We must be careful not to touch 'waiter' after we set ->task = NULL.
+ * It is an allocated on the waiter's stack and may become invalid at
+ * any time after that point (due to a wakeup from another source).
+ */
+ list_del(&waiter->list);
+ tsk = waiter->task;
+ smp_mb();
+ waiter->task = NULL;
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+ goto out;
+
+ /* don't want to wake any writers */
+ dont_wake_writers:
+ waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
+ if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
+ goto out;
+
+ /* grant an infinite number of read locks to the readers at the front
+ * of the queue
+ * - note we increment the 'active part' of the count by the number of
+ * readers before waking any processes up
+ */
+ readers_only:
+ woken = 0;
+ do {
+ woken++;
+
+ if (waiter->list.next == &sem->wait_list)
+ break;
+
+ waiter = list_entry(waiter->list.next,
+ struct rwsem_waiter, list);
+
+ } while (waiter->flags & RWSEM_WAITING_FOR_READ);
+
+ loop = woken;
+ woken *= RWSEM_ACTIVE_BIAS - RWSEM_WAITING_BIAS;
+ if (!downgrading)
+ /* we'd already done one increment earlier */
+ woken -= RWSEM_ACTIVE_BIAS;
+
+ atomic_add(woken, &sem->count);
+
+ next = sem->wait_list.next;
+ for (; loop > 0; loop--) {
+ waiter = list_entry(next, struct rwsem_waiter, list);
+ next = waiter->list.next;
+ tsk = waiter->task;
+ smp_mb();
+ waiter->task = NULL;
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+ }
+
+ sem->wait_list.next = next;
+ next->prev = &sem->wait_list;
+
+ out:
+ return sem;
+
+ /* undo the change to count, but check for a transition 1->0 */
+ undo:
+ if (atomic_add_return(-RWSEM_ACTIVE_BIAS, &sem->count) != 0)
+ goto out;
+ goto try_again;
+}
+
+/*
+ * wait for a lock to be granted
+ */
+static struct rw_semaphore *rwsem_down_failed_common(struct rw_semaphore *sem,
+ struct rwsem_waiter *waiter, signed long adjustment)
+{
+ struct task_struct *tsk = current;
+ signed long count;
+
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+
+ /* set up my own style of waitqueue */
+ spin_lock_irq(&sem->wait_lock);
+ waiter->task = tsk;
+ get_task_struct(tsk);
+
+ list_add_tail(&waiter->list, &sem->wait_list);
+
+ /* we're now waiting on the lock, but no longer actively read-locking */
+ count = atomic_add_return(adjustment, &sem->count);
+
+ /* if there are no active locks, wake the front queued process(es) up */
+ if (!(count & RWSEM_ACTIVE_MASK))
+ sem = __rwsem_do_wake(sem, 0);
+
+ spin_unlock_irq(&sem->wait_lock);
+
+ /* wait to be given the lock */
+ for (;;) {
+ if (!waiter->task)
+ break;
+ schedule();
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+ }
+
+ tsk->state = TASK_RUNNING;
+
+ return sem;
+}
+
+/*
+ * wait for the read lock to be granted
+ */
+static struct rw_semaphore __sched *rwsem_down_read_failed(
+ struct rw_semaphore *sem)
+{
+ struct rwsem_waiter waiter;
+
+ waiter.flags = RWSEM_WAITING_FOR_READ;
+ rwsem_down_failed_common(sem, &waiter,
+ RWSEM_WAITING_BIAS - RWSEM_ACTIVE_BIAS);
+ return sem;
+}
+
+/*
+ * wait for the write lock to be granted
+ */
+static struct rw_semaphore __sched *rwsem_down_write_failed(
+ struct rw_semaphore *sem)
+{
+ struct rwsem_waiter waiter;
+
+ waiter.flags = RWSEM_WAITING_FOR_WRITE;
+ rwsem_down_failed_common(sem, &waiter, -RWSEM_ACTIVE_BIAS);
+
+ return sem;
+}
+
+/*
+ * handle waking up a waiter on the semaphore
+ * - up_read/up_write has decremented the active part of count if we come here
+ */
+static struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&sem->wait_lock, flags);
+
+ /* do nothing if list empty */
+ if (!list_empty(&sem->wait_list))
+ sem = __rwsem_do_wake(sem, 0);
+
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+ return sem;
+}
+
+/*
+ * downgrade a write lock into a read lock
+ * - caller incremented waiting part of count and discovered it still negative
+ * - just wake up any readers at the front of the queue
+ */
+static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&sem->wait_lock, flags);
+
+ /* do nothing if list empty */
+ if (!list_empty(&sem->wait_list))
+ sem = __rwsem_do_wake(sem, 1);
+
+ spin_unlock_irqrestore(&sem->wait_lock, flags);
+
+ return sem;
+}
+
+
+/*
* lock for reading
*/
void down_read(struct rw_semaphore *sem)
@@ -20,7 +263,8 @@ void down_read(struct rw_semaphore *sem)
might_sleep();
rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);

- __down_read(sem);
+ if (unlikely(atomic_inc_return(&sem->count) <= 0))
+ rwsem_down_read_failed(sem);
}

EXPORT_SYMBOL(down_read);
@@ -30,11 +274,16 @@ EXPORT_SYMBOL(down_read);
*/
int down_read_trylock(struct rw_semaphore *sem)
{
- int ret = __down_read_trylock(sem);
+ int tmp;

- if (ret == 1)
- rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
- return ret;
+ while ((tmp = atomic_read(&sem->count)) >= 0) {
+ if (tmp == atomic_cmpxchg(&sem->count, tmp,
+ tmp + RWSEM_ACTIVE_READ_BIAS)) {
+ rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+ return 1;
+ }
+ }
+ return 0;
}

EXPORT_SYMBOL(down_read_trylock);
@@ -44,10 +293,14 @@ EXPORT_SYMBOL(down_read_trylock);
*/
void down_write(struct rw_semaphore *sem)
{
+ int tmp;
+
might_sleep();
rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);

- __down_write(sem);
+ tmp = atomic_add_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
+ if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+ rwsem_down_write_failed(sem);
}

EXPORT_SYMBOL(down_write);
@@ -57,11 +310,15 @@ EXPORT_SYMBOL(down_write);
*/
int down_write_trylock(struct rw_semaphore *sem)
{
- int ret = __down_write_trylock(sem);
+ int tmp;

- if (ret == 1)
+ tmp = atomic_cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
+ RWSEM_ACTIVE_WRITE_BIAS);
+ if (tmp == RWSEM_UNLOCKED_VALUE) {
rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
- return ret;
+ return 1;
+ }
+ return 0;
}

EXPORT_SYMBOL(down_write_trylock);
@@ -71,9 +328,13 @@ EXPORT_SYMBOL(down_write_trylock);
*/
void up_read(struct rw_semaphore *sem)
{
+ int tmp;
+
rwsem_release(&sem->dep_map, 1, _RET_IP_);

- __up_read(sem);
+ tmp = atomic_dec_return(&sem->count);
+ if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0))
+ rwsem_wake(sem);
}

EXPORT_SYMBOL(up_read);
@@ -85,7 +346,9 @@ void up_write(struct rw_semaphore *sem)
{
rwsem_release(&sem->dep_map, 1, _RET_IP_);

- __up_write(sem);
+ if (unlikely(atomic_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
+ &sem->count) < 0))
+ rwsem_wake(sem);
}

EXPORT_SYMBOL(up_write);
@@ -95,11 +358,16 @@ EXPORT_SYMBOL(up_write);
*/
void downgrade_write(struct rw_semaphore *sem)
{
+ int tmp;
+
/*
* lockdep: a downgraded write will live on as a write
* dependency.
*/
- __downgrade_write(sem);
+
+ tmp = atomic_add_return(-RWSEM_WAITING_BIAS, &sem->count);
+ if (tmp < 0)
+ rwsem_downgrade_wake(sem);
}

EXPORT_SYMBOL(downgrade_write);
Index: linux-2.6/lib/rwsem.c
===================================================================
--- linux-2.6.orig/lib/rwsem.c 2006-09-14 03:18:49.000000000 +1000
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,257 +0,0 @@
-/* rwsem.c: R/W semaphores: contention handling functions
- *
- * Written by David Howells ([email protected]).
- * Derived from arch/i386/kernel/semaphore.c
- */
-#include <linux/rwsem.h>
-#include <linux/sched.h>
-#include <linux/init.h>
-#include <linux/module.h>
-
-/*
- * Initialize an rwsem:
- */
-void __init_rwsem(struct rw_semaphore *sem, const char *name,
- struct lock_class_key *key)
-{
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- /*
- * Make sure we are not reinitializing a held semaphore:
- */
- debug_check_no_locks_freed((void *)sem, sizeof(*sem));
- lockdep_init_map(&sem->dep_map, name, key);
-#endif
- sem->count = RWSEM_UNLOCKED_VALUE;
- spin_lock_init(&sem->wait_lock);
- INIT_LIST_HEAD(&sem->wait_list);
-}
-
-EXPORT_SYMBOL(__init_rwsem);
-
-struct rwsem_waiter {
- struct list_head list;
- struct task_struct *task;
- unsigned int flags;
-#define RWSEM_WAITING_FOR_READ 0x00000001
-#define RWSEM_WAITING_FOR_WRITE 0x00000002
-};
-
-/*
- * handle the lock release when processes blocked on it that can now run
- * - if we come here from up_xxxx(), then:
- * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
- * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
- * - there must be someone on the queue
- * - the spinlock must be held by the caller
- * - woken process blocks are discarded from the list after having task zeroed
- * - writers are only woken if downgrading is false
- */
-static inline struct rw_semaphore *
-__rwsem_do_wake(struct rw_semaphore *sem, int downgrading)
-{
- struct rwsem_waiter *waiter;
- struct task_struct *tsk;
- struct list_head *next;
- signed long oldcount, woken, loop;
-
- if (downgrading)
- goto dont_wake_writers;
-
- /* if we came through an up_xxxx() call, we only only wake someone up
- * if we can transition the active part of the count from 0 -> 1
- */
- try_again:
- oldcount = rwsem_atomic_update(RWSEM_ACTIVE_BIAS, sem)
- - RWSEM_ACTIVE_BIAS;
- if (oldcount & RWSEM_ACTIVE_MASK)
- goto undo;
-
- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
-
- /* try to grant a single write lock if there's a writer at the front
- * of the queue - note we leave the 'active part' of the count
- * incremented by 1 and the waiting part incremented by 0x00010000
- */
- if (!(waiter->flags & RWSEM_WAITING_FOR_WRITE))
- goto readers_only;
-
- /* We must be careful not to touch 'waiter' after we set ->task = NULL.
- * It is an allocated on the waiter's stack and may become invalid at
- * any time after that point (due to a wakeup from another source).
- */
- list_del(&waiter->list);
- tsk = waiter->task;
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
- goto out;
-
- /* don't want to wake any writers */
- dont_wake_writers:
- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
- if (waiter->flags & RWSEM_WAITING_FOR_WRITE)
- goto out;
-
- /* grant an infinite number of read locks to the readers at the front
- * of the queue
- * - note we increment the 'active part' of the count by the number of
- * readers before waking any processes up
- */
- readers_only:
- woken = 0;
- do {
- woken++;
-
- if (waiter->list.next == &sem->wait_list)
- break;
-
- waiter = list_entry(waiter->list.next,
- struct rwsem_waiter, list);
-
- } while (waiter->flags & RWSEM_WAITING_FOR_READ);
-
- loop = woken;
- woken *= RWSEM_ACTIVE_BIAS - RWSEM_WAITING_BIAS;
- if (!downgrading)
- /* we'd already done one increment earlier */
- woken -= RWSEM_ACTIVE_BIAS;
-
- rwsem_atomic_add(woken, sem);
-
- next = sem->wait_list.next;
- for (; loop > 0; loop--) {
- waiter = list_entry(next, struct rwsem_waiter, list);
- next = waiter->list.next;
- tsk = waiter->task;
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
- }
-
- sem->wait_list.next = next;
- next->prev = &sem->wait_list;
-
- out:
- return sem;
-
- /* undo the change to count, but check for a transition 1->0 */
- undo:
- if (rwsem_atomic_update(-RWSEM_ACTIVE_BIAS, sem) != 0)
- goto out;
- goto try_again;
-}
-
-/*
- * wait for a lock to be granted
- */
-static inline struct rw_semaphore *
-rwsem_down_failed_common(struct rw_semaphore *sem,
- struct rwsem_waiter *waiter, signed long adjustment)
-{
- struct task_struct *tsk = current;
- signed long count;
-
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-
- /* set up my own style of waitqueue */
- spin_lock_irq(&sem->wait_lock);
- waiter->task = tsk;
- get_task_struct(tsk);
-
- list_add_tail(&waiter->list, &sem->wait_list);
-
- /* we're now waiting on the lock, but no longer actively read-locking */
- count = rwsem_atomic_update(adjustment, sem);
-
- /* if there are no active locks, wake the front queued process(es) up */
- if (!(count & RWSEM_ACTIVE_MASK))
- sem = __rwsem_do_wake(sem, 0);
-
- spin_unlock_irq(&sem->wait_lock);
-
- /* wait to be given the lock */
- for (;;) {
- if (!waiter->task)
- break;
- schedule();
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
- }
-
- tsk->state = TASK_RUNNING;
-
- return sem;
-}
-
-/*
- * wait for the read lock to be granted
- */
-struct rw_semaphore fastcall __sched *
-rwsem_down_read_failed(struct rw_semaphore *sem)
-{
- struct rwsem_waiter waiter;
-
- waiter.flags = RWSEM_WAITING_FOR_READ;
- rwsem_down_failed_common(sem, &waiter,
- RWSEM_WAITING_BIAS - RWSEM_ACTIVE_BIAS);
- return sem;
-}
-
-/*
- * wait for the write lock to be granted
- */
-struct rw_semaphore fastcall __sched *
-rwsem_down_write_failed(struct rw_semaphore *sem)
-{
- struct rwsem_waiter waiter;
-
- waiter.flags = RWSEM_WAITING_FOR_WRITE;
- rwsem_down_failed_common(sem, &waiter, -RWSEM_ACTIVE_BIAS);
-
- return sem;
-}
-
-/*
- * handle waking up a waiter on the semaphore
- * - up_read/up_write has decremented the active part of count if we come here
- */
-struct rw_semaphore fastcall *rwsem_wake(struct rw_semaphore *sem)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- /* do nothing if list empty */
- if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem, 0);
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-
- return sem;
-}
-
-/*
- * downgrade a write lock into a read lock
- * - caller incremented waiting part of count and discovered it still negative
- * - just wake up any readers at the front of the queue
- */
-struct rw_semaphore fastcall *rwsem_downgrade_wake(struct rw_semaphore *sem)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&sem->wait_lock, flags);
-
- /* do nothing if list empty */
- if (!list_empty(&sem->wait_list))
- sem = __rwsem_do_wake(sem, 1);
-
- spin_unlock_irqrestore(&sem->wait_lock, flags);
-
- return sem;
-}
-
-EXPORT_SYMBOL(rwsem_down_read_failed);
-EXPORT_SYMBOL(rwsem_down_write_failed);
-EXPORT_SYMBOL(rwsem_wake);
-EXPORT_SYMBOL(rwsem_downgrade_wake);
Index: linux-2.6/arch/alpha/Kconfig
===================================================================
--- linux-2.6.orig/arch/alpha/Kconfig 2006-06-28 02:35:47.000000000 +1000
+++ linux-2.6/arch/alpha/Kconfig 2006-09-14 03:39:45.000000000 +1000
@@ -18,13 +18,6 @@ config MMU
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/arm/Kconfig
===================================================================
--- linux-2.6.orig/arch/arm/Kconfig 2006-08-05 18:36:24.000000000 +1000
+++ linux-2.6/arch/arm/Kconfig 2006-09-14 03:39:49.000000000 +1000
@@ -59,13 +59,6 @@ config GENERIC_IRQ_PROBE
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/arm26/Kconfig
===================================================================
--- linux-2.6.orig/arch/arm26/Kconfig 2006-08-05 18:36:30.000000000 +1000
+++ linux-2.6/arch/arm26/Kconfig 2006-09-14 03:39:52.000000000 +1000
@@ -34,13 +34,6 @@ config FORCE_MAX_ZONEORDER
int
default 9

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/cris/Kconfig
===================================================================
--- linux-2.6.orig/arch/cris/Kconfig 2006-08-05 18:36:30.000000000 +1000
+++ linux-2.6/arch/cris/Kconfig 2006-09-14 03:39:54.000000000 +1000
@@ -9,13 +9,6 @@ config MMU
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/frv/Kconfig
===================================================================
--- linux-2.6.orig/arch/frv/Kconfig 2006-04-15 20:19:19.000000000 +1000
+++ linux-2.6/arch/frv/Kconfig 2006-09-14 03:39:57.000000000 +1000
@@ -6,13 +6,6 @@ config FRV
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/h8300/Kconfig
===================================================================
--- linux-2.6.orig/arch/h8300/Kconfig 2006-04-15 20:19:19.000000000 +1000
+++ linux-2.6/arch/h8300/Kconfig 2006-09-14 03:40:03.000000000 +1000
@@ -21,14 +21,6 @@ config FPU
bool
default n

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default n
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig 2006-09-13 13:16:22.000000000 +1000
+++ linux-2.6/arch/ia64/Kconfig 2006-09-14 03:40:13.000000000 +1000
@@ -30,10 +30,6 @@ config SWIOTLB
bool
default y

-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/m32r/Kconfig
===================================================================
--- linux-2.6.orig/arch/m32r/Kconfig 2006-04-20 18:54:51.000000000 +1000
+++ linux-2.6/arch/m32r/Kconfig 2006-09-14 03:40:19.000000000 +1000
@@ -205,15 +205,6 @@ config IRAM_SIZE
# Define implied options from the CPU selection here
#

-config RWSEM_GENERIC_SPINLOCK
- bool
- depends on M32R
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default n
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/m68k/Kconfig
===================================================================
--- linux-2.6.orig/arch/m68k/Kconfig 2006-04-15 20:19:20.000000000 +1000
+++ linux-2.6/arch/m68k/Kconfig 2006-09-14 03:40:22.000000000 +1000
@@ -10,13 +10,6 @@ config MMU
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/m68knommu/Kconfig
===================================================================
--- linux-2.6.orig/arch/m68knommu/Kconfig 2006-08-05 18:36:40.000000000 +1000
+++ linux-2.6/arch/m68knommu/Kconfig 2006-09-14 03:40:26.000000000 +1000
@@ -17,14 +17,6 @@ config FPU
bool
default n

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default n
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/mips/Kconfig
===================================================================
--- linux-2.6.orig/arch/mips/Kconfig 2006-08-05 18:36:40.000000000 +1000
+++ linux-2.6/arch/mips/Kconfig 2006-09-14 03:42:45.000000000 +1000
@@ -837,13 +837,6 @@ source "arch/mips/cobalt/Kconfig"

endmenu

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
@@ -1845,10 +1838,6 @@ config MIPS_INSANE_LARGE

endmenu

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
source "init/Kconfig"

menu "Bus options (PCI, PCMCIA, EISA, ISA, TC)"
Index: linux-2.6/arch/parisc/Kconfig
===================================================================
--- linux-2.6.orig/arch/parisc/Kconfig 2006-08-05 18:36:47.000000000 +1000
+++ linux-2.6/arch/parisc/Kconfig 2006-09-14 03:40:33.000000000 +1000
@@ -19,12 +19,6 @@ config MMU
config STACK_GROWSUP
def_bool y

-config RWSEM_GENERIC_SPINLOCK
- def_bool y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.orig/arch/powerpc/Kconfig 2006-09-13 13:16:22.000000000 +1000
+++ linux-2.6/arch/powerpc/Kconfig 2006-09-14 03:40:36.000000000 +1000
@@ -34,13 +34,6 @@ config IRQ_PER_CPU
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/ppc/Kconfig
===================================================================
--- linux-2.6.orig/arch/ppc/Kconfig 2006-08-05 18:36:52.000000000 +1000
+++ linux-2.6/arch/ppc/Kconfig 2006-09-14 03:40:38.000000000 +1000
@@ -12,13 +12,6 @@ config GENERIC_HARDIRQS
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig 2006-08-05 18:36:54.000000000 +1000
+++ linux-2.6/arch/s390/Kconfig 2006-09-14 03:40:41.000000000 +1000
@@ -15,13 +15,6 @@ config STACKTRACE_SUPPORT
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/sh/Kconfig
===================================================================
--- linux-2.6.orig/arch/sh/Kconfig 2006-08-05 18:36:55.000000000 +1000
+++ linux-2.6/arch/sh/Kconfig 2006-09-14 03:40:44.000000000 +1000
@@ -14,13 +14,6 @@ config SUPERH
gaming console. The SuperH port has a home page at
<http://www.linux-sh.org/>.

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/sh64/Kconfig
===================================================================
--- linux-2.6.orig/arch/sh64/Kconfig 2006-04-15 20:19:22.000000000 +1000
+++ linux-2.6/arch/sh64/Kconfig 2006-09-14 03:43:10.000000000 +1000
@@ -17,10 +17,6 @@ config MMU
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
config GENERIC_FIND_NEXT_BIT
bool
default y
@@ -33,9 +29,6 @@ config GENERIC_CALIBRATE_DELAY
bool
default y

-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_ISA_DMA
bool

Index: linux-2.6/arch/sparc/Kconfig
===================================================================
--- linux-2.6.orig/arch/sparc/Kconfig 2006-04-15 20:19:22.000000000 +1000
+++ linux-2.6/arch/sparc/Kconfig 2006-09-14 03:40:51.000000000 +1000
@@ -143,13 +143,6 @@ config SUN_IO
bool
default y

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/sparc64/Kconfig
===================================================================
--- linux-2.6.orig/arch/sparc64/Kconfig 2006-08-05 18:36:59.000000000 +1000
+++ linux-2.6/arch/sparc64/Kconfig 2006-09-14 03:40:56.000000000 +1000
@@ -159,13 +159,6 @@ config US2E_FREQ
If in doubt, say N.

# Global things across all Sun machines.
-config RWSEM_GENERIC_SPINLOCK
- bool
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/v850/Kconfig
===================================================================
--- linux-2.6.orig/arch/v850/Kconfig 2006-04-15 20:19:23.000000000 +1000
+++ linux-2.6/arch/v850/Kconfig 2006-09-14 03:41:02.000000000 +1000
@@ -10,12 +10,6 @@ mainmenu "uClinux/v850 (w/o MMU) Kernel
config MMU
bool
default n
-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-config RWSEM_XCHGADD_ALGORITHM
- bool
- default n
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.orig/arch/x86_64/Kconfig 2006-08-05 18:37:01.000000000 +1000
+++ linux-2.6/arch/x86_64/Kconfig 2006-09-14 03:41:05.000000000 +1000
@@ -46,13 +46,6 @@ config ISA
config SBUS
bool

-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
-
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/arch/xtensa/Kconfig
===================================================================
--- linux-2.6.orig/arch/xtensa/Kconfig 2006-08-05 18:37:03.000000000 +1000
+++ linux-2.6/arch/xtensa/Kconfig 2006-09-14 03:41:08.000000000 +1000
@@ -18,10 +18,6 @@ config XTENSA
with reasonable minimum requirements. The Xtensa Linux project has
a home page at <http://xtensa.sourceforge.net/>.

-config RWSEM_XCHGADD_ALGORITHM
- bool
- default y
-
config GENERIC_FIND_NEXT_BIT
bool
default y
Index: linux-2.6/arch/i386/Kconfig.cpu
===================================================================
--- linux-2.6.orig/arch/i386/Kconfig.cpu 2006-08-05 18:36:33.000000000 +1000
+++ linux-2.6/arch/i386/Kconfig.cpu 2006-09-14 03:42:35.000000000 +1000
@@ -230,16 +230,6 @@ config X86_L1_CACHE_SHIFT
default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
default "6" if MK7 || MK8 || MPENTIUMM

-config RWSEM_GENERIC_SPINLOCK
- bool
- depends on M386
- default y
-
-config RWSEM_XCHGADD_ALGORITHM
- bool
- depends on !M386
- default y
-
config GENERIC_CALIBRATE_DELAY
bool
default y
Index: linux-2.6/arch/um/Kconfig.x86_64
===================================================================
--- linux-2.6.orig/arch/um/Kconfig.x86_64 2006-04-15 20:19:22.000000000 +1000
+++ linux-2.6/arch/um/Kconfig.x86_64 2006-09-14 03:43:07.000000000 +1000
@@ -7,10 +7,6 @@ config 64BIT
default y

#XXX: this is so in the underlying arch, but it's wrong!!!
-config RWSEM_GENERIC_SPINLOCK
- bool
- default y
-
config SEMAPHORE_SLEEPERS
bool
default y
Index: linux-2.6/lib/Makefile
===================================================================
--- linux-2.6.orig/lib/Makefile 2006-08-05 18:38:48.000000000 +1000
+++ linux-2.6/lib/Makefile 2006-09-14 03:47:18.000000000 +1000
@@ -20,8 +20,6 @@ endif

obj-$(CONFIG_DEBUG_LOCKING_API_SELFTESTS) += locking-selftest.o
obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
-lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_SEMAPHORE_SLEEPERS) += semaphore-sleepers.o
lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o
lib-$(CONFIG_GENERIC_HWEIGHT) += hweight.o

Attachments:

rwsem-generic.patch (84.70 kB)

2006-09-13 18:07:48

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Thu, 14 Sep 2006, Nick Piggin wrote:

> Comments?

You only support 64k waiters. We have had cases of software failing
because more than 64k readers were waiting.

Please sent patches inline in order for us to be able to comment.

2006-09-13 18:13:16

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Christoph Lameter wrote:
> On Thu, 14 Sep 2006, Nick Piggin wrote:
>
>
>>Comments?
>
>
> You only support 64k waiters. We have had cases of software failing
> because more than 64k readers were waiting.

Oh really? OK I figured if ppc64 was OK then that would be enough,
but your large Altix systems did slip my mind.

That is a fair criticism... atomic_long it will have to be, then.
That will require a bit of atomic work to get atomic64_cmpxchg
available on all 64-bit architectures.

> Please sent patches inline in order for us to be able to comment.

OK I will for next patchset.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-13 18:16:58

by Christoph Lameter

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

On Thu, 14 Sep 2006, Nick Piggin wrote:

> Oh really? OK I figured if ppc64 was OK then that would be enough,
> but your large Altix systems did slip my mind.

Look at the ia64 rwsem implementation in include/asm-ia64/rwsem.h.

> That is a fair criticism... atomic_long it will have to be, then.
> That will require a bit of atomic work to get atomic64_cmpxchg
> available on all 64-bit architectures.

I would greatly appreciate having that.

2006-09-13 18:51:17

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Nick Piggin <[email protected]> wrote:

> + while ((tmp = atomic_read(&sem->count)) >= 0) {
> + if (tmp == atomic_cmpxchg(&sem->count, tmp,
> + tmp + RWSEM_ACTIVE_READ_BIAS)) {

NAK for FRV. Do not use atomic_cmpxchg() there as it isn't strictly atomic
(FRV only has one strictly atomic operation: SWAP). Please leave FRV as using
the spinlock version which is more efficient on UP.

Please also show benchmarks to show the performance difference between your
version and the old version before Ingo messed it up and made everything
unconditionally out of line without cleaning the inline asm up.

If you are going to generalise, you should get rid of everything barring the
spinlock-based version and stick with that. It will cost you performance
under some circumstances, but it's better under others than attempting to use
atomic_cmpxchg() which may not really exist on all archs.

You've also caused another problem: the spinlock based version permits up to
2^31 - 1 readers at one time, the inline optimised version, on a 32-bit arch,
will only permit up to 2^16 - 1 at most. By doing this to x86_64, you've
reduced the number of processes it can support.

David

2006-09-13 19:25:55

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

David Howells wrote:
> Nick Piggin <[email protected]> wrote:
>
>
>>+ while ((tmp = atomic_read(&sem->count)) >= 0) {
>>+ if (tmp == atomic_cmpxchg(&sem->count, tmp,
>>+ tmp + RWSEM_ACTIVE_READ_BIAS)) {
>
>
> NAK for FRV. Do not use atomic_cmpxchg() there as it isn't strictly atomic
> (FRV only has one strictly atomic operation: SWAP). Please leave FRV as using
> the spinlock version which is more efficient on UP.

From what I can read of the asm and the documentation, it is atomic. If
it were not atomic then it is badly broken.

BTW. if atomic_cmpxchg is slower than a local_irq_disable+local_irq_enable
on your architecture then you have probably not implemented cmpxchg well
because you may as well just implement it with local_irq_disable.

> Please also show benchmarks to show the performance difference between your
> version and the old version before Ingo messed it up and made everything
> unconditionally out of line without cleaning the inline asm up.

I'm not so interested in counting cycles so much as consolidating duplicated
code and reduce complexity and icache footprint. I'll leave you to benchmark
Ingo's changes if you're concerned about them.

Of course I will benchmark the end results when I finish the patch, though.

> If you are going to generalise, you should get rid of everything barring the
> spinlock-based version and stick with that. It will cost you performance
> under some circumstances, but it's better under others than attempting to use
> atomic_cmpxchg() which may not really exist on all archs.

atomic_cmpxchg exists on all architectures.

I'm happy to go with the spinlock version (it is even simpler), but I will
have to benchmark that. I have seen small slowdowns there in high contention
situations but it was improved with my rwsem scalability patch. If
performance is no different between the two, then the spinlock version is a
no brainer.

> You've also caused another problem: the spinlock based version permits up to
> 2^31 - 1 readers at one time, the inline optimised version, on a 32-bit arch,
> will only permit up to 2^16 - 1 at most. By doing this to x86_64, you've
> reduced the number of processes it can support.

Yep, Christoph pointed this out. I'll fix it.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-14 11:41:50

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Nick Piggin <[email protected]> wrote:

> > NAK for FRV. Do not use atomic_cmpxchg() there as it isn't strictly
> > atomic (FRV only has one strictly atomic operation: SWAP). Please leave
> > FRV as using the spinlock version which is more efficient on UP.
>
> From what I can read of the asm and the documentation, it is atomic. If
> it were not atomic then it is badly broken.

It is not strictly atomic because there is no way to do it strictly atomically
with the instruction set available on the FRV CPUs. What I have is okay for
UP, though not the most efficient, but will not work on SMP.

On SMP, atomic_cmpxchg() will have to be implemented with spinlocks unless
Fujitsu add something more powerful than SWAP to the available instructions.

In which case, the spinlock-based rwsems are instantly better because you only
need to take the spinlocks once, not several times, in the slow-path case.

And on UP, the spinlocks devolve to anti-preemption controls and interrupt
disablement[*], all of which is very quick and can be packed.

[*] Using JIT disablement; *actually* disabling the interrupts is very slow.

> BTW. if atomic_cmpxchg is slower than a local_irq_disable+local_irq_enable
> on your architecture then you have probably not implemented cmpxchg well
> because you may as well just implement it with local_irq_disable.

Using local_irq_disable() is not sufficient to implement cmpxchg() on anything
but a UP system, but on a UP system, you don't actually need to do cmpxchg() at
all.

If I have JIT disablement, and I don't have preemption enabled, then the
spinlock wrapping on the fastpath is this:

andcc gr0,gr0,gr0,icc2 // 1 cycle
...
oricc gr0,#1,gr0,icc2 // 1 cycle
tihi icc2,gr0,#2 // 1 cycle if no pending interrupt

which, as long as there weren't any interrupts, is about as fast as I can get
it.

It occurs to me that I could possibly improve the performance of the
spinlock-based rwsems by making use of bit 30 of the count to indicate stuff
waiting on the queue.

> I'm not so interested in counting cycles so much as consolidating duplicated
> code

There's a reason that the common parts of the code are in lib/.

> and reduce complexity

Not so. The spinlock-based rwsems are less complex.

> and icache footprint.

And smaller.

And the optimised rwsems don't necessarily have a larger footprint. After all,
given what Ingo has done and moved them completely out-of-line, means that the
compiler actually has to encode a call to them. Which means that some of the
values it currently has in registers are going to get clobbered, and it has to
arrange for whatever to be in the right registers. Now I grant you that this
is very much arch dependent, and mostly applies to i386 where only three
registers get clobbered by a call (which we can reduce to one).

> atomic_cmpxchg exists on all architectures.

But that doesn't mean you should blindly use it without consideration of the
consequences, or use it when there's a better way available.

> I'm happy to go with the spinlock version (it is even simpler), but I will
> have to benchmark that. I have seen small slowdowns there in high contention
> situations but it was improved with my rwsem scalability patch. If
> performance is no different between the two, then the spinlock version is a
> no brainer.

Using the spinlock-based rwsems in preference to XADD, for instance, is
potentially a source of greater contention, because (hopefully) XADD, in the
ideal case, will touch the cacheline containing the rwsem once, and that'll be
it, whereas with the spinlock version you have to touch the cacheline at least
four times (spinlock grab, examine data, modify data, spinlock drop), and that
means the cacheline can go elsewhere in between.

Using an emulated cmpxchg in preference to the spinlock-based rwsems, on the
other hand, may be as bad in the fast path because you may have to get an
implicit spinlock in the cmpxchg implementation, and in the slowpath it's worse
because not only do you have to get the rwsem-spinlock overall, but you also
have to get the cmpxchg-lock at regular intervals. So you've got two
cachelines potentially bouncing around.

Note that the same arguments apply to the use of just CMPXCHG (eg: sparc) or to
ST/LL or equivalent constructs (eg: mips, alpha, powerpc, arm6) because there's
a gap between the store and the load, and that gives an opportunity for the
cacheline to get stolen, and if that happens, there's a chance that the
instruction will have to be repeated.

As far as I can tell, XADD or direct equivalent is the only one where this
isn't true, but even that may be implemented by ST/LL-equivalents internally in
the CPU.

As I said before, if all you've got is CMPXCHG, you can actually do better than
either algorithm we currently have.

> my rwsem scalability patch

Ummm... I don't remember what that changed; can you send it to me again?

What is it you're trying to achieve anyway? Are you trying to do
generalisation for generalisation's sake? If so, consider that that might not
be the best thing, and that you might end up with something that rocks for one
or two archs and sucks for the rest.

Look at the genirq thing for an example of that... :-)

David

2006-09-14 15:28:08

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

David Howells wrote:
> Nick Piggin <[email protected]> wrote:
>>From what I can read of the asm and the documentation, it is atomic. If
>>it were not atomic then it is badly broken.
>
> It is not strictly atomic because there is no way to do it strictly atomically
> with the instruction set available on the FRV CPUs. What I have is okay for

It is atomic. It is an indivisible operation with respect to other atomic_
instructions. I fail to see what the problem is here.

> UP, though not the most efficient, but will not work on SMP.

Maybe, but that will be a problem for your atomic operations, not code that
uses them. Your "spinlocks" don't currently work on SMP either ;)

> On SMP, atomic_cmpxchg() will have to be implemented with spinlocks unless
> Fujitsu add something more powerful than SWAP to the available instructions.
>
> In which case, the spinlock-based rwsems are instantly better because you only
> need to take the spinlocks once, not several times, in the slow-path case.

The moment you're doing atomics with spinlocks, of course one would love to
rewrite many data structures to use spinlocks rather than atomics for best
performance. Luckily we don't see an asm-sparc/struct-page.h. You have to
draw the line somewhere.

> And on UP, the spinlocks devolve to anti-preemption controls and interrupt
> disablement[*], all of which is very quick and can be packed.
>
> [*] Using JIT disablement; *actually* disabling the interrupts is very slow.

So you can use your virtual interrupt disabling for atomic_cmpxchg too. And
if you implement atomics with spinlocks in future then UP devolves exactly
the same way.

>>BTW. if atomic_cmpxchg is slower than a local_irq_disable+local_irq_enable
>>on your architecture then you have probably not implemented cmpxchg well
>>because you may as well just implement it with local_irq_disable.
>
>
> Using local_irq_disable() is not sufficient to implement cmpxchg() on anything
> but a UP system, but on a UP system, you don't actually need to do cmpxchg() at
> all.
>
> If I have JIT disablement, and I don't have preemption enabled, then the
> spinlock wrapping on the fastpath is this:
>
> andcc gr0,gr0,gr0,icc2 // 1 cycle
> ...
> oricc gr0,#1,gr0,icc2 // 1 cycle
> tihi icc2,gr0,#2 // 1 cycle if no pending interrupt
>
> which, as long as there weren't any interrupts, is about as fast as I can get
> it.

And you could use exactly that to implement atomics too. Hence my point about
your atomic_cmpxchg not being optimal if it is worse than this.

> It occurs to me that I could possibly improve the performance of the
> spinlock-based rwsems by making use of bit 30 of the count to indicate stuff
> waiting on the queue.
>
>
>>I'm not so interested in counting cycles so much as consolidating duplicated
>>code
>
>
> There's a reason that the common parts of the code are in lib/.

Duplicated in that we have two complete implementations to do the same job.
Increasing test and review coverage is a big plus. Before x86-64, there
were really no SMP architectures that mattered using the spinlock version.
There have been subtle memory ordering bugs in there in the past, for
example.

>>and reduce complexity
>
>
> Not so. The spinlock-based rwsems are less complex.

I'm talking about overall complexity. At the moment you have 2 different designs,
one of which is implemented in tricky ways in assembly by 8 architectures. Going
to a single design, all implemented in common code (no asm) is a huge reduction
in complexity.

>>and icache footprint.
>
>
> And smaller.

It is actually larger here.

text data bss dec hex filename
970 0 0 970 3ca lib/rwsem-spinlock.o
576 0 0 576 240 kernel/spinlock.o
=1546

text data bss dec hex filename
1310 0 0 1310 51e lib/rwsem.o
193 0 0 193 c1 kernel/rwsem.o
=1503

> And the optimised rwsems don't necessarily have a larger footprint. After all,
> given what Ingo has done and moved them completely out-of-line, means that the
> compiler actually has to encode a call to them. Which means that some of the
> values it currently has in registers are going to get clobbered, and it has to
> arrange for whatever to be in the right registers. Now I grant you that this
> is very much arch dependent, and mostly applies to i386 where only three
> registers get clobbered by a call (which we can reduce to one).

My patch does nothing to move them back inline. It might be worthwhile, but
I see spinlocks and mutexes are out of line too now... I don't see the
memory trend reversing any time soon ;)

>>atomic_cmpxchg exists on all architectures.
>
>
> But that doesn't mean you should blindly use it without consideration of the
> consequences, or use it when there's a better way available.

Of course. Luckily, I haven't.

>>I'm happy to go with the spinlock version (it is even simpler), but I will
>>have to benchmark that. I have seen small slowdowns there in high contention
>>situations but it was improved with my rwsem scalability patch. If
>>performance is no different between the two, then the spinlock version is a
>>no brainer.
>
>
> Using the spinlock-based rwsems in preference to XADD, for instance, is
> potentially a source of greater contention, because (hopefully) XADD, in the
> ideal case, will touch the cacheline containing the rwsem once, and that'll be
> it, whereas with the spinlock version you have to touch the cacheline at least
> four times (spinlock grab, examine data, modify data, spinlock drop), and that
> means the cacheline can go elsewhere in between.

Yes, that's why I first chose the atomic_add_return variant for the generic
implementation.

> Using an emulated cmpxchg in preference to the spinlock-based rwsems, on the
> other hand, may be as bad in the fast path because you may have to get an
> implicit spinlock in the cmpxchg implementation, and in the slowpath it's worse
> because not only do you have to get the rwsem-spinlock overall, but you also
> have to get the cmpxchg-lock at regular intervals. So you've got two
> cachelines potentially bouncing around.

Yep, using spinlocks for atomics sucks in general. Making a tiny speedup
to rwsems isn't going to help sparc or parisc too much.

> Note that the same arguments apply to the use of just CMPXCHG (eg: sparc) or to
> ST/LL or equivalent constructs (eg: mips, alpha, powerpc, arm6) because there's
> a gap between the store and the load, and that gives an opportunity for the
> cacheline to get stolen, and if that happens, there's a chance that the
> instruction will have to be repeated.
>
> As far as I can tell, XADD or direct equivalent is the only one where this
> isn't true, but even that may be implemented by ST/LL-equivalents internally in
> the CPU.

Why would xadd be different from ll/sc based xadd? Or cmpxchg for that matter?

The CPU controlls the cache at this point anyway, so I'm sure if it gave any
benefit it could elect to hold onto the cacheline for a few more cycles or
until the operation completes. Leave that to the hardware designers.

> As I said before, if all you've got is CMPXCHG, you can actually do better than
> either algorithm we currently have.
>
>
>>my rwsem scalability patch
>
>
> Ummm... I don't remember what that changed; can you send it to me again?

I'll port it and do some testing and get back to you.

> What is it you're trying to achieve anyway? Are you trying to do
> generalisation for generalisation's sake? If so, consider that that might not
> be the best thing, and that you might end up with something that rocks for one
> or two archs and sucks for the rest.

What we're talking about is likely to be an unmeasurable performance
difference for real workloads. And even then only on SMP archs that
implement atomic ops with spinlocks. (In fact, it is probably faster on real
workloads where the icache matters).

Don't get me wrong, I most definitely will be interested in benchmarks, even
microbenchmarks, and even on obscure architectures. If it does suck on 22
architectures of course I won't ask to merge it. My feeling is that it won't
suck. We'll see what the numbers say.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-14 15:38:28

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Ahh, my mistake :)

Nick Piggin wrote:

> It is actually larger here.
>
> text data bss dec hex filename
> 970 0 0 970 3ca lib/rwsem-spinlock.o
> 576 0 0 576 240 kernel/spinlock.o
^^ should be:
35 0 0 35 23 kernel/rwsem.o

> =1546
>
> text data bss dec hex filename
> 1310 0 0 1310 51e lib/rwsem.o
> 193 0 0 193 c1 kernel/rwsem.o
> =1503

Well, it is still larger than the atomic_inc_return variant
after my patch.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-15 09:00:08

by David Howells

[permalink] [raw]

Subject: Re: Why Semaphore Hardware-Dependent?

Nick Piggin <[email protected]> wrote:

> It is actually larger here.
>
> text data bss dec hex filename
> 970 0 0 970 3ca lib/rwsem-spinlock.o
> 576 0 0 576 240 kernel/spinlock.o
> =1546
>
> text data bss dec hex filename
> 1310 0 0 1310 51e lib/rwsem.o
> 193 0 0 193 c1 kernel/rwsem.o
> =1503

What arch? FRV?

David