2010-01-19 22:22:22

by Christoph Lameter

[permalink] [raw]
Subject: [x86] Unify semaphore_32.S and rwlock_64.S


From: Christoph Lameter <[email protected]>
Subject: [x86] Unify semaphore_32.S and rwlock_64.S

Both implement semaphore handling and the code for __read_lock_failed()
and __write_lock_failed() look eearily similar aside from the use of
different registers.

Create a new arch/x86/lib/semaphore.S out of the two files.

This is also a good preparatory patch for getting the rwsem XADD stuff
to work on x86_64.

x86_64 gains the FRAME/ENDFRAME handling that i386 has (not sure what the
point is of having that there).

Signed-off-by: Christoph Lameter <[email protected]>

---
arch/x86/lib/Makefile | 5 -
arch/x86/lib/rwlock_64.S | 38 -----------
arch/x86/lib/semaphore.S | 146 ++++++++++++++++++++++++++++++++++++++++++++
arch/x86/lib/semaphore_32.S | 136 ----------------------------------------
4 files changed, 149 insertions(+), 176 deletions(-)

Index: linux-2.6/arch/x86/lib/Makefile
===================================================================
--- linux-2.6.orig/arch/x86/lib/Makefile 2010-01-19 14:35:40.000000000 -0600
+++ linux-2.6/arch/x86/lib/Makefile 2010-01-19 14:36:29.000000000 -0600
@@ -21,6 +21,7 @@ lib-y += thunk_$(BITS).o
lib-y += usercopy_$(BITS).o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_KPROBES) += insn.o inat.o
+lib-y += semaphore.o

obj-y += msr.o msr-reg.o msr-reg-export.o

@@ -28,7 +29,7 @@ ifeq ($(CONFIG_X86_32),y)
obj-y += atomic64_32.o
lib-y += checksum_32.o
lib-y += strstr_32.o
- lib-y += semaphore_32.o string_32.o
+ lib-y += string_32.o
ifneq ($(CONFIG_X86_CMPXCHG64),y)
lib-y += cmpxchg8b_emu.o
endif
@@ -38,5 +39,5 @@ else
lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
lib-y += thunk_64.o clear_page_64.o copy_page_64.o
lib-y += memmove_64.o memset_64.o
- lib-y += copy_user_64.o rwlock_64.o copy_user_nocache_64.o
+ lib-y += copy_user_64.o copy_user_nocache_64.o
endif
Index: linux-2.6/arch/x86/lib/rwlock_64.S
===================================================================
--- linux-2.6.orig/arch/x86/lib/rwlock_64.S 2010-01-19 14:35:40.000000000 -0600
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,38 +0,0 @@
-/* Slow paths of read/write spinlocks. */
-
-#include <linux/linkage.h>
-#include <asm/rwlock.h>
-#include <asm/alternative-asm.h>
-#include <asm/dwarf2.h>
-
-/* rdi: pointer to rwlock_t */
-ENTRY(__write_lock_failed)
- CFI_STARTPROC
- LOCK_PREFIX
- addl $RW_LOCK_BIAS,(%rdi)
-1: rep
- nop
- cmpl $RW_LOCK_BIAS,(%rdi)
- jne 1b
- LOCK_PREFIX
- subl $RW_LOCK_BIAS,(%rdi)
- jnz __write_lock_failed
- ret
- CFI_ENDPROC
-END(__write_lock_failed)
-
-/* rdi: pointer to rwlock_t */
-ENTRY(__read_lock_failed)
- CFI_STARTPROC
- LOCK_PREFIX
- incl (%rdi)
-1: rep
- nop
- cmpl $1,(%rdi)
- js 1b
- LOCK_PREFIX
- decl (%rdi)
- js __read_lock_failed
- ret
- CFI_ENDPROC
-END(__read_lock_failed)
Index: linux-2.6/arch/x86/lib/semaphore.S
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/arch/x86/lib/semaphore.S 2010-01-19 14:43:40.000000000 -0600
@@ -0,0 +1,146 @@
+/*
+ * i386/x86_64 semaphore implementation.
+ *
+ * (C) Copyright 1999 Linus Torvalds
+ *
+ * Portions Copyright 1999 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * rw semaphores implemented November 1999 by Benjamin LaHaise <[email protected]>
+ * 32 and 64 bit versions merged Jan 2010 by Christoph Lameter
+ */
+
+#include <linux/linkage.h>
+#include <asm/rwlock.h>
+#include <asm/alternative-asm.h>
+#include <asm/frame.h>
+#include <asm/dwarf2.h>
+
+#ifdef CONFIG_64BIT
+#define SEMPOINTER %rdi
+#else
+#define SEMPOINTER %eax
+#endif
+
+/*
+ * The semaphore operations have a special calling sequence that
+ * allow us to do a simpler in-line version of them. These routines
+ * need to convert that sequence back into the C sequence when
+ * there is contention on the semaphore.
+ *
+ * The register SEMPOINTER contains the semaphore pointer on entry. Save the
+ * C-clobbered registers except SEMPOINTER which is either a return value or
+ * just clobbered..
+ */
+ .section .sched.text, "ax"
+
+/*
+ * rw spinlock fallbacks
+ */
+#if defined(CONFIG_SMP) || defined(CONFIG_64BIT)
+ENTRY(__write_lock_failed)
+#ifdef CONFIG_64BIT
+ CFI_STARTPROC
+#else
+ CFI_STARTPROC simple
+#endif
+ FRAME
+2: LOCK_PREFIX
+ addl $ RW_LOCK_BIAS,(SEMPOINTER)
+1: rep; nop
+ cmpl $ RW_LOCK_BIAS,(SEMPOINTER)
+ jne 1b
+ LOCK_PREFIX
+ subl $ RW_LOCK_BIAS,(SEMPOINTER)
+ jnz 2b
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ENDPROC(__write_lock_failed)
+
+ENTRY(__read_lock_failed)
+ CFI_STARTPROC
+ FRAME
+2: LOCK_PREFIX
+ incl (SEMPOINTER)
+1: rep; nop
+ cmpl $1,(SEMPOINTER)
+ js 1b
+ LOCK_PREFIX
+ decl (SEMPOINTER)
+ js 2b
+ ENDFRAME
+ ret
+ CFI_ENDPROC
+ENDPROC(__read_lock_failed)
+#endif
+
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+
+/* Fix up special calling conventions */
+ENTRY(call_rwsem_down_read_failed)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ push %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ call rwsem_down_read_failed
+ pop %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ ENDPROC(call_rwsem_down_read_failed)
+
+ENTRY(call_rwsem_down_write_failed)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ calll rwsem_down_write_failed
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ ENDPROC(call_rwsem_down_write_failed)
+
+ENTRY(call_rwsem_wake)
+ CFI_STARTPROC
+ decw %dx /* do nothing if still outstanding active readers */
+ jnz 1f
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ call rwsem_wake
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+1: ret
+ CFI_ENDPROC
+ ENDPROC(call_rwsem_wake)
+
+/* Fix up special calling conventions */
+ENTRY(call_rwsem_downgrade_wake)
+ CFI_STARTPROC
+ push %ecx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET ecx,0
+ push %edx
+ CFI_ADJUST_CFA_OFFSET 4
+ CFI_REL_OFFSET edx,0
+ call rwsem_downgrade_wake
+ pop %edx
+ CFI_ADJUST_CFA_OFFSET -4
+ pop %ecx
+ CFI_ADJUST_CFA_OFFSET -4
+ ret
+ CFI_ENDPROC
+ ENDPROC(call_rwsem_downgrade_wake)
+
+#endif
Index: linux-2.6/arch/x86/lib/semaphore_32.S
===================================================================
--- linux-2.6.orig/arch/x86/lib/semaphore_32.S 2010-01-19 14:35:40.000000000 -0600
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,136 +0,0 @@
-/*
- * i386 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <[email protected]>
- */
-
-#include <linux/linkage.h>
-#include <asm/rwlock.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-#include <asm/dwarf2.h>
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax whish is either a return
- * value or just clobbered..
- */
- .section .sched.text, "ax"
-
-/*
- * rw spinlock fallbacks
- */
-#ifdef CONFIG_SMP
-ENTRY(__write_lock_failed)
- CFI_STARTPROC simple
- FRAME
-2: LOCK_PREFIX
- addl $ RW_LOCK_BIAS,(%eax)
-1: rep; nop
- cmpl $ RW_LOCK_BIAS,(%eax)
- jne 1b
- LOCK_PREFIX
- subl $ RW_LOCK_BIAS,(%eax)
- jnz 2b
- ENDFRAME
- ret
- CFI_ENDPROC
- ENDPROC(__write_lock_failed)
-
-ENTRY(__read_lock_failed)
- CFI_STARTPROC
- FRAME
-2: LOCK_PREFIX
- incl (%eax)
-1: rep; nop
- cmpl $1,(%eax)
- js 1b
- LOCK_PREFIX
- decl (%eax)
- js 2b
- ENDFRAME
- ret
- CFI_ENDPROC
- ENDPROC(__read_lock_failed)
-
-#endif
-
-#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
- CFI_STARTPROC
- push %ecx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ecx,0
- push %edx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET edx,0
- call rwsem_down_read_failed
- pop %edx
- CFI_ADJUST_CFA_OFFSET -4
- pop %ecx
- CFI_ADJUST_CFA_OFFSET -4
- ret
- CFI_ENDPROC
- ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_write_failed)
- CFI_STARTPROC
- push %ecx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ecx,0
- calll rwsem_down_write_failed
- pop %ecx
- CFI_ADJUST_CFA_OFFSET -4
- ret
- CFI_ENDPROC
- ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_wake)
- CFI_STARTPROC
- decw %dx /* do nothing if still outstanding active readers */
- jnz 1f
- push %ecx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ecx,0
- call rwsem_wake
- pop %ecx
- CFI_ADJUST_CFA_OFFSET -4
-1: ret
- CFI_ENDPROC
- ENDPROC(call_rwsem_wake)
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_downgrade_wake)
- CFI_STARTPROC
- push %ecx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ecx,0
- push %edx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET edx,0
- call rwsem_downgrade_wake
- pop %edx
- CFI_ADJUST_CFA_OFFSET -4
- pop %ecx
- CFI_ADJUST_CFA_OFFSET -4
- ret
- CFI_ENDPROC
- ENDPROC(call_rwsem_downgrade_wake)
-
-#endif


2010-01-19 22:30:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On 01/19/2010 02:21 PM, Christoph Lameter wrote:
>
> From: Christoph Lameter <[email protected]>
> Subject: [x86] Unify semaphore_32.S and rwlock_64.S
>
> Both implement semaphore handling and the code for __read_lock_failed()
> and __write_lock_failed() look eearily similar aside from the use of
> different registers.
>
> Create a new arch/x86/lib/semaphore.S out of the two files.
>

Hi Christoph,

Could you do this in the standard sequencing for unification patches:
first patch the two pieces of code so they are identical, and then
mechanically unifying them? Otherwise it's almost impossible to see
what has changed.

> This is also a good preparatory patch for getting the rwsem XADD stuff
> to work on x86_64.

Have you tried the tip:x86/rwsem branch (Linus' work with a few
additions of mine) and had it not work for you?

> x86_64 gains the FRAME/ENDFRAME handling that i386 has (not sure what the
> point is of having that there).

Presumably it's so you can have frame pointers everywhere.

-hpa

2010-01-20 19:49:40

by Christoph Lameter

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On Tue, 19 Jan 2010, H. Peter Anvin wrote:

> Could you do this in the standard sequencing for unification patches:
> first patch the two pieces of code so they are identical, and then
> mechanically unifying them? Otherwise it's almost impossible to see
> what has changed.

Hmmm... Okay I better do that on top of your patches then.

> > This is also a good preparatory patch for getting the rwsem XADD stuff
> > to work on x86_64.
>
> Have you tried the tip:x86/rwsem branch (Linus' work with a few
> additions of mine) and had it not work for you?

No I just saw it. Linus first patch increases the 64/32 bit separation by
creating yet another 64 bit specific file. Can we avoid that and have
code that is shared as much as possible between 32 and 64 bit?

Then there is another that does the %z0 trick while we already have the
proper definitions for that in include/asm/asm.h. Seems that you have
switched to using those. Was that done consistently?

Why have a rwsem_count_t when a simple long would do in both cases? Just
make sure that long is consistently used.

__downgrade_write: Why use the inc trick instead of the add
like in 32 bit? There is not much difference and it results in much
stabler code.

> > x86_64 gains the FRAME/ENDFRAME handling that i386 has (not sure what the
> > point is of having that there).
>
> Presumably it's so you can have frame pointers everywhere.

For a small code segment that does not do any subroutine calls?

2010-01-20 20:15:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On 01/20/2010 11:49 AM, Christoph Lameter wrote:
> On Tue, 19 Jan 2010, H. Peter Anvin wrote:
>
>> Could you do this in the standard sequencing for unification patches:
>> first patch the two pieces of code so they are identical, and then
>> mechanically unifying them? Otherwise it's almost impossible to see
>> what has changed.
>
> Hmmm... Okay I better do that on top of your patches then.
>
>>> This is also a good preparatory patch for getting the rwsem XADD stuff
>>> to work on x86_64.
>>
>> Have you tried the tip:x86/rwsem branch (Linus' work with a few
>> additions of mine) and had it not work for you?
>
> No I just saw it. Linus first patch increases the 64/32 bit separation by
> creating yet another 64 bit specific file. Can we avoid that and have
> code that is shared as much as possible between 32 and 64 bit?

The ABI is completely different between 32 and 64 bits. The stubs avoid
keeping track of *those* differences in each and every inline. It might
be possible with macros, but there is something that really is very
different: for x86-32, there are only three function-clobbered
registers, which we pretty much need to use anyway. For x86-64, there
are a lot more -- which means that each callsite would end up having gcc
generate save/restore code that would be in the fast path. Linus' patch
pushes that into the slow path, which seems significantly better to me.

The new file seems like a very good way to deal with the ABI/register
set differences here.

> Then there is another that does the %z0 trick while we already have the
> proper definitions for that in include/asm/asm.h. Seems that you have
> switched to using those. Was that done consistently?

The %z0 trick would have been type-safe. Unfortunately some versions of
gcc simply generate incorrect code with it, which is why I switched back
to the <asm/asm.h> macros (and yes, I got rid of all the %z's by sheer
necessity.)

> Why have a rwsem_count_t when a simple long would do in both cases? Just
> make sure that long is consistently used.

The motivation for rwsem_count_t seemed to be making it easier to switch
over. I leave it up to Linus to motivate the typedef... I have to say,
though, that using a typedef also tells you want the number is for.

> __downgrade_write: Why use the inc trick instead of the add
> like in 32 bit? There is not much difference and it results in much
> stabler code.

Because you can't do an add with a 64-bit immediate! Yes, we could have
loaded it into a register, but that would have required an additional
10-byte(!) instruction for no good reason.

>>> x86_64 gains the FRAME/ENDFRAME handling that i386 has (not sure what the
>>> point is of having that there).
>>
>> Presumably it's so you can have frame pointers everywhere.
>
> For a small code segment that does not do any subroutine calls?

It's kind of redundant, yes, but that was presumably the logic.

-hpa

2010-01-20 20:52:08

by Christoph Lameter

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On Wed, 20 Jan 2010, H. Peter Anvin wrote:

> > No I just saw it. Linus first patch increases the 64/32 bit separation by
> > creating yet another 64 bit specific file. Can we avoid that and have
> > code that is shared as much as possible between 32 and 64 bit?
>
> The ABI is completely different between 32 and 64 bits. The stubs avoid
> keeping track of *those* differences in each and every inline. It might
> be possible with macros, but there is something that really is very
> different: for x86-32, there are only three function-clobbered
> registers, which we pretty much need to use anyway. For x86-64, there
> are a lot more -- which means that each callsite would end up having gcc
> generate save/restore code that would be in the fast path. Linus' patch
> pushes that into the slow path, which seems significantly better to me.

That does not seem to be such a problematic thing to solve.

> > Why have a rwsem_count_t when a simple long would do in both cases? Just
> > make sure that long is consistently used.
>
> The motivation for rwsem_count_t seemed to be making it easier to switch
> over. I leave it up to Linus to motivate the typedef... I have to say,
> though, that using a typedef also tells you want the number is for.

I thought we discourage such typedefs?

> > __downgrade_write: Why use the inc trick instead of the add
> > like in 32 bit? There is not much difference and it results in much
> > stabler code.
>
> Because you can't do an add with a 64-bit immediate! Yes, we could have
> loaded it into a register, but that would have required an additional
> 10-byte(!) instruction for no good reason.

Well 2^32 readers is a bit large anyways. If we are satisifed with 2^30
(only a billion) then it works with the same code.

2010-01-20 23:46:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S



On Wed, 20 Jan 2010, H. Peter Anvin wrote:
>
> The motivation for rwsem_count_t seemed to be making it easier to switch
> over. I leave it up to Linus to motivate the typedef...

Indeed. I wanted to separate the issue of cleaning up the mask values from
cleaning up the code. That required that we'd have a separate type that
can change in _one_ place with the bias values.

It was also an unholy type mess before, so the rwsem_count_t also was
about making that unholy mess clearer. That's especially true since the
type will also have to percolate down to the slow cases before you can
actually fix the BIAS values.

As to keeping the 64-bit and 32-bit slow case wrappers separate: this is
very much one case where we _have_ to. Trying to pretend that 32-bit code
and 64-bit code is similar in this case is just lying. The calling
conventions are totally different, there is nothing similar there at all
(different argument registers, different clobbered registers, different
_everything_ that matters for the wrapper).

Linus

2010-01-20 23:57:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S



On Wed, 20 Jan 2010, Christoph Lameter wrote:
>
> Well 2^32 readers is a bit large anyways. If we are satisifed with 2^30
> (only a billion) then it works with the same code.

Yes, that's what I would suggest. Make the constants be (for the 64-bit
case)

#define RWSEM_UNLOCKED_VALUE 0x00000000
#define RWSEM_ACTIVE_BIAS 0x00000001
#define RWSEM_ACTIVE_MASK 0x3fffffff
#define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)

and now all the constants should be expressable as 32-bit (signed) values.

Side note: it might be interesting to keep the rwsem_count_t be a config
option on x86-64 too, so this would _not_ necessarily always be a "x86-32"
vs "x86-64" issue. A raw spinlock is 32-bit, which together with a 32-bit
rwsem_count would make the resem's smaller. Does it matter? Maybe not. But
we might at some point decide that it's worth limiting number of threads
to 32k in certain configurations, so I'd keep my options open.

So make the size of the counter be a CONFIG_RWSEM_LARGE thing, rather than
a 32-bit vs 64-bit thing. And just start out with making x86-64 select it,
but leaving the option open to use the 32-bit version on x86-64 too?

Linus

2010-01-21 00:02:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On 01/20/2010 03:57 PM, Linus Torvalds wrote:
>
>
> On Wed, 20 Jan 2010, Christoph Lameter wrote:
>>
>> Well 2^32 readers is a bit large anyways. If we are satisifed with 2^30
>> (only a billion) then it works with the same code.
>
> Yes, that's what I would suggest. Make the constants be (for the 64-bit
> case)
>
> #define RWSEM_UNLOCKED_VALUE 0x00000000
> #define RWSEM_ACTIVE_BIAS 0x00000001
> #define RWSEM_ACTIVE_MASK 0x3fffffff
> #define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
> #define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
> #define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
>
> and now all the constants should be expressable as 32-bit (signed) values.
>
> Side note: it might be interesting to keep the rwsem_count_t be a config
> option on x86-64 too, so this would _not_ necessarily always be a "x86-32"
> vs "x86-64" issue. A raw spinlock is 32-bit, which together with a 32-bit
> rwsem_count would make the resem's smaller. Does it matter? Maybe not. But
> we might at some point decide that it's worth limiting number of threads
> to 32k in certain configurations, so I'd keep my options open.
>
> So make the size of the counter be a CONFIG_RWSEM_LARGE thing, rather than
> a 32-bit vs 64-bit thing. And just start out with making x86-64 select it,
> but leaving the option open to use the 32-bit version on x86-64 too?
>

I'm somewhat unhappy about that notion, mostly because it means Yet
Another Thing To Verify[TM]. I would like to look at the relative code
sizes of 2^31 and 2^30, however, if all it means is that *one*
instruction in *one* asm has to be different, I'd rather leave it at 2^31.

-hpa

2010-01-21 00:47:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S



On Wed, 20 Jan 2010, Linus Torvalds wrote:
>
> #define RWSEM_UNLOCKED_VALUE 0x00000000
> #define RWSEM_ACTIVE_BIAS 0x00000001
> #define RWSEM_ACTIVE_MASK 0x3fffffff
> #define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
> #define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
> #define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)

Btw, doing that RWSEM_WAITING_BIAS cleanup (we currently have it as an
independent constant) means that now all constants are shared except for
that RWSEM_ACTIVE_MASK. So it ends up being something like this:

#ifdef CONFIG_RWSEM_64bit
typedef __s64 rwsem_count_t;
#define RWSEM_ACTIVE_MASK 0x3fffffff
#else
typedef __s32 rwsem_count_t;
#define RWSEM_ACTIVE_MASK 0xffff
#endif

#define RWSEM_UNLOCKED_VALUE 0x00000000
#define RWSEM_ACTIVE_BIAS 0x00000001
#define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)

with just that two-line difference for the 32-bit/64-bit case.

At least I _think_ so.

And it's worth noting (again) that I didn't actually push the
twsem_count_t changes down into the slow-path code in lib/rwsem.c. There's
a few variables there that might need looking at too. I _think_ they are
all ok as-is (unlike the header file, lib/rwsem.c seems to consistently
use 'signed long' rather than mix 32-bit and 64-bit types), but it migh be
cleaner to make them rwsem_count_t's too.

Linus

2010-01-21 00:56:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On 01/20/2010 04:46 PM, Linus Torvalds wrote:
>
>
> On Wed, 20 Jan 2010, Linus Torvalds wrote:
>>
>> #define RWSEM_UNLOCKED_VALUE 0x00000000
>> #define RWSEM_ACTIVE_BIAS 0x00000001
>> #define RWSEM_ACTIVE_MASK 0x3fffffff
>> #define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
>> #define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
>> #define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
>
> Btw, doing that RWSEM_WAITING_BIAS cleanup (we currently have it as an
> independent constant) means that now all constants are shared except for
> that RWSEM_ACTIVE_MASK. So it ends up being something like this:
>
> #ifdef CONFIG_RWSEM_64bit
> typedef __s64 rwsem_count_t;
> #define RWSEM_ACTIVE_MASK 0x3fffffff
> #else
> typedef __s32 rwsem_count_t;
> #define RWSEM_ACTIVE_MASK 0xffff
> #endif
>
> #define RWSEM_UNLOCKED_VALUE 0x00000000
> #define RWSEM_ACTIVE_BIAS 0x00000001
> #define RWSEM_WAITING_BIAS (~RWSEM_ACTIVE_MASK)
> #define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
> #define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
>
> with just that two-line difference for the 32-bit/64-bit case.
>
> At least I _think_ so.
>

Yes, I already had that change in my tree (or rather, I wrote it as
(-RWSEM_ACTIVE_MASK-1) to be consistent with what was previously there,
but (~RWSEM_ACTIVE_MASK) makes more sense.)

> And it's worth noting (again) that I didn't actually push the
> twsem_count_t changes down into the slow-path code in lib/rwsem.c. There's
> a few variables there that might need looking at too. I _think_ they are
> all ok as-is (unlike the header file, lib/rwsem.c seems to consistently
> use 'signed long' rather than mix 32-bit and 64-bit types), but it migh be
> cleaner to make them rwsem_count_t's too.

Yes, if we have it we should it consistently.

-hpa

2010-01-21 06:34:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [x86] Unify semaphore_32.S and rwlock_64.S

On 01/20/2010 04:02 PM, H. Peter Anvin wrote:
> On 01/20/2010 03:57 PM, Linus Torvalds wrote:
>>
>
> I'm somewhat unhappy about that notion, mostly because it means Yet
> Another Thing To Verify[TM]. I would like to look at the relative code
> sizes of 2^31 and 2^30, however, if all it means is that *one*
> instruction in *one* asm has to be different, I'd rather leave it at 2^31.
>

Well, there is no size difference within measurable limits (an
x86-64-allyesconfig build is 60(!) bytes larger with 2^31 and the incl,
but that's well within the good luck/bad luck with alignments threshold...)

As such, I'd personally prefer to leave it with 1:31:31:1 split, if
nothing else because it reads a bit tidier to me, despite the minor wart
for the incl trick.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.