The following patchset implements the ability to specify a
synchronization mode for bit operations.
I.e. instead of set_bit(x,y) we can do set_bit(x,y, mode).
The following modes are supported:
MODE_NON_ATOMIC
Use non atomic version.
F.e. set_bit(x,y, MODE_NON_ATOMIC) == __set_bit(x,y)
MODE_ATOMIC
The operation is atomic but there is no guarantee how this
operation is ordered respective to other memory operations.
MODE_ACQUIRE
An atomic operation that is guaranteed to occur before
all subsequent memory accesses
MODE_RELEASE
An atomic operation that is guaranteed to occur after
all previos memory acceses.
MODE_BARRIER
An atomic operation that is guaranteed to occur between
previous and later memory operations.
For architectures that have no support for bitops with modes we
fall back to some combination of memory barriers and atomic ops.
This patchset defines architecture support for only IA64.
Others could be done in a similar fashion.
Note that the current semantics for bitops IA64 are broken. Both
smp_mb__after/before_clear_bit are now set to full memory barriers
to compensate which may affect performance.
The kernel core code would need to be fixed to add the proper
synchronization modes to restore prior performance (with then
correct locking semantics). If kernel code wants to use synchronization
modes then an
#include <asm/bitops_mode.h>
needs to be added.
Signed-off-by: Christoph Lameter <[email protected]>
Index: linux-2.6.16-mm2/include/asm-ia64/bitops.h
===================================================================
--- linux-2.6.16-mm2.orig/include/asm-ia64/bitops.h 2006-03-19 21:53:29.000000000 -0800
+++ linux-2.6.16-mm2/include/asm-ia64/bitops.h 2006-03-30 12:50:15.000000000 -0800
@@ -11,6 +11,7 @@
#include <linux/compiler.h>
#include <linux/types.h>
+#include <asm/bitops_mode.h>
#include <asm/bitops.h>
#include <asm/intrinsics.h>
@@ -19,8 +20,6 @@
* @nr: the bit to set
* @addr: the address to start counting from
*
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
* Note that @nr may be almost arbitrarily large; this function is not
* restricted to acting on a single-word quantity.
*
@@ -34,244 +33,106 @@
static __inline__ void
set_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = 1 << (nr & 31);
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old | bit;
- } while (cmpxchg_acq(m, old, new) != old);
+ set_bit_mode(nr, addr, MODE_ATOMIC);
}
/**
* __set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
*/
static __inline__ void
__set_bit (int nr, volatile void *addr)
{
- *((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
+ set_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
-/*
- * clear_bit() has "acquire" semantics.
- */
#define smp_mb__before_clear_bit() smp_mb()
-#define smp_mb__after_clear_bit() do { /* skip */; } while (0)
+#define smp_mb__after_clear_bit() smp_mb()
/**
* clear_bit - Clears a bit in memory
* @nr: Bit to clear
* @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
- * in order to ensure changes are visible on other processors.
*/
static __inline__ void
clear_bit (int nr, volatile void *addr)
{
- __u32 mask, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- mask = ~(1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old & mask;
- } while (cmpxchg_acq(m, old, new) != old);
+ clear_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __clear_bit - Clears a bit in memory (non-atomic version)
- */
static __inline__ void
__clear_bit (int nr, volatile void *addr)
{
- volatile __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- *p &= ~m;
+ clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* change_bit - Toggle a bit in memory
* @nr: Bit to clear
* @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
*/
static __inline__ void
change_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = (1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old ^ bit;
- } while (cmpxchg_acq(m, old, new) != old);
+ change_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __change_bit - Toggle a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
static __inline__ void
__change_bit (int nr, volatile void *addr)
{
- *((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
+ change_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_set_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = 1 << (nr & 31);
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old | bit;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & bit) != 0;
+ return test_and_set_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
static __inline__ int
__test_and_set_bit (int nr, volatile void *addr)
{
- __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- int oldbitset = (*p & m) != 0;
-
- *p |= m;
- return oldbitset;
+ return test_and_set_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_clear_bit - Clear a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_clear_bit (int nr, volatile void *addr)
{
- __u32 mask, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- mask = ~(1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old & mask;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & ~mask) != 0;
+ return test_and_clear_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
static __inline__ int
__test_and_clear_bit(int nr, volatile void * addr)
{
- __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- int oldbitset = *p & m;
-
- *p &= ~m;
- return oldbitset;
+ return test_and_clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_change_bit - Change a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_change_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = (1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old ^ bit;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & bit) != 0;
+ return test_and_change_bit_mode(nr, addr, MODE_ATOMIC);
}
-/*
- * WARNING: non atomic version.
- */
static __inline__ int
__test_and_change_bit (int nr, void *addr)
{
- __u32 old, bit = (1 << (nr & 31));
- __u32 *m = (__u32 *) addr + (nr >> 5);
-
- old = *m;
- *m = old ^ bit;
- return (old & bit) != 0;
+ return test_and_change_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
static __inline__ int
Index: linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h 2006-03-30 12:50:15.000000000 -0800
@@ -0,0 +1,201 @@
+#ifndef _ASM_IA64_BITOPS_MODE_H
+#define _ASM_IA64_BITOPS_MODE_H
+
+/*
+ * Copyright (C) 2006 Silicon Graphics, Incorporated
+ * Christoph Lameter <[email protected]>
+ *
+ * Bit operations with the ability to specify the synchronization mode
+ */
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/intrinsics.h>
+
+#define MODE_NON_ATOMIC 0
+#define MODE_ATOMIC 1
+#define MODE_ACQUIRE 2
+#define MODE_RELEASE 3
+#define MODE_BARRIER 4
+
+static __inline__ __u32 cmpxchg_mode(volatile __u32 *m, __u32 old, __u32 new, int mode)
+{
+ switch (mode) {
+ case MODE_ATOMIC :
+ case MODE_ACQUIRE :
+ return cmpxchg_acq(m, old, new);
+
+ case MODE_RELEASE :
+ return cmpxchg_rel(m, old, new);
+
+ case MODE_BARRIER :
+ cmpxchg_rel(m, old, new);
+ return *m; /* Volatile load = acquire */
+ }
+}
+
+
+/**
+ * set_bit_mode - set a bit in memory
+ *
+ * The address must be (at least) "long" aligned.
+ * Note that there are driver (e.g., eepro100) which use these operations to
+ * operate on hw-defined data-structures, so we can't easily change these
+ * operations to force a bigger alignment.
+ *
+ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
+ */
+static __inline__ void
+set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = 1 << (nr & 31);
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m |= bit;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old | bit;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+}
+
+/**
+ * clear_bit_mode - Clears a bit in memory
+ */
+static __inline__ void
+clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 mask, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ mask = ~(1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m &= ~mask;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old & mask;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+}
+
+/**
+ * change_bit_mode - Toggle a bit in memory
+ */
+static __inline__ void
+change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = (1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m ^= bit;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old ^ bit;
+ } while (cmpxchg_acq(m, old, new) != old);
+}
+
+/**
+ * test_and_set_bit_mode - Set a bit and return its old value
+ */
+static __inline__ int
+test_and_set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = 1 << (nr & 31);
+
+ if (mode == MODE_NON_ATOMIC) {
+ int oldbitset = *m & bit;
+ *m |= bit;
+ return oldbitset;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old | bit;
+ } while (cmpxchg_acq(m, old, new) != old);
+ return (old & bit) != 0;
+}
+
+/**
+ * test_and_clear_bit_mode - Clear a bit and return its old value
+ */
+static __inline__ int
+test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 mask, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ mask = ~(1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ int oldbitset = *m & mask;
+ *m &= ~mask;
+ return oldbitset;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old & mask;
+ } while (cmpxchg_acq(m, old, new) != old);
+ return (old & ~mask) != 0;
+}
+
+/**
+ * test_and_change_bit_mode - Change a bit and return its old value
+ */
+static __inline__ int
+test_and_change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = (1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ old = *m;
+ *m = old ^ bit;
+ return (old & bit) != 0;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old ^ bit;
+ } while (cmpxchg_acq(m, old, new) != old);
+ return (old & bit) != 0;
+}
+
+#endif /* _ASM_IA64_BITOPS_MODE_H */
Index: linux-2.6.16-mm2/include/asm-generic/bitops_mode.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.16-mm2/include/asm-generic/bitops_mode.h 2006-03-30 12:50:15.000000000 -0800
@@ -0,0 +1,220 @@
+#ifndef _ASM_IA64_BITOPS_MODE_H
+#define _ASM_IA64_BITOPS_MODE_H
+
+/*
+ * Copyright (C) 2006 Silicon Graphics, Incorporated
+ * Christoph Lameter <[email protected]>
+ *
+ * Fallback logic for bit operations with synchronization mode
+ */
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/intrinsics.h>
+
+#define MODE_NON_ATOMIC 0
+#define MODE_ATOMIC 1
+#define MODE_ACQUIRE 2
+#define MODE_RELEASE 3
+#define MODE_BARRIER 4
+
+/**
+ * set_bit_mode - Set a bit in memory
+ *
+ * The address must be (at least) "long" aligned.
+ * Note that there are driver (e.g., eepro100) which use these operations to
+ * operate on hw-defined data-structures, so we can't easily change these
+ * operations to force a bigger alignment.
+ *
+ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
+ */
+static __inline__ void
+set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __set_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ set_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ set_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ set_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ set_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * clear_bit_mode - Clears a bit in memory
+ */
+static __inline__ void
+clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __clear_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ clear_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ clear_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ clear_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ clear_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * change_bit_mode - Toggle a bit in memory
+ */
+static __inline__ void
+change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __change_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ change_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ change_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ change_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ change_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * test_and_set_bit_mode - Set a bit and return its old value
+ */
+static __inline__ int
+test_and_set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_set_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_set_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_set_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_set_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_set_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+/**
+ * test_and_clear_bit - Clear a bit and return its old value
+ */
+static __inline__ int
+test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_clear_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_clear_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_clear_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_clear_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_set_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+/**
+ * test_and_change_bit - Change a bit and return its old value
+ */
+static __inline__ int
+test_and_change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_change_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_change_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_change_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_change_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_change_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+#endif /* _ASM_IA64_BITOPS_MODE_H */
Changelog:
V2
- Fix various oversights
- Follow Hans Boehm's scheme for the barrier logic
The following patchset implements the ability to specify a
synchronization mode for bit operations.
I.e. instead of set_bit(x,y) we can do set_bit(x,y, mode).
The following modes are supported:
MODE_NON_ATOMIC
Use non atomic version.
F.e. set_bit(x,y, MODE_NON_ATOMIC) == __set_bit(x,y)
MODE_ATOMIC
The operation is atomic but there is no guarantee how this
operation is ordered respective to other memory operations.
MODE_ACQUIRE
An atomic operation that is guaranteed to occur before
all subsequent memory accesses
MODE_RELEASE
An atomic operation that is guaranteed to occur after
all previos memory acceses.
MODE_BARRIER
An atomic operation that is guaranteed to occur between
previous and later memory operations.
For architectures that have no support for bitops with modes we
fall back to some combination of memory barriers and atomic ops.
This patchset defines architecture support for only IA64.
Others could be done in a similar fashion.
Note that the current semantics for bitops IA64 are broken. Both
smp_mb__after/before_clear_bit are now set to full memory barriers
to compensate which may affect performance.
The kernel core code would need to be fixed to add the proper
synchronization modes to restore prior performance (with then
correct locking semantics). If kernel code wants to use synchronization
modes then an
#include <asm/bitops_mode.h>
needs to be added.
Signed-off-by: Christoph Lameter <[email protected]>
Index: linux-2.6.16-mm2/include/asm-ia64/bitops.h
===================================================================
--- linux-2.6.16-mm2.orig/include/asm-ia64/bitops.h 2006-03-30 15:01:21.000000000 -0800
+++ linux-2.6.16-mm2/include/asm-ia64/bitops.h 2006-03-30 15:44:36.000000000 -0800
@@ -11,6 +11,7 @@
#include <linux/compiler.h>
#include <linux/types.h>
+#include <asm/bitops_mode.h>
#include <asm/bitops.h>
#include <asm/intrinsics.h>
@@ -19,8 +20,6 @@
* @nr: the bit to set
* @addr: the address to start counting from
*
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
* Note that @nr may be almost arbitrarily large; this function is not
* restricted to acting on a single-word quantity.
*
@@ -34,244 +33,106 @@
static __inline__ void
set_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = 1 << (nr & 31);
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old | bit;
- } while (cmpxchg_acq(m, old, new) != old);
+ set_bit_mode(nr, addr, MODE_ATOMIC);
}
/**
* __set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
*/
static __inline__ void
__set_bit (int nr, volatile void *addr)
{
- *((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
+ set_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
-/*
- * clear_bit() has "acquire" semantics.
- */
#define smp_mb__before_clear_bit() smp_mb()
-#define smp_mb__after_clear_bit() do { /* skip */; } while (0)
+#define smp_mb__after_clear_bit() smp_mb()
/**
* clear_bit - Clears a bit in memory
* @nr: Bit to clear
* @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
- * in order to ensure changes are visible on other processors.
*/
static __inline__ void
clear_bit (int nr, volatile void *addr)
{
- __u32 mask, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- mask = ~(1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old & mask;
- } while (cmpxchg_acq(m, old, new) != old);
+ clear_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __clear_bit - Clears a bit in memory (non-atomic version)
- */
static __inline__ void
__clear_bit (int nr, volatile void *addr)
{
- volatile __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- *p &= ~m;
+ clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* change_bit - Toggle a bit in memory
* @nr: Bit to clear
* @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
*/
static __inline__ void
change_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = (1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old ^ bit;
- } while (cmpxchg_acq(m, old, new) != old);
+ change_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __change_bit - Toggle a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
static __inline__ void
__change_bit (int nr, volatile void *addr)
{
- *((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
+ change_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_set_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = 1 << (nr & 31);
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old | bit;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & bit) != 0;
+ return test_and_set_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
static __inline__ int
__test_and_set_bit (int nr, volatile void *addr)
{
- __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- int oldbitset = (*p & m) != 0;
-
- *p |= m;
- return oldbitset;
+ return test_and_set_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_clear_bit - Clear a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_clear_bit (int nr, volatile void *addr)
{
- __u32 mask, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- mask = ~(1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old & mask;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & ~mask) != 0;
+ return test_and_clear_bit_mode(nr, addr, MODE_ATOMIC);
}
-/**
- * __test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
static __inline__ int
__test_and_clear_bit(int nr, volatile void * addr)
{
- __u32 *p = (__u32 *) addr + (nr >> 5);
- __u32 m = 1 << (nr & 31);
- int oldbitset = *p & m;
-
- *p &= ~m;
- return oldbitset;
+ return test_and_clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
/**
* test_and_change_bit - Change a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
*/
static __inline__ int
test_and_change_bit (int nr, volatile void *addr)
{
- __u32 bit, old, new;
- volatile __u32 *m;
- CMPXCHG_BUGCHECK_DECL
-
- m = (volatile __u32 *) addr + (nr >> 5);
- bit = (1 << (nr & 31));
- do {
- CMPXCHG_BUGCHECK(m);
- old = *m;
- new = old ^ bit;
- } while (cmpxchg_acq(m, old, new) != old);
- return (old & bit) != 0;
+ return test_and_change_bit_mode(nr, addr, MODE_ATOMIC);
}
-/*
- * WARNING: non atomic version.
- */
static __inline__ int
__test_and_change_bit (int nr, void *addr)
{
- __u32 old, bit = (1 << (nr & 31));
- __u32 *m = (__u32 *) addr + (nr >> 5);
-
- old = *m;
- *m = old ^ bit;
- return (old & bit) != 0;
+ return test_and_change_bit_mode(nr, addr, MODE_NON_ATOMIC);
}
static __inline__ int
Index: linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h 2006-03-30 16:07:22.000000000 -0800
@@ -0,0 +1,204 @@
+#ifndef _ASM_IA64_BITOPS_MODE_H
+#define _ASM_IA64_BITOPS_MODE_H
+
+/*
+ * Copyright (C) 2006 Silicon Graphics, Incorporated
+ * Christoph Lameter <[email protected]>
+ *
+ * Bit operations with the ability to specify the synchronization mode
+ */
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/intrinsics.h>
+
+#define MODE_NON_ATOMIC 0
+#define MODE_ATOMIC 1
+#define MODE_ACQUIRE 2
+#define MODE_RELEASE 3
+#define MODE_BARRIER 4
+
+static __inline__ __u32 cmpxchg_mode(volatile __u32 *m, __u32 old, __u32 new, int mode)
+{
+ __u32 x;
+
+ switch (mode) {
+ case MODE_ATOMIC :
+ case MODE_ACQUIRE :
+ return cmpxchg_acq(m, old, new);
+
+ case MODE_RELEASE :
+ return cmpxchg_rel(m, old, new);
+
+ case MODE_BARRIER :
+ x = cmpxchg_rel(m, old, new);
+ ia64_mf();
+ return x;
+ }
+}
+
+
+/**
+ * set_bit_mode - set a bit in memory
+ *
+ * The address must be (at least) "long" aligned.
+ * Note that there are driver (e.g., eepro100) which use these operations to
+ * operate on hw-defined data-structures, so we can't easily change these
+ * operations to force a bigger alignment.
+ *
+ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
+ */
+static __inline__ void
+set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = 1 << (nr & 31);
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m |= bit;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old | bit;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+}
+
+/**
+ * clear_bit_mode - Clears a bit in memory
+ */
+static __inline__ void
+clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 mask, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ mask = ~(1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m &= mask;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old & mask;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+}
+
+/**
+ * change_bit_mode - Toggle a bit in memory
+ */
+static __inline__ void
+change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = (1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ *m ^= bit;
+ return;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old ^ bit;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+}
+
+/**
+ * test_and_set_bit_mode - Set a bit and return its old value
+ */
+static __inline__ int
+test_and_set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = 1 << (nr & 31);
+
+ if (mode == MODE_NON_ATOMIC) {
+ int oldbitset = *m & bit;
+ *m |= bit;
+ return oldbitset;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old | bit;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+ return (old & bit) != 0;
+}
+
+/**
+ * test_and_clear_bit_mode - Clear a bit and return its old value
+ */
+static __inline__ int
+test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 mask, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ mask = ~(1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ int oldbitset = *m & mask;
+ *m &= mask;
+ return oldbitset;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old & mask;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+ return (old & ~mask) != 0;
+}
+
+/**
+ * test_and_change_bit_mode - Change a bit and return its old value
+ */
+static __inline__ int
+test_and_change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ __u32 bit, old, new;
+ volatile __u32 *m;
+ CMPXCHG_BUGCHECK_DECL
+
+ m = (volatile __u32 *) addr + (nr >> 5);
+ bit = (1 << (nr & 31));
+
+ if (mode == MODE_NON_ATOMIC) {
+ old = *m;
+ *m = old ^ bit;
+ return (old & bit) != 0;
+ }
+
+ do {
+ CMPXCHG_BUGCHECK(m);
+ old = *m;
+ new = old ^ bit;
+ } while (cmpxchg_mode(m, old, new, mode) != old);
+ return (old & bit) != 0;
+}
+
+#endif /* _ASM_IA64_BITOPS_MODE_H */
Index: linux-2.6.16-mm2/include/asm-generic/bitops_mode.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.16-mm2/include/asm-generic/bitops_mode.h 2006-03-30 15:44:36.000000000 -0800
@@ -0,0 +1,220 @@
+#ifndef _ASM_GENERIC_BITOPS_MODE_H
+#define _ASM_GENERIC_BITOPS_MODE_H
+
+/*
+ * Copyright (C) 2006 Silicon Graphics, Incorporated
+ * Christoph Lameter <[email protected]>
+ *
+ * Fallback logic for bit operations with synchronization mode
+ */
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <asm/intrinsics.h>
+
+#define MODE_NON_ATOMIC 0
+#define MODE_ATOMIC 1
+#define MODE_ACQUIRE 2
+#define MODE_RELEASE 3
+#define MODE_BARRIER 4
+
+/**
+ * set_bit_mode - Set a bit in memory
+ *
+ * The address must be (at least) "long" aligned.
+ * Note that there are driver (e.g., eepro100) which use these operations to
+ * operate on hw-defined data-structures, so we can't easily change these
+ * operations to force a bigger alignment.
+ *
+ * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
+ */
+static __inline__ void
+set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __set_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ set_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ set_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ set_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ set_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * clear_bit_mode - Clears a bit in memory
+ */
+static __inline__ void
+clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __clear_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ clear_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ clear_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ clear_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ clear_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * change_bit_mode - Toggle a bit in memory
+ */
+static __inline__ void
+change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ __change_bit(nr,addr);
+ return;
+
+ case MODE_ATOMIC:
+ change_bit(nr,addr);
+ return;
+
+ case MODE_ACQUIRE:
+ change_bit(nr,addr);
+ smp_mb();
+ return;
+
+ case MODE_RELEASE:
+ smb_mb();
+ change_bit(nr,addr);
+ return;
+
+ case MODE_BARRIER:
+ smb_mb();
+ change_bit(nr,addr);
+ smb_mb();
+ return;
+ }
+}
+
+/**
+ * test_and_set_bit_mode - Set a bit and return its old value
+ */
+static __inline__ int
+test_and_set_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_set_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_set_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_set_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_set_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_set_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+/**
+ * test_and_clear_bit - Clear a bit and return its old value
+ */
+static __inline__ int
+test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_clear_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_clear_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_clear_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_clear_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_set_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+/**
+ * test_and_change_bit - Change a bit and return its old value
+ */
+static __inline__ int
+test_and_change_bit_mode (int nr, volatile void *addr, int mode)
+{
+ int x;
+ switch (mode) {
+ case MODE_NON_ATOMIC:
+ return __test_and_change_bit(nr,addr);
+
+ case MODE_ATOMIC:
+ return test_and_change_bit(nr,addr);
+
+ case MODE_ACQUIRE:
+ x = test_and_change_bit(nr,addr);
+ smp_mb();
+ return x;
+
+ case MODE_RELEASE:
+ smb_mb();
+ return test_and_change_bit(nr,addr);
+
+ case MODE_BARRIER:
+ smb_mb();
+ x = test_and_change_bit(nr,addr);
+ smb_mb();
+ return x;
+ }
+}
+
+#endif /* _ASM_GENERIC_BITOPS_MODE_H */
Christoph Lameter wrote on Thursday, March 30, 2006 4:18 PM
> Note that the current semantics for bitops IA64 are broken. Both
> smp_mb__after/before_clear_bit are now set to full memory barriers
> to compensate
Why you say that? clear_bit has built-in acq or rel semantic depends
on how you define it. I think only one of smp_mb__after/before need to
be smp_mb?
> static __inline__ void
> clear_bit (int nr, volatile void *addr)
> {
> - __u32 mask, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - mask = ~(1 << (nr & 31));
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old & mask;
> - } while (cmpxchg_acq(m, old, new) != old);
> + clear_bit_mode(nr, addr, MODE_ATOMIC);
> }
I would make that MODE_RELEASE for clear_bit, simply to match the
observation that clear_bit is usually used in unlock path and have
potential less surprises.
> +static __inline__ void
> +set_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 bit, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + bit = 1 << (nr & 31);
> +
> + if (mode == MODE_NON_ATOMIC) {
> + *m |= bit;
> + return;
> + }
Please kill all volatile declaration, because for non-atomic version,
you don't need to do any memory ordering, but compiler automatically
adds memory order because of volatile. It's safe to kill them because
cmpxchg later has explicit mode in there.
Same thing goes to all other bit ops.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 4:18 PM
> > Note that the current semantics for bitops IA64 are broken. Both
> > smp_mb__after/before_clear_bit are now set to full memory barriers
> > to compensate
>
> Why you say that? clear_bit has built-in acq or rel semantic depends
> on how you define it. I think only one of smp_mb__after/before need to
> be smp_mb?
clear_bit has no barrier semantics just acquire. Therefore both smp_mb_*
need to be barriers or they need to add some form of "release".
> > +static __inline__ void
> > +set_bit_mode (int nr, volatile void *addr, int mode)
> > +{
> > + __u32 bit, old, new;
> > + volatile __u32 *m;
> > + CMPXCHG_BUGCHECK_DECL
> > +
> > + m = (volatile __u32 *) addr + (nr >> 5);
> > + bit = 1 << (nr & 31);
> > +
> > + if (mode == MODE_NON_ATOMIC) {
> > + *m |= bit;
> > + return;
> > + }
>
> Please kill all volatile declaration, because for non-atomic version,
> you don't need to do any memory ordering, but compiler automatically
> adds memory order because of volatile. It's safe to kill them because
> cmpxchg later has explicit mode in there.
Ok. V3 will have that.
Christoph,
I have to agree with Hans and I'd much prefer making the mode part of
the operation's
name and not a parameter. Besides being The Right Thing, it saves a
lot of typing.
For example:
+ set_bit_mode(nr, addr, MODE_ATOMIC);
would simply become
+ set_bit_atomic(nr, addr);
Seems much better to me.
--david
On 3/30/06, Christoph Lameter <[email protected]> wrote:
> Changelog:
>
> V2
> - Fix various oversights
> - Follow Hans Boehm's scheme for the barrier logic
>
> The following patchset implements the ability to specify a
> synchronization mode for bit operations.
>
> I.e. instead of set_bit(x,y) we can do set_bit(x,y, mode).
>
> The following modes are supported:
>
> MODE_NON_ATOMIC
> Use non atomic version.
> F.e. set_bit(x,y, MODE_NON_ATOMIC) == __set_bit(x,y)
>
> MODE_ATOMIC
> The operation is atomic but there is no guarantee how this
> operation is ordered respective to other memory operations.
>
> MODE_ACQUIRE
> An atomic operation that is guaranteed to occur before
> all subsequent memory accesses
>
> MODE_RELEASE
> An atomic operation that is guaranteed to occur after
> all previos memory acceses.
>
> MODE_BARRIER
> An atomic operation that is guaranteed to occur between
> previous and later memory operations.
>
> For architectures that have no support for bitops with modes we
> fall back to some combination of memory barriers and atomic ops.
>
> This patchset defines architecture support for only IA64.
> Others could be done in a similar fashion.
>
> Note that the current semantics for bitops IA64 are broken. Both
> smp_mb__after/before_clear_bit are now set to full memory barriers
> to compensate which may affect performance.
>
> The kernel core code would need to be fixed to add the proper
> synchronization modes to restore prior performance (with then
> correct locking semantics). If kernel code wants to use synchronization
> modes then an
>
> #include <asm/bitops_mode.h>
>
> needs to be added.
>
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux-2.6.16-mm2/include/asm-ia64/bitops.h
> ===================================================================
> --- linux-2.6.16-mm2.orig/include/asm-ia64/bitops.h 2006-03-30 15:01:21.000000000 -0800
> +++ linux-2.6.16-mm2/include/asm-ia64/bitops.h 2006-03-30 15:44:36.000000000 -0800
> @@ -11,6 +11,7 @@
>
> #include <linux/compiler.h>
> #include <linux/types.h>
> +#include <asm/bitops_mode.h>
> #include <asm/bitops.h>
> #include <asm/intrinsics.h>
>
> @@ -19,8 +20,6 @@
> * @nr: the bit to set
> * @addr: the address to start counting from
> *
> - * This function is atomic and may not be reordered. See __set_bit()
> - * if you do not require the atomic guarantees.
> * Note that @nr may be almost arbitrarily large; this function is not
> * restricted to acting on a single-word quantity.
> *
> @@ -34,244 +33,106 @@
> static __inline__ void
> set_bit (int nr, volatile void *addr)
> {
> - __u32 bit, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - bit = 1 << (nr & 31);
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old | bit;
> - } while (cmpxchg_acq(m, old, new) != old);
> + set_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> /**
> * __set_bit - Set a bit in memory
> * @nr: the bit to set
> * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> */
> static __inline__ void
> __set_bit (int nr, volatile void *addr)
> {
> - *((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
> + set_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> -/*
> - * clear_bit() has "acquire" semantics.
> - */
> #define smp_mb__before_clear_bit() smp_mb()
> -#define smp_mb__after_clear_bit() do { /* skip */; } while (0)
> +#define smp_mb__after_clear_bit() smp_mb()
>
> /**
> * clear_bit - Clears a bit in memory
> * @nr: Bit to clear
> * @addr: Address to start counting from
> - *
> - * clear_bit() is atomic and may not be reordered. However, it does
> - * not contain a memory barrier, so if it is used for locking purposes,
> - * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
> - * in order to ensure changes are visible on other processors.
> */
> static __inline__ void
> clear_bit (int nr, volatile void *addr)
> {
> - __u32 mask, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - mask = ~(1 << (nr & 31));
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old & mask;
> - } while (cmpxchg_acq(m, old, new) != old);
> + clear_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> -/**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> - */
> static __inline__ void
> __clear_bit (int nr, volatile void *addr)
> {
> - volatile __u32 *p = (__u32 *) addr + (nr >> 5);
> - __u32 m = 1 << (nr & 31);
> - *p &= ~m;
> + clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> /**
> * change_bit - Toggle a bit in memory
> * @nr: Bit to clear
> * @addr: Address to start counting from
> - *
> - * change_bit() is atomic and may not be reordered.
> - * Note that @nr may be almost arbitrarily large; this function is not
> - * restricted to acting on a single-word quantity.
> */
> static __inline__ void
> change_bit (int nr, volatile void *addr)
> {
> - __u32 bit, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - bit = (1 << (nr & 31));
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old ^ bit;
> - } while (cmpxchg_acq(m, old, new) != old);
> + change_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> -/**
> - * __change_bit - Toggle a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> static __inline__ void
> __change_bit (int nr, volatile void *addr)
> {
> - *((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
> + change_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> /**
> * test_and_set_bit - Set a bit and return its old value
> * @nr: Bit to set
> * @addr: Address to count from
> - *
> - * This operation is atomic and cannot be reordered.
> - * It also implies a memory barrier.
> */
> static __inline__ int
> test_and_set_bit (int nr, volatile void *addr)
> {
> - __u32 bit, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - bit = 1 << (nr & 31);
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old | bit;
> - } while (cmpxchg_acq(m, old, new) != old);
> - return (old & bit) != 0;
> + return test_and_set_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> -/**
> - * __test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail. You must protect multiple accesses with a lock.
> - */
> static __inline__ int
> __test_and_set_bit (int nr, volatile void *addr)
> {
> - __u32 *p = (__u32 *) addr + (nr >> 5);
> - __u32 m = 1 << (nr & 31);
> - int oldbitset = (*p & m) != 0;
> -
> - *p |= m;
> - return oldbitset;
> + return test_and_set_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> /**
> * test_and_clear_bit - Clear a bit and return its old value
> * @nr: Bit to set
> * @addr: Address to count from
> - *
> - * This operation is atomic and cannot be reordered.
> - * It also implies a memory barrier.
> */
> static __inline__ int
> test_and_clear_bit (int nr, volatile void *addr)
> {
> - __u32 mask, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - mask = ~(1 << (nr & 31));
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old & mask;
> - } while (cmpxchg_acq(m, old, new) != old);
> - return (old & ~mask) != 0;
> + return test_and_clear_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> -/**
> - * __test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail. You must protect multiple accesses with a lock.
> - */
> static __inline__ int
> __test_and_clear_bit(int nr, volatile void * addr)
> {
> - __u32 *p = (__u32 *) addr + (nr >> 5);
> - __u32 m = 1 << (nr & 31);
> - int oldbitset = *p & m;
> -
> - *p &= ~m;
> - return oldbitset;
> + return test_and_clear_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> /**
> * test_and_change_bit - Change a bit and return its old value
> * @nr: Bit to set
> * @addr: Address to count from
> - *
> - * This operation is atomic and cannot be reordered.
> - * It also implies a memory barrier.
> */
> static __inline__ int
> test_and_change_bit (int nr, volatile void *addr)
> {
> - __u32 bit, old, new;
> - volatile __u32 *m;
> - CMPXCHG_BUGCHECK_DECL
> -
> - m = (volatile __u32 *) addr + (nr >> 5);
> - bit = (1 << (nr & 31));
> - do {
> - CMPXCHG_BUGCHECK(m);
> - old = *m;
> - new = old ^ bit;
> - } while (cmpxchg_acq(m, old, new) != old);
> - return (old & bit) != 0;
> + return test_and_change_bit_mode(nr, addr, MODE_ATOMIC);
> }
>
> -/*
> - * WARNING: non atomic version.
> - */
> static __inline__ int
> __test_and_change_bit (int nr, void *addr)
> {
> - __u32 old, bit = (1 << (nr & 31));
> - __u32 *m = (__u32 *) addr + (nr >> 5);
> -
> - old = *m;
> - *m = old ^ bit;
> - return (old & bit) != 0;
> + return test_and_change_bit_mode(nr, addr, MODE_NON_ATOMIC);
> }
>
> static __inline__ int
> Index: linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.16-mm2/include/asm-ia64/bitops_mode.h 2006-03-30 16:07:22.000000000 -0800
> @@ -0,0 +1,204 @@
> +#ifndef _ASM_IA64_BITOPS_MODE_H
> +#define _ASM_IA64_BITOPS_MODE_H
> +
> +/*
> + * Copyright (C) 2006 Silicon Graphics, Incorporated
> + * Christoph Lameter <[email protected]>
> + *
> + * Bit operations with the ability to specify the synchronization mode
> + */
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +#include <asm/intrinsics.h>
> +
> +#define MODE_NON_ATOMIC 0
> +#define MODE_ATOMIC 1
> +#define MODE_ACQUIRE 2
> +#define MODE_RELEASE 3
> +#define MODE_BARRIER 4
> +
> +static __inline__ __u32 cmpxchg_mode(volatile __u32 *m, __u32 old, __u32 new, int mode)
> +{
> + __u32 x;
> +
> + switch (mode) {
> + case MODE_ATOMIC :
> + case MODE_ACQUIRE :
> + return cmpxchg_acq(m, old, new);
> +
> + case MODE_RELEASE :
> + return cmpxchg_rel(m, old, new);
> +
> + case MODE_BARRIER :
> + x = cmpxchg_rel(m, old, new);
> + ia64_mf();
> + return x;
> + }
> +}
> +
> +
> +/**
> + * set_bit_mode - set a bit in memory
> + *
> + * The address must be (at least) "long" aligned.
> + * Note that there are driver (e.g., eepro100) which use these operations to
> + * operate on hw-defined data-structures, so we can't easily change these
> + * operations to force a bigger alignment.
> + *
> + * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
> + */
> +static __inline__ void
> +set_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 bit, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + bit = 1 << (nr & 31);
> +
> + if (mode == MODE_NON_ATOMIC) {
> + *m |= bit;
> + return;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old | bit;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> +}
> +
> +/**
> + * clear_bit_mode - Clears a bit in memory
> + */
> +static __inline__ void
> +clear_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 mask, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + mask = ~(1 << (nr & 31));
> +
> + if (mode == MODE_NON_ATOMIC) {
> + *m &= mask;
> + return;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old & mask;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> +}
> +
> +/**
> + * change_bit_mode - Toggle a bit in memory
> + */
> +static __inline__ void
> +change_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 bit, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + bit = (1 << (nr & 31));
> +
> + if (mode == MODE_NON_ATOMIC) {
> + *m ^= bit;
> + return;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old ^ bit;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> +}
> +
> +/**
> + * test_and_set_bit_mode - Set a bit and return its old value
> + */
> +static __inline__ int
> +test_and_set_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 bit, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + bit = 1 << (nr & 31);
> +
> + if (mode == MODE_NON_ATOMIC) {
> + int oldbitset = *m & bit;
> + *m |= bit;
> + return oldbitset;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old | bit;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> + return (old & bit) != 0;
> +}
> +
> +/**
> + * test_and_clear_bit_mode - Clear a bit and return its old value
> + */
> +static __inline__ int
> +test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 mask, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + mask = ~(1 << (nr & 31));
> +
> + if (mode == MODE_NON_ATOMIC) {
> + int oldbitset = *m & mask;
> + *m &= mask;
> + return oldbitset;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old & mask;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> + return (old & ~mask) != 0;
> +}
> +
> +/**
> + * test_and_change_bit_mode - Change a bit and return its old value
> + */
> +static __inline__ int
> +test_and_change_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + __u32 bit, old, new;
> + volatile __u32 *m;
> + CMPXCHG_BUGCHECK_DECL
> +
> + m = (volatile __u32 *) addr + (nr >> 5);
> + bit = (1 << (nr & 31));
> +
> + if (mode == MODE_NON_ATOMIC) {
> + old = *m;
> + *m = old ^ bit;
> + return (old & bit) != 0;
> + }
> +
> + do {
> + CMPXCHG_BUGCHECK(m);
> + old = *m;
> + new = old ^ bit;
> + } while (cmpxchg_mode(m, old, new, mode) != old);
> + return (old & bit) != 0;
> +}
> +
> +#endif /* _ASM_IA64_BITOPS_MODE_H */
> Index: linux-2.6.16-mm2/include/asm-generic/bitops_mode.h
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.16-mm2/include/asm-generic/bitops_mode.h 2006-03-30 15:44:36.000000000 -0800
> @@ -0,0 +1,220 @@
> +#ifndef _ASM_GENERIC_BITOPS_MODE_H
> +#define _ASM_GENERIC_BITOPS_MODE_H
> +
> +/*
> + * Copyright (C) 2006 Silicon Graphics, Incorporated
> + * Christoph Lameter <[email protected]>
> + *
> + * Fallback logic for bit operations with synchronization mode
> + */
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +#include <asm/intrinsics.h>
> +
> +#define MODE_NON_ATOMIC 0
> +#define MODE_ATOMIC 1
> +#define MODE_ACQUIRE 2
> +#define MODE_RELEASE 3
> +#define MODE_BARRIER 4
> +
> +/**
> + * set_bit_mode - Set a bit in memory
> + *
> + * The address must be (at least) "long" aligned.
> + * Note that there are driver (e.g., eepro100) which use these operations to
> + * operate on hw-defined data-structures, so we can't easily change these
> + * operations to force a bigger alignment.
> + *
> + * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1).
> + */
> +static __inline__ void
> +set_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + __set_bit(nr,addr);
> + return;
> +
> + case MODE_ATOMIC:
> + set_bit(nr,addr);
> + return;
> +
> + case MODE_ACQUIRE:
> + set_bit(nr,addr);
> + smp_mb();
> + return;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + set_bit(nr,addr);
> + return;
> +
> + case MODE_BARRIER:
> + smb_mb();
> + set_bit(nr,addr);
> + smb_mb();
> + return;
> + }
> +}
> +
> +/**
> + * clear_bit_mode - Clears a bit in memory
> + */
> +static __inline__ void
> +clear_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + __clear_bit(nr,addr);
> + return;
> +
> + case MODE_ATOMIC:
> + clear_bit(nr,addr);
> + return;
> +
> + case MODE_ACQUIRE:
> + clear_bit(nr,addr);
> + smp_mb();
> + return;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + clear_bit(nr,addr);
> + return;
> +
> + case MODE_BARRIER:
> + smb_mb();
> + clear_bit(nr,addr);
> + smb_mb();
> + return;
> + }
> +}
> +
> +/**
> + * change_bit_mode - Toggle a bit in memory
> + */
> +static __inline__ void
> +change_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + __change_bit(nr,addr);
> + return;
> +
> + case MODE_ATOMIC:
> + change_bit(nr,addr);
> + return;
> +
> + case MODE_ACQUIRE:
> + change_bit(nr,addr);
> + smp_mb();
> + return;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + change_bit(nr,addr);
> + return;
> +
> + case MODE_BARRIER:
> + smb_mb();
> + change_bit(nr,addr);
> + smb_mb();
> + return;
> + }
> +}
> +
> +/**
> + * test_and_set_bit_mode - Set a bit and return its old value
> + */
> +static __inline__ int
> +test_and_set_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + int x;
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + return __test_and_set_bit(nr,addr);
> +
> + case MODE_ATOMIC:
> + return test_and_set_bit(nr,addr);
> +
> + case MODE_ACQUIRE:
> + x = test_and_set_bit(nr,addr);
> + smp_mb();
> + return x;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + return test_and_set_bit(nr,addr);
> +
> + case MODE_BARRIER:
> + smb_mb();
> + x = test_and_set_bit(nr,addr);
> + smb_mb();
> + return x;
> + }
> +}
> +
> +/**
> + * test_and_clear_bit - Clear a bit and return its old value
> + */
> +static __inline__ int
> +test_and_clear_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + int x;
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + return __test_and_clear_bit(nr,addr);
> +
> + case MODE_ATOMIC:
> + return test_and_clear_bit(nr,addr);
> +
> + case MODE_ACQUIRE:
> + x = test_and_clear_bit(nr,addr);
> + smp_mb();
> + return x;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + return test_and_clear_bit(nr,addr);
> +
> + case MODE_BARRIER:
> + smb_mb();
> + x = test_and_set_bit(nr,addr);
> + smb_mb();
> + return x;
> + }
> +}
> +
> +/**
> + * test_and_change_bit - Change a bit and return its old value
> + */
> +static __inline__ int
> +test_and_change_bit_mode (int nr, volatile void *addr, int mode)
> +{
> + int x;
> + switch (mode) {
> + case MODE_NON_ATOMIC:
> + return __test_and_change_bit(nr,addr);
> +
> + case MODE_ATOMIC:
> + return test_and_change_bit(nr,addr);
> +
> + case MODE_ACQUIRE:
> + x = test_and_change_bit(nr,addr);
> + smp_mb();
> + return x;
> +
> + case MODE_RELEASE:
> + smb_mb();
> + return test_and_change_bit(nr,addr);
> +
> + case MODE_BARRIER:
> + smb_mb();
> + x = test_and_change_bit(nr,addr);
> + smb_mb();
> + return x;
> + }
> +}
> +
> +#endif /* _ASM_GENERIC_BITOPS_MODE_H */
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > static __inline__ void
> > clear_bit (int nr, volatile void *addr)
> > {
> > - __u32 mask, old, new;
> > - volatile __u32 *m;
> > - CMPXCHG_BUGCHECK_DECL
> > -
> > - m = (volatile __u32 *) addr + (nr >> 5);
> > - mask = ~(1 << (nr & 31));
> > - do {
> > - CMPXCHG_BUGCHECK(m);
> > - old = *m;
> > - new = old & mask;
> > - } while (cmpxchg_acq(m, old, new) != old);
> > + clear_bit_mode(nr, addr, MODE_ATOMIC);
> > }
>
> I would make that MODE_RELEASE for clear_bit, simply to match the
> observation that clear_bit is usually used in unlock path and have
> potential less surprises.
clear_bit per se is defined as an atomic operation with no implications
for release or acquire. If it is used for release then either add the
appropriate barrier or use MODE_RELEASE explicitly.
It precise the uncleanness in ia64 that such semantics are attached to
these bit operations which may lead people to depend on those. We need to
either make these explicit or not depend on them.
Christoph Lameter wrote on Thursday, March 30, 2006 4:45 PM
> > I would make that MODE_RELEASE for clear_bit, simply to match the
> > observation that clear_bit is usually used in unlock path and have
> > potential less surprises.
>
> clear_bit per se is defined as an atomic operation with no implications
> for release or acquire. If it is used for release then either add the
> appropriate barrier or use MODE_RELEASE explicitly.
>
> It precise the uncleanness in ia64 that such semantics are attached to
> these bit operations which may lead people to depend on those. We need to
> either make these explicit or not depend on them.
I know, I'm saying since it doesn't make any difference from API point of
view whether it is acq, rel, or no ordering, then just make them rel as a
"preferred" Operation on ia64.
- Ken
On Thu, 30 Mar 2006, David Mosberger-Tang wrote:
> I have to agree with Hans and I'd much prefer making the mode part of
> the operation's
> name and not a parameter. Besides being The Right Thing, it saves a
> lot of typing.
IMHO It reduces the flexibility of the scheme and makes it not extendable.
Leads to a large quantity of macros that are difficult to manage.
Also some higher level functions may want to have the mode passed to them
as parameters. See f.e. include/linux/buffer_head.h. Without the
parameters you will have to maintain farms of definitions for all cases.
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > It precise the uncleanness in ia64 that such semantics are attached to
> > these bit operations which may lead people to depend on those. We need to
> > either make these explicit or not depend on them.
>
> I know, I'm saying since it doesn't make any difference from API point of
> view whether it is acq, rel, or no ordering, then just make them rel as a
> "preferred" Operation on ia64.
That would make the behavior of clear_bit different from other bitops and
references to volatile pointers. I'd like to have this as consistent as
possible.
Christoph Lameter wrote on Thursday, March 30, 2006 4:43 PM
> > > Note that the current semantics for bitops IA64 are broken. Both
> > > smp_mb__after/before_clear_bit are now set to full memory barriers
> > > to compensate
> >
> > Why you say that? clear_bit has built-in acq or rel semantic depends
> > on how you define it. I think only one of smp_mb__after/before need to
> > be smp_mb?
>
> clear_bit has no barrier semantics just acquire. Therefore both smp_mb_*
> need to be barriers or they need to add some form of "release".
We are talking about arch specific implementation of clear_bit and smp_mb_*.
Yes, for generic code, clear_bit has no implication of memory ordering, but
for arch specific code, one should optimize those three functions with the
architecture knowledge of exactly what's happening under the hood.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 4:43 PM
> > > > Note that the current semantics for bitops IA64 are broken. Both
> > > > smp_mb__after/before_clear_bit are now set to full memory barriers
> > > > to compensate
> > >
> > > Why you say that? clear_bit has built-in acq or rel semantic depends
> > > on how you define it. I think only one of smp_mb__after/before need to
> > > be smp_mb?
> >
> > clear_bit has no barrier semantics just acquire. Therefore both smp_mb_*
> > need to be barriers or they need to add some form of "release".
>
> We are talking about arch specific implementation of clear_bit and smp_mb_*.
> Yes, for generic code, clear_bit has no implication of memory ordering, but
> for arch specific code, one should optimize those three functions with the
> architecture knowledge of exactly what's happening under the hood.
Arch specific code should make this explicit too and not rely on implied
semantics. Otherwise one has to memorize that functions have to work with
different semantics in arch code and core code which makes the source
code difficult to maintain.
> Also some higher level functions may want to have the mode passed to them
> as parameters. See f.e. include/linux/buffer_head.h. Without the
> parameters you will have to maintain farms of definitions for all cases.
But if any part of the call chain from those higher level functions
down to these low level functions is not inline, then the compiler
won't be able to collapse out the "switch (mode)" ... so we'd end up
with a ton of extra object code.
-Tony
On Thu, 30 Mar 2006, Luck, Tony wrote:
> > Also some higher level functions may want to have the mode passed to them
> > as parameters. See f.e. include/linux/buffer_head.h. Without the
> > parameters you will have to maintain farms of definitions for all cases.
>
> But if any part of the call chain from those higher level functions
> down to these low level functions is not inline, then the compiler
> won't be able to collapse out the "switch (mode)" ... so we'd end up
> with a ton of extra object code.
Correct. But such bitops are typically defined to be inline.
Christoph Lameter wrote on Thursday, March 30, 2006 4:51 PM
> > > It precise the uncleanness in ia64 that such semantics are attached to
> > > these bit operations which may lead people to depend on those. We need to
> > > either make these explicit or not depend on them.
> >
> > I know, I'm saying since it doesn't make any difference from API point of
> > view whether it is acq, rel, or no ordering, then just make them rel as a
> > "preferred" Operation on ia64.
>
> That would make the behavior of clear_bit different from other bitops and
> references to volatile pointers. I'd like to have this as consistent as
> possible.
Yeah, but we just agreed that caller shouldn't be thinking clear_bit has
memory ordering at all.
- Ken
Christoph Lameter wrote on Thursday, March 30, 2006 4:55 PM
> > We are talking about arch specific implementation of clear_bit and smp_mb_*.
> > Yes, for generic code, clear_bit has no implication of memory ordering, but
> > for arch specific code, one should optimize those three functions with the
> > architecture knowledge of exactly what's happening under the hood.
>
> Arch specific code should make this explicit too and not rely on implied
> semantics. Otherwise one has to memorize that functions have to work with
> different semantics in arch code and core code which makes the source
> code difficult to maintain.
I don't know whether we are talking about the same thing: I propose for ia64:
clear_bit to have release semantic, smp_mb__before_clear_bit will be a noop, smp_mb_after_clear_bit will be a smp_mb().
Caller are still required to use smp_mb__before_clear_bit if it requires, on
ia64, that function will simply be a noop.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > > I know, I'm saying since it doesn't make any difference from API point of
> > > view whether it is acq, rel, or no ordering, then just make them rel as a
> > > "preferred" Operation on ia64.
> >
> > That would make the behavior of clear_bit different from other bitops and
> > references to volatile pointers. I'd like to have this as consistent as
> > possible.
>
> Yeah, but we just agreed that caller shouldn't be thinking clear_bit has
> memory ordering at all.
In general yes the caller should not be thinking about clear_bit having
any memory ordering at all. However for IA64 arch specific code the bit
operations must have a certain ordering semantic and it would be best that
these are also consistent. clear_bit is not a lock operation and may
f.e. be used for locking something.
Christoph Lameter wrote on Thursday, March 30, 2006 5:09 PM
> In general yes the caller should not be thinking about clear_bit having
> any memory ordering at all. However for IA64 arch specific code the bit
> operations must have a certain ordering semantic and it would be best that
> these are also consistent. clear_bit is not a lock operation and may
> f.e. be used for locking something.
OK, fine. Then please don't change smp_mb__after_clear_bit() for ia64.
i.e., leave it alone as noop.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > Arch specific code should make this explicit too and not rely on implied
> > semantics. Otherwise one has to memorize that functions have to work with
> > different semantics in arch code and core code which makes the source
> > code difficult to maintain.
>
> I don't know whether we are talking about the same thing: I propose for
> ia64:
>clear_bit to have release semantic,
Inconsistent with other bit operations.
>smp_mb__before_clear_bit will be a noop,
Then there will no barrier since clear_bit only has acquire semantics. This is a
bug in bit operations since smb_mb__before_clear_bit does not work as
documentted.
>smp_mb_after_clear_bit will be a smp_mb().
Ok.
> Caller are still required to use smp_mb__before_clear_bit if it requires, on
> ia64, that function will simply be a noop.
Well ultimately I wish we could move away from these
smb_mb__before/after_xx macros and use explicit synchronization ordering
instead.
If there is agreement on this patch then we can use explicit ordering in
core kernel code and slowly get rid of smb_mb__xxx.
Christoph Lameter wrote on Thursday, March 30, 2006 5:13 PM
> Then there will no barrier since clear_bit only has acquire semantics.
> This is a bug in bit operations since smb_mb__before_clear_bit does
> not work as documentted.
Well, please make up your mind with:
Option (1):
#define clear_bit clear_bit_mode(..., RELEASE)
#define Smp_mb__before_clear_bit do { } while (0)
#define Smp_mb__after_clear_bit smp_mb()
Or option (2):
#define clear_bit clear_bit_mode(..., ACQUIRE)
#define Smp_mb__before_clear_bit smp_mb()
#define Smp_mb__after_clear_bit do { } while (0)
I'm fine with either one.
- Ken
> The following patchset implements the ability to specify a
> synchronization mode for bit operations.
>
> I.e. instead of set_bit(x,y) we can do set_bit(x,y, mode).
>
> The following modes are supported:
Yuck.
The only conceivable reason for passing the mode as a separate parameter is
- To change the mode dynamically at run time.
- To share common code when the sequence is long and mostly shared
between the various modes (as in open(2) or ll_rw_block()).
I sincerely hope neither of those apply in this case.
On the downside, it's more typing and uglier than a series of
frob_bit_nonatomic()
(probably temporarily or permanently aliased to frob_bit())
frob_bit_atomic()
frob_bit_acquire()
frob_bit_release()
frob_bit_barrier()
functions, and those also prevent you from doing something silly like
frob_bit(x, y, O_DIRECT). Also, the MODE_ prefix might be wanted by
something else.
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 5:13 PM
> > Then there will no barrier since clear_bit only has acquire semantics.
> > This is a bug in bit operations since smb_mb__before_clear_bit does
> > not work as documentted.
>
> Well, please make up your mind with:
>
> Option (1):
>
> #define clear_bit clear_bit_mode(..., RELEASE)
> #define Smp_mb__before_clear_bit do { } while (0)
> #define Smp_mb__after_clear_bit smp_mb()
>
> Or option (2):
>
> #define clear_bit clear_bit_mode(..., ACQUIRE)
> #define Smp_mb__before_clear_bit smp_mb()
> #define Smp_mb__after_clear_bit do { } while (0)
>
> I'm fine with either one.
Neither one is correct because there will always be one combination of
clear_bit with these macros that does not generate the required memory
barrier.
On Thu, 30 Mar 2006, [email protected] wrote:
> The only conceivable reason for passing the mode as a separate parameter is
> - To change the mode dynamically at run time.
> - To share common code when the sequence is long and mostly shared
> between the various modes (as in open(2) or ll_rw_block()).
There is usually quite complex code involved although the code generated
is minimal.
> On the downside, it's more typing and uglier than a series of
>
> frob_bit_nonatomic()
> (probably temporarily or permanently aliased to frob_bit())
> frob_bit_atomic()
> frob_bit_acquire()
> frob_bit_release()
> frob_bit_barrier()
>
> functions, and those also prevent you from doing something silly like
> frob_bit(x, y, O_DIRECT). Also, the MODE_ prefix might be wanted by
> something else.
Ok. We could change the MODE_ prefix but the problem with not passing this
as a parameter that there are numerous functions derived from bit ops that
are then also needed in lots of different flavors. Passing a parameter
cuts down the number of variations dramatically.
Christoph Lameter wrote on Thursday, March 30, 2006 5:38 PM
> > Christoph Lameter wrote on Thursday, March 30, 2006 5:13 PM
> > > Then there will no barrier since clear_bit only has acquire semantics.
> > > This is a bug in bit operations since smb_mb__before_clear_bit does
> > > not work as documentted.
> >
> > Well, please make up your mind with:
> >
> > Option (1):
> >
> > #define clear_bit clear_bit_mode(..., RELEASE)
> > #define Smp_mb__before_clear_bit do { } while (0)
> > #define Smp_mb__after_clear_bit smp_mb()
> >
> > Or option (2):
> >
> > #define clear_bit clear_bit_mode(..., ACQUIRE)
> > #define Smp_mb__before_clear_bit smp_mb()
> > #define Smp_mb__after_clear_bit do { } while (0)
> >
> > I'm fine with either one.
>
> Neither one is correct because there will always be one combination of
> clear_bit with these macros that does not generate the required memory
> barrier.
Can you give an example? Which combination?
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > > Option (1):
> > >
> > > #define clear_bit clear_bit_mode(..., RELEASE)
> > > #define Smp_mb__before_clear_bit do { } while (0)
> > > #define Smp_mb__after_clear_bit smp_mb()
> > >
> > > Or option (2):
> > >
> > > #define clear_bit clear_bit_mode(..., ACQUIRE)
> > > #define Smp_mb__before_clear_bit smp_mb()
> > > #define Smp_mb__after_clear_bit do { } while (0)
> > >
> > > I'm fine with either one.
> >
> > Neither one is correct because there will always be one combination of
> > clear_bit with these macros that does not generate the required memory
> > barrier.
>
> Can you give an example? Which combination?
For Option(1)
smp_mb__before_clear_bit()
clear_bit(...)(
For Option(2)
clear_bit()
smb_mp_after_clear_bit();
Both have either acquire or release semantics but do not have the effect
of a barrier as required by the macros.
Note that both before and after are used in the core kernel code. Both
must work correctly.
Christoph Lameter wrote on Thursday, March 30, 2006 6:38 PM
> > > Neither one is correct because there will always be one combination of
> > > clear_bit with these macros that does not generate the required memory
> > > barrier.
> >
> > Can you give an example? Which combination?
>
> For Option(1)
>
> smp_mb__before_clear_bit()
> clear_bit(...)(
Sorry, you totally lost me. It could me I'm extremely slow today. For
option (1), on ia64, clear_bit has release semantic already. The comb
of __before_clear_bit + clear_bit provides the required ordering. Did
I miss something? By the way, we are talking about detail implementation
on one specific architecture. Not some generic concept that clear_bit
has no ordering stuff in there.
- Ken
Chen, Kenneth W wrote on Thursday, March 30, 2006 6:45 PM
> Christoph Lameter wrote on Thursday, March 30, 2006 6:38 PM
> > > > Neither one is correct because there will always be one combination of
> > > > clear_bit with these macros that does not generate the required memory
> > > > barrier.
> > >
> > > Can you give an example? Which combination?
> >
> > For Option(1)
> >
> > smp_mb__before_clear_bit()
> > clear_bit(...)(
>
> Sorry, you totally lost me. It could me I'm extremely slow today. For
> option (1), on ia64, clear_bit has release semantic already. The comb
> of __before_clear_bit + clear_bit provides the required ordering. Did
> I miss something? By the way, we are talking about detail implementation
> on one specific architecture. Not some generic concept that clear_bit
> has no ordering stuff in there.
By the way, this is the same thing on x86: look at include/asm-i386/bitops.h:
#define smp_mb__before_clear_bit() barrier()
#define smp_mb__after_clear_bit() barrier()
A simple compiler barrier, nothing but
#define barrier() __asm__ __volatile__("": : :"memory")
See, no memory ordering there, because clear_bit already has a LOCK prefix.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> By the way, this is the same thing on x86: look at include/asm-i386/bitops.h:
>
> #define smp_mb__before_clear_bit() barrier()
> #define smp_mb__after_clear_bit() barrier()
>
> A simple compiler barrier, nothing but
> #define barrier() __asm__ __volatile__("": : :"memory")
>
> See, no memory ordering there, because clear_bit already has a LOCK prefix.
And that implies barrier behavior right?
Christoph Lameter wrote on Thursday, March 30, 2006 6:56 PM
> > By the way, this is the same thing on x86: look at include/asm-i386/bitops.h:
> >
> > #define smp_mb__before_clear_bit() barrier()
> > #define smp_mb__after_clear_bit() barrier()
> >
> > A simple compiler barrier, nothing but
> > #define barrier() __asm__ __volatile__("": : :"memory")
> >
> > See, no memory ordering there, because clear_bit already has a LOCK prefix.
>
> And that implies barrier behavior right?
No, not the memory ordering semantics you are thinking about. It just tell
compiler not to be over smart and schedule a load operation above that point
Intel compiler is good at schedule memory load way ahead of its use to hide
memory latency. gcc probably does that too, I'm not 100% sure. This prevents
the compiler to schedule load before that line.
- Ken
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 6:38 PM
> > > > Neither one is correct because there will always be one combination of
> > > > clear_bit with these macros that does not generate the required memory
> > > > barrier.
> > >
> > > Can you give an example? Which combination?
> >
> > For Option(1)
> >
> > smp_mb__before_clear_bit()
> > clear_bit(...)(
>
> Sorry, you totally lost me. It could me I'm extremely slow today. For
> option (1), on ia64, clear_bit has release semantic already. The comb
> of __before_clear_bit + clear_bit provides the required ordering. Did
> I miss something? By the way, we are talking about detail implementation
> on one specific architecture. Not some generic concept that clear_bit
> has no ordering stuff in there.
We are talking about IA64 and IA64 only generates an single instruction
with either release or acquire semantics for the case in which either
smb_mb__before/after_clear_bit does nothing.
Neither acquire nor release is a memory barrier on IA64.
The combination of both does the equivalent but then we do not have both
acquire and release if either smb_mb__before/after_clear bit does
nothing.
For clear_bit you have both uses in the kernel
smb_mb_before_clear_bit()
clear_bit()
and
clear_bit()
smb_mb_after_clear_bit()
clear_bit() in itself does not have barrier semantics on IA64. Therefore
smb_mb_after and before must both provide a memory barrier.
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > > See, no memory ordering there, because clear_bit already has a LOCK prefix.
> No, not the memory ordering semantics you are thinking about. It just tell
> compiler not to be over smart and schedule a load operation above that point
> Intel compiler is good at schedule memory load way ahead of its use to hide
> memory latency. gcc probably does that too, I'm not 100% sure. This prevents
> the compiler to schedule load before that line.
The compiler? I thought we were talking about the processor.
I was referring to the LOCK prefix. Doesnt that insure the processor to
go into a special state and make the bus go into a special state that
implies a barrier?
Christoph Lameter wrote on Thursday, March 30, 2006 7:02 PM
> We are talking about IA64 and IA64 only generates an single instruction
> with either release or acquire semantics for the case in which either
> smb_mb__before/after_clear_bit does nothing.
>
> Neither acquire nor release is a memory barrier on IA64.
The use of
smp_mb__before_clear_bit();
clear_bit( ... );
is: all memory operations before this call will be visible before
the clear_bit(). To me, that's release semantics.
On ia64, we map the following:
#define Smp_mb__before_clear_bit do { } while (0)
#define clear_bit clear_bit_mode(..., RELEASE)
Which looked perfect fine to me. I don't understand why you say it does
not provide memory ordering.
- Ken
Christoph Lameter wrote on Thursday, March 30, 2006 7:09 PM
> I was referring to the LOCK prefix. Doesnt that insure the processor to
> go into a special state and make the bus go into a special state that
> implies a barrier?
Yes, that's why on i386 both smp_mb__before_clear_bit() and
smp_mb__after_clear_bit() are noop.
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 7:02 PM
> > We are talking about IA64 and IA64 only generates an single instruction
> > with either release or acquire semantics for the case in which either
> > smb_mb__before/after_clear_bit does nothing.
> >
> > Neither acquire nor release is a memory barrier on IA64.
>
>
> The use of
> smp_mb__before_clear_bit();
> clear_bit( ... );
>
> is: all memory operations before this call will be visible before
> the clear_bit(). To me, that's release semantics.
What of it? Release semantics are not a full fence or memory barrier.
> On ia64, we map the following:
> #define Smp_mb__before_clear_bit do { } while (0)
> #define clear_bit clear_bit_mode(..., RELEASE)
>
> Which looked perfect fine to me. I don't understand why you say it does
> not provide memory ordering.
It does not provide a memory barrier / fence. Later memory references can
still be moved by the processor above the instruction with release semantics.
Christoph Lameter wrote on Thursday, March 30, 2006 7:12 PM
> > The use of
> > smp_mb__before_clear_bit();
> > clear_bit( ... );
> >
> > is: all memory operations before this call will be visible before
> > the clear_bit(). To me, that's release semantics.
>
> What of it? Release semantics are not a full fence or memory barrier.
The API did not require a full fence. It is defined as a one way fence.
Christoph Lameter wrote on Thursday, March 30, 2006 7:12 PM
> > On ia64, we map the following:
> > #define Smp_mb__before_clear_bit do { } while (0)
> > #define clear_bit clear_bit_mode(..., RELEASE)
> >
> > Which looked perfect fine to me. I don't understand why you say it does
> > not provide memory ordering.
>
> It does not provide a memory barrier / fence. Later memory references can
> still be moved by the processor above the instruction with release semantics.
That is perfect legitimate, and was precisely the reason for the invention of
smp_mb__after_clear_bit - prevent later load to leak before clear_bit.
Christoph Lameter wrote:
> Changelog:
>
> V2
> - Fix various oversights
> - Follow Hans Boehm's scheme for the barrier logic
>
> The following patchset implements the ability to specify a
> synchronization mode for bit operations.
>
> I.e. instead of set_bit(x,y) we can do set_bit(x,y, mode).
>
> The following modes are supported:
>
This has acquire and release, instead of the generic kernel
memory barriers rmb and wmb. As such, I don't think it would
get merged.
> Note that the current semantics for bitops IA64 are broken. Both
> smp_mb__after/before_clear_bit are now set to full memory barriers
> to compensate which may affect performance.
I think you should fight the fights you can win and get a 90%
solution ;) at any rate you do need to fix the existing routines
unless you plan to audit all callers...
First, fix up ia64 in 2.6-head, this means fixing test_and_set_bit
and friends, smp_mb__*_clear_bit, and all the atomic operations that
both modify and return a value.
Then add test_and_set_bit_lock / clear_bit_unlock, and apply them
to a couple of critical places like page lock and buffer lock.
Is this being planned?
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
On Thu, 30 Mar 2006, Chen, Kenneth W wrote:
> > What of it? Release semantics are not a full fence or memory barrier.
>
> The API did not require a full fence. It is defined as a one way fence.
Well that explains our misunderstanding.
The issue with all these hacky macros is that they all have their own
semantics and do not work in a consistent way. More reason to make that
explicit.
Where may I find that definition?
Documentation/atomic_ops.txt implies a complete barrier and gives
an example of the use of these macros in order to obtain release
semantics. AFAIK that does not mean that this is the intended complete
behavior of a "memory barrier":
If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, a set of interfaces are
defined which accomplish this:
void smp_mb__before_atomic_dec(void);
void smp_mb__after_atomic_dec(void);
void smp_mb__before_atomic_inc(void);
void smp_mb__after_atomic_dec(void);
For example, smp_mb__before_atomic_dec() can be used like so:
obj->dead = 1;
smp_mb__before_atomic_dec();
atomic_dec(&obj->ref_count);
It makes sure that all memory operations preceeding the atomic_dec()
call are strongly ordered with respect to the atomic counter
operation. In the above example, it guarentees that the assignment of
"1" to obj->dead will be globally visible to other cpus before the
atomic counter decrement.
Christoph Lameter wrote on Thursday, March 30, 2006 7:12 PM
> > > We are talking about IA64 and IA64 only generates an single instruction
> > > with either release or acquire semantics for the case in which either
> > > smb_mb__before/after_clear_bit does nothing.
> > >
> > > Neither acquire nor release is a memory barrier on IA64.
> >
> >
> > The use of
> > smp_mb__before_clear_bit();
> > clear_bit( ... );
> >
> > is: all memory operations before this call will be visible before
> > the clear_bit(). To me, that's release semantics.
>
> What of it? Release semantics are not a full fence or memory barrier.
>
> > On ia64, we map the following:
> > #define Smp_mb__before_clear_bit do { } while (0)
> > #define clear_bit clear_bit_mode(..., RELEASE)
> >
> > Which looked perfect fine to me. I don't understand why you say it does
> > not provide memory ordering.
>
> It does not provide a memory barrier / fence. Later memory references can
> still be moved by the processor above the instruction with release semantics.
This is probably a classic example of a sucky name leads to confusion.
There are smp_mb_ in the name, however, the semantics is really defined
as a one-way memory barrier and probably is the main reason of contention
in this discussion :-(
Another good reason to get rid of this silly smp_mb_before/after_clear_bit.
- Ken
wrong confusing implementation
On Fri, 31 Mar 2006, Nick Piggin wrote:
> This has acquire and release, instead of the generic kernel
> memory barriers rmb and wmb. As such, I don't think it would
> get merged.
Right. From the earlier conversation I had the impression that this is
what you wanted.
> > Note that the current semantics for bitops IA64 are broken. Both
> > smp_mb__after/before_clear_bit are now set to full memory barriers
> > to compensate which may affect performance.
>
> I think you should fight the fights you can win and get a 90%
> solution ;) at any rate you do need to fix the existing routines
> unless you plan to audit all callers...
>
> First, fix up ia64 in 2.6-head, this means fixing test_and_set_bit
> and friends, smp_mb__*_clear_bit, and all the atomic operations that
> both modify and return a value.
>
> Then add test_and_set_bit_lock / clear_bit_unlock, and apply them
> to a couple of critical places like page lock and buffer lock.
>
> Is this being planned?
That sounds like a long and tedious route to draw out the pain for a
couple of years and add loads of additional macro definitions all over the
header files. I'd really like a solution that allows a gradual
simplification of the macros and that has clear semantics.
So far it seems that I have not even been able to find the definitions for
the proper behavior of memory barriers.
Christoph Lameter wrote on Thursday, March 30, 2006 7:21 PM
> > > What of it? Release semantics are not a full fence or memory barrier.
> >
> > The API did not require a full fence. It is defined as a one way fence.
>
> Well that explains our misunderstanding.
>
> The issue with all these hacky macros is that they all have their own
> semantics and do not work in a consistent way. More reason to make that
> explicit.
>
> Where may I find that definition?
>
> Documentation/atomic_ops.txt implies a complete barrier and gives
> an example of the use of these macros in order to obtain release
> semantics. AFAIK that does not mean that this is the intended complete
> behavior of a "memory barrier":
>
>
>
>
> If a caller requires memory barrier semantics around an atomic_t
> operation which does not return a value, a set of interfaces are
> defined which accomplish this:
>
> void smp_mb__before_atomic_dec(void);
> void smp_mb__after_atomic_dec(void);
> void smp_mb__before_atomic_inc(void);
> void smp_mb__after_atomic_dec(void);
>
> For example, smp_mb__before_atomic_dec() can be used like so:
>
> obj->dead = 1;
> smp_mb__before_atomic_dec();
> atomic_dec(&obj->ref_count);
>
> It makes sure that all memory operations preceeding the atomic_dec()
> call are strongly ordered with respect to the atomic counter
> operation. In the above example, it guarentees that the assignment of
> "1" to obj->dead will be globally visible to other cpus before the
> atomic counter decrement.
It means we need an complete overhaul of smp_mb__before/after_*. The name
and its implied memory order semantics is not consistent and it leads to all
kinds of confusion and probably improper usage.
I'm all for making atomic bit op to have explicit ordering mode in them.
- Ken
Chen, Kenneth W wrote:
> Christoph Lameter wrote on Thursday, March 30, 2006 6:38 PM
>
>>>>Neither one is correct because there will always be one combination of
>>>>clear_bit with these macros that does not generate the required memory
>>>>barrier.
>>>
>>>Can you give an example? Which combination?
>>
>>For Option(1)
>>
>>smp_mb__before_clear_bit()
>>clear_bit(...)(
>
>
> Sorry, you totally lost me. It could me I'm extremely slow today. For
> option (1), on ia64, clear_bit has release semantic already. The comb
> of __before_clear_bit + clear_bit provides the required ordering. Did
> I miss something? By the way, we are talking about detail implementation
> on one specific architecture. Not some generic concept that clear_bit
> has no ordering stuff in there.
>
The memory ordering that above combination should produce is a
Linux style smp_mb before the clear_bit. Not a release.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
* David Mosberger-Tang ([email protected]) wrote:
> I have to agree with Hans and I'd much prefer making the mode part of
> the operation's
> name and not a parameter. Besides being The Right Thing, it saves a
> lot of typing.
> For example:
>
> + set_bit_mode(nr, addr, MODE_ATOMIC);
>
> would simply become
>
> + set_bit_atomic(nr, addr);
That's the approach Xen took as well.
thanks,
-chris
Nick Piggin wrote on Thursday, March 30, 2006 6:53 PM
> >>smp_mb__before_clear_bit()
> >>clear_bit(...)(
> >
> > Sorry, you totally lost me. It could me I'm extremely slow today. For
> > option (1), on ia64, clear_bit has release semantic already. The comb
> > of __before_clear_bit + clear_bit provides the required ordering. Did
> > I miss something? By the way, we are talking about detail implementation
> > on one specific architecture. Not some generic concept that clear_bit
> > has no ordering stuff in there.
> >
>
> The memory ordering that above combination should produce is a
> Linux style smp_mb before the clear_bit. Not a release.
Whoever designed the smp_mb_before/after_* clearly understand the
difference between a bidirectional smp_mb() and a one-way memory
ordering. If smp_mb_before/after are equivalent to smp_mb, what's
the point of introducing another interface?
- Ken
Christoph Lameter wrote:
> On Fri, 31 Mar 2006, Nick Piggin wrote:
>
>
>>This has acquire and release, instead of the generic kernel
>>memory barriers rmb and wmb. As such, I don't think it would
>>get merged.
>
>
> Right. From the earlier conversation I had the impression that this is
> what you wanted.
>
Perhaps it is the best way to go, but you need to fix ia64's
current problems first. Again -- you don't plan to audit and
convert _all_ current users in one go do you?
>
>>>Note that the current semantics for bitops IA64 are broken. Both
>>>smp_mb__after/before_clear_bit are now set to full memory barriers
>>>to compensate which may affect performance.
>>
>>I think you should fight the fights you can win and get a 90%
>>solution ;) at any rate you do need to fix the existing routines
>>unless you plan to audit all callers...
>>
>>First, fix up ia64 in 2.6-head, this means fixing test_and_set_bit
>>and friends, smp_mb__*_clear_bit, and all the atomic operations that
>>both modify and return a value.
>>
>>Then add test_and_set_bit_lock / clear_bit_unlock, and apply them
>>to a couple of critical places like page lock and buffer lock.
>>
>>Is this being planned?
>
>
> That sounds like a long and tedious route to draw out the pain for a
> couple of years and add loads of additional macro definitions all over the
> header files. I'd really like a solution that allows a gradual
> simplification of the macros and that has clear semantics.
>
Why? Yours is the long drawn out plan.
You acknowledge that you have to fix ia64 to match current semantics
first, right?
Now people seem to be worried about the performance impact that will
have, so I simply suggest that adding two or three new macros for the
important cases to give you a 90% solution.
You would then be free to discuss and design a 95% solution ;)
> So far it seems that I have not even been able to find the definitions for
> the proper behavior of memory barriers.
I think Documentation/atomic_ops.txt isn't bad. smp_mb__* really
is a smp_mb, which can be optimised sometimes.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
Christoph Lameter <[email protected]> writes:
> MODE_BARRIER
> An atomic operation that is guaranteed to occur between
> previous and later memory operations.
I think it's a bad idea to create such an complicated interface.
The chances that an average kernel coder will get these right are
quite small. And it will be 100% untested outside IA64 I guess
and thus likely be always slightly buggy as kernel code continues
to change.
-Andi
Chen, Kenneth W wrote:
> Nick Piggin wrote on Thursday, March 30, 2006 6:53 PM
>>The memory ordering that above combination should produce is a
>>Linux style smp_mb before the clear_bit. Not a release.
>
>
> Whoever designed the smp_mb_before/after_* clearly understand the
> difference between a bidirectional smp_mb() and a one-way memory
> ordering. If smp_mb_before/after are equivalent to smp_mb, what's
> the point of introducing another interface?
>
They are not. They provide equivalent barrier when performed
before/after a clear_bit, there is a big difference.
You guys (ia64) are the ones who want to introduce a new
interface, because you think conforming to the kernel's current
interfaces will be too costly. I simply suggested a way you
could do this that would have a chance of being merged.
If you want to change the semantics of smp_mb__*, then good
luck auditing all that well documented code that uses it.
I just happen to think your best bet is to stick with the
obvious full barrier semantics (which is what other
architectures, eg powerpc do), and introduce something new
if you want more performance.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
I would argue that there are two semi-viable approaches to this:
1) Guarantee that all atomic operations always include a full
memory fence/barrier. That way the programmer can largely ignore
associated memory ordering issues.
2) Make the ordering semantics as explicit and clean as possible, which
is being proposed here.
It seems to me that anything in the middle doesn't work well. If
programmers have to think about ordering at all, you want to make it as
difficult as possible to overlook.
My impression is that approach (1) tends not to stick, since it involves
a substantial performance hit on architectures on which the fence is
not implicitly included in atomic operations. Those include Itanium and
PowerPC.
Hans
On Fri, 31 Mar 2006, Andi Kleen wrote:
> Christoph Lameter <[email protected]> writes:
> > MODE_BARRIER
> > An atomic operation that is guaranteed to occur between
> > previous and later memory operations.
>
>
> I think it's a bad idea to create such an complicated interface.
> The chances that an average kernel coder will get these right are
> quite small. And it will be 100% untested outside IA64 I guess
> and thus likely be always slightly buggy as kernel code continues
> to change.
>
> -Andi
>
On Friday 31 March 2006 18:22, Hans Boehm wrote:
> My impression is that approach (1) tends not to stick, since it involves
> a substantial performance hit on architectures on which the fence is
> not implicitly included in atomic operations. Those include Itanium and
> PowerPC.
At least the PPC people are eating the overhead because back when they
didn't they had a long string of subtle powerpc only bugs caused by that
It's a stability/maintainability vs performance issue. I doubt the
performance advantage would be worth the additional work. I guess
with the engineering time you would need to spend getting all this right
you could do much more fruitful optimizations.
-Andi
On Fri, 31 Mar 2006, Nick Piggin wrote:
> >
> > Right. From the earlier conversation I had the impression that this is what
> > you wanted.
> Perhaps it is the best way to go, but you need to fix ia64's
> current problems first. Again -- you don't plan to audit and
> convert _all_ current users in one go do you?
No. If you look at the patch you will see that this fixes the ia64
problems for the existing scheme and then allows the use of new _mode
bitops if you include bitops_mode.h from core code. This is designed for a
gradual change.
> You acknowledge that you have to fix ia64 to match current semantics
> first, right?
Right. I believe I have done so by making both smb_mb_* full barriers.
> Now people seem to be worried about the performance impact that will
> have, so I simply suggest that adding two or three new macros for the
> important cases to give you a 90% solution.
We could transition some key locations of core code to use _mode bitops
if there are performance problems.
> I think Documentation/atomic_ops.txt isn't bad. smp_mb__* really
> is a smp_mb, which can be optimised sometimes.
Ok. Then we are on the same page and the solution I presented may be
acceptable. I have a new rev here that changes the naming a bit but I
think we are okay right?
On Fri, 31 Mar 2006, Andi Kleen wrote:
> Christoph Lameter <[email protected]> writes:
> > MODE_BARRIER
> > An atomic operation that is guaranteed to occur between
> > previous and later memory operations.
> I think it's a bad idea to create such an complicated interface.
> The chances that an average kernel coder will get these right are
> quite small. And it will be 100% untested outside IA64 I guess
> and thus likely be always slightly buggy as kernel code continues
> to change.
Powerpc can do similar things AFAIK. Not sure what other arches have
finer grained control over barriers but it could cover a lot of special
cases for other processors as well.
On Fri, 31 Mar 2006, Andi Kleen wrote:
> On Friday 31 March 2006 18:22, Hans Boehm wrote:
>
> > My impression is that approach (1) tends not to stick, since it involves
> > a substantial performance hit on architectures on which the fence is
> > not implicitly included in atomic operations. Those include Itanium and
> > PowerPC.
>
> At least the PPC people are eating the overhead because back when they
> didn't they had a long string of subtle powerpc only bugs caused by that
PPC has barriers for both smb_mb_before/after cases. IMHO we should do the
same for ia64 and not fuzz around.
> It's a stability/maintainability vs performance issue. I doubt the
> performance advantage would be worth the additional work. I guess
> with the engineering time you would need to spend getting all this right
> you could do much more fruitful optimizations.
Agreed.
On Friday 31 March 2006 19:45, Christoph Lameter wrote:
> On Fri, 31 Mar 2006, Andi Kleen wrote:
>
> > Christoph Lameter <[email protected]> writes:
> > > MODE_BARRIER
> > > An atomic operation that is guaranteed to occur between
> > > previous and later memory operations.
> > I think it's a bad idea to create such an complicated interface.
> > The chances that an average kernel coder will get these right are
> > quite small. And it will be 100% untested outside IA64 I guess
> > and thus likely be always slightly buggy as kernel code continues
> > to change.
>
> Powerpc can do similar things AFAIK. Not sure what other arches have
> finer grained control over barriers but it could cover a lot of special
> cases for other processors as well.
Yes, but I don't think the goal of a portable atomic operations API
in Linux is it to cover everybody's special case in every possible
combination. The goal is to have an abstraction that will lead to
portable code. I don't think your proposal will do this.
-Andi
On Fri, 31 Mar 2006, Andi Kleen wrote:
> > Powerpc can do similar things AFAIK. Not sure what other arches have
> > finer grained control over barriers but it could cover a lot of special
> > cases for other processors as well.
>
> Yes, but I don't think the goal of a portable atomic operations API
> in Linux is it to cover everybody's special case in every possible
> combination. The goal is to have an abstraction that will lead to
> portable code. I don't think your proposal will do this.
AFAIK The goal of a bitmap operations API (we are not talking about atomic
operations here) is to make bitmap operations as efficient as possible and
as simple as possible. We already have multiple special cases for each
bitmap operation
I.e.
clear_bit()
__clear_bit()
and some people talk abouit
clear_bit_lock()
clear_bit_unlock()
What I am prosing is to do one clear_bit_mode function that can take a
parameter customizing its synchronization behavior.
clear_bit_mode(bit,addr,mode)
Nick Piggin wrote on Thursday, March 30, 2006 11:35 PM
> > >The memory ordering that above combination should produce is a
> > >Linux style smp_mb before the clear_bit. Not a release.
> >
> > Whoever designed the smp_mb_before/after_* clearly understand the
> > difference between a bidirectional smp_mb() and a one-way memory
> > ordering. If smp_mb_before/after are equivalent to smp_mb, what's
> > the point of introducing another interface?
> >
> They are not. They provide equivalent barrier when performed
> before/after a clear_bit, there is a big difference.
The usage so far that I can see for
smp_mb__before_clear_bit()
clear_bit
is to close a critical section with clear_bit. I will be hard impressed
to see a usage that allows stuff follows clear_bit to pass clear_bit, but
not to pass the smp_mb_before_xxx.
<end of critical section>
smp_mb_before_clear_bit
clear_bit
<begin other code>
But if you stand on the ground of smp_mb_before_xxx protects clear_bit
from occurring before the "end of critical section", then smp_mb_before
is such a brain dead interface and it is another good reason for having
an explicit ordering mode built into the clear_bit.
Nick Piggin wrote on Thursday, March 30, 2006 11:35 PM
> > Whoever designed the smp_mb_before/after_* clearly understand the
> > difference between a bidirectional smp_mb() and a one-way memory
> > ordering. If smp_mb_before/after are equivalent to smp_mb, what's
> > the point of introducing another interface?
> >
>
> They are not. They provide equivalent barrier when performed
> before/after a clear_bit, there is a big difference.
Just to give another blunt brutal example, what is said here is equivalent
to say kernel requires:
<end of critical section>
smp_mb_before_spin_unlock
spin_unlock
Because it is undesirable to have spin_unlock to leak into the critical
Section and allow critical section to leak after spin_unlock. This is
just plain brain dead.
On Fri, 31 Mar 2006, Chen, Kenneth W wrote:
> > They are not. They provide equivalent barrier when performed
> > before/after a clear_bit, there is a big difference.
>
> Just to give another blunt brutal example, what is said here is equivalent
> to say kernel requires:
>
> <end of critical section>
> smp_mb_before_spin_unlock
> spin_unlock
>
> Because it is undesirable to have spin_unlock to leak into the critical
> Section and allow critical section to leak after spin_unlock. This is
> just plain brain dead.
I think we could say that lock semantics are different from barriers. They
are more like acquire and release on IA64. The problem with smb_mb_*** is
that the coder *explicitly* requested a barrier operation and we do not
give it to him.
Christoph Lameter wrote on Friday, March 31, 2006 1:15 PM
> > > They are not. They provide equivalent barrier when performed
> > > before/after a clear_bit, there is a big difference.
> >
> > Just to give another blunt brutal example, what is said here is equivalent
> > to say kernel requires:
> >
> > <end of critical section>
> > smp_mb_before_spin_unlock
> > spin_unlock
> >
> > Because it is undesirable to have spin_unlock to leak into the critical
> > Section and allow critical section to leak after spin_unlock. This is
> > just plain brain dead.
>
> I think we could say that lock semantics are different from barriers. They
> are more like acquire and release on IA64. The problem with smb_mb_*** is
> that the coder *explicitly* requested a barrier operation and we do not
> give it to him.
I was browsing sparc64 code and it defines:
include/asm-sparc64/bitops.h:
#define smp_mb__after_clear_bit() membar_storeload_storestore()
With my very na?ve knowledge of sparc64, it doesn't look like a full barrier.
Maybe sparc64 is broken too ...
On Fri, 31 Mar 2006, Chen, Kenneth W wrote:
> > I think we could say that lock semantics are different from barriers. They
> > are more like acquire and release on IA64. The problem with smb_mb_*** is
> > that the coder *explicitly* requested a barrier operation and we do not
> > give it to him.
>
> I was browsing sparc64 code and it defines:
>
> include/asm-sparc64/bitops.h:
> #define smp_mb__after_clear_bit() membar_storeload_storestore()
>
> With my very na?ve knowledge of sparc64, it doesn't look like a full barrier.
> Maybe sparc64 is broken too ...
Dave, how does sparc64 handle this situation?
Christoph Lameter wrote:
> On Fri, 31 Mar 2006, Chen, Kenneth W wrote:
>
>
>>>I think we could say that lock semantics are different from barriers. They
>>>are more like acquire and release on IA64. The problem with smb_mb_*** is
>>>that the coder *explicitly* requested a barrier operation and we do not
>>>give it to him.
>>
>>I was browsing sparc64 code and it defines:
>>
>>include/asm-sparc64/bitops.h:
>>#define smp_mb__after_clear_bit() membar_storeload_storestore()
>>
>>With my very na?ve knowledge of sparc64, it doesn't look like a full barrier.
>>Maybe sparc64 is broken too ...
>
>
> Dave, how does sparc64 handle this situation?
It looks like sparc64 always expects paired smp_mb__* operations,
before and after the clear_bit.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
Christoph Lameter wrote:
> On Fri, 31 Mar 2006, Nick Piggin wrote:
>>You acknowledge that you have to fix ia64 to match current semantics
>>first, right?
>
>
> Right. I believe I have done so by making both smb_mb_* full barriers.
All bitop and atomic test_and_set, inc_return, etc etc (ie. everything
that modifies the operand and returns something) needs to be a full
barrier before and after too.
>>Now people seem to be worried about the performance impact that will
>>have, so I simply suggest that adding two or three new macros for the
>>important cases to give you a 90% solution.
>
>
> We could transition some key locations of core code to use _mode bitops
> if there are performance problems.
>
>
>>I think Documentation/atomic_ops.txt isn't bad. smp_mb__* really
>>is a smp_mb, which can be optimised sometimes.
>
>
> Ok. Then we are on the same page and the solution I presented may be
> acceptable. I have a new rev here that changes the naming a bit but I
> think we are okay right?
Not sure, to be honest. I think it is probably something which needs
input from all the other arch people, and Linus, if you intend to use
it to introduce new types of barriers.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com