2001-10-09 01:55:23

by Paul E. McKenney

[permalink] [raw]
Subject: RFC: patch to allow lock-free traversal of lists with insertion

Request for comments...

This is a proposal to provide a wmb()-like primitive that enables
lock-free traversal of lists while elements are concurrently being
inserted into these lists.

This is already possible on many popular architectures, but not on
some CPUs with very weak memory consistency models. See:

http://lse.sourceforge.net/locking/wmbdd.html

for more details.

I am particularly interested in comments from people who understand
the detailed operation of the SPARC membar instruction and the PARISC
SYNC instruction. My belief is that the membar("#SYNC") and SYNC
instructions are sufficient, but the PA-RISC 2.0 Architecture book by
Kane is not completely clear, and I have not yet received my copy of
the SPARC architecture manual.

Thoughts?

Thanx, Paul

PS. This patch applies cleanly to 2.4.9 and 2.4.10. I have tested
the Alpha algorithm in an emulated environment, but do not have
access to most of the machines with interesting changes.


diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/arch/alpha/kernel/smp.c linux-v2.4.9.wmbdd/arch/alpha/kernel/smp.c
--- linux-v2.4.9/arch/alpha/kernel/smp.c Sun Aug 12 10:51:41 2001
+++ linux-v2.4.9.wmbdd/arch/alpha/kernel/smp.c Fri Sep 21 13:13:52 2001
@@ -55,8 +55,20 @@
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_MB,
};

+/* Global and per-CPU state for global MB shootdown. */
+static struct {
+ spinlock_t mutex;
+ unsigned long need_mb; /* bitmask of CPUs that need to do "mb". */
+ long curgen; /* Each "generation" is a group of requests */
+ long maxgen; /* that is handled by one set of "mb"s. */
+} mb_global_data __cacheline_aligned = { SPIN_LOCK_UNLOCKED, 0, 1, 0 };
+static struct {
+ unsigned long mygen ____cacheline_aligned;
+} mb_data[NR_CPUS] __cacheline_aligned;
+
spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;

/* Set to a secondary's cpuid when it comes online. */
@@ -764,6 +776,41 @@
goto again;
}

+/*
+ * Execute an "mb" instruction in response to an IPI_MB. Also directly
+ * called by smp_global_mb(). If this is the last CPU to respond to
+ * an smp_global_mb(), then check to see if an additional generation of
+ * requests needs to be satisfied.
+ */
+
+void
+handle_mb_ipi(void)
+{
+ int this_cpu = smp_processor_id();
+ unsigned long this_cpu_mask = 1UL << this_cpu;
+ unsigned long flags;
+ unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+ /* Avoid lock contention when extra IPIs arrive (due to race) and
+ when waiting for global mb shootdown. */
+ if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+ return;
+ }
+ spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */
+ if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+ return;
+ }
+ mb_global_data.need_mb &= this_cpu_mask;
+ if (mb_global_data.need_mb == 0) {
+ if (++mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+ mb_global_data.need_mb = to_whom;
+ send_ipi_message(to_whom, IPI_MB);
+ }
+ }
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags); /* implied mb */
+}
+
void
handle_ipi(struct pt_regs *regs)
{
@@ -817,6 +864,9 @@
else if (which == IPI_CPU_STOP) {
halt();
}
+ else if (which == IPI_MB) {
+ handle_mb_ipi();
+ }
else {
printk(KERN_CRIT "Unknown IPI on CPU %d: %lu\n",
this_cpu, which);
@@ -852,6 +902,58 @@
printk(KERN_WARNING "smp_send_stop: Not on boot cpu.\n");
#endif
send_ipi_message(to_whom, IPI_CPU_STOP);
+}
+
+/*
+ * Execute an "mb" instruction, then force all other CPUs to execute "mb"
+ * instructions. Does not block. Once this function returns, the caller
+ * is guaranteed that all of its memory writes preceding the call to
+ * smp_global_mb() will be seen by all CPUs as preceding all memory
+ * writes following the call to smp_global_mb().
+ *
+ * For example, if CPU 0 does:
+ * a.data = 1;
+ * smp_global_mb();
+ * p = &a;
+ * and CPU 1 does:
+ * d = p->data;
+ * where a.data is initially garbage and p initially points to another
+ * structure with the "data" field being zero, then CPU 1 will be
+ * guaranteed to have "d" set to either 0 or 1, never garbage.
+ *
+ * Note that the Alpha "wmb" instruction is -not- sufficient!!! If CPU 0
+ * were replace the smp_global_mb() with a wmb(), then CPU 1 could end
+ * up with garbage in "d"!
+ *
+ * This function sends IPIs to all other CPUs, then spins waiting for
+ * them to receive the IPI and execute an "mb" instruction. While
+ * spinning, this function -must- respond to other CPUs executing
+ * smp_global_mb() concurrently, otherwise, deadlock would result.
+ */
+
+void
+smp_global_mb(void)
+{
+ int this_cpu = smp_processor_id();
+ unsigned long this_cpu_mask = 1UL << this_cpu;
+ unsigned long flags;
+ unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+ spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */
+ if (mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+ mb_global_data.maxgen = mb_global_data.curgen + 1;
+ } else {
+ mb_global_data.maxgen = mb_global_data.curgen;
+ mb_global_data.need_mb = to_whom;
+ send_ipi_message(to_whom, IPI_MB);
+ }
+ mb_data[this_cpu].mygen = mb_global_data.maxgen;
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+ while (mb_data[this_cpu].mygen - mb_global_data.curgen >= 0) {
+ handle_mb_ipi();
+ barrier();
+ }
+
}

/*
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-alpha/system.h linux-v2.4.9.wmbdd/include/asm-alpha/system.h
--- linux-v2.4.9/include/asm-alpha/system.h Sun Aug 12 10:38:47 2001
+++ linux-v2.4.9.wmbdd/include/asm-alpha/system.h Fri Sep 21 13:14:37 2001
@@ -151,14 +151,21 @@
#define wmb() \
__asm__ __volatile__("wmb": : :"memory")

+#define mbdd() smp_mbdd()
+#define wmbdd() smp_wmbdd()
+
#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() smp_global_mb()
+#define smp_wmbdd() smp_mbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-arm/system.h linux-v2.4.9.wmbdd/include/asm-arm/system.h
--- linux-v2.4.9/include/asm-arm/system.h Mon Nov 27 17:07:59 2000
+++ linux-v2.4.9.wmbdd/include/asm-arm/system.h Wed Sep 19 14:35:07 2001
@@ -39,6 +39,8 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");

#define prepare_to_switch() do { } while(0)
@@ -68,12 +70,16 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() rmbdd()
+#define smp_wmbdd() wmbdd()

#else

#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()

#define cli() __cli()
#define sti() __sti()
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-cris/system.h linux-v2.4.9.wmbdd/include/asm-cris/system.h
--- linux-v2.4.9/include/asm-cris/system.h Tue May 1 16:05:00 2001
+++ linux-v2.4.9.wmbdd/include/asm-cris/system.h Wed Sep 19 14:36:26 2001
@@ -144,15 +144,21 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define iret()
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-i386/system.h linux-v2.4.9.wmbdd/include/asm-i386/system.h
--- linux-v2.4.9/include/asm-i386/system.h Wed Aug 15 14:21:11 2001
+++ linux-v2.4.9.wmbdd/include/asm-i386/system.h Thu Sep 20 10:09:09 2001
@@ -272,15 +272,21 @@
#define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
#define rmb() mb()
#define wmb() __asm__ __volatile__ ("": : :"memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-ia64/system.h linux-v2.4.9.wmbdd/include/asm-ia64/system.h
--- linux-v2.4.9/include/asm-ia64/system.h Tue Jul 31 10:30:09 2001
+++ linux-v2.4.9.wmbdd/include/asm-ia64/system.h Fri Sep 21 13:17:01 2001
@@ -84,11 +84,36 @@
* like regions are visible before any subsequent
* stores and that all following stores will be
* visible only after all previous stores.
- * rmb(): Like wmb(), but for reads.
+ * In common code, any reads that depend on this
+ * ordering must be separated by an mb() or rmb().
+ * rmb(): Guarantees that all preceding loads to memory-
+ * like regions are executed before any subsequent
+ * loads.
* mb(): wmb()/rmb() combo, i.e., all previous memory
* accesses are visible before all subsequent
* accesses and vice versa. This is also known as
- * a "fence."
+ * a "fence." Again, in common code, any reads that
+ * depend on the order of writes must themselves be
+ * separated by an mb() or rmb().
+ * wmbdd(): Guarantees that all preceding stores to memory-
+ * like regions are visible before any subsequent
+ * stores and that all following stores will be
+ * visible only after all previous stores.
+ * In common code, any reads that depend on this
+ * ordering either must be separated by an mb()
+ * or rmb(), or the later reads must depend on
+ * data loaded by the earlier reads. For an example
+ * of the latter, consider "p->next". The read of
+ * the "next" field depends on the read of the
+ * pointer "p".
+ * mbdd(): wmb()/rmb() combo, i.e., all previous memory
+ * accesses are visible before all subsequent
+ * accesses and vice versa. This is also known as
+ * a "fence." Again, in common code, any reads that
+ * depend on the order of writes must themselves be
+ * separated by an mb() or rmb(), or there must be
+ * a data dependency that forces the second to
+ * wait until the first completes.
*
* Note: "mb()" and its variants cannot be used as a fence to order
* accesses to memory mapped I/O registers. For that, mf.a needs to
@@ -99,15 +124,21 @@
#define mb() __asm__ __volatile__ ("mf" ::: "memory")
#define rmb() mb()
#define wmb() mb()
+#define rmbdd() mb()
+#define wmbdd() mb()

#ifdef CONFIG_SMP
# define smp_mb() mb()
# define smp_rmb() rmb()
# define smp_wmb() wmb()
+# define smp_mbdd() mbdd()
+# define smp_wmbdd() wmbdd()
#else
# define smp_mb() barrier()
# define smp_rmb() barrier()
# define smp_wmb() barrier()
+# define smp_mbdd() barrier()
+# define smp_wmbdd() barrier()
#endif

/*
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-m68k/system.h linux-v2.4.9.wmbdd/include/asm-m68k/system.h
--- linux-v2.4.9/include/asm-m68k/system.h Mon Jun 11 19:15:27 2001
+++ linux-v2.4.9.wmbdd/include/asm-m68k/system.h Wed Sep 19 14:39:18 2001
@@ -81,12 +81,16 @@
#define mb() barrier()
#define rmb() barrier()
#define wmb() barrier()
+#define rmbdd() barrier()
+#define wmbdd() barrier()
#define set_mb(var, value) do { xchg(&var, value); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)

#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()


#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-mips/system.h linux-v2.4.9.wmbdd/include/asm-mips/system.h
--- linux-v2.4.9/include/asm-mips/system.h Mon Jul 2 13:56:40 2001
+++ linux-v2.4.9.wmbdd/include/asm-mips/system.h Wed Sep 19 14:41:26 2001
@@ -152,6 +152,8 @@
#define rmb() do { } while(0)
#define wmb() wbflush()
#define mb() wbflush()
+#define wmbdd() wbflush()
+#define mbdd() wbflush()

#else /* CONFIG_CPU_HAS_WB */

@@ -167,6 +169,8 @@
: "memory")
#define rmb() mb()
#define wmb() mb()
+#define wmbdd() mb()
+#define mbdd() mb()

#endif /* CONFIG_CPU_HAS_WB */

@@ -174,10 +178,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-mips64/system.h linux-v2.4.9.wmbdd/include/asm-mips64/system.h
--- linux-v2.4.9/include/asm-mips64/system.h Wed Jul 4 11:50:39 2001
+++ linux-v2.4.9.wmbdd/include/asm-mips64/system.h Wed Sep 19 14:42:41 2001
@@ -148,15 +148,21 @@
: "memory")
#define rmb() mb()
#define wmb() mb()
+#define rmbdd() mb()
+#define wmbdd() mb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-parisc/system.h linux-v2.4.9.wmbdd/include/asm-parisc/system.h
--- linux-v2.4.9/include/asm-parisc/system.h Wed Dec 6 11:46:39 2000
+++ linux-v2.4.9.wmbdd/include/asm-parisc/system.h Wed Sep 19 14:06:31 2001
@@ -51,6 +51,8 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() rmb()
+#define smp_wmbdd() wmb()
#else
/* This is simply the barrier() macro from linux/kernel.h but when serial.c
* uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
@@ -59,6 +61,8 @@
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");
#endif

/* interrupt control */
@@ -122,6 +126,8 @@

#define mb() __asm__ __volatile__ ("sync" : : :"memory")
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() mb()

extern unsigned long __xchg(unsigned long, unsigned long *, int);

diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-ppc/system.h linux-v2.4.9.wmbdd/include/asm-ppc/system.h
--- linux-v2.4.9/include/asm-ppc/system.h Mon May 21 15:02:06 2001
+++ linux-v2.4.9.wmbdd/include/asm-ppc/system.h Wed Sep 19 14:07:51 2001
@@ -33,6 +33,8 @@
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("eieio" : : : "memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#define set_mb(var, value) do { var = value; mb(); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)
@@ -41,10 +43,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#else
#define smp_mb() __asm__ __volatile__("": : :"memory")
#define smp_rmb() __asm__ __volatile__("": : :"memory")
#define smp_wmb() __asm__ __volatile__("": : :"memory")
+#define smp_mbdd() __asm__ __volatile__("": : :"memory")
+#define smp_wmbdd() __asm__ __volatile__("": : :"memory")
#endif /* CONFIG_SMP */

#ifdef __KERNEL__
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-s390/system.h linux-v2.4.9.wmbdd/include/asm-s390/system.h
--- linux-v2.4.9/include/asm-s390/system.h Wed Jul 25 14:12:02 2001
+++ linux-v2.4.9.wmbdd/include/asm-s390/system.h Wed Sep 19 14:09:47 2001
@@ -118,9 +118,13 @@
#define mb() eieio()
#define rmb() eieio()
#define wmb() eieio()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#define smp_mb__before_clear_bit() smp_mb()
#define smp_mb__after_clear_bit() smp_mb()

diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-s390x/system.h linux-v2.4.9.wmbdd/include/asm-s390x/system.h
--- linux-v2.4.9/include/asm-s390x/system.h Wed Jul 25 14:12:03 2001
+++ linux-v2.4.9.wmbdd/include/asm-s390x/system.h Wed Sep 19 14:10:11 2001
@@ -131,9 +131,13 @@
#define mb() eieio()
#define rmb() eieio()
#define wmb() eieio()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#define smp_mb__before_clear_bit() smp_mb()
#define smp_mb__after_clear_bit() smp_mb()

diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-sh/system.h linux-v2.4.9.wmbdd/include/asm-sh/system.h
--- linux-v2.4.9/include/asm-sh/system.h Sun Jan 28 18:56:00 2001
+++ linux-v2.4.9.wmbdd/include/asm-sh/system.h Wed Sep 19 14:11:44 2001
@@ -89,15 +89,21 @@
#define mb() __asm__ __volatile__ ("": : :"memory")
#define rmb() mb()
#define wmb() __asm__ __volatile__ ("": : :"memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-sparc/system.h linux-v2.4.9.wmbdd/include/asm-sparc/system.h
--- linux-v2.4.9/include/asm-sparc/system.h Tue Oct 3 09:24:41 2000
+++ linux-v2.4.9.wmbdd/include/asm-sparc/system.h Wed Sep 19 14:26:40 2001
@@ -278,11 +278,15 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define set_mb(__var, __value) do { __var = __value; mb(); } while(0)
#define set_wmb(__var, __value) set_mb(__var, __value)
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");

#define nop() __asm__ __volatile__ ("nop");

diff -urN -X /home/mckenney/dontdiff linux-v2.4.9/include/asm-sparc64/system.h linux-v2.4.9.wmbdd/include/asm-sparc64/system.h
--- linux-v2.4.9/include/asm-sparc64/system.h Thu Apr 26 22:17:26 2001
+++ linux-v2.4.9.wmbdd/include/asm-sparc64/system.h Wed Sep 19 14:34:10 2001
@@ -102,6 +102,8 @@
membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad");
#define rmb() membar("#LoadLoad")
#define wmb() membar("#StoreStore")
+#define mbdd() membar("#Sync")
+#define wmbdd() membar("#Sync")
#define set_mb(__var, __value) \
do { __var = __value; membar("#StoreLoad | #StoreStore"); } while(0)
#define set_wmb(__var, __value) \
@@ -111,10 +113,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");
#endif

#define flushi(addr) __asm__ __volatile__ ("flush %0" : : "r" (addr) : "memory")


2001-10-09 02:18:45

by David Miller

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

From: "Paul E. McKenney" <[email protected]>
Date: Mon, 8 Oct 2001 18:55:24 -0700 (PDT)

I am particularly interested in comments from people who understand
the detailed operation of the SPARC membar instruction and the PARISC
SYNC instruction. My belief is that the membar("#SYNC") and SYNC
instructions are sufficient,

SYNC is sufficient but way too strict. You don't explicitly say what
you need to happen. If you need all previous stores to finish
before all subsequent memory operations then:

membar #StoreStore | #StoreLoad

is sufficient. If you need all previous memory operations to finish
before all subsequent stores then:

membar #StoreStore | #LoadStore

is what you want.

Thoughts?

I think if you need to perform IPIs and junk like that to make the
memory barrier happen correctly, just throw your code away and use a
spinlock instead.

Franks a lot,
David S. Miller
[email protected]

2001-10-09 05:45:16

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> I am particularly interested in comments from people who understand
> the detailed operation of the SPARC membar instruction and the PARISC
> SYNC instruction. My belief is that the membar("#SYNC") and SYNC
> instructions are sufficient,
>
> SYNC is sufficient but way too strict. You don't explicitly say what
> you need to happen. If you need all previous stores to finish
> before all subsequent memory operations then:
>
> membar #StoreStore | #StoreLoad
>
> is sufficient. If you need all previous memory operations to finish
> before all subsequent stores then:
>
> membar #StoreStore | #LoadStore
>
> is what you want.

I need to segregate the stores executed by the CPU doing the membar.
All other CPUs must observe the preceding stores before the following
stores. Of course, this means that the loads on the observing CPUs
must be ordered somehow. I need data dependencies between the loads
to be sufficient to order the loads.

For example, if a CPU executes the following:

a = new_value;
wmbdd();
p = &a;

then i need any other CPU executing:

d = *p;

to see either the value that "p" pointed to before the "p = &a" assignment,
or "new_value", -never- the old value of "a".

Does this do the trick?

membar #StoreStore

> Thoughts?
>
> I think if you need to perform IPIs and junk like that to make the
> memory barrier happen correctly, just throw your code away and use a
> spinlock instead.

The IPIs and related junk are I believe needed only on Alpha, which has
no single memory-barrier instruction that can do wmbdd()'s job. Given
that Alpha seems to be on its way out, this did not seem to me to be
too horrible.

Thanx, Paul

2001-10-09 05:56:00

by David Miller

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

From: "Paul McKenney" <[email protected]>
Date: Mon, 8 Oct 2001 22:27:44 -0700

All other CPUs must observe the preceding stores before the following
stores.
...
Does this do the trick?

membar #StoreStore

Yes.

The IPIs and related junk are I believe needed only on Alpha, which has
no single memory-barrier instruction that can do wmbdd()'s job. Given
that Alpha seems to be on its way out, this did not seem to me to be
too horrible.

I somehow doubt that you need an IPI to implement the equivalent of
"membar #StoreStore" on Alpha. Richard?

Franks a lot,
David S. Miller
[email protected]

2001-10-09 06:43:45

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote:
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha. Richard?

Lol. Of course not. Is someone under the impression that AXP
designers were smoking crack?

"wmb" == "membar #StoreStore".
"mb" == "membar #Sync".

See the nice mb/rmb/wmb macros in <asm/system.h>.


r~

2001-10-09 06:52:06

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote:
> This is a proposal to provide a wmb()-like primitive that enables
> lock-free traversal of lists while elements are concurrently being
> inserted into these lists.

I've discussed this with you before and you continue to have
completely missed the point.

Alpha requires that you issue read-after-read memory barriers on
the reader side if you require ordering between reads. That is
the extent of the weakness of the memory ordering.

Sparc64 is the same way.

This crap will never be applied. Your algorithms are simply broken
if you do not ensure proper read ordering via the rmb() macro.



r~

2001-10-09 07:13:33

by BALBIR SINGH

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


Paul E. McKenney wrote:

>Request for comments...
>
>This is a proposal to provide a wmb()-like primitive that enables
>lock-free traversal of lists while elements are concurrently being
>inserted into these lists.
>
>This is already possible on many popular architectures, but not on
>
>some CPUs with very weak memory consistency models. See:
>
> http://lse.sourceforge.net/locking/wmbdd.html
>
>for more details.
>
>I am particularly interested in comments from people who understand
>the detailed operation of the SPARC membar instruction and the PARISC
>SYNC instruction. My belief is that the membar("#SYNC") and SYNC
>instructions are sufficient, but the PA-RISC 2.0 Architecture book by
>Kane is not completely clear, and I have not yet received my copy of
>the SPARC architecture manual.
>
>Thoughts?
>

1) On Alpha this code does not improve performance since we end up using spinlocks
for my_global_data anyway, I think you already know this. I am a little confused
with the code snippet below


+ spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */
+ if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+ return;
+ }
+ mb_global_data.need_mb &= this_cpu_mask;
+ if (mb_global_data.need_mb == 0) {
+ if (++mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+ mb_global_data.need_mb = to_whom;
+ send_ipi_message(to_whom, IPI_MB);
+ }
+ }
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags); /* implied mb */

Are we not checking for the same thing in
(mb_global_data.need_mb &am
p; this_cpu_mask) == 0

and in
mb_global_data.need_mb &= this_cpu_mask;
if (mb_global_data.need_mb == 0) {

In case 1 we return, but in case two we increment the curgen and if curgen is less than
or equal to maxgen then we send IPI's again.

2) What are the rules for incrementing or changing the generation numbers curgen, mygen and
maxgen in your code?

The approach is good, but what are the pratical uses of the approach. Like u mentioned a newly
added element may not show up in the search, searches using this method may have to search again
and there is no way of guaranty that an element that we are looking for will be found (especially
if it is just being added to the list).

The idea is tremendous for approaches where we do not care about elements being newly added.
It should definitely be in the Linux kernel :-)


Balbir

>
>
> Thanx, Paul
>


Attachments:
Wipro_Disclaimer.txt (853.00 B)

2001-10-09 07:41:17

by Dipankar Sarma

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 12:43:55PM +0530, BALBIR SINGH wrote:
> 1) On Alpha this code does not improve performance since we end up using spinlocks
> for my_global_data anyway, I think you already know this.

It may if you don't update very often. It depends on your
read-to-write ratio.

>
> The approach is good, but what are the pratical uses of the approach. Like u mentioned a newly
> added element may not show up in the search, searches using this method may have to search again
> and there is no way of guaranty that an element that we are looking for will be found (especially
> if it is just being added to the list).
>
> The idea is tremendous for approaches where we do not care about elements being newly added.
> It should definitely be in the Linux kernel :-)

Either you see the element or you don't. If you want to avoid duplication,
you could do a locked search before inserting it.
Like I said before, lock-less lookups are useful for read-mostly
data. Yes, updates are costly, but if they happen rarely, you still benefit.

Thanks
Dipankar
--
Dipankar Sarma <[email protected]> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

2001-10-09 08:21:16

by BALBIR SINGH

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

Dipankar Sarma wrote:

>On Tue, Oct 09, 2001 at 12:43:55PM +0530, BALBIR SINGH wrote:
>
>>1) On Alpha this code does not improve performance since we end up using spinlocks
>>for my_global_data anyway, I think you already know this.
>>
>
>It may if you don't update very often. It depends on your
>read-to-write ratio.
>
>>The approach is good, but what are the pratical uses of the approach. Like u mentioned a newly
>>added element may not show up in the search, searches using this method may have to search again
>>and there is no way of guaranty that an element that we are looking for will be found (especially
>>if it is just being added to the list).
>>
>>The idea is tremendous for approaches where we do not care about elements being newly added.
>>It should definitely be in the Linux kernel :-)
>>
>
>Either you see the element or you don't. If you want to avoid duplication,
>you could do a locked search before inserting it.
>Like I said before, lock-less lookups are useful for read-mostly
>data. Yes, updates are costly, but if they happen rarely, you still benefit.
>
How does this compare to the Read-Copy-Update mechanism? Is this just another way of implementing
it, given different usage rules.

Balbir


>
>Thanks
>Dipankar
>




Attachments:
Wipro_Disclaimer.txt (853.00 B)

2001-10-09 08:45:36

by Dipankar Sarma

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 01:51:45PM +0530, BALBIR SINGH wrote:
> Dipankar Sarma wrote:
>
> >Either you see the element or you don't. If you want to avoid duplication,
> >you could do a locked search before inserting it.
> >Like I said before, lock-less lookups are useful for read-mostly
> >data. Yes, updates are costly, but if they happen rarely, you still benefit.
> >
> How does this compare to the Read-Copy-Update mechanism? Is this just another way of implementing
> it, given different usage rules.

Fundamentally, yes, RCU is a method for lock-less lookup. It is just
that RCU is elaborate enough to allow you deletion and freeing of
data as well.

Thanks
Dipankar
--
Dipankar Sarma <[email protected]> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

2001-10-09 09:09:13

by Rusty Russell

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Mon, 8 Oct 2001 23:52:08 -0700
Richard Henderson <[email protected]> wrote:

> On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote:
> > This is a proposal to provide a wmb()-like primitive that enables
> > lock-free traversal of lists while elements are concurrently being
> > inserted into these lists.
>
> I've discussed this with you before and you continue to have
> completely missed the point.
>
> Alpha requires that you issue read-after-read memory barriers on
> the reader side if you require ordering between reads. That is
> the extent of the weakness of the memory ordering.

Actually, I think you are missing the point. On most architectures, a
writer can do:

int *global_ptr = NULL;

x = 1;
wmb();
global_ptr = &x;

And a reader can rely on the dereference as an implicit read barrier:

if (global_ptr) {
if (*global_ptr != 1)
BUG();
}

This is *not guarenteed* on the Alpha. To quote an earlier offline mail
from Paul McKenney (cc'd to you):

> The case that they said can cause trouble is if the CPU 1's caches are
> partitioned. In such cases, the partitions of the cache run pretty much
> independently. If "p" is in a cacheline handled by one partition, and
> "*p" is in a cacheline handled by the other partition, it is possible
> for the invalidation of "p" (due to the "p = &a") to get processed before
> the invalidation of "*p" (which is just "a", due to the "a = 1").

Now, expecting every piece of code to insert an rmb() before dereferencing
a pointer in these cases, just so Alphas don't fail occasionally is NOT a
good solution. Inventing a "rmb_me_harder()" macro for Alpha only is pretty
horrible too. I don't *like* making Alpha's wmb() stronger, but it is the
only solution which doesn't touch common code.

Of course, we can continue to ignore it, which is my preferred solution.

Hope that helps,
Rusty.
PS. This was used by Alan Cox for lock-free insertion into the firewall linked
list in 2.0 - 2.2, and will become more common.

2001-10-09 16:11:01

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> I don't *like* making Alpha's wmb() stronger, but it is the
> only solution which doesn't touch common code.

It's not a "solution" at all. It's so heavy weight you'd be
much better off with locks. Just use the damned rmb_me_harder.


r~

2001-10-09 16:50:45

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote:
> > I somehow doubt that you need an IPI to implement the equivalent of
> > "membar #StoreStore" on Alpha. Richard?
>
> Lol. Of course not. Is someone under the impression that AXP
> designers were smoking crack?

The ones I have talked to showed no signs of having done so. However,
their architecture -does- make it quite challenging for anyone trying to
write lock-free common code, hence all the IPIs.

> "wmb" == "membar #StoreStore".
> "mb" == "membar #Sync".
>
> See the nice mb/rmb/wmb macros in <asm/system.h>.

OK, if "membar #StoreStore" really is equivalent to "wmb", then
"membar #StoreStore" definitely will -not- do the job required here.
Will "membar #SYNC" allow read-side "membar #ReadRead"s to be omitted,
or does "membar #SYNC" fail to detect when outstanding cache
invalidations complete?

Thanx, Paul

2001-10-09 16:50:54

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote:
> > This is a proposal to provide a wmb()-like primitive that enables
> > lock-free traversal of lists while elements are concurrently being
> > inserted into these lists.
>
> I've discussed this with you before and you continue to have
> completely missed the point.

It would not be the first point that I have completely missed, but
please read on. I have discussed this algorithm with Alpha architects,
who tell me that it is sound.

> Alpha requires that you issue read-after-read memory barriers on
> the reader side if you require ordering between reads. That is
> the extent of the weakness of the memory ordering.

I agree that Alpha requires "mb" instructions to be executed on the
reading CPUs if the reading CPUs are to observe some other CPU's writes
occuring in order. And I agree that the usual way that this is done
is to insert "mb" instructions between the reads on the read side.

However, if the reading CPU executes an "mb" instruction between the
time that the writing CPU executes the "wmb" and the time that the writing
CPU executes the second store, then the reading CPU is guaranteed to
see the writes in order. Here is how this happens:


Initial values: a = 0, p = &a, b = 1.


Writing CPU Reading CPU

1) b = 2;

2) Execute "wmb" instruction

3) Send a bunch of IPIs

4) Receive IPI

5) Execute "mb" instruction

6) Indicate completion

7) Detect completion

8) p = &b

The combination of steps 2 and 5 guarantee that the reading CPU will
invalidate the cacheline containing the old value of "b" before it
can possibly reference the new value of "p". The CPU must read "p"
before "b", since it can't know where "p" points before reading it.

> Sparc64 is the same way.

I can believe that "membar #StoreStore" and friends operate in the same
way that the Alpha memory-ordering instructions do. However, some of
the code in Linux seems to rely on "membar #SYNC" waiting for outstanding
invalidations to complete. If this is the case, then "membar #SYNC"
could be used to segregate writes when the corresponding reads are
implicitly ordered by data dependencies, as they are during pointer
dereferences.

> This crap will never be applied. Your algorithms are simply broken
> if you do not ensure proper read ordering via the rmb() macro.

Please see the example above. I do believe that my algorithms are
reliably forcing proper read ordering using IPIs, just in an different
way. Please note that I have discussed this algorithm with Alpha
architects, who believe that it is sound.

But they (and I) may well be confused. If so, could you please
show me what I am missing?

Thanx, Paul

2001-10-09 16:50:45

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> From: "Paul McKenney" <[email protected]>
> Date: Mon, 8 Oct 2001 22:27:44 -0700
>
> All other CPUs must observe the preceding stores before the following
> stores.
> ...
> Does this do the trick?
>
> membar #StoreStore
>
> Yes.

Cool! Thank you!!!

> The IPIs and related junk are I believe needed only on Alpha, which
has
> no single memory-barrier instruction that can do wmbdd()'s job. Given
> that Alpha seems to be on its way out, this did not seem to me to be
> too horrible.
>
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha. Richard?

If "membar #StoreStore" is sufficient, then there is no equivalent of
it on Alpha. Neither the "mb" nor the "wmb" instructions wait for
outstanding invalidations to complete, and therefore do -not- guarantee
that reading CPUs will see writes occuring in the order that the writes
occurred on the writing CPU, even if data dependencies force the order
of the reads (as the pointer-dereference example I gave does).

On Alpha, there -must- be an "mb" on the reading CPU if the reading CPU
is to observe the stores in order. The IPIs are just a way of causing
those "mb"s to happen without having code like this:

d = p->a->b;

from having to be written as follows:

q = p->a;
rmb();
d = q->b;

More thoughts?

Thanx, Paul

2001-10-09 16:50:45

by Manfred Spraul

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

>
> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> > I don't *like* making Alpha's wmb() stronger, but it is the
> > only solution which doesn't touch common code.
>
> It's not a "solution" at all. It's so heavy weight you'd be
> much better off with locks. Just use the damned rmb_me_harder.

rmb_me_harder? smp_mb__{before,after}_{atomic_dec,clear_bit} are already ugly enough.

What about hiding all these details in the list access macros? list_insert, list_get_next, etc. With a default implementation based
on a spinlock, and the capable SMP architectures could define an optimized version.

Then Alpha could do whatever flushing is required. But please do not scatter memory barrier instructions all around the kernel.

--
Manfred

2001-10-09 17:00:34

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above. I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way.

I wasn't suggesting that the IPI wouldn't work -- it will.
But it will be _extremely_ slow.

I am suggesting that the lock-free algorithms should add the
read barriers, and that failure to do so indicates that they
are incomplete. If nothing else, it documents where the real
dependancies are.


r~

2001-10-09 18:15:14

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> > Please see the example above. I do believe that my algorithms are
> > reliably forcing proper read ordering using IPIs, just in an different
> > way.
>
> I wasn't suggesting that the IPI wouldn't work -- it will.
> But it will be _extremely_ slow.

Ah! Please accept my apologies for belaboring the obvious
in my previous emails.

> I am suggesting that the lock-free algorithms should add the
> read barriers, and that failure to do so indicates that they
> are incomplete. If nothing else, it documents where the real
> dependancies are.

Such read barriers are not needed on any architecture where
data dependencies imply an rmb(). Examples include i386, PPC,
and IA64. On these architectures, read-side rmb()s add both
overhead and complexity.

On the completeness, it seems to me that in cases were updates
are rare, the IPIs fill in the gap, and with good performance
benefits. What am I missing here?

Thanx, Paul

2001-10-09 18:15:14

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> > I don't *like* making Alpha's wmb() stronger, but it is the
> > only solution which doesn't touch common code.
>
> It's not a "solution" at all. It's so heavy weight you'd be
> much better off with locks. Just use the damned rmb_me_harder.

There are a number of cases where updates are extremely rare.
FD management and module unloading are but two examples. In
such cases, the overhead of the IPIs in the extremely rare updates
is overwhelmed by the reduction in overhead in the very common
accesses.

And getting rid of rmb() or rmb_me_harder() makes the read-side
code less complex.

Thanx, Paul

2001-10-10 01:21:47

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> The IPIs and related junk are I believe needed only on Alpha, which
has
> no single memory-barrier instruction that can do wmbdd()'s job. Given
> that Alpha seems to be on its way out, this did not seem to me to be
> too horrible.
>
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha. Richard?

I received my copy of the SPARC Architecture Manual (Weaver and Germond)
today.

It turns out that there is -no- equivalent of "membar #StoreStore"
on Alpha, if I am correctly interpreting this manual.

>From section D.4.4, on page 260:

A memory order is legal in RMO if and only if:

(1) X <d Y & L(X) -> X <m Y

[... two other irrelevant cases omitted ...]

Rule (1) states that the RMO model will maintain dependence
when the preceding transaction is a load. Preceding stores
may be delayed in the implementation, so their order may
not be preserved globally.

In the example dereferencing a pointer, we first load the
pointer, then load the value it points to. The second load is
dependent on the first, and the first is a load. Thus, rule (1)
holds, and there is no need for a read-side memory barrier
between the two loads.

This is consistent with the book's definition of
"completion" and the description of the membar
instruction.

In contrast, on Alpha, unless there is an explicit rmb(), data
dependence between a pair of loads in no way forces the two loads
to be ordered. http://lse.sourceforge.net/locking/wmbdd.html
shows how Alpha can get the new value of the pointer, but the
old value of the data it points to. Alpha thus needs the rmb()
between the two loads, even though there is a data dependency.

Am I misinterpreting the SPARC manual?

Thanx, Paul

2001-10-10 01:43:52

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 06:19:49PM -0700, Paul McKenney wrote:
>
> > The IPIs and related junk are I believe needed only on Alpha, which
> has
> > no single memory-barrier instruction that can do wmbdd()'s job. Given
> > that Alpha seems to be on its way out, this did not seem to me to be
> > too horrible.
> >
> > I somehow doubt that you need an IPI to implement the equivalent of
> > "membar #StoreStore" on Alpha. Richard?
>
> I received my copy of the SPARC Architecture Manual (Weaver and Germond)
> today.
>
> It turns out that there is -no- equivalent of "membar #StoreStore"
> on Alpha, if I am correctly interpreting this manual.

The equivalent of "membar #StoreStore" on alpha is be the "wmb" asm
instruction, in linux common code called wmb().

> >From section D.4.4, on page 260:
>
> A memory order is legal in RMO if and only if:
>
> (1) X <d Y & L(X) -> X <m Y
>
> [... two other irrelevant cases omitted ...]
>
> Rule (1) states that the RMO model will maintain dependence
> when the preceding transaction is a load. Preceding stores
> may be delayed in the implementation, so their order may
> not be preserved globally.
>
> In the example dereferencing a pointer, we first load the
> pointer, then load the value it points to. The second load is
> dependent on the first, and the first is a load. Thus, rule (1)
> holds, and there is no need for a read-side memory barrier
> between the two loads.
>
> This is consistent with the book's definition of
> "completion" and the description of the membar
> instruction.
>
> In contrast, on Alpha, unless there is an explicit rmb(), data
> dependence between a pair of loads in no way forces the two loads
> to be ordered. http://lse.sourceforge.net/locking/wmbdd.html
> shows how Alpha can get the new value of the pointer, but the
> old value of the data it points to. Alpha thus needs the rmb()
> between the two loads, even though there is a data dependency.

You remeber I was suprised when you told me alpha needs the rmb despite
of the data dependency :). I thought it wasn't needed (and in turn I
thought we didn't need the wmbdd). I cannot see this requirement
in any alpha specification infact. Are you sure the issue isn't
specific to old cpus or old cache coherency protocols that we can safely
ignore today? I think in SMP systems we care only about ev6 ev67 and
future chips. Also if this can really be reproduced it shouldn't be too
difficult to demonstrate it with a malicious application that stress the
race in loop, maybe somebody (Ivan?) could be interested to write such
application to test.

The IPI just for the rmb within two reads that depends on each other is
just too ugly... But yes, adding rmb() in the reader side looks even
uglier and nobody should really need it.

> Am I misinterpreting the SPARC manual?
>
> Thanx, Paul
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Andrea

2001-10-10 01:44:52

by Rusty Russell

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

In message <[email protected]> you write:
> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> > I don't *like* making Alpha's wmb() stronger, but it is the
> > only solution which doesn't touch common code.
>
> It's not a "solution" at all. It's so heavy weight you'd be
> much better off with locks. Just use the damned rmb_me_harder.

Wow! I'm glad you're volunteering to audit all the kernel code to fix
this Alpha-specific bug by inserting rmb_me_harder() in all the
critical locations!

Don't miss any!

I look forward to seeing your patch,
Rusty.
--
Premature optmztion is rt of all evl. --DK

2001-10-10 02:05:17

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above. I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way. Please note that I have discussed this algorithm with Alpha
> architects, who believe that it is sound.

The IPI way is certainly safe.

The point here is that it is suprisingly that alpha needs this IPI
unlike all other architectures. So while the IPI is certainly safe we
wouldn't expect it to be necessary on alpha either.

Now my only worry is that when you worked on this years ago with the
alpha architects there were old chips, old caches and old machines (ev5
maybe?).

So before changing any code, I would prefer to double check with the
current alpha architects that the read dependency really isn't enough to
enforce read ordering without the need of rmb also on the beleeding edge
ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
but we wouldn't be affected with any recent hardware compiling for
EV6/EV67. Jay, Peter, comments?

Andrea

2001-10-10 03:34:14

by Paul Mackerras

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

Richard Henderson writes:

> I am suggesting that the lock-free algorithms should add the
> read barriers, and that failure to do so indicates that they
> are incomplete. If nothing else, it documents where the real
> dependancies are.

Please, let's not go adding rmb's in places where there is already an
ordering forced by a data dependency - that will hurt performance
unnecessarily on x86, ppc, sparc, ia64, etc.

It seems to me that there are two viable alternatives:

1. Define an rmbdd() which is a no-op on all architectures except for
alpha, where it is an rmb. Richard can then have the job of
finding all the places where an rmbdd is needed, which sounds like
one of the smellier labors of Hercules to me. :)

2. Use Paul McKenney's scheme.

I personally don't really mind which gets chosen. Scheme 1 will
result in intermittent hard-to-find bugs on alpha (since the vast
majority of kernel hackers will not understand where or why rmbdd's
are required), but if Richard prefers that to scheme 2, it's his call
IMHO.

Regards,
Paul.

2001-10-10 05:06:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

In article <[email protected]>,
Andrea Arcangeli <[email protected]> wrote:
>On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
>> Please see the example above. I do believe that my algorithms are
>> reliably forcing proper read ordering using IPIs, just in an different
>> way. Please note that I have discussed this algorithm with Alpha
>> architects, who believe that it is sound.
>
>The IPI way is certainly safe.

Now, before people get all excited, what is this particular code
actually _good_ for?

Creating a lock-free list that allows insertion concurrently with lookup
is _easy_.

But what's the point? If you insert stuff, you eventually have to remove
it. What goes up must come down. Insert-inane-quote-here.

And THAT is the hard part. Doing lookup without locks ends up being
pretty much worthless, because you need the locks for the removal
anyway, at which point the whole thing looks pretty moot.

Did I miss something?

Linus

2001-10-10 05:17:12

by BALBIR SINGH

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

Linus Torvalds wrote:

>In article <[email protected]>,
>Andrea Arcangeli <[email protected]> wrote:
>
>>On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
>>
>>>Please see the example above. I do believe that my algorithms are
>>>reliably forcing proper read ordering using IPIs, just in an different
>>>way. Please note that I have discussed this algorithm with Alpha
>>>architects, who believe that it is sound.
>>>
>>The IPI way is certainly safe.
>>
>
>Now, before people get all excited, what is this particular code
>actually _good_ for?
>
>Creating a lock-free list that allows insertion concurrently with lookup
>is _easy_.
>
>But what's the point? If you insert stuff, you eventually have to remove
>it. What goes up must come down. Insert-inane-quote-here.
>
>And THAT is the hard part. Doing lookup without locks ends up being
>pretty much worthless, because you need the locks for the removal
>anyway, at which point the whole thing looks pretty moot.
>
>Did I miss something?
>
> Linus
>

What about cases like the pci device list or any other such list. Sometimes
you do not care if somebody added something, while you were looking through
the list as long as you do not get illegal addresses or data.
Wouldn't this be very useful there? Most of these lists come up
at system startup and change rearly, but we look through them often.

Me too, Did I miss something?

Balbir

>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>




Attachments:
Wipro_Disclaimer.txt (853.00 B)

2001-10-10 05:24:22

by Davide Libenzi

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, 10 Oct 2001, BALBIR SINGH wrote:

> What about cases like the pci device list or any other such list. Sometimes
> you do not care if somebody added something, while you were looking through
> the list as long as you do not get illegal addresses or data.
> Wouldn't this be very useful there? Most of these lists come up
> at system startup and change rearly, but we look through them often.
>
> Me too, Did I miss something?

What means that changes rarely that you're going to use rarely_lock() ?
If you're going to remove even only 1 element in 1000000 jiffies then
you've to lock.
If your list can only grow, that's different.




- Davide


2001-10-10 05:47:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion


On Wed, 10 Oct 2001, BALBIR SINGH wrote:
> >
> >And THAT is the hard part. Doing lookup without locks ends up being
> >pretty much worthless, because you need the locks for the removal
> >anyway, at which point the whole thing looks pretty moot.
>
> What about cases like the pci device list or any other such list. Sometimes
> you do not care if somebody added something, while you were looking through
> the list as long as you do not get illegal addresses or data.
> Wouldn't this be very useful there? Most of these lists come up
> at system startup and change rearly, but we look through them often.

It's not about "change rarely". You cannot use a non-locking lookup if
they _ever_ remove anything.

I can't think of many lists like that. The PCI lists certainly are both
add/remove: cardbus bridges and hotplug-PCI means that they are not just
purely "enumerate at bootup".

Sure, maybe there are _some_ things that don't need to ever be removed,
but I can't think of any interesting data structure off-hand. Truly static
stuff tends to be allocated in an array that is sized once - array lookups
are much faster than traversing a linked list anyway.

So the linked list approach tends to make sense for things that _aren't_
just static, but I don't know of anything that only grows and grows. In
fact, that would sound like a horrible memory leak to me if we had
something like that. Even slow growth is bad if you want up-times measured
in years.

Now, in all fairness I can imagine hacky lock-less removals too. To get
them to work, you have to (a) change the "next" pointer to point to the
next->next (and have some serialization between removals, but removals
only, to make sure you don't have next->next also going away from you) and
(b) leave the old "next" pointer (and thus the data structure) around
until you can _prove_ that nobody is looking anything up any more, and
that the now-defunct data structure can truly be removed.

However, apart from locking, there aren't all that many ways to "prove"
that non-use. You could probably do it with interesting atomic sequence
numbers etc, although by the time you generate atomic sequence numbers
your lookup is already getting heavier - to the point where locking
probably isn't so bad an idea any more.

Linus

2001-10-10 06:00:48

by BALBIR SINGH

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

Linus Torvalds wrote:

>On Wed, 10 Oct 2001, BALBIR SINGH wrote:
>
>>>And THAT is the hard part. Doing lookup without locks ends up being
>>>pretty much worthless, because you need the locks for the removal
>>>anyway, at which point the whole thing looks pretty moot.
>>>
>>What about cases like the pci device list or any other such list. Sometimes
>>you do not care if somebody added something, while you were looking through
>>the list as long as you do not get illegal addresses or data.
>>Wouldn't this be very useful there? Most of these lists come up
>>at system startup and change rearly, but we look through them often.
>>
>
>It's not about "change rarely". You cannot use a non-locking lookup if
>they _ever_ remove anything.
>
>I can't think of many lists like that. The PCI lists certainly are both
>add/remove: cardbus bridges and hotplug-PCI means that they are not just
>purely "enumerate at bootup".
>

I agree, I just thought of one case quickly. Assume that somebody did a cat /proc/pci.
Meanwhile somebody is adding a new pci device (hotplug PCI) simultaneously (which is very rare).
I would not care if the new device showed up in the output of /proc/pci this time. It would
definitely show up next time. Meanwhile locking the list (just in case it changes) is an
overhead in the case above. I was referring to these cases in my earlier mail.


>
>Sure, maybe there are _some_ things that don't need to ever be removed,
>but I can't think of any interesting data structure off-hand. Truly static
>stuff tends to be allocated in an array that is sized once - array lookups
>are much faster than traversing a linked list anyway.
>
>So the linked list approach tends to make sense for things that _aren't_
>just static, but I don't know of anything that only grows and grows. In
>fact, that would sound like a horrible memory leak to me if we had
>something like that. Even slow growth is bad if you want up-times measured
>in years.
>
>Now, in all fairness I can imagine hacky lock-less removals too. To get
>them to work, you have to (a) change the "next" pointer to point to the
>next->next (and have some serialization between removals, but removals
>only, to make sure you don't have next->next also going away from you) and
>(b) leave the old "next" pointer (and thus the data structure) around
>until you can _prove_ that nobody is looking anything up any more, and
>that the now-defunct data structure can truly be removed.
>
The problem with removals is that somebody could be adding an element simulataneously
to the "next" element. Which might cause problems. I hope I correctly understood
the scenario here.

>
>However, apart from locking, there aren't all that many ways to "prove"
>that non-use. You could probably do it with interesting atomic sequence
>numbers etc, although by the time you generate atomic sequence numbers
>your lookup is already getting heavier - to the point where locking
>probably isn't so bad an idea any more.
>
> Linus
>
>
Balbir



Attachments:
Wipro_Disclaimer.txt (853.00 B)

2001-10-10 06:22:15

by Paul Mackerras

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

Linus Torvalds writes:

> And THAT is the hard part. Doing lookup without locks ends up being
> pretty much worthless, because you need the locks for the removal
> anyway, at which point the whole thing looks pretty moot.
>
> Did I miss something?

I believe this all becomes (much more) useful when you are doing
read-copy-update.

There is an assumption that anyone modifying the list (inserting or
deleting) would take a lock first, so the deletion is just a pointer
assignment. Any reader traversing the list (without a lock) sees
either the old pointer or the new, which is fine.

The difficulty is in making sure that no reader is still inspecting
the list element you just removed before you free it, or modify any
field that the reader would be looking at (particularly the `next'
field :). One way of doing that is to defer the free or modification
to a quiescent point. If you have a separate `next_free' field, you
could safely put the element on a list of elements to be freed at the
next quiescent point.

Paul.

2001-10-10 06:31:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion


On Wed, 10 Oct 2001, Paul Mackerras wrote:
>
> There is an assumption that anyone modifying the list (inserting or
> deleting) would take a lock first, so the deletion is just a pointer
> assignment. Any reader traversing the list (without a lock) sees
> either the old pointer or the new, which is fine.

Right, that's not the problem.

> The difficulty is in making sure that no reader is still inspecting
> the list element you just removed before you free it, or modify any
> field that the reader would be looking at (particularly the `next'
> field :).

..which implies _some_ sort of locking, even if it may be deferred.

The locking can be per-CPU, whatever - have a per-CPU count for "this many
traversals in progress", and have every lookup increment it before
starting, and decrement it after stopping.

Then the deleter can actually free the thing once he has seen every CPU go
down to zero, with the proper memory barriers.

Yes, I see that it can work. But it sounds like a _lot_ of complexity for
not very much gain.

Right now, you already have to have eight CPU's to see locking as a large
problem in normal life. People have normal patches to bring that easily up
to 16. Then how much hard-to-debug-with-subtle-memory-ordering-issues code
do you want to handle the few machines that aren't in that range?

I'm just trying to inject a bit of realism here.

Linus

2001-10-10 07:14:01

by Kirill Ratkin

[permalink] [raw]
Subject: kdb requires kallsyms

Hi.

I've trouble with kdb. I've patched my kernel (stable
2.4.10) and tried to make kernel with kdb (from SGI)
and I see :
/bin/sh: /sbin/kallsyms: No such file or directory
make[1]: *** [kallsyms] error 127

Where can I find this kallsyms?

Regards,





__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

2001-10-10 07:38:17

by BALBIR SINGH

[permalink] [raw]
Subject: Re: kdb requires kallsyms

Kirill Ratkin wrote:

>Hi.
>
>I've trouble with kdb. I've patched my kernel (stable
>2.4.10) and tried to make kernel with kdb (from SGI)
>and I see :
>/bin/sh: /sbin/kallsyms: No such file or directory
>make[1]: *** [kallsyms] error 127
>
>Where can I find this kallsyms?
>
modutils provides this executable. please use
rpm -q --whatprovides `type <name of file>' to see which package provides
the executable.

Regards,
Balbir

>
>Regards,
>
>
>
>
>
>__________________________________________________
>Do You Yahoo!?
>Make a great connection at Yahoo! Personals.
>http://personals.yahoo.com
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>




Attachments:
Wipro_Disclaimer.txt (853.00 B)

2001-10-10 07:36:57

by Paul Mackerras

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

Linus Torvalds writes:

> And THAT is the hard part. Doing lookup without locks ends up being
> pretty much worthless, because you need the locks for the removal
> anyway, at which point the whole thing looks pretty moot.
>
> Did I miss something?

I believe this all becomes (much more) useful when you are doing
read-copy-update.

There is an assumption that anyone modifying the list (inserting or
deleting) would take a lock first, so the deletion is just a pointer
assignment. Any reader traversing the list (without a lock) sees
either the old pointer or the new, which is fine.

The difficulty is in making sure that no reader is still inspecting
the list element you just removed before you free it, or modify any
field that the reader would be looking at (particularly the `next'
field :). One way of doing that is to defer the free or modification
to a quiescent point. If you have a separate `next_free' field, you
could safely put the element on a list of elements to be freed at the
next quiescent point.

Paul.

2001-10-10 11:54:38

by Keith Owens

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, 10 Oct 2001 05:05:10 +0000 (UTC),
[email protected] (Linus Torvalds) wrote:
>Now, before people get all excited, what is this particular code
>actually _good_ for?
>
>Creating a lock-free list that allows insertion concurrently with lookup
>is _easy_.
>
>But what's the point? If you insert stuff, you eventually have to remove
>it. What goes up must come down. Insert-inane-quote-here.
>
>And THAT is the hard part. Doing lookup without locks ends up being
>pretty much worthless, because you need the locks for the removal
>anyway, at which point the whole thing looks pretty moot.

<pedantic>

It is possible to do completely lock free queue management, I have done
it (once). There are several major requirements :-

* Storage must never change its type (type stable storage). Once a
block has been assigned as struct foo, it must always contain struct
foo. This can be relaxed slightly as long as the data types have a
common header format.

The biggest problem this causes is that storage can never be released
and unmapped. You never know if somebody is going to follow an old
pointer to access storage that you freed. You must put the storage
on a free list for struct foo instead of unmapping it, to maintain
the type stable storage.

* Every piece of code that traverses the data tree for structure foo
must take a copy of each struct foo into local storage. After taking
a copy it must verify that its local copy is self consistent, via
validity indicators in each struct. The validity indicators are
typically generation numbers, whatever you use, the indicators must
be atomically updated and atomically read, with suitable wmb and rmb
barriers for machines with weak memory ordering.

* After following a pointer in your local copy of struct foo to access
another data area, you must verify that the master version of your
local copy has not been changed. If the master copy has changed then
the pointer you just followed is no longer reliable. In almost every
case you have to start the traversal from the beginning. It makes
for very convoluted reader and updater code.

</pedantic>

The only reason I used lock free read and update was to hook into an
existing system that mandated sub second responses. Some of the new
operations that were performed during list traversal and update were
subject to unbounded delays that were completely outside my control. I
could not afford to lock the data structures during traversal because
any delay in the new operations would completely stall the existing
system, also the existing system had to be able to retrieve and delete
data for the new operations at any time.

I don't recommend doing lock free unless you have absolutely no
alternative. It requires convoluted code, it is difficult to debug, it
is expensive on memory bandwidth (forget zero copy) and tends to be
expensive on storage as well.

If you are interested in lock free work, take a look at
http://www.cs.pitt.edu/~moir/papers.html, particularly Practical
Implementations of Non-Blocking Synchronization Primitives. But
beware, that paper has an error which caused intermittent bugs that
took me 5 months to track down. Section 3.3, proc Copy, line 6 should
be

6: y := addr->data[i];

2001-10-10 13:25:28

by Ivan Kokshaysky

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> So before changing any code, I would prefer to double check with the
> current alpha architects that the read dependency really isn't enough to
> enforce read ordering without the need of rmb also on the beleeding edge
> ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> but we wouldn't be affected with any recent hardware compiling for
> EV6/EV67. Jay, Peter, comments?

21264 Compiler Writer's Guide [appendix C] explicitly says that the
second load cannot issue if its address depends on a result of previous
load until that result is available. I refuse to believe that it isn't
true for older alphas, especially because they are strictly in-order
machines, unlike ev6.

I suspect some confusion here - probably that architect meant loads
to independent addresses. Of course, in this case mb() is required
to assure ordering.

Ivan.

2001-10-10 13:41:38

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 05:24:31PM +0400, Ivan Kokshaysky wrote:
> On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> > So before changing any code, I would prefer to double check with the
> > current alpha architects that the read dependency really isn't enough to
> > enforce read ordering without the need of rmb also on the beleeding edge
> > ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> > wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> > but we wouldn't be affected with any recent hardware compiling for
> > EV6/EV67. Jay, Peter, comments?
>
> 21264 Compiler Writer's Guide [appendix C] explicitly says that the
> second load cannot issue if its address depends on a result of previous
> load until that result is available. I refuse to believe that it isn't

Fine, btw I also recall to have read something on those lines, and not
even in the 21264 manual but in the alpha reference manual that would
apply to all the chips but I didn't find it with a short lookup. Thanks
for checking!

> true for older alphas, especially because they are strictly in-order
> machines, unlike ev6.

Yes, it sounds strange. However According to Paul this would not be the
cpu but a cache coherency issue. rmb() would enforce the cache coherency
etc... so maybe the issue is related to old SMP motherboard etc... not
even to the cpus ... dunno. But as said it sounded very strange that
also new chips and new boards have such a weird reodering trouble.

> I suspect some confusion here - probably that architect meant loads
> to independent addresses. Of course, in this case mb() is required
> to assure ordering.
>
> Ivan.


Andrea

2001-10-10 13:42:18

by Justin Piszcz

[permalink] [raw]
Subject: AIC7XXX

I'll try the new one out, but when I tried it out when it was first
introduced into the kernel, it had a lot of errors accessing my SCSI cdrom
drives.

2001-10-10 15:28:18

by Victor Yodaiken

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 11:31:08AM +0530, BALBIR SINGH wrote:
> Linus Torvalds wrote:
> >I can't think of many lists like that. The PCI lists certainly are both
> >add/remove: cardbus bridges and hotplug-PCI means that they are not just
> >purely "enumerate at bootup".
> >
>
> I agree, I just thought of one case quickly. Assume that somebody did a cat /proc/pci.
> Meanwhile somebody is adding a new pci device (hotplug PCI) simultaneously (which is very rare).
> I would not care if the new device showed up in the output of /proc/pci this time. It would
> definitely show up next time. Meanwhile locking the list (just in case it changes) is an
> overhead in the case above. I was referring to these cases in my earlier mail.

So you make the data structure and algorithms more complex and hard to maintain in
order to get an undetectable improvement in the speed of something that almost
never happens and and that is not even in the same neighborhood as being a
bottleneck?



2001-10-10 15:59:48

by Victor Yodaiken

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 05:36:18PM +1000, Paul Mackerras wrote:
> Linus Torvalds writes:
>
> > And THAT is the hard part. Doing lookup without locks ends up being
> > pretty much worthless, because you need the locks for the removal
> > anyway, at which point the whole thing looks pretty moot.
> >
> > Did I miss something?
>
> I believe this all becomes (much more) useful when you are doing
> read-copy-update.
>
> There is an assumption that anyone modifying the list (inserting or
> deleting) would take a lock first, so the deletion is just a pointer
> assignment. Any reader traversing the list (without a lock) sees
> either the old pointer or the new, which is fine.
>
> The difficulty is in making sure that no reader is still inspecting
> the list element you just removed before you free it, or modify any
> field that the reader would be looking at (particularly the `next'
> field :). One way of doing that is to defer the free or modification
> to a quiescent point. If you have a separate `next_free' field, you
> could safely put the element on a list of elements to be freed at the
> next quiescent point.


And the "next quiescent point" must be a synchronization point and that must
have spinlocks around it!
Although I kind of like the idea of
normal operation create mess by avoiding synchronization
when system seems idle, get BKL, and clean up.


>
> Paul.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-10-10 17:02:04

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote:
> 1. Define an rmbdd() which is a no-op on all architectures except for
> alpha, where it is an rmb. Richard can then have the job of
> finding all the places where an rmbdd is needed, which sounds like
> one of the smellier labors of Hercules to me. :)

I don't think it's actually all that bad. There won't be all
that many places that require the rmbdd, and they'll pretty
much exactly correspond to the places in which you have to put
wmb for all architectures anyway.


r~

2001-10-10 21:40:43

by Luigi Genoni

[permalink] [raw]
Subject: Re: AIC7XXX

I had similar problems at the beginning, but now this new driver is well
working, ate less for me.

Luigi

On Wed, 10 Oct 2001, war wrote:

> I'll try the new one out, but when I tried it out when it was first
> introduced into the kernel, it had a lot of errors accessing my SCSI cdrom
> drives.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-10-10 21:51:33

by Paul McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion


> On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote:
> > 1. Define an rmbdd() which is a no-op on all architectures except for
> > alpha, where it is an rmb. Richard can then have the job of
> > finding all the places where an rmbdd is needed, which sounds like
> > one of the smellier labors of Hercules to me. :)
>
> I don't think it's actually all that bad. There won't be all
> that many places that require the rmbdd, and they'll pretty
> much exactly correspond to the places in which you have to put
> wmb for all architectures anyway.

Just to make sure I understand... This rmbdd() would use IPIs to
get all the CPUs' caches synchronized, right? Or do you have some
other trick up your sleeve? ;-)

Thanx, Paul

2001-10-10 21:56:54

by Keith Owens

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, 10 Oct 2001 09:54:36 -0600,
Victor Yodaiken <[email protected]> wrote:
>Although I kind of like the idea of
> normal operation create mess by avoiding synchronization
> when system seems idle, get BKL, and clean up.

That does not work. A process can read an entry from a list then
perform an operation that puts the process to sleep. When it wakes up
again, how can it tell if the list has been changed? How can the
cleanup code tell if any process has slept while it was traversing a
list? An idle system does not prevent processes from sleeping at the
wrong point.

Don't even think of requiring "processes must not sleep while
traversing a lock free list". Programmers cannot get "processes must
not sleep while holding a lock" correct and that error is much easier
to detect.

The inability for update code to know when a lock free list is being
traversed is the reason that most lock free work requires type stable
storage. You can mark a list entry as free and put it on a free chain
but you can never unmap the storage nor use the free entry for data of
a different type. That is the only way to guarantee that a process can
safely determine if the list has changed under it during traversal
while still allowing the process to be interrupted or to sleep.

2001-10-10 22:22:47

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote:
> Just to make sure I understand... This rmbdd() would use IPIs to
> get all the CPUs' caches synchronized, right?

No, it would expand to rmb on Alpha, and to nothing elsewhere.


r~

2001-10-10 22:27:08

by Richard Henderson

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote:
> > I don't think it's actually all that bad. There won't be all
> > that many places that require the rmbdd, and they'll pretty
> > much exactly correspond to the places in which you have to put
> > wmb for all architectures anyway.
>
> Just to make sure I understand... This rmbdd() would use IPIs to
> get all the CPUs' caches synchronized, right?

Err, I see your confusion now.

"Correspond" meaning "for every wmb needed on the writer side,
there is likely an rmb needed on the reader side in a similar
place".


r~

2001-10-10 22:29:28

by Victor Yodaiken

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Thu, Oct 11, 2001 at 07:56:43AM +1000, Keith Owens wrote:
> On Wed, 10 Oct 2001 09:54:36 -0600,
> Victor Yodaiken <[email protected]> wrote:
> >Although I kind of like the idea of
> > normal operation create mess by avoiding synchronization
> > when system seems idle, get BKL, and clean up.
>
> That does not work. A process can read an entry from a list then
> perform an operation that puts the process to sleep. When it wakes up
> again, how can it tell if the list has been changed? How can the

In general you're right, and always its better to
reduce contention than to come up with silly algorithms for
reducing the cost of contention, but if you want to live
dangerously:

reader:
atomic increment read_count
do stuff; skip queue elements marked
as zombie
atomic decrement read_count


writer:
spin lock queue
to delete element
mark element as zombie
unspin

cleanup:
spin lock queue
if(read_count == 0){
get big kernel lock
if(read_count is still 0)
// now nobody will be able to get to queue
clean up queue
unlock kernel
}
unlock spin


So you accomplished making the code harder to read and maintain, slowing
down worst case, and maybe reducing average case read spinlock delay.
For some very carefully selected cases this may be a win, but ...



2001-10-10 23:47:23

by David Miller

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Victor Yodaiken <[email protected]>
Date: Wed, 10 Oct 2001 16:24:19 -0600

In general you're right, and always its better to
reduce contention than to come up with silly algorithms for
reducing the cost of contention,

I want to second this and remind people that the "cost" of spinlocks
is mostly not "spinning idly waiting for lock", rather the big cost
is shuffling the dirty cacheline ownership between the processors.

Any scheme involving shared data which is written (the read counts
in the various "lockless" schemes are examples) have the same "cost"
assosciated with them.

In short, I see no performance gain from the lockless algorithms
even in the places where they can be applied.

I spent some time oogling over lockless algorithms a few years ago,
but I stopped once I realized where the true costs were. In my view,
the lockless algorithms perhaps are a win in parallel processing
environments (in fact, the supercomputing field is where a lot of the
lockless algorithm research comes from) but not in the kinds of places
and with the kinds of data structure usage the Linux kernel has.

Franks a lot,
David S. Miller
[email protected]

2001-10-11 00:18:36

by Davide Libenzi

[permalink] [raw]
Subject: Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, 10 Oct 2001, David S. Miller wrote:

> From: Victor Yodaiken <[email protected]>
> Date: Wed, 10 Oct 2001 16:24:19 -0600
>
> In general you're right, and always its better to
> reduce contention than to come up with silly algorithms for
> reducing the cost of contention,
>
> I want to second this and remind people that the "cost" of spinlocks
> is mostly not "spinning idly waiting for lock", rather the big cost
> is shuffling the dirty cacheline ownership between the processors.
>
> Any scheme involving shared data which is written (the read counts
> in the various "lockless" schemes are examples) have the same "cost"
> assosciated with them.
>
> In short, I see no performance gain from the lockless algorithms
> even in the places where they can be applied.
>
> I spent some time oogling over lockless algorithms a few years ago,
> but I stopped once I realized where the true costs were. In my view,
> the lockless algorithms perhaps are a win in parallel processing
> environments (in fact, the supercomputing field is where a lot of the
> lockless algorithm research comes from) but not in the kinds of places
> and with the kinds of data structure usage the Linux kernel has.

Agree.
To have a complete list handling you've to end up locking or
write stressing counter variables to simulate locks.
Just only having special handled lists that removes write ops from irq
handlers can make you save a big cost of ..._irqsave()/restore().
And these can be realized by having an extra list of insert/remove ops
that is filled with no locks by irq handlers and is truncated/flushed by a
__xchg() done by readers.




- Davide


2001-10-11 01:57:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/arch/alpha/kernel/smp.c linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c
--- linux-2.4.10/arch/alpha/kernel/smp.c Thu Sep 13 15:21:32 2001
+++ linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c Mon Oct 8 18:31:18 2001
@@ -63,8 +63,20 @@
IPI_RESCHEDULE,
IPI_CALL_FUNC,
IPI_CPU_STOP,
+ IPI_MB,
};

+/* Global and per-CPU state for global MB shootdown. */
+static struct {
+ spinlock_t mutex;
+ unsigned long need_mb; /* bitmask of CPUs that need to do "mb". */
+ long curgen; /* Each "generation" is a group of requests */
+ long maxgen; /* that is handled by one set of "mb"s. */
+} mb_global_data __cacheline_aligned = { SPIN_LOCK_UNLOCKED, 0, 1, 0 };
+static struct {
+ long mygen ____cacheline_aligned;
+} mb_data[NR_CPUS] __cacheline_aligned;
+
spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;

/* Set to a secondary's cpuid when it comes online. */
@@ -772,6 +784,41 @@
goto again;
}

+/*
+ * Execute an "mb" instruction in response to an IPI_MB. Also directly
+ * called by smp_global_mb(). If this is the last CPU to respond to
+ * an smp_global_mb(), then check to see if an additional generation of
+ * requests needs to be satisfied.
+ */
+
+void
+handle_mb_ipi(void)
+{
+ int this_cpu = smp_processor_id();
+ unsigned long this_cpu_mask = 1UL << this_cpu;
+ unsigned long flags;
+ unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+ /* Avoid lock contention when extra IPIs arrive (due to race) and
+ when waiting for global mb shootdown. */
+ if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+ return;
+ }
+ spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */
+ if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+ return;
+ }
+ mb_global_data.need_mb &= ~this_cpu_mask;
+ if (mb_global_data.need_mb == 0) {
+ if (++mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+ mb_global_data.need_mb = to_whom;
+ send_ipi_message(to_whom, IPI_MB);
+ }
+ }
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags); /* implied mb */
+}
+
void
handle_ipi(struct pt_regs *regs)
{
@@ -825,6 +872,9 @@
else if (which == IPI_CPU_STOP) {
halt();
}
+ else if (which == IPI_MB) {
+ handle_mb_ipi();
+ }
else {
printk(KERN_CRIT "Unknown IPI on CPU %d: %lu\n",
this_cpu, which);
@@ -860,6 +910,58 @@
printk(KERN_WARNING "smp_send_stop: Not on boot cpu.\n");
#endif
send_ipi_message(to_whom, IPI_CPU_STOP);
+}
+
+/*
+ * Execute an "mb" instruction, then force all other CPUs to execute "mb"
+ * instructions. Does not block. Once this function returns, the caller
+ * is guaranteed that all of its memory writes preceding the call to
+ * smp_global_mb() will be seen by all CPUs as preceding all memory
+ * writes following the call to smp_global_mb().
+ *
+ * For example, if CPU 0 does:
+ * a.data = 1;
+ * smp_global_mb();
+ * p = &a;
+ * and CPU 1 does:
+ * d = p->data;
+ * where a.data is initially garbage and p initially points to another
+ * structure with the "data" field being zero, then CPU 1 will be
+ * guaranteed to have "d" set to either 0 or 1, never garbage.
+ *
+ * Note that the Alpha "wmb" instruction is -not- sufficient!!! If CPU 0
+ * were replace the smp_global_mb() with a wmb(), then CPU 1 could end
+ * up with garbage in "d"!
+ *
+ * This function sends IPIs to all other CPUs, then spins waiting for
+ * them to receive the IPI and execute an "mb" instruction. While
+ * spinning, this function -must- respond to other CPUs executing
+ * smp_global_mb() concurrently, otherwise, deadlock would result.
+ */
+
+void
+smp_global_mb(void)
+{
+ int this_cpu = smp_processor_id();
+ unsigned long this_cpu_mask = 1UL << this_cpu;
+ unsigned long flags;
+ unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+ spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */
+ if (mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+ mb_global_data.maxgen = mb_global_data.curgen + 1;
+ } else {
+ mb_global_data.maxgen = mb_global_data.curgen;
+ mb_global_data.need_mb = to_whom;
+ send_ipi_message(to_whom, IPI_MB);
+ }
+ mb_data[this_cpu].mygen = mb_global_data.maxgen;
+ spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+ while (mb_data[this_cpu].mygen - mb_global_data.curgen >= 0) {
+ handle_mb_ipi();
+ barrier();
+ }
+
}

/*
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-alpha/system.h linux-2.4.10.wmbdd/include/asm-alpha/system.h
--- linux-2.4.10/include/asm-alpha/system.h Sun Aug 12 10:38:47 2001
+++ linux-2.4.10.wmbdd/include/asm-alpha/system.h Mon Oct 8 18:31:18 2001
@@ -151,14 +151,21 @@
#define wmb() \
__asm__ __volatile__("wmb": : :"memory")

+#define mbdd() smp_mbdd()
+#define wmbdd() smp_wmbdd()
+
#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() smp_global_mb()
+#define smp_wmbdd() smp_mbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h linux-2.4.10.wmbdd/include/asm-arm/system.h
--- linux-2.4.10/include/asm-arm/system.h Mon Nov 27 17:07:59 2000
+++ linux-2.4.10.wmbdd/include/asm-arm/system.h Mon Oct 8 18:31:18 2001
@@ -39,6 +39,8 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");

#define prepare_to_switch() do { } while(0)
@@ -68,12 +70,16 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() rmbdd()
+#define smp_wmbdd() wmbdd()

#else

#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()

#define cli() __cli()
#define sti() __sti()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h linux-2.4.10.wmbdd/include/asm-cris/system.h
--- linux-2.4.10/include/asm-cris/system.h Tue May 1 16:05:00 2001
+++ linux-2.4.10.wmbdd/include/asm-cris/system.h Mon Oct 8 18:31:18 2001
@@ -144,15 +144,21 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define iret()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h linux-2.4.10.wmbdd/include/asm-i386/system.h
--- linux-2.4.10/include/asm-i386/system.h Sun Sep 23 10:31:01 2001
+++ linux-2.4.10.wmbdd/include/asm-i386/system.h Mon Oct 8 18:31:18 2001
@@ -285,15 +285,21 @@
#define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory")
#define rmb() mb()
#define wmb() __asm__ __volatile__ ("": : :"memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h linux-2.4.10.wmbdd/include/asm-ia64/system.h
--- linux-2.4.10/include/asm-ia64/system.h Tue Jul 31 10:30:09 2001
+++ linux-2.4.10.wmbdd/include/asm-ia64/system.h Mon Oct 8 18:31:18 2001
@@ -84,11 +84,36 @@
* like regions are visible before any subsequent
* stores and that all following stores will be
* visible only after all previous stores.
- * rmb(): Like wmb(), but for reads.
+ * In common code, any reads that depend on this
+ * ordering must be separated by an mb() or rmb().
+ * rmb(): Guarantees that all preceding loads to memory-
+ * like regions are executed before any subsequent
+ * loads.
* mb(): wmb()/rmb() combo, i.e., all previous memory
* accesses are visible before all subsequent
* accesses and vice versa. This is also known as
- * a "fence."
+ * a "fence." Again, in common code, any reads that
+ * depend on the order of writes must themselves be
+ * separated by an mb() or rmb().
+ * wmbdd(): Guarantees that all preceding stores to memory-
+ * like regions are visible before any subsequent
+ * stores and that all following stores will be
+ * visible only after all previous stores.
+ * In common code, any reads that depend on this
+ * ordering either must be separated by an mb()
+ * or rmb(), or the later reads must depend on
+ * data loaded by the earlier reads. For an example
+ * of the latter, consider "p->next". The read of
+ * the "next" field depends on the read of the
+ * pointer "p".
+ * mbdd(): wmb()/rmb() combo, i.e., all previous memory
+ * accesses are visible before all subsequent
+ * accesses and vice versa. This is also known as
+ * a "fence." Again, in common code, any reads that
+ * depend on the order of writes must themselves be
+ * separated by an mb() or rmb(), or there must be
+ * a data dependency that forces the second to
+ * wait until the first completes.
*
* Note: "mb()" and its variants cannot be used as a fence to order
* accesses to memory mapped I/O registers. For that, mf.a needs to
@@ -99,15 +124,21 @@
#define mb() __asm__ __volatile__ ("mf" ::: "memory")
#define rmb() mb()
#define wmb() mb()
+#define rmbdd() mb()
+#define wmbdd() mb()

#ifdef CONFIG_SMP
# define smp_mb() mb()
# define smp_rmb() rmb()
# define smp_wmb() wmb()
+# define smp_mbdd() mbdd()
+# define smp_wmbdd() wmbdd()
#else
# define smp_mb() barrier()
# define smp_rmb() barrier()
# define smp_wmb() barrier()
+# define smp_mbdd() barrier()
+# define smp_wmbdd() barrier()
#endif

/*
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h linux-2.4.10.wmbdd/include/asm-m68k/system.h
--- linux-2.4.10/include/asm-m68k/system.h Mon Jun 11 19:15:27 2001
+++ linux-2.4.10.wmbdd/include/asm-m68k/system.h Mon Oct 8 18:31:18 2001
@@ -81,12 +81,16 @@
#define mb() barrier()
#define rmb() barrier()
#define wmb() barrier()
+#define rmbdd() barrier()
+#define wmbdd() barrier()
#define set_mb(var, value) do { xchg(&var, value); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)

#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()


#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h linux-2.4.10.wmbdd/include/asm-mips/system.h
--- linux-2.4.10/include/asm-mips/system.h Sun Sep 9 10:43:01 2001
+++ linux-2.4.10.wmbdd/include/asm-mips/system.h Mon Oct 8 18:31:18 2001
@@ -152,6 +152,8 @@
#define rmb() do { } while(0)
#define wmb() wbflush()
#define mb() wbflush()
+#define wmbdd() wbflush()
+#define mbdd() wbflush()

#else /* CONFIG_CPU_HAS_WB */

@@ -167,6 +169,8 @@
: "memory")
#define rmb() mb()
#define wmb() mb()
+#define wmbdd() mb()
+#define mbdd() mb()

#endif /* CONFIG_CPU_HAS_WB */

@@ -174,10 +178,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips64/system.h linux-2.4.10.wmbdd/include/asm-mips64/system.h
--- linux-2.4.10/include/asm-mips64/system.h Wed Jul 4 11:50:39 2001
+++ linux-2.4.10.wmbdd/include/asm-mips64/system.h Mon Oct 8 18:31:18 2001
@@ -148,15 +148,21 @@
: "memory")
#define rmb() mb()
#define wmb() mb()
+#define rmbdd() mb()
+#define wmbdd() mb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-parisc/system.h linux-2.4.10.wmbdd/include/asm-parisc/system.h
--- linux-2.4.10/include/asm-parisc/system.h Wed Dec 6 11:46:39 2000
+++ linux-2.4.10.wmbdd/include/asm-parisc/system.h Mon Oct 8 18:31:18 2001
@@ -51,6 +51,8 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() rmb()
+#define smp_wmbdd() wmb()
#else
/* This is simply the barrier() macro from linux/kernel.h but when serial.c
* uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
@@ -59,6 +61,8 @@
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");
#endif

/* interrupt control */
@@ -122,6 +126,8 @@

#define mb() __asm__ __volatile__ ("sync" : : :"memory")
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() mb()

extern unsigned long __xchg(unsigned long, unsigned long *, int);

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h linux-2.4.10.wmbdd/include/asm-ppc/system.h
--- linux-2.4.10/include/asm-ppc/system.h Tue Aug 28 06:58:33 2001
+++ linux-2.4.10.wmbdd/include/asm-ppc/system.h Mon Oct 8 18:31:18 2001
@@ -33,6 +33,8 @@
#define mb() __asm__ __volatile__ ("sync" : : : "memory")
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("eieio" : : : "memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#define set_mb(var, value) do { var = value; mb(); } while (0)
#define set_wmb(var, value) do { var = value; wmb(); } while (0)
@@ -41,10 +43,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#else
#define smp_mb() __asm__ __volatile__("": : :"memory")
#define smp_rmb() __asm__ __volatile__("": : :"memory")
#define smp_wmb() __asm__ __volatile__("": : :"memory")
+#define smp_mbdd() __asm__ __volatile__("": : :"memory")
+#define smp_wmbdd() __asm__ __volatile__("": : :"memory")
#endif /* CONFIG_SMP */

#ifdef __KERNEL__
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h linux-2.4.10.wmbdd/include/asm-s390/system.h
--- linux-2.4.10/include/asm-s390/system.h Wed Jul 25 14:12:02 2001
+++ linux-2.4.10.wmbdd/include/asm-s390/system.h Mon Oct 8 18:31:18 2001
@@ -118,9 +118,13 @@
#define mb() eieio()
#define rmb() eieio()
#define wmb() eieio()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#define smp_mb__before_clear_bit() smp_mb()
#define smp_mb__after_clear_bit() smp_mb()

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390x/system.h linux-2.4.10.wmbdd/include/asm-s390x/system.h
--- linux-2.4.10/include/asm-s390x/system.h Wed Jul 25 14:12:03 2001
+++ linux-2.4.10.wmbdd/include/asm-s390x/system.h Mon Oct 8 18:31:19 2001
@@ -131,9 +131,13 @@
#define mb() eieio()
#define rmb() eieio()
#define wmb() eieio()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#define smp_mb__before_clear_bit() smp_mb()
#define smp_mb__after_clear_bit() smp_mb()

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h linux-2.4.10.wmbdd/include/asm-sh/system.h
--- linux-2.4.10/include/asm-sh/system.h Sat Sep 8 12:29:09 2001
+++ linux-2.4.10.wmbdd/include/asm-sh/system.h Mon Oct 8 18:31:19 2001
@@ -89,15 +89,21 @@
#define mb() __asm__ __volatile__ ("": : :"memory")
#define rmb() mb()
#define wmb() __asm__ __volatile__ ("": : :"memory")
+#define mbdd() mb()
+#define wmbdd() wmb()

#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mb()
+#define smp_wmbdd() wmb()
#else
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+#define smp_mbdd() barrier()
+#define smp_wmbdd() barrier()
#endif

#define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc/system.h linux-2.4.10.wmbdd/include/asm-sparc/system.h
--- linux-2.4.10/include/asm-sparc/system.h Tue Oct 3 09:24:41 2000
+++ linux-2.4.10.wmbdd/include/asm-sparc/system.h Mon Oct 8 18:31:19 2001
@@ -278,11 +278,15 @@
#define mb() __asm__ __volatile__ ("" : : : "memory")
#define rmb() mb()
#define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()
#define set_mb(__var, __value) do { __var = __value; mb(); } while(0)
#define set_wmb(__var, __value) set_mb(__var, __value)
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");

#define nop() __asm__ __volatile__ ("nop");

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc64/system.h linux-2.4.10.wmbdd/include/asm-sparc64/system.h
--- linux-2.4.10/include/asm-sparc64/system.h Fri Sep 7 11:01:20 2001
+++ linux-2.4.10.wmbdd/include/asm-sparc64/system.h Wed Oct 10 16:43:21 2001
@@ -100,6 +100,8 @@
membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad");
#define rmb() membar("#LoadLoad")
#define wmb() membar("#StoreStore")
+#define mbdd() mb()
+#define wmbdd() wmb()
#define set_mb(__var, __value) \
do { __var = __value; membar("#StoreLoad | #StoreStore"); } while(0)
#define set_wmb(__var, __value) \
@@ -109,10 +111,14 @@
#define smp_mb() mb()
#define smp_rmb() rmb()
#define smp_wmb() wmb()
+#define smp_mbdd() mbdd()
+#define smp_wmbdd() wmbdd()
#else
#define smp_mb() __asm__ __volatile__("":::"memory");
#define smp_rmb() __asm__ __volatile__("":::"memory");
#define smp_wmb() __asm__ __volatile__("":::"memory");
+#define smp_mbdd() __asm__ __volatile__("":::"memory");
+#define smp_wmbdd() __asm__ __volatile__("":::"memory");
#endif

#define flushi(addr) __asm__ __volatile__ ("flush %0" : : "r" (addr) : "memory")

2001-10-12 04:18:51

by Rusty Russell

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

On Wed, 10 Oct 2001 18:56:26 -0700 (PDT)
"Paul E. McKenney" <[email protected]> wrote:

> Here are two patches. The wmbdd patch has been modified to use
> the lighter-weight SPARC instruction, as suggested by Dave Miller.
> The rmbdd patch defines an rmbdd() primitive that is defined to be
> rmb() on Alpha and a nop on other architectures. I believe this
> rmbdd() primitive is what Richard is looking for.

Surely we don't need both? If rmbdd exists, any code needing wmbdd
is terminally broken?

Rusty.

2001-10-13 14:48:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: RFC: patch to allow lock-free traversal of lists with insertion

>> Here are two patches. The wmbdd patch has been modified to use
>> the lighter-weight SPARC instruction, as suggested by Dave Miller.
>> The rmbdd patch defines an rmbdd() primitive that is defined to be
>> rmb() on Alpha and a nop on other architectures. I believe this
>> rmbdd() primitive is what Richard is looking for.
>
>Surely we don't need both? If rmbdd exists, any code needing wmbdd
>is terminally broken?

One or the other. And at this point, it looks like rmbdd() (or
read_cache_depends()) is the mechanism of choice, given wmbdd()'s
performance on Alpha.

Thanx, Paul