2003-03-09 23:36:37

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH] small fixes in brlock.h

Index: linux-2.5.64-unwashed/include/linux/brlock.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.64/include/linux/brlock.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 brlock.h
--- linux-2.5.64-unwashed/include/linux/brlock.h 5 Mar 2003 05:07:54 -0000 1.1.1.1
+++ linux-2.5.64-unwashed/include/linux/brlock.h 9 Mar 2003 23:42:26 -0000
@@ -85,8 +85,7 @@
if (idx >= __BR_END)
__br_lock_usage_bug();

- preempt_disable();
- _raw_read_lock(&__brlock_array[smp_processor_id()][idx]);
+ read_lock(&__brlock_array[smp_processor_id()][idx]);
}

static inline void br_read_unlock (enum brlock_indices idx)
@@ -202,7 +201,7 @@
do { local_bh_disable(); br_write_lock(idx); } while (0)

#define br_read_unlock_irqrestore(idx, flags) \
- do { br_read_unlock(irx); local_irq_restore(flags); } while (0)
+ do { br_read_unlock(idx); local_irq_restore(flags); } while (0)

#define br_read_unlock_irq(idx) \
do { br_read_unlock(idx); local_irq_enable(); } while (0)
@@ -211,7 +210,7 @@
do { br_read_unlock(idx); local_bh_enable(); } while (0)

#define br_write_unlock_irqrestore(idx, flags) \
- do { br_write_unlock(irx); local_irq_restore(flags); } while (0)
+ do { br_write_unlock(idx); local_irq_restore(flags); } while (0)

#define br_write_unlock_irq(idx) \
do { br_write_unlock(idx); local_irq_enable(); } while (0)

--
function.linuxpower.ca


2003-03-09 23:49:20

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

On Sun, 2003-03-09 at 18:44, Zwane Mwaikambo wrote:

> --- linux-2.5.64-unwashed/include/linux/brlock.h 5 Mar 2003 05:07:54 -0000 1.1.1.1
> +++ linux-2.5.64-unwashed/include/linux/brlock.h 9 Mar 2003 23:42:26 -0000
> @@ -85,8 +85,7 @@
> if (idx >= __BR_END)
> __br_lock_usage_bug();
>
> - preempt_disable();
> - _raw_read_lock(&__brlock_array[smp_processor_id()][idx]);
> + read_lock(&__brlock_array[smp_processor_id()][idx]);
> }

This is wrong.

We have to disable preemption prior to reading smp_processor_id().
Otherwise we may lock/unlock the wrong processor's brlock. The above as
you changed it is equivalent to:

cpu = smp_processor_id();
/* do not want to preempt here, but we can! */
preempt_disable();
_raw_read_lock(&__brlock_array[cpu][idx]);

And what we had, and what we need, is:

preempt_disable();
cpu = smp_processor_id();
_raw_read_lock(&__brlock_array[cpu][idx]);

In other words, we need to ensure preemption is disabled prior to
calling smp_processor_id().

We also do something similar with the write patch in lib/brlock.c, but
for different reason: to disable preemption once and not NR_CPUS times.

The rest of the patch looks fine. :)

Robert Love

2003-03-09 23:54:54

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

On Sun, 9 Mar 2003, Robert Love wrote:

> On Sun, 2003-03-09 at 18:44, Zwane Mwaikambo wrote:
>
> > --- linux-2.5.64-unwashed/include/linux/brlock.h 5 Mar 2003 05:07:54 -0000 1.1.1.1
> > +++ linux-2.5.64-unwashed/include/linux/brlock.h 9 Mar 2003 23:42:26 -0000
> > @@ -85,8 +85,7 @@
> > if (idx >= __BR_END)
> > __br_lock_usage_bug();
> >
> > - preempt_disable();
> > - _raw_read_lock(&__brlock_array[smp_processor_id()][idx]);
> > + read_lock(&__brlock_array[smp_processor_id()][idx]);
> > }
>
> This is wrong.

I was waiting for this ;)

> We have to disable preemption prior to reading smp_processor_id().
> Otherwise we may lock/unlock the wrong processor's brlock. The above as
> you changed it is equivalent to:
>
> cpu = smp_processor_id();
> /* do not want to preempt here, but we can! */
> preempt_disable();
> _raw_read_lock(&__brlock_array[cpu][idx]);

How are we able to preempt there? Timer tick?

Thanks,
Zwane
--
function.linuxpower.ca

2003-03-09 23:59:39

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

On Sun, 2003-03-09 at 19:03, Zwane Mwaikambo wrote:

> > cpu = smp_processor_id();
> > /* do not want to preempt here, but we can! */
> > preempt_disable();
> > _raw_read_lock(&__brlock_array[cpu][idx]);
>
> How are we able to preempt there? Timer tick?

Yep. Any interrupt, actually.

Or the reschedule IPI on SMP systems.

Kernel preemption off an interrupt is actually the most common (and the
ideal) place to preempt since an interrupt is usually what wakes up a
task off I/O and sets need_resched. So kernel preemption lets us
reschedule the higher priority task the moment the interrupt wakes it
up. Of course, if a lock is held, we have to wait till we drop it.

Robert Love

2003-03-10 00:04:40

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

On Sun, 2003-03-09 at 18:44, Zwane Mwaikambo wrote:

> #define br_read_unlock_irqrestore(idx, flags) \
> - do { br_read_unlock(irx); local_irq_restore(flags); } while (0)
> + do { br_read_unlock(idx); local_irq_restore(flags); } while (0)

BTW, I am amazed all these s/idx/irx/ bugs exist and no one noticed
them.

I guess nothing uses these irq variants. In fact, grepping the
source... wow, not much uses brlocks at all. Only registered lock is
BR_NETPROTO_LOCK. A read lock on it is called only 7 times and a write
lock is used 31 times.

Everything must of moved over to using RCU or something. It makes me
question the future of these things.

Robert Love

2003-03-10 01:03:04

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

On Sun, 9 Mar 2003, Robert Love wrote:

> Yep. Any interrupt, actually.
>
> Or the reschedule IPI on SMP systems.
>
> Kernel preemption off an interrupt is actually the most common (and the
> ideal) place to preempt since an interrupt is usually what wakes up a
> task off I/O and sets need_resched. So kernel preemption lets us
> reschedule the higher priority task the moment the interrupt wakes it
> up. Of course, if a lock is held, we have to wait till we drop it.

Thanks for clearing that up, i'll probably need to relook at some of my
code to be sure.

Zwane
--
function.linuxpower.ca

2003-03-10 02:25:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h


On 9 Mar 2003, Robert Love wrote:
>
> I guess nothing uses these irq variants. In fact, grepping the
> source... wow, not much uses brlocks at all. Only registered lock is
> BR_NETPROTO_LOCK. A read lock on it is called only 7 times and a write
> lock is used 31 times.
>
> Everything must of moved over to using RCU or something. It makes me
> question the future of these things.

No, I don't think there are even "moved to RCU" users. It's just never
been used very much, since the writes are _so_ expensive (in fact, there
have been various live-locks on the writer side, the whole brlock thing is
really questionable).

It's entirely possible that the current user could be replaced by RCU
and/or seqlocks, and we could get rid of brlocks entirely.

Linus

2003-03-10 21:44:55

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] small fixes in brlock.h

Hi,

On Sun, 9 Mar 2003, Linus Torvalds wrote:

> No, I don't think there are even "moved to RCU" users. It's just never
> been used very much, since the writes are _so_ expensive (in fact, there
> have been various live-locks on the writer side, the whole brlock thing is
> really questionable).
>
> It's entirely possible that the current user could be replaced by RCU
> and/or seqlocks, and we could get rid of brlocks entirely.

I think now it's a good time to throw the patch below into the ring. :)
This new lock is an extension of the brlock and has an asynchronous and a
synchronous mode. The synchronous mode can be used to replaced the brlock
and does something very similiar than the nonatomic version of the brlock,
but has a better worst case behaviour.
The asynchronous mode can be used to replaced RCU. The read lock is very
cheap and never sleeps, but it's enough to make the quiescent stage a lot
cheaper and more flexible to use. This means instead of calling call_rcu()
it's enough to call brs_read_sync() and immediately after this all reader
references are gone.
I'm meditating over this patch already some time, but I can't find a flaw.
OTOH it looks very promising to make locking in several areas a lot
cheaper. I have another patch laying around, which completely replaces the
dcache_lock with this lock. Additionally I only need an asynchronous read
lock during d_lookup and dput and is compatible with the fast walk code,
so it should be even faster than dcache rcu, but requires of course a few
more changes. This patch is far away from ready, as some bits are still
missing, but I think I figured out the basic locking rules. If anyone
wants to look at it, I'll send him a copy.

bye, Roman

diff -Nur linux-2.5.org/include/linux/brslock.h linux-2.5/include/linux/brslock.h
--- linux-2.5.org/include/linux/brslock.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.5/include/linux/brslock.h 2003-03-02 02:59:24.000000000 +0100
@@ -0,0 +1,154 @@
+#ifndef __LINUX_BRSLOCK_H
+#define __LINUX_BRSLOCK_H
+
+/*
+ * 'Big Reader / Sync' read-write spinlocks.
+ *
+ * super-fast read/write locks, with write-side penalty and
+ * a synchronous and asynchronous version.
+ *
+ * (c) 2003, Roman Zippel <[email protected]>
+ * Released under the terms of the GNU GPL v2.0.
+ *
+ * This lock is based on the brlock by Ingo Molnar and David S. Miller
+ * and rcu techniques.
+ *
+ * The asynchronous read lock only marks the critical regions, but never waits
+ * for any writer, so the writer can only detect active readers, this is
+ * sufficient for asynchronous list updates as used by the rcu mechanism, but
+ * has the advantage that after a call to brs_read_sync the update is finished
+ * and no further callback is required.
+ * The synchronous read lock extends the asynchronous lock by adding a write
+ * test, but since it's in the same cache line as the count, it's very cheap.
+ * The writer can now prevent new readers from entering the critical region
+ * and only needs to wait for other readers to leave it.
+ */
+
+#include <linux/config.h>
+
+#ifdef CONFIG_SMP
+
+#include <linux/cache.h>
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+
+enum brslock_indices {
+ __BRS_END
+};
+
+/*
+ * This causes a link-time bug message if an
+ * invalid index is used:
+ */
+extern void __brs_lock_usage_bug(void);
+#define __brs_lock_idx_check(idx) \
+ if (idx >= __BRS_END) \
+ __brs_lock_usage_bug()
+
+struct brs_lock {
+ unsigned int count;
+ unsigned int wlock;
+};
+
+/*
+ * align last allocated index to the next cacheline:
+ */
+#define __BRS_IDX_MAX \
+ (((sizeof(struct brs_lock)*__BRS_END + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1)) / sizeof(struct brs_lock))
+
+extern struct brs_lock __brslock_array[NR_CPUS][__BRS_IDX_MAX];
+extern spinlock_t __brs_write_locks[__BRS_END];
+
+extern void __brs_read_lock_sync(enum brslock_indices idx);
+extern void __brs_read_sync(enum brslock_indices idx);
+extern void __brs_write_lock_sync(enum brslock_indices idx);
+extern void __brs_write_unlock_sync(enum brslock_indices idx);
+
+static inline void __brs_read_lock_async(enum brslock_indices idx, int id)
+{
+ __brslock_array[id][idx].count++;
+ mb();
+}
+
+static inline void brs_read_lock_async(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ __brs_read_lock_async(idx, get_cpu());
+}
+
+static inline void __brs_read_unlock_async(enum brslock_indices idx, int id)
+{
+ mb();
+ __brslock_array[id][idx].count--;
+}
+
+static inline void brs_read_unlock_async(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ __brs_read_unlock_async(idx, smp_processor_id());
+ put_cpu();
+}
+
+static inline void brs_write_lock_async(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ spin_lock(&__brs_write_locks[idx]);
+}
+
+static inline void brs_write_unlock_async(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ spin_unlock(&__brs_write_locks[idx]);
+}
+
+static inline void brs_read_sync(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ __brs_read_sync(idx);
+}
+
+static inline void brs_read_lock_sync(enum brslock_indices idx)
+{
+ int id;
+restart:
+ id = get_cpu();
+ __brs_lock_idx_check(idx);
+ __brs_read_lock_async(idx, id);
+ if (unlikely(__brslock_array[id][idx].wlock)) {
+ __brs_read_lock_sync(idx);
+ goto restart;
+ }
+}
+
+static inline void brs_read_unlock_sync(enum brslock_indices idx)
+{
+ brs_read_unlock_async(idx);
+}
+
+static inline void brs_write_lock_sync(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ __brs_write_lock_sync(idx);
+}
+
+static inline void brs_write_unlock_sync(enum brslock_indices idx)
+{
+ __brs_lock_idx_check(idx);
+ __brs_write_unlock_sync(idx);
+}
+
+#else
+
+#define brs_read_lock_async(idx) ({ (void)(idx); preempt_disable(); })
+#define brs_read_unlock_async(idx) ({ (void)(idx); preempt_enable(); })
+#define brs_write_lock_async(idx) ({ (void)(idx); preempt_disable(); })
+#define brs_write_unlock_async(idx) ({ (void)(idx); preempt_enable(); })
+#define brs_read_sync(idx) ({ (void)(idx); })
+#define brs_read_lock_sync(idx) ({ (void)(idx); preempt_disable(); })
+#define brs_read_unlock_sync(idx) ({ (void)(idx); preempt_enable(); })
+#define brs_write_lock_sync(idx) ({ (void)(idx); preempt_disable(); })
+#define brs_write_unlock_sync(idx) ({ (void)(idx); preempt_enable(); })
+
+#endif /* CONFIG_SMP */
+
+#endif /* __LINUX_BRSLOCK_H */
diff -Nur linux-2.5.org/lib/Makefile linux-2.5/lib/Makefile
--- linux-2.5.org/lib/Makefile 2003-03-10 20:38:48.000000000 +0100
+++ linux-2.5/lib/Makefile 2003-03-10 20:39:39.000000000 +0100
@@ -8,7 +8,7 @@

L_TARGET := lib.a

-obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o \
+obj-y := errno.o ctype.o string.o vsprintf.o brlock.o brslock.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
kobject.o idr.o

diff -Nur linux-2.5.org/lib/brslock.c linux-2.5/lib/brslock.c
--- linux-2.5.org/lib/brslock.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.5/lib/brslock.c 2003-03-10 19:59:42.000000000 +0100
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2003 Roman Zippel <[email protected]>
+ * Released under the terms of the GNU GPL v2.0.
+ */
+
+#include <linux/config.h>
+
+#ifdef CONFIG_SMP
+
+#include <linux/brslock.h>
+
+struct brs_lock __brslock_array[NR_CPUS][__BRS_IDX_MAX];
+spinlock_t __brs_write_locks[__BRS_END];
+
+void __brs_read_lock_sync(enum brslock_indices idx)
+{
+ brs_read_unlock(idx);
+ while (__brslock_array[smp_processor_id()][idx].wlock)
+ cpu_relax();
+}
+
+void __brs_read_sync(enum brslock_indices idx)
+{
+ int mask, i, j;
+
+ mask = cpu_online_map & ~(1 << smp_processor_id());
+ while (mask) {
+ for (i = 0, j = 1; i < NR_CPUS; j <<= 1, i++) {
+ if ((mask & j) && !__brslock_array[i][idx].count)
+ mask &= ~j;
+ }
+ cpu_relax();
+ }
+}
+
+void __brs_write_lock_sync(enum brslock_indices idx)
+{
+ int i;
+
+ spin_lock(&__brs_write_locks[idx]);
+ for (i = 0; i < NR_CPUS; i++)
+ __brslock_array[i][idx].wlock = 1;
+ mb();
+ __brs_read_sync(idx);
+}
+
+void __brs_write_unlock_sync(enum brslock_indices idx)
+{
+ int i;
+
+ for (i = 0; i < NR_CPUS; i++)
+ __brslock_array[i][idx].wlock = 0;
+ spin_unlock(&__brs_write_locks[idx]);
+}
+
+#endif /* CONFIG_SMP */


2003-03-12 00:07:36

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (2/8) Eliminate brlock for packet_type

Replace linked list for packet_type with brlock with list macros and RCU.

diff -urN -X dontdiff linux-2.5.64/net/8021q/vlan.c linux-2.5-nobrlock/net/8021q/vlan.c
--- linux-2.5.64/net/8021q/vlan.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/8021q/vlan.c 2003-03-11 14:31:28.000000000 -0800
@@ -29,7 +29,6 @@
#include <net/p8022.h>
#include <net/arp.h>
#include <linux/rtnetlink.h>
-#include <linux/brlock.h>
#include <linux/notifier.h>

#include <linux/if_vlan.h>
@@ -68,7 +67,6 @@
.dev =NULL,
.func = vlan_skb_recv, /* VLAN receive method */
.data = (void *)(-1), /* Set here '(void *)1' when this code can SHARE SKBs */
- .next = NULL
};

/* End of global variables definitions. */
@@ -231,9 +229,10 @@
real_dev->vlan_rx_kill_vid(real_dev, vlan_id);
}

- br_write_lock(BR_NETPROTO_LOCK);
grp->vlan_devices[vlan_id] = NULL;
- br_write_unlock(BR_NETPROTO_LOCK);
+
+ /* wait for RCU in network receive */
+ synchronize_kernel();


/* Caller unregisters (and if necessary, puts)
diff -urN -X dontdiff linux-2.5.64/net/8021q/vlan_dev.c linux-2.5-nobrlock/net/8021q/vlan_dev.c
--- linux-2.5.64/net/8021q/vlan_dev.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/8021q/vlan_dev.c 2003-03-10 15:49:52.000000000 -0800
@@ -31,7 +31,6 @@
#include <net/datalink.h>
#include <net/p8022.h>
#include <net/arp.h>
-#include <linux/brlock.h>

#include "vlan.h"
#include "vlanproc.h"



2003-03-12 00:03:41

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (0/8) replace brlock with RCU

> estionable).
>
> It's entirely possible that the current user could be replaced by RCU
> and/or seqlocks, and we could get rid of brlocks entirely.
>
> Linus

The following sequence of patches replaces the remaining use of brlock
with RCU. Most of this is fairly straightforward. The unregister functions
use synchronize_kernel(), perhaps there should be a special version to
indicate sychronizing with network BH.

Tested the normal network path and IPV4.
For the others all the pieces compile. But don't have the test setup to
give VLAN, SNAP, IPV6, and all the netfilter code a real test.

Comments?


2003-03-12 00:08:05

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (3/8) Eliminate brlock from vlan

Eliminate brlock from vlan

diff -urN -X dontdiff linux-2.5.64/net/8021q/vlan.c linux-2.5-nobrlock/net/8021q/vlan.c
--- linux-2.5.64/net/8021q/vlan.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/8021q/vlan.c 2003-03-11 14:31:28.000000000 -0800
@@ -29,7 +29,6 @@
#include <net/p8022.h>
#include <net/arp.h>
#include <linux/rtnetlink.h>
-#include <linux/brlock.h>
#include <linux/notifier.h>

#include <linux/if_vlan.h>
@@ -68,7 +67,6 @@
.dev =NULL,
.func = vlan_skb_recv, /* VLAN receive method */
.data = (void *)(-1), /* Set here '(void *)1' when this code can SHARE SKBs */
- .next = NULL
};

/* End of global variables definitions. */
@@ -231,9 +229,10 @@
real_dev->vlan_rx_kill_vid(real_dev, vlan_id);
}

- br_write_lock(BR_NETPROTO_LOCK);
grp->vlan_devices[vlan_id] = NULL;
- br_write_unlock(BR_NETPROTO_LOCK);
+
+ /* wait for RCU in network receive */
+ synchronize_kernel();


/* Caller unregisters (and if necessary, puts)
diff -urN -X dontdiff linux-2.5.64/net/8021q/vlan_dev.c linux-2.5-nobrlock/net/8021q/vlan_dev.c
--- linux-2.5.64/net/8021q/vlan_dev.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/8021q/vlan_dev.c 2003-03-10 15:49:52.000000000 -0800
@@ -31,7 +31,6 @@
#include <net/datalink.h>
#include <net/p8022.h>
#include <net/arp.h>
-#include <linux/brlock.h>

#include "vlan.h"
#include "vlanproc.h"



2003-03-12 00:04:57

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (1/8) Eliminate brlock in psnap

The following replaces brlock with RCU in the SNAP module.

diff -urN -X dontdiff linux-2.5.64/net/802/psnap.c linux-2.5-nobrlock/net/802/psnap.c
--- linux-2.5.64/net/802/psnap.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/802/psnap.c 2003-03-10 15:48:53.000000000 -0800
@@ -21,10 +21,10 @@
#include <linux/mm.h>
#include <linux/in.h>
#include <linux/init.h>
-#include <linux/brlock.h>

LIST_HEAD(snap_list);
static struct llc_sap *snap_sap;
+static spinlock_t snap_lock = SPIN_LOCK_UNLOCKED;

/*
* Find a snap client by matching the 5 bytes.
@@ -34,17 +34,15 @@
struct list_head *entry;
struct datalink_proto *proto = NULL, *p;

- if (list_empty(&snap_list))
- goto out;
-
- list_for_each(entry, &snap_list) {
+ rcu_read_lock();
+ list_for_each_rcu(entry, &snap_list) {
p = list_entry(entry, struct datalink_proto, node);
if (!memcmp(p->type, desc, 5)) {
proto = p;
break;
}
}
-out:
+ rcu_read_unlock();
return proto;
}

@@ -124,8 +122,7 @@
{
struct datalink_proto *proto = NULL;

- br_write_lock_bh(BR_NETPROTO_LOCK);
-
+ spin_lock_bh(&snap_lock);
if (find_snap_client(desc))
goto out;

@@ -135,10 +132,10 @@
proto->rcvfunc = rcvfunc;
proto->header_length = 5 + 3; /* snap + 802.2 */
proto->request = snap_request;
- list_add(&proto->node, &snap_list);
+ list_add_rcu(&proto->node, &snap_list);
}
out:
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&snap_lock);
return proto;
}

@@ -147,12 +144,13 @@
*/
void unregister_snap_client(struct datalink_proto *proto)
{
- br_write_lock_bh(BR_NETPROTO_LOCK);
+ static RCU_HEAD(snap_rcu);

- list_del(&proto->node);
- kfree(proto);
+ spin_lock_bh(&snap_lock);
+ list_del_rcu(&proto->node);
+ spin_unlock_bh(&snap_lock);

- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ call_rcu(&snap_rcu, (void (*)(void *)) kfree, proto);
}

MODULE_LICENSE("GPL");



2003-03-12 00:06:22

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (7/8) Eliminate brlock from IPV6

Replace brlock in IPV6 with RCU.

diff -urN -X dontdiff linux-2.5.64/net/ipv6/af_inet6.c linux-2.5-nobrlock/net/ipv6/af_inet6.c
--- linux-2.5.64/net/ipv6/af_inet6.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv6/af_inet6.c 2003-03-10 16:05:21.000000000 -0800
@@ -45,7 +45,6 @@
#include <linux/inet.h>
#include <linux/netdevice.h>
#include <linux/icmpv6.h>
-#include <linux/brlock.h>
#include <linux/smp_lock.h>

#include <net/ip.h>
@@ -101,6 +100,7 @@
* build a new socket.
*/
struct list_head inetsw6[SOCK_MAX];
+static spinlock_t inetsw6_lock = SPIN_LOCK_UNLOCKED;

static void inet6_sock_destruct(struct sock *sk)
{
@@ -161,8 +161,8 @@

/* Look for the requested type/protocol pair. */
answer = NULL;
- br_read_lock_bh(BR_NETPROTO_LOCK);
- list_for_each(p, &inetsw6[sock->type]) {
+ rcu_read_lock();
+ list_for_each_rcu(p, &inetsw6[sock->type]) {
answer = list_entry(p, struct inet_protosw, list);

/* Check the non-wild match. */
@@ -180,7 +180,7 @@
}
answer = NULL;
}
- br_read_unlock_bh(BR_NETPROTO_LOCK);
+ rcu_read_unlock();

if (!answer)
goto free_and_badtype;
@@ -571,8 +571,7 @@
int protocol = p->protocol;
struct list_head *last_perm;

- br_write_lock_bh(BR_NETPROTO_LOCK);
-
+ spin_lock_bh(&inetsw6_lock);
if (p->type > SOCK_MAX)
goto out_illegal;

@@ -602,7 +601,7 @@
*/
list_add(&p->list, last_perm);
out:
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inetsw6_lock);
return;

out_permanent:
diff -urN -X dontdiff linux-2.5.64/net/ipv6/ipv6_sockglue.c linux-2.5-nobrlock/net/ipv6/ipv6_sockglue.c
--- linux-2.5.64/net/ipv6/ipv6_sockglue.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv6/ipv6_sockglue.c 2003-03-10 13:33:07.000000000 -0800
@@ -54,11 +54,10 @@

static struct packet_type ipv6_packet_type =
{
- __constant_htons(ETH_P_IPV6),
- NULL, /* All devices */
- ipv6_rcv,
- (void*)1,
- NULL
+ .type = __constant_htons(ETH_P_IPV6),
+ .dev = NULL, /* All devices */
+ .func = ipv6_rcv,
+ .data = (void*)1,
};

/*
diff -urN -X dontdiff linux-2.5.64/net/ipv6/netfilter/ip6_queue.c linux-2.5-nobrlock/net/ipv6/netfilter/ip6_queue.c
--- linux-2.5.64/net/ipv6/netfilter/ip6_queue.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv6/netfilter/ip6_queue.c 2003-03-11 15:05:18.000000000 -0800
@@ -26,7 +26,6 @@
#include <linux/netfilter.h>
#include <linux/netlink.h>
#include <linux/spinlock.h>
-#include <linux/brlock.h>
#include <linux/sysctl.h>
#include <linux/proc_fs.h>
#include <net/sock.h>
@@ -688,8 +687,6 @@

cleanup:
nf_unregister_queue_handler(PF_INET6);
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
ipq_flush(NF_DROP);

cleanup_sysctl:
diff -urN -X dontdiff linux-2.5.64/net/ipv6/protocol.c linux-2.5-nobrlock/net/ipv6/protocol.c
--- linux-2.5.64/net/ipv6/protocol.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv6/protocol.c 2003-03-11 14:37:06.000000000 -0800
@@ -32,7 +32,6 @@
#include <linux/in6.h>
#include <linux/netdevice.h>
#include <linux/if_arp.h>
-#include <linux/brlock.h>

#include <net/sock.h>
#include <net/snmp.h>
@@ -41,21 +40,21 @@
#include <net/protocol.h>

struct inet6_protocol *inet6_protos[MAX_INET_PROTOS];
+static spinlock_t inet6_proto_lock = SPIN_LOCK_UNLOCKED;

int inet6_add_protocol(struct inet6_protocol *prot, unsigned char protocol)
{
int ret, hash = protocol & (MAX_INET_PROTOS - 1);

- br_write_lock_bh(BR_NETPROTO_LOCK);

+ spin_lock_bh(&inet6_proto_lock);
if (inet6_protos[hash]) {
ret = -1;
} else {
inet6_protos[hash] = prot;
ret = 0;
}
-
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inet6_proto_lock);

return ret;
}
@@ -68,7 +67,7 @@
{
int ret, hash = protocol & (MAX_INET_PROTOS - 1);

- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&inet6_proto_lock);

if (inet6_protos[hash] != prot) {
ret = -1;
@@ -76,8 +75,9 @@
inet6_protos[hash] = NULL;
ret = 0;
}
+ spin_unlock_bh(&inet6_proto_lock);

- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();

return ret;
}



2003-03-12 00:20:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] (0/8) replace brlock with RCU

From: Linus Torvalds <[email protected]>
Date: Tue, 11 Mar 2003 16:23:24 -0800 (PST)

On 11 Mar 2003, Stephen Hemminger wrote:
>
> The following sequence of patches replaces the remaining use of brlock
> with RCU. Most of this is fairly straightforward. The unregister functions
> use synchronize_kernel(), perhaps there should be a special version to
> indicate sychronizing with network BH.
>
> Comments?

I'm not going to take this directly, but if it passes muster with David,
I'm happy. The fewer locking primitives we need, the better, and brlocks
have had enough problems that I wouldn't mind getting rid of them.

I'm fine with it, as long as I get shown how to get the equivalent
atomic sequence using the new primitives. Ie. is there still a way
to go:

stop_all_incoming_packets();
do_something();
resume_all_incoming_packets();

with the new stuff?

2003-03-12 00:06:50

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (6/8) Eliminate brlock from ipv4

Replace brlock in IPV4 with RCU

diff -urN -X dontdiff linux-2.5.64/net/ipv4/af_inet.c linux-2.5-nobrlock/net/ipv4/af_inet.c
--- linux-2.5.64/net/ipv4/af_inet.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/af_inet.c 2003-03-10 16:03:13.000000000 -0800
@@ -94,7 +94,6 @@
#include <linux/inet.h>
#include <linux/igmp.h>
#include <linux/netdevice.h>
-#include <linux/brlock.h>
#include <net/ip.h>
#include <net/protocol.h>
#include <net/arp.h>
@@ -130,6 +129,7 @@
* build a new socket.
*/
struct list_head inetsw[SOCK_MAX];
+static spinlock_t inetsw_lock = SPIN_LOCK_UNLOCKED;

/* New destruction routine */

@@ -337,8 +337,8 @@

/* Look for the requested type/protocol pair. */
answer = NULL;
- br_read_lock_bh(BR_NETPROTO_LOCK);
- list_for_each(p, &inetsw[sock->type]) {
+ rcu_read_lock();
+ list_for_each_rcu(p, &inetsw[sock->type]) {
answer = list_entry(p, struct inet_protosw, list);

/* Check the non-wild match. */
@@ -356,7 +356,7 @@
}
answer = NULL;
}
- br_read_unlock_bh(BR_NETPROTO_LOCK);
+ rcu_read_unlock();

err = -ESOCKTNOSUPPORT;
if (!answer)
@@ -978,7 +978,7 @@
int protocol = p->protocol;
struct list_head *last_perm;

- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&inetsw_lock);

if (p->type > SOCK_MAX)
goto out_illegal;
@@ -1007,9 +1007,9 @@
* non-permanent entry. This means that when we remove this entry, the
* system automatically returns to the old behavior.
*/
- list_add(&p->list, last_perm);
+ list_add_rcu(&p->list, last_perm);
out:
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inetsw_lock);
return;

out_permanent:
@@ -1031,9 +1031,11 @@
"Attempt to unregister permanent protocol %d.\n",
p->protocol);
} else {
- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&inetsw_lock);
list_del(&p->list);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inetsw_lock);
+
+ synchronize_kernel();
}
}

diff -urN -X dontdiff linux-2.5.64/net/ipv4/ip_output.c linux-2.5-nobrlock/net/ipv4/ip_output.c
--- linux-2.5.64/net/ipv4/ip_output.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/ip_output.c 2003-03-10 13:33:07.000000000 -0800
@@ -1261,11 +1261,10 @@

static struct packet_type ip_packet_type =
{
- __constant_htons(ETH_P_IP),
- NULL, /* All devices */
- ip_rcv,
- (void*)1,
- NULL,
+ .type = __constant_htons(ETH_P_IP),
+ .dev = NULL, /* All devices */
+ .func = ip_rcv,
+ .data = (void*)1,
};

/*
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_conntrack_core.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.5.64/net/ipv4/netfilter/ip_conntrack_core.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_conntrack_core.c 2003-03-11 14:36:31.000000000 -0800
@@ -24,7 +24,6 @@
#include <linux/skbuff.h>
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
-#include <linux/brlock.h>
#include <net/checksum.h>
#include <linux/stddef.h>
#include <linux/sysctl.h>
@@ -1160,8 +1159,7 @@
WRITE_UNLOCK(&ip_conntrack_lock);

/* Someone could be still looking at the helper in a bh. */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();
}

/* Refresh conntrack for this many jiffies. */
@@ -1402,8 +1400,7 @@
/* This makes sure all current packets have passed through
netfilter framework. Roll on, two-stage module
delete... */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();

i_see_dead_people:
ip_ct_selective_cleanup(kill_all, NULL);
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_conntrack_standalone.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_conntrack_standalone.c
--- linux-2.5.64/net/ipv4/netfilter/ip_conntrack_standalone.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_conntrack_standalone.c 2003-03-11 14:41:17.000000000 -0800
@@ -15,7 +15,6 @@
#include <linux/skbuff.h>
#include <linux/proc_fs.h>
#include <linux/version.h>
-#include <linux/brlock.h>
#include <net/checksum.h>

#define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_conntrack_lock)
@@ -325,8 +324,7 @@
WRITE_UNLOCK(&ip_conntrack_lock);

/* Somebody could be still looking at the proto in bh. */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();

/* Remove all contrack entries for this protocol */
ip_ct_selective_cleanup(kill_proto, &proto->proto);
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_nat_core.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_core.c
--- linux-2.5.64/net/ipv4/netfilter/ip_nat_core.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_core.c 2003-03-11 14:37:51.000000000 -0800
@@ -8,7 +8,6 @@
#include <linux/timer.h>
#include <linux/skbuff.h>
#include <linux/netfilter_ipv4.h>
-#include <linux/brlock.h>
#include <linux/vmalloc.h>
#include <net/checksum.h>
#include <net/icmp.h>
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_nat_helper.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_helper.c
--- linux-2.5.64/net/ipv4/netfilter/ip_nat_helper.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_helper.c 2003-03-11 14:38:23.000000000 -0800
@@ -20,7 +20,6 @@
#include <linux/timer.h>
#include <linux/skbuff.h>
#include <linux/netfilter_ipv4.h>
-#include <linux/brlock.h>
#include <net/checksum.h>
#include <net/icmp.h>
#include <net/ip.h>
@@ -545,8 +544,7 @@
WRITE_UNLOCK(&ip_nat_lock);

/* Someone could be still looking at the helper in a bh. */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();

/* Find anything using it, and umm, kill them. We can't turn
them into normal connections: if we've adjusted SYNs, then
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_nat_snmp_basic.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_snmp_basic.c
--- linux-2.5.64/net/ipv4/netfilter/ip_nat_snmp_basic.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_snmp_basic.c 2003-03-11 15:03:15.000000000 -0800
@@ -50,7 +50,6 @@
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_ipv4/ip_nat.h>
#include <linux/netfilter_ipv4/ip_nat_helper.h>
-#include <linux/brlock.h>
#include <linux/types.h>
#include <linux/ip.h>
#include <net/udp.h>
@@ -1351,8 +1350,6 @@
{
ip_nat_helper_unregister(&snmp);
ip_nat_helper_unregister(&snmp_trap);
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
}

module_init(init);
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_nat_standalone.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_standalone.c
--- linux-2.5.64/net/ipv4/netfilter/ip_nat_standalone.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_nat_standalone.c 2003-03-11 15:05:05.000000000 -0800
@@ -24,7 +24,6 @@
#include <net/checksum.h>
#include <linux/spinlock.h>
#include <linux/version.h>
-#include <linux/brlock.h>

#define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_nat_lock)
#define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_nat_lock)
@@ -268,8 +267,7 @@
WRITE_UNLOCK(&ip_nat_lock);

/* Someone could be still looking at the proto in a bh. */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();
}

static int init_or_cleanup(int init)
diff -urN -X dontdiff linux-2.5.64/net/ipv4/netfilter/ip_queue.c linux-2.5-nobrlock/net/ipv4/netfilter/ip_queue.c
--- linux-2.5.64/net/ipv4/netfilter/ip_queue.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/netfilter/ip_queue.c 2003-03-11 14:40:45.000000000 -0800
@@ -23,7 +23,6 @@
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netlink.h>
#include <linux/spinlock.h>
-#include <linux/brlock.h>
#include <linux/sysctl.h>
#include <linux/proc_fs.h>
#include <linux/security.h>
@@ -685,8 +684,7 @@

cleanup:
nf_unregister_queue_handler(PF_INET);
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+
ipq_flush(NF_DROP);

cleanup_sysctl:
diff -urN -X dontdiff linux-2.5.64/net/ipv4/protocol.c linux-2.5-nobrlock/net/ipv4/protocol.c
--- linux-2.5.64/net/ipv4/protocol.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/ipv4/protocol.c 2003-03-11 14:42:32.000000000 -0800
@@ -37,7 +37,6 @@
#include <linux/inet.h>
#include <linux/netdevice.h>
#include <linux/timer.h>
-#include <linux/brlock.h>
#include <net/ip.h>
#include <net/protocol.h>
#include <net/tcp.h>
@@ -49,6 +48,7 @@
#include <linux/igmp.h>

struct inet_protocol *inet_protos[MAX_INET_PROTOS];
+static spinlock_t inet_proto_lock = SPIN_LOCK_UNLOCKED;

/*
* Add a protocol handler to the hash tables
@@ -60,16 +60,14 @@

hash = protocol & (MAX_INET_PROTOS - 1);

- br_write_lock_bh(BR_NETPROTO_LOCK);
-
+ spin_lock_bh(&inet_proto_lock);
if (inet_protos[hash]) {
ret = -1;
} else {
inet_protos[hash] = prot;
ret = 0;
}
-
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inet_proto_lock);

return ret;
}
@@ -84,16 +82,15 @@

hash = protocol & (MAX_INET_PROTOS - 1);

- br_write_lock_bh(BR_NETPROTO_LOCK);
-
+ spin_lock_bh(&inet_proto_lock);
if (inet_protos[hash] == prot) {
inet_protos[hash] = NULL;
ret = 0;
} else {
ret = -1;
}
-
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&inet_proto_lock);
+ synchronize_kernel();

return ret;
}



2003-03-12 00:23:10

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

On Tue, 2003-03-11 at 16:23, David S. Miller wrote:
> From: Stephen Hemminger <[email protected]>
> Date: 11 Mar 2003 16:15:23 -0800
>
> Previous patches killed all remaining uses of brlock so bye.
>
> I'm all for this once patch 2/8 gets fixed up :-)
>
> So what is the new way to say "stop all incoming packet
> processing while I update data structure X"?

The caller's didn't need to stop packet processing, just remove their type
from a list or some other call back hook. A simple pointer update takes
care of removing a simple call back. The list delete rcu code takes care
of the memory barriers and doing the updates in the right order.
This ensures no future packet processing will grab that token

The next problem is how to ensure that packets in flight are not using
the hook. This is handled by call_rcu() for the more general case where
processing can be deferred (only used a couple places). Other places
use synchronize_kernel() which just waits. What call_rcu() does is make sure
that every processor has seen the change.



2003-03-12 00:18:07

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] (2/8) Eliminate brlock for packet_type

On Tue, 2003-03-11 at 16:20, David S. Miller wrote:
> From: Stephen Hemminger <[email protected]>
> Date: 11 Mar 2003 16:14:40 -0800
>
> Replace linked list for packet_type with brlock with list macros and RCU.
>
> Then why is a VLAN patch attached?

Little pieces breed confusion...this is the right piece.

diff -urN -X dontdiff linux-2.5.64/include/linux/netdevice.h linux-2.5-nobrlock/include/linux/netdevice.h
--- linux-2.5.64/include/linux/netdevice.h 2003-03-11 09:08:00.000000000 -0800
+++ linux-2.5-nobrlock/include/linux/netdevice.h 2003-03-11 14:15:17.000000000 -0800
@@ -452,7 +452,7 @@
int (*func) (struct sk_buff *, struct net_device *,
struct packet_type *);
void *data; /* Private to the packet type */
- struct packet_type *next;
+ struct list_head packet_list;
};


diff -urN -X dontdiff linux-2.5.64/net/core/dev.c linux-2.5-nobrlock/net/core/dev.c
--- linux-2.5.64/net/core/dev.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/core/dev.c 2003-03-11 14:37:36.000000000 -0800
@@ -90,7 +90,6 @@
#include <linux/etherdevice.h>
#include <linux/notifier.h>
#include <linux/skbuff.h>
-#include <linux/brlock.h>
#include <net/sock.h>
#include <linux/rtnetlink.h>
#include <linux/proc_fs.h>
@@ -170,8 +169,11 @@
* 86DD IPv6
*/

-static struct packet_type *ptype_base[16]; /* 16 way hashed list */
-static struct packet_type *ptype_all; /* Taps */
+static spinlock_t ptype_lock = SPIN_LOCK_UNLOCKED;
+static struct list_head ptype_base[16]; /* 16 way hashed list */
+static struct list_head ptype_all; /* Taps */
+
+static spinlock_t master_lock = SPIN_LOCK_UNLOCKED;

#ifdef OFFLINE_SAMPLE
static void sample_queue(unsigned long dummy);
@@ -203,7 +205,6 @@

static struct subsystem net_subsys;

-
/*******************************************************************************

Protocol management and registration routines
@@ -245,8 +246,7 @@
{
int hash;

- br_write_lock_bh(BR_NETPROTO_LOCK);
-
+ spin_lock_bh(&ptype_lock);
#ifdef CONFIG_NET_FASTROUTE
/* Hack to detect packet socket */
if (pt->data && (long)(pt->data) != 1) {
@@ -256,14 +256,12 @@
#endif
if (pt->type == htons(ETH_P_ALL)) {
netdev_nit++;
- pt->next = ptype_all;
- ptype_all = pt;
+ list_add_rcu(&pt->packet_list, &ptype_all);
} else {
hash = ntohs(pt->type) & 15;
- pt->next = ptype_base[hash];
- ptype_base[hash] = pt;
+ list_add_rcu(&pt->packet_list, &ptype_base[hash]);
}
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&ptype_lock);
}

extern void linkwatch_run_queue(void);
@@ -279,29 +277,30 @@
*/
void dev_remove_pack(struct packet_type *pt)
{
- struct packet_type **pt1;
-
- br_write_lock_bh(BR_NETPROTO_LOCK);
+ struct list_head *head, *pelem;

- if (pt->type == htons(ETH_P_ALL)) {
- netdev_nit--;
- pt1 = &ptype_all;
- } else
- pt1 = &ptype_base[ntohs(pt->type) & 15];
-
- for (; *pt1; pt1 = &((*pt1)->next)) {
- if (pt == *pt1) {
- *pt1 = pt->next;
+ if (pt->type == htons(ETH_P_ALL))
+ head = &ptype_all;
+ else
+ head = &ptype_base[ntohs(pt->type) & 15];
+
+ spin_lock_bh(&ptype_lock);
+ list_for_each(pelem, head) {
+ if (list_entry(pelem, struct packet_type, packet_list) == pt) {
+ list_del(pelem);
#ifdef CONFIG_NET_FASTROUTE
if (pt->data)
netdev_fastroute_obstacles--;
#endif
+ if (pt->type == htons(ETH_P_ALL))
+ netdev_nit--;
goto out;
}
}
printk(KERN_WARNING "dev_remove_pack: %p not found.\n", pt);
-out:
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+
+ out:
+ spin_unlock_bh(&ptype_lock);
}

/******************************************************************************
@@ -896,11 +895,13 @@

void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
{
- struct packet_type *ptype;
+ struct list_head *plist;
do_gettimeofday(&skb->stamp);

- br_read_lock(BR_NETPROTO_LOCK);
- for (ptype = ptype_all; ptype; ptype = ptype->next) {
+ rcu_read_lock();
+ list_for_each_rcu(plist, &ptype_all) {
+ struct packet_type *ptype
+ = list_entry(plist, struct packet_type, packet_list);
/* Never send packets back to the socket
* they originated from - MvS ([email protected])
*/
@@ -930,7 +931,7 @@
ptype->func(skb2, skb->dev, ptype);
}
}
- br_read_unlock(BR_NETPROTO_LOCK);
+ rcu_read_unlock();
}

/* Calculate csum in the case, when packet is misrouted.
@@ -1423,6 +1424,7 @@

int netif_receive_skb(struct sk_buff *skb)
{
+ struct list_head *pcur;
struct packet_type *ptype, *pt_prev;
int ret = NET_RX_DROP;
unsigned short type = skb->protocol;
@@ -1443,8 +1445,10 @@

skb->h.raw = skb->nh.raw = skb->data;

+ rcu_read_lock();
pt_prev = NULL;
- for (ptype = ptype_all; ptype; ptype = ptype->next) {
+ list_for_each_rcu(pcur, &ptype_all) {
+ ptype = list_entry(pcur, struct packet_type, packet_list);
if (!ptype->dev || ptype->dev == skb->dev) {
if (pt_prev) {
if (!pt_prev->data) {
@@ -1476,7 +1480,9 @@
}
#endif

- for (ptype = ptype_base[ntohs(type) & 15]; ptype; ptype = ptype->next) {
+ list_for_each_rcu(pcur, &ptype_base[ntohs(type) & 15]) {
+ ptype = list_entry(pcur, struct packet_type, packet_list);
+
if (ptype->type == type &&
(!ptype->dev || ptype->dev == skb->dev)) {
if (pt_prev) {
@@ -1506,6 +1512,7 @@
*/
ret = NET_RX_DROP;
}
+ rcu_read_unlock();

return ret;
}
@@ -1580,7 +1587,7 @@
unsigned long start_time = jiffies;
int budget = netdev_max_backlog;

- br_read_lock(BR_NETPROTO_LOCK);
+ preempt_disable();
local_irq_disable();

while (!list_empty(&queue->poll_list)) {
@@ -1609,7 +1616,7 @@
}
out:
local_irq_enable();
- br_read_unlock(BR_NETPROTO_LOCK);
+ preempt_enable();
return;

softnet_break:
@@ -1925,6 +1932,10 @@
#define dev_proc_init() 0
#endif /* CONFIG_PROC_FS */

+static RCU_HEAD(netdev_master_rcu);
+static void unset_old_master(void *arg) {
+ dev_put((struct net_device *) arg);
+}

/**
* netdev_set_master - set up master/slave pair
@@ -1943,18 +1954,20 @@

ASSERT_RTNL();

+ spin_lock_bh(&master_lock);
if (master) {
- if (old)
+ if (old) {
+ spin_unlock_bh(&master_lock);
return -EBUSY;
+ }
dev_hold(master);
}

- br_write_lock_bh(BR_NETPROTO_LOCK);
slave->master = master;
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&master_lock);

- if (old)
- dev_put(old);
+ if (old)
+ call_rcu(&netdev_master_rcu, unset_old_master, old);

if (master)
slave->flags |= IFF_SLAVE;
@@ -1962,6 +1975,7 @@
slave->flags &= ~IFF_SLAVE;

rtmsg_ifinfo(RTM_NEWLINK, slave, IFF_SLAVE);
+
return 0;
}

@@ -2646,10 +2660,7 @@
return -ENODEV;
}

- /* Synchronize to net_rx_action. */
- br_write_lock_bh(BR_NETPROTO_LOCK);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
-
+ synchronize_kernel();

#ifdef CONFIG_NET_FASTROUTE
dev_clear_fastroute(dev);
@@ -2755,7 +2766,6 @@
return 0;
}

-
/*
* Initialize the DEV module. At boot time this walks the device list and
* unhooks any devices that fail to initialise (normally hardware not
@@ -2786,6 +2796,11 @@
if (dev_proc_init())
goto out;

+ /* Initialize packet type chains */
+ INIT_LIST_HEAD(&ptype_all);
+ for (i = 0; i < ARRAY_SIZE(ptype_base); i++)
+ INIT_LIST_HEAD(&ptype_base[i]);
+
subsystem_register(&net_subsys);

#ifdef CONFIG_NET_DIVERT



2003-03-12 00:07:14

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (5/8) Eliminate brlock from netfilter

Replace brlock in netfilter code with RCU.

diff -urN -X dontdiff linux-2.5.64/net/core/netfilter.c linux-2.5-nobrlock/net/core/netfilter.c
--- linux-2.5.64/net/core/netfilter.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/core/netfilter.c 2003-03-11 14:39:38.000000000 -0800
@@ -19,7 +19,6 @@
#include <linux/interrupt.h>
#include <linux/if.h>
#include <linux/netdevice.h>
-#include <linux/brlock.h>
#include <linux/inetdevice.h>
#include <net/sock.h>
#include <net/route.h>
@@ -46,6 +45,7 @@

struct list_head nf_hooks[NPROTO][NF_MAX_HOOKS];
static LIST_HEAD(nf_sockopts);
+static spinlock_t nf_lock = SPIN_LOCK_UNLOCKED;

/*
* A queue handler may be registered for each protocol. Each is protected by
@@ -56,28 +56,31 @@
nf_queue_outfn_t outfn;
void *data;
} queue_handler[NPROTO];
+static spinlock_t queue_lock = SPIN_LOCK_UNLOCKED;

int nf_register_hook(struct nf_hook_ops *reg)
{
struct list_head *i;

- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&nf_lock);
for (i = nf_hooks[reg->pf][reg->hooknum].next;
i != &nf_hooks[reg->pf][reg->hooknum];
i = i->next) {
if (reg->priority < ((struct nf_hook_ops *)i)->priority)
break;
}
- list_add(&reg->list, i->prev);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ list_add_rcu(&reg->list, i->prev);
+ spin_unlock_bh(&nf_lock);
return 0;
}

void nf_unregister_hook(struct nf_hook_ops *reg)
{
- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&nf_lock);
list_del(&reg->list);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&nf_lock);
+
+ synchronize_kernel();
}

/* Do exclusive ranges overlap? */
@@ -344,7 +347,9 @@
int (*okfn)(struct sk_buff *),
int hook_thresh)
{
- for (*i = (*i)->next; *i != head; *i = (*i)->next) {
+
+ for (*i = (*i)->next; *i != head; *i = (*i)->next,
+ ({ read_barrier_depends(); 0;})) {
struct nf_hook_ops *elem = (struct nf_hook_ops *)*i;

if (hook_thresh > elem->priority)
@@ -381,15 +386,16 @@
{
int ret;

- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&queue_lock);
if (queue_handler[pf].outfn)
ret = -EBUSY;
else {
- queue_handler[pf].outfn = outfn;
queue_handler[pf].data = data;
+ wmb();
+ queue_handler[pf].outfn = outfn;
ret = 0;
}
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&queue_lock);

return ret;
}
@@ -397,10 +403,14 @@
/* The caller must flush their queue before this */
int nf_unregister_queue_handler(int pf)
{
- br_write_lock_bh(BR_NETPROTO_LOCK);
+ spin_lock_bh(&queue_lock);
queue_handler[pf].outfn = NULL;
+ wmb();
queue_handler[pf].data = NULL;
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ spin_unlock_bh(&queue_lock);
+
+ synchronize_kernel();
+
return 0;
}

@@ -422,9 +432,10 @@
struct net_device *physoutdev = NULL;
#endif

+ rcu_read_lock();
if (!queue_handler[pf].outfn) {
kfree_skb(skb);
- return;
+ goto out;
}

info = kmalloc(sizeof(*info), GFP_ATOMIC);
@@ -433,7 +444,7 @@
printk(KERN_ERR "OOM queueing packet %p\n",
skb);
kfree_skb(skb);
- return;
+ goto out;
}

*info = (struct nf_info) {
@@ -463,8 +474,9 @@
#endif
kfree(info);
kfree_skb(skb);
- return;
}
+ out:
+ rcu_read_unlock();
}

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
@@ -491,7 +503,7 @@
}

/* We may already have this, but read-locks nest anyway */
- br_read_lock_bh(BR_NETPROTO_LOCK);
+ rcu_read_lock();

#ifdef CONFIG_NETFILTER_DEBUG
if (skb->nf_debug & (1 << hook)) {
@@ -520,7 +532,7 @@
break;
}

- br_read_unlock_bh(BR_NETPROTO_LOCK);
+ rcu_read_unlock();
return ret;
}

@@ -530,8 +542,7 @@
struct list_head *elem = &info->elem->list;
struct list_head *i;

- /* We don't have BR_NETPROTO_LOCK here */
- br_read_lock_bh(BR_NETPROTO_LOCK);
+ rcu_read_lock();
for (i = nf_hooks[info->pf][info->hook].next; i != elem; i = i->next) {
if (i == &nf_hooks[info->pf][info->hook]) {
/* The module which sent it to userspace is gone. */
@@ -569,7 +580,7 @@
kfree_skb(skb);
break;
}
- br_read_unlock_bh(BR_NETPROTO_LOCK);
+ rcu_read_unlock();

/* Release those devices we held, or Alexey will kill me. */
if (info->indev) dev_put(info->indev);



2003-03-12 00:12:42

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] (2/8) Eliminate brlock for packet_type

From: Stephen Hemminger <[email protected]>
Date: 11 Mar 2003 16:14:40 -0800

Replace linked list for packet_type with brlock with list macros and RCU.

Then why is a VLAN patch attached?

2003-03-12 00:16:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] (0/8) replace brlock with RCU


On 11 Mar 2003, Stephen Hemminger wrote:
>
> The following sequence of patches replaces the remaining use of brlock
> with RCU. Most of this is fairly straightforward. The unregister functions
> use synchronize_kernel(), perhaps there should be a special version to
> indicate sychronizing with network BH.
>
> Comments?

I'm not going to take this directly, but if it passes muster with David,
I'm happy. The fewer locking primitives we need, the better, and brlocks
have had enough problems that I wouldn't mind getting rid of them.

Linus

2003-03-12 00:09:43

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (4/8) Eliminate brlock in net/bridge

Eliminate brlock in bridge code.

diff -urN -X dontdiff linux-2.5.64/net/bridge/br.c linux-2.5-nobrlock/net/bridge/br.c
--- linux-2.5.64/net/bridge/br.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/bridge/br.c 2003-03-10 15:49:15.000000000 -0800
@@ -21,7 +21,6 @@
#include <linux/etherdevice.h>
#include <linux/init.h>
#include <linux/if_bridge.h>
-#include <linux/brlock.h>
#include <asm/uaccess.h>
#include "br_private.h"

@@ -73,14 +72,15 @@
unregister_netdevice_notifier(&br_device_notifier);
br_call_ioctl_atomic(__br_clear_ioctl_hook);

- br_write_lock_bh(BR_NETPROTO_LOCK);
br_handle_frame_hook = NULL;
- br_write_unlock_bh(BR_NETPROTO_LOCK);

#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
br_fdb_get_hook = NULL;
br_fdb_put_hook = NULL;
#endif
+
+ /* netif_receive_skb uses RCU */
+ synchronize_kernel();
}

EXPORT_SYMBOL(br_should_route_hook);
diff -urN -X dontdiff linux-2.5.64/net/bridge/br_if.c linux-2.5-nobrlock/net/bridge/br_if.c
--- linux-2.5.64/net/bridge/br_if.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/bridge/br_if.c 2003-03-10 15:50:54.000000000 -0800
@@ -18,7 +18,6 @@
#include <linux/if_bridge.h>
#include <linux/inetdevice.h>
#include <linux/rtnetlink.h>
-#include <linux/brlock.h>
#include <asm/uaccess.h>
#include "br_private.h"

@@ -87,12 +86,11 @@

static void del_ifs(struct net_bridge *br)
{
- br_write_lock_bh(BR_NETPROTO_LOCK);
write_lock(&br->lock);
while (br->port_list != NULL)
__br_del_if(br, br->port_list->dev);
write_unlock(&br->lock);
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();
}

static struct net_bridge *new_nb(char *name)
@@ -257,13 +255,12 @@
{
int retval;

- br_write_lock_bh(BR_NETPROTO_LOCK);
write_lock(&br->lock);
retval = __br_del_if(br, dev);
br_stp_recalculate_bridge_id(br);
write_unlock(&br->lock);
- br_write_unlock_bh(BR_NETPROTO_LOCK);

+ synchronize_kernel();
return retval;
}

diff -urN -X dontdiff linux-2.5.64/net/bridge/netfilter/ebtable_broute.c linux-2.5-nobrlock/net/bridge/netfilter/ebtable_broute.c
--- linux-2.5.64/net/bridge/netfilter/ebtable_broute.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/net/bridge/netfilter/ebtable_broute.c 2003-03-11 15:04:04.000000000 -0800
@@ -14,7 +14,6 @@
#include <linux/netfilter_bridge/ebtables.h>
#include <linux/module.h>
#include <linux/if_bridge.h>
-#include <linux/brlock.h>

// EBT_ACCEPT means the frame will be bridged
// EBT_DROP means the frame will be routed
@@ -68,18 +67,17 @@
ret = ebt_register_table(&broute_table);
if (ret < 0)
return ret;
- br_write_lock_bh(BR_NETPROTO_LOCK);
// see br_input.c
br_should_route_hook = ebt_broute;
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();
+
return ret;
}

static void __exit fini(void)
{
- br_write_lock_bh(BR_NETPROTO_LOCK);
br_should_route_hook = NULL;
- br_write_unlock_bh(BR_NETPROTO_LOCK);
+ synchronize_kernel();
ebt_unregister_table(&broute_table);
}




2003-03-12 00:27:08

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

From: Stephen Hemminger <[email protected]>
Date: 11 Mar 2003 16:31:46 -0800

The caller's didn't need to stop packet processing, just remove their type
from a list or some other call back hook. A simple pointer update takes
care of removing a simple call back. The list delete rcu code takes care
of the memory barriers and doing the updates in the right order.
This ensures no future packet processing will grab that token

Ok, I'm fine with this then. Linus you can apply all of his patches.

2003-03-12 00:38:09

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

On Tue, 2003-03-11 at 16:44, Linus Torvalds wrote:
> On Tue, 11 Mar 2003, David S. Miller wrote:
> >
> > Ok, I'm fine with this then. Linus you can apply all of his patches.
>
> I'm a lazy bum, and I would _really_ want this tested more before it hits
> my tree. I think it makes sense, but still..


I agree, given the number of sub-pieces it deserves some more time...

2003-03-12 00:05:49

by Stephen Hemminger

[permalink] [raw]
Subject: [PATCH] (8/8) Kill brlock

Previous patches killed all remaining uses of brlock so bye.

diff -urN -X dontdiff linux-2.5.64/include/linux/brlock.h linux-2.5-nobrlock/include/linux/brlock.h
--- linux-2.5.64/include/linux/brlock.h 2003-03-11 09:08:00.000000000 -0800
+++ linux-2.5-nobrlock/include/linux/brlock.h 1969-12-31 16:00:00.000000000 -0800
@@ -1,222 +0,0 @@
-#ifndef __LINUX_BRLOCK_H
-#define __LINUX_BRLOCK_H
-
-/*
- * 'Big Reader' read-write spinlocks.
- *
- * super-fast read/write locks, with write-side penalty. The point
- * is to have a per-CPU read/write lock. Readers lock their CPU-local
- * readlock, writers must lock all locks to get write access. These
- * CPU-read-write locks are semantically identical to normal rwlocks.
- * Memory usage is higher as well. (NR_CPUS*L1_CACHE_BYTES bytes)
- *
- * The most important feature is that these spinlocks do not cause
- * cacheline ping-pong in the 'most readonly data' case.
- *
- * Copyright 2000, Ingo Molnar <[email protected]>
- *
- * Registry idea and naming [ crutial! :-) ] by:
- *
- * David S. Miller <[email protected]>
- *
- * David has an implementation that doesn't use atomic operations in
- * the read branch via memory ordering tricks - i guess we need to
- * split this up into a per-arch thing? The atomicity issue is a
- * secondary item in profiles, at least on x86 platforms.
- *
- * The atomic op version overhead is indeed a big deal on
- * load-locked/store-conditional cpus (ALPHA/MIPS/PPC) and
- * compare-and-swap cpus (Sparc64). So we control which
- * implementation to use with a __BRLOCK_USE_ATOMICS define. -DaveM
- *
- */
-
-/* Register bigreader lock indices here. */
-enum brlock_indices {
- BR_NETPROTO_LOCK,
- __BR_END
-};
-
-#include <linux/config.h>
-
-#ifdef CONFIG_SMP
-
-#include <linux/cache.h>
-#include <linux/spinlock.h>
-
-#if defined(__i386__) || defined(__ia64__) || defined(__x86_64__)
-#define __BRLOCK_USE_ATOMICS
-#else
-#undef __BRLOCK_USE_ATOMICS
-#endif
-
-#ifdef __BRLOCK_USE_ATOMICS
-typedef rwlock_t brlock_read_lock_t;
-#else
-typedef unsigned int brlock_read_lock_t;
-#endif
-
-/*
- * align last allocated index to the next cacheline:
- */
-#define __BR_IDX_MAX \
- (((sizeof(brlock_read_lock_t)*__BR_END + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1)) / sizeof(brlock_read_lock_t))
-
-extern brlock_read_lock_t __brlock_array[NR_CPUS][__BR_IDX_MAX];
-
-#ifndef __BRLOCK_USE_ATOMICS
-struct br_wrlock {
- spinlock_t lock;
-} __attribute__ ((__aligned__(SMP_CACHE_BYTES)));
-
-extern struct br_wrlock __br_write_locks[__BR_IDX_MAX];
-#endif
-
-extern void __br_lock_usage_bug (void);
-
-#ifdef __BRLOCK_USE_ATOMICS
-
-static inline void br_read_lock (enum brlock_indices idx)
-{
- /*
- * This causes a link-time bug message if an
- * invalid index is used:
- */
- if (idx >= __BR_END)
- __br_lock_usage_bug();
-
- preempt_disable();
- _raw_read_lock(&__brlock_array[smp_processor_id()][idx]);
-}
-
-static inline void br_read_unlock (enum brlock_indices idx)
-{
- if (idx >= __BR_END)
- __br_lock_usage_bug();
-
- read_unlock(&__brlock_array[smp_processor_id()][idx]);
-}
-
-#else /* ! __BRLOCK_USE_ATOMICS */
-static inline void br_read_lock (enum brlock_indices idx)
-{
- unsigned int *ctr;
- spinlock_t *lock;
-
- /*
- * This causes a link-time bug message if an
- * invalid index is used:
- */
- if (idx >= __BR_END)
- __br_lock_usage_bug();
-
- preempt_disable();
- ctr = &__brlock_array[smp_processor_id()][idx];
- lock = &__br_write_locks[idx].lock;
-again:
- (*ctr)++;
- mb();
- if (spin_is_locked(lock)) {
- (*ctr)--;
- wmb(); /*
- * The release of the ctr must become visible
- * to the other cpus eventually thus wmb(),
- * we don't care if spin_is_locked is reordered
- * before the releasing of the ctr.
- * However IMHO this wmb() is superflous even in theory.
- * It would not be superflous only if on the
- * other CPUs doing a ldl_l instead of an ldl
- * would make a difference and I don't think this is
- * the case.
- * I'd like to clarify this issue further
- * but for now this is a slow path so adding the
- * wmb() will keep us on the safe side.
- */
- while (spin_is_locked(lock))
- barrier();
- goto again;
- }
-}
-
-static inline void br_read_unlock (enum brlock_indices idx)
-{
- unsigned int *ctr;
-
- if (idx >= __BR_END)
- __br_lock_usage_bug();
-
- ctr = &__brlock_array[smp_processor_id()][idx];
-
- wmb();
- (*ctr)--;
- preempt_enable();
-}
-#endif /* __BRLOCK_USE_ATOMICS */
-
-/* write path not inlined - it's rare and larger */
-
-extern void FASTCALL(__br_write_lock (enum brlock_indices idx));
-extern void FASTCALL(__br_write_unlock (enum brlock_indices idx));
-
-static inline void br_write_lock (enum brlock_indices idx)
-{
- if (idx >= __BR_END)
- __br_lock_usage_bug();
- __br_write_lock(idx);
-}
-
-static inline void br_write_unlock (enum brlock_indices idx)
-{
- if (idx >= __BR_END)
- __br_lock_usage_bug();
- __br_write_unlock(idx);
-}
-
-#else
-# define br_read_lock(idx) ({ (void)(idx); preempt_disable(); })
-# define br_read_unlock(idx) ({ (void)(idx); preempt_enable(); })
-# define br_write_lock(idx) ({ (void)(idx); preempt_disable(); })
-# define br_write_unlock(idx) ({ (void)(idx); preempt_enable(); })
-#endif /* CONFIG_SMP */
-
-/*
- * Now enumerate all of the possible sw/hw IRQ protected
- * versions of the interfaces.
- */
-#define br_read_lock_irqsave(idx, flags) \
- do { local_irq_save(flags); br_read_lock(idx); } while (0)
-
-#define br_read_lock_irq(idx) \
- do { local_irq_disable(); br_read_lock(idx); } while (0)
-
-#define br_read_lock_bh(idx) \
- do { local_bh_disable(); br_read_lock(idx); } while (0)
-
-#define br_write_lock_irqsave(idx, flags) \
- do { local_irq_save(flags); br_write_lock(idx); } while (0)
-
-#define br_write_lock_irq(idx) \
- do { local_irq_disable(); br_write_lock(idx); } while (0)
-
-#define br_write_lock_bh(idx) \
- do { local_bh_disable(); br_write_lock(idx); } while (0)
-
-#define br_read_unlock_irqrestore(idx, flags) \
- do { br_read_unlock(irx); local_irq_restore(flags); } while (0)
-
-#define br_read_unlock_irq(idx) \
- do { br_read_unlock(idx); local_irq_enable(); } while (0)
-
-#define br_read_unlock_bh(idx) \
- do { br_read_unlock(idx); local_bh_enable(); } while (0)
-
-#define br_write_unlock_irqrestore(idx, flags) \
- do { br_write_unlock(irx); local_irq_restore(flags); } while (0)
-
-#define br_write_unlock_irq(idx) \
- do { br_write_unlock(idx); local_irq_enable(); } while (0)
-
-#define br_write_unlock_bh(idx) \
- do { br_write_unlock(idx); local_bh_enable(); } while (0)
-
-#endif /* __LINUX_BRLOCK_H */
diff -urN -X dontdiff linux-2.5.64/kernel/ksyms.c linux-2.5-nobrlock/kernel/ksyms.c
--- linux-2.5.64/kernel/ksyms.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/kernel/ksyms.c 2003-03-10 15:50:15.000000000 -0800
@@ -40,7 +40,6 @@
#include <linux/mm.h>
#include <linux/capability.h>
#include <linux/highuid.h>
-#include <linux/brlock.h>
#include <linux/fs.h>
#include <linux/uio.h>
#include <linux/tty.h>
@@ -433,17 +432,6 @@
#endif
EXPORT_SYMBOL(mod_timer);

-#ifdef CONFIG_SMP
-
-/* Big-Reader lock implementation */
-EXPORT_SYMBOL(__brlock_array);
-#ifndef __BRLOCK_USE_ATOMICS
-EXPORT_SYMBOL(__br_write_locks);
-#endif
-EXPORT_SYMBOL(__br_write_lock);
-EXPORT_SYMBOL(__br_write_unlock);
-#endif
-
#ifdef HAVE_DISABLE_HLT
EXPORT_SYMBOL(disable_hlt);
EXPORT_SYMBOL(enable_hlt);
diff -urN -X dontdiff linux-2.5.64/lib/brlock.c linux-2.5-nobrlock/lib/brlock.c
--- linux-2.5.64/lib/brlock.c 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/lib/brlock.c 1969-12-31 16:00:00.000000000 -0800
@@ -1,72 +0,0 @@
-/*
- *
- * linux/lib/brlock.c
- *
- * 'Big Reader' read-write spinlocks. See linux/brlock.h for details.
- *
- * Copyright 2000, Ingo Molnar <[email protected]>
- * Copyright 2000, David S. Miller <[email protected]>
- */
-
-#include <linux/config.h>
-
-#ifdef CONFIG_SMP
-
-#include <linux/sched.h>
-#include <linux/brlock.h>
-
-#ifdef __BRLOCK_USE_ATOMICS
-
-brlock_read_lock_t __brlock_array[NR_CPUS][__BR_IDX_MAX] =
- { [0 ... NR_CPUS-1] = { [0 ... __BR_IDX_MAX-1] = RW_LOCK_UNLOCKED } };
-
-void __br_write_lock (enum brlock_indices idx)
-{
- int i;
-
- preempt_disable();
- for (i = 0; i < NR_CPUS; i++)
- _raw_write_lock(&__brlock_array[i][idx]);
-}
-
-void __br_write_unlock (enum brlock_indices idx)
-{
- int i;
-
- for (i = 0; i < NR_CPUS; i++)
- _raw_write_unlock(&__brlock_array[i][idx]);
- preempt_enable();
-}
-
-#else /* ! __BRLOCK_USE_ATOMICS */
-
-brlock_read_lock_t __brlock_array[NR_CPUS][__BR_IDX_MAX] =
- { [0 ... NR_CPUS-1] = { [0 ... __BR_IDX_MAX-1] = 0 } };
-
-struct br_wrlock __br_write_locks[__BR_IDX_MAX] =
- { [0 ... __BR_IDX_MAX-1] = { SPIN_LOCK_UNLOCKED } };
-
-void __br_write_lock (enum brlock_indices idx)
-{
- int i;
-
- preempt_disable();
-again:
- _raw_spin_lock(&__br_write_locks[idx].lock);
- for (i = 0; i < NR_CPUS; i++)
- if (__brlock_array[i][idx] != 0) {
- _raw_spin_unlock(&__br_write_locks[idx].lock);
- barrier();
- cpu_relax();
- goto again;
- }
-}
-
-void __br_write_unlock (enum brlock_indices idx)
-{
- spin_unlock(&__br_write_locks[idx].lock);
-}
-
-#endif /* __BRLOCK_USE_ATOMICS */
-
-#endif /* CONFIG_SMP */
diff -urN -X dontdiff linux-2.5.64/lib/Makefile linux-2.5-nobrlock/lib/Makefile
--- linux-2.5.64/lib/Makefile 2003-03-11 09:08:01.000000000 -0800
+++ linux-2.5-nobrlock/lib/Makefile 2003-03-10 15:38:10.000000000 -0800
@@ -8,7 +8,7 @@

L_TARGET := lib.a

-obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o \
+obj-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
kobject.o idr.o




2003-03-12 00:36:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock


On Tue, 11 Mar 2003, David S. Miller wrote:
>
> Ok, I'm fine with this then. Linus you can apply all of his patches.

I'm a lazy bum, and I would _really_ want this tested more before it hits
my tree. I think it makes sense, but still..

Linus

2003-03-12 00:14:57

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

From: Stephen Hemminger <[email protected]>
Date: 11 Mar 2003 16:15:23 -0800

Previous patches killed all remaining uses of brlock so bye.

I'm all for this once patch 2/8 gets fixed up :-)

So what is the new way to say "stop all incoming packet
processing while I update data structure X"?

2003-03-12 00:52:43

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] (1/8) Eliminate brlock in psnap

Hi,

On 11 Mar 2003, Stephen Hemminger wrote:

> void unregister_snap_client(struct datalink_proto *proto)
> {
> - br_write_lock_bh(BR_NETPROTO_LOCK);
> + static RCU_HEAD(snap_rcu);
>
> - list_del(&proto->node);
> - kfree(proto);
> + spin_lock_bh(&snap_lock);
> + list_del_rcu(&proto->node);
> + spin_unlock_bh(&snap_lock);
>
> - br_write_unlock_bh(BR_NETPROTO_LOCK);
> + call_rcu(&snap_rcu, (void (*)(void *)) kfree, proto);
> }

Is this really correct? What happens with snap_rcu, if
unregister_snap_client is called again, before the call_rcu callback
finished?

bye, Roman

2003-03-12 01:05:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] (1/8) Eliminate brlock in psnap

On Tue, 11 Mar 2003, Stephen Hemminger wrote:

> void unregister_snap_client(struct datalink_proto *proto)
> {
> - br_write_lock_bh(BR_NETPROTO_LOCK);
> + static RCU_HEAD(snap_rcu);
>
> - list_del(&proto->node);
> - kfree(proto);
> + spin_lock_bh(&snap_lock);
> + list_del_rcu(&proto->node);
> + spin_unlock_bh(&snap_lock);
>
> - br_write_unlock_bh(BR_NETPROTO_LOCK);
> + call_rcu(&snap_rcu, (void (*)(void *)) kfree, proto);
> }

Do we need the spin_lock_bh around the list_del_rcu? But also How
about. This way we don't change the previous characteristic of block till
done unregistering

struct datalink_proto {
...
struct completion registration;
};

void __unregister_snap_client(void *__proto)
{
struct datalink_proto *proto = __proto;
complete(&proto->registration);
}

unregister_snap_client(struct datalink_proto *proto)
{
list_del_rcu(&proto->node);
call_rcu(&snap_rcu, __unregister_snap_client, proto);
wait_for_completion(&proto->registration);
kfree(proto);
}

2003-03-12 01:12:51

by Alan

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

On Wed, 2003-03-12 at 00:44, Linus Torvalds wrote:
> On Tue, 11 Mar 2003, David S. Miller wrote:
> >
> > Ok, I'm fine with this then. Linus you can apply all of his patches.
>
> I'm a lazy bum, and I would _really_ want this tested more before it hits
> my tree. I think it makes sense, but still..

If Linus is scared ;) then throw them at me for -ac by all means. Anyone
running -ac IDE test sets is brave enough to run rcu network code 8)

2003-03-12 01:18:05

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] (5/8) Eliminate brlock from netfilter

On Tue, 11 Mar 2003, Stephen Hemminger wrote:

> void nf_unregister_hook(struct nf_hook_ops *reg)
> {
> - br_write_lock_bh(BR_NETPROTO_LOCK);
> + spin_lock_bh(&nf_lock);
> list_del(&reg->list);
> - br_write_unlock_bh(BR_NETPROTO_LOCK);
> + spin_unlock_bh(&nf_lock);
> +
> + synchronize_kernel();
> }

Won't that have to be list_del_rcu there?

Zwane
--
function.linuxpower.ca

2003-03-12 01:26:13

by Roman Zippel

[permalink] [raw]
Subject: Re: [PATCH] (0/8) replace brlock with RCU

Hi,

On Tue, 11 Mar 2003, David S. Miller wrote:

> I'm fine with it, as long as I get shown how to get the equivalent
> atomic sequence using the new primitives. Ie. is there still a way
> to go:
>
> stop_all_incoming_packets();
> do_something();
> resume_all_incoming_packets();
>
> with the new stuff?

BTW if anyone is interested in a brlock implementation, which can offer
this property, but can also beat rcu, you might want to look at this
patch http://marc.theaimsgroup.com/?l=linux-kernel&m=104733360501112&w=2

bye, Roman

2003-03-12 05:45:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

Alan Cox <[email protected]> writes:

> If Linus is scared ;) then throw them at me for -ac by all means. Anyone
> running -ac IDE test sets is brave enough to run rcu network code 8)

It's unlikely to cause problems, unless you start/stop tcpdump all day.

Protocol addition/deletion is really rare.

-Andi

2003-03-12 06:25:23

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] (8/8) Kill brlock

From: Andi Kleen <[email protected]>
Date: 12 Mar 2003 06:56:09 +0100

Alan Cox <[email protected]> writes:

> If Linus is scared ;) then throw them at me for -ac by all means. Anyone
> running -ac IDE test sets is brave enough to run rcu network code 8)

It's unlikely to cause problems, unless you start/stop tcpdump all day.

Protocol addition/deletion is really rare.

True, and this is why people should wrap their brains around
the patch or stress these specific cases really hard.

Rusty's module stress testers might be useful here.

2003-03-12 21:04:06

by Paul McKenney

[permalink] [raw]
Subject: Re: [PATCH] (1/8) Eliminate brlock in psnap





> On Tue, 11 Mar 2003, Stephen Hemminger wrote:
>
> > void unregister_snap_client(struct datalink_proto *proto)
> > {
> > - br_write_lock_bh(BR_NETPROTO_LOCK);
> > + static RCU_HEAD(snap_rcu);
> >
> > - list_del(&proto->node);
> > - kfree(proto);
> > + spin_lock_bh(&snap_lock);
> > + list_del_rcu(&proto->node);
> > + spin_unlock_bh(&snap_lock);
> >
> > - br_write_unlock_bh(BR_NETPROTO_LOCK);
> > + call_rcu(&snap_rcu, (void (*)(void *)) kfree, proto);
> > }
>
> Do we need the spin_lock_bh around the list_del_rcu? But also How
> about. This way we don't change the previous characteristic of block till
> done unregistering
>
> struct datalink_proto {
> ...
> struct completion registration;
> };
>
> void __unregister_snap_client(void *__proto)
> {
> struct datalink_proto *proto = __proto;
> complete(&proto->registration);
> }
>
> unregister_snap_client(struct datalink_proto *proto)
> {
> list_del_rcu(&proto->node);
> call_rcu(&snap_rcu, __unregister_snap_client, proto);
> wait_for_completion(&proto->registration);
> kfree(proto);
> }

You are saying that we can omit locking because this is
called only from a module-exit function, and thus protected
by the module_mutex semaphore? (I must defer to
others who have a better handle on modules...)

If in fact only one module-exit function can be
executing at a given time, then we should be able to
use the following approach:

void unregister_snap_client(struct datalink_proto *proto)
{
list_del_rcu(&proto->node); /* protected by module_mutex. */
synchronize_kernel();
kfree(proto);
}

If multiple module-exit functions can somehow execute
concurrently, then something like the following would
be needed:

void unregister_snap_client(struct datalink_proto *proto)
{
spin_lock_bh(&snap_lock);
list_del_rcu(&proto->node);
spin_unlock_bh(&snap_lock);
synchronize_kernel();
kfree(proto);
}

Module unloading should be rare enough to tolerate
the grace period under the module_mutex, right?

Thoughts?

Thanx, Paul

2003-03-13 00:46:54

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH] (1/8) Eliminate brlock in psnap

On Wed, 12 Mar 2003, Paul McKenney wrote:

> You are saying that we can omit locking because this is
> called only from a module-exit function, and thus protected
> by the module_mutex semaphore? (I must defer to
> others who have a better handle on modules...)
>
> If in fact only one module-exit function can be
> executing at a given time, then we should be able to
> use the following approach:

Yes the ->exit call is protected by module_mutex globally.

> Module unloading should be rare enough to tolerate
> the grace period under the module_mutex, right?
>
> Thoughts?

I would agree. However can't unregister_snap_client be used in other paths
apart from module_unload? I wouldn't worry too much since if
register_snap_client and unregister_snap_client for the same protocol
races it's a bug in the caller's code. The safe RCU list removal and
synchronize_kernel should protect us from sane usage.

Zwane

2003-03-13 22:57:38

by Paul McKenney

[permalink] [raw]
Subject: Re: [PATCH] (1/8) Eliminate brlock in psnap





> > Module unloading should be rare enough to tolerate
> > the grace period under the module_mutex, right?
> >
> > Thoughts?
>
> I would agree. However can't unregister_snap_client be used in other
paths
> apart from module_unload? I wouldn't worry too much since if
> register_snap_client and unregister_snap_client for the same protocol
> races it's a bug in the caller's code. The safe RCU list removal and
> synchronize_kernel should protect us from sane usage.
>
> Zwane

It is not called from elsewhere right now, but if there
is a possibility that it might be, Steve of course should
keep the locking in these functions.

Thanx, Paul