2002-04-10 01:51:12

by Mike Fedyk

[permalink] [raw]
Subject: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

Hi,

*read below for a problem report against the tulip network driver[1]

It looks like the bridging feature of the kernel has changed between 2.4.17
and 18.

I have a machine running 2.4.16 running as a bride without problem. While
testing the network cards for another bridge I had to restart.
Unfortunately, it didn't complete. Nothing would respond except for sysrq.
I was able to sync, unmount, and reboot successfully.

Debian does this for you automatically, but if you do it manually it does
the same thing.

o Setup a bridge like normal
o start it up
o ifconfig br0 down
o brctl delbr br0
*boom*

2.4.16,17 work just fine, but 2.4.18 does not. Looking through the
changelog, 2.4.18-pre4, or 5 look suspect. Also, it is not fixed in
2.4.19-pre6. I can test the 2.4.18-pre kernels if you'd like, just let me
know.

[1] Also several times I was able to get my tulip network card to stop
forwarding packets after a soft-reboot from the lockup mentioned above. I
was able to reproduce this in 2.4.17, 2.4.18, 2.4.19-pre3-ac4 (possibly
2.4.19-pre5-ac3 though didn't check to make sure but I used it several times
during my testing), and 2.4.19-pre6. None of the other network cards in
this machine did this (3c509-isa, pcnet32-pci, eepro100-pci)

I don't have a serial cable ATM but I could buy one in a few days if a
ksymoopsed sysrq-t would help.

00:00.0 Host bridge: Intel Corp. 430HX - 82439HX TXC [Triton II] (rev 03)
00:01.0 ISA bridge: Intel Corp. 82371SB PIIX3 ISA [Natoma/Triton II] (rev 01)
00:01.1 IDE interface: Intel Corp. 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB Controller: Intel Corp. 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:06.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
00:07.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 02)
00:08.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+] (rev 54)
00:0b.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet LANCE] (rev 16)

Mike


2002-04-10 03:58:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Apr 09, 2002 18:53 -0700, Mike Fedyk wrote:
> I have a machine running 2.4.16 running as a bride without problem.

Well, I know some people get attached to their machines, but you will
find that your new bride will age a lot faster than you ;-).

You may also want to contact Lennert Buytenhek <[email protected]>, the
bridge code maintainer.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-04-10 15:01:07

by Lennert Buytenhek

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6


On Tue, Apr 09, 2002 at 06:53:11PM -0700, Mike Fedyk wrote:

> 2.4.16,17 work just fine, but 2.4.18 does not. Looking through the
> changelog, 2.4.18-pre4, or 5 look suspect. Also, it is not fixed in
> 2.4.19-pre6. I can test the 2.4.18-pre kernels if you'd like, just let me
> know.

There's a detailed list of patches that went into 2.4 at:

http://bridge.sourceforge.net/patchtracker.html

Can you try and find out which of the three patches that went
into 2.4.18 is causing your problems?


thanks,
Lennert

2002-04-10 18:13:59

by Mike Fedyk

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Tue, Apr 09, 2002 at 06:53:11PM -0700, Mike Fedyk wrote:
> Hi,
>
> *read below for a problem report against the tulip network driver[1]
>
> It looks like the bridging feature of the kernel has changed between 2.4.17
> and 18.
>
> I have a machine running 2.4.16 running as a bride without problem. While

Oh, yes s/bride/bridge/ ... Nice one Andreas. Women age so much better
than computers, but are just as much harder to change also. ;)

> testing the network cards for another bridge I had to restart.
> Unfortunately, it didn't complete. Nothing would respond except for sysrq.
> I was able to sync, unmount, and reboot successfully.
>
> Debian does this for you automatically, but if you do it manually it does
> the same thing.
>
> o Setup a bridge like normal
> o start it up
> o ifconfig br0 down
> o brctl delbr br0
> *boom*
>
> 2.4.16,17 work just fine, but 2.4.18 does not. Looking through the
> changelog, 2.4.18-pre4, or 5 look suspect. Also, it is not fixed in
> 2.4.19-pre6. I can test the 2.4.18-pre kernels if you'd like, just let me
> know.
>
> [1] Also several times I was able to get my tulip network card to stop
> forwarding packets after a soft-reboot from the lockup mentioned above. I

I forgot to mention that on each occourance, if I turn the power off, wait a
few seconds, and turn it back on, all network cards work normally including
tulip. It just looks like the chip isn't setup correctly[2] if it wasn't
already in a known state.

[2] I only say this because a power-down "fixes" the problem whereas a
sysctl-s-u-b soft-reset does not. It could be the bios on this old machine
too.

As requested by Jeff Garzik, I'm going to copy the latest tulip driver from
2.4.19-pre6 to 2.4.17 and test it there. Lennert, I'll test the three
bridgeing patches on top of that and let you know which one(s) cause the
problem with bridge removal.

Mike

2002-04-11 00:46:58

by Mike Fedyk

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Wed, Apr 10, 2002 at 11:16:06AM -0700, Mike Fedyk wrote:
> As requested by Jeff Garzik, I'm going to copy the latest tulip driver from
> 2.4.19-pre6 to 2.4.17 and test it there. Lennert, I'll test the three
> bridgeing patches on top of that and let you know which one(s) cause the
> problem with bridge removal.
>

2.4.18_fix_port_state_handling.diff

Is causing the problem on 2.4.17-19p6tulip-br0fpsh. I haven't tested the
other patches though.

I'm going to reverse this patch on 2.4.19-pre6 to see if it fixes it there
too. Stay tuned.

Mike

2002-04-11 01:03:02

by Mike Fedyk

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Wed, Apr 10, 2002 at 05:49:11PM -0700, Mike Fedyk wrote:
> 2.4.18_fix_port_state_handling.diff
>
> Is causing the problem on 2.4.17-19p6tulip-br0fpsh. I haven't tested the
> other patches though.
>
> I'm going to reverse this patch on 2.4.19-pre6 to see if it fixes it there
> too. Stay tuned.

2.4.18_enslave_bridge_dev_to_bridge_dev.diff

Is fine I didn't reproduce the trouble in 2.4.17-19p6tulip-br0ebdtbd (it was
already compiling when I tested the port_state kernel...).

2.4.19-pre6-nobr0fpsh building now...

Mike

2002-04-11 01:38:23

by Mike Fedyk

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Wed, Apr 10, 2002 at 06:05:15PM -0700, Mike Fedyk wrote:
> On Wed, Apr 10, 2002 at 05:49:11PM -0700, Mike Fedyk wrote:
> > 2.4.18_fix_port_state_handling.diff
> >
> > Is causing the problem on 2.4.17-19p6tulip-br0fpsh. I haven't tested the
> > other patches though.
> >
> > I'm going to reverse this patch on 2.4.19-pre6 to see if it fixes it there
> > too. Stay tuned.
>
> 2.4.18_enslave_bridge_dev_to_bridge_dev.diff
>
> Is fine I didn't reproduce the trouble in 2.4.17-19p6tulip-br0ebdtbd (it was
> already compiling when I tested the port_state kernel...).
>
> 2.4.19-pre6-nobr0fpsh building now...

Yep, reversing 2.4.18_fix_port_state_handling.diff fixed it.

Here's a patch to reverse it for you:
--- 2.4.19-pre6/net/bridge/br_input.c Mon Feb 25 11:38:14 2002
+++ 2.4.19-pre6-nobr0fpsh/net/bridge/br_input.c Wed Apr 10 18:04:17 2002
@@ -5,7 +5,7 @@
* Authors:
* Lennert Buytenhek <[email protected]>
*
- * $Id: br_input.c,v 1.9.2.1 2001/12/24 04:50:05 davem Exp $
+ * $Id: br_input.c,v 1.9 2001/08/14 22:05:57 davem Exp $
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -46,7 +46,7 @@
br_pass_frame_up_finish);
}

-static int br_handle_frame_finish(struct sk_buff *skb)
+static void __br_handle_frame(struct sk_buff *skb)
{
struct net_bridge *br;
unsigned char *dest;
@@ -57,112 +57,103 @@
dest = skb->mac.ethernet->h_dest;

p = skb->dev->br_port;
- if (p == NULL)
- goto err_nolock;
-
br = p->br;
- read_lock(&br->lock);
- if (skb->dev->br_port == NULL)
- goto err;
-
passedup = 0;
+
+ if (!(br->dev.flags & IFF_UP) ||
+ p->state == BR_STATE_DISABLED)
+ goto freeandout;
+
if (br->dev.flags & IFF_PROMISC) {
struct sk_buff *skb2;

skb2 = skb_clone(skb, GFP_ATOMIC);
- if (skb2 != NULL) {
+ if (skb2) {
passedup = 1;
br_pass_frame_up(br, skb2);
}
}

+ if (skb->mac.ethernet->h_source[0] & 1)
+ goto freeandout;
+
+ if (!passedup &&
+ (dest[0] & 1) &&
+ (br->dev.flags & IFF_ALLMULTI || br->dev.mc_list != NULL)) {
+ struct sk_buff *skb2;
+
+ skb2 = skb_clone(skb, GFP_ATOMIC);
+ if (skb2) {
+ passedup = 1;
+ br_pass_frame_up(br, skb2);
+ }
+ }
+
+ if (br->stp_enabled &&
+ !memcmp(dest, bridge_ula, 5) &&
+ !(dest[5] & 0xF0))
+ goto handle_special_frame;
+
+ if (p->state == BR_STATE_LEARNING ||
+ p->state == BR_STATE_FORWARDING)
+ br_fdb_insert(br, p, skb->mac.ethernet->h_source, 0);
+
+ if (p->state != BR_STATE_FORWARDING)
+ goto freeandout;
+
if (dest[0] & 1) {
- br_flood_forward(br, skb, !passedup);
+ br_flood_forward(br, skb, 1);
if (!passedup)
br_pass_frame_up(br, skb);
- goto out;
+ else
+ kfree_skb(skb);
+ return;
}

dst = br_fdb_get(br, dest);
+
if (dst != NULL && dst->is_local) {
if (!passedup)
br_pass_frame_up(br, skb);
else
kfree_skb(skb);
br_fdb_put(dst);
- goto out;
+ return;
}

if (dst != NULL) {
br_forward(dst->dst, skb);
br_fdb_put(dst);
- goto out;
+ return;
}

br_flood_forward(br, skb, 0);
+ return;

-out:
- read_unlock(&br->lock);
- return 0;
+ handle_special_frame:
+ if (!dest[5]) {
+ br_stp_handle_bpdu(skb);
+ return;
+ }

-err:
- read_unlock(&br->lock);
-err_nolock:
+ freeandout:
kfree_skb(skb);
- return 0;
}

-void br_handle_frame(struct sk_buff *skb)
+static int br_handle_frame_finish(struct sk_buff *skb)
{
struct net_bridge *br;
- unsigned char *dest;
- struct net_bridge_port *p;

- dest = skb->mac.ethernet->h_dest;
-
- p = skb->dev->br_port;
- if (p == NULL)
- goto err_nolock;
-
- br = p->br;
+ br = skb->dev->br_port->br;
read_lock(&br->lock);
- if (skb->dev->br_port == NULL)
- goto err;
-
- if (!(br->dev.flags & IFF_UP) ||
- p->state == BR_STATE_DISABLED)
- goto err;
-
- if (skb->mac.ethernet->h_source[0] & 1)
- goto err;
-
- if (p->state == BR_STATE_LEARNING ||
- p->state == BR_STATE_FORWARDING)
- br_fdb_insert(br, p, skb->mac.ethernet->h_source, 0);
-
- if (br->stp_enabled &&
- !memcmp(dest, bridge_ula, 5) &&
- !(dest[5] & 0xF0))
- goto handle_special_frame;
-
- if (p->state == BR_STATE_FORWARDING) {
- NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
- br_handle_frame_finish);
- read_unlock(&br->lock);
- return;
- }
-
-err:
+ __br_handle_frame(skb);
read_unlock(&br->lock);
-err_nolock:
- kfree_skb(skb);
- return;

-handle_special_frame:
- if (!dest[5]) {
- br_stp_handle_bpdu(skb);
- return;
- }
+ return 0;
+}

- kfree_skb(skb);
+void br_handle_frame(struct sk_buff *skb)
+{
+ NF_HOOK(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
+ br_handle_frame_finish);
}

2002-04-11 02:51:23

by David Miller

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

From: Mike Fedyk <[email protected]>
Date: Wed, 10 Apr 2002 18:40:35 -0700

On Wed, Apr 10, 2002 at 06:05:15PM -0700, Mike Fedyk wrote:
> 2.4.19-pre6-nobr0fpsh building now...

Yep, reversing 2.4.18_fix_port_state_handling.diff fixed it.

Thanks for tracking this down so precisely. If every bug reporter
actually bothered to do this work, it'd take a lot less work to
fix most bugs :-)

Lennert, just let me know when you have a fix :-)

2002-04-11 20:22:10

by Mike Fedyk

[permalink] [raw]
Subject: Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Wed, Apr 10, 2002 at 07:44:11PM -0700, David S. Miller wrote:
> From: Mike Fedyk <[email protected]>
> Date: Wed, 10 Apr 2002 18:40:35 -0700
>
> On Wed, Apr 10, 2002 at 06:05:15PM -0700, Mike Fedyk wrote:
> > 2.4.19-pre6-nobr0fpsh building now...
>
> Yep, reversing 2.4.18_fix_port_state_handling.diff fixed it.
>
> Thanks for tracking this down so precisely. If every bug reporter
> actually bothered to do this work, it'd take a lot less work to
> fix most bugs :-)
>

Thanks. :)

> Lennert, just let me know when you have a fix :-)
>

Couldn't have done it if Lennert didn't keep those applied patches
available, and separated. :)

Mike

2002-04-23 22:58:01

by Mike Fedyk

[permalink] [raw]
Subject: [PATCH] Re: [BUG] DEADLOCK when removing a bridge on 2.4.19-pre6

On Wed, Apr 10, 2002 at 06:40:35PM -0700, Mike Fedyk wrote:
> On Wed, Apr 10, 2002 at 06:05:15PM -0700, Mike Fedyk wrote:
> > On Wed, Apr 10, 2002 at 05:49:11PM -0700, Mike Fedyk wrote:
> > > 2.4.18_fix_port_state_handling.diff
> > >
> > > Is causing the problem on 2.4.17-19p6tulip-br0fpsh. I haven't tested the
> > > other patches though.
> > >
> > > I'm going to reverse this patch on 2.4.19-pre6 to see if it fixes it there
> > > too. Stay tuned.
> >
> > 2.4.18_enslave_bridge_dev_to_bridge_dev.diff
> >
> > Is fine I didn't reproduce the trouble in 2.4.17-19p6tulip-br0ebdtbd (it was
> > already compiling when I tested the port_state kernel...).
> >
> > 2.4.19-pre6-nobr0fpsh building now...
>
> Yep, reversing 2.4.18_fix_port_state_handling.diff fixed it.
>

Instead of reversing the patch, use this patch from Lennert Buytenhek instead:

Note: this problem should only show up on smp kernels(unvarified), it showed
up with a smp kernel on a UP system for me.

--- linux-2.4.18/net/bridge/br_input.c.orig Thu Apr 18 11:50:16 2002
+++ linux-2.4.18/net/bridge/br_input.c Thu Apr 18 11:50:26 2002
@@ -161,8 +161,10 @@
handle_special_frame:
if (!dest[5]) {
br_stp_handle_bpdu(skb);
+ read_unlock(&br->lock);
return;
}

+ read_unlock(&br->lock);
kfree_skb(skb);
}