2005-01-31 16:22:11

by Mirko Parthey

[permalink] [raw]
Subject: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

My Debian machine hangs during shutdown, with messages like this:
unregister_netdevice: waiting for br0 to become free. Usage count = 1

I narrowed it down to the command
# brctl delbr br0
which does not return in the circumstances shown below.

The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
Debian kernel-image-2.6.10-1-686.

My .config is taken from the above mentioned Debian kernel, adapted to
2.6.11-rc2 and the processor type set to Pentium Classic. (I can email
the .config on request).

How to reproduce the problem (I tried this on a Pentium 4 machine):

boot: linux init=/bin/bash
[...booting...]
# mount proc -t proc /proc
# ifconfig lo 127.0.0.1
# brctl addbr br0
# modprobe e100 # also reproducible with 3c5x9
# brctl addif br0 eth0
# ifconfig br0 192.168.1.1
# ifconfig eth0 up
# ifconfig lo down
# lsmod | grep bridge
bridge 48888 0
# brctl delif br0 eth0
# ifconfig br0 down
# brctl delbr br0
unregister_netdevice: waiting for br0 to become free. Usage count = 1
unregister_netdevice: waiting for br0 to become free. Usage count = 1
[...this message again and again, but no progress...]

I'm also surprised that the reference count for the bridge module is
zero, even when it is in use.

Please let me know if you need any further information,
and please Cc me on replies (I am not subscribed to linux-kernel).

Thanks,
Mirko


2005-01-31 16:41:25

by Matthias-Christian Ott

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

Mirko Parthey wrote:

>My Debian machine hangs during shutdown, with messages like this:
>unregister_netdevice: waiting for br0 to become free. Usage count = 1
>
>I narrowed it down to the command
> # brctl delbr br0
>which does not return in the circumstances shown below.
>
>The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
>Debian kernel-image-2.6.10-1-686.
>
>My .config is taken from the above mentioned Debian kernel, adapted to
>2.6.11-rc2 and the processor type set to Pentium Classic. (I can email
>the .config on request).
>
>How to reproduce the problem (I tried this on a Pentium 4 machine):
>
>boot: linux init=/bin/bash
>[...booting...]
># mount proc -t proc /proc
># ifconfig lo 127.0.0.1
># brctl addbr br0
># modprobe e100 # also reproducible with 3c5x9
># brctl addif br0 eth0
># ifconfig br0 192.168.1.1
># ifconfig eth0 up
># ifconfig lo down
># lsmod | grep bridge
>bridge 48888 0
># brctl delif br0 eth0
># ifconfig br0 down
># brctl delbr br0
>unregister_netdevice: waiting for br0 to become free. Usage count = 1
>unregister_netdevice: waiting for br0 to become free. Usage count = 1
>[...this message again and again, but no progress...]
>
>I'm also surprised that the reference count for the bridge module is
>zero, even when it is in use.
>
>Please let me know if you need any further information,
>and please Cc me on replies (I am not subscribed to linux-kernel).
>
>Thanks,
>Mirko
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
Hi!
I have a similar problem with my wlan modul, this patch should fix it (I
disables the check function [untested]):

diff -Nru linux-2.6.11-rc2/net/core/dev.c
linux-2.6.11-rc2-ott/net/core/dev.c
--- linux-2.6.11-rc2/net/core/dev.c 2005-01-26 22:27:31.000000000 +0100
+++ linux-2.6.11-rc2-ott/net/core/dev.c 2005-01-31 00:47:34.000000000
+0100
@@ -2974,7 +2974,7 @@
netdev_unregister_sysfs(dev);
dev->reg_state = NETREG_UNREGISTERED;

- netdev_wait_allrefs(dev);
+ //netdev_wait_allrefs(dev);

/* paranoia */
BUG_ON(atomic_read(&dev->refcnt));

Matthias-Christian Ott

2005-02-01 17:36:32

by Mirko Parthey

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Mon, Jan 31, 2005 at 05:22:02PM +0100, wrote:
> My Debian machine hangs during shutdown, with messages like this:
> unregister_netdevice: waiting for br0 to become free. Usage count = 1
>
> I narrowed it down to the command
> # brctl delbr br0
> which does not return in the circumstances shown below.
>
> The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
> Debian kernel-image-2.6.10-1-686.
> [...]

The problem was introduced between 2.6.8.1 and 2.6.9,
and it is still present in 2.6.11-rc2-bk9.

Mirko

2005-02-02 15:09:04

by Matthias-Christian Ott

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

Mirko Parthey wrote:

>On Mon, Jan 31, 2005 at 05:22:02PM +0100, wrote:
>
>
>>My Debian machine hangs during shutdown, with messages like this:
>>unregister_netdevice: waiting for br0 to become free. Usage count = 1
>>
>>I narrowed it down to the command
>> # brctl delbr br0
>>which does not return in the circumstances shown below.
>>
>>The problem is reproducible with both 2.6.11-rc2 from kernel.org and the
>>Debian kernel-image-2.6.10-1-686.
>>[...]
>>
>>
>
>The problem was introduced between 2.6.8.1 and 2.6.9,
>and it is still present in 2.6.11-rc2-bk9.
>
>Mirko
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
Jep it appears in 2.6.8 and above. I don't know the process who's using it:

Before shutdown:
PID TTY TIME CMD
1 ? 00:00:00 init
2 ? 00:00:00 ksoftirqd/0
3 ? 00:00:00 events/0
4 ? 00:00:00 khelper
9 ? 00:00:00 kthread
19 ? 00:00:00 kacpid
17 ? 00:00:00 vesafb
100 ? 00:00:00 kblockd/0
164 ? 00:00:00 pdflush
165 ? 00:00:00 pdflush
166 ? 00:00:00 kswapd0
167 ? 00:00:00 aio/0
168 ? 00:00:00 jfsIO
169 ? 00:00:00 jfsCommit
170 ? 00:00:00 jfsSync
171 ? 00:00:00 xfslogd/0
172 ? 00:00:00 xfsdatad/0
173 ? 00:00:00 xfsbufd
774 ? 00:00:00 kseriod
862 ? 00:00:00 ata/0
882 ? 00:00:00 kcryptd/0
883 ? 00:00:00 kmirrord/0
887 ? 00:00:00 xfssyncd
939 ? 00:00:00 udevd
1061 ? 00:00:00 devfsd
1242 ? 00:00:00 khubd
1616 ? 00:00:00 syslog-ng
2562 ? 00:00:00 khpsbpkt
2646 ? 00:00:00 knodemgrd_0
3987 ? 00:00:00 crond
4027 tty2 00:00:00 agetty
4028 tty3 00:00:00 agetty
4029 tty4 00:00:00 agetty
4030 tty5 00:00:00 agetty
4031 tty6 00:00:00 agetty
4473 tty1 00:00:00 bash
4553 tty1 00:00:00 ps

And my System can't shutdown because of this bug. Currently disabling
this checking function seems to be the only solution (see my diff).

Matthias-Christian Ott

2005-02-05 05:25:31

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Mon, Jan 31, 2005 at 04:22:02PM +0000, Mirko Parthey wrote:
>
> How to reproduce the problem (I tried this on a Pentium 4 machine):
>
> boot: linux init=/bin/bash
> [...booting...]
> # mount proc -t proc /proc
> # ifconfig lo 127.0.0.1
> # brctl addbr br0
> # modprobe e100 # also reproducible with 3c5x9
> # brctl addif br0 eth0
> # ifconfig br0 192.168.1.1
> # ifconfig eth0 up
> # ifconfig lo down

This is the key to the problem.

It took me a while to find the cause of this. Along the way I
found a few other ref counting bugs in this area as well.

All of these bugs stem from the idev reference held in rtable/rt6_info.
Looking back in my mailbox, it's amazing how many problems this piece
of infrastructure has caused since it was installed last June. AFAIK
to this day there is still no code in the kernel that actually uses
this infrastructure.

Anyway, this particular problem is due to IPv6 adding local addresses
with split devices. That is, routes to local addresses are added with
rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
device where the address is added.

This is just not going to work unless IPv6 creates its own dst garbage
collection routine since the generic GC in net/core/dst.c has no way
of finding all the rt6_info's referring to a specific net_device through
rt6i_idev.

It is also different from the IPv4 behaviour where we set both dev
and idev to loopback_dev.

Now this stuff is apparently going to be used for IP statistics. As
it is packets sent to/received from local addresses are counted against
the loopback device. Linux has behaved this way for a long time. So
when these statistics actually get implemented it will be very strange
if they were accounted to the device owning the local addresses instead
of loopback.

It also goes against the Linux philosophy where the addresses are owned
by the host, not the interface.

Therefore I propose the simple solution of not doing the split device
accounting in rt6_info.

Signed-off-by: Herbert Xu <[email protected]>

I will send the patches for the other bugs separately.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Attachments:
(No filename) (2.28 kB)
q (661.00 B)
Download all attachments

2005-02-05 05:46:05

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, 5 Feb 2005 16:24:07 +1100
Herbert Xu <[email protected]> wrote:

> This is the key to the problem.
...
> All of these bugs stem from the idev reference held in rtable/rt6_info.
...
> Anyway, this particular problem is due to IPv6 adding local addresses
> with split devices. That is, routes to local addresses are added with
> rt6i_dev set to &loopback_dev and rt6i_idev set to the idev of the
> device where the address is added.
...
> It also goes against the Linux philosophy where the addresses are owned
> by the host, not the interface.
>
> Therefore I propose the simple solution of not doing the split device
> accounting in rt6_info.

I agree with your analysis, however... this change is not sufficient.
You have to then walk over all the uses of rt6i_dev and sanitize the
cases that still expect the split semantics. For example, things like
this piece of coe in rt6_device_match():

if (dev->flags & IFF_LOOPBACK) {
if (sprt->rt6i_idev == NULL ||
sprt->rt6i_idev->dev->ifindex != oif) {
if (strict && oif)
continue;
if (local && (!oif ||
local->rt6i_idev->dev->ifindex == oif))
continue;
}
local = sprt;
}

It is just the first such thing I found, scanning rt6i_idev uses
will easily find several others.

2005-02-05 06:14:11

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
> It is just the first such thing I found, scanning rt6i_idev uses
> will easily find several others.

You're right of course. I thought they were all harmless but I was
obviously wrong about this one.

So here is a patch that essentially reverts the split devices
semantics introduced by these two changesets:

[IPV6] addrconf_dst_alloc() to allocate new route for local address.
[IPV6] take rt6i_idev into account when looking up routes.

Signed-off-by: Herbert Xu ~{PmV>HI~} <[email protected]>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Attachments:
(No filename) (800.00 B)
p (2.91 kB)
Download all attachments

2005-02-05 06:24:00

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, 5 Feb 2005 17:11:10 +1100
Herbert Xu <[email protected]> wrote:

> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
>
> [IPV6] addrconf_dst_alloc() to allocate new route for local address.
> [IPV6] take rt6i_idev into account when looking up routes.
>
> Signed-off-by: Herbert Xu ~{PmV>HI~} <[email protected]>

Ok.

But Herbert, let's take a step back real quick because I want
to point something out. IPv6 does try to handle the dangling
mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
via net/core/dst.c:dst_ifdown(), and this releases the ipv6
idev correctly in the split device case.

Did your analysis of this bridging release bug take this into
account? That's why we added this dst->ops method, specifically
to handle this problem.

This was added by Yoshifuji-san in ChangeSet 1.1722.137.17 which
has the checking comment:

[NET]: Add dst->ifdown callback.

Use it to release protocol specific objects that may be
tied to a dst cache object, at ifdown time. Currently
this is used to release ipv4/ipv6 specific device state.

2005-02-05 06:47:28

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Fri, Feb 04, 2005 at 10:13:44PM -0800, David S. Miller wrote:
>
> But Herbert, let's take a step back real quick because I want
> to point something out. IPv6 does try to handle the dangling
> mismatched idev's, in route.c:ip6_dst_ifdown(), this is called
> via net/core/dst.c:dst_ifdown(), and this releases the ipv6
> idev correctly in the split device case.
>
> Did your analysis of this bridging release bug take this into
> account? That's why we added this dst->ops method, specifically
> to handle this problem.

This doesn't work because net/core/dst.c can only search based
on dst->dev. For the split device case, dst->dev is set to
loopback_dev while rt6i_idev is set to the real device.

Therefore net/core/dst.c stands no chance of finding the correct
local address routes that it needs to process.

If we wanted to preserve the split device semantics, then we
can create a local GC list in IPv6 so that it can search based
on rt6i_idev as well as the other keys. Alternatively we can
remove the dst->dev == dev check in dst_dev_event and dst_ifdown
and move that test down to the individual ifdown functions.

However, IMHO the split device semantics is inconsistent with
the philosphy that addresses belong to the host and not the
interface. So it doesn't really make sense in the current
networking stack.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-05 10:51:05

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 5 Feb 2005 17:46:43 +1100), Herbert Xu <[email protected]> says:

> If we wanted to preserve the split device semantics, then we
> can create a local GC list in IPv6 so that it can search based
> on rt6i_idev as well as the other keys. Alternatively we can
> remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> and move that test down to the individual ifdown functions.

Yes, IPv6 needs "split device" semantics
(for per-device statistics such as Ip6InDelivers etc),
and I like later solution.

Thanks.

--yoshfuji

2005-02-05 10:57:21

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 05:46:43PM +1100, herbert wrote:
>
> This doesn't work because net/core/dst.c can only search based
> on dst->dev. For the split device case, dst->dev is set to
> loopback_dev while rt6i_idev is set to the real device.

Although I still think this is a bug, I'm now starting to suspect
that there is another bug around as well.

There is probably an ifp leak which in turn leads to a split dst
leak that allows the first bug to make its mark.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-05 11:24:39

by Andre Tomt

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

Herbert Xu wrote:
> On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
>>It is just the first such thing I found, scanning rt6i_idev uses
>>will easily find several others.
>
>
> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
<..>

This patch fixes my problems with hangs when dot1q VLAN interfaces gets
removed when loopback is down, as reported in the thread "2.6.10
ipv6/8021q lockup on vconfig on interface removal".

Yay :)

2005-02-05 11:40:10

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 05 Feb 2005 12:14:04 +0100), Andre Tomt <[email protected]> says:

> This patch fixes my problems with hangs when dot1q VLAN interfaces gets
> removed when loopback is down, as reported in the thread "2.6.10
> ipv6/8021q lockup on vconfig on interface removal".

Please tell me, why your lo is down...

Anyway, if we really want to "fix" this,
we should do in other way.

I think "Make loopback idev stick around" patches
(for IPv4 and IPv6) could be start of that.

--yoshfuji

2005-02-05 11:54:16

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 05 Feb 2005 12:48:15 +0100), Andre Tomt <[email protected]> says:

> > Please tell me, why your lo is down...
:
> "ifdown -a" gets run on shutdown and reboot here, and ifdown -a in
> Debian brings down loopback before any other interfaces.

Okay, thanks. (I now remember someone told me this before.) hmm...

--yoshfuji

2005-02-05 12:49:24

by Matthias-Christian Ott

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

Matthias-Christian Ott wrote:

> Mirko Parthey wrote:
>
>> On Mon, Jan 31, 2005 at 05:22:02PM +0100, wrote:
>>
>>
>>> My Debian machine hangs during shutdown, with messages like this:
>>> unregister_netdevice: waiting for br0 to become free. Usage count = 1
>>>
>>> I narrowed it down to the command
>>> # brctl delbr br0
>>> which does not return in the circumstances shown below.
>>>
>>> The problem is reproducible with both 2.6.11-rc2 from kernel.org and
>>> the
>>> Debian kernel-image-2.6.10-1-686.
>>> [...]
>>>
>>
>>
>> The problem was introduced between 2.6.8.1 and 2.6.9,
>> and it is still present in 2.6.11-rc2-bk9.
>>
>> Mirko
>> -
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>>
> Jep it appears in 2.6.8 and above. I don't know the process who's
> using it:
>
> Before shutdown:
> PID TTY TIME CMD
> 1 ? 00:00:00 init
> 2 ? 00:00:00 ksoftirqd/0
> 3 ? 00:00:00 events/0
> 4 ? 00:00:00 khelper
> 9 ? 00:00:00 kthread
> 19 ? 00:00:00 kacpid
> 17 ? 00:00:00 vesafb
> 100 ? 00:00:00 kblockd/0
> 164 ? 00:00:00 pdflush
> 165 ? 00:00:00 pdflush
> 166 ? 00:00:00 kswapd0
> 167 ? 00:00:00 aio/0
> 168 ? 00:00:00 jfsIO
> 169 ? 00:00:00 jfsCommit
> 170 ? 00:00:00 jfsSync
> 171 ? 00:00:00 xfslogd/0
> 172 ? 00:00:00 xfsdatad/0
> 173 ? 00:00:00 xfsbufd
> 774 ? 00:00:00 kseriod
> 862 ? 00:00:00 ata/0
> 882 ? 00:00:00 kcryptd/0
> 883 ? 00:00:00 kmirrord/0
> 887 ? 00:00:00 xfssyncd
> 939 ? 00:00:00 udevd
> 1061 ? 00:00:00 devfsd
> 1242 ? 00:00:00 khubd
> 1616 ? 00:00:00 syslog-ng
> 2562 ? 00:00:00 khpsbpkt
> 2646 ? 00:00:00 knodemgrd_0
> 3987 ? 00:00:00 crond
> 4027 tty2 00:00:00 agetty
> 4028 tty3 00:00:00 agetty
> 4029 tty4 00:00:00 agetty
> 4030 tty5 00:00:00 agetty
> 4031 tty6 00:00:00 agetty
> 4473 tty1 00:00:00 bash
> 4553 tty1 00:00:00 ps
>
> And my System can't shutdown because of this bug. Currently disabling
> this checking function seems to be the only solution (see my diff).
>
> Matthias-Christian Ott
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Well I think my problem was not ipv6 problem Herbert discribed, it was
syslog-ng. I replaced it with metalog and it seems to work well (I'll
test it with 2.6.11-rc3 today).

Matthias-Christian Ott

2005-02-05 18:34:06

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 07:50:39PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Yes, IPv6 needs "split device" semantics
> (for per-device statistics such as Ip6InDelivers etc),
> and I like later solution.

OK. Is there any reason why IPv4 should be different from IPv6 in
this respect though?

If the split device dst's are to be kept, we'll need a way to clean
them up. There are two choices:

1) Put the dst's on IPv6's own GC so that we can search by rt6i_idev.
2) Change the generic GC so that dst->ops->ifdown is always called even
if dst->dev does not match with the device going down. We also need to
change the individual ifdown functions to check for ->dev. The IPv6
ifdown function can then check for ->rt6i_idev as well.

What's your preference?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-05 18:38:24

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 08:39:00PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> I think "Make loopback idev stick around" patches
> (for IPv4 and IPv6) could be start of that.

Unfortunately that patch can't fix this particular problem. This
problem will show up whenever there is a dst on the GC list that
has split devices and a non-zero refcnt.

So if you had a process holding that dst you can still get a dead-lock.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 04:11:11

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, 05 Feb 2005 19:50:39 +0900 (JST)
YOSHIFUJI Hideaki / $B5HF#1QL@(B <[email protected]> wrote:

> In article <[email protected]> (at Sat, 5 Feb 2005 17:46:43 +1100), Herbert Xu <[email protected]> says:
>
> > If we wanted to preserve the split device semantics, then we
> > can create a local GC list in IPv6 so that it can search based
> > on rt6i_idev as well as the other keys. Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> Yes, IPv6 needs "split device" semantics
> (for per-device statistics such as Ip6InDelivers etc),
> and I like later solution.

Ok. I never read whether ipv6, like ipv4, is specified to support
a model of host based ownership of addresses. Does anyone know?

2005-02-06 04:18:34

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, 5 Feb 2005 17:46:43 +1100
Herbert Xu <[email protected]> wrote:

> This doesn't work because net/core/dst.c can only search based
> on dst->dev. For the split device case, dst->dev is set to
> loopback_dev while rt6i_idev is set to the real device.

Indeed. I didn't catch that.

> If we wanted to preserve the split device semantics, then we
> can create a local GC list in IPv6 so that it can search based
> on rt6i_idev as well as the other keys.

Ok, so this would entail changing each ipv6 dst_free() call
into one to ip6_dst_free(), which would:

ip6_garbage_add(dst);
dst_free(dst);

It would mean that dst_run_gc() would need to have some callback
like dst->ops->gc_destroy() or similar, which would allow ipv6
to delete the dst from it's local garbage list.

> Alternatively we can
> remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> and move that test down to the individual ifdown functions.

I think there is a hole in this idea.... maybe.

If the idea is to scan dst_garbage_list down in ipv6 specific code,
you can't do that since 'dst' objects from every pool in the kernel
get put onto the dst_garbage_list. It is generic.

They have no identity, so it's illegal to treat any member of that
list as an rt_entry, rt6_entry or any specific higher level dst
type.

2005-02-06 04:36:36

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 5 Feb 2005 20:10:44 -0800), "David S. Miller" <[email protected]> says:

> > Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> I think there is a hole in this idea.... maybe.
>
> If the idea is to scan dst_garbage_list down in ipv6 specific code,
> you can't do that since 'dst' objects from every pool in the kernel
> get put onto the dst_garbage_list. It is generic.

How about making dst->ops->dev_check() like this:

static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
{
if (dst->ops->dev_check)
return dst->ops->dev_check(dst, dev)
else
return dst->dev == dev;
}

--yoshfuji

2005-02-06 05:00:50

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 5 Feb 2005 20:02:42 -0800), "David S. Miller" <[email protected]> says:

> > Yes, IPv6 needs "split device" semantics
> > (for per-device statistics such as Ip6InDelivers etc),
> > and I like later solution.
>
> Ok. I never read whether ipv6, like ipv4, is specified to support
> a model of host based ownership of addresses. Does anyone know?

I'm not sure it is explicitly specified, but there're some hints:

1. we need to allow multiple addresses on multiple interfaces.
e.g. link-local address

2. if a packet has come from A to link-local address on the other side B,
we should not receive it.
+-------+
---->|A B|
+-------+

Currently, it does not happen in usual because ndisc does NOT handle
addresses on other device.

3. mib document states that we should take statistics on interface which
the address belongs to; not the interface where the packet has come
from:

cited from RFC2011bis:
Local (*) packets on the input side are counted on the interface
associated with their destination address, which may not be the
interface on which they were received. This requirement is caused by
the possibility of losing the original interface during processing,
especially re-assembly.

(*): here it means incoming, but not forwarding.


BTW, BSD has similar reference to interface structure in routeing entry.

--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2005-02-06 05:12:19

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
YOSHIFUJI Hideaki / $B5HF#1QL@(B <[email protected]> wrote:

> How about making dst->ops->dev_check() like this:
>
> static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
> {
> if (dst->ops->dev_check)
> return dst->ops->dev_check(dst, dev)
> else
> return dst->dev == dev;
> }

Oh I see. That would work, and it seems the simplest, and
lowest risk fix for this problem.

Herbert, what do you think?

2005-02-06 05:30:24

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sat, 5 Feb 2005 21:04:11 -0800), "David S. Miller" <[email protected]> says:

> On Sun, 06 Feb 2005 13:37:23 +0900 (JST)
> YOSHIFUJI Hideaki / $B5HF#1QL@(B <[email protected]> wrote:
>
> > How about making dst->ops->dev_check() like this:
> >
> > static int inline dst_dev_check(struct dst_entry *dst, struct net_device *dev)
> > {
> > if (dst->ops->dev_check)
> > return dst->ops->dev_check(dst, dev)
> > else
> > return dst->dev == dev;
> > }
>
> Oh I see. That would work, and it seems the simplest, and
> lowest risk fix for this problem.

Well...

Here, lo is going down.
rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
I think we already see dst->dev == dev (==lo) now.
So, I doubt that fix the problem.

The source of problem is entry (*) which still on routing entry,
not on gc list. And, the owner of entry is not routing table but
unicast/anycast address structure(s).
We need to "kill" active address on the other interfaces.

*: rt->rt6i_dev = lo and rt->rt6i_idev = ethX


BTW, I wish we could shut down eth0 during lo is pending...

--yoshfuji


2005-02-06 05:49:22

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sun, 06 Feb 2005 14:31:07 +0900 (JST)), YOSHIFUJI Hideaki / $B5HF#1QL@(B <[email protected]> says:

> The source of problem is entry (*) which still on routing entry,
> not on gc list. And, the owner of entry is not routing table but
> unicast/anycast address structure(s).
> We need to "kill" active address on the other interfaces.

Which means in addrconf_notiry(), if the dev == &loopback_dev,
call addrconf_ifdown for every device like this:

Signed-off-by: Hideaki YOSHIFUJI <[email protected]>

===== net/ipv6/addrconf.c 1.129 vs edited =====
--- 1.129/net/ipv6/addrconf.c 2005-01-18 06:13:31 +09:00
+++ edited/net/ipv6/addrconf.c 2005-02-06 14:48:25 +09:00
@@ -1961,6 +1961,20 @@
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
/*
+ * If lo is doing down we need to kill
+ * all addresses on the host because it owns
+ * route on lo. --yoshfuji
+ */
+ if (dev == &loopback_dev) {
+ struct net_device *dev1;
+ for (dev1 = dev_base; dev1; dev1 = dev->next) {
+ if (dev1 == &loopback_dev ||
+ __in6_dev_get(dev1) == NULL)
+ continue;
+ addrconf_ifdown(dev1, event != NETDEV_DOWN);
+ }
+ }
+ /*
* Remove all addresses from this interface.
*/
addrconf_ifdown(dev, event != NETDEV_DOWN);


--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2005-02-06 06:53:29

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 09:04:11PM -0800, David S. Miller wrote:
>
> Oh I see. That would work, and it seems the simplest, and
> lowest risk fix for this problem.
>
> Herbert, what do you think?

Yes I agree.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 06:53:20

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 08:10:44PM -0800, David S. Miller wrote:
>
> > Alternatively we can
> > remove the dst->dev == dev check in dst_dev_event and dst_ifdown
> > and move that test down to the individual ifdown functions.
>
> I think there is a hole in this idea.... maybe.
>
> If the idea is to scan dst_garbage_list down in ipv6 specific code,
> you can't do that since 'dst' objects from every pool in the kernel
> get put onto the dst_garbage_list. It is generic.

The idea is to move the check into dst->ops->ifdown. By definition
ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
will become

for (dst = dst_garbage_list; dst; dst = dst->next) {
dst_ifdown(dst, event != NETDEV_DOWN);
}

dst_ifdown will become

...

do {
if (dst->dev == dev && unregister) {
...
}

dst->ops->ifdown(dst, dev, unregister);
} while ((dst = dst->child) && dst->flags & DST_NOHASH);

...

Note the extra dev argument to ifdown. ipv6_dst_ifdown will be

static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
int how)
{
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;

if (idev != NULL && idev->dev != &loopback_dev && idev->dev == dev) {
struct inet6_dev *loopback_idev = in6_dev_get(&loopback_dev);
if (loopback_idev != NULL) {
rt->rt6i_idev = loopback_idev;
in6_dev_put(idev);
}
}
}

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 06:59:34

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sun, Feb 06, 2005 at 02:31:07PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Here, lo is going down.
> rt->rt6i_dev = lo and rt->rt6i_idev = ethX.
> I think we already see dst->dev == dev (==lo) now.
> So, I doubt that fix the problem.
>
> The source of problem is entry (*) which still on routing entry,
> not on gc list. And, the owner of entry is not routing table but
> unicast/anycast address structure(s).
> We need to "kill" active address on the other interfaces.
>
> *: rt->rt6i_dev = lo and rt->rt6i_idev = ethX

Sorry I don't think this is right. Although lo going down is
required to cause those symptoms, it is not the trigger.

The problem only occurs when eth0 itself is unregistered.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 07:04:58

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sun, Feb 06, 2005 at 02:50:07PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> Which means in addrconf_notiry(), if the dev == &loopback_dev,
> call addrconf_ifdown for every device like this:

This should fix the reported issue. However, I'm not sure if it's
a good idea to stop all IP traffic when lo goes down. We don't do
that for IPv4.

Besides, we'll still need to fix the rt6i_idev GC issue since the
same bug can occur when eth0 goes down and some appliation is holding
a dst to a local address route. It can become a dead-lock if the
said application then invokes a syscall that takes the RTNL.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 10:55:39

by Andre Tomt

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

Herbert Xu wrote:
> On Fri, Feb 04, 2005 at 09:38:13PM -0800, David S. Miller wrote:
>
>>It is just the first such thing I found, scanning rt6i_idev uses
>>will easily find several others.
>
>
> You're right of course. I thought they were all harmless but I was
> obviously wrong about this one.
>
> So here is a patch that essentially reverts the split devices
> semantics introduced by these two changesets:
>
> [IPV6] addrconf_dst_alloc() to allocate new route for local address.
> [IPV6] take rt6i_idev into account when looking up routes.
>
> Signed-off-by: Herbert Xu ~{PmV>HI~} <[email protected]>

Now that this fix have been written off as probably wrong; how much does
it break? As far as I've understood it "just" reverts to old semantics,
probably not correct semantics, but still not unexpected semantics
(like, say, hang on device unregistration ;) )

I'm contemplating just using it as a quick-fix until 2.6.11 to get this
problem under control.

2005-02-06 11:34:07

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sun, 06 Feb 2005 11:55:19 +0100), Andre Tomt <[email protected]> says:

> I'm contemplating just using it as a quick-fix until 2.6.11 to get this
> problem under control.

Would you find if my patch works? Thanks.

--yoshfuji

2005-02-06 11:47:27

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sat, Feb 05, 2005 at 09:45:59PM +1100, herbert wrote:
>
> Although I still think this is a bug, I'm now starting to suspect
> that there is another bug around as well.
>
> There is probably an ifp leak which in turn leads to a split dst
> leak that allows the first bug to make its mark.

Found it. This is what happens:

lo goes down =>
rt6_ifdown =>
eth0's local address route gets deleted

eth0 goes down =>
__ipv6_ifa_notify =>
ip6_del_rt fails so we fall through to the
dst_free path. At this point the refcount
taken by __ipv6_ifa_notify is leaked.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-06 12:29:28

by YOSHIFUJI Hideaki

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

In article <[email protected]> (at Sun, 6 Feb 2005 22:41:45 +1100), Herbert Xu <[email protected]> says:

> On Sat, Feb 05, 2005 at 09:45:59PM +1100, herbert wrote:
> >
> > Although I still think this is a bug, I'm now starting to suspect
> > that there is another bug around as well.
> >
> > There is probably an ifp leak which in turn leads to a split dst
> > leak that allows the first bug to make its mark.
>
> Found it. This is what happens:
>
> lo goes down =>
> rt6_ifdown =>
> eth0's local address route gets deleted
>
> eth0 goes down =>
> __ipv6_ifa_notify =>
> ip6_del_rt fails so we fall through to the
> dst_free path. At this point the refcount
> taken by __ipv6_ifa_notify is leaked.

Oh, you're right! Thanks.

How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?

Signed-off-by: Hideaki YOSHIFUJI <[email protected]>

===== include/linux/ipv6_route.h 1.6 vs edited =====
--- 1.6/include/linux/ipv6_route.h 2004-10-26 12:54:23 +09:00
+++ edited/include/linux/ipv6_route.h 2005-02-06 21:27:02 +09:00
@@ -26,6 +26,7 @@
#define RTF_FLOW 0x02000000 /* flow significant route */
#define RTF_POLICY 0x04000000 /* policy route */

+#define RTF_ANYCAST 0x40000000
#define RTF_LOCAL 0x80000000

struct in6_rtmsg {
===== net/ipv6/route.c 1.105 vs edited =====
--- 1.105/net/ipv6/route.c 2005-01-15 17:44:48 +09:00
+++ edited/net/ipv6/route.c 2005-02-06 21:26:35 +09:00
@@ -1408,7 +1408,9 @@
rt->u.dst.obsolete = -1;

rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
- if (!anycast)
+ if (anycast)
+ rt->rt6i_flags |= RTF_ANYCAST;
+ else
rt->rt6i_flags |= RTF_LOCAL;
rt->rt6i_nexthop = ndisc_get_neigh(rt->rt6i_dev, &rt->rt6i_gateway);
if (rt->rt6i_nexthop == NULL) {
@@ -1427,7 +1429,8 @@
static int fib6_ifdown(struct rt6_info *rt, void *arg)
{
if (((void*)rt->rt6i_dev == arg || arg == NULL) &&
- rt != &ip6_null_entry) {
+ rt != &ip6_null_entry &&
+ !(rt->rt6i_flags & (RTF_LOCAL|RTF_ANYCAST))) {
RT6_TRACE("deleted by ifdown %p\n", rt);
return -1;
}

--
Hideaki YOSHIFUJI @ USAGI Project <[email protected]>
GPG FP: 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA

2005-02-06 21:03:04

by Herbert Xu

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.11-rc2 hangs on bridge shutdown (br0)

On Sun, Feb 06, 2005 at 09:30:18PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
>
> How about this; Ignore entries addrconf_dst_alloc'ed entries in rt6_ifdown()?

Great, that definitely fixes the local address problem.

I'm not sure about anycast routes though. Who's going to delete them
when the real device goes down?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-02-08 01:31:06

by Herbert Xu

[permalink] [raw]
Subject: [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown

On Sun, Feb 06, 2005 at 05:51:17PM +1100, herbert wrote:
>
> The idea is to move the check into dst->ops->ifdown. By definition
> ipv6_dst_ifdown will only see rt6_info entries. So dst_dev_event
> will become

Here are the patches to do this. Do they look sane?

This one moves the dst->child processing from dst_ifdown into
xfrm_dst_ifdown.

Signed-off-by: Herbert Xu <[email protected]>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Attachments:
(No filename) (622.00 B)
dst-patch-1 (2.46 kB)
Download all attachments

2005-02-08 01:34:07

by Herbert Xu

[permalink] [raw]
Subject: Re: [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown

On Tue, Feb 08, 2005 at 12:29:29PM +1100, herbert wrote:
>
> This one moves the dst->child processing from dst_ifdown into
> xfrm_dst_ifdown.

This patch adds a net_device argument to ifdown. After all,
it's a bit silly to notify someone of an ifdown event without
telling them what which device it was for :)

Signed-off-by: Herbert Xu <[email protected]>

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Attachments:
(No filename) (588.00 B)
dst-patch-2 (4.16 kB)
Download all attachments

2005-02-15 22:58:51

by David Miller

[permalink] [raw]
Subject: Re: [IPSEC] Move dst->child loop from dst_ifdown to xfrm_dst_ifdown

On Tue, 8 Feb 2005 12:31:40 +1100
Herbert Xu <[email protected]> wrote:

> On Tue, Feb 08, 2005 at 12:29:29PM +1100, herbert wrote:
> >
> > This one moves the dst->child processing from dst_ifdown into
> > xfrm_dst_ifdown.
>
> This patch adds a net_device argument to ifdown. After all,
> it's a bit silly to notify someone of an ifdown event without
> telling them what which device it was for :)
>
> Signed-off-by: Herbert Xu <[email protected]>

Ok, I'm going to try and get these two patches into 2.6.11