LinuxLists.cc - 2.6.7-rc3: waiting for eth0 to become free

2004-06-07 22:52:25

Subject: 2.6.7-rc3: waiting for eth0 to become free

Hi!

On my laptop, when using a CardBus 3c59x-based NIC, I need to run
"cardctl eject" so the system won't freeze when resuming. "cardctl
eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
network sockets opened (for example, Evolution mantaining a connection
against an IMAP server): the card is ejected (well, not physically),
even when there are ESTABLISHED connections.

However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
holds any socket open. After a while the "unregister_netdevice: waiting
for eth0 to become free" message starts appearing on the kernel message
ring. The only apparent solution is killing that program, ejecting the
card from its slot and wait until 3c59x.o usage count reaches zero.

Can someone tell me what's going on here?
Thank you very much.

2004-06-07 23:10:18

by Erik Tews

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

Am Di, den 08.06.2004 schrieb Felipe Alfaro Solana um 0:52:
> On my laptop, when using a CardBus 3c59x-based NIC, I need to run
> "cardctl eject" so the system won't freeze when resuming. "cardctl
> eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
> network sockets opened (for example, Evolution mantaining a connection
> against an IMAP server): the card is ejected (well, not physically),
> even when there are ESTABLISHED connections.
>
> However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
> holds any socket open. After a while the "unregister_netdevice: waiting
> for eth0 to become free" message starts appearing on the kernel message
> ring. The only apparent solution is killing that program, ejecting the
> card from its slot and wait until 3c59x.o usage count reaches zero.

I have seen similar problems with my prism2 minipci-card. I often unload
the driver to reset the card, sometimes it hangs during unloading with
the same message. Would it be possible to add some code to backtrace
this lock?

This happens with a lot of recent 2.6 kernels, not always reproduceable.

2004-06-08 06:11:08

by Felipe Alfaro Solana

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Tue, 2004-06-08 at 01:04 +0200, Erik Tews wrote:
> Am Di, den 08.06.2004 schrieb Felipe Alfaro Solana um 0:52:
> > On my laptop, when using a CardBus 3c59x-based NIC, I need to run
> > "cardctl eject" so the system won't freeze when resuming. "cardctl
> > eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
> > network sockets opened (for example, Evolution mantaining a connection
> > against an IMAP server): the card is ejected (well, not physically),
> > even when there are ESTABLISHED connections.
> >
> > However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
> > holds any socket open. After a while the "unregister_netdevice: waiting
> > for eth0 to become free" message starts appearing on the kernel message
> > ring. The only apparent solution is killing that program, ejecting the
> > card from its slot and wait until 3c59x.o usage count reaches zero.
>
> I have seen similar problems with my prism2 minipci-card. I often unload
> the driver to reset the card, sometimes it hangs during unloading with
> the same message. Would it be possible to add some code to backtrace
> this lock?
>
> This happens with a lot of recent 2.6 kernels, not always reproduceable.

I can reproduce this consistently with 2.6.7-rc3: launch an ftp session
against a remote server and then try running "cardctl eject". In my
case, it just refuses to unload with an usage count of 1.

2004-06-08 08:24:07

by Russell King

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Tue, Jun 08, 2004 at 12:52:21AM +0200, Felipe Alfaro Solana wrote:
> Can someone tell me what's going on here?

I think you want to address your mail to the linux-net or netdev mailing
lists. There's far more networking people on those lists.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-06-09 13:08:10

by Felipe Alfaro Solana

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Tue, 2004-06-08 at 14:02 -0700, Stephen Hemminger wrote:
> On Tue, 08 Jun 2004 22:09:29 +0200
> Felipe Alfaro Solana <[email protected]> wrote:
>
> > On Tue, 2004-06-08 at 12:42 -0700, Stephen Hemminger wrote:
> > > On Tue, 08 Jun 2004 21:18:30 +0200
> > > Felipe Alfaro Solana <[email protected]> wrote:
> > >
> > > > Hi!
> > > >
> > > > On my laptop, when using a CardBus 3c59x-based NIC, I need to run
> > > > "cardctl eject" so the system won't freeze when resuming. "cardctl
> > > > eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
> > > > network sockets opened (for example, Evolution mantaining a connection
> > > > against an IMAP server): the card is ejected (well, not physically),
> > > > even when there are ESTABLISHED connections.
> > > >
> > > > However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
> > > > holds any socket open. After a while the "unregister_netdevice: waiting
> > > > for eth0 to become free" message starts appearing on the kernel message
> > > > ring. The only apparent solution is killing that program, ejecting the
> > > > card from its slot and wait until 3c59x.o usage count reaches zero.
> > > >
> > > > Can someone tell me what's going on here?
> > > > Thank you very much.
> > >
> > > What protocols are you running? Is IPV6 loaded?
> >
> > I'm using IPv4, IPv6 and IPSec ESP with AES/CBC.
> > Do you want .config?
>
> Not really, could you see if it is an IPv6 vs IPSec problem by not running/loading
> one or the other.
>
> What is happening is that some subsystem is holding a reference to the device (calling dev_hold())
> but not cleaning up (calling dev_put). It can be a hard to track which of the many
> things routing, etc are not being cleared properly. Look for routes that still
> get stuck (ip route) and neighbor cache entries. Most of these end up being
> protocol bugs.

I think you were pointing me in the right direction. I've found the
following changes to net/ipv4/route.c and net/ipv6/route.c between
2.6.7-rc2-mm2 and 2.6.7-rc3-mm1:

diff -uNr linux-2.6.7-rc2-mm2/net/ipv4/route.c linux-2.6.7-rc3-mm1/net/
ipv4/route.c
--- linux-2.6.7-rc2-mm2/net/ipv4/route.c 2004-06-09 13:15:46.000000000
+0200
+++ linux-2.6.7-rc3-mm1/net/ipv4/route.c 2004-06-09 12:55:19.000000000
+0200
@@ -1040,6 +1040,8 @@
rt->u.dst.child = NULL;
if (rt->u.dst.dev)
dev_hold(rt->u.dst.dev);
+ if (rt->idev)
+ in_dev_hold(rt->idev);
rt->u.dst.obsolete = 0;
rt->u.dst.lastuse = jiffies;
rt->u.dst.path = &rt->u.dst;
@@ -1321,11 +1323,17 @@
{
struct rtable *rt = (struct rtable *) dst;
struct inet_peer *peer = rt->peer;
+ struct in_device *idev = rt->idev;

if (peer) {
rt->peer = NULL;
inet_putpeer(peer);
}
+
+ if (idev) {
+ rt->idev = NULL;
+ in_dev_put(idev);
+ }
}

static void ipv4_link_failure(struct sk_buff *skb)
@@ -1339,8 +1347,10 @@
dst_set_expires(&rt->u.dst, 0);
}

-static int ip_rt_bug(struct sk_buff *skb)
+static int ip_rt_bug(struct sk_buff **pskb)
{
+ struct sk_buff *skb = *pskb;
+
printk(KERN_DEBUG "ip_rt_bug: %u.%u.%u.%u -> %u.%u.%u.%u, %s\n",
NIPQUAD(skb->nh.iph->saddr), NIPQUAD(skb->nh.iph->daddr),
skb->dev ? skb->dev->name : "?");
@@ -1486,6 +1496,7 @@
rth->fl.iif = dev->ifindex;
rth->u.dst.dev = &loopback_dev;
dev_hold(rth->u.dst.dev);
+ rth->idev = in_dev_get(rth->u.dst.dev);
rth->fl.oif = 0;
rth->rt_gateway = daddr;
rth->rt_spec_dst= spec_dst;
@@ -1695,6 +1706,7 @@
rth->fl.iif = dev->ifindex;
rth->u.dst.dev = out_dev->dev;
dev_hold(rth->u.dst.dev);
+ rth->idev = in_dev_get(rth->u.dst.dev);
rth->fl.oif = 0;
rth->rt_spec_dst= spec_dst;

@@ -1774,6 +1786,7 @@
rth->fl.iif = dev->ifindex;
rth->u.dst.dev = &loopback_dev;
dev_hold(rth->u.dst.dev);
+ rth->idev = in_dev_get(rth->u.dst.dev);
rth->rt_gateway = daddr;
rth->rt_spec_dst= spec_dst;
rth->u.dst.input= ip_local_deliver;
@@ -2157,6 +2170,7 @@
rth->rt_iif = oldflp->oif ? : dev_out->ifindex;
rth->u.dst.dev = dev_out;
dev_hold(dev_out);
+ rth->idev = in_dev_get(dev_out);
rth->rt_gateway = fl.fl4_dst;
rth->rt_spec_dst= fl.fl4_src;

diff -uNr linux-2.6.7-rc2-mm2/net/ipv6/route.c linux-2.6.7-rc3-mm1/net/
ipv6/route.c
--- linux-2.6.7-rc2-mm2/net/ipv6/route.c 2004-06-09 13:15:46.000000000
+0200
+++ linux-2.6.7-rc3-mm1/net/ipv6/route.c 2004-06-09 12:55:19.000000000
+0200
@@ -83,9 +83,11 @@
static struct rt6_info * ip6_rt_copy(struct rt6_info *ort);
static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32
cookie);
static struct dst_entry *ip6_negative_advice(struct dst_entry *);
+static void ip6_dst_destroy(struct dst_entry *);
static int ip6_dst_gc(void);

static int ip6_pkt_discard(struct sk_buff *skb);
+static int ip6_pkt_discard_out(struct sk_buff **pskb);
static void ip6_link_failure(struct sk_buff *skb);
static void ip6_rt_update_pmtu(struct dst_entry *dst, u32 mtu);

@@ -95,6 +97,7 @@
.gc = ip6_dst_gc,
.gc_thresh = 1024,
.check = ip6_dst_check,
+ .destroy = ip6_dst_destroy,
.negative_advice = ip6_negative_advice,
.link_failure = ip6_link_failure,
.update_pmtu = ip6_rt_update_pmtu,
@@ -111,7 +114,7 @@
.error = -ENETUNREACH,
.metrics = { [RTAX_HOPLIMIT - 1] = 255, },
.input = ip6_pkt_discard,
- .output = ip6_pkt_discard,
+ .output = ip6_pkt_discard_out,
.ops = &ip6_dst_ops,
.path = (struct dst_entry*)&ip6_null_entry,
}
@@ -134,7 +137,15 @@
/* allocate dst with ip6_dst_ops */
static __inline__ struct rt6_info *ip6_dst_alloc(void)
{
- return dst_alloc(&ip6_dst_ops);
+ return (struct rt6_info *)dst_alloc(&ip6_dst_ops);
+}
+
+static void ip6_dst_destroy(struct dst_entry *dst)
+{
+ struct rt6_info *rt = (struct rt6_info *)dst;
+ if (rt->rt6i_idev != NULL)
+ in6_dev_put(rt->rt6i_idev);
+
}

/*
@@ -566,21 +577,21 @@
struct dst_entry *ndisc_dst_alloc(struct net_device *dev,
struct neighbour *neigh,
struct in6_addr *addr,
- int (*output)(struct sk_buff *))
+ int (*output)(struct sk_buff **))
{
struct rt6_info *rt = ip6_dst_alloc();

if (unlikely(rt == NULL))
goto out;

- if (dev)
- dev_hold(dev);
+ dev_hold(dev);
if (neigh)
neigh_hold(neigh);
else
neigh = ndisc_get_neigh(dev, addr);

rt->rt6i_dev = dev;
+ rt->rt6i_idev = in6_dev_get(dev);
rt->rt6i_nexthop = neigh;
rt->rt6i_expires = 0;
rt->rt6i_flags = RTF_LOCAL;
@@ -714,6 +725,12 @@
if (rtmsg->rtmsg_src_len)
return -EINVAL;
#endif
+ if (rtmsg->rtmsg_ifindex) {
+ dev = dev_get_by_index(rtmsg->rtmsg_ifindex);
+ if (!dev)
+ return -ENODEV;
+ }
+
if (rtmsg->rtmsg_metric == 0)
rtmsg->rtmsg_metric = IP6_RT_PRIO_USER;

@@ -739,13 +756,6 @@

rt->u.dst.output = ip6_output;

- if (rtmsg->rtmsg_ifindex) {
- dev = dev_get_by_index(rtmsg->rtmsg_ifindex);
- err = -ENODEV;
- if (dev == NULL)
- goto out;
- }
-
ipv6_addr_prefix(&rt->rt6i_dst.addr,
&rtmsg->rtmsg_dst, rtmsg->rtmsg_dst_len);
rt->rt6i_dst.plen = rtmsg->rtmsg_dst_len;
@@ -769,7 +779,7 @@
dev_put(dev);
dev = &loopback_dev;
dev_hold(dev);
- rt->u.dst.output = ip6_pkt_discard;
+ rt->u.dst.output = ip6_pkt_discard_out;
rt->u.dst.input = ip6_pkt_discard;
rt->u.dst.error = -ENETUNREACH;
rt->rt6i_flags = RTF_REJECT|RTF_NONEXTHOP;
@@ -872,6 +882,7 @@
if (!rt->u.dst.metrics[RTAX_ADVMSS-1])
rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst));
rt->u.dst.dev = dev;
+ rt->rt6i_idev = in6_dev_get(dev);
return rt6_ins(rt, nlh, _rtattr);

out:
@@ -1138,6 +1149,9 @@
rt->u.dst.dev = ort->u.dst.dev;
if (rt->u.dst.dev)
dev_hold(rt->u.dst.dev);
+ rt->rt6i_idev = ort->rt6i_idev;
+ if (rt->rt6i_idev)
+ in6_dev_hold(rt->rt6i_idev);
rt->u.dst.lastuse = jiffies;
rt->rt6i_expires = 0;

@@ -1259,12 +1273,17 @@

int ip6_pkt_discard(struct sk_buff *skb)
{
- IP6_INC_STATS(Ip6OutNoRoutes);
+ IP6_INC_STATS(OutNoRoutes);
icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_NOROUTE, 0, skb->dev);
kfree_skb(skb);
return 0;
}

+int ip6_pkt_discard_out(struct sk_buff **pskb)
+{
+ return ip6_pkt_discard(*pskb);
+}
+
/*
* Add address
*/
@@ -1282,6 +1301,7 @@
rt->u.dst.input = ip6_input;
rt->u.dst.output = ip6_output;
rt->rt6i_dev = &loopback_dev;
+ rt->rt6i_idev = in6_dev_get(&loopback_dev);
rt->u.dst.metrics[RTAX_MTU-1] = ipv6_get_mtu(rt->rt6i_dev);
rt->u.dst.metrics[RTAX_ADVMSS-1] = ipv6_advmss(dst_pmtu(&rt->u.dst));
rt->u.dst.metrics[RTAX_HOPLIMIT-1] = ipv6_get_hoplimit(rt->rt6i_dev);

It seems there is some kind of misreferencing, which was introduced by
the previous changes. I'm still trying to figure out what's going on.

Reverting these changes allows "cardctl eject" to proceed even when a
userspace process has an active open socket. However, reverting these
changes breaks IPv6 a little bit for me.

I don't have access to BK, but it could be interesting to look at the
changesets that caused these changes.

Any other ideas?

2004-06-09 15:18:13

by Felipe Alfaro Solana

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

Attachments:

IPV4.patch (1.26 kB)
IPV6.patch (1.05 kB)
Download all attachments

2004-06-09 22:49:27

by Christian Kujau

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Felipe Alfaro Solana <[email protected]> wrote:
|>What is happening is that some subsystem is holding a reference to the
device (calling dev_hold())
|>but not cleaning up (calling dev_put). It can be a hard to track
which of the many
|>things routing, etc are not being cleared properly. Look for routes
that still
|>get stuck (ip route) and neighbor cache entries. Most of these end up
being
|>protocol bugs.
|
|
| The two attached patches, one for net/ipv4/route.c, the other for net/
| ipv6/route.c fix all my problems when running "cardctl eject" while a
| program mantains an open network socket (ESTABLISHED).
|
| Both patches apply cleanly against 2.6.7-rc3 and 2.6.7-rc3-mm1.
| I'm not completely sure what has changed in 2.6.7-rc3 that is breaking
| cardctl for me, as it Just Worked(TM) fine in 2.6.7-rc2.

do you know, by any chance, if this error is dependent to eth0 only or
could help for my error message too:

unregister_netdevice: waiting for ppp0 to become free. Usage count = 1

happened just a few hours ago (2.6.7-rc3), i had to reboot the box
anyway, but pppd was not able to die (even with kill -9)

Christian.
- --
BOFH excuse #258:

That's easy to fix, but I can't be bothered.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAx5PN+A7rjkF8z0wRAuR+AJ41024qDMPVWYlVeofUZ6N50E3oRwCfeqhs
/GxxIqmDbClJXw/i2WNhJt4=
=lHgP
-----END PGP SIGNATURE-----

2004-06-10 06:08:54

by Felipe Alfaro Solana

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Thu, 2004-06-10 at 00:48 +0200, Christian Kujau wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Felipe Alfaro Solana <[email protected]> wrote:
> |>What is happening is that some subsystem is holding a reference to the
> device (calling dev_hold())
> |>but not cleaning up (calling dev_put). It can be a hard to track
> which of the many
> |>things routing, etc are not being cleared properly. Look for routes
> that still
> |>get stuck (ip route) and neighbor cache entries. Most of these end up
> being
> |>protocol bugs.
> |
> |
> | The two attached patches, one for net/ipv4/route.c, the other for net/
> | ipv6/route.c fix all my problems when running "cardctl eject" while a
> | program mantains an open network socket (ESTABLISHED).
> |
> | Both patches apply cleanly against 2.6.7-rc3 and 2.6.7-rc3-mm1.
> | I'm not completely sure what has changed in 2.6.7-rc3 that is breaking
> | cardctl for me, as it Just Worked(TM) fine in 2.6.7-rc2.
>
> do you know, by any chance, if this error is dependent to eth0 only or
> could help for my error message too:
>
> unregister_netdevice: waiting for ppp0 to become free. Usage count = 1

I think the mentioned error is not dependent on any specific interface
(let it be eth0, or ppp0), but any interface in general which has a
routing entry and is the target/source of IP traffic. This is based on
the fact that my fixes play with the refcounting on any interface. not
just eth0 specifically, and pertain to both IPv4 and IPv6 core.

However, I detected this behavior on my eth0, since this is the only
interface I have on my laptop. You just can try both patches against
2.6.7-rc3 or 2.6.7-rc3-mm1 to see if them cure your problems.

> happened just a few hours ago (2.6.7-rc3), i had to reboot the box
> anyway, but pppd was not able to die (even with kill -9)

In my case, I was able to trigger the problem by running "cardctl eject"
which was then stuck at D state. Killing any program using a network
socket, and waiting for opened connections to transition from
ESTABLISHED to TIME_WAIT and then being closed, allowed "cardctl" to
exit the D state.

2004-06-10 11:07:04

by Christian Kujau

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Felipe Alfaro Solana wrote:
| I think the mentioned error is not dependent on any specific interface
| (let it be eth0, or ppp0), but any interface in general which has a
| routing entry and is the target/source of IP traffic. This is based on
| the fact that my fixes play with the refcounting on any interface. not
| just eth0 specifically, and pertain to both IPv4 and IPv6 core.

ok.

| In my case, I was able to trigger the problem by running "cardctl eject"
| which was then stuck at D state. Killing any program using a network
| socket, and waiting for opened connections to transition from
| ESTABLISHED to TIME_WAIT and then being closed, allowed "cardctl" to
| exit the D state.

no having pcmcia here, i'll see if i can reproduce it to / see what the
patches will do.

Thanks for the explanation,
Christian.
- --
BOFH excuse #439:

Hot Java has gone cold
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAyEDH+A7rjkF8z0wRAusDAKCp2WW4LO01hP9ZDXa3N6eH7cuvIgCg2dTz
IzprZryuJ/VuiRY/DGvMH24=
=f7qL
-----END PGP SIGNATURE-----

2004-06-10 15:37:35

by Stephen Hemminger

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Wed, 09 Jun 2004 17:18:02 +0200
Felipe Alfaro Solana <[email protected]> wrote:

> On Tue, 2004-06-08 at 14:02 -0700, Stephen Hemminger wrote:
> > On Tue, 08 Jun 2004 22:09:29 +0200
> > Felipe Alfaro Solana <[email protected]> wrote:
> >
> > > On Tue, 2004-06-08 at 12:42 -0700, Stephen Hemminger wrote:
> > > > On Tue, 08 Jun 2004 21:18:30 +0200
> > > > Felipe Alfaro Solana <[email protected]> wrote:
> > > >
> > > > > Hi!
> > > > >
> > > > > On my laptop, when using a CardBus 3c59x-based NIC, I need to run
> > > > > "cardctl eject" so the system won't freeze when resuming. "cardctl
> > > > > eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
> > > > > network sockets opened (for example, Evolution mantaining a connection
> > > > > against an IMAP server): the card is ejected (well, not physically),
> > > > > even when there are ESTABLISHED connections.
> > > > >
> > > > > However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
> > > > > holds any socket open. After a while the "unregister_netdevice: waiting
> > > > > for eth0 to become free" message starts appearing on the kernel message
> > > > > ring. The only apparent solution is killing that program, ejecting the
> > > > > card from its slot and wait until 3c59x.o usage count reaches zero.
> > > > >
> > > > > Can someone tell me what's going on here?
> > > > > Thank you very much.
> > > >
> > > > What protocols are you running? Is IPV6 loaded?
> > >
> > > I'm using IPv4, IPv6 and IPSec ESP with AES/CBC.
> > > Do you want .config?
> >
> > Not really, could you see if it is an IPv6 vs IPSec problem by not running/loading
> > one or the other.
> >
> > What is happening is that some subsystem is holding a reference to the device (calling dev_hold())
> > but not cleaning up (calling dev_put). It can be a hard to track which of the many
> > things routing, etc are not being cleared properly. Look for routes that still
> > get stuck (ip route) and neighbor cache entries. Most of these end up being
> > protocol bugs.
>
> The two attached patches, one for net/ipv4/route.c, the other for net/
> ipv6/route.c fix all my problems when running "cardctl eject" while a
> program mantains an open network socket (ESTABLISHED).
>
> Both patches apply cleanly against 2.6.7-rc3 and 2.6.7-rc3-mm1.
> I'm not completely sure what has changed in 2.6.7-rc3 that is breaking
> cardctl for me, as it Just Worked(TM) fine in 2.6.7-rc2.
>
> Hope this can throw some light at this issue.

Since you effectively remove rth->idev, why not remove it from the structure
to make sure no code is still expecting it to be set.

2004-06-10 19:33:56

by Diego Calleja

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

El Wed, 09 Jun 2004 17:18:02 +0200 Felipe Alfaro Solana <[email protected]> escribi?:

I'm seeing the same with a ppp link: pppd can't be killed even with SIGKILL (it
eats 100% of cpu - all in sys time, readprofile attached) and the kernel spits
out those messages:

unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
etc...

It doesn't happen always. i've seen it a couple of times, right now
I'm trying to reproduce it with the patch felipe provided (this is plain
-rc3 too)

This is the trace of the looping pppd:

pppd S C140E820 0 721 1 18063 17268 (NOTLB)
004d8032 7a6e2f0b 00000f8c c140e820 00000001 ddce8170 dd749e18 dd749e18
00000000 00000001 c032bc08 0000000a dd749e64 c011f454 00000046 00000001
c03500a4 00000046 c03500a8 c0110f66 ddce8170 ddce8170 010bdf60 dd749eec
Call Trace:
[<c011f454>] __do_softirq+0xb4/0xc0
[<c0110f66>] smp_apic_timer_interrupt+0xe6/0x150
[<c0104b1a>] apic_timer_interrupt+0x1a/0x20
[<c02a42d1>] schedule+0x71/0x640
[<c0104b1a>] apic_timer_interrupt+0x1a/0x20
[<c0123ad0>] process_timeout+0x0/0x10
[<c0123218>] del_singleshot_timer_sync+0x8/0x30
[<c02a4f79>] schedule_timeout+0x69/0xc0
[<c0104a98>] common_interrupt+0x18/0x20
[<c0123ad0>] process_timeout+0x0/0x10
[<c0259d15>] netdev_wait_allrefs+0x55/0x110
[<c0259efc>] netdev_run_todo+0x12c/0x240
[<e08aa1a0>] ppp_asynctty_close+0x0/0x100 [ppp_async]
[<e08b4c5b>] ppp_shutdown_interface+0x8b/0xf0 [ppp_generic]
[<e08b2082>] ppp_release+0x52/0x60 [ppp_generic]
[<c0152a56>] __fput+0x106/0x120
[<c015129f>] filp_close+0x4f/0x80
[<c01040d9>] sysenter_past_esp+0x52/0x71

profile (it's a dual box, the bug only keeps 100% one cpu, i reseted the
profile and let it run 5-10 seconds before getting this)

23165 total 0,0134
11517 default_idle 179,9531
4099 schedule 2,5619
3572 __mod_timer 6,3786
1867 del_timer 11,6687
1202 sched_clock 8,3472
508 schedule_timeout 2,6458
194 netdev_wait_allrefs 0,7132
157 del_singleshot_timer_sync 3,2708
13 __copy_user_intel 0,0739

Diego Calleja

2004-06-10 20:08:50

by Felipe Alfaro Solana

[permalink] [raw]

Subject: Re: 2.6.7-rc3: waiting for eth0 to become free

On Thu, 2004-06-10 at 08:36 -0700, Stephen Hemminger wrote:
> On Wed, 09 Jun 2004 17:18:02 +0200
> Felipe Alfaro Solana <[email protected]> wrote:
>
> > On Tue, 2004-06-08 at 14:02 -0700, Stephen Hemminger wrote:
> > > On Tue, 08 Jun 2004 22:09:29 +0200
> > > Felipe Alfaro Solana <[email protected]> wrote:
> > >
> > > > On Tue, 2004-06-08 at 12:42 -0700, Stephen Hemminger wrote:
> > > > > On Tue, 08 Jun 2004 21:18:30 +0200
> > > > > Felipe Alfaro Solana <[email protected]> wrote:
> > > > >
> > > > > > Hi!
> > > > > >
> > > > > > On my laptop, when using a CardBus 3c59x-based NIC, I need to run
> > > > > > "cardctl eject" so the system won't freeze when resuming. "cardctl
> > > > > > eject" worked fine in 2.6.7-rc2-mm2, even when there were programs with
> > > > > > network sockets opened (for example, Evolution mantaining a connection
> > > > > > against an IMAP server): the card is ejected (well, not physically),
> > > > > > even when there are ESTABLISHED connections.
> > > > > >
> > > > > > However, starting with 2.6.7-rc3, "cardctl eject" hangs if a program
> > > > > > holds any socket open. After a while the "unregister_netdevice: waiting
> > > > > > for eth0 to become free" message starts appearing on the kernel message
> > > > > > ring. The only apparent solution is killing that program, ejecting the
> > > > > > card from its slot and wait until 3c59x.o usage count reaches zero.
> > > > > >
> > > > > > Can someone tell me what's going on here?
> > > > > > Thank you very much.
> > > > >
> > > > > What protocols are you running? Is IPV6 loaded?
> > > >
> > > > I'm using IPv4, IPv6 and IPSec ESP with AES/CBC.
> > > > Do you want .config?
> > >
> > > Not really, could you see if it is an IPv6 vs IPSec problem by not running/loading
> > > one or the other.
> > >
> > > What is happening is that some subsystem is holding a reference to the device (calling dev_hold())
> > > but not cleaning up (calling dev_put). It can be a hard to track which of the many
> > > things routing, etc are not being cleared properly. Look for routes that still
> > > get stuck (ip route) and neighbor cache entries. Most of these end up being
> > > protocol bugs.
> >
> > The two attached patches, one for net/ipv4/route.c, the other for net/
> > ipv6/route.c fix all my problems when running "cardctl eject" while a
> > program mantains an open network socket (ESTABLISHED).
> >
> > Both patches apply cleanly against 2.6.7-rc3 and 2.6.7-rc3-mm1.
> > I'm not completely sure what has changed in 2.6.7-rc3 that is breaking
> > cardctl for me, as it Just Worked(TM) fine in 2.6.7-rc2.
> >
> > Hope this can throw some light at this issue.
>
> Since you effectively remove rth->idev, why not remove it from the structure
> to make sure no code is still expecting it to be set.

What about the following one? Tested on 2.6.7-rc3-mm1.

Attachments:

ip-route-fix-refcount.patch (3.24 kB)