Summary:
On 2.5.70 and later kernels, shutting down a pppoe connection causes
pppd to hang and results in a usage count stuck at 1.
Details:
I have a pppoe dsl connection and I use the roaring penguin stuff that
comes default with Mandrake 9. My connection is brought up at init
time. With kernels past 2.5.69, if I try and shut down the connection I
get logs as follows:
Jun 29 17:18:29 doug adsl-stop: Killing pppd
Jun 29 17:18:29 doug pppd[779]: Terminating on signal 15.
Jun 29 17:18:29 doug adsl-stop: Killing adsl-connect
Jun 29 17:18:29 doug pppd[779]: Connection terminated.
Jun 29 17:18:29 doug pppd[779]: Connect time 1.3 minutes.
Jun 29 17:18:29 doug pppd[779]: Sent 902 bytes, received 588 bytes.
Jun 29 17:18:32 doug pppoe[781]: Session 2991 terminated -- received
PADT from peer
Jun 29 17:18:32 doug pppoe[781]: Sent PADT
Jun 29 17:18:39 doug kernel: unregister_netdevice: waiting for ppp0 to
become free. Usage count = 1
Jun 29 17:18:45 doug ntpd[1094]: sendto(132.246.168.148): Invalid argument
Jun 29 17:18:46 doug smbd[1510]: [2003/06/29 17:18:46, 0]
smbd/server.c:open_sockets(238)
Jun 29 17:18:46 doug smbd[1510]: Got SIGHUP
Jun 29 17:18:46 doug smb: smbd -HUP succeeded
Jun 29 17:18:49 doug kernel: unregister_netdevice: waiting for ppp0 to
become free. Usage count = 1
Jun 29 17:19:29 doug last message repeated 4 times
Jun 29 17:20:39 doug last message repeated 7 times
Also, pppd is stuck in a busy-loop, and isn't killable even with -9.
Interestingly, top shows it in the R state. I thought that wasn't
supposed to happen?
With 2.5.69, the shutdown messages look like:
Jun 29 21:56:17 doug adsl-stop: Killing pppd
Jun 29 21:56:17 doug pppd[778]: Terminating on signal 15.
Jun 29 21:56:17 doug adsl-stop: Killing adsl-connect
Jun 29 21:56:17 doug pppd[778]: Connection terminated.
Jun 29 21:56:17 doug pppd[778]: Connect time 9.7 minutes.
Jun 29 21:56:17 doug pppd[778]: Sent 1510 bytes, received 588 bytes.
Jun 29 21:56:17 doug pppoe[781]: read (asyncReadFromPPP): Session 14:
Input/output error
Jun 29 21:56:17 doug pppoe[781]: Sent PADT
Jun 29 21:56:17 doug pppd[778]: Exit.
The cpu is an athlon xp, no modules loaded.
One interesting tidbit is that this doesn't seem to happen if I remove
the dsl connection from init and do it manually later.
I did a quick scan of the ppp*.c files in drivers/net and these are the
ones with updates that went into 2.5.70.
Affected files are:
ppp_deflate.c 1.10
ppp_generic.c 1.25-1.30
ppp_synctty.c 1.9
Affected userids:
akpm
davem
paulus
torvalds
If anyone wants to propose a patch, I'm willing to try it out.
Thanks,
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
Hello Chris,
> Summary:
> On 2.5.70 and later kernels, shutting down a pppoe connection causes
> pppd to hang and results in a usage count stuck at 1.
>
> Details:
>
> I have a pppoe dsl connection and I use the roaring penguin
> stuff that
> comes default with Mandrake 9. My connection is brought up at init
> time. With kernels past 2.5.69, if I try and shut down the
> connection I
> get logs as follows:
>
> Jun 29 17:18:39 doug kernel: unregister_netdevice: waiting
> for ppp0 to
> become free. Usage count = 1
Interestingly, I've got the same with device tun0 on my box, and
it appeared at the same time.
2.5.70 was really blocking as it even prevented a normal shutdown
of the box :-(
Problem is that now, there is a counter of how many "instances" are
using a netdevice... and somehow, the counter can be inc'ed, but
ppp and tun seems to never be dec'ed...
I haven't found where it is, but I made a quick fix in my own kernel
tree to decrement the reference counter each time the message is
printed... This allow my box to stop bugging me with these messages,
and now it can shutdown nicely.
Sorry, I have no other clue as to where it is broken....
Regards,
Paul
On Mon, Jun 30, 2003 at 08:07:25AM +0200, Paul Rolland wrote:
> > Jun 29 17:18:39 doug kernel: unregister_netdevice: waiting
> > for ppp0 to
> > become free. Usage count = 1
>
> Interestingly, I've got the same with device tun0 on my box, and
> it appeared at the same time.
> 2.5.70 was really blocking as it even prevented a normal shutdown
> of the box :-(
People with PCMCIA cards have been reporting the same thing. It sounds
like something's up with the netdev layer, and it has persisted until
2.5.73 thus far.
Note that it helps to post such messages to the linux-net lists; some
of the net people don't read lkml.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
From: Russell King <[email protected]>
Date: Mon, 30 Jun 2003 09:05:07 +0100
People with PCMCIA cards have been reporting the same thing. It sounds
like something's up with the netdev layer, and it has persisted until
2.5.73 thus far.
If there are bugs in pcmcia drivers, they are _really_ going to show
now. The change is that 'rmmod' is allowed even if the device is
"up". We don't grab/drop module reference counts when the device is
brought up/down. We simply "down" up net devices at
unregister_netdevice() time.
So if a device is racey, it's going to be "really" racey now.
If people mention which devices give the problems (with current
kernels, we've fixed a lot of bugs as of late) the drivers can
be audited for register/unregister bugs.
On Mon, Jun 30, 2003 at 01:03:37AM -0700, David S. Miller wrote:
> From: Russell King <[email protected]>
> Date: Mon, 30 Jun 2003 09:05:07 +0100
>
> People with PCMCIA cards have been reporting the same thing. It sounds
> like something's up with the netdev layer, and it has persisted until
> 2.5.73 thus far.
>
> If there are bugs in pcmcia drivers, they are _really_ going to show
> now. The change is that 'rmmod' is allowed even if the device is
> "up". We don't grab/drop module reference counts when the device is
> brought up/down. We simply "down" up net devices at
> unregister_netdevice() time.
>
> So if a device is racey, it's going to be "really" racey now.
>
> If people mention which devices give the problems (with current
> kernels, we've fixed a lot of bugs as of late) the drivers can
> be audited for register/unregister bugs.
The thread I replied to is about pppoe devices, so it isn't limited to
PCMCIA, although that seems to be the most popular subset which causes
the problem.
Chris Friesen <[email protected]> wrote:
> Summary:
> On 2.5.70 and later kernels, shutting down a pppoe connection causes
> pppd to hang and results in a usage count stuck at 1.
John M Flinchbaugh <[email protected]> wrote:
> i still see it with both my 3c574_cs and my orinoco_cs in 2.5.73.
[email protected] wrote:
> I'm having some problems with 2.5.71 (latest bk yesterday I believe).
> All works well (pcmcia works as advertised, with one tiny blip on
> the horizon), except when I want to reboot, when I get the following
> message:
>
> unregister_netdevice: waiting for eth1 to become free. Usage count = 1
>
> The net device is an Orinoco mini-pci card (eg, cardbus minipci interface
> with built-in orinoco card), and it is down.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
Hello,
> The thread I replied to is about pppoe devices, so it isn't
> limited to PCMCIA, although that seems to be the most popular
> subset which causes the problem.
>
I _do_ confirm this is definitely not related to a PCMCIA stuff
as far as I'm concerned. I'm using a desktop machine, no PCMCIA
stuff in the kernel I built, even as module.
Regards,
Paul
Chris Friesen writes:
> I have a pppoe dsl connection and I use the roaring penguin stuff that
> comes default with Mandrake 9. My connection is brought up at init
> time. With kernels past 2.5.69, if I try and shut down the connection I
> get logs as follows:
Is this the user-mode pppoe or the in-kernel pppoe? IOW, are you
using the pppoe channel type, or do you have the usermode program that
runs pppd behind a pty?
And, do you have any TCP connections open over the link when you take
it down? What version of pppd is it?
Has anyone been able to replicate this without using pppoe? The type
of channel shouldn't make any difference, but I just tried ppp over a
pty and it worked fine (except that Deflate is broken, but that's
another problem).
I have DSL and I could connect it up to a system running 2.5. Maybe
I'll go try that now...
Paul.
I wrote:
> I have DSL and I could connect it up to a system running 2.5. Maybe
> I'll go try that now...
Just tried that... no problems at all. I connected twice, using the
rp-pppoe plugin for pppd that is in the PPP cvs tree together with the
in-kernel pppoe and pppox modules. Both times it shut down cleanly
without putting anything in the kernel logs.
Paul.
Paul Mackerras wrote:
> Is this the user-mode pppoe or the in-kernel pppoe? IOW, are you
> using the pppoe channel type, or do you have the usermode program that
> runs pppd behind a pty?
I believe its the Roaring Penguin usermode one. I'm fairly sure PPPOE
isn't enabled in the kernel. I'm at work now, so it'll have to wait
till this evening to make sure.
> And, do you have any TCP connections open over the link when you take
> it down?
On at least some of the occasions there should have been no connections
open as the machine had just booted and the first thing I did after X
came up was to shutdown adsl.
> What version of pppd is it?
Not sure--will check later. Pretty sure its Mandrake 9 default.
> Has anyone been able to replicate this without using pppoe? The type
> of channel shouldn't make any difference, but I just tried ppp over a
> pty and it worked fine (except that Deflate is broken, but that's
> another problem).
Note that I can only reliably reproduce it if the dsl connection is
brought up at init time. If I don't bring it up automatically at init
but manually bring it up later, the problem doesn't seem to occur.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
On Mon, 30 Jun 2003 10:02:42 -0400
Chris Friesen <[email protected]> wrote:
> Paul Mackerras wrote:
>
> > Is this the user-mode pppoe or the in-kernel pppoe? IOW, are you
> > using the pppoe channel type, or do you have the usermode program that
> > runs pppd behind a pty?
>
> I believe its the Roaring Penguin usermode one. I'm fairly sure PPPOE
> isn't enabled in the kernel. I'm at work now, so it'll have to wait
> till this evening to make sure.
>
> > And, do you have any TCP connections open over the link when you take
> > it down?
>
> On at least some of the occasions there should have been no connections
> open as the machine had just booted and the first thing I did after X
> came up was to shutdown adsl.
>
> > What version of pppd is it?
>
> Not sure--will check later. Pretty sure its Mandrake 9 default.
>
> > Has anyone been able to replicate this without using pppoe? The type
> > of channel shouldn't make any difference, but I just tried ppp over a
> > pty and it worked fine (except that Deflate is broken, but that's
> > another problem).
>
> Note that I can only reliably reproduce it if the dsl connection is
> brought up at init time. If I don't bring it up automatically at init
> but manually bring it up later, the problem doesn't seem to occur.
>
> Chris
PPP did have problems keeping track of the tty until the latest round
if fixes (2.5.73+). The ppp_async module wasn't using owner fields as
reqired.
Also, see if bringing down the ppp connection with ifconfig
before attempting the rmmod helps. i.e.
ifconfig ppp0 down
Stephen Hemminger wrote:
> PPP did have problems keeping track of the tty until the latest round
> if fixes (2.5.73+). The ppp_async module wasn't using owner fields as
> reqired.
bk-current as of last night still showed the same issues.
> Also, see if bringing down the ppp connection with ifconfig
> before attempting the rmmod helps. i.e.
> ifconfig ppp0 down
Will try.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
Paul Mackerras wrote:
> Is this the user-mode pppoe or the in-kernel pppoe? IOW, are you
> using the pppoe channel type, or do you have the usermode program that
> runs pppd behind a pty?
Usermode, roaring penguin pppoe, version 3.5.
> And, do you have any TCP connections open over the link when you take
> it down? What version of pppd is it?
On at least some occasions, no connections open. pppd version 2.4.1
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
Well, I've upgraded to the latest 2.5.74 kernel and pppd version 2.4.2b3
(still using the rp-pppoe userspace software though).
Per Stephen's suggestion I also tried removing the ip address and
bringing down the ppp link before shuttind down the adsl connection.
Makes no difference.
If I start a dsl connection at system init and then as soon as I get a
login prompt I shut the connection down, I get the following log:
Jul 3 00:59:26 doug adsl-stop: Killing pppd
Jul 3 00:59:26 doug pppd[779]: Terminating on signal 15.
Jul 3 00:59:26 doug adsl-stop: Killing adsl-connect
Jul 3 00:59:26 doug pppd[779]: Connection terminated.
Jul 3 00:59:26 doug pppd[779]: Connect time 1.5 minutes.
Jul 3 00:59:26 doug pppd[779]: Sent 978 bytes, received 588 bytes.
Jul 3 00:59:29 doug pppoe[781]: Session 511 terminated -- received PADT
from peer
Jul 3 00:59:29 doug pppoe[781]: Sent PADT
Jul 3 00:59:36 doug kernel: unregister_netdevice: waiting for ppp0 to
become free. Usage count = 1
Jul 3 01:00:16 doug last message repeated 4 times
If I start the connection up manually after I'm booted, I get the following:
Jul 3 00:03:06 doug adsl-stop: Killing pppd
Jul 3 00:03:06 doug pppd[1763]: Terminating on signal 15.
Jul 3 00:03:06 doug adsl-stop: Killing adsl-connect
Jul 3 00:03:06 doug pppd[1763]: Connection terminated.
Jul 3 00:03:06 doug pppd[1763]: Connect time 0.4 minutes.
Jul 3 00:03:06 doug pppd[1763]: Sent 64 bytes, received 70 bytes.
Jul 3 00:03:06 doug pppoe[1769]: read (asyncReadFromPPP): Session 6990:
Input/output error
Jul 3 00:03:06 doug pppoe[1769]: Sent PADT
Jul 3 00:03:06 doug pppd[1763]: Exit.
The main difference I see is that in the success case we don't seem to
be receiving a PADT message, but rather we get an error in asyncReadFromPPP.
Any ideas where to look? For those on the netdev list who have just
tuned in, this started happening with 2.5.70.
Chris
--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]
Hello,
> Well, I've upgraded to the latest 2.5.74 kernel and pppd
> version 2.4.2b3
> (still using the rp-pppoe userspace software though).
>
> Per Stephen's suggestion I also tried removing the ip address and
> bringing down the ppp link before shuttind down the adsl connection.
>
> Makes no difference.
>
To complete on this topic : I've got the problem since 2.5.70, when
netdev_wait_allrefs has been introduced in net/core/dev.c
I have the same behavior using vtund, configured to create a tap0
interface.
At shutdown time, the interface refuses to get freed and I'm stuck.
Having vtund started at boot time (within the /etc/rc.d/... stuff)
or later doesn't make any difference.
Shutting down the interface before stopping the application or halting
the machine doesn't make any difference either.
The other problem is that the current implementation of
netdev_wait_allrefs makes that if you kill an application that is
using a device not correctly counted, you lock the console you are
working on.
e.g., killing vtund will create a printk(... unregister_netdevice...),
and the console cannot be used anymore as long as the counter hasn't
reached 0 and the device is freed...
Paul