2009-04-22 05:58:00

by Alexander V. Lukyanov

[permalink] [raw]
Subject: 2.6.29.1: unregister_netdevice problem

Eventually I have an increased load average without apparent reason.
When I reboot the server in such a case, I get infinitely repeating
messages on the console:

unregister_netdevice: waiting for eth0.2 to become free. Usage count = 4

eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller: Realtek
Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet
controller (rev 01)

--
Alexander.


2009-04-23 23:31:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

"Alexander V. Lukyanov" <[email protected]> writes:

> Eventually I have an increased load average without apparent reason.
> When I reboot the server in such a case, I get infinitely repeating
> messages on the console:
>
> unregister_netdevice: waiting for eth0.2 to become free. Usage count = 4
>
> eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller: Realtek
> Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet
> controller (rev 01)

CC: netdev where someone might have a better clue.

Infinitely repeating unregister_netdevice messages means something
isn't releasing it's reference count to your network device.

There really isn't enough information in your email to figure out
what you were doing that, or what piece of code triggered this.

Eric

2009-04-24 21:23:11

by Bruno Prémont

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

On Thu, 23 April 2009 [email protected] (Eric W. Biederman) wrote:
> "Alexander V. Lukyanov" <[email protected]> writes:
>
> > Eventually I have an increased load average without apparent reason.
> > When I reboot the server in such a case, I get infinitely repeating
> > messages on the console:
> >
> > unregister_netdevice: waiting for eth0.2 to become free. Usage
> > count = 4
> >
> > eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller:
> > Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit
> > Ethernet controller (rev 01)
>
> CC: netdev where someone might have a better clue.
>
> Infinitely repeating unregister_netdevice messages means something
> isn't releasing it's reference count to your network device.
>
> There really isn't enough information in your email to figure out
> what you were doing that, or what piece of code triggered this.


A few I similar cases I have encountered are related to:
vlan, netconsole

If you attempt to rmmod the driver of a network interface for which
you have a vlan or netconsole setup on top of you end up with this
kind of lock.

At least the two above do not react of removal attempt notifications
and thus keep the network device referenced.

Bruno

2009-04-27 05:42:22

by Alexander V. Lukyanov

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

On Wed, Apr 22, 2009 at 09:57:35AM +0400, Alexander V. Lukyanov wrote:
> unregister_netdevice: waiting for eth0.2 to become free. Usage count = 4
>
> eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller: Realtek
> Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet
> controller (rev 01)

Ok, now I did manually 'vconfig rem eth0.2' and I get these repeated messages.
How do I find out what exactly holds the interface being used?

I have killed most of non-kernel processes, eth0.2 is still used. LA=1, but
the cpu is idle.

top - 09:34:26 up 4 days, 23:55, 2 users, load average: 1.00, 1.12, 3.55
Tasks: 56 total, 1 running, 55 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3356308k total, 1165044k used, 2191264k free, 443536k buffers
Swap: 3212920k total, 712k used, 3212208k free, 493568k cached

--
Alexander..

2009-04-28 12:58:35

by Alexander V. Lukyanov

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

On Mon, Apr 27, 2009 at 09:41:03AM +0400, Alexander V. Lukyanov wrote:
> On Wed, Apr 22, 2009 at 09:57:35AM +0400, Alexander V. Lukyanov wrote:
> > unregister_netdevice: waiting for eth0.2 to become free. Usage count = 4
> >
> > eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller: Realtek
> > Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet
> > controller (rev 01)
>
> Ok, now I did manually 'vconfig rem eth0.2' and I get these repeated messages.
> How do I find out what exactly holds the interface being used?

BTW, it looks like all vlan interfaces (I have many of them) have similar
problem, when it happens - every few days:

unregister_netdevice: waiting for eth0.907 to become free. Usage count = 20

and when I run a `vconfig rem', I cannot run another one. It s(t)ucks.

--
Alexander..

2009-04-28 20:51:36

by Jarek Poplawski

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

Alexander V. Lukyanov wrote, On 04/28/2009 02:57 PM:

> On Mon, Apr 27, 2009 at 09:41:03AM +0400, Alexander V. Lukyanov wrote:
>> On Wed, Apr 22, 2009 at 09:57:35AM +0400, Alexander V. Lukyanov wrote:
>>> unregister_netdevice: waiting for eth0.2 to become free. Usage count = 4
>>>
>>> eth0.2 is a vlan interface, eth0 is 02:00.0 Ethernet controller: Realtek
>>> Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet
>>> controller (rev 01)
>> Ok, now I did manually 'vconfig rem eth0.2' and I get these repeated messages.
>> How do I find out what exactly holds the interface being used?
>
> BTW, it looks like all vlan interfaces (I have many of them) have similar
> problem, when it happens - every few days:
>
> unregister_netdevice: waiting for eth0.907 to become free. Usage count = 20
>
> and when I run a `vconfig rem', I cannot run another one. It s(t)ucks.
>


Do you mean if you wait a bit longer (until the first one is really removed)
before running another one, it doesn't s(t)uck? Is there a change e.g. wrt.
2.6.28?

Jarek P.

2009-04-29 05:45:43

by Alexander V. Lukyanov

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

On Tue, Apr 28, 2009 at 10:49:58PM +0200, Jarek Poplawski wrote:
> Do you mean if you wait a bit longer (until the first one is really removed)

The problem is that it is never removed. I waited for at least 30 minutes.

> before running another one, it doesn't s(t)uck? Is there a change e.g. wrt.
> 2.6.28?

I mean, 'vconfig rem' on another interface stucks if the previous vconfig
has not finished, and it never finishes. So I cannot check if other vlan
interfaces have the same 'refcnt' problem. So far I tried 'vconfig rem' on
two vlan interfaces and I have a dozen.

Again, the problem only happens after I notice surprisingly high LA (30
instead of 2-6) and interactive slowness. When it happens, I check network
usage, disk usage, cpu usage, mem usage - they are normal or even lower than
usual.

The server is running transparent squid and named. The last kernel version
was 2.6.27.21 and it did not have this problem.

--
Alexander.

2009-04-29 09:11:37

by Jarek Poplawski

[permalink] [raw]
Subject: Re: 2.6.29.1: unregister_netdevice problem

On Wed, Apr 29, 2009 at 09:45:10AM +0400, Alexander V. Lukyanov wrote:
> On Tue, Apr 28, 2009 at 10:49:58PM +0200, Jarek Poplawski wrote:
> > Do you mean if you wait a bit longer (until the first one is really removed)
>
> The problem is that it is never removed. I waited for at least 30 minutes.
>
> > before running another one, it doesn't s(t)uck? Is there a change e.g. wrt.
> > 2.6.28?
>
> I mean, 'vconfig rem' on another interface stucks if the previous vconfig
> has not finished, and it never finishes. So I cannot check if other vlan
> interfaces have the same 'refcnt' problem. So far I tried 'vconfig rem' on
> two vlan interfaces and I have a dozen.
>
> Again, the problem only happens after I notice surprisingly high LA (30
> instead of 2-6) and interactive slowness. When it happens, I check network
> usage, disk usage, cpu usage, mem usage - they are normal or even lower than
> usual.
>
> The server is running transparent squid and named. The last kernel version
> was 2.6.27.21 and it did not have this problem.

So looks like a regression. Alas this thing could be hard to debug and
still more data is needed. For the beginning maybe: .config, dmesg,
and a few SysRq logs while this happens e.g. Alt-PrtScr with t, d, w, q
(gzipped or as attachments to a bugzilla report). (If it's not a big
problem trying 2.6.28.9 could be helpful too.)

Thanks,
Jarek P.