Hi,
When ATM and Ethernet are compiled in, ATM and Ethernet create their
NEIGH/ARP tables, they are both assigned to family AF_INET.
int neigh_add(....) {
...
for (tbl=neigh_tables; tbl; tbl = tbl->next) {
if (tbl->family != ndm->ndm_family)
continue;
...
}
As ATM table is created before Ethernet(main?) table,
net/core/neighbour.c::neigh_add() function add all permanent IP ARP
Ethernet NUD to the IP ATM table, which is wrong.
Therefore, when net/core/neighbour.c::neigh_ifdown() is called ARP
entries are not cleared, leaving dev->refcnt to a value that will never
be able to reach 0 anymore.
So, when net/core/dev.c::unregister_netdevice() is called it stalls
without being able to destroy the interface leaving the system with no
network tools working anymore.
This is really easy to reproduce:
openvpn --mktun --dev tap10
ip addr add 10.20.30.20/24 dev tap10
ip link set up dev tap10
ip neighbour add 10.20.30.40 lladdr 01:02:03:04:05:06 nud permanent dev tap10
ip link set down dev tap10
openvpn --rmtun --dev tap10
and then kernel log starts being filled by:
unregister_netdevice: waiting for tap10 to become free. Usage count = 2
unregister_netdevice: waiting for tap10 to become free. Usage count = 2
unregister_netdevice: waiting for tap10 to become free. Usage count = 2
unregister_netdevice: waiting for tap10 to become free. Usage count = 2
I changed the family of the ATM table to AF_ATMPVC, of course it fixes
the issue but I guess this is the wrong way to fix that.
Best regard,
Sylvain
Hi,
On Wed, Apr 07, 2010 at 10:23:39PM +0200, Sylvain Rochet wrote:
> Hi,
>
> (...)
>
> I changed the family of the ATM table to AF_ATMPVC, of course it fixes
> the issue but I guess this is the wrong way to fix that.
Finally made a patch that follows what Linux 2.6 does, which consists of
having "netlink" and "no-netlink" tables.
Sylvain
Hi Sylvain,
indeed, you've hit a real bug. It reminds me of the sad days I
was forced to use IPoA over a USB modem to access the net. The
tiniest config error required a reboot to fix it :-/
Your fix looks right at first glance, but I'll review it deeper
before merging it, though it should be OK since 2.6 is similar.
BTW, is there any reason why you're stuck on 2.4 ? Are you using
some vendor-specific drivers which are not in 2.6, did you not
have the time to upgrade yet, or did you not find a long enable
support for 2.6 releases ? Or anything else ?
I'm asking because whatever keeps users in 2.4 should be addressed
one way or another (probably via some doc to add in 2.4 BTW).
Regards,
Willy
Hi Willy,
On Tue, Apr 20, 2010 at 07:11:25AM +0200, Willy Tarreau wrote:
> Hi Sylvain,
>
> indeed, you've hit a real bug. It reminds me of the sad days I
> was forced to use IPoA over a USB modem to access the net. The
> tiniest config error required a reboot to fix it :-/
>
> Your fix looks right at first glance, but I'll review it deeper
> before merging it, though it should be OK since 2.6 is similar.
>
> BTW, is there any reason why you're stuck on 2.4 ? Are you using
> some vendor-specific drivers which are not in 2.6, did you not
> have the time to upgrade yet, or did you not find a long enable
> support for 2.6 releases ? Or anything else ?
>
> I'm asking because whatever keeps users in 2.4 should be addressed
> one way or another (probably via some doc to add in 2.4 BTW).
Well, if that were only me, this would be a 2.6 kernel, actually one of
our new xDSL collect provider use Linux routers on operator customer
edge and they are still using 2.4 kernels. This is going to change
soon, but well, I discovered that there was this bug, I could not left
it uncorrected, even on the 2.4 kernel ;-)
I am not sure if collect routers are also used elsewhere than in France.
This is the server where PPP/L2TP tunnels or VP/VC ATM are ended, so
that all operators use the same national networks and simply use PPP
tunnels from xDSL customer to Internet operator router, using Radius,
PPPoE, PPPoA, L2TP, and PPP protocol to do the authentication, find and
reach the endpoint.
By the way, the patch also fix another issue, when an interface with
dynamic NUDs is set to link down sate, you have to wait that NUDs
entries expire before setting the interface back to link up state. This
is obvious because dynamic NUDs entries are not cleared when
neigh_ifdown() is called.
Regards,
Sylvain