2015-11-27 15:32:51

by Philipp Hahn

[permalink] [raw]
Subject: Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race condition that leads to zero port ID

Hello,

I observed a strange stall in a multi-threaded process (originally
OpenLDAP slapd), which stalls during a call to getaddrinfo() after
upgrading the kernel from 3.16.x to 4.1.12:

> #0 0x00007f8e6d97b87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #1 0x00007f8e6d99bfbc in make_request (fd=12, pid=2871, seen_ipv4=<optimized out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
> #2 0x00007f8e6d99c42a in __check_pf (seen_ipv4=0x7f8e6a0969ff, seen_ipv6=0x7f8e6a0969fe, in6ai=0x7f8e6a0969e0, in6ailen=0x7f8e6a0969d8) at ../sysdeps/unix/sysv/linux/check_pf.c:272
> #3 0x00007f8e6d959007 in *__GI_getaddrinfo (name=0x7f8e6a096a80 "0.0.0.0", service=0x400d04 "7636", hints=0x7f8e6a096a50, pai=0x7f8e6a096a48) at ../sysdeps/posix/getaddrinfo.c:2389

(The glibc tries to get the routing table to see via netlink, if it
should support IPv4 and/or IPv6. Neither
<https://sourceware.org/bugzilla/show_bug.cgi?id=12926> nor
<https://sourceware.org/git/?p=glibc.git;a=commit;h=636064eb4c03397c86aa26e489e68f952bd5e53f>
fixed the problem.)

I've been able to reproduce it by writing the attached program, which
calls getaddrinfo() from 10 thread 10^6 times. On my 8-core Intel I see
several threads hanging in the above call after some seconds - you don't
have to wait for the process to terminate):

> gdb --batch --ex 'set pagination off' --ex 'thread apply all bt' -p `pidof a.out` | grep rec
...
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0 0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82

Using "git bisect start v4.1.11 v.4.10" I've been able to identify
4e27762417669cb459971635be550eb7b5598286 (v4.1.9-26-g4e27762), which is
a back-port of 1f770c0a09da855a2b51af6d19de97fb955eca85
(v4.2-11228-g1f770c0). I've attached the log.

> # git log -1 4e27762417669cb459971635be550eb7b5598286
> commit 4e27762417669cb459971635be550eb7b5598286
> Author: Herbert Xu <[email protected]>
> Date: Fri Sep 18 19:16:50 2015 +0800
>
> netlink: Fix autobind race condition that leads to zero port ID
>
> [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]
>
> The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
> Reset portid after netlink_insert failure") introduced a race
> condition where if two threads try to autobind the same socket
> one of them may end up with a zero port ID. This led to kernel
> deadlocks that were observed by multiple people.
>
> This patch reverts that commit and instead fixes it by introducing
> a separte rhash_portid variable so that the real portid is only set
> after the socket has been successfully hashed.
>
> Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
> Reported-by: Tejun Heo <[email protected]>
> Reported-by: Linus Torvalds <[email protected]>
> Signed-off-by: Herbert Xu <[email protected]>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>

I verified 4e27762417669cb459971635be550eb7b5598286~1 does not show the
problem.

v4.1.13 still has the problem.
> # git log --oneline v4.1..v4.1.13 -- net/netlink/af_netlink.c
> e962218 netlink: Trim skb to alloc size to avoid MSG_TRUNC
> d486236 netlink: Replace rhash_portid with bound
> 4e27762 netlink: Fix autobind race condition that leads to zero port ID
> 65d48c6 netlink, mmap: transform mmap skb into full skb on taps
> d397617 netlink: make sure -EBUSY won't escape from netlink_insert
> b265c30 netlink: don't hold mutex in rcu callback when releasing mmapd ring

v4.3 does not show the problem.
> # git log --oneline v4.2..v4.3 -- net/netlink/af_netlink.c
> 47191d6 netlink: fix locking around NETLINK_LIST_MEMBERSHIPS
> db65a3a netlink: Trim skb to alloc size to avoid MSG_TRUNC
> da314c9 netlink: Replace rhash_portid with bound
> 1f770c0 netlink: Fix autobind race condition that leads to zero port ID
> 1853c94 netlink, mmap: transform mmap skb into full skb on taps
> 6bb0fef netlink, mmap: fix edge-case leakages in nf queue zero-copy
> a66e365 netlink, mmap: don't walk rx ring on poll if receive queue non-empty
> 0ef7077 netlink: rx mmap: fix POLLIN condition
> 7084a31 netlink: mmap: fix lookup frame position
> 0a6a3a2 netlink: add NETLINK_CAP_ACK socket option

v4.2 does not show the problem (as expected, as the patch isn't in 4.2.0
> # git log --oneline v4.2..v4.2.6 -- net/netlink/af_netlink.c
> 326ce2c netlink: Trim skb to alloc size to avoid MSG_TRUNC
> e2a3131 netlink: Replace rhash_portid with bound
> 6e32e73 netlink: Fix autobind race condition that leads to zero port ID
> 62f43b5 netlink, mmap: transform mmap skb into full skb on taps ring

I no longer have an idea where to look next, as something outside of
af_netlink.c seems to have fixed the issue for at least 4.3. Any idea?

Kernel: 4.1.12 (Debian)
GLIBC: 2.13-38+deb7u8
CPU: Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz

Thank you and a (hopefully) happy weekend.
Philipp
--
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
[email protected]

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876


Attachments:
40059_glic-NL.c (2.04 kB)
40059_linux-NL.bisect (1.79 kB)
Download all attachments

2015-12-05 01:10:17

by Herbert Xu

[permalink] [raw]
Subject: Re: Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race condition that leads to zero port ID

On Fri, Nov 27, 2015 at 04:23:54PM +0100, Philipp Hahn wrote:
>
> I observed a strange stall in a multi-threaded process (originally
> OpenLDAP slapd), which stalls during a call to getaddrinfo() after
> upgrading the kernel from 3.16.x to 4.1.12:

Can you try to see if the latest kernel still has this problem.
Unfortunately a number of fixes seem to not have made it into
4.1 so anything could happen there.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2015-12-05 14:16:45

by Philipp Hahn

[permalink] [raw]
Subject: Re: Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race condition that leads to zero port ID

Hello,

On Sat, Dec 05, 2015 at 09:10:08AM +0800, Herbert Xu wrote:
> On Fri, Nov 27, 2015 at 04:23:54PM +0100, Philipp Hahn wrote:
> >
> > I observed a strange stall in a multi-threaded process (originally
> > OpenLDAP slapd), which stalls during a call to getaddrinfo() after
> > upgrading the kernel from 3.16.x to 4.1.12:
>
> Can you try to see if the latest kernel still has this problem.
> Unfortunately a number of fixes seem to not have made it into
> 4.1 so anything could happen there.

I also tried 4.1.13, if that is what you mean with latest; it also
hangs.

> Thanks,

Thanks go to you.

Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]