Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757122AbbKRVgb (ORCPT ); Wed, 18 Nov 2015 16:36:31 -0500 Received: from mail-ph.de-nserver.de ([85.158.179.214]:11206 "EHLO mail-ph.de-nserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756987AbbKRVg3 (ORCPT ); Wed, 18 Nov 2015 16:36:29 -0500 X-Fcrdns: No Subject: Re: Asterisk deadlocks since Kernel 4.1 To: Florian Weimer References: <564B3D35.50004@profihost.ag> <564B7F9D.5060701@profihost.ag> <564CDE2F.8000201@profihost.ag> <564CEB0C.40006@redhat.com> Cc: Thomas Gleixner , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org From: Stefan Priebe Message-ID: <564CEF5D.3080005@profihost.ag> Date: Wed, 18 Nov 2015 22:36:29 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <564CEB0C.40006@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-User-Auth: Auth by s.priebe@profihost.ag through 185.39.223.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4695 Lines: 104 Am 18.11.2015 um 22:18 schrieb Florian Weimer: > On 11/18/2015 09:23 PM, Stefan Priebe wrote: >> >> Am 17.11.2015 um 20:43 schrieb Thomas Gleixner: >>> On Tue, 17 Nov 2015, Stefan Priebe wrote: >>>> I've now also two gdb backtraces from two crashes: >>>> http://pastebin.com/raw.php?i=yih5jNt8 >>>> >>>> http://pastebin.com/raw.php?i=kGEcvH4T >>> >>> They don't tell me anything as I have no idea of the inner workings of >>> asterisk. You might be better of to talk to the asterisk folks to help >>> you track down what that thing is waiting for, so we can actually look >>> at a well defined area. >> >> The asterisk guys told me it's a livelock asterisk is waiting for >> getaddrinfo / recvmsg. >> >> Thread 2 (Thread 0x7fbe989c6700 (LWP 12890)): >> #0 0x00007fbeb9eb487d in recvmsg () from /lib/x86_64-linux-gnu/libc.so.6 >> #1 0x00007fbeb9ed4fcc in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #2 0x00007fbeb9ed544a in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #3 0x00007fbeb9e92007 in getaddrinfo () from >> /lib/x86_64-linux-gnu/libc.so.6 > > Stefan, > > please try to get a backtrace with debugging information. It is likely > that this is the make_request/__check_pf functionality in glibc, but it > would be nice to get some certainty. sorry here it is. What I'm wondering is why is there ipv6 stuff? I don't have ipv6 except for link local. Could it be this one? https://bugzilla.redhat.com/show_bug.cgi?id=505105#c79 Thread 31 (Thread 0x7f295c011700 (LWP 26654)): #0 0x00007f295de3287d in recvmsg () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f295de52fcc in make_request (fd=35, pid=26631, seen_ipv4=, seen_ipv6=, in6ai=, in6ailen=) at ../sysdeps/unix/sysv/linux/check_pf.c:119 #2 0x00007f295de5344a in __check_pf (seen_ipv4=0x7f295c00e85f, seen_ipv6=0x7f295c00e85e, in6ai=0x7f295c00e840, in6ailen=0x7f295c00e838) at ../sysdeps/unix/sysv/linux/check_pf.c:271 #3 0x00007f295de10007 in *__GI_getaddrinfo (name=0x7f295c00e8b0 "10.12.12.55", service=0x7f295c00e8bc "2135", hints=0x7f295c00e910, pai=0x7f295c00e908) at ../sysdeps/posix/getaddrinfo.c:2389 #4 0x000000000050287e in ast_sockaddr_resolve (addrs=0x7f295c00e9d0, str=0x7f295c00ea30 "10.12.12.55:2135", flags=0, family=2) at netsock2.c:268 #5 0x00007f2958963ba2 in ast_sockaddr_resolve_first_af (addr=0x7f29300591d8, name=0x7f295c00ea30 "10.12.12.55:2135", flag=0, family=2) at chan_sip.c:30689 #6 0x00007f2958963cb5 in ast_sockaddr_resolve_first_transport (addr=0x7f29300591d8, name=0x7f295c00ea30 "10.12.12.55:2135", flag=0, transport=1) at chan_sip.c:30720 #7 0x00007f29588fd3cc in set_destination (p=0x7f2930058cc8, uri=0x7f29300576e8 "sip:9052@10.12.12.55:2135;line=to7a729l") at chan_sip.c:10455 #8 0x00007f29588fe6e0 in reqprep (req=0x7f295c00fee0, p=0x7f2930058cc8, sipmethod=4, seqno=287, newbranch=1) at chan_sip.c:10778 #9 0x00007f295890a201 in transmit_state_notify (p=0x7f2930058cc8, state=1, full=1, timeout=0) at chan_sip.c:13259 #10 0x00007f29589141bb in cb_extensionstate (context=0x7f295c010cd0 "hints", exten=0x7f295c010c80 "9052QS", state=1, data=0x7f2930058cc8) at chan_sip.c:15117 #11 0x000000000050ebf6 in handle_statechange (datap=0x7f293acef830) at pbx.c:4972 #12 0x0000000000555f8e in tps_processing_function (data=0x1f24f28) at taskprocessor.c:327 #13 0x0000000000569280 in dummy_start (data=0x1ed76f0) at utils.c:1173 #14 0x00007f295d5dcb50 in start_thread (arg=) at pthread_create.c:304 #15 0x00007f295de3195d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #16 0x0000000000000000 in ?? () > > Which glibc version do you use? Has it got a fix for CVE-2013-7423? > > So far, the only known cause for a hang in this place (that is, lack of > return from recvmsg) is incorrect file descriptor use. (CVE-2013-7423 > is such an issue in glibc itself.) The kernel upgrade could change > scheduling behavior, and the actual bug might have been latent before. > > Theoretically, recvmsg could also hang if the Netlink query was dropped > by the kernel, or the final packet in the response was dropped. We > never saw that happen, even under extreme load, but I didn't test with > recent kernels. > > The glibc change Hannes mentioned won't detect the hang, but if there is > incorrect file descriptor reuse going on, it is possible that the new > assert catches it. > > Florian > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/