Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757169AbbKRVXu (ORCPT ); Wed, 18 Nov 2015 16:23:50 -0500 Received: from mail-ph.de-nserver.de ([85.158.179.214]:12614 "EHLO mail-ph.de-nserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757082AbbKRVXs (ORCPT ); Wed, 18 Nov 2015 16:23:48 -0500 X-Fcrdns: No Subject: Re: Asterisk deadlocks since Kernel 4.1 To: Florian Weimer References: <564B3D35.50004@profihost.ag> <564B7F9D.5060701@profihost.ag> <564CDE2F.8000201@profihost.ag> <564CEB0C.40006@redhat.com> Cc: Thomas Gleixner , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org From: Stefan Priebe Message-ID: <564CEC65.4080901@profihost.ag> Date: Wed, 18 Nov 2015 22:23:49 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <564CEB0C.40006@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-User-Auth: Auth by s.priebe@profihost.ag through 185.39.223.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2498 Lines: 63 Am 18.11.2015 um 22:18 schrieb Florian Weimer: > On 11/18/2015 09:23 PM, Stefan Priebe wrote: >> >> Am 17.11.2015 um 20:43 schrieb Thomas Gleixner: >>> On Tue, 17 Nov 2015, Stefan Priebe wrote: >>>> I've now also two gdb backtraces from two crashes: >>>> http://pastebin.com/raw.php?i=yih5jNt8 >>>> >>>> http://pastebin.com/raw.php?i=kGEcvH4T >>> >>> They don't tell me anything as I have no idea of the inner workings of >>> asterisk. You might be better of to talk to the asterisk folks to help >>> you track down what that thing is waiting for, so we can actually look >>> at a well defined area. >> >> The asterisk guys told me it's a livelock asterisk is waiting for >> getaddrinfo / recvmsg. >> >> Thread 2 (Thread 0x7fbe989c6700 (LWP 12890)): >> #0 0x00007fbeb9eb487d in recvmsg () from /lib/x86_64-linux-gnu/libc.so.6 >> #1 0x00007fbeb9ed4fcc in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #2 0x00007fbeb9ed544a in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >> #3 0x00007fbeb9e92007 in getaddrinfo () from >> /lib/x86_64-linux-gnu/libc.so.6 > > Stefan, > > please try to get a backtrace with debugging information. It is likely > that this is the make_request/__check_pf functionality in glibc, but it > would be nice to get some certainty. > > Which glibc version do you use? Has it got a fix for CVE-2013-7423? It's Debians 2.13-38+deb7u8 Debians issue tracker says it is fixed: https://security-tracker.debian.org/tracker/CVE-2013-7423 > So far, the only known cause for a hang in this place (that is, lack of > return from recvmsg) is incorrect file descriptor use. (CVE-2013-7423 > is such an issue in glibc itself.) The kernel upgrade could change > scheduling behavior, and the actual bug might have been latent before. > > Theoretically, recvmsg could also hang if the Netlink query was dropped > by the kernel, or the final packet in the response was dropped. We > never saw that happen, even under extreme load, but I didn't test with > recent kernels. The load is very low in this system. Just 30 phones and only 1-6 calling. > The glibc change Hannes mentioned won't detect the hang, but if there is > incorrect file descriptor reuse going on, it is possible that the new > assert catches it. > > Florian > Stefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/