Received: by 10.192.165.156 with SMTP id m28csp816362imm; Mon, 16 Apr 2018 09:09:27 -0700 (PDT) X-Google-Smtp-Source: AIpwx48dWgB854+YrJHl2RVJtF2vDCCUT2nrsxm3puGMpdmvVV4SKUyEK4t046TwqC4+mgWXqr7h X-Received: by 10.101.65.67 with SMTP id x3mr783660pgp.425.1523894967467; Mon, 16 Apr 2018 09:09:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523894967; cv=none; d=google.com; s=arc-20160816; b=ylyOcJwOFG3JyoPObZEXZBNjyfz7Vy+uNxe8H4weR19aB0yDwkw3uwhlDcsPXaqeqc iW6JEGzzAncazKzXR0WnEWl54kXyvWQ/iNAdsEh6n5IiGz+s5lJeZId7axqDzyqrIdoO pLtz6zK2quhV9+Naq1LrJlw9HULsAyXwGHKoi9M/DHrrOHCSqCXOEmi97HXvOI9bbk78 KkkXAwOeNhtcCfObxhlD/88NHgM98hS+IM5oTAo0OZbJGBKBKRlFXYUCjz5A7Ok/n1fb fhrAN2aG6qLtvaBum+fF7+zSHf2gJLrKtAwn11LJvvt20zha2mNOAs/BPjotbX365piT taKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=33TVCAydUPr8bz7kgZ3/ggouLpUrc/JIuojzumOFNMY=; b=QlPgxmABHzDB5bWX2CFGGBFkwIWULdbuRSIbDXg5vLffMbQ2P+5VjgypuK3xrBuvVr FtXhO42EEgoWNLXkQnCOAyI3j9ph6tc6W/7ah8a1C7v955DSG7gJV+E1Hs+wQs3Lm+Qf S0HrPMpSxWMyjf9H7s2dZ3S6h5P/G8JlC78CtUpDNtMAvNwdYuk1IOQ2siEL8OKy8fvB MsQTRRpFtEBaxb0iGEBcDr0ih3nW9DKNE1paUp8xHNflRiCSv64PLFZCnYWzDyOGmPsO E3K2RAbK0iarDLQPrJN6QVpWCfnhMLp5+K5ni4GntqKPbSy5pEqVM/+5GlVdJqN2VbwW ppaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@morinfr.org header.s=20170427 header.b=Vlj1gCa+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z124si11117135pfb.188.2018.04.16.09.09.11; Mon, 16 Apr 2018 09:09:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@morinfr.org header.s=20170427 header.b=Vlj1gCa+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752973AbeDPQGg (ORCPT + 99 others); Mon, 16 Apr 2018 12:06:36 -0400 Received: from smtp4-g21.free.fr ([212.27.42.4]:15262 "EHLO smtp4-g21.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752831AbeDPQGe (ORCPT ); Mon, 16 Apr 2018 12:06:34 -0400 Received: from bender.morinfr.org (unknown [81.57.171.53]) by smtp4-g21.free.fr (Postfix) with ESMTPS id 9162219F4F3; Mon, 16 Apr 2018 18:06:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=morinfr.org ; s=20170427; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=33TVCAydUPr8bz7kgZ3/ggouLpUrc/JIuojzumOFNMY=; b=Vlj1gCa+0/mnrL0AzW6qSO+xD5 dTx/MpdoPBbVE2eKmmThvIRLuLQYXmqZLtyyDDsp8xAmCh3A5N9T1Qq3fult6GnBcCCVgl9mn/Uxq eq7SsSEM/LlqIKKNVaUvS/lzy2ikutTDAjaqa6W9WCVUJt/poFgAS2wSsmLkVD5EVuMI=; Received: from guillaum by bender.morinfr.org with local (Exim 4.89) (envelope-from ) id 1f86eF-0007Ge-R4; Mon, 16 Apr 2018 18:06:31 +0200 Date: Mon, 16 Apr 2018 18:06:31 +0200 From: Guillaume Morin To: Jan Kara Cc: Pavlos Parissis , stable@vger.kernel.org, decui@microsoft.com, jack@suse.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, mszeredi@redhat.com Subject: Re: kernel panics with 4.14.X versions Message-ID: <20180416160631.2jepytqz5phrg3g3@bender.morinfr.org> Mail-Followup-To: Jan Kara , Pavlos Parissis , stable@vger.kernel.org, decui@microsoft.com, jack@suse.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, mszeredi@redhat.com References: <20180416132550.d25jtdntdvpy55l3@bender.morinfr.org> <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16 Apr 16:40, Jan Kara wrote: > Can you please run RIP through ./scripts/faddr2line to see where exactly > are we looping? I expect the loop iterating over marks to notify but better > be sure. > > How easily can you hit this? Are you able to run debug kernels / inspect > crash dumps when the issue occurs? Also testing with the latest mainline > kernel (4.16) would be welcome whether this isn't just an issue with the > backport of fsnotify fixes from Miklos. I do have one proper kernel crash dump for one of the lockups we saw PID: 30407 TASK: ffff9584913b2180 CPU: 8 COMMAND: "python" #0 [ffff959cb7883d80] machine_kexec at ffffffff890561ff #1 [ffff959cb7883dd8] __crash_kexec at ffffffff890f6dde #2 [ffff959cb7883e90] panic at ffffffff89074f03 #3 [ffff959cb7883f10] watchdog_timer_fn at ffffffff89117388 #4 [ffff959cb7883f40] __hrtimer_run_queues at ffffffff890dc65c #5 [ffff959cb7883f88] hrtimer_interrupt at ffffffff890dcb76 #6 [ffff959cb7883fd8] smp_apic_timer_interrupt at ffffffff89802f6a #7 [ffff959cb7883ff0] apic_timer_interrupt at ffffffff8980227d --- --- #8 [ffffafa5c894f880] apic_timer_interrupt at ffffffff8980227d [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffffffff8a696820 RFLAGS: 00000002 RAX: ffff95908f520c20 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff959c83c4d000 RSI: 0000000000000000 RDI: ffffafa5c894f9f8 RBP: 0000000053411000 R8: 0000000000000000 R9: ffff95908f520c48 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000001000 R14: 0000000000001000 R15: 0000000053410000 ORIG_RAX: 0000000000000000 CS: 0000 SS: ffffffffffffff10 bt: WARNING: possibly bogus exception frame #9 [ffffafa5c894f928] fsnotify at ffffffff892293e7 #10 [ffffafa5c894f9e8] __fsnotify_parent at ffffffff89229686 #11 [ffffafa5c894fa48] __kernel_write at ffffffff891e9962 #12 [ffffafa5c894fa70] dump_emit at ffffffff892445af #13 [ffffafa5c894faa8] elf_core_dump at ffffffff8923f546 #14 [ffffafa5c894fc60] do_coredump at ffffffff89244c3f #15 [ffffafa5c894fda0] get_signal at ffffffff89083ed0 #16 [ffffafa5c894fe18] do_signal at ffffffff89028323 #17 [ffffafa5c894ff10] exit_to_usermode_loop at ffffffff8900308c #18 [ffffafa5c894ff38] prepare_exit_to_usermode at ffffffff89003753 RIP: 00007f69706935c3 RSP: 00007ffeb8c1b4a8 RFLAGS: 00010206 RAX: 00007f686d200034 RBX: 00005591f24f0170 RCX: 00007f68cb800000 RDX: 00007f696d200000 RSI: 0000000000000061 RDI: 00007f686d200034 RBP: 00007f686d200010 R8: ffffffffffffffff R9: 00000000000000ff R10: 00000000e0a9a400 R11: 0000000000000246 R12: 0000000100000000 R13: 0000000100000000 R14: 0000000000000000 R15: 0000000000000083 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b faddr2line gives "fsnotify at fs/notify/fsnotify.c:368" (it's a 4.14.22). So it does seem that you were right about the location. This happens with systemd handling coredumps. It's using fsnotify to learn about new dumps. Note that on this machine, the dumps are on a loop mount: /dev/loop0 /usr/cores ext4 rw,relatime,data=ordered 0 0 -- Guillaume Morin