Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932565AbbLXKwK (ORCPT ); Thu, 24 Dec 2015 05:52:10 -0500 Received: from nautica.notk.org ([91.121.71.147]:36139 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932405AbbLXKwH (ORCPT ); Thu, 24 Dec 2015 05:52:07 -0500 Date: Thu, 24 Dec 2015 11:51:49 +0100 From: Dominique Martinet To: Andy Lutomirski Cc: Al Viro , "linux-kernel@vger.kernel.org" , V9FS Developers , Linux FS Devel Subject: Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race Message-ID: <20151224105149.GA24863@nautica> References: <20151207224643.GA10531@nautica> <20151208023331.GJ20997@ZenIV.linux.org.uk> <20151209062316.GA29917@nautica> <20151209064542.GW20997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1739 Lines: 42 Andy Lutomirski wrote on Thu, Dec 17, 2015: > This could be QEMU's analysis script screwing up. Is there a good way > for me to gather more info? I finally took some time to reproduce it (sorry for the delay) Using your config, virtme commit (17363c2) and kernel tag v4.4-rc3 I was able to reproduce it just fine with my qemu (2.4.90) Now for the fun bit... I ran it with a gdb server, attaching gdb and running cont always 'unblocks' it Using the kernel gdb scripts (lx-ps) I see about 250 kworker threads running, the backtraces all look the same: [ 20.273945] [] schedule+0x30/0x80 [ 20.274644] [] schedule_preempt_disabled+0x9/0x10 [ 20.275539] [] __mutex_lock_slowpath+0x107/0x2f0 [ 20.276421] [] ? lookup_fast+0xbe/0x320 [ 20.277195] [] mutex_lock+0x15/0x30 [ 20.277916] [] walk_component+0x1a7/0x270 so given it unblocks after hooking gdb + cont I'm actually thinking this might be a pure scheduling issue? (e.g. thread is never re-scheduled or something like that?) I can't see any task not in schedule() in your sysrq dump task transcript either. Not sure how to go around debugging that, to be honest. I've tried both default one virtual cpu and -smp 3 or 4 and both can reproduce it; cpu usage on the host is always low so it doesn't look like there's any busy-polling involved.. This is a pretty subtle bug we have there.. -- Dominique Martinet -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/