Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754531Ab1DKR3S (ORCPT ); Mon, 11 Apr 2011 13:29:18 -0400 Received: from 0122700014.0.fullrate.dk ([95.166.99.235]:53047 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754131Ab1DKR3Q (ORCPT ); Mon, 11 Apr 2011 13:29:16 -0400 Message-ID: <4DA33A68.7040707@fusionio.com> Date: Mon, 11 Apr 2011 19:29:12 +0200 From: Jens Axboe MIME-Version: 1.0 To: Tejun Heo CC: Steven Whitehouse , "linux-kernel@vger.kernel.org" , James Bottomley Subject: Re: Strange block/scsi/workqueue issue References: <1302533763.2596.23.camel@dolmen> <20110411171803.GG9673@mtj.dyndns.org> In-Reply-To: <20110411171803.GG9673@mtj.dyndns.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2429 Lines: 64 On 2011-04-11 19:18, Tejun Heo wrote: > Hello, > > (cc'ing James. The original message is http://lkml.org/lkml/2011/4/11/175 ) > > Please read from the bottom up. > > On Mon, Apr 11, 2011 at 03:56:03PM +0100, Steven Whitehouse wrote: >> [] schedule_timeout+0x295/0x310 >> [] wait_for_common+0x120/0x170 >> [] wait_for_completion+0x18/0x20 >> [] wait_on_cpu_work+0xec/0x100 >> [] wait_on_work+0xdb/0x150 >> [] __cancel_work_timer+0x83/0x130 >> [] cancel_delayed_work_sync+0xd/0x10 > > 4. which in turn tries to sync cancel q->delay_work. Oops, deadlock. > >> [] blk_sync_queue+0x24/0x50 > > 3. and calls into blk_sync_queue() > >> [] blk_cleanup_queue+0xf/0x60 >> [] scsi_free_queue+0x9/0x10 >> [] scsi_device_dev_release_usercontext+0xeb/0x140 >> [] execute_in_process_context+0x86/0xa0 > > 2. It triggers SCSI device release > >> [] scsi_device_dev_release+0x17/0x20 >> [] device_release+0x22/0x90 >> [] kobject_release+0x45/0x90 >> [] kref_put+0x37/0x70 >> [] kobject_put+0x27/0x60 >> [] put_device+0x12/0x20 >> [] scsi_request_fn+0xb9/0x4a0 >> [] __blk_run_queue+0x6a/0x110 >> [] blk_delay_work+0x26/0x40 > > 1. Workqueue starting execution of q->delay_work and scsi_request_fn() > is run from there. > >> [] process_one_work+0x197/0x520 >> [] worker_thread+0x15c/0x330 >> [] kthread+0xa6/0xb0 >> [] kernel_thread_helper+0x4/0x10 > > So, q->delay_work ends up waiting for itself. I'd like to blame SCSI > (as it also fits my agenda to kill execute_in_process_context ;-) for > diving all the way into blk_cleanup_queue() directly from request_fn. > > Does the following patch fix the problem? Thanks, that looks a lot saner. This is/was a time bomb waiting to blow up. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/