Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754949Ab1DKRtY (ORCPT ); Mon, 11 Apr 2011 13:49:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36212 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754762Ab1DKRtX (ORCPT ); Mon, 11 Apr 2011 13:49:23 -0400 Subject: Re: Strange block/scsi/workqueue issue From: Steven Whitehouse To: Tejun Heo Cc: linux-kernel@vger.kernel.org, Jens Axboe , James Bottomley In-Reply-To: <20110411171803.GG9673@mtj.dyndns.org> References: <1302533763.2596.23.camel@dolmen> <20110411171803.GG9673@mtj.dyndns.org> Content-Type: text/plain; charset="UTF-8" Organization: Red Hat UK Ltd Date: Mon, 11 Apr 2011 18:52:10 +0100 Message-ID: <1302544330.2596.31.camel@dolmen> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3974 Lines: 101 Hi, On Tue, 2011-04-12 at 02:18 +0900, Tejun Heo wrote: > Hello, > > (cc'ing James. The original message is http://lkml.org/lkml/2011/4/11/175 ) > > Please read from the bottom up. > > On Mon, Apr 11, 2011 at 03:56:03PM +0100, Steven Whitehouse wrote: > > [] schedule_timeout+0x295/0x310 > > [] wait_for_common+0x120/0x170 > > [] wait_for_completion+0x18/0x20 > > [] wait_on_cpu_work+0xec/0x100 > > [] wait_on_work+0xdb/0x150 > > [] __cancel_work_timer+0x83/0x130 > > [] cancel_delayed_work_sync+0xd/0x10 > > 4. which in turn tries to sync cancel q->delay_work. Oops, deadlock. > > > [] blk_sync_queue+0x24/0x50 > > 3. and calls into blk_sync_queue() > > > [] blk_cleanup_queue+0xf/0x60 > > [] scsi_free_queue+0x9/0x10 > > [] scsi_device_dev_release_usercontext+0xeb/0x140 > > [] execute_in_process_context+0x86/0xa0 > > 2. It triggers SCSI device release > > > [] scsi_device_dev_release+0x17/0x20 > > [] device_release+0x22/0x90 > > [] kobject_release+0x45/0x90 > > [] kref_put+0x37/0x70 > > [] kobject_put+0x27/0x60 > > [] put_device+0x12/0x20 > > [] scsi_request_fn+0xb9/0x4a0 > > [] __blk_run_queue+0x6a/0x110 > > [] blk_delay_work+0x26/0x40 > > 1. Workqueue starting execution of q->delay_work and scsi_request_fn() > is run from there. > > > [] process_one_work+0x197/0x520 > > [] worker_thread+0x15c/0x330 > > [] kthread+0xa6/0xb0 > > [] kernel_thread_helper+0x4/0x10 > > So, q->delay_work ends up waiting for itself. I'd like to blame SCSI > (as it also fits my agenda to kill execute_in_process_context ;-) for > diving all the way into blk_cleanup_queue() directly from request_fn. > > Does the following patch fix the problem? > Unfortunately not: scsi 0:0:32:0: Enclosure DP BACKPLANE 1.07 PQ: 0 ANSI: 5 scsi 0:2:0:0: Direct-Access DELL PERC 6/i 1.22 PQ: 0 ANSI: 5 scsi 0:2:1:0: Direct-Access DELL PERC 6/i 1.22 PQ: 0 ANSI: 5 ------------[ cut here ]------------ ------------[ cut here ]------------ WARNING: at lib/kref.c:34 kref_get+0x2d/0x30() Hardware name: PowerEdge R710 Modules linked in: Pid: 12, comm: kworker/2:0 Not tainted 2.6.39-rc2+ #188 Call Trace: [] warn_slowpath_common+0x7a/0xb0 [] warn_slowpath_null+0x15/0x20 [] kref_get+0x2d/0x30 [] kobject_get+0x1a/0x30 [] get_device+0x14/0x20 [] scsi_request_fn+0x37/0x4a0 [] __blk_run_queue+0x6a/0x110 [] blk_delay_work+0x26/0x40 [] process_one_work+0x197/0x520 [] ? process_one_work+0x131/0x520 [] ? blk_make_request+0x90/0x90 [] worker_thread+0x15c/0x330 [] ? manage_workers.clone.20+0x240/0x240 [] ? manage_workers.clone.20+0x240/0x240 [] kthread+0xa6/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x6f/0x110 [] ? _raw_spin_unlock_irq+0x46/0x70 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 ---[ end trace 3681e9da2630a94b ]--- Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/