Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752171Ab2E2Bli (ORCPT ); Mon, 28 May 2012 21:41:38 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:52624 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751885Ab2E2Blg (ORCPT ); Mon, 28 May 2012 21:41:36 -0400 Date: Tue, 29 May 2012 10:41:28 +0900 From: Tejun Heo To: Asias He Cc: Jens Axboe , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V3] block: Mitigate lock unbalance caused by lock switching Message-ID: <20120529014128.GF20954@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120528102214.GB15202@dhcp-172-17-108-109.mtv.corp.google.com> <1338255542-22247-1-git-send-email-asias@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1338255542-22247-1-git-send-email-asias@redhat.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3642 Lines: 79 On Tue, May 29, 2012 at 09:39:01AM +0800, Asias He wrote: > Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally > supplied queue_lock before blk_drain_queue(). Switching the lock would > introduce lock unbalance because theads which have taken the external > lock might unlock the internal lock in the during the queue drain. This > patch mitigate this by disconnecting the lock after the queue draining > since queue draining makes a lot of request_queue users go away. > > However, please note, this patch only makes the problem less likely to > happen. Anyone who still holds a ref might try to issue a new request on > a dead queue after the blk_cleanup_queue() finishes draining, the lock > unbalance might still happen in this case. > > ===================================== > [ BUG: bad unlock balance detected! ] > 3.4.0+ #288 Not tainted > ------------------------------------- > fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at: > [] blk_queue_bio+0x2a2/0x380 > but there are no more locks to release! > > other info that might help us debug this: > 1 lock held by fio/17706: > #0: (&(&vblk->lock)->rlock){......}, at: [] > get_request_wait+0x19a/0x250 > > stack backtrace: > Pid: 17706, comm: fio Not tainted 3.4.0+ #288 > Call Trace: > [] ? blk_queue_bio+0x2a2/0x380 > [] print_unlock_inbalance_bug+0xf9/0x100 > [] lock_release_non_nested+0x1df/0x330 > [] ? dio_bio_end_aio+0x34/0xc0 > [] ? bio_check_pages_dirty+0x85/0xe0 > [] ? dio_bio_end_aio+0xb1/0xc0 > [] ? blk_queue_bio+0x2a2/0x380 > [] ? blk_queue_bio+0x2a2/0x380 > [] lock_release+0xd9/0x250 > [] _raw_spin_unlock_irq+0x23/0x40 > [] blk_queue_bio+0x2a2/0x380 > [] generic_make_request+0xca/0x100 > [] submit_bio+0x76/0xf0 > [] ? set_page_dirty_lock+0x3c/0x60 > [] ? bio_set_pages_dirty+0x51/0x70 > [] do_blockdev_direct_IO+0xbf8/0xee0 > [] ? blkdev_get_block+0x80/0x80 > [] __blockdev_direct_IO+0x55/0x60 > [] ? blkdev_get_block+0x80/0x80 > [] blkdev_direct_IO+0x57/0x60 > [] ? blkdev_get_block+0x80/0x80 > [] generic_file_aio_read+0x70e/0x760 > [] ? __lock_acquire+0x215/0x5a0 > [] ? aio_run_iocb+0x54/0x1a0 > [] ? grab_cache_page_nowait+0xc0/0xc0 > [] aio_rw_vect_retry+0x7c/0x1e0 > [] ? aio_fsync+0x30/0x30 > [] aio_run_iocb+0x66/0x1a0 > [] do_io_submit+0x6f0/0xb80 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] sys_io_submit+0x10/0x20 > [] system_call_fastpath+0x16/0x1b > > Changes since v2: Update commit log to explain how the code is still > broken even if we delay the lock switching after the drain. > Changes since v1: Update commit log as Tejun suggested. > > Signed-off-by: Asias He Acked-by: Tejun Heo Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/