Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755639AbYANRGd (ORCPT ); Mon, 14 Jan 2008 12:06:33 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754586AbYANRGN (ORCPT ); Mon, 14 Jan 2008 12:06:13 -0500 Received: from styx.suse.cz ([82.119.242.94]:54563 "EHLO duck.suse.cz" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753339AbYANRGL (ORCPT ); Mon, 14 Jan 2008 12:06:11 -0500 Date: Mon, 14 Jan 2008 18:06:09 +0100 From: Jan Kara To: Zach Brown Cc: Erez Zadok , linux-kernel@vger.kernel.org, ext3-users@redhat.com, Chris Mason , Peter Zijlstra , linux-fsdevel@vger.kernel.org Subject: Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66) Message-ID: <20080114170609.GH4214@duck.suse.cz> References: <200712242302.lBON2O8s011190@agora.fsl.cs.sunysb.edu> <477BF72B.4000608@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <477BF72B.4000608@oracle.com> User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3345 Lines: 75 On Wed 02-01-08 12:42:19, Zach Brown wrote: > Erez Zadok wrote: > > Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree. > > Kernel w/ SMP, preemption, and lockdep configured. > > This is a real lock ordering problem. Thanks for reporting it. > > The updating of atime inside sys_mmap() orders the mmap_sem in the vfs > outside of the journal handle in ext3's inode dirtying: > > > -> #1 (jbd_handle){--..}: > > [] __lock_acquire+0x9cc/0xb95 > > [] lock_acquire+0x5f/0x78 > > [] journal_start+0xee/0xf8 > > [] ext3_journal_start_sb+0x48/0x4a > > [] ext3_dirty_inode+0x27/0x6c > > [] __mark_inode_dirty+0x29/0x144 > > [] touch_atime+0xb7/0xbc > > [] generic_file_mmap+0x2d/0x42 > > [] mmap_region+0x1e6/0x3b4 > > [] do_mmap_pgoff+0x1fb/0x253 > > [] sys_mmap2+0x9b/0xb5 > > [] syscall_call+0x7/0xb > > [] 0xffffffff > > ext3_direct_IO() orders the journal handle outside of the mmap_sem that > dio_get_page() acquires to pin pages with get_user_pages(): > > > -> #0 (&mm->mmap_sem){----}: > > [] __lock_acquire+0x8bc/0xb95 > > [] lock_acquire+0x5f/0x78 > > [] down_read+0x3a/0x4c > > [] dio_get_page+0x4e/0x15d > > [] __blockdev_direct_IO+0x431/0xa81 > > [] ext3_direct_IO+0x10c/0x1a1 > > [] generic_file_direct_IO+0x124/0x139 > > [] generic_file_direct_write+0x56/0x11c > > [] __generic_file_aio_write_nolock+0x33d/0x489 > > [] generic_file_aio_write+0x58/0xb6 > > [] ext3_file_write+0x27/0x99 > > [] do_sync_write+0xc5/0x102 > > [] vfs_write+0x90/0x119 > > [] sys_write+0x3d/0x61 > > [] sysenter_past_esp+0x5f/0xa5 > > [] 0xffffffff > > Two fixes come to mind: > > 1) use something like Peter's ->mmap_prepare() to update atime before > acquiring the mmap_sem. ( http://lkml.org/lkml/2007/11/11/97 ). I > don't know if this would leave more paths which do a journal_start() > while holding the mmap_sem. > > 2) rework ext3's dio to only hold the jbd handle in ext3_get_block(). > Chris has a patch for this kicking around somewhere but I'm told it has > problems exposing old blocks in ordered data mode. > > Does anyone have preferences? I could go either way. I certainly don't > like the idea of journal handles being held across the entirety of > fs/direct-io.c. It's yet another case of O_DIRECT differing wildly from > the buffered path :(. I've looked more into it and I think that 2) is the only way to go since transaction start ranks below page lock (standard buffered write path) and page lock ranks below mmap_sem. So we have at least one more dependency mmap_sem must go before transaction start... Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/