From: Peter Zijlstra Subject: Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode Date: Sun, 15 Jul 2007 21:21:03 +0200 Message-ID: <1184527263.5284.110.camel@lappy> References: <1183275482.4010.133.camel@localhost.localdomain> <20070710163247.5c8bfa3f.akpm@linux-foundation.org> <20070713020529.1486491f.akpm@linux-foundation.org> <1184333621.20032.85.camel@twins> <20070713121259.20066d5b.akpm@linux-foundation.org> <20070713214746.GH23255@mami.zabbo.net> <1184504543.5284.96.camel@lappy> <20070715111109.a2775ae4.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Zach Brown , cmm@us.ibm.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org, Andy Whitcroft , Ingo Molnar To: Andrew Morton Return-path: In-Reply-To: <20070715111109.a2775ae4.akpm@linux-foundation.org> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Sun, 2007-07-15 at 11:11 -0700, Andrew Morton wrote: > On Sun, 15 Jul 2007 15:02:23 +0200 Peter Zijlstra wrote: > > > On Fri, 2007-07-13 at 14:47 -0700, Zach Brown wrote: > > > > > Peter, do you have any interest in seeing how far we can get > > > at tracking lock_page()? I'm not holding my breath, but any little bit > > > would probably help. > > > > Would this be a valid report? > > > > ( /me goes hunt a x86_64 unwinder patch that will apply to this tree. > > These stacktraces are pain ) > > They are. lockdep reports are a pain too. It's still a struggle to > understand wtf they're trying to tell you. Mabe it's just me. It got you confused alright,.. > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > [ 2.6.22-rt3-dirty #34 > > ------------------------------------------------------- > > mount/1296 is trying to acquire lock: > > (&ei->truncate_mutex){--..}, at: [] ext3_get_blocks_handle+0x1a4/0x8f7 > > > > but task is already holding lock: > > (lock_page_0){--..}, at: [] generic_file_buffered_write+0x1ee/0x646 > > > > which lock already depends on the new lock. > > So, the offence is trying to acquire &ei->truncate_mutex while already holding lock_page_0. These traces show how the previous (reverse) dependancy came into being > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (lock_page_0){--..}: > > [] __lock_acquire+0xa72/0xc35 > > [] lock_acquire+0x48/0x61 > > [] add_to_page_cache_lru+0xe/0x23 > > [] add_to_page_cache+0x1de/0x2c1 > > [] add_to_page_cache_lru+0xe/0x23 > > [] find_or_create_page+0x4c/0x73 > > [] __getblk+0x118/0x23c > > [] __bread+0x6/0x9c > > [] read_block_bitmap+0x34/0x65 > > [] ext3_free_blocks_sb+0xec/0x3d4 > > [] ext3_free_blocks+0x2e/0x61 > > [] ext3_free_data+0xaa/0xda > > [] ext3_truncate+0x4d2/0x84e > > [] pagevec_lookup+0x17/0x1e > > [] truncate_inode_pages_range+0x1f4/0x323 > > [] add_preempt_count+0x14/0xe4 > > [] journal_stop+0x1fe/0x21d > > [] vmtruncate+0xa2/0xc0 > > [] inode_setattr+0x22/0x10a > > [] ext3_setattr+0x136/0x18f > > [] notify_change+0x10a/0x241 > > [] notify_change+0x128/0x241 > > [] do_truncate+0x56/0x7f > > [] do_truncate+0x61/0x7f > > [] get_write_access+0x3f/0x45 > > [] may_open+0x193/0x1af > > [] open_namei+0x2cb/0x63e > > [] rt_up_read+0x53/0x5c > > [] do_page_fault+0x479/0x7cc > > [] do_filp_open+0x1c/0x38 > > [] rt_spin_unlock+0x17/0x47 > > [] get_unused_fd+0xf9/0x107 > > [] do_sys_open+0x48/0xd5 > > [] system_call+0x7e/0x83 > > [] 0xffffffffffffffff > > I guess we're doing lock_page() against a blockdev pagecache page here > while holding truncate_mutex against some S_ISREG file. So this trace ( -> #1 ) shows how lock_page_0 became to depend on &ei->truncate_mutex ( -> #0 ). > > -> #0 (&ei->truncate_mutex){--..}: > > [] print_circular_bug_header+0xcc/0xd3 > > [] __lock_acquire+0x96e/0xc35 > > [] lock_acquire+0x48/0x61 > > [] ext3_get_blocks_handle+0x1a4/0x8f7 > > [] _mutex_lock+0x26/0x52 > > [] ext3_get_blocks_handle+0x1a4/0x8f7 > > [] find_usage_backwards+0xb0/0xd9 > > [] find_usage_backwards+0xb0/0xd9 > > [] debug_check_no_locks_freed+0x11d/0x129 > > [] trace_hardirqs_on_caller+0x115/0x138 > > [] lockdep_init_map+0xac/0x41f > > [] add_preempt_count+0x14/0xe4 > > [] ext3_get_block+0xc2/0xe4 > > [] __block_prepare_write+0x195/0x442 > > [] ext3_get_block+0x0/0xe4 > > [] block_prepare_write+0x1a/0x25 > > [] ext3_prepare_write+0xb2/0x17b > > [] generic_file_buffered_write+0x298/0x646 > > [] current_fs_time+0x3b/0x40 > > [] add_preempt_count+0x14/0xe4 > > [] __generic_file_aio_write_nolock+0x34f/0x3b9 > > [] put_lock_stats+0xe/0x2a > > [] generic_file_aio_write+0x4c/0xc4 > > [] generic_file_aio_write+0x61/0xc4 > > [] ext3_orphan_del+0x53/0x19f > > [] ext3_file_write+0x1c/0x9d > > [] do_sync_write+0xcc/0x10f > > [] autoremove_wake_function+0x0/0x2e > > [] get_lock_stats+0xe/0x3f > > [] lock_release_holdtime+0x41/0x4f > > [] put_lock_stats+0xe/0x2a > > [] sys_fchmod+0xa3/0xbd > > [] _mutex_unlock+0x17/0x20 > > [] vfs_write+0xb6/0x148 > > [] sys_write+0x48/0x74 > > [] system_call+0x7e/0x83 > > [] 0xffffffffffffffff > > Here we're taking a file's truncate_mutex while holding lock_page() against > one of its pages. This is the correct lock ranking, I suppose. > > This is one of those fairly common cross-inode notabugs, I suspect. So #0 and #1 are compatible, and build the dependancy: &ei->truncate_mutex lock_page_0 Then the part you cut out: > > stack backtrace: > > > > Call Trace: > > [] print_circular_bug_tail+0x69/0x72 > > [] print_circular_bug_header+0xcc/0xd3 > > [] __lock_acquire+0x96e/0xc35 > > [] lock_acquire+0x48/0x61 > > [] ext3_get_blocks_handle+0x1a4/0x8f7 > > [] _mutex_lock+0x26/0x52 Here we take: &ei->truncate_mutex > > [] ext3_get_blocks_handle+0x1a4/0x8f7 > > [] find_usage_backwards+0xb0/0xd9 > > [] find_usage_backwards+0xb0/0xd9 > > [] debug_check_no_locks_freed+0x11d/0x129 > > [] trace_hardirqs_on_caller+0x115/0x138 > > [] lockdep_init_map+0xac/0x41f > > [] add_preempt_count+0x14/0xe4 > > [] ext3_get_block+0xc2/0xe4 > > [] __block_prepare_write+0x195/0x442 > > [] ext3_get_block+0x0/0xe4 > > [] block_prepare_write+0x1a/0x25 > > [] ext3_prepare_write+0xb2/0x17b > > [] generic_file_buffered_write+0x298/0x646 while some path from generic_file_buffered_write() took: lock_page_0 probably __grab_cache_page() > > > > [] current_fs_time+0x3b/0x40 > > [] add_preempt_count+0x14/0xe4 > > [] __generic_file_aio_write_nolock+0x34f/0x3b9 > > [] put_lock_stats+0xe/0x2a > > [] generic_file_aio_write+0x4c/0xc4 > > [] generic_file_aio_write+0x61/0xc4 > > [] ext3_orphan_del+0x53/0x19f > > [] ext3_file_write+0x1c/0x9d > > [] do_sync_write+0xcc/0x10f > > [] autoremove_wake_function+0x0/0x2e > > [] get_lock_stats+0xe/0x3f > > [] lock_release_holdtime+0x41/0x4f > > [] put_lock_stats+0xe/0x2a > > [] sys_fchmod+0xa3/0xbd > > [] _mutex_unlock+0x17/0x20 > > [] vfs_write+0xb6/0x148 > > [] sys_write+0x48/0x74 > > [] system_call+0x7e/0x83 > > Shows the current stacktrace where we violate the previously established locking order.