Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753550AbYFWAY1 (ORCPT ); Sun, 22 Jun 2008 20:24:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751321AbYFWAYT (ORCPT ); Sun, 22 Jun 2008 20:24:19 -0400 Received: from gir.skynet.ie ([193.1.99.77]:53190 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750979AbYFWAYS (ORCPT ); Sun, 22 Jun 2008 20:24:18 -0400 Date: Mon, 23 Jun 2008 01:24:15 +0100 From: Mel Gorman To: Daniel J Blueman , Christoph Lameter , Linus Torvalds , Alexander Beregalov , Linux Kernel , xfs@oss.sgi.com Subject: Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)... Message-ID: <20080623002415.GB21597@csn.ul.ie> References: <6278d2220806220256g674304ectb945c14e7e09fede@mail.gmail.com> <6278d2220806220258p28de00c1x615ad7b2f708e3f8@mail.gmail.com> <20080622221930.GA11558@disturbed> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20080622221930.GA11558@disturbed> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6151 Lines: 145 On (23/06/08 08:19), Dave Chinner didst pronounce: > [added xfs@oss.sgi.com to cc] > > On Sun, Jun 22, 2008 at 10:58:56AM +0100, Daniel J Blueman wrote: > > I'm seeing a similar issue [2] to what was recently reported [1] by > > Alexander, but with another workload involving XFS and memory > > pressure. > > > > SLUB allocator is in use and config is at http://quora.org/config-client-debug . > > > > Let me know if you'd like more details/vmlinux objdump etc. > > > > Thanks, > > Daniel > > > > --- [1] > > > > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c > > > > --- [2] > > > > ======================================================= > > [ INFO: possible circular locking dependency detected ] > > 2.6.26-rc7-210c #2 > > ------------------------------------------------------- > > AutopanoPro/4470 is trying to acquire lock: > > (iprune_mutex){--..}, at: [] shrink_icache_memory+0x7d/0x290 > > > > but task is already holding lock: > > (&mm->mmap_sem){----}, at: [] do_page_fault+0x255/0x890 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (&mm->mmap_sem){----}: > > [] __lock_acquire+0xbdd/0x1020 > > [] lock_acquire+0x65/0x90 > > [] down_read+0x3b/0x70 > > [] do_page_fault+0x27c/0x890 > > [] error_exit+0x0/0xa9 > > [] 0xffffffffffffffff > > > > -> #1 (&(&ip->i_iolock)->mr_lock){----}: > > [] __lock_acquire+0xbdd/0x1020 > > [] lock_acquire+0x65/0x90 > > [] down_write_nested+0x46/0x80 > > [] xfs_ilock+0x99/0xa0 > > [] xfs_ireclaim+0x3f/0x90 > > [] xfs_finish_reclaim+0x59/0x1a0 > > [] xfs_reclaim+0x109/0x110 > > [] xfs_fs_clear_inode+0xe1/0x110 > > [] clear_inode+0x7d/0x110 > > [] dispose_list+0x2a/0x100 > > [] shrink_icache_memory+0x22f/0x290 > > [] shrink_slab+0x168/0x1d0 > > [] kswapd+0x3b6/0x560 > > [] kthread+0x4d/0x80 > > [] child_rip+0xa/0x12 > > [] 0xffffffffffffffff > > You may as well ignore anything invlving this path in XFS until > lockdep gets fixed. The kswapd reclaim path is inverted over the > synchronous reclaim path that is xfs_ilock -> run out of memory -> > prune_icache and then potentially another -> xfs_ilock. > In that case, have you any theory as to why this circular dependency is being reported now but wasn't before 2.6.26-rc1? I'm beginning to wonder if the bisecting fingering the zonelist modifiation is just a co-incidence. Also, do you think the stalls were happening before but just not being noticed? > In this case, XFS can *never* deadlock because the second xfs_ilock > is on a different, unreferenced, unlocked inode, but without turning > off lockdep there is nothing in XFS that can be done to prevent > this warning. > > Therxp eis a similar bug in the VM w.r.t the mmap_sem in that the > mmap_sem is held across a call to put_filp() which can result in > inversions between the xfs_ilock and mmap_sem. > > Both of these cases cannot be solved by changing XFS - lockdep > needs to be made aware of paths that can invert normal locking > order (like prune_icache) so it doesn't give false positives > like this. > > > -> #0 (iprune_mutex){--..}: > > [] __lock_acquire+0xa47/0x1020 > > [] lock_acquire+0x65/0x90 > > [] mutex_lock_nested+0xb5/0x300 > > [] shrink_icache_memory+0x7d/0x290 > > [] shrink_slab+0x168/0x1d0 > > [] try_to_free_pages+0x268/0x3a0 > > [] __alloc_pages_internal+0x206/0x4b0 > > [] __alloc_pages_nodemask+0x9/0x10 > > [] alloc_page_vma+0x72/0x1b0 > > [] handle_mm_fault+0x462/0x7b0 > > [] do_page_fault+0x30c/0x890 > > [] error_exit+0x0/0xa9 > > [] 0xffffffffffffffff > > This case is different in that it ??s complaining about mmap_sem vs > iprune_mutex, so I think that we can pretty much ignore the XFS side > of things here - the problem is higher level code.... > > > [] try_to_free_pages+0x268/0x3a0 > > [] ? isolate_pages_global+0x0/0x40 > > [] __alloc_pages_internal+0x206/0x4b0 > > [] __alloc_pages_nodemask+0x9/0x10 > > [] alloc_page_vma+0x72/0x1b0 > > [] handle_mm_fault+0x462/0x7b0 > > FWIW, should page allocation in a page fault be allowed to recurse > into the filesystem? If I follow the spaghetti of inline and > compiler inlined functions correctly, this is a GFP_HIGHUSER_MOVABLE > allocation, right? Should we be allowing shrink_icache_memory() > to be called at all in the page fault path? > Well, the page fault path is able to go to sleep and can enter direct reclaim under low memory situations. Right now, I'm failing to see why a page fault should not be allowed to reclaim pages in use by a filesystem. It was allowed before so the question still is why the circular lock warning appears now but didn't before. > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/