Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752163AbXJZL1U (ORCPT ); Fri, 26 Oct 2007 07:27:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751920AbXJZL1F (ORCPT ); Fri, 26 Oct 2007 07:27:05 -0400 Received: from extu-mxob-2.symantec.com ([216.10.194.135]:47263 "EHLO extu-mxob-2.symantec.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbXJZL1D (ORCPT ); Fri, 26 Oct 2007 07:27:03 -0400 Date: Fri, 26 Oct 2007 12:26:08 +0100 (BST) From: Hugh Dickins X-X-Sender: hugh@blonde.wat.veritas.com To: Neil Brown cc: Pekka Enberg , Erez Zadok , Ryan Finnie , Andrew Morton , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, cjwatson@ubuntu.com, linux-mm@kvack.org Subject: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland In-Reply-To: <18209.19021.383347.160126@notabene.brown> Message-ID: References: <200710142232.l9EMW8kK029572@agora.fsl.cs.sunysb.edu> <84144f020710150447o94b1babo8b6e6a647828465f@mail.gmail.com> <84144f020710221348x297795c0qda61046ec69a7178@mail.gmail.com> <18209.19021.383347.160126@notabene.brown> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4602 Lines: 97 On Fri, 26 Oct 2007, Neil Brown wrote: > On Thursday October 25, hugh@veritas.com wrote: > > The patch looks like it makes perfect sense to me. Great, thanks a lot for looking at it, Neil and Pekka. > Before the change, ->writepage could return AOP_WRITEPAGE_ACTIVATE > without unlocking the page, and this has precisely the effect of: > ClearPageReclaim(); (if the call path was through pageout) > SetPageActive(); (if the call was through shrink_page_list) > unlock_page(); > > With the patch, the ->writepage method does the SetPageActive and the > unlock_page, which on the whole seems cleaner. > > We seem to have lost a call to ClearPageReclaim - I don't know if that > is significant. It doesn't show up in the diff at all, but pageout() already has if (!PageWriteback(page)) { /* synchronous write or broken a_ops? */ ClearPageReclaim(page); } which will clear it since we've never set PageWriteback. I think no harm would come from leaving it set there, since it only takes effect in end_page_writeback (its effect being to let the just written page be moved to the end of the LRU, so that it will then be soon reclaimed), and clear_page_dirty_for_io clears it before coming down this way. But I'd never argue for that: I hate having leftover flags hanging around outside the scope of their relevance. > > Special, hidden, undocumented, secret hack! Then in 2.6.7 Andrew > > stole his own secret and used it when concocting ramdisk_writepage. > > Oh, and NFS made some kind of use of it in 2.6.6 only. Then Neil > > revealed the secret to the uninitiated in 2.6.17: now, what's the > > appropriate punishment for that? > > Surely the punishment should be for writing hidden undocumented hacks > in the first place! I vote we just make him maintainer for the whole > kernel - that will keep him so busy that he will never have a chance > to do it again :-) That is a splendid retort, which has won you absolution. But it makes me a little sad: that smiley should be a weepy. > > --- 2.6.24-rc1/Documentation/filesystems/vfs.txt 2007-10-24 07:15:11.000000000 +0100 > > +++ linux/Documentation/filesystems/vfs.txt 2007-10-24 08:42:07.000000000 +0100 > > @@ -567,9 +567,7 @@ struct address_space_operations { > > If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to > > try too hard if there are problems, and may choose to write out > > other pages from the mapping if that is easier (e.g. due to > > - internal dependencies). If it chooses not to start writeout, it > > - should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep > > - calling ->writepage on that page. > > + internal dependencies). > > > > It seems that the new requirement is that if the address_space > chooses not to write out the page, it should now call SetPageActive(). > If that is the case, I think it should be explicit in the > documentation - please? No, it's not the case; but you're right that I should add something there, to put an end to the idea. It'll be something along the lines of "You may notice shmem setting PageActive there, but please don't do that; or if you insist, be sure never to do so in the !wbc->for_reclaim case". The PageActive thing is for when a filesystem regrets that it even had a ->writepage (it replicates the behaviour of the writepage == NULL case or the VM_LOCKED SWAP_FAIL case or the !add_to_swap case, delaying the return of this page to writepage for as long as it can). It's done in shmem_writepage because shm_lock (equivalent to VM_LOCKED) is only discovered within that writepage, and no-swap is discovered there too. ramdisk does it too: I've not tried to understand ramdisk as Nick and Eric have, but it used to have no writepage, and would prefer to have no writepage, but appears to need one for some PageUptodate reasons. It's fairly normal for a filesystem to find that for some reason it cannot carry out a writepage on this page right now (in the reclaim case: the sync case demands action, IIRC); so it then simply does set_page_dirty and unlock_page and returns "success". I'll try to condense this down for the Doc when finalizing the patch; which I've still not yet tested properly - thanks for the eyes, but I can't submit it until I've checked in detail that it really gets to do what we think it does. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/