Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935166AbXJPTHu (ORCPT ); Tue, 16 Oct 2007 15:07:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964949AbXJPTHV (ORCPT ); Tue, 16 Oct 2007 15:07:21 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:55782 "EHLO ebiederm.dsl.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934252AbXJPTHU (ORCPT ); Tue, 16 Oct 2007 15:07:20 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Nick Piggin Cc: Andrew Morton , Christian Borntraeger , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Martin Schwidefsky , "Theodore Ts'o" Subject: Re: [PATCH] rd: Mark ramdisk buffers heads dirty References: <200710151028.34407.borntraeger@de.ibm.com> <200710161819.11231.nickpiggin@yahoo.com.au> Date: Tue, 16 Oct 2007 13:06:46 -0600 In-Reply-To: <200710161819.11231.nickpiggin@yahoo.com.au> (Nick Piggin's message of "Tue, 16 Oct 2007 18:19:10 +1000") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3896 Lines: 87 Nick Piggin writes: > On Tuesday 16 October 2007 08:42, Eric W. Biederman wrote: >> I have not observed this case but it is possible to get a dirty page >> cache with clean buffer heads if we get a clean ramdisk page with >> buffer heads generated by a filesystem calling __getblk and then write >> to that page from user space through the block device. Then we just >> need to hit the proper window and try_to_free_buffers() will mark that >> page clean and eventually drop it. Ouch! >> >> To fix this use the generic __set_page_dirty_buffers in the ramdisk >> code so that when we mark a page dirty we also mark it's buffer heads >> dirty. > > Hmm, so we can also have some filesystems writing their own buffers > out by hand (clear_buffer_dirty, submit buffer for IO). Other places > will do similarly dodgy things with filesystem metadata > (fsync_buffers_list, for example). > > So your buffers get cleaned again, then your pages get cleaned. So I just took a little bit of time to look at and think about the path you are referring to, and I don't see a problem. The rule with the buffer dirty bit is that you first clear it and then you submit the write. When the write request finally makes it's way to rd.c we copy the data if necessary and call set_page_dirty. Which will then mark the page and the buffer dirty again. In essence the ramdisk code just attempts to lock buffers in ram by setting their dirty bit, just like we do for pages in ramfs. The only case where I see the dirty bit getting cleared without submitting I/O is when dirty state doesn't matter, in which case if we get a page full buffers all of whose data we don't care about it is legitimate to drop the page. As for ramdisk_writepage it probably makes sense to remove it, as the generic code paths seem to work as well or better the writepage method is NULL. However if we do keep it we should call set_page_dirty there on general principles. > While I said it was a good fix when I saw the patch earlier, I think > it's not closing the entire hole, and as such, Christian's patch is > probably the way to go for stable. I thought through the logic in try_to_free_buffers and it actually makes sense to me now. mark_buffer_dirty sets the page dirty bit so dirty buffers reside on dirty pages. When we submit I/O we aren't guaranteed to submit all of the dirty buffers for a page at once, so we don't clear the page dirty bit. With the result that we can end up with pages with the dirty bit set but all of their buffers are clean. Since we rather deliberately allow truly clean pages to be dropped from the ramdisk overriding try_to_free_buffer_pages looks wrong because then except for invalidate we can not remove buffers from truly clean pages. > For mainline, *if* we want to keep the old rd.c around at all, I > don't see any harm in this patch so long as Christian's is merged > as well. Sharing common code is always good. I do agree that with the amount of code duplication in the buffer cache that locking pages into the buffer cache seems very error prone, and difficult to maintain. So rewriting rd.c to store it's pages elsewhere sounds very promising. While I can see Christian's patch as fixing the symptom. I have a very hard time see it as correct. If we did something more elaborate to replace try_to_free_buffer_pages such that we could drop buffers from clean buffer cache pages when they became such and so were only suppressing the drop the dirty bit logic for pages that contain data we want to keep I would be ok with it. Just totally skipping buffer head freeing just feels wrong to me. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/