Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758338AbXJQVdO (ORCPT ); Wed, 17 Oct 2007 17:33:14 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753289AbXJQVc7 (ORCPT ); Wed, 17 Oct 2007 17:32:59 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:50418 "EHLO ebiederm.dsl.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753699AbXJQVc6 (ORCPT ); Wed, 17 Oct 2007 17:32:58 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Chris Mason Cc: Christian Borntraeger , Andrew Morton , Nick Piggin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Martin Schwidefsky , "Theodore Ts'o" , stable@kernel.org Subject: Re: [PATCH] rd: Mark ramdisk buffers heads dirty References: <200710151028.34407.borntraeger@de.ibm.com> <200710160956.58061.borntraeger@de.ibm.com> <200710171814.01717.borntraeger@de.ibm.com> <1192648456.15717.7.camel@think.oraclecorp.com> <1192654481.15717.16.camel@think.oraclecorp.com> Date: Wed, 17 Oct 2007 15:30:19 -0600 In-Reply-To: <1192654481.15717.16.camel@think.oraclecorp.com> (Chris Mason's message of "Wed, 17 Oct 2007 16:54:41 -0400") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3031 Lines: 71 Chris Mason writes: >> Thinking about it. I don't believe anyone has ever intentionally built >> a filesystem tool that depends on being able to modify a file systems >> metadata buffer heads while the filesystem is running, and doing that >> would seem to be fragile as it would require a lot of cooperation >> between the tool and the filesystem about how the filesystem uses and >> implement things. >> > > That's right. For example, ext2 is doing directories in the page cache > of the directory inode, so there's a cache alias between the block > device page cache and the directory inode page cache. > >> Now I guess I need to see how difficult a patch would be to give >> filesystems magic inodes to keep their metadata buffer heads in. > > Not hard, the block device inode is already a magic inode for metadata > buffer heads. You could just make another one attached to the bdev. > > But, I don't think I fully understand the problem you're trying to > solve? So the start: When we write buffers from the buffer cache we clear buffer_dirty but not PageDirty So try_to_free_buffers() will mark any page with clean buffer_heads that is not clean itself clean. The ramdisk set pages dirty to keep them from being removed from the page cache, just like ramfs. Unfortunately when those dirty ramdisk pages get buffers on them and those buffers all go clean and we are trying to reclaim buffer_heads we drop those pages from the page cache. Ouch! We can fix the ramdisk by setting making certain that buffer_heads on ramdisk pages stay dirty as well. The problem is this breaks filesystems like reiserfs and ext3 that expect to be able to make buffer_heads clean sometimes. There are other ways to solve this for ramdisks, such as changing where ramdisks are stored. However fixing the ramdisks this way still leaves the general problem that there are other paths to the filesystem metadata buffers, and those other paths cause the code to be complicated and buggy. So I'm trying to see if we can untangle this Gordian knot, so the code because more easily maintainable. To make the buffer cache a helper library instead of require infrastructure, it looks like two things need to happen. - Move metadata buffer heads off block devices page cache entries, - Communicate the mappings of data pages to block device sectors in a generic way without buffer heads. How we ultimately fix the ramdisk tends to depend on how we untangle the buffer head problem. Right now the only simple solution is to suppress try_to_free_buffers, which is a bit ugly. We can also come up with a completely separate store for the pages in the buffer cache but if we wind up moving the metadata buffer heads anyway then that should not be necessary. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/