Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sun, 20 Jan 2002 21:28:30 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sun, 20 Jan 2002 21:28:22 -0500 Received: from CPEdeadbeef0000.cpe.net.cable.rogers.com ([24.100.234.67]:3332 "HELO coredump.sh0n.net") by vger.kernel.org with SMTP id ; Sun, 20 Jan 2002 21:28:09 -0500 Date: Sun, 20 Jan 2002 21:29:02 -0500 (EST) From: Shawn Starr To: Anton Altaparmakov cc: Hans Reiser , Matt , Rik van Riel , , Josh MacDonald Subject: Re: Possible Idea with filesystem buffering. In-Reply-To: <5.1.0.14.2.20020121010950.02672870@pop.cus.cam.ac.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 21 Jan 2002, Anton Altaparmakov wrote: > [snip] > At 00:57 21/01/02, Hans Reiser wrote: > [snip] > > Would be best if VM told us if we really must write that page. > > In theory the VM should never call writepage unless the page must be writen > out... > > But I agree with you that it would be good to be able to distinguish the > two cases. I have been thinking about this a bit in the context of NTFS TNG > but I think that it would be better to have a generic solution rather than > every fs does their own copy of the same thing. I envisage that there is a > flush daemon which just walks around writing pages to disk in the > background (there could be one per fs, or a generic one which fs register > with, at their option they could have their own of course) in order to keep > the number of dirty pages low and in order to minimize data loss on the > event of system/power failure. > > This demon requires several interfaces though, with regards to journalling > fs. The daemon should have an interface where the fs can say "commit pages > in this list NOW and do not return before done", also a barrier operation > would be required in journalling context. A transactions interface would be > ideal, where the fs can submit whole transactions consisting of writing out > a list of pages and optional write barriers; e.g. write journal pages x, y, > z, barrier, write metadata, perhaps barrier, finally write data pages a, b, > c. Simple file systems could just not bother at all and rely on the flush > daemon calling the fs to write the pages. > > Obviously when this daemon writes pages the pages will continue being > there. OTOH, if the VM calls write page because it needs to free memory > then writepage must write and clean the page. > if they are dirty and written immediately to the disk they can be cleaned from the queue. It would be nice if there was some way to have a checksum verify the data was written back then wipe it from the queue. As an example: 5 operations requested, 2 already in queue. In queue) DIRTY write to disk (this task has been in the queue for a while) In queue) not 'old' memory but must be written to disk pending queue: 1) read operation 2) read operation 3) Write operation 4) write operation The daemon should resort the priority write dirty pages to disk then write nay other pages that are left on queue, then get to read pages. Notes: If there is only one operation in the queue (say write) and nothing else comes along, then the daemon should force-write the data back to disk after a period of timeout (the memory in the slot becomes dirty) If there's too many tasks in the queue and another one requires more memory then whats left in the buffer/cache the daemon could request to store the request in swap memory and put it in the queue, if the request is a write request it would have more priority then any read requests still and get completed quickly allowing for remaining queue events to complete. Example: ReiserFS: Operation A. Write (10K) Operation B. Read (200K) Operation C. Write (160K) XFS: Operation A. Read (63K) Operation B. Read (3k) Operation C. Write (10K) EXT3: Operation A. Write (290K) Operation B. Write (90K) Operation C. Read (3k) the kpagebuf (or whatever name). Would get all these requests and sort out what needs to be done first as long as there's buffer/cache memory free the write operations would be done as fast as possible, verified by some checksum and purged from the queue, If there's no cache/buffer memory free then all write queues reguardless of being in swap or cache/buffer need to be written to disk. So: kpagebuf queue (total available buffer/cache memory is say 512K) EXT3 Write (290K) ReiserFS Write (160K) ReiserFS Write (10K) XFS Write (10K) EXT3 Write (90K) - Goes in swap because total > 512K (Dirty x2 state) ReiserFS Read (200K) - Swap (dirty x2) XFS Read (63K) - Swap (dirty x2) XFS Read (3K) - Swap (dirty x2) EXT3 Read (3K) - Swap (dirty x2) * The daemon would check in order of filesystem registeration for whos should be in the read queue first. * The daemon should maximize amount of memory stored in bufeer/cache to try to prevent write requests having to go into swap. In the above queue, we have a lot of read operations and one write operation in swap. Clean out the write operations since they are now dirty (because there's no room for more operations in the buffer/cache). Move the swapped write operation to the top of the queue and get rid of it. Move the read operations from swap to queue since there is room again. ** NOTE ** because those read requests are now dirty they MUST be delt with or they'll get stuck in the queue with more write requests overtaking them. Maybe I've lost it but that's how I see it ;) Shawn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/