Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 10 Feb 2003 05:42:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 10 Feb 2003 05:42:44 -0500 Received: from [213.86.99.237] ([213.86.99.237]:17878 "EHLO warthog.cambridge.redhat.com") by vger.kernel.org with ESMTP id ; Mon, 10 Feb 2003 05:42:41 -0500 From: David Howells To: torvalds@transmeta.com cc: Andrew Morton , dhowells@redhat.com, jgarzik@redhat.com, linux-kernel@vger.kernel.org Subject: extra PG_* bits for page->flags User-Agent: EMH/1.14.1 SEMI/1.14.4 (Hosorogi) FLIM/1.14.4 (=?ISO-8859-4?Q?Kashiharajing=FE-mae?=) APEL/10.4 Emacs/21.2 (i386-redhat-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.4 - "Hosorogi") Content-Type: text/plain; charset=US-ASCII Date: Mon, 10 Feb 2003 10:51:07 +0000 Message-ID: <20459.1044874267@warthog.cambridge.redhat.com> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7260 Lines: 208 Hi Linus, I'm writing a cache filesystem for primarily for caching AFS pages, but that also can be used for caching other network FS pages (such as NFSv4, which Jeff Garzik and Trond Myklebust are interested in, I think). This will be journalled because if you've got an enourmous cache with all of /usr and all your data in it, you don't want to have to re-initialise it if power goes away (besides, it may have uncommitted data in it). So what I'd like is for page->flags to acquire two new flags: (*) PG_journal This indicates that a journalled operation in progress on the page. It is only cleared when the operation has it's operation marked as completed in the journal on disc. (*) PG_journalmark This indicates that the above journalling operation is still having its initial mark made in the journal on disc. It is cleared when the write for this completes. Anyone wishing to do a journalled operation to a block in the cachefs would: (1) Wait for the PG_journal to become clear (2) Ask for the journal to be marked (which would set both bits) (3) Wait for PG_journalmark to be cleared (4) Modify the page appropriately and write it to disc (5) Ask for the journal to be ACK'd Actually, step (1) would be done by step (2) in a manner similar to lock_page(). Would you be willing to accept such an addition? David diff -ur linux-2.5.59-orig/include/linux/page-flags.h linux-2.5.59/include/linux/page-flags.h --- linux-2.5.59-orig/include/linux/page-flags.h 2003-01-17 02:22:24.000000000 +0000 +++ linux-2.5.59/include/linux/page-flags.h 2003-02-09 18:40:03.000000000 +0000 @@ -74,6 +74,9 @@ #define PG_mappedtodisk 17 /* Has blocks allocated on-disk */ #define PG_reclaim 18 /* To be recalimed asap */ +#define PG_journalmark 20 /* journalling mark being written for page */ +#define PG_journal 21 /* journalled operation in progress on page */ + /* * Global page accounting. One instance per CPU. Only unsigned longs are * allowed. @@ -185,6 +188,18 @@ #define PageChecked(page) test_bit(PG_checked, &(page)->flags) #define SetPageChecked(page) set_bit(PG_checked, &(page)->flags) +#define PageJournal(page) test_bit(PG_journal, &(page)->flags) +#define SetPageJournal(page) set_bit(PG_journal, &(page)->flags) +#define ClearPageJournal(page) clear_bit(PG_journal, &(page)->flags) +#define TestClearPageJournal(page) test_and_clear_bit(PG_journal, &(page)->flags) +#define TestSetPageJournal(page) test_and_set_bit(PG_journal, &(page)->flags) + +#define PageJournalMark(page) test_bit(PG_journalmark, &(page)->flags) +#define SetPageJournalMark(page) set_bit(PG_journalmark, &(page)->flags) +#define ClearPageJournalMark(page) clear_bit(PG_journalmark, &(page)->flags) +#define TestClearPageJournalMark(page) test_and_clear_bit(PG_journalmark, &(page)->flags) +#define TestSetPageJournalMark(page) test_and_set_bit(PG_journalmark, &(page)->flags) + #define PageReserved(page) test_bit(PG_reserved, &(page)->flags) #define SetPageReserved(page) set_bit(PG_reserved, &(page)->flags) #define ClearPageReserved(page) clear_bit(PG_reserved, &(page)->flags) diff -ur linux-2.5.59-orig/include/linux/pagemap.h linux-2.5.59/include/linux/pagemap.h --- linux-2.5.59-orig/include/linux/pagemap.h 2003-01-17 02:21:34.000000000 +0000 +++ linux-2.5.59/include/linux/pagemap.h 2003-02-09 19:34:48.000000000 +0000 @@ -122,4 +122,33 @@ } extern void end_page_writeback(struct page *page); + +/* + * Wait for a journalling on a page + * - first bit for initiation mark being recorded in journal + * - second bit for ACK mark being recorded in journal + */ +extern void FASTCALL(__journal_page(struct page *page)); + +static inline void journal_page(struct page *page) +{ + if (TestSetPageJournal(page)) + __journal_page(page); +} + +static inline void wait_on_page_journal_mark(struct page *page) +{ + if (PageJournalMark(page)) + wait_on_page_bit(page, PG_journalmark); +} + +static inline void wait_on_page_journal_ack(struct page *page) +{ + if (PageJournal(page)) + wait_on_page_bit(page, PG_journal); +} + +extern void FASTCALL(page_marked_journal(struct page *page)); +extern void FASTCALL(page_acked_journal(struct page *page)); + #endif /* _LINUX_PAGEMAP_H */ diff -ur linux-2.5.59-orig/mm/filemap.c linux-2.5.59/mm/filemap.c --- linux-2.5.59-orig/mm/filemap.c 2003-01-17 02:22:00.000000000 +0000 +++ linux-2.5.59/mm/filemap.c 2003-02-09 19:35:36.000000000 +0000 @@ -226,6 +226,8 @@ return error; } +EXPORT_SYMBOL(add_to_page_cache); + int add_to_page_cache_lru(struct page *page, struct address_space *mapping, pgoff_t offset, int gfp_mask) { @@ -312,6 +314,36 @@ EXPORT_SYMBOL(end_page_writeback); /* + * Note a journalling op on a page has been ACK'd in the journal + */ +void page_acked_journal(struct page *page) +{ + wait_queue_head_t *waitqueue = page_waitqueue(page); + smp_mb__before_clear_bit(); + if (!TestClearPageJournal(page)) + BUG(); + smp_mb__after_clear_bit(); + if (waitqueue_active(waitqueue)) + wake_up_all(waitqueue); +} +EXPORT_SYMBOL(page_acked_journal); + +/* + * Note a journalling op on a page has been made initiated in the journal + */ +void page_marked_journal(struct page *page) +{ + wait_queue_head_t *waitqueue = page_waitqueue(page); + smp_mb__before_clear_bit(); + if (!TestClearPageJournalMark(page)) + BUG(); + smp_mb__after_clear_bit(); + if (waitqueue_active(waitqueue)) + wake_up_all(waitqueue); +} +EXPORT_SYMBOL(page_marked_journal); + +/* * Get a lock on the page, assuming we need to sleep to get it. * * Ugly: running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some @@ -335,6 +367,29 @@ EXPORT_SYMBOL(__lock_page); /* + * Get a journalling lock on the page, assuming we need to sleep to get it. + * + * Ugly: running sync_page() in state TASK_UNINTERRUPTIBLE is scary. If some + * random driver's requestfn sets TASK_RUNNING, we could busywait. However + * chances are that on the second loop, the block layer's plug list is empty, + * so sync_page() will then return in state TASK_UNINTERRUPTIBLE. + */ +void __journal_page(struct page *page) +{ + wait_queue_head_t *wqh = page_waitqueue(page); + DEFINE_WAIT(wait); + + while (TestSetPageJournal(page)) { + prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); + sync_page(page); + if (PageJournal(page)) + io_schedule(); + } + finish_wait(wqh, &wait); +} +EXPORT_SYMBOL(__journal_page); + +/* * a rather lightweight function, finding and getting a reference to a * hashed page atomically. */ diff -ur linux-2.5.59-orig/mm/page_alloc.c linux-2.5.59/mm/page_alloc.c --- linux-2.5.59-orig/mm/page_alloc.c 2003-01-17 02:21:38.000000000 +0000 +++ linux-2.5.59/mm/page_alloc.c 2003-02-09 17:10:05.000000000 +0000 @@ -262,7 +262,9 @@ 1 << PG_active | 1 << PG_dirty | 1 << PG_reclaim | - 1 << PG_writeback ))) + 1 << PG_writeback | + 1 << PG_journal | + 1 << PG_journalmark))) bad_page(__FUNCTION__, page); page->flags &= ~(1 << PG_uptodate | 1 << PG_error | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/