2002-10-31 16:05:48

by Nikita Danilov

[permalink] [raw]
Subject: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

Hello, Linus,

Following patch exports remove_from_page_cache(). reiser4 stores all
meta-data in the page cache. When piece of meta-data is removed,
corresponding page has to be removed from the page cache (this is
similar to truncate, but for meta-data), explicit call to
remove_from_page_cache() is required at this point.

Please apply.
Nikita.
diff -rup -X dontdiff linus-bk-2.5/mm/filemap.c linux-2.5-reiser4/mm/filemap.c
--- linus-bk-2.5/mm/filemap.c Fri Oct 18 03:00:41 2002
+++ linux-2.5-reiser4/mm/filemap.c Tue Oct 29 17:16:22 2002
@@ -97,6 +97,7 @@ void remove_from_page_cache(struct page
__remove_from_page_cache(page);
write_unlock(&mapping->page_lock);
}
+EXPORT_SYMBOL(remove_from_page_cache);

static inline int sync_page(struct page *page)
{


2002-10-31 16:12:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Thu, Oct 31, 2002 at 07:03:03PM +0300, Nikita Danilov wrote:
> Hello, Linus,
>
> Following patch exports remove_from_page_cache(). reiser4 stores all
> meta-data in the page cache. When piece of meta-data is removed,
> corresponding page has to be removed from the page cache (this is
> similar to truncate, but for meta-data), explicit call to
> remove_from_page_cache() is required at this point.

Could you please explain the code that needs it? No one should
call this in individual filesystem drivers.

2002-10-31 16:18:42

by Nikita Danilov

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

Christoph Hellwig writes:
> On Thu, Oct 31, 2002 at 07:03:03PM +0300, Nikita Danilov wrote:
> > Hello, Linus,
> >
> > Following patch exports remove_from_page_cache(). reiser4 stores all
> > meta-data in the page cache. When piece of meta-data is removed,
> > corresponding page has to be removed from the page cache (this is
> > similar to truncate, but for meta-data), explicit call to
> > remove_from_page_cache() is required at this point.
>
> Could you please explain the code that needs it? No one should
> call this in individual filesystem drivers.

Reiser4 stores meta-data in a huge balanced tree. This tree is kept
(partially) in the page cache. All pages in this tree are attached to
"fake" inode. Sometimes you need to remove node from the tree. At this
moment page has to be removed from the fake inode mapping.

Other file systems don't need remove_from_page_cache() because they only
store in the page cache data (and remove_from_page_cache() is called by
truncate()) and meta data that are never explicitly deleted (like
directory content in ext2).

Nikita.

2002-10-31 16:24:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Thu, Oct 31, 2002 at 07:24:40PM +0300, Nikita Danilov wrote:
> Reiser4 stores meta-data in a huge balanced tree. This tree is kept
> (partially) in the page cache. All pages in this tree are attached to
> "fake" inode. Sometimes you need to remove node from the tree. At this
> moment page has to be removed from the fake inode mapping.

What about chaing truncate_inode_pages to take an additional len
argument so you don't have to remove all pages past an offset?

>
> Other file systems don't need remove_from_page_cache() because they only
> store in the page cache data (and remove_from_page_cache() is called by
> truncate()) and meta data that are never explicitly deleted (like
> directory content in ext2).

Sorry, but that's wrong. XFS does use the pagecache for all metadata and JFS
for all but the superblock (which is never changed durin use)

2002-10-31 16:40:39

by Nikita Danilov

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

Christoph Hellwig writes:
> On Thu, Oct 31, 2002 at 07:24:40PM +0300, Nikita Danilov wrote:
> > Reiser4 stores meta-data in a huge balanced tree. This tree is kept
> > (partially) in the page cache. All pages in this tree are attached to
> > "fake" inode. Sometimes you need to remove node from the tree. At this
> > moment page has to be removed from the fake inode mapping.
>
> What about chaing truncate_inode_pages to take an additional len
> argument so you don't have to remove all pages past an offset?

It is possible, I think. But this will look more of a hack. Truncate is
rather for truncating mapping than cutting one page from the middle.

Besides, current truncate_inode_pages() with all its
radix_tree_gang_lookup()'s and two passes doesn't looks like easily
adaptable.

>
> >
> > Other file systems don't need remove_from_page_cache() because they only
> > store in the page cache data (and remove_from_page_cache() is called by
> > truncate()) and meta data that are never explicitly deleted (like
> > directory content in ext2).
>
> Sorry, but that's wrong. XFS does use the pagecache for all metadata and JFS
> for all but the superblock (which is never changed durin use)
>

Interesting. Then, XFS and JFS meta data in the page cache probably
are linearly ordered, and there it is never necessary to remove meta
data page from the middle of the mapping, right?

Nikita.

2002-10-31 16:51:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Thu, Oct 31, 2002 at 07:45:39PM +0300, Nikita Danilov wrote:
> Interesting. Then, XFS and JFS meta data in the page cache probably
> are linearly ordered, and there it is never necessary to remove meta
> data page from the middle of the mapping, right?

The issue is rather different for XFS and JFS. in JFS most metadata
(actually all metadata but the small superblock) is stored in inodes,
and it's accessed through the pagecache mapping for those inodes.

All access to those pages doesn't go directly through the pagecache
interface but a small metapage wrapper. When the page is removed it's synced
to disk and removed from the metapage hash, so that you can't acess it
anymore. It might still be on the VM lists for a while.

XFS on the other hand only uses the blockdevice mapping to acess it's
metadata so it doesn't have to remove the page explicitly from the
cache ever.

2002-10-31 16:59:34

by Nikita Danilov

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

Christoph Hellwig writes:
> On Thu, Oct 31, 2002 at 07:45:39PM +0300, Nikita Danilov wrote:
> > Interesting. Then, XFS and JFS meta data in the page cache probably
> > are linearly ordered, and there it is never necessary to remove meta
> > data page from the middle of the mapping, right?
>
> The issue is rather different for XFS and JFS. in JFS most metadata
> (actually all metadata but the small superblock) is stored in inodes,
> and it's accessed through the pagecache mapping for those inodes.
>
> All access to those pages doesn't go directly through the pagecache
> interface but a small metapage wrapper. When the page is removed it's synced
> to disk and removed from the metapage hash, so that you can't acess it
> anymore. It might still be on the VM lists for a while.

Interesting. But things like ->vm_writeback() and friends will go
directly to the page bypassing metapage wrapper, right? JFS checks that
page is still "live" on each low-level VM call?

>
> XFS on the other hand only uses the blockdevice mapping to acess it's
> metadata so it doesn't have to remove the page explicitly from the
> cache ever.
>

Nikita.

2002-10-31 17:06:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Thu, Oct 31, 2002 at 08:04:40PM +0300, Nikita Danilov wrote:
> Interesting. But things like ->vm_writeback() and friends will go
> directly to the page bypassing metapage wrapper, right?

Yes.

> JFS checks that
> page is still "live" on each low-level VM call?

Well, if the exntent in question doesn't exist anymore get_block()
will return a failure. In practice that won't happen as those
pages still kept around are never dirty.

2002-10-31 17:29:55

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Oct 31, 2002 16:31 +0000, Christoph Hellwig wrote:
> On Thu, Oct 31, 2002 at 07:24:40PM +0300, Nikita Danilov wrote:
> > Reiser4 stores meta-data in a huge balanced tree. This tree is kept
> > (partially) in the page cache. All pages in this tree are attached to
> > "fake" inode. Sometimes you need to remove node from the tree. At this
> > moment page has to be removed from the fake inode mapping.
>
> What about chaing truncate_inode_pages to take an additional len
> argument so you don't have to remove all pages past an offset?

That would be what we have been calling "punch", and is quite useful
for putting holes in files (i.e. making them sparse again). This
can be used for InterMezzo (among other things) so that the KML log
file can be growing at the end, but being punched out at the start
so it doesn't use up a lot of disk space.

Not that I'm holding my breath on getting this in the kernel, but
it is definitely useful.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-10-31 18:19:10

by Nikita Danilov

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

Andreas Dilger writes:
> On Oct 31, 2002 16:31 +0000, Christoph Hellwig wrote:
> > On Thu, Oct 31, 2002 at 07:24:40PM +0300, Nikita Danilov wrote:
> > > Reiser4 stores meta-data in a huge balanced tree. This tree is kept
> > > (partially) in the page cache. All pages in this tree are attached to
> > > "fake" inode. Sometimes you need to remove node from the tree. At this
> > > moment page has to be removed from the fake inode mapping.
> >
> > What about chaing truncate_inode_pages to take an additional len
> > argument so you don't have to remove all pages past an offset?
>
> That would be what we have been calling "punch", and is quite useful
> for putting holes in files (i.e. making them sparse again). This
> can be used for InterMezzo (among other things) so that the KML log
> file can be growing at the end, but being punched out at the start
> so it doesn't use up a lot of disk space.

Abusing truncate for such things will remain abuse exactly. Separate
interface is required.

>
> Not that I'm holding my breath on getting this in the kernel, but
> it is definitely useful.
>
> Cheers, Andreas

Nikita.

2002-10-31 18:51:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()


On Thu, 31 Oct 2002, Christoph Hellwig wrote:

> On Thu, Oct 31, 2002 at 07:24:40PM +0300, Nikita Danilov wrote:
> > Reiser4 stores meta-data in a huge balanced tree. This tree is kept
> > (partially) in the page cache. All pages in this tree are attached to
> > "fake" inode. Sometimes you need to remove node from the tree. At this
> > moment page has to be removed from the fake inode mapping.
>
> What about chaing truncate_inode_pages to take an additional len
> argument so you don't have to remove all pages past an offset?

Actually, we may want that for other reasons anyway. In particular, I can
well imagine why a networked filesystem in particular might want to
invalidate a range of a file cache, but not necessarily all of it.

(Yeah, I don't know of any network filesystem that does invalidation on
anything but a file granularity, but I assume such filesystems have to
exist. Especially in cluster environments it sounds like a sane thing to
do invalidates on a finer granularity)

Linus

2002-10-31 20:11:45

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Thu, Oct 31, 2002 at 10:33:11AM -0700, Andreas Dilger wrote:
> On Oct 31, 2002 16:31 +0000, Christoph Hellwig wrote:
> > What about chaing truncate_inode_pages to take an additional len
> > argument so you don't have to remove all pages past an offset?
>
> That would be what we have been calling "punch", and is quite useful
> for putting holes in files (i.e. making them sparse again). This
> can be used for InterMezzo (among other things) so that the KML log
> file can be growing at the end, but being punched out at the start
> so it doesn't use up a lot of disk space.

It's also very useful for HSM-software (Hirarcial Storage Management).


--
Ragnar Kj?rstad
Big Storage

2002-11-01 21:53:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH]: reiser4 [5/8] export remove_from_page_cache()

On Oct 31, 2002 10:57 -0800, Linus Torvalds wrote:
> On Thu, 31 Oct 2002, Christoph Hellwig wrote:
> > What about chaing truncate_inode_pages to take an additional len
> > argument so you don't have to remove all pages past an offset?
>
> Actually, we may want that for other reasons anyway. In particular, I can
> well imagine why a networked filesystem in particular might want to
> invalidate a range of a file cache, but not necessarily all of it.
>
> (Yeah, I don't know of any network filesystem that does invalidation on
> anything but a file granularity, but I assume such filesystems have to
> exist. Especially in cluster environments it sounds like a sane thing to
> do invalidates on a finer granularity)

Yes, we definitely need such a beast for Lustre. Currently (because we
haven't gotten around to fixing it) we invalidate the whole file if
there is a lock conflict when we really only want to invalidate a
page or range of pages.

Our "performance" release isn't until next year - we're still working
on "performant" right now, but in the case of multiple clients writing
to non-overlapping areas in a file, or different files we're still
pretty good - abount 1.5GB/s aggregate write speed with 20 storage targets.

We have 62 storage targets in our target environment, but haven't done
a full tests because we're working on some nasty distributed metadata
bugs right now. Since the client->target I/O is pretty much independent,
there should be no problems hitting 4.5 GB/s aggregate write speed.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/