2007-11-07 01:17:19

by Christoph Lameter

[permalink] [raw]
Subject: [patch 14/23] inodes: Support generic defragmentation

This implements the ability to remove inodes in a particular slab
from inode cache. In order to remove an inode we may have to write out
the pages of an inode, the inode itself and remove the dentries referring
to the node.

Provide generic functionality that can be used by filesystems that have
their own inode caches to also tie into the defragmentation functions
that are made available here.

Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Christoph Lameter <[email protected]>
---
fs/inode.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 6 +++
2 files changed, 102 insertions(+)

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c 2007-10-17 13:35:52.000000000 -0700
+++ linux-2.6/fs/inode.c 2007-11-06 12:56:15.000000000 -0800
@@ -1369,6 +1369,101 @@ static int __init set_ihash_entries(char
}
__setup("ihash_entries=", set_ihash_entries);

+void *get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+ int i;
+
+ spin_lock(&inode_lock);
+ for (i = 0; i < nr; i++) {
+ struct inode *inode = v[i];
+
+ if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ v[i] = NULL;
+ else
+ __iget(inode);
+ }
+ spin_unlock(&inode_lock);
+ return NULL;
+}
+EXPORT_SYMBOL(get_inodes);
+
+/*
+ * Function for filesystems that embedd struct inode into their own
+ * structures. The offset is the offset of the struct inode in the fs inode.
+ */
+void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
+ unsigned long offset)
+{
+ int i;
+
+ for (i = 0; i < nr; i++)
+ v[i] += offset;
+
+ return get_inodes(s, nr, v);
+}
+EXPORT_SYMBOL(fs_get_inodes);
+
+void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
+{
+ struct inode *inode;
+ int i;
+ int abort = 0;
+ LIST_HEAD(freeable);
+ struct super_block *sb;
+
+ for (i = 0; i < nr; i++) {
+ inode = v[i];
+ if (!inode)
+ continue;
+
+ if (inode_has_buffers(inode) || inode->i_data.nrpages) {
+ if (remove_inode_buffers(inode))
+ invalidate_mapping_pages(&inode->i_data,
+ 0, -1);
+ }
+
+ /* Invalidate children and dentry */
+ if (S_ISDIR(inode->i_mode)) {
+ struct dentry *d = d_find_alias(inode);
+
+ if (d) {
+ d_invalidate(d);
+ dput(d);
+ }
+ }
+
+ if (inode->i_state & I_DIRTY)
+ write_inode_now(inode, 1);
+
+ d_prune_aliases(inode);
+ }
+
+ mutex_lock(&iprune_mutex);
+ for (i = 0; i < nr; i++) {
+ inode = v[i];
+ if (!inode)
+ continue;
+
+ sb = inode->i_sb;
+ iput(inode);
+ if (abort || !(sb->s_flags & MS_ACTIVE))
+ continue;
+
+ spin_lock(&inode_lock);
+ abort = !can_unuse(inode);
+
+ if (!abort) {
+ list_move(&inode->i_list, &freeable);
+ inode->i_state |= I_FREEING;
+ inodes_stat.nr_unused--;
+ }
+ spin_unlock(&inode_lock);
+ }
+ dispose_list(&freeable);
+ mutex_unlock(&iprune_mutex);
+}
+EXPORT_SYMBOL(kick_inodes);
+
/*
* Initialize the waitqueues and inode hash table.
*/
@@ -1408,6 +1503,7 @@ void __init inode_init(void)
SLAB_MEM_SPREAD),
init_once);
register_shrinker(&icache_shrinker);
+ kmem_cache_setup_defrag(inode_cachep, get_inodes, kick_inodes);

/* Hash may have been set up in inode_init_early */
if (!hashdist)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2007-10-25 18:28:41.000000000 -0700
+++ linux-2.6/include/linux/fs.h 2007-11-06 12:56:15.000000000 -0800
@@ -1776,6 +1776,12 @@ static inline void insert_inode_hash(str
__insert_inode_hash(inode, inode->i_ino);
}

+/* Helper functions for inode defragmentation support in filesystems */
+extern void kick_inodes(struct kmem_cache *, int, void **, void *);
+extern void *get_inodes(struct kmem_cache *, int nr, void **);
+extern void *fs_get_inodes(struct kmem_cache *, int nr, void **,
+ unsigned long offset);
+
extern struct file * get_empty_filp(void);
extern void file_move(struct file *f, struct list_head *list);
extern void file_kill(struct file *f);

--


2007-11-07 10:22:44

by Jörn Engel

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Tue, 6 November 2007 17:11:44 -0800, Christoph Lameter wrote:
>
> +void *get_inodes(struct kmem_cache *s, int nr, void **v)
> +{
> + int i;
> +
> + spin_lock(&inode_lock);
> + for (i = 0; i < nr; i++) {
> + struct inode *inode = v[i];
> +
> + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> + v[i] = NULL;
> + else
> + __iget(inode);
> + }
> + spin_unlock(&inode_lock);
> + return NULL;
> +}
> +EXPORT_SYMBOL(get_inodes);

What purpose does the return type have?

> +/*
> + * Function for filesystems that embedd struct inode into their own
> + * structures. The offset is the offset of the struct inode in the fs inode.
> + */
> +void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
> + unsigned long offset)
> +{
> + int i;
> +
> + for (i = 0; i < nr; i++)
> + v[i] += offset;
> +
> + return get_inodes(s, nr, v);
> +}
> +EXPORT_SYMBOL(fs_get_inodes);

The fact that all pointers get changed makes me a bit uneasy:
struct foo_inode v[20];
...
fs_get_inodes(..., v, ...);
...
v[0].foo_field = bar;

No warning, but spectacular fireworks.

> +void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
> +{
> + struct inode *inode;
> + int i;
> + int abort = 0;
> + LIST_HEAD(freeable);
> + struct super_block *sb;
> +
> + for (i = 0; i < nr; i++) {
> + inode = v[i];
> + if (!inode)
> + continue;

NULL is legal here? Then fs_get_inodes should check for NULL as well
and not add the offset to NULL pointers, I guess.

> + if (inode_has_buffers(inode) || inode->i_data.nrpages) {
> + if (remove_inode_buffers(inode))
> + invalidate_mapping_pages(&inode->i_data,
> + 0, -1);

This linebreak can be removed.

> + }
> +
> + /* Invalidate children and dentry */
> + if (S_ISDIR(inode->i_mode)) {
> + struct dentry *d = d_find_alias(inode);
> +
> + if (d) {
> + d_invalidate(d);
> + dput(d);
> + }
> + }
> +
> + if (inode->i_state & I_DIRTY)
> + write_inode_now(inode, 1);

Once more the three-bit I_DIRTY is used like a boolean value. I don't
hold it against you, specifically. A general review/cleanup is
necessary for that.

Jörn

--
"[One] doesn't need to know [...] how to cause a headache in order
to take an aspirin."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001

2007-11-07 10:35:37

by Andreas Schwab

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

J?rn Engel <[email protected]> writes:

> On Tue, 6 November 2007 17:11:44 -0800, Christoph Lameter wrote:
>>
>> +/*
>> + * Function for filesystems that embedd struct inode into their own
>> + * structures. The offset is the offset of the struct inode in the fs inode.
>> + */
>> +void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
>> + unsigned long offset)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < nr; i++)
>> + v[i] += offset;
>> +
>> + return get_inodes(s, nr, v);
>> +}
>> +EXPORT_SYMBOL(fs_get_inodes);
>
> The fact that all pointers get changed makes me a bit uneasy:
> struct foo_inode v[20];
> ...
> fs_get_inodes(..., v, ...);
> ...
> v[0].foo_field = bar;
>
> No warning, but spectacular fireworks.

You'l get a warning that struct foo_inode * is incompatible with void **.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2007-11-07 10:36:22

by Jörn Engel

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Wed, 7 November 2007 11:17:48 +0100, Jörn Engel wrote:
> > +/*
> > + * Function for filesystems that embedd struct inode into their own
> > + * structures. The offset is the offset of the struct inode in the fs inode.
> > + */
> > +void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
> > + unsigned long offset)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < nr; i++)
> > + v[i] += offset;
> > +
> > + return get_inodes(s, nr, v);
> > +}
> > +EXPORT_SYMBOL(fs_get_inodes);
>
> The fact that all pointers get changed makes me a bit uneasy:
> struct foo_inode v[20];
> ...
> fs_get_inodes(..., v, ...);
> ...
> v[0].foo_field = bar;
>
> No warning, but spectacular fireworks.
>
> > +void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
> > +{
> > + struct inode *inode;
> > + int i;
> > + int abort = 0;
> > + LIST_HEAD(freeable);
> > + struct super_block *sb;
> > +
> > + for (i = 0; i < nr; i++) {
> > + inode = v[i];
> > + if (!inode)
> > + continue;
>
> NULL is legal here? Then fs_get_inodes should check for NULL as well
> and not add the offset to NULL pointers, I guess.

Ignore these two comments. Reading further before making them would
have helped. ;)

Jörn

--
Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is
frequently going to be big, don't get fancy.
-- Rob Pike

2007-11-07 10:40:53

by Jörn Engel

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Wed, 7 November 2007 11:35:13 +0100, Andreas Schwab wrote:
> >
> > The fact that all pointers get changed makes me a bit uneasy:
> > struct foo_inode v[20];
> > ...
> > fs_get_inodes(..., v, ...);
> > ...
> > v[0].foo_field = bar;
> >
> > No warning, but spectacular fireworks.
>
> You'l get a warning that struct foo_inode * is incompatible with void **.
- struct foo_inode v[20];
+ struct foo_inode *v[20];

Looks like my example needs a patch as well. Anyway, the function is
used in a way that makes this a non-issue.

Jörn

--
You cannot suppose that Moliere ever troubled himself to be original in the
matter of ideas. You cannot suppose that the stories he tells in his plays
have never been told before. They were culled, as you very well know.
-- Andre-Louis Moreau in Scarabouche

2007-11-07 18:41:11

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Wed, 7 Nov 2007, J?rn Engel wrote:

> On Tue, 6 November 2007 17:11:44 -0800, Christoph Lameter wrote:
> >
> > +void *get_inodes(struct kmem_cache *s, int nr, void **v)
> > +{
> > + int i;
> > +
> > + spin_lock(&inode_lock);
> > + for (i = 0; i < nr; i++) {
> > + struct inode *inode = v[i];
> > +
> > + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> > + v[i] = NULL;
> > + else
> > + __iget(inode);
> > + }
> > + spin_unlock(&inode_lock);
> > + return NULL;
> > +}
> > +EXPORT_SYMBOL(get_inodes);
>
> What purpose does the return type have?

The pointer is for communication between the get and kick methods. get()
can modify kick() behavior by returning a pointer to a data structure or
using the pointer to set a flag. F.e. get() may discover that there is an
unreclaimable object and set a flag that causes kick to simply undo the
refcount increment. get() may build a map for the objects and indicate in
the map special treatment.

> > +void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
> > + unsigned long offset)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < nr; i++)
> > + v[i] += offset;
> > +
> > + return get_inodes(s, nr, v);
> > +}
> > +EXPORT_SYMBOL(fs_get_inodes);
>
> The fact that all pointers get changed makes me a bit uneasy:
> struct foo_inode v[20];
> ...
> fs_get_inodes(..., v, ...);
> ...
> v[0].foo_field = bar;
>
> No warning, but spectacular fireworks.

As far as I can remember: The core code always passes pointers to struct
inode to the filesystems. The filesystems will then recalculate the
pointers to point to the fs ide of an inode.


> > +void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
> > +{
> > + struct inode *inode;
> > + int i;
> > + int abort = 0;
> > + LIST_HEAD(freeable);
> > + struct super_block *sb;
> > +
> > + for (i = 0; i < nr; i++) {
> > + inode = v[i];
> > + if (!inode)
> > + continue;
>
> NULL is legal here? Then fs_get_inodes should check for NULL as well
> and not add the offset to NULL pointers, I guess.

The get() method may have set a pointer to NULL. The fs_get_inodes() is
run at a time when all pointers are valid.

> > + }
> > +
> > + /* Invalidate children and dentry */
> > + if (S_ISDIR(inode->i_mode)) {
> > + struct dentry *d = d_find_alias(inode);
> > +
> > + if (d) {
> > + d_invalidate(d);
> > + dput(d);
> > + }
> > + }
> > +
> > + if (inode->i_state & I_DIRTY)
> > + write_inode_now(inode, 1);
>
> Once more the three-bit I_DIRTY is used like a boolean value. I don't
> hold it against you, specifically. A general review/cleanup is
> necessary for that.

Yeah. I'd be glad if someone could take this piece off my hands.

2007-11-07 18:56:17

by Jörn Engel

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Wed, 7 November 2007 10:40:55 -0800, Christoph Lameter wrote:
> On Wed, 7 Nov 2007, Jörn Engel wrote:
> > On Tue, 6 November 2007 17:11:44 -0800, Christoph Lameter wrote:
> > >
> > > +void *get_inodes(struct kmem_cache *s, int nr, void **v)
> > > +{
> > > + int i;
> > > +
> > > + spin_lock(&inode_lock);
> > > + for (i = 0; i < nr; i++) {
> > > + struct inode *inode = v[i];
> > > +
> > > + if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
> > > + v[i] = NULL;
> > > + else
> > > + __iget(inode);
> > > + }
> > > + spin_unlock(&inode_lock);
> > > + return NULL;
> > > +}
> > > +EXPORT_SYMBOL(get_inodes);
> >
> > What purpose does the return type have?
>
> The pointer is for communication between the get and kick methods. get()
> can modify kick() behavior by returning a pointer to a data structure or
> using the pointer to set a flag. F.e. get() may discover that there is an
> unreclaimable object and set a flag that causes kick to simply undo the
> refcount increment. get() may build a map for the objects and indicate in
> the map special treatment.

Is there a get/kick pair that actually does this? So far I haven't
found anything like it.

Also, something vaguely matching that paragraph might make sense in a
kerneldoc header to the function. ;)

Jörn

--
There is no worse hell than that provided by the regrets
for wasted opportunities.
-- Andre-Louis Moreau in Scarabouche

2007-11-07 19:00:29

by Christoph Lameter

[permalink] [raw]
Subject: Re: [patch 14/23] inodes: Support generic defragmentation

On Wed, 7 Nov 2007, J?rn Engel wrote:

> > The pointer is for communication between the get and kick methods. get()
> > can modify kick() behavior by returning a pointer to a data structure or
> > using the pointer to set a flag. F.e. get() may discover that there is an
> > unreclaimable object and set a flag that causes kick to simply undo the
> > refcount increment. get() may build a map for the objects and indicate in
> > the map special treatment.
>
> Is there a get/kick pair that actually does this? So far I haven't
> found anything like it.

Hmmm.. Nothing uses it at this point. I went through a series of get/kicks
during development. Some needed it. I suspect that we will need it when we
implement reallocation instead of simply reclaiming. It is also necessary
if we get into the situation where we want to optimize the reclaim. At
that point the kick method needs to know how far get() got before the
action was aborted in order to fix up only certain refcounts.

> Also, something vaguely matching that paragraph might make sense in a
> kerneldoc header to the function. ;)

Its described in slab.h