LinuxLists.cc - [patch 52/52] fs: icache less I

2010-06-24 03:24:37

Subject: [patch 52/52] fs: icache less I_FREEING time

Problem with inode reclaim is that it puts inodes into I_FREEING state
and then continues to gather more, during which it may iput,
invalidate_mapping_pages, be preempted, etc. Holding these inodes in
I_FREEING can cause pauses.

After the inode scalability work, there is not a big reason to batch
up inodes to reclaim them, so dispose them as they are found from the
LRU.

Signed-off-by: Nick Piggin <[email protected]>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -390,6 +390,19 @@ EXPORT_SYMBOL(clear_inode);

static void inode_sb_list_del(struct inode *inode);

+static void dispose_one_inode(struct inode *inode)
+{
+ clear_inode(inode);
+
+ spin_lock(&inode->i_lock);
+ __remove_inode_hash(inode);
+ inode_sb_list_del(inode);
+ spin_unlock(&inode->i_lock);
+
+ wake_up_inode(inode);
+ destroy_inode(inode);
+}
+
/*
* dispose_list - dispose of the contents of a local list
* @head: the head of the list to free
@@ -409,15 +422,8 @@ static void dispose_list(struct list_hea

if (inode->i_data.nrpages)
truncate_inode_pages(&inode->i_data, 0);
- clear_inode(inode);
-
- spin_lock(&inode->i_lock);
- __remove_inode_hash(inode);
- inode_sb_list_del(inode);
- spin_unlock(&inode->i_lock);
-
- wake_up_inode(inode);
- destroy_inode(inode);
+ dispose_one_inode(inode);
+ cond_resched();
nr_disposed++;
}
}
@@ -526,7 +532,6 @@ EXPORT_SYMBOL(invalidate_inodes);
*/
static void prune_icache(struct zone *zone, unsigned long nr_to_scan)
{
- LIST_HEAD(freeable);
unsigned long reap = 0;

down_read(&iprune_sem);
@@ -563,8 +568,6 @@ again:
__iget(inode);
spin_unlock(&inode->i_lock);

- dispose_list(&freeable);
-
if (remove_inode_buffers(inode))
reap += invalidate_mapping_pages(&inode->i_data,
0, -1);
@@ -572,11 +575,15 @@ again:
spin_lock(&zone->inode_lru_lock);
continue;
}
- list_move(&inode->i_lru, &freeable);
+ list_del_init(&inode->i_lru);
WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_FREEING;
spin_unlock(&inode->i_lock);
zone->inode_nr_lru--;
+ spin_unlock(&zone->inode_lru_lock);
+ dispose_one_inode(inode);
+ cond_resched();
+ spin_lock(&zone->inode_lru_lock);
}
if (current_is_kswapd())
__count_vm_events(KSWAPD_INODESTEAL, reap);
@@ -584,7 +591,6 @@ again:
__count_vm_events(PGINODESTEAL, reap);
spin_unlock(&zone->inode_lru_lock);

- dispose_list(&freeable);
up_read(&iprune_sem);
}

2010-06-30 10:14:12

by Dave Chinner

[permalink] [raw]

Subject: Re: [patch 52/52] fs: icache less I_FREEING time

On Thu, Jun 24, 2010 at 01:03:04PM +1000, [email protected] wrote:
> Problem with inode reclaim is that it puts inodes into I_FREEING state
> and then continues to gather more, during which it may iput,
> invalidate_mapping_pages, be preempted, etc. Holding these inodes in
> I_FREEING can cause pauses.

What sort of pauses? I can't see how holding a few inodes in
I_FREEING state would cause any serious sort of holdoff...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-06-30 14:33:29

by Nick Piggin

[permalink] [raw]

Subject: Re: [patch 52/52] fs: icache less I_FREEING time

On Wed, Jun 30, 2010 at 08:13:54PM +1000, Dave Chinner wrote:
> On Thu, Jun 24, 2010 at 01:03:04PM +1000, [email protected] wrote:
> > Problem with inode reclaim is that it puts inodes into I_FREEING state
> > and then continues to gather more, during which it may iput,
> > invalidate_mapping_pages, be preempted, etc. Holding these inodes in
> > I_FREEING can cause pauses.
>
> What sort of pauses? I can't see how holding a few inodes in
> I_FREEING state would cause any serious sort of holdoff...

Well if the inode is accessed again, it has to wait for potentially
hundreds of inodes to be found from the LRU, pagecache invalidated,
and destroyed.

2010-07-01 03:34:03

by Dave Chinner

[permalink] [raw]

Subject: Re: [patch 52/52] fs: icache less I_FREEING time

On Wed, Jun 30, 2010 at 10:14:52PM +1000, Nick Piggin wrote:
> On Wed, Jun 30, 2010 at 08:13:54PM +1000, Dave Chinner wrote:
> > On Thu, Jun 24, 2010 at 01:03:04PM +1000, [email protected] wrote:
> > > Problem with inode reclaim is that it puts inodes into I_FREEING state
> > > and then continues to gather more, during which it may iput,
> > > invalidate_mapping_pages, be preempted, etc. Holding these inodes in
> > > I_FREEING can cause pauses.
> >
> > What sort of pauses? I can't see how holding a few inodes in
> > I_FREEING state would cause any serious sort of holdoff...
>
> Well if the inode is accessed again, it has to wait for potentially
> hundreds of inodes to be found from the LRU, pagecache invalidated,
> and destroyed.

So it's a theoretical concern you have, not something that's
actually been demonstrated as a problem?

As it is, If the inode is accessed immediately after teardown has
started, then we failed to hold on to the inode at a higher level
for long enough. Changing the I_FREEING behaviour is trying to
address the issue at the wrong level...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-07-01 08:06:35

by Nick Piggin

[permalink] [raw]

Subject: Re: [patch 52/52] fs: icache less I_FREEING time

On Thu, Jul 01, 2010 at 01:33:42PM +1000, Dave Chinner wrote:
> On Wed, Jun 30, 2010 at 10:14:52PM +1000, Nick Piggin wrote:
> > On Wed, Jun 30, 2010 at 08:13:54PM +1000, Dave Chinner wrote:
> > > On Thu, Jun 24, 2010 at 01:03:04PM +1000, [email protected] wrote:
> > > > Problem with inode reclaim is that it puts inodes into I_FREEING state
> > > > and then continues to gather more, during which it may iput,
> > > > invalidate_mapping_pages, be preempted, etc. Holding these inodes in
> > > > I_FREEING can cause pauses.
> > >
> > > What sort of pauses? I can't see how holding a few inodes in
> > > I_FREEING state would cause any serious sort of holdoff...
> >
> > Well if the inode is accessed again, it has to wait for potentially
> > hundreds of inodes to be found from the LRU, pagecache invalidated,
> > and destroyed.
>
> So it's a theoretical concern you have, not something that's
> actually been demonstrated as a problem?

Yes it causes holdoffs. I actually started visibly seeing the problem
badly in my parallel git diff workload as I started experimenting with
increasing batch size.

But even smaller batches could have a problem if the wrong inode is hit.
I didn't add instrumentation in there, but I just didn't see such a
benefit from batching after the lock breakups, so I prefer to go the
other way and add back batching and its downsides IF that proves to be
an improvement.

Is this of any consequence for the filesystem?

> As it is, If the inode is accessed immediately after teardown has
> started, then we failed to hold on to the inode at a higher level
> for long enough.

Or simply didn't anticpate userspace access pattern.

> Changing the I_FREEING behaviour is trying to
> address the issue at the wrong level...

Reclaim should of course be as good as possible. But it is inherntly
going to have problems. You need to design everything around reclaim
thinking about the possibility that it will go wrong. Because it will,
in unreproducable ways, on a customer's machine.