2010-08-23 14:48:44

by Chris Mason

[permalink] [raw]
Subject: aio: bump i_count instead of using igrab

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch. igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle. It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason <[email protected]>

diff --git a/fs/aio.c b/fs/aio.c
index cf0bef4..0be69fa 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1558,7 +1558,20 @@ static void aio_batch_add(struct address_space *mapping,
}

abe = mempool_alloc(abe_pool, GFP_KERNEL);
- BUG_ON(!igrab(mapping->host));
+
+ /*
+ * we should be using igrab here, but
+ * we don't want to hammer on the global
+ * inode spinlock just to take an extra
+ * reference on a file that we must already
+ * have a reference to.
+ *
+ * When we're called, we always have a reference
+ * on the file, so we must always have a reference
+ * on the inode, so igrab must always just
+ * bump the count and move on.
+ */
+ atomic_inc(&mapping->host->i_count);
abe->mapping = mapping;
hlist_add_head(&abe->list, &batch_hash[bucket]);
return;


2010-08-23 14:50:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: aio: bump i_count instead of using igrab

On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> The aio batching code is using igrab to get an extra reference on the
> inode so it can safely batch. igrab will go ahead and take the global
> inode spinlock, which can be a bottleneck on large machines doing lots
> of AIO.
>
> In this case, igrab isn't required because we already have a reference
> on the file handle. It is safe to just bump the i_count directly
> on the inode.
>
> Benchmarking shows this patch brings IOP/s on tons of flash up by about
> 2.5X.

There's some places in XFS where we do the same, and it showed up as a
bottle neck before. Instead of open coding the increment we have
a wrapper that includes and assert that the numbers is always positive.

I think we really want a proper helper for general use instead of
completly opencoding it.

2010-08-23 15:01:28

by Chris Mason

[permalink] [raw]
Subject: Re: aio: bump i_count instead of using igrab

On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> > The aio batching code is using igrab to get an extra reference on the
> > inode so it can safely batch. igrab will go ahead and take the global
> > inode spinlock, which can be a bottleneck on large machines doing lots
> > of AIO.
> >
> > In this case, igrab isn't required because we already have a reference
> > on the file handle. It is safe to just bump the i_count directly
> > on the inode.
> >
> > Benchmarking shows this patch brings IOP/s on tons of flash up by about
> > 2.5X.
>
> There's some places in XFS where we do the same, and it showed up as a
> bottle neck before. Instead of open coding the increment we have
> a wrapper that includes and assert that the numbers is always positive.
>
> I think we really want a proper helper for general use instead of
> completly opencoding it.
>

Nick, this is about a 1 liner to fs/aio.c replacing igrab with
atomic_inc directly on the inode reference count.

I know your scalability tree gets rid of the global, but in this case I
think it still makes sense to avoid the locking completely when the
caller knows it is safe. Do you already have something similar hiding
in the scalability tree?

-chris

2010-08-23 17:26:31

by Jeff Moyer

[permalink] [raw]
Subject: Re: aio: bump i_count instead of using igrab

Chris Mason <[email protected]> writes:

> On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
>> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
>> > The aio batching code is using igrab to get an extra reference on the
>> > inode so it can safely batch. igrab will go ahead and take the global
>> > inode spinlock, which can be a bottleneck on large machines doing lots
>> > of AIO.
>> >
>> > In this case, igrab isn't required because we already have a reference
>> > on the file handle. It is safe to just bump the i_count directly
>> > on the inode.
>> >
>> > Benchmarking shows this patch brings IOP/s on tons of flash up by about
>> > 2.5X.
>>
>> There's some places in XFS where we do the same, and it showed up as a
>> bottle neck before. Instead of open coding the increment we have
>> a wrapper that includes and assert that the numbers is always positive.
>>
>> I think we really want a proper helper for general use instead of
>> completly opencoding it.
>>
>
> Nick, this is about a 1 liner to fs/aio.c replacing igrab with
> atomic_inc directly on the inode reference count.
>
> I know your scalability tree gets rid of the global, but in this case I
> think it still makes sense to avoid the locking completely when the
> caller knows it is safe. Do you already have something similar hiding
> in the scalability tree?

I opted for the safe route, initially, as I was not too familiar with
the locking. If it's deemed safe to just do the increment, that works
for me.

Thanks for tracking this down, Chris!

Cheers,
Jeff

2010-08-23 21:18:50

by Jeff Moyer

[permalink] [raw]
Subject: Re: aio: bump i_count instead of using igrab

Christoph Hellwig <[email protected]> writes:

> On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
>> The aio batching code is using igrab to get an extra reference on the
>> inode so it can safely batch. igrab will go ahead and take the global
>> inode spinlock, which can be a bottleneck on large machines doing lots
>> of AIO.
>>
>> In this case, igrab isn't required because we already have a reference
>> on the file handle. It is safe to just bump the i_count directly
>> on the inode.
>>
>> Benchmarking shows this patch brings IOP/s on tons of flash up by about
>> 2.5X.
>
> There's some places in XFS where we do the same, and it showed up as a
> bottle neck before. Instead of open coding the increment we have
> a wrapper that includes and assert that the numbers is always positive.
>
> I think we really want a proper helper for general use instead of
> completly opencoding it.

Well, it would make detecting races or invalid assumptions a little
easier. If Chris wants to code that up, that's fine with me. Honestly,
though, I don't think it's necessary.

I've gone through the alloc/free paths for the inode and I'm convinced
this is safe. I'm happy with this version of the patch.

Reviewed-by: Jeff Moyer <[email protected]>

2010-08-24 07:34:08

by Nick Piggin

[permalink] [raw]
Subject: Re: aio: bump i_count instead of using igrab

On Mon, Aug 23, 2010 at 11:00:23AM -0400, Chris Mason wrote:
> On Mon, Aug 23, 2010 at 10:50:31AM -0400, Christoph Hellwig wrote:
> > On Mon, Aug 23, 2010 at 10:47:55AM -0400, Chris Mason wrote:
> > > The aio batching code is using igrab to get an extra reference on the
> > > inode so it can safely batch. igrab will go ahead and take the global
> > > inode spinlock, which can be a bottleneck on large machines doing lots
> > > of AIO.
> > >
> > > In this case, igrab isn't required because we already have a reference
> > > on the file handle. It is safe to just bump the i_count directly
> > > on the inode.
> > >
> > > Benchmarking shows this patch brings IOP/s on tons of flash up by about
> > > 2.5X.
> >
> > There's some places in XFS where we do the same, and it showed up as a
> > bottle neck before. Instead of open coding the increment we have
> > a wrapper that includes and assert that the numbers is always positive.
> >
> > I think we really want a proper helper for general use instead of
> > completly opencoding it.

Yeah igrab isn't nice. Several places are calling it without checking
return code too, which means they are either buggy or should be using
something else.


> Nick, this is about a 1 liner to fs/aio.c replacing igrab with
> atomic_inc directly on the inode reference count.
>
> I know your scalability tree gets rid of the global, but in this case I
> think it still makes sense to avoid the locking completely when the
> caller knows it is safe. Do you already have something similar hiding
> in the scalability tree?

Yes of course :) It has just simple refcounting increment helper that
can do some of the assertions.

I am hoping to be able to post the inode stuff soonish and hopefully
get it into Al's tree. It will pop up on fsdevel.