2010-06-24 03:26:19

by Nick Piggin

[permalink] [raw]
Subject: [patch 42/52] fs: icache per-cpu last_ino allocator

From: Eric Dumazet <[email protected]>

new_inode() dirties a contended cache line to get increasing inode numbers.

Solve this problem by providing to each cpu a per_cpu variable, feeded by the
shared last_ino, but once every 1024 allocations.

This reduce contention on the shared last_ino, and give same spreading ino
numbers than before. (same wraparound after 2^32 allocations)

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Nick Piggin <[email protected]>
---
fs/inode.c | 42 +++++++++++++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 7 deletions(-)

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -664,6 +664,40 @@ __inode_add_to_lists(struct super_block
}
}

+#ifdef CONFIG_SMP
+/*
+ * Each cpu owns a range of 1024 numbers.
+ * 'shared_last_ino' is dirtied only once out of 1024 allocations,
+ * to renew the exhausted range.
+ *
+ * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
+ * error if st_ino won't fit in target struct field. Use 32bit counter
+ * here to attempt to avoid that.
+ */
+static DEFINE_PER_CPU(int, last_ino);
+static atomic_t shared_last_ino;
+
+static int last_ino_get(void)
+{
+ int *p = &get_cpu_var(last_ino);
+ int res = *p;
+
+ if (unlikely((res & 1023) == 0))
+ res = atomic_add_return(1024, &shared_last_ino) - 1024;
+
+ *p = ++res;
+ put_cpu_var(last_ino);
+ return res;
+}
+#else
+static int last_ino_get(void)
+{
+ static int last_ino;
+
+ return ++last_ino;
+}
+#endif
+
/**
* inode_add_to_lists - add a new inode to relevant lists
* @sb: superblock inode belongs to
@@ -700,19 +734,13 @@ EXPORT_SYMBOL_GPL(inode_add_to_lists);
*/
struct inode *new_inode(struct super_block *sb)
{
- /*
- * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
- * error if st_ino won't fit in target struct field. Use 32bit counter
- * here to attempt to avoid that.
- */
- static atomic_t last_ino = ATOMIC_INIT(0);
struct inode *inode;

inode = alloc_inode(sb);
if (inode) {
/* XXX: init as locked for speedup */
spin_lock(&inode->i_lock);
- inode->i_ino = atomic_inc_return(&last_ino);
+ inode->i_ino = last_ino_get();
inode->i_state = 0;
__inode_add_to_lists(sb, NULL, inode);
spin_unlock(&inode->i_lock);


2010-06-24 09:48:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 42/52] fs: icache per-cpu last_ino allocator

[email protected] writes:

> From: Eric Dumazet <[email protected]>
>
> new_inode() dirties a contended cache line to get increasing inode numbers.
>
> Solve this problem by providing to each cpu a per_cpu variable, feeded by the
> shared last_ino, but once every 1024 allocations.

Most file systems don't even need this because they
allocate their own inode numbers, right?. So perhaps it could be turned
off for all of those, e.g. with a superblock flag.

I guess the main customer is sockets only.

> +#ifdef CONFIG_SMP
> +/*
> + * Each cpu owns a range of 1024 numbers.
> + * 'shared_last_ino' is dirtied only once out of 1024 allocations,
> + * to renew the exhausted range.
> + *
> + * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
> + * error if st_ino won't fit in target struct field. Use 32bit counter
> + * here to attempt to avoid that.

I don't understand how the 32bit counter should prevent that.

> + */
> +static DEFINE_PER_CPU(int, last_ino);
> +static atomic_t shared_last_ino;

With the 1024 skip, isn't overflow much more likely, just scaling
with the number of CPUs on a large CPU number systems, even if there
aren't that many new inodes?

> +static int last_ino_get(void)
> +{
> + int *p = &get_cpu_var(last_ino);
> + int res = *p;
> +
> + if (unlikely((res & 1023) == 0))
> + res = atomic_add_return(1024, &shared_last_ino) - 1024;

The magic numbers really want to be defines?

-Andi
--
[email protected] -- Speaking for myself only.

2010-06-24 15:52:51

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 42/52] fs: icache per-cpu last_ino allocator

On Thu, Jun 24, 2010 at 11:48:13AM +0200, Andi Kleen wrote:
> [email protected] writes:
>
> > From: Eric Dumazet <[email protected]>
> >
> > new_inode() dirties a contended cache line to get increasing inode numbers.
> >
> > Solve this problem by providing to each cpu a per_cpu variable, feeded by the
> > shared last_ino, but once every 1024 allocations.
>
> Most file systems don't even need this because they
> allocate their own inode numbers, right?. So perhaps it could be turned
> off for all of those, e.g. with a superblock flag.

That's right. More or less it just requires alloc_inode to be exported,
adding more branches in new_inode would not be a good way to go.

But I didn't want to start microoptimisations in filesystems just yet.


> I guess the main customer is sockets only.

I guess. Sockets and ram based filesystems. Interestingly I don't know
really what it's for (in socket code it's mostly for reporting and
hashing it seems). It sure isn't guaranteed to be unique.

Anyway it's outside the scope of this patchset to change functionality
at all.


> > +#ifdef CONFIG_SMP
> > +/*
> > + * Each cpu owns a range of 1024 numbers.
> > + * 'shared_last_ino' is dirtied only once out of 1024 allocations,
> > + * to renew the exhausted range.
> > + *
> > + * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
> > + * error if st_ino won't fit in target struct field. Use 32bit counter
> > + * here to attempt to avoid that.
>
> I don't understand how the 32bit counter should prevent that.

Well I think glibc will convert 64 bit stat struct to 32bit for
old apps. It detects if the ino can't fit in 32 bits.


> > +static DEFINE_PER_CPU(int, last_ino);
> > +static atomic_t shared_last_ino;
>
> With the 1024 skip, isn't overflow much more likely, just scaling
> with the number of CPUs on a large CPU number systems, even if there
> aren't that many new inodes?

Well EOVERFLOW should never happen with only the low 32 significant
bits set in the inode. If you are worried about wrapping the counter,
then no I don't think it is much more likely.

Because each CPU will only reserve another 1024 inode interval after
it has already allocated 1024 numbers. So the most wastage you will
get is (1024-1)*NR_CPUS -- somewhere around 1/1000th of the available
range.

I guess overflow will be more common now because it will be possible
to allocate inodes much faster on such a huge machine :)


> > +static int last_ino_get(void)
> > +{
> > + int *p = &get_cpu_var(last_ino);
> > + int res = *p;
> > +
> > + if (unlikely((res & 1023) == 0))
> > + res = atomic_add_return(1024, &shared_last_ino) - 1024;
>
> The magic numbers really want to be defines?

Sure OK.

2010-06-24 16:19:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 42/52] fs: icache per-cpu last_ino allocator

On Fri, Jun 25, 2010 at 01:52:43AM +1000, Nick Piggin wrote:
>
> That's right. More or less it just requires alloc_inode to be exported,
> adding more branches in new_inode would not be a good way to go.

One test/branch shouldn't hurt much.

> > I guess the main customer is sockets only.
>
> I guess. Sockets and ram based filesystems. Interestingly I don't know
> really what it's for (in socket code it's mostly for reporting and
> hashing it seems). It sure isn't guaranteed to be unique.

Maybe it could be generated lazily on access for those?
I suppose stat on a socket is relatively rare.
The only problem is would need an accessor.

But ok out of scope.

> Well I think glibc will convert 64 bit stat struct to 32bit for
> old apps. It detects if the ino can't fit in 32 bits.

... and will fail the stat.

-Andi

--
[email protected] -- Speaking for myself only.

2010-06-24 16:39:10

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 42/52] fs: icache per-cpu last_ino allocator

On Thu, Jun 24, 2010 at 06:19:49PM +0200, Andi Kleen wrote:
> On Fri, Jun 25, 2010 at 01:52:43AM +1000, Nick Piggin wrote:
> >
> > That's right. More or less it just requires alloc_inode to be exported,
> > adding more branches in new_inode would not be a good way to go.
>
> One test/branch shouldn't hurt much.

If we go through filesystems anyway may as well just use alloc_inode.


> > > I guess the main customer is sockets only.
> >
> > I guess. Sockets and ram based filesystems. Interestingly I don't know
> > really what it's for (in socket code it's mostly for reporting and
> > hashing it seems). It sure isn't guaranteed to be unique.
>
> Maybe it could be generated lazily on access for those?
> I suppose stat on a socket is relatively rare.
> The only problem is would need an accessor.
>
> But ok out of scope.

Yea that might work. sock_i_ino() and ->dname covers a lot.


> > Well I think glibc will convert 64 bit stat struct to 32bit for
> > old apps. It detects if the ino can't fit in 32 bits.
>
> ... and will fail the stat.

Which is what we're trying to avoid, I guess.