2022-04-01 10:36:10

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] fs: change test in inode_insert5 for adding to the sb list

The inode_insert5 currently looks at I_CREATING to decide whether to
insert the inode into the sb list. This test is a bit ambiguous though
as I_CREATING state is not directly related to that list.

This test is also problematic for some upcoming ceph changes to add
fscrypt support. We need to be able to allocate an inode using new_inode
and insert it into the hash later if we end up using it, and doing that
now means that we double add it and corrupt the list.

What we really want to know in this test is whether the inode is already
in its superblock list, and then add it if it isn't. Have it test for
list_empty instead and ensure that we always initialize the list by
doing it in inode_init_once. It's only ever removed from the list with
list_del_init, so that should be sufficient.

Suggested-by: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/inode.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

This is the alternate approach that Al suggested to me on IRC. I think
this is likely to be more robust in the long run, and we can avoid
exporting another symbol.

Al, if you're ok with this, would you mind taking this in via your tree?
I'd like to see this in sit in linux-next for a bit so we can see if any
benchmarks get dinged.

diff --git a/fs/inode.c b/fs/inode.c
index 63324df6fa27..e10cff5102d4 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -422,6 +422,7 @@ void inode_init_once(struct inode *inode)
INIT_LIST_HEAD(&inode->i_io_list);
INIT_LIST_HEAD(&inode->i_wb_list);
INIT_LIST_HEAD(&inode->i_lru);
+ INIT_LIST_HEAD(&inode->i_sb_list);
__address_space_init_once(&inode->i_data);
i_size_ordered_init(inode);
}
@@ -1021,7 +1022,6 @@ struct inode *new_inode_pseudo(struct super_block *sb)
spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);
- INIT_LIST_HEAD(&inode->i_sb_list);
}
return inode;
}
@@ -1165,7 +1165,6 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
{
struct hlist_head *head = inode_hashtable + hash(inode->i_sb, hashval);
struct inode *old;
- bool creating = inode->i_state & I_CREATING;

again:
spin_lock(&inode_hash_lock);
@@ -1199,7 +1198,13 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval,
inode->i_state |= I_NEW;
hlist_add_head_rcu(&inode->i_hash, head);
spin_unlock(&inode->i_lock);
- if (!creating)
+
+ /*
+ * Add it to the list if it wasn't already in,
+ * e.g. new_inode. We hold I_NEW at this point, so
+ * we should be safe to test i_sb_list locklessly.
+ */
+ if (list_empty(&inode->i_sb_list))
inode_sb_list_add(inode);
unlock:
spin_unlock(&inode_hash_lock);
--
2.35.1


2022-04-02 14:26:58

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] fs: change test in inode_insert5 for adding to the sb list

On Thu, Mar 31, 2022 at 06:56:32PM -0400, Jeff Layton wrote:
> The inode_insert5 currently looks at I_CREATING to decide whether to
> insert the inode into the sb list. This test is a bit ambiguous though
> as I_CREATING state is not directly related to that list.
>
> This test is also problematic for some upcoming ceph changes to add
> fscrypt support. We need to be able to allocate an inode using new_inode
> and insert it into the hash later if we end up using it, and doing that
> now means that we double add it and corrupt the list.
>
> What we really want to know in this test is whether the inode is already
> in its superblock list, and then add it if it isn't. Have it test for
> list_empty instead and ensure that we always initialize the list by
> doing it in inode_init_once. It's only ever removed from the list with
> list_del_init, so that should be sufficient.
>
> Suggested-by: Al Viro <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/inode.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> This is the alternate approach that Al suggested to me on IRC. I think
> this is likely to be more robust in the long run, and we can avoid
> exporting another symbol.

Looks good to me.

Reviewed-by: Dave Chinner <[email protected]>

FWIW, I'm getting ready to resend patches originally written by
Waiman Long years ago to convert the inode sb list to a different
structure (per-cpu lists) for scalability reasons, but is still
allows using list-empty() to check if the inode is on the list or
not so I dont' see a problem with this change at all.

> Al, if you're ok with this, would you mind taking this in via your tree?
> I'd like to see this in sit in linux-next for a bit so we can see if any
> benchmarks get dinged.

I think that is unlikely - the sb inode list just doesn't show up in
profiles until you are pushing several hundred thousand inodes a
second through the inode cache and there really aren't a lot of
worklaods out there that do that. At that point, sb list lock
contention becomes the issue, not the requirement to add in-use
inodes to the sb list...

e.g. concurrent 'find <...> -ctime' operations on XFS hit sb list
lock contention limits at about 600,000 inodes/s being,
instantiated, stat()d and reclaimed from memory. With
Waiman's dlist code I mention above, it'll do 1.5 million inodes/s
for the same CPU usage. And a concurrent bulkstat workload goes
from 600,000 inodes/s to over 6 million inodes/s for the same
CPU usage. That bulkstat workload is hitting memory reclaim
scalability limits as I'm turning over ~12GB/s of cached memory on a
machine with only 16GB RAM...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2022-04-02 16:35:58

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] fs: change test in inode_insert5 for adding to the sb list

On Fri, 2022-04-01 at 14:53 +1100, Dave Chinner wrote:
> On Thu, Mar 31, 2022 at 06:56:32PM -0400, Jeff Layton wrote:
> > The inode_insert5 currently looks at I_CREATING to decide whether to
> > insert the inode into the sb list. This test is a bit ambiguous though
> > as I_CREATING state is not directly related to that list.
> >
> > This test is also problematic for some upcoming ceph changes to add
> > fscrypt support. We need to be able to allocate an inode using new_inode
> > and insert it into the hash later if we end up using it, and doing that
> > now means that we double add it and corrupt the list.
> >
> > What we really want to know in this test is whether the inode is already
> > in its superblock list, and then add it if it isn't. Have it test for
> > list_empty instead and ensure that we always initialize the list by
> > doing it in inode_init_once. It's only ever removed from the list with
> > list_del_init, so that should be sufficient.
> >
> > Suggested-by: Al Viro <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/inode.c | 11 ++++++++---
> > 1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > This is the alternate approach that Al suggested to me on IRC. I think
> > this is likely to be more robust in the long run, and we can avoid
> > exporting another symbol.
>
> Looks good to me.
>
> Reviewed-by: Dave Chinner <[email protected]>
>
> FWIW, I'm getting ready to resend patches originally written by
> Waiman Long years ago to convert the inode sb list to a different
> structure (per-cpu lists) for scalability reasons, but is still
> allows using list-empty() to check if the inode is on the list or
> not so I dont' see a problem with this change at all.
>

Thanks, Dave.

> > Al, if you're ok with this, would you mind taking this in via your tree?
> > I'd like to see this in sit in linux-next for a bit so we can see if any
> > benchmarks get dinged.
>
> I think that is unlikely - the sb inode list just doesn't show up in
> profiles until you are pushing several hundred thousand inodes a
> second through the inode cache and there really aren't a lot of
> worklaods out there that do that. At that point, sb list lock
> contention becomes the issue, not the requirement to add in-use
> inodes to the sb list...
>

My (minor) concern was that since we're now initializing this list for
all allocations, not just in new_inode, it could potentially slow down
some callers. I agree that it seems pretty unlikely to be an issue
though.

> e.g. concurrent 'find <...> -ctime' operations on XFS hit sb list
> lock contention limits at about 600,000 inodes/s being,
> instantiated, stat()d and reclaimed from memory. With
> Waiman's dlist code I mention above, it'll do 1.5 million inodes/s
> for the same CPU usage. And a concurrent bulkstat workload goes
> from 600,000 inodes/s to over 6 million inodes/s for the same
> CPU usage. That bulkstat workload is hitting memory reclaim
> scalability limits as I'm turning over ~12GB/s of cached memory on a
> machine with only 16GB RAM...
>
> Cheers,
>
> Dave.

--
Jeff Layton <[email protected]>