2024-04-04 15:59:34

by David Howells

[permalink] [raw]
Subject: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

During mount, cifs_mount_get_tcon() gets a tcon resource connection record
and then attaches an fscache volume cookie to it. However, it does this
irrespective of whether or not the tcon returned from cifs_get_tcon() is a
new record or one that's already in use. This leads to a warning about a
volume cookie collision and a leaked volume cookie because tcon->fscache
gets reset.

Fix this be adding a mutex and a "we've already tried this" flag and only
doing it once for the lifetime of the tcon.

[!] Note: Looking at cifs_mount_get_tcon(), a more general solution may
actually be required. Reacquiring the volume cookie isn't the only thing
that function does: it also partially reinitialises the tcon record without
any locking - which may cause live filesystem ops already using the tcon
through a previous mount to malfunction.

This can be reproduced simply by something like:

mount //example.com/test /xfstest.test -o user=shares,pass=xxx,fsc
mount //example.com/test /mnt -o user=shares,pass=xxx,fsc

Fixes: 70431bfd825d ("cifs: Support fscache indexing rewrite")
Signed-off-by: David Howells <[email protected]>
cc: Steve French <[email protected]>
cc: Paulo Alcantara <[email protected]>
cc: Shyam Prasad N <[email protected]>
cc: [email protected]
cc: [email protected]
---
fs/smb/client/cifsglob.h | 2 ++
fs/smb/client/fscache.c | 13 +++++++++++++
fs/smb/client/misc.c | 3 +++
3 files changed, 18 insertions(+)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 7ed9d05f6890..43319288b4e3 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1275,7 +1275,9 @@ struct cifs_tcon {
__u32 max_cached_dirs;
#ifdef CONFIG_CIFS_FSCACHE
u64 resource_id; /* server resource id */
+ bool fscache_acquired; /* T if we've tried acquiring a cookie */
struct fscache_volume *fscache; /* cookie for share */
+ struct mutex fscache_lock; /* Prevent regetting a cookie */
#endif
struct list_head pending_opens; /* list of incomplete opens */
struct cached_fids *cfids;
diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c
index 340efce8f052..113bde8f1e61 100644
--- a/fs/smb/client/fscache.c
+++ b/fs/smb/client/fscache.c
@@ -43,12 +43,23 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
char *key;
int ret = -ENOMEM;

+ if (tcon->fscache_acquired)
+ return 0;
+
+ mutex_lock(&tcon->fscache_lock);
+ if (tcon->fscache_acquired) {
+ mutex_unlock(&tcon->fscache_lock);
+ return 0;
+ }
+ tcon->fscache_acquired = true;
+
tcon->fscache = NULL;
switch (sa->sa_family) {
case AF_INET:
case AF_INET6:
break;
default:
+ mutex_unlock(&tcon->fscache_lock);
cifs_dbg(VFS, "Unknown network family '%d'\n", sa->sa_family);
return -EINVAL;
}
@@ -57,6 +68,7 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)

sharename = extract_sharename(tcon->tree_name);
if (IS_ERR(sharename)) {
+ mutex_unlock(&tcon->fscache_lock);
cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__);
return PTR_ERR(sharename);
}
@@ -90,6 +102,7 @@ int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
kfree(key);
out:
kfree(sharename);
+ mutex_unlock(&tcon->fscache_lock);
return ret;
}

diff --git a/fs/smb/client/misc.c b/fs/smb/client/misc.c
index c3771fc81328..b27fbb840539 100644
--- a/fs/smb/client/misc.c
+++ b/fs/smb/client/misc.c
@@ -141,6 +141,9 @@ tcon_info_alloc(bool dir_leases_enabled)
#ifdef CONFIG_CIFS_DFS_UPCALL
INIT_LIST_HEAD(&ret_buf->dfs_ses_list);
#endif
+#ifdef CONFIG_CIFS_FSCACHE
+ mutex_init(&ret_buf->fscache_lock);
+#endif

return ret_buf;
}



2024-04-04 16:08:54

by Paulo Alcantara

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Hi Dave,

David Howells <[email protected]> writes:

> During mount, cifs_mount_get_tcon() gets a tcon resource connection record
> and then attaches an fscache volume cookie to it. However, it does this
> irrespective of whether or not the tcon returned from cifs_get_tcon() is a
> new record or one that's already in use. This leads to a warning about a
> volume cookie collision and a leaked volume cookie because tcon->fscache
> gets reset.
>
> Fix this be adding a mutex and a "we've already tried this" flag and only
> doing it once for the lifetime of the tcon.

Can't we just move the cookie acquisition to cifs_get_tcon() before it
gets added to list @ses->tcon_list. This way we'll guarantee that the
cookie is set only once for the new tcon.

Besides, do we want to share a tcon with two different superblocks that
have 'fsc' and 'nofsc', respectively? If not, it would be better to fix
match_tcon() as well to handle such case.

> [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may
> actually be required. Reacquiring the volume cookie isn't the only thing
> that function does: it also partially reinitialises the tcon record without
> any locking - which may cause live filesystem ops already using the tcon
> through a previous mount to malfunction.

Agreed.

2024-04-16 17:00:18

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Paulo Alcantara <[email protected]> wrote:

> Can't we just move the cookie acquisition to cifs_get_tcon() before it
> gets added to list @ses->tcon_list. This way we'll guarantee that the
> cookie is set only once for the new tcon.

cifs_get_tcon() is used from more than one place and I'm not sure the second
place (__cifs_construct_tcon()) actually wants a cookie. I'm not sure what
that path is for. Could all the (re)setting up being done in
cifs_mount_get_tcon() be pushed back into cifs_get_tcon()?

> Besides, do we want to share a tcon with two different superblocks that
> have 'fsc' and 'nofsc', respectively? If not, it would be better to fix
> match_tcon() as well to handle such case.

Maybe? What does a tcon *actually* represent? I note that in
cifs_match_super(), it's not the only criterion matched upon - so you can, at
least in apparent theory, get different superblocks for the same tcon anyway.

This suggests that the tcon might not be the best place for the fscache volume
cookie as you can have multiple inodes wishing to use the same file cookie if
there are multiple mounts mounting the same tcon but, say, with different
mount parameters.

I'm not sure what the right way around this is. The root of the problem is
coherency management. If we make a change to an inode on one mounted
superblock and this bounces a change notification over to the server that then
pokes an inode in another mounted superblock on the same machine and causes it
to be invalidated, you lose your local cache if both inodes refer to the same
fscache cookie.

Remember: fscache does not do this for you! It's just a facility by which
which data can be stored and retrieved. The netfs is responsible for telling
it when to invalidate and handling coherency.

That said, it might be possible to time-share a cookie on cifs with leases,
but the local superblocks would have to know about each other - in which case,
why are they separate superblocks?

David


2024-04-17 13:42:14

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Paulo Alcantara <[email protected]> wrote:

> > [!] Note: Looking at cifs_mount_get_tcon(), a more general solution may
> > actually be required. Reacquiring the volume cookie isn't the only thing
> > that function does: it also partially reinitialises the tcon record without
> > any locking - which may cause live filesystem ops already using the tcon
> > through a previous mount to malfunction.
>
> Agreed.

Looking over the code again, I'm not sure whether is actually necessary - or
whether it is necessary and will be a bit nasty to implement as it will
require read locking also.

Firstly, reset_cifs_unix_caps() seems to re-set tcon->fsUnixInfo.Capability
and tcon->unix_ext, which it would presumably set to the same things - which
is probably fine.

However, cifs_qfs_tcon() makes RPC operations that reloads tcon->fsDevInfo and
tcon->fsAttrInfo - both of which may be being accessed without locks.

smb2_qfs_tcon() and smb3_qfs_tcon() alters everything cifs_qfs_tcon() does,
plus a bunch of extra tcon members. Can this locally cached information
change over time on the server whilst we have a connection to it?

David


2024-04-17 14:09:56

by Paulo Alcantara

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

David Howells <[email protected]> writes:

> Paulo Alcantara <[email protected]> wrote:
>
>> Can't we just move the cookie acquisition to cifs_get_tcon() before it
>> gets added to list @ses->tcon_list. This way we'll guarantee that the
>> cookie is set only once for the new tcon.
>
> cifs_get_tcon() is used from more than one place and I'm not sure the second
> place (__cifs_construct_tcon()) actually wants a cookie. I'm not sure what
> that path is for.

__cifs_construct_tcon() is used for creating sessions and tcons under
multiuser mounts. Whenever an user accesses a multiuser mount and the
client can't find a credential for it, a new session and tcon will be
created for the user accessing the mount -- new accesses from same user
will end up reusing the created session and tcon.

And yes, I don't think we'll need a cookie for those tcons as the client
seems to get the fscache cookie from master tcon (the one created from
mount credentials).

> Could all the (re)setting up being done in cifs_mount_get_tcon() be
> pushed back into cifs_get_tcon()?

AFAICT, yes. I'd need to look into it to make sure that's safe.

>> Besides, do we want to share a tcon with two different superblocks that
>> have 'fsc' and 'nofsc', respectively? If not, it would be better to fix
>> match_tcon() as well to handle such case.
>
> Maybe? What does a tcon *actually* represent? I note that in
> cifs_match_super(), it's not the only criterion matched upon - so you can, at
> least in apparent theory, get different superblocks for the same tcon anyway.

tcon simply represents a tree connected SMB share. It can be either an
IPC share (\\srv\IPC$) or the actual share (\\srv\share) we're accessing
the files from.

Consider the following example where a tcon is reused from different
CIFS superblocks:

mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon
mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon

So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes.

The two mounts are accessing the same tcon (\\srv\share) but the new
superblock was created because the prefix path "\dir" didn't match in
cifs_match_super(). Trust me, that's a very common scenario.

> This suggests that the tcon might not be the best place for the fscache volume
> cookie as you can have multiple inodes wishing to use the same file cookie if
> there are multiple mounts mounting the same tcon but, say, with different
> mount parameters.

We're not supposed to allow mounts with different parameters reusing
servers, sessions or tcons, so that should be no issue.

> I'm not sure what the right way around this is. The root of the problem is
> coherency management. If we make a change to an inode on one mounted
> superblock and this bounces a change notification over to the server that then
> pokes an inode in another mounted superblock on the same machine and causes it
> to be invalidated, you lose your local cache if both inodes refer to the same
> fscache cookie.

Yes, that could be a problem. Perhaps placing the fscache cookie in the
superblock would be a way to go, so we can handle the different fscache
cookies for the superblocks that contain different prefix paths and
access same tcon.

> Remember: fscache does not do this for you! It's just a facility by which
> which data can be stored and retrieved. The netfs is responsible for telling
> it when to invalidate and handling coherency.

ACK.

> That said, it might be possible to time-share a cookie on cifs with leases,
> but the local superblocks would have to know about each other - in which case,
> why are they separate superblocks?

See above why they could be separate superblocks.

2024-04-17 14:38:47

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Paulo Alcantara <[email protected]> wrote:

> Consider the following example where a tcon is reused from different
> CIFS superblocks:
>
> mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon
> mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon
>
> So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes.
>
> The two mounts are accessing the same tcon (\\srv\share) but the new
> superblock was created because the prefix path "\dir" didn't match in
> cifs_match_super(). Trust me, that's a very common scenario.

Why does it need to lead to a different superblock, assuming ${opts} is the
same in both cases? Can we not do as NFS does and share the superblock,
walking during the mount process through the directory prefix to the root
object?

In other words, why does:

mount.cifs //srv/share /mnt/1 -o ${opts}
mount.cifs //srv/share/dir /mnt/2 -o ${opts}

give you a different result to:

mount.cifs //srv/share /mnt/1 -o ${opts}
mount --bind /mnt/1/dir /mnt/2

David


2024-04-17 19:00:21

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

On 4/17/2024 10:38 AM, David Howells wrote:
> Paulo Alcantara <[email protected]> wrote:
>
>> Consider the following example where a tcon is reused from different
>> CIFS superblocks:
>>
>> mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon
>> mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon
>>
>> So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes.
>>
>> The two mounts are accessing the same tcon (\\srv\share) but the new
>> superblock was created because the prefix path "\dir" didn't match in
>> cifs_match_super(). Trust me, that's a very common scenario.
>
> Why does it need to lead to a different superblock, assuming ${opts} is the

The tcon is a property of the SMB3 session, it's not shared nor is
it necessarily created at mount time.

Tom.

> same in both cases? Can we not do as NFS does and share the superblock,
> walking during the mount process through the directory prefix to the root
> object?
>
> In other words, why does:
>
> mount.cifs //srv/share /mnt/1 -o ${opts}
> mount.cifs //srv/share/dir /mnt/2 -o ${opts}
>
> give you a different result to:
>
> mount.cifs //srv/share /mnt/1 -o ${opts}
> mount --bind /mnt/1/dir /mnt/2
>
> David
>
>
>

2024-04-17 21:26:05

by Paulo Alcantara

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

David Howells <[email protected]> writes:

> Paulo Alcantara <[email protected]> wrote:
>
>> Consider the following example where a tcon is reused from different
>> CIFS superblocks:
>>
>> mount.cifs //srv/share /mnt/1 -o ${opts} # new super, new tcon
>> mount.cifs //srv/share/dir /mnt/2 -o ${opts} # new super, reused tcon
>>
>> So, /mnt/1/dir/foo and /mnt/2/foo will lead to different inodes.
>>
>> The two mounts are accessing the same tcon (\\srv\share) but the new
>> superblock was created because the prefix path "\dir" didn't match in
>> cifs_match_super(). Trust me, that's a very common scenario.
>
> Why does it need to lead to a different superblock, assuming ${opts} is the
> same in both cases? Can we not do as NFS does and share the superblock,
> walking during the mount process through the directory prefix to the root
> object?

I don't know why it was designed that way, but the reason we have two
different superblocks with ${opts} being the same is because cifs.ko
relies on the value of cifs_sb_info::prepath to build paths out of
dentries. See build_path_from_dentry(). So, when you access
/mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix]
+ cifs_sb_info::prepath + \foo' and then reuse connections
(server+session+tcon) from first superblock to perform I/O on that file.

> In other words, why does:
>
> mount.cifs //srv/share /mnt/1 -o ${opts}
> mount.cifs //srv/share/dir /mnt/2 -o ${opts}
>
> give you a different result to:
>
> mount.cifs //srv/share /mnt/1 -o ${opts}
> mount --bind /mnt/1/dir /mnt/2

Honestly, I don't know how bind works at VFS level. I see that the new
superblock isn't created and when I access /mnt/2/foo,
build_path_from_dentry() correctly returns '\dir\foo'.

2024-04-18 13:34:01

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Tom Talpey <[email protected]> wrote:

> The tcon is a property of the SMB3 session, it's not shared nor is
> it necessarily created at mount time.

Trust me, it can be shared between superblocks.

David


2024-04-18 13:44:16

by David Howells

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

Paulo Alcantara <[email protected]> wrote:

> I don't know why it was designed that way, but the reason we have two
> different superblocks with ${opts} being the same is because cifs.ko
> relies on the value of cifs_sb_info::prepath to build paths out of
> dentries. See build_path_from_dentry(). So, when you access
> /mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix]
> + cifs_sb_info::prepath + \foo' and then reuse connections
> (server+session+tcon) from first superblock to perform I/O on that file.

Yep. You don't *need* prepath. You could always build from the sb->s_root
without a prepath and have mnt->mnt_root offset the place the VFS thinks you
are:

[rootdir]/ <--- s_root points here
|
v
foo/
|
v
bar/ <--- mnt_root points here
|
v
a

Without prepath, you build back up the tree { a, bar/, foo/, [rootdir] } with
prepath you insert the prepath at the end.

Bind mounts just make the VFS think it's starting midway down, but you build
up back to s_root.

Think of a mount as just referring to a subtree of the tree inside the
superblock. The prepath is just an optimisation - but possibly one that makes
sense for cifs if you're having to do pathname fabrication a lot.

David


2024-04-19 20:04:39

by Paulo Alcantara

[permalink] [raw]
Subject: Re: [PATCH] cifs: Fix reacquisition of volume cookie on still-live connection

David Howells <[email protected]> writes:

> Paulo Alcantara <[email protected]> wrote:
>
>> I don't know why it was designed that way, but the reason we have two
>> different superblocks with ${opts} being the same is because cifs.ko
>> relies on the value of cifs_sb_info::prepath to build paths out of
>> dentries. See build_path_from_dentry(). So, when you access
>> /mnt/2/foo, cifs.ko will build a path like '[optional tree name prefix]
>> + cifs_sb_info::prepath + \foo' and then reuse connections
>> (server+session+tcon) from first superblock to perform I/O on that file.
>
> Yep. You don't *need* prepath. You could always build from the sb->s_root
> without a prepath and have mnt->mnt_root offset the place the VFS thinks you
> are:
>
> [rootdir]/ <--- s_root points here
> |
> v
> foo/
> |
> v
> bar/ <--- mnt_root points here
> |
> v
> a
>
> Without prepath, you build back up the tree { a, bar/, foo/, [rootdir] } with
> prepath you insert the prepath at the end.
>
> Bind mounts just make the VFS think it's starting midway down, but you build
> up back to s_root.
>
> Think of a mount as just referring to a subtree of the tree inside the
> superblock. The prepath is just an optimisation - but possibly one that makes
> sense for cifs if you're having to do pathname fabrication a lot.

Thanks alot Dave! Great explanation. We also need to remember that
those prefix paths can also be changed over reconnect. That is, if
you're currently mounted to a DFS link target '\srv1\share' and client
failovers to next target '\srv2\share\foo\bar', cifs_sb_info::prepath
will be set to '\foo\bar'. And if you mounted the DFS link as
`mount.cifs //dfs/link/some/dir`, cifs_sb_info::prepath would be set to
'\some\dir\foo\bar'. Yeah, a lot of corner cases to handle...

Anyways, don't worry much about all of this as we can handle in
follow-up patches.

FWIW, patch looks good:

Acked-by: Paulo Alcantara (Red Hat) <[email protected]>