2023-11-01 02:22:53

by Al Viro

[permalink] [raw]
Subject: Re: [RFC] simplifying fast_dput(), dentry_kill() et.al.

[NFS folks Cc'd]

On Tue, Oct 31, 2023 at 12:18:48AM +0000, Al Viro wrote:
> On Mon, Oct 30, 2023 at 12:18:28PM -1000, Linus Torvalds wrote:
> > On Mon, 30 Oct 2023 at 11:53, Al Viro <[email protected]> wrote:
> > >
> > > After fixing a couple of brainos, it seems to work.
> >
> > This all makes me unnaturally nervous, probably because it;s overly
> > subtle, and I have lost the context for some of the rules.
>
> A bit of context: I started to look at the possibility of refcount overflows.
> Writing the current rules for dentry refcounting and lifetime down was the
> obvious first step, and that immediately turned into an awful mess.
>
> It is overly subtle. Even more so when you throw the shrink lists into
> the mix - shrink_lock_dentry() got too smart for its own good, and that
> leads to really awful correctness proofs.

... and for another example of subtle shit, consider DCACHE_NORCU. Recall
c0eb027e5aef "vfs: don't do RCU lookup of empty pathnames" and note that
it relies upon never getting results of alloc_file_pseudo() with directory
inode anywhere near descriptor tables.

Back then I basically went "fine, nobody would ever use alloc_file_pseudo()
for that anyway", but... there's a call in __nfs42_ssc_open() that doesn't
have any obvious protection against ending up with directory inode.
That does not end up anywhere near descriptor tables, as far as I can tell,
fortunately.

Unfortunately, it is quite capable of fucking the things up in different
ways, even if it's promptly closed. d_instantiate() on directory inode
is a really bad thing; a plenty of places expect to have only one alias
for those, and would be very unhappy with that kind of crap without any
RCU considerations.

I'm pretty sure that this NFS code really does not want to use that for
directories; the simplest solution would be to refuse alloc_file_pseudo()
for directory inodes. NFS folks - do you have a problem with the
following patch?

======
Make sure we never feed a directory to alloc_file_pseudo()

That would be broken in a lot of ways, from UAF in pathwalk if
that thing ever gets into descriptor tables, to royally screwing
every place that relies upon the lack of aliases for directory
inodes (i.e. quite a bit of VFS).

Signed-off-by: Al Viro <[email protected]>
---
diff --git a/fs/file_table.c b/fs/file_table.c
index ee21b3da9d08..5331a696896e 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -326,6 +326,9 @@ struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
struct path path;
struct file *file;

+ if (WARN_ON_ONCE(S_ISDIR(inode->i_mode)))
+ return ERR_PTR(-EISDIR);
+
path.dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
if (!path.dentry)
return ERR_PTR(-ENOMEM);


2023-11-01 14:31:09

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [RFC] simplifying fast_dput(), dentry_kill() et.al.

On 31 Oct 2023, at 22:22, Al Viro wrote:

> [NFS folks Cc'd]
>
> On Tue, Oct 31, 2023 at 12:18:48AM +0000, Al Viro wrote:
>> On Mon, Oct 30, 2023 at 12:18:28PM -1000, Linus Torvalds wrote:
>>> On Mon, 30 Oct 2023 at 11:53, Al Viro <[email protected]> wrote:
>>>>
>>>> After fixing a couple of brainos, it seems to work.
>>>
>>> This all makes me unnaturally nervous, probably because it;s overly
>>> subtle, and I have lost the context for some of the rules.
>>
>> A bit of context: I started to look at the possibility of refcount overflows.
>> Writing the current rules for dentry refcounting and lifetime down was the
>> obvious first step, and that immediately turned into an awful mess.
>>
>> It is overly subtle. Even more so when you throw the shrink lists into
>> the mix - shrink_lock_dentry() got too smart for its own good, and that
>> leads to really awful correctness proofs.
>
> ... and for another example of subtle shit, consider DCACHE_NORCU. Recall
> c0eb027e5aef "vfs: don't do RCU lookup of empty pathnames" and note that
> it relies upon never getting results of alloc_file_pseudo() with directory
> inode anywhere near descriptor tables.
>
> Back then I basically went "fine, nobody would ever use alloc_file_pseudo()
> for that anyway", but... there's a call in __nfs42_ssc_open() that doesn't
> have any obvious protection against ending up with directory inode.
> That does not end up anywhere near descriptor tables, as far as I can tell,
> fortunately.
>
> Unfortunately, it is quite capable of fucking the things up in different
> ways, even if it's promptly closed. d_instantiate() on directory inode
> is a really bad thing; a plenty of places expect to have only one alias
> for those, and would be very unhappy with that kind of crap without any
> RCU considerations.
>
> I'm pretty sure that this NFS code really does not want to use that for
> directories; the simplest solution would be to refuse alloc_file_pseudo()
> for directory inodes. NFS folks - do you have a problem with the
> following patch?

It would be a protocol violation to use COPY on a directory:

https://www.rfc-editor.org/rfc/rfc7862.html#section-15.2.3

Both SAVED_FH and CURRENT_FH must be regular files. If either
SAVED_FH or CURRENT_FH is not a regular file, the operation MUST fail
and return NFS4ERR_WRONG_TYPE.

so nfsd4_verify_copy() does S_ISREG() checks before alloc_file_pseudo().

Ben