> On Mar 24, 2022, at 6:51 PM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2022-03-24 at 18:08 -0400, Chuck Lever wrote:
>> There have been reports of races that cause NFSv4 OPEN(CREATE) to
>> return an error even though the requested file was created. NFSv4
>> does not seem to provide a status code for this case.
>>
>> There are plenty of things that can go wrong between the
>> vfs_create() call in do_nfsd_create() and nfs4_vfs_get_file(), but
>> one of them is another client looking up and modifying the file's
>> mode bits in that window. If that happens, the creator might lose
>> access to the file before it can be opened.
>>
>> Instead of opening the newly created file in nfsd4_process_open2(),
>> open it as soon as the file is created, and leave it sitting in the
>> file cache.
>>
>> This patch is not optimal, of course - we should replace the
>> vfs_create() call in do_nfsd_create() with a call that creates the
>> file and returns a "struct file *" that can be planted immediately
>> in nf->nf_file.
>>
>> But first, I would like to hear opinions about the approach
>> suggested above.
>>
>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> fs/nfsd/vfs.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 02a544645b52..80b568fa12f1 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1410,6 +1410,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct
>> svc_fh *fhp,
>> __be32 err;
>> int host_err;
>> __u32 v_mtime=0, v_atime=0;
>> + struct nfsd_file *nf;
>>
>> /* XXX: Shouldn't this be nfserr_inval? */
>> err = nfserr_perm;
>> @@ -1535,6 +1536,14 @@ do_nfsd_create(struct svc_rqst *rqstp, struct
>> svc_fh *fhp,
>> iap->ia_atime.tv_nsec = 0;
>> }
>>
>> + /* Attempt to instantiate a filecache entry while we still
>> hold
>> + * the parent's inode mutex. */
>> + err = nfsd_file_acquire(rqstp, resfhp, NFSD_MAY_WRITE, &nf);
>> + if (err)
>> + /* This would be bad */
>> + goto out;
>> + nfsd_file_put(nf);
>
> Why rely on the file cache to carry the nfsd_file? Isn't it much easier
> just to pass it back directly to the caller?
My thought was that the "struct file *" has to end up in the
filecache anyway, even in the NFSv3 case. If I do this right,
I can avoid the subsequent call to nfsd_open_verified(), which
seems to be the source of the racy chmod behavior I mentioned
above. That might help NFSv3 too, which would need to recreate
the "struct file *" in nfsd_read() or nfsd_write() anyway.
> There are only 2 callers of do_nfsd_create(), so you'd have
> nfsd3_proc_create() that will just call nfsd_file_put() as per above,
> whereas the NFSv4 specific do_open_lookup() could just stash it in the
> struct nfsd4_open so that it can get passed into do_open_permission()
> and eventually nfsd4_process_open2().
Neither nfsd4_process_open2() or do_open_permission() seem
directly interested in the nfsd_file --- it's all under the
covers, in nfs4_get_vfs_file(). But yes, a "struct nfsd4_open"
is passed to and visible in nfs4_get_vfs_file(), as is the
open->op_created boolean.
>> +
>> set_attr:
>> err = nfsd_create_setattr(rqstp, resfhp, iap);
>>
>>
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
--
Chuck Lever
On Fri, 2022-03-25 at 00:35 +0000, Chuck Lever III wrote:
>
>
> > On Mar 24, 2022, at 6:51 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Thu, 2022-03-24 at 18:08 -0400, Chuck Lever wrote:
> > > There have been reports of races that cause NFSv4 OPEN(CREATE) to
> > > return an error even though the requested file was created. NFSv4
> > > does not seem to provide a status code for this case.
> > >
> > > There are plenty of things that can go wrong between the
> > > vfs_create() call in do_nfsd_create() and nfs4_vfs_get_file(),
> > > but
> > > one of them is another client looking up and modifying the file's
> > > mode bits in that window. If that happens, the creator might lose
> > > access to the file before it can be opened.
> > >
> > > Instead of opening the newly created file in
> > > nfsd4_process_open2(),
> > > open it as soon as the file is created, and leave it sitting in
> > > the
> > > file cache.
> > >
> > > This patch is not optimal, of course - we should replace the
> > > vfs_create() call in do_nfsd_create() with a call that creates
> > > the
> > > file and returns a "struct file *" that can be planted
> > > immediately
> > > in nf->nf_file.
> > >
> > > But first, I would like to hear opinions about the approach
> > > suggested above.
> > >
> > > BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
> > > Signed-off-by: Chuck Lever <[email protected]>
> > > ---
> > > fs/nfsd/vfs.c | 9 +++++++++
> > > 1 file changed, 9 insertions(+)
> > >
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 02a544645b52..80b568fa12f1 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -1410,6 +1410,7 @@ do_nfsd_create(struct svc_rqst *rqstp,
> > > struct
> > > svc_fh *fhp,
> > > __be32 err;
> > > int host_err;
> > > __u32 v_mtime=0, v_atime=0;
> > > + struct nfsd_file *nf;
> > >
> > > /* XXX: Shouldn't this be nfserr_inval? */
> > > err = nfserr_perm;
> > > @@ -1535,6 +1536,14 @@ do_nfsd_create(struct svc_rqst *rqstp,
> > > struct
> > > svc_fh *fhp,
> > > iap->ia_atime.tv_nsec = 0;
> > > }
> > >
> > > + /* Attempt to instantiate a filecache entry while we
> > > still
> > > hold
> > > + * the parent's inode mutex. */
> > > + err = nfsd_file_acquire(rqstp, resfhp, NFSD_MAY_WRITE,
> > > &nf);
> > > + if (err)
> > > + /* This would be bad */
> > > + goto out;
> > > + nfsd_file_put(nf);
> >
> > Why rely on the file cache to carry the nfsd_file? Isn't it much
> > easier
> > just to pass it back directly to the caller?
>
> My thought was that the "struct file *" has to end up in the
> filecache anyway, even in the NFSv3 case. If I do this right,
> I can avoid the subsequent call to nfsd_open_verified(), which
> seems to be the source of the racy chmod behavior I mentioned
> above. That might help NFSv3 too, which would need to recreate
> the "struct file *" in nfsd_read() or nfsd_write() anyway.
>
> > There are only 2 callers of do_nfsd_create(), so you'd have
> > nfsd3_proc_create() that will just call nfsd_file_put() as per
> > above,
> > whereas the NFSv4 specific do_open_lookup() could just stash it in
> > the
> > struct nfsd4_open so that it can get passed into
> > do_open_permission()
> > and eventually nfsd4_process_open2().
>
> Neither nfsd4_process_open2() or do_open_permission() seem
> directly interested in the nfsd_file --- it's all under the
> covers, in nfs4_get_vfs_file(). But yes, a "struct nfsd4_open"
> is passed to and visible in nfs4_get_vfs_file(), as is the
> open->op_created boolean.
Having a nfsd_file in the file cache doesn't prevent anyone from
changing the mode, owner and share access locks on the file before you
can do the next set of permissions checks after dropping locks, nor
does it even guarantee that you will still have an open descriptor when
the call to nfsd4_process_open2() occurs.
Making the file descriptor open() and share lock settings atomic with
the file creation, and ensuring that the resulting descriptor is passed
along all the way through the OPEN processing is the only way to avoid
these TOCTOU problems. Yes, they are still a problem for NFSv3, but
that's a protocol issue that NFSv4 was supposed to fix.
>
>
> > > +
> > > set_attr:
> > > err = nfsd_create_setattr(rqstp, resfhp, iap);
> > >
> > >
> > >
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
>
> --
> Chuck Lever
>
>
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On Mar 24, 2022, at 11:52 PM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2022-03-25 at 00:35 +0000, Chuck Lever III wrote:
>>
>>
>>> On Mar 24, 2022, at 6:51 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Thu, 2022-03-24 at 18:08 -0400, Chuck Lever wrote:
>>>> There have been reports of races that cause NFSv4 OPEN(CREATE) to
>>>> return an error even though the requested file was created. NFSv4
>>>> does not seem to provide a status code for this case.
>>>>
>>>> There are plenty of things that can go wrong between the
>>>> vfs_create() call in do_nfsd_create() and nfs4_vfs_get_file(),
>>>> but
>>>> one of them is another client looking up and modifying the file's
>>>> mode bits in that window. If that happens, the creator might lose
>>>> access to the file before it can be opened.
>>>>
>>>> Instead of opening the newly created file in
>>>> nfsd4_process_open2(),
>>>> open it as soon as the file is created, and leave it sitting in
>>>> the
>>>> file cache.
>>>>
>>>> This patch is not optimal, of course - we should replace the
>>>> vfs_create() call in do_nfsd_create() with a call that creates
>>>> the
>>>> file and returns a "struct file *" that can be planted
>>>> immediately
>>>> in nf->nf_file.
>>>>
>>>> But first, I would like to hear opinions about the approach
>>>> suggested above.
>>>>
>>>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>> fs/nfsd/vfs.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 02a544645b52..80b568fa12f1 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -1410,6 +1410,7 @@ do_nfsd_create(struct svc_rqst *rqstp,
>>>> struct
>>>> svc_fh *fhp,
>>>> __be32 err;
>>>> int host_err;
>>>> __u32 v_mtime=0, v_atime=0;
>>>> + struct nfsd_file *nf;
>>>>
>>>> /* XXX: Shouldn't this be nfserr_inval? */
>>>> err = nfserr_perm;
>>>> @@ -1535,6 +1536,14 @@ do_nfsd_create(struct svc_rqst *rqstp,
>>>> struct
>>>> svc_fh *fhp,
>>>> iap->ia_atime.tv_nsec = 0;
>>>> }
>>>>
>>>> + /* Attempt to instantiate a filecache entry while we
>>>> still
>>>> hold
>>>> + * the parent's inode mutex. */
>>>> + err = nfsd_file_acquire(rqstp, resfhp, NFSD_MAY_WRITE,
>>>> &nf);
>>>> + if (err)
>>>> + /* This would be bad */
>>>> + goto out;
>>>> + nfsd_file_put(nf);
>>>
>>> Why rely on the file cache to carry the nfsd_file? Isn't it much
>>> easier
>>> just to pass it back directly to the caller?
>>
>> My thought was that the "struct file *" has to end up in the
>> filecache anyway, even in the NFSv3 case. If I do this right,
>> I can avoid the subsequent call to nfsd_open_verified(), which
>> seems to be the source of the racy chmod behavior I mentioned
>> above. That might help NFSv3 too, which would need to recreate
>> the "struct file *" in nfsd_read() or nfsd_write() anyway.
>>
>>> There are only 2 callers of do_nfsd_create(), so you'd have
>>> nfsd3_proc_create() that will just call nfsd_file_put() as per
>>> above,
>>> whereas the NFSv4 specific do_open_lookup() could just stash it in
>>> the
>>> struct nfsd4_open so that it can get passed into
>>> do_open_permission()
>>> and eventually nfsd4_process_open2().
>>
>> Neither nfsd4_process_open2() or do_open_permission() seem
>> directly interested in the nfsd_file --- it's all under the
>> covers, in nfs4_get_vfs_file(). But yes, a "struct nfsd4_open"
>> is passed to and visible in nfs4_get_vfs_file(), as is the
>> open->op_created boolean.
>
> Having a nfsd_file in the file cache doesn't prevent anyone from
> changing the mode, owner and share access locks on the file before you
> can do the next set of permissions checks after dropping locks,
It's not meant to prevent those changes, but merely to ensure
that the open "struct file *" that was initially created is
the one used eventually in nfsd4_process_open2(). We don't
want to have to open one there -- it could be too late.
> nor
> does it even guarantee that you will still have an open descriptor when
> the call to nfsd4_process_open2() occurs.
Agreed, that's the rub. If there's no way to get the filecache
to always preserve that "struct file *", then explicitly
returning it from do_nfsd_create() seems like the only choice.
So now the problem is which VFS API should do_nfsd_create() use
instead of vfs_create() ? I haven't yet found a "create a file"
API that is EXPORTed and takes a dentry and returns a "struct
file *". At this point I'm not sure one currently exists.
--
Chuck Lever