2016-07-08 01:48:02

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

It looks like we are bit overzealous about failing mkdir/create/mknod
with permission denied if the parent dir is not writeable.
Need to make sure the name does not exist first, because we need to
return EEXIST in that case.

Signed-off-by: Oleg Drokin <[email protected]>
---
A very similar problem exists with symlinks, but the patch is more
involved, so assuming this one is ok, I'll send a symlink one separately.
fs/nfsd/nfs4proc.c | 6 +++++-
fs/nfsd/vfs.c | 11 ++++++++++-
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index de1ff1d..0067520 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

fh_init(&resfh, NFS4_FHSIZE);

+ /*
+ * We just check thta parent is accessible here, nfsd_* do their
+ * own access permission checks
+ */
status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
- NFSD_MAY_CREATE);
+ NFSD_MAY_EXEC);
if (status)
return status;

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6fbd81e..6a45ec6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (isdotent(fname, flen))
goto out;

- err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
+ /*
+ * Even though it is a create, first we see if we are even allowed
+ * to peek inside the parent
+ */
+ err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
if (err)
goto out;

@@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
goto out;
}

+ /* Now let's see if we actually have permissions to create */
+ err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
+ if (err)
+ goto out;
+
if (!(iap->ia_valid & ATTR_MODE))
iap->ia_mode = 0;
iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
--
2.7.4



2016-07-08 11:02:30

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> It looks like we are bit overzealous about failing mkdir/create/mknod
> with permission denied if the parent dir is not writeable.
> Need to make sure the name does not exist first, because we need to
> return EEXIST in that case.
>
> Signed-off-by: Oleg Drokin <[email protected]>
> ---
> A very similar problem exists with symlinks, but the patch is more
> involved, so assuming this one is ok, I'll send a symlink one separately.
>  fs/nfsd/nfs4proc.c |  6 +++++-
>  fs/nfsd/vfs.c      | 11 ++++++++++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
>


nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
always used is that EPERM is "permanent". IOW, changing permissions
won't ever allow the user to do something. For instance, unprivileged
users can never chown a file, so they should get back EPERM there. When
a directory isn't writeable on a create they should get EACCES since
they could do the create if the directory were writeable.


> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..0067520 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>   fh_init(&resfh, NFS4_FHSIZE);
>  
> + /*
> +  * We just check thta parent is accessible here, nfsd_* do their
> +  * own access permission checks
> +  */
>   status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> -    NFSD_MAY_CREATE);
> +    NFSD_MAY_EXEC);
>   if (status)
>   return status;
>  
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6fbd81e..6a45ec6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   if (isdotent(fname, flen))
>   goto out;
>  
> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> + /*
> +  * Even though it is a create, first we see if we are even allowed
> +  * to peek inside the parent
> +  */
> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>   if (err)
>   goto out;
>  
> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>   goto out; 
>   }
>  
> + /* Now let's see if we actually have permissions to create */
> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> + if (err)
> + goto out;
> +
>   if (!(iap->ia_valid & ATTR_MODE))
>   iap->ia_mode = 0;
>   iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;


Ouch. This means two nfsd_permission calls per create operation. If
it's necessary for correctness then so be it, but is it actually
documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
EACCES in this situation?

--

Jeff Layton <[email protected]>

2016-07-08 15:14:54

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:

> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>> It looks like we are bit overzealous about failing mkdir/create/mknod
>> with permission denied if the parent dir is not writeable.
>> Need to make sure the name does not exist first, because we need to
>> return EEXIST in that case.
>>
>> Signed-off-by: Oleg Drokin <[email protected]>
>> ---
>> A very similar problem exists with symlinks, but the patch is more
>> involved, so assuming this one is ok, I'll send a symlink one separately.
>> fs/nfsd/nfs4proc.c | 6 +++++-
>> fs/nfsd/vfs.c | 11 ++++++++++-
>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>
>
>
> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> always used is that EPERM is "permanent". IOW, changing permissions
> won't ever allow the user to do something. For instance, unprivileged
> users can never chown a file, so they should get back EPERM there. When
> a directory isn't writeable on a create they should get EACCES since
> they could do the create if the directory were writeable.

Hm, I see, thanks.
Confusing that you get "Permission denied" from perror ;)

>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index de1ff1d..0067520 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>
>> fh_init(&resfh, NFS4_FHSIZE);
>>
>> + /*
>> + * We just check thta parent is accessible here, nfsd_* do their
>> + * own access permission checks
>> + */
>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>> - NFSD_MAY_CREATE);
>> + NFSD_MAY_EXEC);
>> if (status)
>> return status;
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 6fbd81e..6a45ec6 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> if (isdotent(fname, flen))
>> goto out;
>>
>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>> + /*
>> + * Even though it is a create, first we see if we are even allowed
>> + * to peek inside the parent
>> + */
>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>> if (err)
>> goto out;
>>
>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> goto out;
>> }
>>
>> + /* Now let's see if we actually have permissions to create */
>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>> + if (err)
>> + goto out;
>> +
>> if (!(iap->ia_valid & ATTR_MODE))
>> iap->ia_mode = 0;
>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>
>
> Ouch. This means two nfsd_permission calls per create operation. If
> it's necessary for correctness then so be it, but is it actually
> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> EACCES in this situation?

Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
newer version is here:
http://pubs.opengroup.org/onlinepubs/9699919799/

They tell us that we absolutely must fail with EEXIST if the name is a symlink
(so we need to lookup it anyway), and also that EEXIST is the failure code
if the path exists.

Are double permission checks really as bad for nfs? it looked like it would
call mostly into VFS so even if first call would be expensive, second call should
be really cheap?

In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
i.e. mkdir in a directory where you cannot read and execute will give you
EEXIST where as that's when EACCES is desired I imagine.


>
> --
>
> Jeff Layton <[email protected]>


2016-07-08 15:54:11

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>
> > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > with permission denied if the parent dir is not writeable.
> > > Need to make sure the name does not exist first, because we need to
> > > return EEXIST in that case.
> > >
> > > Signed-off-by: Oleg Drokin <[email protected]>
> > > ---
> > > A very similar problem exists with symlinks, but the patch is more
> > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > >
> >
> >
> > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > always used is that EPERM is "permanent". IOW, changing permissions
> > won't ever allow the user to do something. For instance, unprivileged
> > users can never chown a file, so they should get back EPERM there. When
> > a directory isn't writeable on a create they should get EACCES since
> > they could do the create if the directory were writeable.
>
> Hm, I see, thanks.
> Confusing that you get "Permission denied" from perror ;)
>

Yes indeed. It's a subtle and confusing distinction.

> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index de1ff1d..0067520 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  
> > >   fh_init(&resfh, NFS4_FHSIZE);
> > >  
> > > + /*
> > > +  * We just check thta parent is accessible here, nfsd_* do their
> > > +  * own access permission checks
> > > +  */
> > >   status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > -    NFSD_MAY_CREATE);
> > > +    NFSD_MAY_EXEC);
> > >   if (status)
> > >   return status;
> > >  
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 6fbd81e..6a45ec6 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >   if (isdotent(fname, flen))
> > >   goto out;
> > >  
> > > - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > + /*
> > > +  * Even though it is a create, first we see if we are even allowed
> > > +  * to peek inside the parent
> > > +  */
> > > + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > >   if (err)
> > >   goto out;
> > >  
> > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > >   goto out; 
> > >   }
> > >  
> > > + /* Now let's see if we actually have permissions to create */
> > > + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > + if (err)
> > > + goto out;
> > > +
> > >   if (!(iap->ia_valid & ATTR_MODE))
> > >   iap->ia_mode = 0;
> > >   iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >
> >
> > Ouch. This means two nfsd_permission calls per create operation. If
> > it's necessary for correctness then so be it, but is it actually
> > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > EACCES in this situation?
>
> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> newer version is here:
> http://pubs.opengroup.org/onlinepubs/9699919799/
>
> They tell us that we absolutely must fail with EEXIST if the name is a symlink
> (so we need to lookup it anyway), and also that EEXIST is the failure code
> if the path exists.
>

I'm not sure that that verbiage supersedes the fact that you don't have
write permissions on the directory. Does it?

ISTM that it's perfectly valid to shortcut looking up the dentry if the
user doesn't have write permissions on the directory, even when the
target is a symlink.

IOW, I'm not sure I see a bug here.

> Are double permission checks really as bad for nfs? it looked like it would
> call mostly into VFS so even if first call would be expensive, second call should
> be really cheap?
>

It depends on the underlying fs. In most cases, you're right, but you
can export things that overload the ->permission op, and those can be
as expensive as they like (within reason of course).


> In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
> nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
> i.e. mkdir in a directory where you cannot read and execute will give you
> EEXIST where as that's when EACCES is desired I imagine.
>

--

Jeff Layton <[email protected]>

2016-07-08 15:59:59

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:

> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>
>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>> with permission denied if the parent dir is not writeable.
>>>> Need to make sure the name does not exist first, because we need to
>>>> return EEXIST in that case.
>>>>
>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>> ---
>>>> A very similar problem exists with symlinks, but the patch is more
>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>
>>>
>>>
>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>> always used is that EPERM is "permanent". IOW, changing permissions
>>> won't ever allow the user to do something. For instance, unprivileged
>>> users can never chown a file, so they should get back EPERM there. When
>>> a directory isn't writeable on a create they should get EACCES since
>>> they could do the create if the directory were writeable.
>>
>> Hm, I see, thanks.
>> Confusing that you get "Permission denied" from perror ;)
>>
>
> Yes indeed. It's a subtle and confusing distinction.
>
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index de1ff1d..0067520 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>
>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>
>>>> + /*
>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>> + * own access permission checks
>>>> + */
>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>> - NFSD_MAY_CREATE);
>>>> + NFSD_MAY_EXEC);
>>>> if (status)
>>>> return status;
>>>>
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 6fbd81e..6a45ec6 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> if (isdotent(fname, flen))
>>>> goto out;
>>>>
>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>> + /*
>>>> + * Even though it is a create, first we see if we are even allowed
>>>> + * to peek inside the parent
>>>> + */
>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>> if (err)
>>>> goto out;
>>>>
>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> goto out;
>>>> }
>>>>
>>>> + /* Now let's see if we actually have permissions to create */
>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>> + if (err)
>>>> + goto out;
>>>> +
>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>> iap->ia_mode = 0;
>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>
>>>
>>> Ouch. This means two nfsd_permission calls per create operation. If
>>> it's necessary for correctness then so be it, but is it actually
>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>> EACCES in this situation?
>>
>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>> newer version is here:
>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>
>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>> if the path exists.
>>
>
> I'm not sure that that verbiage supersedes the fact that you don't have
> write permissions on the directory. Does it?

"If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."

This sounds pretty straightforward to me, no?
Since it does not matter that we do not have write permissions here, because
the name already exists.

(also there are tons of applications that make this assumption when
badly reimplementing their mkdir -p thing, I imagine they also have this same
reading of the man page - this is why I even care about it).

> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> user doesn't have write permissions on the directory, even when the
> target is a symlink.
>
> IOW, I'm not sure I see a bug here.
>
>> Are double permission checks really as bad for nfs? it looked like it would
>> call mostly into VFS so even if first call would be expensive, second call should
>> be really cheap?
>>
>
> It depends on the underlying fs. In most cases, you're right, but you
> can export things that overload the ->permission op, and those can be
> as expensive as they like (within reason of course).
>
>
>> In theory we can do the fh_verify(.., NFSD_MAY_NOP) to skip the first
>> nfsd_permission() call, but I suspect then we'd violate the spec in the other way:
>> i.e. mkdir in a directory where you cannot read and execute will give you
>> EEXIST where as that's when EACCES is desired I imagine.
>>
>
> --
>
> Jeff Layton <[email protected]>


2016-07-08 16:04:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> > On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> >
> > > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > > with permission denied if the parent dir is not writeable.
> > > > Need to make sure the name does not exist first, because we need to
> > > > return EEXIST in that case.
> > > >
> > > > Signed-off-by: Oleg Drokin <[email protected]>
> > > > ---
> > > > A very similar problem exists with symlinks, but the patch is more
> > > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > > >
> > >
> > >
> > > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > > always used is that EPERM is "permanent". IOW, changing permissions
> > > won't ever allow the user to do something. For instance, unprivileged
> > > users can never chown a file, so they should get back EPERM there. When
> > > a directory isn't writeable on a create they should get EACCES since
> > > they could do the create if the directory were writeable.
> >
> > Hm, I see, thanks.
> > Confusing that you get "Permission denied" from perror ;)
> >
>
> Yes indeed. It's a subtle and confusing distinction.
>
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index de1ff1d..0067520 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  
> > > >   fh_init(&resfh, NFS4_FHSIZE);
> > > >  
> > > > + /*
> > > > +  * We just check thta parent is accessible here, nfsd_* do their
> > > > +  * own access permission checks
> > > > +  */
> > > >   status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > > -    NFSD_MAY_CREATE);
> > > > +    NFSD_MAY_EXEC);
> > > >   if (status)
> > > >   return status;
> > > >  
> > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > index 6fbd81e..6a45ec6 100644
> > > > --- a/fs/nfsd/vfs.c
> > > > +++ b/fs/nfsd/vfs.c
> > > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >   if (isdotent(fname, flen))
> > > >   goto out;
> > > >  
> > > > - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > > + /*
> > > > +  * Even though it is a create, first we see if we are even allowed
> > > > +  * to peek inside the parent
> > > > +  */
> > > > + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > > >   if (err)
> > > >   goto out;
> > > >  
> > > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > >   goto out; 
> > > >   }
> > > >  
> > > > + /* Now let's see if we actually have permissions to create */
> > > > + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > > + if (err)
> > > > + goto out;
> > > > +
> > > >   if (!(iap->ia_valid & ATTR_MODE))
> > > >   iap->ia_mode = 0;
> > > >   iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> > >
> > >
> > > Ouch. This means two nfsd_permission calls per create operation. If
> > > it's necessary for correctness then so be it, but is it actually
> > > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > > EACCES in this situation?
> >
> > Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> > newer version is here:
> > http://pubs.opengroup.org/onlinepubs/9699919799/
> >
> > They tell us that we absolutely must fail with EEXIST if the name is a symlink
> > (so we need to lookup it anyway), and also that EEXIST is the failure code
> > if the path exists.
> >
>
> I'm not sure that that verbiage supersedes the fact that you don't have
> write permissions on the directory. Does it?
>
> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> user doesn't have write permissions on the directory, even when the
> target is a symlink.
>
> IOW, I'm not sure I see a bug here.

If this is causing real programs to behave incorrectly, then that may
matter more than the letter of the spec. But I'm a little curious why
we'd be hearing about that just now--did the client or server's behavior
change recently?

> > Are double permission checks really as bad for nfs? it looked like it would
> > call mostly into VFS so even if first call would be expensive, second call should
> > be really cheap?
> >
>
> It depends on the underlying fs. In most cases, you're right, but you
> can export things that overload the ->permission op, and those can be
> as expensive as they like (within reason of course).

Weird if the expense of a second permission call is significant compared
to following the mkdir and sync. But, what do I know.

--b.

2016-07-08 16:16:24

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>>
>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>> with permission denied if the parent dir is not writeable.
>>>>> Need to make sure the name does not exist first, because we need to
>>>>> return EEXIST in that case.
>>>>>
>>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>>> ---
>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>
>>>>
>>>>
>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>> won't ever allow the user to do something. For instance, unprivileged
>>>> users can never chown a file, so they should get back EPERM there. When
>>>> a directory isn't writeable on a create they should get EACCES since
>>>> they could do the create if the directory were writeable.
>>>
>>> Hm, I see, thanks.
>>> Confusing that you get "Permission denied" from perror ;)
>>>
>>
>> Yes indeed. It's a subtle and confusing distinction.
>>
>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>> index de1ff1d..0067520 100644
>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>
>>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>>
>>>>> + /*
>>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>>> + * own access permission checks
>>>>> + */
>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>> - NFSD_MAY_CREATE);
>>>>> + NFSD_MAY_EXEC);
>>>>> if (status)
>>>>> return status;
>>>>>
>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>> index 6fbd81e..6a45ec6 100644
>>>>> --- a/fs/nfsd/vfs.c
>>>>> +++ b/fs/nfsd/vfs.c
>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>> if (isdotent(fname, flen))
>>>>> goto out;
>>>>>
>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>> + /*
>>>>> + * Even though it is a create, first we see if we are even allowed
>>>>> + * to peek inside the parent
>>>>> + */
>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>> if (err)
>>>>> goto out;
>>>>>
>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>> goto out;
>>>>> }
>>>>>
>>>>> + /* Now let's see if we actually have permissions to create */
>>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>> + if (err)
>>>>> + goto out;
>>>>> +
>>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>>> iap->ia_mode = 0;
>>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>
>>>>
>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>> it's necessary for correctness then so be it, but is it actually
>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>> EACCES in this situation?
>>>
>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>> newer version is here:
>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>
>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>> if the path exists.
>>>
>>
>> I'm not sure that that verbiage supersedes the fact that you don't have
>> write permissions on the directory. Does it?
>>
>> ISTM that it's perfectly valid to shortcut looking up the dentry if the
>> user doesn't have write permissions on the directory, even when the
>> target is a symlink.
>>
>> IOW, I'm not sure I see a bug here.
>
> If this is causing real programs to behave incorrectly, then that may
> matter more than the letter of the spec. But I'm a little curious why
> we'd be hearing about that just now--did the client or server's behavior
> change recently?

We, on the Lustre side, have been hearing about this since 2010, (this optimization
was enabled in 2009).

I suspect some people just complain in places that not everybody monitors.
I tried 3.10 and it has the same problem here.
I just tried on RHEL6 (2.6.32) and the problem is also apparent there.

Also it's confusing how you get different errors depending on if the cache is hot or not:
[green@centos6-16 racer]$ mkdir test
mkdir: cannot create directory `test': Permission denied
[green@centos6-16 racer]$ ls -ld test
drwxr-xr-x 2 root root 4096 Jul 8 12:12 test
[green@centos6-16 racer]$ mkdir test
mkdir: cannot create directory `test': File exists


>>> Are double permission checks really as bad for nfs? it looked like it would
>>> call mostly into VFS so even if first call would be expensive, second call should
>>> be really cheap?
>>>
>>
>> It depends on the underlying fs. In most cases, you're right, but you
>> can export things that overload the ->permission op, and those can be
>> as expensive as they like (within reason of course).
>
> Weird if the expense of a second permission call is significant compared
> to following the mkdir and sync. But, what do I know.
>
> --b.


2016-07-08 16:17:49

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, 2016-07-08 at 11:59 -0400, Oleg Drokin wrote:
> On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:
>
> > On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> > > On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> > >
> > > > On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> > > > > It looks like we are bit overzealous about failing mkdir/create/mknod
> > > > > with permission denied if the parent dir is not writeable.
> > > > > Need to make sure the name does not exist first, because we need to
> > > > > return EEXIST in that case.
> > > > >
> > > > > Signed-off-by: Oleg Drokin <[email protected]>
> > > > > ---
> > > > > A very similar problem exists with symlinks, but the patch is more
> > > > > involved, so assuming this one is ok, I'll send a symlink one separately.
> > > > >  fs/nfsd/nfs4proc.c |  6 +++++-
> > > > >  fs/nfsd/vfs.c      | 11 ++++++++++-
> > > > >  2 files changed, 15 insertions(+), 2 deletions(-)
> > > > >
> > > >
> > > >
> > > > nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> > > > always used is that EPERM is "permanent". IOW, changing permissions
> > > > won't ever allow the user to do something. For instance, unprivileged
> > > > users can never chown a file, so they should get back EPERM there. When
> > > > a directory isn't writeable on a create they should get EACCES since
> > > > they could do the create if the directory were writeable.
> > >
> > > Hm, I see, thanks.
> > > Confusing that you get "Permission denied" from perror ;)
> > >
> >
> > Yes indeed. It's a subtle and confusing distinction.
> >
> > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > index de1ff1d..0067520 100644
> > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >  
> > > > >   fh_init(&resfh, NFS4_FHSIZE);
> > > > >  
> > > > > + /*
> > > > > +  * We just check thta parent is accessible here, nfsd_* do their
> > > > > +  * own access permission checks
> > > > > +  */
> > > > >   status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> > > > > -    NFSD_MAY_CREATE);
> > > > > +    NFSD_MAY_EXEC);
> > > > >   if (status)
> > > > >   return status;
> > > > >  
> > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > > index 6fbd81e..6a45ec6 100644
> > > > > --- a/fs/nfsd/vfs.c
> > > > > +++ b/fs/nfsd/vfs.c
> > > > > @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > > >   if (isdotent(fname, flen))
> > > > >   goto out;
> > > > >  
> > > > > - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> > > > > + /*
> > > > > +  * Even though it is a create, first we see if we are even allowed
> > > > > +  * to peek inside the parent
> > > > > +  */
> > > > > + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> > > > >   if (err)
> > > > >   goto out;
> > > > >  
> > > > > @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > > >   goto out; 
> > > > >   }
> > > > >  
> > > > > + /* Now let's see if we actually have permissions to create */
> > > > > + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> > > > > + if (err)
> > > > > + goto out;
> > > > > +
> > > > >   if (!(iap->ia_valid & ATTR_MODE))
> > > > >   iap->ia_mode = 0;
> > > > >   iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> > > >
> > > >
> > > > Ouch. This means two nfsd_permission calls per create operation. If
> > > > it's necessary for correctness then so be it, but is it actually
> > > > documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> > > > EACCES in this situation?
> > >
> > > Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> > > newer version is here:
> > > http://pubs.opengroup.org/onlinepubs/9699919799/
> > >
> > > They tell us that we absolutely must fail with EEXIST if the name is a symlink
> > > (so we need to lookup it anyway), and also that EEXIST is the failure code
> > > if the path exists.
> > >
> >
> > I'm not sure that that verbiage supersedes the fact that you don't have
> > write permissions on the directory. Does it?
>
> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>
> This sounds pretty straightforward to me, no?
> Since it does not matter that we do not have write permissions here, because
> the name already exists.
>
> (also there are tons of applications that make this assumption when
> badly reimplementing their mkdir -p thing, I imagine they also have this same
> reading of the man page - this is why I even care about it).
>

I always have trouble with this sort of thing. Just because it's in
DESCRIPTION, does that make it supersede the part in ERRORS? IOW, I
think that's just telling you how to handle a symlink as the last
component, not that you have to do that before the permissions check.

Now that said, as a practical matter I do agree that EEXIST _is_
probably the more helpful error message. If there are applications that
rely on this then we probably should just take your patch.

Reviewed-by: Jeff Layton <[email protected]>

2016-07-08 16:29:01

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 12:17 PM, Jeff Layton wrote:

> On Fri, 2016-07-08 at 11:59 -0400, Oleg Drokin wrote:
>> On Jul 8, 2016, at 11:53 AM, Jeff Layton wrote:
>>
>>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>>>
>>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>> with permission denied if the parent dir is not writeable.
>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>> return EEXIST in that case.
>>>>>>
>>>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>>>> ---
>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>
>>>>>
>>>>>
>>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>>> won't ever allow the user to do something. For instance, unprivileged
>>>>> users can never chown a file, so they should get back EPERM there. When
>>>>> a directory isn't writeable on a create they should get EACCES since
>>>>> they could do the create if the directory were writeable.
>>>>
>>>> Hm, I see, thanks.
>>>> Confusing that you get "Permission denied" from perror ;)
>>>>
>>>
>>> Yes indeed. It's a subtle and confusing distinction.
>>>
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index de1ff1d..0067520 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>
>>>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>>>
>>>>>> + /*
>>>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>>>> + * own access permission checks
>>>>>> + */
>>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>> - NFSD_MAY_CREATE);
>>>>>> + NFSD_MAY_EXEC);
>>>>>> if (status)
>>>>>> return status;
>>>>>>
>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>> --- a/fs/nfsd/vfs.c
>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> if (isdotent(fname, flen))
>>>>>> goto out;
>>>>>>
>>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>> + /*
>>>>>> + * Even though it is a create, first we see if we are even allowed
>>>>>> + * to peek inside the parent
>>>>>> + */
>>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>> if (err)
>>>>>> goto out;
>>>>>>
>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> goto out;
>>>>>> }
>>>>>>
>>>>>> + /* Now let's see if we actually have permissions to create */
>>>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>> + if (err)
>>>>>> + goto out;
>>>>>> +
>>>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>>>> iap->ia_mode = 0;
>>>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>>
>>>>>
>>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>>> it's necessary for correctness then so be it, but is it actually
>>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>>> EACCES in this situation?
>>>>
>>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>>> newer version is here:
>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>
>>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>>> if the path exists.
>>>>
>>>
>>> I'm not sure that that verbiage supersedes the fact that you don't have
>>> write permissions on the directory. Does it?
>>
>> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>>
>> This sounds pretty straightforward to me, no?
>> Since it does not matter that we do not have write permissions here, because
>> the name already exists.
>>
>> (also there are tons of applications that make this assumption when
>> badly reimplementing their mkdir -p thing, I imagine they also have this same
>> reading of the man page - this is why I even care about it).
>>
>
> I always have trouble with this sort of thing. Just because it's in
> DESCRIPTION, does that make it supersede the part in ERRORS? IOW, I
> think that's just telling you how to handle a symlink as the last
> component, not that you have to do that before the permissions check.

Personally I think of it like open(O_CREAT).
If the file exists, you just open it even if you have no permissions to
add stuff in the parent.

Same with mkdir/mknod - if the name already exists - you don't need any
write rights to create it - it's already there.

symlink was likely just special cased to highlight that it should not
just be followed (unlike what happens with open).

Also VFS even goes to some pains to prioritize returning EEXIST over
other errors like EROFS and such (something nfs (both server and client?)/lustre
does not, but I suspect it's a lot more fringe case?).

> Now that said, as a practical matter I do agree that EEXIST _is_
> probably the more helpful error message. If there are applications that
> rely on this then we probably should just take your patch.
>
> Reviewed-by: Jeff Layton <[email protected]>


2016-07-08 20:49:28

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 12:16:14PM -0400, Oleg Drokin wrote:
>
> On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:
>
> > On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
> >> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
> >>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
> >>>
> >>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
> >>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>>> with permission denied if the parent dir is not writeable.
> >>>>> Need to make sure the name does not exist first, because we need to
> >>>>> return EEXIST in that case.
> >>>>>
> >>>>> Signed-off-by: Oleg Drokin <[email protected]>
> >>>>> ---
> >>>>> A very similar problem exists with symlinks, but the patch is more
> >>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>>> fs/nfsd/nfs4proc.c | 6 +++++-
> >>>>> fs/nfsd/vfs.c | 11 ++++++++++-
> >>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>>>>
> >>>>
> >>>>
> >>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
> >>>> always used is that EPERM is "permanent". IOW, changing permissions
> >>>> won't ever allow the user to do something. For instance, unprivileged
> >>>> users can never chown a file, so they should get back EPERM there. When
> >>>> a directory isn't writeable on a create they should get EACCES since
> >>>> they could do the create if the directory were writeable.
> >>>
> >>> Hm, I see, thanks.
> >>> Confusing that you get "Permission denied" from perror ;)
> >>>
> >>
> >> Yes indeed. It's a subtle and confusing distinction.
> >>
> >>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>>> index de1ff1d..0067520 100644
> >>>>> --- a/fs/nfsd/nfs4proc.c
> >>>>> +++ b/fs/nfsd/nfs4proc.c
> >>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>>>
> >>>>> fh_init(&resfh, NFS4_FHSIZE);
> >>>>>
> >>>>> + /*
> >>>>> + * We just check thta parent is accessible here, nfsd_* do their
> >>>>> + * own access permission checks
> >>>>> + */
> >>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>>> - NFSD_MAY_CREATE);
> >>>>> + NFSD_MAY_EXEC);
> >>>>> if (status)
> >>>>> return status;
> >>>>>
> >>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>>> index 6fbd81e..6a45ec6 100644
> >>>>> --- a/fs/nfsd/vfs.c
> >>>>> +++ b/fs/nfsd/vfs.c
> >>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>> if (isdotent(fname, flen))
> >>>>> goto out;
> >>>>>
> >>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>>> + /*
> >>>>> + * Even though it is a create, first we see if we are even allowed
> >>>>> + * to peek inside the parent
> >>>>> + */
> >>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>>>> if (err)
> >>>>> goto out;
> >>>>>
> >>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>> goto out;
> >>>>> }
> >>>>>
> >>>>> + /* Now let's see if we actually have permissions to create */
> >>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >>>>> + if (err)
> >>>>> + goto out;
> >>>>> +
> >>>>> if (!(iap->ia_valid & ATTR_MODE))
> >>>>> iap->ia_mode = 0;
> >>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >>>>
> >>>>
> >>>> Ouch. This means two nfsd_permission calls per create operation. If
> >>>> it's necessary for correctness then so be it, but is it actually
> >>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
> >>>> EACCES in this situation?
> >>>
> >>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
> >>> newer version is here:
> >>> http://pubs.opengroup.org/onlinepubs/9699919799/
> >>>
> >>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
> >>> (so we need to lookup it anyway), and also that EEXIST is the failure code
> >>> if the path exists.
> >>>
> >>
> >> I'm not sure that that verbiage supersedes the fact that you don't have
> >> write permissions on the directory. Does it?
> >>
> >> ISTM that it's perfectly valid to shortcut looking up the dentry if the
> >> user doesn't have write permissions on the directory, even when the
> >> target is a symlink.
> >>
> >> IOW, I'm not sure I see a bug here.
> >
> > If this is causing real programs to behave incorrectly, then that may
> > matter more than the letter of the spec. But I'm a little curious why
> > we'd be hearing about that just now--did the client or server's behavior
> > change recently?
>
> We, on the Lustre side, have been hearing about this since 2010, (this optimization
> was enabled in 2009).
>
> I suspect some people just complain in places that not everybody monitors.

Sure, but you said "tons of programs" do this, and off hand I can't
recall a single report. That's weird.

Anyway, I agree that the behavior your want seems more consistent at
least.

--b.

> I tried 3.10 and it has the same problem here.
> I just tried on RHEL6 (2.6.32) and the problem is also apparent there.
>
> Also it's confusing how you get different errors depending on if the cache is hot or not:
> [green@centos6-16 racer]$ mkdir test
> mkdir: cannot create directory `test': Permission denied
> [green@centos6-16 racer]$ ls -ld test
> drwxr-xr-x 2 root root 4096 Jul 8 12:12 test
> [green@centos6-16 racer]$ mkdir test
> mkdir: cannot create directory `test': File exists
>
>
> >>> Are double permission checks really as bad for nfs? it looked like it would
> >>> call mostly into VFS so even if first call would be expensive, second call should
> >>> be really cheap?
> >>>
> >>
> >> It depends on the underlying fs. In most cases, you're right, but you
> >> can export things that overload the ->permission op, and those can be
> >> as expensive as they like (within reason of course).
> >
> > Weird if the expense of a second permission call is significant compared
> > to following the mkdir and sync. But, what do I know.
> >
> > --b.

2016-07-08 20:54:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> It looks like we are bit overzealous about failing mkdir/create/mknod
> with permission denied if the parent dir is not writeable.
> Need to make sure the name does not exist first, because we need to
> return EEXIST in that case.
>
> Signed-off-by: Oleg Drokin <[email protected]>
> ---
> A very similar problem exists with symlinks, but the patch is more
> involved, so assuming this one is ok, I'll send a symlink one separately.
> fs/nfsd/nfs4proc.c | 6 +++++-
> fs/nfsd/vfs.c | 11 ++++++++++-
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index de1ff1d..0067520 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>
> fh_init(&resfh, NFS4_FHSIZE);
>
> + /*
> + * We just check thta parent is accessible here, nfsd_* do their
> + * own access permission checks
> + */
> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> - NFSD_MAY_CREATE);
> + NFSD_MAY_EXEC);
> if (status)
> return status;
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6fbd81e..6a45ec6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (isdotent(fname, flen))
> goto out;
>
> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> + /*
> + * Even though it is a create, first we see if we are even allowed
> + * to peek inside the parent
> + */
> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);

Looks like in the v3 case we haven't actually locked the directory yet
at this point so this check is a little race-prone.

I wonder why the code's structured that way--it's confusing.

--b.

> if (err)
> goto out;
>
> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> goto out;
> }
>
> + /* Now let's see if we actually have permissions to create */
> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> + if (err)
> + goto out;
> +
> if (!(iap->ia_valid & ATTR_MODE))
> iap->ia_mode = 0;
> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> --
> 2.7.4

2016-07-08 21:47:32

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 4:49 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 12:16:14PM -0400, Oleg Drokin wrote:
>>
>> On Jul 8, 2016, at 12:04 PM, J. Bruce Fields wrote:
>>
>>> On Fri, Jul 08, 2016 at 11:53:28AM -0400, Jeff Layton wrote:
>>>> On Fri, 2016-07-08 at 11:14 -0400, Oleg Drokin wrote:
>>>>> On Jul 8, 2016, at 7:02 AM, Jeff Layton wrote:
>>>>>
>>>>>> On Thu, 2016-07-07 at 21:47 -0400, Oleg Drokin wrote:
>>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>>> with permission denied if the parent dir is not writeable.
>>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>>> return EEXIST in that case.
>>>>>>>
>>>>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>>>>> ---
>>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>
>>>>>>
>>>>>> nit: subject says EPERM, but I think you mean EACCES. The mnemonic I've
>>>>>> always used is that EPERM is "permanent". IOW, changing permissions
>>>>>> won't ever allow the user to do something. For instance, unprivileged
>>>>>> users can never chown a file, so they should get back EPERM there. When
>>>>>> a directory isn't writeable on a create they should get EACCES since
>>>>>> they could do the create if the directory were writeable.
>>>>>
>>>>> Hm, I see, thanks.
>>>>> Confusing that you get "Permission denied" from perror ;)
>>>>>
>>>>
>>>> Yes indeed. It's a subtle and confusing distinction.
>>>>
>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>> index de1ff1d..0067520 100644
>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>
>>>>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>>>>
>>>>>>> + /*
>>>>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>>>>> + * own access permission checks
>>>>>>> + */
>>>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>>> - NFSD_MAY_CREATE);
>>>>>>> + NFSD_MAY_EXEC);
>>>>>>> if (status)
>>>>>>> return status;
>>>>>>>
>>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>>> --- a/fs/nfsd/vfs.c
>>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>> if (isdotent(fname, flen))
>>>>>>> goto out;
>>>>>>>
>>>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>>> + /*
>>>>>>> + * Even though it is a create, first we see if we are even allowed
>>>>>>> + * to peek inside the parent
>>>>>>> + */
>>>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>>> if (err)
>>>>>>> goto out;
>>>>>>>
>>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>> goto out;
>>>>>>> }
>>>>>>>
>>>>>>> + /* Now let's see if we actually have permissions to create */
>>>>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>>> + if (err)
>>>>>>> + goto out;
>>>>>>> +
>>>>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>>>>> iap->ia_mode = 0;
>>>>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>>>
>>>>>>
>>>>>> Ouch. This means two nfsd_permission calls per create operation. If
>>>>>> it's necessary for correctness then so be it, but is it actually
>>>>>> documented anywhere (POSIX perhaps?) that we must prefer EEXIST over
>>>>>> EACCES in this situation?
>>>>>
>>>>> Opengroup manpage: http://pubs.opengroup.org/onlinepubs/009695399/functions/mkdir.html
>>>>> newer version is here:
>>>>> http://pubs.opengroup.org/onlinepubs/9699919799/
>>>>>
>>>>> They tell us that we absolutely must fail with EEXIST if the name is a symlink
>>>>> (so we need to lookup it anyway), and also that EEXIST is the failure code
>>>>> if the path exists.
>>>>>
>>>>
>>>> I'm not sure that that verbiage supersedes the fact that you don't have
>>>> write permissions on the directory. Does it?
>>>>
>>>> ISTM that it's perfectly valid to shortcut looking up the dentry if the
>>>> user doesn't have write permissions on the directory, even when the
>>>> target is a symlink.
>>>>
>>>> IOW, I'm not sure I see a bug here.
>>>
>>> If this is causing real programs to behave incorrectly, then that may
>>> matter more than the letter of the spec. But I'm a little curious why
>>> we'd be hearing about that just now--did the client or server's behavior
>>> change recently?
>>
>> We, on the Lustre side, have been hearing about this since 2010, (this optimization
>> was enabled in 2009).
>>
>> I suspect some people just complain in places that not everybody monitors.
>
> Sure, but you said "tons of programs" do this, and off hand I can't
> recall a single report. That's weird.

I wonder if people just accept that "NFS is just weird" and code in workarounds,
where as with Lustre we promise (almost) full POSIX compliance, and also came much later
so people are just seeing that "this does not work" and complain more loudly?

2016-07-08 21:53:26

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:

> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>> It looks like we are bit overzealous about failing mkdir/create/mknod
>> with permission denied if the parent dir is not writeable.
>> Need to make sure the name does not exist first, because we need to
>> return EEXIST in that case.
>>
>> Signed-off-by: Oleg Drokin <[email protected]>
>> ---
>> A very similar problem exists with symlinks, but the patch is more
>> involved, so assuming this one is ok, I'll send a symlink one separately.
>> fs/nfsd/nfs4proc.c | 6 +++++-
>> fs/nfsd/vfs.c | 11 ++++++++++-
>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index de1ff1d..0067520 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>
>> fh_init(&resfh, NFS4_FHSIZE);
>>
>> + /*
>> + * We just check thta parent is accessible here, nfsd_* do their
>> + * own access permission checks
>> + */
>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>> - NFSD_MAY_CREATE);
>> + NFSD_MAY_EXEC);
>> if (status)
>> return status;
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 6fbd81e..6a45ec6 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> if (isdotent(fname, flen))
>> goto out;
>>
>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>> + /*
>> + * Even though it is a create, first we see if we are even allowed
>> + * to peek inside the parent
>> + */
>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>
> Looks like in the v3 case we haven't actually locked the directory yet
> at this point so this check is a little race-prone.

In reality this check is not really needed, I suspect.
When we call vfs_create/mknod/mkdir later on, it has it's own permission check
anyway so if there was a race and somebody changed dir access in the middle,
there's going to be another check anyway and it would be caught.
Unless there's some weird server-side permission wiggling as well that makes it
ineffective, but I imagine that one cannot really change in a racy way?

> I wonder why the code's structured that way--it's confusing.

Probably years of accumulated "damage" ;)

> --b.
>
>> if (err)
>> goto out;
>>
>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> goto out;
>> }
>>
>> + /* Now let's see if we actually have permissions to create */
>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>> + if (err)
>> + goto out;
>> +
>> if (!(iap->ia_valid & ATTR_MODE))
>> iap->ia_mode = 0;
>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>> --
>> 2.7.4


2016-07-09 02:52:37

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 11:59:50AM -0400, Oleg Drokin wrote:

> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>
> This sounds pretty straightforward to me, no?
> Since it does not matter that we do not have write permissions here, because
> the name already exists.

When more than one condition applies, we have every right to return any of
them. POSIX does *NOT* specify the order of checks. Never had.

2016-07-09 02:58:47

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 10:52 PM, Al Viro wrote:

> On Fri, Jul 08, 2016 at 11:59:50AM -0400, Oleg Drokin wrote:
>
>> "If path names a symbolic link, mkdir() shall fail and set errno to [EEXIST]."
>>
>> This sounds pretty straightforward to me, no?
>> Since it does not matter that we do not have write permissions here, because
>> the name already exists.
>
> When more than one condition applies, we have every right to return any of
> them. POSIX does *NOT* specify the order of checks. Never had.

Out of curiosity, why does filename_create() delay EROFS then?


2016-07-09 03:10:04

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:

> I wonder if people just accept that "NFS is just weird" and code in workarounds,
> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
> so people are just seeing that "this does not work" and complain more loudly?

To quote POSIX: "If more than one error occurs in processing a function call,
any one of the possible errors may be returned, as the order of detection is
undefined." (from System Interfaces: General Information: 2.3 Error Numbers)

And regarding mkdir(2) it has
[EACCES]
Search permission is denied on a component of the path prefix, or write
permission is denied on the parent directory of the directory to be created.
[EEXIST]
The named file exists.
among the error conditions. In situations when both apply, the implementation
is bloody well allowed to return either. It might be nicer to return EEXIST
in such cases, for consistency sake (if another thread does stat() on the
pathname in question just as you are about to call mkdir(2), you will get
EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
please do not bring POSIX compliance as an argument. It's a QoI argument and
nothing beyond that.

2016-07-09 03:13:22

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 10:58:38PM -0400, Oleg Drokin wrote:

> > When more than one condition applies, we have every right to return any of
> > them. POSIX does *NOT* specify the order of checks. Never had.
>
> Out of curiosity, why does filename_create() delay EROFS then?

QoI and historical behaviour...

2016-07-09 03:41:48

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 8, 2016, at 11:10 PM, Al Viro wrote:

> On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:
>
>> I wonder if people just accept that "NFS is just weird" and code in workarounds,
>> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
>> so people are just seeing that "this does not work" and complain more loudly?
>
> To quote POSIX: "If more than one error occurs in processing a function call,
> any one of the possible errors may be returned, as the order of detection is
> undefined." (from System Interfaces: General Information: 2.3 Error Numbers)
>
> And regarding mkdir(2) it has
> [EACCES]
> Search permission is denied on a component of the path prefix, or write
> permission is denied on the parent directory of the directory to be created.
> [EEXIST]
> The named file exists.
> among the error conditions. In situations when both apply, the implementation
> is bloody well allowed to return either. It might be nicer to return EEXIST
> in such cases, for consistency sake (if another thread does stat() on the
> pathname in question just as you are about to call mkdir(2), you will get
> EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
> please do not bring POSIX compliance as an argument. It's a QoI argument and
> nothing beyond that.

Ok, I see.
Thanks.

Bruce, do you want the patch resubmitted with a rewritten commit message,
or do you think it's best to just drop it them?


2016-07-13 19:01:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 11:41:41PM -0400, Oleg Drokin wrote:
>
> On Jul 8, 2016, at 11:10 PM, Al Viro wrote:
>
> > On Fri, Jul 08, 2016 at 05:47:22PM -0400, Oleg Drokin wrote:
> >
> >> I wonder if people just accept that "NFS is just weird" and code in workarounds,
> >> where as with Lustre we promise (almost) full POSIX compliance, and also came much later
> >> so people are just seeing that "this does not work" and complain more loudly?
> >
> > To quote POSIX: "If more than one error occurs in processing a function call,
> > any one of the possible errors may be returned, as the order of detection is
> > undefined." (from System Interfaces: General Information: 2.3 Error Numbers)
> >
> > And regarding mkdir(2) it has
> > [EACCES]
> > Search permission is denied on a component of the path prefix, or write
> > permission is denied on the parent directory of the directory to be created.
> > [EEXIST]
> > The named file exists.
> > among the error conditions. In situations when both apply, the implementation
> > is bloody well allowed to return either. It might be nicer to return EEXIST
> > in such cases, for consistency sake (if another thread does stat() on the
> > pathname in question just as you are about to call mkdir(2), you will get
> > EEXIST without ever reaching permission(9), let alone ->mkdir() method), but
> > please do not bring POSIX compliance as an argument. It's a QoI argument and
> > nothing beyond that.
>
> Ok, I see.
> Thanks.
>
> Bruce, do you want the patch resubmitted with a rewritten commit message,
> or do you think it's best to just drop it them?

Other things being equal I still agree with you that there'd be
advantages to being more consistent, so a changelog update would be
fine.

--b.

2016-07-21 20:34:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>
> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>
> > On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >> It looks like we are bit overzealous about failing mkdir/create/mknod
> >> with permission denied if the parent dir is not writeable.
> >> Need to make sure the name does not exist first, because we need to
> >> return EEXIST in that case.
> >>
> >> Signed-off-by: Oleg Drokin <[email protected]>
> >> ---
> >> A very similar problem exists with symlinks, but the patch is more
> >> involved, so assuming this one is ok, I'll send a symlink one separately.
> >> fs/nfsd/nfs4proc.c | 6 +++++-
> >> fs/nfsd/vfs.c | 11 ++++++++++-
> >> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >> index de1ff1d..0067520 100644
> >> --- a/fs/nfsd/nfs4proc.c
> >> +++ b/fs/nfsd/nfs4proc.c
> >> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>
> >> fh_init(&resfh, NFS4_FHSIZE);
> >>
> >> + /*
> >> + * We just check thta parent is accessible here, nfsd_* do their
> >> + * own access permission checks
> >> + */
> >> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >> - NFSD_MAY_CREATE);
> >> + NFSD_MAY_EXEC);
> >> if (status)
> >> return status;
> >>
> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >> index 6fbd81e..6a45ec6 100644
> >> --- a/fs/nfsd/vfs.c
> >> +++ b/fs/nfsd/vfs.c
> >> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >> if (isdotent(fname, flen))
> >> goto out;
> >>
> >> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >> + /*
> >> + * Even though it is a create, first we see if we are even allowed
> >> + * to peek inside the parent
> >> + */
> >> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >
> > Looks like in the v3 case we haven't actually locked the directory yet
> > at this point so this check is a little race-prone.
>
> In reality this check is not really needed, I suspect.
> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> anyway so if there was a race and somebody changed dir access in the middle,
> there's going to be another check anyway and it would be caught.
> Unless there's some weird server-side permission wiggling as well that makes it
> ineffective, but I imagine that one cannot really change in a racy way?

Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
We still need the fh_verify there since it's also what does the
filehandle->dentry translation, but we don't need permission checking
here yet.

Applying with that one change. (And I'll followup with some additional
minor cleanup of the create code.)

--b.

>
> > I wonder why the code's structured that way--it's confusing.
>
> Probably years of accumulated "damage" ;)
>
> > --b.
> >
> >> if (err)
> >> goto out;
> >>
> >> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >> goto out;
> >> }
> >>
> >> + /* Now let's see if we actually have permissions to create */
> >> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >> + if (err)
> >> + goto out;
> >> +
> >> if (!(iap->ia_valid & ATTR_MODE))
> >> iap->ia_mode = 0;
> >> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >> --
> >> 2.7.4

2016-07-21 20:37:48

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:

> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>>
>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>>
>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>> with permission denied if the parent dir is not writeable.
>>>> Need to make sure the name does not exist first, because we need to
>>>> return EEXIST in that case.
>>>>
>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>> ---
>>>> A very similar problem exists with symlinks, but the patch is more
>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>> index de1ff1d..0067520 100644
>>>> --- a/fs/nfsd/nfs4proc.c
>>>> +++ b/fs/nfsd/nfs4proc.c
>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>
>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>
>>>> + /*
>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>> + * own access permission checks
>>>> + */
>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>> - NFSD_MAY_CREATE);
>>>> + NFSD_MAY_EXEC);
>>>> if (status)
>>>> return status;
>>>>
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 6fbd81e..6a45ec6 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> if (isdotent(fname, flen))
>>>> goto out;
>>>>
>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>> + /*
>>>> + * Even though it is a create, first we see if we are even allowed
>>>> + * to peek inside the parent
>>>> + */
>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>
>>> Looks like in the v3 case we haven't actually locked the directory yet
>>> at this point so this check is a little race-prone.
>>
>> In reality this check is not really needed, I suspect.
>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>> anyway so if there was a race and somebody changed dir access in the middle,
>> there's going to be another check anyway and it would be caught.
>> Unless there's some weird server-side permission wiggling as well that makes it
>> ineffective, but I imagine that one cannot really change in a racy way?
>
> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> We still need the fh_verify there since it's also what does the
> filehandle->dentry translation, but we don't need permission checking
> here yet.

This will likely need an extra test to ensure that when you
do mkdir where you do not have exec permissions, you would get EACCES instead
of EEXIST, otherwise that would be information leakage, no?
Or do you think the second time we do nfsd_permission, that would be covered?

> Applying with that one change. (And I'll followup with some additional
> minor cleanup of the create code.)
>
> --b.
>
>>
>>> I wonder why the code's structured that way--it's confusing.
>>
>> Probably years of accumulated "damage" ;)
>>
>>> --b.
>>>
>>>> if (err)
>>>> goto out;
>>>>
>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>> goto out;
>>>> }
>>>>
>>>> + /* Now let's see if we actually have permissions to create */
>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>> + if (err)
>>>> + goto out;
>>>> +
>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>> iap->ia_mode = 0;
>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>> --
>>>> 2.7.4


2016-07-22 01:57:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
>
> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
>
> > On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
> >>
> >> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
> >>
> >>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>> with permission denied if the parent dir is not writeable.
> >>>> Need to make sure the name does not exist first, because we need to
> >>>> return EEXIST in that case.
> >>>>
> >>>> Signed-off-by: Oleg Drokin <[email protected]>
> >>>> ---
> >>>> A very similar problem exists with symlinks, but the patch is more
> >>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>> fs/nfsd/nfs4proc.c | 6 +++++-
> >>>> fs/nfsd/vfs.c | 11 ++++++++++-
> >>>> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>> index de1ff1d..0067520 100644
> >>>> --- a/fs/nfsd/nfs4proc.c
> >>>> +++ b/fs/nfsd/nfs4proc.c
> >>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>>
> >>>> fh_init(&resfh, NFS4_FHSIZE);
> >>>>
> >>>> + /*
> >>>> + * We just check thta parent is accessible here, nfsd_* do their
> >>>> + * own access permission checks
> >>>> + */
> >>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>> - NFSD_MAY_CREATE);
> >>>> + NFSD_MAY_EXEC);
> >>>> if (status)
> >>>> return status;
> >>>>
> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>> index 6fbd81e..6a45ec6 100644
> >>>> --- a/fs/nfsd/vfs.c
> >>>> +++ b/fs/nfsd/vfs.c
> >>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>> if (isdotent(fname, flen))
> >>>> goto out;
> >>>>
> >>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>> + /*
> >>>> + * Even though it is a create, first we see if we are even allowed
> >>>> + * to peek inside the parent
> >>>> + */
> >>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>>
> >>> Looks like in the v3 case we haven't actually locked the directory yet
> >>> at this point so this check is a little race-prone.
> >>
> >> In reality this check is not really needed, I suspect.
> >> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> >> anyway so if there was a race and somebody changed dir access in the middle,
> >> there's going to be another check anyway and it would be caught.
> >> Unless there's some weird server-side permission wiggling as well that makes it
> >> ineffective, but I imagine that one cannot really change in a racy way?
> >
> > Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> > We still need the fh_verify there since it's also what does the
> > filehandle->dentry translation, but we don't need permission checking
> > here yet.
>
> This will likely need an extra test to ensure that when you
> do mkdir where you do not have exec permissions, you would get EACCES instead
> of EEXIST, otherwise that would be information leakage, no?
> Or do you think the second time we do nfsd_permission, that would be covered?

No, you're right, for some reason I thought that the check for a
positive inode didn't happen till later. But actually the logic is
basically:

lock inode
lookup_one_len
return nfserr_exist if looked up dentry is positive.
check for create permission
vfs_create

So, yes, the initial MAY_EXEC test's needed to prevent that information
leak.

That said... I wonder why it's done that way? Seems to me we could just
tremove that nfserr_exist check and the vfs would handle it for us....
I'll try that.

--b.

>
> > Applying with that one change. (And I'll followup with some additional
> > minor cleanup of the create code.)
> >
> > --b.
> >
> >>
> >>> I wonder why the code's structured that way--it's confusing.
> >>
> >> Probably years of accumulated "damage" ;)
> >>
> >>> --b.
> >>>
> >>>> if (err)
> >>>> goto out;
> >>>>
> >>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>> goto out;
> >>>> }
> >>>>
> >>>> + /* Now let's see if we actually have permissions to create */
> >>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
> >>>> + if (err)
> >>>> + goto out;
> >>>> +
> >>>> if (!(iap->ia_valid & ATTR_MODE))
> >>>> iap->ia_mode = 0;
> >>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
> >>>> --
> >>>> 2.7.4

2016-07-22 06:35:36

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:

> On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
>>
>> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
>>
>>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>>>>
>>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>>>>
>>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>> with permission denied if the parent dir is not writeable.
>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>> return EEXIST in that case.
>>>>>>
>>>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>>>> ---
>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>> index de1ff1d..0067520 100644
>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>
>>>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>>>
>>>>>> + /*
>>>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>>>> + * own access permission checks
>>>>>> + */
>>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>> - NFSD_MAY_CREATE);
>>>>>> + NFSD_MAY_EXEC);
>>>>>> if (status)
>>>>>> return status;
>>>>>>
>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>> --- a/fs/nfsd/vfs.c
>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> if (isdotent(fname, flen))
>>>>>> goto out;
>>>>>>
>>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>> + /*
>>>>>> + * Even though it is a create, first we see if we are even allowed
>>>>>> + * to peek inside the parent
>>>>>> + */
>>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>
>>>>> Looks like in the v3 case we haven't actually locked the directory yet
>>>>> at this point so this check is a little race-prone.
>>>>
>>>> In reality this check is not really needed, I suspect.
>>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>>>> anyway so if there was a race and somebody changed dir access in the middle,
>>>> there's going to be another check anyway and it would be caught.
>>>> Unless there's some weird server-side permission wiggling as well that makes it
>>>> ineffective, but I imagine that one cannot really change in a racy way?
>>>
>>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
>>> We still need the fh_verify there since it's also what does the
>>> filehandle->dentry translation, but we don't need permission checking
>>> here yet.
>>
>> This will likely need an extra test to ensure that when you
>> do mkdir where you do not have exec permissions, you would get EACCES instead
>> of EEXIST, otherwise that would be information leakage, no?
>> Or do you think the second time we do nfsd_permission, that would be covered?
>
> No, you're right, for some reason I thought that the check for a
> positive inode didn't happen till later. But actually the logic is
> basically:
>
> lock inode
> lookup_one_len
> return nfserr_exist if looked up dentry is positive.
> check for create permission
> vfs_create
>
> So, yes, the initial MAY_EXEC test's needed to prevent that information
> leak.
>
> That said... I wonder why it's done that way? Seems to me we could just
> tremove that nfserr_exist check and the vfs would handle it for us....
> I'll try that.

It won't work because the very first thing vfs_create does is may_create(),
and so you get EACCES right there instead of the EEXIST.

>
> --b.
>
>>
>>> Applying with that one change. (And I'll followup with some additional
>>> minor cleanup of the create code.)
>>>
>>> --b.
>>>
>>>>
>>>>> I wonder why the code's structured that way--it's confusing.
>>>>
>>>> Probably years of accumulated "damage" ;)
>>>>
>>>>> --b.
>>>>>
>>>>>> if (err)
>>>>>> goto out;
>>>>>>
>>>>>> @@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>> goto out;
>>>>>> }
>>>>>>
>>>>>> + /* Now let's see if we actually have permissions to create */
>>>>>> + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
>>>>>> + if (err)
>>>>>> + goto out;
>>>>>> +
>>>>>> if (!(iap->ia_valid & ATTR_MODE))
>>>>>> iap->ia_mode = 0;
>>>>>> iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
>>>>>> --
>>>>>> 2.7.4


2016-07-22 10:55:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

On Fri, Jul 22, 2016 at 02:35:26AM -0400, Oleg Drokin wrote:
>
> On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:
>
> > On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
> >>
> >> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
> >>
> >>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
> >>>>
> >>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
> >>>>
> >>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
> >>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
> >>>>>> with permission denied if the parent dir is not writeable.
> >>>>>> Need to make sure the name does not exist first, because we need to
> >>>>>> return EEXIST in that case.
> >>>>>>
> >>>>>> Signed-off-by: Oleg Drokin <[email protected]>
> >>>>>> ---
> >>>>>> A very similar problem exists with symlinks, but the patch is more
> >>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
> >>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
> >>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
> >>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> >>>>>> index de1ff1d..0067520 100644
> >>>>>> --- a/fs/nfsd/nfs4proc.c
> >>>>>> +++ b/fs/nfsd/nfs4proc.c
> >>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >>>>>>
> >>>>>> fh_init(&resfh, NFS4_FHSIZE);
> >>>>>>
> >>>>>> + /*
> >>>>>> + * We just check thta parent is accessible here, nfsd_* do their
> >>>>>> + * own access permission checks
> >>>>>> + */
> >>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
> >>>>>> - NFSD_MAY_CREATE);
> >>>>>> + NFSD_MAY_EXEC);
> >>>>>> if (status)
> >>>>>> return status;
> >>>>>>
> >>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>>>> index 6fbd81e..6a45ec6 100644
> >>>>>> --- a/fs/nfsd/vfs.c
> >>>>>> +++ b/fs/nfsd/vfs.c
> >>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>>>>> if (isdotent(fname, flen))
> >>>>>> goto out;
> >>>>>>
> >>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
> >>>>>> + /*
> >>>>>> + * Even though it is a create, first we see if we are even allowed
> >>>>>> + * to peek inside the parent
> >>>>>> + */
> >>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
> >>>>>
> >>>>> Looks like in the v3 case we haven't actually locked the directory yet
> >>>>> at this point so this check is a little race-prone.
> >>>>
> >>>> In reality this check is not really needed, I suspect.
> >>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
> >>>> anyway so if there was a race and somebody changed dir access in the middle,
> >>>> there's going to be another check anyway and it would be caught.
> >>>> Unless there's some weird server-side permission wiggling as well that makes it
> >>>> ineffective, but I imagine that one cannot really change in a racy way?
> >>>
> >>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
> >>> We still need the fh_verify there since it's also what does the
> >>> filehandle->dentry translation, but we don't need permission checking
> >>> here yet.
> >>
> >> This will likely need an extra test to ensure that when you
> >> do mkdir where you do not have exec permissions, you would get EACCES instead
> >> of EEXIST, otherwise that would be information leakage, no?
> >> Or do you think the second time we do nfsd_permission, that would be covered?
> >
> > No, you're right, for some reason I thought that the check for a
> > positive inode didn't happen till later. But actually the logic is
> > basically:
> >
> > lock inode
> > lookup_one_len
> > return nfserr_exist if looked up dentry is positive.
> > check for create permission
> > vfs_create
> >
> > So, yes, the initial MAY_EXEC test's needed to prevent that information
> > leak.
> >
> > That said... I wonder why it's done that way? Seems to me we could just
> > tremove that nfserr_exist check and the vfs would handle it for us....
> > I'll try that.
>
> It won't work because the very first thing vfs_create does is may_create(),
> and so you get EACCES right there instead of the EEXIST.

static inline int may_create(struct inode *dir, struct dentry *child)
{
audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
if (child->d_inode)
return -EEXIST;
...

So it looks OK to me.

--b.

2016-07-22 15:13:29

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM


On Jul 22, 2016, at 6:55 AM, J. Bruce Fields wrote:

> On Fri, Jul 22, 2016 at 02:35:26AM -0400, Oleg Drokin wrote:
>>
>> On Jul 21, 2016, at 9:57 PM, J. Bruce Fields wrote:
>>
>>> On Thu, Jul 21, 2016 at 04:37:40PM -0400, Oleg Drokin wrote:
>>>>
>>>> On Jul 21, 2016, at 4:34 PM, J. Bruce Fields wrote:
>>>>
>>>>> On Fri, Jul 08, 2016 at 05:53:19PM -0400, Oleg Drokin wrote:
>>>>>>
>>>>>> On Jul 8, 2016, at 4:54 PM, J. Bruce Fields wrote:
>>>>>>
>>>>>>> On Thu, Jul 07, 2016 at 09:47:46PM -0400, Oleg Drokin wrote:
>>>>>>>> It looks like we are bit overzealous about failing mkdir/create/mknod
>>>>>>>> with permission denied if the parent dir is not writeable.
>>>>>>>> Need to make sure the name does not exist first, because we need to
>>>>>>>> return EEXIST in that case.
>>>>>>>>
>>>>>>>> Signed-off-by: Oleg Drokin <[email protected]>
>>>>>>>> ---
>>>>>>>> A very similar problem exists with symlinks, but the patch is more
>>>>>>>> involved, so assuming this one is ok, I'll send a symlink one separately.
>>>>>>>> fs/nfsd/nfs4proc.c | 6 +++++-
>>>>>>>> fs/nfsd/vfs.c | 11 ++++++++++-
>>>>>>>> 2 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>>>>>>>> index de1ff1d..0067520 100644
>>>>>>>> --- a/fs/nfsd/nfs4proc.c
>>>>>>>> +++ b/fs/nfsd/nfs4proc.c
>>>>>>>> @@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>>>>>>
>>>>>>>> fh_init(&resfh, NFS4_FHSIZE);
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * We just check thta parent is accessible here, nfsd_* do their
>>>>>>>> + * own access permission checks
>>>>>>>> + */
>>>>>>>> status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
>>>>>>>> - NFSD_MAY_CREATE);
>>>>>>>> + NFSD_MAY_EXEC);
>>>>>>>> if (status)
>>>>>>>> return status;
>>>>>>>>
>>>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>>>> index 6fbd81e..6a45ec6 100644
>>>>>>>> --- a/fs/nfsd/vfs.c
>>>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>>>> @@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>>>>> if (isdotent(fname, flen))
>>>>>>>> goto out;
>>>>>>>>
>>>>>>>> - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
>>>>>>>> + /*
>>>>>>>> + * Even though it is a create, first we see if we are even allowed
>>>>>>>> + * to peek inside the parent
>>>>>>>> + */
>>>>>>>> + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
>>>>>>>
>>>>>>> Looks like in the v3 case we haven't actually locked the directory yet
>>>>>>> at this point so this check is a little race-prone.
>>>>>>
>>>>>> In reality this check is not really needed, I suspect.
>>>>>> When we call vfs_create/mknod/mkdir later on, it has it's own permission check
>>>>>> anyway so if there was a race and somebody changed dir access in the middle,
>>>>>> there's going to be another check anyway and it would be caught.
>>>>>> Unless there's some weird server-side permission wiggling as well that makes it
>>>>>> ineffective, but I imagine that one cannot really change in a racy way?
>>>>>
>>>>> Yeah, I think I'll just change those NFSD_MAY_EXEC's to NFSD_MAY_NOP's.
>>>>> We still need the fh_verify there since it's also what does the
>>>>> filehandle->dentry translation, but we don't need permission checking
>>>>> here yet.
>>>>
>>>> This will likely need an extra test to ensure that when you
>>>> do mkdir where you do not have exec permissions, you would get EACCES instead
>>>> of EEXIST, otherwise that would be information leakage, no?
>>>> Or do you think the second time we do nfsd_permission, that would be covered?
>>>
>>> No, you're right, for some reason I thought that the check for a
>>> positive inode didn't happen till later. But actually the logic is
>>> basically:
>>>
>>> lock inode
>>> lookup_one_len
>>> return nfserr_exist if looked up dentry is positive.
>>> check for create permission
>>> vfs_create
>>>
>>> So, yes, the initial MAY_EXEC test's needed to prevent that information
>>> leak.
>>>
>>> That said... I wonder why it's done that way? Seems to me we could just
>>> tremove that nfserr_exist check and the vfs would handle it for us....
>>> I'll try that.
>>
>> It won't work because the very first thing vfs_create does is may_create(),
>> and so you get EACCES right there instead of the EEXIST.
>
> static inline int may_create(struct inode *dir, struct dentry *child)
> {
> audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
> if (child->d_inode)
> return -EEXIST;
> ...
>
> So it looks OK to me.

Hm, in fact indeed. I was just too worked up about the client side, but on the
server side there was a real lookup already, so it does look workable.

>
> --b.


2016-07-22 17:49:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 1/7] nfsd: Make creates return EEXIST instead of EACCES

From: Oleg Drokin <[email protected]>

When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.

Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 6 +++++-
fs/nfsd/vfs.c | 11 ++++++++++-
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e0c15f879d89..9d7e1edf0cca 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -605,8 +605,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

fh_init(&resfh, NFS4_FHSIZE);

+ /*
+ * We just check that parent is accessible here, nfsd_* do their
+ * own access permission checks
+ */
status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
- NFSD_MAY_CREATE);
+ NFSD_MAY_EXEC);
if (status)
return status;

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6fbd81ecb410..fda4f86161f8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1161,7 +1161,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (isdotent(fname, flen))
goto out;

- err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE);
+ /*
+ * Even though it is a create, first let's see if we are even allowed
+ * to peek inside the parent
+ */
+ err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
if (err)
goto out;

@@ -1211,6 +1215,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
goto out;
}

+ /* Now let's see if we actually have permissions to create */
+ err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
+ if (err)
+ goto out;
+
if (!(iap->ia_valid & ATTR_MODE))
iap->ia_mode = 0;
iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
--
2.7.4


2016-07-22 17:49:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 4/7] nfsd: reorganize nfsd_create

From: "J. Bruce Fields" <[email protected]>

There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked. The only already-locked
caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.

Also fix some comments while we're here.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfsproc.c | 4 +-
fs/nfsd/vfs.c | 109 ++++++++++++++++++++++++++++--------------------------
fs/nfsd/vfs.h | 3 ++
3 files changed, 61 insertions(+), 55 deletions(-)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index c30b12388bf6..8914206c867c 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -358,8 +358,8 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,
nfserr = 0;
if (!inode) {
/* File doesn't exist. Create it and set attrs */
- nfserr = nfsd_create(rqstp, dirfhp, argp->name, argp->len,
- attr, type, rdev, newfhp);
+ nfserr = nfsd_create_locked(rqstp, dirfhp, argp->name,
+ argp->len, attr, type, rdev, newfhp);
} else if (type == S_IFREG) {
dprintk("nfsd: existing %s, valid=%x, size=%ld\n",
argp->name, attr->ia_valid, (long) attr->ia_size);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7ae3b5a72a4d..9c7830cdaeda 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1135,16 +1135,9 @@ nfsd_check_ignore_resizing(struct iattr *iap)
iap->ia_valid &= ~ATTR_SIZE;
}

-/*
- * Create a file (regular, directory, device, fifo); UNIX sockets
- * not yet implemented.
- * If the response fh has been verified, the parent directory should
- * already be locked. Note that the parent directory is left locked.
- *
- * N.B. Every call to nfsd_create needs an fh_put for _both_ fhp and resfhp
- */
+/* The parent directory should already be locked: */
__be32
-nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
+nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
char *fname, int flen, struct iattr *iap,
int type, dev_t rdev, struct svc_fh *resfhp)
{
@@ -1154,50 +1147,15 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 err2;
int host_err;

- err = nfserr_exist;
- if (isdotent(fname, flen))
- goto out;
-
- /*
- * Even though it is a create, first let's see if we are even allowed
- * to peek inside the parent
- */
- err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
- if (err)
- goto out;
-
dentry = fhp->fh_dentry;
dirp = d_inode(dentry);

- /*
- * Check whether the response file handle has been verified yet.
- * If it has, the parent directory should already be locked.
- */
- if (!resfhp->fh_dentry) {
- host_err = fh_want_write(fhp);
- if (host_err)
- goto out_nfserr;
-
- /* called from nfsd_proc_mkdir, or possibly nfsd3_proc_create */
- fh_lock_nested(fhp, I_MUTEX_PARENT);
- dchild = lookup_one_len(fname, dentry, flen);
- host_err = PTR_ERR(dchild);
- if (IS_ERR(dchild))
- goto out_nfserr;
- err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
- if (err)
- goto out;
- } else {
- /* called from nfsd_proc_create */
- dchild = dget(resfhp->fh_dentry);
- if (!fhp->fh_locked) {
- /* not actually possible */
- printk(KERN_ERR
- "nfsd_create: parent %pd2 not locked!\n",
+ dchild = dget(resfhp->fh_dentry);
+ if (!fhp->fh_locked) {
+ WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n",
dentry);
- err = nfserr_io;
- goto out;
- }
+ err = nfserr_io;
+ goto out;
}
/*
* Make sure the child dentry is still negative ...
@@ -1225,9 +1183,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
goto out;
}

- /*
- * Get the dir op function pointer.
- */
err = 0;
host_err = 0;
switch (type) {
@@ -1254,7 +1209,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
/*
* nfsd_create_setattr already committed the child. Transactional
* filesystems had a chance to commit changes for both parent and
- * child * simultaneously making the following commit_metadata a
+ * child simultaneously making the following commit_metadata a
* noop.
*/
err2 = nfserrno(commit_metadata(fhp));
@@ -1275,6 +1230,54 @@ out_nfserr:
goto out;
}

+/*
+ * Create a filesystem object (regular, directory, special).
+ * Note that the parent directory is left locked.
+ *
+ * N.B. Every call to nfsd_create needs an fh_put for _both_ fhp and resfhp
+ */
+__be32
+nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ char *fname, int flen, struct iattr *iap,
+ int type, dev_t rdev, struct svc_fh *resfhp)
+{
+ struct dentry *dentry, *dchild = NULL;
+ struct inode *dirp;
+ __be32 err;
+ int host_err;
+
+ if (isdotent(fname, flen))
+ return nfserr_exist;
+
+ /*
+ * Even though it is a create, first let's see if we are even allowed
+ * to peek inside the parent
+ */
+ err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
+ if (err)
+ return err;
+
+ dentry = fhp->fh_dentry;
+ dirp = d_inode(dentry);
+
+ host_err = fh_want_write(fhp);
+ if (host_err)
+ return nfserrno(host_err);
+
+ fh_lock_nested(fhp, I_MUTEX_PARENT);
+ dchild = lookup_one_len(fname, dentry, flen);
+ host_err = PTR_ERR(dchild);
+ if (IS_ERR(dchild))
+ return nfserrno(host_err);
+ err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
+ if (err) {
+ dput(dchild);
+ return err;
+ }
+ return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type,
+ rdev, resfhp);
+}
+
#ifdef CONFIG_NFSD_V3

/*
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 2d573ec057f8..3cbb1b33777b 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -59,6 +59,9 @@ __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *,
__be32 nfsd4_clone_file_range(struct file *, u64, struct file *,
u64, u64);
#endif /* CONFIG_NFSD_V4 */
+__be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *,
+ char *name, int len, struct iattr *attrs,
+ int type, dev_t rdev, struct svc_fh *res);
__be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
char *name, int len, struct iattr *attrs,
int type, dev_t rdev, struct svc_fh *res);
--
2.7.4


2016-07-22 17:49:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 3/7] nfsd: remove redundant i_lookup check

From: "J. Bruce Fields" <[email protected]>

I'm not sure why this was added. It doesn't seem necessary, and no
other caller does this.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/vfs.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fba8e7e521e0..7ae3b5a72a4d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1169,9 +1169,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
dentry = fhp->fh_dentry;
dirp = d_inode(dentry);

- err = nfserr_notdir;
- if (!dirp->i_op->lookup)
- goto out;
/*
* Check whether the response file handle has been verified yet.
* If it has, the parent directory should already be locked.
--
2.7.4


2016-07-22 17:49:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 2/7] nfsd: remove redundant zero-length check from create

From: "J. Bruce Fields" <[email protected]>

lookup_one_len already has this check.

The only effect of this patch is to return access instead of perm in the
0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.

The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfsproc.c | 3 ---
fs/nfsd/vfs.c | 3 ---
2 files changed, 6 deletions(-)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 409fdde8e95d..c30b12388bf6 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -250,9 +250,6 @@ nfsd_proc_create(struct svc_rqst *rqstp, struct nfsd_createargs *argp,

/* Check for NFSD_MAY_WRITE in nfsd_create if necessary */

- nfserr = nfserr_acces;
- if (!argp->len)
- goto done;
nfserr = nfserr_exist;
if (isdotent(argp->name, argp->len))
goto done;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index fda4f86161f8..fba8e7e521e0 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1154,9 +1154,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 err2;
int host_err;

- err = nfserr_perm;
- if (!flen)
- goto out;
err = nfserr_exist;
if (isdotent(fname, flen))
goto out;
--
2.7.4


2016-07-22 17:49:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Make creates return EEXIST correctly instead of EPERM

From: "J. Bruce Fields" <[email protected]>

On Fri, Jul 22, 2016 at 11:13:20AM -0400, Oleg Drokin wrote:
> Hm, in fact indeed. I was just too worked up about the client side,
> but on the server side there was a real lookup already, so it does
> look workable.

So I end up with the following. This is all (after your patch) pretty
trivial cleanup, but I think it's overdue for that code.

--b.


J. Bruce Fields (6):
nfsd: remove redundant zero-length check from create
nfsd: remove redundant i_lookup check
nfsd: reorganize nfsd_create
nfsd: remove unnecessary positive-dentry check
nfsd: clean up bad-type check in nfsd_create_locked
nfsd: drop unnecessary MAY_EXEC check from create

Oleg Drokin (1):
nfsd: Make creates return EEXIST instead of EACCES

fs/nfsd/nfs4proc.c | 3 +-
fs/nfsd/nfsproc.c | 7 +--
fs/nfsd/vfs.c | 131 ++++++++++++++++++++++++-----------------------------
fs/nfsd/vfs.h | 3 ++
4 files changed, 66 insertions(+), 78 deletions(-)

--
2.7.4


2016-07-22 17:49:06

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 6/7] nfsd: clean up bad-type check in nfsd_create_locked

From: "J. Bruce Fields" <[email protected]>

Minor cleanup, no change in behavior.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/vfs.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d45b39b408a1..cd06c6511cfc 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1166,13 +1166,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
iap->ia_mode = 0;
iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;

- err = nfserr_inval;
- if (!S_ISREG(type) && !S_ISDIR(type) && !special_file(type)) {
- printk(KERN_WARNING "nfsd: bad file type %o in nfsd_create\n",
- type);
- goto out;
- }
-
err = 0;
host_err = 0;
switch (type) {
@@ -1190,6 +1183,10 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
case S_IFSOCK:
host_err = vfs_mknod(dirp, dchild, iap->ia_mode, rdev);
break;
+ default:
+ printk(KERN_WARNING "nfsd: bad file type %o in nfsd_create\n",
+ type);
+ host_err = -EINVAL;
}
if (host_err < 0)
goto out_nfserr;
--
2.7.4


2016-07-22 17:49:06

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 5/7] nfsd: remove unnecessary positive-dentry check

From: "J. Bruce Fields" <[email protected]>

vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.

This check is therefore unnecessary.

(In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed. I'm leaving that behavior alone.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/vfs.c | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9c7830cdaeda..d45b39b408a1 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1157,17 +1157,7 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
err = nfserr_io;
goto out;
}
- /*
- * Make sure the child dentry is still negative ...
- */
- err = nfserr_exist;
- if (d_really_is_positive(dchild)) {
- dprintk("nfsd_create: dentry %pd/%pd not negative!\n",
- dentry, dchild);
- goto out;
- }

- /* Now let's see if we actually have permissions to create */
err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE);
if (err)
goto out;
--
2.7.4


2016-07-22 17:49:06

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 7/7] nfsd: drop unnecessary MAY_EXEC check from create

From: "J. Bruce Fields" <[email protected]>

We need an fh_verify to make sure we at least have a dentry, but actual
permission checks happen later.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 7 +------
fs/nfsd/vfs.c | 6 +-----
2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9d7e1edf0cca..1fb222752b2b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -605,12 +605,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

fh_init(&resfh, NFS4_FHSIZE);

- /*
- * We just check that parent is accessible here, nfsd_* do their
- * own access permission checks
- */
- status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR,
- NFSD_MAY_EXEC);
+ status = fh_verify(rqstp, &cstate->current_fh, S_IFDIR, NFSD_MAY_NOP);
if (status)
return status;

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cd06c6511cfc..c844fd601381 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1236,11 +1236,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (isdotent(fname, flen))
return nfserr_exist;

- /*
- * Even though it is a create, first let's see if we are even allowed
- * to peek inside the parent
- */
- err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC);
+ err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_NOP);
if (err)
return err;

--
2.7.4


2016-07-24 00:22:44

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/7] nfsd: remove redundant i_lookup check

On Fri, Jul 22, 2016 at 01:48:52PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> I'm not sure why this was added. It doesn't seem necessary, and no
> other caller does this.

lookup_one_len() will explode if you call it for non-directory (==
!d_can_lookup(), i.e. something without ->lookup()). So unless the callers
do guarantee that check being true, it *is* needed.

2016-07-24 12:10:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/7] nfsd: remove redundant i_lookup check

On Sun, Jul 24, 2016 at 01:22:06AM +0100, Al Viro wrote:
> On Fri, Jul 22, 2016 at 01:48:52PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > I'm not sure why this was added. It doesn't seem necessary, and no
> > other caller does this.
>
> lookup_one_len() will explode if you call it for non-directory (==
> !d_can_lookup(), i.e. something without ->lookup()). So unless the callers
> do guarantee that check being true, it *is* needed.

Both callers call fh_verify(.,.,S_IFDIR,.), so at this point we know
that i_mode & S_IFMT == S_IFDIR. Is there some odd case where that's
insufficient? If so, I think there may be bugs elsewhere in nfsd. If
not, I'll add a note to the changelog.

Thanks for reminding me to check this, I hadn't thought of that as an
"is this a directory" check, it makes more sense now.

--b.

2016-07-24 14:23:16

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH 3/7] nfsd: remove redundant i_lookup check

On Sun, Jul 24, 2016 at 08:10:14AM -0400, J. Bruce Fields wrote:
> On Sun, Jul 24, 2016 at 01:22:06AM +0100, Al Viro wrote:
> > On Fri, Jul 22, 2016 at 01:48:52PM -0400, J. Bruce Fields wrote:
> > > From: "J. Bruce Fields" <[email protected]>
> > >
> > > I'm not sure why this was added. It doesn't seem necessary, and no
> > > other caller does this.
> >
> > lookup_one_len() will explode if you call it for non-directory (==
> > !d_can_lookup(), i.e. something without ->lookup()). So unless the callers
> > do guarantee that check being true, it *is* needed.
>
> Both callers call fh_verify(.,.,S_IFDIR,.), so at this point we know
> that i_mode & S_IFMT == S_IFDIR. Is there some odd case where that's
> insufficient? If so, I think there may be bugs elsewhere in nfsd. If
> not, I'll add a note to the changelog.

First of all, such objects do exist; they probably won't be encountered by
nfsd and all instances I can think of are not writable, but...

> Thanks for reminding me to check this, I hadn't thought of that as an
> "is this a directory" check, it makes more sense now.

I'd have turned that into d_can_lookup(fhp->fh_dentry), actually.

2016-07-24 20:21:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/7] nfsd: remove redundant i_lookup check

On Sun, Jul 24, 2016 at 03:23:07PM +0100, Al Viro wrote:
> On Sun, Jul 24, 2016 at 08:10:14AM -0400, J. Bruce Fields wrote:
> > On Sun, Jul 24, 2016 at 01:22:06AM +0100, Al Viro wrote:
> > > On Fri, Jul 22, 2016 at 01:48:52PM -0400, J. Bruce Fields wrote:
> > > > From: "J. Bruce Fields" <[email protected]>
> > > >
> > > > I'm not sure why this was added. It doesn't seem necessary, and no
> > > > other caller does this.
> > >
> > > lookup_one_len() will explode if you call it for non-directory (==
> > > !d_can_lookup(), i.e. something without ->lookup()). So unless the callers
> > > do guarantee that check being true, it *is* needed.
> >
> > Both callers call fh_verify(.,.,S_IFDIR,.), so at this point we know
> > that i_mode & S_IFMT == S_IFDIR. Is there some odd case where that's
> > insufficient? If so, I think there may be bugs elsewhere in nfsd. If
> > not, I'll add a note to the changelog.
>
> First of all, such objects do exist; they probably won't be encountered by
> nfsd and all instances I can think of are not writable, but...
>
> > Thanks for reminding me to check this, I hadn't thought of that as an
> > "is this a directory" check, it makes more sense now.
>
> I'd have turned that into d_can_lookup(fhp->fh_dentry), actually.

So would such a check mainly just protect developers from themselves if
they try to make a weird filesystems exportable?

If we need to catch this I'd rather do it in fh_verify, which would
cover some other operations, too. Maybe like the below. We could be
nicer and WARN()/error out instead of BUG. But it's unclear to me
whether this case is worth checking for at all.

--b.

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 27250e279c37..372747a00214 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -59,14 +59,17 @@ static int nfsd_acceptable(void *expv, struct dentry *dentry)
* the write call).
*/
static inline __be32
-nfsd_mode_check(struct svc_rqst *rqstp, umode_t mode, umode_t requested)
+nfsd_mode_check(struct svc_rqst *rqstp, struct dentry *dentry,
+ umode_t requested)
{
- mode &= S_IFMT;
+ umode_t mode = d_inode(dentry)->i_mode & S_IFMT;

if (requested == 0) /* the caller doesn't care */
return nfs_ok;
- if (mode == requested)
+ if (mode == requested) {
+ BUG_ON(mode == S_IFDIR && !d_can_lookup(dentry));
return nfs_ok;
+ }
/*
* v4 has an error more specific than err_notdir which we should
* return in preference to err_notdir:
@@ -340,7 +343,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
if (error)
goto out;

- error = nfsd_mode_check(rqstp, d_inode(dentry)->i_mode, type);
+ error = nfsd_mode_check(rqstp, dentry, type);
if (error)
goto out;