2015-12-25 12:41:31

by Donald Buczek

[permalink] [raw]
Subject: [PATCH] nfs: do not deny execute access based on outdated mode in inode

This patch fixes a problem, that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.

The access check based on the file mode is not required, because
the server already verified the clients rights.

Remove the test.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Signed-off-by: Donald Buczek <[email protected]>
---
fs/nfs/dir.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ce5a218..ffc25b0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2481,9 +2481,6 @@ force_lookup:
res = PTR_ERR(cred);
}
out:
- if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
- res = -EACCES;
-
dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
inode->i_sb->s_id, inode->i_ino, mask, res);
return res;
--
2.4.1



2015-12-26 18:36:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Fri, Dec 25, 2015 at 7:30 AM, Donald Buczek <[email protected]> wrote:
> This patch fixes a problem, that a nfs4 client incorrectly denies
> execute access based on outdated file mode (missing 'x' bit).
> After the mode on the server is 'fixed' (chmod +x) further execution
> attempts continue to fail, because the nfs ACCESS call updates
> the access parameter but not the mode parameter or the mode in
> the inode.
>
> The access check based on the file mode is not required, because
> the server already verified the clients rights.
>
> Remove the test.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
> Signed-off-by: Donald Buczek <[email protected]>
> ---
> fs/nfs/dir.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ce5a218..ffc25b0 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -2481,9 +2481,6 @@ force_lookup:
> res = PTR_ERR(cred);
> }
> out:
> - if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
> - res = -EACCES;
> -
> dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
> inode->i_sb->s_id, inode->i_ino, mask, res);
> return res;
>

My main question here is why the client isn't picking up the changed
mode bits here? All open() calls should be asking for the full set of
attributes as part of the OPEN COMPOUND rpc call.

Cheers
Trond

2015-12-26 23:58:34

by Donald Buczek

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On 26.12.2015 19:36, Trond Myklebust wrote:
> On Fri, Dec 25, 2015 at 7:30 AM, Donald Buczek <[email protected]> wrote:
>> This patch fixes a problem, that a nfs4 client incorrectly denies
>> execute access based on outdated file mode (missing 'x' bit).
>> After the mode on the server is 'fixed' (chmod +x) further execution
>> attempts continue to fail, because the nfs ACCESS call updates
>> the access parameter but not the mode parameter or the mode in
>> the inode.
>>
>> The access check based on the file mode is not required, because
>> the server already verified the clients rights.
>>
>> Remove the test.
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
>> Signed-off-by: Donald Buczek <[email protected]>
>> ---
>> fs/nfs/dir.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index ce5a218..ffc25b0 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -2481,9 +2481,6 @@ force_lookup:
>> res = PTR_ERR(cred);
>> }
>> out:
>> - if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
>> - res = -EACCES;
>> -
>> dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
>> inode->i_sb->s_id, inode->i_ino, mask, res);
>> return res;
>>
> My main question here is why the client isn't picking up the changed
> mode bits here? All open() calls should be asking for the full set of
> attributes as part of the OPEN COMPOUND rpc call.
>
> Cheers
> Trond

Its from fs/namei.c do_last() :

> finish_open_created:
> error = may_open(&nd->path, acc_mode, open_flag);
> if (error)
> goto out;
>
> BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
> error = vfs_open(&nd->path, file, current_cred());


may_open() -> inode_permission() -> __inode_permission() ->
do_inode_permission() -> inode->i_op->permission() -> nfs_permission()
first

vfs_open() -> do_dentry_open() -> inode->i_fop->open() ->
nfs4_file_open() later


Merry Christmas

Donald


--
Donald Buczek
[email protected]
Tel: +49 30 8413 1433


2015-12-27 00:11:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Sat, Dec 26, 2015 at 6:58 PM, Donald Buczek <[email protected]> wrote:
> On 26.12.2015 19:36, Trond Myklebust wrote:
>>
>> On Fri, Dec 25, 2015 at 7:30 AM, Donald Buczek <[email protected]>
>> wrote:
>>>
>>> This patch fixes a problem, that a nfs4 client incorrectly denies
>>> execute access based on outdated file mode (missing 'x' bit).
>>> After the mode on the server is 'fixed' (chmod +x) further execution
>>> attempts continue to fail, because the nfs ACCESS call updates
>>> the access parameter but not the mode parameter or the mode in
>>> the inode.
>>>
>>> The access check based on the file mode is not required, because
>>> the server already verified the clients rights.
>>>
>>> Remove the test.
>>>
>>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
>>> Signed-off-by: Donald Buczek <[email protected]>
>>> ---
>>> fs/nfs/dir.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>> index ce5a218..ffc25b0 100644
>>> --- a/fs/nfs/dir.c
>>> +++ b/fs/nfs/dir.c
>>> @@ -2481,9 +2481,6 @@ force_lookup:
>>> res = PTR_ERR(cred);
>>> }
>>> out:
>>> - if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
>>> - res = -EACCES;
>>> -
>>> dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
>>> inode->i_sb->s_id, inode->i_ino, mask, res);
>>> return res;
>>>
>> My main question here is why the client isn't picking up the changed
>> mode bits here? All open() calls should be asking for the full set of
>> attributes as part of the OPEN COMPOUND rpc call.
>>
>> Cheers
>> Trond
>
>
> Its from fs/namei.c do_last() :
>
>> finish_open_created:
>> error = may_open(&nd->path, acc_mode, open_flag);
>> if (error)
>> goto out;
>>
>> BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
>> error = vfs_open(&nd->path, file, current_cred());
>
>
>
> may_open() -> inode_permission() -> __inode_permission() ->
> do_inode_permission() -> inode->i_op->permission() -> nfs_permission()
> first
>
> vfs_open() -> do_dentry_open() -> inode->i_fop->open() -> nfs4_file_open()
> later

Ah... IOW: That so called "fast path" crap in do_last() is screwing us
over by forcing us to to 2 ACCESS calls. The first being done for no
good reason by the VFS, and the second being done in the OPEN call to
the server in the NFS client.

> Merry Christmas

Suggestions Al?

Trond

2015-12-27 00:38:42

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Sat, Dec 26, 2015 at 07:11:16PM -0500, Trond Myklebust wrote:

> Ah... IOW: That so called "fast path" crap in do_last() is screwing us
> over by forcing us to to 2 ACCESS calls. The first being done for no
> good reason by the VFS, and the second being done in the OPEN call to
> the server in the NFS client.

Excuse me, but this is complete BS. First of all, are you suggesting
that every filesystem out there must push the permission checks into
its ->open()? This may_open() is where they normally happen...

Moreover, the things like "this mount is r/o, don't let anything to
be opened on it" are inherently client-side. It's a property of
vfsmount; the same fs instance might be accessible r/w in another
place. And things like opening a device node, etc. are not reaching
the server at all.

> > Merry Christmas
>
> Suggestions Al?

Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
that things will be caught by server anyway?

2015-12-27 01:26:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Sat, Dec 26, 2015 at 7:38 PM, Al Viro <[email protected]> wrote:
> On Sat, Dec 26, 2015 at 07:11:16PM -0500, Trond Myklebust wrote:
>
>> Ah... IOW: That so called "fast path" crap in do_last() is screwing us
>> over by forcing us to to 2 ACCESS calls. The first being done for no
>> good reason by the VFS, and the second being done in the OPEN call to
>> the server in the NFS client.
>
> Excuse me, but this is complete BS. First of all, are you suggesting
> that every filesystem out there must push the permission checks into
> its ->open()? This may_open() is where they normally happen...
>
> Moreover, the things like "this mount is r/o, don't let anything to
> be opened on it" are inherently client-side. It's a property of
> vfsmount; the same fs instance might be accessible r/w in another
> place. And things like opening a device node, etc. are not reaching
> the server at all.

The may_open() is now happening before NFS gets a chance to issue the
OPEN rpc call, which is a change w.r.t. the lookup intent code. The
ordering is significant because it means the OPEN can no longer prime
the access cache.

>> > Merry Christmas
>>
>> Suggestions Al?
>
> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
> that things will be caught by server anyway?

That can work as long as we're guaranteed that everything that calls
inode_permission() with MAY_OPEN on a regular file will also follow up
with a vfs_open() or dentry_open() on success. Is this always the
case?

2015-12-27 02:28:35

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Sat, Dec 26, 2015 at 08:26:13PM -0500, Trond Myklebust wrote:

> The may_open() is now happening before NFS gets a chance to issue the
> OPEN rpc call, which is a change w.r.t. the lookup intent code. The
> ordering is significant because it means the OPEN can no longer prime
> the access cache.

Always had... Consider e.g. device nodes; there ->open() might very well
have side effects, and ones you do not want to allow for any random caller.
Permissions checks always had been done prior to ->open(), it's not something
new.

> >> > Merry Christmas
> >>
> >> Suggestions Al?
> >
> > Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
> > that things will be caught by server anyway?
>
> That can work as long as we're guaranteed that everything that calls
> inode_permission() with MAY_OPEN on a regular file will also follow up
> with a vfs_open() or dentry_open() on success. Is this always the
> case?

1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
it doesn't have ->tmpfile() instance anyway)

2) in atomic_open(), after the call of ->atomic_open() has succeeded.

3) in do_last(), followed on success by vfs_open()

That's all. All calls of inode_permission() that get MAY_OPEN come from
may_open(), and there's no other callers of that puppy.


PS: I'm not sure we want to carry that MAY_OPEN in op->acc_mode, actually.
may_open() gets acc_mode without MAY_OPEN only when called from do_last()
in case of O_PATH open. The check in the very beginning of may_open()
triggers only in that case and might as well be replaced with
if (likely(!(open_flag & O_PATH))) {
error = may_open(&nd->path, acc_mode, open_flag);
if (error)
goto out;
}
in that call site (one right after finish_open_created:). Then we could
bloody well have may_open() do
error = inode_permission(inode, acc_mode | MAY_OPEN);
and forget about MAY_OPEN in op->acc_mode.

Something like the patch below should be an equivalent transformation and with
that it's really clear what's going on with MAY_OPEN. Warning: completely
untested.

diff --git a/fs/exec.c b/fs/exec.c
index b06623a..828ec5f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -119,7 +119,7 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
int error = PTR_ERR(tmp);
static const struct open_flags uselib_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
- .acc_mode = MAY_READ | MAY_EXEC | MAY_OPEN,
+ .acc_mode = MAY_READ | MAY_EXEC,
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};
@@ -763,7 +763,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
int err;
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
- .acc_mode = MAY_EXEC | MAY_OPEN,
+ .acc_mode = MAY_EXEC,
.intent = LOOKUP_OPEN,
.lookup_flags = LOOKUP_FOLLOW,
};
diff --git a/fs/namei.c b/fs/namei.c
index 9e102ac..45c702e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2663,10 +2663,6 @@ static int may_open(struct path *path, int acc_mode, int flag)
struct inode *inode = dentry->d_inode;
int error;

- /* O_PATH? */
- if (!acc_mode)
- return 0;
-
if (!inode)
return -ENOENT;

@@ -2688,7 +2684,7 @@ static int may_open(struct path *path, int acc_mode, int flag)
break;
}

- error = inode_permission(inode, acc_mode);
+ error = inode_permission(inode, MAY_OPEN | acc_mode);
if (error)
return error;

@@ -2880,7 +2876,7 @@ static int atomic_open(struct nameidata *nd, struct dentry *dentry,
if (*opened & FILE_CREATED) {
WARN_ON(!(open_flag & O_CREAT));
fsnotify_create(dir, dentry);
- acc_mode = MAY_OPEN;
+ acc_mode = 0;
}
error = may_open(&file->f_path, acc_mode, open_flag);
if (error)
@@ -3093,7 +3089,7 @@ retry_lookup:
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
- acc_mode = MAY_OPEN;
+ acc_mode = 0;
path_to_nameidata(&path, nd);
goto finish_open_created;
}
@@ -3177,10 +3173,11 @@ finish_open:
got_write = true;
}
finish_open_created:
- error = may_open(&nd->path, acc_mode, open_flag);
- if (error)
- goto out;
-
+ if (likely(!(open_flag & O_PATH))) {
+ error = may_open(&nd->path, acc_mode, open_flag);
+ if (error)
+ goto out;
+ }
BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
error = vfs_open(&nd->path, file, current_cred());
if (!error) {
@@ -3267,7 +3264,7 @@ static int do_tmpfile(struct nameidata *nd, unsigned flags,
goto out2;
audit_inode(nd->name, child, 0);
/* Don't check for other permissions, the inode was just created */
- error = may_open(&path, MAY_OPEN, op->open_flag);
+ error = may_open(&path, 0, op->open_flag);
if (error)
goto out2;
file->f_path.mnt = path.mnt;
diff --git a/fs/open.c b/fs/open.c
index b6f1e96..b25b154 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -887,7 +887,7 @@ EXPORT_SYMBOL(dentry_open);
static inline int build_open_flags(int flags, umode_t mode, struct open_flags *op)
{
int lookup_flags = 0;
- int acc_mode;
+ int acc_mode = ACC_MODE(flags);

if (flags & (O_CREAT | __O_TMPFILE))
op->mode = (mode & S_IALLUGO) | S_IFREG;
@@ -909,7 +909,6 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
if (flags & __O_TMPFILE) {
if ((flags & O_TMPFILE_MASK) != O_TMPFILE)
return -EINVAL;
- acc_mode = MAY_OPEN | ACC_MODE(flags);
if (!(acc_mode & MAY_WRITE))
return -EINVAL;
} else if (flags & O_PATH) {
@@ -919,8 +918,6 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
*/
flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
acc_mode = 0;
- } else {
- acc_mode = MAY_OPEN | ACC_MODE(flags);
}

op->open_flag = flags;

2015-12-27 02:54:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: do not deny execute access based on outdated mode in inode

On Sat, Dec 26, 2015 at 9:28 PM, Al Viro <[email protected]> wrote:
> On Sat, Dec 26, 2015 at 08:26:13PM -0500, Trond Myklebust wrote:
>
>> The may_open() is now happening before NFS gets a chance to issue the
>> OPEN rpc call, which is a change w.r.t. the lookup intent code. The
>> ordering is significant because it means the OPEN can no longer prime
>> the access cache.
>
> Always had... Consider e.g. device nodes; there ->open() might very well
> have side effects, and ones you do not want to allow for any random caller.
> Permissions checks always had been done prior to ->open(), it's not something
> new.

When we did lookup intents, the d_revalidate() would take a lookup
intent, which would trigger the OPEN call for NFS before we got
anywhere near permissions checking. As I said, the removal of lookup
intents makes that impossible, and hence now we have the inefficiency.

>
>> >> > Merry Christmas
>> >>
>> >> Suggestions Al?
>> >
>> > Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>> > that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.

Right, so we don't care there either.

> 3) in do_last(), followed on success by vfs_open()
>
> That's all. All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.

OK.

> PS: I'm not sure we want to carry that MAY_OPEN in op->acc_mode, actually.
> may_open() gets acc_mode without MAY_OPEN only when called from do_last()
> in case of O_PATH open. The check in the very beginning of may_open()
> triggers only in that case and might as well be replaced with
> if (likely(!(open_flag & O_PATH))) {
> error = may_open(&nd->path, acc_mode, open_flag);
> if (error)
> goto out;
> }
> in that call site (one right after finish_open_created:). Then we could
> bloody well have may_open() do
> error = inode_permission(inode, acc_mode | MAY_OPEN);
> and forget about MAY_OPEN in op->acc_mode.
>
> Something like the patch below should be an equivalent transformation and with
> that it's really clear what's going on with MAY_OPEN. Warning: completely
> untested.

That looks right and would indeed make it easier to trace. I'll push
the changes to nfs_permission()

Thanks!
Trond

2015-12-27 03:07:13

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFSv4: Don't perform cached access checks before we've OPENed the file

Donald Buczek reports that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.

The root cause is ultimately that the VFS is calling may_open()
before the NFS client has a chance to OPEN the file and hence revalidate
the access and attribute caches.

Al Viro suggests:
>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>> that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>
> 3) in do_last(), followed on success by vfs_open()
>
> That's all. All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.

Reported-by: Donald Buczek <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Link: http://lkml.kernel.org/r/[email protected]
Cc: Al Viro <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
Hi Donald,
Can you check if this fixes the issue for you?

fs/nfs/dir.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ce5a21861074..44e519c21e18 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2449,6 +2449,9 @@ int nfs_permission(struct inode *inode, int mask)
case S_IFLNK:
goto out;
case S_IFREG:
+ if ((mask & MAY_OPEN) &&
+ nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
+ return 0;
break;
case S_IFDIR:
/*
--
2.5.0


2015-12-27 12:18:53

by Donald Buczek

[permalink] [raw]
Subject: Re: [PATCH] NFSv4: Don't perform cached access checks before we've OPENed the file



On 27.12.2015 04:06, Trond Myklebust wrote:
> Donald Buczek reports that a nfs4 client incorrectly denies
> execute access based on outdated file mode (missing 'x' bit).
> After the mode on the server is 'fixed' (chmod +x) further execution
> attempts continue to fail, because the nfs ACCESS call updates
> the access parameter but not the mode parameter or the mode in
> the inode.
>
> The root cause is ultimately that the VFS is calling may_open()
> before the NFS client has a chance to OPEN the file and hence revalidate
> the access and attribute caches.
>
> Al Viro suggests:
>>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>>> that things will be caught by server anyway?
>>> That can work as long as we're guaranteed that everything that calls
>>> inode_permission() with MAY_OPEN on a regular file will also follow up
>>> with a vfs_open() or dentry_open() on success. Is this always the
>>> case?
>> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
>> it doesn't have ->tmpfile() instance anyway)
>>
>> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>>
>> 3) in do_last(), followed on success by vfs_open()
>>
>> That's all. All calls of inode_permission() that get MAY_OPEN come from
>> may_open(), and there's no other callers of that puppy.
> Reported-by: Donald Buczek <[email protected]>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
> Link: http://lkml.kernel.org/r/[email protected]
> Cc: Al Viro <[email protected]>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> Hi Donald,
> Can you check if this fixes the issue for you?
>
> fs/nfs/dir.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ce5a21861074..44e519c21e18 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -2449,6 +2449,9 @@ int nfs_permission(struct inode *inode, int mask)
> case S_IFLNK:
> goto out;
> case S_IFREG:
> + if ((mask & MAY_OPEN) &&
> + nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
> + return 0;
> break;
> case S_IFDIR:
> /*


I can confirm that this fixes the original issue. However, even with
this patch, calls to the access syscall would continue to deliver
failure based on obsolete modes forever. This can be seen as a bug, too.

PS:

I don't yet understand the point of execute_ok. It doesn't even consider
the uid.

Apart from that two suggestions to consider:

* If we go over the server for ACCESS anyway, we could combine it
with a GETATTR compound operation. Then we would be ready for additional
client-side checks against the inode.
* If we look at the mode, we should validate it first (
nfs_revalidate_inode ? )

Regards
Donald


2015-12-27 16:23:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFSv4: Don't perform cached access checks before we've OPENed the file

On Sun, Dec 27, 2015 at 7:18 AM, Donald Buczek <[email protected]> wrote:
>
>
> On 27.12.2015 04:06, Trond Myklebust wrote:
>>
>> Donald Buczek reports that a nfs4 client incorrectly denies
>> execute access based on outdated file mode (missing 'x' bit).
>> After the mode on the server is 'fixed' (chmod +x) further execution
>> attempts continue to fail, because the nfs ACCESS call updates
>> the access parameter but not the mode parameter or the mode in
>> the inode.
>>
>> The root cause is ultimately that the VFS is calling may_open()
>> before the NFS client has a chance to OPEN the file and hence revalidate
>> the access and attribute caches.
>>
>> Al Viro suggests:
>>>>>
>>>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you
>>>>> know
>>>>> that things will be caught by server anyway?
>>>>
>>>> That can work as long as we're guaranteed that everything that calls
>>>> inode_permission() with MAY_OPEN on a regular file will also follow up
>>>> with a vfs_open() or dentry_open() on success. Is this always the
>>>> case?
>>>
>>> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS
>>> since
>>> it doesn't have ->tmpfile() instance anyway)
>>>
>>> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>>>
>>> 3) in do_last(), followed on success by vfs_open()
>>>
>>> That's all. All calls of inode_permission() that get MAY_OPEN come from
>>> may_open(), and there's no other callers of that puppy.
>>
>> Reported-by: Donald Buczek <[email protected]>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
>> Link:
>> http://lkml.kernel.org/r/[email protected]
>> Cc: Al Viro <[email protected]>
>> Signed-off-by: Trond Myklebust <[email protected]>
>> ---
>> Hi Donald,
>> Can you check if this fixes the issue for you?
>>
>> fs/nfs/dir.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index ce5a21861074..44e519c21e18 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -2449,6 +2449,9 @@ int nfs_permission(struct inode *inode, int mask)
>> case S_IFLNK:
>> goto out;
>> case S_IFREG:
>> + if ((mask & MAY_OPEN) &&
>> + nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
>> + return 0;
>> break;
>> case S_IFDIR:
>> /*
>
>
>
> I can confirm that this fixes the original issue. However, even with this
> patch, calls to the access syscall would continue to deliver failure based
> on obsolete modes forever. This can be seen as a bug, too.

No. What happens now is that the OPEN compound executes before any
ACCESS calls, and so it refreshes the inode attributes and the access
cache.

> PS:
>
> I don't yet understand the point of execute_ok. It doesn't even consider the
> uid.

...or the group ownership or anything other than whether or not at
least one execute bit is set. That's a convention that was set in the
VFS a long time ago, and that Miklos' patches later pushed down into
the filesystems.
I'm OK with removing it, if someone can explain to me what it was
intended to enforce in the first place, so that we can have a
discussion about why it may be obsolete.

> Apart from that two suggestions to consider:
>
> * If we go over the server for ACCESS anyway, we could combine it with a
> GETATTR compound operation. Then we would be ready for additional
> client-side checks against the inode.
>
> * If we look at the mode, we should validate it first (
> nfs_revalidate_inode ? )

For regular files, we now only go to the server for ACCESS if we're
holding a delegation, in which case we already know the values for the
attributes; the client is authoritative in that case.

Cheers
Trond

2015-12-27 17:57:20

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] NFSv4: Don't perform cached access checks before we've OPENed the file

On Sun, Dec 27, 2015 at 11:23:55AM -0500, Trond Myklebust wrote:
> > PS:
> >
> > I don't yet understand the point of execute_ok. It doesn't even consider the
> > uid.
>
> ...or the group ownership or anything other than whether or not at
> least one execute bit is set. That's a convention that was set in the
> VFS a long time ago,

... by yourself, if you recall the patch that moved that check from
open_exec() to permission(), to get consistency between access() and
execve().

> and that Miklos' patches later pushed down into
> the filesystems.
> I'm OK with removing it, if someone can explain to me what it was
> intended to enforce in the first place, so that we can have a
> discussion about why it may be obsolete.

"Not even root gets to execute a binary that doesn't have a single exec bit
on it" goes _way_ back. And not just in terms of Linux -
v5 /usr/sys/ken/fio.c:access() has
if(u.u_uid == 0) {
if(m == IEXEC && (ip->i_mode &
(IEXEC | (IEXEC>>3) | (IEXEC>>6))) == 0)
return(1);
return(0);
}
so this had been introduced somewhere between v3 and v5 (AFAIK, v4 source
is gone and I hadn't crawled through the v4 manpages to see if that has
got a mention). At the very least it's been there since Nov 26 1974...

2015-12-28 19:39:13

by Donald Buczek

[permalink] [raw]
Subject: [PATCH] nfs: revalidate inode before access checks

On 27.12.2015 17:23, Trond Myklebust wrote:
> On Sun, Dec 27, 2015 at 7:18 AM, Donald Buczek <[email protected]> wrote:
>>
>> On 27.12.2015 04:06, Trond Myklebust wrote:
>>> Donald Buczek reports that a nfs4 client incorrectly denies
>>> execute access based on outdated file mode (missing 'x' bit).
>>> After the mode on the server is 'fixed' (chmod +x) further execution
>>> attempts continue to fail, because the nfs ACCESS call updates
>>> the access parameter but not the mode parameter or the mode in
>>> the inode.
>>>
>>> The root cause is ultimately that the VFS is calling may_open()
>>> before the NFS client has a chance to OPEN the file and hence revalidate
>>> the access and attribute caches.
>>>
>>> Al Viro suggests:
>>>>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you
>>>>>> know
>>>>>> that things will be caught by server anyway?
>>>>> That can work as long as we're guaranteed that everything that calls
>>>>> inode_permission() with MAY_OPEN on a regular file will also follow up
>>>>> with a vfs_open() or dentry_open() on success. Is this always the
>>>>> case?
>>>> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS
>>>> since
>>>> it doesn't have ->tmpfile() instance anyway)
>>>>
>>>> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>>>>
>>>> 3) in do_last(), followed on success by vfs_open()
>>>>
>>>> That's all. All calls of inode_permission() that get MAY_OPEN come from
>>>> may_open(), and there's no other callers of that puppy.
>>> Reported-by: Donald Buczek <[email protected]>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
>>> Link:
>>> http://lkml.kernel.org/r/[email protected]
>>> Cc: Al Viro <[email protected]>
>>> Signed-off-by: Trond Myklebust <[email protected]>
>>> ---
>>> Hi Donald,
>>> Can you check if this fixes the issue for you?
>>>
>>> fs/nfs/dir.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>> index ce5a21861074..44e519c21e18 100644
>>> --- a/fs/nfs/dir.c
>>> +++ b/fs/nfs/dir.c
>>> @@ -2449,6 +2449,9 @@ int nfs_permission(struct inode *inode, int mask)
>>> case S_IFLNK:
>>> goto out;
>>> case S_IFREG:
>>> + if ((mask & MAY_OPEN) &&
>>> + nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
>>> + return 0;
>>> break;
>>> case S_IFDIR:
>>> /*
>>
>>
>> I can confirm that this fixes the original issue. However, even with this
>> patch, calls to the access syscall would continue to deliver failure based
>> on obsolete modes forever. This can be seen as a bug, too.
> No. What happens now is that the OPEN compound executes before any
> ACCESS calls, and so it refreshes the inode attributes and the access
> cache.

No? If we come here by access() instead of execve(), we don't have MAY_OPEN but MAY_ACCESS. So we are going the old route, and permission is denied. An open will never be attempted. Of course, I've tested that before making the claim, that the problem continues to exist with access() instead of execve(). But perhaps you are referring to something else?

I think, it has to be decided, whether the ancient rule "Not even root gets to execute a binary that doesn't have a single exec bit
on it" should be enforced nowadays or not. How to decide that?

If no, the check can just be removed (from other filesystems, too)
If yes, then we want to deny execute access based on the client visible mode. In that case we'd have to ensure that the cache is properly aged out. I append the trivial patch here which worked for me.

Cheers
Donald

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Signed-off-by: Donald Buczek <[email protected]>


---
fs/nfs/dir.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ce5a218..60d42b4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2439,6 +2439,8 @@ int nfs_permission(struct inode *inode, int mask)

nfs_inc_stats(inode, NFSIOS_VFSACCESS);

+ nfs_revalidate_inode(NFS_SERVER(inode),inode);
+
if ((mask & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
goto out;
/* Is this sys_access() ? */
--
2.4.1


2015-12-28 21:10:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: revalidate inode before access checks

On Mon, Dec 28, 2015 at 2:38 PM, Donald Buczek <[email protected]> wrote:
> On 27.12.2015 17:23, Trond Myklebust wrote:
>> On Sun, Dec 27, 2015 at 7:18 AM, Donald Buczek <[email protected]> wrote:
>>>
>>> On 27.12.2015 04:06, Trond Myklebust wrote:
>>>> Donald Buczek reports that a nfs4 client incorrectly denies
>>>> execute access based on outdated file mode (missing 'x' bit).
>>>> After the mode on the server is 'fixed' (chmod +x) further execution
>>>> attempts continue to fail, because the nfs ACCESS call updates
>>>> the access parameter but not the mode parameter or the mode in
>>>> the inode.
>>>>
>>>> The root cause is ultimately that the VFS is calling may_open()
>>>> before the NFS client has a chance to OPEN the file and hence revalidate
>>>> the access and attribute caches.
>>>>
>>>> Al Viro suggests:
>>>>>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you
>>>>>>> know
>>>>>>> that things will be caught by server anyway?
>>>>>> That can work as long as we're guaranteed that everything that calls
>>>>>> inode_permission() with MAY_OPEN on a regular file will also follow up
>>>>>> with a vfs_open() or dentry_open() on success. Is this always the
>>>>>> case?
>>>>> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS
>>>>> since
>>>>> it doesn't have ->tmpfile() instance anyway)
>>>>>
>>>>> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>>>>>
>>>>> 3) in do_last(), followed on success by vfs_open()
>>>>>
>>>>> That's all. All calls of inode_permission() that get MAY_OPEN come from
>>>>> may_open(), and there's no other callers of that puppy.
>>>> Reported-by: Donald Buczek <[email protected]>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
>>>> Link:
>>>> http://lkml.kernel.org/r/[email protected]
>>>> Cc: Al Viro <[email protected]>
>>>> Signed-off-by: Trond Myklebust <[email protected]>
>>>> ---
>>>> Hi Donald,
>>>> Can you check if this fixes the issue for you?
>>>>
>>>> fs/nfs/dir.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>>> index ce5a21861074..44e519c21e18 100644
>>>> --- a/fs/nfs/dir.c
>>>> +++ b/fs/nfs/dir.c
>>>> @@ -2449,6 +2449,9 @@ int nfs_permission(struct inode *inode, int mask)
>>>> case S_IFLNK:
>>>> goto out;
>>>> case S_IFREG:
>>>> + if ((mask & MAY_OPEN) &&
>>>> + nfs_server_capable(inode, NFS_CAP_ATOMIC_OPEN))
>>>> + return 0;
>>>> break;
>>>> case S_IFDIR:
>>>> /*
>>>
>>>
>>> I can confirm that this fixes the original issue. However, even with this
>>> patch, calls to the access syscall would continue to deliver failure based
>>> on obsolete modes forever. This can be seen as a bug, too.
>> No. What happens now is that the OPEN compound executes before any
>> ACCESS calls, and so it refreshes the inode attributes and the access
>> cache.
>
> No? If we come here by access() instead of execve(), we don't have MAY_OPEN but MAY_ACCESS. So we are going the old route, and permission is denied. An open will never be attempted. Of course, I've tested that before making the claim, that the problem continues to exist with access() instead of execve(). But perhaps you are referring to something else?
>
> I think, it has to be decided, whether the ancient rule "Not even root gets to execute a binary that doesn't have a single exec bit
> on it" should be enforced nowadays or not. How to decide that?
>
> If no, the check can just be removed (from other filesystems, too)
> If yes, then we want to deny execute access based on the client visible mode. In that case we'd have to ensure that the cache is properly aged out. I append the trivial patch here which worked for me.
>
> Cheers
> Donald
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109771
> Signed-off-by: Donald Buczek <[email protected]>
>
>
> ---
> fs/nfs/dir.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ce5a218..60d42b4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -2439,6 +2439,8 @@ int nfs_permission(struct inode *inode, int mask)
>
> nfs_inc_stats(inode, NFSIOS_VFSACCESS);
>
> + nfs_revalidate_inode(NFS_SERVER(inode),inode);
> +
> if ((mask & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
> goto out;
> /* Is this sys_access() ? */
> --
> 2.4.1
>

Sorry, but that patch is not at all acceptable. It does not respect
the MAY_NOT_BLOCK flag, and so can end up deadlocking your system. It
also adds further unwanted slowness to open().

To fix access(), chdir() and their ilk, I suggest adding a helper
nfs_execute_ok() that does the right thing depending on the value of
MAY_NOT_BLOCK. Something like:

static int nfs_execute_ok(struct inode *inode, int mask)
{
struct nfs_server *server = NFS_SERVER(inode);
int ret;

if (mask & MAY_NOT_BLOCK)
ret = nfs_revalidate_inode_rcu(server, inode);
else
ret = nfs_revalidate_inode(server, inode);
if (ret == 0 && !execute_ok(inode))
ret = -EACCES;
return ret;
}

However for the open() fast path, optimising out the entire shebang is
the right thing to do if we can.

2015-12-29 00:41:21

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFS: Ensure we revalidate attributes before using execute_ok()

Donald Buczek reports that NFS clients can also report incorrect
results for access() due to lack of revalidation of attributes
before calling execute_ok().
Looking closely, it seems chdir() is afflicted with the same problem.

Fix is to ensure we call nfs_revalidate_inode_rcu() or
nfs_revalidate_inode() as appropriate before deciding to trust
execute_ok().

Reported-by: Donald Buczek <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/dir.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 44e519c21e18..5bd2f5bfaf57 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2432,6 +2432,20 @@ int nfs_may_open(struct inode *inode, struct rpc_cred *cred, int openflags)
}
EXPORT_SYMBOL_GPL(nfs_may_open);

+static int nfs_execute_ok(struct inode *inode, int mask)
+{
+ struct nfs_server *server = NFS_SERVER(inode);
+ int ret;
+
+ if (mask & MAY_NOT_BLOCK)
+ ret = nfs_revalidate_inode_rcu(server, inode);
+ else
+ ret = nfs_revalidate_inode(server, inode);
+ if (ret == 0 && !execute_ok(inode))
+ ret = -EACCES;
+ return ret;
+}
+
int nfs_permission(struct inode *inode, int mask)
{
struct rpc_cred *cred;
@@ -2484,8 +2498,8 @@ force_lookup:
res = PTR_ERR(cred);
}
out:
- if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
- res = -EACCES;
+ if (!res && (mask & MAY_EXEC))
+ res = nfs_execute_ok(inode, mask);

dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
inode->i_sb->s_id, inode->i_ino, mask, res);
--
2.5.0


2015-12-29 19:51:03

by Donald Buczek

[permalink] [raw]
Subject: Re: [PATCH] NFS: Ensure we revalidate attributes before using execute_ok()

On 29.12.2015 01:40, Trond Myklebust wrote:
> Donald Buczek reports that NFS clients can also report incorrect
> results for access() due to lack of revalidation of attributes
> before calling execute_ok().
> Looking closely, it seems chdir() is afflicted with the same problem.
>
> Fix is to ensure we call nfs_revalidate_inode_rcu() or
> nfs_revalidate_inode() as appropriate before deciding to trust
> execute_ok().
>
> Reported-by: Donald Buczek <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/dir.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 44e519c21e18..5bd2f5bfaf57 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -2432,6 +2432,20 @@ int nfs_may_open(struct inode *inode, struct rpc_cred *cred, int openflags)
> }
> EXPORT_SYMBOL_GPL(nfs_may_open);
>
> +static int nfs_execute_ok(struct inode *inode, int mask)
> +{
> + struct nfs_server *server = NFS_SERVER(inode);
> + int ret;
> +
> + if (mask & MAY_NOT_BLOCK)
> + ret = nfs_revalidate_inode_rcu(server, inode);
> + else
> + ret = nfs_revalidate_inode(server, inode);
> + if (ret == 0 && !execute_ok(inode))
> + ret = -EACCES;
> + return ret;
> +}
> +
> int nfs_permission(struct inode *inode, int mask)
> {
> struct rpc_cred *cred;
> @@ -2484,8 +2498,8 @@ force_lookup:
> res = PTR_ERR(cred);
> }
> out:
> - if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
> - res = -EACCES;
> + if (!res && (mask & MAY_EXEC))
> + res = nfs_execute_ok(inode, mask);
>
> dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
> inode->i_sb->s_id, inode->i_ino, mask, res);

This patch doesn't resolve the problem. The reason is, that there is a
nfs_do_access() before this nfs_execute_ok() in the execution path of
nfs_permission. While nfs4_proc_acccess doesn't update the mode, it
does update read_cache_jiffies. So the later nfs_revalidate_inode will
be a noop, the cache was just made to look fresh.

If nfs_revalidate_inode would be called before the nfs_do_access it
might work. I fact it would make some sense to move it before the switch
based on inode->i_mode, because the mode might change on the server, too.

Regards
Donald

PS: Sorry for my faulty patch! What a shame :(


2015-12-29 20:18:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: Ensure we revalidate attributes before using execute_ok()

On Tue, Dec 29, 2015 at 2:51 PM, Donald Buczek <[email protected]> wrote:
>
> On 29.12.2015 01:40, Trond Myklebust wrote:
>>
>> Donald Buczek reports that NFS clients can also report incorrect
>> results for access() due to lack of revalidation of attributes
>> before calling execute_ok().
>> Looking closely, it seems chdir() is afflicted with the same problem.
>>
>> Fix is to ensure we call nfs_revalidate_inode_rcu() or
>> nfs_revalidate_inode() as appropriate before deciding to trust
>> execute_ok().
>>
>> Reported-by: Donald Buczek <[email protected]>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: Trond Myklebust <[email protected]>
>> ---
>> fs/nfs/dir.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index 44e519c21e18..5bd2f5bfaf57 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -2432,6 +2432,20 @@ int nfs_may_open(struct inode *inode, struct rpc_cred *cred, int openflags)
>> }
>> EXPORT_SYMBOL_GPL(nfs_may_open);
>> +static int nfs_execute_ok(struct inode *inode, int mask)
>> +{
>> + struct nfs_server *server = NFS_SERVER(inode);
>> + int ret;
>> +
>> + if (mask & MAY_NOT_BLOCK)
>> + ret = nfs_revalidate_inode_rcu(server, inode);
>> + else
>> + ret = nfs_revalidate_inode(server, inode);
>> + if (ret == 0 && !execute_ok(inode))
>> + ret = -EACCES;
>> + return ret;
>> +}
>> +
>> int nfs_permission(struct inode *inode, int mask)
>> {
>> struct rpc_cred *cred;
>> @@ -2484,8 +2498,8 @@ force_lookup:
>> res = PTR_ERR(cred);
>> }
>> out:
>> - if (!res && (mask & MAY_EXEC) && !execute_ok(inode))
>> - res = -EACCES;
>> + if (!res && (mask & MAY_EXEC))
>> + res = nfs_execute_ok(inode, mask);
>> dfprintk(VFS, "NFS: permission(%s/%lu), mask=0x%x, res=%d\n",
>> inode->i_sb->s_id, inode->i_ino, mask, res);
>
>
> This patch doesn't resolve the problem. The reason is, that there is a nfs_do_access() before this nfs_execute_ok() in the execution path of nfs_permission. While nfs4_proc_acccess doesn't update the mode, it does update read_cache_jiffies. So the later nfs_revalidate_inode will be a noop, the cache was just made to look fresh.
>
> If nfs_revalidate_inode would be called before the nfs_do_access it might work. I fact it would make some sense to move it before the switch based on inode->i_mode, because the mode might change on the server, too.

We shouldn't be marking the inode attributes as valid if the change
attribute was modified, but we didn't get a full set of attributes.
Let me take a look at that.

> PS: Sorry for my faulty patch! What a shame :(

Not a problem. :-)

2015-12-30 00:03:21

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] NFS: Fix attribute cache revalidation

If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.

Reported-by: Donald Buczek <[email protected]>
Cc: [email protected]
Signed-off-by: Trond Myklebust <[email protected]>
---
Hi Donald,

Can you please apply this on top of the other 2 patches and see if it
fixes your access() testcase?

Thanks!


fs/nfs/inode.c | 51 +++++++++++++++++++++++++++++++++++++--------------
1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index c7e8b87da5b2..7431feeb16f1 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1641,6 +1641,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
unsigned long invalid = 0;
unsigned long now = jiffies;
unsigned long save_cache_validity;
+ bool cache_revalidated = true;

dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n",
__func__, inode->i_sb->s_id, inode->i_ino,
@@ -1702,22 +1703,28 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
nfs_force_lookup_revalidate(inode);
inode->i_version = fattr->change_attr;
}
- } else
+ } else {
nfsi->cache_validity |= save_cache_validity;
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_MTIME) {
memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
- } else if (server->caps & NFS_CAP_MTIME)
+ } else if (server->caps & NFS_CAP_MTIME) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_CTIME) {
memcpy(&inode->i_ctime, &fattr->ctime, sizeof(inode->i_ctime));
- } else if (server->caps & NFS_CAP_CTIME)
+ } else if (server->caps & NFS_CAP_CTIME) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

/* Check if our cached file size is stale */
if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
@@ -1737,19 +1744,23 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
(long long)cur_isize,
(long long)new_isize);
}
- } else
+ } else {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_PAGECACHE
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }


if (fattr->valid & NFS_ATTR_FATTR_ATIME)
memcpy(&inode->i_atime, &fattr->atime, sizeof(inode->i_atime));
- else if (server->caps & NFS_CAP_ATIME)
+ else if (server->caps & NFS_CAP_ATIME) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATIME
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_MODE) {
if ((inode->i_mode & S_IALLUGO) != (fattr->mode & S_IALLUGO)) {
@@ -1758,36 +1769,42 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
inode->i_mode = newmode;
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
}
- } else if (server->caps & NFS_CAP_MODE)
+ } else if (server->caps & NFS_CAP_MODE) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_OWNER) {
if (!uid_eq(inode->i_uid, fattr->uid)) {
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
inode->i_uid = fattr->uid;
}
- } else if (server->caps & NFS_CAP_OWNER)
+ } else if (server->caps & NFS_CAP_OWNER) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_GROUP) {
if (!gid_eq(inode->i_gid, fattr->gid)) {
invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
inode->i_gid = fattr->gid;
}
- } else if (server->caps & NFS_CAP_OWNER_GROUP)
+ } else if (server->caps & NFS_CAP_OWNER_GROUP) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_NLINK) {
if (inode->i_nlink != fattr->nlink) {
@@ -1796,19 +1813,22 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
invalid |= NFS_INO_INVALID_DATA;
set_nlink(inode, fattr->nlink);
}
- } else if (server->caps & NFS_CAP_NLINK)
+ } else if (server->caps & NFS_CAP_NLINK) {
nfsi->cache_validity |= save_cache_validity &
(NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_FORCED);
+ cache_revalidated = false;
+ }

if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
/*
* report the blocks in 512byte units
*/
inode->i_blocks = nfs_calc_block_size(fattr->du.nfs3.used);
- }
- if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
+ } else if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
inode->i_blocks = fattr->du.nfs2.blocks;
+ else
+ cache_revalidated = false;

/* Update attrtimeo value if we're out of the unstable period */
if (invalid & NFS_INO_INVALID_ATTR) {
@@ -1818,8 +1838,11 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
/* Set barrier to be more recent than all outstanding updates */
nfsi->attr_gencount = nfs_inc_attr_generation_counter();
} else {
- if (!time_in_range_open(now, nfsi->attrtimeo_timestamp, nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
- if ((nfsi->attrtimeo <<= 1) > NFS_MAXATTRTIMEO(inode))
+ if (cache_revalidated &&
+ !time_in_range_open(now, nfsi->attrtimeo_timestamp,
+ nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
+ nfsi->attrtimeo <<= 1;
+ if (nfsi->attrtimeo > NFS_MAXATTRTIMEO(inode))
nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
nfsi->attrtimeo_timestamp = now;
}
@@ -1829,7 +1852,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
}

/* Don't declare attrcache up to date if there were no attrs! */
- if (fattr->valid != 0)
+ if (cache_revalidated)
invalid &= ~NFS_INO_INVALID_ATTR;

/* Don't invalidate the data if we were to blame */
--
2.5.0


2015-12-30 11:23:10

by Donald Buczek

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix attribute cache revalidation

On 30.12.2015 01:02, Trond Myklebust wrote:
> If a NFSv4 client uses the cache_consistency_bitmask in order to
> request only information about the change attribute, timestamps and
> size, then it has not revalidated all attributes, and hence the
> attribute timeout timestamp should not be updated.
>
> Reported-by: Donald Buczek <[email protected]>
> Cc: [email protected]
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> Hi Donald,
>
> Can you please apply this on top of the other 2 patches and see if it
> fixes your access() testcase?
>
> Thanks!
>
>
> fs/nfs/inode.c | 51 +++++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 37 insertions(+), 14 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index c7e8b87da5b2..7431feeb16f1 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1641,6 +1641,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> unsigned long invalid = 0;
> unsigned long now = jiffies;
> unsigned long save_cache_validity;
> + bool cache_revalidated = true;
>
> dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n",
> __func__, inode->i_sb->s_id, inode->i_ino,
> @@ -1702,22 +1703,28 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> nfs_force_lookup_revalidate(inode);
> inode->i_version = fattr->change_attr;
> }
> - } else
> + } else {
> nfsi->cache_validity |= save_cache_validity;
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_MTIME) {
> memcpy(&inode->i_mtime, &fattr->mtime, sizeof(inode->i_mtime));
> - } else if (server->caps & NFS_CAP_MTIME)
> + } else if (server->caps & NFS_CAP_MTIME) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_CTIME) {
> memcpy(&inode->i_ctime, &fattr->ctime, sizeof(inode->i_ctime));
> - } else if (server->caps & NFS_CAP_CTIME)
> + } else if (server->caps & NFS_CAP_CTIME) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> /* Check if our cached file size is stale */
> if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
> @@ -1737,19 +1744,23 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> (long long)cur_isize,
> (long long)new_isize);
> }
> - } else
> + } else {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_REVAL_PAGECACHE
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
>
> if (fattr->valid & NFS_ATTR_FATTR_ATIME)
> memcpy(&inode->i_atime, &fattr->atime, sizeof(inode->i_atime));
> - else if (server->caps & NFS_CAP_ATIME)
> + else if (server->caps & NFS_CAP_ATIME) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATIME
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_MODE) {
> if ((inode->i_mode & S_IALLUGO) != (fattr->mode & S_IALLUGO)) {
> @@ -1758,36 +1769,42 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> inode->i_mode = newmode;
> invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
> }
> - } else if (server->caps & NFS_CAP_MODE)
> + } else if (server->caps & NFS_CAP_MODE) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_OWNER) {
> if (!uid_eq(inode->i_uid, fattr->uid)) {
> invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
> inode->i_uid = fattr->uid;
> }
> - } else if (server->caps & NFS_CAP_OWNER)
> + } else if (server->caps & NFS_CAP_OWNER) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_GROUP) {
> if (!gid_eq(inode->i_gid, fattr->gid)) {
> invalid |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL;
> inode->i_gid = fattr->gid;
> }
> - } else if (server->caps & NFS_CAP_OWNER_GROUP)
> + } else if (server->caps & NFS_CAP_OWNER_GROUP) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_NLINK) {
> if (inode->i_nlink != fattr->nlink) {
> @@ -1796,19 +1813,22 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> invalid |= NFS_INO_INVALID_DATA;
> set_nlink(inode, fattr->nlink);
> }
> - } else if (server->caps & NFS_CAP_NLINK)
> + } else if (server->caps & NFS_CAP_NLINK) {
> nfsi->cache_validity |= save_cache_validity &
> (NFS_INO_INVALID_ATTR
> | NFS_INO_REVAL_FORCED);
> + cache_revalidated = false;
> + }
>
> if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
> /*
> * report the blocks in 512byte units
> */
> inode->i_blocks = nfs_calc_block_size(fattr->du.nfs3.used);
> - }
> - if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
> + } else if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
> inode->i_blocks = fattr->du.nfs2.blocks;
> + else
> + cache_revalidated = false;
>
> /* Update attrtimeo value if we're out of the unstable period */
> if (invalid & NFS_INO_INVALID_ATTR) {
> @@ -1818,8 +1838,11 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> /* Set barrier to be more recent than all outstanding updates */
> nfsi->attr_gencount = nfs_inc_attr_generation_counter();
> } else {
> - if (!time_in_range_open(now, nfsi->attrtimeo_timestamp, nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
> - if ((nfsi->attrtimeo <<= 1) > NFS_MAXATTRTIMEO(inode))
> + if (cache_revalidated &&
> + !time_in_range_open(now, nfsi->attrtimeo_timestamp,
> + nfsi->attrtimeo_timestamp + nfsi->attrtimeo)) {
> + nfsi->attrtimeo <<= 1;
> + if (nfsi->attrtimeo > NFS_MAXATTRTIMEO(inode))
> nfsi->attrtimeo = NFS_MAXATTRTIMEO(inode);
> nfsi->attrtimeo_timestamp = now;
> }
> @@ -1829,7 +1852,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
> }
>
> /* Don't declare attrcache up to date if there were no attrs! */
> - if (fattr->valid != 0)
> + if (cache_revalidated)
> invalid &= ~NFS_INO_INVALID_ATTR;
>
> /* Don't invalidate the data if we were to blame */
Hi, Trond,

your three patches

[PATCH] NFSv4: Don't perform cached access checks before we've OPENed
the file
[PATCH] NFS: Ensure we revalidate attributes before using execute_ok()
[PATCH] NFS: Fix attribute cache revalidation

applied to your github master fix the user-visible problems (exec and
access case). I currently don't have time to analyze the code or do more
testing than running my test script, though. I hope I can apply these on
our cluster nodes during the holiday and we'd have it on production
systems in January.

Btw: There is a little whitespace error in the last patch (space before
tab at the "} else if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)" line).

Thank you very much & Happy New Year

Donald

--
Donald Buczek
[email protected]
Tel: +49 30 8413 1433


2015-12-30 14:04:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFS: Fix attribute cache revalidation

On Wed, Dec 30, 2015 at 6:23 AM, Donald Buczek <[email protected]> wrote:
> Hi, Trond,
>
> your three patches
>
> [PATCH] NFSv4: Don't perform cached access checks before we've OPENed the
> file
> [PATCH] NFS: Ensure we revalidate attributes before using execute_ok()
> [PATCH] NFS: Fix attribute cache revalidation
>
> applied to your github master fix the user-visible problems (exec and access
> case). I currently don't have time to analyze the code or do more testing
> than running my test script, though. I hope I can apply these on our cluster
> nodes during the holiday and we'd have it on production systems in January.
>
> Btw: There is a little whitespace error in the last patch (space before tab
> at the "} else if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)" line).
>
> Thank you very much & Happy New Year

Thanks very much for the testing and analysis! Happy New Year to you too.

Cheers
Trond