After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent multiple
aliases for directory inode") my NFS client crashes while doing lustre race
tests simultaneously on a local filesystem and the same filesystem exported
via knfsd:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Call Trace:
? iput+0x76/0x200
? d_splice_alias+0x307/0x3c0
? dput.part.31+0x96/0x110
? nfs_instantiate+0x45/0x160 [nfs]
nfs3_proc_setacls+0xa/0x20 [nfsv3]
nfs3_proc_create+0x1cc/0x230 [nfsv3]
nfs_create+0x83/0x160 [nfs]
path_openat+0x11aa/0x14d0
do_filp_open+0x93/0x100
? __check_object_size+0xa3/0x181
do_sys_open+0x184/0x220
do_syscall_64+0x5b/0x1b0
entry_SYSCALL_64_after_hwframe+0x65/0xca
158 static int __nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
159 struct posix_acl *dfacl)
160 {
>> 161 struct nfs_server *server = NFS_SERVER(inode);
The 0x28 offset is i_sb in struct inode, we passed a NULL inode to
nfs3_proc_setacls().
After taking this apart, I find the dentry in R12 has a NULL inode after
nfs_instantiate(), which makes sense if we move it to the alias just after
nfs_fhget() (See the referenced commit above). Indeed, on the list of
children is the identical positive dentry that is left behind after
d_splice_alias(). Moving it would usualy be fine for callers, except for
NFSv3 because we want the inode pointer to ride the dentry back up the
stack so we can set ACLs on it and/or set attributes in the case of EXCLUSIVE.
A similar problem existed in nfsd_create_locked(), and was fixed by commit
3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry negative
unhashed"). This patch takes the same approach to fixing the problem: in
the rare case that we lost the race to the dentry, look it up and get the
inode from there.
Signed-off-by: Benjamin Coddington <[email protected]>
Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for directory inode")
Cc: Al Viro <[email protected]>
---
fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index a3ad2d46fd42..292c53c082f7 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -20,6 +20,7 @@
#include <linux/nfs_mount.h>
#include <linux/freezer.h>
#include <linux/xattr.h>
+#include <linux/namei.h>
#include "iostat.h"
#include "internal.h"
@@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr,
struct posix_acl *default_acl, *acl;
struct nfs3_createdata *data;
int status = -ENOMEM;
+ struct dentry *d = NULL;
dprintk("NFS call create %pd\n", dentry);
@@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr,
if (status != 0)
goto out_release_acls;
+ /* Possible that nfs_instantiate() lost a race to open-by-fhandle,
+ * in which case we don't have a reference to the dentry */
+ if (unlikely(d_unhashed(dentry))) {
+ d = lookup_one_len(dentry->d_name.name, dentry->d_parent,
+ dentry->d_name.len);
+ if (IS_ERR(d)) {
+ status = PTR_ERR(d);
+ goto out_release_acls;
+ }
+ if (unlikely(d_is_negative(d))) {
+ status = -ENOENT;
+ goto out_put_d;
+ }
+ dentry = d;
+ }
+
/* When we created the file with exclusive semantics, make
* sure we set the attributes afterwards. */
if (data->arg.create.createmode == NFS3_CREATE_EXCLUSIVE) {
@@ -372,11 +390,13 @@ nfs3_proc_create(struct inode *dir, struct dentry *dentry, struct iattr *sattr,
nfs_post_op_update_inode(d_inode(dentry), data->res.fattr);
dprintk("NFS reply setattr (post-create): %d\n", status);
if (status != 0)
- goto out_release_acls;
+ goto out_put_d;
}
status = nfs3_proc_setacls(d_inode(dentry), acl, default_acl);
+out_put_d:
+ dput(d);
out_release_acls:
posix_acl_release(acl);
posix_acl_release(default_acl);
--
2.20.1
On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote:
> After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent
> multiple
> aliases for directory inode") my NFS client crashes while doing
> lustre race
> tests simultaneously on a local filesystem and the same filesystem
> exported
> via knfsd:
>
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000028
> Call Trace:
> ? iput+0x76/0x200
> ? d_splice_alias+0x307/0x3c0
> ? dput.part.31+0x96/0x110
> ? nfs_instantiate+0x45/0x160 [nfs]
> nfs3_proc_setacls+0xa/0x20 [nfsv3]
> nfs3_proc_create+0x1cc/0x230 [nfsv3]
> nfs_create+0x83/0x160 [nfs]
> path_openat+0x11aa/0x14d0
> do_filp_open+0x93/0x100
> ? __check_object_size+0xa3/0x181
> do_sys_open+0x184/0x220
> do_syscall_64+0x5b/0x1b0
> entry_SYSCALL_64_after_hwframe+0x65/0xca
>
> 158 static int __nfs3_proc_setacls(struct inode *inode, struct
> posix_acl *acl,
> 159 struct posix_acl *dfacl)
> 160 {
> > > 161 struct nfs_server *server = NFS_SERVER(inode);
>
> The 0x28 offset is i_sb in struct inode, we passed a NULL inode to
> nfs3_proc_setacls().
>
> After taking this apart, I find the dentry in R12 has a NULL inode
> after
> nfs_instantiate(), which makes sense if we move it to the alias just
> after
> nfs_fhget() (See the referenced commit above). Indeed, on the list
> of
> children is the identical positive dentry that is left behind after
> d_splice_alias(). Moving it would usualy be fine for callers, except
> for
> NFSv3 because we want the inode pointer to ride the dentry back up
> the
> stack so we can set ACLs on it and/or set attributes in the case of
> EXCLUSIVE.
>
> A similar problem existed in nfsd_create_locked(), and was fixed by
> commit
> 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry
> negative
> unhashed"). This patch takes the same approach to fixing the
> problem: in
> the rare case that we lost the race to the dentry, look it up and get
> the
> inode from there.
>
> Signed-off-by: Benjamin Coddington <[email protected]>
> Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for
> directory inode")
> Cc: Al Viro <[email protected]>
> ---
> fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index a3ad2d46fd42..292c53c082f7 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -20,6 +20,7 @@
> #include <linux/nfs_mount.h>
> #include <linux/freezer.h>
> #include <linux/xattr.h>
> +#include <linux/namei.h>
>
> #include "iostat.h"
> #include "internal.h"
> @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry
> *dentry, struct iattr *sattr,
> struct posix_acl *default_acl, *acl;
> struct nfs3_createdata *data;
> int status = -ENOMEM;
> + struct dentry *d = NULL;
>
> dprintk("NFS call create %pd\n", dentry);
>
> @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct
> dentry *dentry, struct iattr *sattr,
> if (status != 0)
> goto out_release_acls;
>
> + /* Possible that nfs_instantiate() lost a race to open-by-
> fhandle,
> + * in which case we don't have a reference to the dentry */
> + if (unlikely(d_unhashed(dentry))) {
> + d = lookup_one_len(dentry->d_name.name, dentry-
> >d_parent,
> + dentry-
> >d_name.len);
> + if (IS_ERR(d)) {
> + status = PTR_ERR(d);
> + goto out_release_acls;
> + }
> + if (unlikely(d_is_negative(d))) {
> + status = -ENOENT;
> + goto out_put_d;
> + }
> + dentry = d;
> + }
> +
If this is a consequence of a race in nfs_instantiate, then why are we
not fix it there? Won't we otherwise end up having to duplicate the
above code in all the other callers?
IOW: why not simply modify nfs_instantiate() to return the dentry from
d_splice_alias(), much like we already do for nfs_lookup()?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 26 Aug 2019, at 16:39, Trond Myklebust wrote:
> On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote:
>> After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent
>> multiple
>> aliases for directory inode") my NFS client crashes while doing
>> lustre race
>> tests simultaneously on a local filesystem and the same filesystem
>> exported
>> via knfsd:
>>
>> BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000028
>> Call Trace:
>> ? iput+0x76/0x200
>> ? d_splice_alias+0x307/0x3c0
>> ? dput.part.31+0x96/0x110
>> ? nfs_instantiate+0x45/0x160 [nfs]
>> nfs3_proc_setacls+0xa/0x20 [nfsv3]
>> nfs3_proc_create+0x1cc/0x230 [nfsv3]
>> nfs_create+0x83/0x160 [nfs]
>> path_openat+0x11aa/0x14d0
>> do_filp_open+0x93/0x100
>> ? __check_object_size+0xa3/0x181
>> do_sys_open+0x184/0x220
>> do_syscall_64+0x5b/0x1b0
>> entry_SYSCALL_64_after_hwframe+0x65/0xca
>>
>> 158 static int __nfs3_proc_setacls(struct inode *inode, struct
>> posix_acl *acl,
>> 159 struct posix_acl *dfacl)
>> 160 {
>>>> 161 struct nfs_server *server = NFS_SERVER(inode);
>>
>> The 0x28 offset is i_sb in struct inode, we passed a NULL inode to
>> nfs3_proc_setacls().
>>
>> After taking this apart, I find the dentry in R12 has a NULL inode
>> after
>> nfs_instantiate(), which makes sense if we move it to the alias just
>> after
>> nfs_fhget() (See the referenced commit above). Indeed, on the list
>> of
>> children is the identical positive dentry that is left behind after
>> d_splice_alias(). Moving it would usualy be fine for callers, except
>> for
>> NFSv3 because we want the inode pointer to ride the dentry back up
>> the
>> stack so we can set ACLs on it and/or set attributes in the case of
>> EXCLUSIVE.
>>
>> A similar problem existed in nfsd_create_locked(), and was fixed by
>> commit
>> 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry
>> negative
>> unhashed"). This patch takes the same approach to fixing the
>> problem: in
>> the rare case that we lost the race to the dentry, look it up and get
>> the
>> inode from there.
>>
>> Signed-off-by: Benjamin Coddington <[email protected]>
>> Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for
>> directory inode")
>> Cc: Al Viro <[email protected]>
>> ---
>> fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++-
>> 1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
>> index a3ad2d46fd42..292c53c082f7 100644
>> --- a/fs/nfs/nfs3proc.c
>> +++ b/fs/nfs/nfs3proc.c
>> @@ -20,6 +20,7 @@
>> #include <linux/nfs_mount.h>
>> #include <linux/freezer.h>
>> #include <linux/xattr.h>
>> +#include <linux/namei.h>
>>
>> #include "iostat.h"
>> #include "internal.h"
>> @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct dentry
>> *dentry, struct iattr *sattr,
>> struct posix_acl *default_acl, *acl;
>> struct nfs3_createdata *data;
>> int status = -ENOMEM;
>> + struct dentry *d = NULL;
>>
>> dprintk("NFS call create %pd\n", dentry);
>>
>> @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct
>> dentry *dentry, struct iattr *sattr,
>> if (status != 0)
>> goto out_release_acls;
>>
>> + /* Possible that nfs_instantiate() lost a race to open-by-
>> fhandle,
>> + * in which case we don't have a reference to the dentry */
>> + if (unlikely(d_unhashed(dentry))) {
>> + d = lookup_one_len(dentry->d_name.name, dentry-
>>> d_parent,
>> + dentry-
>>> d_name.len);
>> + if (IS_ERR(d)) {
>> + status = PTR_ERR(d);
>> + goto out_release_acls;
>> + }
>> + if (unlikely(d_is_negative(d))) {
>> + status = -ENOENT;
>> + goto out_put_d;
>> + }
>> + dentry = d;
>> + }
>> +
>
>
> If this is a consequence of a race in nfs_instantiate, then why are we
> not fix it there? Won't we otherwise end up having to duplicate the
> above code in all the other callers?
>
> IOW: why not simply modify nfs_instantiate() to return the dentry from
> d_splice_alias(), much like we already do for nfs_lookup()?
None of the other callers care about the dentry and it seemed more invasive.
It is also an accepted pattern for VFS - that's why Al justified it in
b0c6108ecf64.
If you'd rather change all the callers, let me know and I can send that.
Ben
On Tue, 2019-08-27 at 06:33 -0400, Benjamin Coddington wrote:
> On 26 Aug 2019, at 16:39, Trond Myklebust wrote:
>
> > On Mon, 2019-08-26 at 16:24 -0400, Benjamin Coddington wrote:
> > > After adding commit b0c6108ecf64 ("nfs_instantiate(): prevent
> > > multiple
> > > aliases for directory inode") my NFS client crashes while doing
> > > lustre race
> > > tests simultaneously on a local filesystem and the same
> > > filesystem
> > > exported
> > > via knfsd:
> > >
> > > BUG: unable to handle kernel NULL pointer dereference at
> > > 0000000000000028
> > > Call Trace:
> > > ? iput+0x76/0x200
> > > ? d_splice_alias+0x307/0x3c0
> > > ? dput.part.31+0x96/0x110
> > > ? nfs_instantiate+0x45/0x160 [nfs]
> > > nfs3_proc_setacls+0xa/0x20 [nfsv3]
> > > nfs3_proc_create+0x1cc/0x230 [nfsv3]
> > > nfs_create+0x83/0x160 [nfs]
> > > path_openat+0x11aa/0x14d0
> > > do_filp_open+0x93/0x100
> > > ? __check_object_size+0xa3/0x181
> > > do_sys_open+0x184/0x220
> > > do_syscall_64+0x5b/0x1b0
> > > entry_SYSCALL_64_after_hwframe+0x65/0xca
> > >
> > > 158 static int __nfs3_proc_setacls(struct inode *inode, struct
> > > posix_acl *acl,
> > > 159 struct posix_acl *dfacl)
> > > 160 {
> > > > > 161 struct nfs_server *server = NFS_SERVER(inode);
> > >
> > > The 0x28 offset is i_sb in struct inode, we passed a NULL inode
> > > to
> > > nfs3_proc_setacls().
> > >
> > > After taking this apart, I find the dentry in R12 has a NULL
> > > inode
> > > after
> > > nfs_instantiate(), which makes sense if we move it to the alias
> > > just
> > > after
> > > nfs_fhget() (See the referenced commit above). Indeed, on the
> > > list
> > > of
> > > children is the identical positive dentry that is left behind
> > > after
> > > d_splice_alias(). Moving it would usualy be fine for callers,
> > > except
> > > for
> > > NFSv3 because we want the inode pointer to ride the dentry back
> > > up
> > > the
> > > stack so we can set ACLs on it and/or set attributes in the case
> > > of
> > > EXCLUSIVE.
> > >
> > > A similar problem existed in nfsd_create_locked(), and was fixed
> > > by
> > > commit
> > > 3819bb0d79f5 ("nfsd: vfs_mkdir() might succeed leaving dentry
> > > negative
> > > unhashed"). This patch takes the same approach to fixing the
> > > problem: in
> > > the rare case that we lost the race to the dentry, look it up and
> > > get
> > > the
> > > inode from there.
> > >
> > > Signed-off-by: Benjamin Coddington <[email protected]>
> > > Fixes: b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases
> > > for
> > > directory inode")
> > > Cc: Al Viro <[email protected]>
> > > ---
> > > fs/nfs/nfs3proc.c | 22 +++++++++++++++++++++-
> > > 1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> > > index a3ad2d46fd42..292c53c082f7 100644
> > > --- a/fs/nfs/nfs3proc.c
> > > +++ b/fs/nfs/nfs3proc.c
> > > @@ -20,6 +20,7 @@
> > > #include <linux/nfs_mount.h>
> > > #include <linux/freezer.h>
> > > #include <linux/xattr.h>
> > > +#include <linux/namei.h>
> > >
> > > #include "iostat.h"
> > > #include "internal.h"
> > > @@ -305,6 +306,7 @@ nfs3_proc_create(struct inode *dir, struct
> > > dentry
> > > *dentry, struct iattr *sattr,
> > > struct posix_acl *default_acl, *acl;
> > > struct nfs3_createdata *data;
> > > int status = -ENOMEM;
> > > + struct dentry *d = NULL;
> > >
> > > dprintk("NFS call create %pd\n", dentry);
> > >
> > > @@ -355,6 +357,22 @@ nfs3_proc_create(struct inode *dir, struct
> > > dentry *dentry, struct iattr *sattr,
> > > if (status != 0)
> > > goto out_release_acls;
> > >
> > > + /* Possible that nfs_instantiate() lost a race to open-by-
> > > fhandle,
> > > + * in which case we don't have a reference to the dentry */
> > > + if (unlikely(d_unhashed(dentry))) {
> > > + d = lookup_one_len(dentry->d_name.name, dentry-
> > > > d_parent,
> > > + dentry-
> > > > d_name.len);
> > > + if (IS_ERR(d)) {
> > > + status = PTR_ERR(d);
> > > + goto out_release_acls;
> > > + }
> > > + if (unlikely(d_is_negative(d))) {
> > > + status = -ENOENT;
> > > + goto out_put_d;
> > > + }
> > > + dentry = d;
> > > + }
> > > +
> >
> > If this is a consequence of a race in nfs_instantiate, then why are
> > we
> > not fix it there? Won't we otherwise end up having to duplicate the
> > above code in all the other callers?
> >
> > IOW: why not simply modify nfs_instantiate() to return the dentry
> > from
> > d_splice_alias(), much like we already do for nfs_lookup()?
>
> None of the other callers care about the dentry and it seemed more
> invasive.
> It is also an accepted pattern for VFS - that's why Al justified it
> in
> b0c6108ecf64.
It is racy, though. Nothing guarantees that a dentry for that file is
still hashed under the same name when you look it up again, so it is
better to pass it back directly from the d_splice_alias() call.
> If you'd rather change all the callers, let me know and I can send
> that.
If you'd prefer not to have to change all the callers, then perhaps
split the function into two parts:
- The inner part that returns the dentry from d_splice_alias() on
success, and which can be called directly from nfs3_do_create().
- Then a wrapper that works like nfs_instantiate() by dput()ing the
valid dentry (and returning 0) or otherwise converting the ERR_PTR()
and returning that.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 27 Aug 2019, at 7:46, Trond Myklebust wrote:
> On Tue, 2019-08-27 at 06:33 -0400, Benjamin Coddington wrote:
>> On 26 Aug 2019, at 16:39, Trond Myklebust wrote:
>>> If this is a consequence of a race in nfs_instantiate, then why are
>>> we
>>> not fix it there? Won't we otherwise end up having to duplicate the
>>> above code in all the other callers?
>>>
>>> IOW: why not simply modify nfs_instantiate() to return the dentry
>>> from
>>> d_splice_alias(), much like we already do for nfs_lookup()?
>>
>> None of the other callers care about the dentry and it seemed more
>> invasive.
>> It is also an accepted pattern for VFS - that's why Al justified it
>> in
>> b0c6108ecf64.
>
> It is racy, though. Nothing guarantees that a dentry for that file is
> still hashed under the same name when you look it up again, so it is
> better to pass it back directly from the d_splice_alias() call.
>
>> If you'd rather change all the callers, let me know and I can send
>> that.
>
> If you'd prefer not to have to change all the callers, then perhaps
> split the function into two parts:
> - The inner part that returns the dentry from d_splice_alias() on
> success, and which can be called directly from nfs3_do_create().
> - Then a wrapper that works like nfs_instantiate() by dput()ing the
> valid dentry (and returning 0) or otherwise converting the ERR_PTR()
> and returning that.
Ok, sounds fine.
One thing that strikes me as odd is the d_drop() at the top of
nfs_instantiate(). That seems wrong if the next check for positive bypasses
the work of hashing it again.
Can you give me a hint as to why the paths are that way? Otherwise, I think
it should change as:
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 8d501093660f..7720a19b38d3 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1682,11 +1682,12 @@ int nfs_instantiate(struct dentry *dentry, struct
nfs_fh *fhandle,
struct dentry *d;
int error = -EACCES;
- d_drop(dentry);
-
/* We may have been initialized further down */
if (d_really_is_positive(dentry))
goto out;
+
+ d_drop(dentry);
if (fhandle->size == 0) {
error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name,
fhandle, fattr, NULL);
if (error)
Ben