2020-09-14 03:40:35

by Jianyong Wu

[permalink] [raw]
Subject: [PATCH RFC 0/4] 9p: fix open-unlink-f*syscall bug

open-unlink-f*syscall bug is a well-known bug in 9p, we try to fix the bug
in this patch set.
I take Eric's and Greg's patches which constiute the 1/4 - 3/4 of this patch
set as the main frame of the solution. In patch 4/4, I fix the fid race issue
exists in Greg's patch.

Eric Van Hensbergen (1):
fs/9p: fix create-unlink-getattr idiom

Greg Kurz (1):
fs/9p: search open fids first

Jianyong Wu (2):
fs/9p: track open fids
9p: fix race issue in fid contention.

fs/9p/fid.c | 72 +++++++++++++++++++++++++++++++++++------
fs/9p/fid.h | 25 +++++++++++---
fs/9p/vfs_dentry.c | 2 +-
fs/9p/vfs_dir.c | 20 ++++++++++--
fs/9p/vfs_file.c | 3 +-
fs/9p/vfs_inode.c | 52 +++++++++++++++++++++--------
fs/9p/vfs_inode_dotl.c | 44 ++++++++++++++++---------
fs/9p/vfs_super.c | 7 ++--
fs/9p/xattr.c | 18 ++++++++---
include/net/9p/client.h | 8 +++++
net/9p/client.c | 7 +++-
11 files changed, 206 insertions(+), 52 deletions(-)

--
2.17.1


2020-09-14 03:42:59

by Jianyong Wu

[permalink] [raw]
Subject: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
bug in 9p. But there is race issue in fid comtention.
As Greg's patch stores all of fids from opened files into according inode,
so all the lookup fid ops can retrieve fid from inode preferentially. But
there is no mechanism to handle the fid comtention issue. For example,
there are two threads get the same fid in the same time and one of them
clunk the fid before the other thread ready to discard the fid. In this
scenario, it will lead to some fatal problems, even kernel core dump.

I introduce a mechanism to fix this race issue. A counter field introduced
into p9_fid struct to store the reference counter to the fid. When a fid
is allocated from the inode, the counter will increase, and will decrease
at the end of its occupation. It is guaranteed that the fid won't be clunked
before the reference counter go down to 0, then we can avoid the clunked
fid to be used.
As there is no need to retrieve fid from inode in all conditions, a enum value
denotes the source of the fid is introduced to 9p_fid either. So we can only
handle the reference counter as to the fid obtained from inode.

tests:
race issue test from the old test case:
for file in {01..50}; do touch f.${file}; done
seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null

open-unlink-f*syscall test:
I have tested for f*syscall include: ftruncate fstat fchown fchmod faccessat.

Signed-off-by: Jianyong Wu <[email protected]>
---
fs/9p/fid.c | 27 +++++++++++++++++---------
fs/9p/fid.h | 24 +++++++++++++++++++----
fs/9p/vfs_dentry.c | 2 +-
fs/9p/vfs_dir.c | 23 +++++++++++++++++-----
fs/9p/vfs_file.c | 2 +-
fs/9p/vfs_inode.c | 42 ++++++++++++++++++++++++++++------------
fs/9p/vfs_inode_dotl.c | 43 +++++++++++++++++++++++++++--------------
fs/9p/vfs_super.c | 7 +++++--
fs/9p/xattr.c | 18 +++++++++++++----
include/net/9p/client.h | 7 +++++++
net/9p/client.c | 2 ++
11 files changed, 144 insertions(+), 53 deletions(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 0b23b0fe6c51..fd8cfa4776c9 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -60,6 +60,10 @@ static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
break;
}
}
+ if (ret && !IS_ERR(ret)) {
+ atomic_inc(&ret->count);
+ ret->s = FID_FROM_INODE;
+ }
spin_unlock(&inode->i_lock);
return ret;
}
@@ -84,10 +88,13 @@ void v9fs_open_fid_add(struct inode *inode, struct p9_fid *fid)
* @dentry: dentry to look for fid in
* @uid: return fid that belongs to the specified user
* @any: if non-zero, return any fid associated with the dentry
+ * @source: from which of inode or dentry caller want to retrieve fid, 0
+ * denotes dentry and other denotes inode. Only if the f* syscall related
+ * funcs are recommended to set to non-zero.
*
*/

-static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
+static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any, int source)
{
struct p9_fid *fid, *ret;

@@ -96,7 +103,7 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
any);
ret = NULL;

- if (d_inode(dentry))
+ if (d_inode(dentry) && source)
ret = v9fs_fid_find_inode(d_inode(dentry), uid);

/* we'll recheck under lock if there's anything to look in */
@@ -109,6 +116,8 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
break;
}
}
+ if (ret && !IS_ERR(ret))
+ ret->s = FID_FROM_DENTRY;
spin_unlock(&dentry->d_lock);
}

@@ -144,7 +153,7 @@ static int build_path_from_dentry(struct v9fs_session_info *v9ses,
}

static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
- kuid_t uid, int any)
+ kuid_t uid, int any, int source)
{
struct dentry *ds;
const unsigned char **wnames, *uname;
@@ -154,7 +163,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,

v9ses = v9fs_dentry2v9ses(dentry);
access = v9ses->flags & V9FS_ACCESS_MASK;
- fid = v9fs_fid_find(dentry, uid, any);
+ fid = v9fs_fid_find(dentry, uid, any, source);
if (fid)
return fid;
/*
@@ -164,7 +173,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
*/
down_read(&v9ses->rename_sem);
ds = dentry->d_parent;
- fid = v9fs_fid_find(ds, uid, any);
+ fid = v9fs_fid_find(ds, uid, any, 0);
if (fid) {
/* Found the parent fid do a lookup with that */
fid = p9_client_walk(fid, 1, &dentry->d_name.name, 1);
@@ -173,7 +182,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
up_read(&v9ses->rename_sem);

/* start from the root and try to do a lookup */
- fid = v9fs_fid_find(dentry->d_sb->s_root, uid, any);
+ fid = v9fs_fid_find(dentry->d_sb->s_root, uid, any, 0);
if (!fid) {
/* the user is not attached to the fs yet */
if (access == V9FS_ACCESS_SINGLE)
@@ -258,7 +267,7 @@ static struct p9_fid *v9fs_fid_lookup_with_uid(struct dentry *dentry,
* the fs yet, attach now and walk from the root.
*/

-struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
+struct p9_fid *v9fs_fid_lookup(struct dentry *dentry, int source)
{
kuid_t uid;
int any, access;
@@ -284,7 +293,7 @@ struct p9_fid *v9fs_fid_lookup(struct dentry *dentry)
any = 0;
break;
}
- return v9fs_fid_lookup_with_uid(dentry, uid, any);
+ return v9fs_fid_lookup_with_uid(dentry, uid, any, source);
}

struct p9_fid *v9fs_writeback_fid(struct dentry *dentry)
@@ -292,7 +301,7 @@ struct p9_fid *v9fs_writeback_fid(struct dentry *dentry)
int err;
struct p9_fid *fid;

- fid = clone_fid(v9fs_fid_lookup_with_uid(dentry, GLOBAL_ROOT_UID, 0));
+ fid = clone_fid(v9fs_fid_lookup_with_uid(dentry, GLOBAL_ROOT_UID, 0, 0));
if (IS_ERR(fid))
goto error_out;
/*
diff --git a/fs/9p/fid.h b/fs/9p/fid.h
index dfa11df02818..bea25fd2b983 100644
--- a/fs/9p/fid.h
+++ b/fs/9p/fid.h
@@ -8,10 +8,26 @@
#define FS_9P_FID_H
#include <linux/list.h>

-struct p9_fid *v9fs_fid_lookup(struct dentry *dentry);
+struct p9_fid *v9fs_fid_lookup(struct dentry *dentry, int source);
+static inline int fid_atomic_dec(struct p9_fid *fid)
+{
+ if (fid && !IS_ERR(fid) && (fid->s & FID_FROM_INODE)) {
+ if (atomic_sub_return(1, &fid->count) < 0) {
+ pr_err("fid counter should be positive\n");
+ while (atomic_inc_and_test(&fid->count));
+ return -1;
+ }
+ }
+ return 0;
+}
+
static inline struct p9_fid *v9fs_parent_fid(struct dentry *dentry)
{
- return v9fs_fid_lookup(dentry->d_parent);
+ /*
+ * The "*at syscall" often need parent fd, so let's search
+ * fid from inode first.
+ */
+ return v9fs_fid_lookup(dentry->d_parent, 1);
}
void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid);
struct p9_fid *v9fs_writeback_fid(struct dentry *dentry);
@@ -20,8 +36,8 @@ static inline struct p9_fid *clone_fid(struct p9_fid *fid)
{
return IS_ERR(fid) ? fid : p9_client_walk(fid, 0, NULL, 1);
}
-static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry)
+static inline struct p9_fid *v9fs_fid_clone(struct dentry *dentry, int source)
{
- return clone_fid(v9fs_fid_lookup(dentry));
+ return clone_fid(v9fs_fid_lookup(dentry, source));
}
#endif
diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
index 7d6f69aefd45..17bb872d7203 100644
--- a/fs/9p/vfs_dentry.c
+++ b/fs/9p/vfs_dentry.c
@@ -76,7 +76,7 @@ static int v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
if (v9inode->cache_validity & V9FS_INO_INVALID_ATTR) {
int retval;
struct v9fs_session_info *v9ses;
- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 0);
if (IS_ERR(fid))
return PTR_ERR(fid);

diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
index d82d8a346f86..f817c6a5fb42 100644
--- a/fs/9p/vfs_dir.c
+++ b/fs/9p/vfs_dir.c
@@ -19,6 +19,7 @@
#include <linux/idr.h>
#include <linux/slab.h>
#include <linux/uio.h>
+#include <linux/delay.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>

@@ -206,15 +207,27 @@ static int v9fs_dir_readdir_dotl(struct file *file, struct dir_context *ctx)
int v9fs_dir_release(struct inode *inode, struct file *filp)
{
struct p9_fid *fid;
+ int count;

fid = filp->private_data;
p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n",
inode, filp, fid ? fid->fid : -1);
- spin_lock(&inode->i_lock);
- hlist_del(&fid->ilist);
- spin_unlock(&inode->i_lock);
- if (fid)
- p9_client_clunk(fid);
+ if (fid) {
+ spin_lock(&inode->i_lock);
+ hlist_del(&fid->ilist);
+ spin_unlock(&inode->i_lock);
+ /*
+ * Here we wait up to 10ms, if the counter won't down to 0
+ * the fid will be left behind.
+ */
+ count = 100;
+ while (count > 0 && fid->count.counter > 0) {
+ count--;
+ udelay(100);
+ }
+ if (fid->count.counter <= 0)
+ p9_client_clunk(fid);
+ }
return 0;
}

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index b42cc1752cd1..efc3f5cc1c14 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -59,7 +59,7 @@ int v9fs_file_open(struct inode *inode, struct file *file)
v9fs_proto_dotu(v9ses));
fid = file->private_data;
if (!fid) {
- fid = v9fs_fid_clone(file_dentry(file));
+ fid = v9fs_fid_clone(file_dentry(file), 0);
if (IS_ERR(fid))
return PTR_ERR(fid);

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 6b243ffcbcf0..f8718b9b1915 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -551,9 +551,10 @@ static int v9fs_remove(struct inode *dir, struct dentry *dentry, int flags)
if (v9fs_proto_dotl(v9ses))
retval = p9_client_unlinkat(dfid, dentry->d_name.name,
v9fs_at_to_dotl_flags(flags));
+ fid_atomic_dec(dfid);
if (retval == -EOPNOTSUPP) {
/* Try the one based on path */
- v9fid = v9fs_fid_clone(dentry);
+ v9fid = v9fs_fid_clone(dentry, 0);
if (IS_ERR(v9fid))
return PTR_ERR(v9fid);
retval = p9_client_remove(v9fid);
@@ -616,12 +617,14 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
if (IS_ERR(ofid)) {
err = PTR_ERR(ofid);
p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
+ fid_atomic_dec(dfid);
return ERR_PTR(err);
}

err = p9_client_fcreate(ofid, name, perm, mode, extension);
if (err < 0) {
p9_debug(P9_DEBUG_VFS, "p9_client_fcreate failed %d\n", err);
+ fid_atomic_dec(dfid);
goto error;
}

@@ -633,6 +636,7 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
p9_debug(P9_DEBUG_VFS,
"p9_client_walk failed %d\n", err);
fid = NULL;
+ fid_atomic_dec(dfid);
goto error;
}
/*
@@ -643,11 +647,13 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
err = PTR_ERR(inode);
p9_debug(P9_DEBUG_VFS,
"inode creation failed %d\n", err);
+ fid_atomic_dec(dfid);
goto error;
}
v9fs_fid_add(dentry, fid);
d_instantiate(dentry, inode);
}
+ fid_atomic_dec(dfid);
return ofid;
error:
if (ofid)
@@ -760,6 +766,7 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
*/
name = dentry->d_name.name;
fid = p9_client_walk(dfid, 1, &name, 1);
+ fid_atomic_dec(dfid);
if (fid == ERR_PTR(-ENOENT))
inode = NULL;
else if (IS_ERR(fid))
@@ -910,7 +917,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *old_inode;
struct inode *new_inode;
struct v9fs_session_info *v9ses;
- struct p9_fid *oldfid;
+ struct p9_fid *oldfid, *dfid;
struct p9_fid *olddirfid;
struct p9_fid *newdirfid;
struct p9_wstat wstat;
@@ -923,17 +930,23 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
old_inode = d_inode(old_dentry);
new_inode = d_inode(new_dentry);
v9ses = v9fs_inode2v9ses(old_inode);
- oldfid = v9fs_fid_lookup(old_dentry);
+ oldfid = v9fs_fid_lookup(old_dentry, 1);
if (IS_ERR(oldfid))
return PTR_ERR(oldfid);

- olddirfid = clone_fid(v9fs_parent_fid(old_dentry));
+ dfid = v9fs_parent_fid(old_dentry);
+ olddirfid = clone_fid(dfid);
+ if (dfid && !IS_ERR(dfid))
+ fid_atomic_dec(dfid);
if (IS_ERR(olddirfid)) {
retval = PTR_ERR(olddirfid);
goto done;
}

- newdirfid = clone_fid(v9fs_parent_fid(new_dentry));
+ dfid = v9fs_parent_fid(new_dentry);
+ newdirfid = clone_fid(dfid);
+ if (dfid && !IS_ERR(dfid))
+ fid_atomic_dec(dfid);
if (IS_ERR(newdirfid)) {
retval = PTR_ERR(newdirfid);
goto clunk_olddir;
@@ -990,6 +1003,7 @@ v9fs_vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
p9_client_clunk(olddirfid);

done:
+ fid_atomic_dec(oldfid);
return retval;
}

@@ -1017,11 +1031,12 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,
generic_fillattr(d_inode(dentry), stat);
return 0;
}
- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 1);
if (IS_ERR(fid))
return PTR_ERR(fid);

st = p9_client_stat(fid);
+ fid_atomic_dec(fid);
if (IS_ERR(st))
return PTR_ERR(st);

@@ -1042,7 +1057,7 @@ v9fs_vfs_getattr(const struct path *path, struct kstat *stat,

static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
{
- int retval;
+ int retval, use_dentry = 0;
struct v9fs_session_info *v9ses;
struct p9_fid *fid = NULL;
struct p9_wstat wstat;
@@ -1058,11 +1073,12 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
fid = iattr->ia_file->private_data;
WARN_ON(!fid);
}
- if (!fid)
- fid = v9fs_fid_lookup(dentry);
+ if (!fid) {
+ fid = v9fs_fid_lookup(dentry, 1);
+ use_dentry = 1;
+ }
if(IS_ERR(fid))
return PTR_ERR(fid);
-
v9fs_blank_wstat(&wstat);
if (iattr->ia_valid & ATTR_MODE)
wstat.mode = unixmode2p9mode(v9ses, iattr->ia_mode);
@@ -1089,6 +1105,8 @@ static int v9fs_vfs_setattr(struct dentry *dentry, struct iattr *iattr)
filemap_write_and_wait(d_inode(dentry)->i_mapping);

retval = p9_client_wstat(fid, &wstat);
+ if (use_dentry)
+ fid_atomic_dec(fid);
if (retval < 0)
return retval;

@@ -1203,7 +1221,7 @@ static const char *v9fs_vfs_get_link(struct dentry *dentry,
return ERR_PTR(-ECHILD);

v9ses = v9fs_dentry2v9ses(dentry);
- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 0);
p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

if (IS_ERR(fid))
@@ -1303,7 +1321,7 @@ v9fs_vfs_link(struct dentry *old_dentry, struct inode *dir,
p9_debug(P9_DEBUG_VFS, " %lu,%pd,%pd\n",
dir->i_ino, dentry, old_dentry);

- oldfid = v9fs_fid_clone(old_dentry);
+ oldfid = v9fs_fid_clone(old_dentry, 0);
if (IS_ERR(oldfid))
return PTR_ERR(oldfid);

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 08f2e089fb0e..be68db87ef61 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -296,6 +296,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,

/* instantiate inode and assign the unopened fid to the dentry */
fid = p9_client_walk(dfid, 1, &name, 1);
+ fid_atomic_dec(dfid);
if (IS_ERR(fid)) {
err = PTR_ERR(fid);
p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
@@ -394,7 +395,6 @@ static int v9fs_vfs_mkdir_dotl(struct inode *dir,
dfid = NULL;
goto error;
}
-
gid = v9fs_get_fsgid_for_create(dir);
mode = omode;
/* Update mode based on ACL value */
@@ -452,6 +452,7 @@ static int v9fs_vfs_mkdir_dotl(struct inode *dir,
if (fid)
p9_client_clunk(fid);
v9fs_put_acl(dacl, pacl);
+ fid_atomic_dec(dfid);
return err;
}

@@ -470,7 +471,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
generic_fillattr(d_inode(dentry), stat);
return 0;
}
- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 1);
if (IS_ERR(fid))
return PTR_ERR(fid);

@@ -479,6 +480,7 @@ v9fs_vfs_getattr_dotl(const struct path *path, struct kstat *stat,
*/

st = p9_client_getattr_dotl(fid, P9_STATS_ALL);
+ fid_atomic_dec(fid);
if (IS_ERR(st))
return PTR_ERR(st);

@@ -540,7 +542,7 @@ static int v9fs_mapped_iattr_valid(int iattr_valid)

int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
{
- int retval;
+ int retval, use_dentry = 0;
struct p9_fid *fid = NULL;
struct p9_iattr_dotl p9attr;
struct inode *inode = d_inode(dentry);
@@ -565,8 +567,10 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
fid = iattr->ia_file->private_data;
WARN_ON(!fid);
}
- if (!fid)
- fid = v9fs_fid_lookup(dentry);
+ if (!fid) {
+ fid = v9fs_fid_lookup(dentry, 1);
+ use_dentry = 1;
+ }
if (IS_ERR(fid))
return PTR_ERR(fid);

@@ -575,8 +579,10 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
filemap_write_and_wait(inode->i_mapping);

retval = p9_client_setattr(fid, &p9attr);
- if (retval < 0)
+ if (retval < 0) {
+ fid_atomic_dec(fid);
return retval;
+ }

if ((iattr->ia_valid & ATTR_SIZE) &&
iattr->ia_size != i_size_read(inode))
@@ -588,9 +594,13 @@ int v9fs_vfs_setattr_dotl(struct dentry *dentry, struct iattr *iattr)
if (iattr->ia_valid & ATTR_MODE) {
/* We also want to update ACL when we update mode bits */
retval = v9fs_acl_chmod(inode, fid);
- if (retval < 0)
+ if (retval < 0) {
+ fid_atomic_dec(fid);
return retval;
+ }
}
+ if (use_dentry)
+ fid_atomic_dec(fid);
return 0;
}

@@ -741,7 +751,7 @@ v9fs_vfs_symlink_dotl(struct inode *dir, struct dentry *dentry,
error:
if (fid)
p9_client_clunk(fid);
-
+ fid_atomic_dec(dfid);
return err;
}

@@ -769,12 +779,15 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode *dir,
if (IS_ERR(dfid))
return PTR_ERR(dfid);

- oldfid = v9fs_fid_lookup(old_dentry);
- if (IS_ERR(oldfid))
+ oldfid = v9fs_fid_lookup(old_dentry, 0);
+ if (IS_ERR(oldfid)) {
+ fid_atomic_dec(dfid);
return PTR_ERR(oldfid);
+ }

err = p9_client_link(dfid, oldfid, dentry->d_name.name);
-
+ fid_atomic_dec(dfid);
+ fid_atomic_dec(oldfid);
if (err < 0) {
p9_debug(P9_DEBUG_VFS, "p9_client_link failed %d\n", err);
return err;
@@ -784,10 +797,9 @@ v9fs_vfs_link_dotl(struct dentry *old_dentry, struct inode *dir,
if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
/* Get the latest stat info from server. */
struct p9_fid *fid;
- fid = v9fs_fid_lookup(old_dentry);
+ fid = v9fs_fid_lookup(old_dentry, 0);
if (IS_ERR(fid))
return PTR_ERR(fid);
-
v9fs_refresh_inode_dotl(fid, d_inode(old_dentry));
}
ihold(d_inode(old_dentry));
@@ -830,7 +842,6 @@ v9fs_vfs_mknod_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
dfid = NULL;
goto error;
}
-
gid = v9fs_get_fsgid_for_create(dir);
mode = omode;
/* Update mode based on ACL value */
@@ -887,6 +898,7 @@ v9fs_vfs_mknod_dotl(struct inode *dir, struct dentry *dentry, umode_t omode,
if (fid)
p9_client_clunk(fid);
v9fs_put_acl(dacl, pacl);
+ fid_atomic_dec(dfid);
return err;
}

@@ -911,10 +923,11 @@ v9fs_vfs_get_link_dotl(struct dentry *dentry,

p9_debug(P9_DEBUG_VFS, "%pd\n", dentry);

- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 0);
if (IS_ERR(fid))
return ERR_CAST(fid);
retval = p9_client_readlink(fid, &target);
+ fid_atomic_dec(fid);
if (retval)
return ERR_PTR(retval);
set_delayed_call(done, kfree_link, target);
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a..ef35d4f07c40 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -241,7 +241,7 @@ static int v9fs_statfs(struct dentry *dentry, struct kstatfs *buf)
struct p9_rstatfs rs;
int res;

- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 1);
if (IS_ERR(fid)) {
res = PTR_ERR(fid);
goto done;
@@ -262,10 +262,13 @@ static int v9fs_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_fsid.val[1] = (rs.fsid >> 32) & 0xFFFFFFFFUL;
buf->f_namelen = rs.namelen;
}
- if (res != -ENOSYS)
+ if (res != -ENOSYS) {
+ fid_atomic_dec(fid);
goto done;
+ }
}
res = simple_statfs(dentry, buf);
+ fid_atomic_dec(fid);
done:
return res;
}
diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c
index ac8ff8ca4c11..f4e90a2dc975 100644
--- a/fs/9p/xattr.c
+++ b/fs/9p/xattr.c
@@ -71,14 +71,17 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
void *buffer, size_t buffer_size)
{
struct p9_fid *fid;
+ int ret;

p9_debug(P9_DEBUG_VFS, "name = %s value_len = %zu\n",
name, buffer_size);
- fid = v9fs_fid_lookup(dentry);
+ fid = v9fs_fid_lookup(dentry, 1);
if (IS_ERR(fid))
return PTR_ERR(fid);
+ ret = v9fs_fid_xattr_get(fid, name, buffer, buffer_size);
+ fid_atomic_dec(fid);

- return v9fs_fid_xattr_get(fid, name, buffer, buffer_size);
+ return ret;
}

/*
@@ -96,8 +99,15 @@ ssize_t v9fs_xattr_get(struct dentry *dentry, const char *name,
int v9fs_xattr_set(struct dentry *dentry, const char *name,
const void *value, size_t value_len, int flags)
{
- struct p9_fid *fid = v9fs_fid_lookup(dentry);
- return v9fs_fid_xattr_set(fid, name, value, value_len, flags);
+ int ret;
+ struct p9_fid *fid;
+
+ fid = v9fs_fid_lookup(dentry, 1);
+ if (IS_ERR(fid))
+ return PTR_ERR(fid);
+ ret = v9fs_fid_xattr_set(fid, name, value, value_len, flags);
+ fid_atomic_dec(fid);
+ return ret;
}

int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name,
diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index ce7882da8e86..831cb1a903b1 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -140,14 +140,21 @@ struct p9_client {
*
* TODO: This needs lots of explanation.
*/
+enum fid_source {
+ FID_FROM_OTHER,
+ FID_FROM_INODE,
+ FID_FROM_DENTRY,
+};

struct p9_fid {
struct p9_client *clnt;
u32 fid;
+ atomic_t count;
int mode;
struct p9_qid qid;
u32 iounit;
kuid_t uid;
+ enum fid_source s;

void *rdir;

diff --git a/net/9p/client.c b/net/9p/client.c
index 1a3f72bf45fc..51f1277ba58d 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -901,6 +901,8 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
fid->clnt = clnt;
fid->rdir = NULL;
fid->fid = 0;
+ fid->s = 0;
+ atomic_set(&fid->count, 0);

idr_preload(GFP_KERNEL);
spin_lock_irq(&clnt->lock);
--
2.17.1

2020-09-14 03:43:01

by Jianyong Wu

[permalink] [raw]
Subject: [PATCH RFC 1/4] fs/9p: fix create-unlink-getattr idiom

From: Eric Van Hensbergen <[email protected]>

Fixes several outstanding bug reports of not being able to getattr from an
open file after an unlink. This patch cleans up transient fids on an unlink
and will search open fids on a client if it detects a dentry that appears to
have been unlinked. This search is necessary because fstat does not pass fd
information through the VFS API to the filesystem, only the dentry which for
9p has an imperfect match to fids.

Inherent in this patch is also a fix for the qid handling on create/open
which apparently wasn't being set correctly and was necessary for the search
to succeed.

A possible optimization over this fix is to include accounting of open fids
with the inode in the private data (in a similar fashion to the way we track
transient fids with dentries). This would allow a much quicker search for
a matching open fid.

Signed-off-by: Eric Van Hensbergen <[email protected]>
(changed v9fs_fid_find_global to v9fs_fid_find_inode in comment)
Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Jianyong Wu <[email protected]>

Change-Id: Ifd5c8cdca8b40216e3e7d021eb6d0afd750096e7
---
fs/9p/fid.c | 30 ++++++++++++++++++++++++++++++
fs/9p/vfs_inode.c | 4 ++++
net/9p/client.c | 5 ++++-
3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/fs/9p/fid.c b/fs/9p/fid.c
index 3d681a2c2731..3304984c0fad 100644
--- a/fs/9p/fid.c
+++ b/fs/9p/fid.c
@@ -38,6 +38,33 @@ void v9fs_fid_add(struct dentry *dentry, struct p9_fid *fid)
spin_unlock(&dentry->d_lock);
}

+/**
+ * v9fs_fid_find_inode - search for a fid off of the client list
+ * @inode: return a fid pointing to a specific inode
+ * @uid: return a fid belonging to the specified user
+ *
+ */
+
+static struct p9_fid *v9fs_fid_find_inode(struct inode *inode, kuid_t uid)
+{
+ struct p9_client *clnt = v9fs_inode2v9ses(inode)->clnt;
+ struct p9_fid *fid, *fidptr, *ret = NULL;
+ unsigned long flags;
+
+ p9_debug(P9_DEBUG_VFS, " inode: %p\n", inode);
+
+ spin_lock_irqsave(&clnt->lock, flags);
+ list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
+ if (uid_eq(fid->uid, uid) &&
+ (inode->i_ino == v9fs_qid2ino(&fid->qid))) {
+ ret = fid;
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&clnt->lock, flags);
+ return ret;
+}
+
/**
* v9fs_fid_find - retrieve a fid that belongs to the specified uid
* @dentry: dentry to look for fid in
@@ -65,6 +92,9 @@ static struct p9_fid *v9fs_fid_find(struct dentry *dentry, kuid_t uid, int any)
}
}
spin_unlock(&dentry->d_lock);
+ } else {
+ if (dentry->d_inode)
+ ret = v9fs_fid_find_inode(dentry->d_inode, uid);
}

return ret;
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index ae0c38ad1fcb..31c2fddabb82 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -570,6 +570,10 @@ static int v9fs_remove(struct inode *dir, struct dentry *dentry, int flags)

v9fs_invalidate_inode_attr(inode);
v9fs_invalidate_inode_attr(dir);
+
+ /* invalidate all fids associated with dentry */
+ /* NOTE: This will not include open fids */
+ dentry->d_op->d_release(dentry);
}
return retval;
}
diff --git a/net/9p/client.c b/net/9p/client.c
index 09f1ec589b80..1a3f72bf45fc 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1219,7 +1219,7 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
if (nwname)
memmove(&fid->qid, &wqids[nwqids - 1], sizeof(struct p9_qid));
else
- fid->qid = oldfid->qid;
+ memmove(&fid->qid, &oldfid->qid, sizeof(struct p9_qid));

kfree(wqids);
return fid;
@@ -1272,6 +1272,7 @@ int p9_client_open(struct p9_fid *fid, int mode)
p9_is_proto_dotl(clnt) ? "RLOPEN" : "ROPEN", qid.type,
(unsigned long long)qid.path, qid.version, iounit);

+ memmove(&fid->qid, &qid, sizeof(struct p9_qid));
fid->mode = mode;
fid->iounit = iounit;

@@ -1317,6 +1318,7 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
(unsigned long long)qid->path,
qid->version, iounit);

+ memmove(&ofid->qid, qid, sizeof(struct p9_qid));
ofid->mode = mode;
ofid->iounit = iounit;

@@ -1362,6 +1364,7 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
(unsigned long long)qid.path,
qid.version, iounit);

+ memmove(&fid->qid, &qid, sizeof(struct p9_qid));
fid->mode = mode;
fid->iounit = iounit;

--
2.17.1

2020-09-14 05:58:01

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.


Thanks for having a look a this!

Jianyong Wu wrote on Mon, Sep 14, 2020:
> Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
> bug in 9p. But there is race issue in fid comtention.
> As Greg's patch stores all of fids from opened files into according inode,
> so all the lookup fid ops can retrieve fid from inode preferentially. But
> there is no mechanism to handle the fid comtention issue. For example,
> there are two threads get the same fid in the same time and one of them
> clunk the fid before the other thread ready to discard the fid. In this
> scenario, it will lead to some fatal problems, even kernel core dump.

Ah, so that's what the problem was. Good job finding the problem!


> I introduce a mechanism to fix this race issue. A counter field introduced
> into p9_fid struct to store the reference counter to the fid. When a fid
> is allocated from the inode, the counter will increase, and will decrease
> at the end of its occupation. It is guaranteed that the fid won't be clunked
> before the reference counter go down to 0, then we can avoid the clunked
> fid to be used.
> As there is no need to retrieve fid from inode in all conditions, a enum value
> denotes the source of the fid is introduced to 9p_fid either. So we can only
> handle the reference counter as to the fid obtained from inode.

If there is no contention then an always-one refcount and an enum are
the same thing.
I'd rather not make a difference but make it a full-fledged refcount
thing; the enum in the code introduces quite a bit of code churn that
doesn't strike me as useful (and I don't like int arguments like this,
but if we can just do away with it there's no need to argue about that)

Not having exceptions for that will also make the code around
fid_atomic_dec much simpler: just have clunk do an atomic dec and only
do the actual clunk if that hit zero, and we should be able to get rid
of that helper?


Timing wise it's a bit awkward but I just dug out the async clunk
mechanism I wrote two years ago, that will conflict with this patch but
might also help a bit I guess?
I should probably have reposted them...


So to recap:
- Let's try some more straight-forward refcounting: set to 1 on alloc,
increment when it's found in fid.c, decrement in clunk and only send the
actual clunk if counter hit 0

- Ideally base yourself of my 9p-test branch to have async clunk:
https://github.com/martinetd/linux/commits/9p-test
I've been promising to push it to next this week™ for a couple of weeks
but if something is based on it I won't be able to delay this much
longer, it'll get pushed to 5.10 cycle anyway.
(I'll resend the patches to be clean)

- (please, no polling 10ms then leaking something!)


Thanks,
--
Dominique

2020-09-14 06:04:45

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH RFC 1/4] fs/9p: fix create-unlink-getattr idiom

Jianyong Wu wrote on Mon, Sep 14, 2020:
> Signed-off-by: Greg Kurz <[email protected]>

Just on a note on that mail: [email protected] has no longer been
working for a while, probably want to update to [email protected] in both
first two patches.
Greg, sorry I had forgotten to fix Cc earlier, can you confirm?


I'll (re)do a proper review of the first three patches again in a bit
but iirc they looked good on paper, will resend a mail if required.

--
Dominique

2020-09-14 06:34:05

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Dominique Martinet wrote on Mon, Sep 14, 2020:
> Jianyong Wu wrote on Mon, Sep 14, 2020:
> - Ideally base yourself of my 9p-test branch to have async clunk:
> https://github.com/martinetd/linux/commits/9p-test
> I've been promising to push it to next this week™ for a couple of weeks
> but if something is based on it I won't be able to delay this much
> longer, it'll get pushed to 5.10 cycle anyway.
> (I'll resend the patches to be clean)
>
>> tests:
>> race issue test from the old test case:
>> for file in {01..50}; do touch f.${file}; done
>> seq 1 1000 | xargs -n 1 -P 50 -I{} cat f.* > /dev/null

hmpf, so that made me insist a bit on this test on my patch and I see
a problem with that as well. The me from a few years ago was good!

With that said I'll want to work a bit more on this, so feel free to
base off master and I'll deal with rebase if required.

Part of me thinks it's the same bug that will be fixed with refcounting
and I just made it easier to hit, but I'm honestly unsure at this point
and testing would basically mean I just code what I asked you to...

Well, let me know if you want me to do the refcounting, but I'd rather
let you finish what you started. If possible put the patch first in the
series so commits can be tested independently.


Thanks,
--
Dominique

2020-09-14 07:36:16

by Jianyong Wu

[permalink] [raw]
Subject: RE: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <[email protected]>
> Sent: Monday, September 14, 2020 1:56 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; v9fs-
> [email protected]; [email protected]; Justin He
> <[email protected]>; Greg Kurz <[email protected]>
> Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.
>
>
> Thanks for having a look a this!
>
> Jianyong Wu wrote on Mon, Sep 14, 2020:
> > Eric's and Greg's patch offer a mechanism to fix open-unlink-f*syscall
> > bug in 9p. But there is race issue in fid comtention.
> > As Greg's patch stores all of fids from opened files into according
> > inode, so all the lookup fid ops can retrieve fid from inode
> > preferentially. But there is no mechanism to handle the fid comtention
> > issue. For example, there are two threads get the same fid in the same
> > time and one of them clunk the fid before the other thread ready to
> > discard the fid. In this scenario, it will lead to some fatal problems, even
> kernel core dump.
>
> Ah, so that's what the problem was. Good job finding the problem!
>
Thanks! Very pleasure.
>
> > I introduce a mechanism to fix this race issue. A counter field
> > introduced into p9_fid struct to store the reference counter to the
> > fid. When a fid is allocated from the inode, the counter will
> > increase, and will decrease at the end of its occupation. It is
> > guaranteed that the fid won't be clunked before the reference counter
> > go down to 0, then we can avoid the clunked fid to be used.
> > As there is no need to retrieve fid from inode in all conditions, a
> > enum value denotes the source of the fid is introduced to 9p_fid
> > either. So we can only handle the reference counter as to the fid obtained
> from inode.
>
> If there is no contention then an always-one refcount and an enum are the
> same thing.
> I'd rather not make a difference but make it a full-fledged refcount thing; the
> enum in the code introduces quite a bit of code churn that doesn't strike me
> as useful (and I don't like int arguments like this, but if we can just do away
> with it there's no need to argue about that)
>
> Not having exceptions for that will also make the code around
> fid_atomic_dec much simpler: just have clunk do an atomic dec and only do
> the actual clunk if that hit zero, and we should be able to get rid of that
> helper?
>
Sorry, I think always-one refcount won't work at this point, as the fid will be clunked only by
File context itself not the every consumer of every fid. We can't decrease the refcounter at just one
static point. Am I wrong?
This enum value is not functionally necessary, but I think it can reduce the contention of fid, as there are
really lots of scenarios that fid from inode is not necessary.

>
> Timing wise it's a bit awkward but I just dug out the async clunk mechanism I
> wrote two years ago, that will conflict with this patch but might also help a bit
> I guess?
> I should probably have reposted them...
>
Interesting!

>
> So to recap:
> - Let's try some more straight-forward refcounting: set to 1 on alloc,
> increment when it's found in fid.c, decrement in clunk and only send the
> actual clunk if counter hit 0
>
it may not work, I think.

> - Ideally base yourself of my 9p-test branch to have async clunk:
> https://github.com/martinetd/linux/commits/9p-test
> I've been promising to push it to next this week™ for a couple of weeks but if
> something is based on it I won't be able to delay this much longer, it'll get
> pushed to 5.10 cycle anyway.
> (I'll resend the patches to be clean)
>
> - (please, no polling 10ms then leaking something!)
>
Yeah, it will lead fid to leak sometimes, unfortunately, I'm afraid that the CPU may be stuck here. we
must wait here (v9fs_dir_release) for the counter down to 0, as this is the only place to release the fid.
That's the problem.

Thanks
Jianyong
> Thanks,
> --
> Dominique
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2020-09-14 07:54:02

by Jianyong Wu

[permalink] [raw]
Subject: RE: [V9fs-developer] [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <[email protected]>
> Sent: Monday, September 14, 2020 2:32 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; Justin He <[email protected]>; [email protected];
> [email protected]; [email protected]; Greg
> Kurz <[email protected]>
> Subject: Re: [V9fs-developer] [PATCH RFC 4/4] 9p: fix race issue in fid
> contention.
>
> Dominique Martinet wrote on Mon, Sep 14, 2020:
> > Jianyong Wu wrote on Mon, Sep 14, 2020:
> > - Ideally base yourself of my 9p-test branch to have async clunk:
> > https://github.com/martinetd/linux/commits/9p-test
> > I've been promising to push it to next this week™ for a couple of
> > weeks but if something is based on it I won't be able to delay this
> > much longer, it'll get pushed to 5.10 cycle anyway.
> > (I'll resend the patches to be clean)
> >
> >> tests:
> >> race issue test from the old test case:
> >> for file in {01..50}; do touch f.${file}; done seq 1 1000 | xargs -n
> >> 1 -P 50 -I{} cat f.* > /dev/null
>
> hmpf, so that made me insist a bit on this test on my patch and I see a
> problem with that as well. The me from a few years ago was good!
>
> With that said I'll want to work a bit more on this, so feel free to base off
> master and I'll deal with rebase if required.
>
> Part of me thinks it's the same bug that will be fixed with refcounting and I
> just made it easier to hit, but I'm honestly unsure at this point and testing
> would basically mean I just code what I asked you to...
>
> Well, let me know if you want me to do the refcounting, but I'd rather let you
> finish what you started.

Thanks, I'm happy to work this.
>If possible put the patch first in the series so commits
> can be tested independently.

Ah, this patch depends on the previous patches, how can I put it as the first of the series?

Thanks
Jianyong

> Thanks,
> --
> Dominique
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2020-09-14 08:14:05

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH RFC 1/4] fs/9p: fix create-unlink-getattr idiom

On Mon, 14 Sep 2020 08:00:36 +0200
Dominique Martinet <[email protected]> wrote:

> Jianyong Wu wrote on Mon, Sep 14, 2020:
> > Signed-off-by: Greg Kurz <[email protected]>
>
> Just on a note on that mail: [email protected] has no longer been
> working for a while, probably want to update to [email protected] in both
> first two patches.
> Greg, sorry I had forgotten to fix Cc earlier, can you confirm?
>

Np :) and yes I confirm that [email protected] is to be used now.

14a36a435343 ("mailmap: add entry for Greg Kurz")

>
> I'll (re)do a proper review of the first three patches again in a bit
> but iirc they looked good on paper, will resend a mail if required.
>

2020-09-14 08:33:30

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Jianyong Wu wrote on Mon, Sep 14, 2020:
> > Not having exceptions for that will also make the code around
> > fid_atomic_dec much simpler: just have clunk do an atomic dec and only do
> > the actual clunk if that hit zero, and we should be able to get rid of that
> > helper?
>
> Sorry, I think always-one refcount won't work at this point, as the
> fid will be clunked only by file context itself not the every consumer
> of every fid. We can't decrease the refcounter at just one static
> point.
> Am I wrong?

I don't understand the "We can't decrease the refcounter at just one
static point".
Basically everywhere you added a fid_atomic_dec() will just need to be
changed to clunk -- the basic rule of refcounting is that anywhere you
take a ref you need to put it back.

All these places take a fid to do some RPC already so it's not a problem
to add a clunk and do one more; especially since the "clunk" will just
be just a deref.
For consistency I'd advocate for the kref API as we use that for
requests already; it might be better to rename the clunk calls to
p9_fid_put or something but I think that's more churn than it's
worth....


Is there anywhere you think cannot do that?

> This enum value is not functionally necessary, but I think it can
> reduce the contention of fid, as there are really lots of scenarios
> that fid from inode is not necessary.

I really don't think it makes things slower if done correctly (e.g. no
waiting as currently done but the last deref does the actual clunk), and
would rather keep code simpler unless the difference is big (so would
need to do an actual benchmark of both if you're convinced it helps) --
sorry.

>> If possible put the patch first in the series so commits can be
>> tested independently.
>
> Ah, this patch depends on the previous patches, how can I put it as
> the first of the series?

Basically build the logic in the first patch, then either apply the
other three as close as they currently are as possible and add the
missing refcalls in a new patch or incorporate the refcounting in them
as well.
It's fine if you keep it it last, that was just a greedy request on my
part to be able to test async clunk more easily ; forget about it.

--
Dominique

2020-09-14 08:44:44

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH RFC 0/4] 9p: fix open-unlink-f*syscall bug

On Mon, 14 Sep 2020 11:37:50 +0800
Jianyong Wu <[email protected]> wrote:

> open-unlink-f*syscall bug is a well-known bug in 9p, we try to fix the bug
> in this patch set.
> I take Eric's and Greg's patches which constiute the 1/4 - 3/4 of this patch
> set as the main frame of the solution. In patch 4/4, I fix the fid race issue
> exists in Greg's patch.
>

IIRC some patches were needed on the QEMU side as well... I'm spending
less time on 9pfs in QEMU, so Cc'ing the new maintainer:

Christian Schoenebeck <[email protected]>

> Eric Van Hensbergen (1):
> fs/9p: fix create-unlink-getattr idiom
>
> Greg Kurz (1):
> fs/9p: search open fids first
>
> Jianyong Wu (2):
> fs/9p: track open fids
> 9p: fix race issue in fid contention.
>
> fs/9p/fid.c | 72 +++++++++++++++++++++++++++++++++++------
> fs/9p/fid.h | 25 +++++++++++---
> fs/9p/vfs_dentry.c | 2 +-
> fs/9p/vfs_dir.c | 20 ++++++++++--
> fs/9p/vfs_file.c | 3 +-
> fs/9p/vfs_inode.c | 52 +++++++++++++++++++++--------
> fs/9p/vfs_inode_dotl.c | 44 ++++++++++++++++---------
> fs/9p/vfs_super.c | 7 ++--
> fs/9p/xattr.c | 18 ++++++++---
> include/net/9p/client.h | 8 +++++
> net/9p/client.c | 7 +++-
> 11 files changed, 206 insertions(+), 52 deletions(-)
>

2020-09-14 12:40:36

by Jianyong Wu

[permalink] [raw]
Subject: RE: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <[email protected]>
> Sent: Monday, September 14, 2020 4:32 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; v9fs-
> [email protected]; [email protected]; Justin He
> <[email protected]>; Greg Kurz <[email protected]>
> Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.
>
> Jianyong Wu wrote on Mon, Sep 14, 2020:
> > > Not having exceptions for that will also make the code around
> > > fid_atomic_dec much simpler: just have clunk do an atomic dec and
> > > only do the actual clunk if that hit zero, and we should be able to
> > > get rid of that helper?
> >
> > Sorry, I think always-one refcount won't work at this point, as the
> > fid will be clunked only by file context itself not the every consumer
> > of every fid. We can't decrease the refcounter at just one static
> > point.
> > Am I wrong?
>
> I don't understand the "We can't decrease the refcounter at just one static
> point".
> Basically everywhere you added a fid_atomic_dec() will just need to be
> changed to clunk -- the basic rule of refcounting is that anywhere you take a
> ref you need to put it back.
>
Oh, maybe I just miss your point. It is ok to put the fid_atomic_dec() into p9_client_clunk() and
Let the clunk replace the refcount dec.

> All these places take a fid to do some RPC already so it's not a problem to add
> a clunk and do one more; especially since the "clunk" will just be just a deref.
> For consistency I'd advocate for the kref API as we use that for requests
> already; it might be better to rename the clunk calls to p9_fid_put or
> something but I think that's more churn than it's worth....
>
Ok, I see.
>
> Is there anywhere you think cannot do that?
>
No.
> > This enum value is not functionally necessary, but I think it can
> > reduce the contention of fid, as there are really lots of scenarios
> > that fid from inode is not necessary.
>
> I really don't think it makes things slower if done correctly (e.g. no waiting as
> currently done but the last deref does the actual clunk), and would rather
> keep code simpler unless the difference is big (so would need to do an actual
> benchmark of both if you're convinced it helps) -- sorry.
>
Ok, fair enough.

> >> If possible put the patch first in the series so commits can be
> >> tested independently.
> >
> > Ah, this patch depends on the previous patches, how can I put it as
> > the first of the series?
>
> Basically build the logic in the first patch, then either apply the other three as
> close as they currently are as possible and add the missing refcalls in a new
> patch or incorporate the refcounting in them as well.
> It's fine if you keep it it last, that was just a greedy request on my part to be
> able to test async clunk more easily ; forget about it.

Ok, keep this in original state is easy for me.

Thanks
Jianyong
>
> --
> Dominique
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2020-09-14 12:43:40

by Jianyong Wu

[permalink] [raw]
Subject: RE: [V9fs-developer] [PATCH RFC 0/4] 9p: fix open-unlink-f*syscall bug



> -----Original Message-----
> From: Greg Kurz <[email protected]>
> Sent: Monday, September 14, 2020 4:36 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; v9fs-
> [email protected]; Justin He <[email protected]>; linux-
> [email protected]; Christian Schoenebeck
> <[email protected]>
> Subject: Re: [V9fs-developer] [PATCH RFC 0/4] 9p: fix open-unlink-f*syscall
> bug
>
> On Mon, 14 Sep 2020 11:37:50 +0800
> Jianyong Wu <[email protected]> wrote:
>
> > open-unlink-f*syscall bug is a well-known bug in 9p, we try to fix the
> > bug in this patch set.
> > I take Eric's and Greg's patches which constiute the 1/4 - 3/4 of this
> > patch set as the main frame of the solution. In patch 4/4, I fix the
> > fid race issue exists in Greg's patch.
> >
>
> IIRC some patches were needed on the QEMU side as well... I'm spending
> less time on 9pfs in QEMU, so Cc'ing the new maintainer:
>
> Christian Schoenebeck <[email protected]>
>
Ok, very kind of you.

Thanks
Jianyong
> > Eric Van Hensbergen (1):
> > fs/9p: fix create-unlink-getattr idiom
> >
> > Greg Kurz (1):
> > fs/9p: search open fids first
> >
> > Jianyong Wu (2):
> > fs/9p: track open fids
> > 9p: fix race issue in fid contention.
> >
> > fs/9p/fid.c | 72 +++++++++++++++++++++++++++++++++++------
> > fs/9p/fid.h | 25 +++++++++++---
> > fs/9p/vfs_dentry.c | 2 +-
> > fs/9p/vfs_dir.c | 20 ++++++++++--
> > fs/9p/vfs_file.c | 3 +-
> > fs/9p/vfs_inode.c | 52 +++++++++++++++++++++--------
> > fs/9p/vfs_inode_dotl.c | 44 ++++++++++++++++---------
> > fs/9p/vfs_super.c | 7 ++--
> > fs/9p/xattr.c | 18 ++++++++---
> > include/net/9p/client.h | 8 +++++
> > net/9p/client.c | 7 +++-
> > 11 files changed, 206 insertions(+), 52 deletions(-)
> >

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2020-09-14 17:54:04

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH RFC 0/4] 9p: fix open-unlink-f*syscall bug

On Montag, 14. September 2020 10:35:46 CEST Greg Kurz wrote:
> On Mon, 14 Sep 2020 11:37:50 +0800
>
> Jianyong Wu <[email protected]> wrote:
> > open-unlink-f*syscall bug is a well-known bug in 9p, we try to fix the bug
> > in this patch set.
> > I take Eric's and Greg's patches which constiute the 1/4 - 3/4 of this
> > patch set as the main frame of the solution. In patch 4/4, I fix the fid
> > race issue exists in Greg's patch.
>
> IIRC some patches were needed on the QEMU side as well... I'm spending
> less time on 9pfs in QEMU, so Cc'ing the new maintainer:
>
> Christian Schoenebeck <[email protected]>

AFAICS this is about this old bug report:
https://bugs.launchpad.net/qemu/+bug/1336794

So yes, looks like this also requires changes to the 9pfs 'local' fs driver on
QEMU side:
https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07586.html

Eric, Greg, would there be an easy way to establish QEMU test cases running
the 9pfs 'local' fs driver? Right now we only have 9pfs qtest cases for QEMU
which can only use the 'synth' driver, which is not helpful for such kind of
issues.

Best regards,
Christian Schoenebeck


2020-09-18 09:01:22

by Jianyong Wu

[permalink] [raw]
Subject: RE: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet <[email protected]>
> Sent: Monday, September 14, 2020 4:32 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; v9fs-
> [email protected]; [email protected]; Justin He
> <[email protected]>; Greg Kurz <[email protected]>
> Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.
>
> Jianyong Wu wrote on Mon, Sep 14, 2020:
> > > Not having exceptions for that will also make the code around
> > > fid_atomic_dec much simpler: just have clunk do an atomic dec and
> > > only do the actual clunk if that hit zero, and we should be able to
> > > get rid of that helper?
> >
> > Sorry, I think always-one refcount won't work at this point, as the
> > fid will be clunked only by file context itself not the every consumer
> > of every fid. We can't decrease the refcounter at just one static
> > point.
> > Am I wrong?
>
> I don't understand the "We can't decrease the refcounter at just one static
> point".
> Basically everywhere you added a fid_atomic_dec() will just need to be
> changed to clunk -- the basic rule of refcounting is that anywhere you take a
> ref you need to put it back.
>
> All these places take a fid to do some RPC already so it's not a problem to add
> a clunk and do one more; especially since the "clunk" will just be just a deref.
> For consistency I'd advocate for the kref API as we use that for requests
> already; it might be better to rename the clunk calls to p9_fid_put or
> something but I think that's more churn than it's worth....
>
If we move the counter decrease code into p9_client_clunk and put it instead of fid_atomic_dec, we need delete fid off the inode where it stores in p9_client_clunk.
But there is no way can we acquire the inode in p9_client_clunk. Do you have any idea? I think introduce another parameter for p9_client_clunk
Is not graceful.

Thanks
Jianyong Wu
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2020-09-18 09:38:55

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.

Jianyong Wu wrote on Fri, Sep 18, 2020:
> If we move the counter decrease code into p9_client_clunk and put it
> instead of fid_atomic_dec, we need delete fid off the inode where it
> stores in p9_client_clunk.
> But there is no way can we acquire the inode in p9_client_clunk. Do
> you have any idea? I think introduce another parameter for
> p9_client_clunk
> Is not graceful.

You cannot write code about the inode in p9_client_clunk, the way the
code is split fs/9p can refer to net/9p but not the other way around
(module-wise, 9p can refer to 9pnet but 9pnet cannot refer to 9p or we
would have cyclic dependencies)

However I don't see what bothers you.
v9fs_dir_release can remove the fid from the inode as it does currently
and just clunk immediately afterwards.


If another user of the fid had gotten the fid from the inode previously,
it has a ref, so the fid will not be actually clunked then but it will
be clunked later when it is done being used -- that is perfectly fine ?

p9_client_clunk should not have to worry about anything in the vfs.

--
Dominique

2020-09-18 10:09:29

by Jianyong Wu

[permalink] [raw]
Subject: RE: [PATCH RFC 4/4] 9p: fix race issue in fid contention.



> -----Original Message-----
> From: Dominique Martinet <[email protected]>
> Sent: Friday, September 18, 2020 5:35 PM
> To: Jianyong Wu <[email protected]>
> Cc: [email protected]; [email protected]; v9fs-
> [email protected]; [email protected]; Justin He
> <[email protected]>; Greg Kurz <[email protected]>
> Subject: Re: [PATCH RFC 4/4] 9p: fix race issue in fid contention.
>
> Jianyong Wu wrote on Fri, Sep 18, 2020:
> > If we move the counter decrease code into p9_client_clunk and put it
> > instead of fid_atomic_dec, we need delete fid off the inode where it
> > stores in p9_client_clunk.
> > But there is no way can we acquire the inode in p9_client_clunk. Do
> > you have any idea? I think introduce another parameter for
> > p9_client_clunk Is not graceful.
>
> You cannot write code about the inode in p9_client_clunk, the way the code
> is split fs/9p can refer to net/9p but not the other way around (module-wise,
> 9p can refer to 9pnet but 9pnet cannot refer to 9p or we would have cyclic
> dependencies)
>
> However I don't see what bothers you.
> v9fs_dir_release can remove the fid from the inode as it does currently and
> just clunk immediately afterwards.
>
>
> If another user of the fid had gotten the fid from the inode previously, it has
> a ref, so the fid will not be actually clunked then but it will be clunked later
> when it is done being used -- that is perfectly fine ?
>
> p9_client_clunk should not have to worry about anything in the vfs.
>
Yeah, I see. That's it.

Thanks
Jianyong
> --
> Dominique
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.