2022-03-22 18:33:50

by Dharmendra Singh

[permalink] [raw]
Subject: [PATCH v2 0/2] FUSE: Implement atomic lookup + open

In FUSE, as of now, uncached lookups are expensive over the wire.
E.g additional latencies and stressing (meta data) servers from
thousands of clients. These lookup calls possibly can be avoided
in some cases. Incoming two patches addresses this issue.

First patch handles the case where we open first time a file/dir or create
a file (O_CREAT) but do a lookup first on it. After lookup is performed
we make another call into libfuse to open the file. Now these two separate
calls into libfuse can be combined and performed as a single call into
libfuse.

Second patch handles the case when we are opening an already existing file
(positive dentry). Before this open call, we re-validate the inode and
this re-validation does a lookup on the file and verify the inode.
This separate lookup also can be avoided (for non-dir) and combined
with open call into libfuse.

Here is the link to the libfuse pull request which implements atomic open
https://github.com/libfuse/libfuse/pull/644

I am going to post performance results shortly.


Dharmendra Singh (2):
FUSE: Implement atomic lookup + open
FUSE: Avoid lookup in d_revalidate()

fs/fuse/dir.c | 179 +++++++++++++++++++++++++++++++++-----
fs/fuse/file.c | 30 ++++++-
fs/fuse/fuse_i.h | 13 ++-
fs/fuse/inode.c | 4 +-
fs/fuse/ioctl.c | 2 +-
include/uapi/linux/fuse.h | 2 +
6 files changed, 204 insertions(+), 26 deletions(-)

--
2.17.1


2022-03-23 07:16:31

by Dharmendra Singh

[permalink] [raw]
Subject: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

From: Dharmendra Singh <[email protected]>

There are couple of places in FUSE where we do agressive
lookup.
1) When we go for creating a file (O_CREAT), we do lookup
for non-existent file. It is very much likely that file
does not exists yet as O_CREAT is passed to open(). This
lookup can be avoided and can be performed as part of
open call into libfuse.

2) When there is normal open for file/dir (dentry is
new/negative). In this case since we are anyway going to open
the file/dir with USER space, avoid this separate lookup call
into libfuse and combine it with open.

This lookup + open in single call to libfuse and finally to
USER space has been named as atomic open. It is expected
that USER space open the file and fills in the attributes
which are then used to make inode stand/revalidate in the
kernel cache.

Signed-off-by: Dharmendra Singh <[email protected]>
---
v2 patch includes:
- disabled o-create atomicity when the user space file system
does not have an atomic_open implemented. In principle lookups
for O_CREATE also could be optimized out, but there is a risk
to break existing fuse file systems. Those file system might
not expect open O_CREATE calls for exiting files, as these calls
had been so far avoided as lookup was done first.

fs/fuse/dir.c | 113 +++++++++++++++++++++++++++++++-------
fs/fuse/fuse_i.h | 3 +
fs/fuse/inode.c | 4 +-
include/uapi/linux/fuse.h | 2 +
4 files changed, 101 insertions(+), 21 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 656e921f3506..b2613eb87a4e 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -516,16 +516,14 @@ static int get_security_context(struct dentry *entry, umode_t mode,
}

/*
- * Atomic create+open operation
- *
- * If the filesystem doesn't support this, then fall back to separate
- * 'mknod' + 'open' requests.
+ * Perform create + open or lookup + open in single call to libfuse
*/
-static int fuse_create_open(struct inode *dir, struct dentry *entry,
- struct file *file, unsigned int flags,
- umode_t mode)
+static int fuse_atomic_open_common(struct inode *dir, struct dentry *entry,
+ struct dentry **alias, struct file *file,
+ unsigned int flags, umode_t mode,
+ uint32_t opcode)
{
- int err;
+ bool create = (opcode == FUSE_CREATE ? true : false);
struct inode *inode;
struct fuse_mount *fm = get_fuse_mount(dir);
FUSE_ARGS(args);
@@ -535,11 +533,16 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
struct fuse_entry_out outentry;
struct fuse_inode *fi;
struct fuse_file *ff;
+ struct dentry *res = NULL;
void *security_ctx = NULL;
u32 security_ctxlen;
+ int err;
+
+ if (alias)
+ *alias = NULL;

/* Userspace expects S_IFREG in create mode */
- BUG_ON((mode & S_IFMT) != S_IFREG);
+ BUG_ON(create && (mode & S_IFMT) != S_IFREG);

forget = fuse_alloc_forget();
err = -ENOMEM;
@@ -554,7 +557,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
if (!fm->fc->dont_mask)
mode &= ~current_umask();

- flags &= ~O_NOCTTY;
+ if (!create) {
+ flags = flags & ~(O_CREAT | O_EXCL | O_NOCTTY);
+ if (!fm->fc->atomic_o_trunc)
+ flags &= ~O_TRUNC;
+ } else {
+ flags &= ~O_NOCTTY;
+ }
memset(&inarg, 0, sizeof(inarg));
memset(&outentry, 0, sizeof(outentry));
inarg.flags = flags;
@@ -566,7 +575,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
}

- args.opcode = FUSE_CREATE;
+ args.opcode = opcode;
args.nodeid = get_node_id(dir);
args.in_numargs = 2;
args.in_args[0].size = sizeof(inarg);
@@ -595,8 +604,12 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
if (err)
goto out_free_ff;

+ err = -ENOENT;
+ if (!S_ISDIR(outentry.attr.mode) && !outentry.nodeid)
+ goto out_free_ff;
+
err = -EIO;
- if (!S_ISREG(outentry.attr.mode) || invalid_nodeid(outentry.nodeid) ||
+ if (invalid_nodeid(outentry.nodeid) ||
fuse_invalid_attr(&outentry.attr))
goto out_free_ff;

@@ -612,10 +625,32 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
err = -ENOMEM;
goto out_err;
}
+ if (!fm->fc->do_atomic_open)
+ d_instantiate(entry, inode);
+ else {
+ res = d_splice_alias(inode, entry);
+ if (res) {
+ /* Close the file in user space, but do not unlink it,
+ * if it was created - with network file systems other
+ * clients might have already accessed it.
+ */
+ if (IS_ERR(res)) {
+ fi = get_fuse_inode(inode);
+ fuse_sync_release(fi, ff, flags);
+ fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
+ err = PTR_ERR(res);
+ goto out_err;
+ } else {
+ entry = res;
+ if (alias)
+ *alias = res;
+ }
+ }
+ }
kfree(forget);
- d_instantiate(entry, inode);
fuse_change_entry_timeout(entry, &outentry);
- fuse_dir_changed(dir);
+ if (create)
+ fuse_dir_changed(dir);
err = finish_open(file, entry, generic_file_open);
if (err) {
fi = get_fuse_inode(inode);
@@ -634,20 +669,54 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
return err;
}

+/*
+ * Atomic lookup + open
+ */
+
+static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
+ struct dentry **alias, struct file *file,
+ unsigned int flags, umode_t mode)
+{
+ int err;
+ struct fuse_conn *fc = get_fuse_conn(dir);
+
+ if (!fc->do_atomic_open)
+ return -ENOSYS;
+ err = fuse_atomic_open_common(dir, entry, alias, file,
+ flags, mode, FUSE_ATOMIC_OPEN);
+ return err;
+}
+
static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
umode_t, dev_t);
static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
struct file *file, unsigned flags,
umode_t mode)
{
- int err;
+ bool create = (flags & O_CREAT) ? true : false;
struct fuse_conn *fc = get_fuse_conn(dir);
- struct dentry *res = NULL;
+ struct dentry *res = NULL, *alias = NULL;
+ int err;

if (fuse_is_bad(dir))
return -EIO;

- if (d_in_lookup(entry)) {
+ /* Atomic lookup + open - dentry might be File or Directory */
+ if (!create) {
+ err = fuse_do_atomic_open(dir, entry, &alias, file, flags, mode);
+ res = alias;
+ if (!err)
+ goto out_dput;
+ else if (err != -ENOSYS)
+ goto no_open;
+ }
+ /* ENOSYS fall back - user space does not have full atomic open.*/
+
+ /* O_CREAT could be optimized already, but we fear to break some
+ * userspace implementations therefore optimize in case of atomic
+ * open only.
+ */
+ if (!fc->do_atomic_open && d_in_lookup(entry)) {
res = fuse_lookup(dir, entry, 0);
if (IS_ERR(res))
return PTR_ERR(res);
@@ -656,7 +725,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
entry = res;
}

- if (!(flags & O_CREAT) || d_really_is_positive(entry))
+ if (!create || d_really_is_positive(entry))
goto no_open;

/* Only creates */
@@ -664,8 +733,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,

if (fc->no_create)
goto mknod;
-
- err = fuse_create_open(dir, entry, file, flags, mode);
+ /*
+ * If the filesystem doesn't support atomic create + open, then fall
+ * back to separate 'mknod' + 'open' requests.
+ */
+ err = fuse_atomic_open_common(dir, entry, NULL, file, flags, mode,
+ FUSE_CREATE);
if (err == -ENOSYS) {
fc->no_create = 1;
goto mknod;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e59fbdefeb..e4dc68a90b28 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -669,6 +669,9 @@ struct fuse_conn {
/** Is open/release not implemented by fs? */
unsigned no_open:1;

+ /** Does the filesystem support atomic open? */
+ unsigned do_atomic_open:1;
+
/** Is opendir/releasedir not implemented by fs? */
unsigned no_opendir:1;

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ee846ce371d8..5f667de69115 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1190,6 +1190,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
fc->setxattr_ext = 1;
if (flags & FUSE_SECURITY_CTX)
fc->init_security = 1;
+ if (flags & FUSE_DO_ATOMIC_OPEN)
+ fc->do_atomic_open = 1;
} else {
ra_pages = fc->max_read / PAGE_SIZE;
fc->no_lock = 1;
@@ -1235,7 +1237,7 @@ void fuse_send_init(struct fuse_mount *fm)
FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
- FUSE_SECURITY_CTX;
+ FUSE_SECURITY_CTX | FUSE_DO_ATOMIC_OPEN;
#ifdef CONFIG_FUSE_DAX
if (fm->fc->dax)
flags |= FUSE_MAP_ALIGNMENT;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d6ccee961891..a28dd60078ff 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -389,6 +389,7 @@ struct fuse_file_lock {
/* bits 32..63 get shifted down 32 bits into the flags2 field */
#define FUSE_SECURITY_CTX (1ULL << 32)
#define FUSE_HAS_INODE_DAX (1ULL << 33)
+#define FUSE_DO_ATOMIC_OPEN (1ULL << 34)

/**
* CUSE INIT request/reply flags
@@ -537,6 +538,7 @@ enum fuse_opcode {
FUSE_SETUPMAPPING = 48,
FUSE_REMOVEMAPPING = 49,
FUSE_SYNCFS = 50,
+ FUSE_ATOMIC_OPEN = 51,

/* CUSE specific operations */
CUSE_INIT = 4096,
--
2.17.1

2022-03-29 12:46:12

by Dharmendra Singh

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] FUSE: Implement atomic lookup + open

On Tue, Mar 22, 2022 at 5:22 PM Dharmendra Singh <[email protected]> wrote:
>
> In FUSE, as of now, uncached lookups are expensive over the wire.
> E.g additional latencies and stressing (meta data) servers from
> thousands of clients. These lookup calls possibly can be avoided
> in some cases. Incoming two patches addresses this issue.
>
> First patch handles the case where we open first time a file/dir or create
> a file (O_CREAT) but do a lookup first on it. After lookup is performed
> we make another call into libfuse to open the file. Now these two separate
> calls into libfuse can be combined and performed as a single call into
> libfuse.
>
> Second patch handles the case when we are opening an already existing file
> (positive dentry). Before this open call, we re-validate the inode and
> this re-validation does a lookup on the file and verify the inode.
> This separate lookup also can be avoided (for non-dir) and combined
> with open call into libfuse.
>
> Here is the link to the libfuse pull request which implements atomic open
> https://github.com/libfuse/libfuse/pull/644
>
> I am going to post performance results shortly.
>
>
> Dharmendra Singh (2):
> FUSE: Implement atomic lookup + open
> FUSE: Avoid lookup in d_revalidate()

A gentle reminder to look into the above patch set.

2022-04-07 20:00:09

by Dharmendra Singh

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] FUSE: Implement atomic lookup + open

On Tue, Mar 29, 2022 at 4:37 PM Dharmendra Hans <[email protected]> wrote:
>
> On Tue, Mar 22, 2022 at 5:22 PM Dharmendra Singh <[email protected]> wrote:
> >
> > In FUSE, as of now, uncached lookups are expensive over the wire.
> > E.g additional latencies and stressing (meta data) servers from
> > thousands of clients. These lookup calls possibly can be avoided
> > in some cases. Incoming two patches addresses this issue.
> >
> > First patch handles the case where we open first time a file/dir or create
> > a file (O_CREAT) but do a lookup first on it. After lookup is performed
> > we make another call into libfuse to open the file. Now these two separate
> > calls into libfuse can be combined and performed as a single call into
> > libfuse.
> >
> > Second patch handles the case when we are opening an already existing file
> > (positive dentry). Before this open call, we re-validate the inode and
> > this re-validation does a lookup on the file and verify the inode.
> > This separate lookup also can be avoided (for non-dir) and combined
> > with open call into libfuse.
> >
> > Here is the link to the libfuse pull request which implements atomic open
> > https://github.com/libfuse/libfuse/pull/644
> >
> > I am going to post performance results shortly.
> >
> >
> > Dharmendra Singh (2):
> > FUSE: Implement atomic lookup + open
> > FUSE: Avoid lookup in d_revalidate()
>
> A gentle reminder to look into the above patch set.
Sending a gentle reminder again to look into the requested patches.

2022-04-22 19:32:34

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

On Tue, 22 Mar 2022 at 12:52, Dharmendra Singh <[email protected]> wrote:
>
> From: Dharmendra Singh <[email protected]>
>
> There are couple of places in FUSE where we do agressive
> lookup.
> 1) When we go for creating a file (O_CREAT), we do lookup
> for non-existent file. It is very much likely that file
> does not exists yet as O_CREAT is passed to open(). This
> lookup can be avoided and can be performed as part of
> open call into libfuse.
>
> 2) When there is normal open for file/dir (dentry is
> new/negative). In this case since we are anyway going to open
> the file/dir with USER space, avoid this separate lookup call
> into libfuse and combine it with open.
>
> This lookup + open in single call to libfuse and finally to
> USER space has been named as atomic open. It is expected
> that USER space open the file and fills in the attributes
> which are then used to make inode stand/revalidate in the
> kernel cache.
>
> Signed-off-by: Dharmendra Singh <[email protected]>
> ---
> v2 patch includes:
> - disabled o-create atomicity when the user space file system
> does not have an atomic_open implemented. In principle lookups
> for O_CREATE also could be optimized out, but there is a risk
> to break existing fuse file systems. Those file system might
> not expect open O_CREATE calls for exiting files, as these calls
> had been so far avoided as lookup was done first.

So we enabling atomic lookup+create only if FUSE_DO_ATOMIC_OPEN is
set. This logic is a bit confusing as CREATE is unrelated to
ATOMIC_OPEN. It would be cleaner to have a separate flag for atomic
lookup+create. And in fact FUSE_DO_ATOMIC_OPEN could be dropped and
the usual logic of setting fc->no_atomic_open if ENOSYS is returned
could be used instead.

>
> fs/fuse/dir.c | 113 +++++++++++++++++++++++++++++++-------
> fs/fuse/fuse_i.h | 3 +
> fs/fuse/inode.c | 4 +-
> include/uapi/linux/fuse.h | 2 +
> 4 files changed, 101 insertions(+), 21 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 656e921f3506..b2613eb87a4e 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -516,16 +516,14 @@ static int get_security_context(struct dentry *entry, umode_t mode,
> }
>
> /*
> - * Atomic create+open operation
> - *
> - * If the filesystem doesn't support this, then fall back to separate
> - * 'mknod' + 'open' requests.
> + * Perform create + open or lookup + open in single call to libfuse
> */
> -static int fuse_create_open(struct inode *dir, struct dentry *entry,
> - struct file *file, unsigned int flags,
> - umode_t mode)
> +static int fuse_atomic_open_common(struct inode *dir, struct dentry *entry,
> + struct dentry **alias, struct file *file,
> + unsigned int flags, umode_t mode,
> + uint32_t opcode)
> {
> - int err;
> + bool create = (opcode == FUSE_CREATE ? true : false);
> struct inode *inode;
> struct fuse_mount *fm = get_fuse_mount(dir);
> FUSE_ARGS(args);
> @@ -535,11 +533,16 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> struct fuse_entry_out outentry;
> struct fuse_inode *fi;
> struct fuse_file *ff;
> + struct dentry *res = NULL;
> void *security_ctx = NULL;
> u32 security_ctxlen;
> + int err;
> +
> + if (alias)
> + *alias = NULL;
>
> /* Userspace expects S_IFREG in create mode */
> - BUG_ON((mode & S_IFMT) != S_IFREG);
> + BUG_ON(create && (mode & S_IFMT) != S_IFREG);
>
> forget = fuse_alloc_forget();
> err = -ENOMEM;
> @@ -554,7 +557,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> if (!fm->fc->dont_mask)
> mode &= ~current_umask();
>
> - flags &= ~O_NOCTTY;
> + if (!create) {
> + flags = flags & ~(O_CREAT | O_EXCL | O_NOCTTY);

We know O_CREAT and O_EXCL are not set in this case.

> + if (!fm->fc->atomic_o_trunc)
> + flags &= ~O_TRUNC;

I think atomic_open should imply atomic_o_trunc. Not worth
complicating this further with a separate case.

> + } else {
> + flags &= ~O_NOCTTY;
> + }
> memset(&inarg, 0, sizeof(inarg));
> memset(&outentry, 0, sizeof(outentry));
> inarg.flags = flags;
> @@ -566,7 +575,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
> }
>
> - args.opcode = FUSE_CREATE;
> + args.opcode = opcode;
> args.nodeid = get_node_id(dir);
> args.in_numargs = 2;
> args.in_args[0].size = sizeof(inarg);
> @@ -595,8 +604,12 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> if (err)
> goto out_free_ff;
>
> + err = -ENOENT;
> + if (!S_ISDIR(outentry.attr.mode) && !outentry.nodeid)
> + goto out_free_ff;
> +
> err = -EIO;
> - if (!S_ISREG(outentry.attr.mode) || invalid_nodeid(outentry.nodeid) ||
> + if (invalid_nodeid(outentry.nodeid) ||
> fuse_invalid_attr(&outentry.attr))
> goto out_free_ff;
>
> @@ -612,10 +625,32 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> err = -ENOMEM;
> goto out_err;
> }
> + if (!fm->fc->do_atomic_open)
> + d_instantiate(entry, inode);
> + else {
> + res = d_splice_alias(inode, entry);
> + if (res) {
> + /* Close the file in user space, but do not unlink it,
> + * if it was created - with network file systems other
> + * clients might have already accessed it.
> + */
> + if (IS_ERR(res)) {
> + fi = get_fuse_inode(inode);
> + fuse_sync_release(fi, ff, flags);
> + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> + err = PTR_ERR(res);
> + goto out_err;
> + } else {
> + entry = res;
> + if (alias)
> + *alias = res;
> + }
> + }
> + }
> kfree(forget);
> - d_instantiate(entry, inode);
> fuse_change_entry_timeout(entry, &outentry);
> - fuse_dir_changed(dir);
> + if (create)
> + fuse_dir_changed(dir);

This will invalidate the parent even if the file was not created.
Userspace will have to indicate whether the file was created or not as
the kernel won't be able to determine this otherwise. This affects
permission checking as well.

> err = finish_open(file, entry, generic_file_open);
> if (err) {
> fi = get_fuse_inode(inode);
> @@ -634,20 +669,54 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> return err;
> }
>
> +/*
> + * Atomic lookup + open
> + */
> +
> +static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> + struct dentry **alias, struct file *file,
> + unsigned int flags, umode_t mode)
> +{
> + int err;
> + struct fuse_conn *fc = get_fuse_conn(dir);
> +
> + if (!fc->do_atomic_open)
> + return -ENOSYS;
> + err = fuse_atomic_open_common(dir, entry, alias, file,
> + flags, mode, FUSE_ATOMIC_OPEN);
> + return err;
> +}
> +
> static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
> umode_t, dev_t);
> static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> struct file *file, unsigned flags,
> umode_t mode)
> {
> - int err;
> + bool create = (flags & O_CREAT) ? true : false;
> struct fuse_conn *fc = get_fuse_conn(dir);
> - struct dentry *res = NULL;
> + struct dentry *res = NULL, *alias = NULL;
> + int err;
>
> if (fuse_is_bad(dir))
> return -EIO;
>
> - if (d_in_lookup(entry)) {
> + /* Atomic lookup + open - dentry might be File or Directory */
> + if (!create) {
> + err = fuse_do_atomic_open(dir, entry, &alias, file, flags, mode);
> + res = alias;
> + if (!err)
> + goto out_dput;
> + else if (err != -ENOSYS)
> + goto no_open;

The above looks bogus. On error we just want to return that error,
not finish the open.

> + }
> + /* ENOSYS fall back - user space does not have full atomic open.*/
> +
> + /* O_CREAT could be optimized already, but we fear to break some
> + * userspace implementations therefore optimize in case of atomic
> + * open only.
> + */
> + if (!fc->do_atomic_open && d_in_lookup(entry)) {
> res = fuse_lookup(dir, entry, 0);
> if (IS_ERR(res))
> return PTR_ERR(res);
> @@ -656,7 +725,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> entry = res;
> }
>
> - if (!(flags & O_CREAT) || d_really_is_positive(entry))
> + if (!create || d_really_is_positive(entry))
> goto no_open;
>
> /* Only creates */
> @@ -664,8 +733,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>
> if (fc->no_create)
> goto mknod;
> -
> - err = fuse_create_open(dir, entry, file, flags, mode);
> + /*
> + * If the filesystem doesn't support atomic create + open, then fall
> + * back to separate 'mknod' + 'open' requests.
> + */
> + err = fuse_atomic_open_common(dir, entry, NULL, file, flags, mode,
> + FUSE_CREATE);
> if (err == -ENOSYS) {
> fc->no_create = 1;
> goto mknod;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e59fbdefeb..e4dc68a90b28 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -669,6 +669,9 @@ struct fuse_conn {
> /** Is open/release not implemented by fs? */
> unsigned no_open:1;
>
> + /** Does the filesystem support atomic open? */
> + unsigned do_atomic_open:1;
> +
> /** Is opendir/releasedir not implemented by fs? */
> unsigned no_opendir:1;
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index ee846ce371d8..5f667de69115 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1190,6 +1190,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> fc->setxattr_ext = 1;
> if (flags & FUSE_SECURITY_CTX)
> fc->init_security = 1;
> + if (flags & FUSE_DO_ATOMIC_OPEN)
> + fc->do_atomic_open = 1;
> } else {
> ra_pages = fc->max_read / PAGE_SIZE;
> fc->no_lock = 1;
> @@ -1235,7 +1237,7 @@ void fuse_send_init(struct fuse_mount *fm)
> FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
> FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
> FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> - FUSE_SECURITY_CTX;
> + FUSE_SECURITY_CTX | FUSE_DO_ATOMIC_OPEN;
> #ifdef CONFIG_FUSE_DAX
> if (fm->fc->dax)
> flags |= FUSE_MAP_ALIGNMENT;
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d6ccee961891..a28dd60078ff 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -389,6 +389,7 @@ struct fuse_file_lock {
> /* bits 32..63 get shifted down 32 bits into the flags2 field */
> #define FUSE_SECURITY_CTX (1ULL << 32)
> #define FUSE_HAS_INODE_DAX (1ULL << 33)
> +#define FUSE_DO_ATOMIC_OPEN (1ULL << 34)
>
> /**
> * CUSE INIT request/reply flags
> @@ -537,6 +538,7 @@ enum fuse_opcode {
> FUSE_SETUPMAPPING = 48,
> FUSE_REMOVEMAPPING = 49,
> FUSE_SYNCFS = 50,
> + FUSE_ATOMIC_OPEN = 51,
>
> /* CUSE specific operations */
> CUSE_INIT = 4096,
> --
> 2.17.1
>

2022-04-25 09:18:16

by Dharmendra Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

On Fri, Apr 22, 2022 at 8:59 PM Miklos Szeredi <[email protected]> wrote:
>
> On Tue, 22 Mar 2022 at 12:52, Dharmendra Singh <[email protected]> wrote:
> >
> > From: Dharmendra Singh <[email protected]>
> >
> > There are couple of places in FUSE where we do agressive
> > lookup.
> > 1) When we go for creating a file (O_CREAT), we do lookup
> > for non-existent file. It is very much likely that file
> > does not exists yet as O_CREAT is passed to open(). This
> > lookup can be avoided and can be performed as part of
> > open call into libfuse.
> >
> > 2) When there is normal open for file/dir (dentry is
> > new/negative). In this case since we are anyway going to open
> > the file/dir with USER space, avoid this separate lookup call
> > into libfuse and combine it with open.
> >
> > This lookup + open in single call to libfuse and finally to
> > USER space has been named as atomic open. It is expected
> > that USER space open the file and fills in the attributes
> > which are then used to make inode stand/revalidate in the
> > kernel cache.
> >
> > Signed-off-by: Dharmendra Singh <[email protected]>
> > ---
> > v2 patch includes:
> > - disabled o-create atomicity when the user space file system
> > does not have an atomic_open implemented. In principle lookups
> > for O_CREATE also could be optimized out, but there is a risk
> > to break existing fuse file systems. Those file system might
> > not expect open O_CREATE calls for exiting files, as these calls
> > had been so far avoided as lookup was done first.
>
> So we enabling atomic lookup+create only if FUSE_DO_ATOMIC_OPEN is
> set. This logic is a bit confusing as CREATE is unrelated to
> ATOMIC_OPEN. It would be cleaner to have a separate flag for atomic
> lookup+create. And in fact FUSE_DO_ATOMIC_OPEN could be dropped and
> the usual logic of setting fc->no_atomic_open if ENOSYS is returned
> could be used instead.

I am aware that ATOMIC_OPEN is not directly related to CREATE. But
This is more of feature enabling by using the flag. If we do not
FUSE_DO_ATOMIC_OPEN, CREATE calls would not know that it need to
optimize lookup calls otherwise as we know only from open call that
atomic open is implemented. So workloads or performance measuring
applications such as bonnie++ would not be showing improvements for
CREATE, it would not be making 'open' calls. And it is only in open
calls we set fc->do_atomic_open.

>
> >
> > fs/fuse/dir.c | 113 +++++++++++++++++++++++++++++++-------
> > fs/fuse/fuse_i.h | 3 +
> > fs/fuse/inode.c | 4 +-
> > include/uapi/linux/fuse.h | 2 +
> > 4 files changed, 101 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 656e921f3506..b2613eb87a4e 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -516,16 +516,14 @@ static int get_security_context(struct dentry *entry, umode_t mode,
> > }
> >
> > /*
> > - * Atomic create+open operation
> > - *
> > - * If the filesystem doesn't support this, then fall back to separate
> > - * 'mknod' + 'open' requests.
> > + * Perform create + open or lookup + open in single call to libfuse
> > */
> > -static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > - struct file *file, unsigned int flags,
> > - umode_t mode)
> > +static int fuse_atomic_open_common(struct inode *dir, struct dentry *entry,
> > + struct dentry **alias, struct file *file,
> > + unsigned int flags, umode_t mode,
> > + uint32_t opcode)
> > {
> > - int err;
> > + bool create = (opcode == FUSE_CREATE ? true : false);
> > struct inode *inode;
> > struct fuse_mount *fm = get_fuse_mount(dir);
> > FUSE_ARGS(args);
> > @@ -535,11 +533,16 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > struct fuse_entry_out outentry;
> > struct fuse_inode *fi;
> > struct fuse_file *ff;
> > + struct dentry *res = NULL;
> > void *security_ctx = NULL;
> > u32 security_ctxlen;
> > + int err;
> > +
> > + if (alias)
> > + *alias = NULL;
> >
> > /* Userspace expects S_IFREG in create mode */
> > - BUG_ON((mode & S_IFMT) != S_IFREG);
> > + BUG_ON(create && (mode & S_IFMT) != S_IFREG);
> >
> > forget = fuse_alloc_forget();
> > err = -ENOMEM;
> > @@ -554,7 +557,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > if (!fm->fc->dont_mask)
> > mode &= ~current_umask();
> >
> > - flags &= ~O_NOCTTY;
> > + if (!create) {
> > + flags = flags & ~(O_CREAT | O_EXCL | O_NOCTTY);
>
> We know O_CREAT and O_EXCL are not set in this case.

Would remove it

>
> > + if (!fm->fc->atomic_o_trunc)
> > + flags &= ~O_TRUNC;
>
> I think atomic_open should imply atomic_o_trunc. Not worth
> complicating this further with a separate case.

I see. So if atomic open is enabled, we should be truncating file as
part of this atomic open call itself despite fc->atomic_o_trunc is
set or not. Would make changes here.

>
> > + } else {
> > + flags &= ~O_NOCTTY;
> > + }
> > memset(&inarg, 0, sizeof(inarg));
> > memset(&outentry, 0, sizeof(outentry));
> > inarg.flags = flags;
> > @@ -566,7 +575,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
> > }
> >
> > - args.opcode = FUSE_CREATE;
> > + args.opcode = opcode;
> > args.nodeid = get_node_id(dir);
> > args.in_numargs = 2;
> > args.in_args[0].size = sizeof(inarg);
> > @@ -595,8 +604,12 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > if (err)
> > goto out_free_ff;
> >
> > + err = -ENOENT;
> > + if (!S_ISDIR(outentry.attr.mode) && !outentry.nodeid)
> > + goto out_free_ff;
> > +
> > err = -EIO;
> > - if (!S_ISREG(outentry.attr.mode) || invalid_nodeid(outentry.nodeid) ||
> > + if (invalid_nodeid(outentry.nodeid) ||
> > fuse_invalid_attr(&outentry.attr))
> > goto out_free_ff;
> >
> > @@ -612,10 +625,32 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > err = -ENOMEM;
> > goto out_err;
> > }
> > + if (!fm->fc->do_atomic_open)
> > + d_instantiate(entry, inode);
> > + else {
> > + res = d_splice_alias(inode, entry);
> > + if (res) {
> > + /* Close the file in user space, but do not unlink it,
> > + * if it was created - with network file systems other
> > + * clients might have already accessed it.
> > + */
> > + if (IS_ERR(res)) {
> > + fi = get_fuse_inode(inode);
> > + fuse_sync_release(fi, ff, flags);
> > + fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> > + err = PTR_ERR(res);
> > + goto out_err;
> > + } else {
> > + entry = res;
> > + if (alias)
> > + *alias = res;
> > + }
> > + }
> > + }
> > kfree(forget);
> > - d_instantiate(entry, inode);
> > fuse_change_entry_timeout(entry, &outentry);
> > - fuse_dir_changed(dir);
> > + if (create)
> > + fuse_dir_changed(dir);
>
> This will invalidate the parent even if the file was not created.
> Userspace will have to indicate whether the file was created or not as
> the kernel won't be able to determine this otherwise. This affects
> permission checking as well.

Thanks, I see. I would check if we can pass a flag from libfuse to
fuse kernel and check here for the same in case file was actually
created or not.

> > err = finish_open(file, entry, generic_file_open);
> > if (err) {
> > fi = get_fuse_inode(inode);
> > @@ -634,20 +669,54 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > return err;
> > }
> >
> > +/*
> > + * Atomic lookup + open
> > + */
> > +
> > +static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> > + struct dentry **alias, struct file *file,
> > + unsigned int flags, umode_t mode)
> > +{
> > + int err;
> > + struct fuse_conn *fc = get_fuse_conn(dir);
> > +
> > + if (!fc->do_atomic_open)
> > + return -ENOSYS;
> > + err = fuse_atomic_open_common(dir, entry, alias, file,
> > + flags, mode, FUSE_ATOMIC_OPEN);
> > + return err;
> > +}
> > +
> > static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
> > umode_t, dev_t);
> > static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> > struct file *file, unsigned flags,
> > umode_t mode)
> > {
> > - int err;
> > + bool create = (flags & O_CREAT) ? true : false;
> > struct fuse_conn *fc = get_fuse_conn(dir);
> > - struct dentry *res = NULL;
> > + struct dentry *res = NULL, *alias = NULL;
> > + int err;
> >
> > if (fuse_is_bad(dir))
> > return -EIO;
> >
> > - if (d_in_lookup(entry)) {
> > + /* Atomic lookup + open - dentry might be File or Directory */
> > + if (!create) {
> > + err = fuse_do_atomic_open(dir, entry, &alias, file, flags, mode);
> > + res = alias;
> > + if (!err)
> > + goto out_dput;
> > + else if (err != -ENOSYS)
> > + goto no_open;
>
> The above looks bogus. On error we just want to return that error,
> not finish the open.

Thanks for pointing it out. I would handle error return.

2022-04-25 15:53:50

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

On Mon, 25 Apr 2022 at 07:26, Dharmendra Hans <[email protected]> wrote:
>
> On Fri, Apr 22, 2022 at 8:59 PM Miklos Szeredi <[email protected]> wrote:
> >
> > On Tue, 22 Mar 2022 at 12:52, Dharmendra Singh <[email protected]> wrote:
> > >
> > > From: Dharmendra Singh <[email protected]>
> > >
> > > There are couple of places in FUSE where we do agressive
> > > lookup.
> > > 1) When we go for creating a file (O_CREAT), we do lookup
> > > for non-existent file. It is very much likely that file
> > > does not exists yet as O_CREAT is passed to open(). This
> > > lookup can be avoided and can be performed as part of
> > > open call into libfuse.
> > >
> > > 2) When there is normal open for file/dir (dentry is
> > > new/negative). In this case since we are anyway going to open
> > > the file/dir with USER space, avoid this separate lookup call
> > > into libfuse and combine it with open.
> > >
> > > This lookup + open in single call to libfuse and finally to
> > > USER space has been named as atomic open. It is expected
> > > that USER space open the file and fills in the attributes
> > > which are then used to make inode stand/revalidate in the
> > > kernel cache.
> > >
> > > Signed-off-by: Dharmendra Singh <[email protected]>
> > > ---
> > > v2 patch includes:
> > > - disabled o-create atomicity when the user space file system
> > > does not have an atomic_open implemented. In principle lookups
> > > for O_CREATE also could be optimized out, but there is a risk
> > > to break existing fuse file systems. Those file system might
> > > not expect open O_CREATE calls for exiting files, as these calls
> > > had been so far avoided as lookup was done first.
> >
> > So we enabling atomic lookup+create only if FUSE_DO_ATOMIC_OPEN is
> > set. This logic is a bit confusing as CREATE is unrelated to
> > ATOMIC_OPEN. It would be cleaner to have a separate flag for atomic
> > lookup+create. And in fact FUSE_DO_ATOMIC_OPEN could be dropped and
> > the usual logic of setting fc->no_atomic_open if ENOSYS is returned
> > could be used instead.
>
> I am aware that ATOMIC_OPEN is not directly related to CREATE. But
> This is more of feature enabling by using the flag. If we do not
> FUSE_DO_ATOMIC_OPEN, CREATE calls would not know that it need to
> optimize lookup calls otherwise as we know only from open call that
> atomic open is implemented.

Right. So because the atomic lookup+crteate would need a new flag to
return whether the file was created or not, this is probably better
implemented as a completely new request type (FUSE_ATOMIC_CREATE?)

No new INIT flags needed at all, since we can use the ENOSYS mechanism
to determine whether the filesystem has atomic open/create ops or not.

Thanks,
Miklos

2022-04-25 17:20:46

by Dharmendra Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

On Mon, Apr 25, 2022 at 1:08 PM Miklos Szeredi <[email protected]> wrote:
>
> On Mon, 25 Apr 2022 at 07:26, Dharmendra Hans <[email protected]> wrote:
> >
> > On Fri, Apr 22, 2022 at 8:59 PM Miklos Szeredi <[email protected]> wrote:
> > >
> > > On Tue, 22 Mar 2022 at 12:52, Dharmendra Singh <[email protected]> wrote:
> > > >
> > > > From: Dharmendra Singh <[email protected]>
> > > >
> > > > There are couple of places in FUSE where we do agressive
> > > > lookup.
> > > > 1) When we go for creating a file (O_CREAT), we do lookup
> > > > for non-existent file. It is very much likely that file
> > > > does not exists yet as O_CREAT is passed to open(). This
> > > > lookup can be avoided and can be performed as part of
> > > > open call into libfuse.
> > > >
> > > > 2) When there is normal open for file/dir (dentry is
> > > > new/negative). In this case since we are anyway going to open
> > > > the file/dir with USER space, avoid this separate lookup call
> > > > into libfuse and combine it with open.
> > > >
> > > > This lookup + open in single call to libfuse and finally to
> > > > USER space has been named as atomic open. It is expected
> > > > that USER space open the file and fills in the attributes
> > > > which are then used to make inode stand/revalidate in the
> > > > kernel cache.
> > > >
> > > > Signed-off-by: Dharmendra Singh <[email protected]>
> > > > ---
> > > > v2 patch includes:
> > > > - disabled o-create atomicity when the user space file system
> > > > does not have an atomic_open implemented. In principle lookups
> > > > for O_CREATE also could be optimized out, but there is a risk
> > > > to break existing fuse file systems. Those file system might
> > > > not expect open O_CREATE calls for exiting files, as these calls
> > > > had been so far avoided as lookup was done first.
> > >
> > > So we enabling atomic lookup+create only if FUSE_DO_ATOMIC_OPEN is
> > > set. This logic is a bit confusing as CREATE is unrelated to
> > > ATOMIC_OPEN. It would be cleaner to have a separate flag for atomic
> > > lookup+create. And in fact FUSE_DO_ATOMIC_OPEN could be dropped and
> > > the usual logic of setting fc->no_atomic_open if ENOSYS is returned
> > > could be used instead.
> >
> > I am aware that ATOMIC_OPEN is not directly related to CREATE. But
> > This is more of feature enabling by using the flag. If we do not
> > FUSE_DO_ATOMIC_OPEN, CREATE calls would not know that it need to
> > optimize lookup calls otherwise as we know only from open call that
> > atomic open is implemented.
>
> Right. So because the atomic lookup+crteate would need a new flag to
> return whether the file was created or not, this is probably better
> implemented as a completely new request type (FUSE_ATOMIC_CREATE?)
>
> No new INIT flags needed at all, since we can use the ENOSYS mechanism
> to determine whether the filesystem has atomic open/create ops or not.

Yes, it sounds good to have a separate request type for CREATE. I
would separate out the patch into two for create and open. Will omit
INIT flags. Also, I would change libfuse code accordingly.

2022-04-29 15:53:56

by Dharmendra Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] FUSE: Implement atomic lookup + open

On Mon, Apr 25, 2022 at 4:13 PM Dharmendra Hans <[email protected]> wrote:
>
> On Mon, Apr 25, 2022 at 1:08 PM Miklos Szeredi <[email protected]> wrote:
> >
> > On Mon, 25 Apr 2022 at 07:26, Dharmendra Hans <[email protected]> wrote:
> > >
> > > On Fri, Apr 22, 2022 at 8:59 PM Miklos Szeredi <[email protected]> wrote:
> > > >
> > > > On Tue, 22 Mar 2022 at 12:52, Dharmendra Singh <[email protected]> wrote:
> > > > >
> > > > > From: Dharmendra Singh <[email protected]>
> > > > >
> > > > > There are couple of places in FUSE where we do agressive
> > > > > lookup.
> > > > > 1) When we go for creating a file (O_CREAT), we do lookup
> > > > > for non-existent file. It is very much likely that file
> > > > > does not exists yet as O_CREAT is passed to open(). This
> > > > > lookup can be avoided and can be performed as part of
> > > > > open call into libfuse.
> > > > >
> > > > > 2) When there is normal open for file/dir (dentry is
> > > > > new/negative). In this case since we are anyway going to open
> > > > > the file/dir with USER space, avoid this separate lookup call
> > > > > into libfuse and combine it with open.
> > > > >
> > > > > This lookup + open in single call to libfuse and finally to
> > > > > USER space has been named as atomic open. It is expected
> > > > > that USER space open the file and fills in the attributes
> > > > > which are then used to make inode stand/revalidate in the
> > > > > kernel cache.
> > > > >
> > > > > Signed-off-by: Dharmendra Singh <[email protected]>
> > > > > ---
> > > > > v2 patch includes:
> > > > > - disabled o-create atomicity when the user space file system
> > > > > does not have an atomic_open implemented. In principle lookups
> > > > > for O_CREATE also could be optimized out, but there is a risk
> > > > > to break existing fuse file systems. Those file system might
> > > > > not expect open O_CREATE calls for exiting files, as these calls
> > > > > had been so far avoided as lookup was done first.
> > > >
> > > > So we enabling atomic lookup+create only if FUSE_DO_ATOMIC_OPEN is
> > > > set. This logic is a bit confusing as CREATE is unrelated to
> > > > ATOMIC_OPEN. It would be cleaner to have a separate flag for atomic
> > > > lookup+create. And in fact FUSE_DO_ATOMIC_OPEN could be dropped and
> > > > the usual logic of setting fc->no_atomic_open if ENOSYS is returned
> > > > could be used instead.
> > >
> > > I am aware that ATOMIC_OPEN is not directly related to CREATE. But
> > > This is more of feature enabling by using the flag. If we do not
> > > FUSE_DO_ATOMIC_OPEN, CREATE calls would not know that it need to
> > > optimize lookup calls otherwise as we know only from open call that
> > > atomic open is implemented.
> >
> > Right. So because the atomic lookup+crteate would need a new flag to
> > return whether the file was created or not, this is probably better
> > implemented as a completely new request type (FUSE_ATOMIC_CREATE?)
> >
> > No new INIT flags needed at all, since we can use the ENOSYS mechanism
> > to determine whether the filesystem has atomic open/create ops or not.
>
> Yes, it sounds good to have a separate request type for CREATE. I
> would separate out the patch into two for create and open. Will omit
> INIT flags. Also, I would change libfuse code accordingly.

Actually when writing the code, I observe that not having INIT flags
works fine for atomic create but it does not work well for atomic
open case considering specially 3rd patch which optimises
d_revalidate() lookups.
(https://lore.kernel.org/linux-fsdevel/[email protected]/,
we did not receive any comments on it so far).
So it looks like we need INIT flags in atomic open case at least
considering that 3rd patch would go in as well.