2014-06-06 13:27:12

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 0/5] fuse: close file synchronously (v2)

Hi,

There is a long-standing demand for synchronous behaviour of fuse_release:

http://sourceforge.net/mailarchive/message.php?msg_id=19343889
http://sourceforge.net/mailarchive/message.php?msg_id=29814693

A year ago Avati and me explained why such a feature would be useful:

http://sourceforge.net/mailarchive/message.php?msg_id=29889055
http://sourceforge.net/mailarchive/message.php?msg_id=29867423

In short, the problem is that fuse_release (that's called on last user
close(2)) sends FUSE_RELEASE to userspace and returns without waiting for
ACK from userspace. Consequently, there is a gap when user regards the
file released while userspace fuse is still working on it. An attempt to
access the file from another node leads to complicated synchronization
problems because the first node still "holds" the file.

The patch-set resolves the problem by making fuse_release synchronous:
wait for ACK from userspace for FUSE_RELEASE if the feature is ON.

To keep single-threaded userspace implementations happy the patch-set
ensures that by the time fuse_release_common calls fuse_file_put, no
more in-flight I/O exists. Asynchronous fuse callbacks (like
fuse_readpages_end) cannot trigger FUSE_RELEASE anymore. Hence, we'll
never block in contexts other than close().

Changed in v2:
- improved comments, commented spin_unlock_wait out according to Brian'
suggestions.
- rebased on v3.15-rc8 tag of Linus' tree.

Thanks,
Maxim

---

Maxim Patlasov (5):
fuse: add close_wait flag to fuse_conn
fuse: cosmetic rework of fuse_send_readpages
fuse: wait for end of IO on release
fuse: enable close_wait feature
fuse: fix synchronous case of fuse_file_put()


fs/fuse/file.c | 100 ++++++++++++++++++++++++++++++++++++++-------
fs/fuse/fuse_i.h | 3 +
fs/fuse/inode.c | 4 +-
include/uapi/linux/fuse.h | 3 +
4 files changed, 93 insertions(+), 17 deletions(-)

--
Signature


2014-06-06 13:27:43

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 1/5] fuse: add close_wait flag to fuse_conn

The feature will be governed by fc->close_wait. Userspace can enable it in
the same way as auto_inval_data or any other kernel fuse capability.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/fuse_i.h | 3 +++
fs/fuse/inode.c | 4 +++-
include/uapi/linux/fuse.h | 3 +++
3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7aa5c75..434ff08 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -557,6 +557,9 @@ struct fuse_conn {
/** Does the filesystem support asynchronous direct-IO submission? */
unsigned async_dio:1;

+ /** Wait for response from daemon on close */
+ unsigned close_wait:1;
+
/** The number of requests waiting for completion */
atomic_t num_waiting;

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 754dcf2..580d1b3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -898,6 +898,8 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
else
fc->sb->s_time_gran = 1000000000;

+ if (arg->flags & FUSE_CLOSE_WAIT)
+ fc->close_wait = 1;
} else {
ra_pages = fc->max_read / PAGE_CACHE_SIZE;
fc->no_lock = 1;
@@ -926,7 +928,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req)
FUSE_SPLICE_WRITE | FUSE_SPLICE_MOVE | FUSE_SPLICE_READ |
FUSE_FLOCK_LOCKS | FUSE_IOCTL_DIR | FUSE_AUTO_INVAL_DATA |
FUSE_DO_READDIRPLUS | FUSE_READDIRPLUS_AUTO | FUSE_ASYNC_DIO |
- FUSE_WRITEBACK_CACHE;
+ FUSE_WRITEBACK_CACHE | FUSE_CLOSE_WAIT;
req->in.h.opcode = FUSE_INIT;
req->in.numargs = 1;
req->in.args[0].size = sizeof(*arg);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 40b5ca8..1e1b6fa 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -101,6 +101,7 @@
* - add FATTR_CTIME
* - add ctime and ctimensec to fuse_setattr_in
* - add FUSE_RENAME2 request
+ * - add FUSE_CLOSE_WAIT
*/

#ifndef _LINUX_FUSE_H
@@ -229,6 +230,7 @@ struct fuse_file_lock {
* FUSE_READDIRPLUS_AUTO: adaptive readdirplus
* FUSE_ASYNC_DIO: asynchronous direct I/O submission
* FUSE_WRITEBACK_CACHE: use writeback cache for buffered writes
+ * FUSE_CLOSE_WAIT: wait for response from daemon on close
*/
#define FUSE_ASYNC_READ (1 << 0)
#define FUSE_POSIX_LOCKS (1 << 1)
@@ -247,6 +249,7 @@ struct fuse_file_lock {
#define FUSE_READDIRPLUS_AUTO (1 << 14)
#define FUSE_ASYNC_DIO (1 << 15)
#define FUSE_WRITEBACK_CACHE (1 << 16)
+#define FUSE_CLOSE_WAIT (1 << 17)

/**
* CUSE INIT request/reply flags

2014-06-06 13:28:28

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages

The patch change arguments of fuse_send_readpages to give it access to inode
(will be used in the next patch of patch-set). The change is cosmetic,
no logic changed.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 96d513e..b81a945 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -827,8 +827,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
fuse_file_put(req->ff, false);
}

-static void fuse_send_readpages(struct fuse_req *req, struct file *file)
+struct fuse_fill_data {
+ struct fuse_req *req;
+ struct file *file;
+ struct inode *inode;
+ unsigned nr_pages;
+};
+
+static void fuse_send_readpages(struct fuse_fill_data *data)
{
+ struct fuse_req *req = data->req;
+ struct file *file = data->file;
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fc;
loff_t pos = page_offset(req->pages[0]);
@@ -850,13 +859,6 @@ static void fuse_send_readpages(struct fuse_req *req, struct file *file)
}
}

-struct fuse_fill_data {
- struct fuse_req *req;
- struct file *file;
- struct inode *inode;
- unsigned nr_pages;
-};
-
static int fuse_readpages_fill(void *_data, struct page *page)
{
struct fuse_fill_data *data = _data;
@@ -872,7 +874,7 @@ static int fuse_readpages_fill(void *_data, struct page *page)
req->pages[req->num_pages - 1]->index + 1 != page->index)) {
int nr_alloc = min_t(unsigned, data->nr_pages,
FUSE_MAX_PAGES_PER_REQ);
- fuse_send_readpages(req, data->file);
+ fuse_send_readpages(data);
if (fc->async_read)
req = fuse_get_req_for_background(fc, nr_alloc);
else
@@ -925,7 +927,7 @@ static int fuse_readpages(struct file *file, struct address_space *mapping,
err = read_cache_pages(mapping, pages, fuse_readpages_fill, &data);
if (!err) {
if (data.req->num_pages)
- fuse_send_readpages(data.req, file);
+ fuse_send_readpages(&data);
else
fuse_put_request(fc, data.req);
}

2014-06-06 13:29:21

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 3/5] fuse: wait for end of IO on release

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 66 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b81a945..d50af99 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
}
}

+/*
+ * Asynchronous callbacks may use it instead of fuse_file_put() because
+ * we guarantee that they are never last holders of ff. Hitting BUG() below
+ * will make clear any violation of the guarantee.
+ */
+static void __fuse_file_put(struct fuse_file *ff)
+{
+ if (atomic_dec_and_test(&ff->count))
+ BUG();
+}
+
int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
bool isdir)
{
@@ -302,6 +313,13 @@ void fuse_release_common(struct file *file, int opcode)
req->misc.release.path = file->f_path;

/*
+ * No more in-flight asynchronous READ or WRITE requests if
+ * fuse file release is synchronous
+ */
+ if (ff->fc->close_wait)
+ BUG_ON(atomic_read(&ff->count) != 1);
+
+ /*
* Normally this will send the RELEASE request, however if
* some asynchronous READ or WRITE requests are outstanding,
* the sending will be delayed.
@@ -321,11 +339,34 @@ static int fuse_open(struct inode *inode, struct file *file)
static int fuse_release(struct inode *inode, struct file *file)
{
struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_file *ff = file->private_data;

/* see fuse_vma_close() for !writeback_cache case */
if (fc->writeback_cache)
write_inode_now(inode, 1);

+ if (ff->fc->close_wait) {
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ /*
+ * Must remove file from write list. Otherwise it is possible
+ * this file will get more writeback from another files
+ * rerouted via write_files.
+ */
+ spin_lock(&ff->fc->lock);
+ list_del_init(&ff->write_entry);
+ spin_unlock(&ff->fc->lock);
+
+ wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+ /*
+ * spin_unlock_wait(&ff->fc->lock) would be natural here to
+ * wait for threads just released ff to leave their critical
+ * sections. But taking spinlock is the first thing
+ * fuse_release_common does, so that this is unnecessary.
+ */
+ }
+
fuse_release_common(file, FUSE_RELEASE);

/* return value is ignored by VFS */
@@ -823,8 +864,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
unlock_page(page);
page_cache_release(page);
}
- if (req->ff)
- fuse_file_put(req->ff, false);
+ if (req->ff) {
+ if (fc->close_wait) {
+ struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+ spin_lock(&fc->lock);
+ __fuse_file_put(req->ff);
+ wake_up(&fi->page_waitq);
+ spin_unlock(&fc->lock);
+ } else
+ fuse_file_put(req->ff, false);
+ }
}

struct fuse_fill_data {
@@ -851,6 +901,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
if (fc->async_read) {
req->ff = fuse_file_get(ff);
req->end = fuse_readpages_end;
+ req->inode = data->inode;
fuse_request_send_background(fc, req);
} else {
fuse_request_send(fc, req);
@@ -1537,7 +1588,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
for (i = 0; i < req->num_pages; i++)
__free_page(req->pages[i]);

- if (req->ff)
+ if (req->ff && !fc->close_wait)
fuse_file_put(req->ff, false);
}

@@ -1554,6 +1605,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
bdi_writeout_inc(bdi);
}
+ if (fc->close_wait)
+ __fuse_file_put(req->ff);
wake_up(&fi->page_waitq);
}

@@ -1694,8 +1747,16 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)

ff = __fuse_write_file_get(fc, fi);
err = fuse_flush_times(inode, ff);
- if (ff)
- fuse_file_put(ff, 0);
+ if (ff) {
+ if (fc->close_wait) {
+ spin_lock(&fc->lock);
+ __fuse_file_put(ff);
+ wake_up(&fi->page_waitq);
+ spin_unlock(&fc->lock);
+
+ } else
+ fuse_file_put(ff, false);
+ }

return err;
}

2014-06-06 13:30:04

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 4/5] fuse: enable close_wait feature

The patch enables feature by passing 'true' to fuse_file_put in
fuse_release_common.

Previously, this was safe only in special cases when we sure that
multi-threaded userspace won't deadlock if we'll synchronously send
FUSE_RELEASE in the context of read-ahead or write-back callback. Now, it's
always safe because callbacks don't send requests to userspace anymore.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d50af99..783cb52 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -328,7 +328,8 @@ void fuse_release_common(struct file *file, int opcode)
* synchronous RELEASE is allowed (and desirable) in this case
* because the server can be trusted not to screw up.
*/
- fuse_file_put(ff, ff->fc->destroy_req != NULL);
+ fuse_file_put(ff, ff->fc->destroy_req != NULL ||
+ ff->fc->close_wait);
}

static int fuse_open(struct inode *inode, struct file *file)

2014-06-06 13:31:34

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 5/5] fuse: fix synchronous case of fuse_file_put()

If fuse_file_put() is called with sync==true, the user may be blocked for
a while, until userspace ACKs our FUSE_RELEASE request. This blocking must be
uninterruptible. Otherwise request could be interrupted, but file association
in user space remains.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 783cb52..9f38568 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -136,6 +136,10 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
path_put(&req->misc.release.path);
fuse_put_request(ff->fc, req);
} else if (sync) {
+ /* Must force. Otherwise request could be interrupted,
+ * but file association in user space remains.
+ */
+ req->force = 1;
req->background = 0;
fuse_request_send(ff->fc, req);
path_put(&req->misc.release.path);

2014-06-06 13:51:05

by John Muir

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:

> The patch-set resolves the problem by making fuse_release synchronous:
> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.

Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?

John.

--
John Muir - [email protected]
+32 491 64 22 76

2014-06-09 07:50:16

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 06/06/2014 05:51 PM, John Muir wrote:
> On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:
>
>> The patch-set resolves the problem by making fuse_release synchronous:
>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
> Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?

I don't expect a great demand for such a granularity. File-system global
"close_wait" conveys a general user expectation about filesystem
behaviour in distributed environment: if you stopped using a file on
given node, whether it means that the file is immediately accessible
from another node.

Maxim

>
> John.
>
> --
> John Muir - [email protected]
> +32 491 64 22 76
>
>
>

2014-06-09 09:36:08

by John Muir

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 2014.06.09, at 9:50 , Maxim Patlasov <[email protected]> wrote:

> On 06/06/2014 05:51 PM, John Muir wrote:
>> On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:
>>
>>> The patch-set resolves the problem by making fuse_release synchronous:
>>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>> Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?
>
> I don't expect a great demand for such a granularity. File-system global "close_wait" conveys a general user expectation about filesystem behaviour in distributed environment: if you stopped using a file on given node, whether it means that the file is immediately accessible from another node.
>

By user do you mean the end-user, or the implementor of the file-system? It seems to me that the end-user doesn't care, and just wants the file-system to work as expected. I don't think we're really talking about the end-user.

The implementor of a file-system, on the other hand, might want the semantics for close_wait on some files, but not on others. Won't there be a performance impact? Some distributed file-systems might want this on specific files only. Implementing it as a flag on the struct fuse_file_info gives the flexibility to the file-system implementor.

Regards,

John.

--
John Muir - [email protected]
+32 491 64 22 76-

2014-06-09 10:46:12

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 06/09/2014 01:26 PM, John Muir wrote:
> On 2014.06.09, at 9:50 , Maxim Patlasov <[email protected]> wrote:
>
>> On 06/06/2014 05:51 PM, John Muir wrote:
>>> On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:
>>>
>>>> The patch-set resolves the problem by making fuse_release synchronous:
>>>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>>> Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?
>> I don't expect a great demand for such a granularity. File-system global "close_wait" conveys a general user expectation about filesystem behaviour in distributed environment: if you stopped using a file on given node, whether it means that the file is immediately accessible from another node.
>>
> By user do you mean the end-user, or the implementor of the file-system? It seems to me that the end-user doesn't care, and just wants the file-system to work as expected. I don't think we're really talking about the end-user.

No, this is exactly about end-user expectations. Imagine a complicated
heavy-loaded shared storage where handling FUSE_RELEASE in userspace may
take a few minutes. In close_wait=0 case, an end-user who has just
called close(2) has no idea when it's safe to access the file from
another node or even when it's OK to umount filesystem.

>
> The implementor of a file-system, on the other hand, might want the semantics for close_wait on some files, but not on others. Won't there be a performance impact? Some distributed file-systems might want this on specific files only. Implementing it as a flag on the struct fuse_file_info gives the flexibility to the file-system implementor.

fuse_file_info is an userspace structure, in-kernel fuse knows nothing
about it. In close_wait=1 case, nothing prevents a file-system
implementation from ACK-ing FUSE_RELEASE request immediately (for
specific files) and schedule actual handling for future processing.

Thanks,
Maxim

2014-06-09 11:11:11

by John Muir

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 2014.06.09, at 12:46 , Maxim Patlasov <[email protected]> wrote:

> On 06/09/2014 01:26 PM, John Muir wrote:
>> On 2014.06.09, at 9:50 , Maxim Patlasov <[email protected]> wrote:
>>
>>> On 06/06/2014 05:51 PM, John Muir wrote:
>>>> On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:
>>>>
>>>>> The patch-set resolves the problem by making fuse_release synchronous:
>>>>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>>>> Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?
>>> I don't expect a great demand for such a granularity. File-system global "close_wait" conveys a general user expectation about filesystem behaviour in distributed environment: if you stopped using a file on given node, whether it means that the file is immediately accessible from another node.
>>>
>> By user do you mean the end-user, or the implementor of the file-system? It seems to me that the end-user doesn't care, and just wants the file-system to work as expected. I don't think we're really talking about the end-user.
>
> No, this is exactly about end-user expectations. Imagine a complicated heavy-loaded shared storage where handling FUSE_RELEASE in userspace may take a few minutes. In close_wait=0 case, an end-user who has just called close(2) has no idea when it's safe to access the file from another node or even when it's OK to umount filesystem.

I think we're saying the same thing here from different perspectives. The end-user wants the file-system to operate with the semantics you describe, but I don't think it makes sense to give the end-user control over those semantics. The file-system itself should be implemented that way, or not, or per-file

If it's a read-only file, then does this not add the overhead of having the kernel wait for the user-space file-system process to respond before closing it. In my experience, there is actually significant cost to the kernel to user-space messaging in FUSE when manipulating thousands of files.

>
>>
>> The implementor of a file-system, on the other hand, might want the semantics for close_wait on some files, but not on others. Won't there be a performance impact? Some distributed file-systems might want this on specific files only. Implementing it as a flag on the struct fuse_file_info gives the flexibility to the file-system implementor.
>
> fuse_file_info is an userspace structure, in-kernel fuse knows nothing about it. In close_wait=1 case, nothing prevents a file-system implementation from ACK-ing FUSE_RELEASE request immediately (for specific files) and schedule actual handling for future processing.

Of course you know I meant that you'd add another flag to both fuse_file_info, and in the kernel equivalent for those flags which is struct fuse_open_out -> open_flags. This is where other such per file options are specified such as whether or not to keep the in-kernal cache for a file, whether or not to allow direct-io, and whether or not to allow seek.

Anyway, I guess you're the one doing all the work on this and if you have a particular implementation that doesn't require such fine-grained control, and no one else does then it's up to you. I'm just trying to show an alternative implementation that gives the file-system implementor more control while keeping the ability to meet user expectations.

Regards,

John.

--
John Muir - [email protected]
+32 491 64 22 76

2014-06-09 12:00:17

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/5] fuse: close file synchronously (v2)

On 06/09/2014 03:11 PM, John Muir wrote:
> On 2014.06.09, at 12:46 , Maxim Patlasov <[email protected]> wrote:
>
>> On 06/09/2014 01:26 PM, John Muir wrote:
>>> On 2014.06.09, at 9:50 , Maxim Patlasov <[email protected]> wrote:
>>>
>>>> On 06/06/2014 05:51 PM, John Muir wrote:
>>>>> On 2014.06.06, at 15:27 , Maxim Patlasov <[email protected]> wrote:
>>>>>
>>>>>> The patch-set resolves the problem by making fuse_release synchronous:
>>>>>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>>>>> Why not make this feature per-file with a new flag bit in struct fuse_file_info rather than as a file-system global?
>>>> I don't expect a great demand for such a granularity. File-system global "close_wait" conveys a general user expectation about filesystem behaviour in distributed environment: if you stopped using a file on given node, whether it means that the file is immediately accessible from another node.
>>>>
>>> By user do you mean the end-user, or the implementor of the file-system? It seems to me that the end-user doesn't care, and just wants the file-system to work as expected. I don't think we're really talking about the end-user.
>> No, this is exactly about end-user expectations. Imagine a complicated heavy-loaded shared storage where handling FUSE_RELEASE in userspace may take a few minutes. In close_wait=0 case, an end-user who has just called close(2) has no idea when it's safe to access the file from another node or even when it's OK to umount filesystem.
> I think we're saying the same thing here from different perspectives. The end-user wants the file-system to operate with the semantics you describe, but I don't think it makes sense to give the end-user control over those semantics. The file-system itself should be implemented that way, or not, or per-file
>
> If it's a read-only file, then does this not add the overhead of having the kernel wait for the user-space file-system process to respond before closing it. In my experience, there is actually significant cost to the kernel to user-space messaging in FUSE when manipulating thousands of files.
>
>>> The implementor of a file-system, on the other hand, might want the semantics for close_wait on some files, but not on others. Won't there be a performance impact? Some distributed file-systems might want this on specific files only. Implementing it as a flag on the struct fuse_file_info gives the flexibility to the file-system implementor.
>> fuse_file_info is an userspace structure, in-kernel fuse knows nothing about it. In close_wait=1 case, nothing prevents a file-system implementation from ACK-ing FUSE_RELEASE request immediately (for specific files) and schedule actual handling for future processing.
> Of course you know I meant that you'd add another flag to both fuse_file_info, and in the kernel equivalent for those flags which is struct fuse_open_out -> open_flags. This is where other such per file options are specified such as whether or not to keep the in-kernal cache for a file, whether or not to allow direct-io, and whether or not to allow seek.
>
> Anyway, I guess you're the one doing all the work on this and if you have a particular implementation that doesn't require such fine-grained control, and no one else does then it's up to you. I'm just trying to show an alternative implementation that gives the file-system implementor more control while keeping the ability to meet user expectations.

Thank you, John. That's really depends on whether someone else wants
fine-grained control or not. I'm generally OK to re-work the patch-set
if more requesters emerge.

Thanks,
Maxim