2013-03-21 14:03:41

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 0/4] fuse: fix accounting background requests (v2)

Hi,

The feature was added long time ago (commit 08a53cdc...) with the comment:

> A task may have at most one synchronous request allocated. So these requests
> need not be otherwise limited.
>
> However the number of background requests (release, forget, asynchronous
> reads, interrupted requests) can grow indefinitely. This can be used by a
> malicous user to cause FUSE to allocate arbitrary amounts of unswappable
> kernel memory, denying service.
>
> For this reason add a limit for the number of background requests, and block
> allocations of new requests until the number goes bellow the limit.

However, the implementation suffers from the following problems:

1. Latency of synchronous requests. As soon as fc->num_background hits the
limit, all allocations are blocked: both for synchronous and background
requests. This is unnecessary - as the comment cited above states, synchronous
requests need not be limited (by fuse). Moreover, sometimes it's very
inconvenient. For example, a dozen of tasks aggressively writing to mmap()-ed
area may block 'ls' for long while (>1min in my experiments).

2. Thundering herd problem. When fc->num_background falls below the limit,
request_end() calls wake_up_all(&fc->blocked_waitq). This wakes up all waiters
while it's not impossible that the first waiter getting new request will
immediately put it to background increasing fc->num_background again.
(experimenting with mmap()-ed writes I observed 2x slowdown as compared with
fuse after applying this patch-set)

The patch-set re-works fuse_get_req (and its callers) to throttle only requests
intended for background processing. Having this done, it becomes possible to
use exclusive wakeups in chained manner: request_end() wakes up a waiter,
the waiter allocates new request and submits it for background processing,
the processing ends in request_end() where another wakeup happens an so on.

Changed in v2:
- rebased on for-next branch of the fuse tree
- fixed race when processing request begins before init-reply came

Thanks,
Maxim

---

Maxim V. Patlasov (4):
fuse: make request allocations for background processing explicit
fuse: add flag fc->uninitialized
fuse: skip blocking on allocations of synchronous requests
fuse: implement exclusive wakeup for blocked_waitq


fs/fuse/cuse.c | 3 ++
fs/fuse/dev.c | 69 +++++++++++++++++++++++++++++++++++++++++++-----------
fs/fuse/file.c | 6 +++--
fs/fuse/fuse_i.h | 8 ++++++
fs/fuse/inode.c | 4 +++
5 files changed, 73 insertions(+), 17 deletions(-)

--
Signature


2013-03-21 14:03:53

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 1/4] fuse: make request allocations for background processing explicit

There are two types of processing requests in FUSE: synchronous (via
fuse_request_send()) and asynchronous (via adding to fc->bg_queue).

Fortunately, the type of processing is always known in advance, at the time
of request allocation. This preparatory patch utilizes this fact making
fuse_get_req() aware about the type. Next patches will use it.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/cuse.c | 2 +-
fs/fuse/dev.c | 24 +++++++++++++++++++++---
fs/fuse/file.c | 6 ++++--
fs/fuse/fuse_i.h | 4 ++++
fs/fuse/inode.c | 1 +
5 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 6f96a8d..b7c7f30 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -422,7 +422,7 @@ static int cuse_send_init(struct cuse_conn *cc)

BUILD_BUG_ON(CUSE_INIT_INFO_MAX > PAGE_SIZE);

- req = fuse_get_req(fc, 1);
+ req = fuse_get_req_for_background(fc, 1);
if (IS_ERR(req)) {
rc = PTR_ERR(req);
goto err;
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index e9bdec0..512626f 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -86,7 +86,10 @@ EXPORT_SYMBOL_GPL(fuse_request_alloc);

struct fuse_req *fuse_request_alloc_nofs(unsigned npages)
{
- return __fuse_request_alloc(npages, GFP_NOFS);
+ struct fuse_req *req = __fuse_request_alloc(npages, GFP_NOFS);
+ if (req)
+ req->background = 1; /* writeback always goes to bg_queue */
+ return req;
}

void fuse_request_free(struct fuse_req *req)
@@ -130,7 +133,8 @@ static void fuse_req_init_context(struct fuse_req *req)
req->in.h.pid = current->pid;
}

-struct fuse_req *fuse_get_req(struct fuse_conn *fc, unsigned npages)
+struct fuse_req *fuse_get_req_internal(struct fuse_conn *fc, unsigned npages,
+ bool for_background)
{
struct fuse_req *req;
sigset_t oldset;
@@ -156,14 +160,27 @@ struct fuse_req *fuse_get_req(struct fuse_conn *fc, unsigned npages)

fuse_req_init_context(req);
req->waiting = 1;
+ req->background = for_background;
return req;

out:
atomic_dec(&fc->num_waiting);
return ERR_PTR(err);
}
+
+struct fuse_req *fuse_get_req(struct fuse_conn *fc, unsigned npages)
+{
+ return fuse_get_req_internal(fc, npages, 0);
+}
EXPORT_SYMBOL_GPL(fuse_get_req);

+struct fuse_req *fuse_get_req_for_background(struct fuse_conn *fc,
+ unsigned npages)
+{
+ return fuse_get_req_internal(fc, npages, 1);
+}
+EXPORT_SYMBOL_GPL(fuse_get_req_for_background);
+
/*
* Return request in fuse_file->reserved_req. However that may
* currently be in use. If that is the case, wait for it to become
@@ -442,6 +459,7 @@ __acquires(fc->lock)

static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
{
+ BUG_ON(req->background);
spin_lock(&fc->lock);
if (!fc->connected)
req->out.h.error = -ENOTCONN;
@@ -469,7 +487,7 @@ EXPORT_SYMBOL_GPL(fuse_request_send);
static void fuse_request_send_nowait_locked(struct fuse_conn *fc,
struct fuse_req *req)
{
- req->background = 1;
+ BUG_ON(!req->background);
fc->num_background++;
if (fc->num_background == fc->max_background)
fc->blocked = 1;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index c807176..097d48f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -131,6 +131,7 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
fuse_put_request(ff->fc, req);
} else {
req->end = fuse_release_end;
+ req->background = 1;
fuse_request_send_background(ff->fc, req);
}
kfree(ff);
@@ -661,7 +662,8 @@ static int fuse_readpages_fill(void *_data, struct page *page)
int nr_alloc = min_t(unsigned, data->nr_pages,
FUSE_MAX_PAGES_PER_REQ);
fuse_send_readpages(req, data->file);
- data->req = req = fuse_get_req(fc, nr_alloc);
+ data->req = req = fuse_get_req_internal(fc, nr_alloc,
+ fc->async_read);
if (IS_ERR(req)) {
unlock_page(page);
return PTR_ERR(req);
@@ -696,7 +698,7 @@ static int fuse_readpages(struct file *file, struct address_space *mapping,

data.file = file;
data.inode = inode;
- data.req = fuse_get_req(fc, nr_alloc);
+ data.req = fuse_get_req_internal(fc, nr_alloc, fc->async_read);
data.nr_pages = nr_pages;
err = PTR_ERR(data.req);
if (IS_ERR(data.req))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6aeba86..457f62e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -708,6 +708,10 @@ void fuse_request_free(struct fuse_req *req);
* caller should specify # elements in req->pages[] explicitly
*/
struct fuse_req *fuse_get_req(struct fuse_conn *fc, unsigned npages);
+struct fuse_req *fuse_get_req_for_background(struct fuse_conn *fc,
+ unsigned npages);
+struct fuse_req *fuse_get_req_internal(struct fuse_conn *fc, unsigned npages,
+ bool for_background);

/**
* Get a request, may fail with -ENOMEM,
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 01353ed..de0bee0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1043,6 +1043,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
init_req = fuse_request_alloc(0);
if (!init_req)
goto err_put_root;
+ init_req->background = 1;

if (is_bdev) {
fc->destroy_req = fuse_request_alloc(0);

2013-03-21 14:04:03

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 2/4] fuse: add flag fc->uninitialized

Existing flag fc->blocked is used to suspend request allocation both in case
of many background request submitted and period of time before init_reply
arrives from userspace. Next patch will skip blocking allocations of
synchronous request (disregarding fc->blocked). This is mostly OK, but
we still need to suspend allocations if init_reply is not arrived yet. The
patch introduces flag fc->uninitialized which will serve this purpose.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/cuse.c | 1 +
fs/fuse/dev.c | 1 +
fs/fuse/fuse_i.h | 4 ++++
fs/fuse/inode.c | 3 +++
4 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index b7c7f30..8fe0998 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -505,6 +505,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file)

cc->fc.connected = 1;
cc->fc.blocked = 0;
+ cc->fc.uninitialized = 0;
rc = cuse_send_init(cc);
if (rc) {
fuse_conn_put(&cc->fc);
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 512626f..6137650 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2089,6 +2089,7 @@ void fuse_abort_conn(struct fuse_conn *fc)
if (fc->connected) {
fc->connected = 0;
fc->blocked = 0;
+ fc->uninitialized = 0;
end_io_requests(fc);
end_queued_requests(fc);
end_polls(fc);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 457f62e..e893126 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -417,6 +417,10 @@ struct fuse_conn {
/** Batching of FORGET requests (positive indicates FORGET batch) */
int forget_batch;

+ /** Flag indicating that INIT reply is not received yet. Allocating
+ * any fuse request will be suspended until the flag is cleared */
+ int uninitialized;
+
/** Flag indicating if connection is blocked. This will be
the case before the INIT reply is received, and if there
are too many outstading backgrounds requests */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index de0bee0..0d14f03 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -362,6 +362,7 @@ void fuse_conn_kill(struct fuse_conn *fc)
spin_lock(&fc->lock);
fc->connected = 0;
fc->blocked = 0;
+ fc->uninitialized = 0;
spin_unlock(&fc->lock);
/* Flush all readers on this fs */
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
@@ -582,6 +583,7 @@ void fuse_conn_init(struct fuse_conn *fc)
fc->polled_files = RB_ROOT;
fc->reqctr = 0;
fc->blocked = 1;
+ fc->uninitialized = 1;
fc->attr_version = 1;
get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
}
@@ -881,6 +883,7 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req)
fc->conn_init = 1;
}
fc->blocked = 0;
+ fc->uninitialized = 0;
wake_up_all(&fc->blocked_waitq);
}

2013-03-21 14:04:16

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 3/4] fuse: skip blocking on allocations of synchronous requests

Miklos wrote:

> A task may have at most one synchronous request allocated. So these
> requests need not be otherwise limited.

The patch re-works fuse_get_req() to follow this idea.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/dev.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6137650..1f7ce89 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -137,17 +137,27 @@ struct fuse_req *fuse_get_req_internal(struct fuse_conn *fc, unsigned npages,
bool for_background)
{
struct fuse_req *req;
- sigset_t oldset;
- int intr;
int err;
+ int *flag_p = NULL;

atomic_inc(&fc->num_waiting);
- block_sigs(&oldset);
- intr = wait_event_interruptible(fc->blocked_waitq, !fc->blocked);
- restore_sigs(&oldset);
- err = -EINTR;
- if (intr)
- goto out;
+
+ if (for_background)
+ flag_p = &fc->blocked;
+ else if (fc->uninitialized)
+ flag_p = &fc->uninitialized;
+
+ if (flag_p) {
+ sigset_t oldset;
+ int intr;
+
+ block_sigs(&oldset);
+ intr = wait_event_interruptible(fc->blocked_waitq, !*flag_p);
+ restore_sigs(&oldset);
+ err = -EINTR;
+ if (intr)
+ goto out;
+ }

err = -ENOTCONN;
if (!fc->connected)

2013-03-21 14:04:26

by Maxim Patlasov

[permalink] [raw]
Subject: [PATCH 4/4] fuse: implement exclusive wakeup for blocked_waitq

The patch solves thundering herd problem. So far as previous patches ensured
that only allocations for background may block, it's safe to wake up one
waiter. Whoever it is, it will wake up another one in request_end() afterwards.

Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/dev.c | 20 ++++++++++++++++----
1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1f7ce89..ea99e2a 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -152,7 +152,8 @@ struct fuse_req *fuse_get_req_internal(struct fuse_conn *fc, unsigned npages,
int intr;

block_sigs(&oldset);
- intr = wait_event_interruptible(fc->blocked_waitq, !*flag_p);
+ intr = wait_event_interruptible_exclusive(fc->blocked_waitq,
+ !*flag_p);
restore_sigs(&oldset);
err = -EINTR;
if (intr)
@@ -265,6 +266,13 @@ struct fuse_req *fuse_get_req_nofail_nopages(struct fuse_conn *fc,
void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req)
{
if (atomic_dec_and_test(&req->count)) {
+ if (unlikely(req->background)) {
+ spin_lock(&fc->lock);
+ if (!fc->blocked)
+ wake_up(&fc->blocked_waitq);
+ spin_unlock(&fc->lock);
+ }
+
if (req->waiting)
atomic_dec(&fc->num_waiting);

@@ -362,10 +370,14 @@ __releases(fc->lock)
list_del(&req->intr_entry);
req->state = FUSE_REQ_FINISHED;
if (req->background) {
- if (fc->num_background == fc->max_background) {
+ req->background = 0;
+
+ if (fc->num_background == fc->max_background)
fc->blocked = 0;
- wake_up_all(&fc->blocked_waitq);
- }
+
+ if (!fc->blocked)
+ wake_up(&fc->blocked_waitq);
+
if (fc->num_background == fc->congestion_threshold &&
fc->connected && fc->bdi_initialized) {
clear_bdi_congested(&fc->bdi, BLK_RW_SYNC);

2013-04-11 11:18:58

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH 0/4] fuse: fix accounting background requests (v2)

Hi Miklos,

Any feedback would be highly appreciated.

Thanks,
Maxim

03/21/2013 06:01 PM, Maxim V. Patlasov пишет:
> Hi,
>
> The feature was added long time ago (commit 08a53cdc...) with the comment:
>
>> A task may have at most one synchronous request allocated. So these requests
>> need not be otherwise limited.
>>
>> However the number of background requests (release, forget, asynchronous
>> reads, interrupted requests) can grow indefinitely. This can be used by a
>> malicous user to cause FUSE to allocate arbitrary amounts of unswappable
>> kernel memory, denying service.
>>
>> For this reason add a limit for the number of background requests, and block
>> allocations of new requests until the number goes bellow the limit.
> However, the implementation suffers from the following problems:
>
> 1. Latency of synchronous requests. As soon as fc->num_background hits the
> limit, all allocations are blocked: both for synchronous and background
> requests. This is unnecessary - as the comment cited above states, synchronous
> requests need not be limited (by fuse). Moreover, sometimes it's very
> inconvenient. For example, a dozen of tasks aggressively writing to mmap()-ed
> area may block 'ls' for long while (>1min in my experiments).
>
> 2. Thundering herd problem. When fc->num_background falls below the limit,
> request_end() calls wake_up_all(&fc->blocked_waitq). This wakes up all waiters
> while it's not impossible that the first waiter getting new request will
> immediately put it to background increasing fc->num_background again.
> (experimenting with mmap()-ed writes I observed 2x slowdown as compared with
> fuse after applying this patch-set)
>
> The patch-set re-works fuse_get_req (and its callers) to throttle only requests
> intended for background processing. Having this done, it becomes possible to
> use exclusive wakeups in chained manner: request_end() wakes up a waiter,
> the waiter allocates new request and submits it for background processing,
> the processing ends in request_end() where another wakeup happens an so on.
>
> Changed in v2:
> - rebased on for-next branch of the fuse tree
> - fixed race when processing request begins before init-reply came
>
> Thanks,
> Maxim
>
> ---
>
> Maxim V. Patlasov (4):
> fuse: make request allocations for background processing explicit
> fuse: add flag fc->uninitialized
> fuse: skip blocking on allocations of synchronous requests
> fuse: implement exclusive wakeup for blocked_waitq
>
>
> fs/fuse/cuse.c | 3 ++
> fs/fuse/dev.c | 69 +++++++++++++++++++++++++++++++++++++++++++-----------
> fs/fuse/file.c | 6 +++--
> fs/fuse/fuse_i.h | 8 ++++++
> fs/fuse/inode.c | 4 +++
> 5 files changed, 73 insertions(+), 17 deletions(-)
>