2017-12-20 11:39:21

by Paolo Valente

[permalink] [raw]
Subject: [PATCH IMPROVEMENT/BUGFIX 0/4] remove start-up time outlier caused by wrong detection of cooperating processes

Hi,
the main patch in this series ("block, bfq: let a queue be merged only
shortly after starting I/O") eliminates an outlier in the application
start-up time guaranteed by BFQ. This outlier occurs more or less
frequently, as a function of the characteristics of the system, and is
caused by a wrong detection of so-called cooperating processes
(details in the commit message). This main patch is preceded by two
patches that fix two bugs found while working on this problem. The
patch is then followed by a further, optimization patch, that removes
an operation made superfluous by the main patch.

Jens, I've not forgotten our decision to make a patch that enables
blkio stats (related to proportional-share policy) to not be enabled
at boot, or when CFQ or BFQ modules are loaded. Just, we have already
prepared the present patches, plus a few other patches for improving
BFQ and fixing bugs, and I'd like to clear this backlog first.

In this respect, after a patch for boosting throughput more reliably
with cooperating processes, I'll send out a patch to solve an
important read starvation problem. If I'm not making a blunder, this
problem affects every I/O scheduler in blk-mq. As a first step, I'll
propose a fix for BFQ. If the fix is ok, I'm willing to port it to the
other schedulers.

Thanks,
Paolo

Angelo Ruocco (2):
block, bfq: check low_latency flag in bfq_bfqq_save_state()
block, bfq: remove superfluous check in queue-merging setup

Paolo Valente (2):
block, bfq: add missing rq_pos_tree update on rq removal
block, bfq: let a queue be merged only shortly after starting I/O

block/bfq-iosched.c | 98 ++++++++++++++++++++++++++++++-----------------------
block/bfq-iosched.h | 2 ++
block/bfq-wf2q.c | 4 +++
3 files changed, 61 insertions(+), 43 deletions(-)

--
2.15.1


2017-12-20 11:39:30

by Paolo Valente

[permalink] [raw]
Subject: [PATCH IMPROVEMENT/BUGFIX 1/4] block, bfq: add missing rq_pos_tree update on rq removal

If two processes do I/O close to each other, then BFQ merges the
bfq_queues associated with these processes, to get a more sequential
I/O, and thus a higher throughput. In this respect, to detect whether
two processes are doing I/O close to each other, BFQ keeps a list of
the head-of-line I/O requests of all active bfq_queues. The list is
ordered by initial sectors, and implemented through a red-black tree
(rq_pos_tree).

Unfortunately, the update of the rq_pos_tree was incomplete, because
the tree was not updated on the removal of the head-of-line I/O
request of a bfq_queue, in case the queue did not remain empty. This
commit adds the missing update.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Angelo Ruocco <[email protected]>
---
block/bfq-iosched.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a9e06217e64d..c6f0b93a769c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1627,6 +1627,8 @@ static void bfq_remove_request(struct request_queue *q,
rb_erase(&bfqq->pos_node, bfqq->pos_root);
bfqq->pos_root = NULL;
}
+ } else {
+ bfq_pos_tree_add_move(bfqd, bfqq);
}

if (rq->cmd_flags & REQ_META)
--
2.15.1

2017-12-20 11:39:35

by Paolo Valente

[permalink] [raw]
Subject: [PATCH IMPROVEMENT/BUGFIX 2/4] block, bfq: check low_latency flag in bfq_bfqq_save_state()

From: Angelo Ruocco <[email protected]>

A just-created bfq_queue will certainly be deemed as interactive on
the arrival of its first I/O request, if the low_latency flag is
set. Yet, if the queue is merged with another queue on the arrival of
its first I/O request, it will not have the chance to be flagged as
interactive. Nevertheless, if the queue is then split soon enough, it
has to be flagged as interactive after the split.

To handle this early-merge scenario correctly, BFQ saves the state of
the queue, on the merge, as if the latter had already been deemed
interactive. So, if the queue is split soon, it will get
weight-raised, because the previous state of the queue is resumed on
the split.

Unfortunately, in the act of saving the state of the newly-created
queue, BFQ doesn't check whether the low_latency flag is set, and this
causes early-merged queues to be then weight-raised, on queue splits,
even if low_latency is off. This commit addresses this problem by
adding the missing check.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c6f0b93a769c..f58367ef98c1 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2064,7 +2064,8 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
if (unlikely(bfq_bfqq_just_created(bfqq) &&
- !bfq_bfqq_in_large_burst(bfqq))) {
+ !bfq_bfqq_in_large_burst(bfqq) &&
+ bfqq->bfqd->low_latency)) {
/*
* bfqq being merged right after being created: bfqq
* would have deserved interactive weight raising, but
--
2.15.1

2017-12-20 11:39:38

by Paolo Valente

[permalink] [raw]
Subject: [PATCH IMPROVEMENT/BUGFIX 3/4] block, bfq: let a queue be merged only shortly after starting I/O

In BFQ and CFQ, two processes are said to be cooperating if they do
I/O in such a way that the union of their I/O requests yields a
sequential I/O pattern. To get such a sequential I/O pattern out of
the non-sequential pattern of each cooperating process, BFQ and CFQ
merge the queues associated with these processes. In more detail,
cooperating processes, and thus their associated queues, usually
start, or restart, to do I/O shortly after each other. This is the
case, e.g., for the I/O threads of KVM/QEMU and of the dump
utility. Basing on this assumption, this commit allows a bfq_queue to
be merged only during a short time interval (100ms) after it starts,
or re-starts, to do I/O. This filtering provides two important
benefits.

First, it greatly reduces the probability that two non-cooperating
processes have their queues merged by mistake, if they just happen to
do I/O close to each other for a short time interval. These spurious
merges cause loss of service guarantees. A low-weight bfq_queue may
unjustly get more than its expected share of the throughput: if such a
low-weight queue is merged with a high-weight queue, then the I/O for
the low-weight queue is served as if the queue had a high weight. This
may damage other high-weight queues unexpectedly. For instance,
because of this issue, lxterminal occasionally took 7.5 seconds to
start, instead of 6.5 seconds, when some sequential readers and
writers did I/O in the background on a FUJITSU MHX2300BT HDD. The
reason is that the bfq_queues associated with some of the readers or
the writers were merged with the high-weight queues of some processes
that had to do some urgent but little I/O. The readers then exploited
the inherited high weight for all or most of their I/O, during the
start-up of terminal. The filtering introduced by this commit
eliminated any outlier caused by spurious queue merges in our start-up
time tests.

This filtering also provides a little boost of the throughput
sustainable by BFQ: 3-4%, depending on the CPU. The reason is that,
once a bfq_queue cannot be merged any longer, this commit makes BFQ
stop updating the data needed to handle merging for the queue.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Angelo Ruocco <[email protected]>
---
block/bfq-iosched.c | 57 ++++++++++++++++++++++++++++++++++++++++++-----------
block/bfq-iosched.h | 2 ++
block/bfq-wf2q.c | 4 ++++
3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f58367ef98c1..320022135dc8 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -166,6 +166,20 @@ static const int bfq_async_charge_factor = 10;
/* Default timeout values, in jiffies, approximating CFQ defaults. */
const int bfq_timeout = HZ / 8;

+/*
+ * Time limit for merging (see comments in bfq_setup_cooperator). Set
+ * to the slowest value that, in our tests, proved to be effective in
+ * removing false positives, while not causing true positives to miss
+ * queue merging.
+ *
+ * As can be deduced from the low time limit below, queue merging, if
+ * successful, happens at the very beggining of the I/O of the involved
+ * cooperating processes, as a consequence of the arrival of the very
+ * first requests from each cooperator. After that, there is very
+ * little chance to find cooperators.
+ */
+static const unsigned long bfq_merge_time_limit = HZ/10;
+
static struct kmem_cache *bfq_pool;

/* Below this threshold (in ns), we consider thinktime immediate. */
@@ -444,6 +458,13 @@ bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
return bfqq;
}

+static bool bfq_too_late_for_merging(struct bfq_queue *bfqq)
+{
+ return bfqq->service_from_backlogged > 0 &&
+ time_is_before_jiffies(bfqq->first_IO_time +
+ bfq_merge_time_limit);
+}
+
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
{
struct rb_node **p, *parent;
@@ -454,6 +475,14 @@ void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
bfqq->pos_root = NULL;
}

+ /*
+ * bfqq cannot be merged any longer (see comments in
+ * bfq_setup_cooperator): no point in adding bfqq into the
+ * position tree.
+ */
+ if (bfq_too_late_for_merging(bfqq))
+ return;
+
if (bfq_class_idle(bfqq))
return;
if (!bfqq->next_rq)
@@ -1935,6 +1964,9 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
struct bfq_queue *new_bfqq)
{
+ if (bfq_too_late_for_merging(new_bfqq))
+ return false;
+
if (bfq_class_idle(bfqq) || bfq_class_idle(new_bfqq) ||
(bfqq->ioprio_class != new_bfqq->ioprio_class))
return false;
@@ -2003,6 +2035,20 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
{
struct bfq_queue *in_service_bfqq, *new_bfqq;

+ /*
+ * Prevent bfqq from being merged if it has been created too
+ * long ago. The idea is that true cooperating processes, and
+ * thus their associated bfq_queues, are supposed to be
+ * created shortly after each other. This is the case, e.g.,
+ * for KVM/QEMU and dump I/O threads. Basing on this
+ * assumption, the following filtering greatly reduces the
+ * probability that two non-cooperating processes, which just
+ * happen to do close I/O for some short time interval, have
+ * their queues merged by mistake.
+ */
+ if (bfq_too_late_for_merging(bfqq))
+ return NULL;
+
if (bfqq->new_bfqq)
return bfqq->new_bfqq;

@@ -3044,17 +3090,6 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
*/
slow = bfq_bfqq_is_slow(bfqd, bfqq, compensate, reason, &delta);

- /*
- * Increase service_from_backlogged before next statement,
- * because the possible next invocation of
- * bfq_bfqq_charge_time would likely inflate
- * entity->service. In contrast, service_from_backlogged must
- * contain real service, to enable the soft real-time
- * heuristic to correctly compute the bandwidth consumed by
- * bfqq.
- */
- bfqq->service_from_backlogged += entity->service;
-
/*
* As above explained, charge slow (typically seeky) and
* timed-out queues with the time and not the service
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 91c4390903a1..5d47b58d5fc8 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -344,6 +344,8 @@ struct bfq_queue {
unsigned long wr_start_at_switch_to_srt;

unsigned long split_time; /* time of last split */
+
+ unsigned long first_IO_time; /* time of first I/O for this queue */
};

/**
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index e495d3f9b4b0..4456eda34e48 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -835,6 +835,10 @@ void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
struct bfq_entity *entity = &bfqq->entity;
struct bfq_service_tree *st;

+ if (!bfqq->service_from_backlogged)
+ bfqq->first_IO_time = jiffies;
+
+ bfqq->service_from_backlogged += served;
for_each_entity(entity) {
st = bfq_entity_service_tree(entity);

--
2.15.1

2017-12-20 11:40:09

by Paolo Valente

[permalink] [raw]
Subject: [PATCH IMPROVEMENT/BUGFIX 4/4] block, bfq: remove superfluous check in queue-merging setup

From: Angelo Ruocco <[email protected]>

When two or more processes do I/O in a way that the their requests are
sequential in respect to one another, BFQ merges the bfq_queues associated
with the processes. This way the overall I/O pattern becomes sequential,
and thus there is a boost in througput.
These cooperating processes usually start or restart to do I/O shortly
after each other. So, in order to avoid merging non-cooperating processes,
BFQ ensures that none of these queues has been in weight raising for too
long.

In this respect, from commit "block, bfq-sq, bfq-mq: let a queue be merged
only shortly after being created", BFQ checks whether any queue (and not
only weight-raised ones) is doing I/O continuously from too long to be
merged.

This new additional check makes the first one useless: a queue doing
I/O from long enough, if being weight-raised, is also a queue in
weight raising for too long to be merged. Accordingly, this commit
removes the first check.

Signed-off-by: Angelo Ruocco <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 36 +++++-------------------------------
1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 320022135dc8..c66578592c9e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1990,20 +1990,6 @@ static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
return true;
}

-/*
- * If this function returns true, then bfqq cannot be merged. The idea
- * is that true cooperation happens very early after processes start
- * to do I/O. Usually, late cooperations are just accidental false
- * positives. In case bfqq is weight-raised, such false positives
- * would evidently degrade latency guarantees for bfqq.
- */
-static bool wr_from_too_long(struct bfq_queue *bfqq)
-{
- return bfqq->wr_coeff > 1 &&
- time_is_before_jiffies(bfqq->last_wr_start_finish +
- msecs_to_jiffies(100));
-}
-
/*
* Attempt to schedule a merge of bfqq with the currently in-service
* queue or with a close queue among the scheduled queues. Return
@@ -2017,11 +2003,6 @@ static bool wr_from_too_long(struct bfq_queue *bfqq)
* to maintain. Besides, in such a critical condition as an out of memory,
* the benefits of queue merging may be little relevant, or even negligible.
*
- * Weight-raised queues can be merged only if their weight-raising
- * period has just started. In fact cooperating processes are usually
- * started together. Thus, with this filter we avoid false positives
- * that would jeopardize low-latency guarantees.
- *
* WARNING: queue merging may impair fairness among non-weight raised
* queues, for at least two reasons: 1) the original weight of a
* merged queue may change during the merged state, 2) even being the
@@ -2052,9 +2033,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (bfqq->new_bfqq)
return bfqq->new_bfqq;

- if (!io_struct ||
- wr_from_too_long(bfqq) ||
- unlikely(bfqq == &bfqd->oom_bfqq))
+ if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
return NULL;

/* If there is only one backlogged queue, don't search. */
@@ -2063,12 +2042,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,

in_service_bfqq = bfqd->in_service_queue;

- if (!in_service_bfqq || in_service_bfqq == bfqq
- || wr_from_too_long(in_service_bfqq) ||
- unlikely(in_service_bfqq == &bfqd->oom_bfqq))
- goto check_scheduled;
-
- if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
+ if (in_service_bfqq && in_service_bfqq != bfqq &&
+ likely(in_service_bfqq != &bfqd->oom_bfqq) &&
+ bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
bfqq->entity.parent == in_service_bfqq->entity.parent &&
bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
@@ -2080,12 +2056,10 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
* queues. The only thing we need is that the bio/request is not
* NULL, as we need it to establish whether a cooperator exists.
*/
-check_scheduled:
new_bfqq = bfq_find_close_cooperator(bfqd, bfqq,
bfq_io_struct_pos(io_struct, request));

- if (new_bfqq && !wr_from_too_long(new_bfqq) &&
- likely(new_bfqq != &bfqd->oom_bfqq) &&
+ if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq) &&
bfq_may_be_close_cooperator(bfqq, new_bfqq))
return bfq_setup_merge(bfqq, new_bfqq);

--
2.15.1

2018-01-05 16:28:31

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH IMPROVEMENT/BUGFIX 0/4] remove start-up time outlier caused by wrong detection of cooperating processes

On 12/20/17 4:38 AM, Paolo Valente wrote:
> Hi,
> the main patch in this series ("block, bfq: let a queue be merged only
> shortly after starting I/O") eliminates an outlier in the application
> start-up time guaranteed by BFQ. This outlier occurs more or less
> frequently, as a function of the characteristics of the system, and is
> caused by a wrong detection of so-called cooperating processes
> (details in the commit message). This main patch is preceded by two
> patches that fix two bugs found while working on this problem. The
> patch is then followed by a further, optimization patch, that removes
> an operation made superfluous by the main patch.
>
> Jens, I've not forgotten our decision to make a patch that enables
> blkio stats (related to proportional-share policy) to not be enabled
> at boot, or when CFQ or BFQ modules are loaded. Just, we have already
> prepared the present patches, plus a few other patches for improving
> BFQ and fixing bugs, and I'd like to clear this backlog first.
>
> In this respect, after a patch for boosting throughput more reliably
> with cooperating processes, I'll send out a patch to solve an
> important read starvation problem. If I'm not making a blunder, this
> problem affects every I/O scheduler in blk-mq. As a first step, I'll
> propose a fix for BFQ. If the fix is ok, I'm willing to port it to the
> other schedulers.

Added for 4.16, thanks.

--
Jens Axboe