From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org,
        angeloruocco90@gmail.com, bfq-iosched@googlegroups.com,
        Paolo Valente <paolo.valente@linaro.org>
Subject: [PATCH IMPROVEMENT] block, bfq: increase threshold to deem I/O as random
Date: Wed, 20 Dec 2017 17:27:36 +0100
Message-Id: <20171220162736.5067-1-paolo.valente@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2955
Lines: 61

If two processes do I/O close to each other, i.e., are cooperating
processes in BFQ (and CFQ'S) nomenclature, then BFQ merges their
associated bfq_queues, so as to get sequential I/O from the union of
the I/O requests of the processes, and thus reach a higher
throughput. A merged queue is then split if its I/O stops being
sequential. In this respect, BFQ deems the I/O of a bfq_queue as
(mostly) sequential only if less than 4 I/O requests are random, out
of the last 32 requests inserted into the queue.

Unfortunately, extensive testing (with the interleaved_io benchmark of
the S suite [1], and with real applications spawning cooperating
processes) has clearly shown that, with such a low threshold, only a
rather low I/O throughput may be reached when several cooperating
processes do I/O. In particular, the outcome of each test run was
bimodal: if queue merging occurred and was stable during the test,
then the throughput was close to the peak rate of the storage device,
otherwise the throughput was arbitrarily low (usually around 1/10 of
the peak rate with a rotational device). The probability to get the
unlucky outcomes grew with the number of cooperating processes: it was
already significant with 5 processes, and close to one with 7 or more
processes.

The cause of the low throughput in the unlucky runs was that the
merged queues containing the I/O of these cooperating processes were
soon split, because they contained more random I/O requests than those
tolerated by the 4/32 threshold, but
- that I/O would have however allowed the storage device to reach
  peak throughput or almost peak throughput;
- in contrast, the I/O of these processes, if served individually
  (from separate queues) yielded a rather low throughput.

So we repeated our tests with increasing values of the threshold,
until we found the minimum value (19) for which we obtained maximum
throughput, reliably, with at least up to 9 cooperating
processes. Then we checked that the use of that higher threshold value
did not cause any regression for any other benchmark in the suite [1].
This commit raises the threshold to such a higher value.

[1] https://github.com/Algodev-github/S

Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c66578592c9e..e33c5c4c9856 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -192,7 +192,7 @@ static struct kmem_cache *bfq_pool;
 #define BFQQ_SEEK_THR		(sector_t)(8 * 100)
 #define BFQQ_SECT_THR_NONROT	(sector_t)(2 * 32)
 #define BFQQ_CLOSE_THR		(sector_t)(8 * 1024)
-#define BFQQ_SEEKY(bfqq)	(hweight32(bfqq->seek_history) > 32/8)
+#define BFQQ_SEEKY(bfqq)	(hweight32(bfqq->seek_history) > 19)
 
 /* Min number of samples required to perform peak-rate update */
 #define BFQ_RATE_MIN_SAMPLES	32
-- 
2.15.1