DMARC-Filter: OpenDMARC Filter v1.3.2 vulcan.natalenko.name C5A4C2B6BC1
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org,
        linux-kernel@vger.kernel.org, ulf.hansson@linaro.org,
        broonie@kernel.org, linus.walleij@linaro.org,
        bfq-iosched@googlegroups.com
Subject: Re: [PATCH IMPROVEMENT] block, bfq: limit sectors served with interactive weight raising
Date: Fri, 29 Dec 2017 15:10:37 +0100
Message-ID: <2155485.5RQgH7pnPy@natalenko.name>
In-Reply-To: <20171228111917.2767-1-paolo.valente@linaro.org>
References: <20171228111917.2767-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 9621
Lines: 203

Hi.

On čtvrtek 28. prosince 2017 12:19:17 CET Paolo Valente wrote:
> To maximise responsiveness, BFQ raises the weight, and performs device
> idling, for bfq_queues associated with processes deemed as
> interactive. In particular, weight raising has a maximum duration,
> equal to the time needed to start a large application. If a
> weight-raised process goes on doing I/O beyond this maximum duration,
> it loses weight-raising.
> 
> This mechanism is evidently vulnerable to the following false
> positives: I/O-bound applications that will go on doing I/O for much
> longer than the duration of weight-raising. These applications have
> basically no benefit from being weight-raised at the beginning of
> their I/O. On the opposite end, while being weight-raised, these
> applications
> a) unjustly steal throughput to applications that may truly need
> low latency;
> b) make BFQ uselessly perform device idling; device idling results
> in loss of device throughput with most flash-based storage, and may
> increase latencies when used purposelessly.
> 
> This commit adds a countermeasure to reduce both the above
> problems. To introduce this countermeasure, we provide the following
> extra piece of information (full details in the comments added by this
> commit). During the start-up of the large application used as a
> reference to set the duration of weight-raising, involved processes
> transfer at most ~110K sectors each. Accordingly, a process initially
> deemed as interactive has no right to be weight-raised any longer,
> once transferred 110K sectors or more.
> 
> Basing on this consideration, this commit early-ends weight-raising
> for a bfq_queue if the latter happens to have received an amount of
> service at least equal to 110K sectors (actually, a little bit more,
> to keep a safety margin). I/O-bound applications that reach a high
> throughput, such as file copy, get to this threshold much before the
> allowed weight-raising period finishes. Thus this early ending of
> weight-raising reduces the amount of time during which these
> applications cause the problems described above.
> 
> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
> ---
>  block/bfq-iosched.c | 81
> +++++++++++++++++++++++++++++++++++++++++++++++------ block/bfq-iosched.h |
>  5 ++++
>  block/bfq-wf2q.c    |  3 ++
>  3 files changed, 80 insertions(+), 9 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 6f75015d18c0..ea48b5c8f088 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -209,15 +209,17 @@ static struct kmem_cache *bfq_pool;
>   * interactive applications automatically, using the following formula:
>   * duration = (R / r) * T, where r is the peak rate of the device, and
>   * R and T are two reference parameters.
> - * In particular, R is the peak rate of the reference device (see below),
> - * and T is a reference time: given the systems that are likely to be
> - * installed on the reference device according to its speed class, T is
> - * about the maximum time needed, under BFQ and while reading two files in
> - * parallel, to load typical large applications on these systems.
> - * In practice, the slower/faster the device at hand is, the more/less it
> - * takes to load applications with respect to the reference device.
> - * Accordingly, the longer/shorter BFQ grants weight raising to interactive
> - * applications.
> + * In particular, R is the peak rate of the reference device (see
> + * below), and T is a reference time: given the systems that are
> + * likely to be installed on the reference device according to its
> + * speed class, T is about the maximum time needed, under BFQ and
> + * while reading two files in parallel, to load typical large
> + * applications on these systems (see the comments on
> + * max_service_from_wr below, for more details on how T is obtained).
> + * In practice, the slower/faster the device at hand is, the more/less
> + * it takes to load applications with respect to the reference device.
> + * Accordingly, the longer/shorter BFQ grants weight raising to
> + * interactive applications.
>   *
>   * BFQ uses four different reference pairs (R, T), depending on:
>   * . whether the device is rotational or non-rotational;
> @@ -254,6 +256,60 @@ static int T_slow[2];
>  static int T_fast[2];
>  static int device_speed_thresh[2];
> 
> +/*
> + * BFQ uses the above-detailed, time-based weight-raising mechanism to
> + * privilege interactive tasks. This mechanism is vulnerable to the
> + * following false positives: I/O-bound applications that will go on
> + * doing I/O for much longer than the duration of weight
> + * raising. These applications have basically no benefit from being
> + * weight-raised at the beginning of their I/O. On the opposite end,
> + * while being weight-raised, these applications
> + * a) unjustly steal throughput to applications that may actually need
> + * low latency;
> + * b) make BFQ uselessly perform device idling; device idling results
> + * in loss of device throughput with most flash-based storage, and may
> + * increase latencies when used purposelessly.
> + *
> + * BFQ tries to reduce these problems, by adopting the following
> + * countermeasure. To introduce this countermeasure, we need first to
> + * finish explaining how the duration of weight-raising for
> + * interactive tasks is computed.
> + *
> + * For a bfq_queue deemed as interactive, the duration of weight
> + * raising is dynamically adjusted, as a function of the estimated
> + * peak rate of the device, so as to be equal to the time needed to
> + * execute the 'largest' interactive task we benchmarked so far. By
> + * largest task, we mean the task for which each involved process has
> + * to do more I/O than for any of the other tasks we benchmarked. This
> + * reference interactive task is the start-up of LibreOffice Writer,
> + * and in this task each process/bfq_queue needs to have at most ~110K
> + * sectors transferred.
> + *
> + * This last piece of information enables BFQ to reduce the actual
> + * duration of weight-raising for at least one class of I/O-bound
> + * applications: those doing sequential or quasi-sequential I/O. An
> + * example is file copy. In fact, once started, the main I/O-bound
> + * processes of these applications usually consume the above 110K
> + * sectors in much less time than the processes of an application that
> + * is starting, because these I/O-bound processes will greedily devote
> + * almost all their CPU cycles only to their target,
> + * throughput-friendly I/O operations. This is even more true if BFQ
> + * happens to be underestimating the device peak rate, and thus
> + * overestimating the duration of weight raising. But, according to
> + * our measurements, once transferred 110K sectors, these processes
> + * have no right to be weight-raised any longer.
> + *
> + * Basing on the last consideration, BFQ ends weight-raising for a
> + * bfq_queue if the latter happens to have received an amount of
> + * service at least equal to the following constant. The constant is
> + * set to slightly more than 110K, to have a minimum safety margin.
> + *
> + * This early ending of weight-raising reduces the amount of time
> + * during which interactive false positives cause the two problems
> + * described at the beginning of these comments.
> + */
> +static const unsigned long max_service_from_wr = 120000;
> +
>  #define RQ_BIC(rq)		icq_to_bic((rq)->elv.priv[0])
>  #define RQ_BFQQ(rq)		((rq)->elv.priv[1])
> 
> @@ -1352,6 +1408,7 @@ static void bfq_update_bfqq_wr_on_rq_arrival(struct
> bfq_data *bfqd, if (old_wr_coeff == 1 && wr_or_deserves_wr) {
>  		/* start a weight-raising period */
>  		if (interactive) {
> +			bfqq->service_from_wr = 0;
>  			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
>  			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
>  		} else {
> @@ -3665,6 +3722,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
> struct bfq_queue *bfqq) bfqq->entity.prio_changed = 1;
>  			}
>  		}
> +		if (bfqq->wr_coeff > 1 &&
> +		    bfqq->wr_cur_max_time != bfqd->bfq_wr_rt_max_time &&
> +		    bfqq->service_from_wr > max_service_from_wr) {
> +			/* see comments on max_service_from_wr */
> +			bfq_bfqq_end_wr(bfqq);
> +		}
>  	}
>  	/*
>  	 * To improve latency (for this or other queues), immediately
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index fcd941008127..350c39ae2896 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -337,6 +337,11 @@ struct bfq_queue {
>  	 * last transition from idle to backlogged.
>  	 */
>  	unsigned long service_from_backlogged;
> +	/*
> +	 * Cumulative service received from the @bfq_queue since its
> +	 * last transition to weight-raised state.
> +	 */
> +	unsigned long service_from_wr;
> 
>  	/*
>  	 * Value of wr start time when switching to soft rt
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index 4456eda34e48..4498c43245e2 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -838,6 +838,9 @@ void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
> if (!bfqq->service_from_backlogged)
>  		bfqq->first_IO_time = jiffies;
> 
> +	if (bfqq->wr_coeff > 1)
> +		bfqq->service_from_wr += served;
> +
>  	bfqq->service_from_backlogged += served;
>  	for_each_entity(entity) {
>  		st = bfq_entity_service_tree(entity);

Building and running it routinely on my laptop didn't reveal any smoke (at 
least, yet), so

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

(actually, currently my Tested-by applies to all pending BFQ patches too)

Regards,
  Oleksandr