Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753880AbcJDU17 (ORCPT ); Tue, 4 Oct 2016 16:27:59 -0400 Received: from mail-yw0-f171.google.com ([209.85.161.171]:35981 "EHLO mail-yw0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751479AbcJDU15 (ORCPT ); Tue, 4 Oct 2016 16:27:57 -0400 Date: Tue, 4 Oct 2016 16:27:54 -0400 From: Tejun Heo To: Paolo Valente Cc: Shaohua Li , Vivek Goyal , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Jens Axboe , Kernel-team@fb.com, jmoyer@redhat.com, Mark Brown , Linus Walleij , Ulf Hansson Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit Message-ID: <20161004202754.GJ4205@htj.duckdns.org> References: <20161004155616.GB4205@htj.duckdns.org> <20161004162759.GD4205@htj.duckdns.org> <278BCC7B-ED58-4FDF-9243-FAFC3F862E4D@unimore.it> <20161004172852.GB73678@anikkar-mbp.local.dhcp.thefacebook.com> <20161004185413.GF4205@htj.duckdns.org> <20161004191427.GG4205@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.0 (2016-08-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2693 Lines: 57 Hello, Paolo. On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote: > > Hmm... I think we already discussed this but here's a really simple > > case. There are three unknown workloads A, B and C and we want to > > give A certain best-effort guarantees (let's say around 80% of the > > underlying device) whether A is sharing the device with B or C. > > That's the same example that you proposed me in our previous > discussion. For this example I showed you, with many boring numbers, > that with BFQ you get the most accurate distribution of the resource. Yes, it is about the same example and what I understood was that "accurate distribution of the resources" holds as long as the randomness is incidental (ie. due to layout on the filesystem and so on) with the slice expiration mechanism offsetting the actually random workloads. > If you have enough stamina, I can repeat them again. To save your I'll go back to the thread and re-read them. > patience, here is a very brief summary. In a concrete use case, the > unknown workloads turn into something like this: there will be a first > time interval during which A happens to be, say, sequential, B happens > to be, say, random and C happens to be, say, quasi-sequential. Then > there will be a next time interval during which their characteristics > change, and so on. It is easy (but boring, I acknowledge it) to show > that, for each of these time intervals BFQ provides the best possible > service in terms of fairness, bandwidth distribution, stability and so > on. Why? Because of the elastic bandwidth-time scheduling of BFQ > that we already discussed, and because BFQ is naturally accurate in > redistributing aggregate throughput proportionally, when needed. Yeah, that's what I remember and for workload above certain level of randomness its time consumption is mapped to bw, right? > > I get that bfq can be a good compromise on most desktop workloads and > > behave reasonably well for some server workloads with the slice > > expiration mechanism but it really isn't an IO resource partitioning > > mechanism. > > Right. My argument is that BFQ enables you to give to each client the > bandwidth and low-latency guarantees you want. And this IMO is way > better than partitioning a resource and then getting unavoidable > unfairness and high latency. But that statement only holds while bw is the main thing to guarantee, no? The level of isolation that we're looking for here is fairly strict adherence to sub/few-milliseconds in terms of high percentile scheduling latency while within the configured bw/iops limits, not "overall this device is being used pretty well". Thanks. -- tejun