Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752027AbcJDRoC convert rfc822-to-8bit (ORCPT ); Tue, 4 Oct 2016 13:44:02 -0400 Received: from spostino.sms.unimo.it ([155.185.44.3]:52139 "EHLO spostino.sms.unimo.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751020AbcJDRoA (ORCPT ); Tue, 4 Oct 2016 13:44:00 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [PATCH V3 00/11] block-throttle: add .high limit From: Paolo Valente In-Reply-To: <20161004172852.GB73678@anikkar-mbp.local.dhcp.thefacebook.com> Date: Tue, 4 Oct 2016 19:43:48 +0200 Cc: Tejun Heo , Vivek Goyal , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Jens Axboe , Kernel-team@fb.com, jmoyer@redhat.com, Mark Brown , Linus Walleij , Ulf Hansson Content-Transfer-Encoding: 8BIT Message-Id: References: <20161004132805.GB28808@redhat.com> <20161004155616.GB4205@htj.duckdns.org> <20161004162759.GD4205@htj.duckdns.org> <278BCC7B-ED58-4FDF-9243-FAFC3F862E4D@unimore.it> <20161004172852.GB73678@anikkar-mbp.local.dhcp.thefacebook.com> To: Shaohua Li X-Mailer: Apple Mail (2.3124) UNIMORE-X-SA-Score: -2.9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4992 Lines: 125 > Il giorno 04 ott 2016, alle ore 19:28, Shaohua Li ha scritto: > > On Tue, Oct 04, 2016 at 07:01:39PM +0200, Paolo Valente wrote: >> >>> Il giorno 04 ott 2016, alle ore 18:27, Tejun Heo ha scritto: >>> >>> Hello, >>> >>> On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote: >>>> Could you please elaborate more on this point? BFQ uses sectors >>>> served to measure service, and, on the all the fast devices on which >>>> we have tested it, it accurately distributes >>>> bandwidth as desired, redistributes excess bandwidth with any issue, >>>> and guarantees high responsiveness and low latency at application and >>>> system level (e.g., ~0 drop rate in video playback, with any background >>>> workload tested). >>> >>> The same argument as before. Bandwidth is a very bad measure of IO >>> resources spent. For specific use cases (like desktop or whatever), >>> this can work but not generally. >>> >> >> Actually, we have already discussed this point, and IMHO the arguments >> that (apparently) convinced you that bandwidth is the most relevant >> service guarantee for I/O in desktops and the like, prove that >> bandwidth is the most important service guarantee in servers too. >> >> Again, all the examples I can think of seem to confirm it: >> . file hosting: a good service must guarantee reasonable read/write, >> i.e., download/upload, speeds to users >> . file streaming: a good service must guarantee low drop rates, and >> this can be guaranteed only by guaranteeing bandwidth and latency >> . web hosting: high bandwidth and low latency needed here too >> . clouds: high bw and low latency needed to let, e.g., users of VMs >> enjoy high responsiveness and, for example, reasonable file-copy >> time >> ... >> >> To put in yet another way, with packet I/O in, e.g., clouds, there are >> basically the same issues, and the main goal is again guaranteeing >> bandwidth and low latency among nodes. >> >> Could you please provide a concrete server example (assuming we still >> agree about desktops), where I/O bandwidth does not matter while time >> does? > > I don't think IO bandwidth does not matter. The problem is bandwidth can't > measure IO cost. For example, you can't say 8k IO costs 2x IO resource than 4k > IO. > For what goal do you need to be able to say this, once you succeeded in guaranteeing bandwidth and low latency to each process/client/group/node/user? >>>> Could you please suggest me some test to show how sector-based >>>> guarantees fails? >>> >>> Well, mix 4k random and sequential workloads and try to distribute the >>> acteual IO resources. >>> >> >> >> If I'm not mistaken, we have already gone through this example too, >> and I thought we agreed on what service scheme worked best, again >> focusing only on desktops. To make a long story short(er), here is a >> snippet from one of our last exchanges. >> >> ---------- >> >> On Sat, Apr 16, 2016 at 12:08:44AM +0200, Paolo Valente wrote: >>> Maybe the source of confusion is the fact that a simple sector-based, >>> proportional share scheduler always distributes total bandwidth >>> according to weights. The catch is the additional BFQ rule: random >>> workloads get only time isolation, and are charged for full budgets, >>> so as to not affect the schedule of quasi-sequential workloads. So, >>> the correct claim for BFQ is that it distributes total bandwidth >>> according to weights (only) when all competing workloads are >>> quasi-sequential. If some workloads are random, then these workloads >>> are just time scheduled. This does break proportional-share bandwidth >>> distribution with mixed workloads, but, much more importantly, saves >>> both total throughput and individual bandwidths of quasi-sequential >>> workloads. >>> >>> We could then check whether I did succeed in tuning timeouts and >>> budgets so as to achieve the best tradeoffs. But this is probably a >>> second-order problem as of now. > > I don't see why random/sequential matters for SSD. what really matters is > request size and IO depth. Time scheduling is skeptical too, as workloads can > dispatch all IO within almost 0 time in high queue depth disks. > That's an orthogonal issue. If what matter is, e.g., size, then it is enough to replace "sequential I/O" with "large-request I/O". In case I have been too vague, here is an example: I mean that, e.g, in an I/O scheduler you replace the function that computes whether a queue is seeky based on request distance, with a function based on request size. And this is exactly what has been already done, for example, in CFQ: if (blk_queue_nonrot(cfqd->queue)) cfqq->seek_history |= (n_sec < CFQQ_SECT_THR_NONROT); else cfqq->seek_history |= (sdist > CFQQ_SEEK_THR); Thanks, Paolo > Thanks, > Shaohua -- Paolo Valente Algogroup Dipartimento di Scienze Fisiche, Informatiche e Matematiche Via Campi 213/B 41125 Modena - Italy http://algogroup.unimore.it/people/paolo/