Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S938754AbcKLFZM (ORCPT ); Sat, 12 Nov 2016 00:25:12 -0500 Received: from mail-pf0-f181.google.com ([209.85.192.181]:34898 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932569AbcKLFZL (ORCPT ); Sat, 12 Nov 2016 00:25:11 -0500 Subject: Re: [PATCH 6/8] block: add scalable completion tracking of requests To: Jan Kara References: <1478034531-28559-1-git-send-email-axboe@fb.com> <1478034531-28559-7-git-send-email-axboe@fb.com> <20161108133007.GP32353@quack2.suse.cz> <20161109090157.GZ32353@quack2.suse.cz> <3c7e3183-a7e3-4219-54ca-65c9f45b6d5b@fb.com> <20161110193808.GF31098@quack2.suse.cz> Cc: Jens Axboe , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, hch@lst.de From: Jens Axboe Message-ID: <9a412943-1194-b12f-e728-7c3504041a4c@kernel.dk> Date: Fri, 11 Nov 2016 22:19:37 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161110193808.GF31098@quack2.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3711 Lines: 82 On 11/10/2016 12:38 PM, Jan Kara wrote: > On Wed 09-11-16 12:52:25, Jens Axboe wrote: >> On 11/09/2016 09:09 AM, Jens Axboe wrote: >>> On 11/09/2016 02:01 AM, Jan Kara wrote: >>>> On Tue 08-11-16 08:25:52, Jens Axboe wrote: >>>>> On 11/08/2016 06:30 AM, Jan Kara wrote: >>>>>> On Tue 01-11-16 15:08:49, Jens Axboe wrote: >>>>>>> For legacy block, we simply track them in the request queue. For >>>>>>> blk-mq, we track them on a per-sw queue basis, which we can then >>>>>>> sum up through the hardware queues and finally to a per device >>>>>>> state. >>>>>>> >>>>>>> The stats are tracked in, roughly, 0.1s interval windows. >>>>>>> >>>>>>> Add sysfs files to display the stats. >>>>>>> >>>>>>> Signed-off-by: Jens Axboe >>>>>> >>>>>> This patch looks mostly good to me but I have one concern: You track >>>>>> statistics in a fixed 134ms window, stats get cleared at the >>>>>> beginning of >>>>>> each window. Now this can interact with the writeback window and >>>>>> latency >>>>>> settings which are dynamic and settable from userspace - so if the >>>>>> writeback code observation window gets set larger than the stats >>>>>> window, >>>>>> things become strange since you'll likely miss quite some observations >>>>>> about read latencies. So I think you need to make sure stats window is >>>>>> always larger than writeback window. Or actually, why do you have >>>>>> something >>>>>> like stats window and don't leave clearing of statistics completely >>>>>> to the >>>>>> writeback tracking code? >>>>> >>>>> That's a good point, and there actually used to be a comment to that >>>>> effect in the code. I think the best solution here would be to make the >>>>> stats code mask available somewhere, and allow a consumer of the stats >>>>> to request a larger window. >>>>> >>>>> Similarly, we could make the stat window be driven by the consumer, as >>>>> you suggest. >>>>> >>>>> Currently there are two pending submissions that depend on the stats >>>>> code. One is this writeback series, and the other one is the hybrid >>>>> polling code. The latter does not really care about the window size as >>>>> such, since it has no monitoring window of its own, and it wants the >>>>> auto-clearing as well. >>>>> >>>>> I don't mind working on additions for this, but I'd prefer if we could >>>>> layer them on top of the existing series instead of respinning it. >>>>> There's considerable test time on the existing patchset. Would that work >>>>> for you? Especially collapsing the stats and wbt windows would require >>>>> some re-architecting. >>>> >>>> OK, that works for me. Actually, when thinking about this, I have one >>>> more >>>> suggestion: Do we really want to expose the wbt window as a sysfs >>>> tunable? >>>> I guess it is good for initial experiments but longer term having the wbt >>>> window length be a function of target read latency might be better. >>>> Generally you want the window length to be considerably larger than the >>>> target latency but OTOH not too large so that the algorithm can react >>>> reasonably quickly so that suggests it could really be autotuned (and we >>>> scale the window anyway to adapt it to current situation). >>> >>> That's not a bad idea, I have thought about that as well before. We >>> don't need the window tunable, and you are right, it can be a function >>> of the desired latency. >>> >>> I'll hardwire the 100msec latency window for now and get rid of the >>> exposed tunable. It's harder to remove sysfs files once they have made >>> it into the kernel... >> >> Killed the sysfs variable, so for now it'll be a 100msec window by >> default. > > OK, I guess good enough to get this merged. Thanks Jan! -- Jens Axboe