Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755535AbcK1WV4 (ORCPT ); Mon, 28 Nov 2016 17:21:56 -0500 Received: from mail-yw0-f193.google.com ([209.85.161.193]:34068 "EHLO mail-yw0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751515AbcK1WVu (ORCPT ); Mon, 28 Nov 2016 17:21:50 -0500 Date: Mon, 28 Nov 2016 17:21:48 -0500 From: Tejun Heo To: Shaohua Li Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Kernel-team@fb.com, axboe@fb.com, vgoyal@redhat.com Subject: Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection Message-ID: <20161128222148.GB12948@htj.duckdns.org> References: <20161123214619.GE11306@mtj.duckdns.org> <20161124011517.GC4724@ksenks-mbp.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161124011517.GC4724@ksenks-mbp.dhcp.thefacebook.com> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2339 Lines: 47 Hello, Shaohua. On Wed, Nov 23, 2016 at 05:15:18PM -0800, Shaohua Li wrote: > > Hmm... I'm not sure thinktime is the best measure here. Think time is > > used by cfq mainly to tell the likely future behavior of a workload so > > that cfq can take speculative actions on the prediction. However, > > given that the implemented high limit behavior tries to provide a > > certain level of latency target, using the predictive thinktime to > > regulate behavior might lead to too unpredictable behaviors. > > Latency just reflects one side of the IO. Latency and think time haven't any > relationship. For example, a cgroup dispatching 1 IO per second can still have > high latency. If we only take latency account, we will think the cgroup is > busy, which is not justified. Yes, the two are indepndent metrics; however, whether a cgroup is considered idle or not affects whether blk-throttle will adhere to the latency target or not. Thinktime is a magic number which can be good but whose behavior can be very difficult to predict from outside the black box. What I was trying to say was that putting in thinktime here can greatly weaken the configured latency target in unobvious ways. > > Moreover, I don't see why we need to bother with predictions anyway. > > cfq needed it but I don't think that's the case for blk-throtl. It > > can just provide idle threshold where a cgroup which hasn't issued an > > IO over that threshold is considered idle. That'd be a lot easier to > > understand and configure from userland while providing a good enough > > mechanism to prevent idle cgroups from clamping down utilization for > > too long. > > We could do this, but it will only work for very idle workload, eg, the > workload is completely idle. If workload dispatches IO sporadically, this will > likely not work. The average think time is more precise for predication. But we can increase sharing by upping the target latency. That should be the main knob - if low, the user wants stricter service guarantee at the cost of lower overall utilization; if high, the workload can deal with higher latency and the system can achieve higher overall utilization. I think the idle detection should be an extra mechanism which can be used to ignore cgroup-disk combinations which are staying idle for a long time. Thanks. -- tejun