Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754331Ab0FKFKj (ORCPT ); Fri, 11 Jun 2010 01:10:39 -0400 Received: from smtp-out.google.com ([216.239.44.51]:44954 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754064Ab0FKFKe convert rfc822-to-8bit (ORCPT ); Fri, 11 Jun 2010 01:10:34 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:content-transfer-encoding; b=XQF7sDwuGnsMYlGV+VH56FCKhqt6FWouoZBikRazxPZeuYuGxb5U6obTe4XQRMdk1 SY9q64LT5wNJ31HlYFEvg== MIME-Version: 1.0 In-Reply-To: <20100525132554.GA3310@redhat.com> References: <4BF64712.1070500@cn.fujitsu.com> <20100521131751.GA15302@redhat.com> <4BF9D265.70008@cn.fujitsu.com> <20100524212207.GC28685@redhat.com> <4BFB29DB.10507@cn.fujitsu.com> <20100525020339.GA8214@redhat.com> <4BFB3D66.1080001@cn.fujitsu.com> <20100525132554.GA3310@redhat.com> From: Divyesh Shah Date: Thu, 10 Jun 2010 22:10:09 -0700 Message-ID: Subject: Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status To: Vivek Goyal Cc: Gui Jianfeng , Jens Axboe , linux kernel mailing list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5513 Lines: 118 On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal wrote: > On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote: >> Vivek Goyal wrote: >> > On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote: >> >> Vivek Goyal wrote: >> >>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote: >> >>>> Vivek Goyal wrote: >> >>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote: >> >>>>>> Hi, >> >>>>>> >> >>>>>> This series implements three new interfaces to keep track of tranferred bytes, >> >>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues >> >>>>>> from service tree, these three interfaces will reset and shows zero. >> >>>>> Hi Gui, >> >>>>> >> >>>>> Can you give some details regarding how this functionality is useful? Why >> >>>>> would somebody be interested in only in stats of till group was >> >>>>> backlogged and not in total stats? >> >>>>> >> >>>>> Groups can come and go so fast and these stats will reset so many times >> >>>>> that I am not able to visualize how these stats will be useful. >> >>>> Hi Vivek, >> >>>> >> >>>> Currently, we assign weight to a group, but user still doesn't know how fast the >> >>>> group runs. With io rate interface, users can check the rate of a group at any >> >>>> moment, or to determine whether the weight assigned to a group is enough. >> >>>> bytes and time interface is just for debug purpose. >> >>> Gui, >> >>> >> >>> I still don't understand that why blkio.sectors or blkio.io_service_bytes >> >>> or blkio.io_serviced interfaces are not good enough to determine at what >> >>> rate a group is doing IO. >> >>> >> >>> I think we can very well write something in userspace like "iostat" to >> >>> display the per group rate. Utility can read the any of the above files >> >>> say at the interfval of 1s, calculate the diff between the values and >> >>> display that as group effective rate. >> >> Hi Vivek, >> >> >> >> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth >> >> value. This value represents the actual rate a group runs. IMO, io rate calculated from >> >> user space is not accurate in following two scenarios: >> >> >> >> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the >> >> ? rate calculated in this interval doesn't make sense. >> >> >> > >> > If you are not servicing groups for long time, anyway it is very bad for >> > latency. So that's why soft limit of 300ms of CFQ makes sense and >> > practically I am not sure you will be blocking groups for .5s. >> > >> > Even if you do, then user just needs to choose a bigger interval and you >> > will see more smooth rates. Reduce the interval and you might see little >> > bursty rate. >> >> Vivek, >> >> IIUC, the most big problem for user app is the user app doesn't know how long >> the group has been dequeued during the interval. For example, user choose >> 10s interval, 8s of which is not backlogged, but when user app calculates >> io rate, this 8s still include. So this rate isn't what we want. Am i missing >> something? > > Gui, > > If user application is not doing enough IO and group is getting deleted > fast, io_active_rate is not going to give you any meaningful data as it > will be lost the moment group gets deleted. > > Hence one needs to monitor the IO rate when a workload is running and is > keeping disk busy more or less all the time. > > Even in your example, if you monitored IO rate over 10 second interval and > group is not doing any IO, you just can't do anything about it. Just that > your measurement e method is wrong. Even io_active_rate will not help you > here as by the time you read the file, group is gone and there is no data. > > The very reason you want to monitor rate is that you want to make sure > group is getting enough BW. If group is not doing IO then one can look at > blkio.dequeue file and see if group is getting deleted too frequently. If > yes, that means group is not doing enough IO to keep the disk busy. One > can also try increasing the weight of the group but that will not help > much if group does not remain backlogged for significant amount of time. > >> "io_active_rate" will never take un-backlogged time into account when calculating >> io rate. >> > > Theoritically blkio.sectors/blkio.time gives the rate excluding the time > when group was not backlogged? I agree with Vivek here. We use blkio.time as a source for io rate count for each cgroup, knowing that it is not entirely accurate but a good enough approximation. Gui, if you want to find out whether the cgroup has enough weight or not, I'd recommend looking at the wait_time stat in addition to blkio.time. It has been very useful in identifying jobs that are not getting enough IO done due to less weight assigned to them. > > But I will not recommend using blkio.time as it is very approximate. > > I really am not able to see what this interface is really buying you. > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at ?http://vger.kernel.org/majordomo-info.html > Please read the FAQ at ?http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/