LinuxLists.cc - Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status

2010-06-11 05:10:39

Subject: Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status

On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <[email protected]> wrote:
> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>> > On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>> >> Vivek Goyal wrote:
>> >>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>> >>>> Vivek Goyal wrote:
>> >>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>> >>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>> >>>>>> from service tree, these three interfaces will reset and shows zero.
>> >>>>> Hi Gui,
>> >>>>>
>> >>>>> Can you give some details regarding how this functionality is useful? Why
>> >>>>> would somebody be interested in only in stats of till group was
>> >>>>> backlogged and not in total stats?
>> >>>>>
>> >>>>> Groups can come and go so fast and these stats will reset so many times
>> >>>>> that I am not able to visualize how these stats will be useful.
>> >>>> Hi Vivek,
>> >>>>
>> >>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>> >>>> group runs. With io rate interface, users can check the rate of a group at any
>> >>>> moment, or to determine whether the weight assigned to a group is enough.
>> >>>> bytes and time interface is just for debug purpose.
>> >>> Gui,
>> >>>
>> >>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>> >>> or blkio.io_serviced interfaces are not good enough to determine at what
>> >>> rate a group is doing IO.
>> >>>
>> >>> I think we can very well write something in userspace like "iostat" to
>> >>> display the per group rate. Utility can read the any of the above files
>> >>> say at the interfval of 1s, calculate the diff between the values and
>> >>> display that as group effective rate.
>> >> Hi Vivek,
>> >>
>> >> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>> >> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>> >> user space is not accurate in following two scenarios:
>> >>
>> >> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>> >> ? rate calculated in this interval doesn't make sense.
>> >>
>> >
>> > If you are not servicing groups for long time, anyway it is very bad for
>> > latency. So that's why soft limit of 300ms of CFQ makes sense and
>> > practically I am not sure you will be blocking groups for .5s.
>> >
>> > Even if you do, then user just needs to choose a bigger interval and you
>> > will see more smooth rates. Reduce the interval and you might see little
>> > bursty rate.
>>
>> Vivek,
>>
>> IIUC, the most big problem for user app is the user app doesn't know how long
>> the group has been dequeued during the interval. For example, user choose
>> 10s interval, 8s of which is not backlogged, but when user app calculates
>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>> something?
>
> Gui,
>
> If user application is not doing enough IO and group is getting deleted
> fast, io_active_rate is not going to give you any meaningful data as it
> will be lost the moment group gets deleted.
>
> Hence one needs to monitor the IO rate when a workload is running and is
> keeping disk busy more or less all the time.
>
> Even in your example, if you monitored IO rate over 10 second interval and
> group is not doing any IO, you just can't do anything about it. Just that
> your measurement e method is wrong. Even io_active_rate will not help you
> here as by the time you read the file, group is gone and there is no data.
>
> The very reason you want to monitor rate is that you want to make sure
> group is getting enough BW. If group is not doing IO then one can look at
> blkio.dequeue file and see if group is getting deleted too frequently. If
> yes, that means group is not doing enough IO to keep the disk busy. One
> can also try increasing the weight of the group but that will not help
> much if group does not remain backlogged for significant amount of time.
>
>> "io_active_rate" will never take un-backlogged time into account when calculating
>> io rate.
>>
>
> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
> when group was not backlogged?

I agree with Vivek here. We use blkio.time as a source for io rate
count for each cgroup, knowing that it is not entirely accurate but a
good enough approximation.

Gui, if you want to find out whether the cgroup has enough weight or
not, I'd recommend looking at the wait_time stat in addition to
blkio.time. It has been very useful in identifying jobs that are not
getting enough IO done due to less weight assigned to them.

>
> But I will not recommend using blkio.time as it is very approximate.
>
> I really am not able to see what this interface is really buying you.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

2010-06-11 06:33:31

by Gui, Jianfeng/归剑峰

[permalink] [raw]

Subject: Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status

Divyesh Shah wrote:
> On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <[email protected]> wrote:
>> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>>> Vivek Goyal wrote:
>>>> On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>>>>> Vivek Goyal wrote:
>>>>>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>>>>>>> Vivek Goyal wrote:
>>>>>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>>>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>>>>>>> from service tree, these three interfaces will reset and shows zero.
>>>>>>>> Hi Gui,
>>>>>>>>
>>>>>>>> Can you give some details regarding how this functionality is useful? Why
>>>>>>>> would somebody be interested in only in stats of till group was
>>>>>>>> backlogged and not in total stats?
>>>>>>>>
>>>>>>>> Groups can come and go so fast and these stats will reset so many times
>>>>>>>> that I am not able to visualize how these stats will be useful.
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>>>>>>> group runs. With io rate interface, users can check the rate of a group at any
>>>>>>> moment, or to determine whether the weight assigned to a group is enough.
>>>>>>> bytes and time interface is just for debug purpose.
>>>>>> Gui,
>>>>>>
>>>>>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>>>>>> or blkio.io_serviced interfaces are not good enough to determine at what
>>>>>> rate a group is doing IO.
>>>>>>
>>>>>> I think we can very well write something in userspace like "iostat" to
>>>>>> display the per group rate. Utility can read the any of the above files
>>>>>> say at the interfval of 1s, calculate the diff between the values and
>>>>>> display that as group effective rate.
>>>>> Hi Vivek,
>>>>>
>>>>> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>>>>> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>>>>> user space is not accurate in following two scenarios:
>>>>>
>>>>> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>>>>> rate calculated in this interval doesn't make sense.
>>>>>
>>>> If you are not servicing groups for long time, anyway it is very bad for
>>>> latency. So that's why soft limit of 300ms of CFQ makes sense and
>>>> practically I am not sure you will be blocking groups for .5s.
>>>>
>>>> Even if you do, then user just needs to choose a bigger interval and you
>>>> will see more smooth rates. Reduce the interval and you might see little
>>>> bursty rate.
>>> Vivek,
>>>
>>> IIUC, the most big problem for user app is the user app doesn't know how long
>>> the group has been dequeued during the interval. For example, user choose
>>> 10s interval, 8s of which is not backlogged, but when user app calculates
>>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>>> something?
>> Gui,
>>
>> If user application is not doing enough IO and group is getting deleted
>> fast, io_active_rate is not going to give you any meaningful data as it
>> will be lost the moment group gets deleted.
>>
>> Hence one needs to monitor the IO rate when a workload is running and is
>> keeping disk busy more or less all the time.
>>
>> Even in your example, if you monitored IO rate over 10 second interval and
>> group is not doing any IO, you just can't do anything about it. Just that
>> your measurement e method is wrong. Even io_active_rate will not help you
>> here as by the time you read the file, group is gone and there is no data.
>>
>> The very reason you want to monitor rate is that you want to make sure
>> group is getting enough BW. If group is not doing IO then one can look at
>> blkio.dequeue file and see if group is getting deleted too frequently. If
>> yes, that means group is not doing enough IO to keep the disk busy. One
>> can also try increasing the weight of the group but that will not help
>> much if group does not remain backlogged for significant amount of time.
>>
>>> "io_active_rate" will never take un-backlogged time into account when calculating
>>> io rate.
>>>
>> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
>> when group was not backlogged?
>
> I agree with Vivek here. We use blkio.time as a source for io rate
> count for each cgroup, knowing that it is not entirely accurate but a
> good enough approximation.
>
> Gui, if you want to find out whether the cgroup has enough weight or
> not, I'd recommend looking at the wait_time stat in addition to
> blkio.time. It has been very useful in identifying jobs that are not
> getting enough IO done due to less weight assigned to them.

Ok, see. :)

Thanks,
Gui

>
>> But I will not recommend using blkio.time as it is very approximate.
>>
>> I really am not able to see what this interface is really buying you.
>>
>> Thanks
>> Vivek
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>