DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns;
	h=message-id:date:from:to:subject:cc:in-reply-to:
	mime-version:content-type:content-transfer-encoding:
	content-disposition:references;
	b=ox3G2v4KUVc9KgKLHGD+nhTZLYPL+poCqPifC0/GzzRl6bUwzHROtjt8MIZSpFvft
	wKXQkI4NX4/PRw8lwa/9g==
Message-ID: <2846be6b0810281817x70ea8d4ev378f4893aaf1f61e@mail.gmail.com>
Date: Tue, 28 Oct 2008 18:17:05 -0700
From: "Naveen Gupta" <ngupta@google.com>
To: "Aaron Carroll" <aaronc@gelato.unsw.edu.au>
Subject: Re: [PATCH] Priorities in Anticipatory I/O scheduler
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
       akpm@linux-foundation.org, s-uchida@ap.jp.nec.com, david@fromorbit.com
In-Reply-To: <4907AEE7.5030508@gelato.unsw.edu.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20081027190131.070061000@elf.corp.google.com>
	 <20081027190139.838646000@elf.corp.google.com>
	 <20081028002024.GM4985@disturbed>
	 <2846be6b0810281014q495cef22mae344423ed59c71a@mail.gmail.com>
	 <20081028214443.GX4985@disturbed>
	 <2846be6b0810281548oc81fbe4td2e1a5e2fba18745@mail.gmail.com>
	 <20081028233101.GD17077@disturbed>
	 <2846be6b0810281704r5092c415n3fea9c849c6086ca@mail.gmail.com>
	 <4907AEE7.5030508@gelato.unsw.edu.au>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6418
Lines: 137

2008/10/28 Aaron Carroll <aaronc@gelato.unsw.edu.au>:
> Naveen Gupta wrote:
>> 2008/10/28 Dave Chinner <david@fromorbit.com>:
>>> On Tue, Oct 28, 2008 at 03:48:44PM -0700, Naveen Gupta wrote:
>>>> I/O from RT class in CFQ can still see a bubble with this new latency
>>>> class. An easy way to check this would be to submit ios at multiple
>>>> levels both in CFQ and AS and check max latency of the highest levels.
>>>> I will let Jens or Satoshi comment on exact algorithm for RT class.
>>> You're missing my point entirely.
>>>
>>> You're defining a new class that has the exact same meaning as
>>> the current RT class definition, then mapping the BE class over
>>> the top of that, hence changing what that means for everyone.
>>>
>>> The fact that the *implementation* of AS and CFQ is different is
>>> irrelevant; if you use the RT class then on CFQ you get the current
>>> RT behaviour, if you use the RT class on AS you should get your new
>>> priority dispatch mechanism. We don't need a new API just because
>>> the implementations are different.
>>>
>>
>> There is nothing "real-time" about the current RT class anyways. It is
>
> Yes, this is stupid.  IMO the real time class should be strict priorities
> within the class, and within the same priority level, round robin.  As it
> stands, RT seems to be just like a second BE class.
>
>> basically these small *implementation* differences that defines these
>> classes in current scheme of things, precise definitions of which
>> would be very hard to find if one started looking around.
>>
>> The current implementation of AS is basically a flat structure with
>> multiple priority levels. Initially I planned them to be different
>> levels of best-effort class, which is exactly what we are doing
>> "best-effort" from the scheduler/software point of view. So, the
>> question is what you do with other classes for which you don't have a
>> significantly different behavior: to keep things simple you map them
>> to existing flat structure. And, I mapped RT (all levels to BE 0),
>> idle (all levels to BE 7).
>
> Even compared to CFQs broken RT handling, this is wrong, because now
> any old BE0 process is equal in priority to any RT process.
>
>> This leaves these RT and IDLE classes open for future implementations,
>> where one could use hardware priorities (may be in NCQ) to implement
>> RT class or other improvisations in software other than schedulers to
>> map to RT class.
>>
>> Now the initial feedback was since this *implementation* is different
>> from anything we have in CFQ which is our current *standard* way of
>> thinking and comparing (that is the only thing that exists) why not
>> make them into a new class :). And somehow map others so that they
>> make some sense till we get something for those classes as well.
>>
>>>>>> So, in some sense it kind of implements absolute priority and
>>>>>> is best used for jobs which are latency sensitive.  Since the
>>>>>> priorities can be and are mapped internally in anticipatory
>>>>>> scheduler, BEST_EFFORT class is mapped one-one with the LATENCY
>>>>>> class.
>>>>> So you map the BE class to something with the same semantics as
>>>>> the RT class? What mapping do you do when an application uses
>>>>> the RT class?
>>>>>
>>>> Yes I could have used RT class but it was used in CFQ to implement
>>>> it's time-sliced based highest priority class.  If an application
>>>> uses RT class, AS maps all levels of RT class to BE class level 0
>>>> (i.e. to the highest priority available)
>>> Which means you are throwing away all the RT priority levels and
>>> so an application using the RT class would be subtly broken on AS....
>>>
>>
>> As I said earlier the organization of the AS levels is flat, so we
>> could use any class (RT, BE, LATENCY) and fold the remaining ones. The
>> other way which you would probably like is to increase number of
>> levels and map different classes so that they are not folded.
>
> As I said in my reply to the initial posting of this, I think there are
> only two sensible ways of handling this:
>
>  1) Maintain the full number of I/O priorities (1 IDLE, 8 BE, 8 RT);

But then we are assuming that we are providing different quality of
service according to classes.

>  2) Collapse the levels and only deal with the classes;

I am not sure if this is meaningful. When all we have is different
levels of BE, it wouldn't make sense to call them different classes.
>
> Any other mapping seems arbitrary and likely to confuse.
>
>>>>>> A filesystem can use best-effort class using similar interface
>>>>>> as for cfq.
>>>>> The folk using the RT priority classes greatly objected to using
>>>>> the RT class for journal I/O precisely because it would then
>>>>> preempt their application's RT I/O and introduce unpredictable
>>>>> latencies.
>>>>>
>>>>> Journal I/O will typically use the highest priority BE class so
>>>>> that it is promoted above BE I/O but does not preempt RT I/O.
>>>>> With your mapping of BE classes to this new "absolute priority
>>>>> latency" class, this configuration will give journal I/O the
>>>>> highest priority in the scheduler. This will cause preemption of
>>>>> your latency sensitive I/O and so those latencies you are trying
>>>>> to avoid won't go away....
>>>>>
>>>> I see your problem, we could make the LATENCY class different from
>>>> and above BE class (instead of one-one mapping).
>>> Like the RT class is currently defined to be? ;)
>>>
>>
>> I agree with you and we could use RT (though you and I know that
>> basically it is best effort). LATENCY was invented due to a previous
>> suggestion.
>
> Maybe what you want to do is make RT really real-time, and then use this
> latency class to differentiate latency-sensitive BE traffic from regular
> BE traffic.  Not necessarily ``higher'' priority, just a different kind of
> best-effort.  One way of implementing this in CFQ might be to have smaller
> but more frequent dispatches.
>
>
> Also from the original posting, I think the weights are still broken
> (especially in the context of RT) but I won't repeat that here.

Sorry I am out of context. I can look at them later.

>
>
>    -- Aaron
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/