by Naveen Gupta

[permalink] [raw]

Subject: Re: [PATCH] Priorities in Anticipatory I/O scheduler

2008/10/28 Aaron Carroll <[email protected]>:
> Naveen Gupta wrote:
>> 2008/10/28 Aaron Carroll <[email protected]>:
>>> Naveen Gupta wrote:
>>>> As I said earlier the organization of the AS levels is flat, so we
>>>> could use any class (RT, BE, LATENCY) and fold the remaining ones. The
>>>> other way which you would probably like is to increase number of
>>>> levels and map different classes so that they are not folded.
>>> As I said in my reply to the initial posting of this, I think there are
>>> only two sensible ways of handling this:
>>>
>>> 1) Maintain the full number of I/O priorities (1 IDLE, 8 BE, 8 RT);
>>
>> But then we are assuming that we are providing different quality of
>> service according to classes.
>
> Right. The ideal solution is a scheduler-independent definition of
> RT (Jens?) which you can apply here. However, it seems to me that you
> want to basically ignore RT and IDLE. If you're going to do that, at
> least implement sane alternate behaviour.
>
> This solution applies the the principle of least surprise; RT requests
> always have higher priority than BE requests, and within the class,
> higher level means higher priority. In your implementation, BE 0 == RT x
> and IDLE == BE 7. This is surprising behaviour.

Aaron, I took care of these in reply to Dave's email.

>
>>> 2) Collapse the levels and only deal with the classes;
>>
>> I am not sure if this is meaningful. When all we have is different
>> levels of BE, it wouldn't make sense to call them different classes.
>
> It's not meaningful as it stands. This difference here is that you
> at least maintain the ordering of the classes with respect to priority.

I am not sure that giving one level to each class would be an
acceptable solution.

>
>
> -- Aaron
>
>
>

2008-10-29 21:33:44

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH] Priorities in Anticipatory I/O scheduler

On Wed, Oct 29, 2008 at 01:49:49AM -0700, Naveen Gupta wrote:
> 2008/10/28 Dave Chinner <[email protected]>:
> >> Now the initial feedback was since this *implementation* is different
> >> from anything we have in CFQ which is our current *standard* way of
> >> thinking and comparing (that is the only thing that exists) why not
> >> make them into a new class :).
> >
> > Because it make it impossible to optimise application code as the
> > class that needs to be used is entirely dependent on the
> > configuration of the machine that it is running on. Application
> > writers are not going to probe the I/O scheduler the block device
> > is using to determine if they should be using RT or LATENCY class
> > prioritisation. From a user POV they do *exactly the same thing*,
> > so they should use the same behavioural classes defined by the API.
>
> I agree with you that we need an API which is valid across schedulers.
> But one has to agree that this sort of thing has it's own limitations.
> We are assuming that every scheduler which implements any kind of
> priority has a valid implementation of RT, BE, Idle class, which in
> this we we don't have. What happens tomorrow once we have a scheduler
> which decides that it needs to divide b/w. Which class would one map
> it to?

Throttling does not belong in the elevator. It can be successfully
done generically above the elevator in DM. See the dm-ioband
patches, for example. The elevator is for optimising scheduling of
issued I/O, not controlling every aspect of the I/O path.

> As I understand what you are asking for is: filesystem i/o can use BE
> 0 across all schedulers for journal updates. And you still have RT
> levels to take care of any higher priority i/o which need not wait for
> journal updates.

No, I wanted to use the very highest priority available for the
journal updates. The folk using the real-time priority class didn't
like that, and suggested that the highest BE priority would be
better so journal I/O didn't preempt their RT data I/O. So what I'm
saying is based on feedback from ppl actually using the RT class for
their RT applications...

This is what I've ben trying to tell you and I have so far been
unsuccessful at getting through to you - there are ppl using
this API because it's exposed to userspace so we can't just change it
whenever someone feels like it.

> Here is what we can do:
> 1. Add 17 levels. top 8 RT, next 8 BE and last 1 idle. Though we know
> they all are similar in implementation. It's just that RT > BE > idle
> in importance.

Yes, just like CPU scheduling. We had a RT class there long before
we could really do RT scheduling. Also, nobody suggested introducing
a new "latency class" to the CPU scheduler to fix problems with the
RT scheduling - they fixed the scheduler instead and the API did not
change. We should be following the exact same model for I/O
scheduling priorities.

> And if the LATENCY camp is still active, add another
> class LATENCY which in context of AS is same as RT. So you get to keep
> RT > BE and they get Latency.

Just drop the whole "latency" idea altogether - it's just
another way of saying "use an rt-like priority mechanism", which
we already have a class for.

> 2. Add 10 levels instead of current 8. top 1 level maps all 8 RT
> levels. next 8 are BE and last 1 maps to idle. This also gives you
> access to BE 0, while all RT levels are higher priority than BE. It
> discourages people from using different RT levels unless we find a new
> meaning for it in context of AS.

That doesn't seem like a very good idea to me - RT is there, ppl are
using it, so not supporting it means that the ppl who really care
about I/O latency will continue to avoid using the AS scheduler...

Cheers,

Dave.
--
Dave Chinner
[email protected]