2009-06-06 15:43:23

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: ionice priority "none: prio 0" v. "none: prio 1" v. best-effort v. idle?

Hi Linda,

On Fri, Jun 5, 2009 at 11:52 PM, Linda Walsh<[email protected]> wrote:
> <-- vim: se sts=4 sw=4 ts=8 nosi sc ai: /-->
> Thanks for the pointer to the exact C-file for the cfg scheduler, but
> if you intended the 'line' tag to mean anything, line#1579
> would be 38 lines beyond the end of the 1541 line file.

Hhm, which version are you looking at?
cfq-iosched.c is 2676 lines long in my 2.6.30-rc8, and I'm pointing to
function cfq_init_prio_data.
It may be that in your older version, this code is not yet present, or
has a different form.

What I see here:

static void cfq_init_prio_data(struct cfq_queue *cfqq, struct io_context *ioc)
{
struct task_struct *tsk = current;
int ioprio_class;

if (!cfq_cfqq_prio_changed(cfqq))
return;

ioprio_class = IOPRIO_PRIO_CLASS(ioc->ioprio);
switch (ioprio_class) {
default:
printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
case IOPRIO_CLASS_NONE:
/*
* no prio set, inherit CPU scheduling settings
*/
cfqq->ioprio = task_nice_ioprio(tsk);
cfqq->ioprio_class = task_nice_ioclass(tsk);
break;
case IOPRIO_CLASS_RT:
cfqq->ioprio = task_ioprio(ioc);
cfqq->ioprio_class = IOPRIO_CLASS_RT;
break;
case IOPRIO_CLASS_BE:
cfqq->ioprio = task_ioprio(ioc);
cfqq->ioprio_class = IOPRIO_CLASS_BE;
break;
case IOPRIO_CLASS_IDLE:
cfqq->ioprio_class = IOPRIO_CLASS_IDLE;
cfqq->ioprio = 7;
cfq_clear_cfqq_idle_window(cfqq);
break;
}
...

The case IOPRIO_CLASS_NONE is the interesting one, since it uses
task_nice_* functions (defined in include/linux/ioprio.h) to inherit
the cpu scheduler priorities.
The rest of the code can assume that IOPRIO_CLASS_NONE will not
appear, since it was already translated to meaningful values.

<snip>
> This would seem to indicate a fundamental error in cfq's io
> scheduling.
>

As I see the code on 2.6.30, cfq is implementing it correctly, but
differently from what is stated in the man page (that, in turn,
matches what you are seeing from an earlier version of it).

> If the above is not the case -- this is the BEST example of why
> I would like "ionice" to return the actual dynamic "io-priority"
> of a process -- IF, it is set by CPU priority, AND would like it
> to be clear where the "cpu-governed priority" class
> (currently labeled 'none', but ideally would be renamed something
> like 'follow-cpu'?) maybe should be renamed 'follow-cpu'?) is
> in relation to the the other named classes (idle,be,rt).
>
It would be nice, but almost impossible to do. The fact is that the
ioprio acquires a meaning only in combination with an io-scheduler.
Different io-schedulers could, in theory, interpret the none class in
different ways (it is not even mentioned in the man). And a process
that is doing I/O on 2 disks could be talking with 2 different
io-schedulers at the same time. 'ionice' cannot therefore give a
single dynamic priority, and it is much easier to have it just display
the values of the fields, instead of their interpretation.

Corrado

>
> So just how confused am I, or,
>
> Is there a problem with the code (as it appears in this module)?
>
> Tnx,
> Linda
>

--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda