LinuxLists.cc - [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

2004-06-10 05:36:13

Subject: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

Peter Williams wrote:
> The single priority array scheduler (SPA) is a patch that simplifies
> the O(1) scheduler while maintaining its good scalability and
> interactive response characteristics. The patch comes as four sub
> patches to simplify perusal of the changes:

An updated version of this scheduler is now available for 2.6.7-rc3 at:

<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA-v0.1>
<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_IAB-v0.1>
<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_TPB-v0.1>
<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_TSTATS-v0.1>

These patches should be applied in the order that they are listed.

Also, as promised, the first of these patches has been unified with the
staircase scheduler and a patch that implements the staircase scheduler
on top of the first of the above patches is available at:

<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_STAIRCASE-v0.1>

This scheduler is functionally equivalent to Con Kolivas's v6.4
scheduler except that a promotion feature (that will trigger very
infrequently probably never :-)) has been added. This was added
because, although it is extremely unlikely, starvation is possible with
the staircase scheduler and this feature removes that possibility.

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2004-06-11 13:40:17

by Shane Shrybman

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

Hi Peter,

I just started to try out your SPA scheduler patch and found that it is
noticeably sluggish when resizing a mozilla window on the desktop. I
have a profile of 2.6.7-rc3-spa and 2.6.7-rc2-mm2 and put them up at:
http://zeke.yi.org/linux/spa/ . There is also vmstat output there but it
doesn't look too helpful to me.

The test was basic and went like this:

x86, K7, UP, gnome desktop with mozilla (with a bunch of tabs) and a few
rxvts. cmdline= elevator=cfq profile=1

readprofile -r

grab a corner of my mozilla window and continually move it around for
several seconds

readprofile -v -m /boot/System.map-2.6.7-rc3|sort -rn +2|head -n30

do the same while dumping vmstat 1 to a file.

The kernel with your patch had a much harder time keeping up with the
window resizing. Moving the entire window did not seem too bad or not
too noticeable. I tried a similar test while running a kernel compile
(make -j3) and it made the window resizing _really_ slow to respond.

Regards,

Shane

2004-06-12 00:10:16

by Peter Williams

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

Shane Shrybman wrote:
> Hi Peter,
>
> I just started to try out your SPA scheduler patch and found that it is
> noticeably sluggish when resizing a mozilla window on the desktop. I
> have a profile of 2.6.7-rc3-spa and 2.6.7-rc2-mm2 and put them up at:
> http://zeke.yi.org/linux/spa/ . There is also vmstat output there but it
> doesn't look too helpful to me.
>
> The test was basic and went like this:
>
> x86, K7, UP, gnome desktop with mozilla (with a bunch of tabs) and a few
> rxvts. cmdline= elevator=cfq profile=1
>
> readprofile -r
>
> grab a corner of my mozilla window and continually move it around for
> several seconds
>
> readprofile -v -m /boot/System.map-2.6.7-rc3|sort -rn +2|head -n30
>
> do the same while dumping vmstat 1 to a file.
>
> The kernel with your patch had a much harder time keeping up with the
> window resizing. Moving the entire window did not seem too bad or not
> too noticeable. I tried a similar test while running a kernel compile
> (make -j3) and it made the window resizing _really_ slow to respond.

Thanks for the feedback, I'll add your test to those I perform myself.

Some of the control variables for this scheduler have rather arbitrary
values at the moment and are likely to be non optimal. I'm in the
process of making some of these variables modifiable at run time via
/proc/sys to enable experimentation with different settings. Hopefully,
this will enable settings that improve interactive performance to be
established.

Once again, thanks for the feedback
Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2004-06-12 14:50:56

by Shane Shrybman

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

On Fri, 2004-06-11 at 20:10, Peter Williams wrote:
> Shane Shrybman wrote:
> > Hi Peter,
> >
> > I just started to try out your SPA scheduler patch and found that it is
> > noticeably sluggish when resizing a mozilla window on the desktop. I
> > have a profile of 2.6.7-rc3-spa and 2.6.7-rc2-mm2 and put them up at:
> > http://zeke.yi.org/linux/spa/ . There is also vmstat output there but it
> > doesn't look too helpful to me.
> >
[..snip..]
>
> Thanks for the feedback, I'll add your test to those I perform myself.
>

Sure, no problem. Hope it helps.

> Some of the control variables for this scheduler have rather arbitrary
> values at the moment and are likely to be non optimal. I'm in the
> process of making some of these variables modifiable at run time via
> /proc/sys to enable experimentation with different settings. Hopefully,
> this will enable settings that improve interactive performance to be
> established.
>
> Once again, thanks for the feedback
> Peter

Regards,

Shane

2004-06-13 01:15:42

by Peter Williams

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

Shane Shrybman wrote:
> On Fri, 2004-06-11 at 20:10, Peter Williams wrote:
>
>>Shane Shrybman wrote:
>>
>>>Hi Peter,
>>>
>>>I just started to try out your SPA scheduler patch and found that it is
>>>noticeably sluggish when resizing a mozilla window on the desktop. I
>>>have a profile of 2.6.7-rc3-spa and 2.6.7-rc2-mm2 and put them up at:
>>>http://zeke.yi.org/linux/spa/ . There is also vmstat output there but it
>>>doesn't look too helpful to me.
>>>
>
> [..snip..]
>
>>Thanks for the feedback, I'll add your test to those I perform myself.
>>
>
>
> Sure, no problem. Hope it helps.
>
>
>>Some of the control variables for this scheduler have rather arbitrary
>>values at the moment and are likely to be non optimal. I'm in the
>>process of making some of these variables modifiable at run time via
>>/proc/sys to enable experimentation with different settings. Hopefully,
>>this will enable settings that improve interactive performance to be
>>established.

A patch which adds control of the SPA scheduler via /proc/sys/kernel is
available at:

<http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_CNTL-v0.1>

This adds the directory /proc/sys/kernel/cpusched to the /proc directory
if SYSCTL is configured into the kernel. This directory contains 6
files that allow read/write (root only for write) to 6 unsigned integer
parameters that control the scheduler as follows:

time_slice -- (in msecs) controls the amount of CPU time that CPU bound
tasks get before they are booted off the CPU, have their bonuses
reassessed, are put back on the run queue and rescheduled. This has a
default value of 100 (which is the average time slice dealt out by the
original O(1) scheduler) and a maximum allowable value of 1000 (i.e. 1
second) and a minimum of 1. Making this smaller MAY help interactive
response but COULD increase the context switching rate. Making this
larger MAY improve overall system throughput and reduce context
switching rate but MAY make interactive responsiveness worse.

base_promotion_interval -- (in msecs) controls the rate at which the
promotion mechanism promotes runnable tasks. This has a default value
of 55 and a maximum of UINT_MAX and a minimum of 1. The primary effect
of changing this parameter is to alter the "severity" (i.e. how much
more CPU access a lower "nice" will give a task) of "nice" values. The
default value is such that if there are two CPU bound tasks (hard
spinners) running (and time_slice has its default value of 100) and one
of them has a "nice" value one less than the other then that task will
get one and a half times as much CPU access as the other.

max_ai_bonus -- determines the maximum interactive bonus that a task
can receive. This has a default value of 10 and a maximum value of 10.
Setting this value to zero effectively turns off the interactive bonus.

max_tpt_bonus -- determines the maximum throughput bonus that a task
can receive. This has a default value of 5 and a maximum value of 9 (to
keep the maximum number of priority slots <= 160 (or 5 * 32)). Setting
this value to zero effectively turns off the interactive bonus.

ia_threshold -- is the sleepiness (i.e. A_s / (A_s + A_c) where A_s is
the average sleep duration per scheduling cycle and A_c is the average
CPU consumption per scheduling cycle) threshold (in thousandths) above
which the task will be considered interactive and have its interactive
bonus increased asymptotically towards the maximum. This is assessed at
the end of each scheduling cycle and is only used to determine whether
an increase in the bonus is warranted. This has a default value of 900
and a minimum and maximum of 0 and 1000 respectively.

cpu_hog_threshold -- is the idleness (i.e.. (A_s + A_d) / (A_s + A_d +
A_c) where A_c and A_s are as before and A_d is average time spent on
the run queue per cycle waiting for CPU access) threshold (in
thousandths) below which a task will be considered a CPU hog and start
to lose interactive bonus points (if it has any). This is assessed at
the end of each scheduling cycle and at the end of each time slice (if
the task uses an entire slice) and is only used to decrease the task's
interactive bonus. This has a default value of 500 and a minimum and
maximum of 0 and 1000 respectively.

I suspect that it is this last parameter that is the cause of the
phenomena that you reported as the activity you described probably
caused the X servers idleness to become smaller than 500 causing it to
lose some of its interactive bonus. Could you apply this patch and try
decreasing the value of this parameter to see if your problem goes away?
A clue as to how small to make it can be obtained by observing the CPU
usage rate reported by top for the X server when you are stirring it up.
If the reported usage is y% then a threshold of slightly smaller than
(1000 - y * 10) should work.

Thanks
Peter
PS I will probably change the semantics of cpu_hog_threshold in the next
version to use CPU usage rate (the opposite of idleness) as the measure
and reverse the test (i.e. if usage rate is greater than the threshold
the task loses some of its interactive bonus). Its value (A_c / (A_c +
A_s + A_d) NB this is different to the complement of sleepiness) is
slightly cheaper to calculate and I also think it's a more natural way
of thinking about the issue.
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2004-06-13 17:46:48

by Shane Shrybman

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

On Sat, 2004-06-12 at 21:15, Peter Williams wrote:
> Shane Shrybman wrote:
> > On Fri, 2004-06-11 at 20:10, Peter Williams wrote:
> >
> >>Shane Shrybman wrote:
> >>
> >>>Hi Peter,
> >>>
> >>>I just started to try out your SPA scheduler patch and found that it is
> >>>noticeably sluggish when resizing a mozilla window on the desktop. I
> >>>have a profile of 2.6.7-rc3-spa and 2.6.7-rc2-mm2 and put them up at:
> >>>http://zeke.yi.org/linux/spa/ . There is also vmstat output there but it
> >>>doesn't look too helpful to me.
> >>>
> >
> > [..snip..]
> >
> >>Thanks for the feedback, I'll add your test to those I perform myself.
> >>
> >
> >
> > Sure, no problem. Hope it helps.
> >
> >
> >>Some of the control variables for this scheduler have rather arbitrary
> >>values at the moment and are likely to be non optimal. I'm in the
> >>process of making some of these variables modifiable at run time via
> >>/proc/sys to enable experimentation with different settings. Hopefully,
> >>this will enable settings that improve interactive performance to be
> >>established.
>
> A patch which adds control of the SPA scheduler via /proc/sys/kernel is
> available at:
>
> <http://users.bigpond.net.au/Peter-Williams/patch-2_6_7_rc3-SPA_CNTL-v0.1>
>
> This adds the directory /proc/sys/kernel/cpusched to the /proc directory
> if SYSCTL is configured into the kernel. This directory contains 6
> files that allow read/write (root only for write) to 6 unsigned integer
> parameters that control the scheduler as follows:
>
> time_slice -- (in msecs) controls the amount of CPU time that CPU bound
> tasks get before they are booted off the CPU, have their bonuses
> reassessed, are put back on the run queue and rescheduled. This has a
> default value of 100 (which is the average time slice dealt out by the
> original O(1) scheduler) and a maximum allowable value of 1000 (i.e. 1
> second) and a minimum of 1. Making this smaller MAY help interactive
> response but COULD increase the context switching rate. Making this
> larger MAY improve overall system throughput and reduce context
> switching rate but MAY make interactive responsiveness worse.
>
> base_promotion_interval -- (in msecs) controls the rate at which the
> promotion mechanism promotes runnable tasks. This has a default value
> of 55 and a maximum of UINT_MAX and a minimum of 1. The primary effect
> of changing this parameter is to alter the "severity" (i.e. how much
> more CPU access a lower "nice" will give a task) of "nice" values. The
> default value is such that if there are two CPU bound tasks (hard
> spinners) running (and time_slice has its default value of 100) and one
> of them has a "nice" value one less than the other then that task will
> get one and a half times as much CPU access as the other.
>
> max_ai_bonus -- determines the maximum interactive bonus that a task
> can receive. This has a default value of 10 and a maximum value of 10.
> Setting this value to zero effectively turns off the interactive bonus.
>
> max_tpt_bonus -- determines the maximum throughput bonus that a task
> can receive. This has a default value of 5 and a maximum value of 9 (to
> keep the maximum number of priority slots <= 160 (or 5 * 32)). Setting
> this value to zero effectively turns off the interactive bonus.
>
> ia_threshold -- is the sleepiness (i.e. A_s / (A_s + A_c) where A_s is
> the average sleep duration per scheduling cycle and A_c is the average
> CPU consumption per scheduling cycle) threshold (in thousandths) above
> which the task will be considered interactive and have its interactive
> bonus increased asymptotically towards the maximum. This is assessed at
> the end of each scheduling cycle and is only used to determine whether
> an increase in the bonus is warranted. This has a default value of 900
> and a minimum and maximum of 0 and 1000 respectively.
>
> cpu_hog_threshold -- is the idleness (i.e.. (A_s + A_d) / (A_s + A_d +
> A_c) where A_c and A_s are as before and A_d is average time spent on
> the run queue per cycle waiting for CPU access) threshold (in
> thousandths) below which a task will be considered a CPU hog and start
> to lose interactive bonus points (if it has any). This is assessed at
> the end of each scheduling cycle and at the end of each time slice (if
> the task uses an entire slice) and is only used to decrease the task's
> interactive bonus. This has a default value of 500 and a minimum and
> maximum of 0 and 1000 respectively.
>
> I suspect that it is this last parameter that is the cause of the
> phenomena that you reported as the activity you described probably
> caused the X servers idleness to become smaller than 500 causing it to
> lose some of its interactive bonus. Could you apply this patch and try
> decreasing the value of this parameter to see if your problem goes away?
> A clue as to how small to make it can be obtained by observing the CPU
> usage rate reported by top for the X server when you are stirring it up.
> If the reported usage is y% then a threshold of slightly smaller than
> (1000 - y * 10) should work.
>

I am doing the same basic test as before but I should mention a few
things. In mozilla I have ten tabs open, 7 of them with loaded web
pages. Depending on which tab is focused the interactive responsiveness
varies greatly. Performance decreases rapidly with the complexity of the
web page to be rendered. I am using the Debian 4.3.0-7 20040318043201
version of XFree86 with the nv driver. This driver is incredibly hungry
for CPU resources (absolute pig really).

Here is what I see:

At the default setting of 500 I see mozilla and XFree86 each at about
49% cpu (fighting for cpu), each with a priority ~30-33, with no idle
time. Metacity, the window manager, also looks to be trying to get some
CPU, cause it is usually third on the list at ~3% cpu.

As cpu_hog_threshold is decreased to 400. CPU X=50% pri=30, mozilla=40%
pri=34-37, metacity=4.5% pri=29. Feels a tad less jerky, cause metacity
gets a bit more cpu?

Decreasing cpu_hog_threshold to 300 feels worse than 500.

Decreasing the time_slice to 10-20 makes things noticeably smoother. Are
time slices dynamic in the SPA scheduler? and the vanilla (0)1
scheduler?

My humble assessment is that to provide a good interactive feel for this
workload; Xfree86, mozilla and metacity all need to get CPU in a timely
manner. This is a tricky balancing act because XFree86 and mozilla can
easily fall into the cpu hog category, or if they are not cpu hogs they
can act to starve metacity of cpu resources.

Right now I am using cpu_hog_threshold=400 and timeslice=20 and it seems
to be an improvement. But this is fairly subjective of course.

> Thanks
> Peter
> PS I will probably change the semantics of cpu_hog_threshold in the next
> version to use CPU usage rate (the opposite of idleness) as the measure
> and reverse the test (i.e. if usage rate is greater than the threshold
> the task loses some of its interactive bonus). Its value (A_c / (A_c +
> A_s + A_d) NB this is different to the complement of sleepiness) is
> slightly cheaper to calculate and I also think it's a more natural way
> of thinking about the issue.

Regards,

Shane

2004-06-13 23:45:20

by Peter Williams

[permalink] [raw]

Subject: Re: [PATCH][2.6.7-rc3] Single Priority Array CPU Scheduler

Shane Shrybman wrote:
[bits deleted]
>
> I am doing the same basic test as before but I should mention a few
> things. In mozilla I have ten tabs open, 7 of them with loaded web
> pages. Depending on which tab is focused the interactive responsiveness
> varies greatly. Performance decreases rapidly with the complexity of the
> web page to be rendered. I am using the Debian 4.3.0-7 20040318043201
> version of XFree86 with the nv driver. This driver is incredibly hungry
> for CPU resources (absolute pig really).
>
> Here is what I see:
>
> At the default setting of 500 I see mozilla and XFree86 each at about
> 49% cpu (fighting for cpu), each with a priority ~30-33, with no idle
> time. Metacity, the window manager, also looks to be trying to get some
> CPU, cause it is usually third on the list at ~3% cpu.
>
> As cpu_hog_threshold is decreased to 400. CPU X=50% pri=30, mozilla=40%
> pri=34-37, metacity=4.5% pri=29. Feels a tad less jerky, cause metacity
> gets a bit more cpu?
>
> Decreasing cpu_hog_threshold to 300 feels worse than 500.

This probably means that a CPU hog managed to get some interactive bonus
points and the lower threshold stops them being taken away.

>
> Decreasing the time_slice to 10-20 makes things noticeably smoother. Are
> time slices dynamic in the SPA scheduler?

No. Each task gets a fixed fresh time slice when it wakes up and at the
end of an expired time slice.

> and the vanilla (0)1
> scheduler?

The vanilla scheduler uses time slices to control "nice" and the
allocated size varies widely. The default value in SPA is the average
of what vanilla would have awarded. This is not necessarily the best
time slice for interactive response with SPA.

>
> My humble assessment is that to provide a good interactive feel for this
> workload; Xfree86, mozilla and metacity all need to get CPU in a timely
> manner. This is a tricky balancing act because XFree86 and mozilla can
> easily fall into the cpu hog category, or if they are not cpu hogs they
> can act to starve metacity of cpu resources.

It is indeed a tricky balancing act and it would be easier if more was
known about the behavioural characteristics of the protagonists. I have
added a logging switch which when on will log the scheduling statistics
for exiting tasks to the system log and I plan to use this to get an
idea of the characteristics of the tasks involved in a kernel build.

I'm also writing a tool that can be pointed at a particular task (such
as the X server) and gather time series data on the behaviour of that
task using the statistics in /proc/<tgid>/task/<tid>/cpustats.
Hopefully, that will help in working out the best settings or perhaps
suggest other criteria that may be better for deciding interactiveness.

I had hoped that the throughput bonus would have helped in situations
such as the one you describe, where several interactive tasks are
fighting it out, by minimizing queuing. I suspect that the reason it
isn't helping is that a linear mapping from CPU delay to the bonus is
inadequate as it means quite large discrepancies (20%) are required to
get one bonus point. I will look at modifying the mapping function to
give more emphasis to small discrepancies.

>
> Right now I am using cpu_hog_threshold=400 and timeslice=20 and it seems
> to be an improvement. But this is fairly subjective of course.

You might also try increasing the promotion interval a little as this
will accentuate the differences between the allocated priorities.

Thanks again for the feedback
Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce