Hi,
I have been adapting Rick's schedstats package to extract
more information from the sched-domains infrastructure.
Before I release a patch, I'd like some input as to what
statistics people want covered, and in what form they would
like them presented (I'm talking only about balancing).
I have covered the basic stuff I could think of, and
written a small C program to parse it (I'm no good at perl)
and output this (sorry it is still pretty ugly):
npiggin@didi:~/usr/src/linux-2.4$ stats7 pre post
For domain0
31.005358l load balance calls / s move 1.162474l tasks / s
Of which, 66.198008l% calls and 67.187500l% task moves from idle balancing
93.539823l% found no imbalance
2.654867l% found an imbalance but failed
30.232558l% of tasks were moved with cache nice
Of which, 25.834798l% calls and 28.125000l% task moves from busy balancing
95.918367l% found no imbalance
0.000000l% found an imbalance but failed
100.000000l% of tasks were moved with cache nice
Of which, 7.967194l% calls and 4.687500l% task moves from newidle balancing
94.117647l% found no imbalance
0.000000l% found an imbalance but failed
100.000000l% of tasks were moved with cache nice
0.000000l active balances / s move 0.000000l tasks / s
0.036327l passive load balances / s
2.070657l affine wakeups / s
0.000000l exec balances / s
This was the behaviour during a make -j4 bzImage on a 2xSMP. For
a NUMA system, it would also give you domain1 for example.
A few interesting things this tells us: load_balance is being
called 31 times per second, ~95% of the time there is no imbalance,
and it moves 1.16 tasks per second.
idle balancing is going over the cache_nice_tries limit 70% of
the time which might warrant cache_nice_tries being increased.
etc.
Comments?
The important thing, as always, is that collecting the stats not impact
the action being taken. If you stick with incrementing counters and
not taking additional locks, then you've probably done what you can to
minimize any impact.
>From an analysis standpoint it would be nice to know which of the major
features are being activated for a particular load. So imbalance-driven
moves, power-driven moves, and the number of times each domain tried
to balance and failed would all be useful. I think your output covered
those.
Another useful stat might be how many times the self-adjusting fields
(min, max) adjusted themselves. That might yield some insights on
whether that's working well (or correctly).
When I started thinking about these stats, I started thinking about how to
identify the domains. "domain0" and "domain1" do uniquely identify some
data structures, but especially as they get hierarchical, can we easily
tie them to the cpus they manage? Perhaps the stats should include a
bitmap of what cpus are covered by the domain too.
Looks very useful for those times when some workload causes the scheduler
to burp -- between scheduler stats and domain stats we may find it much
easier to track down issues.
Would you say these would be in addition to the schedstats or would
these replace them?
Rick
Rick Lindsley wrote:
> The important thing, as always, is that collecting the stats not impact
> the action being taken. If you stick with incrementing counters and
> not taking additional locks, then you've probably done what you can to
> minimize any impact.
>
Yes, they're all simple increments without the need for any
locking.
>>From an analysis standpoint it would be nice to know which of the major
> features are being activated for a particular load. So imbalance-driven
> moves, power-driven moves, and the number of times each domain tried
> to balance and failed would all be useful. I think your output covered
> those.
>
It doesn't get into the finer points of how the imbalance
is derived, but maybe it should...
> Another useful stat might be how many times the self-adjusting fields
> (min, max) adjusted themselves. That might yield some insights on
> whether that's working well (or correctly).
>
Might be a good idea.
> When I started thinking about these stats, I started thinking about how to
> identify the domains. "domain0" and "domain1" do uniquely identify some
> data structures, but especially as they get hierarchical, can we easily
> tie them to the cpus they manage? Perhaps the stats should include a
> bitmap of what cpus are covered by the domain too.
>
Well, every domain that is reported here will cover the entire
system because it simply takes the sum of statistics from all
domains.
It is a good overview, but it probably would be a good idea to
be able to break down the views and zoom in a bit.
> Looks very useful for those times when some workload causes the scheduler
> to burp -- between scheduler stats and domain stats we may find it much
> easier to track down issues.
>
> Would you say these would be in addition to the schedstats or would
> these replace them?
It will replace some of them, I think.
For example, all load_balance operations are done within the
context of a sched domain, so you would use the sched domain's
statistics there. However you have other statistics that are
specific to the runqueue, for example, which would stay where
they are.
Thanks
Nick
From an analysis standpoint it would be nice to know which of
the major features are being activated for a particular load.
So imbalance-driven moves, power-driven moves, and the number of
times each domain tried to balance and failed would all be useful.
I think your output covered those.
It doesn't get into the finer points of how the imbalance is derived,
but maybe it should...
It's ok to wait and see if those are useful before implementing them. I
suspect they would be relatively easily added if they were needed.
One reason there are 6 versions of scheduler statistics is that the
information needed kept changing, both due to a better understanding of
bottlenecks and due to changing code.
Well, every domain that is reported here will cover the entire system
because it simply takes the sum of statistics from all domains.
I would suggest creating an output format that gives you all this
information (since we have it anyway) but I think it is quite reasonable
for the program which *interprets* this information to summarize it.
Would you say these would be in addition to the schedstats or
would these replace them?
It will replace some of them, I think.
That's my thought too. I would suggest that we merge them into one patch.
Much as I'd like to see my schedstats hit the mainline, I think it
is prudent to separate the major architectural changes sched-domains
introduces from statistics both related and unrelated to them --
and having two statistics patches for the scheduler, even if they are
complementary, makes it harder on Andrew and more confusing for users.
Rick
Rick Lindsley wrote:
> From an analysis standpoint it would be nice to know which of
> the major features are being activated for a particular load.
> So imbalance-driven moves, power-driven moves, and the number of
> times each domain tried to balance and failed would all be useful.
> I think your output covered those.
>
> It doesn't get into the finer points of how the imbalance is derived,
> but maybe it should...
>
> It's ok to wait and see if those are useful before implementing them. I
> suspect they would be relatively easily added if they were needed.
> One reason there are 6 versions of scheduler statistics is that the
> information needed kept changing, both due to a better understanding of
> bottlenecks and due to changing code.
>
Yep.
> Well, every domain that is reported here will cover the entire system
> because it simply takes the sum of statistics from all domains.
>
> I would suggest creating an output format that gives you all this
> information (since we have it anyway) but I think it is quite reasonable
> for the program which *interprets* this information to summarize it.
>
OK, yeah that is a fine idea.
> Would you say these would be in addition to the schedstats or
> would these replace them?
>
> It will replace some of them, I think.
>
> That's my thought too. I would suggest that we merge them into one patch.
> Much as I'd like to see my schedstats hit the mainline, I think it
> is prudent to separate the major architectural changes sched-domains
> introduces from statistics both related and unrelated to them --
> and having two statistics patches for the scheduler, even if they are
> complementary, makes it harder on Andrew and more confusing for users.
>
No, I started with your sources, and the plan has always
been to merge my changes back to you where possible.