On 17.04.11 04:53:32, Peter Zijlstra wrote:
> On Sun, 2011-04-17 at 10:18 +0200, Ingo Molnar wrote:
> > So with 6 counters it would be a loop of 720, with 8 counters a loop of 40320,
> > with 10 counters a loop of 3628800 ... O(n!) is not fun.
>
> Right, and we'll hit this case at least once when scheduling a
> over-committed system. Intel Sandy Bridge can have 8 counters per core +
> 3 fixed counters, giving an n=11 situation. You do _NOT_ want to have
> one 39916800 cycle loop before we determine the PMU isn't schedulable,
> that's simply unacceptable.

Of course it is not that much as the algorithm is already optimized
and we only walk through possible ways. Also, the more constraints we
have the less we have to walk. So lets assume a worst case of 8
unconstraint counters, I reimplemented the algorithm in the perl
script attached and counted 251 loops, following numbers I got
depending on the number of counters:

$ perl counter-scheduling.pl | grep Num
Number of counters: 2, loops: 10, redos: 4, ratio: 2.5
Number of counters: 3, loops: 26, redos: 7, ratio: 3.7
Number of counters: 4, loops: 53, redos: 11, ratio: 4.8
Number of counters: 5, loops: 89, redos: 15, ratio: 5.9
Number of counters: 6, loops: 134, redos: 19, ratio: 7.1
Number of counters: 7, loops: 188, redos: 23, ratio: 8.2
Number of counters: 8, loops: 251, redos: 27, ratio: 9.3
Number of counters: 9, loops: 323, redos: 31, ratio: 10.4
Number of counters: 10, loops: 404, redos: 35, ratio: 11.5
Number of counters: 11, loops: 494, redos: 39, ratio: 12.7
Number of counters: 12, loops: 593, redos: 43, ratio: 13.8

It seems the algorithm is about number-of-counter times slower than
the current. I think this is worth some further considerations. There
is also some room for improvement with my algorithm.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Attachments:

(No filename) (1.86 kB)
counter-scheduling.pl (984.00 B)
counter-scheduling.pl Download all attachments

2011-04-18 08:18:09

* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > > if (c->weight != w)
> > > > continue;
> > > >
> > > > - for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > - if (!test_bit(j, used_mask))
> > > > + /* for each bit in idxmsk starting from idx */
> > > > + while (idx < X86_PMC_IDX_MAX) {
> > > > + idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > + idx);
> > >
> > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> >
> > Please put the body of the loop into a helper function, the function is large
> > and there are countless col80 uglinesses in it!
>
> I just tried that, its real ugly due to the amount of state you need to
> pass around.

Does it help if you put that state into a helper structure?

Thanks,

Ingo

2011-05-20 03:19:01

by tip-bot for Robert Richter

[permalink] [raw]

Subject: Re: [PATCH v2] perf, x86: Fix event scheduler for constraints with overlapping counters

On 19.05.11 14:06:50, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, 2011-05-18 at 23:20 +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <[email protected]> wrote:
> > >
> > > > > if (c->weight != w)
> > > > > continue;
> > > > >
> > > > > - for_each_set_bit(j, c->idxmsk, X86_PMC_IDX_MAX) {
> > > > > - if (!test_bit(j, used_mask))
> > > > > + /* for each bit in idxmsk starting from idx */
> > > > > + while (idx < X86_PMC_IDX_MAX) {
> > > > > + idx = find_next_bit(c->idxmsk, X86_PMC_IDX_MAX,
> > > > > + idx);
> > > >
> > > > I'd be mighty tempted to ignore that 80 column rule here ;-)
> > >
> > > Please put the body of the loop into a helper function, the function is large
> > > and there are countless col80 uglinesses in it!
> >
> > I just tried that, its real ugly due to the amount of state you need to
> > pass around.
>
> Does it help if you put that state into a helper structure?

Yes, this is what I have in mind too. We could iterate on such a state
stucture instead of a couple of single variables. Storing and
restoring the state will then just copying the structure.

-Robert

>
> Thanks,
>
> Ingo
>

--
Advanced Micro Devices, Inc.
Operating System Research Center