2003-03-06 05:06:53

by Andrew Theurer

[permalink] [raw]
Subject: HT and idle = poll

The test: kernbench (average of kernel compiles5) with -j2 on a 2 physical/4
logical P4 system. This is on 2.5.64-HTschedB3:

idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
idle = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%

A 15.5% increase in compile times.

So, don't use idle=poll with HT when you know your workload has idle time! I
have not tried oprofile, but it stands to reason that this would be a
problem. There's no point in using idle=poll with oprofile and HT anyway, as
the cpu utilization is totally wrong with HT to begin with (more on that
later).

Presumably a logical cpu polling while idle uses too many cpu resources
unnecessarily and significantly affects the performance of its sibling.

-Andrew Theurer


2003-03-06 19:20:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: HT and idle = poll

In article <[email protected]>,
Andrew Theurer <[email protected]> wrote:
>The test: kernbench (average of kernel compiles5) with -j2 on a 2 physical/4
>logical P4 system. This is on 2.5.64-HTschedB3:
>
>idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
>idle = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%
>
>A 15.5% increase in compile times.
>
>So, don't use idle=poll with HT when you know your workload has idle time! I
>have not tried oprofile, but it stands to reason that this would be a
>problem. There's no point in using idle=poll with oprofile and HT anyway, as
>the cpu utilization is totally wrong with HT to begin with (more on that
>later).
>
>Presumably a logical cpu polling while idle uses too many cpu resources
>unnecessarily and significantly affects the performance of its sibling.

Btw, I think this is exactly what the new HT prescott instructions are
for: instead of having busy loops polling for a change in memory (be it
a spinlock or a "need_resched" flag), new HT CPU's will support a
"mwait" instruction.

But yes, at least for now, I really don't think you should really _ever_
use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
cycles from the real work.

Linus

2003-03-06 19:33:38

by Davide Libenzi

[permalink] [raw]
Subject: Re: HT and idle = poll

On Thu, 6 Mar 2003, Linus Torvalds wrote:

> But yes, at least for now, I really don't think you should really _ever_
> use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
> cycles from the real work.

Not only. The polling CPU will also shoot a strom of memory requests,
clobbering the CPU's memory I/O stages.



- Davide

2003-03-06 19:53:34

by Alan

[permalink] [raw]
Subject: Re: HT and idle = poll

On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> >So, don't use idle=poll with HT when you know your workload has idle time! I
> >have not tried oprofile, but it stands to reason that this would be a

idle=poll probably needs to be doing "rep nop" in a tight loop. That
ironically also saves more power than "hlt" on PIV last time someone
investigated


2003-03-06 19:57:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: HT and idle = poll


On Thu, 6 Mar 2003, Davide Libenzi wrote:
>
> Not only. The polling CPU will also shoot a strom of memory requests,
> clobbering the CPU's memory I/O stages.

Well, that would only be true with a really crappy CPU with no caches.

Polling the same location (as long as it's a pure poll, not trying to do
some locked read-modify-write cycle) should be fine. At least for
something like idle-polling, where the one location it _is_ polling should
not actually be touched by anybody else until the wakeup actually happens.

Linus

2003-03-06 20:00:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: HT and idle = poll


On 6 Mar 2003, Alan Cox wrote:
> On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> > >So, don't use idle=poll with HT when you know your workload has idle time! I
> > >have not tried oprofile, but it stands to reason that this would be a
>
> idle=poll probably needs to be doing "rep nop" in a tight loop.

We already do that. It's not enough. The HT thing will still steal cycles
continually, since the "rep nop" is really only equivalent to a
"sched_yield()".

Think of "rep nop" as yielding, and "mwait" as a true wait.

(I don't actually have any real information on "mwait", so I may be wrong
about the details on the new instructions. They looked obvious enough,
though).

Linus

2003-03-06 20:33:56

by Davide Libenzi

[permalink] [raw]
Subject: Re: HT and idle = poll

On Thu, 6 Mar 2003, Linus Torvalds wrote:

>
> On Thu, 6 Mar 2003, Davide Libenzi wrote:
> >
> > Not only. The polling CPU will also shoot a strom of memory requests,
> > clobbering the CPU's memory I/O stages.
>
> Well, that would only be true with a really crappy CPU with no caches.
>
> Polling the same location (as long as it's a pure poll, not trying to do
> some locked read-modify-write cycle) should be fine. At least for
> something like idle-polling, where the one location it _is_ polling should
> not actually be touched by anybody else until the wakeup actually happens.

We are talking about HT, don't we ? Cores share execution units and memory
requests are shot on the memory I/O units of the CPU. Before there is a
cache circuitry intervention. Something like "while (!run);" will generate
an enormous amount of memory I/O requests on the CPU's memory units. That
are shared by cores. Even with non-HT CPU, the above loop creates problems
respect of the latency to exit the loop itself when the condition will
become true. This because of the huge number of alloc request issued, that
must be, exiting the loop, 1) discarded 2) checked against reordering. But
I don't think the exit latency matters a lot here.



- Davide

2003-03-06 21:05:14

by Nakajima, Jun

[permalink] [raw]
Subject: RE: HT and idle = poll

Linus,

That's correct. Basically mwait is similar to hlt, but you can avoid IPI to wake up the processor waiting. A write to the address specified by monitor wakes up the processor, unlike hlt.

So our plan is to use monitor/mwait in the idle loop, for example, in the kernel to lower the latency.

Jun

> -----Original Message-----
> From: Linus Torvalds [mailto:[email protected]]
> Sent: Thursday, March 06, 2003 12:09 PM
> To: Alan Cox
> Cc: Linux Kernel Mailing List
> Subject: Re: HT and idle = poll
>
>
> On 6 Mar 2003, Alan Cox wrote:
> > On Thu, 2003-03-06 at 19:30, Linus Torvalds wrote:
> > > >So, don't use idle=poll with HT when you know your workload has idle
> time! I
> > > >have not tried oprofile, but it stands to reason that this would be a
> >
> > idle=poll probably needs to be doing "rep nop" in a tight loop.
>
> We already do that. It's not enough. The HT thing will still steal cycles
> continually, since the "rep nop" is really only equivalent to a
> "sched_yield()".
>
> Think of "rep nop" as yielding, and "mwait" as a true wait.
>
> (I don't actually have any real information on "mwait", so I may be wrong
> about the details on the new instructions. They looked obvious enough,
> though).
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-03-06 21:26:36

by Alan

[permalink] [raw]
Subject: RE: HT and idle = poll

On Thu, 2003-03-06 at 21:15, Nakajima, Jun wrote:
> Linus,
>
> That's correct. Basically mwait is similar to hlt, but you can avoid IPI to wake up the processor waiting. A write to the address specified by monitor wakes up the processor, unlike hlt.
>
> So our plan is to use monitor/mwait in the idle loop, for example, in the kernel to lower the latency.

Thats nice. It means you've got the basis of the instructions (although not quite the same
exact functionality) as Brian Grayson proposed four years ago with Armadillo.

2003-03-06 22:22:16

by Martin J. Bligh

[permalink] [raw]
Subject: Re: HT and idle = poll

> Andrew Theurer <[email protected]> wrote:
>> The test: kernbench (average of kernel compiles5) with -j2 on a 2 physical/4
>> logical P4 system. This is on 2.5.64-HTschedB3:
>>
>> idle != poll: Elapsed: 136.692s User: 249.846s System: 30.596s CPU: 204.8%
>> idle = poll: Elapsed: 161.868s User: 295.738s System: 32.966s CPU: 202.6%
>>
>> A 15.5% increase in compile times.
>>
>> So, don't use idle=poll with HT when you know your workload has idle time! I
>> have not tried oprofile, but it stands to reason that this would be a
>> problem. There's no point in using idle=poll with oprofile and HT anyway, as
>> the cpu utilization is totally wrong with HT to begin with (more on that
>> later).
>>
>> Presumably a logical cpu polling while idle uses too many cpu resources
>> unnecessarily and significantly affects the performance of its sibling.
>
> Btw, I think this is exactly what the new HT prescott instructions are
> for: instead of having busy loops polling for a change in memory (be it
> a spinlock or a "need_resched" flag), new HT CPU's will support a
> "mwait" instruction.
>
> But yes, at least for now, I really don't think you should really _ever_
> use "idle=poll" on HT-enabled hardware. The idle CPU's will just suck
> cycles from the real work.

BTW, could someone give a brief summary of why idle=poll is needed for
oprofile, I'd love to add it do the "documentation for dummies" file I
was writing.

M.

2003-03-06 22:25:51

by Eric Northup

[permalink] [raw]
Subject: Re: HT and idle = poll

On Thursday 06 March 2003 03:08 pm, Linus Torvalds wrote:
> On 6 Mar 2003, Alan Cox wrote:
> > idle=poll probably needs to be doing "rep nop" in a tight loop.
>
> We already do that. It's not enough. The HT thing will still steal cycles
> continually, since the "rep nop" is really only equivalent to a
> "sched_yield()".

(Perhaps a naive idea) Right now, there is a single "rep nop" per poll. What
happens if you unroll the loop a few times:

while (!condition) {
cpu_relax();
cpu_relax();
cpu_relax();
}

? I have no HT hardware so can't test this.

-Eric

2003-03-06 23:48:32

by John Levon

[permalink] [raw]
Subject: Re: HT and idle = poll

On Thu, Mar 06, 2003 at 02:22:48PM -0800, Martin J. Bligh wrote:

> BTW, could someone give a brief summary of why idle=poll is needed for
> oprofile, I'd love to add it do the "documentation for dummies" file I
> was writing.

Because events like CPU_CLK_UNHALTED don't tick when the cpu is halted,
so the idle time doesn't show up properly in the kernel profile.
idle=poll doesn't hlt so the profile for poll_idle() reflects the actual
idle percentage.

Something like that anyway.

john