I would caution against having hyperthreading on by default in the 2.4.19
release. I am seeing a significant degrade in network workloads on P4 with
hyperthreading on. On 2.4.19-pre10, I get 788 Mbps on NetBench, but on
2.4.19-rc1 (and probably rc3, should know in an hour), I get 690 Mbps. It is
clearly a hyperthreading/interrupt routing issue. On this system (4 x P4),
with no hyperthreading, there is enough CPU to handle all interrupts on CPU0
(this is where all ints go by default). With hyperthreading on, I get "1/2"
of a CPU for interrupt processing. What ends up happenning is that CPU0 is
at 100%, while CPU1-CPU7 are at 75%. Now I know the "noth" is available, but
since hyperthreading is not proven to give a performance boost to more than
1/2 of the common workloads for linux users (or is it? who has done tests?),
I'd like to see this default behavior reversed, and still use acpismp=force
to enable hyperthreading.
Also, If anyone has performance results for their workloads showing a boost
with hyperthreading, I would really like to know.
-Andrew Theurer
<<So here goes rc3. Another -rc is going to come only in the case of really
critical problem(s).
I'm attaching the rc2->rc3 changelog only because the full changelog got
too big (I guess thats why my -rc2 announce mail didnt go to lk).>>
On Mon, 2002-07-29 at 20:54, Andrew Theurer wrote:
> I would caution against having hyperthreading on by default in the 2.4.19
> release. I am seeing a significant degrade in network workloads on P4 with
> hyperthreading on. On 2.4.19-pre10, I get 788 Mbps on NetBench, but on
> 2.4.19-rc1 (and probably rc3, should know in an hour), I get 690 Mbps. It is
> clearly a hyperthreading/interrupt routing issue. On this system (4 x P4),
Quite possibly. I've just merged the O(1) scheduler load balancing fixes
for the hyperthreading stuff, rc3 uses the old scheduler so that isnt
your problem. For most workloads I see a speed up. The more cache
optimised the workload the less the speedup.
Its quite possible the irq routing ought to be smarter, at the moment
I'm not sure of the best approaches.
On Mon, Jul 29, 2002 at 10:28:42PM +0100, Alan Cox wrote:
> On Mon, 2002-07-29 at 20:54, Andrew Theurer wrote:
> > I would caution against having hyperthreading on by default in the 2.4.19
> > release. I am seeing a significant degrade in network workloads on P4 with
> > hyperthreading on. On 2.4.19-pre10, I get 788 Mbps on NetBench, but on
> > 2.4.19-rc1 (and probably rc3, should know in an hour), I get 690 Mbps. It is
> > clearly a hyperthreading/interrupt routing issue. On this system (4 x P4),
>
> Quite possibly. I've just merged the O(1) scheduler load balancing fixes
> for the hyperthreading stuff, rc3 uses the old scheduler so that isnt
btw, please make sure to merge my patch, the original one had several
severe bugs.
> Its quite possible the irq routing ought to be smarter, at the moment
> I'm not sure of the best approaches.
fixing irq routing is even simpler, should be a three liner, I heard
somebody used kernel threads for it, that's certainly not needed
(btw, also for the irq routing I recommend you to merge my modified
version, the original one looked more to make P4 SMP look like a PIII
in /proc/interrupts not really to improve performance, performance
improves only if the irq stops trashing around and if it stops
overwriting the ioapic registers even if there's no routing change
required).
Andrea
On Monday 29 July 2002 4:28 pm, Alan Cox wrote:
> On Mon, 2002-07-29 at 20:54, Andrew Theurer wrote:
> > I would caution against having hyperthreading on by default in the 2.4.19
> > release. I am seeing a significant degrade in network workloads on P4
> > with hyperthreading on. On 2.4.19-pre10, I get 788 Mbps on NetBench, but
> > on 2.4.19-rc1 (and probably rc3, should know in an hour), I get 690 Mbps.
> > It is clearly a hyperthreading/interrupt routing issue. On this system
> > (4 x P4),
>
> Quite possibly. I've just merged the O(1) scheduler load balancing fixes
> for the hyperthreading stuff, rc3 uses the old scheduler so that isnt
> your problem. For most workloads I see a speed up. The more cache
> optimised the workload the less the speedup.
>
> Its quite possible the irq routing ought to be smarter, at the moment
> I'm not sure of the best approaches.
Agreed, we need some sort of irqbalance, and I intend to test with Ingo's and
Andrea's approaches. With that addition, I may even see an improvement with
hyperthreading. But for an rc release, I think it would be prudent to revert
the "new code" for default hyperthreading behavior, and attack the whole
problem in 2.4.20 or later release.
-Andrew Theurer
On Mon, 2002-07-29 at 21:58, Andrew Theurer wrote:
> Agreed, we need some sort of irqbalance, and I intend to test with Ingo's and
> Andrea's approaches. With that addition, I may even see an improvement with
> hyperthreading. But for an rc release, I think it would be prudent to revert
> the "new code" for default hyperthreading behavior, and attack the whole
> problem in 2.4.20 or later release.
Because your personal workload is slower ?
Thats overkill to say the least. Learn to use the kernel boot options
On Mon, Jul 29, 2002 at 10:38:40PM +0200, Andrea Arcangeli wrote:
> On Mon, Jul 29, 2002 at 10:28:42PM +0100, Alan Cox wrote:
> > On Mon, 2002-07-29 at 20:54, Andrew Theurer wrote:
> > > I would caution against having hyperthreading on by default in the 2.4.19
> > > release. I am seeing a significant degrade in network workloads on P4 with
> > > hyperthreading on. On 2.4.19-pre10, I get 788 Mbps on NetBench, but on
> > > 2.4.19-rc1 (and probably rc3, should know in an hour), I get 690 Mbps. It is
> > > clearly a hyperthreading/interrupt routing issue. On this system (4 x P4),
> >
> > Quite possibly. I've just merged the O(1) scheduler load balancing fixes
> > for the hyperthreading stuff, rc3 uses the old scheduler so that isnt
>
> btw, please make sure to merge my patch, the original one had several
> severe bugs.
and the new one had a bug too :). Please merge the fix I posted to l-k
too thanks.
Andrea
On Monday 29 July 2002 7:37 pm, Alan Cox wrote:
> On Mon, 2002-07-29 at 21:58, Andrew Theurer wrote:
> > Agreed, we need some sort of irqbalance, and I intend to test with Ingo's
> > and Andrea's approaches. With that addition, I may even see an
> > improvement with hyperthreading. But for an rc release, I think it would
> > be prudent to revert the "new code" for default hyperthreading behavior,
> > and attack the whole problem in 2.4.20 or later release.
>
> Because your personal workload is slower ?
Well, I would think some here would be interested in samba performance.
However, If 4-way P4 systems are considered rare at this point I guess it's
not important enough to revert. FYI, after testing 2P with and without
hyperthreading, it's much faster. 481 Mbps for no hyperthreading and 605
Mbps with. If I can get even close to that improvement with 4 processors,
I'll be very happy.
-Andrew Theurer
On Tue, 30 Jul 2002, Andrea Arcangeli wrote:
> and the new one had a bug too :). Please merge the fix I posted to l-k
> too thanks.
Judging from the patch the code seems incredibly subtle and
I'd be amazed if it doesn't break again every few weeks...
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
On Mon, Jul 29, 2002 at 08:51:51PM -0300, Rik van Riel wrote:
> On Tue, 30 Jul 2002, Andrea Arcangeli wrote:
>
> > and the new one had a bug too :). Please merge the fix I posted to l-k
> > too thanks.
>
> Judging from the patch the code seems incredibly subtle and
> I'd be amazed if it doesn't break again every few weeks...
what's subtle exactly? I found the SD_MAJOR >>4, << 8 >> 8 16 devnum <<
4 in sd.c subtle, this doesn't look subtle to me. The code simply avoids
to rebalance the current idle cpu if the sibling isn't idle too and it
tries to idle reschedule another idle package (with both sibling idle)
instead. The coding is in coherent style with the rest of the o1
scheduler as far I can tell.
Andrea
On 20020730 Rik van Riel wrote:
> On Tue, 30 Jul 2002, Andrea Arcangeli wrote:
>
> > and the new one had a bug too :). Please merge the fix I posted to l-k
> > too thanks.
>
> Judging from the patch the code seems incredibly subtle and
> I'd be amazed if it doesn't break again every few weeks...
>
How about this version (gcc-3.2 generates the same amount of assembler):
int find(int this_cpu)
{
int i;
for ( i = (this_cpu+1)%smp_num_cpus;
i != this_cpu;
i = (i+1)%smp_num_cpus )
{
int physical = cpu_logical_map(i);
int sibling = cpu_sibling_map[physical];
if (idle_cpu(physical) && idle_cpu(sibling))
return physical;
}
return -1;
}
On Tue, Jul 30, 2002 at 02:09:12AM +0200, J.A. Magallon wrote:
> How about this version (gcc-3.2 generates the same amount of assembler):
>
> int find(int this_cpu)
> {
> int i;
>
> for ( i = (this_cpu+1)%smp_num_cpus;
> i != this_cpu;
> i = (i+1)%smp_num_cpus )
> {
> int physical = cpu_logical_map(i);
> int sibling = cpu_sibling_map[physical];
>
> if (idle_cpu(physical) && idle_cpu(sibling))
> return physical;
> }
> return -1;
> }
I also find the above a bit more readable, I'll rediff once more time
then.
Andrea
On Mon, 29 Jul 2002, Andrew Theurer wrote:
> On Monday 29 July 2002 4:28 pm, Alan Cox wrote:
> > Its quite possible the irq routing ought to be smarter, at the moment
> > I'm not sure of the best approaches.
>
> Agreed, we need some sort of irqbalance, and I intend to test with Ingo's and
> Andrea's approaches. With that addition, I may even see an improvement with
> hyperthreading. But for an rc release, I think it would be prudent to revert
> the "new code" for default hyperthreading behavior, and attack the whole
> problem in 2.4.20 or later release.
Ingo Molnars patches for .17 and .18 worked
well for us, and did balance the ints load across all the CPUs very well.
You can find the patches I used agains 2.4.18 at
http://www.hardrock.org/kernel/
BTW, this was on a production box for approximately one month,
then the box mysteriously crashed. Due to the fact that our load wasn't
utilizing the hyperthreading that much I removed acpismp=force from the
boot string.
The are balanced across the 2 real CPUs.
Regards
James Bourne
>
> -Andrew Theurer
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
http://www.mtroyal.ab.ca
******************************************************************************
This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and or privileged
information. Please contact the sender immediately if you are not the
intended recipient of this communication, and do not copy, distribute, or
take action relying on it. Any communication received in error, or
subsequent reply, should be deleted or destroyed.
******************************************************************************
"There are only 10 types of people in this world: those who
understand binary and those who don't."
On Mon, 29 Jul 2002, Andrew Theurer wrote:
> On Monday 29 July 2002 7:37 pm, Alan Cox wrote:
> > On Mon, 2002-07-29 at 21:58, Andrew Theurer wrote:
> > > Agreed, we need some sort of irqbalance, and I intend to test with Ingo's
> > > and Andrea's approaches. With that addition, I may even see an
> > > improvement with hyperthreading. But for an rc release, I think it would
> > > be prudent to revert the "new code" for default hyperthreading behavior,
> > > and attack the whole problem in 2.4.20 or later release.
> >
> > Because your personal workload is slower ?
>
> Well, I would think some here would be interested in samba performance.
> However, If 4-way P4 systems are considered rare at this point I guess it's
> not important enough to revert. FYI, after testing 2P with and without
> hyperthreading, it's much faster. 481 Mbps for no hyperthreading and 605
> Mbps with. If I can get even close to that improvement with 4 processors,
> I'll be very happy.
Hyperthreading only helps if you are running a process when uses a lot
of concurrent process binding up CPU time (IE, more runnable processes
waiting then there are CPUs). I/O won't be much effected, and I found
that it actually gave a performance hit to turn it on if you weren't using
it (about 5%, maybe) if you are not utilizing the existing processors.
Of course, if you are already talking a single, dual, or quad P4 at 1.8GHz
or something, 5% sounds like a lot, but if you're not using it
that heavily you won't actually notice the 5% loss in performance (well,
unless you are watching some kind of image rendering software do it's
thing? =).
Some tests I performed in April are at http://www.hardrock.org/HT-results/
You can see in the kernel compile output the difference.
Regards
James Bourne
>
> -Andrew Theurer
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
http://www.mtroyal.ab.ca
******************************************************************************
This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and or privileged
information. Please contact the sender immediately if you are not the
intended recipient of this communication, and do not copy, distribute, or
take action relying on it. Any communication received in error, or
subsequent reply, should be deleted or destroyed.
******************************************************************************
"There are only 10 types of people in this world: those who
understand binary and those who don't."
On Tue, 30 Jul 2002, J.A. Magallon wrote:
> How about this version (gcc-3.2 generates the same amount of assembler):
Now *that* is readable code!
> int find(int this_cpu)
> {
> int i;
>
> for ( i = (this_cpu+1)%smp_num_cpus;
> i != this_cpu;
> i = (i+1)%smp_num_cpus )
> {
> int physical = cpu_logical_map(i);
> int sibling = cpu_sibling_map[physical];
>
> if (idle_cpu(physical) && idle_cpu(sibling))
> return physical;
> }
> return -1;
> }
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Tue, 30 Jul 2002, Bill Davidsen wrote:
> On Tue, 30 Jul 2002, J.A. Magallon wrote:
>
> > How about this version (gcc-3.2 generates the same amount of assembler):
>
> Now *that* is readable code!
Having code this readable is pretty much essential for
maintenance, too.
I wouldn't mind if every time I code or patch something
that isn't up to the reading standard of Mr. Magallon's
code somebody would raise his hand and/or LART me, until
the code is easily readable.
While developing the rmap VM we went through this process
for a number of iterations and the end result has been that
various people I've never heard of before managed to create
patches against the rmap code or ports of the rmap code to
2.5 that Just Worked.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Rik van Riel <[email protected]> wrote:
> Having code this readable is pretty much essential for
> maintenance, too.
> I wouldn't mind if every time I code or patch something
> that isn't up to the reading standard of Mr. Magallon's
> code somebody would raise his hand and/or LART me, until
> the code is easily readable.
The GNU coding standards make some very sensible comments on this
subject. A very good read;
http://www.gnu.org/prep/standards_24.html
I find it interesting that a large quantity of the kernel and C
library source code I have come across recently has no comments (with
the exception of the O(1) scheduler, very nice). At the very least, I
think every function should have a comment listing all of its input
variables and what they mean, along with a rough idea of what the
function does, and what it returns, along with any assumptions. It
would make the code a *lot* easier for programmers with less than guru
levels of knowledge to understand and hack on.
--
Sam Vilain, [email protected] WWW: http://sam.vilain.net/
7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc
278A A425 30A9 05B5 2F13
Its not the size of the ship, its the size of the waves.
LITTLE RICHARD
On Wed, Jul 31, 2002 at 03:16:05PM +0100, Sam Vilain wrote:
> the exception of the O(1) scheduler, very nice). At the very least, I
> think every function should have a comment listing all of its input
> variables and what they mean, along with a rough idea of what the
> function does, and what it returns, along with any assumptions.
I would like to see at least the identifiers named sanely
(is there already in the Linux kernel) and ALL the assumptions
documented with BUG_ON() or sth. like this.
The rest can be reconstructed by reading the source. But non-local
assumptions are a nasty source for BUGs :-(
> It would make the code a *lot* easier for programmers with less
> than guru levels of knowledge to understand and hack on.
But it shouldn't be that easy, that Aunt Tillie starts submitting
feature patches without understanding the whole picture ;-)
PS: Trimmed CC a little, since these people are busy doing other
things.
Regards
Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth