The -jam patchset is interesting because it starts out
with the entire -aa patchset and adds a few things.
Sometimes small differences in LMbench between -jam and -aa are
just CPU bounces on SMP. The difference for pipe and af/unix latency
only appears on SMP too, but it is very consistent. (My k6/2
has small differences between -aa and -jam for pipe and af/unix
latency).
You will know better what could make the difference:
This is the averages:
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix
----------------- ------- -------
2.4.19-pre10-aa4 33.941 70.216
2.4.19-pre10-jam2 7.877 16.699
These are the individual runs:
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
OS Pipe AF/Unix
----------------------------- ------- ------
Linux 2.4.19-pre10-aa4 33.999 73.024
Linux 2.4.19-pre10-aa4 35.829 73.261
Linux 2.4.19-pre10-aa4 16.710 74.830
Linux 2.4.19-pre10-aa4 37.221 66.354
Linux 2.4.19-pre10-aa4 36.259 68.433
Linux 2.4.19-pre10-aa4 36.429 68.215
Linux 2.4.19-pre10-aa4 35.379 77.147
Linux 2.4.19-pre10-aa4 29.300 73.641
Linux 2.4.19-pre10-aa4 35.798 64.875
Linux 2.4.19-pre10-aa4 35.691 75.433
Linux 2.4.19-pre10-aa4 35.372 73.398
Linux 2.4.19-pre10-aa4 33.516 69.183
Linux 2.4.19-pre10-aa4 34.986 69.254
Linux 2.4.19-pre10-aa4 33.743 69.893
Linux 2.4.19-pre10-aa4 32.679 71.900
Linux 2.4.19-pre10-aa4 34.131 71.812
Linux 2.4.19-pre10-aa4 33.444 72.454
Linux 2.4.19-pre10-aa4 36.531 71.956
Linux 2.4.19-pre10-aa4 37.838 69.731
Linux 2.4.19-pre10-aa4 34.359 71.522
Linux 2.4.19-pre10-aa4 33.286 71.609
Linux 2.4.19-pre10-aa4 32.361 43.533
Linux 2.4.19-pre10-aa4 31.716 74.131
Linux 2.4.19-pre10-aa4 35.218 72.001
Linux 2.4.19-pre10-aa4 36.709 67.795
Linux 2.4.19-pre10-jam2 7.9977 14.495
Linux 2.4.19-pre10-jam2 7.8406 14.044
Linux 2.4.19-pre10-jam2 7.7899 14.006
Linux 2.4.19-pre10-jam2 7.8584 13.819
Linux 2.4.19-pre10-jam2 7.8379 14.453
Linux 2.4.19-pre10-jam2 7.8781 14.156
Linux 2.4.19-pre10-jam2 7.8881 14.238
Linux 2.4.19-pre10-jam2 7.9833 14.168
Linux 2.4.19-pre10-jam2 7.7772 78.765
Linux 2.4.19-pre10-jam2 8.0816 13.703
Linux 2.4.19-pre10-jam2 7.8605 14.042
Linux 2.4.19-pre10-jam2 7.7982 13.883
Linux 2.4.19-pre10-jam2 7.6362 14.286
Linux 2.4.19-pre10-jam2 7.7480 13.989
Linux 2.4.19-pre10-jam2 7.9262 13.947
Linux 2.4.19-pre10-jam2 8.0904 14.014
Linux 2.4.19-pre10-jam2 7.8480 14.310
Linux 2.4.19-pre10-jam2 7.7982 14.171
Linux 2.4.19-pre10-jam2 7.9776 14.234
Linux 2.4.19-pre10-jam2 7.7931 14.125
Linux 2.4.19-pre10-jam2 7.8553 14.110
Linux 2.4.19-pre10-jam2 7.7294 14.285
Linux 2.4.19-pre10-jam2 8.3361 14.131
Linux 2.4.19-pre10-jam2 7.7797 14.039
Linux 2.4.19-pre10-jam2 7.8265 14.043
For pipe and af/unix bandwidth, the difference appears to just be a
CPU bounce here and there.
jam patchsets are at:
http://giga.cps.unizar.es/~magallon/linux/
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
On 2002.07.09 [email protected] wrote:
>The -jam patchset is interesting because it starts out
>with the entire -aa patchset and adds a few things.
>
>Sometimes small differences in LMbench between -jam and -aa are
>just CPU bounces on SMP. The difference for pipe and af/unix latency
>only appears on SMP too, but it is very consistent. (My k6/2
>has small differences between -aa and -jam for pipe and af/unix
>latency).
>
>You will know better what could make the difference:
>
>This is the averages:
>
>*Local* Communication latencies in microseconds - smaller is better
>-------------------------------------------------------------------
>kernel Pipe AF/Unix
>----------------- ------- -------
>2.4.19-pre10-aa4 33.941 70.216
>2.4.19-pre10-jam2 7.877 16.699
>
Candidates in pre10-jam2 could be:
11-irqbalance-B1.bz2
12-smptimers-A0.bz2
13-irqrate-A1.bz2
excluding anything that has nothing to do with pipes or latency.
Could you try latest -rc1-aa2 ? It includes also irqbalance, so it could be
on varable less in the equation.
I dropped smptimers and irqrate because they did not mix very well with
bproc and O1 scheduler, but I can try to add them again.
I have a rc1-jam2 ready, but the only important change wrt SMP could be the
mem-barrier specific implementation for P3/P4, and your box is an AMD.
??
--
J.A. Magallon \ Software is like sex: It's better when it's free
mailto:[email protected] \ -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)
On 2002.07.09 J.A. Magallon wrote:
>
>
>I have a rc1-jam2 ready, but the only important change wrt SMP could be the
>mem-barrier specific implementation for P3/P4, and your box is an AMD.
>
Opps, I remembered your tests are done on a Quad Xeon ?
--
J.A. Magallon \ Software is like sex: It's better when it's free
mailto:[email protected] \ -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)
On 2002.07.09 [email protected] wrote:
>The -jam patchset is interesting because it starts out
>with the entire -aa patchset and adds a few things.
>
>Sometimes small differences in LMbench between -jam and -aa are
>just CPU bounces on SMP. The difference for pipe and af/unix latency
>only appears on SMP too, but it is very consistent. (My k6/2
>has small differences between -aa and -jam for pipe and af/unix
>latency).
>
>You will know better what could make the difference:
>
>This is the averages:
>
>*Local* Communication latencies in microseconds - smaller is better
>-------------------------------------------------------------------
>kernel Pipe AF/Unix
>----------------- ------- -------
>2.4.19-pre10-aa4 33.941 70.216
>2.4.19-pre10-jam2 7.877 16.699
>
I took a look at your numbers:
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix UDP RPC/UDP TCP RPC/TCP TCPconn
----------------------------- ------- ------- ------- ------- ------- ------- -------
2.4.19-pre7-jam6 29.513 42.369 58.6165 60.7792 50.2572 82.4976 87.321
2.4.19-pre8-jam2 7.697 15.274 59.6730 60.8190 55.276 82.1297 89.416
2.4.19-pre8-jam2-nowuos 7.739 14.929 57.9326 60.5497 55.9745 81.8908 90.370
(last line says that wake-up-sync is not responsible...)
Main changes between first two were irqbalance and ide6->ide10.
--
J.A. Magallon \ Software is like sex: It's better when it's free
mailto:[email protected] \ -- Linus Torvalds, FSF T-shirt
Linux werewolf 2.4.19-rc1-jam2, Mandrake Linux 8.3 (Cooker) for i586
gcc (GCC) 3.1.1 (Mandrake Linux 8.3 3.1.1-0.7mdk)
On Tue, 9 Jul 2002, J.A. Magallon wrote:
> Opps, I remembered your tests are done on a Quad Xeon ?
Out of interest, is that a P4/Xeon?
Cheers,
Zwane Mwaikambo
--
function.linuxpower.ca
> *Local* Communication latencies in microseconds - smaller is better
> kernel Pipe AF/Unix
> ----------------------------- ------- -------
> 2.4.19-pre7-jam6 29.513 42.369
> 2.4.19-pre8-jam2 7.697 15.274
> 2.4.19-pre8-jam2-nowuos 7.739 14.929
> (last line says that wake-up-sync is not responsible...)
> Main changes between first two were irqbalance and ide6->ide10.
The system is scsi only. pre7-jam6 and pre8-jam2 .config's were
identical.
> Could you try latest -rc1-aa2 ? It includes also irqbalance,
Based on Andrea'a diff logs, irqbalance appeared in 2.4.19pre10aa3.
There are small differences between the pre10-jam2 and aa irqbalance
patches. One new datapoint with pre10-jam3:
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix
----------------------------- ------- -------
2.4.19-pre10-jam2 7.877 16.699
2.4.19-pre10-jam3 33.133 66.825
2.4.19-pre10-aa2 34.208 62.732
2.4.19-pre10-aa4 33.941 70.216
2.4.19-rc1-aa1-1g-nio 34.989 52.704
A config difference between pre10-jam2 and pre10-jam3 is:
CONFIG_X86_SFENCE=y # pre10-jam2
pre10-jam2 was compiled with -Os and pre10-jam3 with -O2.
> Out of interest, is that a P4/Xeon?
Quad P3/Xeon 700 mhz with 1MB cache.
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
On Tue, Jul 09, 2002 at 10:05:58AM -0400, [email protected] wrote:
> > *Local* Communication latencies in microseconds - smaller is better
>
> > kernel Pipe AF/Unix
> > ----------------------------- ------- -------
> > 2.4.19-pre7-jam6 29.513 42.369
> > 2.4.19-pre8-jam2 7.697 15.274
> > 2.4.19-pre8-jam2-nowuos 7.739 14.929
>
> > (last line says that wake-up-sync is not responsible...)
>
> > Main changes between first two were irqbalance and ide6->ide10.
>
> The system is scsi only. pre7-jam6 and pre8-jam2 .config's were
> identical.
>
> > Could you try latest -rc1-aa2 ? It includes also irqbalance,
>
> Based on Andrea'a diff logs, irqbalance appeared in 2.4.19pre10aa3.
> There are small differences between the pre10-jam2 and aa irqbalance
> patches. One new datapoint with pre10-jam3:
>
> *Local* Communication latencies in microseconds - smaller is better
> -------------------------------------------------------------------
> kernel Pipe AF/Unix
> ----------------------------- ------- -------
> 2.4.19-pre10-jam2 7.877 16.699
> 2.4.19-pre10-jam3 33.133 66.825
> 2.4.19-pre10-aa2 34.208 62.732
> 2.4.19-pre10-aa4 33.941 70.216
> 2.4.19-rc1-aa1-1g-nio 34.989 52.704
now if this was AF_INET via ethernet I could imagine the irqbalance
making difference (or even irqrate even if irqrate should make no
difference until your hardware hits the limit of irqs it can handle).
but both pipe and afunix should not generate any irq load (other than
the IPI with the reschedule_task wakeups at least, but they're only
dependent on the scheduler, ipi delivery isn't influenced by the
irqrate/irqbalance patches). it's all trasmission in software internal
to the kernel, with no hardware events so no irq, so I would be very
surprised if the irqbalance or irqrate could make any difference. I
would look elsewere first at least. No idea why you're looking at those
irq related patches for this workload.
At first glance I would say either it's a compiler issue that generates
some very inefficent code one way or the other (seems very unlikely but
cache effects can be quite huge in tight loops where a very small part
of the kernel is exercised), or it has something to do with schduler or
similar core non-irq related areas.
>
> A config difference between pre10-jam2 and pre10-jam3 is:
> CONFIG_X86_SFENCE=y # pre10-jam2
> pre10-jam2 was compiled with -Os and pre10-jam3 with -O2.
>
> > Out of interest, is that a P4/Xeon?
>
> Quad P3/Xeon 700 mhz with 1MB cache.
>
> --
> Randy Hron
> http://home.earthlink.net/~rwhron/kernel/bigbox.html
Andrea
> both pipe and afunix should not generate any irq load (other than
> the IPI with the reschedule_task wakeups at least, but they're only
> dependent on the scheduler
there are some scheduler bits in irqbalance for cpu affinity.
irqbalance is in the two jam patchsets with low latency, and not
in the patchsets with higher latency.
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
On Thu, Jul 11, 2002 at 05:02:14AM -0400, [email protected] wrote:
> > both pipe and afunix should not generate any irq load (other than
> > the IPI with the reschedule_task wakeups at least, but they're only
> > dependent on the scheduler
>
> there are some scheduler bits in irqbalance for cpu affinity.
> irqbalance is in the two jam patchsets with low latency, and not
> in the patchsets with higher latency.
I don't see those scheduler bits, it only exports the idle task info so
we know if a cpu is idle from irq.
anyways 2.4.19-pre10-jam2 is composed by plain 2.4.19pre10aa2 + a number
of patches (including irqbalance,irqrate,smptimers, btw smptimers
reintroduces a deadlock crahsing bug exploitable from userspace that I
pushed into 2.4 mainline recently). So the difference has to be in the
patches into pre10jam2 because pre10aa2 is slow and jam2 is fast.
Only looking at the patches it's not clear what can make the difference.
BTW, in your new set of benchmarks rc1aa1 still seems to be compiled in
the unfair why that explains the slower I/O results, right? I don't mind
of course, just to be sure.
I don't have time to do benchmarks on this myself right now, but if
somebody could try to apply the patches in jam2 with a binary search
(I'd first suggest to backout irqrate, smptimers and irqbalance and see
if it's still fast as I expect), that would be really interesting.
Thanks,
Andrea
On Mon, 8 Jul 2002 [email protected] wrote:
> Sometimes small differences in LMbench between -jam and -aa are
> just CPU bounces on SMP. The difference for pipe and af/unix latency
> only appears on SMP too, but it is very consistent. (My k6/2
> has small differences between -aa and -jam for pipe and af/unix
> latency).
>
> You will know better what could make the difference:
>
> This is the averages:
>
> *Local* Communication latencies in microseconds - smaller is better
> -------------------------------------------------------------------
> kernel Pipe AF/Unix
> ----------------- ------- -------
> 2.4.19-pre10-aa4 33.941 70.216
> 2.4.19-pre10-jam2 7.877 16.699
Small differences? The only thing I would call small is the latency of the
jam kernel!
If (a) this is a real value which results in ~5x latency reduction in
non-benchmark applications, and (b) doesn't have some resulting penalty
(there are some free lunches in Linux), then it would be desirable.
I have an IPC test which measures time for a datum to move form process A
to process B and back, using various methods, I'll try to build these
kernels and test it in my next free day. I'd love to test latency of SysV
message queues as well, since these turn out to be good solutions to some
types of N:M client-server problems.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
> 2.4.19-pre10-jam2 is composed by plain 2.4.19pre10aa2 + a number
> of patches
Yes, that makes narrowing it down to a single patch straight forward.
> BTW, in your new set of benchmarks rc1aa1 still seems to be compiled in
> the unfair why that explains the slower I/O results, right?
Yes. 2.4.19rc1aa1 did not have CONFIG_2GB or CONFIG_HIGHIO set, so
that was unfair. 2.4.19-pre10-jam[23] had 2GB and HIGHIO.
2.4.19rc1aa2 is benching with 2GB and HIGHIO now.
> I don't have time to do benchmarks on this myself right now, but if
> somebody could try to apply the patches in jam2 with a binary search
> (I'd first suggest to backout irqrate, smptimers and irqbalance and see
> if it's still fast as I expect), that would be really interesting.
Thanks for picking out the most suspect patches. In going through
the patchlogs on the 4 different jam samples, I see:
irqrate and smptimers are in pre7jam6 (high latency) and pre8jam2
(low latency), so they may not be the key patch for pipe/unix
latency.
irqbalance is in pre8jam2 and pre10jam2, which both had low latency.
irqbalance is not in pre7jam6 and pre10jam3 which had higher latency.
After 2.4.19rc1aa2 completes, I'll run the latency tests on pre10-jam2
and back out patches until the difference appears. Can't take but a
few pleasant hours, and the weekend is coming. :)
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html