Observations -
The up-fix for the setup_per_cpu_areas compile
issue apparently didn't make it into 2.5.8-final,
so we had to apply the patch from 2.5.8-pre3
to get it to compile.
That said, however, everything works, all services
are running, all devices working, Xfree is happy.
P4-B/1600, genuine intel mobo running RH 7.2+rawhide
It also passes the q3a test with snappy results
:-)
Joe
J Sloan wrote:
> Observations -
>
> The up-fix for the setup_per_cpu_areas compile
> issue apparently didn't make it into 2.5.8-final,
> so we had to apply the patch from 2.5.8-pre3
> to get it to compile.
>
> That said, however, everything works, all services
> are running, all devices working, Xfree is happy.
Stop me if you've heard this one before -
But there is one additional observation:
dbench performance has regressed significantly
since 2.5.8-pre1; the performance is equivalent
up to 8 instances, but at 16 and above, 2.5.8 final
takes a nosedive. Performance at 128 instances
is approximately 20% of the throughput of
2.5.8-pre1 - which is in turn not up to 2.4.xx
performance levels. I realize that the BIO has
been through heavy surgery, and nowhere near
optimized, but this is just a data point...
hdparm -t shows normal performance levels,
for what it's worth
2.5.8-pre1
--------------
Throughput 151.152 MB/sec (NB=188.94 MB/sec 1511.52 MBit/sec) 1 procs
Throughput 152.177 MB/sec (NB=190.221 MB/sec 1521.77 MBit/sec) 2 procs
Throughput 151.965 MB/sec (NB=189.957 MB/sec 1519.65 MBit/sec) 4 procs
Throughput 151.068 MB/sec (NB=188.835 MB/sec 1510.68 MBit/sec) 8 procs
Throughput 43.0191 MB/sec (NB=53.7738 MB/sec 430.191 MBit/sec) 16 procs
Throughput 9.65171 MB/sec (NB=12.0646 MB/sec 96.5171 MBit/sec) 32 procs
Throughput 37.8267 MB/sec (NB=47.2833 MB/sec 378.267 MBit/sec) 64 procs
Throughput 14.0459 MB/sec (NB=17.5573 MB/sec 140.459 MBit/sec) 80 procs
Throughput 16.2971 MB/sec (NB=20.3714 MB/sec 162.971 MBit/sec) 128 procs
2.5.8-final
---------------
Throughput 152.948 MB/sec (NB=191.185 MB/sec 1529.48 MBit/sec) 1 procs
Throughput 151.597 MB/sec (NB=189.497 MB/sec 1515.97 MBit/sec) 2 procs
Throughput 150.377 MB/sec (NB=187.972 MB/sec 1503.77 MBit/sec) 4 procs
Throughput 150.159 MB/sec (NB=187.698 MB/sec 1501.59 MBit/sec) 8 procs
Throughput 7.25691 MB/sec (NB=9.07113 MB/sec 72.5691 MBit/sec) 16 procs
Throughput 6.36332 MB/sec (NB=7.95415 MB/sec 63.6332 MBit/sec) 32 procs
Throughput 5.55008 MB/sec (NB=6.9376 MB/sec 55.5008 MBit/sec) 64 procs
Throughput 5.82333 MB/sec (NB=7.27916 MB/sec 58.2333 MBit/sec) 80 procs
Throughput 3.40741 MB/sec (NB=4.25926 MB/sec 34.0741 MBit/sec) 128 procs
FWIW -
One other observation was the numerous
syslog entries generated during the test,
which were as follows:
Apr 14 20:40:35 neo kernel: invalidate: busy buffer
Apr 14 20:41:15 neo last message repeated 72 times
Apr 14 20:44:41 neo last message repeated 36 times
Apr 14 20:45:24 neo last message repeated 47 times
J Sloan wrote:
> dbench performance has regressed significantly
> since 2.5.8-pre1;
>
> 2.5.8-pre1
> --------------
> Throughput 37.8267 MB/sec (NB=47.2833 MB/sec 378.267 MBit/sec) 64 procs
> Throughput 14.0459 MB/sec (NB=17.5573 MB/sec 140.459 MBit/sec) 80 procs
> Throughput 16.2971 MB/sec (NB=20.3714 MB/sec 162.971 MBit/sec) 128
> procs
>
> 2.5.8-final
> ---------------
> Throughput 5.55008 MB/sec (NB=6.9376 MB/sec 55.5008 MBit/sec) 64 procs
> Throughput 5.82333 MB/sec (NB=7.27916 MB/sec 58.2333 MBit/sec) 80 procs
> Throughput 3.40741 MB/sec (NB=4.25926 MB/sec 34.0741 MBit/sec) 128
> procs
>
J Sloan wrote:
>
> ...
> dbench performance has regressed significantly
> since 2.5.8-pre1; the performance is equivalent
> up to 8 instances, but at 16 and above, 2.5.8 final
> takes a nosedive. Performance at 128 instances
> is approximately 20% of the throughput of
> 2.5.8-pre1 - which is in turn not up to 2.4.xx
> performance levels. I realize that the BIO has
> been through heavy surgery, and nowhere near
> optimized, but this is just a data point...
It's not related to BIO. dbench is all about higher-level
memory management, high-level IO scheduling and butterfly
wings.
> ...
> Throughput 151.068 MB/sec (NB=188.835 MB/sec 1510.68 MBit/sec) 8 procs
> Throughput 43.0191 MB/sec (NB=53.7738 MB/sec 430.191 MBit/sec) 16 procs
> Throughput 9.65171 MB/sec (NB=12.0646 MB/sec 96.5171 MBit/sec) 32 procs
> Throughput 37.8267 MB/sec (NB=47.2833 MB/sec 378.267 MBit/sec) 64 procs
Consider that 32 proc line for a while.
>....
> 2.5.8-final
> ---------------
> Throughput 152.948 MB/sec (NB=191.185 MB/sec 1529.48 MBit/sec) 1 procs
> Throughput 151.597 MB/sec (NB=189.497 MB/sec 1515.97 MBit/sec) 2 procs
> Throughput 150.377 MB/sec (NB=187.972 MB/sec 1503.77 MBit/sec) 4 procs
> Throughput 150.159 MB/sec (NB=187.698 MB/sec 1501.59 MBit/sec) 8 procs
> Throughput 7.25691 MB/sec (NB=9.07113 MB/sec 72.5691 MBit/sec) 16 procs
> Throughput 6.36332 MB/sec (NB=7.95415 MB/sec 63.6332 MBit/sec) 32 procs
It's obviously fallen over some cliff. Conceivably the larger readahead
window causes this. How much memory does the machine have? `dbench 64'
on a 512 meg setup certainly causes readahead thrashing. You can
stick a `printk("ouch");' into handle_ra_thrashing() and watch it...
But really, all this stuff is in churn at present. I have patches here
which take `dbench 64' on 512 megs from this:
2.5.8:
Throughput 12.7343 MB/sec (NB=15.9179 MB/sec 127.343 MBit/sec)
to this:
2.5.8-akpm:
Throughput 49.2223 MB/sec (NB=61.5278 MB/sec 492.223 MBit/sec)
This is partly by just throwing more memory at it. The gap
widens on highmem...
And that code isn't tuned yet - I do know that threads are getting
blocked by each other at the inode level. And that ext2 is serialising
itself at the lock_super() level, and that if you fix that,
threads serialise on slab's cache_chain_sem (which is pretty
amazing...).
Patience. 2.5.later-on will perform well. :)
-
J Sloan wrote:
>
> FWIW -
>
> One other observation was the numerous
> syslog entries generated during the test,
> which were as follows:
>
> Apr 14 20:40:35 neo kernel: invalidate: busy buffer
> Apr 14 20:41:15 neo last message repeated 72 times
> Apr 14 20:44:41 neo last message repeated 36 times
> Apr 14 20:45:24 neo last message repeated 47 times
>
If that is happening during the dbench run, then something
is wrong.
What filesystem and I/O drivers are you using? LVM?
RAID?
Please replace that line in fs:buffer.c:invalidate_bdev()
with a BUG(), or show_stack(0), send the ksymoops output.
Thanks.
-
Andrew Morton wrote:
>J Sloan wrote:
>
>>
>>Apr 14 20:40:35 neo kernel: invalidate: busy buffer
>>
>
>If that is happening during the dbench run, then something
>is wrong.
>
I am reasonably sure that's when it was happening.
>
>
>What filesystem and I/O drivers are you using? LVM?
>RAID?
>
Actually just plain old ext2 on ide drives -
>
>Please replace that line in fs:buffer.c:invalidate_bdev()
>with a BUG(), or show_stack(0), send the ksymoops output.
>
OK, will do -
Joe
Andrew Morton wrote:
>It's not related to BIO. dbench is all about higher-level
>memory management, high-level IO scheduling and butterfly
>wings.
>
Yes, no doubt and a lot of other deep magic
which is only dimly perceived by the likes
of yours truly....
>>
>>Throughput 150.159 MB/sec (NB=187.698 MB/sec 1501.59 MBit/sec) 8 procs
>>Throughput 7.25691 MB/sec (NB=9.07113 MB/sec 72.5691 MBit/sec) 16 procs
>>Throughput 6.36332 MB/sec (NB=7.95415 MB/sec 63.6332 MBit/sec) 32 procs
>>
>
>It's obviously fallen over some cliff. Conceivably the larger readahead
>window causes this. How much memory does the machine have?
>
The box has 512 MB RAM -
>`dbench 64'
>on a 512 meg setup certainly causes readahead thrashing. You can
>stick a `printk("ouch");' into handle_ra_thrashing() and watch it...
>
hmm - OK, will try that -
Just for giggles, same machine with 2.4.19-pre4-ac4 -
Throughput 150.979 MB/sec (NB=188.723 MB/sec 1509.79 MBit/sec) 1 procs
Throughput 150.796 MB/sec (NB=188.496 MB/sec 1507.96 MBit/sec) 2 procs
Throughput 151.185 MB/sec (NB=188.982 MB/sec 1511.85 MBit/sec) 4 procs
Throughput 141.255 MB/sec (NB=176.568 MB/sec 1412.55 MBit/sec) 8 procs
Throughput 105.066 MB/sec (NB=131.332 MB/sec 1050.66 MBit/sec) 16 procs
Throughput 69.3542 MB/sec (NB=86.6928 MB/sec 693.542 MBit/sec) 32 procs
Throughput 32.4904 MB/sec (NB=40.613 MB/sec 324.904 MBit/sec) 64 procs
Throughput 30.4824 MB/sec (NB=38.103 MB/sec 304.824 MBit/sec) 80 procs
Throughput 19.0265 MB/sec (NB=23.7832 MB/sec 190.265 MBit/sec) 128 procs
>
>
>Patience. 2.5.later-on will perform well. :)
>
Oh, yes -
It's already quite usable for some workloads, and the
latency for workstation use is quite good - I am looking
forward to the maturation of this diamond in the rough
:-)
Joe
OH well, on sparc64 setup_per_cpu_areas() simply is
not declared, since it is not a GENERIC_PER_CPU.
then asm/cacheflush.h, required by linux/highmem.h,
does not exist.
And then PREEMPT_ACTIVE is not defined...
it seems that I could not test under sparc64 also this release, sigh!
On Sun, 14 Apr 2002, J Sloan wrote:
> Observations -
>
> The up-fix for the setup_per_cpu_areas compile
> issue apparently didn't make it into 2.5.8-final,
> so we had to apply the patch from 2.5.8-pre3
> to get it to compile.
>
> That said, however, everything works, all services
> are running, all devices working, Xfree is happy.
>
> P4-B/1600, genuine intel mobo running RH 7.2+rawhide
>
> It also passes the q3a test with snappy results
>
> :-)
>
> Joe
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
From: Luigi Genoni <[email protected]>
Date: Mon, 15 Apr 2002 16:15:04 +0200 (CEST)
OH well, on sparc64 setup_per_cpu_areas() simply is
not declared, since it is not a GENERIC_PER_CPU.
then asm/cacheflush.h, required by linux/highmem.h,
does not exist.
And then PREEMPT_ACTIVE is not defined...
it seems that I could not test under sparc64 also this release, sigh!
I just haven't pushed my tree yet, it will be fixed soon.
I've been busy with other things this weekend...