2002-09-24 13:15:28

by William Lee Irwin III

[permalink] [raw]
Subject: 2.5.38-mm2 dbench $N times

Taken on 32x/32G NUMA-Q:

Throughput 67.3949 MB/sec (NB=84.2436 MB/sec 673.949 MBit/sec) 16 procs
dbench 16 11.72s user 122.21s system 422% cpu 31.733 total

Throughput 95.1519 MB/sec (NB=118.94 MB/sec 951.519 MBit/sec) 32 procs
dbench 32 24.71s user 357.97s system 847% cpu 45.175 total

Throughput 93.8379 MB/sec (NB=117.297 MB/sec 938.379 MBit/sec) 64 procs
dbench 64 56.03s user 773.39s system 903% cpu 1:31.75 total

Throughput 87.2713 MB/sec (NB=109.089 MB/sec 872.713 MBit/sec) 128 procs
dbench 128 116.31s user 1524.85s system 840% cpu 3:15.16 total

Throughput 84.454 MB/sec (NB=105.567 MB/sec 844.54 MBit/sec) 192 procs
dbench 192 180.64s user 2293.04s system 821% cpu 5:01.13 total

Throughput 82.9662 MB/sec (NB=103.708 MB/sec 829.662 MBit/sec) 224 procs
dbench 224 212.30s user 2716.77s system 820% cpu 5:57.15 total

Throughput 37.9382 MB/sec (NB=47.4227 MB/sec 379.382 MBit/sec) 256 procs
dbench 256 237.38s user 3115.41s system 376% cpu 14:51.40 total

Throughput 25.7546 MB/sec (NB=32.1932 MB/sec 257.546 MBit/sec) 512 procs
dbench 512 465.96s user 5980.49s system 245% cpu 43:45.79 total


Cheers,
Bill


2002-09-24 17:44:30

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

William Lee Irwin III wrote:
>
> Taken on 32x/32G NUMA-Q:
>
> Throughput 67.3949 MB/sec (NB=84.2436 MB/sec 673.949 MBit/sec) 16 procs
> dbench 16 11.72s user 122.21s system 422% cpu 31.733 total
>

Taken on 2x/0.8G el-scruffo PC:

Throughput 135.02 MB/sec (NB=168.775 MB/sec 1350.2 MBit/sec)
./dbench 16 12.11s user 16.29s system 181% cpu 15.646 total

What's up with that?

2002-09-25 01:13:09

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

At some point in the past, Andrew Morton wrote:
>> dbench 16 on that sort of machine is a memory bandwidth test.
>> And a dcache lock exerciser. It basically doesn't touch the
>> disk. Something very bad is happening.
>> Anton can get 3000 MByte/sec ;)


On Tue, Sep 24, 2002 at 06:08:59PM -0700, Dave Hansen wrote:
> Bill's Machine cost around $50, plus the cost to repair the walls that I
> crushed when hauling the pieces around. Anton's cost $2 million. Bill
> wins :)
> Are you trying to bind the processes anywhere? I wonder what would happen
> if you make it always run quad 0...

It's probably more an artifact of not having substantial I/O subsystems.
This is basically a single-JBOD test.



Cheers,
Bill

2002-09-25 01:04:13

by Dave Hansen

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

On Tue, 24 Sep 2002, Andrew Morton wrote:

> William Lee Irwin III wrote:
> >
> > William Lee Irwin III wrote:
> > >> Taken on 32x/32G NUMA-Q:
> > >> Throughput 67.3949 MB/sec (NB=84.2436 MB/sec 673.949 MBit/sec) 16 procs
> > >> dbench 16 11.72s user 122.21s system 422% cpu 31.733 total
>
> dbench 16 on that sort of machine is a memory bandwidth test.
> And a dcache lock exerciser. It basically doesn't touch the
> disk. Something very bad is happening.
>
> Anton can get 3000 MByte/sec ;)

Bill's Machine cost around $50, plus the cost to repair the walls that I
crushed when hauling the pieces around. Anton's cost $2 million. Bill
wins :)

Are you trying to bind the processes anywhere? I wonder what would happen
if you make it always run quad 0...
--
Dave Hansen
[email protected]

2002-09-25 00:21:15

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

On Tue, Sep 24, 2002 at 10:47:30AM -0700, Andrew Morton wrote:
>> Taken on 2x/0.8G el-scruffo PC:
>> Throughput 135.02 MB/sec (NB=168.775 MB/sec 1350.2 MBit/sec)
>> ./dbench 16 12.11s user 16.29s system 181% cpu 15.646 total
>> What's up with that?

On Tue, Sep 24, 2002 at 05:18:26PM -0700, William Lee Irwin III wrote:
> Not sure. This is boot bay SCSI crud, but single-disk FC looks
> *worse* for no obvious reason. Multiple disk tests do much better
> (about matching the el-scruffo PC numbers above).

Exact numbers:

Total throughput: 136.09139999999999 MB/s
dbench.log.j:Throughput 17.4581 MB/sec (NB=21.8226 MB/sec 174.581 MBit/sec) 64 procs
dbench.log.k:Throughput 17.2604 MB/sec (NB=21.5755 MB/sec 172.604 MBit/sec) 64 procs
dbench.log.l:Throughput 19.0192 MB/sec (NB=23.774 MB/sec 190.192 MBit/sec) 64 procs
dbench.log.m:Throughput 15.7826 MB/sec (NB=19.7283 MB/sec 157.826 MBit/sec) 64 procs
dbench.log.n:Throughput 15.8795 MB/sec (NB=19.8494 MB/sec 158.795 MBit/sec) 64 procs
dbench.log.o:Throughput 17.621 MB/sec (NB=22.0263 MB/sec 176.21 MBit/sec) 64 procs
dbench.log.p:Throughput 15.489 MB/sec (NB=19.3613 MB/sec 154.89 MBit/sec) 64 procs
dbench.log.q:Throughput 17.5816 MB/sec (NB=21.977 MB/sec 175.816 MBit/sec) 64 procs

2002-09-25 00:35:40

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

William Lee Irwin III wrote:
>> Not sure. This is boot bay SCSI crud, but single-disk FC looks
>> *worse* for no obvious reason. Multiple disk tests do much better
>> (about matching the el-scruffo PC numbers above).
>>

On Tue, Sep 24, 2002 at 05:31:39PM -0700, Andrew Morton wrote:
> dbench 16 on that sort of machine is a memory bandwidth test.
> And a dcache lock exerciser. It basically doesn't touch the
> disk. Something very bad is happening.
> Anton can get 3000 MByte/sec ;)

Hmm, this is odd (remember, dcache_rcu is in here):

c01053dc 9194801 60.668 poll_idle
c01175db 1752528 11.5633 .text.lock.sched
c0114c08 1281763 8.45717 load_balance
c0106408 388517 2.56346 .text.lock.semaphore
c0147a4e 272571 1.79844 .text.lock.file_table
c0115080 265006 1.74853 scheduler_tick
c0132374 227403 1.50042 generic_file_write_nolock
c0115434 187759 1.23885 do_schedule
c01233c8 167905 1.10785 run_timer_tasklet
c0114778 103077 0.68011 try_to_wake_up
c010603c 100121 0.660606 __down
c0111788 96062 0.633824 smp_apic_timer_interrupt
c011fa20 70840 0.467408 tasklet_hi_action
c011f700 63125 0.416504 do_softirq
c0131880 60640 0.400107 file_read_actor
c0145dd0 53872 0.355452 generic_file_llseek
c01473e0 45819 0.302317 get_empty_filp
c0175190 35814 0.236304 ext2_new_block
c01476ac 31657 0.208875 __fput
c0146354 26755 0.176532 vfs_write
c010d718 26627 0.175687 timer_interrupt
c01a2cd0 26426 0.174361 atomic_dec_and_lock
c0123294 23022 0.151901 update_one_process

2002-09-25 00:26:32

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

William Lee Irwin III wrote:
>
> William Lee Irwin III wrote:
> >> Taken on 32x/32G NUMA-Q:
> >> Throughput 67.3949 MB/sec (NB=84.2436 MB/sec 673.949 MBit/sec) 16 procs
> >> dbench 16 11.72s user 122.21s system 422% cpu 31.733 total
>
> On Tue, Sep 24, 2002 at 10:47:30AM -0700, Andrew Morton wrote:
> > Taken on 2x/0.8G el-scruffo PC:
> > Throughput 135.02 MB/sec (NB=168.775 MB/sec 1350.2 MBit/sec)
> > ./dbench 16 12.11s user 16.29s system 181% cpu 15.646 total
> > What's up with that?
>
> Not sure. This is boot bay SCSI crud, but single-disk FC looks
> *worse* for no obvious reason. Multiple disk tests do much better
> (about matching the el-scruffo PC numbers above).
>

dbench 16 on that sort of machine is a memory bandwidth test.
And a dcache lock exerciser. It basically doesn't touch the
disk. Something very bad is happening.

Anton can get 3000 MByte/sec ;)

2002-09-25 00:41:06

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

William Lee Irwin III wrote:
>
>...
> c01053dc 9194801 60.668 poll_idle
> c01175db 1752528 11.5633 .text.lock.sched
> c0114c08 1281763 8.45717 load_balance
> c0106408 388517 2.56346 .text.lock.semaphore

lock_super in the ext2 block allocator, I bet.

Al was making noises about nailing that. Per-blockgroup
rwlocks would be neat.

2002-09-25 00:14:18

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

William Lee Irwin III wrote:
>> Taken on 32x/32G NUMA-Q:
>> Throughput 67.3949 MB/sec (NB=84.2436 MB/sec 673.949 MBit/sec) 16 procs
>> dbench 16 11.72s user 122.21s system 422% cpu 31.733 total

On Tue, Sep 24, 2002 at 10:47:30AM -0700, Andrew Morton wrote:
> Taken on 2x/0.8G el-scruffo PC:
> Throughput 135.02 MB/sec (NB=168.775 MB/sec 1350.2 MBit/sec)
> ./dbench 16 12.11s user 16.29s system 181% cpu 15.646 total
> What's up with that?

Not sure. This is boot bay SCSI crud, but single-disk FC looks
*worse* for no obvious reason. Multiple disk tests do much better
(about matching the el-scruffo PC numbers above).


Cheers,
Bill

2002-09-25 06:00:34

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

>> Not sure. This is boot bay SCSI crud, but single-disk FC looks
>> *worse* for no obvious reason. Multiple disk tests do much better
>> (about matching the el-scruffo PC numbers above).
>
> dbench 16 on that sort of machine is a memory bandwidth test.
> And a dcache lock exerciser. It basically doesn't touch the
> disk. Something very bad is happening.

Does dbench have any sort of CPU locality between who read it
into pagecache, and who read it out again? If not, you stand
7/8 chance of being on the wrong node, and getting 1/20 of the
mem bandwidth ....

M.

2002-09-25 18:55:20

by Andrew Theurer

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

On Wednesday 25 September 2002 1:03 am, Martin J. Bligh wrote:
> >> Not sure. This is boot bay SCSI crud, but single-disk FC looks
> >> *worse* for no obvious reason. Multiple disk tests do much better
> >> (about matching the el-scruffo PC numbers above).
> >
> > dbench 16 on that sort of machine is a memory bandwidth test.
> > And a dcache lock exerciser. It basically doesn't touch the
> > disk. Something very bad is happening.
>
> Does dbench have any sort of CPU locality between who read it
> into pagecache, and who read it out again? If not, you stand
> 7/8 chance of being on the wrong node, and getting 1/20 of the
> mem bandwidth ....

Pretty sure each dbench child does it's own write/read to only it's own data.
There is no sharing that I am aware of between the processes. How about
running in tmpfs to avoid any disk IO at all?

Also, what's the policy for home node assignment on fork? Are all of these
children getting the same home node assingnment??

-Andrew

2002-09-25 20:55:55

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

> Pretty sure each dbench child does it's own write/read to only it's own data.
> There is no sharing that I am aware of between the processes.

Right, but if the processes migrate easily, there's still no CPU locality.
Bill, do you want to try binding 1/32 of the processes to each CPU, and see if
that makes your throughput increase?

> How about running in tmpfs to avoid any disk IO at all?

As far as I understand it, that won't help - we're operating out of pagecache anyway
at this level, I think.

> Also, what's the policy for home node assignment on fork? Are all of these
> children getting the same home node assingnment??

Policy is random - the scheduler lacks any NUMA comprehension at the moment.
The numa sched mods would probably help a lot too.

M.

2002-09-25 21:10:51

by Andrew Theurer

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

On Wednesday 25 September 2002 3:57 pm, Martin J. Bligh wrote:
> > Pretty sure each dbench child does it's own write/read to only it's own
> > data. There is no sharing that I am aware of between the processes.
>
> Right, but if the processes migrate easily, there's still no CPU locality.
> Bill, do you want to try binding 1/32 of the processes to each CPU, and see
> if that makes your throughput increase?

Any stats to track migration over <x> time? Would be intersting to see this
in some sort of /proc/pid maybe?

> > How about running in tmpfs to avoid any disk IO at all?
>
> As far as I understand it, that won't help - we're operating out of
> pagecache anyway at this level, I think.

at dbench 512, that's about 512*20MB, 10GB (not 100% sure on that 20MB
figure). Have we hit a threshold at which this stuff gets written to disk?

> > Also, what's the policy for home node assignment on fork? Are all of
> > these children getting the same home node assingnment??
>
> Policy is random - the scheduler lacks any NUMA comprehension at the
> moment. The numa sched mods would probably help a lot too.

Oops, confusing this run with the numa sched stuff earlier.

Andrew Theurer

2002-09-25 23:53:41

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.38-mm2 dbench $N times

On Wednesday 25 September 2002 1:03 am, Martin J. Bligh wrote:
>> Does dbench have any sort of CPU locality between who read it
>> into pagecache, and who read it out again? If not, you stand
>> 7/8 chance of being on the wrong node, and getting 1/20 of the
>> mem bandwidth ....

On Wed, Sep 25, 2002 at 01:51:58PM -0500, Andrew Theurer wrote:
> Pretty sure each dbench child does it's own write/read to only it's
> own data. There is no sharing that I am aware of between the processes.
> How about running in tmpfs to avoid any disk IO at all?

tmpfs needs some fixes before it can be used for that. Hugh's working
on it.


Cheers,
Bill