2002-06-24 16:33:41

by Gross, Mark

[permalink] [raw]
Subject: RE: [Lse-tech] RE: ext3 performance bottleneck as the number of s pindles gets large

We are running the tests with the following mother board.
http://www.intel.com/design/servers/scb2/index.htm?iid=ipp_browse+motherbd_s
cb2&

Its a very nice box with 2 independent 64/66 PCI buses.
Capable of 2x503MB/sec, using your logic ;)

Regardless, the 640MB/s number was computed without considering the PCI bus
limitations, or the dual port nature of the base 160MB/sec nature of the
Adabptec SCSI-39160.
http://www.adaptec.com/worldwide/product/proddetail.html?sess=no&prodkey=ASC
-39160&cat=Products

Realistically, we are looking for a max throughput of about 320MB/sec with 4
adapters with enough drives attached.

--mgross

(W) 503-712-8218
MS: JF1-05
2111 N.E. 25th Ave.
Hillsboro, OR 97124


> -----Original Message-----
> From: Christopher E. Brown [mailto:[email protected]]
> Sent: Saturday, June 22, 2002 9:03 PM
> To: Griffiths, Richard A
> Cc: 'Andrew Morton'; [email protected]; 'Jens Axboe'; Linux
> Kernel Mailing List; [email protected]
> Subject: [Lse-tech] RE: ext3 performance bottleneck as the number of
> spindles gets large
>
>
> On Thu, 20 Jun 2002, Griffiths, Richard A wrote:
>
> > I should have mentioned the throughput we saw on 4 adapters
> 6 drives was
> > 126KB/s. The max theoretical bus bandwith is 640MB/s.
>
>
> This is *NOT* correct. Assuming a 64bit 66Mhz PCI bus your MAX is
> 503MB/sec minus PCI overhead...
>
> This of course assumes nothing else is using the PCI bus.
>
>
> 120 something MB/sec sounds a hell of a lot like topping out a 32bit
> 33Mhz PCI bus, but IIRC the earlier posting listed 39160 cards, PCI
> 64bit w/ backward compat to 32bit.
>
> You do have *ALL* of these cards plugged into a full PCI 64bit/66Mhz
> slot right? Not plugging them into a 32bit/33Mhz slot?
>
>
> 32bit/33Mhz (32 * 33,000,000) / (1024 * 1024 * 8) = 125.89 MByte/sec
> 64bit/33Mhz (64 * 33,000,000) / (1024 * 1024 * 8) = 251.77 MByte/sec
> 64bit/66Mhz (64 * 66,000,000) / (1024 * 1024 * 8) = 503.54 MByte/sec
>
>
> NOTE: PCI transfer rates are often listed as
>
> 32bit/33Mhz, 132 MByte/sec
> 64bit/33Mhz, 264 MByte/sec
> 64bit/66Mhz, 528 MByte/sec
>
> This is somewhat true, but only if we start with Mbit rates as used in
> transmission rates (1,000,000 bits/sec) and work from there, instead
> of 2^20 (1,048,576). I will not argue about PCI 32bit/33Mhz being
> 1056Mbit, if talking about line rate, but when we are talking about
> storage media and transfers to/from as measured by files remember to
> convert.
>
> --
> I route, therefore you are.
>
>
>
>
> -------------------------------------------------------
> Sponsored by:
> ThinkGeek at http://www.ThinkGeek.com/
> _______________________________________________
> Lse-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/lse-tech
>


2002-06-24 17:24:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Lse-tech] RE: ext3 performance bottleneck as the number of s pindles gets large

"Gross, Mark" <[email protected]> writes:

> We are running the tests with the following mother board.
> http://www.intel.com/design/servers/scb2/index.htm?iid=ipp_browse+motherbd_s
> cb2&
>
> Its a very nice box with 2 independent 64/66 PCI buses.
> Capable of 2x503MB/sec, using your logic ;)
>
> Regardless, the 640MB/s number was computed without considering the PCI bus
> limitations, or the dual port nature of the base 160MB/sec nature of the
> Adabptec SCSI-39160.
> http://www.adaptec.com/worldwide/product/proddetail.html?sess=no&prodkey=ASC
> -39160&cat=Products
>
> Realistically, we are looking for a max throughput of about 320MB/sec with 4
> adapters with enough drives attached.

Careful even with at 320MB/sec this requires 50% of your systems theoretical
memory bandwidth in DMA transfers.

Application level benchmarks like streams can only achieve memory copy numbers
on a PIII platform of about 320MB/sec. A highly tuned mmx, or sse memory
copy can do better but it is a challenge.

That close to the hardware limits finding the actual bottleneck can
get very tricky. At the very least I would run the system with
just one processor, and attempt to get the numbers that way. I can
trivially see spinlock hold times staying high simply because
the memory is busy with a DMA transfer, and so cannot be used to
transfer the new contents of the lock.

Eric