LinuxLists.cc - Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

2001-10-22 19:07:59

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

Unlike the SGI patch, the multiple block size patch continues to use buffer
heads. So the biggest atomic transfer request that can be seen by a device
driver with the multiblocksize patch is still 1 page.

Getting bigger transfers would require a single buffer head to be able to
point to a multipage buffer or not use buffer heads at all.
The former would obviously be a major change and suitable only for 2.5
(perhaps as part of the much-awaited rewrite of the block I/O
subsystem).The use of multipage transfers using a single buffer head would
also help non-raw I/O transfers. I don't know if anyone is working along
those lines.

Incidentally, the multiple block size patch doesn't check whether the
device driver can handle large requests - thats on the todo list of
changes.

Shailabh Nagar
Enterprise Linux Group, IBM TJ Watson Research Center
(914) 945 2851, T/L 862 2851

Reto Baettig <[email protected]>@lists.sourceforge.net on 10/22/2001 03:50:16
AM
Hi!

We had 200MB/s on 2.2.18 with the SGI raw patch and CPU-Load
approximately 10%.
On 2.4.3-12, we get 100MB/s with 100% CPU-Load. Is there a way of
getting even bigger transfers than one page for the aligned part? With
the SGI patch, there was much less waiting for I/O completion because
we could transfer 1MB in one chunk. I'm sorry but I don't have time at
the moment to test the patch but I will send you our numbers as soon as
we have some time.

Good to see somebody working on it! Thanks!

Reto

Shailabh Nagar wrote:
>
> Following up on the previous mail with patches for doing multiblock raw
I/O
> :
>
> Experiments on a 2-way, 850MHz PIII, 256K cache, 256M memory
> Running bonnie (modified to allow specification of O_DIRECT option,
> target file etc.)
> Only the block tests (rewrite,read,write) have been run. All tests
> are single threaded.
>
> BW = bandwidth in kB/s
> cpu = %CPU use
> abs = size of each I/O request
> (NOT blocksize used by underlying raw I/O mechanism !)
>
> pre2 = using kernel 2.4.13-pre2aa1
> multi = 2.4.13-pre2aa1 kernel with multiblock raw I/O patches applied
> (both /dev/raw and O_DIRECT)
>
> /dev/raw (uses 512 byte blocks)
> ===============================
>
> rewrite write read
> ------------------------------------------------------------------
> pre2 multi pre2 multi pre2 multi
> ------------------------------------------------------------------
> abs BW cpu BW cpu BW cpu BW cpu BW cpu BW cpu
> ------------------------------------------------------------------
> 4k 884 0.5 882 0.1 1609 0.3 1609 0.2 9841 1.5 9841 0.9
> 6k 884 0.5 882 0.2 1609 0.5 1609 0.1 9841 1.8 9841 1.2
> 16k 884 0.6 882 0.2 1609 0.3 1609 0.0 9841 2.7 9841 1.4
> 18k 884 0.4 882 0.2 1609 0.4 1607 0.1 9841 2.4 9829 1.2
> 64k 883 0.5 882 0.1 1609 0.4 1609 0.3 9841 2.0 9841 0.6
> 66k 883 0.5 882 0.2 1609 0.5 1609 0.2 9829 3.4 9829 1.0
>
> O_DIRECT : on filesystem with 1K blocksize
> ===========================================
>
> rewrite write read
> ------------------------------------------------------------------
> pre2 multi pre2 multi pre2 multi
> ------------------------------------------------------------------
> abs BW cpu BW cpu BW cpu BW cpu BW cpu BW cpu
> ------------------------------------------------------------------
> 4k 854 0.8 880 0.4 1527 0.5 1607 0.1 9731 2.5 9780 1.3
> 6k 856 0.4 882 0.3 1527 0.4 1607 0.1 9732 1.6 9780 0.7
> 16k 857 0.4 881 0.1 1527 0.3 1608 0.0 9732 2.2 9780 1.2
> 18k 857 0.3 882 0.2 1527 0.4 1607 0.1 9731 1.9 9780 1.0
> 64k 857 0.3 881 0.1 1526 0.4 1607 0.2 9732 1.6 9780 1.6
> 66k 856 0.4 882 0.2 1527 0.4 1607 0.2 9731 2.7 9780 1.2
>

2001-10-23 06:42:37

by Jens Axboe

[permalink] [raw]

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

On Mon, Oct 22 2001, Shailabh Nagar wrote:
>
>
> Unlike the SGI patch, the multiple block size patch continues to use buffer
> heads. So the biggest atomic transfer request that can be seen by a device
> driver with the multiblocksize patch is still 1 page.

Not so. Given a 1MB contigious request broken into 256 pages, even if
submitted in these chunks it will be merged into the biggest possible
request the lower level driver can handle. This is typically 127kB, for
SCSI it can be as much as 512kB currently and depending on the SCSI
driver even more maybe.

I haven't seen the SGI rawio patch, but I'm assuming it used kiobufs to
pass a single unit of 1 meg down at the time. Yes currently we do incur
significant overhead compared to that approach.

> Getting bigger transfers would require a single buffer head to be able to
> point to a multipage buffer or not use buffer heads at all.
> The former would obviously be a major change and suitable only for 2.5
> (perhaps as part of the much-awaited rewrite of the block I/O

Ongoing effort.

> subsystem).The use of multipage transfers using a single buffer head would
> also help non-raw I/O transfers. I don't know if anyone is working along
> those lines.

It is being worked on.

--
Jens Axboe

2001-10-23 09:59:25

by Martin Frey

[permalink] [raw]

Subject: RE: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

>I haven't seen the SGI rawio patch, but I'm assuming it used kiobufs to
>pass a single unit of 1 meg down at the time. Yes currently we do incur
>significant overhead compared to that approach.
>
Yes, it used kiobufs to get a gatherlist, setup a gather DMA out
of that list and submitted it to the SCSI layer. Depending on
the controller 1 MB could be transfered with 0 memcopies, 1 DMA,
1 interrupt. 200 MB/s with 10% CPU load was really impressive.

Regards, Martin

2001-10-23 10:02:35

by Jens Axboe

[permalink] [raw]

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

On Tue, Oct 23 2001, Martin Frey wrote:
> >I haven't seen the SGI rawio patch, but I'm assuming it used kiobufs to
> >pass a single unit of 1 meg down at the time. Yes currently we do incur
> >significant overhead compared to that approach.
> >
> Yes, it used kiobufs to get a gatherlist, setup a gather DMA out
> of that list and submitted it to the SCSI layer. Depending on
> the controller 1 MB could be transfered with 0 memcopies, 1 DMA,
> 1 interrupt. 200 MB/s with 10% CPU load was really impressive.

Let me repeat that the only difference between the kiobuf and the
current approach is the overhead incurred on multiple __make_request
calls. Given the current short queues, this isn't as bad as it used to
be. Of course it isn't free, though.

It's still 0 mem copies, and can be completed with 1 interrupts and DMA
operation.

--
Jens Axboe

2001-10-23 16:17:25

by Alan

[permalink] [raw]

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

> request the lower level driver can handle. This is typically 127kB, for
> SCSI it can be as much as 512kB currently and depending on the SCSI

We really btw should make scsi default to 128K - otherwise all the raid
stuff tends to go 127K, 1K, 127K, 1K and have to handle partial stripe
read/writes

2001-10-23 17:49:33

by Jens Axboe

[permalink] [raw]

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

On Tue, Oct 23 2001, Alan Cox wrote:
> > request the lower level driver can handle. This is typically 127kB, for
> > SCSI it can be as much as 512kB currently and depending on the SCSI
>
> We really btw should make scsi default to 128K - otherwise all the raid
> stuff tends to go 127K, 1K, 127K, 1K and have to handle partial stripe
> read/writes

Fine with me, the major reason for doing 255 sectors and not 256 was IDE
of course... So feel free to change the default host max_sectors to 256.

--
Jens Axboe

2001-10-23 17:58:33

by Alan

[permalink] [raw]

Subject: Re: [Lse-tech] Re: Preliminary results of using multiblock raw I/O

> Fine with me, the major reason for doing 255 sectors and not 256 was IDE
> of course... So feel free to change the default host max_sectors to 256.

The -ac tree uses 128 for IDE currently I believe. I will double check
before I tweak anything