2002-09-27 17:16:24

by James Bottomley

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

> Which part of the OS are you talking about?

I'm not, I'm talking about the pure physical characteristics of the transport
bus.

> I also do not believe that the command overhead is as significant as
> you suggest. I've personally seen a non-packetized SCSI bus perform
> over 15K transactions per-second.

Well, lets assume the simplest setup possible: select + tag msg + 10 byte
command + disconnect + reselect + status; that's 17 bytes async. The maximum
bus speed async narrow is about 4Mb/s, so those 17 bytes take around 4us to
transmit. On a wide Ultra2 bus, the data rate is about 80Mb/s so it takes
50us to transmit 4k or 800us to transmit 64k. However, the major killer in
this model is going to be disconnection delay at around 200us (dwarfing
arbitration delay, bus settle time etc). For 4k packets you spend about 3
times longer arbitrating for the bus than you do transmitting data. For 64k
packets it's 25% of your data transmit time in arbitration. Your theoretical
throughput for 4k packets is thus 20Mb/s. In my book that's a significant
loss on an 80Mb/s bus.

On Fabric busses, you move to the network model and collision probabilities
which increase as the packet size goes down.

[email protected] said:
> Because of read-ahead, the OS should never send 16 4k contiguous reads
> to the I/O layer for the same application.

read ahead is basically a very simplistic form of I/O scheduling.

> Hooks for sending ordered tags have been in the aic7xxx driver, at
> least in FreeBSD's version, since '97. As soon as the Linux cmd
> blocks have such information it will be trivial to have the aic7xxx
> driver issue the appropriate tag types.

They already do in 2.5, see scsi_populate_tag_msg() in scsi.h. This assumes
you're using the generic tag queueing, which the aic7xxx doesn't, but you
could easily key the tag type off REQ_BARRIER.

> But this misses the point. Andrew's original speculation was that
> writes were "passing reads" once the read was submitted to the drive.

The speculation is based on the observation that for transactions consisting
of multiple writes and small reads, the reads take a long time to complete.
That translates to starving a read in favour of a bunch of contiguous writes.
I'm sure we've all seen SCSI drives indulge in this type of unfair behaviour
(it does make sense to keep servicing writes if they're direct follow on's
from the previously serviced ones).

> I would like to understand the evidence behind that assertion since
> all drive's I've worked with automatically give a higher priority to
> read traffic than writes since writes can be buffered but reads
> cannot.

The evidence is here:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103302456113997&w=1

James



2002-09-27 18:52:18

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

> Well, lets assume the simplest setup possible: select + tag msg + 10 byte
> command + disconnect + reselect + status; that's 17 bytes async. The
> maximum bus speed async narrow is about 4Mb/s, so those 17 bytes take
> around 4us to transmit. On a wide Ultra2 bus, the data rate is about
> 80Mb/s so it takes 50us to transmit 4k or 800us to transmit 64k.
> However, the major killer in this model is going to be disconnection
> delay at around 200us (dwarfing arbitration delay, bus settle time etc).
> For 4k packets you spend about 3 times longer arbitrating for the bus
> than you do transmitting data. For 64k packets it's 25% of your data
> transmit time in arbitration. Your theoretical throughput for 4k
> packets is thus 20Mb/s. In my book that's a significant loss on an
> 80Mb/s bus.

This only matters if your benchmark is dependent on round-trip latency
(no read-ahead or write behind and no command overlap) or if you have
saturated the bus. None of these are the case with the single drive
I/O benchmarks that have been talked about in this thread. I suppose
I should have been more specific in saying, "the command overhead is
not a factor in the issues raised by this thread". Now if you want
to use command overhead as a reason to use tagged queuing to mitigate
that overhead, by all means, go right ahead.

>> Hooks for sending ordered tags have been in the aic7xxx driver, at
>> least in FreeBSD's version, since '97. As soon as the Linux cmd
>> blocks have such information it will be trivial to have the aic7xxx
>> driver issue the appropriate tag types.
>
> They already do in 2.5, see scsi_populate_tag_msg() in scsi.h. This
> assumes you're using the generic tag queueing, which the aic7xxx
> doesn't, but you could easily key the tag type off REQ_BARRIER.

Okay.

>> But this misses the point. Andrew's original speculation was that
>> writes were "passing reads" once the read was submitted to the drive.
>
> The speculation is based on the observation that for transactions
> consisting of multiple writes and small reads, the reads take a long
> time to complete.

I've seen evidence that a series of reads takes a long time to complete,
but nothing that indicates that every read is starved beyond what you
would expect to see if a huge number of writes were issued between each
read.

> That translates to starving a read in favour of a
> bunch of contiguous writes. I'm sure we've all seen SCSI drives indulge
> in this type of unfair behaviour (it does make sense to keep servicing
> writes if they're direct follow on's from the previously serviced ones).

Actually I haven't. The closest I can come to this is a single read way
off on the far side of the disk starved by a continuous stream or reads
on the other side of the platter. This behavior was fixed by all major
drive manufacturers that I know of back in 97 or 98.

>> I would like to understand the evidence behind that assertion since
>> all drive's I've worked with automatically give a higher priority to
>> read traffic than writes since writes can be buffered but reads
>> cannot.
>
> The evidence is here:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103302456113997&w=1

Which unfortunately characterizes only a single symptom without breaking
it down on a transaction by transaction basis. We need to understand
how many writes were queued by the OS to the drive between each read to
know if the drive is actually allowing writes to pass reads or not.

--
Justin

2002-09-27 20:53:20

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

>> Hooks for sending ordered tags have been in the aic7xxx driver, at
>> least in FreeBSD's version, since '97. As soon as the Linux cmd
>> blocks have such information it will be trivial to have the aic7xxx
>> driver issue the appropriate tag types.
>
> They already do in 2.5, see scsi_populate_tag_msg() in scsi.h. This
> assumes you're using the generic tag queueing, which the aic7xxx
> doesn't, but you could easily key the tag type off REQ_BARRIER.

If anyone wants to play with the updated aic7xxx and aic79xx drivers
(new port to 2.5, plus it honors the otag stuff), you can pick it up
from here:



--On Friday, September 27, 2002 13:21:29 -0400 James Bottomley
<[email protected]> wrote:

>> Which part of the OS are you talking about?
>
> I'm not, I'm talking about the pure physical characteristics of the
> transport bus.
>
>> I also do not believe that the command overhead is as significant as
>> you suggest. I've personally seen a non-packetized SCSI bus perform
>> over 15K transactions per-second.
>
> Well, lets assume the simplest setup possible: select + tag msg + 10 byte
> command + disconnect + reselect + status; that's 17 bytes async. The
> maximum bus speed async narrow is about 4Mb/s, so those 17 bytes take
> around 4us to transmit. On a wide Ultra2 bus, the data rate is about
> 80Mb/s so it takes 50us to transmit 4k or 800us to transmit 64k.
> However, the major killer in this model is going to be disconnection
> delay at around 200us (dwarfing arbitration delay, bus settle time etc).
> For 4k packets you spend about 3 times longer arbitrating for the bus
> than you do transmitting data. For 64k packets it's 25% of your data
> transmit time in arbitration. Your theoretical throughput for 4k
> packets is thus 20Mb/s. In my book that's a significant loss on an
> 80Mb/s bus.
>
> On Fabric busses, you move to the network model and collision
> probabilities which increase as the packet size goes down.
>
> [email protected] said:
>> Because of read-ahead, the OS should never send 16 4k contiguous reads
>> to the I/O layer for the same application.
>
> read ahead is basically a very simplistic form of I/O scheduling.
>


--On Friday, September 27, 2002 13:21:29 -0400 James Bottomley
<[email protected]> wrote:

>> Which part of the OS are you talking about?
>
> I'm not, I'm talking about the pure physical characteristics of the
> transport bus.
>
>> I also do not believe that the command overhead is as significant as
>> you suggest. I've personally seen a non-packetized SCSI bus perform
>> over 15K transactions per-second.
>
> Well, lets assume the simplest setup possible: select + tag msg + 10 byte
> command + disconnect + reselect + status; that's 17 bytes async. The
> maximum bus speed async narrow is about 4Mb/s, so those 17 bytes take
> around 4us to transmit. On a wide Ultra2 bus, the data rate is about
> 80Mb/s so it takes 50us to transmit 4k or 800us to transmit 64k.
> However, the major killer in this model is going to be disconnection
> delay at around 200us (dwarfing arbitration delay, bus settle time etc).
> For 4k packets you spend about 3 times longer arbitrating for the bus
> than you do transmitting data. For 64k packets it's 25% of your data
> transmit time in arbitration. Your theoretical throughput for 4k
> packets is thus 20Mb/s. In my book that's a significant loss on an
> 80Mb/s bus.
>
> On Fabric busses, you move to the network model and collision
> probabilities which increase as the packet size goes down.
>
> [email protected] said:
>> Because of read-ahead, the OS should never send 16 4k contiguous reads
>> to the I/O layer for the same application.
>
> read ahead is basically a very simplistic form of I/O scheduling.
>

http://people.FreeBSD.org/~gibbs/linux/linux-2.5-aic79xxx.tar.gz

--
Justin

2002-09-27 21:33:55

by Patrick Mansfield

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

On Fri, Sep 27, 2002 at 02:58:15PM -0600, Justin T. Gibbs wrote:
> >> Hooks for sending ordered tags have been in the aic7xxx driver, at
> >> least in FreeBSD's version, since '97. As soon as the Linux cmd
> >> blocks have such information it will be trivial to have the aic7xxx
> >> driver issue the appropriate tag types.
> >
> > They already do in 2.5, see scsi_populate_tag_msg() in scsi.h. This
> > assumes you're using the generic tag queueing, which the aic7xxx
> > doesn't, but you could easily key the tag type off REQ_BARRIER.
>
> If anyone wants to play with the updated aic7xxx and aic79xx drivers
> (new port to 2.5, plus it honors the otag stuff), you can pick it up
> from here:
>
>
> http://people.FreeBSD.org/~gibbs/linux/linux-2.5-aic79xxx.tar.gz
>
> --
> Justin

Any 2.5 patch for the above? Or aic7xxx/Config.in and
aic7xxx/Makefile for 2.5?

Thanks.

-- Patrick Mansfield

2002-09-27 22:07:08

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

>> http://people.FreeBSD.org/~gibbs/linux/linux-2.5-aic79xxx.tar.gz
>>
>> --
>> Justin
>
> Any 2.5 patch for the above? Or aic7xxx/Config.in and
> aic7xxx/Makefile for 2.5?

Try it now.

--
Justin

2002-09-27 22:23:56

by Patrick Mansfield

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

On Fri, Sep 27, 2002 at 04:08:22PM -0600, Justin T. Gibbs wrote:
> >> http://people.FreeBSD.org/~gibbs/linux/linux-2.5-aic79xxx.tar.gz
> >>
> >> --
> >> Justin
> >
> > Any 2.5 patch for the above? Or aic7xxx/Config.in and
> > aic7xxx/Makefile for 2.5?
>
> Try it now.
>

Great! It boots up fine on my IBM netfinity system with 2.5.37.

I see:

[ boot up stuff deleted ]

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.10
<Adaptec aic7896/97 Ultra2 SCSI adapter>
aic7896/97: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.10
<Adaptec aic7896/97 Ultra2 SCSI adapter>
aic7896/97: Ultra2 Wide Channel B, SCSI Id=7, 32/253 SCBs

I turned on the debug flags, there were a bunch of odd messages
in there, but otherwise it seems to be working fine. My .config
has the following AIC config options:

CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=253
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
CONFIG_AIC7XXX_ALLOW_MEMIO=y
# CONFIG_AIC7XXX_PROBE_EISA_VL is not set
# CONFIG_AIC7XXX_BUILD_FIRMWARE is not set
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC7XXX_REG_PRETTY_PRINT=y
# CONFIG_SCSI_AIC79XX is not set

Weird boot time messages:

INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT<5> Vendor: IBM-PSG Model: ST318203LC !# Rev: B222
Type: Direct-Access ANSI SCSI revision: 02
INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT(scsi0:A:0): 80.000MB/s transfers (40.000MHz, offset 15, 16bit)
INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT<5> Vendor: IBM-PSG Model: ST318203LC !# Rev: B222
Type: Direct-Access ANSI SCSI revision: 02
INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT(scsi0:A:1): 80.000MB/s transfers (40.000MHz, offset 15, 16bit)
Vendor: IBM Model: LN V1.2Rack Rev: B004
Type: Processor ANSI SCSI revision: 02
scsi0:A:0:0: Tagged Queuing enabled. Depth 253
scsi0:A:1:0: Tagged Queuing enabled. Depth 253
st: Version 20020822, fixed bufsize 32768, wrt 30720, s/g segs 256
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT<5>SCSI device sda: 35548320 512-byte hdwr sectors (18201 MB)
sda: sda1 sda2
INITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUTINITIATOR_MSG_OUT<5>SCSI device sdb: 35548320 512-byte hdwr sectors (18201 MB)
sdb: sdb1 sdb2
Attached scsi generic sg2 at scsi0, channel 0, id 15, lun 0, type 3
mice: PS/2 mouse device common for all mice
input: PS/2 Generic Mouse on isa0060/serio1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 32Kbytes
TCP: Hash tables configured (established 16384 bind 21845)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 96k freed
INIT: version 2.78 booting

[ more boot up stuff ]

-- Patrick Mansfield

2002-09-27 22:43:58

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Warning - running *really* short on DMA buffers while doing file transfers

> I turned on the debug flags, there were a bunch of odd messages
> in there, but otherwise it seems to be working fine. My .config
> has the following AIC config options:

<sigh>
I always run with debugging turned on with the message flags enabled,
so I missed this in my testing. I just updated the tarfile. The
following patch is all you need to shut the driver up.

--
Justin


Attachments:
(No filename) (382.00 B)
diff (1.34 kB)
Download all attachments