Hi,
I've implemented tagged command queueing for ATA disk drives, and it's
now ready for people to give it a test spin. As it has had only limited
testing so far, please be very careful with it. It has been tested on
two drives so far, a GXP75-30gb and a GXP120-40gb, and with a PIIX4
controller:
Intel Corp. 82371AB PIIX4 IDE: IDE controller on PCI slot 00:04.1
Intel Corp. 82371AB PIIX4 IDE: chipset revision 1
Intel Corp. 82371AB PIIX4 IDE: not 100% native mode: will probe irqs later
PIIX: Intel Corp. 82371AB PIIX4 IDE UDMA33 controller on pci00:04.1
ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:pio, hdd:pio
hda: IC35L040AVVA07-0, ATA DISK drive
hdc: IBM-DTLA-307030, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: tagged command queueing enabled, command queue depth 32
hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=79780/16/63, UDMA(33)
hdc: tagged command queueing enabled, command queue depth 32
hdc: 60036480 sectors (30739 MB) w/1916KiB Cache, CHS=59560/16/63, UDMA(33)
I haven't patched hdparm to support enabling TCQ yet, so please use the
'TCQ on by default' option. If TCQ has been selected an enabled, there
should be a 'tcq' file in the ide proc directory which will give you a
bit of info about the current state of the drive:
bart:~ # cat /proc/ide/hdc/tcq
Max queue depth: 32
Max achieved depth: 23
Max depth since last: 16
Current depth: 10
ar dfd9e120 is tag 2
ar dfd9e360 is tag 3
ar dfd9e660 is tag 4
ar dfd9e5a0 is tag 5
ar dfd9e7e0 is tag 6
ar dfd9e600 is tag 7
ar dfd9e480 is tag 8
ar dfd9e2a0 is tag 9
ar dfd9e300 is tag 14
ar dfd9e720 is tag 15
hwgroup busy
dma busy
oldest command 5
immed rel 1153, immed comp 12370
As this is the first release, some of this info is a bit
user-unfriendly. Let me explain the fields:
The first four are about the queue depth: maximum supported, maximum
achieved during operations, maximum achieved since last cat of file, and
the current depth at the time of the file cat. Then follows a listing of
the requests in progress. Then some debug info about hwgroup and dma
busy states, and oldest command (in jiffies) in the queue. The last line
indicates the status of sent commands -- either they released the bus
immediately (immed rel: data not ready, we can queue another), or the
drive chose to fetch data immediately (immed comp, no bus release).
I could talk a bit about the IDE cores changes needed to make this
happen (ata_request_t stuff), but I'm too lazy and will wait until
people ask about it. Note that I'm no particular fan of hungarian
notation either, but (again) due to laziness I've used that because it's
easier on the typing fingers. Things are bound to change from this first
alpha release anyways, so I'll just wait.
If you have a recent IBM Deskstar drive (these are the only ones that
support TCQ current, to my knowledge), I would appreciate if you could
give it a spin and report success and failure to me.
Random notes:
- For this release, only a single drive on a channel has been tested. So
don't bother filing bug reports if you don't have a similar setup in
that regard.
- Interrupt handler timer needs to be changed to be per-ata_request, not
per hwgroup.
- General ata_request stuff isn't design complete yet, this could be
taken much further.
- Due to memory consumption friendliness, block queues are configured
for a maximum of 32 segments at this time. This is still enough to
achieve the full 128kb for a dma request that we support anyways
(modulo 48-bit lba, doesn't matter).
- TCQ can't be supported on Promise controllers at all, it intercepts
interrupts so we don't see them.
- Backing off TCQ in case of failure is untested.
- ide.h TCQ stuff needs some cleaning
That said, please give it a go if you can! Patch is against 2.5.8-pre2.
--
Jens Axboe
Jens Axboe wrote:
> Hi,
>
> I've implemented tagged command queueing for ATA disk drives, and it's
> now ready for people to give it a test spin. As it has had only limited
> testing so far, please be very careful with it. It has been tested on
> two drives so far, a GXP75-30gb and a GXP120-40gb, and with a PIIX4
> controller:
OK after a cursory look I see that the patch contains quite
a lot of ideas for the generic code itself. Do you think that it would
be worth wile to extract them first or should the patch be just included
in mainline. (I don't intent to interferre too much with your efforts to
do something similar in 2.4.xx.)....
On Mon, Apr 08 2002, Martin Dalecki wrote:
> Jens Axboe wrote:
> >Hi,
> >
> >I've implemented tagged command queueing for ATA disk drives, and it's
> >now ready for people to give it a test spin. As it has had only limited
> >testing so far, please be very careful with it. It has been tested on
> >two drives so far, a GXP75-30gb and a GXP120-40gb, and with a PIIX4
> >controller:
>
> OK after a cursory look I see that the patch contains quite
> a lot of ideas for the generic code itself. Do you think that it would
> be worth wile to extract them first or should the patch be just included
> in mainline. (I don't intent to interferre too much with your efforts to
> do something similar in 2.4.xx.)....
Good question, I've asked myself that too... Yeah I see some of my ideas
as being nice to have in mainline even without TCQ. The big one being
ata_request_t of course, there are some parts to this:
- Separate scatterlist and dma table out from hwgroup. This is not
really needed for TCQ, but saves doing a blk_rq_map_sg on a request
more than once. If future ATA hardware would support more than one
pending DMA operation per hwgroup, this would be useful even without
TCQ.
- Use ata_request_t as the main request command. This is where I really
want to go. I'm not saying that we need a complete IDE mid layer, but
a private request type is a nice way to unify the passing of a general
command around. So the taskfile stuff would remain very low level,
ata_request would add the higher level parts. I could expand lots more
on this, but I'm quite sure you know where I'm going :-)
Note that the ata_request_t usage is a bit messy in the current patch,
that's merely because I was more focused on getting TCQ stable than
designing this out right now. So I think we should let it mature in the
TCQ patch for just a while before making any final commitments. Agreed?
Of course this will leave me with the pain of merging with your IDE
stuff every time a new -pre comes out (updating this patch from
2.5.1-pre where I last used it was _not_ funny! :-), but I can handle
that.
In addition, there are small buglet fixes in the patch that should go to
general. I will extract these, I already send you one of these earlier
today.
BTW, I just found an SMP race in the current patch. I'll send out a new
version later, for now it's here:
--- ../../linux-2.5.8-pre2/drivers/ide/ide.c Mon Apr 8 14:53:06 2002
+++ drivers/ide/ide.c Mon Apr 8 14:40:33 2002
@@ -1373,8 +1373,17 @@
break;
}
- if (blk_queue_plugged(&drive->queue))
- BUG();
+ /*
+ * there's a small window between where the queue could be
+ * replugged while we are in here when using tcq (in which
+ * case the queue is probably empty anyways...), so check
+ * and leave if appropriate. When not using tqc, this is
+ * still a severe BUG!
+ */
+ if (blk_queue_plugged(&drive->queue)) {
+ BUG_ON(!drive->using_tcq);
+ break;
+ }
/*
* just continuing an interrupted request maybe
--
Jens Axboe
Jens Axboe wrote:
> On Mon, Apr 08 2002, Martin Dalecki wrote:
>
>>Jens Axboe wrote:
>>
>>>Hi,
>>>
>>>I've implemented tagged command queueing for ATA disk drives, and it's
>>>now ready for people to give it a test spin. As it has had only limited
>>>testing so far, please be very careful with it. It has been tested on
>>>two drives so far, a GXP75-30gb and a GXP120-40gb, and with a PIIX4
>>>controller:
>>
>>OK after a cursory look I see that the patch contains quite
>>a lot of ideas for the generic code itself. Do you think that it would
>>be worth wile to extract them first or should the patch be just included
>>in mainline. (I don't intent to interferre too much with your efforts to
>>do something similar in 2.4.xx.)....
>
>
> Good question, I've asked myself that too... Yeah I see some of my ideas
> as being nice to have in mainline even without TCQ. The big one being
> ata_request_t of course, there are some parts to this:
Sure ;-).
> - Separate scatterlist and dma table out from hwgroup. This is not
> really needed for TCQ, but saves doing a blk_rq_map_sg on a request
> more than once. If future ATA hardware would support more than one
> pending DMA operation per hwgroup, this would be useful even without
> TCQ.
Agreed.
> - Use ata_request_t as the main request command. This is where I really
> want to go. I'm not saying that we need a complete IDE mid layer, but
> a private request type is a nice way to unify the passing of a general
> command around. So the taskfile stuff would remain very low level,
> ata_request would add the higher level parts. I could expand lots more
> on this, but I'm quite sure you know where I'm going :-)
Well I can assure you that we are not dragging the towell in two different
directions - please see for example my notes about the ata_taskfile
function having too much parameters ;-).
> Note that the ata_request_t usage is a bit messy in the current patch,
I noted it already ;-)
> that's merely because I was more focused on getting TCQ stable than
> designing this out right now. So I think we should let it mature in the
> TCQ patch for just a while before making any final commitments. Agreed?
No problem with me. I will just pull out the generally good stuff
out of it OK? I hope this will not make the tracking of the
alpha patches too difficult for you...
> Of course this will leave me with the pain of merging with your IDE
> stuff every time a new -pre comes out (updating this patch from
> 2.5.1-pre where I last used it was _not_ funny! :-), but I can handle
> that.
Well, there is *no question* you are capable to do this...
> In addition, there are small buglet fixes in the patch that should go to
> general. I will extract these, I already send you one of these earlier
> today.
Yes I have noticed this as well. However let's wait and see
whatever maybe I'm able to save you the trobule and pull them
out myself. Your alpha patch is "interresting" enough to have me
a walk over it line by line anyway :-).
I have to catch up with 2.5.8-pre2 anyway, since apparently this
weekend was more about alcohol consumption for me then hacking...
On Mon, Apr 08 2002, Martin Dalecki wrote:
> >- Separate scatterlist and dma table out from hwgroup. This is not
> > really needed for TCQ, but saves doing a blk_rq_map_sg on a request
> > more than once. If future ATA hardware would support more than one
> > pending DMA operation per hwgroup, this would be useful even without
> > TCQ.
>
> Agreed.
If we do this, we need to make a decision about how many segments to
enable per command. As I stated, current is 32 which gives us (at least)
128kb per request. This is all we need right now, and I'm not too
convinced that doing much larger requests with 48-bit lba will buys us
_anything_ but bigger latency problems :-). This is just my speculation,
I have no numbers to back this up so far. Now, with a 1kb fs we are
limited to 32kb requests if we don't get good clustering. This might be
a small performance hit, but if you are writing big blocks in a 1kb fs
chances are good thaat you _will_ get good clustering (writing out the 4
consecutive buffer_heads stringed to the page) so I'm not convinced that
this will be a problem either.
So I'd just stick with PRD_SEGMENTS at 32 so far. The over head of going
to, eg, 64 would be 8 * 64 == 512 bytes per ata_request instead of the
current 256 right now. Ok, that's not a lot, but still :-)
> >- Use ata_request_t as the main request command. This is where I really
> > want to go. I'm not saying that we need a complete IDE mid layer, but
> > a private request type is a nice way to unify the passing of a general
> > command around. So the taskfile stuff would remain very low level,
> > ata_request would add the higher level parts. I could expand lots more
> > on this, but I'm quite sure you know where I'm going :-)
>
> Well I can assure you that we are not dragging the towell in two different
> directions - please see for example my notes about the ata_taskfile
> function having too much parameters ;-).
ata_taskfile(drive, ar);
or something to that effect should be very possible, it just requires
taking my generalization a bit further.
> >Note that the ata_request_t usage is a bit messy in the current patch,
>
> I noted it already ;-)
I didn't want ata_request_t changes to pollute the patch too much :-)
> >that's merely because I was more focused on getting TCQ stable than
> >designing this out right now. So I think we should let it mature in the
> >TCQ patch for just a while before making any final commitments. Agreed?
>
> No problem with me. I will just pull out the generally good stuff
> out of it OK? I hope this will not make the tracking of the
> alpha patches too difficult for you...
Yes that's fine with me, and feel free to extend the ata_request stuff
(and anything else). I'll adapt the tcq stuff and submit when ready.
> >In addition, there are small buglet fixes in the patch that should go to
> >general. I will extract these, I already send you one of these earlier
> >today.
>
> Yes I have noticed this as well. However let's wait and see
> whatever maybe I'm able to save you the trobule and pull them
> out myself. Your alpha patch is "interresting" enough to have me
> a walk over it line by line anyway :-).
Alright, I'll let you ponder the stuff for a while and pull what you
want.
> I have to catch up with 2.5.8-pre2 anyway, since apparently this
> weekend was more about alcohol consumption for me then hacking...
Ahem, yes that part I know too :)
--
Jens Axboe
It fails for the most part because there are (3) three holes in the state
diagram. Intel works because the hardware is smart. Highpoint 366 works
because of the host polling timer (will add soon in 2.4). The majority
will fail because of the invalid state the drive leaves the host. Since
the host does not receive proper notice of the command block execution as
a direct result of the device failing to default to a fixed state, it
becomes a mess. In short, until the intelligent host hardware hits the
market, nobody supports Tagged Command Queue. The reason it is randomly
supported by FreeBSD is because it uses the host hardware polling
interrupt. Otherwise the overhead for spinning and chortling cpu clock
cycles is a lark.
Well have fun but expect little success on most hardware.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Mon, 8 Apr 2002, Jens Axboe wrote:
> Hi,
>
> I've implemented tagged command queueing for ATA disk drives, and it's
> now ready for people to give it a test spin. As it has had only limited
> testing so far, please be very careful with it. It has been tested on
> two drives so far, a GXP75-30gb and a GXP120-40gb, and with a PIIX4
> controller:
>
> Intel Corp. 82371AB PIIX4 IDE: IDE controller on PCI slot 00:04.1
> Intel Corp. 82371AB PIIX4 IDE: chipset revision 1
> Intel Corp. 82371AB PIIX4 IDE: not 100% native mode: will probe irqs later
> PIIX: Intel Corp. 82371AB PIIX4 IDE UDMA33 controller on pci00:04.1
> ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:pio
> ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:pio, hdd:pio
> hda: IC35L040AVVA07-0, ATA DISK drive
> hdc: IBM-DTLA-307030, ATA DISK drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> hda: tagged command queueing enabled, command queue depth 32
> hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=79780/16/63, UDMA(33)
> hdc: tagged command queueing enabled, command queue depth 32
> hdc: 60036480 sectors (30739 MB) w/1916KiB Cache, CHS=59560/16/63, UDMA(33)
>
> I haven't patched hdparm to support enabling TCQ yet, so please use the
> 'TCQ on by default' option. If TCQ has been selected an enabled, there
> should be a 'tcq' file in the ide proc directory which will give you a
> bit of info about the current state of the drive:
>
> bart:~ # cat /proc/ide/hdc/tcq
> Max queue depth: 32
> Max achieved depth: 23
> Max depth since last: 16
> Current depth: 10
> ar dfd9e120 is tag 2
> ar dfd9e360 is tag 3
> ar dfd9e660 is tag 4
> ar dfd9e5a0 is tag 5
> ar dfd9e7e0 is tag 6
> ar dfd9e600 is tag 7
> ar dfd9e480 is tag 8
> ar dfd9e2a0 is tag 9
> ar dfd9e300 is tag 14
> ar dfd9e720 is tag 15
> hwgroup busy
> dma busy
> oldest command 5
> immed rel 1153, immed comp 12370
>
> As this is the first release, some of this info is a bit
> user-unfriendly. Let me explain the fields:
>
> The first four are about the queue depth: maximum supported, maximum
> achieved during operations, maximum achieved since last cat of file, and
> the current depth at the time of the file cat. Then follows a listing of
> the requests in progress. Then some debug info about hwgroup and dma
> busy states, and oldest command (in jiffies) in the queue. The last line
> indicates the status of sent commands -- either they released the bus
> immediately (immed rel: data not ready, we can queue another), or the
> drive chose to fetch data immediately (immed comp, no bus release).
>
> I could talk a bit about the IDE cores changes needed to make this
> happen (ata_request_t stuff), but I'm too lazy and will wait until
> people ask about it. Note that I'm no particular fan of hungarian
> notation either, but (again) due to laziness I've used that because it's
> easier on the typing fingers. Things are bound to change from this first
> alpha release anyways, so I'll just wait.
>
> If you have a recent IBM Deskstar drive (these are the only ones that
> support TCQ current, to my knowledge), I would appreciate if you could
> give it a spin and report success and failure to me.
>
> Random notes:
>
> - For this release, only a single drive on a channel has been tested. So
> don't bother filing bug reports if you don't have a similar setup in
> that regard.
> - Interrupt handler timer needs to be changed to be per-ata_request, not
> per hwgroup.
> - General ata_request stuff isn't design complete yet, this could be
> taken much further.
> - Due to memory consumption friendliness, block queues are configured
> for a maximum of 32 segments at this time. This is still enough to
> achieve the full 128kb for a dma request that we support anyways
> (modulo 48-bit lba, doesn't matter).
> - TCQ can't be supported on Promise controllers at all, it intercepts
> interrupts so we don't see them.
> - Backing off TCQ in case of failure is untested.
> - ide.h TCQ stuff needs some cleaning
>
> That said, please give it a go if you can! Patch is against 2.5.8-pre2.
>
> --
> Jens Axboe
>
>