LinuxLists.cc - [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

2004-09-14 15:53:06

Subject: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

My first attempt at posting this seems to have gone AWOL,
so here it comes again. Also being posted to linux-scsi.

Here is the first public release of the 2.6.xx driver
source code for the Pacific Digital Corporation QStor SATA/RAID chip.

This 4-channel chip has hardware-assisted RAID0/RAID1/RAID10,
host-queuing, per-request TCQ/NCQ support, support for hot insertion
and removal of drives, etc.. The 64-bit/66Mhz chip shows throughput
in excess of 200MByte/sec on my ancient P3-1GHz test system,
and can do much better when installed in a PCI-X slot.

The driver (attached) supports most of the chip features,
including host, native and legacy tagged queuing,
but does not yet include boot-from-raid support (coming soon).

Both hdparm and smartmontools are fully supported by this driver.

This patch is against linux-2.6.9-rc2.

Please email me any errors or corrections you may deem necessary
for kernel inclusion.

Signed-off-by: Mark Lord <[email protected]>
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

Attachments:

qstor.patch-2.6.9-rc2.gz (35.02 kB)

2004-09-14 17:14:35

by Alan

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Maw, 2004-09-14 at 16:42, Mark Lord wrote:
> Please email me any errors or corrections you may deem necessary
> for kernel inclusion.

Nothing says its GPL

qstor_printk uses a fixed length buffer yet we have vprintk. It also
appears to be full of magic maybe formatting stuff to optionally insert
things like DRIVER_NAME. Is there a reason for not just using printk ?

qstore_test_logbuf returns the flags value across functions. Not all
platforms can support this (see Rusty's unreliable guide)

What is qstor_alloc about ?

What happens if qstor_scsi_proc_write is passed a length of
0xFFFFFFFF. You seem to have no upper bound nor any trap on the
alloc overflow. Seems a horrible way to expose the raid commands too ?

qstor_exec_special_cmd doesn't consider datalen overflow. I'm not sure
what stops a lot of parallel callers here but not sure if thats fixed by
the command queueing ?

Correct ioctl return is -ENOTTY for unknown (thats a mistake still in
many existing drivers so no suprise its still being copied)

The ioremap_nocache should just be ioremap I believe

Announces for the driver should be KERN_INFO IMHO

Reset code seems to be spending a lot of time with irq's off ?

qstor_dealloc_device checks dev->id != NULL - kfree(NULL) is an allowed
no-op btw.

2004-09-14 17:23:18

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Mark Lord wrote:
> My first attempt at posting this seems to have gone AWOL,
> so here it comes again. Also being posted to linux-scsi.

There is that CC feature in your mailer, you know... :)

Repeating what I posted to linux-scsi:

> Here is the first public release of the 2.6.xx driver
> source code for the Pacific Digital Corporation QStor SATA/RAID chip.
>
> This 4-channel chip has hardware-assisted RAID0/RAID1/RAID10,
> host-queuing, per-request TCQ/NCQ support, support for hot insertion
> and removal of drives, etc.. The 64-bit/66Mhz chip shows throughput
> in excess of 200MByte/sec on my ancient P3-1GHz test system,
> and can do much better when installed in a PCI-X slot.

How much of the RAID is actually hardware-assisted?

This looks pretty much like an ATA driver to me.

Linus vetoed future SCSI->ATA translators. He only allowed libata
because I promised to remove the translation and make it a native block
driver in the future, which I have been working towards (see struct
ata_queued_cmd, etc.)

> The driver (attached) supports most of the chip features,
> including host, native and legacy tagged queuing,
> but does not yet include boot-from-raid support (coming soon).
>
> Both hdparm and smartmontools are fully supported by this driver.

Actual comments on the code:
0) so far, this driver looks like fake RAID to me... if so that's a big
veto. if it's real RAID, only the following grumbles apply :)

1) not endian safe at all

2) I am dubious about including Yet Another set of event logging functions

3) in qstor_read_events you unlock the spinlock without first locking
it, in one path (wait_event_interruptible rc==0)

4) qstor_extract_id_string appears to be generic across IDE/libata/qstor

5) new procfs stuff discouraged

6) style: ditch braces on single statements, e.g.,

+ if (drive != NULL) {
+ qstor_destroy_device(drive);
+ }

7) double spin_unlock_irq possible in qstor_create_device

8) use msleep() rather than schedule_timeout()

9) use of do_sleep paradigm is dubious: you should instead try to keep
your locked code regions as small as possible. in general, this code
has far too many unlock-doit-lock sections.

Experience has shown that too much unlock-doit-lock leads to bugs and
increases the pain when analyzing your locking.

In particular, releasing the lock and sleeping would be very wrong in
the ->queuecommand and error handling paths (alas... I would love to
sleep in the fine-grained eh hooks)

10) in qstor_scsi_done, when is cmd->scsi_done ever NULL?

11) do you properly keep track of the 'done' function passed to you in
->queuecommand? or do you mistakenly assume that cmd->scsi_done is the
same as the ->queuecommand argument?

12) fix the sd.c code, don't add silly driver-specific workarounds:

+ buf[0] = TYPE_DISK; // Cannot use TYPE_RAID -- sd.c rejects it

13) doh! check for pci_map_xxx failure

14) use sg_dma_len() macro, not sg->length

15) Bart and I are slowly moving over to using linux/ata.h for
ATA-generic constants and enums. Please use ATA_CMD_xxx (and add
constants to that header as required).

16) There are WAY too many magic numbers in this driver :(

17) Are you 100% certain of your queued error handling? The reason why
libata doesn't do NCQ is purely because error handling is so
complicated. Potential problems I don't see you handling (but I could
be missing something!):

a) on a bus error (not device error), one has _no idea_ which
commands complete etc.
b) if SERVICE interrupt is enabled, you may not get back the
correct D2H Register FIS showing the errored device in question
c) even if you receive the correct D2H Register FIS, you may
need to manually abort the queue with a NOP

Queueing is easy. Picking up the pieces when it fails isn't.

libata's error handling is dumb, but also easy to review and verify.

18) return -EFOO values from your PCI probe function

19) propagate return value from pci_enable_device

20) check (and return) pci_set_dma_mask retval

21) use the proper ULL suffix for pci_set_dma_mask argument

22) use pci_set_consistent_dma_mask also

23) use pci_request_regions to reserve resources

24) when qstor_probe fails, don't just return! undo the stuff that
occurred before the error (such as calling pci_disable_device or
pci_release_regions or iounmap)

25) propagate return value from request_irq

26) check scsi_add_host return value

27) the following is highly silly. you are locking a function, just so
you can unlock a function so you can sleep.

+ spin_lock_irqsave(uhba->lock, flags);
+ qstor_reset(uhba);
+ spin_unlock_irqrestore(uhba->lock, flags);

28) "SECTOR_BYTES" -- how many more definitions of the 512-byte sector
do we need? :)

29) You use QSTOR_PACKED_STRUCTURE when not needed, which causes gcc to
generate horribly sub-optimal code

30) style: use u32/u64 as kernel standard.

+ unsigned pLEN :32; // Byte count
+ unsigned spare32 :32; // 0

31) none of your bitfield structures are endian-safe

2004-09-14 17:23:14

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Alan Cox wrote:
> Correct ioctl return is -ENOTTY for unknown (thats a mistake still in
> many existing drivers so no suprise its still being copied)

Quoting linux/Documentation/scsi_mid_low_api.txt:
*
* Unfortunately some applications expect -EINVAL and react badly
* when -ENOTTY is returned; stick with -EINVAL.

Looks like a documentation fix is needed then.

This is a hardware RAID device, with various graduations of sharing
possible between hardware and software. It was originally written
pre-libata, and the RAID functionality in particular does not map
well to libata. Nor do the host-queuing implementation and other features.

Thanks Alan. I'll address the other points you brought up later.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-14 17:37:58

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, Sep 14, 2004 at 01:27:54PM -0400, Mark Lord wrote:
> Here's a question for you: like all of the other RAID drivers,
> this one needs an interface to a userland RAID management GUI.
>
> The usual method for this is to create a fake character device driver,
> and use that as the interface to userland. This is commonly done,
> but is it the best way to handle such? A /proc/ or /sys/ interface
> could achieve similar goals, but without the need of a fake device.
>
> We can go either way with this one, so lets hear some opinions on it.

Well,

* if the userland interface is 100% sending cdbs or taskfiles, then I
would prefer that Jens Axboe's "bsg" be used. Its a chardev interface
for sending/receiving commands to a request queue.

* otherwise, I would pick either chrdev or sysfs. if you gotta support
2.4, I guess that means chrdev.

> For the rest, this driver has been around (vendor driver) since before
> libata became usable, and certainly before libata existed in 2.4.xx.
> The driver will eventuall need to compile and run in 2.4.20,
> for customers using old Redhat kernels. It's not there yet,
> but if it were to lean more heavily on 2.6.xx stuff,
> then that will be more difficult to achieve.

libata and all its drivers work on RHEL2.1 (2.4.9), and someone is
even crazy enough to be porting libata to 2.2.x ;-)

Jeff

2004-09-14 17:37:53

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Thanks Jeff,

I'll look into most of your points and give responses and changes
where required. But first, a few overall notes on the approach.

This is a hardware RAID device, but it requires driver knowledge
of the RAID features. It does not map to libata at all, unfortunately,
because all of the queuing features are completely non-SATA standard,
and the RAID stuff is (as normal) peculiar to the chip.

Here's a question for you: like all of the other RAID drivers,
this one needs an interface to a userland RAID management GUI.

The usual method for this is to create a fake character device driver,
and use that as the interface to userland. This is commonly done,
but is it the best way to handle such? A /proc/ or /sys/ interface
could achieve similar goals, but without the need of a fake device.

We can go either way with this one, so lets hear some opinions on it.

For the rest, this driver has been around (vendor driver) since before
libata became usable, and certainly before libata existed in 2.4.xx.
The driver will eventuall need to compile and run in 2.4.20,
for customers using old Redhat kernels. It's not there yet,
but if it were to lean more heavily on 2.6.xx stuff,
then that will be more difficult to achieve.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-14 18:06:09

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

>In particular, releasing the lock and sleeping would be very wrong
>in the ->queuecommand and error handling paths
>(alas... I would love to sleep in the fine-grained eh hooks)

Mmm.. definitely no sleeps in queuecommand(), but sleeping seems
necessary in host_reset_handler() -- the alternative is to just
busywait inline.. which would really not be good.

Isn't the protocol for the eh host_reset_handler() basically
just "do the reset, and return whether it worked or not?".
If so, the driver really has to hang around until the reset
completes so that correct status can be returned. This generally
takes a couple of milliseconds in practice (measured it).

Is there a better way to do that?

I really would prefer never to have to reset the drives,
but when they have a queuing error, many of them simply
won't talk to us again without a reset. The driver avoids
the reset as much as it can for other situations, though.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-14 18:15:46

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, Sep 14, 2004 at 02:03:28PM -0400, Mark Lord wrote:
> Whatever this driver does, it has to be reasonably portable
> back to early 2.4.xx kernels, so it cannot depend too much
> upon newly (or to-be) implemented semantics in 2.6.xx.

Feel free to examine libata's use of ->eh_strategy_handler
in 2.4.x ;-)

Jeff

2004-09-14 18:18:28

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, Sep 14, 2004 at 02:03:28PM -0400, Mark Lord wrote:
> It looks to me as if the eh code prevents further queuecommand()
> calls while the LLD *_reset_handler() code is running.
> I wonder if it also does so for the eh_strategy_handler() ?

Yes, it definitely does, by definition:

SCSI's fine-grained eh hooks are called inside scsi_unjam_host(),
which is the default ->eh_strategy_handler.

Jeff

2004-09-14 18:12:54

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Mark Lord wrote:
> >In particular, releasing the lock and sleeping would be very wrong
> >in the ->queuecommand and error handling paths
> >(alas... I would love to sleep in the fine-grained eh hooks)
>
> Mmm.. definitely no sleeps in queuecommand(), but sleeping seems
> necessary in host_reset_handler() -- the alternative is to just
> busywait inline.. which would really not be good.
>
> Isn't the protocol for the eh host_reset_handler() basically
> just "do the reset, and return whether it worked or not?".
> If so, the driver really has to hang around until the reset
> completes so that correct status can be returned. This generally
> takes a couple of milliseconds in practice (measured it).
>
> Is there a better way to do that?
>
> I really would prefer never to have to reset the drives,
> but when they have a queuing error, many of them simply
> won't talk to us again without a reset. The driver avoids
> the reset as much as it can for other situations, though.

James and I occasionally talk about this. This is a _big_ reason why I
use the ->eh_strategy_handler() rather than the more fine-grained ->eh
hooks: you get unlocked, unfettered sleep priveleges inside the scsi EH
thread.

The SCSI LLD API really needs to -not- spinlock on the EH hooks, and
instead simply guarantee that ->queuecommand and other hooks will not be
called while the driver is in EH.

ISTR James didn't disagree, so maybe a patch can be worked out...

Of course, you could always just use ->eh_strategy_handler and do 100%
of the error handling yourself. That's the route libata chose.

Jeff

2004-09-14 18:38:06

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, 2004-09-14 at 13:00, Jeff Garzik wrote:
> 9) use of do_sleep paradigm is dubious: you should instead try to keep
> your locked code regions as small as possible. in general, this code
> has far too many unlock-doit-lock sections.
>
> Experience has shown that too much unlock-doit-lock leads to bugs and
> increases the pain when analyzing your locking.
>
> In particular, releasing the lock and sleeping would be very wrong in
> the ->queuecommand and error handling paths (alas... I would love to
> sleep in the fine-grained eh hooks)

Actually, its only wrong in queuecommand because that can be called in
softirq context.

Sleeping in the eh paths is fine (as long as you drop the locks that the
EH thread has uselessly taken for you). Indeed it's often required
since the return is supposed to tell the eh thread whether the action
was successful or not.

James

2004-09-14 18:55:10

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, 2004-09-14 at 14:35, Jeff Garzik wrote:
> The lock is taken in the SCSI layer with spin_lock_irqsave(), but the
> low-level driver cannot perform the exact opposite,
> spin_unlock_irqrestore(). The best they can do is spin_lock_irq(),
> which isnt 100% the same.

That's what they do if you look.

The eh_ stubs are only called from the eh_ thread, so it's safe to
enable interrupts as well.

The business of the mid-layer taking the locks is an annoying holdover
from the "drivers don't need to do locking" mentality. Unfortunately
most drivers now simply drop the locks immediately they begin an eh_
entry point and reacquire them just prior to returning ... which makes
all the eh code look messy.

James

2004-09-14 18:43:09

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

On Tue, Sep 14, 2004 at 02:25:35PM -0400, James Bottomley wrote:
> Sleeping in the eh paths is fine (as long as you drop the locks that the
> EH thread has uselessly taken for you). Indeed it's often required
> since the return is supposed to tell the eh thread whether the action
> was successful or not.

I'm not sure this true for all arches?

The lock is taken in the SCSI layer with spin_lock_irqsave(), but the
low-level driver cannot perform the exact opposite,
spin_unlock_irqrestore(). The best they can do is spin_lock_irq(),
which isnt 100% the same.

Jeff

2004-09-14 18:12:45

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

>The SCSI LLD API really needs to -not- spinlock on the EH hooks,
>and instead simply guarantee that ->queuecommand and other hooks
>will not be called while the driver is in EH.
>
>ISTR James didn't disagree, so maybe a patch can be worked out...

It looks to me as if the eh code prevents further queuecommand()
calls while the LLD *_reset_handler() code is running.
I wonder if it also does so for the eh_strategy_handler() ?

Have to look at it, I guess.

>Of course, you could always just use ->eh_strategy_handler
>and do 100% of the error handling yourself.

Mmmm.. yes, that may be better, perhaps.

Whatever this driver does, it has to be reasonably portable
back to early 2.4.xx kernels, so it cannot depend too much
upon newly (or to-be) implemented semantics in 2.6.xx.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-15 02:41:54

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

>Actually, its only wrong in queuecommand because that can be called in
>softirq context.
>
>Sleeping in the eh paths is fine (as long as you drop the locks that the
>EH thread has uselessly taken for you).

Good, that's how I understood it as well.

But the locking is certainly a mess as-is in the QStor driver.
Sure, it is actually all technically correct, but hard to follow.

I believe I can remove nearly all of it and really tidy things up
as a result.

Thanks guys, this has been really helpful so far.
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-15 02:47:36

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Groovy. FWIW (if it wasn't obvious from context) my objection in
general to the driver is withdrawn, since you explained it is RAID and
not an ATA driver.

I would really like to work on consolidating the ATA code in libata,
though. As the name implies, it's a library -- don't feel that your
driver must conform to the libata driver API in order to make use of all
its functions. And feel free to add to it.

Jeff

2004-09-15 12:37:09

by Mark Lord

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

I would really like to work on consolidating the ATA code in libata,
though. As the name implies, it's a library -- don't feel that your
driver must conform to the libata driver API in order to make use of all
its functions. And feel free to add to it.

Yes, there are definite code sharing possibilities there to be explored.
Right now, my first priority is to get support for this hardware
into the kernel. This same driver source will also be backported
to mid-2.4.xx series, both Redhat and generic.

After that, we can modify some interfaces to reduce the small overlaps
that may present.

Next revision is due out later today. It may still have a few warts
to work out, but I think it is looking much better than before.

Better to have a decent hardware driver within the tree,
than an unknown vendor-only binary driver outside the tree.

Cheers
--
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

2004-09-15 17:01:59

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH] New QStor SATA/RAID Driver for 2.6.9-rc2

Mark Lord wrote:
> Currently on Linux, that interface is called "SCSI".
> I think it might not be unreasonable to gradually evolve
> the SCSI host interface to include, say, a non-translating
> queuecommand() method, and associated pals.
[...]
> We practically have that already today.
> The SCSI mid-layer is a nice generic block device glue system.
> We just need perhaps to make it less SCSI-specific.

You seem to have independent reached the same conclusion I did :)

To be specific, SCSI provides LLD infrastructure that block does not:
1) infrastructure for queueing, retrying, and timing out requests
2) an error handling thread.
3) a standard method of addressing attached devices
4) a standard method of submitting raw commands from userspace

It is my goal to shift this infrastructure from SCSI to block over time.
There is a fair amount of queueing infrastructure now in 2.6 (part of
#1), and Jens already has test code for #4.

I had hoped to start working on this in 2.7, but with the new dev model
2.7 is postponed indefinitely.... so I guess I'll start working on it
sooner rather than later :)

Jeff