2005-09-27 01:01:26

by Lukasz Kosewski

[permalink] [raw]
Subject: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Hey Jeff, all,

Patch 2/3 for libata hotswapping, the 'general purpose glitch-free
incredibly awesome' hotswapping API.

Lots of comments available in patch and header, please review, apply,
and convert other drivers to it if you like it.

Refer to patch 3 for a reference implementation.

Luke Kosewski


Attachments:
(No filename) (304.00 B)
02-libata_hotswap_infrastructure-2.6.14-rc2-git5.diff (11.33 kB)
Download all attachments

2005-09-27 19:42:16

by Bernhard C. Schrenk

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Hi Luke,

I tested your previous version of this patch on an AMD64. I found one 64 bit
bug, crashing the kernel. After fixing it, it worked quite well.

Please change in the file libata-core.c the line

ap->hotplug_timer.data = (unsigned int)ap;

to

ap->hotplug_timer.data = (unsigned long)ap;

Thanks for this great patch,
Bernhard

2005-09-27 21:47:24

by Lukasz Kosewski

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

On 9/27/05, Bernhard C. Schrenk <[email protected]> wrote:
> I tested your previous version of this patch on an AMD64. I found one 64 bit
> bug, crashing the kernel. After fixing it, it worked quite well.
>
> Please change in the file libata-core.c the line
>
> ap->hotplug_timer.data = (unsigned int)ap;
>
> to
>
> ap->hotplug_timer.data = (unsigned long)ap;

Hey Bernhard, Thomas, Jeff,

This bug was the product of my flagrant attempts to defy the C
standard of sizeof(long) == sizeof(void *). Apparently, I have failed
to slip it into the kernel once again. Super props go to me since I
even have an Athlon 64 machine and didn't commit it to testing this.

Jeff, please accept this revised patch in place of the original patch
2, the only difference being the change suggested above.

Thanks to Thomas Lustig and Bernhard Schrenk for both pointing this out!

Luke Kosewski


Attachments:
(No filename) (875.00 B)
02-libata_hotswap_infrastructure-2.6.14-rc2-git5-2.diff (11.45 kB)
Download all attachments

2005-09-28 19:10:44

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Lukasz Kosewski wrote:
> @@ -1147,6 +1148,11 @@ static void ata_dev_identify(struct ata_
> return;
> }
>
> + /* Necessary if we had an LBA48 drive in, we pulled it out, and put a
> + * non-LBA48 drive on the same ap.
> + */
> + dev->flags &= ~ATA_DFLAG_LBA48;
> +

probably should just clear out all the flags...


> @@ -3692,6 +3698,102 @@ idle_irq:
> return 0; /* irq not handled */
> }
>
> +void ata_check_kill_qc(struct ata_port *ap)
> +{
> + struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
> +
> + if (unlikely(qc)) {
> + /* This is SO bad. But we can't just run
> + * ata_qc_complete without doing this, because
> + * ata_scsi_qc_complete wants to talk to the device,
> + * and we can't let it do that since it doesn't exist
> + * anymore.
> + */

is ata_chk_err() call in ata_to_sense_error() the only place that needs
to talk to the hardware? If so, maybe we could work around that by
making sure it is passed register values, avoiding the need to poke the h/w.



> + ata_scsi_prepare_qc_abort(qc);
> + ata_qc_complete(qc, ATA_ERR);
> + }
> +}
> +
> +static void ata_hotplug_task(void *_data)
> +{
> + struct ata_port *ap = (struct ata_port *)_data;
> + enum hotplug_states hotplug_todo;
> +
> + spin_lock(&ap->host_set->lock);
> + hotplug_todo = ap->plug; // Make sure we don't modify while reading!
> + spin_unlock(&ap->host_set->lock);

this lock should always be acquired using spin_lock_irqsave()


> + if (hotplug_todo == ATA_HOTPLUG_PLUG) {
> + DPRINTK("Got a plug request on port %d\n", ap->id);
> +
> + /* This might be necessary if we unplug and plug in a drive
> + * within ATA_TMOUT_HOTSWAP / HZ seconds... due to the debounce
> + * timer, one event is generated. Since the last event was a
> + * plug, the unplug routine never gets called, so we need to
> + * clean up the mess first. If there was never a drive here in
> + * the first place, this will just do nothing. Otherwise, it
> + * basically prevents 'plug' from being called twice with no
> + * unplug in between.
> + */
> + ata_scsi_handle_unplug(ap);
> +
> + // The following flag is necessary on some Seagate drives.
> + ap->flags |= ATA_FLAG_SATA_RESET;

we can't unconditionally set this for all controllers

For PATA hotplug, which this code will eventually handle, the SATA
SControl register doesn't even exist :)


> + ap->udma_mask = ap->orig_udma_mask;

dumb question -- what is this for?

for hotplug we'll want to go through the entire probe sequence,
configuring pio/dma masks all over again.


> + if (!ata_bus_probe(ap)) //Drive found! Tell the SCSI layer!
> + ata_scsi_handle_plug(ap);
> + } else if (hotplug_todo == ATA_HOTPLUG_UNPLUG) {
> + DPRINTK("Got an unplug request on port %d\n", ap->id);
> +
> + ata_scsi_handle_unplug(ap);
> + } else /* FIXME: Some kind of error condition? */
> + BUG();
> +}
> +
> +static void ata_hotplug_timer_func(unsigned long _data)
> +{
> + struct ata_port *ap = (struct ata_port *)_data;
> +
> + queue_work(ata_hotplug_wq, &ap->hotplug_task);
> +}

if this is all the timer function needs to do, then you can just use
queue_work_delayed()


> +void ata_hotplug_plug(struct ata_port *ap)
> +{
> + del_timer(&ap->hotplug_timer);
> +
> + if (ap->ops->hotplug_plug_janitor)
> + ap->ops->hotplug_plug_janitor(ap);
> +
> + /* This line should be protected by a host_set->lock. If we follow the
> + * call chain up from this, it should be called from ata_hotplug_unplug
> + * or ata_hotplug_plug, which should be called from an interrupt
> + * handler. Make sure the call to either of those functions is
> + * protected.
> + */
> + ap->plug = ATA_HOTPLUG_PLUG;
> +
> + ap->hotplug_timer.expires = jiffies + ATA_TMOUT_HOTSWAP;
> + add_timer(&ap->hotplug_timer);
> +}
> +
> +void ata_hotplug_unplug(struct ata_port *ap)

I prefer the names ata_hot_plug() and ata_hot_unplug()


> +{
> + ata_port_disable(ap); //Gone gone gone!
> +
> + del_timer(&ap->hotplug_timer);
> +
> + if (ap->ops->hotplug_unplug_janitor)
> + ap->ops->hotplug_unplug_janitor(ap);
> +
> + /* See comment near similar line in ata_hotplug_plug function.
> + */
> + ap->plug = ATA_HOTPLUG_UNPLUG;
> +
> + ap->hotplug_timer.expires = jiffies + ATA_TMOUT_HOTSWAP;
> + add_timer(&ap->hotplug_timer);
> +}
> +
> /**
> * ata_interrupt - Default ATA host interrupt handler
> * @irq: irq line (unused)
> @@ -3925,7 +4027,12 @@ static void ata_host_init(struct ata_por
> ap->cbl = ATA_CBL_NONE;
> ap->active_tag = ATA_TAG_POISON;
> ap->last_ctl = 0xFF;
> + ap->orig_udma_mask = ent->udma_mask;
>
> + ap->hotplug_timer.function = ata_hotplug_timer_func;
> + ap->hotplug_timer.data = (unsigned int)ap;
> + init_timer(&ap->hotplug_timer);
> + INIT_WORK(&ap->hotplug_task, ata_hotplug_task, ap);
> INIT_WORK(&ap->packet_task, atapi_packet_task, ap);
> INIT_WORK(&ap->pio_task, ata_pio_task, ap);
>
> @@ -4541,6 +4648,11 @@ static int __init ata_init(void)
> ata_wq = create_workqueue("ata");
> if (!ata_wq)
> return -ENOMEM;
> + ata_hotplug_wq = create_workqueue("ata_hotplug");
> + if (!ata_hotplug_wq) {
> + destroy_workqueue(ata_wq);
> + return -ENOMEM;
> + }
>
> printk(KERN_DEBUG "libata version " DRV_VERSION " loaded.\n");
> return 0;
> @@ -4549,6 +4661,7 @@ static int __init ata_init(void)
> static void __exit ata_exit(void)
> {
> destroy_workqueue(ata_wq);
> + destroy_workqueue(ata_hotplug_wq);
> }
>
> module_init(ata_init);
> @@ -4604,6 +4717,8 @@ EXPORT_SYMBOL_GPL(ata_dev_classify);
> EXPORT_SYMBOL_GPL(ata_dev_id_string);
> EXPORT_SYMBOL_GPL(ata_dev_config);
> EXPORT_SYMBOL_GPL(ata_scsi_simulate);
> +EXPORT_SYMBOL_GPL(ata_hotplug_plug);
> +EXPORT_SYMBOL_GPL(ata_hotplug_unplug);
>
> #ifdef CONFIG_PCI
> EXPORT_SYMBOL_GPL(pci_test_config_bits);
> diff -rpuN linux-2.6.14-rc1/drivers/scsi/libata.h linux-2.6.14-rc1-new/drivers/scsi/libata.h
> --- linux-2.6.14-rc1/drivers/scsi/libata.h 2005-09-12 23:12:09.000000000 -0400
> +++ linux-2.6.14-rc1-new/drivers/scsi/libata.h 2005-09-14 19:57:53.000000000 -0400
> @@ -44,6 +44,7 @@ extern struct ata_queued_cmd *ata_qc_new
> extern void ata_qc_free(struct ata_queued_cmd *qc);
> extern int ata_qc_issue(struct ata_queued_cmd *qc);
> extern int ata_check_atapi_dma(struct ata_queued_cmd *qc);
> +extern void ata_check_kill_qc(struct ata_port *ap);
> extern void ata_dev_select(struct ata_port *ap, unsigned int device,
> unsigned int wait, unsigned int can_sleep);
> extern void ata_tf_to_host_nolock(struct ata_port *ap, struct ata_taskfile *tf);
> @@ -79,6 +80,9 @@ extern void ata_scsi_badcmd(struct scsi_
> extern void ata_scsi_rbuf_fill(struct ata_scsi_args *args,
> unsigned int (*actor) (struct ata_scsi_args *args,
> u8 *rbuf, unsigned int buflen));
> +extern void ata_scsi_prepare_qc_abort(struct ata_queued_cmd *qc);
> +extern void ata_scsi_handle_plug(struct ata_port *ap);
> +extern void ata_scsi_handle_unplug(struct ata_port *ap);
>
> static inline void ata_bad_scsiop(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *))
> {
> diff -rpuN linux-2.6.14-rc1/drivers/scsi/libata-scsi.c linux-2.6.14-rc1-new/drivers/scsi/libata-scsi.c
> --- linux-2.6.14-rc1/drivers/scsi/libata-scsi.c 2005-09-12 23:12:09.000000000 -0400
> +++ linux-2.6.14-rc1-new/drivers/scsi/libata-scsi.c 2005-09-14 19:57:53.000000000 -0400
> @@ -716,6 +716,53 @@ static int ata_scsi_qc_complete(struct a
> return 0;
> }
>
> +static int ata_scsi_qc_abort(struct ata_queued_cmd *qc, u8 drv_stat)
> +{
> + struct scsi_cmnd *cmd = qc->scsicmd;
> +
> + cmd->result = SAM_STAT_TASK_ABORTED; //FIXME: Is this what we want?

Just trace through the error handling code, the answer should fall out
from there...


> + qc->scsidone(cmd);
> +
> + return 0;
> +}
> +
> +void ata_scsi_prepare_qc_abort(struct ata_queued_cmd *qc)
> +{
> + /* For some reason or another, we can't allow a normal scsi_qc_complete.
> + * Note that at this point, we must be SURE the command is going to time
> + * out, or else we might be changing this as it's being called. Never a
> + * good thing.
> + */
> + if (qc->complete_fn == ata_scsi_qc_complete);
> + qc->complete_fn = ata_scsi_qc_abort;
> +}
> +
> +void ata_scsi_handle_plug(struct ata_port *ap)
> +{
> + //Currently SATA only

Please change comments to follow the standard style in the rest of the
drivers: /* foo */


> + scsi_add_device(ap->host, 0, 0, 0);
> +}
> +
> +void ata_scsi_handle_unplug(struct ata_port *ap)
> +{
> + //SATA only, no PATA

why? This function looks generic enough to work for PATA

2005-09-28 22:53:55

by Alan

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

For PATA the requirements I'm aware of are

- Interface for user to say "am about to swap"
- Interface for user to say "have swapped"
- Must quiesce both master and slave before swap (or one per cable)
- Must reset to PIO_SLOW and then recompute modes for both devices
becuse it is possible that changing one changes the other timings
- The above is true for *unplug* too. A straight unplug may speed up the
other drive!
- Post hotswap need to reconfigure both drives as if from scratch

2005-10-03 14:30:22

by Lukasz Kosewski

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

On 9/28/05, Jeff Garzik <[email protected]> wrote:
> > + dev->flags &= ~ATA_DFLAG_LBA48;
> > +
>
> probably should just clear out all the flags...

I'll look into making this cleaner :) During testing this was the
only one that was necessary, but I'll do some clean-up foo on this.

> is ata_chk_err() call in ata_to_sense_error() the only place that needs
> to talk to the hardware? If so, maybe we could work around that by
> making sure it is passed register values, avoiding the need to poke the h/w.

Yep, it is the only place as far as I can see. As for passing
register values, the reason I decided not to do that is because, say,
a user unplugged his drive because there WAS some kind of error which
set ATA_ERR. This would attempt to perform some kind of error check
which would compound the problem. However, I agree this is dirty :)
Should I just check the register values, & ~ATA_ERR, and then pass that in?

> > + spin_lock(&ap->host_set->lock);
> > + hotplug_todo = ap->plug; // Make sure we don't modify while reading!
> > + spin_unlock(&ap->host_set->lock);
>
> this lock should always be acquired using spin_lock_irqsave()

Oops.

> > + ata_scsi_handle_unplug(ap);
> > +
> > + // The following flag is necessary on some Seagate drives.
> > + ap->flags |= ATA_FLAG_SATA_RESET;
>
> we can't unconditionally set this for all controllers
>
> For PATA hotplug, which this code will eventually handle, the SATA
> SControl register doesn't even exist :)

if (ap & ATA_FLAG_SATA) ap->flags |= ATA_FLAG_SATA_RESET
? :)

> > + ap->udma_mask = ap->orig_udma_mask;
>
> dumb question -- what is this for?
>
> for hotplug we'll want to go through the entire probe sequence,
> configuring pio/dma masks all over again.

OK I'll admit I may not have looked through the code enough on that
one, but here's a situation I remember about this:
Had a drive which supported UDMA 5 max. Unplugged it, plugged in a
drive which supported UDMA 6, but libata reported UDMA 5 max. Since
the flags aren't reset, and ata_mode_string() works from slowest to
fastest, this will always happen unless we reset the flags to some
default value. So... voila. Do you have a suggestion for making this
cleaner?

> > +static void ata_hotplug_timer_func(unsigned long _data)
> > +{
> > + struct ata_port *ap = (struct ata_port *)_data;
> > +
> > + queue_work(ata_hotplug_wq, &ap->hotplug_task);
> > +}
>
> if this is all the timer function needs to do, then you can just use
> queue_work_delayed()

I agree.

> > +void ata_hotplug_plug(struct ata_port *ap)

<snip>

> > +void ata_hotplug_unplug(struct ata_port *ap)
>
> I prefer the names ata_hot_plug() and ata_hot_unplug()

Sure.

> > +static int ata_scsi_qc_abort(struct ata_queued_cmd *qc, u8 drv_stat)
> > +{
> > + struct scsi_cmnd *cmd = qc->scsicmd;
> > +
> > + cmd->result = SAM_STAT_TASK_ABORTED; //FIXME: Is this what we want?
>
> Just trace through the error handling code, the answer should fall out
> from there...

Yeah... I put that fixme in there in August to remind myself to do
that... apparently the reminder didn't help. I'll look into it.

> > +void ata_scsi_handle_plug(struct ata_port *ap)
> > +{
> > + //Currently SATA only
>
> Please change comments to follow the standard style in the rest of the
> drivers: /* foo */

Righto.

> > +void ata_scsi_handle_unplug(struct ata_port *ap)
> > +{
> > + //SATA only, no PATA
>
> why? This function looks generic enough to work for PATA

This just refers to the fact that I'm currently just looking for a
drive with channel, id, and lun all equal to 0. I'll remove the
comment.

OK, looks like I have my work cut out for me. Thanks for the feedback
Jeff, I'll resubmit this as soon as I've made changes and done some
more testing to ensure it's still stable.

2005-10-03 15:19:58

by Lukasz Kosewski

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Hey Alan,

On 9/28/05, Alan Cox <[email protected]> wrote:
> For PATA the requirements I'm aware of are
>
> - Interface for user to say "am about to swap"

You mean something like "echo scsi-remove-single-device a b c d" >
/proc/scsi/scsi? I guess the sysfs equivalent?

> - Interface for user to say "have swapped"

I suppose ditto.

> - Must quiesce both master and slave before swap (or one per cable)

The way I've written my infrastructure, this seems as if it'll just
require another carefully placed hook function.

> - Must reset to PIO_SLOW and then recompute modes for both devices
> becuse it is possible that changing one changes the other timings

This shouldn't be hard since I already do a similar reset by resetting
udma_flags to a pre-init state. Probably in an if (!(ap->flags &
ATA_FLAG_SATA)).

> - The above is true for *unplug* too. A straight unplug may speed up the
> other drive!
> - Post hotswap need to reconfigure both drives as if from scratch

Hmm, this seems far more complicated... basically during a swap
operation, we have to shut down all I/O to the other drive on the
cable (if there is one), if I read you correctly, and then reconfigure
both drives once one is plugged in.

>From what you're saying, it seems to me that the infrastructure I put
forth will work as is, plus some if statements and extraneous
PATA-only functions (and functionality like shutting down the other
disk on the cable until the user calls the 'warm-swap complete'
function').

How about this; I want this SATA hotswapping stuff to be tested, so
I'll commit my patches for 'SATA only' for the time being. I'll stare
at them for a while and then see what kind of PATA-specific if
statements and hooks are necessary in the code?

Luke

2005-10-03 15:43:50

by Alan

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

On Llu, 2005-10-03 at 11:19 -0400, Lukasz Kosewski wrote:
> You mean something like "echo scsi-remove-single-device a b c d" >
> /proc/scsi/scsi? I guess the sysfs equivalent?
>
> > - Interface for user to say "have swapped"
>
> I suppose ditto.

Yes. A lot of PATA does swapping only if you tell it first so it can
kill IORDY or tristate the bus.

> > - Post hotswap need to reconfigure both drives as if from scratch
>
> Hmm, this seems far more complicated... basically during a swap
> operation, we have to shut down all I/O to the other drive on the
> cable (if there is one), if I read you correctly, and then reconfigure
> both drives once one is plugged in.

Yes.

> How about this; I want this SATA hotswapping stuff to be tested, so
> I'll commit my patches for 'SATA only' for the time being. I'll stare
> at them for a while and then see what kind of PATA-specific if
> statements and hooks are necessary in the code?

Makes sense to me - the PATA stuff is slowly developing and its getting
closer to submittable as bits of the core code get merged.

2005-10-03 15:47:14

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Lukasz Kosewski wrote:
> On 9/28/05, Jeff Garzik <[email protected]> wrote:
>
>>>+ dev->flags &= ~ATA_DFLAG_LBA48;
>>>+
>>
>>probably should just clear out all the flags...
>
>
> I'll look into making this cleaner :) During testing this was the
> only one that was necessary, but I'll do some clean-up foo on this.
>
>
>>is ata_chk_err() call in ata_to_sense_error() the only place that needs
>>to talk to the hardware? If so, maybe we could work around that by
>>making sure it is passed register values, avoiding the need to poke the h/w.
>
>
> Yep, it is the only place as far as I can see. As for passing
> register values, the reason I decided not to do that is because, say,
> a user unplugged his drive because there WAS some kind of error which
> set ATA_ERR. This would attempt to perform some kind of error check
> which would compound the problem. However, I agree this is dirty :)
> Should I just check the register values, & ~ATA_ERR, and then pass that in?
>
>
>>>+ spin_lock(&ap->host_set->lock);
>>>+ hotplug_todo = ap->plug; // Make sure we don't modify while reading!
>>>+ spin_unlock(&ap->host_set->lock);
>>
>>this lock should always be acquired using spin_lock_irqsave()
>
>
> Oops.
>
>
>>>+ ata_scsi_handle_unplug(ap);
>>>+
>>>+ // The following flag is necessary on some Seagate drives.
>>>+ ap->flags |= ATA_FLAG_SATA_RESET;
>>
>>we can't unconditionally set this for all controllers
>>
>>For PATA hotplug, which this code will eventually handle, the SATA
>>SControl register doesn't even exist :)
>
>
> if (ap & ATA_FLAG_SATA) ap->flags |= ATA_FLAG_SATA_RESET
> ? :)
>
>
>>>+ ap->udma_mask = ap->orig_udma_mask;
>>
>>dumb question -- what is this for?
>>
>>for hotplug we'll want to go through the entire probe sequence,
>>configuring pio/dma masks all over again.
>
>
> OK I'll admit I may not have looked through the code enough on that
> one, but here's a situation I remember about this:
> Had a drive which supported UDMA 5 max. Unplugged it, plugged in a
> drive which supported UDMA 6, but libata reported UDMA 5 max. Since
> the flags aren't reset, and ata_mode_string() works from slowest to
> fastest, this will always happen unless we reset the flags to some
> default value. So... voila. Do you have a suggestion for making this
> cleaner?

Well, overall what needs to happen for newly-plugged-in devices is the a
re-run of the probe sequence. The flags should certainly be reset to a
default value, the same value that the device would see had it been
plugged in when the driver loaded: ata_bus_probe() call sequence, after
initialization from ata_host_init(). This may require saving a few
pieces of data, as you have done, agreed.

Jeff


2005-10-03 15:53:33

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 2/3] Add disk hotswap support to libata RESEND #5

Lukasz Kosewski wrote:
> How about this; I want this SATA hotswapping stuff to be tested, so
> I'll commit my patches for 'SATA only' for the time being. I'll stare
> at them for a while and then see what kind of PATA-specific if
> statements and hooks are necessary in the code?

Ideally we should just create hooks for any SATA-specific behavior, and
ensure that nothing SATA-specific is written into any of the core paths.

One of the SATA controllers, Intel ICH5 & ICH6, does not have a hotplug
interrupt, but yet supports "coldplug":

* user indicates to kernel, to disable the SATA port
* kernel says "OK, it's disabled"
* user disconnects hard drive
and
* SATA port is disabled
* user connects hard drive
* user indicates to kernel, to enable SATA port
* kernel says "OK, I've turned it on" and probes it

This is a real-world, high-volume SATA case, yet it functionally behaves
like PATA.

So that causes us to consider various entry points:

* {something}, be it a hot-unplug interrupt or user write(2) to sysfs,
tells us a device is gone
* {something}, be it a hot-plug interrupt or user write(2) to sysfs,
tells us a new device appeared

So for either SATA or PATA, it should look similar in the core: we just
need a "kick", a function call that triggers one of these two actions.
The handling of those actions [your code] should hopefully be pretty
generic. ;-)

Thanks for working on this!

Jeff