LinuxLists.cc - [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

2007-10-29 14:42:23

Subject: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

This is the next revision of the SCSI event notification infrastructure
patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD
devices that support it.

For devices that support SATA AN (only very recent ones do), this means
that HAL and other userspace utilities no longer need to repeatedly poll
the CD/DVD device to determine if the user has changed the media.

This revision takes into account James' comments from earlier today,
modulo the following notes:

* I think the various event attributes should always be present,
for all devices at all times. If various events are not supported,
the attribute will of course return zero (false, not supported).

* I do not think this work should be blocked behind a revamp
of the attribute group interface.

* I was slack and did not bother to implement the 'set' operation
for the attributes. This can easily be done at a later time in a
separate patch. It is not a merge stopper to have the driver
exclusively control the event mask, rather than driver+sysfs.

2007-10-29 14:42:35

by Jeff Garzik

[permalink] [raw]

Subject: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

Originally based on a patch by Kristen Carlson Accardi @ Intel.
Copious input from James Bottomley.

Signed-off-by: Jeff Garzik <[email protected]>
---
drivers/scsi/scsi_lib.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
drivers/scsi/scsi_scan.c | 2 +
drivers/scsi/scsi_sysfs.c | 20 +++++++++++++
include/scsi/scsi_device.h | 12 ++++++++
4 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..f55ec80 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -18,6 +18,7 @@
#include <linux/delay.h>
#include <linux/hardirq.h>
#include <linux/scatterlist.h>
+#include <linux/bitmap.h>

#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -2115,6 +2116,71 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
EXPORT_SYMBOL(scsi_device_set_state);

/**
+ * sdev_evt_thread - send a uevent for each scsi event
+ * @work: work struct for scsi_device
+ *
+ * Emit all queued media events as environment variables
+ * sent during a uevent.
+ */
+void scsi_evt_thread(struct work_struct *work)
+{
+ struct scsi_device *sdev;
+ char *envp[SDEV_EVT_LAST + 2];
+ DECLARE_BITMAP(mask, SDEV_EVT_MAXBITS);
+ unsigned long flags;
+ int evt, idx;
+
+ sdev = container_of(work, struct scsi_device, event_work);
+
+ spin_lock_irqsave(&sdev->list_lock, flags);
+ bitmap_copy(mask, sdev->event_mask, SDEV_EVT_MAXBITS);
+ bitmap_zero(sdev->event_mask, SDEV_EVT_MAXBITS);
+ spin_unlock_irqrestore(&sdev->list_lock, flags);
+
+ idx = 0;
+ for (evt = 0; evt < SDEV_EVT_LAST; evt++) {
+ if (!test_bit(evt, mask))
+ continue;
+
+ switch (evt) {
+ case SDEV_EVT_MEDIA_CHANGE:
+ envp[idx++] = "SDEV_MEDIA_CHANGE=1";
+ break;
+ }
+ }
+ envp[idx++] = NULL;
+
+ kobject_uevent_env(&sdev->sdev_gendev.kobj, KOBJ_CHANGE, envp);
+}
+
+/**
+ * sdev_evt_notify - send asserted events to uevent thread
+ * @sdev: scsi_device event occurred on
+ * @map: the bitmapped list of asserted events (SDEV_EVT_xxx)
+ *
+ *
+ */
+void sdev_evt_notify(struct scsi_device *sdev, const unsigned long *map)
+{
+ DECLARE_BITMAP(tmp_map, SDEV_EVT_MAXBITS);
+ unsigned long flags;
+
+ spin_lock_irqsave(&sdev->list_lock, flags);
+
+ bitmap_and(tmp_map, sdev->supported_events, map, SDEV_EVT_MAXBITS);
+
+ if (!bitmap_empty(tmp_map, SDEV_EVT_MAXBITS)) {
+ bitmap_or(sdev->event_mask, sdev->event_mask, tmp_map,
+ SDEV_EVT_MAXBITS);
+
+ schedule_work(&sdev->event_work);
+ }
+
+ spin_unlock_irqrestore(&sdev->list_lock, flags);
+}
+EXPORT_SYMBOL_GPL(sdev_evt_notify);
+
+/**
* scsi_device_quiesce - Block user issued commands.
* @sdev: scsi device to quiesce.
*
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index b53c5f6..3632b62 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -236,6 +236,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
struct scsi_device *sdev;
int display_failure_msg = 1, ret;
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
+ extern void scsi_evt_thread(struct work_struct *work);

sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
GFP_ATOMIC);
@@ -255,6 +256,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
INIT_LIST_HEAD(&sdev->cmd_list);
INIT_LIST_HEAD(&sdev->starved_entry);
spin_lock_init(&sdev->list_lock);
+ INIT_WORK(&sdev->event_work, scsi_evt_thread);

sdev->sdev_gendev.parent = get_device(&starget->dev);
sdev->sdev_target = starget;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index d531cee..9a1b0c5 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -282,6 +282,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
list_del(&sdev->starved_entry);
spin_unlock_irqrestore(sdev->host->host_lock, flags);

+ cancel_work_sync(&sdev->event_work);
+
if (sdev->request_queue) {
sdev->request_queue->queuedata = NULL;
/* user context needed to free queue */
@@ -614,6 +616,23 @@ sdev_show_modalias(struct device *dev, struct device_attribute *attr, char *buf)
}
static DEVICE_ATTR(modalias, S_IRUGO, sdev_show_modalias, NULL);

+#define DECLARE_EVT_SHOW(name, Cap_name) \
+static ssize_t \
+sdev_show_##name(struct device *dev, struct device_attribute *attr, \
+ char *buf) \
+{ \
+ struct scsi_device *sdev = to_scsi_device(dev); \
+ int val = test_bit(SDEV_EVT_##Cap_name, sdev->supported_events);\
+ return snprintf(buf, 20, "%d\n", val); \
+}
+
+#define DECLARE_EVT(name, Cap_name) \
+ DECLARE_EVT_SHOW(name, Cap_name) \
+ static DEVICE_ATTR(evt_##name, S_IRUGO, sdev_show_##name, NULL);
+#define REF_EVT(name) &dev_attr_evt_##name.attr
+
+DECLARE_EVT(media_change, MEDIA_CHANGE)
+
/* Default template for device attributes. May NOT be modified */
static struct attribute *scsi_sdev_attrs[] = {
&dev_attr_device_blocked.attr,
@@ -631,6 +650,7 @@ static struct attribute *scsi_sdev_attrs[] = {
&dev_attr_iodone_cnt.attr,
&dev_attr_ioerr_cnt.attr,
&dev_attr_modalias.attr,
+ REF_EVT(media_change),
NULL
};

diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d5057bc..063827a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -46,6 +46,13 @@ enum scsi_device_state {
* to the scsi lld. */
};

+enum scsi_device_event {
+ SDEV_EVT_MEDIA_CHANGE = 1, /* media has changed */
+
+ SDEV_EVT_LAST = SDEV_EVT_MEDIA_CHANGE,
+ SDEV_EVT_MAXBITS = SDEV_EVT_LAST + 1
+};
+
struct scsi_device {
struct Scsi_Host *host;
struct request_queue *request_queue;
@@ -127,6 +134,10 @@ struct scsi_device {
unsigned guess_capacity:1; /* READ_CAPACITY might be too high by 1 */
unsigned retry_hwerror:1; /* Retry HARDWARE_ERROR */

+ DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
+ DECLARE_BITMAP(event_mask, SDEV_EVT_MAXBITS); /* asserted events */
+ struct work_struct event_work;
+
unsigned int device_blocked; /* Device returned QUEUE_FULL. */

unsigned int max_device_blocked; /* what device_blocked counts down from */
@@ -275,6 +286,7 @@ extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
int retries);
extern int scsi_device_set_state(struct scsi_device *sdev,
enum scsi_device_state state);
+extern void sdev_evt_notify(struct scsi_device *sdev, const unsigned long *map);
extern int scsi_device_quiesce(struct scsi_device *sdev);
extern void scsi_device_resume(struct scsi_device *sdev);
extern void scsi_target_quiesce(struct scsi_target *);
--
1.5.2.4

2007-10-29 14:42:50

by Jeff Garzik

[permalink] [raw]

Subject: [PATCH v4 2/2] libata: Utilize new SCSI event infrastructure

An end to CD-ROM polling (if you have a device that supports AN)...
hooray!

Signed-off-by: Jeff Garzik <[email protected]>
---
drivers/ata/libata-scsi.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index f752edd..e6d5627 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -773,6 +773,9 @@ static void ata_scsi_dev_config(struct scsi_device *sdev,
blk_queue_max_hw_segments(q, q->max_hw_segments - 1);
}

+ if (dev->flags & ATA_DFLAG_AN)
+ set_bit(SDEV_EVT_MEDIA_CHANGE, sdev->supported_events);
+
if (dev->flags & ATA_DFLAG_NCQ) {
int depth;

@@ -3225,10 +3228,11 @@ static void ata_scsi_handle_link_detach(struct ata_link *link)
*/
void ata_scsi_media_change_notify(struct ata_device *dev)
{
-#ifdef OTHER_AN_PATCHES_HAVE_BEEN_APPLIED
- if (dev->sdev)
- scsi_device_event_notify(dev->sdev, SDEV_MEDIA_CHANGE);
-#endif
+ if (dev->sdev) {
+ DECLARE_BITMAP(map, SDEV_EVT_MAXBITS);
+ __set_bit(SDEV_EVT_MEDIA_CHANGE, map);
+ sdev_evt_notify(dev->sdev, map);
+ }
}

/**
--
1.5.2.4

2007-10-29 15:43:56

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 10:42 -0400, Jeff Garzik wrote:
> This is the next revision of the SCSI event notification infrastructure
> patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD
> devices that support it.
>
> For devices that support SATA AN (only very recent ones do), this means
> that HAL and other userspace utilities no longer need to repeatedly poll
> the CD/DVD device to determine if the user has changed the media.
>
> This revision takes into account James' comments from earlier today,
> modulo the following notes:
>
> * I think the various event attributes should always be present,
> for all devices at all times. If various events are not supported,
> the attribute will of course return zero (false, not supported).

Actually, I don't think so. We have precedent for this in the transport
classes: if a device doesn't support a feature, we don't export the flag
for that feature through sysfs. This allows not only feature control,
but an immediate view of the device capabilities simply by viewing the
sysfs directory.

I think this functionality is very easy to layer in, so there's no
reason not to do it.

> * I do not think this work should be blocked behind a revamp
> of the attribute group interface.

Assuming gregkh or kay ack, it won't be.

> * I was slack and did not bother to implement the 'set' operation
> for the attributes. This can easily be done at a later time in a
> separate patch. It is not a merge stopper to have the driver
> exclusively control the event mask, rather than driver+sysfs.

James

2007-10-29 15:51:42

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 10:42 -0400, Jeff Garzik wrote:
> Originally based on a patch by Kristen Carlson Accardi @ Intel.
> Copious input from James Bottomley.
>
> Signed-off-by: Jeff Garzik <[email protected]>
> ---
> drivers/scsi/scsi_lib.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
> drivers/scsi/scsi_scan.c | 2 +
> drivers/scsi/scsi_sysfs.c | 20 +++++++++++++
> include/scsi/scsi_device.h | 12 ++++++++
> 4 files changed, 100 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 61fdaf0..f55ec80 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -18,6 +18,7 @@
> #include <linux/delay.h>
> #include <linux/hardirq.h>
> #include <linux/scatterlist.h>
> +#include <linux/bitmap.h>
>
> #include <scsi/scsi.h>
> #include <scsi/scsi_cmnd.h>
> @@ -2115,6 +2116,71 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
> EXPORT_SYMBOL(scsi_device_set_state);
>
> /**
> + * sdev_evt_thread - send a uevent for each scsi event
> + * @work: work struct for scsi_device
> + *
> + * Emit all queued media events as environment variables
> + * sent during a uevent.
> + */
> +void scsi_evt_thread(struct work_struct *work)
> +{
> + struct scsi_device *sdev;
> + char *envp[SDEV_EVT_LAST + 2];
> + DECLARE_BITMAP(mask, SDEV_EVT_MAXBITS);
> + unsigned long flags;
> + int evt, idx;
> +
> + sdev = container_of(work, struct scsi_device, event_work);
> +
> + spin_lock_irqsave(&sdev->list_lock, flags);
> + bitmap_copy(mask, sdev->event_mask, SDEV_EVT_MAXBITS);
> + bitmap_zero(sdev->event_mask, SDEV_EVT_MAXBITS);
> + spin_unlock_irqrestore(&sdev->list_lock, flags);
> +
> + idx = 0;
> + for (evt = 0; evt < SDEV_EVT_LAST; evt++) {
> + if (!test_bit(evt, mask))
> + continue;
> +
> + switch (evt) {
> + case SDEV_EVT_MEDIA_CHANGE:
> + envp[idx++] = "SDEV_MEDIA_CHANGE=1";
> + break;
> + }
> + }
> + envp[idx++] = NULL;
> +
> + kobject_uevent_env(&sdev->sdev_gendev.kobj, KOBJ_CHANGE, envp);
> +}
> +
> +/**
> + * sdev_evt_notify - send asserted events to uevent thread
> + * @sdev: scsi_device event occurred on
> + * @map: the bitmapped list of asserted events (SDEV_EVT_xxx)
> + *
> + *
> + */
> +void sdev_evt_notify(struct scsi_device *sdev, const unsigned long *map)
> +{
> + DECLARE_BITMAP(tmp_map, SDEV_EVT_MAXBITS);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&sdev->list_lock, flags);
> +
> + bitmap_and(tmp_map, sdev->supported_events, map, SDEV_EVT_MAXBITS);
> +
> + if (!bitmap_empty(tmp_map, SDEV_EVT_MAXBITS)) {
> + bitmap_or(sdev->event_mask, sdev->event_mask, tmp_map,
> + SDEV_EVT_MAXBITS);
> +
> + schedule_work(&sdev->event_work);
> + }
> +
> + spin_unlock_irqrestore(&sdev->list_lock, flags);

This still doesn't solve the fundamental corruption problem:
sdev->event_work has to contain the work entry until the workqueue has
finished executing it (which is some unspecified time in the future).
As soon as you drop the sdev->list_lock, the system thinks
sdev->event_work is available for reuse. If we fire another event
before the work queue finished processing the prior event, the queue
will be corrupted.

Although I hate GFP_ATOMIC allocations, I think that's the only viable
way to get out of this corruption problem (using a mechanism similar to
what I proposed yesterday).

Also, I think Kristin's initial use of execute_in_user_context() was a
good call .. if we already have a user context, there's no need to
bother the workqueue ... some of these events will likely trigger from
thread backed kernel daemons.

James

2007-10-29 15:58:45

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

James Bottomley wrote:
> On Mon, 2007-10-29 at 10:42 -0400, Jeff Garzik wrote:
>> This is the next revision of the SCSI event notification infrastructure
>> patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD
>> devices that support it.
>>
>> For devices that support SATA AN (only very recent ones do), this means
>> that HAL and other userspace utilities no longer need to repeatedly poll
>> the CD/DVD device to determine if the user has changed the media.
>>
>> This revision takes into account James' comments from earlier today,
>> modulo the following notes:
>>
>> * I think the various event attributes should always be present,
>> for all devices at all times. If various events are not supported,
>> the attribute will of course return zero (false, not supported).
>
> Actually, I don't think so. We have precedent for this in the transport
> classes: if a device doesn't support a feature, we don't export the flag
> for that feature through sysfs. This allows not only feature control,
> but an immediate view of the device capabilities simply by viewing the
> sysfs directory.

Think about about the values being exported by these sysfs attributes:
they indicate whether or not that feature is supported.

Thus, using the presence/absence of an attribute to communicate the same
thing would be redundant.

This suggestion adds a whole lot of complexity -- mirroring every change
to sdev->supported_events by dynamically adding or removing attributes.

The current nice, simple, elegant bitops-based interface is suddenly a
lot more cumbersome if forced to deal with attribute creation and disposal.

Finally, this additional complexity of dynamic attribute management also
eliminates some key information: userland can test the existence of the
attribute to determine if that support is present in the kernel.

Jeff

2007-10-29 16:07:46

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

James Bottomley wrote:
> This still doesn't solve the fundamental corruption problem:
> sdev->event_work has to contain the work entry until the workqueue has
> finished executing it (which is some unspecified time in the future).
> As soon as you drop the sdev->list_lock, the system thinks
> sdev->event_work is available for reuse. If we fire another event
> before the work queue finished processing the prior event, the queue
> will be corrupted.

I think you're misunderstanding the workqueue code? You can call
schedule_work(&sdev->event_work) from anywhere, any time you like, as
many times as you like.

> Also, I think Kristin's initial use of execute_in_user_context() was a
> good call .. if we already have a user context, there's no need to
> bother the workqueue ... some of these events will likely trigger from
> thread backed kernel daemons.

Quite agreed that sdev_evt_notify() might be called from kernel daemons,
but in general this is a fire-and-forget API that is -likely- to be
called from interrupt or completion context of many drivers, just like
scsi_done or other completion APIs. It is a fundamentally parallel
interface.

If thread-backed kernel daemons want to use this, it is trivial for them
to schedule work, then sync.

Jeff

2007-10-29 16:11:13

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 11:58 -0400, Jeff Garzik wrote:
> James Bottomley wrote:
> > On Mon, 2007-10-29 at 10:42 -0400, Jeff Garzik wrote:
> >> This is the next revision of the SCSI event notification infrastructure
> >> patchset, enabling SATA Asynchronous Notification ("AN") for CD/DVD
> >> devices that support it.
> >>
> >> For devices that support SATA AN (only very recent ones do), this means
> >> that HAL and other userspace utilities no longer need to repeatedly poll
> >> the CD/DVD device to determine if the user has changed the media.
> >>
> >> This revision takes into account James' comments from earlier today,
> >> modulo the following notes:
> >>
> >> * I think the various event attributes should always be present,
> >> for all devices at all times. If various events are not supported,
> >> the attribute will of course return zero (false, not supported).
> >
> > Actually, I don't think so. We have precedent for this in the transport
> > classes: if a device doesn't support a feature, we don't export the flag
> > for that feature through sysfs. This allows not only feature control,
> > but an immediate view of the device capabilities simply by viewing the
> > sysfs directory.
>
> Think about about the values being exported by these sysfs attributes:
> they indicate whether or not that feature is supported.

Ah, OK; I haven't communicated what we need very clearly. We need a way
to see if the event is supported by the device, as well as a way to turn
it off. For some of the events (possibly not the SATA AN one, since I
know all SATA devices will be well behaved) there's going to be a need
to deal with berserk or broken devices that become trigger happy, so
turning off the event will be a useful (and possibly essential) way of
coping.

> Thus, using the presence/absence of an attribute to communicate the same
> thing would be redundant.
>
> This suggestion adds a whole lot of complexity -- mirroring every change
> to sdev->supported_events by dynamically adding or removing attributes.
>
> The current nice, simple, elegant bitops-based interface is suddenly a
> lot more cumbersome if forced to deal with attribute creation and disposal.
>
> Finally, this additional complexity of dynamic attribute management also
> eliminates some key information: userland can test the existence of the
> attribute to determine if that support is present in the kernel.

James

2007-10-29 16:17:22

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 12:07 -0400, Jeff Garzik wrote:
> James Bottomley wrote:
> > This still doesn't solve the fundamental corruption problem:
> > sdev->event_work has to contain the work entry until the workqueue has
> > finished executing it (which is some unspecified time in the future).
> > As soon as you drop the sdev->list_lock, the system thinks
> > sdev->event_work is available for reuse. If we fire another event
> > before the work queue finished processing the prior event, the queue
> > will be corrupted.
>
> I think you're misunderstanding the workqueue code? You can call
> schedule_work(&sdev->event_work) from anywhere, any time you like, as
> many times as you like.

OK, take me through it slowly then ... I think schedule_work(work)
inserts work->entry onto the workqueue list (in
workqueue.c:insert_work()). If the event hasn't fired, it will already
be on the list, so adding the same entry to a list twice causes a list
corruption problem.

Plus, unfortunately, the CC/UA events are going to have to carry extra
sense data; they're not simply going to be triggers saying something
happened.

James

2007-10-29 16:25:20

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

James Bottomley wrote:
> Ah, OK; I haven't communicated what we need very clearly. We need a way
> to see if the event is supported by the device, as well as a way to turn
> it off. For some of the events (possibly not the SATA AN one, since I
> know all SATA devices will be well behaved) there's going to be a need
> to deal with berserk or broken devices that become trigger happy, so
> turning off the event will be a useful (and possibly essential) way of
> coping.

That's possible with the presented interface[1]:

# see if event is supported
cat $path/evt_media_change

# turn off event to deal with broken/beserk devices
echo 0 > $path/evt_media_change

Some sillyhead can always do

echo 1 > $path/evt_some_event_my_device_does_not_support

but that will be obviously be a no-op because their device simply will
not send such events.

Granted ls(1) is no longer a method for viewing supported-at-boot-time
list of events -- ls(1) in the presented interface lists what events the
_kernel_ supports, and cat(1) is used to discover which events are
actually enabled.

I think that is the only difference between our two positions: [if I
understand you correctly] you want ls(1) to be able to list the device's
supported events. However, I feel that is inconsistent: for your
proposal, userspace must perform two checks in order to determine a
feature's availability: 1) does the file exist? 2) is the file context
non-zero?

Regards,

Jeff

[1] modulo my comment from the original email in this thread:
> * I was slack and did not bother to implement the 'set' operation
> for the attributes. This can easily be done at a later time in a
> separate patch.

2007-10-29 16:29:38

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

James Bottomley wrote:
> On Mon, 2007-10-29 at 12:07 -0400, Jeff Garzik wrote:
>> James Bottomley wrote:
>>> This still doesn't solve the fundamental corruption problem:
>>> sdev->event_work has to contain the work entry until the workqueue has
>>> finished executing it (which is some unspecified time in the future).
>>> As soon as you drop the sdev->list_lock, the system thinks
>>> sdev->event_work is available for reuse. If we fire another event
>>> before the work queue finished processing the prior event, the queue
>>> will be corrupted.
>> I think you're misunderstanding the workqueue code? You can call
>> schedule_work(&sdev->event_work) from anywhere, any time you like, as
>> many times as you like.
>
> OK, take me through it slowly then ... I think schedule_work(work)
> inserts work->entry onto the workqueue list (in
> workqueue.c:insert_work()). If the event hasn't fired, it will already
> be on the list, so adding the same entry to a list twice causes a list
> corruption problem.

It does a test_and_set_bit() first thing in queue_work(). Similar
exclusivity logic is found in net device land. Ah, the fun of locking
without locks that benh grumbles about :)

> Plus, unfortunately, the CC/UA events are going to have to carry extra
> sense data; they're not simply going to be triggers saying something
> happened.

OK this is a fair criticism.

If additional data must be carried, then I must ditch the beloved bitmap
implementation and go back to a list (with associated GFP_ATOMIC alloc).

I will fix this, unless I receive email to the contrary...

Jeff

2007-10-29 16:34:23

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 12:24 -0400, Jeff Garzik wrote:
> James Bottomley wrote:
> > Ah, OK; I haven't communicated what we need very clearly. We need a way
> > to see if the event is supported by the device, as well as a way to turn
> > it off. For some of the events (possibly not the SATA AN one, since I
> > know all SATA devices will be well behaved) there's going to be a need
> > to deal with berserk or broken devices that become trigger happy, so
> > turning off the event will be a useful (and possibly essential) way of
> > coping.
>
>
> That's possible with the presented interface[1]:
>
> # see if event is supported
> cat $path/evt_media_change
>
> # turn off event to deal with broken/beserk devices
> echo 0 > $path/evt_media_change
>
> Some sillyhead can always do
>
> echo 1 > $path/evt_some_event_my_device_does_not_support
>
> but that will be obviously be a no-op because their device simply will
> not send such events.
>
> Granted ls(1) is no longer a method for viewing supported-at-boot-time
> list of events -- ls(1) in the presented interface lists what events the
> _kernel_ supports, and cat(1) is used to discover which events are
> actually enabled.
>
> I think that is the only difference between our two positions: [if I
> understand you correctly] you want ls(1) to be able to list the device's
> supported events. However, I feel that is inconsistent: for your
> proposal, userspace must perform two checks in order to determine a
> feature's availability: 1) does the file exist? 2) is the file context
> non-zero?

Yes, I agree ... however, open file is one op for the user -ENXIO means
device doesn't support the event; value indicates whether the event is
currently triggering.

I just would rather we use the file exists if device supports event,
because it's consistent with all the rest of our SCSI interfaces.

> Regards,
>
> Jeff
>
>
> [1] modulo my comment from the original email in this thread:
> > * I was slack and did not bother to implement the 'set' operation
> > for the attributes. This can easily be done at a later time in a
> > separate patch.

James

2007-10-29 16:48:54

by Jeff Garzik

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] [SCSI] Asynchronous event notification infrastructure

James Bottomley wrote:
> On Mon, 2007-10-29 at 12:24 -0400, Jeff Garzik wrote:
>> James Bottomley wrote:
>>> Ah, OK; I haven't communicated what we need very clearly. We need a way
>>> to see if the event is supported by the device, as well as a way to turn
>>> it off. For some of the events (possibly not the SATA AN one, since I
>>> know all SATA devices will be well behaved) there's going to be a need
>>> to deal with berserk or broken devices that become trigger happy, so
>>> turning off the event will be a useful (and possibly essential) way of
>>> coping.
>>
>> That's possible with the presented interface[1]:
>>
>> # see if event is supported
>> cat $path/evt_media_change
>>
>> # turn off event to deal with broken/beserk devices
>> echo 0 > $path/evt_media_change
>>
>> Some sillyhead can always do
>>
>> echo 1 > $path/evt_some_event_my_device_does_not_support
>>
>> but that will be obviously be a no-op because their device simply will
>> not send such events.
>>
>> Granted ls(1) is no longer a method for viewing supported-at-boot-time
>> list of events -- ls(1) in the presented interface lists what events the
>> _kernel_ supports, and cat(1) is used to discover which events are
>> actually enabled.
>>
>> I think that is the only difference between our two positions: [if I
>> understand you correctly] you want ls(1) to be able to list the device's
>> supported events. However, I feel that is inconsistent: for your
>> proposal, userspace must perform two checks in order to determine a
>> feature's availability: 1) does the file exist? 2) is the file context
>> non-zero?
>
> Yes, I agree ... however, open file is one op for the user -ENXIO means
> device doesn't support the event; value indicates whether the event is
> currently triggering.
>
> I just would rather we use the file exists if device supports event,
> because it's consistent with all the rest of our SCSI interfaces.

Two problems with what you just described:

1) "value indicates current event state" is a new concept in this thread
(maybe you were thinking this all along, but I didn't get that from your
writing).

Watching the sysfs node for event activity is definitely outside the
scope of this work, and IMO not very useful. The time from when LLDD
calls sdev_evt_notify() until uevent completion is very short, so the
time window for actually receiving a useful value in your scenario is
also short.

My patch presented the attributes purely as control nodes, only affected
sdev->supported_events and nothing more. You seem to be suggesting
exporting the true-for-only-a-few-milliseconds activity state, rather
then enable/disable state.

2) Event support itself is dynamic, which causes me to revisit the
"complexity" argument. In libata, for example, we only note that the
media-change event is supported after some time passes -- not in the
initial slave_config. Or error handler may disable it at runtime
because that event is problematic.

As such, that implies that the LLDD (with help from scsi_lib) is
dynamically adding and removing these attributes at runtime -- a lot
more complexity than is really needed AFAICS.

It is easy and straightforward for the driver to set a bit.

We cannot assume the state of event support bits are constant from
modprobe/slave_config time.

Jeff

2007-10-29 17:02:20

by James Bottomley

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] SCSI: Asynchronous event notification infrastructure

On Mon, 2007-10-29 at 12:29 -0400, Jeff Garzik wrote:
> James Bottomley wrote:
> > On Mon, 2007-10-29 at 12:07 -0400, Jeff Garzik wrote:
> >> James Bottomley wrote:
> >>> This still doesn't solve the fundamental corruption problem:
> >>> sdev->event_work has to contain the work entry until the workqueue has
> >>> finished executing it (which is some unspecified time in the future).
> >>> As soon as you drop the sdev->list_lock, the system thinks
> >>> sdev->event_work is available for reuse. If we fire another event
> >>> before the work queue finished processing the prior event, the queue
> >>> will be corrupted.
> >> I think you're misunderstanding the workqueue code? You can call
> >> schedule_work(&sdev->event_work) from anywhere, any time you like, as
> >> many times as you like.
> >
> > OK, take me through it slowly then ... I think schedule_work(work)
> > inserts work->entry onto the workqueue list (in
> > workqueue.c:insert_work()). If the event hasn't fired, it will already
> > be on the list, so adding the same entry to a list twice causes a list
> > corruption problem.
>
> It does a test_and_set_bit() first thing in queue_work(). Similar
> exclusivity logic is found in net device land. Ah, the fun of locking
> without locks that benh grumbles about :)

Ah, OK, sorry ... I was actually looking at __queue_work().

> > Plus, unfortunately, the CC/UA events are going to have to carry extra
> > sense data; they're not simply going to be triggers saying something
> > happened.
>
> OK this is a fair criticism.
>
> If additional data must be carried, then I must ditch the beloved bitmap
> implementation and go back to a list (with associated GFP_ATOMIC alloc).
>
> I will fix this, unless I receive email to the contrary...

Yes, unfortunately, thanks. If all events were a simple number, it's
easy, but the CC/UA events carry data as well.

James