2010-11-01 18:46:24

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET RFC] block/SCSI: implement in-kernel disk event handling

This patchset implements in-kernel disk event handling framework and
add support for it to sr and sd. This is largely to move media
presence polling into kernel as userspace implementation turned out to
be quite problematic over the years.

>From the patch description of the third patch,

Currently, media presence polling for removeable block devices is done
from userland. There are several issues with this.

* Polling is done by periodically opening the device. For SCSI
devices, the command sequence generated by such action involves a
few different commands including TEST_UNIT_READY. This behavior,
while perfectly legal, is different from Windows which only issues
single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
ATAPI devices lock up after being periodically queried such command
sequences.

* There is no reliable and unintrusive way for a userland program to
tell whether the target device is safe for media presence polling.
For example, polling for media presence during an on-going burning
session can make it fail. The polling program can avoid this by
opening the device with O_EXCL but then it risks making a valid
exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
is lighter and better coordinated (workqueue, timer slack).

This patchset contains the following seven patches.

0001-block-kill-genhd_media_change_notify.patch
0002-block-move-register_disk-and-del_gendisk-to-block-ge.patch
0003-implement-in-kernel-gendisk-events-handling.patch
0004-cdrom-add-check_events-support.patch
0005-scsi-replace-sr_test_unit_ready-with-scsi_test_unit_.patch
0006-sr-implement-sr_check_events.patch
0007-sd-implement-sd_check_events.patch

0001-0002 are prepreations. 0003 implements the block layer framework
for disk event handling. 0004-0007 add support for it to cdrom and
then sr and sd.

Block drivers just need to implement a new bdev method
->check_events() which supercedes ->media_change() and set
bdev->events[_async] masks according to what the device can do.
Everything else is handled by block layer including event generation,
polling and its configuration.

Tested with dvd drives and an iomega click drive (a removable direct
access ATAPI device Alan gave to me, still operational!).

Kay, as you're the media presence polling expert :-) how does this
look to you?

This patchset is on top of v2.6.37-rc1 + clean-up-bdev-claim-release
patchset and available in the following git tree,

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git disk-events

and contains the following changes.

block/genhd.c | 544 +++++++++++++++++++++++++++++++++++++++++++++---
drivers/cdrom/cdrom.c | 55 ++++
drivers/scsi/scsi_lib.c | 13 -
drivers/scsi/sd.c | 97 ++++----
drivers/scsi/sd.h | 1
drivers/scsi/sr.c | 168 ++++++++------
drivers/scsi/sr.h | 3
drivers/scsi/sr_ioctl.c | 2
fs/block_dev.c | 41 +++
fs/partitions/check.c | 89 -------
include/linux/blkdev.h | 4
include/linux/cdrom.h | 6
include/linux/fs.h | 1
include/linux/genhd.h | 20 +
include/scsi/scsi.h | 1
15 files changed, 774 insertions(+), 271 deletions(-)

Thanks.

--
tejun


2010-11-01 18:45:04

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/7] scsi: replace sr_test_unit_ready() with scsi_test_unit_ready()

The usage of TUR has been confusing involving several different
commits updating different parts over time. Currently, the only
differences between scsi_test_unit_ready() and sr_test_unit_ready()
are,

* scsi_test_unit_ready() also sets sdev->changed on NOT_READY.

* scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
NOT_READY.

Due to the above two differences, sr is using its own
sr_test_unit_ready(), but sd - the sole user of the above extra
handling - doesn't even need them.

Where scsi_test_unit_ready() is used in sd_media_changed(), the code
is looking for device ready w/ media present state which is true iff
TUR succeeds w/o sense data or UA, and when the device is not ready
for whatever reason sd_media_changed() explicitly marks media as
missing so there's no reason to set sdev->changed automatically from
scsi_test_unit_ready() on NOT_READY.

Drop both special handlings from scsi_test_unit_ready(), which makes
it equivalant to sr_test_unit_ready(), and replace
sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop
unnecessary explicit NOT_READY check from sd_media_changed().
Checking return value is enough for testing device readiness.

Signed-off-by: Tejun Heo <[email protected]>
---
drivers/scsi/scsi_lib.c | 13 +------------
drivers/scsi/sd.c | 10 +---------
drivers/scsi/sr.c | 37 ++++++-------------------------------
drivers/scsi/sr.h | 1 -
drivers/scsi/sr_ioctl.c | 2 +-
5 files changed, 9 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eafeeda..13bf891 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1984,8 +1984,7 @@ EXPORT_SYMBOL(scsi_mode_sense);
* in.
*
* Returns zero if unsuccessful or an error if TUR failed. For
- * removable media, a return of NOT_READY or UNIT_ATTENTION is
- * translated to success, with the ->changed flag updated.
+ * removable media, UNIT_ATTENTION sets ->changed flag.
**/
int
scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
@@ -2012,16 +2011,6 @@ scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries,
} while (scsi_sense_valid(sshdr) &&
sshdr->sense_key == UNIT_ATTENTION && --retries);

- if (!sshdr)
- /* could not allocate sense buffer, so can't process it */
- return result;
-
- if (sdev->removable && scsi_sense_valid(sshdr) &&
- (sshdr->sense_key == UNIT_ATTENTION ||
- sshdr->sense_key == NOT_READY)) {
- sdev->changed = 1;
- result = 0;
- }
if (!sshdr_external)
kfree(sshdr);
return result;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b9ab3a5..8d488a9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1045,15 +1045,7 @@ static int sd_media_changed(struct gendisk *disk)
sshdr);
}

- /*
- * Unable to test, unit probably not ready. This usually
- * means there is no disc in the drive. Mark as changed,
- * and we will figure it out later once the drive is
- * available again.
- */
- if (retval || (scsi_sense_valid(sshdr) &&
- /* 0x3a is medium not present */
- sshdr->asc == 0x3a)) {
+ if (retval) {
set_media_not_present(sdkp);
retval = 1;
goto out;
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index d7b383c..63cdbc6 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -165,32 +165,6 @@ static void scsi_cd_put(struct scsi_cd *cd)
mutex_unlock(&sr_ref_mutex);
}

-/* identical to scsi_test_unit_ready except that it doesn't
- * eat the NOT_READY returns for removable media */
-int sr_test_unit_ready(struct scsi_device *sdev, struct scsi_sense_hdr *sshdr)
-{
- int retries = MAX_RETRIES;
- int the_result;
- u8 cmd[] = {TEST_UNIT_READY, 0, 0, 0, 0, 0 };
-
- /* issue TEST_UNIT_READY until the initial startup UNIT_ATTENTION
- * conditions are gone, or a timeout happens
- */
- do {
- the_result = scsi_execute_req(sdev, cmd, DMA_NONE, NULL,
- 0, sshdr, SR_TIMEOUT,
- retries--, NULL);
- if (scsi_sense_valid(sshdr) &&
- sshdr->sense_key == UNIT_ATTENTION)
- sdev->changed = 1;
-
- } while (retries > 0 &&
- (!scsi_status_is_good(the_result) ||
- (scsi_sense_valid(sshdr) &&
- sshdr->sense_key == UNIT_ATTENTION)));
- return the_result;
-}
-
/*
* This function checks to see if the media has been changed in the
* CDROM drive. It is possible that we have already sensed a change,
@@ -213,10 +187,11 @@ static int sr_media_change(struct cdrom_device_info *cdi, int slot)
}

sshdr = kzalloc(sizeof(*sshdr), GFP_KERNEL);
- retval = sr_test_unit_ready(cd->device, sshdr);
- if (retval || (scsi_sense_valid(sshdr) &&
- /* 0x3a is medium not present */
- sshdr->asc == 0x3a)) {
+ retval = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES,
+ sshdr);
+ if (host_byte(retval) || (scsi_sense_valid(sshdr) &&
+ /* ASC 0x3a is medium not present */
+ sshdr->asc == 0x3a)) {
/* Media not present or unable to test, unit probably not
* ready. This usually means there is no disc in the drive.
* Mark as changed, and we will figure it out later once
@@ -780,7 +755,7 @@ static void get_capabilities(struct scsi_cd *cd)
}

/* eat unit attentions */
- sr_test_unit_ready(cd->device, &sshdr);
+ scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);

/* ask for mode page 0x2a */
rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128,
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 1e144df..81fbc0b 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -61,7 +61,6 @@ int sr_select_speed(struct cdrom_device_info *cdi, int speed);
int sr_audio_ioctl(struct cdrom_device_info *, unsigned int, void *);

int sr_is_xa(Scsi_CD *);
-int sr_test_unit_ready(struct scsi_device *sdev, struct scsi_sense_hdr *sshdr);

/* sr_vendor.c */
void sr_vendor_init(Scsi_CD *);
diff --git a/drivers/scsi/sr_ioctl.c b/drivers/scsi/sr_ioctl.c
index 3cd8ffb..8be3055 100644
--- a/drivers/scsi/sr_ioctl.c
+++ b/drivers/scsi/sr_ioctl.c
@@ -307,7 +307,7 @@ int sr_drive_status(struct cdrom_device_info *cdi, int slot)
/* we have no changer support */
return -EINVAL;
}
- if (0 == sr_test_unit_ready(cd->device, &sshdr))
+ if (!scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr))
return CDS_DISC_OK;

/* SK/ASC/ASCQ of 2/4/1 means "unit is becoming ready" */
--
1.7.1

2010-11-01 18:45:40

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/7] implement in-kernel gendisk events handling

Currently, media presence polling for removeable block devices is done
from userland. There are several issues with this.

* Polling is done by periodically opening the device. For SCSI
devices, the command sequence generated by such action involves a
few different commands including TEST_UNIT_READY. This behavior,
while perfectly legal, is different from Windows which only issues
single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
ATAPI devices lock up after being periodically queried such command
sequences.

* There is no reliable and unintrusive way for a userland program to
tell whether the target device is safe for media presence polling.
For example, polling for media presence during an on-going burning
session can make it fail. The polling program can avoid this by
opening the device with O_EXCL but then it risks making a valid
exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
is lighter and better coordinated (workqueue, timer slack).

This patch implements framework for in-kernel disk event handling,
which includes media presence polling.

* bdops->check_events() is added, which supercedes ->media_changed().
It should check whether there's any pending event and return if so.
Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be
called parallelly.

* gendisk->events and ->async_events are added. These should be
initialized by block driver before passing the device to add_disk().
The former contains the mask of all supported events and the latter
the mask of all events which the device can report without polling.
/sys/block/*/events[_async] export these to userland.

* Kernel parameter block.events_dfl_poll_msecs controls the system
polling interval (default is 0 which means disable) and
/sys/block/*/events_poll_msecs control polling intervals for
individual devices (default is -1 meaning use system setting). Note
that if a device can report all supported events asynchronously and
its polling interval isn't explicitly set, the device won't be
polled regardless of the system polling interval.

* If a device is opened exclusively with write access, event checking
is automatically disabled until all write exclusive accesses are
released.

* There are event 'clearing' events. For example, both of currently
defined events are cleared after the device has been successfully
opened. This information is passed to ->check_events() callback
using @clearing argument as a hint.

* Event checking is always performed from system_nrt_wq and timer
slack is set to 25% for polling.

* Nothing changes for drivers which implement ->media_changed() but
not ->check_events(). Going forward, all drivers will be converted
to ->check_events() and ->media_change() will be dropped.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Kay Sievers <[email protected]>
Cc: Jan Kara <[email protected]>
---
block/genhd.c | 429 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/block_dev.c | 41 ++++-
include/linux/blkdev.h | 3 +
include/linux/fs.h | 1 +
include/linux/genhd.h | 18 ++-
5 files changed, 484 insertions(+), 8 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 2e5e4c0..5465a82 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
#include <linux/buffer_head.h>
#include <linux/mutex.h>
#include <linux/idr.h>
+#include <linux/log2.h>

#include "blk.h"

@@ -35,6 +36,10 @@ static DEFINE_IDR(ext_devt_idr);

static struct device_type disk_type;

+static void disk_add_events(struct gendisk *disk);
+static void disk_del_events(struct gendisk *disk);
+static void disk_release_events(struct gendisk *disk);
+
/**
* disk_get_part - get partition
* @disk: disk to look partition from
@@ -609,6 +614,8 @@ void add_disk(struct gendisk *disk)
retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
"bdi");
WARN_ON(retval);
+
+ disk_add_events(disk);
}
EXPORT_SYMBOL(add_disk);

@@ -617,6 +624,8 @@ void del_gendisk(struct gendisk *disk)
struct disk_part_iter piter;
struct hd_struct *part;

+ disk_del_events(disk);
+
/* invalidate stuff */
disk_part_iter_init(&piter, disk,
DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
@@ -1089,6 +1098,7 @@ static void disk_release(struct device *dev)
{
struct gendisk *disk = dev_to_disk(dev);

+ disk_release_events(disk);
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
free_part_stats(&disk->part0);
@@ -1350,3 +1360,422 @@ int invalidate_partition(struct gendisk *disk, int partno)
}

EXPORT_SYMBOL(invalidate_partition);
+
+/*
+ * Disk events - monitor disk events like media change and eject request.
+ */
+struct disk_events {
+ struct list_head node; /* all disk_event's */
+ struct gendisk *disk; /* the associated disk */
+ spinlock_t lock;
+
+ int block; /* event blocking depth */
+ unsigned int pending; /* events already sent out */
+ unsigned int clearing; /* events being cleared */
+
+ long poll_msecs; /* interval, -1 for default */
+ struct delayed_work dwork;
+};
+
+static const char *disk_events_strs[] = {
+ [ilog2(DISK_EVENT_MEDIA_CHANGE)] = "media_change",
+ [ilog2(DISK_EVENT_EJECT_REQUEST)] = "eject_request",
+};
+
+static char *disk_uevents[] = {
+ [ilog2(DISK_EVENT_MEDIA_CHANGE)] = "DISK_MEDIA_CHANGE=1",
+ [ilog2(DISK_EVENT_EJECT_REQUEST)] = "DISK_EJECT_REQUEST=1",
+};
+
+/* list of all disk_events */
+static DEFINE_MUTEX(disk_events_mutex);
+static LIST_HEAD(disk_events);
+
+/* disable in-kernel polling by default */
+static unsigned long disk_events_dfl_poll_msecs = 0;
+
+static unsigned long disk_events_poll_jiffies(struct gendisk *disk)
+{
+ struct disk_events *ev = disk->ev;
+ long intv_msecs = 0;
+
+ /*
+ * If device-specific poll interval is set, always use it. If
+ * the default is being used, poll iff there are events which
+ * can't be monitored asynchronously.
+ */
+ if (ev->poll_msecs >= 0)
+ intv_msecs = ev->poll_msecs;
+ else if (disk->events & ~disk->async_events)
+ intv_msecs = disk_events_dfl_poll_msecs;
+
+ return msecs_to_jiffies(intv_msecs);
+}
+
+static void __disk_block_events(struct gendisk *disk, bool sync)
+{
+ struct disk_events *ev = disk->ev;
+ unsigned long flags;
+ bool cancel;
+
+ spin_lock_irqsave(&ev->lock, flags);
+ cancel = !ev->block++;
+ spin_unlock_irqrestore(&ev->lock, flags);
+
+ if (cancel) {
+ if (sync)
+ cancel_delayed_work_sync(&disk->ev->dwork);
+ else
+ cancel_delayed_work(&disk->ev->dwork);
+ }
+}
+
+static void __disk_unblock_events(struct gendisk *disk, bool check_now)
+{
+ struct disk_events *ev = disk->ev;
+ unsigned long intv;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ev->lock, flags);
+
+ if (WARN_ON_ONCE(ev->block <= 0))
+ goto out_unlock;
+
+ if (--ev->block)
+ goto out_unlock;
+
+ /*
+ * Not exactly a latency critical operation, set poll timer
+ * slack to 25% and kick event check.
+ */
+ intv = disk_events_poll_jiffies(disk);
+ set_timer_slack(&ev->dwork.timer, intv / 4);
+ if (check_now)
+ queue_delayed_work(system_nrt_wq, &ev->dwork, 0);
+ else if (intv)
+ queue_delayed_work(system_nrt_wq, &ev->dwork, intv);
+out_unlock:
+ spin_unlock_irqrestore(&ev->lock, flags);
+}
+
+/**
+ * disk_block_events - block and flush disk event checking
+ * @disk: disk to block events for
+ *
+ * On return from this function, it is guaranteed that event checking
+ * isn't in progress and won't happen until unblocked by
+ * disk_unblock_events(). Events blocking is counted and the actual
+ * unblocking happens after the matching number of unblocks are done.
+ *
+ * Note that this intentionally does not block event checking from
+ * disk_clear_events().
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void disk_block_events(struct gendisk *disk)
+{
+ if (disk->ev)
+ __disk_block_events(disk, true);
+}
+
+/**
+ * disk_unblock_events - unblock disk event checking
+ * @disk: disk to unblock events for
+ *
+ * Undo disk_block_events(). When the block count reaches zero, it
+ * starts events polling if configured.
+ *
+ * CONTEXT:
+ * Don't care. Safe to call from irq context.
+ */
+void disk_unblock_events(struct gendisk *disk)
+{
+ if (disk->ev)
+ __disk_unblock_events(disk, true);
+}
+
+/**
+ * disk_check_events - schedule immediate event checking
+ * @disk: disk to check events for
+ *
+ * Schedule immediate event checking on @disk if not blocked.
+ *
+ * CONTEXT:
+ * Don't care. Safe to call from irq context.
+ */
+void disk_check_events(struct gendisk *disk)
+{
+ if (disk->ev) {
+ __disk_block_events(disk, false);
+ __disk_unblock_events(disk, true);
+ }
+}
+EXPORT_SYMBOL_GPL(disk_check_events);
+
+/**
+ * disk_clear_events - synchronously check, clear and return pending events
+ * @disk: disk to fetch and clear events from
+ * @mask: mask of events to be fetched and clearted
+ *
+ * Disk events are synchronously checked and pending events in @mask
+ * are cleared and returned. This ignores the block count.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
+{
+ const struct block_device_operations *bdops = disk->fops;
+ struct disk_events *ev = disk->ev;
+ unsigned int pending;
+
+ if (!ev) {
+ /* for drivers still using the old ->media_changed method */
+ if ((mask & DISK_EVENT_MEDIA_CHANGE) &&
+ bdops->media_changed && bdops->media_changed(disk))
+ return DISK_EVENT_MEDIA_CHANGE;
+ return 0;
+ }
+
+ /* tell the workfn about the events being cleared */
+ spin_lock_irq(&ev->lock);
+ ev->clearing |= mask;
+ spin_unlock_irq(&ev->lock);
+
+ /* uncondtionally schedule event check and wait for it to finish */
+ __disk_block_events(disk, true);
+ queue_delayed_work(system_nrt_wq, &ev->dwork, 0);
+ flush_delayed_work(&ev->dwork);
+ __disk_unblock_events(disk, false);
+
+ /* then, fetch and clear pending events */
+ spin_lock_irq(&ev->lock);
+ WARN_ON_ONCE(ev->clearing & mask); /* cleared by workfn */
+ pending = ev->pending & mask;
+ ev->pending &= ~mask;
+ spin_unlock_irq(&ev->lock);
+
+ return pending;
+}
+
+static void disk_events_workfn(struct work_struct *work)
+{
+ struct delayed_work *dwork = to_delayed_work(work);
+ struct disk_events *ev = container_of(dwork, struct disk_events, dwork);
+ struct gendisk *disk = ev->disk;
+ char *envp[ARRAY_SIZE(disk_uevents) + 1] = { };
+ unsigned int clearing = ev->clearing;
+ unsigned int events;
+ unsigned long intv;
+ int nr_events = 0, i;
+
+ /* check events */
+ events = disk->fops->check_events(disk, clearing);
+
+ /* accumulate pending events and schedule next poll if necessary */
+ spin_lock_irq(&ev->lock);
+
+ events &= ~ev->pending;
+ ev->pending |= events;
+ ev->clearing &= ~clearing;
+
+ intv = disk_events_poll_jiffies(disk);
+ if (!ev->block && intv)
+ queue_delayed_work(system_nrt_wq, &ev->dwork, intv);
+
+ spin_unlock_irq(&ev->lock);
+
+ /* tell userland about new events */
+ for (i = 0; i < ARRAY_SIZE(disk_uevents); i++)
+ if (events & (1 << i))
+ envp[nr_events++] = disk_uevents[i];
+
+ if (nr_events)
+ kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
+}
+
+/*
+ * A disk events enabled device has the following sysfs nodes under
+ * its /sys/block/X/ directory.
+ *
+ * events : list of all supported events
+ * events_async : list of events which can be detected w/o polling
+ * events_poll_msecs : polling interval, 0: disable, -1: system default
+ */
+static ssize_t __disk_events_show(unsigned int events, char *buf)
+{
+ const char *delim = "";
+ ssize_t pos = 0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(disk_events_strs); i++)
+ if (events & (1 << i)) {
+ pos += sprintf(buf + pos, "%s%s",
+ delim, disk_events_strs[i]);
+ delim = " ";
+ }
+ if (pos)
+ pos += sprintf(buf + pos, "\n");
+ return pos;
+}
+
+static ssize_t disk_events_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return __disk_events_show(disk->events, buf);
+}
+
+static ssize_t disk_events_async_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return __disk_events_show(disk->async_events, buf);
+}
+
+static ssize_t disk_events_poll_msecs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return sprintf(buf, "%ld\n", disk->ev->poll_msecs);
+}
+
+static ssize_t disk_events_poll_msecs_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ long intv;
+
+ if (!count || !sscanf(buf, "%ld", &intv))
+ return -EINVAL;
+
+ if (intv < 0 && intv != -1)
+ return -EINVAL;
+
+ __disk_block_events(disk, true);
+ disk->ev->poll_msecs = intv;
+ __disk_unblock_events(disk, true);
+
+ return count;
+}
+
+static const DEVICE_ATTR(events, S_IRUGO, disk_events_show, NULL);
+static const DEVICE_ATTR(events_async, S_IRUGO, disk_events_async_show, NULL);
+static const DEVICE_ATTR(events_poll_msecs, S_IRUGO|S_IWUSR,
+ disk_events_poll_msecs_show,
+ disk_events_poll_msecs_store);
+
+static const struct attribute *disk_events_attrs[] = {
+ &dev_attr_events.attr,
+ &dev_attr_events_async.attr,
+ &dev_attr_events_poll_msecs.attr,
+ NULL,
+};
+
+/*
+ * The default polling interval can be specified by the kernel
+ * parameter block.events_dfl_poll_msecs which defaults to 0
+ * (disable). This can also be modified runtime by writing to
+ * /sys/module/block/events_dfl_poll_msecs.
+ */
+static int disk_events_set_dfl_poll_msecs(const char *val,
+ const struct kernel_param *kp)
+{
+ struct disk_events *ev;
+ int ret;
+
+ ret = param_set_ulong(val, kp);
+ if (ret < 0)
+ return ret;
+
+ mutex_lock(&disk_events_mutex);
+
+ list_for_each_entry(ev, &disk_events, node)
+ disk_check_events(ev->disk);
+
+ mutex_unlock(&disk_events_mutex);
+
+ return 0;
+}
+
+static const struct kernel_param_ops disk_events_dfl_poll_msecs_param_ops = {
+ .set = disk_events_set_dfl_poll_msecs,
+ .get = param_get_ulong,
+};
+
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "block."
+
+module_param_cb(events_dfl_poll_msecs, &disk_events_dfl_poll_msecs_param_ops,
+ &disk_events_dfl_poll_msecs, 0644);
+
+/*
+ * disk_{add|del|release}_events - initialize and destroy disk_events.
+ */
+static void disk_add_events(struct gendisk *disk)
+{
+ struct disk_events *ev;
+
+ if (!disk->fops->check_events || !(disk->events | disk->async_events))
+ return;
+
+ ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+ if (!ev) {
+ pr_warn("%s: failed to initialize events\n", disk->disk_name);
+ return;
+ }
+
+ if (sysfs_create_files(&disk_to_dev(disk)->kobj,
+ disk_events_attrs) < 0) {
+ pr_warn("%s: failed to create sysfs files for events\n",
+ disk->disk_name);
+ kfree(ev);
+ return;
+ }
+
+ disk->ev = ev;
+
+ INIT_LIST_HEAD(&ev->node);
+ ev->disk = disk;
+ spin_lock_init(&ev->lock);
+ ev->block = 1;
+ ev->poll_msecs = -1;
+ INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn);
+
+ mutex_lock(&disk_events_mutex);
+ list_add_tail(&ev->node, &disk_events);
+ mutex_unlock(&disk_events_mutex);
+
+ /*
+ * Block count is initialized to 1 and the following initial
+ * unblock kicks it into action.
+ */
+ __disk_unblock_events(disk, true);
+}
+
+static void disk_del_events(struct gendisk *disk)
+{
+ if (!disk->ev)
+ return;
+
+ __disk_block_events(disk, true);
+
+ mutex_lock(&disk_events_mutex);
+ list_del_init(&disk->ev->node);
+ mutex_unlock(&disk_events_mutex);
+
+ sysfs_remove_files(&disk_to_dev(disk)->kobj, disk_events_attrs);
+}
+
+static void disk_release_events(struct gendisk *disk)
+{
+ /* the block count should be 1 from disk_del_events() */
+ WARN_ON_ONCE(disk->ev && disk->ev->block != 1);
+ kfree(disk->ev);
+}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 269bfbb..764cdb7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -966,10 +966,11 @@ int check_disk_change(struct block_device *bdev)
{
struct gendisk *disk = bdev->bd_disk;
const struct block_device_operations *bdops = disk->fops;
+ unsigned int events;

- if (!bdops->media_changed)
- return 0;
- if (!bdops->media_changed(bdev->bd_disk))
+ events = disk_clear_events(disk, DISK_EVENT_MEDIA_CHANGE |
+ DISK_EVENT_EJECT_REQUEST);
+ if (!(events & DISK_EVENT_MEDIA_CHANGE))
return 0;

flush_disk(bdev);
@@ -1151,9 +1152,10 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)

if (whole) {
/* finish claiming */
+ mutex_lock(&bdev->bd_mutex);
spin_lock(&bdev_lock);

- if (res == 0) {
+ if (!res) {
BUG_ON(!bd_may_claim(bdev, whole, holder));
/*
* Note that for a whole device bd_holders
@@ -1173,6 +1175,20 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
wake_up_bit(&whole->bd_claiming, 0);

spin_unlock(&bdev_lock);
+
+ /*
+ * Block event polling for write claims. Any write
+ * holder makes the write_holder state stick until all
+ * are released. This is good enough and tracking
+ * individual writeable reference is too fragile given
+ * the way @mode is used in blkdev_get/put().
+ */
+ if (!res && (mode & FMODE_WRITE) && !bdev->bd_write_holder) {
+ bdev->bd_write_holder = true;
+ disk_block_events(bdev->bd_disk);
+ }
+
+ mutex_unlock(&bdev->bd_mutex);
bdput(whole);
}

@@ -1272,12 +1288,23 @@ int blkdev_put(struct block_device *bdev, fmode_t mode)

spin_unlock(&bdev_lock);

- /* if this was the last claim, holder link should go too */
- if (bdev_free)
+ /*
+ * If this was the last claim, remove holder link and
+ * unblock evpoll if it was a write holder.
+ */
+ if (bdev_free) {
bd_unlink_disk_holder(bdev);
+ if (bdev->bd_write_holder) {
+ disk_unblock_events(bdev->bd_disk);
+ bdev->bd_write_holder = false;
+ } else
+ disk_check_events(bdev->bd_disk);
+ }

mutex_unlock(&bdev->bd_mutex);
- }
+ } else
+ disk_check_events(bdev->bd_disk);
+
return __blkdev_put(bdev, mode, 0);
}
EXPORT_SYMBOL(blkdev_put);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c6f9e3c..7c63870 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1252,6 +1252,9 @@ struct block_device_operations {
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*direct_access) (struct block_device *, sector_t,
void **, unsigned long *);
+ unsigned int (*check_events) (struct gendisk *disk,
+ unsigned int clearing);
+ /* ->media_changed() is DEPRECATED, use ->check_events() instead */
int (*media_changed) (struct gendisk *);
void (*unlock_native_capacity) (struct gendisk *);
int (*revalidate_disk) (struct gendisk *);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1a033e8..b72c390 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -662,6 +662,7 @@ struct block_device {
void * bd_claiming;
void * bd_holder;
int bd_holders;
+ bool bd_write_holder;
#ifdef CONFIG_SYSFS
struct gendisk * bd_holder_disk; /* for sysfs slave linkng */
#endif
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 56e17ed..13893aa 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -127,6 +127,11 @@ struct hd_struct {
#define GENHD_FL_EXT_DEVT 64 /* allow extended devt */
#define GENHD_FL_NATIVE_CAPACITY 128

+enum {
+ DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */
+ DISK_EVENT_EJECT_REQUEST = 1 << 1, /* eject requested */
+};
+
#define BLK_SCSI_MAX_CMDS (256)
#define BLK_SCSI_CMD_PER_LONG (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8))

@@ -143,6 +148,8 @@ struct disk_part_tbl {
struct hd_struct __rcu *part[];
};

+struct disk_events;
+
struct gendisk {
/* major, first_minor and minors are input parameters only,
* don't use directly. Use disk_devt() and disk_max_parts().
@@ -154,6 +161,10 @@ struct gendisk {

char disk_name[DISK_NAME_LEN]; /* name of major driver */
char *(*devnode)(struct gendisk *gd, mode_t *mode);
+
+ unsigned int events; /* supported events */
+ unsigned int async_events; /* async events, subset of all */
+
/* Array of pointers to partitions indexed by partno.
* Protected with matching bdev lock but stat and other
* non-critical accesses use RCU. Always access through
@@ -171,8 +182,8 @@ struct gendisk {
struct kobject *slave_dir;

struct timer_rand_state *random;
-
atomic_t sync_io; /* RAID */
+ struct disk_events *ev;
#ifdef CONFIG_BLK_DEV_INTEGRITY
struct blk_integrity *integrity;
#endif
@@ -405,6 +416,11 @@ static inline int get_disk_ro(struct gendisk *disk)
return disk->part0.policy;
}

+extern void disk_block_events(struct gendisk *disk);
+extern void disk_unblock_events(struct gendisk *disk);
+extern void disk_check_events(struct gendisk *disk);
+extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);
+
/* drivers/char/random.c */
extern void add_disk_randomness(struct gendisk *disk);
extern void rand_initialize_disk(struct gendisk *disk);
--
1.7.1

2010-11-01 18:45:09

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/7] sr: implement sr_check_events()

Replace sr_media_change() with sr_check_events(). It normally only
uses GET_EVENT_STATUS_NOTIFICATION to check both media change and
eject request. If @clearing includes DISK_EVENT_MEDIA_CHANGE, it
issues TUR and compares whether media presence has changed. The SCSI
specific media change uevent is kept for compatibility.

sr_media_change() was doing both media change check and revalidation.
The revalidation part is split into sr_block_revalidate_disk().

Signed-off-by: Tejun Heo <[email protected]>
Cc: Kay Sievers <[email protected]>
---
drivers/scsi/sr.c | 149 +++++++++++++++++++++++++++++++++------------------
drivers/scsi/sr.h | 2 +-
include/scsi/scsi.h | 1 +
3 files changed, 98 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 63cdbc6..edda30d 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -104,14 +104,15 @@ static void sr_release(struct cdrom_device_info *);
static void get_sectorsize(struct scsi_cd *);
static void get_capabilities(struct scsi_cd *);

-static int sr_media_change(struct cdrom_device_info *, int);
+static unsigned int sr_check_events(struct cdrom_device_info *cdi,
+ unsigned int clearing, int slot);
static int sr_packet(struct cdrom_device_info *, struct packet_command *);

static struct cdrom_device_ops sr_dops = {
.open = sr_open,
.release = sr_release,
.drive_status = sr_drive_status,
- .media_changed = sr_media_change,
+ .check_events = sr_check_events,
.tray_move = sr_tray_move,
.lock_door = sr_lock_door,
.select_speed = sr_select_speed,
@@ -165,65 +166,90 @@ static void scsi_cd_put(struct scsi_cd *cd)
mutex_unlock(&sr_ref_mutex);
}

+static unsigned int sr_get_events(struct scsi_device *sdev)
+{
+ u8 buf[8];
+ u8 cmd[] = { GET_EVENT_STATUS_NOTIFICATION,
+ 1, /* polled */
+ 0, 0, /* reserved */
+ 1 << 4, /* notification class: media */
+ 0, 0, /* reserved */
+ 0, sizeof(buf), /* allocation length */
+ 0, /* control */
+ };
+ struct event_header *eh = (void *)buf;
+ struct media_event_desc *med = (void *)(buf + 4);
+ struct scsi_sense_hdr sshdr;
+ int result;
+
+ result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, sizeof(buf),
+ &sshdr, SR_TIMEOUT, MAX_RETRIES, NULL);
+ if (scsi_sense_valid(&sshdr) && sshdr.sense_key == UNIT_ATTENTION)
+ return DISK_EVENT_MEDIA_CHANGE;
+
+ if (result || be16_to_cpu(eh->data_len) < sizeof(*med))
+ return 0;
+
+ if (eh->nea || eh->notification_class != 0x4)
+ return 0;
+
+ if (med->media_event_code == 1)
+ return DISK_EVENT_EJECT_REQUEST;
+ else if (med->media_event_code == 2)
+ return DISK_EVENT_MEDIA_CHANGE;
+ return 0;
+}
+
/*
- * This function checks to see if the media has been changed in the
- * CDROM drive. It is possible that we have already sensed a change,
- * or the drive may have sensed one and not yet reported it. We must
- * be ready for either case. This function always reports the current
- * value of the changed bit. If flag is 0, then the changed bit is reset.
- * This function could be done as an ioctl, but we would need to have
- * an inode for that to work, and we do not always have one.
+ * This function checks to see if the media has been changed or eject
+ * button has been pressed. It is possible that we have already
+ * sensed a change, or the drive may have sensed one and not yet
+ * reported it. The past events are accumulated in sdev->changed and
+ * returned together with the current state.
*/
-
-static int sr_media_change(struct cdrom_device_info *cdi, int slot)
+static unsigned int sr_check_events(struct cdrom_device_info *cdi,
+ unsigned int clearing, int slot)
{
struct scsi_cd *cd = cdi->handle;
- int retval;
- struct scsi_sense_hdr *sshdr;
+ bool last_present;
+ struct scsi_sense_hdr sshdr;
+ unsigned int events;
+ int ret;

- if (CDSL_CURRENT != slot) {
- /* no changer support */
- return -EINVAL;
- }
+ /* no changer support */
+ if (CDSL_CURRENT != slot)
+ return 0;

- sshdr = kzalloc(sizeof(*sshdr), GFP_KERNEL);
- retval = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES,
- sshdr);
- if (host_byte(retval) || (scsi_sense_valid(sshdr) &&
- /* ASC 0x3a is medium not present */
- sshdr->asc == 0x3a)) {
- /* Media not present or unable to test, unit probably not
- * ready. This usually means there is no disc in the drive.
- * Mark as changed, and we will figure it out later once
- * the drive is available again.
- */
- cd->device->changed = 1;
- /* This will force a flush, if called from check_disk_change */
- retval = 1;
- goto out;
- };
+ events = sr_get_events(cd->device);

- retval = cd->device->changed;
- cd->device->changed = 0;
- /* If the disk changed, the capacity will now be different,
- * so we force a re-read of this information */
- if (retval) {
- /* check multisession offset etc */
- sr_cd_check(cdi);
- get_sectorsize(cd);
+ /*
+ * GET_EVENT_STATUS_NOTIFICATION is enough unless MEDIA_CHANGE
+ * is being cleared. Note that there are devices which hang
+ * if asked to execute TUR repeatedly.
+ */
+ if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
+ goto skip_tur;
+
+ /* let's see whether the media is there with TUR */
+ last_present = cd->media_present;
+ ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
+ cd->media_present = !(host_byte(ret) || (scsi_sense_valid(&sshdr) &&
+ sshdr.asc == 0x3a));
+ if (last_present != cd->media_present)
+ events |= DISK_EVENT_MEDIA_CHANGE;
+skip_tur:
+ if (cd->device->changed) {
+ events |= DISK_EVENT_MEDIA_CHANGE;
+ cd->device->changed = 0;
}

-out:
- /* Notify userspace, that media has changed. */
- if (retval != cd->previous_state)
+ /* for backward compatibility */
+ if (events & DISK_EVENT_MEDIA_CHANGE)
sdev_evt_send_simple(cd->device, SDEV_EVT_MEDIA_CHANGE,
GFP_KERNEL);
- cd->previous_state = retval;
- kfree(sshdr);
-
- return retval;
+ return events;
}
-
+
/*
* sr_done is the interrupt routine for the device driver.
*
@@ -508,10 +534,25 @@ out:
return ret;
}

-static int sr_block_media_changed(struct gendisk *disk)
+static unsigned int sr_block_check_events(struct gendisk *disk,
+ unsigned int clearing)
{
struct scsi_cd *cd = scsi_cd(disk);
- return cdrom_media_changed(&cd->cdi);
+ return cdrom_check_events(&cd->cdi, clearing);
+}
+
+static int sr_block_revalidate_disk(struct gendisk *disk)
+{
+ struct scsi_cd *cd = scsi_cd(disk);
+ struct scsi_sense_hdr sshdr;
+
+ /* if the unit is not ready, nothing more to do */
+ if (scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr))
+ return 0;
+
+ sr_cd_check(&cd->cdi);
+ get_sectorsize(cd);
+ return 0;
}

static const struct block_device_operations sr_bdops =
@@ -520,7 +561,8 @@ static const struct block_device_operations sr_bdops =
.open = sr_block_open,
.release = sr_block_release,
.ioctl = sr_block_ioctl,
- .media_changed = sr_block_media_changed,
+ .check_events = sr_block_check_events,
+ .revalidate_disk = sr_block_revalidate_disk,
/*
* No compat_ioctl for now because sr_block_ioctl never
* seems to pass arbitary ioctls down to host drivers.
@@ -593,6 +635,7 @@ static int sr_probe(struct device *dev)
sprintf(disk->disk_name, "sr%d", minor);
disk->fops = &sr_bdops;
disk->flags = GENHD_FL_CD;
+ disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST;

blk_queue_rq_timeout(sdev->request_queue, SR_TIMEOUT);

@@ -602,7 +645,7 @@ static int sr_probe(struct device *dev)
cd->disk = disk;
cd->capacity = 0x1fffff;
cd->device->changed = 1; /* force recheck CD type */
- cd->previous_state = 1;
+ cd->media_present = 0;
cd->use = 1;
cd->readcd_known = 0;
cd->readcd_cdda = 0;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 81fbc0b..e036f1d 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -40,7 +40,7 @@ typedef struct scsi_cd {
unsigned xa_flag:1; /* CD has XA sectors ? */
unsigned readcd_known:1; /* drive supports READ_CD (0xbe) */
unsigned readcd_cdda:1; /* reading audio data using READ_CD */
- unsigned previous_state:1; /* media has changed */
+ unsigned media_present:1; /* media is present */
struct cdrom_device_info cdi;
/* We hold gendisk and scsi_device references on probe and use
* the refs on this kref to decide when to release them */
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 216af85..8675861 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -104,6 +104,7 @@ struct scsi_cmnd;
#define UNMAP 0x42
#define READ_TOC 0x43
#define READ_HEADER 0x44
+#define GET_EVENT_STATUS_NOTIFICATION 0x4a
#define LOG_SELECT 0x4c
#define LOG_SENSE 0x4d
#define XDWRITEREAD_10 0x53
--
1.7.1

2010-11-01 18:45:24

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 7/7] sd: implement sd_check_events()

Replace sd_media_change() with sd_check_events(). sd used to set the
changed state whenever the device is not ready, which can cause event
loop while the device is not ready. Media presence handling code is
changed such that the changed state is set iff the media presence
actually changes. UA still always sets the changed state and
NOT_READY always (at least where it used to set ->changed) clears
media presence, so no event is lost.

Signed-off-by: Tejun Heo <[email protected]>
---
drivers/scsi/sd.c | 89 ++++++++++++++++++++++++++++------------------------
drivers/scsi/sd.h | 1 -
2 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8d488a9..34760bb 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -991,30 +991,50 @@ out:

static void set_media_not_present(struct scsi_disk *sdkp)
{
+ if (sdkp->media_present)
+ sdkp->device->changed = 1;
sdkp->media_present = 0;
sdkp->capacity = 0;
- sdkp->device->changed = 1;
+}
+
+static int media_not_present(struct scsi_disk *sdkp,
+ struct scsi_sense_hdr *sshdr)
+{
+ if (!scsi_sense_valid(sshdr))
+ return 0;
+
+ /* not invoked for commands that could return deferred errors */
+ switch (sshdr->sense_key) {
+ case UNIT_ATTENTION:
+ sdkp->device->changed = 1;
+ /* fall through */
+ case NOT_READY:
+ /* medium not present */
+ if (sshdr->asc == 0x3A) {
+ set_media_not_present(sdkp);
+ return 1;
+ }
+ }
+ return 0;
}

/**
- * sd_media_changed - check if our medium changed
- * @disk: kernel device descriptor
+ * sd_check_events - check media events
+ * @disk: kernel device descriptor
+ * @clearing: disk events currently being cleared
*
- * Returns 0 if not applicable or no change; 1 if change
+ * Returns mask of DISK_EVENT_*.
*
* Note: this function is invoked from the block subsystem.
**/
-static int sd_media_changed(struct gendisk *disk)
+static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
{
struct scsi_disk *sdkp = scsi_disk(disk);
struct scsi_device *sdp = sdkp->device;
struct scsi_sense_hdr *sshdr = NULL;
int retval;

- SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_media_changed\n"));
-
- if (!sdp->removable)
- return 0;
+ SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_check_events\n"));

/*
* If the device is offline, don't send any commands - just pretend as
@@ -1024,7 +1044,6 @@ static int sd_media_changed(struct gendisk *disk)
*/
if (!scsi_device_online(sdp)) {
set_media_not_present(sdkp);
- retval = 1;
goto out;
}

@@ -1045,26 +1064,30 @@ static int sd_media_changed(struct gendisk *disk)
sshdr);
}

- if (retval) {
+ /* failed to execute TUR, assume media not present */
+ if (host_byte(retval)) {
set_media_not_present(sdkp);
- retval = 1;
goto out;
}

+ if (media_not_present(sdkp, sshdr))
+ goto out;
+
/*
* For removable scsi disk we have to recognise the presence
- * of a disk in the drive. This is kept in the struct scsi_disk
- * struct and tested at open ! Daniel Roche ([email protected])
+ * of a disk in the drive.
*/
+ if (!sdkp->media_present)
+ sdp->changed = 1;
sdkp->media_present = 1;
-
- retval = sdp->changed;
- sdp->changed = 0;
out:
- if (retval != sdkp->previous_state)
+ /* for backward compatibility */
+ if (sdp->changed)
sdev_evt_send_simple(sdp, SDEV_EVT_MEDIA_CHANGE, GFP_KERNEL);
- sdkp->previous_state = retval;
kfree(sshdr);
+
+ retval = sdp->changed ? DISK_EVENT_MEDIA_CHANGE : 0;
+ sdp->changed = 0;
return retval;
}

@@ -1157,7 +1180,7 @@ static const struct block_device_operations sd_fops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = sd_compat_ioctl,
#endif
- .media_changed = sd_media_changed,
+ .check_events = sd_check_events,
.revalidate_disk = sd_revalidate_disk,
.unlock_native_capacity = sd_unlock_native_capacity,
};
@@ -1293,23 +1316,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
return good_bytes;
}

-static int media_not_present(struct scsi_disk *sdkp,
- struct scsi_sense_hdr *sshdr)
-{
-
- if (!scsi_sense_valid(sshdr))
- return 0;
- /* not invoked for commands that could return deferred errors */
- if (sshdr->sense_key != NOT_READY &&
- sshdr->sense_key != UNIT_ATTENTION)
- return 0;
- if (sshdr->asc != 0x3A) /* medium not present */
- return 0;
-
- set_media_not_present(sdkp);
- return 1;
-}
-
/*
* spinup disk - called only in sd_revalidate_disk()
*/
@@ -1484,7 +1490,7 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
*/
if (sdp->removable &&
sense_valid && sshdr->sense_key == NOT_READY)
- sdp->changed = 1;
+ set_media_not_present(sdkp);

/*
* We used to set media_present to 0 here to indicate no media
@@ -2325,7 +2331,7 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
/* defaults, until the device tells us otherwise */
sdp->sector_size = 512;
sdkp->capacity = 0;
- sdkp->media_present = 1;
+ sdkp->media_present = 0;
sdkp->write_prot = 0;
sdkp->WCE = 0;
sdkp->RCD = 0;
@@ -2339,8 +2345,10 @@ static void sd_probe_async(void *data, async_cookie_t cookie)

gd->driverfs_dev = &sdp->sdev_gendev;
gd->flags = GENHD_FL_EXT_DEVT;
- if (sdp->removable)
+ if (sdp->removable) {
gd->flags |= GENHD_FL_REMOVABLE;
+ gd->events |= DISK_EVENT_MEDIA_CHANGE;
+ }

add_disk(gd);
sd_dif_config_host(sdkp);
@@ -2422,7 +2430,6 @@ static int sd_probe(struct device *dev)
sdkp->disk = gd;
sdkp->index = index;
atomic_set(&sdkp->openers, 0);
- sdkp->previous_state = 1;

if (!sdp->request_queue->rq_timeout) {
if (sdp->type != TYPE_MOD)
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 55488fa..c9d8f6c 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -55,7 +55,6 @@ struct scsi_disk {
u8 media_present;
u8 write_prot;
u8 protection_type;/* Data Integrity Field */
- unsigned previous_state : 1;
unsigned ATO : 1; /* state of disk ATO bit */
unsigned WCE : 1; /* state of disk WCE bit */
unsigned RCD : 1; /* state of disk RCD bit, unused */
--
1.7.1

2010-11-01 18:45:13

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/7] cdrom: add ->check_events() support

In principle, cdrom just needs to pass through ->check_events() but
CDROM_MEDIA_CHANGED ioctl makes things a bit more complex. Just as
with ->media_changed() support, cdrom code needs to buffer the events
and serve them to ioctl and vfs as requested. It may be a good idea
to deprecate CDROM_MEDIA_CHANGED ioctl in the long run.

Signed-off-by: Tejun Heo <[email protected]>
---
drivers/cdrom/cdrom.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/cdrom.h | 6 +++++
2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index af13c62..1ea8f8d 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -1348,7 +1348,10 @@ static int cdrom_select_disc(struct cdrom_device_info *cdi, int slot)
if (!CDROM_CAN(CDC_SELECT_DISC))
return -EDRIVE_CANT_DO_THIS;

- (void) cdi->ops->media_changed(cdi, slot);
+ if (cdi->ops->check_events)
+ cdi->ops->check_events(cdi, 0, slot);
+ else
+ cdi->ops->media_changed(cdi, slot);

if (slot == CDSL_NONE) {
/* set media changed bits, on both queues */
@@ -1392,6 +1395,41 @@ static int cdrom_select_disc(struct cdrom_device_info *cdi, int slot)
return slot;
}

+/*
+ * As cdrom implements an extra ioctl consumer for media changed
+ * event, it needs to buffer ->check_events() output, such that event
+ * is not lost for both the usual VFS and ioctl paths.
+ * cdi->{vfs|ioctl}_events are used to buffer pending events for each
+ * path.
+ *
+ * XXX: Locking is non-existent. cdi->ops->check_events() can be
+ * called in parallel and buffering fields are accessed without any
+ * exclusion. The original media_changed code had the same problem.
+ * It might be better to simply deprecate CDROM_MEDIA_CHANGED ioctl
+ * and remove this cruft altogether. It doesn't have much usefulness
+ * at this point.
+ */
+static void cdrom_update_events(struct cdrom_device_info *cdi,
+ unsigned int clearing)
+{
+ unsigned int events;
+
+ events = cdi->ops->check_events(cdi, clearing, CDSL_CURRENT);
+ cdi->vfs_events |= events;
+ cdi->ioctl_events |= events;
+}
+
+unsigned int cdrom_check_events(struct cdrom_device_info *cdi,
+ unsigned int clearing)
+{
+ unsigned int events;
+
+ cdrom_update_events(cdi, clearing);
+ events = cdi->vfs_events;
+ cdi->vfs_events = 0;
+ return events;
+}
+
/* We want to make media_changed accessible to the user through an
* ioctl. The main problem now is that we must double-buffer the
* low-level implementation, to assure that the VFS and the user both
@@ -1403,15 +1441,26 @@ int media_changed(struct cdrom_device_info *cdi, int queue)
{
unsigned int mask = (1 << (queue & 1));
int ret = !!(cdi->mc_flags & mask);
+ bool changed;

if (!CDROM_CAN(CDC_MEDIA_CHANGED))
- return ret;
+ return ret;
+
/* changed since last call? */
- if (cdi->ops->media_changed(cdi, CDSL_CURRENT)) {
+ if (cdi->ops->check_events) {
+ BUG_ON(!queue); /* shouldn't be called from VFS path */
+ cdrom_update_events(cdi, DISK_EVENT_MEDIA_CHANGE);
+ changed = cdi->ioctl_events & DISK_EVENT_MEDIA_CHANGE;
+ cdi->ioctl_events = 0;
+ } else
+ changed = cdi->ops->media_changed(cdi, CDSL_CURRENT);
+
+ if (changed) {
cdi->mc_flags = 0x3; /* set bit on both queues */
ret |= 1;
cdi->media_written = 0;
}
+
cdi->mc_flags &= ~mask; /* clear bit */
return ret;
}
diff --git a/include/linux/cdrom.h b/include/linux/cdrom.h
index 78e9047..35eae4b 100644
--- a/include/linux/cdrom.h
+++ b/include/linux/cdrom.h
@@ -946,6 +946,8 @@ struct cdrom_device_info {
/* device-related storage */
unsigned int options : 30; /* options flags */
unsigned mc_flags : 2; /* media change buffer flags */
+ unsigned int vfs_events; /* cached events for vfs path */
+ unsigned int ioctl_events; /* cached events for ioctl path */
int use_count; /* number of times device opened */
char name[20]; /* name of the device type */
/* per-device flags */
@@ -965,6 +967,8 @@ struct cdrom_device_ops {
int (*open) (struct cdrom_device_info *, int);
void (*release) (struct cdrom_device_info *);
int (*drive_status) (struct cdrom_device_info *, int);
+ unsigned int (*check_events) (struct cdrom_device_info *cdi,
+ unsigned int clearing, int slot);
int (*media_changed) (struct cdrom_device_info *, int);
int (*tray_move) (struct cdrom_device_info *, int);
int (*lock_door) (struct cdrom_device_info *, int);
@@ -993,6 +997,8 @@ extern int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev,
extern void cdrom_release(struct cdrom_device_info *cdi, fmode_t mode);
extern int cdrom_ioctl(struct cdrom_device_info *cdi, struct block_device *bdev,
fmode_t mode, unsigned int cmd, unsigned long arg);
+extern unsigned int cdrom_check_events(struct cdrom_device_info *cdi,
+ unsigned int clearing);
extern int cdrom_media_changed(struct cdrom_device_info *);

extern int register_cdrom(struct cdrom_device_info *cdi);
--
1.7.1

2010-11-01 18:46:59

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/7] block: move register_disk() and del_gendisk() to block/genhd.c

There's no reason register_disk() and del_gendisk() should be in
fs/partitions/check.c and it forces unnecessary global functions.
Move both to genhd.c. While at it, collapse unlink_gendisk(), which
was artificially in a separate function due to genhd.c/check.c split,
into del_gendisk().

Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++--
fs/partitions/check.c | 89 -----------------------------------------------
include/linux/blkdev.h | 1 -
include/linux/genhd.h | 1 -
4 files changed, 87 insertions(+), 94 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 0905ab2..2e5e4c0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -502,6 +502,64 @@ static int exact_lock(dev_t devt, void *data)
return 0;
}

+void register_disk(struct gendisk *disk)
+{
+ struct device *ddev = disk_to_dev(disk);
+ struct block_device *bdev;
+ struct disk_part_iter piter;
+ struct hd_struct *part;
+ int err;
+
+ ddev->parent = disk->driverfs_dev;
+
+ dev_set_name(ddev, disk->disk_name);
+
+ /* delay uevents, until we scanned partition table */
+ dev_set_uevent_suppress(ddev, 1);
+
+ if (device_add(ddev))
+ return;
+ if (!sysfs_deprecated) {
+ err = sysfs_create_link(block_depr, &ddev->kobj,
+ kobject_name(&ddev->kobj));
+ if (err) {
+ device_del(ddev);
+ return;
+ }
+ }
+ disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
+ disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
+
+ /* No minors to use for partitions */
+ if (!disk_partitionable(disk))
+ goto exit;
+
+ /* No such device (e.g., media were just removed) */
+ if (!get_capacity(disk))
+ goto exit;
+
+ bdev = bdget_disk(disk, 0);
+ if (!bdev)
+ goto exit;
+
+ bdev->bd_invalidated = 1;
+ err = blkdev_get(bdev, FMODE_READ, NULL);
+ if (err < 0)
+ goto exit;
+ blkdev_put(bdev, FMODE_READ);
+
+exit:
+ /* announce disk after possible partitions are created */
+ dev_set_uevent_suppress(ddev, 0);
+ kobject_uevent(&ddev->kobj, KOBJ_ADD);
+
+ /* announce possible partitions */
+ disk_part_iter_init(&piter, disk, 0);
+ while ((part = disk_part_iter_next(&piter)))
+ kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD);
+ disk_part_iter_exit(&piter);
+}
+
/**
* add_disk - add partitioning information to kernel list
* @disk: per-device partitioning information
@@ -552,17 +610,43 @@ void add_disk(struct gendisk *disk)
"bdi");
WARN_ON(retval);
}
-
EXPORT_SYMBOL(add_disk);
-EXPORT_SYMBOL(del_gendisk); /* in partitions/check.c */

-void unlink_gendisk(struct gendisk *disk)
+void del_gendisk(struct gendisk *disk)
{
+ struct disk_part_iter piter;
+ struct hd_struct *part;
+
+ /* invalidate stuff */
+ disk_part_iter_init(&piter, disk,
+ DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+ while ((part = disk_part_iter_next(&piter))) {
+ invalidate_partition(disk, part->partno);
+ delete_partition(disk, part->partno);
+ }
+ disk_part_iter_exit(&piter);
+
+ invalidate_partition(disk, 0);
+ blk_free_devt(disk_to_dev(disk)->devt);
+ set_capacity(disk, 0);
+ disk->flags &= ~GENHD_FL_UP;
+
sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
bdi_unregister(&disk->queue->backing_dev_info);
blk_unregister_queue(disk);
blk_unregister_region(disk_devt(disk), disk->minors);
+
+ part_stat_set_all(&disk->part0, 0);
+ disk->part0.stamp = 0;
+
+ kobject_put(disk->part0.holder_dir);
+ kobject_put(disk->slave_dir);
+ disk->driverfs_dev = NULL;
+ if (!sysfs_deprecated)
+ sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+ device_del(disk_to_dev(disk));
}
+EXPORT_SYMBOL(del_gendisk);

/**
* get_gendisk - get partitioning information for a given device
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 2e6501d..e4afbac 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -507,65 +507,6 @@ out_put:
return ERR_PTR(err);
}

-/* Not exported, helper to add_disk(). */
-void register_disk(struct gendisk *disk)
-{
- struct device *ddev = disk_to_dev(disk);
- struct block_device *bdev;
- struct disk_part_iter piter;
- struct hd_struct *part;
- int err;
-
- ddev->parent = disk->driverfs_dev;
-
- dev_set_name(ddev, disk->disk_name);
-
- /* delay uevents, until we scanned partition table */
- dev_set_uevent_suppress(ddev, 1);
-
- if (device_add(ddev))
- return;
- if (!sysfs_deprecated) {
- err = sysfs_create_link(block_depr, &ddev->kobj,
- kobject_name(&ddev->kobj));
- if (err) {
- device_del(ddev);
- return;
- }
- }
- disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
- disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
-
- /* No minors to use for partitions */
- if (!disk_partitionable(disk))
- goto exit;
-
- /* No such device (e.g., media were just removed) */
- if (!get_capacity(disk))
- goto exit;
-
- bdev = bdget_disk(disk, 0);
- if (!bdev)
- goto exit;
-
- bdev->bd_invalidated = 1;
- err = blkdev_get(bdev, FMODE_READ, NULL);
- if (err < 0)
- goto exit;
- blkdev_put(bdev, FMODE_READ);
-
-exit:
- /* announce disk after possible partitions are created */
- dev_set_uevent_suppress(ddev, 0);
- kobject_uevent(&ddev->kobj, KOBJ_ADD);
-
- /* announce possible partitions */
- disk_part_iter_init(&piter, disk, 0);
- while ((part = disk_part_iter_next(&piter)))
- kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD);
- disk_part_iter_exit(&piter);
-}
-
static bool disk_unlock_native_capacity(struct gendisk *disk)
{
const struct block_device_operations *bdops = disk->fops;
@@ -728,33 +669,3 @@ fail:
}

EXPORT_SYMBOL(read_dev_sector);
-
-void del_gendisk(struct gendisk *disk)
-{
- struct disk_part_iter piter;
- struct hd_struct *part;
-
- /* invalidate stuff */
- disk_part_iter_init(&piter, disk,
- DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
- while ((part = disk_part_iter_next(&piter))) {
- invalidate_partition(disk, part->partno);
- delete_partition(disk, part->partno);
- }
- disk_part_iter_exit(&piter);
-
- invalidate_partition(disk, 0);
- blk_free_devt(disk_to_dev(disk)->devt);
- set_capacity(disk, 0);
- disk->flags &= ~GENHD_FL_UP;
- unlink_gendisk(disk);
- part_stat_set_all(&disk->part0, 0);
- disk->part0.stamp = 0;
-
- kobject_put(disk->part0.holder_dir);
- kobject_put(disk->slave_dir);
- disk->driverfs_dev = NULL;
- if (!sysfs_deprecated)
- sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
- device_del(disk_to_dev(disk));
-}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5027a59..c6f9e3c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -644,7 +644,6 @@ static inline void rq_flush_dcache_pages(struct request *rq)

extern int blk_register_queue(struct gendisk *disk);
extern void blk_unregister_queue(struct gendisk *disk);
-extern void register_disk(struct gendisk *dev);
extern void generic_make_request(struct bio *bio);
extern void blk_rq_init(struct request_queue *q, struct request *rq);
extern void blk_put_request(struct request *);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 5e4e692..56e17ed 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -394,7 +394,6 @@ extern void part_round_stats(int cpu, struct hd_struct *part);
/* block/genhd.c */
extern void add_disk(struct gendisk *disk);
extern void del_gendisk(struct gendisk *gp);
-extern void unlink_gendisk(struct gendisk *gp);
extern struct gendisk *get_gendisk(dev_t dev, int *partno);
extern struct block_device *bdget_disk(struct gendisk *disk, int partno);

--
1.7.1

2010-11-01 18:46:29

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/7] block: kill genhd_media_change_notify()

There's no user of the facility. Kill it.

Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 25 -------------------------
include/linux/genhd.h | 1 -
2 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 5fa2b44..0905ab2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1110,29 +1110,6 @@ static int __init proc_genhd_init(void)
module_init(proc_genhd_init);
#endif /* CONFIG_PROC_FS */

-static void media_change_notify_thread(struct work_struct *work)
-{
- struct gendisk *gd = container_of(work, struct gendisk, async_notify);
- char event[] = "MEDIA_CHANGE=1";
- char *envp[] = { event, NULL };
-
- /*
- * set enviroment vars to indicate which event this is for
- * so that user space will know to go check the media status.
- */
- kobject_uevent_env(&disk_to_dev(gd)->kobj, KOBJ_CHANGE, envp);
- put_device(gd->driverfs_dev);
-}
-
-#if 0
-void genhd_media_change_notify(struct gendisk *disk)
-{
- get_device(disk->driverfs_dev);
- schedule_work(&disk->async_notify);
-}
-EXPORT_SYMBOL_GPL(genhd_media_change_notify);
-#endif /* 0 */
-
dev_t blk_lookup_devt(const char *name, int partno)
{
dev_t devt = MKDEV(0, 0);
@@ -1198,8 +1175,6 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
disk_to_dev(disk)->class = &block_class;
disk_to_dev(disk)->type = &disk_type;
device_initialize(disk_to_dev(disk));
- INIT_WORK(&disk->async_notify,
- media_change_notify_thread);
}
return disk;
}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 7a7b9c1..5e4e692 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -173,7 +173,6 @@ struct gendisk {
struct timer_rand_state *random;

atomic_t sync_io; /* RAID */
- struct work_struct async_notify;
#ifdef CONFIG_BLK_DEV_INTEGRITY
struct blk_integrity *integrity;
#endif
--
1.7.1

2010-11-04 17:07:18

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCHSET RFC] block/SCSI: implement in-kernel disk event handling

On 10-11-01 02:44 PM, Tejun Heo wrote:
> This patchset implements in-kernel disk event handling framework and
> add support for it to sr and sd. This is largely to move media
> presence polling into kernel as userspace implementation turned out to
> be quite problematic over the years.
>
>> From the patch description of the third patch,
>
> Currently, media presence polling for removeable block devices is done
> from userland. There are several issues with this.
>
> * Polling is done by periodically opening the device. For SCSI
> devices, the command sequence generated by such action involves a
> few different commands including TEST_UNIT_READY. This behavior,
> while perfectly legal, is different from Windows which only issues
> single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
> ATAPI devices lock up after being periodically queried such command
> sequences.
> * There is no reliable and unintrusive way for a userland program to
> tell whether the target device is safe for media presence polling.
> For example, polling for media presence during an on-going burning
> session can make it fail. The polling program can avoid this by
> opening the device with O_EXCL but then it risks making a valid
> exclusive user of the device fail w/ -EBUSY.

Tejun,
Yes. You could add the SCSI FORMAT UNIT and the ATA
(and soon SCSI) SANITIZE commands to that example.
Compliant SCSI units should respond to such an untimely
TEST UNIT READY with the additional sense of "logical
unit not ready, format in progress" but many devices
don't leading to error recovery and device resets.


Ideally I would like to see a way from the user space
to uncouple a unit from the sd (or sr) driver, in a
similar way to how an LLD can use
scsi_device::no_uld_attach . That would restrict user
space access to the bsg or sg driver. Another approach
is to flag the sd (or sr) driver to not send any !@#$%^&
commands to a device until further notice.

Doug Gilbert

> * Userland polling is unnecessarily heavy and in-kernel implementation
> is lighter and better coordinated (workqueue, timer slack).

<snip>

> --
> tejun

2010-11-04 18:38:20

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCHSET RFC] block/SCSI: implement in-kernel disk event handling

Douglas Gilbert <[email protected]> wrote:
<snip>
> Ideally I would like to see a way from the user space
> to uncouple a unit from the sd (or sr) driver, in a
> similar way to how an LLD can use
> scsi_device::no_uld_attach . That would restrict user
> space access to the bsg or sg driver.

I am not sure if this is exactly what you want but you can unbind a
device from the sd driver today.

E.x.

# cd /sys/bus/scsi/drivers/sd
# ls
0:0:0:0 4:0:0:0 4:0:1:0 5:0:0:0 5:0:1:0 bind unbind
# sg_map
/dev/sg0 /dev/sda
/dev/sg1 /dev/sdb
/dev/sg2 /dev/sdc
/dev/sg3 /dev/sdd
/dev/sg4 /dev/sde
# echo "5:0:1:0" > unbind
# ls
0:0:0:0 4:0:0:0 4:0:1:0 5:0:0:0 bind unbind
# sg_map
/dev/sg0 /dev/sda
/dev/sg1 /dev/sdb
/dev/sg2 /dev/sdc
/dev/sg3 /dev/sdd
/dev/sg4

-andmike
--
Michael Anderson
[email protected]