2007-06-15 12:13:32

by Geert Uytterhoeven

[permalink] [raw]
Subject: [patch 4/6] ps3: Disk Storage Driver

From: Geert Uytterhoeven <[email protected]>

Add a Disk Storage Driver for the PS3:
- Implemented as a block device driver with a dynamic major
- Disk names (and partitions) are of the format ps3d%c(%u)
- Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
doesn't support scatter-gather

CC: Geoff Levand <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
---
Changes since previous submission:
o Don't use `default y' in Kconfig
o Remove the thread for handling requests
o Set up sysfs links between block device and PS3 system device:
. /sys/block/ps3da/device -> ../../devices/ps3_system/sb_02
. /sys/devices/ps3_system/sb_02/block:ps3da -> ../../../block/ps3da
o Fix all FIXMEs
o Implement proper barrier support
o Call add_disk_randomness()
o Add a test to verify we are running on a PS3
o Fix error path in ps3disk_init()
o Fix cleanup order in ps3disk_exit()

arch/powerpc/platforms/ps3/Kconfig | 10
drivers/block/Makefile | 1
drivers/block/ps3disk.c | 526 +++++++++++++++++++++++++++++++++++++
3 files changed, 537 insertions(+)

--- a/arch/powerpc/platforms/ps3/Kconfig
+++ b/arch/powerpc/platforms/ps3/Kconfig
@@ -102,4 +102,14 @@ config PS3_STORAGE
depends on PPC_PS3
tristate

+config PS3_DISK
+ tristate "PS3 Disk Storage Driver"
+ depends on PPC_PS3 && BLOCK
+ select PS3_STORAGE
+ help
+ Include support for the PS3 Disk Storage.
+
+ This support is required to access the PS3 hard disk.
+ In general, all users will say Y or M.
+
endmenu
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_VIODASD) += viodasd.o
obj-$(CONFIG_BLK_DEV_SX8) += sx8.o
obj-$(CONFIG_BLK_DEV_UB) += ub.o

+obj-$(CONFIG_PS3_DISK) += ps3disk.o
--- /dev/null
+++ b/drivers/block/ps3disk.c
@@ -0,0 +1,526 @@
+/*
+ * PS3 Disk Storage Driver
+ *
+ * Copyright (C) 2007 Sony Computer Entertainment Inc.
+ * Copyright 2007 Sony Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published
+ * by the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/blkdev.h>
+#include <linux/freezer.h>
+#include <linux/hdreg.h>
+#include <linux/interrupt.h>
+#include <linux/kthread.h>
+
+#include <asm/lv1call.h>
+#include <asm/ps3stor.h>
+#include <asm/firmware.h>
+
+
+#define DEVICE_NAME "ps3disk"
+
+#define BOUNCE_SIZE (64*1024)
+
+#define PS3DISK_MAJOR 0
+
+#define PS3DISK_MAX_DISKS 16
+#define PS3DISK_MINORS 16
+
+#define KERNEL_SECTOR_SIZE 512
+
+
+#define PS3DISK_NAME "ps3d%c"
+
+#define LV1_STORAGE_ATA_HDDOUT (0x23)
+
+
+struct ps3disk_private {
+ spinlock_t lock; /* Request queue spinlock */
+ struct request_queue *queue;
+ struct gendisk *gendisk;
+ unsigned int blocking_factor;
+ struct request *req;
+};
+#define ps3disk_priv(dev) ((dev)->sbd.core.driver_data)
+
+static int ps3disk_major = PS3DISK_MAJOR;
+
+
+static int ps3disk_open(struct inode *inode, struct file *file)
+{
+ struct ps3_storage_device *dev = inode->i_bdev->bd_disk->private_data;
+
+ file->private_data = dev;
+ return 0;
+}
+
+static struct block_device_operations ps3disk_fops = {
+ .owner = THIS_MODULE,
+ .open = ps3disk_open,
+};
+
+
+static void ps3disk_scatter_gather(struct ps3_storage_device *dev,
+ struct request *req, int gather)
+{
+ unsigned int sectors = 0, offset = 0;
+ struct bio *bio;
+ sector_t sector;
+ struct bio_vec *bvec;
+ unsigned int i = 0, j;
+ size_t size;
+ void *buf;
+
+ rq_for_each_bio(bio, req) {
+ sector = bio->bi_sector;
+ dev_dbg(&dev->sbd.core,
+ "%s:%u: bio %u: %u segs %u sectors from %lu\n",
+ __func__, __LINE__, i, bio_segments(bio),
+ bio_sectors(bio), sector);
+ bio_for_each_segment(bvec, bio, j) {
+ size = bio_cur_sectors(bio)*KERNEL_SECTOR_SIZE;
+ buf = __bio_kmap_atomic(bio, j, KM_USER0);
+ if (gather)
+ memcpy(dev->bounce_buf+offset, buf, size);
+ else
+ memcpy(buf, dev->bounce_buf+offset, size);
+ offset += size;
+ __bio_kunmap_atomic(bio, KM_USER0);
+ }
+ sectors += bio_sectors(bio);
+ i++;
+ }
+}
+
+static u64 ps3disk_read_write_sectors(struct ps3_storage_device *dev, u64 lpar,
+ u64 start_sector, u64 sectors, int write)
+{
+ unsigned int region_id = dev->regions[dev->region_idx].id;
+ const char *op = write ? "write" : "read";
+ int res;
+
+ dev_dbg(&dev->sbd.core, "%s:%u: %s %lu sectors starting at %lu\n",
+ __func__, __LINE__, op, sectors, start_sector);
+
+ res = write ? lv1_storage_write(dev->sbd.dev_id, region_id,
+ start_sector, sectors, 0, lpar,
+ &dev->tag)
+ : lv1_storage_read(dev->sbd.dev_id, region_id,
+ start_sector, sectors, 0, lpar,
+ &dev->tag);
+ if (res) {
+ dev_dbg(&dev->sbd.core, "%s:%u: %s failed %d\n", __func__,
+ __LINE__, op, res);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int ps3disk_submit_request_sg(struct ps3_storage_device *dev,
+ struct request *req)
+{
+ struct ps3disk_private *priv = ps3disk_priv(dev);
+ int write = rq_data_dir(req);
+ const char *op = write ? "write" : "read";
+ u64 res;
+
+#ifdef DEBUG
+ unsigned int n = 0;
+ struct bio *bio;
+ rq_for_each_bio(bio, req)
+ n++;
+ dev_dbg(&dev->sbd.core,
+ "%s:%u: %s req has %u bios for %lu sectors %lu hard sectors\n",
+ __func__, __LINE__, op, n, req->nr_sectors,
+ req->hard_nr_sectors);
+#endif
+
+ if (write)
+ ps3disk_scatter_gather(dev, req, 1);
+
+ res = ps3disk_read_write_sectors(dev, dev->bounce_lpar,
+ req->sector*priv->blocking_factor,
+ req->nr_sectors*priv->blocking_factor,
+ write);
+ if (res) {
+ dev_err(&dev->sbd.core, "%s:%u: %s failed 0x%lx\n", __func__,
+ __LINE__, op, res);
+ end_request(req, 0);
+ return 0;
+ }
+
+ priv->req = req;
+ return 1;
+}
+
+static int ps3disk_submit_flush_request(struct ps3_storage_device *dev,
+ struct request *req)
+{
+ struct ps3disk_private *priv = ps3disk_priv(dev);
+ u64 res;
+
+ dev_dbg(&dev->sbd.core, "%s:%u: flush request\n", __func__, __LINE__);
+
+ res = lv1_storage_send_device_command(dev->sbd.dev_id,
+ LV1_STORAGE_ATA_HDDOUT, 0, 0, 0,
+ 0, &dev->tag);
+ if (res) {
+ dev_err(&dev->sbd.core, "%s:%u: sync cache failed 0x%lx\n",
+ __func__, __LINE__, res);
+ end_request(req, 0);
+ return 0;
+ }
+
+ priv->req = req;
+ return 1;
+}
+
+static void ps3disk_do_request(struct ps3_storage_device *dev,
+ request_queue_t *q)
+{
+ struct request *req;
+
+ dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
+
+ while ((req = elv_next_request(q))) {
+ if (blk_fs_request(req)) {
+ if (ps3disk_submit_request_sg(dev, req))
+ break;
+ } else if (req->cmd_type == REQ_TYPE_FLUSH) {
+ if (ps3disk_submit_flush_request(dev, req))
+ break;
+ } else {
+ blk_dump_rq_flags(req, DEVICE_NAME " bad request");
+ end_request(req, 0);
+ continue;
+ }
+ }
+}
+
+static void ps3disk_request(request_queue_t *q)
+{
+ struct ps3_storage_device *dev = q->queuedata;
+ struct ps3disk_private *priv = ps3disk_priv(dev);
+
+ if (priv->req) {
+ dev_dbg(&dev->sbd.core, "%s:%u busy\n", __func__, __LINE__);
+ return;
+ }
+
+ ps3disk_do_request(dev, q);
+}
+
+static irqreturn_t ps3disk_interrupt(int irq, void *data)
+{
+ struct ps3_storage_device *dev = data;
+ struct ps3disk_private *priv;
+ struct request *req;
+ int res, read, uptodate;
+ u64 tag, status;
+ unsigned long num_sectors;
+ const char *op;
+
+ res = lv1_storage_get_async_status(dev->sbd.dev_id, &tag, &status);
+
+ if (tag != dev->tag)
+ dev_err(&dev->sbd.core,
+ "%s:%u: tag mismatch, got %lx, expected %lx\n",
+ __func__, __LINE__, tag, dev->tag);
+
+ if (res) {
+ dev_err(&dev->sbd.core, "%s:%u: res=%d status=0x%lx\n",
+ __func__, __LINE__, res, status);
+ return IRQ_HANDLED;
+ }
+
+ priv = ps3disk_priv(dev);
+ req = priv->req;
+ if (!req) {
+ dev_dbg(&dev->sbd.core,
+ "%s:%u non-block layer request completed\n", __func__,
+ __LINE__);
+ dev->lv1_status = status;
+ complete(&dev->done);
+ return IRQ_HANDLED;
+ }
+
+ if (req->cmd_type == REQ_TYPE_FLUSH) {
+ read = 0;
+ num_sectors = req->hard_cur_sectors;
+ op = "flush";
+ } else {
+ read = !rq_data_dir(req);
+ num_sectors = req->nr_sectors;
+ op = read ? "read" : "write";
+ }
+ if (status) {
+ dev_dbg(&dev->sbd.core, "%s:%u: %s failed 0x%lx\n", __func__,
+ __LINE__, op, status);
+ uptodate = 0;
+ } else {
+ dev_dbg(&dev->sbd.core, "%s:%u: %s completed\n", __func__,
+ __LINE__, op);
+ uptodate = 1;
+ if (read)
+ ps3disk_scatter_gather(dev, req, 0);
+ }
+
+ spin_lock(&priv->lock);
+ if (!end_that_request_first(req, uptodate, num_sectors)) {
+ add_disk_randomness(req->rq_disk);
+ blkdev_dequeue_request(req);
+ end_that_request_last(req, uptodate);
+ }
+ priv->req = NULL;
+ ps3disk_do_request(dev, priv->queue);
+ spin_unlock(&priv->lock);
+
+ return IRQ_HANDLED;
+}
+
+static int ps3disk_sync_cache(struct ps3_storage_device *dev)
+{
+ u64 res;
+
+ dev_dbg(&dev->sbd.core, "%s:%u: sync cache\n", __func__, __LINE__);
+
+ res = ps3stor_send_command(dev, LV1_STORAGE_ATA_HDDOUT, 0, 0, 0, 0);
+ if (res) {
+ dev_err(&dev->sbd.core, "%s:%u: sync cache failed 0x%lx\n",
+ __func__, __LINE__, res);
+ return -EIO;
+ }
+ return 0;
+}
+
+static void ps3disk_prepare_flush(request_queue_t *q, struct request *req)
+{
+ struct ps3_storage_device *dev = q->queuedata;
+
+ dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
+
+ memset(req->cmd, 0, sizeof(req->cmd));
+ req->cmd_type = REQ_TYPE_FLUSH;
+}
+
+static int ps3disk_issue_flush(request_queue_t *q, struct gendisk *gendisk,
+ sector_t *sector)
+{
+ struct ps3_storage_device *dev = q->queuedata;
+ struct request *req;
+ int res;
+
+ dev_dbg(&dev->sbd.core, "%s:%u\n", __func__, __LINE__);
+
+ req = blk_get_request(q, WRITE, __GFP_WAIT);
+ ps3disk_prepare_flush(q, req);
+ res = blk_execute_rq(q, gendisk, req, 0);
+ if (res)
+ dev_err(&dev->sbd.core, "%s:%u: flush request failed %d\n",
+ __func__, __LINE__, res);
+ blk_put_request(req);
+ return res;
+}
+
+
+static unsigned long ps3disk_mask;
+
+static int __devinit ps3disk_probe(struct ps3_system_bus_device *_dev)
+{
+ struct ps3_storage_device *dev = to_ps3_storage_device(&_dev->core);
+ struct ps3disk_private *priv;
+ int error;
+ unsigned int devidx;
+ struct request_queue *queue;
+ struct gendisk *gendisk;
+
+ if (dev->blk_size < KERNEL_SECTOR_SIZE) {
+ dev_err(&dev->sbd.core,
+ "%s:%u: cannot handle block size %lu\n", __func__,
+ __LINE__, dev->blk_size);
+ return -EINVAL;
+ }
+
+ BUILD_BUG_ON(PS3DISK_MAX_DISKS > BITS_PER_LONG);
+ devidx = find_first_zero_bit(&ps3disk_mask, PS3DISK_MAX_DISKS);
+ if (devidx >= PS3DISK_MAX_DISKS) {
+ dev_err(&dev->sbd.core, "%s:%u: Too many disks\n", __func__,
+ __LINE__);
+ return -ENOSPC;
+ }
+ __set_bit(devidx, &ps3disk_mask);
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv) {
+ error = -ENOMEM;
+ goto fail;
+ }
+
+ ps3disk_priv(dev) = priv;
+ spin_lock_init(&priv->lock);
+
+ dev->bounce_size = BOUNCE_SIZE;
+ dev->bounce_buf = kmalloc(BOUNCE_SIZE, GFP_DMA);
+ if (!dev->bounce_buf) {
+ error = -ENOMEM;
+ goto fail_free_priv;
+ }
+
+ error = ps3stor_setup(dev);
+ if (error)
+ goto fail_free_bounce;
+
+ /* override the interrupt handler */
+ free_irq(dev->irq, dev);
+ error = request_irq(dev->irq, ps3disk_interrupt, IRQF_DISABLED,
+ dev->sbd.core.driver->name, dev);
+ if (error) {
+ dev_err(&dev->sbd.core, "%s:%u: request_irq failed %d\n",
+ __func__, __LINE__, error);
+ goto fail_teardown;
+ }
+
+ queue = blk_init_queue(ps3disk_request, &priv->lock);
+ if (!queue) {
+ dev_err(&dev->sbd.core, "%s:%u: blk_init_queue failed\n",
+ __func__, __LINE__);
+ error = -ENOMEM;
+ goto fail_teardown;
+ }
+
+ priv->queue = queue;
+ queue->queuedata = dev;
+
+ blk_queue_bounce_limit(queue, BLK_BOUNCE_HIGH);
+
+ blk_queue_max_sectors(queue, dev->bounce_size/KERNEL_SECTOR_SIZE);
+ blk_queue_segment_boundary(queue, -1UL);
+ blk_queue_dma_alignment(queue, dev->blk_size-1);
+ blk_queue_hardsect_size(queue, dev->blk_size);
+
+ blk_queue_issue_flush_fn(queue, ps3disk_issue_flush);
+ blk_queue_ordered(queue, QUEUE_ORDERED_DRAIN_FLUSH,
+ ps3disk_prepare_flush);
+
+ blk_queue_max_phys_segments(queue, -1);
+ blk_queue_max_hw_segments(queue, -1);
+ blk_queue_max_segment_size(queue, dev->bounce_size);
+
+ gendisk = alloc_disk(PS3DISK_MINORS);
+ if (!gendisk) {
+ dev_err(&dev->sbd.core, "%s:%u: alloc_disk failed\n", __func__,
+ __LINE__);
+ error = -ENOMEM;
+ goto fail_cleanup_queue;
+ }
+
+ priv->gendisk = gendisk;
+ gendisk->major = ps3disk_major;
+ gendisk->first_minor = devidx * PS3DISK_MINORS;
+ gendisk->fops = &ps3disk_fops;
+ gendisk->queue = queue;
+ gendisk->private_data = dev;
+ gendisk->driverfs_dev = &dev->sbd.core;
+ snprintf(gendisk->disk_name, sizeof(gendisk->disk_name), PS3DISK_NAME,
+ devidx+'a');
+ priv->blocking_factor = dev->blk_size/KERNEL_SECTOR_SIZE;
+ set_capacity(gendisk,
+ dev->regions[dev->region_idx].size*priv->blocking_factor);
+
+ add_disk(gendisk);
+ return 0;
+
+fail_cleanup_queue:
+ blk_cleanup_queue(queue);
+fail_teardown:
+ ps3stor_teardown(dev);
+fail_free_bounce:
+ kfree(dev->bounce_buf);
+fail_free_priv:
+ kfree(priv);
+fail:
+ __clear_bit(devidx, &ps3disk_mask);
+ return error;
+}
+
+static int ps3disk_remove(struct ps3_system_bus_device *_dev)
+{
+ struct ps3_storage_device *dev = to_ps3_storage_device(&_dev->core);
+ struct ps3disk_private *priv = ps3disk_priv(dev);
+
+ __clear_bit(priv->gendisk->first_minor / PS3DISK_MINORS,
+ &ps3disk_mask);
+ del_gendisk(priv->gendisk);
+ put_disk(priv->gendisk);
+ blk_cleanup_queue(priv->queue);
+ dev_notice(&dev->sbd.core, "Synchronizing disk cache\n");
+ ps3disk_sync_cache(dev);
+ ps3stor_teardown(dev);
+ kfree(dev->bounce_buf);
+ kfree(priv);
+ return 0;
+}
+
+static struct ps3_system_bus_driver ps3disk = {
+ .match_id = PS3_MATCH_ID_STOR_DISK,
+ .core.name = DEVICE_NAME,
+ .core.owner = THIS_MODULE,
+ .probe = ps3disk_probe,
+ .remove = ps3disk_remove,
+ .shutdown = ps3disk_remove,
+};
+
+
+static int __init ps3disk_init(void)
+{
+ int error;
+
+ if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
+ return -ENODEV;
+
+ error = register_blkdev(ps3disk_major, DEVICE_NAME);
+ if (error <= 0) {
+ printk(KERN_ERR "%s:%u: register_blkdev failed %d\n", __func__,
+ __LINE__, error);
+ return error;
+ }
+ if (!ps3disk_major)
+ ps3disk_major = error;
+
+ pr_info("%s:%u: registered block device major %d\n", __func__,
+ __LINE__, ps3disk_major);
+
+ error = ps3_system_bus_driver_register(&ps3disk);
+ if (error)
+ unregister_blkdev(ps3disk_major, DEVICE_NAME);
+
+ return error;
+}
+
+static void __exit ps3disk_exit(void)
+{
+ ps3_system_bus_driver_unregister(&ps3disk);
+ unregister_blkdev(ps3disk_major, DEVICE_NAME);
+}
+
+module_init(ps3disk_init);
+module_exit(ps3disk_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("PS3 Disk Storage Driver");
+MODULE_AUTHOR("Sony Corporation");
+MODULE_ALIAS(PS3_MODULE_ALIAS_STOR_DISK);

--
With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619


2007-06-15 14:36:37

by David Woodhouse

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote:
> Add a Disk Storage Driver for the PS3:
> - Implemented as a block device driver with a dynamic major
> - Disk names (and partitions) are of the format ps3d%c(%u)
> - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> doesn't support scatter-gather

Any particular reason why this is done as a separate block device driver
rather than as SCSI?

--
dwmw2

2007-06-15 14:41:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Friday 15 June 2007, David Woodhouse wrote:
>
> On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote:
> > Add a Disk Storage Driver for the PS3:
> > ? - Implemented as a block device driver with a dynamic major
> > ? - Disk names (and partitions) are of the format ps3d%c(%u)
> > ? - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> > ? ? doesn't support scatter-gather
>
> Any particular reason why this is done as a separate block device driver
> rather than as SCSI?

Because the hypervisor interface is based on tranferring blocks, not based
on SCSI commands like the one for the optical storage drive.

The first version of this driver actually was based on SCSI command emulation,
and in an earlier round of reviews it was decided to rewrite it to use the
simpler block interface directly.

Arnd <><

2007-06-15 14:43:31

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 15 Jun 2007, David Woodhouse wrote:
> On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote:
> > Add a Disk Storage Driver for the PS3:
> > - Implemented as a block device driver with a dynamic major
> > - Disk names (and partitions) are of the format ps3d%c(%u)
> > - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> > doesn't support scatter-gather
>
> Any particular reason why this is done as a separate block device driver
> rather than as SCSI?

Because no new fake SCSI drivers are accepted anymore.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-06-15 16:11:39

by Alan

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

> > Any particular reason why this is done as a separate block device driver
> > rather than as SCSI?
>
> Because no new fake SCSI drivers are accepted anymore.

Where did drivers/ata come from ;)

How about making it a fake ata driver if James is being fussy 8)

2007-06-15 18:05:43

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 15 Jun 2007, Alan Cox wrote:
> > > Any particular reason why this is done as a separate block device driver
> > > rather than as SCSI?
> >
> > Because no new fake SCSI drivers are accepted anymore.
>
> Where did drivers/ata come from ;)

I was told that one got in just before the moratorium on fake SCSI.

> How about making it a fake ata driver if James is being fussy 8)

I don't think Christoph likes fake ATA drivers neither ;-)

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-06-15 21:17:28

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Geert Uytterhoeven <[email protected]>
Date: Fri, 15 Jun 2007 16:43:05 +0200 (CEST)

> On Fri, 15 Jun 2007, David Woodhouse wrote:
> > On Fri, 2007-06-15 at 13:39 +0200, Geert Uytterhoeven wrote:
> > > Add a Disk Storage Driver for the PS3:
> > > - Implemented as a block device driver with a dynamic major
> > > - Disk names (and partitions) are of the format ps3d%c(%u)
> > > - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> > > doesn't support scatter-gather
> >
> > Any particular reason why this is done as a separate block device driver
> > rather than as SCSI?
>
> Because no new fake SCSI drivers are accepted anymore.

I'm strongly divided on this issue as I'm about to hit the same exact
thing for Sun Logical Domains on sparc64 Niagara systems.

In fact the interface I get to use allows SCSI commands to be sent
pass-through to the device, even though the basic virtual I/O API is
purely block I/O based.

It's senseless to make people build new major/minor numbers for
all these new quirky storage drivers. People have to add support
for the new major number to installers and all kinds of other
tools.

If the SCSI guys were smart, there would be a totally generic helper
layer that allows anyone to hook into the SCSI layer as a virtual SCSI
disk provider in like 10 lines of code. :-)

2007-06-15 21:19:24

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Alan Cox <[email protected]>
Date: Fri, 15 Jun 2007 17:15:45 +0100

> > > Any particular reason why this is done as a separate block device driver
> > > rather than as SCSI?
> >
> > Because no new fake SCSI drivers are accepted anymore.
>
> Where did drivers/ata come from ;)
>
> How about making it a fake ata driver if James is being fussy 8)

That sounds like a good idea for my virtual I/O case on
Niagara too actually :-)

Another quirk I have to deal with is that under LDOMs you
can export full disks and also just slices. So I'll have
to get down into the partition machinery to support that
somehow.

2007-06-15 21:29:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

David Miller wrote:
> From: Geert Uytterhoeven <[email protected]>

>> Because no new fake SCSI drivers are accepted anymore.

If you can access the firmware/hypervisor code, have that provide SCSI.
Then the Linux driver piece is easy, obvious and flexible SCSI, and is
not "fake."


> If the SCSI guys were smart, there would be a totally generic helper
> layer that allows anyone to hook into the SCSI layer as a virtual SCSI
> disk provider in like 10 lines of code. :-)

The SCSI target code gives you the capability to do that (the SCSI
transport part, the virtual storage part is easy from there).

Jeff


2007-06-15 21:38:01

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Jeff Garzik <[email protected]>
Date: Fri, 15 Jun 2007 17:28:31 -0400

> David Miller wrote:
> > If the SCSI guys were smart, there would be a totally generic helper
> > layer that allows anyone to hook into the SCSI layer as a virtual SCSI
> > disk provider in like 10 lines of code. :-)
>
> The SCSI target code gives you the capability to do that (the SCSI
> transport part, the virtual storage part is easy from there).

I don't get to control the server side in my case, I'm a client
talking to something with a pre-determined interface.

2007-06-15 21:51:27

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

David Miller wrote:
> It's senseless to make people build new major/minor numbers for
> all these new quirky storage drivers. People have to add support
> for the new major number to installers and all kinds of other
> tools.


FWIW that's why libata followed the path it did: faking SCSI required
the least amount of code for the maximal amount of existing storage
driver and platform installer support.

Well, that and the fact that ATAPI is really SCSI, but that's offtopic
for this thread. :)

Jeff


2007-06-15 21:55:52

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Jeff Garzik <[email protected]>
Date: Fri, 15 Jun 2007 17:50:50 -0400

> FWIW that's why libata followed the path it did: faking SCSI required
> the least amount of code for the maximal amount of existing storage
> driver and platform installer support.
>
> Well, that and the fact that ATAPI is really SCSI, but that's offtopic
> for this thread. :)

Yep I absolutely understand that, and for my case as well it's
nearly always SCSI underneath that's why you can send SCSI
commands passthru to the virtual disk server if you like.

2007-06-15 22:41:16

by James Bottomley

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 2007-06-15 at 14:19 -0700, David Miller wrote:
> From: Alan Cox <[email protected]>
> Date: Fri, 15 Jun 2007 17:15:45 +0100
>
> > > > Any particular reason why this is done as a separate block device driver
> > > > rather than as SCSI?
> > >
> > > Because no new fake SCSI drivers are accepted anymore.
> >
> > Where did drivers/ata come from ;)
> >
> > How about making it a fake ata driver if James is being fussy 8)
>
> That sounds like a good idea for my virtual I/O case on
> Niagara too actually :-)

I have no objections to VIO servers speaking SCSI ... if you want to do
that, I suggest you look at an existing example, like
drivers/scsi/ibmvscsi (the actual communication piece is very ppc
hypervisor specific, though).

However, I do strongly believe that using SCSI for devices which truly
aren't SCSI (i.e. you have to code a shim in the driver between the SCSI
commands and whatever your device speaks) is wrong ... if SCSI has
features that the generic block layer doesn't, then we need to move the
function from SCSI to block to give you access ... in theory this was
the rationale for having libata as the last "fake" SCSI driver, so we
could identify this functionality and begin to move it.

> Another quirk I have to deal with is that under LDOMs you
> can export full disks and also just slices. So I'll have
> to get down into the partition machinery to support that
> somehow.

For this, it sounds like you might find nbd a more enticing
proposition ... it already is partition independent and is basically a
block to net socket exporter.

James


2007-06-15 23:06:32

by Alan

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

> That sounds like a good idea for my virtual I/O case on
> Niagara too actually :-)

It was actually meant semi-seriously in that drivers/ata sits on top of
the abstract parts of drivers/scsi which James keeps asking for someone to
get split properly off and which would sort all these out.

> Another quirk I have to deal with is that under LDOMs you
> can export full disks and also just slices. So I'll have
> to get down into the partition machinery to support that
> somehow.

If your slices don't fit the PC worldview you might want to look at
dmraid and just use device mapper to handle them. It is a lot more
flexible and we could actually bin all our partition code and use this
but for back compatibility.

Alan

2007-06-15 23:08:49

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: James Bottomley <[email protected]>
Date: Fri, 15 Jun 2007 15:40:42 -0700

> On Fri, 2007-06-15 at 14:19 -0700, David Miller wrote:
> > Another quirk I have to deal with is that under LDOMs you
> > can export full disks and also just slices. So I'll have
> > to get down into the partition machinery to support that
> > somehow.
>
> For this, it sounds like you might find nbd a more enticing
> proposition ... it already is partition independent and is basically a
> block to net socket exporter.

That's not gonna work, it's a totally different model.

I have a predefined protocol over hypervisor provided "channels" and
page flipping also done by the hypervisor for the bulk data transfer.
For the client side I cannot change the hypervisor nor the server
speaking on the other end. And when I do write a server I do want
it to be able to speak to all of the existing clients.

There's SCSI command pass through as well, as I keep mentioning as
it's an important reason I don't want to go with any of the non-SCSI
solutions (other than perhaps ATA) being suggested.

2007-06-15 23:28:25

by James Bottomley

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 2007-06-15 at 16:08 -0700, David Miller wrote:
> From: James Bottomley <[email protected]>
> Date: Fri, 15 Jun 2007 15:40:42 -0700
>
> > On Fri, 2007-06-15 at 14:19 -0700, David Miller wrote:
> > > Another quirk I have to deal with is that under LDOMs you
> > > can export full disks and also just slices. So I'll have
> > > to get down into the partition machinery to support that
> > > somehow.
> >
> > For this, it sounds like you might find nbd a more enticing
> > proposition ... it already is partition independent and is basically a
> > block to net socket exporter.
>
> That's not gonna work, it's a totally different model.
>
> I have a predefined protocol over hypervisor provided "channels" and
> page flipping also done by the hypervisor for the bulk data transfer.
> For the client side I cannot change the hypervisor nor the server
> speaking on the other end. And when I do write a server I do want
> it to be able to speak to all of the existing clients.
>
> There's SCSI command pass through as well, as I keep mentioning as
> it's an important reason I don't want to go with any of the non-SCSI
> solutions (other than perhaps ATA) being suggested.

Then sure, use SCSI ... the ibmvscsi client originally talked to some
type of hypervisor interface too before IBM extracted it and open
sourced the server. If actual SCSI commands are going in somewhere ...
be it a real device, a RAID firmware emulation or a hypervisor input,
then I'm happy with the driver being in SCSI.

James


2007-06-15 23:37:17

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: James Bottomley <[email protected]>
Date: Fri, 15 Jun 2007 16:28:03 -0700

> On Fri, 2007-06-15 at 16:08 -0700, David Miller wrote:
> > From: James Bottomley <[email protected]>
> > Date: Fri, 15 Jun 2007 15:40:42 -0700
> >
> > > On Fri, 2007-06-15 at 14:19 -0700, David Miller wrote:
> > > > Another quirk I have to deal with is that under LDOMs you
> > > > can export full disks and also just slices. So I'll have
> > > > to get down into the partition machinery to support that
> > > > somehow.
> > >
> > > For this, it sounds like you might find nbd a more enticing
> > > proposition ... it already is partition independent and is basically a
> > > block to net socket exporter.
> >
> > That's not gonna work, it's a totally different model.
> >
> > I have a predefined protocol over hypervisor provided "channels" and
> > page flipping also done by the hypervisor for the bulk data transfer.
> > For the client side I cannot change the hypervisor nor the server
> > speaking on the other end. And when I do write a server I do want
> > it to be able to speak to all of the existing clients.
> >
> > There's SCSI command pass through as well, as I keep mentioning as
> > it's an important reason I don't want to go with any of the non-SCSI
> > solutions (other than perhaps ATA) being suggested.
>
> Then sure, use SCSI ... the ibmvscsi client originally talked to some
> type of hypervisor interface too before IBM extracted it and open
> sourced the server. If actual SCSI commands are going in somewhere ...
> be it a real device, a RAID firmware emulation or a hypervisor input,
> then I'm happy with the driver being in SCSI.

For normal block I/O it's just raw copies over the provided protocol.

But the service exports a service by which raw SCSI commands can be
sent, for things like disk fault probing and stuff like that. It's
not for block I/O, it's for "all the funny stuff" scsi comands are
used for outside of actual data transfers.

2007-06-16 06:22:08

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, 15 Jun 2007, David Miller wrote:
> From: Alan Cox <[email protected]>
> Date: Fri, 15 Jun 2007 17:15:45 +0100
>
> > > > Any particular reason why this is done as a separate block device driver
> > > > rather than as SCSI?
> > >
> > > Because no new fake SCSI drivers are accepted anymore.
> >
> > Where did drivers/ata come from ;)
> >
> > How about making it a fake ata driver if James is being fussy 8)
>
> That sounds like a good idea for my virtual I/O case on
> Niagara too actually :-)
>
> Another quirk I have to deal with is that under LDOMs you
> can export full disks and also just slices. So I'll have
> to get down into the partition machinery to support that
> somehow.

The PS3 hypervisor also splits each device in regions (cfr. partitions), with
different access rights depending the OS that runs on top of it. The old
virtual SCSI driver treated each region as a separate LUN.

As currently only one region can be accessed by Linux anyway, I didn't bother
supporting multiple accessible regions in the new storage drivers.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-06-19 05:44:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, Jun 15, 2007 at 01:39:23PM +0200, Geert Uytterhoeven wrote:
> From: Geert Uytterhoeven <[email protected]>
>
> Add a Disk Storage Driver for the PS3:
> - Implemented as a block device driver with a dynamic major
> - Disk names (and partitions) are of the format ps3d%c(%u)
> - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> doesn't support scatter-gather

Looks good to me. Only nitpicks are:

- ps3disk_priv should probably be an inline function instead of a macro
- ps3disk_open is not needed at all and should be removed completely.
- please remove PS3DISK_MAJOR - and just pass 0 to register_blkdev.
Passing 0 to get an unassigned dev_t range is the normal convention
and giving it a suprising symbolic names makes reading the code a
little harder.
- put_disk in ps3disk_remove should be done after you finished tearing
down the device, not before
- calling free_irq directly followed by request_irq in ps3disk_probe
looks rather dumb. I'd rather move the request_irq out of
ps3stor_setup into the low-level driver.

2007-06-19 05:54:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, Jun 15, 2007 at 02:19:17PM -0700, David Miller wrote:
> Another quirk I have to deal with is that under LDOMs you
> can export full disks and also just slices. So I'll have
> to get down into the partition machinery to support that
> somehow.

What's the problem with that? Partition machinery is perfectly
fine with just getting a raw disk without a partition table,
and it obviously doesn't care if the raw disk it sees is part
of a partition on the storage server.

2007-06-19 05:57:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Fri, Jun 15, 2007 at 04:08:58PM -0700, David Miller wrote:
> That's not gonna work, it's a totally different model.
>
> I have a predefined protocol over hypervisor provided "channels" and
> page flipping also done by the hypervisor for the bulk data transfer.
> For the client side I cannot change the hypervisor nor the server
> speaking on the other end. And when I do write a server I do want
> it to be able to speak to all of the existing clients.
>
> There's SCSI command pass through as well, as I keep mentioning as
> it's an important reason I don't want to go with any of the non-SCSI
> solutions (other than perhaps ATA) being suggested.

A SCSI pass through is of course perfectly fine. If you have a separate
block passthrough that has additional magic a separate block driver is
the way to go because it actually is simpler than a scsi driver decoding
command blocks and translating them to deep magic. The ps3 storage
drivers this thread discussed are a good example for that. We now
have a very nice and simple disk, scsi and flash chardev driver each
that don't include abstractions layers and cruft. Combine that with
their initial scsi layer driver that was full of internal dispatches
because each of these device types speaks a completely different command
set.

2007-06-19 06:03:49

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Christoph Hellwig <[email protected]>
Date: Tue, 19 Jun 2007 07:53:58 +0200

> On Fri, Jun 15, 2007 at 02:19:17PM -0700, David Miller wrote:
> > Another quirk I have to deal with is that under LDOMs you
> > can export full disks and also just slices. So I'll have
> > to get down into the partition machinery to support that
> > somehow.
>
> What's the problem with that? Partition machinery is perfectly
> fine with just getting a raw disk without a partition table,
> and it obviously doesn't care if the raw disk it sees is part
> of a partition on the storage server.

Applications care :-)

One way of managing the partition table is by asking the storage
server for it. It supports giving back something that looks
like the Sun partition table layout, and also something that's
EFI based.

But the client can ignore that and read/write the partition
table itself.

What I really care about is what will work transparently for existing
userspace. In particular, distribution installers and existing tools
like fdisk.

When a slice it being exported, it's not being exported like that so
that the client can just spam a disk label into it, it's for exporting
one slice. So we might want (using the scsi naming as an arbitrary
example case) /dev/sda1 to be created but not /dev/sda

How exactly do you envision this kind of thing working?

Supporting the disk-server-provided partition table management
means providing ioctls() or whatever interface to get/set
the table. I had one idea to intercept reads/writes to the
first block(s) and translating those into partition management
commands to the disk server, but that is just too ugly to live. :-)

2007-06-19 06:07:26

by David Miller

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

From: Christoph Hellwig <[email protected]>
Date: Tue, 19 Jun 2007 07:56:29 +0200

> A SCSI pass through is of course perfectly fine. If you have a separate
> block passthrough that has additional magic a separate block driver is
> the way to go because it actually is simpler than a scsi driver decoding
> command blocks and translating them to deep magic. The ps3 storage
> drivers this thread discussed are a good example for that. We now
> have a very nice and simple disk, scsi and flash chardev driver each
> that don't include abstractions layers and cruft. Combine that with
> their initial scsi layer driver that was full of internal dispatches
> because each of these device types speaks a completely different command
> set.

That's how I'm currently writing my virtual disk client driver
for the Sun LDOMS stuff, as a block device.

The remaining issues are the partitioning (which we're discussing in
another thread) and how to export the scsi passthru support in such a
non-scsi block driver.

The main disk I/O block read and write is done using descriptors
sent to the disk server. SCSI pass-through is provided (optionally)
so that disk analysis tools can do things like MODE_SENSE on the
disk.

2007-06-19 07:06:59

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Mon, 18 Jun 2007, David Miller wrote:
> What I really care about is what will work transparently for existing
> userspace. In particular, distribution installers and existing tools
> like fdisk.
>
> When a slice it being exported, it's not being exported like that so
> that the client can just spam a disk label into it, it's for exporting
> one slice. So we might want (using the scsi naming as an arbitrary
> example case) /dev/sda1 to be created but not /dev/sda
>
> How exactly do you envision this kind of thing working?

How do distribution installers handle USB disks and memory cards? Some tend
to be partitioned, some don't.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-06-19 08:16:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Mon, Jun 18, 2007 at 11:07:31PM -0700, David Miller wrote:
> The main disk I/O block read and write is done using descriptors
> sent to the disk server. SCSI pass-through is provided (optionally)
> so that disk analysis tools can do things like MODE_SENSE on the
> disk.

SG_IO can easily be implemented for block devices. See cciss in
current mainline for an example.

2007-06-19 12:51:37

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Tue, 19 Jun 2007, Christoph Hellwig wrote:
> On Fri, Jun 15, 2007 at 01:39:23PM +0200, Geert Uytterhoeven wrote:
> > From: Geert Uytterhoeven <[email protected]>
> >
> > Add a Disk Storage Driver for the PS3:
> > - Implemented as a block device driver with a dynamic major
> > - Disk names (and partitions) are of the format ps3d%c(%u)
> > - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> > doesn't support scatter-gather
>
> Looks good to me. Only nitpicks are:
>
> - ps3disk_priv should probably be an inline function instead of a macro

I used a macro because you can do

ps3disk_pri(dev) = ...;

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Network and Software Technology Center Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone: +32 (0)2 700 8453
Fax: +32 (0)2 700 8622
E-mail: [email protected]
Internet: http://www.sony-europe.com/

Sony Network and Software Technology Center Europe
A division of Sony Service Centre (Europe) N.V.
Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium
VAT BE 0413.825.160 · RPR Brussels
Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619

2007-06-19 17:20:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/6] ps3: Disk Storage Driver

On Tue, Jun 19, 2007 at 02:51:25PM +0200, Geert Uytterhoeven wrote:
> On Tue, 19 Jun 2007, Christoph Hellwig wrote:
> > On Fri, Jun 15, 2007 at 01:39:23PM +0200, Geert Uytterhoeven wrote:
> > > From: Geert Uytterhoeven <[email protected]>
> > >
> > > Add a Disk Storage Driver for the PS3:
> > > - Implemented as a block device driver with a dynamic major
> > > - Disk names (and partitions) are of the format ps3d%c(%u)
> > > - Uses software scatter-gather with a 64 KiB bounce buffer as the hypervisor
> > > doesn't support scatter-gather
> >
> > Looks good to me. Only nitpicks are:
> >
> > - ps3disk_priv should probably be an inline function instead of a macro
>
> I used a macro because you can do
>
> ps3disk_pri(dev) = ...;

I'm not exactly a fan of macros used as lvalues, but if you really
want this it can go in.