2010-11-02 16:21:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 0/4] FLUSH/FUA updates for Xen blkfront

From: Jeremy Fitzhardinge <[email protected]>

Update the Xen blockfront driver to deal with the block layer's cache
flush and FUA operations.

The only primitive we have for implementing these is
BLKIF_OP_WRITE_BARRIER, which is a fully ordered write operation.
This is much stronger than a simple cache flush which Linux requires,
but it will do the job.

It can also implement FUA, since it will guarantee that the data is
written to stable storage when it completes - however it also
implements full ordering, which makes it equivalent to a FLUSH+FUA.

Unfortunately it appears that the actual Xen backend implementation
will fail an empty BLKIF_OP_WRITE_BARRIER operation. If that happens,
disable use of WRITE_BARRIER. (Daniel, I don't know if this is
deliberate or not; I couldn't see where the error was coming from).

Jeremy Fitzhardinge (4):
xen/blkfront: map REQ_FLUSH into a full barrier
xen/blkfront: change blk_shadow.request to proper pointer
xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER
xen/blkfront: cope with backend that fail empty
BLKIF_OP_WRITE_BARRIER requests

drivers/block/xen-blkfront.c | 53 +++++++++++++++++++++++------------------
1 files changed, 30 insertions(+), 23 deletions(-)

--
1.7.2.3


2010-11-02 16:21:39

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 1/4] xen/blkfront: map REQ_FLUSH into a full barrier

From: Jeremy Fitzhardinge <[email protected]>

Implement a flush as a full barrier, since we have nothing weaker.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/block/xen-blkfront.c | 19 +++++--------------
1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 06e2812..3a318d8 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -245,14 +245,11 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
}

/*
- * blkif_queue_request
+ * Generate a Xen blkfront IO request from a blk layer request. Reads
+ * and writes are handled as expected. Since we lack a loose flush
+ * request, we map flushes into a full ordered barrier.
*
- * request block io
- *
- * id: for guest use only.
- * operation: BLKIF_OP_{READ,WRITE,PROBE}
- * buffer: buffer to read/write into. this should be a
- * virtual address in the guest os.
+ * @req: a request struct
*/
static int blkif_queue_request(struct request *req)
{
@@ -289,7 +286,7 @@ static int blkif_queue_request(struct request *req)

ring_req->operation = rq_data_dir(req) ?
BLKIF_OP_WRITE : BLKIF_OP_READ;
- if (req->cmd_flags & REQ_HARDBARRIER)
+ if (req->cmd_flags & REQ_FLUSH)
ring_req->operation = BLKIF_OP_WRITE_BARRIER;

ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
@@ -1069,14 +1066,8 @@ static void blkfront_connect(struct blkfront_info *info)
*/
info->feature_flush = 0;

- /*
- * The driver doesn't properly handled empty flushes, so
- * lets disable barrier support for now.
- */
-#if 0
if (!err && barrier)
info->feature_flush = REQ_FLUSH;
-#endif

err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
if (err) {
--
1.7.2.3

2010-11-02 16:21:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 3/4] xen/blkfront: Implement FUA with BLKIF_OP_WRITE_BARRIER

From: Jeremy Fitzhardinge <[email protected]>

The BLKIF_OP_WRITE_BARRIER is a full ordered barrier, so we can use it
to implement FUA as well as a plain FLUSH.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
---
drivers/block/xen-blkfront.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 31c8a64..76b874a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -286,8 +286,18 @@ static int blkif_queue_request(struct request *req)

ring_req->operation = rq_data_dir(req) ?
BLKIF_OP_WRITE : BLKIF_OP_READ;
- if (req->cmd_flags & REQ_FLUSH)
+
+ if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
+ /*
+ * Ideally we could just do an unordered
+ * flush-to-disk, but all we have is a full write
+ * barrier at the moment. However, a barrier write is
+ * a superset of FUA, so we can implement it the same
+ * way. (It's also a FLUSH+FUA, since it is
+ * guaranteed ordered WRT previous writes.)
+ */
ring_req->operation = BLKIF_OP_WRITE_BARRIER;
+ }

ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -1065,7 +1075,7 @@ static void blkfront_connect(struct blkfront_info *info)
info->feature_flush = 0;

if (!err && barrier)
- info->feature_flush = REQ_FLUSH;
+ info->feature_flush = REQ_FLUSH | REQ_FUA;

err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size);
if (err) {
--
1.7.2.3

2010-11-02 16:21:46

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 4/4] xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests

From: Jeremy Fitzhardinge <[email protected]>

Some(?) Xen block backends fail BLKIF_OP_WRITE_BARRIER requests, which
Linux uses as a cache flush operation. In that case, disable use
of FLUSH.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Cc: Daniel Stodden <[email protected]>
---
drivers/block/xen-blkfront.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 76b874a..5e5a622 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -656,6 +656,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
printk(KERN_WARNING "blkfront: %s: write barrier op failed\n",
info->gd->disk_name);
error = -EOPNOTSUPP;
+ }
+ if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+ info->shadow[id].req.nr_segments == 0)) {
+ printk(KERN_WARNING "blkfront: %s: empty write barrier op failed\n",
+ info->gd->disk_name);
+ error = -EOPNOTSUPP;
+ }
+ if (unlikely(error)) {
info->feature_flush = 0;
xlvbd_flush(info);
}
--
1.7.2.3

2010-11-02 16:22:06

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 2/4] xen/blkfront: change blk_shadow.request to proper pointer

From: Jeremy Fitzhardinge <[email protected]>

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/block/xen-blkfront.c | 14 ++++++--------
1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3a318d8..31c8a64 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -65,7 +65,7 @@ enum blkif_state {

struct blk_shadow {
struct blkif_request req;
- unsigned long request;
+ struct request *request;
unsigned long frame[BLKIF_MAX_SEGMENTS_PER_REQUEST];
};

@@ -136,7 +136,7 @@ static void add_id_to_freelist(struct blkfront_info *info,
unsigned long id)
{
info->shadow[id].req.id = info->shadow_free;
- info->shadow[id].request = 0;
+ info->shadow[id].request = NULL;
info->shadow_free = id;
}

@@ -278,7 +278,7 @@ static int blkif_queue_request(struct request *req)
/* Fill out a communications ring structure. */
ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
id = get_id_from_freelist(info);
- info->shadow[id].request = (unsigned long)req;
+ info->shadow[id].request = req;

ring_req->id = id;
ring_req->sector_number = (blkif_sector_t)blk_rq_pos(req);
@@ -633,7 +633,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)

bret = RING_GET_RESPONSE(&info->ring, i);
id = bret->id;
- req = (struct request *)info->shadow[id].request;
+ req = info->shadow[id].request;

blkif_completion(&info->shadow[id]);

@@ -898,7 +898,7 @@ static int blkif_recover(struct blkfront_info *info)
/* Stage 3: Find pending requests and requeue them. */
for (i = 0; i < BLK_RING_SIZE; i++) {
/* Not in use? */
- if (copy[i].request == 0)
+ if (!copy[i].request)
continue;

/* Grab a request slot and copy shadow state into it. */
@@ -915,9 +915,7 @@ static int blkif_recover(struct blkfront_info *info)
req->seg[j].gref,
info->xbdev->otherend_id,
pfn_to_mfn(info->shadow[req->id].frame[j]),
- rq_data_dir(
- (struct request *)
- info->shadow[req->id].request));
+ rq_data_dir(info->shadow[req->id].request));
info->shadow[req->id].req = *req;

info->ring.req_prod_pvt++;
--
1.7.2.3

2010-11-02 16:55:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/4] xen/blkfront: cope with backend that fail empty BLKIF_OP_WRITE_BARRIER requests

> info->gd->disk_name);
> error = -EOPNOTSUPP;
> + }
> + if (unlikely(bret->status == BLKIF_RSP_ERROR &&
> + info->shadow[id].req.nr_segments == 0)) {
> + printk(KERN_WARNING "blkfront: %s: empty write barrier op failed\n",
> + info->gd->disk_name);
> + error = -EOPNOTSUPP;
> + }

We don't use -EOPNOTSUPP anymore in the new world order, anything
barrier related is just a normal I/O error now.