2021-07-30 10:39:59

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v3 0/3] xen: harden blkfront against malicious backends

Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately blkfront in the Linux kernel is fully trusting its
backend. This series is fixing blkfront in this regard.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

It should be mentioned that a similar series has been posted some years
ago by Marek Marczykowski-Górecki, but this series has not been applied
due to a Xen header not having been available in the Xen git repo at
that time. Additionally my series is fixing some more DoS cases.

Changes in V3:
- patch 3: insert missing unlock in error case (kernel test robot)
- patch 3: use %#x as format for printing wrong operation value
(Roger Pau Monné)

Changes in V2:
- put blkfront patches into own series
- some minor comments addressed

Juergen Gross (3):
xen/blkfront: read response from backend only once
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: don't trust the backend response data blindly

drivers/block/xen-blkfront.c | 126 +++++++++++++++++++++++------------
1 file changed, 84 insertions(+), 42 deletions(-)

--
2.26.2



2021-07-30 10:39:59

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v3 1/3] xen/blkfront: read response from backend only once

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
---
drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index d83fee21f6c5..15e840287734 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
{
struct request *req;
- struct blkif_response *bret;
+ struct blkif_response bret;
RING_IDX i, rp;
unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
@@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;

- bret = RING_GET_RESPONSE(&rinfo->ring, i);
- id = bret->id;
+ RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
+ id = bret.id;
+
/*
* The backend has messed up and given us an id that we would
* never have given to it (we stamp it up to BLK_RING_SIZE -
@@ -1522,39 +1523,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
*/
if (id >= BLK_RING_SIZE(info)) {
WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
/* We can't safely get the 'struct request' as
* the id is busted. */
continue;
}
req = rinfo->shadow[id].request;

- if (bret->operation != BLKIF_OP_DISCARD) {
+ if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
* I/O request is split in 2
*/
- if (!blkif_completion(&id, rinfo, bret))
+ if (!blkif_completion(&id, rinfo, &bret))
continue;
}

if (add_id_to_freelist(rinfo, id)) {
WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
continue;
}

- if (bret->status == BLKIF_RSP_OKAY)
+ if (bret.status == BLKIF_RSP_OKAY)
blkif_req(req)->error = BLK_STS_OK;
else
blkif_req(req)->error = BLK_STS_IOERR;

- switch (bret->operation) {
+ switch (bret.operation) {
case BLKIF_OP_DISCARD:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
@@ -1564,15 +1565,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
- if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+ if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(blkif_req(req)->error)) {
@@ -1585,9 +1586,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
fallthrough;
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
- if (unlikely(bret->status != BLKIF_RSP_OKAY))
+ if (unlikely(bret.status != BLKIF_RSP_OKAY))
dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret->status);
+ "request: %x\n", bret.status);

break;
default:
--
2.26.2


2021-07-30 10:41:12

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v3 3/3] xen/blkfront: don't trust the backend response data blindly

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected. Recovering from
this state is possible only via removing and adding the virtual device
again (e.g. via a suspend/resume cycle).

Make all warning messages issued due to valid error responses rate
limited in order to avoid message floods being triggered by a malicious
backend.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
---
V2:
- use READ_ONCE() for reading the producer index
- check validity of producer index only after memory barrier (Jan Beulich)
- use virt_rmb() as barrier (Jan Beulich)
V3:
- insert missing unlock in error case (kernel test robot)
- use %#x as format for printing wrong operation value (Roger Pau Monné)
---
drivers/block/xen-blkfront.c | 70 +++++++++++++++++++++++++++---------
1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b7301006fb28..7a5e62fae171 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -80,6 +80,7 @@ enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
BLKIF_STATE_SUSPENDED,
+ BLKIF_STATE_ERROR,
};

struct grant {
@@ -89,6 +90,7 @@ struct grant {
};

enum blk_req_status {
+ REQ_PROCESSING,
REQ_WAITING,
REQ_DONE,
REQ_ERROR,
@@ -530,7 +532,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,

id = get_id_from_freelist(rinfo);
rinfo->shadow[id].request = req;
- rinfo->shadow[id].status = REQ_WAITING;
+ rinfo->shadow[id].status = REQ_PROCESSING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

rinfo->shadow[id].req.u.rw.id = id;
@@ -559,6 +561,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf

/* Copy the request to the ring page. */
*final_ring_req = *ring_req;
+ rinfo->shadow[id].status = REQ_WAITING;

return 0;
}
@@ -834,8 +837,11 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri

/* Copy request(s) to the ring page. */
*final_ring_req = *ring_req;
- if (unlikely(require_extra_req))
+ rinfo->shadow[id].status = REQ_WAITING;
+ if (unlikely(require_extra_req)) {
*final_extra_ring_req = *extra_ring_req;
+ rinfo->shadow[extra_id].status = REQ_WAITING;
+ }

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
@@ -1359,8 +1365,8 @@ static enum blk_req_status blkif_rsp_to_req_status(int rsp)
static int blkif_get_final_status(enum blk_req_status s1,
enum blk_req_status s2)
{
- BUG_ON(s1 == REQ_WAITING);
- BUG_ON(s2 == REQ_WAITING);
+ BUG_ON(s1 < REQ_DONE);
+ BUG_ON(s2 < REQ_DONE);

if (s1 == REQ_ERROR || s2 == REQ_ERROR)
return BLKIF_RSP_ERROR;
@@ -1393,7 +1399,7 @@ static bool blkif_completion(unsigned long *id,
s->status = blkif_rsp_to_req_status(bret->status);

/* Wait the second response if not yet here. */
- if (s2->status == REQ_WAITING)
+ if (s2->status < REQ_DONE)
return false;

bret->status = blkif_get_final_status(s->status,
@@ -1512,11 +1518,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)

spin_lock_irqsave(&rinfo->ring_lock, flags);
again:
- rp = rinfo->ring.sring->rsp_prod;
- rmb(); /* Ensure we see queued responses up to 'rp'. */
+ rp = READ_ONCE(rinfo->ring.sring->rsp_prod);
+ virt_rmb(); /* Ensure we see queued responses up to 'rp'. */
+ if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
+ pr_alert("%s: illegal number of responses %u\n",
+ info->gd->disk_name, rp - rinfo->ring.rsp_cons);
+ goto err;
+ }

for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;
+ unsigned int op;

RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
id = bret.id;
@@ -1527,14 +1539,28 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
* look in get_id_from_freelist.
*/
if (id >= BLK_RING_SIZE(info)) {
- WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret.operation), id);
- /* We can't safely get the 'struct request' as
- * the id is busted. */
- continue;
+ pr_alert("%s: response has incorrect id (%ld)\n",
+ info->gd->disk_name, id);
+ goto err;
+ }
+ if (rinfo->shadow[id].status != REQ_WAITING) {
+ pr_alert("%s: response references no pending request\n",
+ info->gd->disk_name);
+ goto err;
}
+
+ rinfo->shadow[id].status = REQ_PROCESSING;
req = rinfo->shadow[id].request;

+ op = rinfo->shadow[id].req.operation;
+ if (op == BLKIF_OP_INDIRECT)
+ op = rinfo->shadow[id].req.u.indirect.indirect_op;
+ if (bret.operation != op) {
+ pr_alert("%s: response has wrong operation (%u instead of %u)\n",
+ info->gd->disk_name, bret.operation, op);
+ goto err;
+ }
+
if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
@@ -1559,7 +1585,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_DISCARD:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
@@ -1571,13 +1598,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
- printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
@@ -1592,8 +1619,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
if (unlikely(bret.status != BLKIF_RSP_OKAY))
- dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret.status);
+ dev_dbg_ratelimited(&info->xbdev->dev,
+ "Bad return from blkdev data request: %#x\n",
+ bret.status);

break;
default:
@@ -1619,6 +1647,14 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
spin_unlock_irqrestore(&rinfo->ring_lock, flags);

return IRQ_HANDLED;
+
+ err:
+ info->connected = BLKIF_STATE_ERROR;
+
+ spin_unlock_irqrestore(&rinfo->ring_lock, flags);
+
+ pr_alert("%s disabled for further use\n", info->gd->disk_name);
+ return IRQ_HANDLED;
}


--
2.26.2


2021-07-30 10:42:33

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
---
V2:
- init variable to avoid potential compiler warning (Jan Beulich)
---
drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 15e840287734..b7301006fb28 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -533,7 +533,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
rinfo->shadow[id].status = REQ_WAITING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

- (*ring_req)->u.rw.id = id;
+ rinfo->shadow[id].req.u.rw.id = id;

return id;
}
@@ -541,11 +541,12 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
{
struct blkfront_info *info = rinfo->dev_info;
- struct blkif_request *ring_req;
+ struct blkif_request *ring_req, *final_ring_req;
unsigned long id;

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

ring_req->operation = BLKIF_OP_DISCARD;
ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -556,8 +557,8 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
else
ring_req->u.discard.flag = 0;

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy the request to the ring page. */
+ *final_ring_req = *ring_req;

return 0;
}
@@ -690,6 +691,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
{
struct blkfront_info *info = rinfo->dev_info;
struct blkif_request *ring_req, *extra_ring_req = NULL;
+ struct blkif_request *final_ring_req, *final_extra_ring_req = NULL;
unsigned long id, extra_id = NO_ASSOCIATED_ID;
bool require_extra_req = false;
int i;
@@ -734,7 +736,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
}

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
num_grant = 0;
@@ -785,7 +788,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
ring_req->u.rw.nr_segments = num_grant;
if (unlikely(require_extra_req)) {
extra_id = blkif_ring_get_request(rinfo, req,
- &extra_ring_req);
+ &final_extra_ring_req);
+ extra_ring_req = &rinfo->shadow[extra_id].req;
+
/*
* Only the first request contains the scatter-gather
* list.
@@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
if (setup.segments)
kunmap_atomic(setup.segments);

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy request(s) to the ring page. */
+ *final_ring_req = *ring_req;
if (unlikely(require_extra_req))
- rinfo->shadow[extra_id].req = *extra_ring_req;
+ *final_extra_ring_req = *extra_ring_req;

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
--
2.26.2


2021-08-02 14:42:47

by Oleksandr Andrushchenko

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once

Hi, Juergen!

On 30.07.21 13:38, Juergen Gross wrote:
> In order to avoid problems in case the backend is modifying a response
> on the ring page while the frontend has already seen it, just read the
> response into a local buffer in one go and then operate on that buffer
> only.
>
> Signed-off-by: Juergen Gross <[email protected]>
> Reviewed-by: Jan Beulich <[email protected]>
> Acked-by: Roger Pau Monné <[email protected]>
> ---
> drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
> 1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index d83fee21f6c5..15e840287734 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
> static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> {
> struct request *req;
> - struct blkif_response *bret;
> + struct blkif_response bret;
> RING_IDX i, rp;
> unsigned long flags;
> struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> for (i = rinfo->ring.rsp_cons; i != rp; i++) {
> unsigned long id;
>
> - bret = RING_GET_RESPONSE(&rinfo->ring, i);
> - id = bret->id;
> + RING_COPY_RESPONSE(&rinfo->ring, i, &bret);

As per my understanding copying is still not an atomic operation as the request/response

are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while

we are copying?

Thanks,

Oleksandr

> + id = bret.id;
> +
> /*
> * The backend has messed up and given us an id that we would
> * never have given to it (we stamp it up to BLK_RING_SIZE -
> @@ -1522,39 +1523,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> */
> if (id >= BLK_RING_SIZE(info)) {
> WARN(1, "%s: response to %s has incorrect id (%ld)\n",
> - info->gd->disk_name, op_name(bret->operation), id);
> + info->gd->disk_name, op_name(bret.operation), id);
> /* We can't safely get the 'struct request' as
> * the id is busted. */
> continue;
> }
> req = rinfo->shadow[id].request;
>
> - if (bret->operation != BLKIF_OP_DISCARD) {
> + if (bret.operation != BLKIF_OP_DISCARD) {
> /*
> * We may need to wait for an extra response if the
> * I/O request is split in 2
> */
> - if (!blkif_completion(&id, rinfo, bret))
> + if (!blkif_completion(&id, rinfo, &bret))
> continue;
> }
>
> if (add_id_to_freelist(rinfo, id)) {
> WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
> - info->gd->disk_name, op_name(bret->operation), id);
> + info->gd->disk_name, op_name(bret.operation), id);
> continue;
> }
>
> - if (bret->status == BLKIF_RSP_OKAY)
> + if (bret.status == BLKIF_RSP_OKAY)
> blkif_req(req)->error = BLK_STS_OK;
> else
> blkif_req(req)->error = BLK_STS_IOERR;
>
> - switch (bret->operation) {
> + switch (bret.operation) {
> case BLKIF_OP_DISCARD:
> - if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> + if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> struct request_queue *rq = info->rq;
> printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> - info->gd->disk_name, op_name(bret->operation));
> + info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> info->feature_discard = 0;
> info->feature_secdiscard = 0;
> @@ -1564,15 +1565,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> break;
> case BLKIF_OP_FLUSH_DISKCACHE:
> case BLKIF_OP_WRITE_BARRIER:
> - if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
> + if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> - info->gd->disk_name, op_name(bret->operation));
> + info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> - if (unlikely(bret->status == BLKIF_RSP_ERROR &&
> + if (unlikely(bret.status == BLKIF_RSP_ERROR &&
> rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
> printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
> - info->gd->disk_name, op_name(bret->operation));
> + info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> if (unlikely(blkif_req(req)->error)) {
> @@ -1585,9 +1586,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> fallthrough;
> case BLKIF_OP_READ:
> case BLKIF_OP_WRITE:
> - if (unlikely(bret->status != BLKIF_RSP_OKAY))
> + if (unlikely(bret.status != BLKIF_RSP_OKAY))
> dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
> - "request: %x\n", bret->status);
> + "request: %x\n", bret.status);
>
> break;
> default:

2021-08-02 20:07:46

by Julien Grall

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once

Hi,

On 02/08/2021 15:06, Oleksandr Andrushchenko wrote:
> On 30.07.21 13:38, Juergen Gross wrote:
>> In order to avoid problems in case the backend is modifying a response
>> on the ring page while the frontend has already seen it, just read the
>> response into a local buffer in one go and then operate on that buffer
>> only.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> Reviewed-by: Jan Beulich <[email protected]>
>> Acked-by: Roger Pau Monné <[email protected]>
>> ---
>> drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
>> 1 file changed, 18 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index d83fee21f6c5..15e840287734 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
>> static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> {
>> struct request *req;
>> - struct blkif_response *bret;
>> + struct blkif_response bret;
>> RING_IDX i, rp;
>> unsigned long flags;
>> struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> for (i = rinfo->ring.rsp_cons; i != rp; i++) {
>> unsigned long id;
>>
>> - bret = RING_GET_RESPONSE(&rinfo->ring, i);
>> - id = bret->id;
>> + RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
>
> As per my understanding copying is still not an atomic operation as the request/response
>
> are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while
>
> we are copying?

Nothing and, I believe, you are never going to be able to ensure
atomicity with large structure (at least between entity that doesn't
trust each other).

However, what you can do is copying the response once, check that it is
consistent and then use it. If it is not consistent, then you can report
an error.

This is better than what's currently in tree. IOW we may have multiple
read so the code is prone to TOCTOU.

Cheers,

--
Julien Grall

2021-08-03 07:09:37

by Oleksandr Andrushchenko

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] xen/blkfront: read response from backend only once


On 02.08.21 22:26, Julien Grall wrote:
> Hi,
>
> On 02/08/2021 15:06, Oleksandr Andrushchenko wrote:
>> On 30.07.21 13:38, Juergen Gross wrote:
>>> In order to avoid problems in case the backend is modifying a response
>>> on the ring page while the frontend has already seen it, just read the
>>> response into a local buffer in one go and then operate on that buffer
>>> only.
>>>
>>> Signed-off-by: Juergen Gross <[email protected]>
>>> Reviewed-by: Jan Beulich <[email protected]>
>>> Acked-by: Roger Pau Monné <[email protected]>
>>> ---
>>>    drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
>>>    1 file changed, 18 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index d83fee21f6c5..15e840287734 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1496,7 +1496,7 @@ static bool blkif_completion(unsigned long *id,
>>>    static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>    {
>>>        struct request *req;
>>> -    struct blkif_response *bret;
>>> +    struct blkif_response bret;
>>>        RING_IDX i, rp;
>>>        unsigned long flags;
>>>        struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>>> @@ -1513,8 +1513,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>        for (i = rinfo->ring.rsp_cons; i != rp; i++) {
>>>            unsigned long id;
>>>    -        bret = RING_GET_RESPONSE(&rinfo->ring, i);
>>> -        id   = bret->id;
>>> +        RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
>>
>> As per my understanding copying is still not an atomic operation as the request/response
>>
>> are multi-byte structures in general. IOW, what prevents the backend from modifying the ring while
>>
>> we are copying?
>
> Nothing and, I believe, you are never going to be able to ensure atomicity with large structure (at least between entity that doesn't trust each other).
>
> However, what you can do is copying the response once, check that it is consistent and then use it. If it is not consistent, then you can report an error.
>
> This is better than what's currently in tree. IOW we may have multiple read so the code is prone to TOCTOU.

Agree,

Thanks

>
> Cheers,
>

2021-08-30 10:17:28

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] xen: harden blkfront against malicious backends

On 30.07.21 12:38, Juergen Gross wrote:
> Xen backends of para-virtualized devices can live in dom0 kernel, dom0
> user land, or in a driver domain. This means that a backend might
> reside in a less trusted environment than the Xen core components, so
> a backend should not be able to do harm to a Xen guest (it can still
> mess up I/O data, but it shouldn't be able to e.g. crash a guest by
> other means or cause a privilege escalation in the guest).
>
> Unfortunately blkfront in the Linux kernel is fully trusting its
> backend. This series is fixing blkfront in this regard.
>
> It was discussed to handle this as a security problem, but the topic
> was discussed in public before, so it isn't a real secret.
>
> It should be mentioned that a similar series has been posted some years
> ago by Marek Marczykowski-Górecki, but this series has not been applied
> due to a Xen header not having been available in the Xen git repo at
> that time. Additionally my series is fixing some more DoS cases.
>
> Changes in V3:
> - patch 3: insert missing unlock in error case (kernel test robot)
> - patch 3: use %#x as format for printing wrong operation value
> (Roger Pau Monné)
>
> Changes in V2:
> - put blkfront patches into own series
> - some minor comments addressed
>
> Juergen Gross (3):
> xen/blkfront: read response from backend only once
> xen/blkfront: don't take local copy of a request from the ring page
> xen/blkfront: don't trust the backend response data blindly
>
> drivers/block/xen-blkfront.c | 126 +++++++++++++++++++++++------------
> 1 file changed, 84 insertions(+), 42 deletions(-)
>

Series pushed to xen/tip.git for-linus-5.15


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments
Subject: Re: [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page

On Fri, Jul 30, 2021 at 12:38:53PM +0200, Juergen Gross wrote:
> In order to avoid a malicious backend being able to influence the local
> copy of a request build the request locally first and then copy it to
> the ring page instead of doing it the other way round as today.
>
> Signed-off-by: Juergen Gross <[email protected]>
> Reviewed-by: Jan Beulich <[email protected]>
> Acked-by: Roger Pau Monné <[email protected]>
> ---
> V2:
> - init variable to avoid potential compiler warning (Jan Beulich)
> ---
> drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 15e840287734..b7301006fb28 100644

(...)

> @@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
> if (setup.segments)
> kunmap_atomic(setup.segments);
>
> - /* Keep a private copy so we can reissue requests when recovering. */
> - rinfo->shadow[id].req = *ring_req;
> + /* Copy request(s) to the ring page. */
> + *final_ring_req = *ring_req;

Is this guaranteed to not be optimized by the compiler in an unsafe way
(like, do the operation the other way around)?
My version of the patch had "wmb()" just before, maybe a good idea to
add it here too?

> if (unlikely(require_extra_req))
> - rinfo->shadow[extra_id].req = *extra_ring_req;
> + *final_extra_ring_req = *extra_ring_req;
>
> if (new_persistent_gnts)
> gnttab_free_grant_references(setup.gref_head);
> --
> 2.26.2
>
>

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


Attachments:
(No filename) (1.64 kB)
signature.asc (499.00 B)
Download all attachments

2021-09-10 10:38:46

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] xen/blkfront: don't take local copy of a request from the ring page

On 10.09.21 12:14, Marek Marczykowski-Górecki wrote:
> On Fri, Jul 30, 2021 at 12:38:53PM +0200, Juergen Gross wrote:
>> In order to avoid a malicious backend being able to influence the local
>> copy of a request build the request locally first and then copy it to
>> the ring page instead of doing it the other way round as today.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> Reviewed-by: Jan Beulich <[email protected]>
>> Acked-by: Roger Pau Monné <[email protected]>
>> ---
>> V2:
>> - init variable to avoid potential compiler warning (Jan Beulich)
>> ---
>> drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
>> 1 file changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 15e840287734..b7301006fb28 100644
>
> (...)
>
>> @@ -827,10 +832,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>> if (setup.segments)
>> kunmap_atomic(setup.segments);
>>
>> - /* Keep a private copy so we can reissue requests when recovering. */
>> - rinfo->shadow[id].req = *ring_req;
>> + /* Copy request(s) to the ring page. */
>> + *final_ring_req = *ring_req;
>
> Is this guaranteed to not be optimized by the compiler in an unsafe way
> (like, do the operation the other way around)?

I don't think the C standard allows that. AFAIK reordering writes is
allowed only between sequence points. And each external function call is
a sequence point, making such an optimization in our case illegal.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments