2021-07-08 12:44:32

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 0/3] xen: harden blkfront against malicious backends

Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately blkfront in the Linux kernel is fully trusting its
backend. This series is fixing blkfront in this regard.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

Changes in V2:
- put blkfront patches into own series
- some minor comments addressed

Juergen Gross (3):
xen/blkfront: read response from backend only once
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: don't trust the backend response data blindly

drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++------------
1 file changed, 80 insertions(+), 42 deletions(-)

--
2.26.2


2021-07-08 12:44:45

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 1/3] xen/blkfront: read response from backend only once

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8d49f8fa98bb..86356014d35e 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1539,7 +1539,7 @@ static bool blkif_completion(unsigned long *id,
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
{
struct request *req;
- struct blkif_response *bret;
+ struct blkif_response bret;
RING_IDX i, rp;
unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
@@ -1556,8 +1556,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;

- bret = RING_GET_RESPONSE(&rinfo->ring, i);
- id = bret->id;
+ RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
+ id = bret.id;
+
/*
* The backend has messed up and given us an id that we would
* never have given to it (we stamp it up to BLK_RING_SIZE -
@@ -1565,39 +1566,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
*/
if (id >= BLK_RING_SIZE(info)) {
WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
/* We can't safely get the 'struct request' as
* the id is busted. */
continue;
}
req = rinfo->shadow[id].request;

- if (bret->operation != BLKIF_OP_DISCARD) {
+ if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
* I/O request is split in 2
*/
- if (!blkif_completion(&id, rinfo, bret))
+ if (!blkif_completion(&id, rinfo, &bret))
continue;
}

if (add_id_to_freelist(rinfo, id)) {
WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
continue;
}

- if (bret->status == BLKIF_RSP_OKAY)
+ if (bret.status == BLKIF_RSP_OKAY)
blkif_req(req)->error = BLK_STS_OK;
else
blkif_req(req)->error = BLK_STS_IOERR;

- switch (bret->operation) {
+ switch (bret.operation) {
case BLKIF_OP_DISCARD:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
@@ -1607,15 +1608,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
- if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+ if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(blkif_req(req)->error)) {
@@ -1628,9 +1629,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
fallthrough;
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
- if (unlikely(bret->status != BLKIF_RSP_OKAY))
+ if (unlikely(bret.status != BLKIF_RSP_OKAY))
dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret->status);
+ "request: %x\n", bret.status);

break;
default:
--
2.26.2

2021-07-08 12:45:00

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 2/3] xen/blkfront: don't take local copy of a request from the ring page

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
---
V2:
- init variable to avoid potential compiler warning (Jan Beulich)
---
drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 86356014d35e..80701860870a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -546,7 +546,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
rinfo->shadow[id].status = REQ_WAITING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

- (*ring_req)->u.rw.id = id;
+ rinfo->shadow[id].req.u.rw.id = id;

return id;
}
@@ -554,11 +554,12 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
{
struct blkfront_info *info = rinfo->dev_info;
- struct blkif_request *ring_req;
+ struct blkif_request *ring_req, *final_ring_req;
unsigned long id;

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

ring_req->operation = BLKIF_OP_DISCARD;
ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -569,8 +570,8 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
else
ring_req->u.discard.flag = 0;

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy the request to the ring page. */
+ *final_ring_req = *ring_req;

return 0;
}
@@ -703,6 +704,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
{
struct blkfront_info *info = rinfo->dev_info;
struct blkif_request *ring_req, *extra_ring_req = NULL;
+ struct blkif_request *final_ring_req, *final_extra_ring_req = NULL;
unsigned long id, extra_id = NO_ASSOCIATED_ID;
bool require_extra_req = false;
int i;
@@ -747,7 +749,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
}

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
num_grant = 0;
@@ -798,7 +801,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
ring_req->u.rw.nr_segments = num_grant;
if (unlikely(require_extra_req)) {
extra_id = blkif_ring_get_request(rinfo, req,
- &extra_ring_req);
+ &final_extra_ring_req);
+ extra_ring_req = &rinfo->shadow[extra_id].req;
+
/*
* Only the first request contains the scatter-gather
* list.
@@ -840,10 +845,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
if (setup.segments)
kunmap_atomic(setup.segments);

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy request(s) to the ring page. */
+ *final_ring_req = *ring_req;
if (unlikely(require_extra_req))
- rinfo->shadow[extra_id].req = *extra_ring_req;
+ *final_extra_ring_req = *extra_ring_req;

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
--
2.26.2

2021-07-08 12:45:06

by Juergen Gross

[permalink] [raw]
Subject: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected. Recovering from
this state is possible only via removing and adding the virtual device
again (e.g. via a suspend/resume cycle).

Signed-off-by: Juergen Gross <[email protected]>
---
V2:
- use READ_ONCE() for reading the producer index
- check validity of producer index only after memory barrier (Jan Beulich)
- use virt_rmb() as barrier (Jan Beulich)
---
drivers/block/xen-blkfront.c | 66 ++++++++++++++++++++++++++----------
1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 80701860870a..ecdbb0381b4c 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -80,6 +80,7 @@ enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
BLKIF_STATE_SUSPENDED,
+ BLKIF_STATE_ERROR,
};

struct grant {
@@ -89,6 +90,7 @@ struct grant {
};

enum blk_req_status {
+ REQ_PROCESSING,
REQ_WAITING,
REQ_DONE,
REQ_ERROR,
@@ -543,7 +545,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,

id = get_id_from_freelist(rinfo);
rinfo->shadow[id].request = req;
- rinfo->shadow[id].status = REQ_WAITING;
+ rinfo->shadow[id].status = REQ_PROCESSING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

rinfo->shadow[id].req.u.rw.id = id;
@@ -572,6 +574,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf

/* Copy the request to the ring page. */
*final_ring_req = *ring_req;
+ rinfo->shadow[id].status = REQ_WAITING;

return 0;
}
@@ -847,8 +850,11 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri

/* Copy request(s) to the ring page. */
*final_ring_req = *ring_req;
- if (unlikely(require_extra_req))
+ rinfo->shadow[id].status = REQ_WAITING;
+ if (unlikely(require_extra_req)) {
*final_extra_ring_req = *extra_ring_req;
+ rinfo->shadow[extra_id].status = REQ_WAITING;
+ }

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
@@ -1402,8 +1408,8 @@ static enum blk_req_status blkif_rsp_to_req_status(int rsp)
static int blkif_get_final_status(enum blk_req_status s1,
enum blk_req_status s2)
{
- BUG_ON(s1 == REQ_WAITING);
- BUG_ON(s2 == REQ_WAITING);
+ BUG_ON(s1 < REQ_DONE);
+ BUG_ON(s2 < REQ_DONE);

if (s1 == REQ_ERROR || s2 == REQ_ERROR)
return BLKIF_RSP_ERROR;
@@ -1436,7 +1442,7 @@ static bool blkif_completion(unsigned long *id,
s->status = blkif_rsp_to_req_status(bret->status);

/* Wait the second response if not yet here. */
- if (s2->status == REQ_WAITING)
+ if (s2->status < REQ_DONE)
return false;

bret->status = blkif_get_final_status(s->status,
@@ -1555,11 +1561,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)

spin_lock_irqsave(&rinfo->ring_lock, flags);
again:
- rp = rinfo->ring.sring->rsp_prod;
- rmb(); /* Ensure we see queued responses up to 'rp'. */
+ rp = READ_ONCE(rinfo->ring.sring->rsp_prod);
+ virt_rmb(); /* Ensure we see queued responses up to 'rp'. */
+ if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
+ pr_alert("%s: illegal number of responses %u\n",
+ info->gd->disk_name, rp - rinfo->ring.rsp_cons);
+ goto err;
+ }

for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;
+ unsigned int op;

RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
id = bret.id;
@@ -1570,14 +1582,28 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
* look in get_id_from_freelist.
*/
if (id >= BLK_RING_SIZE(info)) {
- WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret.operation), id);
- /* We can't safely get the 'struct request' as
- * the id is busted. */
- continue;
+ pr_alert("%s: response has incorrect id (%ld)\n",
+ info->gd->disk_name, id);
+ goto err;
}
+ if (rinfo->shadow[id].status != REQ_WAITING) {
+ pr_alert("%s: response references no pending request\n",
+ info->gd->disk_name);
+ goto err;
+ }
+
+ rinfo->shadow[id].status = REQ_PROCESSING;
req = rinfo->shadow[id].request;

+ op = rinfo->shadow[id].req.operation;
+ if (op == BLKIF_OP_INDIRECT)
+ op = rinfo->shadow[id].req.u.indirect.indirect_op;
+ if (bret.operation != op) {
+ pr_alert("%s: response has wrong operation (%u instead of %u)\n",
+ info->gd->disk_name, bret.operation, op);
+ goto err;
+ }
+
if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
@@ -1602,7 +1628,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_DISCARD:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
@@ -1614,13 +1641,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
- printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
@@ -1635,8 +1662,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
if (unlikely(bret.status != BLKIF_RSP_OKAY))
- dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret.status);
+ dev_dbg_ratelimited(&info->xbdev->dev,
+ "Bad return from blkdev data request: %x\n", bret.status);

break;
default:
@@ -1662,6 +1689,11 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
spin_unlock_irqrestore(&rinfo->ring_lock, flags);

return IRQ_HANDLED;
+
+ err:
+ info->connected = BLKIF_STATE_ERROR;
+ pr_alert("%s disabled for further use\n", info->gd->disk_name);
+ return IRQ_HANDLED;
}


--
2.26.2

2021-07-08 13:13:18

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

On 08.07.2021 14:43, Juergen Gross wrote:
> Today blkfront will trust the backend to send only sane response data.
> In order to avoid privilege escalations or crashes in case of malicious
> backends verify the data to be within expected limits. Especially make
> sure that the response always references an outstanding request.
>
> Introduce a new state of the ring BLKIF_STATE_ERROR which will be
> switched to in case an inconsistency is being detected. Recovering from
> this state is possible only via removing and adding the virtual device
> again (e.g. via a suspend/resume cycle).
>
> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Jan Beulich <[email protected]>
albeit ...

> @@ -1602,7 +1628,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_DISCARD:
> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> struct request_queue *rq = info->rq;
> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> +
> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> info->feature_discard = 0;
> @@ -1614,13 +1641,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_FLUSH_DISKCACHE:
> case BLKIF_OP_WRITE_BARRIER:
> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> if (unlikely(bret.status == BLKIF_RSP_ERROR &&
> rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
> - printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
> + pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> @@ -1635,8 +1662,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_READ:
> case BLKIF_OP_WRITE:
> if (unlikely(bret.status != BLKIF_RSP_OKAY))
> - dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
> - "request: %x\n", bret.status);
> + dev_dbg_ratelimited(&info->xbdev->dev,
> + "Bad return from blkdev data request: %x\n", bret.status);
>
> break;
> default:

... all of these look kind of unrelated to the topic of the patch,
and the conversion also isn't mentioned as on-purpose in the
description.

Jan

2021-07-08 13:15:56

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

On 08.07.21 15:11, Jan Beulich wrote:
> On 08.07.2021 14:43, Juergen Gross wrote:
>> Today blkfront will trust the backend to send only sane response data.
>> In order to avoid privilege escalations or crashes in case of malicious
>> backends verify the data to be within expected limits. Especially make
>> sure that the response always references an outstanding request.
>>
>> Introduce a new state of the ring BLKIF_STATE_ERROR which will be
>> switched to in case an inconsistency is being detected. Recovering from
>> this state is possible only via removing and adding the virtual device
>> again (e.g. via a suspend/resume cycle).
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Reviewed-by: Jan Beulich <[email protected]>
> albeit ...
>
>> @@ -1602,7 +1628,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> case BLKIF_OP_DISCARD:
>> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
>> struct request_queue *rq = info->rq;
>> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>> +
>> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
>> info->gd->disk_name, op_name(bret.operation));
>> blkif_req(req)->error = BLK_STS_NOTSUPP;
>> info->feature_discard = 0;
>> @@ -1614,13 +1641,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> case BLKIF_OP_FLUSH_DISKCACHE:
>> case BLKIF_OP_WRITE_BARRIER:
>> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
>> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
>> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
>> info->gd->disk_name, op_name(bret.operation));
>> blkif_req(req)->error = BLK_STS_NOTSUPP;
>> }
>> if (unlikely(bret.status == BLKIF_RSP_ERROR &&
>> rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
>> - printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
>> + pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
>> info->gd->disk_name, op_name(bret.operation));
>> blkif_req(req)->error = BLK_STS_NOTSUPP;
>> }
>> @@ -1635,8 +1662,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> case BLKIF_OP_READ:
>> case BLKIF_OP_WRITE:
>> if (unlikely(bret.status != BLKIF_RSP_OKAY))
>> - dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
>> - "request: %x\n", bret.status);
>> + dev_dbg_ratelimited(&info->xbdev->dev,
>> + "Bad return from blkdev data request: %x\n", bret.status);
>>
>> break;
>> default:
>
> ... all of these look kind of unrelated to the topic of the patch,
> and the conversion also isn't mentioned as on-purpose in the
> description.

Hmm, yes, I'll add a sentence to the commit message.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-08 14:23:35

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] xen: harden blkfront against malicious backends

On Thu, Jul 08, 2021 at 02:43:42PM +0200, Juergen Gross wrote:
> Xen backends of para-virtualized devices can live in dom0 kernel, dom0
> user land, or in a driver domain. This means that a backend might
> reside in a less trusted environment than the Xen core components, so
> a backend should not be able to do harm to a Xen guest (it can still
> mess up I/O data, but it shouldn't be able to e.g. crash a guest by
> other means or cause a privilege escalation in the guest).
>
> Unfortunately blkfront in the Linux kernel is fully trusting its
> backend. This series is fixing blkfront in this regard.
>
> It was discussed to handle this as a security problem, but the topic
> was discussed in public before, so it isn't a real secret.

Wow. This looks like what Marek did .. in 2018!

https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg02336.html

Would it be worth crediting Marek?
>
> Changes in V2:
> - put blkfront patches into own series
> - some minor comments addressed
>
> Juergen Gross (3):
> xen/blkfront: read response from backend only once
> xen/blkfront: don't take local copy of a request from the ring page
> xen/blkfront: don't trust the backend response data blindly
>
> drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++------------
> 1 file changed, 80 insertions(+), 42 deletions(-)
>
> --
> 2.26.2
>

2021-07-08 14:41:18

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] xen: harden blkfront against malicious backends

On 08.07.21 16:22, Konrad Rzeszutek Wilk wrote:
> On Thu, Jul 08, 2021 at 02:43:42PM +0200, Juergen Gross wrote:
>> Xen backends of para-virtualized devices can live in dom0 kernel, dom0
>> user land, or in a driver domain. This means that a backend might
>> reside in a less trusted environment than the Xen core components, so
>> a backend should not be able to do harm to a Xen guest (it can still
>> mess up I/O data, but it shouldn't be able to e.g. crash a guest by
>> other means or cause a privilege escalation in the guest).
>>
>> Unfortunately blkfront in the Linux kernel is fully trusting its
>> backend. This series is fixing blkfront in this regard.
>>
>> It was discussed to handle this as a security problem, but the topic
>> was discussed in public before, so it isn't a real secret.
>
> Wow. This looks like what Marek did .. in 2018!
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg02336.html

Yes, seems to have been a similar goal.

> Would it be worth crediting Marek?

I'm fine mentioning his patches, but I didn't know of his patches until
having sent out V1 of my series.

I'd be interested in learning why his patches haven't been taken back
then.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-09 11:12:03

by Chen, Rong A

[permalink] [raw]

2021-07-09 13:57:47

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] xen/blkfront: don't take local copy of a request from the ring page

On 09.07.21 10:55, Roger Pau Monné wrote:
> On Thu, Jul 08, 2021 at 02:43:44PM +0200, Juergen Gross wrote:
>> In order to avoid a malicious backend being able to influence the local
>> copy of a request build the request locally first and then copy it to
>> the ring page instead of doing it the other way round as today.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> Reviewed-by: Jan Beulich <[email protected]>
>
> Acked-by: Roger Pau Monné <[email protected]>
>
> Thanks!
>
> One unrelated comment below.
>

...

>> @@ -798,7 +801,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>> ring_req->u.rw.nr_segments = num_grant;
>> if (unlikely(require_extra_req)) {
>> extra_id = blkif_ring_get_request(rinfo, req,
>> - &extra_ring_req);
>> + &final_extra_ring_req);
>> + extra_ring_req = &rinfo->shadow[extra_id].req;
>
> I'm slightly confused about this extra request stuff because I cannot
> find any check that asserts we have two empty slots on the ring before
> getting here (I only see a RING_FULL check in blkif_queue_rq).
>
> This is AFAIK only used on Arm when guest page size > 4KB.

I agree that this is a bug and should be fixed.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-09 14:00:13

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

On 09.07.21 11:42, Roger Pau Monné wrote:
> On Thu, Jul 08, 2021 at 02:43:45PM +0200, Juergen Gross wrote:
>> Today blkfront will trust the backend to send only sane response data.
>> In order to avoid privilege escalations or crashes in case of malicious
>> backends verify the data to be within expected limits. Especially make
>> sure that the response always references an outstanding request.
>>
>> Introduce a new state of the ring BLKIF_STATE_ERROR which will be
>> switched to in case an inconsistency is being detected. Recovering from
>> this state is possible only via removing and adding the virtual device
>> again (e.g. via a suspend/resume cycle).
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Acked-by: Roger Pau Monné <[email protected]>

>> @@ -1555,11 +1561,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>
>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>> again:
>> - rp = rinfo->ring.sring->rsp_prod;
>> - rmb(); /* Ensure we see queued responses up to 'rp'. */
>> + rp = READ_ONCE(rinfo->ring.sring->rsp_prod);
>> + virt_rmb(); /* Ensure we see queued responses up to 'rp'. */
>
> Is the READ_ONCE strictly needed? Doesn't the barrier prevent rp from
> not being loaded at this point?

I asked Jan the same and he didn't want to rule that out. Additionally
the READ_ONCE() helps against (rather improbable) load tearing of the
compiler.

>> + op = rinfo->shadow[id].req.operation;
>> + if (op == BLKIF_OP_INDIRECT)
>> + op = rinfo->shadow[id].req.u.indirect.indirect_op;
>> + if (bret.operation != op) {
>> + pr_alert("%s: response has wrong operation (%u instead of %u)\n",
>> + info->gd->disk_name, bret.operation, op);
>
> You could also use op_name here, but I guess this could mask the
> operation as 'unknown' for any number out of the defined ones.

This case shouldn't happen normally, so having the numerical value is
enough and will help for hiding any undefined op.

>> @@ -1635,8 +1662,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> case BLKIF_OP_READ:
>> case BLKIF_OP_WRITE:
>> if (unlikely(bret.status != BLKIF_RSP_OKAY))
>> - dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
>> - "request: %x\n", bret.status);
>> + dev_dbg_ratelimited(&info->xbdev->dev,
>> + "Bad return from blkdev data request: %x\n", bret.status);
>
> Since you are touching the line, could you use %#x here? It's IMO not
> obvious from the context this status will be printed in hex base. Also
> bret.status parameter could be split into a newline.

Fine with me.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments
Subject: Re: [PATCH v2 0/3] xen: harden blkfront against malicious backends

On Thu, Jul 08, 2021 at 04:39:58PM +0200, Juergen Gross wrote:
> On 08.07.21 16:22, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jul 08, 2021 at 02:43:42PM +0200, Juergen Gross wrote:
> > > Xen backends of para-virtualized devices can live in dom0 kernel, dom0
> > > user land, or in a driver domain. This means that a backend might
> > > reside in a less trusted environment than the Xen core components, so
> > > a backend should not be able to do harm to a Xen guest (it can still
> > > mess up I/O data, but it shouldn't be able to e.g. crash a guest by
> > > other means or cause a privilege escalation in the guest).
> > >
> > > Unfortunately blkfront in the Linux kernel is fully trusting its
> > > backend. This series is fixing blkfront in this regard.
> > >
> > > It was discussed to handle this as a security problem, but the topic
> > > was discussed in public before, so it isn't a real secret.
> >
> > Wow. This looks like what Marek did .. in 2018!
> >
> > https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg02336.html
>
> Yes, seems to have been a similar goal.
>
> > Would it be worth crediting Marek?
>
> I'm fine mentioning his patches, but I didn't know of his patches until
> having sent out V1 of my series.

Some email issue likely? You were on explicit CC in that series.

> I'd be interested in learning why his patches haven't been taken back
> then.

Mostly it was waiting in limbo on "public: add RING_COPY_RESPONSE()"[1] patch
to the Xen tree, to be then synchronized back to Linux headers. That patch
was finally committed in March this year. I should've followed up on it,
earlier than 3 years later...

[1] https://lore.kernel.org/xen-devel/[email protected]/T/#u

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


Attachments:
(No filename) (1.82 kB)
signature.asc (499.00 B)
Download all attachments

2021-07-30 10:10:30

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

On 08.07.21 14:43, Juergen Gross wrote:
> Today blkfront will trust the backend to send only sane response data.
> In order to avoid privilege escalations or crashes in case of malicious
> backends verify the data to be within expected limits. Especially make
> sure that the response always references an outstanding request.
>
> Introduce a new state of the ring BLKIF_STATE_ERROR which will be
> switched to in case an inconsistency is being detected. Recovering from
> this state is possible only via removing and adding the virtual device
> again (e.g. via a suspend/resume cycle).
>
> Signed-off-by: Juergen Gross <[email protected]>

Any comments for this patch?


Juergen

> ---
> V2:
> - use READ_ONCE() for reading the producer index
> - check validity of producer index only after memory barrier (Jan Beulich)
> - use virt_rmb() as barrier (Jan Beulich)
> ---
> drivers/block/xen-blkfront.c | 66 ++++++++++++++++++++++++++----------
> 1 file changed, 49 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 80701860870a..ecdbb0381b4c 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -80,6 +80,7 @@ enum blkif_state {
> BLKIF_STATE_DISCONNECTED,
> BLKIF_STATE_CONNECTED,
> BLKIF_STATE_SUSPENDED,
> + BLKIF_STATE_ERROR,
> };
>
> struct grant {
> @@ -89,6 +90,7 @@ struct grant {
> };
>
> enum blk_req_status {
> + REQ_PROCESSING,
> REQ_WAITING,
> REQ_DONE,
> REQ_ERROR,
> @@ -543,7 +545,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
>
> id = get_id_from_freelist(rinfo);
> rinfo->shadow[id].request = req;
> - rinfo->shadow[id].status = REQ_WAITING;
> + rinfo->shadow[id].status = REQ_PROCESSING;
> rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;
>
> rinfo->shadow[id].req.u.rw.id = id;
> @@ -572,6 +574,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
>
> /* Copy the request to the ring page. */
> *final_ring_req = *ring_req;
> + rinfo->shadow[id].status = REQ_WAITING;
>
> return 0;
> }
> @@ -847,8 +850,11 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>
> /* Copy request(s) to the ring page. */
> *final_ring_req = *ring_req;
> - if (unlikely(require_extra_req))
> + rinfo->shadow[id].status = REQ_WAITING;
> + if (unlikely(require_extra_req)) {
> *final_extra_ring_req = *extra_ring_req;
> + rinfo->shadow[extra_id].status = REQ_WAITING;
> + }
>
> if (new_persistent_gnts)
> gnttab_free_grant_references(setup.gref_head);
> @@ -1402,8 +1408,8 @@ static enum blk_req_status blkif_rsp_to_req_status(int rsp)
> static int blkif_get_final_status(enum blk_req_status s1,
> enum blk_req_status s2)
> {
> - BUG_ON(s1 == REQ_WAITING);
> - BUG_ON(s2 == REQ_WAITING);
> + BUG_ON(s1 < REQ_DONE);
> + BUG_ON(s2 < REQ_DONE);
>
> if (s1 == REQ_ERROR || s2 == REQ_ERROR)
> return BLKIF_RSP_ERROR;
> @@ -1436,7 +1442,7 @@ static bool blkif_completion(unsigned long *id,
> s->status = blkif_rsp_to_req_status(bret->status);
>
> /* Wait the second response if not yet here. */
> - if (s2->status == REQ_WAITING)
> + if (s2->status < REQ_DONE)
> return false;
>
> bret->status = blkif_get_final_status(s->status,
> @@ -1555,11 +1561,17 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>
> spin_lock_irqsave(&rinfo->ring_lock, flags);
> again:
> - rp = rinfo->ring.sring->rsp_prod;
> - rmb(); /* Ensure we see queued responses up to 'rp'. */
> + rp = READ_ONCE(rinfo->ring.sring->rsp_prod);
> + virt_rmb(); /* Ensure we see queued responses up to 'rp'. */
> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
> + pr_alert("%s: illegal number of responses %u\n",
> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
> + goto err;
> + }
>
> for (i = rinfo->ring.rsp_cons; i != rp; i++) {
> unsigned long id;
> + unsigned int op;
>
> RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
> id = bret.id;
> @@ -1570,14 +1582,28 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> * look in get_id_from_freelist.
> */
> if (id >= BLK_RING_SIZE(info)) {
> - WARN(1, "%s: response to %s has incorrect id (%ld)\n",
> - info->gd->disk_name, op_name(bret.operation), id);
> - /* We can't safely get the 'struct request' as
> - * the id is busted. */
> - continue;
> + pr_alert("%s: response has incorrect id (%ld)\n",
> + info->gd->disk_name, id);
> + goto err;
> }
> + if (rinfo->shadow[id].status != REQ_WAITING) {
> + pr_alert("%s: response references no pending request\n",
> + info->gd->disk_name);
> + goto err;
> + }
> +
> + rinfo->shadow[id].status = REQ_PROCESSING;
> req = rinfo->shadow[id].request;
>
> + op = rinfo->shadow[id].req.operation;
> + if (op == BLKIF_OP_INDIRECT)
> + op = rinfo->shadow[id].req.u.indirect.indirect_op;
> + if (bret.operation != op) {
> + pr_alert("%s: response has wrong operation (%u instead of %u)\n",
> + info->gd->disk_name, bret.operation, op);
> + goto err;
> + }
> +
> if (bret.operation != BLKIF_OP_DISCARD) {
> /*
> * We may need to wait for an extra response if the
> @@ -1602,7 +1628,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_DISCARD:
> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> struct request_queue *rq = info->rq;
> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> +
> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> info->feature_discard = 0;
> @@ -1614,13 +1641,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_FLUSH_DISKCACHE:
> case BLKIF_OP_WRITE_BARRIER:
> if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
> - printk(KERN_WARNING "blkfront: %s: %s op failed\n",
> + pr_warn_ratelimited("blkfront: %s: %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> if (unlikely(bret.status == BLKIF_RSP_ERROR &&
> rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
> - printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
> + pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
> info->gd->disk_name, op_name(bret.operation));
> blkif_req(req)->error = BLK_STS_NOTSUPP;
> }
> @@ -1635,8 +1662,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> case BLKIF_OP_READ:
> case BLKIF_OP_WRITE:
> if (unlikely(bret.status != BLKIF_RSP_OKAY))
> - dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
> - "request: %x\n", bret.status);
> + dev_dbg_ratelimited(&info->xbdev->dev,
> + "Bad return from blkdev data request: %x\n", bret.status);
>
> break;
> default:
> @@ -1662,6 +1689,11 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>
> return IRQ_HANDLED;
> +
> + err:
> + info->connected = BLKIF_STATE_ERROR;
> + pr_alert("%s disabled for further use\n", info->gd->disk_name);
> + return IRQ_HANDLED;
> }
>
>
>


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-30 10:33:31

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] xen/blkfront: don't trust the backend response data blindly

On 30.07.21 12:08, Juergen Gross wrote:
> On 08.07.21 14:43, Juergen Gross wrote:
>> Today blkfront will trust the backend to send only sane response data.
>> In order to avoid privilege escalations or crashes in case of malicious
>> backends verify the data to be within expected limits. Especially make
>> sure that the response always references an outstanding request.
>>
>> Introduce a new state of the ring BLKIF_STATE_ERROR which will be
>> switched to in case an inconsistency is being detected. Recovering from
>> this state is possible only via removing and adding the virtual device
>> again (e.g. via a suspend/resume cycle).
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Any comments for this patch?

Please ignore this request for comments, I already got some.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments