2021-05-13 10:18:07

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 0/8] xen: harden frontends against malicious backends

Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately many frontends in the Linux kernel are fully trusting
their respective backends. This series is starting to fix the most
important frontends: console, disk and network.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

Juergen Gross (8):
xen: sync include/xen/interface/io/ring.h with Xen's newest version
xen/blkfront: read response from backend only once
xen/blkfront: don't take local copy of a request from the ring page
xen/blkfront: don't trust the backend response data blindly
xen/netfront: read response from backend only once
xen/netfront: don't read data from request on the ring page
xen/netfront: don't trust the backend response data blindly
xen/hvc: replace BUG_ON() with negative return value

drivers/block/xen-blkfront.c | 118 +++++++++-----
drivers/net/xen-netfront.c | 184 ++++++++++++++-------
drivers/tty/hvc/hvc_xen.c | 15 +-
include/xen/interface/io/ring.h | 278 ++++++++++++++++++--------------
4 files changed, 369 insertions(+), 226 deletions(-)

--
2.26.2



2021-05-13 10:21:05

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 3/8] xen/blkfront: don't take local copy of a request from the ring page

In order to avoid a malicious backend being able to influence the local
copy of a request build the request locally first and then copy it to
the ring page instead of doing it the other way round as today.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a8b56c153330..c6a05de4f15f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -546,7 +546,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
rinfo->shadow[id].status = REQ_WAITING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

- (*ring_req)->u.rw.id = id;
+ rinfo->shadow[id].req.u.rw.id = id;

return id;
}
@@ -554,11 +554,12 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,
static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_info *rinfo)
{
struct blkfront_info *info = rinfo->dev_info;
- struct blkif_request *ring_req;
+ struct blkif_request *ring_req, *final_ring_req;
unsigned long id;

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

ring_req->operation = BLKIF_OP_DISCARD;
ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
@@ -569,8 +570,8 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf
else
ring_req->u.discard.flag = 0;

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy the request to the ring page. */
+ *final_ring_req = *ring_req;

return 0;
}
@@ -703,6 +704,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
{
struct blkfront_info *info = rinfo->dev_info;
struct blkif_request *ring_req, *extra_ring_req = NULL;
+ struct blkif_request *final_ring_req, *final_extra_ring_req;
unsigned long id, extra_id = NO_ASSOCIATED_ID;
bool require_extra_req = false;
int i;
@@ -747,7 +749,8 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
}

/* Fill out a communications ring structure. */
- id = blkif_ring_get_request(rinfo, req, &ring_req);
+ id = blkif_ring_get_request(rinfo, req, &final_ring_req);
+ ring_req = &rinfo->shadow[id].req;

num_sg = blk_rq_map_sg(req->q, req, rinfo->shadow[id].sg);
num_grant = 0;
@@ -798,7 +801,9 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
ring_req->u.rw.nr_segments = num_grant;
if (unlikely(require_extra_req)) {
extra_id = blkif_ring_get_request(rinfo, req,
- &extra_ring_req);
+ &final_extra_ring_req);
+ extra_ring_req = &rinfo->shadow[extra_id].req;
+
/*
* Only the first request contains the scatter-gather
* list.
@@ -840,10 +845,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
if (setup.segments)
kunmap_atomic(setup.segments);

- /* Keep a private copy so we can reissue requests when recovering. */
- rinfo->shadow[id].req = *ring_req;
+ /* Copy request(s) to the ring page. */
+ *final_ring_req = *ring_req;
if (unlikely(require_extra_req))
- rinfo->shadow[extra_id].req = *extra_ring_req;
+ *final_extra_ring_req = *extra_ring_req;

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
--
2.26.2


2021-05-13 10:23:17

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 6/8] xen/netfront: don't read data from request on the ring page

In order to avoid a malicious backend being able to influence the local
processing of a request build the request locally first and then copy
it to the ring page. Any reading from the request needs to be done on
the local instance.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/net/xen-netfront.c | 75 ++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f91e41ece554..261c35be0147 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -435,7 +435,8 @@ struct xennet_gnttab_make_txreq {
struct netfront_queue *queue;
struct sk_buff *skb;
struct page *page;
- struct xen_netif_tx_request *tx; /* Last request */
+ struct xen_netif_tx_request *tx; /* Last request on ring page */
+ struct xen_netif_tx_request tx_local; /* Last request local copy*/
unsigned int size;
};

@@ -463,30 +464,27 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
queue->grant_tx_page[id] = page;
queue->grant_tx_ref[id] = ref;

- tx->id = id;
- tx->gref = ref;
- tx->offset = offset;
- tx->size = len;
- tx->flags = 0;
+ info->tx_local.id = id;
+ info->tx_local.gref = ref;
+ info->tx_local.offset = offset;
+ info->tx_local.size = len;
+ info->tx_local.flags = 0;
+
+ *tx = info->tx_local;

info->tx = tx;
- info->size += tx->size;
+ info->size += info->tx_local.size;
}

static struct xen_netif_tx_request *xennet_make_first_txreq(
- struct netfront_queue *queue, struct sk_buff *skb,
- struct page *page, unsigned int offset, unsigned int len)
+ struct xennet_gnttab_make_txreq *info,
+ unsigned int offset, unsigned int len)
{
- struct xennet_gnttab_make_txreq info = {
- .queue = queue,
- .skb = skb,
- .page = page,
- .size = 0,
- };
+ info->size = 0;

- gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+ gnttab_for_one_grant(info->page, offset, len, xennet_tx_setup_grant, info);

- return info.tx;
+ return info->tx;
}

static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
@@ -499,35 +497,27 @@ static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
xennet_tx_setup_grant(gfn, offset, len, data);
}

-static struct xen_netif_tx_request *xennet_make_txreqs(
- struct netfront_queue *queue, struct xen_netif_tx_request *tx,
- struct sk_buff *skb, struct page *page,
+static void xennet_make_txreqs(
+ struct xennet_gnttab_make_txreq *info,
+ struct page *page,
unsigned int offset, unsigned int len)
{
- struct xennet_gnttab_make_txreq info = {
- .queue = queue,
- .skb = skb,
- .tx = tx,
- };
-
/* Skip unused frames from start of page */
page += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;

while (len) {
- info.page = page;
- info.size = 0;
+ info->page = page;
+ info->size = 0;

gnttab_foreach_grant_in_range(page, offset, len,
xennet_make_one_txreq,
- &info);
+ info);

page++;
offset = 0;
- len -= info.size;
+ len -= info->size;
}
-
- return info.tx;
}

/*
@@ -580,10 +570,14 @@ static int xennet_xdp_xmit_one(struct net_device *dev,
{
struct netfront_info *np = netdev_priv(dev);
struct netfront_stats *tx_stats = this_cpu_ptr(np->tx_stats);
+ struct xennet_gnttab_make_txreq info = {
+ .queue = queue,
+ .skb = NULL,
+ .page = virt_to_page(xdpf->data),
+ };
int notify;

- xennet_make_first_txreq(queue, NULL,
- virt_to_page(xdpf->data),
+ xennet_make_first_txreq(&info,
offset_in_page(xdpf->data),
xdpf->len);

@@ -647,6 +641,7 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
unsigned int len;
unsigned long flags;
struct netfront_queue *queue = NULL;
+ struct xennet_gnttab_make_txreq info = { };
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;
struct sk_buff *nskb;
@@ -704,14 +699,16 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
}

/* First request for the linear area. */
- first_tx = tx = xennet_make_first_txreq(queue, skb,
- page, offset, len);
+ info.queue = queue;
+ info.skb = skb;
+ info.page = page;
+ first_tx = tx = xennet_make_first_txreq(&info, offset, len);
offset += tx->size;
if (offset == PAGE_SIZE) {
page++;
offset = 0;
}
- len -= tx->size;
+ len -= info.tx_local.size;

if (skb->ip_summed == CHECKSUM_PARTIAL)
/* local packet? */
@@ -741,12 +738,12 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
}

/* Requests for the rest of the linear area. */
- tx = xennet_make_txreqs(queue, tx, skb, page, offset, len);
+ xennet_make_txreqs(&info, page, offset, len);

/* Requests for all the frags. */
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
- tx = xennet_make_txreqs(queue, tx, skb, skb_frag_page(frag),
+ xennet_make_txreqs(&info, skb_frag_page(frag),
skb_frag_off(frag),
skb_frag_size(frag));
}
--
2.26.2


2021-05-13 10:24:42

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/tty/hvc/hvc_xen.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb(); /* update queue values before going on */
+
+ if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+ return -EINVAL;
+
BUG_ON((prod - cons) > sizeof(intf->out));

while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
@@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len)
*/
while (len) {
int sent = __write_console(cons, data, len);
-
+
+ if (sent < 0)
+ return sent;
+
data += sent;
len -= sent;

@@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len)
cons = intf->in_cons;
prod = intf->in_prod;
mb(); /* get pointers before reading ring */
- BUG_ON((prod - cons) > sizeof(intf->in));
+
+ if (WARN_ONCE((prod - cons) > sizeof(intf->in),
+ "Illegal ring page indices"))
+ return -EINVAL;

while (cons != prod && recv < len)
buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
--
2.26.2


2021-05-13 11:31:33

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 1/8] xen: sync include/xen/interface/io/ring.h with Xen's newest version

Sync include/xen/interface/io/ring.h with Xen's newest version in
order to get the RING_COPY_RESPONSE() and RING_RESPONSE_PROD_OVERFLOW()
macros.

Note that this will correct the wrong license info by adding the
missing original copyright notice.

Signed-off-by: Juergen Gross <[email protected]>
---
include/xen/interface/io/ring.h | 278 ++++++++++++++++++--------------
1 file changed, 156 insertions(+), 122 deletions(-)

diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index 2af7a1cd6658..b39cdbc522ec 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -1,21 +1,53 @@
-/* SPDX-License-Identifier: GPL-2.0 */
/******************************************************************************
* ring.h
*
* Shared producer-consumer ring macros.
*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
* Tim Deegan and Andrew Warfield November 2004.
*/

#ifndef __XEN_PUBLIC_IO_RING_H__
#define __XEN_PUBLIC_IO_RING_H__

+/*
+ * When #include'ing this header, you need to provide the following
+ * declaration upfront:
+ * - standard integers types (uint8_t, uint16_t, etc)
+ * They are provided by stdint.h of the standard headers.
+ *
+ * In addition, if you intend to use the FLEX macros, you also need to
+ * provide the following, before invoking the FLEX macros:
+ * - size_t
+ * - memcpy
+ * - grant_ref_t
+ * These declarations are provided by string.h of the standard headers,
+ * and grant_table.h from the Xen public headers.
+ */
+
#include <xen/interface/grant_table.h>

typedef unsigned int RING_IDX;

/* Round a 32-bit unsigned constant down to the nearest power of two. */
-#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1))
+#define __RD2(_x) (((_x) & 0x00000002) ? 0x2 : ((_x) & 0x1))
#define __RD4(_x) (((_x) & 0x0000000c) ? __RD2((_x)>>2)<<2 : __RD2(_x))
#define __RD8(_x) (((_x) & 0x000000f0) ? __RD4((_x)>>4)<<4 : __RD4(_x))
#define __RD16(_x) (((_x) & 0x0000ff00) ? __RD8((_x)>>8)<<8 : __RD8(_x))
@@ -27,82 +59,79 @@ typedef unsigned int RING_IDX;
* A ring contains as many entries as will fit, rounded down to the nearest
* power of two (so we can mask with (size-1) to loop around).
*/
-#define __CONST_RING_SIZE(_s, _sz) \
- (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \
- sizeof(((struct _s##_sring *)0)->ring[0])))
-
+#define __CONST_RING_SIZE(_s, _sz) \
+ (__RD32(((_sz) - offsetof(struct _s##_sring, ring)) / \
+ sizeof(((struct _s##_sring *)0)->ring[0])))
/*
* The same for passing in an actual pointer instead of a name tag.
*/
-#define __RING_SIZE(_s, _sz) \
- (__RD32(((_sz) - (long)&(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))
+#define __RING_SIZE(_s, _sz) \
+ (__RD32(((_sz) - (long)(_s)->ring + (long)(_s)) / sizeof((_s)->ring[0])))

/*
* Macros to make the correct C datatypes for a new kind of ring.
*
* To make a new ring datatype, you need to have two message structures,
- * let's say struct request, and struct response already defined.
+ * let's say request_t, and response_t already defined.
*
* In a header where you want the ring datatype declared, you then do:
*
- * DEFINE_RING_TYPES(mytag, struct request, struct response);
+ * DEFINE_RING_TYPES(mytag, request_t, response_t);
*
* These expand out to give you a set of types, as you can see below.
* The most important of these are:
*
- * struct mytag_sring - The shared ring.
- * struct mytag_front_ring - The 'front' half of the ring.
- * struct mytag_back_ring - The 'back' half of the ring.
+ * mytag_sring_t - The shared ring.
+ * mytag_front_ring_t - The 'front' half of the ring.
+ * mytag_back_ring_t - The 'back' half of the ring.
*
* To initialize a ring in your code you need to know the location and size
* of the shared memory area (PAGE_SIZE, for instance). To initialise
* the front half:
*
- * struct mytag_front_ring front_ring;
- * SHARED_RING_INIT((struct mytag_sring *)shared_page);
- * FRONT_RING_INIT(&front_ring, (struct mytag_sring *)shared_page,
- * PAGE_SIZE);
+ * mytag_front_ring_t front_ring;
+ * SHARED_RING_INIT((mytag_sring_t *)shared_page);
+ * FRONT_RING_INIT(&front_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
*
* Initializing the back follows similarly (note that only the front
* initializes the shared ring):
*
- * struct mytag_back_ring back_ring;
- * BACK_RING_INIT(&back_ring, (struct mytag_sring *)shared_page,
- * PAGE_SIZE);
+ * mytag_back_ring_t back_ring;
+ * BACK_RING_INIT(&back_ring, (mytag_sring_t *)shared_page, PAGE_SIZE);
*/

-#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \
- \
-/* Shared ring entry */ \
-union __name##_sring_entry { \
- __req_t req; \
- __rsp_t rsp; \
-}; \
- \
-/* Shared ring page */ \
-struct __name##_sring { \
- RING_IDX req_prod, req_event; \
- RING_IDX rsp_prod, rsp_event; \
- uint8_t pad[48]; \
- union __name##_sring_entry ring[1]; /* variable-length */ \
-}; \
- \
-/* "Front" end's private variables */ \
-struct __name##_front_ring { \
- RING_IDX req_prod_pvt; \
- RING_IDX rsp_cons; \
- unsigned int nr_ents; \
- struct __name##_sring *sring; \
-}; \
- \
-/* "Back" end's private variables */ \
-struct __name##_back_ring { \
- RING_IDX rsp_prod_pvt; \
- RING_IDX req_cons; \
- unsigned int nr_ents; \
- struct __name##_sring *sring; \
-};
-
+#define DEFINE_RING_TYPES(__name, __req_t, __rsp_t) \
+ \
+/* Shared ring entry */ \
+union __name##_sring_entry { \
+ __req_t req; \
+ __rsp_t rsp; \
+}; \
+ \
+/* Shared ring page */ \
+struct __name##_sring { \
+ RING_IDX req_prod, req_event; \
+ RING_IDX rsp_prod, rsp_event; \
+ uint8_t __pad[48]; \
+ union __name##_sring_entry ring[1]; /* variable-length */ \
+}; \
+ \
+/* "Front" end's private variables */ \
+struct __name##_front_ring { \
+ RING_IDX req_prod_pvt; \
+ RING_IDX rsp_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
+/* "Back" end's private variables */ \
+struct __name##_back_ring { \
+ RING_IDX rsp_prod_pvt; \
+ RING_IDX req_cons; \
+ unsigned int nr_ents; \
+ struct __name##_sring *sring; \
+}; \
+ \
/*
* Macros for manipulating rings.
*
@@ -119,94 +148,99 @@ struct __name##_back_ring { \
*/

/* Initialising empty rings */
-#define SHARED_RING_INIT(_s) do { \
- (_s)->req_prod = (_s)->rsp_prod = 0; \
- (_s)->req_event = (_s)->rsp_event = 1; \
- memset((_s)->pad, 0, sizeof((_s)->pad)); \
+#define SHARED_RING_INIT(_s) do { \
+ (_s)->req_prod = (_s)->rsp_prod = 0; \
+ (_s)->req_event = (_s)->rsp_event = 1; \
+ (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \
} while(0)

-#define FRONT_RING_ATTACH(_r, _s, _i, __size) do { \
- (_r)->req_prod_pvt = (_i); \
- (_r)->rsp_cons = (_i); \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
- (_r)->sring = (_s); \
+#define FRONT_RING_ATTACH(_r, _s, _i, __size) do { \
+ (_r)->req_prod_pvt = (_i); \
+ (_r)->rsp_cons = (_i); \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
} while (0)

#define FRONT_RING_INIT(_r, _s, __size) FRONT_RING_ATTACH(_r, _s, 0, __size)

-#define BACK_RING_ATTACH(_r, _s, _i, __size) do { \
- (_r)->rsp_prod_pvt = (_i); \
- (_r)->req_cons = (_i); \
- (_r)->nr_ents = __RING_SIZE(_s, __size); \
- (_r)->sring = (_s); \
+#define BACK_RING_ATTACH(_r, _s, _i, __size) do { \
+ (_r)->rsp_prod_pvt = (_i); \
+ (_r)->req_cons = (_i); \
+ (_r)->nr_ents = __RING_SIZE(_s, __size); \
+ (_r)->sring = (_s); \
} while (0)

#define BACK_RING_INIT(_r, _s, __size) BACK_RING_ATTACH(_r, _s, 0, __size)

/* How big is this ring? */
-#define RING_SIZE(_r) \
+#define RING_SIZE(_r) \
((_r)->nr_ents)

/* Number of free requests (for use on front side only). */
-#define RING_FREE_REQUESTS(_r) \
+#define RING_FREE_REQUESTS(_r) \
(RING_SIZE(_r) - ((_r)->req_prod_pvt - (_r)->rsp_cons))

/* Test if there is an empty slot available on the front ring.
* (This is only meaningful from the front. )
*/
-#define RING_FULL(_r) \
+#define RING_FULL(_r) \
(RING_FREE_REQUESTS(_r) == 0)

/* Test if there are outstanding messages to be processed on a ring. */
-#define RING_HAS_UNCONSUMED_RESPONSES(_r) \
+#define RING_HAS_UNCONSUMED_RESPONSES(_r) \
((_r)->sring->rsp_prod - (_r)->rsp_cons)

-#define RING_HAS_UNCONSUMED_REQUESTS(_r) \
- ({ \
- unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \
- unsigned int rsp = RING_SIZE(_r) - \
- ((_r)->req_cons - (_r)->rsp_prod_pvt); \
- req < rsp ? req : rsp; \
- })
+#define RING_HAS_UNCONSUMED_REQUESTS(_r) ({ \
+ unsigned int req = (_r)->sring->req_prod - (_r)->req_cons; \
+ unsigned int rsp = RING_SIZE(_r) - \
+ ((_r)->req_cons - (_r)->rsp_prod_pvt); \
+ req < rsp ? req : rsp; \
+})

/* Direct access to individual ring elements, by index. */
-#define RING_GET_REQUEST(_r, _idx) \
+#define RING_GET_REQUEST(_r, _idx) \
(&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].req))

+#define RING_GET_RESPONSE(_r, _idx) \
+ (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp))
+
/*
- * Get a local copy of a request.
+ * Get a local copy of a request/response.
*
- * Use this in preference to RING_GET_REQUEST() so all processing is
+ * Use this in preference to RING_GET_{REQUEST,RESPONSE}() so all processing is
* done on a local copy that cannot be modified by the other end.
*
* Note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 may cause this
- * to be ineffective where _req is a struct which consists of only bitfields.
+ * to be ineffective where dest is a struct which consists of only bitfields.
*/
-#define RING_COPY_REQUEST(_r, _idx, _req) do { \
- /* Use volatile to force the copy into _req. */ \
- *(_req) = *(volatile typeof(_req))RING_GET_REQUEST(_r, _idx); \
+#define RING_COPY_(type, r, idx, dest) do { \
+ /* Use volatile to force the copy into dest. */ \
+ *(dest) = *(volatile typeof(dest))RING_GET_##type(r, idx); \
} while (0)

-#define RING_GET_RESPONSE(_r, _idx) \
- (&((_r)->sring->ring[((_idx) & (RING_SIZE(_r) - 1))].rsp))
+#define RING_COPY_REQUEST(r, idx, req) RING_COPY_(REQUEST, r, idx, req)
+#define RING_COPY_RESPONSE(r, idx, rsp) RING_COPY_(RESPONSE, r, idx, rsp)

/* Loop termination condition: Would the specified index overflow the ring? */
-#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \
+#define RING_REQUEST_CONS_OVERFLOW(_r, _cons) \
(((_cons) - (_r)->rsp_prod_pvt) >= RING_SIZE(_r))

/* Ill-behaved frontend determination: Can there be this many requests? */
-#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \
+#define RING_REQUEST_PROD_OVERFLOW(_r, _prod) \
(((_prod) - (_r)->rsp_prod_pvt) > RING_SIZE(_r))

+/* Ill-behaved backend determination: Can there be this many responses? */
+#define RING_RESPONSE_PROD_OVERFLOW(_r, _prod) \
+ (((_prod) - (_r)->rsp_cons) > RING_SIZE(_r))

-#define RING_PUSH_REQUESTS(_r) do { \
- virt_wmb(); /* back sees requests /before/ updated producer index */ \
- (_r)->sring->req_prod = (_r)->req_prod_pvt; \
+#define RING_PUSH_REQUESTS(_r) do { \
+ virt_wmb(); /* back sees requests /before/ updated producer index */\
+ (_r)->sring->req_prod = (_r)->req_prod_pvt; \
} while (0)

-#define RING_PUSH_RESPONSES(_r) do { \
- virt_wmb(); /* front sees responses /before/ updated producer index */ \
- (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
+#define RING_PUSH_RESPONSES(_r) do { \
+ virt_wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \
} while (0)

/*
@@ -239,40 +273,40 @@ struct __name##_back_ring { \
* field appropriately.
*/

-#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
- RING_IDX __old = (_r)->sring->req_prod; \
- RING_IDX __new = (_r)->req_prod_pvt; \
- virt_wmb(); /* back sees requests /before/ updated producer index */ \
- (_r)->sring->req_prod = __new; \
- virt_mb(); /* back sees new requests /before/ we check req_event */ \
- (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
- (RING_IDX)(__new - __old)); \
+#define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->req_prod; \
+ RING_IDX __new = (_r)->req_prod_pvt; \
+ virt_wmb(); /* back sees requests /before/ updated producer index */\
+ (_r)->sring->req_prod = __new; \
+ virt_mb(); /* back sees new requests /before/ we check req_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \
+ (RING_IDX)(__new - __old)); \
} while (0)

-#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
- RING_IDX __old = (_r)->sring->rsp_prod; \
- RING_IDX __new = (_r)->rsp_prod_pvt; \
- virt_wmb(); /* front sees responses /before/ updated producer index */ \
- (_r)->sring->rsp_prod = __new; \
- virt_mb(); /* front sees new responses /before/ we check rsp_event */ \
- (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
- (RING_IDX)(__new - __old)); \
+#define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \
+ RING_IDX __old = (_r)->sring->rsp_prod; \
+ RING_IDX __new = (_r)->rsp_prod_pvt; \
+ virt_wmb(); /* front sees resps /before/ updated producer index */ \
+ (_r)->sring->rsp_prod = __new; \
+ virt_mb(); /* front sees new resps /before/ we check rsp_event */ \
+ (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \
+ (RING_IDX)(__new - __old)); \
} while (0)

-#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \
- (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
- if (_work_to_do) break; \
- (_r)->sring->req_event = (_r)->req_cons + 1; \
- virt_mb(); \
- (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+#define RING_FINAL_CHECK_FOR_REQUESTS(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->req_event = (_r)->req_cons + 1; \
+ virt_mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \
} while (0)

-#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \
- (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
- if (_work_to_do) break; \
- (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
- virt_mb(); \
- (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+#define RING_FINAL_CHECK_FOR_RESPONSES(_r, _work_to_do) do { \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
+ if (_work_to_do) break; \
+ (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \
+ virt_mb(); \
+ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
} while (0)


--
2.26.2


2021-05-13 11:31:40

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

Today blkfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Introduce a new state of the ring BLKIF_STATE_ERROR which will be
switched to in case an inconsistency is being detected.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 62 +++++++++++++++++++++++++++---------
1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index c6a05de4f15f..aa0f159829b4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -80,6 +80,7 @@ enum blkif_state {
BLKIF_STATE_DISCONNECTED,
BLKIF_STATE_CONNECTED,
BLKIF_STATE_SUSPENDED,
+ BLKIF_STATE_ERROR,
};

struct grant {
@@ -89,6 +90,7 @@ struct grant {
};

enum blk_req_status {
+ REQ_PROCESSING,
REQ_WAITING,
REQ_DONE,
REQ_ERROR,
@@ -543,7 +545,7 @@ static unsigned long blkif_ring_get_request(struct blkfront_ring_info *rinfo,

id = get_id_from_freelist(rinfo);
rinfo->shadow[id].request = req;
- rinfo->shadow[id].status = REQ_WAITING;
+ rinfo->shadow[id].status = REQ_PROCESSING;
rinfo->shadow[id].associated_id = NO_ASSOCIATED_ID;

rinfo->shadow[id].req.u.rw.id = id;
@@ -572,6 +574,7 @@ static int blkif_queue_discard_req(struct request *req, struct blkfront_ring_inf

/* Copy the request to the ring page. */
*final_ring_req = *ring_req;
+ rinfo->shadow[id].status = REQ_WAITING;

return 0;
}
@@ -847,8 +850,11 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri

/* Copy request(s) to the ring page. */
*final_ring_req = *ring_req;
- if (unlikely(require_extra_req))
+ rinfo->shadow[id].status = REQ_WAITING;
+ if (unlikely(require_extra_req)) {
*final_extra_ring_req = *extra_ring_req;
+ rinfo->shadow[extra_id].status = REQ_WAITING;
+ }

if (new_persistent_gnts)
gnttab_free_grant_references(setup.gref_head);
@@ -1420,8 +1426,8 @@ static enum blk_req_status blkif_rsp_to_req_status(int rsp)
static int blkif_get_final_status(enum blk_req_status s1,
enum blk_req_status s2)
{
- BUG_ON(s1 == REQ_WAITING);
- BUG_ON(s2 == REQ_WAITING);
+ BUG_ON(s1 < REQ_DONE);
+ BUG_ON(s2 < REQ_DONE);

if (s1 == REQ_ERROR || s2 == REQ_ERROR)
return BLKIF_RSP_ERROR;
@@ -1454,7 +1460,7 @@ static bool blkif_completion(unsigned long *id,
s->status = blkif_rsp_to_req_status(bret->status);

/* Wait the second response if not yet here. */
- if (s2->status == REQ_WAITING)
+ if (s2->status < REQ_DONE)
return false;

bret->status = blkif_get_final_status(s->status,
@@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
spin_lock_irqsave(&rinfo->ring_lock, flags);
again:
rp = rinfo->ring.sring->rsp_prod;
+ if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
+ pr_alert("%s: illegal number of responses %u\n",
+ info->gd->disk_name, rp - rinfo->ring.rsp_cons);
+ goto err;
+ }
rmb(); /* Ensure we see queued responses up to 'rp'. */

for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;
+ unsigned int op;

RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
id = bret.id;
@@ -1588,14 +1600,28 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
* look in get_id_from_freelist.
*/
if (id >= BLK_RING_SIZE(info)) {
- WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret.operation), id);
- /* We can't safely get the 'struct request' as
- * the id is busted. */
- continue;
+ pr_alert("%s: response has incorrect id (%ld)\n",
+ info->gd->disk_name, id);
+ goto err;
}
+ if (rinfo->shadow[id].status != REQ_WAITING) {
+ pr_alert("%s: response references no pending request\n",
+ info->gd->disk_name);
+ goto err;
+ }
+
+ rinfo->shadow[id].status = REQ_PROCESSING;
req = rinfo->shadow[id].request;

+ op = rinfo->shadow[id].req.operation;
+ if (op == BLKIF_OP_INDIRECT)
+ op = rinfo->shadow[id].req.u.indirect.indirect_op;
+ if (bret.operation != op) {
+ pr_alert("%s: response has wrong operation (%u instead of %u)\n",
+ info->gd->disk_name, bret.operation, op);
+ goto err;
+ }
+
if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
@@ -1620,7 +1646,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_DISCARD:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
@@ -1632,13 +1659,13 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
- printk(KERN_WARNING "blkfront: %s: %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
- printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
+ pr_warn_ratelimited("blkfront: %s: empty %s op failed\n",
info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
@@ -1653,8 +1680,8 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
if (unlikely(bret.status != BLKIF_RSP_OKAY))
- dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret.status);
+ dev_dbg_ratelimited(&info->xbdev->dev,
+ "Bad return from blkdev data request: %x\n", bret.status);

break;
default:
@@ -1680,6 +1707,11 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
spin_unlock_irqrestore(&rinfo->ring_lock, flags);

return IRQ_HANDLED;
+
+ err:
+ info->connected = BLKIF_STATE_ERROR;
+ pr_alert("%s disabled for further use\n", info->gd->disk_name);
+ return IRQ_HANDLED;
}


--
2.26.2


2021-05-13 11:31:55

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 2/8] xen/blkfront: read response from backend only once

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/block/xen-blkfront.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 10df39a8b18d..a8b56c153330 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1557,7 +1557,7 @@ static bool blkif_completion(unsigned long *id,
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
{
struct request *req;
- struct blkif_response *bret;
+ struct blkif_response bret;
RING_IDX i, rp;
unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
@@ -1574,8 +1574,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
for (i = rinfo->ring.rsp_cons; i != rp; i++) {
unsigned long id;

- bret = RING_GET_RESPONSE(&rinfo->ring, i);
- id = bret->id;
+ RING_COPY_RESPONSE(&rinfo->ring, i, &bret);
+ id = bret.id;
+
/*
* The backend has messed up and given us an id that we would
* never have given to it (we stamp it up to BLK_RING_SIZE -
@@ -1583,39 +1584,39 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
*/
if (id >= BLK_RING_SIZE(info)) {
WARN(1, "%s: response to %s has incorrect id (%ld)\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
/* We can't safely get the 'struct request' as
* the id is busted. */
continue;
}
req = rinfo->shadow[id].request;

- if (bret->operation != BLKIF_OP_DISCARD) {
+ if (bret.operation != BLKIF_OP_DISCARD) {
/*
* We may need to wait for an extra response if the
* I/O request is split in 2
*/
- if (!blkif_completion(&id, rinfo, bret))
+ if (!blkif_completion(&id, rinfo, &bret))
continue;
}

if (add_id_to_freelist(rinfo, id)) {
WARN(1, "%s: response to %s (id %ld) couldn't be recycled!\n",
- info->gd->disk_name, op_name(bret->operation), id);
+ info->gd->disk_name, op_name(bret.operation), id);
continue;
}

- if (bret->status == BLKIF_RSP_OKAY)
+ if (bret.status == BLKIF_RSP_OKAY)
blkif_req(req)->error = BLK_STS_OK;
else
blkif_req(req)->error = BLK_STS_IOERR;

- switch (bret->operation) {
+ switch (bret.operation) {
case BLKIF_OP_DISCARD:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
@@ -1625,15 +1626,15 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
- if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
+ if (unlikely(bret.status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
- if (unlikely(bret->status == BLKIF_RSP_ERROR &&
+ if (unlikely(bret.status == BLKIF_RSP_ERROR &&
rinfo->shadow[id].req.u.rw.nr_segments == 0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op failed\n",
- info->gd->disk_name, op_name(bret->operation));
+ info->gd->disk_name, op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
if (unlikely(blkif_req(req)->error)) {
@@ -1646,9 +1647,9 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
fallthrough;
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
- if (unlikely(bret->status != BLKIF_RSP_OKAY))
+ if (unlikely(bret.status != BLKIF_RSP_OKAY))
dev_dbg(&info->xbdev->dev, "Bad return from blkdev data "
- "request: %x\n", bret->status);
+ "request: %x\n", bret.status);

break;
default:
--
2.26.2


2021-05-13 11:32:03

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 7/8] xen/netfront: don't trust the backend response data blindly

Today netfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.

Note that only the tx queue needs special id handling, as for the rx
queue the id is equal to the index in the ring page.

Introduce a new indicator for the device whether it is broken and let
the device stop working when it is set. Set this indicator in case the
backend sets any weird data.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/net/xen-netfront.c | 71 +++++++++++++++++++++++++++++++++++---
1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 261c35be0147..ccd6d1389b0a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -154,6 +154,8 @@ struct netfront_queue {

struct page_pool *page_pool;
struct xdp_rxq_info xdp_rxq;
+
+ bool tx_pending[NET_TX_RING_SIZE];
};

struct netfront_info {
@@ -173,6 +175,9 @@ struct netfront_info {
bool netback_has_xdp_headroom;
bool netfront_xdp_enabled;

+ /* Is device behaving sane? */
+ bool broken;
+
atomic_t rx_gso_checksum_fixup;
};

@@ -363,7 +368,7 @@ static int xennet_open(struct net_device *dev)
unsigned int i = 0;
struct netfront_queue *queue = NULL;

- if (!np->queues)
+ if (!np->queues || np->broken)
return -ENODEV;

for (i = 0; i < num_queues; ++i) {
@@ -391,11 +396,17 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
unsigned short id;
struct sk_buff *skb;
bool more_to_do;
+ const struct device *dev = &queue->info->netdev->dev;

BUG_ON(!netif_carrier_ok(queue->info->netdev));

do {
prod = queue->tx.sring->rsp_prod;
+ if (RING_RESPONSE_PROD_OVERFLOW(&queue->tx, prod)) {
+ dev_alert(dev, "Illegal number of responses %u\n",
+ prod - queue->tx.rsp_cons);
+ goto err;
+ }
rmb(); /* Ensure we see responses up to 'rp'. */

for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
@@ -406,12 +417,25 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
continue;

id = txrsp.id;
+ if (id >= RING_SIZE(&queue->tx)) {
+ dev_alert(dev,
+ "Response has incorrect id (%u)\n",
+ id);
+ goto err;
+ }
+ if (!queue->tx_pending[id]) {
+ dev_alert(dev,
+ "Response for inactive request\n");
+ goto err;
+ }
+
+ queue->tx_pending[id] = false;
skb = queue->tx_skbs[id].skb;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
- pr_alert("%s: warning -- grant still in use by backend domain\n",
- __func__);
- BUG();
+ dev_alert(dev,
+ "Grant still in use by backend domain\n");
+ goto err;
}
gnttab_end_foreign_access_ref(
queue->grant_tx_ref[id], GNTMAP_readonly);
@@ -429,6 +453,12 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
} while (more_to_do);

xennet_maybe_wake_tx(queue);
+
+ return;
+
+ err:
+ queue->info->broken = true;
+ dev_alert(dev, "Disabled for further use\n");
}

struct xennet_gnttab_make_txreq {
@@ -472,6 +502,13 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,

*tx = info->tx_local;

+ /*
+ * The request is not in its final form, as size and flags might be
+ * modified later, but even if a malicious backend will send a response
+ * now, nothing bad regarding security could happen.
+ */
+ queue->tx_pending[id] = true;
+
info->tx = tx;
info->size += info->tx_local.size;
}
@@ -605,6 +642,8 @@ static int xennet_xdp_xmit(struct net_device *dev, int n,
int nxmit = 0;
int i;

+ if (unlikely(np->broken))
+ return -ENODEV;
if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
return -EINVAL;

@@ -649,6 +688,8 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
/* Drop the packet if no queues are set up */
if (num_queues < 1)
goto drop;
+ if (unlikely(np->broken))
+ goto drop;
/* Determine which queue to transmit this SKB on */
queue_index = skb_get_queue_mapping(skb);
queue = &np->queues[queue_index];
@@ -1153,6 +1194,13 @@ static int xennet_poll(struct napi_struct *napi, int budget)
skb_queue_head_init(&tmpq);

rp = queue->rx.sring->rsp_prod;
+ if (RING_RESPONSE_PROD_OVERFLOW(&queue->rx, rp)) {
+ dev_alert(&dev->dev, "Illegal number of responses %u\n",
+ rp - queue->rx.rsp_cons);
+ queue->info->broken = true;
+ spin_unlock(&queue->rx_lock);
+ return 0;
+ }
rmb(); /* Ensure we see queued responses up to 'rp'. */

i = queue->rx.rsp_cons;
@@ -1373,6 +1421,9 @@ static irqreturn_t xennet_tx_interrupt(int irq, void *dev_id)
struct netfront_queue *queue = dev_id;
unsigned long flags;

+ if (queue->info->broken)
+ return IRQ_HANDLED;
+
spin_lock_irqsave(&queue->tx_lock, flags);
xennet_tx_buf_gc(queue);
spin_unlock_irqrestore(&queue->tx_lock, flags);
@@ -1385,6 +1436,9 @@ static irqreturn_t xennet_rx_interrupt(int irq, void *dev_id)
struct netfront_queue *queue = dev_id;
struct net_device *dev = queue->info->netdev;

+ if (queue->info->broken)
+ return IRQ_HANDLED;
+
if (likely(netif_carrier_ok(dev) &&
RING_HAS_UNCONSUMED_RESPONSES(&queue->rx)))
napi_schedule(&queue->napi);
@@ -1406,6 +1460,10 @@ static void xennet_poll_controller(struct net_device *dev)
struct netfront_info *info = netdev_priv(dev);
unsigned int num_queues = dev->real_num_tx_queues;
unsigned int i;
+
+ if (info->broken)
+ return;
+
for (i = 0; i < num_queues; ++i)
xennet_interrupt(0, &info->queues[i]);
}
@@ -1477,6 +1535,11 @@ static int xennet_xdp_set(struct net_device *dev, struct bpf_prog *prog,

static int xennet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
+ struct netfront_info *np = netdev_priv(dev);
+
+ if (np->broken)
+ return -ENODEV;
+
switch (xdp->command) {
case XDP_SETUP_PROG:
return xennet_xdp_set(dev, xdp->prog, xdp->extack);
--
2.26.2


2021-05-13 11:32:09

by Jürgen Groß

[permalink] [raw]
Subject: [PATCH 5/8] xen/netfront: read response from backend only once

In order to avoid problems in case the backend is modifying a response
on the ring page while the frontend has already seen it, just read the
response into a local buffer in one go and then operate on that buffer
only.

Signed-off-by: Juergen Gross <[email protected]>
---
drivers/net/xen-netfront.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 44275908d61a..f91e41ece554 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -399,13 +399,13 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
rmb(); /* Ensure we see responses up to 'rp'. */

for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
- struct xen_netif_tx_response *txrsp;
+ struct xen_netif_tx_response txrsp;

- txrsp = RING_GET_RESPONSE(&queue->tx, cons);
- if (txrsp->status == XEN_NETIF_RSP_NULL)
+ RING_COPY_RESPONSE(&queue->tx, cons, &txrsp);
+ if (txrsp.status == XEN_NETIF_RSP_NULL)
continue;

- id = txrsp->id;
+ id = txrsp.id;
skb = queue->tx_skbs[id].skb;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
@@ -814,7 +814,7 @@ static int xennet_get_extras(struct netfront_queue *queue,
RING_IDX rp)

{
- struct xen_netif_extra_info *extra;
+ struct xen_netif_extra_info extra;
struct device *dev = &queue->info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
int err = 0;
@@ -830,24 +830,22 @@ static int xennet_get_extras(struct netfront_queue *queue,
break;
}

- extra = (struct xen_netif_extra_info *)
- RING_GET_RESPONSE(&queue->rx, ++cons);
+ RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);

- if (unlikely(!extra->type ||
- extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+ if (unlikely(!extra.type ||
+ extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
if (net_ratelimit())
dev_warn(dev, "Invalid extra type: %d\n",
- extra->type);
+ extra.type);
err = -EINVAL;
} else {
- memcpy(&extras[extra->type - 1], extra,
- sizeof(*extra));
+ memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
}

skb = xennet_get_rx_skb(queue, cons);
ref = xennet_get_rx_ref(queue, cons);
xennet_move_rx_slot(queue, skb, ref);
- } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+ } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);

queue->rx.rsp_cons = cons;
return err;
@@ -905,7 +903,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
struct sk_buff_head *list,
bool *need_xdp_flush)
{
- struct xen_netif_rx_response *rx = &rinfo->rx;
+ struct xen_netif_rx_response *rx = &rinfo->rx, rx_local;
int max = XEN_NETIF_NR_SLOTS_MIN + (rx->status <= RX_COPY_THRESHOLD);
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
@@ -989,7 +987,8 @@ static int xennet_get_responses(struct netfront_queue *queue,
break;
}

- rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
+ RING_COPY_RESPONSE(&queue->rx, cons + slots, &rx_local);
+ rx = &rx_local;
skb = xennet_get_rx_skb(queue, cons + slots);
ref = xennet_get_rx_ref(queue, cons + slots);
slots++;
@@ -1044,10 +1043,11 @@ static int xennet_fill_frags(struct netfront_queue *queue,
struct sk_buff *nskb;

while ((nskb = __skb_dequeue(list))) {
- struct xen_netif_rx_response *rx =
- RING_GET_RESPONSE(&queue->rx, ++cons);
+ struct xen_netif_rx_response rx;
skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];

+ RING_COPY_RESPONSE(&queue->rx, ++cons, &rx);
+
if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;

@@ -1062,7 +1062,7 @@ static int xennet_fill_frags(struct netfront_queue *queue,

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
skb_frag_page(nfrag),
- rx->offset, rx->status, PAGE_SIZE);
+ rx.offset, rx.status, PAGE_SIZE);

skb_shinfo(nskb)->nr_frags = 0;
kfree_skb(nskb);
@@ -1161,7 +1161,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
i = queue->rx.rsp_cons;
work_done = 0;
while ((i != rp) && (work_done < budget)) {
- memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx));
+ RING_COPY_RESPONSE(&queue->rx, i, rx);
memset(extras, 0, sizeof(rinfo.extras));

err = xennet_get_responses(queue, &rinfo, rp, &tmpq,
--
2.26.2


2021-05-13 11:38:26

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value



Le 13/05/2021 à 12:03, Juergen Gross a écrit :
> Xen frontends shouldn't BUG() in case of illegal data received from
> their backends. So replace the BUG_ON()s when reading illegal data from
> the ring page with negative return values.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> drivers/tty/hvc/hvc_xen.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
> index 92c9a476defc..30d7ffb1e04c 100644
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
> cons = intf->out_cons;
> prod = intf->out_prod;
> mb(); /* update queue values before going on */
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->out),
> + "Illegal ring page indices"))
> + return -EINVAL;
> +
> BUG_ON((prod - cons) > sizeof(intf->out));

Why keep the BUG_ON() ?


>
> while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
> @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len)
> */
> while (len) {
> int sent = __write_console(cons, data, len);
> -
> +
> + if (sent < 0)
> + return sent;
> +
> data += sent;
> len -= sent;
>
> @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len)
> cons = intf->in_cons;
> prod = intf->in_prod;
> mb(); /* get pointers before reading ring */
> - BUG_ON((prod - cons) > sizeof(intf->in));
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->in),
> + "Illegal ring page indices"))
> + return -EINVAL;
>
> while (cons != prod && recv < len)
> buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
>

2021-05-13 11:39:02

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

On 13.05.21 12:16, Christophe Leroy wrote:
>
>
> Le 13/05/2021 à 12:03, Juergen Gross a écrit :
>> Xen frontends shouldn't BUG() in case of illegal data received from
>> their backends. So replace the BUG_ON()s when reading illegal data from
>> the ring page with negative return values.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>>   drivers/tty/hvc/hvc_xen.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
>> index 92c9a476defc..30d7ffb1e04c 100644
>> --- a/drivers/tty/hvc/hvc_xen.c
>> +++ b/drivers/tty/hvc/hvc_xen.c
>> @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info
>> *xencons,
>>       cons = intf->out_cons;
>>       prod = intf->out_prod;
>>       mb();            /* update queue values before going on */
>> +
>> +    if (WARN_ONCE((prod - cons) > sizeof(intf->out),
>> +              "Illegal ring page indices"))
>> +        return -EINVAL;
>> +
>>       BUG_ON((prod - cons) > sizeof(intf->out));
>
> Why keep the BUG_ON() ?

Oh, failed to delete it. Thanks for noticing.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-13 11:39:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote:
> Xen frontends shouldn't BUG() in case of illegal data received from
> their backends. So replace the BUG_ON()s when reading illegal data from
> the ring page with negative return values.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> drivers/tty/hvc/hvc_xen.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
> index 92c9a476defc..30d7ffb1e04c 100644
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
> cons = intf->out_cons;
> prod = intf->out_prod;
> mb(); /* update queue values before going on */
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->out),
> + "Illegal ring page indices"))
> + return -EINVAL;

How nice, you just rebooted on panic-on-warn systems :(

> +
> BUG_ON((prod - cons) > sizeof(intf->out));

Why keep this line?

Please just fix this up properly, if userspace can trigger this, then
both the WARN_ON() and BUG_ON() are not correct and need to be correctly
handled.


>
> while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
> @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len)
> */
> while (len) {
> int sent = __write_console(cons, data, len);
> -
> +
> + if (sent < 0)
> + return sent;
> +
> data += sent;
> len -= sent;
>
> @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len)
> cons = intf->in_cons;
> prod = intf->in_prod;
> mb(); /* get pointers before reading ring */
> - BUG_ON((prod - cons) > sizeof(intf->in));
> +
> + if (WARN_ONCE((prod - cons) > sizeof(intf->in),
> + "Illegal ring page indices"))
> + return -EINVAL;

Same here, you still just paniced a machine :(

thanks,

greg k-h

2021-05-13 17:45:13

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

On 13.05.21 12:25, Greg Kroah-Hartman wrote:
> On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote:
>> Xen frontends shouldn't BUG() in case of illegal data received from
>> their backends. So replace the BUG_ON()s when reading illegal data from
>> the ring page with negative return values.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> drivers/tty/hvc/hvc_xen.c | 15 +++++++++++++--
>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
>> index 92c9a476defc..30d7ffb1e04c 100644
>> --- a/drivers/tty/hvc/hvc_xen.c
>> +++ b/drivers/tty/hvc/hvc_xen.c
>> @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
>> cons = intf->out_cons;
>> prod = intf->out_prod;
>> mb(); /* update queue values before going on */
>> +
>> + if (WARN_ONCE((prod - cons) > sizeof(intf->out),
>> + "Illegal ring page indices"))
>> + return -EINVAL;
>
> How nice, you just rebooted on panic-on-warn systems :(
>
>> +
>> BUG_ON((prod - cons) > sizeof(intf->out));
>
> Why keep this line?

Failed to delete it, sorry.

>
> Please just fix this up properly, if userspace can trigger this, then
> both the WARN_ON() and BUG_ON() are not correct and need to be correctly
> handled.

It can be triggered by the console backend, but I agree a WARN isn't the
way to go here.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-17 18:45:29

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 2/8] xen/blkfront: read response from backend only once

On 13.05.2021 12:02, Juergen Gross wrote:
> In order to avoid problems in case the backend is modifying a response
> on the ring page while the frontend has already seen it, just read the
> response into a local buffer in one go and then operate on that buffer
> only.
>
> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Jan Beulich <[email protected]>


2021-05-17 18:57:11

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 3/8] xen/blkfront: don't take local copy of a request from the ring page

On 13.05.2021 12:02, Juergen Gross wrote:
> In order to avoid a malicious backend being able to influence the local
> copy of a request build the request locally first and then copy it to
> the ring page instead of doing it the other way round as today.
>
> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Jan Beulich <[email protected]>
with one remark/question:

> @@ -703,6 +704,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
> {
> struct blkfront_info *info = rinfo->dev_info;
> struct blkif_request *ring_req, *extra_ring_req = NULL;
> + struct blkif_request *final_ring_req, *final_extra_ring_req;

Without setting final_extra_ring_req to NULL just like is done for
extra_ring_req, ...

> @@ -840,10 +845,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
> if (setup.segments)
> kunmap_atomic(setup.segments);
>
> - /* Keep a private copy so we can reissue requests when recovering. */
> - rinfo->shadow[id].req = *ring_req;
> + /* Copy request(s) to the ring page. */
> + *final_ring_req = *ring_req;
> if (unlikely(require_extra_req))
> - rinfo->shadow[extra_id].req = *extra_ring_req;
> + *final_extra_ring_req = *extra_ring_req;

... are you sure all supported compilers will recognize the
conditional use and not warn about use of a possibly uninitialized
variable?

Jan

2021-05-17 22:30:47

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 13.05.2021 12:02, Juergen Gross wrote:
> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> spin_lock_irqsave(&rinfo->ring_lock, flags);
> again:
> rp = rinfo->ring.sring->rsp_prod;
> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
> + pr_alert("%s: illegal number of responses %u\n",
> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
> + goto err;
> + }
> rmb(); /* Ensure we see queued responses up to 'rp'. */

I think you want to insert after the barrier.

> @@ -1680,6 +1707,11 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
> spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>
> return IRQ_HANDLED;
> +
> + err:
> + info->connected = BLKIF_STATE_ERROR;
> + pr_alert("%s disabled for further use\n", info->gd->disk_name);
> + return IRQ_HANDLED;
> }

Am I understanding that a suspend (and then resume) can be used to
recover from error state? If so - is this intentional? If so in turn,
would it make sense to spell this out in the description?

Jan

2021-05-17 22:30:55

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 3/8] xen/blkfront: don't take local copy of a request from the ring page

On 17.05.21 16:01, Jan Beulich wrote:
> On 13.05.2021 12:02, Juergen Gross wrote:
>> In order to avoid a malicious backend being able to influence the local
>> copy of a request build the request locally first and then copy it to
>> the ring page instead of doing it the other way round as today.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Reviewed-by: Jan Beulich <[email protected]>
> with one remark/question:
>
>> @@ -703,6 +704,7 @@ static int blkif_queue_rw_req(struct request *req,
struct blkfront_ring_info *ri
>> {
>> struct blkfront_info *info = rinfo->dev_info;
>> struct blkif_request *ring_req, *extra_ring_req = NULL;
>> + struct blkif_request *final_ring_req, *final_extra_ring_req;
>
> Without setting final_extra_ring_req to NULL just like is done for
> extra_ring_req, ...
>
>> @@ -840,10 +845,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>> if (setup.segments)
>> kunmap_atomic(setup.segments);
>>
>> - /* Keep a private copy so we can reissue requests when recovering. */
>> - rinfo->shadow[id].req = *ring_req;
>> + /* Copy request(s) to the ring page. */
>> + *final_ring_req = *ring_req;
>> if (unlikely(require_extra_req))
>> - rinfo->shadow[extra_id].req = *extra_ring_req;
>> + *final_extra_ring_req = *extra_ring_req;
>
> ... are you sure all supported compilers will recognize the
> conditional use and not warn about use of a possibly uninitialized
> variable?

Hmm, probably better safe than sorry. Will change it.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-17 23:58:16

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 5/8] xen/netfront: read response from backend only once

On 13.05.2021 12:02, Juergen Gross wrote:
> In order to avoid problems in case the backend is modifying a response
> on the ring page while the frontend has already seen it, just read the
> response into a local buffer in one go and then operate on that buffer
> only.
>
> Signed-off-by: Juergen Gross <[email protected]>

Reviewed-by: Jan Beulich <[email protected]>
with one remark:

> @@ -830,24 +830,22 @@ static int xennet_get_extras(struct netfront_queue *queue,
> break;
> }
>
> - extra = (struct xen_netif_extra_info *)
> - RING_GET_RESPONSE(&queue->rx, ++cons);
> + RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);
>
> - if (unlikely(!extra->type ||
> - extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
> + if (unlikely(!extra.type ||
> + extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
> if (net_ratelimit())
> dev_warn(dev, "Invalid extra type: %d\n",
> - extra->type);
> + extra.type);
> err = -EINVAL;
> } else {
> - memcpy(&extras[extra->type - 1], extra,
> - sizeof(*extra));
> + memcpy(&extras[extra.type - 1], &extra, sizeof(extra));

Maybe take the opportunity and switch to (type safe) structure
assignment?

Jan

2021-05-18 00:43:43

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 17.05.21 16:11, Jan Beulich wrote:
> On 13.05.2021 12:02, Juergen Gross wrote:
>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>> again:
>> rp = rinfo->ring.sring->rsp_prod;
>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>> + pr_alert("%s: illegal number of responses %u\n",
>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>> + goto err;
>> + }
>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>
> I think you want to insert after the barrier.

Why? The relevant variable which is checked is "rp". The result of the
check is in no way depending on the responses themselves. And any change
of rsp_cons is protected by ring_lock, so there is no possibility of
reading an old value here.

>
>> @@ -1680,6 +1707,11 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>> spin_unlock_irqrestore(&rinfo->ring_lock, flags);
>>
>> return IRQ_HANDLED;
>> +
>> + err:
>> + info->connected = BLKIF_STATE_ERROR;
>> + pr_alert("%s disabled for further use\n", info->gd->disk_name);
>> + return IRQ_HANDLED;
>> }
>
> Am I understanding that a suspend (and then resume) can be used to
> recover from error state? If so - is this intentional? If so in turn,
> would it make sense to spell this out in the description?

I'd call it a nice side effect rather than intention. I can add a remark
to the commit message if you want.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-18 01:08:37

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 5/8] xen/netfront: read response from backend only once

On 17.05.21 16:20, Jan Beulich wrote:
> On 13.05.2021 12:02, Juergen Gross wrote:
>> In order to avoid problems in case the backend is modifying a response
>> on the ring page while the frontend has already seen it, just read the
>> response into a local buffer in one go and then operate on that buffer
>> only.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>
> Reviewed-by: Jan Beulich <[email protected]>
> with one remark:
>
>> @@ -830,24 +830,22 @@ static int xennet_get_extras(struct netfront_queue *queue,
>> break;
>> }
>>
>> - extra = (struct xen_netif_extra_info *)
>> - RING_GET_RESPONSE(&queue->rx, ++cons);
>> + RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);
>>
>> - if (unlikely(!extra->type ||
>> - extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
>> + if (unlikely(!extra.type ||
>> + extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
>> if (net_ratelimit())
>> dev_warn(dev, "Invalid extra type: %d\n",
>> - extra->type);
>> + extra.type);
>> err = -EINVAL;
>> } else {
>> - memcpy(&extras[extra->type - 1], extra,
>> - sizeof(*extra));
>> + memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
>
> Maybe take the opportunity and switch to (type safe) structure
> assignment?

Yes, good idea.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-18 11:36:31

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 6/8] xen/netfront: don't read data from request on the ring page

On 13.05.2021 12:03, Juergen Gross wrote:
> In order to avoid a malicious backend being able to influence the local
> processing of a request build the request locally first and then copy
> it to the ring page. Any reading from the request needs to be done on
> the local instance.

"Any reading" isn't really true - you don't change xennet_make_one_txreq(),
yet that has a read-modify-write operation. Without that I would have
been inclined to ask whether ...

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -435,7 +435,8 @@ struct xennet_gnttab_make_txreq {
> struct netfront_queue *queue;
> struct sk_buff *skb;
> struct page *page;
> - struct xen_netif_tx_request *tx; /* Last request */
> + struct xen_netif_tx_request *tx; /* Last request on ring page */
> + struct xen_netif_tx_request tx_local; /* Last request local copy*/

... retaining the tx field here is a good idea.

> @@ -463,30 +464,27 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
> queue->grant_tx_page[id] = page;
> queue->grant_tx_ref[id] = ref;
>
> - tx->id = id;
> - tx->gref = ref;
> - tx->offset = offset;
> - tx->size = len;
> - tx->flags = 0;
> + info->tx_local.id = id;
> + info->tx_local.gref = ref;
> + info->tx_local.offset = offset;
> + info->tx_local.size = len;
> + info->tx_local.flags = 0;
> +
> + *tx = info->tx_local;
>
> info->tx = tx;
> - info->size += tx->size;
> + info->size += info->tx_local.size;
> }
>
> static struct xen_netif_tx_request *xennet_make_first_txreq(
> - struct netfront_queue *queue, struct sk_buff *skb,
> - struct page *page, unsigned int offset, unsigned int len)
> + struct xennet_gnttab_make_txreq *info,
> + unsigned int offset, unsigned int len)
> {
> - struct xennet_gnttab_make_txreq info = {
> - .queue = queue,
> - .skb = skb,
> - .page = page,
> - .size = 0,
> - };
> + info->size = 0;
>
> - gnttab_for_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
> + gnttab_for_one_grant(info->page, offset, len, xennet_tx_setup_grant, info);
>
> - return info.tx;
> + return info->tx;
> }

Similarly this returning of a pointer into the ring looks at least
risky to me. At the very least it looks as if ...

> @@ -704,14 +699,16 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev
> }
>
> /* First request for the linear area. */
> - first_tx = tx = xennet_make_first_txreq(queue, skb,
> - page, offset, len);
> + info.queue = queue;
> + info.skb = skb;
> + info.page = page;
> + first_tx = tx = xennet_make_first_txreq(&info, offset, len);

... you could avoid setting tx here; perhaps the local variable
could go away altogether, showing it's really just first_rx that
is still needed. It's odd that ...

> offset += tx->size;

... you don't change this one, when ...

> if (offset == PAGE_SIZE) {
> page++;
> offset = 0;
> }
> - len -= tx->size;
> + len -= info.tx_local.size;

... you do so here. Likely just an oversight.

Jan

2021-05-18 12:29:22

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 17.05.2021 16:23, Juergen Gross wrote:
> On 17.05.21 16:11, Jan Beulich wrote:
>> On 13.05.2021 12:02, Juergen Gross wrote:
>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>> again:
>>> rp = rinfo->ring.sring->rsp_prod;
>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>> + pr_alert("%s: illegal number of responses %u\n",
>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>> + goto err;
>>> + }
>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>
>> I think you want to insert after the barrier.
>
> Why? The relevant variable which is checked is "rp". The result of the
> check is in no way depending on the responses themselves. And any change
> of rsp_cons is protected by ring_lock, so there is no possibility of
> reading an old value here.

But this is a standard double read situation: You might check a value
and then (via a separate read) use a different one past the barrier.

Jan

2021-05-18 13:40:21

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 17.05.21 17:12, Jan Beulich wrote:
> On 17.05.2021 16:23, Juergen Gross wrote:
>> On 17.05.21 16:11, Jan Beulich wrote:
>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>> again:
>>>> rp = rinfo->ring.sring->rsp_prod;
>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>> + goto err;
>>>> + }
>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>
>>> I think you want to insert after the barrier.
>>
>> Why? The relevant variable which is checked is "rp". The result of the
>> check is in no way depending on the responses themselves. And any change
>> of rsp_cons is protected by ring_lock, so there is no possibility of
>> reading an old value here.
>
> But this is a standard double read situation: You might check a value
> and then (via a separate read) use a different one past the barrier.

Yes and no.

rsp_cons should never be written by the other side, and additionally
it would be read multiple times anyway.

So if the other side is writing it, the write could always happen after
the test and before the loop is started. This is no real issue here as
the frontend would very soon stumble over an illegal response (either
no request pending, or some other inconsistency). The test is meant to
have a more detailed error message in case it hits.

In the end it doesn't really matter, so I can change it. I just wanted
to point out that IMO both variants are equally valid.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-05-18 14:59:26

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 7/8] xen/netfront: don't trust the backend response data blindly

On 13.05.2021 12:03, Juergen Gross wrote:
> @@ -429,6 +453,12 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
> } while (more_to_do);
>
> xennet_maybe_wake_tx(queue);
> +
> + return;
> +
> + err:
> + queue->info->broken = true;
> + dev_alert(dev, "Disabled for further use\n");
> }

If in blkfront the ability to revive a device via a suspend/resume cycle
is "a nice side effect", wouldn't it be nice for all frontends to behave
similarly in this regard? I.e. wouldn't you want to also clear this flag
somewhere? And shouldn't additionally / more generally a disconnect /
connect cycle allow proper operation again?

> @@ -472,6 +502,13 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
>
> *tx = info->tx_local;
>
> + /*
> + * The request is not in its final form, as size and flags might be
> + * modified later, but even if a malicious backend will send a response
> + * now, nothing bad regarding security could happen.
> + */
> + queue->tx_pending[id] = true;

I'm not sure I can agree with what the comment says. If the backend
sent a response prematurely, wouldn't the underlying slot(s) become
available for re-use, and hence potentially get filled / updated by
two parties?

Jan

2021-05-18 15:10:47

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 17.05.2021 17:22, Juergen Gross wrote:
> On 17.05.21 17:12, Jan Beulich wrote:
>> On 17.05.2021 16:23, Juergen Gross wrote:
>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>> again:
>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>> + goto err;
>>>>> + }
>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>
>>>> I think you want to insert after the barrier.
>>>
>>> Why? The relevant variable which is checked is "rp". The result of the
>>> check is in no way depending on the responses themselves. And any change
>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>> reading an old value here.
>>
>> But this is a standard double read situation: You might check a value
>> and then (via a separate read) use a different one past the barrier.
>
> Yes and no.
>
> rsp_cons should never be written by the other side, and additionally
> it would be read multiple times anyway.

But I'm talking about rsp_prod, as that's what rp gets loaded from.

Jan

> So if the other side is writing it, the write could always happen after
> the test and before the loop is started. This is no real issue here as
> the frontend would very soon stumble over an illegal response (either
> no request pending, or some other inconsistency). The test is meant to
> have a more detailed error message in case it hits.
>
> In the end it doesn't really matter, so I can change it. I just wanted
> to point out that IMO both variants are equally valid.
>
>
> Juergen
>


Subject: Re: [PATCH 0/8] xen: harden frontends against malicious backends

On Thu, May 13, 2021 at 12:02:54PM +0200, Juergen Gross wrote:
> Xen backends of para-virtualized devices can live in dom0 kernel, dom0
> user land, or in a driver domain. This means that a backend might
> reside in a less trusted environment than the Xen core components, so
> a backend should not be able to do harm to a Xen guest (it can still
> mess up I/O data, but it shouldn't be able to e.g. crash a guest by
> other means or cause a privilege escalation in the guest).
>
> Unfortunately many frontends in the Linux kernel are fully trusting
> their respective backends. This series is starting to fix the most
> important frontends: console, disk and network.
>
> It was discussed to handle this as a security problem, but the topic
> was discussed in public before, so it isn't a real secret.

Is it based on patches we ship in Qubes[1] and also I've sent here some
years ago[2]? I see a lot of similarities. If not, you may want to
compare them.

[1] https://github.com/QubesOS/qubes-linux-kernel/
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg02336.html


> Juergen Gross (8):
> xen: sync include/xen/interface/io/ring.h with Xen's newest version
> xen/blkfront: read response from backend only once
> xen/blkfront: don't take local copy of a request from the ring page
> xen/blkfront: don't trust the backend response data blindly
> xen/netfront: read response from backend only once
> xen/netfront: don't read data from request on the ring page
> xen/netfront: don't trust the backend response data blindly
> xen/hvc: replace BUG_ON() with negative return value
>
> drivers/block/xen-blkfront.c | 118 +++++++++-----
> drivers/net/xen-netfront.c | 184 ++++++++++++++-------
> drivers/tty/hvc/hvc_xen.c | 15 +-
> include/xen/interface/io/ring.h | 278 ++++++++++++++++++--------------
> 4 files changed, 369 insertions(+), 226 deletions(-)
>
> --
> 2.26.2
>
>

--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


Attachments:
(No filename) (2.00 kB)
signature.asc (499.00 B)
Download all attachments

2021-07-08 05:47:54

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 17.05.21 17:33, Jan Beulich wrote:
> On 17.05.2021 17:22, Juergen Gross wrote:
>> On 17.05.21 17:12, Jan Beulich wrote:
>>> On 17.05.2021 16:23, Juergen Gross wrote:
>>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>>> again:
>>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>>> + goto err;
>>>>>> + }
>>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>>
>>>>> I think you want to insert after the barrier.
>>>>
>>>> Why? The relevant variable which is checked is "rp". The result of the
>>>> check is in no way depending on the responses themselves. And any change
>>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>>> reading an old value here.
>>>
>>> But this is a standard double read situation: You might check a value
>>> and then (via a separate read) use a different one past the barrier.
>>
>> Yes and no.
>>
>> rsp_cons should never be written by the other side, and additionally
>> it would be read multiple times anyway.
>
> But I'm talking about rsp_prod, as that's what rp gets loaded from.

Oh, now I get your problem.

But shouldn't that better be solved by using READ_ONCE() for reading rp
instead?


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-08 06:39:50

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 08.07.2021 07:47, Juergen Gross wrote:
> On 17.05.21 17:33, Jan Beulich wrote:
>> On 17.05.2021 17:22, Juergen Gross wrote:
>>> On 17.05.21 17:12, Jan Beulich wrote:
>>>> On 17.05.2021 16:23, Juergen Gross wrote:
>>>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>>>> again:
>>>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>>>> + goto err;
>>>>>>> + }
>>>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>>>
>>>>>> I think you want to insert after the barrier.
>>>>>
>>>>> Why? The relevant variable which is checked is "rp". The result of the
>>>>> check is in no way depending on the responses themselves. And any change
>>>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>>>> reading an old value here.
>>>>
>>>> But this is a standard double read situation: You might check a value
>>>> and then (via a separate read) use a different one past the barrier.
>>>
>>> Yes and no.
>>>
>>> rsp_cons should never be written by the other side, and additionally
>>> it would be read multiple times anyway.
>>
>> But I'm talking about rsp_prod, as that's what rp gets loaded from.
>
> Oh, now I get your problem.
>
> But shouldn't that better be solved by using READ_ONCE() for reading rp
> instead?

Not sure - the rmb() is needed anyway aiui, and hence you could as well
move your code addition.

Jan

2021-07-08 06:42:17

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 08.07.21 08:37, Jan Beulich wrote:
> On 08.07.2021 07:47, Juergen Gross wrote:
>> On 17.05.21 17:33, Jan Beulich wrote:
>>> On 17.05.2021 17:22, Juergen Gross wrote:
>>>> On 17.05.21 17:12, Jan Beulich wrote:
>>>>> On 17.05.2021 16:23, Juergen Gross wrote:
>>>>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>>>>> again:
>>>>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>>>>> + goto err;
>>>>>>>> + }
>>>>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>>>>
>>>>>>> I think you want to insert after the barrier.
>>>>>>
>>>>>> Why? The relevant variable which is checked is "rp". The result of the
>>>>>> check is in no way depending on the responses themselves. And any change
>>>>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>>>>> reading an old value here.
>>>>>
>>>>> But this is a standard double read situation: You might check a value
>>>>> and then (via a separate read) use a different one past the barrier.
>>>>
>>>> Yes and no.
>>>>
>>>> rsp_cons should never be written by the other side, and additionally
>>>> it would be read multiple times anyway.
>>>
>>> But I'm talking about rsp_prod, as that's what rp gets loaded from.
>>
>> Oh, now I get your problem.
>>
>> But shouldn't that better be solved by using READ_ONCE() for reading rp
>> instead?
>
> Not sure - the rmb() is needed anyway aiui, and hence you could as well
> move your code addition.

Sure.

My question was rather: does the rmb() really eliminate the possibility
of a double read introduced by the compiler? If yes, moving the code is
the correct solution.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-07-08 06:53:33

by Jan Beulich

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 08.07.2021 08:40, Juergen Gross wrote:
> On 08.07.21 08:37, Jan Beulich wrote:
>> On 08.07.2021 07:47, Juergen Gross wrote:
>>> On 17.05.21 17:33, Jan Beulich wrote:
>>>> On 17.05.2021 17:22, Juergen Gross wrote:
>>>>> On 17.05.21 17:12, Jan Beulich wrote:
>>>>>> On 17.05.2021 16:23, Juergen Gross wrote:
>>>>>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>>>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>>>>>> again:
>>>>>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>>>>>> + goto err;
>>>>>>>>> + }
>>>>>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>>>>>
>>>>>>>> I think you want to insert after the barrier.
>>>>>>>
>>>>>>> Why? The relevant variable which is checked is "rp". The result of the
>>>>>>> check is in no way depending on the responses themselves. And any change
>>>>>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>>>>>> reading an old value here.
>>>>>>
>>>>>> But this is a standard double read situation: You might check a value
>>>>>> and then (via a separate read) use a different one past the barrier.
>>>>>
>>>>> Yes and no.
>>>>>
>>>>> rsp_cons should never be written by the other side, and additionally
>>>>> it would be read multiple times anyway.
>>>>
>>>> But I'm talking about rsp_prod, as that's what rp gets loaded from.
>>>
>>> Oh, now I get your problem.
>>>
>>> But shouldn't that better be solved by using READ_ONCE() for reading rp
>>> instead?
>>
>> Not sure - the rmb() is needed anyway aiui, and hence you could as well
>> move your code addition.
>
> Sure.
>
> My question was rather: does the rmb() really eliminate the possibility
> of a double read introduced by the compiler? If yes, moving the code is
> the correct solution.

It doesn't eliminate the possibility of a double read, but (leaving
aside split accesses) that's not what you care about here. What you
need is a single stable value to operate on. No matter how many
(non-split) reads the compiler may issue to fill "rp", the final
read's value will be used in the subsequent calculation. Or at
least that's been my understanding; thinking about it the compiler
might issue multiple reads into distinct registers ahead of the
barrier, and use different registers for different subsequent
operations. While this would look like intentionally inefficient
code generation to me, you may indeed want to play safe and use
ACCESS_ONCE() _and_ the barrier. I guess there are more places then
which would want similar treatment, and it's not a problem that
this change introduces ...

Jan

2021-07-08 06:57:16

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 4/8] xen/blkfront: don't trust the backend response data blindly

On 08.07.21 08:52, Jan Beulich wrote:
> On 08.07.2021 08:40, Juergen Gross wrote:
>> On 08.07.21 08:37, Jan Beulich wrote:
>>> On 08.07.2021 07:47, Juergen Gross wrote:
>>>> On 17.05.21 17:33, Jan Beulich wrote:
>>>>> On 17.05.2021 17:22, Juergen Gross wrote:
>>>>>> On 17.05.21 17:12, Jan Beulich wrote:
>>>>>>> On 17.05.2021 16:23, Juergen Gross wrote:
>>>>>>>> On 17.05.21 16:11, Jan Beulich wrote:
>>>>>>>>> On 13.05.2021 12:02, Juergen Gross wrote:
>>>>>>>>>> @@ -1574,10 +1580,16 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>>>>>>>>>> spin_lock_irqsave(&rinfo->ring_lock, flags);
>>>>>>>>>> again:
>>>>>>>>>> rp = rinfo->ring.sring->rsp_prod;
>>>>>>>>>> + if (RING_RESPONSE_PROD_OVERFLOW(&rinfo->ring, rp)) {
>>>>>>>>>> + pr_alert("%s: illegal number of responses %u\n",
>>>>>>>>>> + info->gd->disk_name, rp - rinfo->ring.rsp_cons);
>>>>>>>>>> + goto err;
>>>>>>>>>> + }
>>>>>>>>>> rmb(); /* Ensure we see queued responses up to 'rp'. */
>>>>>>>>>
>>>>>>>>> I think you want to insert after the barrier.
>>>>>>>>
>>>>>>>> Why? The relevant variable which is checked is "rp". The result of the
>>>>>>>> check is in no way depending on the responses themselves. And any change
>>>>>>>> of rsp_cons is protected by ring_lock, so there is no possibility of
>>>>>>>> reading an old value here.
>>>>>>>
>>>>>>> But this is a standard double read situation: You might check a value
>>>>>>> and then (via a separate read) use a different one past the barrier.
>>>>>>
>>>>>> Yes and no.
>>>>>>
>>>>>> rsp_cons should never be written by the other side, and additionally
>>>>>> it would be read multiple times anyway.
>>>>>
>>>>> But I'm talking about rsp_prod, as that's what rp gets loaded from.
>>>>
>>>> Oh, now I get your problem.
>>>>
>>>> But shouldn't that better be solved by using READ_ONCE() for reading rp
>>>> instead?
>>>
>>> Not sure - the rmb() is needed anyway aiui, and hence you could as well
>>> move your code addition.
>>
>> Sure.
>>
>> My question was rather: does the rmb() really eliminate the possibility
>> of a double read introduced by the compiler? If yes, moving the code is
>> the correct solution.
>
> It doesn't eliminate the possibility of a double read, but (leaving
> aside split accesses) that's not what you care about here. What you
> need is a single stable value to operate on. No matter how many
> (non-split) reads the compiler may issue to fill "rp", the final
> read's value will be used in the subsequent calculation. Or at
> least that's been my understanding; thinking about it the compiler
> might issue multiple reads into distinct registers ahead of the
> barrier, and use different registers for different subsequent
> operations. While this would look like intentionally inefficient
> code generation to me, you may indeed want to play safe and use
> ACCESS_ONCE() _and_ the barrier. I guess there are more places then
> which would want similar treatment, and it's not a problem that
> this change introduces ...

Nevertheless I think I can change it right away. It will also help
against load tearing.


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments