LinuxLists.cc - [PATCH 0/7] Xen transport for 9pfs frontend driver

2017-03-06 20:00:53

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 0/7] Xen transport for 9pfs frontend driver

Hi all,

This patch series implements a new transport for 9pfs, aimed at Xen
systems.

The transport is based on a traditional Xen frontend and backend drivers
pair. This patch series implements the frontend, which typically runs in
a regular unprivileged guest.

I'll follow up with another series that implements the backend in
userspace in QEMU, which typically runs in Dom0 (but could also run in
a another guest).

The frontend complies to the Xen transport for 9pfs specification
version 1, available here:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/9pfs.markdown;hb=HEAD

Stefano Stabellini (7):
xen: import new ring macros in ring.h
xen: introduce the header file for the Xen 9pfs transport protocol
xen/9pfs: introduce Xen 9pfs transport driver
xen/9pfs: connect to the backend
xen/9pfs: send requests to the backend
xen/9pfs: receive responses
xen/9pfs: build 9pfs Xen transport driver

include/xen/interface/io/9pfs.h | 40 ++++
include/xen/interface/io/ring.h | 131 ++++++++++++
net/9p/Kconfig | 8 +
net/9p/Makefile | 4 +
net/9p/trans_xen.c | 462 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 645 insertions(+)
create mode 100644 include/xen/interface/io/9pfs.h
create mode 100644 net/9p/trans_xen.c

Cheers,

Stefano

2017-03-06 20:11:26

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 7/7] xen/9pfs: build 9pfs Xen transport driver

This patch adds a Kconfig option and Makefile support for building the
9pfs Xen driver.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
---
net/9p/Kconfig | 8 ++++++++
net/9p/Makefile | 4 ++++
2 files changed, 12 insertions(+)

diff --git a/net/9p/Kconfig b/net/9p/Kconfig
index a75174a..5c5649b 100644
--- a/net/9p/Kconfig
+++ b/net/9p/Kconfig
@@ -22,6 +22,14 @@ config NET_9P_VIRTIO
This builds support for a transports between
guest partitions and a host partition.

+config NET_9P_XEN
+ depends on XEN
+ tristate "9P Xen Transport"
+ help
+ This builds support for a transport between
+ two Xen domains.
+
+
config NET_9P_RDMA
depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS
tristate "9P RDMA Transport (Experimental)"
diff --git a/net/9p/Makefile b/net/9p/Makefile
index a0874cc..697ea7c 100644
--- a/net/9p/Makefile
+++ b/net/9p/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_NET_9P) := 9pnet.o
+obj-$(CONFIG_NET_9P_XEN) += 9pnet_xen.o
obj-$(CONFIG_NET_9P_VIRTIO) += 9pnet_virtio.o
obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o

@@ -14,5 +15,8 @@ obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o
9pnet_virtio-objs := \
trans_virtio.o \

+9pnet_xen-objs := \
+ trans_xen.o \
+
9pnet_rdma-objs := \
trans_rdma.o \
--
1.9.1

2017-03-06 20:11:33

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 6/7] xen/9pfs: receive responses

Upon receiving a notification from the backend, schedule the
p9_xen_response work_struct. p9_xen_response checks if any responses are
available, if so, it reads them one by one, calling p9_client_cb to send
them up to the 9p layer (p9_client_cb completes the request). Handle the
ring following the Xen 9pfs specification.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
---
net/9p/trans_xen.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 4e26556..1ca9246 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -149,6 +149,59 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)

static void p9_xen_response(struct work_struct *work)
{
+ struct xen_9pfs_front_priv *priv;
+ struct xen_9pfs_dataring *ring;
+ RING_IDX cons, prod, masked_cons, masked_prod;
+ struct xen_9pfs_header h;
+ struct p9_req_t *req;
+ int status = REQ_STATUS_ERROR;
+
+ ring = container_of(work, struct xen_9pfs_dataring, work);
+ priv = ring->priv;
+
+ while (1) {
+ cons = ring->intf->in_cons;
+ prod = ring->intf->in_prod;
+ rmb();
+
+ if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
+ notify_remote_via_irq(ring->irq);
+ return;
+ }
+
+ masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
+ masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
+
+ xen_9pfs_read_packet(ring->ring.in,
+ masked_prod, &masked_cons,
+ XEN_9PFS_RING_SIZE, &h, sizeof(h));
+
+ req = p9_tag_lookup(priv->client, h.tag);
+ if (!req || req->status != REQ_STATUS_SENT) {
+ dev_warn(&priv->dev->dev, "Wrong req tag=%x\n", h.tag);
+ cons += h.size;
+ mb();
+ ring->intf->in_cons = cons;
+ continue;
+ }
+
+ memcpy(req->rc, &h, sizeof(h));
+ req->rc->offset = 0;
+
+ masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
+ xen_9pfs_read_packet(ring->ring.in,
+ masked_prod, &masked_cons,
+ XEN_9PFS_RING_SIZE, req->rc->sdata, h.size);
+
+ mb();
+ cons += h.size;
+ ring->intf->in_cons = cons;
+
+ if (req->status != REQ_STATUS_ERROR)
+ status = REQ_STATUS_RCVD;
+
+ p9_client_cb(priv->client, req, status);
+ }
}

static irqreturn_t xen_9pfs_front_event_handler(int irq, void *r)
--
1.9.1

2017-03-06 20:11:43

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 1/7] xen: import new ring macros in ring.h

Sync the ring.h file with upstream Xen, to introduce the new ring macros.
They will be used by the Xen transport for 9pfs.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]

---
NB: The new macros have not been committed to Xen yet. Do not apply this
patch until they do.
---
---
include/xen/interface/io/ring.h | 131 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 131 insertions(+)

diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index 21f4fbd..e16aa92 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -283,4 +283,135 @@ struct __name##_back_ring { \
(_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \
} while (0)

+
+/*
+ * DEFINE_XEN_FLEX_RING_AND_INTF defines two monodirectional rings and
+ * functions to check if there is data on the ring, and to read and
+ * write to them.
+ *
+ * DEFINE_XEN_FLEX_RING is similar to DEFINE_XEN_FLEX_RING_AND_INTF, but
+ * does not define the indexes page. As different protocols can have
+ * extensions to the basic format, this macro allow them to define their
+ * own struct.
+ *
+ * XEN_FLEX_RING_SIZE
+ * Convenience macro to calculate the size of one of the two rings
+ * from the overall order.
+ *
+ * $NAME_mask
+ * Function to apply the size mask to an index, to reduce the index
+ * within the range [0-size].
+ *
+ * $NAME_read_packet
+ * Function to read data from the ring. The amount of data to read is
+ * specified by the "size" argument.
+ *
+ * $NAME_write_packet
+ * Function to write data to the ring. The amount of data to write is
+ * specified by the "size" argument.
+ *
+ * $NAME_get_ring_ptr
+ * Convenience function that returns a pointer to read/write to the
+ * ring at the right location.
+ *
+ * $NAME_data_intf
+ * Indexes page, shared between frontend and backend. It also
+ * contains the array of grant refs.
+ *
+ * $NAME_queued
+ * Function to calculate how many bytes are currently on the ring,
+ * ready to be read. It can also be used to calculate how much free
+ * space is currently on the ring (ring_size - $NAME_queued()).
+ */
+#define XEN_FLEX_RING_SIZE(order) \
+ (1UL << (order + PAGE_SHIFT - 1))
+
+#define DEFINE_XEN_FLEX_RING_AND_INTF(name) \
+struct name##_data_intf { \
+ RING_IDX in_cons, in_prod; \
+ \
+ uint8_t pad1[56]; \
+ \
+ RING_IDX out_cons, out_prod; \
+ \
+ uint8_t pad2[56]; \
+ \
+ RING_IDX ring_order; \
+ grant_ref_t ref[]; \
+}; \
+DEFINE_XEN_FLEX_RING(name);
+
+#define DEFINE_XEN_FLEX_RING(name) \
+static inline RING_IDX name##_mask(RING_IDX idx, RING_IDX ring_size) \
+{ \
+ return (idx & (ring_size - 1)); \
+} \
+ \
+static inline RING_IDX name##_mask_order(RING_IDX idx, RING_IDX ring_order) \
+{ \
+ return (idx & (XEN_FLEX_RING_SIZE(ring_order) - 1)); \
+} \
+ \
+static inline unsigned char* name##_get_ring_ptr(unsigned char *buf, \
+ RING_IDX idx, \
+ RING_IDX ring_order) \
+{ \
+ return buf + name##_mask_order(idx, ring_order); \
+} \
+ \
+static inline void name##_read_packet(const unsigned char *buf, \
+ RING_IDX masked_prod, RING_IDX *masked_cons, \
+ RING_IDX ring_size, void *opaque, size_t size) { \
+ if (*masked_cons < masked_prod || \
+ size <= ring_size - *masked_cons) { \
+ memcpy(opaque, buf + *masked_cons, size); \
+ } else { \
+ memcpy(opaque, buf + *masked_cons, ring_size - *masked_cons); \
+ memcpy((unsigned char *)opaque + ring_size - *masked_cons, buf, \
+ size - (ring_size - *masked_cons)); \
+ } \
+ *masked_cons = name##_mask(*masked_cons + size, ring_size); \
+} \
+ \
+static inline void name##_write_packet(unsigned char *buf, \
+ RING_IDX *masked_prod, RING_IDX masked_cons, \
+ RING_IDX ring_size, const void *opaque, size_t size) { \
+ if (*masked_prod < masked_cons || \
+ size <= ring_size - *masked_prod) { \
+ memcpy(buf + *masked_prod, opaque, size); \
+ } else { \
+ memcpy(buf + *masked_prod, opaque, ring_size - *masked_prod); \
+ memcpy(buf, (unsigned char *)opaque + (ring_size - *masked_prod), \
+ size - (ring_size - *masked_prod)); \
+ } \
+ *masked_prod = name##_mask(*masked_prod + size, ring_size); \
+} \
+ \
+struct name##_data { \
+ unsigned char *in; /* half of the allocation */ \
+ unsigned char *out; /* half of the allocation */ \
+}; \
+ \
+ \
+static inline RING_IDX name##_queued(RING_IDX prod, \
+ RING_IDX cons, RING_IDX ring_size) \
+{ \
+ RING_IDX size; \
+ \
+ if (prod == cons) \
+ return 0; \
+ \
+ prod = name##_mask(prod, ring_size); \
+ cons = name##_mask(cons, ring_size); \
+ \
+ if (prod == cons) \
+ return ring_size; \
+ \
+ if (prod > cons) \
+ size = prod - cons; \
+ else \
+ size = ring_size - (cons - prod); \
+ return size; \
+};
+
#endif /* __XEN_PUBLIC_IO_RING_H__ */
--
1.9.1

2017-03-06 20:11:52

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 2/7] xen: introduce the header file for the Xen 9pfs transport protocol

It uses the new ring.h macros to declare rings and interfaces.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
include/xen/interface/io/9pfs.h | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
create mode 100644 include/xen/interface/io/9pfs.h

diff --git a/include/xen/interface/io/9pfs.h b/include/xen/interface/io/9pfs.h
new file mode 100644
index 0000000..276eda4
--- /dev/null
+++ b/include/xen/interface/io/9pfs.h
@@ -0,0 +1,40 @@
+/*
+ * 9pfs.h -- Xen 9PFS transport
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2017 Stefano Stabellini <[email protected]>
+ */
+
+#ifndef __XEN_PUBLIC_IO_9PFS_H__
+#define __XEN_PUBLIC_IO_9PFS_H__
+
+#include "xen/interface/io/ring.h"
+
+struct xen_9pfs_header {
+ uint32_t size;
+ uint8_t id;
+ uint16_t tag;
+} __attribute__((packed));
+
+#define XEN_9PFS_RING_ORDER 6
+#define XEN_9PFS_RING_SIZE XEN_FLEX_RING_SIZE(XEN_9PFS_RING_ORDER)
+DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
+
+#endif
--
1.9.1

2017-03-06 20:12:52

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 5/7] xen/9pfs: send requests to the backend

Implement struct p9_trans_module create and close functions by looking
at the available Xen 9pfs frontend-backend connections. We don't expect
many frontend-backend connections, thus walking a list is OK.

Send requests to the backend by copying each request to one of the
available rings (each frontend-backend connection comes with multiple
rings). Handle the ring and notifications following the 9pfs
specification. If there are not enough free bytes on the ring for the
request, wait on the wait_queue: the backend will send a notification
after consuming more requests.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
---
net/9p/trans_xen.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 82 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 9f6cf8d..4e26556 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -47,22 +47,103 @@ struct xen_9pfs_front_priv {
};
static LIST_HEAD(xen_9pfs_devs);

+/* We don't currently allow canceling of requests */
static int p9_xen_cancel(struct p9_client *client, struct p9_req_t *req)
{
- return 0;
+ return 1;
}

static int p9_xen_create(struct p9_client *client, const char *addr, char *args)
{
+ struct xen_9pfs_front_priv *priv = NULL;
+
+ list_for_each_entry(priv, &xen_9pfs_devs, list) {
+ if (!strcmp(priv->tag, addr))
+ break;
+ }
+ if (!priv || strcmp(priv->tag, addr))
+ return -EINVAL;
+
+ priv->client = client;
return 0;
}

static void p9_xen_close(struct p9_client *client)
{
+ struct xen_9pfs_front_priv *priv = NULL;
+
+ list_for_each_entry(priv, &xen_9pfs_devs, list) {
+ if (priv->client == client)
+ break;
+ }
+ if (!priv || priv->client != client)
+ return;
+
+ priv->client = NULL;
+ return;
+}
+
+static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
+{
+ RING_IDX cons, prod;
+
+ cons = ring->intf->out_cons;
+ prod = ring->intf->out_prod;
+ mb();
+
+ if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
+ return 1;
+ else
+ return 0;
}

static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
{
+ struct xen_9pfs_front_priv *priv = NULL;
+ RING_IDX cons, prod, masked_cons, masked_prod;
+ unsigned long flags;
+ uint32_t size = p9_req->tc->size;
+ struct xen_9pfs_dataring *ring;
+ int num;
+
+ list_for_each_entry(priv, &xen_9pfs_devs, list) {
+ if (priv->client == client)
+ break;
+ }
+ if (priv == NULL || priv->client != client)
+ return -EINVAL;
+
+ num = p9_req->tc->tag % priv->num_rings;
+ ring = &priv->rings[num];
+
+again:
+ while (wait_event_interruptible(ring->wq,
+ p9_xen_write_todo(ring, size) > 0) != 0);
+
+ spin_lock_irqsave(&ring->lock, flags);
+ cons = ring->intf->out_cons;
+ prod = ring->intf->out_prod;
+ mb();
+
+ if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
+ spin_unlock_irqrestore(&ring->lock, flags);
+ goto again;
+ }
+
+ masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
+ masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
+
+ xen_9pfs_write_packet(ring->ring.out,
+ &masked_prod, masked_cons,
+ XEN_9PFS_RING_SIZE, p9_req->tc->sdata, size);
+
+ p9_req->status = REQ_STATUS_SENT;
+ wmb(); /* write ring before updating pointer */
+ prod += size;
+ ring->intf->out_prod = prod;
+ spin_unlock_irqrestore(&ring->lock, flags);
+ notify_remote_via_irq(ring->irq);
+
return 0;
}

--
1.9.1

2017-03-06 20:13:03

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 3/7] xen/9pfs: introduce Xen 9pfs transport driver

Introduce the Xen 9pfs transport driver: add struct xenbus_driver to
register as a xenbus driver and add struct p9_trans_module to register
as v9fs driver.

All functions are empty stubs for now.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
---
net/9p/trans_xen.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)
create mode 100644 net/9p/trans_xen.c

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
new file mode 100644
index 0000000..877dfd0
--- /dev/null
+++ b/net/9p/trans_xen.c
@@ -0,0 +1,101 @@
+/*
+ * linux/fs/9p/trans_xen
+ *
+ * Xen transport layer.
+ *
+ * Copyright (C) 2017 by Stefano Stabellini <[email protected]>
+ */
+
+#include <xen/events.h>
+#include <xen/grant_table.h>
+#include <xen/xen.h>
+#include <xen/xenbus.h>
+#include <xen/interface/io/9pfs.h>
+
+#include <linux/module.h>
+#include <net/9p/9p.h>
+#include <net/9p/client.h>
+#include <net/9p/transport.h>
+
+static int p9_xen_cancel(struct p9_client *client, struct p9_req_t *req)
+{
+ return 0;
+}
+
+static int p9_xen_create(struct p9_client *client, const char *addr, char *args)
+{
+ return 0;
+}
+
+static void p9_xen_close(struct p9_client *client)
+{
+}
+
+static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
+{
+ return 0;
+}
+
+static struct p9_trans_module p9_xen_trans = {
+ .name = "xen",
+ .maxsize = (1 << (XEN_9PFS_RING_ORDER + XEN_PAGE_SHIFT)),
+ .def = 1,
+ .create = p9_xen_create,
+ .close = p9_xen_close,
+ .request = p9_xen_request,
+ .cancel = p9_xen_cancel,
+ .owner = THIS_MODULE,
+};
+
+static const struct xenbus_device_id xen_9pfs_front_ids[] = {
+ { "9pfs" },
+ { "" }
+};
+
+static int xen_9pfs_front_remove(struct xenbus_device *dev)
+{
+ return 0;
+}
+
+static int xen_9pfs_front_probe(struct xenbus_device *dev,
+ const struct xenbus_device_id *id)
+{
+ return 0;
+}
+
+static int xen_9pfs_front_resume(struct xenbus_device *dev)
+{
+ return 0;
+}
+
+static void xen_9pfs_front_changed(struct xenbus_device *dev,
+ enum xenbus_state backend_state)
+{
+}
+
+static struct xenbus_driver xen_9pfs_front_driver = {
+ .ids = xen_9pfs_front_ids,
+ .probe = xen_9pfs_front_probe,
+ .remove = xen_9pfs_front_remove,
+ .resume = xen_9pfs_front_resume,
+ .otherend_changed = xen_9pfs_front_changed,
+};
+
+int p9_trans_xen_init(void)
+{
+ if (!xen_domain())
+ return -ENODEV;
+
+ pr_info("Initialising Xen transport for 9pfs\n");
+
+ v9fs_register_trans(&p9_xen_trans);
+ return xenbus_register_frontend(&xen_9pfs_front_driver);
+}
+module_init(p9_trans_xen_init);
+
+void p9_trans_xen_exit(void)
+{
+ v9fs_unregister_trans(&p9_xen_trans);
+ return xenbus_unregister_driver(&xen_9pfs_front_driver);
+}
+module_exit(p9_trans_xen_exit);
--
1.9.1

2017-03-06 20:13:26

by Stefano Stabellini

[permalink] [raw]

Subject: [PATCH 4/7] xen/9pfs: connect to the backend

Implement functions to handle the xenbus handshake. Upon connection,
allocate the rings according to the protocol specification.

Initialize a work_struct and a wait_queue. The work_struct will be used
to schedule work upon receiving an event channel notification from the
backend. The wait_queue will be used to wait when the ring is full and
we need to send a new request.

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
CC: Eric Van Hensbergen <[email protected]>
CC: Ron Minnich <[email protected]>
CC: Latchesar Ionkov <[email protected]>
CC: [email protected]
---
net/9p/trans_xen.c | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 227 insertions(+)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 877dfd0..9f6cf8d 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -17,6 +17,36 @@
#include <net/9p/client.h>
#include <net/9p/transport.h>

+#define XEN_9PFS_NUM_RINGS 2
+
+/* One per ring, more than one per 9pfs share */
+struct xen_9pfs_dataring {
+ struct xen_9pfs_front_priv *priv;
+
+ struct xen_9pfs_data_intf *intf;
+ grant_ref_t ref;
+ int evtchn;
+ int irq;
+ spinlock_t lock;
+
+ void *bytes;
+ struct xen_9pfs_data ring;
+ wait_queue_head_t wq;
+ struct work_struct work;
+};
+
+/* One per 9pfs share */
+struct xen_9pfs_front_priv {
+ struct list_head list;
+ struct xenbus_device *dev;
+ char *tag;
+ struct p9_client *client;
+
+ int num_rings;
+ struct xen_9pfs_dataring *rings;
+};
+static LIST_HEAD(xen_9pfs_devs);
+
static int p9_xen_cancel(struct p9_client *client, struct p9_req_t *req)
{
return 0;
@@ -36,6 +66,21 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
return 0;
}

+static void p9_xen_response(struct work_struct *work)
+{
+}
+
+static irqreturn_t xen_9pfs_front_event_handler(int irq, void *r)
+{
+ struct xen_9pfs_dataring *ring = r;
+ BUG_ON(!ring || !ring->priv->client);
+
+ wake_up_interruptible(&ring->wq);
+ schedule_work(&ring->work);
+
+ return IRQ_HANDLED;
+}
+
static struct p9_trans_module p9_xen_trans = {
.name = "xen",
.maxsize = (1 << (XEN_9PFS_RING_ORDER + XEN_PAGE_SHIFT)),
@@ -52,25 +97,207 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
{ "" }
};

+static int xen_9pfs_front_free(struct xen_9pfs_front_priv *priv)
+{
+ int i, j;
+
+ list_del(&priv->list);
+
+ for (i = 0; i < priv->num_rings; i++) {
+ if (priv->rings[i].intf == NULL)
+ break;
+ if (priv->rings[i].irq > 0)
+ unbind_from_irqhandler(priv->rings[i].irq, priv->dev);
+ if (priv->rings[i].bytes != NULL) {
+ for (j = 0; j < (1 << XEN_9PFS_RING_ORDER); j++)
+ gnttab_end_foreign_access(priv->rings[i].intf->ref[j], 0, 0);
+ free_pages((unsigned long)priv->rings[i].bytes, XEN_9PFS_RING_ORDER);
+ }
+ gnttab_end_foreign_access(priv->rings[i].ref, 0, 0);
+ free_page((unsigned long)priv->rings[i].intf);
+ }
+ kfree(priv->rings);
+ kfree(priv);
+
+ return 0;
+}
+
static int xen_9pfs_front_remove(struct xenbus_device *dev)
{
+ int ret;
+ struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
+
+ dev_set_drvdata(&dev->dev, NULL);
+ ret = xen_9pfs_front_free(priv);
+ return ret;
+}
+
+static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
+ struct xen_9pfs_dataring *ring)
+{
+ int i;
+ int ret = -ENOMEM;
+
+ init_waitqueue_head(&ring->wq);
+ spin_lock_init(&ring->lock);
+ INIT_WORK(&ring->work, p9_xen_response);
+
+ ring->intf = (struct xen_9pfs_data_intf *) __get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!ring->intf)
+ goto error;
+ memset(ring->intf, 0, XEN_PAGE_SIZE);
+ ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);
+ if (ring->bytes == NULL)
+ goto error;
+ for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
+ ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);
+ ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
+ ring->ring.in = ring->bytes;
+ ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
+
+ ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
+ if (ret)
+ goto error;
+ ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
+ 0, "xen_9pfs-frontend", ring);
+ if (ring->irq < 0) {
+ xenbus_free_evtchn(dev, ring->evtchn);
+ ret = ring->irq;
+ goto error;
+ }
return 0;
+
+error:
+ if (ring->intf != NULL)
+ kfree(ring->intf);
+ if (ring->bytes != NULL)
+ kfree(ring->bytes);
+ return ret;
}

static int xen_9pfs_front_probe(struct xenbus_device *dev,
const struct xenbus_device_id *id)
{
+ int ret = -EFAULT, i;
+ struct xenbus_transaction xbt;
+ struct xen_9pfs_front_priv *priv = NULL;
+ char *versions;
+ unsigned int max_rings, max_ring_order, len;
+
+ versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len);
+ if (!len || strcmp(versions, "1"))
+ return -EINVAL;
+ kfree(versions);
+ ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-rings", "%u", &max_rings);
+ if (ret < 0 || max_rings < XEN_9PFS_NUM_RINGS)
+ return -EINVAL;
+ ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u", &max_ring_order);
+ if (ret < 0|| max_ring_order < XEN_9PFS_RING_ORDER)
+ return -EINVAL;
+
+
+ priv = kzalloc(sizeof(struct xen_9pfs_front_priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->dev = dev;
+ priv->num_rings = XEN_9PFS_NUM_RINGS;
+ priv->rings = kzalloc(sizeof(struct xen_9pfs_dataring) * priv->num_rings,
+ GFP_KERNEL);
+ if (!priv->rings) {
+ kfree(priv);
+ return -ENOMEM;
+ }
+
+ again:
+ ret = xenbus_transaction_start(&xbt);
+ if (ret) {
+ xenbus_dev_fatal(dev, ret, "starting transaction");
+ goto error;
+ }
+ ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
+ if (ret)
+ goto error_xenbus;
+ ret = xenbus_printf(xbt, dev->nodename, "num-rings", "%u", priv->num_rings);
+ if (ret)
+ goto error_xenbus;
+ for (i = 0; i < priv->num_rings; i++) {
+ char str[16];
+
+ priv->rings[i].priv = priv;
+ ret = xen_9pfs_front_alloc_dataring(dev, &priv->rings[i]);
+ if (ret < 0)
+ goto error_xenbus;
+
+ sprintf(str, "ring-ref%u", i);
+ ret = xenbus_printf(xbt, dev->nodename, str, "%d", priv->rings[i].ref);
+ if (ret)
+ goto error_xenbus;
+
+ sprintf(str, "event-channel-%u", i);
+ ret = xenbus_printf(xbt, dev->nodename, str, "%u", priv->rings[i].evtchn);
+ if (ret)
+ goto error_xenbus;
+ }
+ priv->tag = xenbus_read(xbt, dev->nodename, "tag", NULL);
+ if (ret)
+ goto error_xenbus;
+ ret = xenbus_transaction_end(xbt, 0);
+ if (ret) {
+ if (ret == -EAGAIN)
+ goto again;
+ xenbus_dev_fatal(dev, ret, "completing transaction");
+ goto error;
+ }
+
+
+ list_add_tail(&priv->list, &xen_9pfs_devs);
+ dev_set_drvdata(&dev->dev, priv);
+ xenbus_switch_state(dev, XenbusStateInitialised);
+
return 0;
+
+ error_xenbus:
+ xenbus_transaction_end(xbt, 1);
+ xenbus_dev_fatal(dev, ret, "writing xenstore");
+ error:
+ dev_set_drvdata(&dev->dev, NULL);
+ xen_9pfs_front_free(priv);
+ return ret;
}

static int xen_9pfs_front_resume(struct xenbus_device *dev)
{
+ dev_warn(&dev->dev, "suspsend/resume unsupported\n");
return 0;
}

static void xen_9pfs_front_changed(struct xenbus_device *dev,
enum xenbus_state backend_state)
{
+ switch (backend_state) {
+ case XenbusStateReconfiguring:
+ case XenbusStateReconfigured:
+ case XenbusStateInitialising:
+ case XenbusStateInitialised:
+ case XenbusStateUnknown:
+ break;
+
+ case XenbusStateInitWait:
+ break;
+
+ case XenbusStateConnected:
+ xenbus_switch_state(dev, XenbusStateConnected);
+ break;
+
+ case XenbusStateClosed:
+ if (dev->state == XenbusStateClosed)
+ break;
+ /* Missed the backend's CLOSING state -- fallthrough */
+ case XenbusStateClosing:
+ xenbus_frontend_closed(dev);
+ break;
+ }
}

static struct xenbus_driver xen_9pfs_front_driver = {
--
1.9.1

2017-03-06 21:36:38

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 2/7] xen: introduce the header file for the Xen 9pfs transport protocol

On Mon, 6 Mar 2017, Boris Ostrovsky wrote:
> > + uint32_t size;
> > + uint8_t id;
> > + uint16_t tag;
>
> I realize that this is in the spec now and it's probably too late to ask
> this question but wouldn't it be better if id and tag were swapped? No
> need to pack and potentially faster access to tag.

I cannot do anything about it: that struct is defined by the 9pfs
specification (not the Xen spec, the general 9pfs spec). See:

https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/hensbergen/hensbergen.pdf

> > +} __attribute__((packed));
> > +
> > +#define XEN_9PFS_RING_ORDER 6
> > +#define XEN_9PFS_RING_SIZE XEN_FLEX_RING_SIZE(XEN_9PFS_RING_ORDER)
> > +DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
> > +
> > +#endif
>

2017-03-06 21:41:57

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 2/7] xen: introduce the header file for the Xen 9pfs transport protocol

On 03/06/2017 04:36 PM, Stefano Stabellini wrote:
> On Mon, 6 Mar 2017, Boris Ostrovsky wrote:
>>> + uint32_t size;
>>> + uint8_t id;
>>> + uint16_t tag;
>> I realize that this is in the spec now and it's probably too late to ask
>> this question but wouldn't it be better if id and tag were swapped? No
>> need to pack and potentially faster access to tag.
> I cannot do anything about it: that struct is defined by the 9pfs
> specification (not the Xen spec, the general 9pfs spec). See:

Oh, I thought it was Xen-specific (because it's described in
9pfs.markdown). Nevermind then.

-boris

>
> https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/hensbergen/hensbergen.pdf
>
>>> +} __attribute__((packed));
>>> +
>>> +#define XEN_9PFS_RING_ORDER 6
>>> +#define XEN_9PFS_RING_SIZE XEN_FLEX_RING_SIZE(XEN_9PFS_RING_ORDER)
>>> +DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
>>> +
>>> +#endif

2017-03-06 22:32:43

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 2/7] xen: introduce the header file for the Xen 9pfs transport protocol

> +struct xen_9pfs_header {
> + uint32_t size;
> + uint8_t id;
> + uint16_t tag;

I realize that this is in the spec now and it's probably too late to ask
this question but wouldn't it be better if id and tag were swapped? No
need to pack and potentially faster access to tag.

-boris

> +} __attribute__((packed));
> +
> +#define XEN_9PFS_RING_ORDER 6
> +#define XEN_9PFS_RING_SIZE XEN_FLEX_RING_SIZE(XEN_9PFS_RING_ORDER)
> +DEFINE_XEN_FLEX_RING_AND_INTF(xen_9pfs);
> +
> +#endif

2017-03-07 15:04:11

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 4/7] xen/9pfs: connect to the backend

>
> +static int xen_9pfs_front_free(struct xen_9pfs_front_priv *priv)
> +{
> + int i, j;
> +
> + list_del(&priv->list);
> +
> + for (i = 0; i < priv->num_rings; i++) {
> + if (priv->rings[i].intf == NULL)
> + break;

Are we guaranteed that all subsequent entries are not allocated (i.e.
this shouldn't be 'continue')?

> + if (priv->rings[i].irq > 0)
> + unbind_from_irqhandler(priv->rings[i].irq, priv->dev);
> + if (priv->rings[i].bytes != NULL) {
> + for (j = 0; j < (1 << XEN_9PFS_RING_ORDER); j++)
> + gnttab_end_foreign_access(priv->rings[i].intf->ref[j], 0, 0);
> + free_pages((unsigned long)priv->rings[i].bytes, XEN_9PFS_RING_ORDER);
> + }
> + gnttab_end_foreign_access(priv->rings[i].ref, 0, 0);
> + free_page((unsigned long)priv->rings[i].intf);
> + }
> + kfree(priv->rings);
> + kfree(priv);
> +
> + return 0;
> +}
> +
> static int xen_9pfs_front_remove(struct xenbus_device *dev)
> {
> + int ret;
> + struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
> +
> + dev_set_drvdata(&dev->dev, NULL);
> + ret = xen_9pfs_front_free(priv);
> + return ret;
> +}
> +
> +static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
> + struct xen_9pfs_dataring *ring)
> +{
> + int i;
> + int ret = -ENOMEM;
> +
> + init_waitqueue_head(&ring->wq);
> + spin_lock_init(&ring->lock);
> + INIT_WORK(&ring->work, p9_xen_response);
> +
> + ring->intf = (struct xen_9pfs_data_intf *) __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + if (!ring->intf)
> + goto error;
> + memset(ring->intf, 0, XEN_PAGE_SIZE);

get_zeroed_page()? (especially given that __get_free_page() returns
PAGE_SIZE, not XEN_PAGE_SIZE)

> + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);
> + if (ring->bytes == NULL)
> + goto error;
> + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> + ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);
> + ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
> + ring->ring.in = ring->bytes;
> + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
> +
> + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
> + if (ret)
> + goto error;
> + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
> + 0, "xen_9pfs-frontend", ring);
> + if (ring->irq < 0) {
> + xenbus_free_evtchn(dev, ring->evtchn);
> + ret = ring->irq;
> + goto error;
> + }
> return 0;
> +
> +error:

You may need to gnttab_end_foreign_access().

> + if (ring->intf != NULL)
> + kfree(ring->intf);
> + if (ring->bytes != NULL)
> + kfree(ring->bytes);
> + return ret;
> }
>
> static int xen_9pfs_front_probe(struct xenbus_device *dev,
> const struct xenbus_device_id *id)
> {
> + int ret = -EFAULT, i;

Unnecessary initialization.

> + struct xenbus_transaction xbt;
> + struct xen_9pfs_front_priv *priv = NULL;
> + char *versions;
> + unsigned int max_rings, max_ring_order, len;
> +
> + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len);
> + if (!len || strcmp(versions, "1"))
> + return -EINVAL;
> + kfree(versions);
> + ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-rings", "%u", &max_rings);
> + if (ret < 0 || max_rings < XEN_9PFS_NUM_RINGS)
> + return -EINVAL;
> + ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u", &max_ring_order);
> + if (ret < 0|| max_ring_order < XEN_9PFS_RING_ORDER)
> + return -EINVAL;
> +
> +
> + priv = kzalloc(sizeof(struct xen_9pfs_front_priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->dev = dev;
> + priv->num_rings = XEN_9PFS_NUM_RINGS;
> + priv->rings = kzalloc(sizeof(struct xen_9pfs_dataring) * priv->num_rings,
> + GFP_KERNEL);
> + if (!priv->rings) {
> + kfree(priv);
> + return -ENOMEM;
> + }
> +
> + again:
> + ret = xenbus_transaction_start(&xbt);
> + if (ret) {
> + xenbus_dev_fatal(dev, ret, "starting transaction");
> + goto error;
> + }
> + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
> + if (ret)
> + goto error_xenbus;
> + ret = xenbus_printf(xbt, dev->nodename, "num-rings", "%u", priv->num_rings);
> + if (ret)
> + goto error_xenbus;
> + for (i = 0; i < priv->num_rings; i++) {
> + char str[16];
> +
> + priv->rings[i].priv = priv;
> + ret = xen_9pfs_front_alloc_dataring(dev, &priv->rings[i]);

Not for -EAGAIN, I think.

-boris

> + if (ret < 0)
> + goto error_xenbus;
> +
> + sprintf(str, "ring-ref%u", i);
> + ret = xenbus_printf(xbt, dev->nodename, str, "%d", priv->rings[i].ref);
> + if (ret)
> + goto error_xenbus;
> +
> + sprintf(str, "event-channel-%u", i);
> + ret = xenbus_printf(xbt, dev->nodename, str, "%u", priv->rings[i].evtchn);
> + if (ret)
> + goto error_xenbus;
> + }
> + priv->tag = xenbus_read(xbt, dev->nodename, "tag", NULL);
> + if (ret)
> + goto error_xenbus;
> + ret = xenbus_transaction_end(xbt, 0);
> + if (ret) {
> + if (ret == -EAGAIN)
> + goto again;
> + xenbus_dev_fatal(dev, ret, "completing transaction");
> + goto error;
> + }
> +
> +
> + list_add_tail(&priv->list, &xen_9pfs_devs);
> + dev_set_drvdata(&dev->dev, priv);
> + xenbus_switch_state(dev, XenbusStateInitialised);
> +
> return 0;
> +
> + error_xenbus:
> + xenbus_transaction_end(xbt, 1);
> + xenbus_dev_fatal(dev, ret, "writing xenstore");
> + error:
> + dev_set_drvdata(&dev->dev, NULL);
> + xen_9pfs_front_free(priv);
> + return ret;
> }
>

2017-03-07 15:51:23

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 6/7] xen/9pfs: receive responses

On 03/06/2017 03:01 PM, Stefano Stabellini wrote:
> Upon receiving a notification from the backend, schedule the
> p9_xen_response work_struct. p9_xen_response checks if any responses are
> available, if so, it reads them one by one, calling p9_client_cb to send
> them up to the 9p layer (p9_client_cb completes the request). Handle the
> ring following the Xen 9pfs specification.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: Eric Van Hensbergen <[email protected]>
> CC: Ron Minnich <[email protected]>
> CC: Latchesar Ionkov <[email protected]>
> CC: [email protected]
> ---
> net/9p/trans_xen.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index 4e26556..1ca9246 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -149,6 +149,59 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
>
> static void p9_xen_response(struct work_struct *work)
> {
> + struct xen_9pfs_front_priv *priv;
> + struct xen_9pfs_dataring *ring;
> + RING_IDX cons, prod, masked_cons, masked_prod;
> + struct xen_9pfs_header h;
> + struct p9_req_t *req;
> + int status = REQ_STATUS_ERROR;

Doesn't this need to go inside the loop?

> +
> + ring = container_of(work, struct xen_9pfs_dataring, work);
> + priv = ring->priv;
> +
> + while (1) {
> + cons = ring->intf->in_cons;
> + prod = ring->intf->in_prod;
> + rmb();

Is this rmb() or mb()? (Or, in fact, virt_XXX()?) You used mb() in the
previous patch.

> +
> + if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
> + notify_remote_via_irq(ring->irq);
> + return;
> + }
> +
> + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> +
> + xen_9pfs_read_packet(ring->ring.in,
> + masked_prod, &masked_cons,
> + XEN_9PFS_RING_SIZE, &h, sizeof(h));
> +
> + req = p9_tag_lookup(priv->client, h.tag);
> + if (!req || req->status != REQ_STATUS_SENT) {
> + dev_warn(&priv->dev->dev, "Wrong req tag=%x\n", h.tag);
> + cons += h.size;
> + mb();
> + ring->intf->in_cons = cons;
> + continue;

I don't know what xen_9pfs_read_packet() does so perhaps it's done there
but shouldn't the pointers be updated regardless of the 'if' condition?

-boris

> + }
> +
> + memcpy(req->rc, &h, sizeof(h));
> + req->rc->offset = 0;
> +
> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> + xen_9pfs_read_packet(ring->ring.in,
> + masked_prod, &masked_cons,
> + XEN_9PFS_RING_SIZE, req->rc->sdata, h.size);
> +
> + mb();
> + cons += h.size;
> + ring->intf->in_cons = cons;
> +
> + if (req->status != REQ_STATUS_ERROR)
> + status = REQ_STATUS_RCVD;
> +
> + p9_client_cb(priv->client, req, status);
> + }
> }
>
> static irqreturn_t xen_9pfs_front_event_handler(int irq, void *r)
>

2017-03-07 16:30:57

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

On 03/06/2017 03:01 PM, Stefano Stabellini wrote:
> Implement struct p9_trans_module create and close functions by looking
> at the available Xen 9pfs frontend-backend connections. We don't expect
> many frontend-backend connections, thus walking a list is OK.
>
> Send requests to the backend by copying each request to one of the
> available rings (each frontend-backend connection comes with multiple
> rings). Handle the ring and notifications following the 9pfs
> specification. If there are not enough free bytes on the ring for the
> request, wait on the wait_queue: the backend will send a notification
> after consuming more requests.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: Eric Van Hensbergen <[email protected]>
> CC: Ron Minnich <[email protected]>
> CC: Latchesar Ionkov <[email protected]>
> CC: [email protected]
> ---
> net/9p/trans_xen.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index 9f6cf8d..4e26556 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -47,22 +47,103 @@ struct xen_9pfs_front_priv {
> };
> static LIST_HEAD(xen_9pfs_devs);
>
> +/* We don't currently allow canceling of requests */
> static int p9_xen_cancel(struct p9_client *client, struct p9_req_t *req)
> {
> - return 0;
> + return 1;
> }
>
> static int p9_xen_create(struct p9_client *client, const char *addr, char *args)
> {
> + struct xen_9pfs_front_priv *priv = NULL;
> +
> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> + if (!strcmp(priv->tag, addr))
> + break;
> + }

You could simplify this (and p9_xen_close()) but assigning client and
returning from inside the 'if' statement.

I am also not sure you need to initialize priv.

> + if (!priv || strcmp(priv->tag, addr))
> + return -EINVAL;
> +
> + priv->client = client;
> return 0;
> }
>
> static void p9_xen_close(struct p9_client *client)
> {
> + struct xen_9pfs_front_priv *priv = NULL;
> +
> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> + if (priv->client == client)
> + break;
> + }
> + if (!priv || priv->client != client)
> + return;
> +
> + priv->client = NULL;
> + return;
> +}
> +
> +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
> +{
> + RING_IDX cons, prod;
> +
> + cons = ring->intf->out_cons;
> + prod = ring->intf->out_prod;
> + mb();
> +
> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
> + return 1;
> + else
> + return 0;
> }
>
> static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> {
> + struct xen_9pfs_front_priv *priv = NULL;
> + RING_IDX cons, prod, masked_cons, masked_prod;
> + unsigned long flags;
> + uint32_t size = p9_req->tc->size;
> + struct xen_9pfs_dataring *ring;
> + int num;
> +
> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> + if (priv->client == client)
> + break;
> + }
> + if (priv == NULL || priv->client != client)
> + return -EINVAL;
> +
> + num = p9_req->tc->tag % priv->num_rings;
> + ring = &priv->rings[num];
> +
> +again:
> + while (wait_event_interruptible(ring->wq,
> + p9_xen_write_todo(ring, size) > 0) != 0);
> +
> + spin_lock_irqsave(&ring->lock, flags);
> + cons = ring->intf->out_cons;
> + prod = ring->intf->out_prod;
> + mb();
> +
> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {

This looks like p9_xen_write_todo(). BTW, where is xen_9pfs_queued()
defined? I couldn't find it. Same for xen_9pfs_mask() and
xen_9pfs_write_packet().

-boris

> + spin_unlock_irqrestore(&ring->lock, flags);
> + goto again;
> + }
> +
> + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> +
> + xen_9pfs_write_packet(ring->ring.out,
> + &masked_prod, masked_cons,
> + XEN_9PFS_RING_SIZE, p9_req->tc->sdata, size);
> +
> + p9_req->status = REQ_STATUS_SENT;
> + wmb(); /* write ring before updating pointer */
> + prod += size;
> + ring->intf->out_prod = prod;
> + spin_unlock_irqrestore(&ring->lock, flags);
> + notify_remote_via_irq(ring->irq);
> +
> return 0;
> }
>
>

2017-03-07 16:40:28

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 0/7] Xen transport for 9pfs frontend driver

On Mon, Mar 06, 2017 at 12:00:41PM -0800, Stefano Stabellini wrote:
> Hi all,
>
> This patch series implements a new transport for 9pfs, aimed at Xen
> systems.
>
> The transport is based on a traditional Xen frontend and backend drivers
> pair. This patch series implements the frontend, which typically runs in
> a regular unprivileged guest.
>
> I'll follow up with another series that implements the backend in
> userspace in QEMU, which typically runs in Dom0 (but could also run in
> a another guest).
>
> The frontend complies to the Xen transport for 9pfs specification
> version 1, available here:
>
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/9pfs.markdown;hb=HEAD

Kind of tangential to this series, but maybe it would make sense to implement
this transport in a fuse based 9pfs driver? I see there are already several
fuse-9pfs implementations around. Something for a GSoC/Outreach project?

Roger.

2017-03-07 18:28:06

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 0/7] Xen transport for 9pfs frontend driver

On Tue, 7 Mar 2017, Roger Pau Monné wrote:
> On Mon, Mar 06, 2017 at 12:00:41PM -0800, Stefano Stabellini wrote:
> > Hi all,
> >
> > This patch series implements a new transport for 9pfs, aimed at Xen
> > systems.
> >
> > The transport is based on a traditional Xen frontend and backend drivers
> > pair. This patch series implements the frontend, which typically runs in
> > a regular unprivileged guest.
> >
> > I'll follow up with another series that implements the backend in
> > userspace in QEMU, which typically runs in Dom0 (but could also run in
> > a another guest).
> >
> > The frontend complies to the Xen transport for 9pfs specification
> > version 1, available here:
> >
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/9pfs.markdown;hb=HEAD
>
> Kind of tangential to this series, but maybe it would make sense to implement
> this transport in a fuse based 9pfs driver? I see there are already several
> fuse-9pfs implementations around. Something for a GSoC/Outreach project?

Sure. Additionally, with open source frontends and backends already
available, it should be easier to code. I am happy to co-mentor the
project with you, if you feel like it.

2017-03-07 18:54:43

by Julien Grall

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 4/7] xen/9pfs: connect to the backend

Hi Stefano,

On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
> +static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
> + struct xen_9pfs_dataring *ring)
> +{
> + int i;
> + int ret = -ENOMEM;
> +
> + init_waitqueue_head(&ring->wq);
> + spin_lock_init(&ring->lock);
> + INIT_WORK(&ring->work, p9_xen_response);
> +
> + ring->intf = (struct xen_9pfs_data_intf *) __get_free_page(GFP_KERNEL | __GFP_ZERO);
> + if (!ring->intf)
> + goto error;
> + memset(ring->intf, 0, XEN_PAGE_SIZE);
> + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);

The ring order will be in term of Xen page size and not Linux. So you
are going to allocate much more memory than expected on 64KB kernel.

> + if (ring->bytes == NULL)
> + goto error;
> + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> + ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);.

Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).

Also, this is not going to work on 64K kernel because you will grant
access to noncontiguous memory (e.g 0-4K, 64K-68K,...).

We have various helper to break-down the page for you, see
gnttab_for_one_grant, gnttab_foreach_grant, gnttab_count_grant,
xen_for_each_gfn (though this one it is internal to xlate_mmu.c so far)

Please use them to avoid any further.

> + ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);

Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).

> + ring->ring.in = ring->bytes;
> + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
> +
> + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
> + if (ret)
> + goto error;
> + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
> + 0, "xen_9pfs-frontend", ring);
> + if (ring->irq < 0) {
> + xenbus_free_evtchn(dev, ring->evtchn);
> + ret = ring->irq;
> + goto error;
> + }
> return 0;
> +
> +error:
> + if (ring->intf != NULL)
> + kfree(ring->intf);
> + if (ring->bytes != NULL)
> + kfree(ring->bytes);
> + return ret;
> }

Cheers,

--
Julien Grall

2017-03-07 19:12:57

by Julien Grall

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 1/7] xen: import new ring macros in ring.h

Hi Stefano,

On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
> Sync the ring.h file with upstream Xen, to introduce the new ring macros.
> They will be used by the Xen transport for 9pfs.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
>
> ---
> NB: The new macros have not been committed to Xen yet. Do not apply this
> patch until they do.
> ---
> ---
> include/xen/interface/io/ring.h | 131 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 131 insertions(+)
>
> diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
> index 21f4fbd..e16aa92 100644
> --- a/include/xen/interface/io/ring.h
> +++ b/include/xen/interface/io/ring.h

[...]

> +#define XEN_FLEX_RING_SIZE(order) \
> + (1UL << (order + PAGE_SHIFT - 1))

This will need to be XEN_PAGE_SHIFT in order to works with 64K kernel.

Cheers,

--
Julien Grall

2017-03-08 00:21:26

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 4/7] xen/9pfs: connect to the backend

On Tue, 7 Mar 2017, Boris Ostrovsky wrote:
> > +static int xen_9pfs_front_free(struct xen_9pfs_front_priv *priv)
> > +{
> > + int i, j;
> > +
> > + list_del(&priv->list);
> > +
> > + for (i = 0; i < priv->num_rings; i++) {
> > + if (priv->rings[i].intf == NULL)
> > + break;
>
> Are we guaranteed that all subsequent entries are not allocated (i.e.
> this shouldn't be 'continue')?

Yes, we are guaranteed that all subsequent entries are NULL because they
are allocated in order until an error occurs.

> > + if (priv->rings[i].irq > 0)
> > + unbind_from_irqhandler(priv->rings[i].irq, priv->dev);
> > + if (priv->rings[i].bytes != NULL) {
> > + for (j = 0; j < (1 << XEN_9PFS_RING_ORDER); j++)
> > + gnttab_end_foreign_access(priv->rings[i].intf->ref[j], 0, 0);
> > + free_pages((unsigned long)priv->rings[i].bytes, XEN_9PFS_RING_ORDER);
> > + }
> > + gnttab_end_foreign_access(priv->rings[i].ref, 0, 0);
> > + free_page((unsigned long)priv->rings[i].intf);
> > + }
> > + kfree(priv->rings);
> > + kfree(priv);
> > +
> > + return 0;
> > +}
> > +
> > static int xen_9pfs_front_remove(struct xenbus_device *dev)
> > {
> > + int ret;
> > + struct xen_9pfs_front_priv *priv = dev_get_drvdata(&dev->dev);
> > +
> > + dev_set_drvdata(&dev->dev, NULL);
> > + ret = xen_9pfs_front_free(priv);
> > + return ret;
> > +}
> > +
> > +static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
> > + struct xen_9pfs_dataring *ring)
> > +{
> > + int i;
> > + int ret = -ENOMEM;
> > +
> > + init_waitqueue_head(&ring->wq);
> > + spin_lock_init(&ring->lock);
> > + INIT_WORK(&ring->work, p9_xen_response);
> > +
> > + ring->intf = (struct xen_9pfs_data_intf *) __get_free_page(GFP_KERNEL | __GFP_ZERO);
> > + if (!ring->intf)
> > + goto error;
> > + memset(ring->intf, 0, XEN_PAGE_SIZE);
>
> get_zeroed_page()? (especially given that __get_free_page() returns
> PAGE_SIZE, not XEN_PAGE_SIZE)

Yes, good point.

> > + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);
> > + if (ring->bytes == NULL)
> > + goto error;
> > + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> > + ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);
> > + ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
> > + ring->ring.in = ring->bytes;
> > + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
> > +
> > + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
> > + if (ret)
> > + goto error;
> > + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
> > + 0, "xen_9pfs-frontend", ring);
> > + if (ring->irq < 0) {
> > + xenbus_free_evtchn(dev, ring->evtchn);
> > + ret = ring->irq;
> > + goto error;
> > + }
> > return 0;
> > +
> > +error:
>
> You may need to gnttab_end_foreign_access().

Actually this error path is unnecessary because it will be handled by
xen_9pfs_front_probe, that calls xen_9pfs_front_free on errors. I'll
remove it.

> > + if (ring->intf != NULL)
> > + kfree(ring->intf);
> > + if (ring->bytes != NULL)
> > + kfree(ring->bytes);
> > + return ret;
> > }
> >
> > static int xen_9pfs_front_probe(struct xenbus_device *dev,
> > const struct xenbus_device_id *id)
> > {
> > + int ret = -EFAULT, i;
>
> Unnecessary initialization.

I'll remove it.

> > + struct xenbus_transaction xbt;
> > + struct xen_9pfs_front_priv *priv = NULL;
> > + char *versions;
> > + unsigned int max_rings, max_ring_order, len;
> > +
> > + versions = xenbus_read(XBT_NIL, dev->otherend, "versions", &len);
> > + if (!len || strcmp(versions, "1"))
> > + return -EINVAL;
> > + kfree(versions);
> > + ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-rings", "%u", &max_rings);
> > + if (ret < 0 || max_rings < XEN_9PFS_NUM_RINGS)
> > + return -EINVAL;
> > + ret = xenbus_scanf(XBT_NIL, dev->otherend, "max-ring-page-order", "%u", &max_ring_order);
> > + if (ret < 0|| max_ring_order < XEN_9PFS_RING_ORDER)
> > + return -EINVAL;
> > +
> > +
> > + priv = kzalloc(sizeof(struct xen_9pfs_front_priv), GFP_KERNEL);
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + priv->dev = dev;
> > + priv->num_rings = XEN_9PFS_NUM_RINGS;
> > + priv->rings = kzalloc(sizeof(struct xen_9pfs_dataring) * priv->num_rings,
> > + GFP_KERNEL);
> > + if (!priv->rings) {
> > + kfree(priv);
> > + return -ENOMEM;
> > + }
> > +
> > + again:
> > + ret = xenbus_transaction_start(&xbt);
> > + if (ret) {
> > + xenbus_dev_fatal(dev, ret, "starting transaction");
> > + goto error;
> > + }
> > + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
> > + if (ret)
> > + goto error_xenbus;
> > + ret = xenbus_printf(xbt, dev->nodename, "num-rings", "%u", priv->num_rings);
> > + if (ret)
> > + goto error_xenbus;
> > + for (i = 0; i < priv->num_rings; i++) {
> > + char str[16];
> > +
> > + priv->rings[i].priv = priv;
> > + ret = xen_9pfs_front_alloc_dataring(dev, &priv->rings[i]);
>
> Not for -EAGAIN, I think.

I don't think xen_9pfs_front_alloc_dataring can return EAGAIN. EAGAIN
can only come from xenbus_transaction_end, the case we handle below.

> > + if (ret < 0)
> > + goto error_xenbus;
> > +
> > + sprintf(str, "ring-ref%u", i);
> > + ret = xenbus_printf(xbt, dev->nodename, str, "%d", priv->rings[i].ref);
> > + if (ret)
> > + goto error_xenbus;
> > +
> > + sprintf(str, "event-channel-%u", i);
> > + ret = xenbus_printf(xbt, dev->nodename, str, "%u", priv->rings[i].evtchn);
> > + if (ret)
> > + goto error_xenbus;
> > + }
> > + priv->tag = xenbus_read(xbt, dev->nodename, "tag", NULL);
> > + if (ret)
> > + goto error_xenbus;
> > + ret = xenbus_transaction_end(xbt, 0);
> > + if (ret) {
> > + if (ret == -EAGAIN)
> > + goto again;
> > + xenbus_dev_fatal(dev, ret, "completing transaction");
> > + goto error;
> > + }
> > +
> > +
> > + list_add_tail(&priv->list, &xen_9pfs_devs);
> > + dev_set_drvdata(&dev->dev, priv);
> > + xenbus_switch_state(dev, XenbusStateInitialised);
> > +
> > return 0;
> > +
> > + error_xenbus:
> > + xenbus_transaction_end(xbt, 1);
> > + xenbus_dev_fatal(dev, ret, "writing xenstore");
> > + error:
> > + dev_set_drvdata(&dev->dev, NULL);
> > + xen_9pfs_front_free(priv);
> > + return ret;
> > }
> >
>

2017-03-08 00:21:24

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 1/7] xen: import new ring macros in ring.h

On Tue, 7 Mar 2017, Julien Grall wrote:
> Hi Stefano,
>
> On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
> > Sync the ring.h file with upstream Xen, to introduce the new ring macros.
> > They will be used by the Xen transport for 9pfs.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: [email protected]
> >
> > ---
> > NB: The new macros have not been committed to Xen yet. Do not apply this
> > patch until they do.
> > ---
> > ---
> > include/xen/interface/io/ring.h | 131
> > ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 131 insertions(+)
> >
> > diff --git a/include/xen/interface/io/ring.h
> > b/include/xen/interface/io/ring.h
> > index 21f4fbd..e16aa92 100644
> > --- a/include/xen/interface/io/ring.h
> > +++ b/include/xen/interface/io/ring.h
>
> [...]
>
> > +#define XEN_FLEX_RING_SIZE(order)
> > \
> > + (1UL << (order + PAGE_SHIFT - 1))
>
> This will need to be XEN_PAGE_SHIFT in order to works with 64K kernel.

Good point! I had it right at the beginning but I lost the change in one
of the updates from xen.git.

2017-03-08 00:56:15

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

On Tue, 7 Mar 2017, Boris Ostrovsky wrote:
> On 03/06/2017 03:01 PM, Stefano Stabellini wrote:
> > Implement struct p9_trans_module create and close functions by looking
> > at the available Xen 9pfs frontend-backend connections. We don't expect
> > many frontend-backend connections, thus walking a list is OK.
> >
> > Send requests to the backend by copying each request to one of the
> > available rings (each frontend-backend connection comes with multiple
> > rings). Handle the ring and notifications following the 9pfs
> > specification. If there are not enough free bytes on the ring for the
> > request, wait on the wait_queue: the backend will send a notification
> > after consuming more requests.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: Eric Van Hensbergen <[email protected]>
> > CC: Ron Minnich <[email protected]>
> > CC: Latchesar Ionkov <[email protected]>
> > CC: [email protected]
> > ---
> > net/9p/trans_xen.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 82 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> > index 9f6cf8d..4e26556 100644
> > --- a/net/9p/trans_xen.c
> > +++ b/net/9p/trans_xen.c
> > @@ -47,22 +47,103 @@ struct xen_9pfs_front_priv {
> > };
> > static LIST_HEAD(xen_9pfs_devs);
> >
> > +/* We don't currently allow canceling of requests */
> > static int p9_xen_cancel(struct p9_client *client, struct p9_req_t *req)
> > {
> > - return 0;
> > + return 1;
> > }
> >
> > static int p9_xen_create(struct p9_client *client, const char *addr, char *args)
> > {
> > + struct xen_9pfs_front_priv *priv = NULL;
> > +
> > + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> > + if (!strcmp(priv->tag, addr))
> > + break;
> > + }
>
>
> You could simplify this (and p9_xen_close()) but assigning client and
> returning from inside the 'if' statement.

I'll do that.

> I am also not sure you need to initialize priv.

With the new changes, I won't need to.

> > + if (!priv || strcmp(priv->tag, addr))
> > + return -EINVAL;
> > +
> > + priv->client = client;
> > return 0;
> > }
> >
> > static void p9_xen_close(struct p9_client *client)
> > {
> > + struct xen_9pfs_front_priv *priv = NULL;
> > +
> > + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> > + if (priv->client == client)
> > + break;
> > + }
> > + if (!priv || priv->client != client)
> > + return;
> > +
> > + priv->client = NULL;
> > + return;
> > +}
> > +
> > +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
> > +{
> > + RING_IDX cons, prod;
> > +
> > + cons = ring->intf->out_cons;
> > + prod = ring->intf->out_prod;
> > + mb();
> > +
> > + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
> > + return 1;
> > + else
> > + return 0;
> > }
> >
> > static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> > {
> > + struct xen_9pfs_front_priv *priv = NULL;
> > + RING_IDX cons, prod, masked_cons, masked_prod;
> > + unsigned long flags;
> > + uint32_t size = p9_req->tc->size;
> > + struct xen_9pfs_dataring *ring;
> > + int num;
> > +
> > + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> > + if (priv->client == client)
> > + break;
> > + }
> > + if (priv == NULL || priv->client != client)
> > + return -EINVAL;
> > +
> > + num = p9_req->tc->tag % priv->num_rings;
> > + ring = &priv->rings[num];
> > +
> > +again:
> > + while (wait_event_interruptible(ring->wq,
> > + p9_xen_write_todo(ring, size) > 0) != 0);
> > +
> > + spin_lock_irqsave(&ring->lock, flags);
> > + cons = ring->intf->out_cons;
> > + prod = ring->intf->out_prod;
> > + mb();
> > +
> > + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
>
>
> This looks like p9_xen_write_todo().

p9_xen_write_todo is just a wrapper around xen_9pfs_queued to provide
a return value that works well with wait_event_interruptible.

I would prefer not to call p9_xen_write_todo here, because it's simpler
if we don't read prod and cons twice.

> BTW, where is xen_9pfs_queued()
> defined? I couldn't find it. Same for xen_9pfs_mask() and
> xen_9pfs_write_packet().

They are provided by the new ring macros, see
include/xen/interface/io/ring.h (the first patch).

> > + spin_unlock_irqrestore(&ring->lock, flags);
> > + goto again;
> > + }
> > +
> > + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > + xen_9pfs_write_packet(ring->ring.out,
> > + &masked_prod, masked_cons,
> > + XEN_9PFS_RING_SIZE, p9_req->tc->sdata, size);
> > +
> > + p9_req->status = REQ_STATUS_SENT;
> > + wmb(); /* write ring before updating pointer */
> > + prod += size;
> > + ring->intf->out_prod = prod;
> > + spin_unlock_irqrestore(&ring->lock, flags);
> > + notify_remote_via_irq(ring->irq);
> > +
> > return 0;
> > }
> >
> >
>

2017-03-08 01:22:12

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 6/7] xen/9pfs: receive responses

On Tue, 7 Mar 2017, Stefano Stabellini wrote:
> > > +
> > > + ring = container_of(work, struct xen_9pfs_dataring, work);
> > > + priv = ring->priv;
> > > +
> > > + while (1) {
> > > + cons = ring->intf->in_cons;
> > > + prod = ring->intf->in_prod;
> > > + rmb();
> >
> >
> > Is this rmb() or mb()? (Or, in fact, virt_XXX()?) You used mb() in the
> > previous patch.
>
> I think they should all be virt_XXX, thanks.

regarding mb() vs. rmb(), give a look at the workflow at the end of
docs/misc/9pfs.markdown, under "Ring Usage".

2017-03-08 02:17:23

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 4/7] xen/9pfs: connect to the backend

On Tue, 7 Mar 2017, Julien Grall wrote:
> Hi Stefano,
>
> On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
> > +static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
> > + struct xen_9pfs_dataring *ring)
> > +{
> > + int i;
> > + int ret = -ENOMEM;
> > +
> > + init_waitqueue_head(&ring->wq);
> > + spin_lock_init(&ring->lock);
> > + INIT_WORK(&ring->work, p9_xen_response);
> > +
> > + ring->intf = (struct xen_9pfs_data_intf *) __get_free_page(GFP_KERNEL
> > | __GFP_ZERO);
> > + if (!ring->intf)
> > + goto error;
> > + memset(ring->intf, 0, XEN_PAGE_SIZE);
> > + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > XEN_9PFS_RING_ORDER);
>
> The ring order will be in term of Xen page size and not Linux. So you are
> going to allocate much more memory than expected on 64KB kernel.

I'll fix.

> > + if (ring->bytes == NULL)
> > + goto error;
> > + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> > + ring->intf->ref[i] =
> > gnttab_grant_foreign_access(dev->otherend_id,
> > pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);.
>
> Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).

OK

> Also, this is not going to work on 64K kernel because you will grant access to
> noncontiguous memory (e.g 0-4K, 64K-68K,...).

By using virt_to_gfn like you suggested, the loop will correctly iterate
on a 4K by 4K basis, even on a 64K kernel:

ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
XEN_9PFS_RING_ORDER - (PAGE_SHIFT - XEN_PAGE_SHIFT));
for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, virt_to_gfn((void*)ring->bytes) + i, 0);

where XEN_9PFS_RING_ORDER specifies the order at 4K granularity. Am I
missing something?

> We have various helper to break-down the page for you, see
> gnttab_for_one_grant, gnttab_foreach_grant, gnttab_count_grant,
> xen_for_each_gfn (though this one it is internal to xlate_mmu.c so far)
>
> Please use them to avoid any further.
>
> > + ring->ref = gnttab_grant_foreign_access(dev->otherend_id,
> > pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
>
> Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).

Sure

> > + ring->ring.in = ring->bytes;
> > + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
> > +
> > + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
> > + if (ret)
> > + goto error;
> > + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn,
> > xen_9pfs_front_event_handler,
> > + 0, "xen_9pfs-frontend", ring);
> > + if (ring->irq < 0) {
> > + xenbus_free_evtchn(dev, ring->evtchn);
> > + ret = ring->irq;
> > + goto error;
> > + }
> > return 0;
> > +
> > +error:
> > + if (ring->intf != NULL)
> > + kfree(ring->intf);
> > + if (ring->bytes != NULL)
> > + kfree(ring->bytes);
> > + return ret;
> > }

2017-03-08 05:21:45

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 6/7] xen/9pfs: receive responses

On Tue, 7 Mar 2017, Boris Ostrovsky wrote:
> On 03/06/2017 03:01 PM, Stefano Stabellini wrote:
> > Upon receiving a notification from the backend, schedule the
> > p9_xen_response work_struct. p9_xen_response checks if any responses are
> > available, if so, it reads them one by one, calling p9_client_cb to send
> > them up to the 9p layer (p9_client_cb completes the request). Handle the
> > ring following the Xen 9pfs specification.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: Eric Van Hensbergen <[email protected]>
> > CC: Ron Minnich <[email protected]>
> > CC: Latchesar Ionkov <[email protected]>
> > CC: [email protected]
> > ---
> > net/9p/trans_xen.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 53 insertions(+)
> >
> > diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> > index 4e26556..1ca9246 100644
> > --- a/net/9p/trans_xen.c
> > +++ b/net/9p/trans_xen.c
> > @@ -149,6 +149,59 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> >
> > static void p9_xen_response(struct work_struct *work)
> > {
> > + struct xen_9pfs_front_priv *priv;
> > + struct xen_9pfs_dataring *ring;
> > + RING_IDX cons, prod, masked_cons, masked_prod;
> > + struct xen_9pfs_header h;
> > + struct p9_req_t *req;
> > + int status = REQ_STATUS_ERROR;
>
>
> Doesn't this need to go inside the loop?

Yes, thank you!

> > +
> > + ring = container_of(work, struct xen_9pfs_dataring, work);
> > + priv = ring->priv;
> > +
> > + while (1) {
> > + cons = ring->intf->in_cons;
> > + prod = ring->intf->in_prod;
> > + rmb();
>
>
> Is this rmb() or mb()? (Or, in fact, virt_XXX()?) You used mb() in the
> previous patch.

I think they should all be virt_XXX, thanks.

> > +
> > + if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
> > + notify_remote_via_irq(ring->irq);
> > + return;
> > + }
> > +
> > + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> > + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > +
> > + xen_9pfs_read_packet(ring->ring.in,
> > + masked_prod, &masked_cons,
> > + XEN_9PFS_RING_SIZE, &h, sizeof(h));
> > +
> > + req = p9_tag_lookup(priv->client, h.tag);
> > + if (!req || req->status != REQ_STATUS_SENT) {
> > + dev_warn(&priv->dev->dev, "Wrong req tag=%x\n", h.tag);
> > + cons += h.size;
> > + mb();
> > + ring->intf->in_cons = cons;
> > + continue;
>
>
> I don't know what xen_9pfs_read_packet() does so perhaps it's done there
> but shouldn't the pointers be updated regardless of the 'if' condition?

This is the error path - the index is increased immediately. In the
non-error case, we do that right after the next read_packet call, few
lines below.

> > + }
> > +
> > + memcpy(req->rc, &h, sizeof(h));
> > + req->rc->offset = 0;
> > +
> > + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> > + xen_9pfs_read_packet(ring->ring.in,
> > + masked_prod, &masked_cons,
> > + XEN_9PFS_RING_SIZE, req->rc->sdata, h.size);
> > +
> > + mb();
> > + cons += h.size;
> > + ring->intf->in_cons = cons;

Here ^

> > + if (req->status != REQ_STATUS_ERROR)
> > + status = REQ_STATUS_RCVD;
> > +
> > + p9_client_cb(priv->client, req, status);
> > + }
> > }
> >
> > static irqreturn_t xen_9pfs_front_event_handler(int irq, void *r)
> >
>

2017-03-08 13:50:07

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 4/7] xen/9pfs: connect to the backend

>
>>> + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);
>>> + if (ring->bytes == NULL)
>>> + goto error;
>>> + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
>>> + ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);
>>> + ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
>>> + ring->ring.in = ring->bytes;
>>> + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
>>> +
>>> + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
>>> + if (ret)
>>> + goto error;
>>> + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
>>> + 0, "xen_9pfs-frontend", ring);
>>> + if (ring->irq < 0) {
>>> + xenbus_free_evtchn(dev, ring->evtchn);
>>> + ret = ring->irq;
>>> + goto error;
>>> + }
>>> return 0;
>>> +
>>> +error:
>> You may need to gnttab_end_foreign_access().
> Actually this error path is unnecessary because it will be handled by
> xen_9pfs_front_probe, that calls xen_9pfs_front_free on errors. I'll
> remove it.

(It's a matter of personal preference but I think a routine should clean
up its own mess whenever it can.)

>
>
> +
> + again:
> + ret = xenbus_transaction_start(&xbt);
> + if (ret) {
> + xenbus_dev_fatal(dev, ret, "starting transaction");
> + goto error;
> + }
> + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
> + if (ret)
> + goto error_xenbus;
> + ret = xenbus_printf(xbt, dev->nodename, "num-rings", "%u", priv->num_rings);
> + if (ret)
> + goto error_xenbus;
> + for (i = 0; i < priv->num_rings; i++) {
> + char str[16];
> +
> + priv->rings[i].priv = priv;
> + ret = xen_9pfs_front_alloc_dataring(dev, &priv->rings[i]);
>> Not for -EAGAIN, I think.
> I don't think xen_9pfs_front_alloc_dataring can return EAGAIN. EAGAIN
> can only come from xenbus_transaction_end, the case we handle below.

I didn't mean that xen_9pfs_front_alloc_dataring() can return -EAGAIN. I
was referring to...

>
>
>>> + if (ret < 0)
>>> + goto error_xenbus;
>>> +
>>> + sprintf(str, "ring-ref%u", i);
>>> + ret = xenbus_printf(xbt, dev->nodename, str, "%d", priv->rings[i].ref);
>>> + if (ret)
>>> + goto error_xenbus;
>>> +
>>> + sprintf(str, "event-channel-%u", i);
>>> + ret = xenbus_printf(xbt, dev->nodename, str, "%u", priv->rings[i].evtchn);
>>> + if (ret)
>>> + goto error_xenbus;
>>> + }
>>> + priv->tag = xenbus_read(xbt, dev->nodename, "tag", NULL);
>>> + if (ret)
>>> + goto error_xenbus;
>>> + ret = xenbus_transaction_end(xbt, 0);
>>> + if (ret) {
>>> + if (ret == -EAGAIN)
>>> + goto again;

... this.

Sorry for not being clear.

-boris

2017-03-08 13:59:54

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

>>> +}
>>> +
>>> +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
>>> +{
>>> + RING_IDX cons, prod;
>>> +
>>> + cons = ring->intf->out_cons;
>>> + prod = ring->intf->out_prod;
>>> + mb();
>>> +
>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
>>> + return 1;
>>> + else
>>> + return 0;
>>> }
>>>
>>> static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
>>> {
>>> + struct xen_9pfs_front_priv *priv = NULL;
>>> + RING_IDX cons, prod, masked_cons, masked_prod;
>>> + unsigned long flags;
>>> + uint32_t size = p9_req->tc->size;
>>> + struct xen_9pfs_dataring *ring;
>>> + int num;
>>> +
>>> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
>>> + if (priv->client == client)
>>> + break;
>>> + }
>>> + if (priv == NULL || priv->client != client)
>>> + return -EINVAL;
>>> +
>>> + num = p9_req->tc->tag % priv->num_rings;
>>> + ring = &priv->rings[num];
>>> +
>>> +again:
>>> + while (wait_event_interruptible(ring->wq,
>>> + p9_xen_write_todo(ring, size) > 0) != 0);
>>> +
>>> + spin_lock_irqsave(&ring->lock, flags);
>>> + cons = ring->intf->out_cons;
>>> + prod = ring->intf->out_prod;
>>> + mb();
>>> +
>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
>>
>> This looks like p9_xen_write_todo().
> p9_xen_write_todo is just a wrapper around xen_9pfs_queued to provide
> a return value that works well with wait_event_interruptible.
>
> I would prefer not to call p9_xen_write_todo here, because it's simpler
> if we don't read prod and cons twice.

I was referring to the whole code fragment after spin_lock_irqsave(),
not just the last line. Isn't it exactly !p9_xen_write_todo()?

>
>
>> BTW, where is xen_9pfs_queued()
>> defined? I couldn't find it. Same for xen_9pfs_mask() and
>> xen_9pfs_write_packet().
> They are provided by the new ring macros, see
> include/xen/interface/io/ring.h (the first patch).

Oh, right. I was searching for the string literally.

-boris

2017-03-08 14:34:17

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 6/7] xen/9pfs: receive responses

>
>>> +
>>> + if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
>>> + notify_remote_via_irq(ring->irq);
>>> + return;
>>> + }
>>> +
>>> + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
>>> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
>>> +
>>> + xen_9pfs_read_packet(ring->ring.in,
>>> + masked_prod, &masked_cons,
>>> + XEN_9PFS_RING_SIZE, &h, sizeof(h));
>>> +
>>> + req = p9_tag_lookup(priv->client, h.tag);
>>> + if (!req || req->status != REQ_STATUS_SENT) {
>>> + dev_warn(&priv->dev->dev, "Wrong req tag=%x\n", h.tag);
>>> + cons += h.size;
>>> + mb();
>>> + ring->intf->in_cons = cons;
>>> + continue;
>>
>> I don't know what xen_9pfs_read_packet() does so perhaps it's done there
>> but shouldn't the pointers be updated regardless of the 'if' condition?
> This is the error path - the index is increased immediately. In the
> non-error case, we do that right after the next read_packet call, few
> lines below.
>
>
>>> + }
>>> +
>>> + memcpy(req->rc, &h, sizeof(h));
>>> + req->rc->offset = 0;
>>> +
>>> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
>>> + xen_9pfs_read_packet(ring->ring.in,
>>> + masked_prod, &masked_cons,
>>> + XEN_9PFS_RING_SIZE, req->rc->sdata, h.size);
>>> +
>>> + mb();
>>> + cons += h.size;
>>> + ring->intf->in_cons = cons;
> Here ^
>

So the second read is reading again from the same pointer in the ring,
but this time it gets the whole packet, including the header. The first
read was just poking at the header. Right?

If that's correct, can you add a comment somewhere? (unless this is
obvious to everyone else but me.)

-boris

2017-03-08 11:40:15

by Julien Grall

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 1/7] xen: import new ring macros in ring.h

Hi Stefano,

On 08/03/17 00:12, Stefano Stabellini wrote:
> On Tue, 7 Mar 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
>>> Sync the ring.h file with upstream Xen, to introduce the new ring macros.
>>> They will be used by the Xen transport for 9pfs.
>>>
>>> Signed-off-by: Stefano Stabellini <[email protected]>
>>> CC: [email protected]
>>> CC: [email protected]
>>> CC: [email protected]
>>>
>>> ---
>>> NB: The new macros have not been committed to Xen yet. Do not apply this
>>> patch until they do.
>>> ---
>>> ---
>>> include/xen/interface/io/ring.h | 131
>>> ++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 131 insertions(+)
>>>
>>> diff --git a/include/xen/interface/io/ring.h
>>> b/include/xen/interface/io/ring.h
>>> index 21f4fbd..e16aa92 100644
>>> --- a/include/xen/interface/io/ring.h
>>> +++ b/include/xen/interface/io/ring.h
>>
>> [...]
>>
>>> +#define XEN_FLEX_RING_SIZE(order)
>>> \
>>> + (1UL << (order + PAGE_SHIFT - 1))
>>
>> This will need to be XEN_PAGE_SHIFT in order to works with 64K kernel.
>
> Good point! I had it right at the beginning but I lost the change in one
> of the updates from xen.git.

It is probably worth to consider introducing XEN_PAGE_SIZE in the
hypervisor code because we are likely to miss the change when the
headers will be re-sync in the future.

Cheers,

--
Julien Grall

2017-03-08 12:34:45

by Julien Grall

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 4/7] xen/9pfs: connect to the backend

Hi Stefano,

On 08/03/17 00:49, Stefano Stabellini wrote:
> On Tue, 7 Mar 2017, Julien Grall wrote:
>> On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
>>> + if (ring->bytes == NULL)
>>> + goto error;
>>> + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
>>> + ring->intf->ref[i] =
>>> gnttab_grant_foreign_access(dev->otherend_id,
>>> pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);.
>>
>> Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).
>
> OK
>
>
>> Also, this is not going to work on 64K kernel because you will grant access to
>> noncontiguous memory (e.g 0-4K, 64K-68K,...).
>
> By using virt_to_gfn like you suggested, the loop will correctly iterate
> on a 4K by 4K basis, even on a 64K kernel:
>
> ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> XEN_9PFS_RING_ORDER - (PAGE_SHIFT - XEN_PAGE_SHIFT));
> for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, virt_to_gfn((void*)ring->bytes) + i, 0);

BTW, the cast (void *) is not necessary.

>
> where XEN_9PFS_RING_ORDER specifies the order at 4K granularity. Am I
> missing something?

I think it is fine. You could move virt_to_gfn(...) outside and avoid to
do the translation everytime.

Cheers,

--
Julien Grall

2017-03-08 19:09:32

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 4/7] xen/9pfs: connect to the backend

On Wed, 8 Mar 2017, Julien Grall wrote:
> Hi Stefano,
>
> On 08/03/17 00:49, Stefano Stabellini wrote:
> > On Tue, 7 Mar 2017, Julien Grall wrote:
> > > On 03/06/2017 08:01 PM, Stefano Stabellini wrote:
> > > > + if (ring->bytes == NULL)
> > > > + goto error;
> > > > + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> > > > + ring->intf->ref[i] =
> > > > gnttab_grant_foreign_access(dev->otherend_id,
> > > > pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);.
> > >
> > > Please use virt_to_gfn rather than pfn_to_gfn(virt_to_pfn).
> >
> > OK
> >
> >
> > > Also, this is not going to work on 64K kernel because you will grant
> > > access to
> > > noncontiguous memory (e.g 0-4K, 64K-68K,...).
> >
> > By using virt_to_gfn like you suggested, the loop will correctly iterate
> > on a 4K by 4K basis, even on a 64K kernel:
> >
> > ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > XEN_9PFS_RING_ORDER - (PAGE_SHIFT - XEN_PAGE_SHIFT));
> > for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> > ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id,
> > virt_to_gfn((void*)ring->bytes) + i, 0);
>
> BTW, the cast (void *) is not necessary.
>
> >
> > where XEN_9PFS_RING_ORDER specifies the order at 4K granularity. Am I
> > missing something?
>
> I think it is fine. You could move virt_to_gfn(...) outside and avoid to do
> the translation everytime.

All right, thanks.

2017-03-08 19:23:41

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 4/7] xen/9pfs: connect to the backend

On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
> >>> + ring->bytes = (void*)__get_free_pages(GFP_KERNEL | __GFP_ZERO, XEN_9PFS_RING_ORDER);
> >>> + if (ring->bytes == NULL)
> >>> + goto error;
> >>> + for (i = 0; i < (1 << XEN_9PFS_RING_ORDER); i++)
> >>> + ring->intf->ref[i] = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->bytes) + i), 0);
> >>> + ring->ref = gnttab_grant_foreign_access(dev->otherend_id, pfn_to_gfn(virt_to_pfn((void*)ring->intf)), 0);
> >>> + ring->ring.in = ring->bytes;
> >>> + ring->ring.out = ring->bytes + XEN_9PFS_RING_SIZE;
> >>> +
> >>> + ret = xenbus_alloc_evtchn(dev, &ring->evtchn);
> >>> + if (ret)
> >>> + goto error;
> >>> + ring->irq = bind_evtchn_to_irqhandler(ring->evtchn, xen_9pfs_front_event_handler,
> >>> + 0, "xen_9pfs-frontend", ring);
> >>> + if (ring->irq < 0) {
> >>> + xenbus_free_evtchn(dev, ring->evtchn);
> >>> + ret = ring->irq;
> >>> + goto error;
> >>> + }
> >>> return 0;
> >>> +
> >>> +error:
> >> You may need to gnttab_end_foreign_access().
> > Actually this error path is unnecessary because it will be handled by
> > xen_9pfs_front_probe, that calls xen_9pfs_front_free on errors. I'll
> > remove it.
>
>
> (It's a matter of personal preference but I think a routine should clean
> up its own mess whenever it can.)
>
> >
> >
> > +
> > + again:
> > + ret = xenbus_transaction_start(&xbt);
> > + if (ret) {
> > + xenbus_dev_fatal(dev, ret, "starting transaction");
> > + goto error;
> > + }
> > + ret = xenbus_printf(xbt, dev->nodename, "version", "%u", 1);
> > + if (ret)
> > + goto error_xenbus;
> > + ret = xenbus_printf(xbt, dev->nodename, "num-rings", "%u", priv->num_rings);
> > + if (ret)
> > + goto error_xenbus;
> > + for (i = 0; i < priv->num_rings; i++) {
> > + char str[16];
> > +
> > + priv->rings[i].priv = priv;
> > + ret = xen_9pfs_front_alloc_dataring(dev, &priv->rings[i]);
> >> Not for -EAGAIN, I think.
> > I don't think xen_9pfs_front_alloc_dataring can return EAGAIN. EAGAIN
> > can only come from xenbus_transaction_end, the case we handle below.
>
> I didn't mean that xen_9pfs_front_alloc_dataring() can return -EAGAIN. I
> was referring to...
>
> >
> >
> >>> + if (ret < 0)
> >>> + goto error_xenbus;
> >>> +
> >>> + sprintf(str, "ring-ref%u", i);
> >>> + ret = xenbus_printf(xbt, dev->nodename, str, "%d", priv->rings[i].ref);
> >>> + if (ret)
> >>> + goto error_xenbus;
> >>> +
> >>> + sprintf(str, "event-channel-%u", i);
> >>> + ret = xenbus_printf(xbt, dev->nodename, str, "%u", priv->rings[i].evtchn);
> >>> + if (ret)
> >>> + goto error_xenbus;
> >>> + }
> >>> + priv->tag = xenbus_read(xbt, dev->nodename, "tag", NULL);
> >>> + if (ret)
> >>> + goto error_xenbus;
> >>> + ret = xenbus_transaction_end(xbt, 0);
> >>> + if (ret) {
> >>> + if (ret == -EAGAIN)
> >>> + goto again;
>
> ... this.
>
> Sorry for not being clear.

You are right! I'll move the call to xen_9pfs_front_alloc_dataring out
of the xenbus loop.

2017-03-08 19:26:33

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 6/7] xen/9pfs: receive responses

On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
> >>> +
> >>> + if (xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < sizeof(h)) {
> >>> + notify_remote_via_irq(ring->irq);
> >>> + return;
> >>> + }
> >>> +
> >>> + masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> >>> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> >>> +
> >>> + xen_9pfs_read_packet(ring->ring.in,
> >>> + masked_prod, &masked_cons,
> >>> + XEN_9PFS_RING_SIZE, &h, sizeof(h));
> >>> +
> >>> + req = p9_tag_lookup(priv->client, h.tag);
> >>> + if (!req || req->status != REQ_STATUS_SENT) {
> >>> + dev_warn(&priv->dev->dev, "Wrong req tag=%x\n", h.tag);
> >>> + cons += h.size;
> >>> + mb();
> >>> + ring->intf->in_cons = cons;
> >>> + continue;
> >>
> >> I don't know what xen_9pfs_read_packet() does so perhaps it's done there
> >> but shouldn't the pointers be updated regardless of the 'if' condition?
> > This is the error path - the index is increased immediately. In the
> > non-error case, we do that right after the next read_packet call, few
> > lines below.
> >
> >
> >>> + }
> >>> +
> >>> + memcpy(req->rc, &h, sizeof(h));
> >>> + req->rc->offset = 0;
> >>> +
> >>> + masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> >>> + xen_9pfs_read_packet(ring->ring.in,
> >>> + masked_prod, &masked_cons,
> >>> + XEN_9PFS_RING_SIZE, req->rc->sdata, h.size);
> >>> +
> >>> + mb();
> >>> + cons += h.size;
> >>> + ring->intf->in_cons = cons;
> > Here ^
> >
>
>
> So the second read is reading again from the same pointer in the ring,
> but this time it gets the whole packet, including the header. The first
> read was just poking at the header. Right?

That's right. First we read the header, to know how much data to read.

> If that's correct, can you add a comment somewhere? (unless this is
> obvious to everyone else but me.)

Sure.

2017-03-08 19:41:20

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
> >>> +}
> >>> +
> >>> +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
> >>> +{
> >>> + RING_IDX cons, prod;
> >>> +
> >>> + cons = ring->intf->out_cons;
> >>> + prod = ring->intf->out_prod;
> >>> + mb();
> >>> +
> >>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
> >>> + return 1;
> >>> + else
> >>> + return 0;
> >>> }
> >>>
> >>> static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> >>> {
> >>> + struct xen_9pfs_front_priv *priv = NULL;
> >>> + RING_IDX cons, prod, masked_cons, masked_prod;
> >>> + unsigned long flags;
> >>> + uint32_t size = p9_req->tc->size;
> >>> + struct xen_9pfs_dataring *ring;
> >>> + int num;
> >>> +
> >>> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> >>> + if (priv->client == client)
> >>> + break;
> >>> + }
> >>> + if (priv == NULL || priv->client != client)
> >>> + return -EINVAL;
> >>> +
> >>> + num = p9_req->tc->tag % priv->num_rings;
> >>> + ring = &priv->rings[num];
> >>> +
> >>> +again:
> >>> + while (wait_event_interruptible(ring->wq,
> >>> + p9_xen_write_todo(ring, size) > 0) != 0);
> >>> +
> >>> + spin_lock_irqsave(&ring->lock, flags);
> >>> + cons = ring->intf->out_cons;
> >>> + prod = ring->intf->out_prod;
> >>> + mb();
> >>> +
> >>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
> >>
> >> This looks like p9_xen_write_todo().
> > p9_xen_write_todo is just a wrapper around xen_9pfs_queued to provide
> > a return value that works well with wait_event_interruptible.
> >
> > I would prefer not to call p9_xen_write_todo here, because it's simpler
> > if we don't read prod and cons twice.
>
> I was referring to the whole code fragment after spin_lock_irqsave(),
> not just the last line. Isn't it exactly !p9_xen_write_todo()?

Yes, it is true they are almost the same. The difference, and the reason
for p9_xen_write_todo to exist, is that p9_xen_write_todo is called in
the wait_event_interruptible loop, as such it needs to read prod and
cons every time. On the other end, here we want to read them once. Does
it make sense?

2017-03-08 20:05:34

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

On 03/08/2017 02:33 PM, Stefano Stabellini wrote:
> On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
>>>>> +}
>>>>> +
>>>>> +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
>>>>> +{
>>>>> + RING_IDX cons, prod;
>>>>> +
>>>>> + cons = ring->intf->out_cons;
>>>>> + prod = ring->intf->out_prod;
>>>>> + mb();
>>>>> +
>>>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
>>>>> + return 1;
>>>>> + else
>>>>> + return 0;
>>>>> }
>>>>>
>>>>> static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
>>>>> {
>>>>> + struct xen_9pfs_front_priv *priv = NULL;
>>>>> + RING_IDX cons, prod, masked_cons, masked_prod;
>>>>> + unsigned long flags;
>>>>> + uint32_t size = p9_req->tc->size;
>>>>> + struct xen_9pfs_dataring *ring;
>>>>> + int num;
>>>>> +
>>>>> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
>>>>> + if (priv->client == client)
>>>>> + break;
>>>>> + }
>>>>> + if (priv == NULL || priv->client != client)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + num = p9_req->tc->tag % priv->num_rings;
>>>>> + ring = &priv->rings[num];
>>>>> +
>>>>> +again:
>>>>> + while (wait_event_interruptible(ring->wq,
>>>>> + p9_xen_write_todo(ring, size) > 0) != 0);
>>>>> +
>>>>> + spin_lock_irqsave(&ring->lock, flags);
>>>>> + cons = ring->intf->out_cons;
>>>>> + prod = ring->intf->out_prod;
>>>>> + mb();
>>>>> +
>>>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
>>>> This looks like p9_xen_write_todo().
>>> p9_xen_write_todo is just a wrapper around xen_9pfs_queued to provide
>>> a return value that works well with wait_event_interruptible.
>>>
>>> I would prefer not to call p9_xen_write_todo here, because it's simpler
>>> if we don't read prod and cons twice.
>> I was referring to the whole code fragment after spin_lock_irqsave(),
>> not just the last line. Isn't it exactly !p9_xen_write_todo()?
> Yes, it is true they are almost the same. The difference, and the reason
> for p9_xen_write_todo to exist, is that p9_xen_write_todo is called in
> the wait_event_interruptible loop, as such it needs to read prod and
> cons every time. On the other end, here we want to read them once. Does
> it make sense?

I am clearly being particularly dense here but what I was thinking was:

again:
while (wait_event_interruptible(ring->wq,
p9_xen_write_todo(ring, size) > 0) != 0);

spin_lock_irqsave(&ring->lock, flags);
if (!p9_xen_write_todo(ring, size)) {
spin_unlock_irqrestore(&ring->lock, flags);
goto again;
}

There is no extra read of prod/cons.

-boris

2017-03-08 20:12:34

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 6/7] xen/9pfs: receive responses

On Tue, Mar 07, 2017 at 05:13:59PM -0800, Stefano Stabellini wrote:
> On Tue, 7 Mar 2017, Stefano Stabellini wrote:
> > > > +
> > > > + ring = container_of(work, struct xen_9pfs_dataring, work);
> > > > + priv = ring->priv;
> > > > +
> > > > + while (1) {
> > > > + cons = ring->intf->in_cons;
> > > > + prod = ring->intf->in_prod;
> > > > + rmb();
> > >
> > >
> > > Is this rmb() or mb()? (Or, in fact, virt_XXX()?) You used mb() in the
> > > previous patch.
> >
> > I think they should all be virt_XXX, thanks.
>
> regarding mb() vs. rmb(), give a look at the workflow at the end of
> docs/misc/9pfs.markdown, under "Ring Usage".

That is not what Boris meant. He meant that you should use the
virt_ variants instead of the rmb() or wmb().

The reason that on UP kernels the rmb() and wmb() can be converted
to NOPs. While that is OK for a UP kernel it is not good in virtualization
as we need those barriers regardless of the flavor of the kernel.

>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xen.org/xen-devel

2017-03-08 21:02:55

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

>> There is no extra read of prod/cons.
> Yes, there are: just after this if statement we would have to read them
> again to calculate masked_prod and masked_cons.

Ah, of course. Thanks.

-boris

2017-03-08 21:21:22

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 5/7] xen/9pfs: send requests to the backend

On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
> On 03/08/2017 02:33 PM, Stefano Stabellini wrote:
> > On Wed, 8 Mar 2017, Boris Ostrovsky wrote:
> >>>>> +}
> >>>>> +
> >>>>> +static int p9_xen_write_todo(struct xen_9pfs_dataring *ring, RING_IDX size)
> >>>>> +{
> >>>>> + RING_IDX cons, prod;
> >>>>> +
> >>>>> + cons = ring->intf->out_cons;
> >>>>> + prod = ring->intf->out_prod;
> >>>>> + mb();
> >>>>> +
> >>>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) >= size)
> >>>>> + return 1;
> >>>>> + else
> >>>>> + return 0;
> >>>>> }
> >>>>>
> >>>>> static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> >>>>> {
> >>>>> + struct xen_9pfs_front_priv *priv = NULL;
> >>>>> + RING_IDX cons, prod, masked_cons, masked_prod;
> >>>>> + unsigned long flags;
> >>>>> + uint32_t size = p9_req->tc->size;
> >>>>> + struct xen_9pfs_dataring *ring;
> >>>>> + int num;
> >>>>> +
> >>>>> + list_for_each_entry(priv, &xen_9pfs_devs, list) {
> >>>>> + if (priv->client == client)
> >>>>> + break;
> >>>>> + }
> >>>>> + if (priv == NULL || priv->client != client)
> >>>>> + return -EINVAL;
> >>>>> +
> >>>>> + num = p9_req->tc->tag % priv->num_rings;
> >>>>> + ring = &priv->rings[num];
> >>>>> +
> >>>>> +again:
> >>>>> + while (wait_event_interruptible(ring->wq,
> >>>>> + p9_xen_write_todo(ring, size) > 0) != 0);
> >>>>> +
> >>>>> + spin_lock_irqsave(&ring->lock, flags);
> >>>>> + cons = ring->intf->out_cons;
> >>>>> + prod = ring->intf->out_prod;
> >>>>> + mb();
> >>>>> +
> >>>>> + if (XEN_9PFS_RING_SIZE - xen_9pfs_queued(prod, cons, XEN_9PFS_RING_SIZE) < size) {
> >>>> This looks like p9_xen_write_todo().
> >>> p9_xen_write_todo is just a wrapper around xen_9pfs_queued to provide
> >>> a return value that works well with wait_event_interruptible.
> >>>
> >>> I would prefer not to call p9_xen_write_todo here, because it's simpler
> >>> if we don't read prod and cons twice.
> >> I was referring to the whole code fragment after spin_lock_irqsave(),
> >> not just the last line. Isn't it exactly !p9_xen_write_todo()?
> > Yes, it is true they are almost the same. The difference, and the reason
> > for p9_xen_write_todo to exist, is that p9_xen_write_todo is called in
> > the wait_event_interruptible loop, as such it needs to read prod and
> > cons every time. On the other end, here we want to read them once. Does
> > it make sense?
>
>
> I am clearly being particularly dense here but what I was thinking was:
>
> again:
> while (wait_event_interruptible(ring->wq,
> p9_xen_write_todo(ring, size) > 0) != 0);
>
> spin_lock_irqsave(&ring->lock, flags);
> if (!p9_xen_write_todo(ring, size)) {
> spin_unlock_irqrestore(&ring->lock, flags);
> goto again;
> }
>
> There is no extra read of prod/cons.

Yes, there are: just after this if statement we would have to read them
again to calculate masked_prod and masked_cons.

2017-03-09 03:07:50

by Roger Pau Monne

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 0/7] Xen transport for 9pfs frontend driver

On Tue, Mar 07, 2017 at 10:27:05AM -0800, Stefano Stabellini wrote:
> On Tue, 7 Mar 2017, Roger Pau Monn? wrote:
> > On Mon, Mar 06, 2017 at 12:00:41PM -0800, Stefano Stabellini wrote:
> > > Hi all,
> > >
> > > This patch series implements a new transport for 9pfs, aimed at Xen
> > > systems.
> > >
> > > The transport is based on a traditional Xen frontend and backend drivers
> > > pair. This patch series implements the frontend, which typically runs in
> > > a regular unprivileged guest.
> > >
> > > I'll follow up with another series that implements the backend in
> > > userspace in QEMU, which typically runs in Dom0 (but could also run in
> > > a another guest).
> > >
> > > The frontend complies to the Xen transport for 9pfs specification
> > > version 1, available here:
> > >
> > > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/9pfs.markdown;hb=HEAD
> >
> > Kind of tangential to this series, but maybe it would make sense to implement
> > this transport in a fuse based 9pfs driver? I see there are already several
> > fuse-9pfs implementations around. Something for a GSoC/Outreach project?
>
> Sure. Additionally, with open source frontends and backends already
> available, it should be easier to code. I am happy to co-mentor the
> project with you, if you feel like it.

I don't mind co-mentoring it, so far I haven't got lucky with any of my other
GSoC projects, but I don't know anything about 9pfs or fuse :).

This also has the difficulty that neither you not me is a member of any of the
9pfs-fuse projects, so it might be hard to get the changes upstream.

Roger.

2017-03-13 22:36:15

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [Xen-devel] [PATCH 0/7] Xen transport for 9pfs frontend driver

On Thu, 9 Mar 2017, Roger Pau Monné wrote:
> On Tue, Mar 07, 2017 at 10:27:05AM -0800, Stefano Stabellini wrote:
> > On Tue, 7 Mar 2017, Roger Pau Monné wrote:
> > > On Mon, Mar 06, 2017 at 12:00:41PM -0800, Stefano Stabellini wrote:
> > > > Hi all,
> > > >
> > > > This patch series implements a new transport for 9pfs, aimed at Xen
> > > > systems.
> > > >
> > > > The transport is based on a traditional Xen frontend and backend drivers
> > > > pair. This patch series implements the frontend, which typically runs in
> > > > a regular unprivileged guest.
> > > >
> > > > I'll follow up with another series that implements the backend in
> > > > userspace in QEMU, which typically runs in Dom0 (but could also run in
> > > > a another guest).
> > > >
> > > > The frontend complies to the Xen transport for 9pfs specification
> > > > version 1, available here:
> > > >
> > > > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob_plain;f=docs/misc/9pfs.markdown;hb=HEAD
> > >
> > > Kind of tangential to this series, but maybe it would make sense to implement
> > > this transport in a fuse based 9pfs driver? I see there are already several
> > > fuse-9pfs implementations around. Something for a GSoC/Outreach project?
> >
> > Sure. Additionally, with open source frontends and backends already
> > available, it should be easier to code. I am happy to co-mentor the
> > project with you, if you feel like it.
>
> I don't mind co-mentoring it, so far I haven't got lucky with any of my other
> GSoC projects, but I don't know anything about 9pfs or fuse :).
>
> This also has the difficulty that neither you not me is a member of any of the
> 9pfs-fuse projects, so it might be hard to get the changes upstream.

Good point. It would be best if one of the mentors was already engaged
with the 9pfs-fuse community.