Hi all,
ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
hypercall interface and PV protocol are always based on 4KB page granularity.
Any attempt to boot a Linux guest with 64KB pages enabled will result to a
guest crash.
This series is a first attempt to allow those Linux running with the current
hypercall interface and PV protocol.
This solution has been chosen because we want to run Linux 64KB in released
Xen ARM version or/and platform using an old version of Linux DOM0.
There is room for improvement, such as support of 64KB grant, modification
of PV protocol to support different page size... They will be explored in a
separate patch series later.
TODO list:
- Convert swiotlb to 64KB
- Convert xenfb to 64KB
- Check if backend in QEMU works with DOM0 64KB
- It may be possible to move some common define between
netback/netfront and blkfront/blkback in an header
All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
to run it on x86 as I don't have a box with Xen x86 running. I would be
happy if someone give a try and see possible regression for x86.
A branch based on the latest linux/master can be found here:
git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v3
Comments, suggestions are welcomed.
Sincerely yours,
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Julien Grall (20):
net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
arm/xen: Drop pte_mfn and mfn_pte
xen: Add Xen specific page definition
xen/grant: Introduce helpers to split a page into grant
xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
block/xen-blkfront: Split blkif_queue_request in 2
block/xen-blkfront: Store a page rather a pfn in the grant structure
block/xen-blkfront: split get_grant in 2
xen/biomerge: Don't allow biovec to be merge when Linux is not using
4KB page
xen/xenbus: Use Xen page definition
tty/hvc: xen: Use xen page definition
xen/balloon: Don't rely on the page granularity is the same for Xen
and Linux
xen/events: fifo: Make it running on 64KB granularity
xen/grant-table: Make it running on 64KB granularity
block/xen-blkfront: Make it running on 64KB page granularity
block/xen-blkback: Make it running on 64KB page granularity
net/xen-netfront: Make it running on 64KB page granularity
net/xen-netback: Make it running on 64KB page granularity
xen/privcmd: Add support for Linux 64KB page granularity
arm/xen: Add support for 64KB page granularity
arch/arm/include/asm/xen/page.h | 18 +-
arch/arm/xen/enlighten.c | 6 +-
arch/arm/xen/p2m.c | 6 +-
arch/x86/include/asm/xen/page.h | 2 +-
drivers/block/xen-blkback/blkback.c | 5 +-
drivers/block/xen-blkback/common.h | 17 +-
drivers/block/xen-blkback/xenbus.c | 9 +-
drivers/block/xen-blkfront.c | 552 +++++++++++++++++++++++-------------
drivers/net/xen-netback/common.h | 15 +-
drivers/net/xen-netback/netback.c | 163 +++++++----
drivers/net/xen-netfront.c | 122 +++++---
drivers/tty/hvc/hvc_xen.c | 4 +-
drivers/xen/balloon.c | 47 ++-
drivers/xen/biomerge.c | 8 +
drivers/xen/events/events_base.c | 2 +-
drivers/xen/events/events_fifo.c | 2 +-
drivers/xen/grant-table.c | 32 ++-
drivers/xen/privcmd.c | 8 +-
drivers/xen/xenbus/xenbus_client.c | 6 +-
drivers/xen/xenbus/xenbus_probe.c | 3 +-
drivers/xen/xlate_mmu.c | 124 +++++---
include/xen/grant_table.h | 51 ++++
include/xen/page.h | 27 +-
23 files changed, 844 insertions(+), 385 deletions(-)
--
2.1.4
The skb doesn't change within the function. Therefore it's only
necessary to check if we need GSO once at the beginning.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Ian Campbell <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: [email protected]
Changes in v2:
- Patch added
---
drivers/net/xen-netback/netback.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 8293374..66f1780 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -277,6 +277,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
unsigned long bytes;
int gso_type = XEN_NETIF_GSO_TYPE_NONE;
+ if (skb_is_gso(skb)) {
+ if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+ gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+ else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+ gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+ }
+
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
@@ -336,13 +343,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
}
/* Leave a gap for the GSO descriptor. */
- if (skb_is_gso(skb)) {
- if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
- gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
- else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
- gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
- }
-
if (*head && ((1 << gso_type) & queue->vif->gso_mask))
queue->rx.req_cons++;
--
2.1.4
They are not used in common code expect in one place in balloon.c which is
only compiled when Linux is using PV MMU. It's not the case on ARM.
Rather than worrying how to handle the 64KB case, drop them.
Signed-off-by: Julien Grall <[email protected]>
---
Stefano Stabellini <[email protected]>
Russell King <[email protected]>
Changes in v3:
- Patch added
---
arch/arm/include/asm/xen/page.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 1279563..98c9fc3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -13,9 +13,6 @@
#define phys_to_machine_mapping_valid(pfn) (1)
-#define pte_mfn pte_pfn
-#define mfn_pte pfn_pte
-
/* Xen machine address */
typedef struct xmaddr {
phys_addr_t maddr;
--
2.1.4
The Xen hypercall interface is always using 4K page granularity on ARM
and x86 architecture.
With the incoming support of 64K page granularity for ARM64 guest, it
won't be possible to re-use the Linux page definition in Xen drivers.
Introduce Xen page definition helpers based on the Linux page
definition. They have exactly the same name but prefixed with
XEN_/xen_ prefix.
Also modify xen_page_to_gfn to use new Xen page definition.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Changes in v3:
- Fix errors reported by checkpatch.pl
- Rename pfn to xen_pfn in xen_pfn_to_page
- Add a comment that we assume PAGE_SIZE to be a multiple of
XEN_PAGE_SIZE
- s/MFN/GFN/ according to new naming
- Add Stefano's reviewed-by
Changes in v2:
- Add XEN_PFN_UP
- Add a comment describing the behavior of page_to_pfn
---
include/xen/page.h | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/xen/page.h b/include/xen/page.h
index f202992..dac1b26 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -1,11 +1,36 @@
#ifndef _XEN_PAGE_H
#define _XEN_PAGE_H
+#include <asm/page.h>
+
+/* The hypercall interface supports only 4KB page */
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (_AC(1, UL) << XEN_PAGE_SHIFT)
+#define XEN_PAGE_MASK (~(XEN_PAGE_SIZE-1))
+#define xen_offset_in_page(p) ((unsigned long)(p) & ~XEN_PAGE_MASK)
+
+/*
+ * We asume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
+ * XXX: Add a BUILD_BUG_ON?
+ */
+
+#define xen_pfn_to_page(xen_pfn) \
+ ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
+#define xen_page_to_pfn(page) \
+ (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
+
+#define XEN_PFN_PER_PAGE (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PFN_DOWN(x) ((x) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_UP(x) (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
+#define XEN_PFN_PHYS(x) ((phys_addr_t)(x) << XEN_PAGE_SHIFT)
+
#include <asm/xen/page.h>
+/* Return the GFN associated to the first 4KB of the page */
static inline unsigned long xen_page_to_gfn(struct page *page)
{
- return pfn_to_gfn(page_to_pfn(page));
+ return pfn_to_gfn(xen_page_to_pfn(page));
}
struct xen_memory_region {
--
2.1.4
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.
The new helpers will be in charge to split the Linux page into grants and
call a function given by the caller on each grant.
Also provide an helper to count the number of grants within a given
contiguous region.
Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h was required.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Changes in v3:
- Fix error reported by checkpatch.pl
- Typoes
- s/pfn/xen_pfn/ in gnttab_foreach_grant
- Drop the possibility to use less data. The complexity is moved
in netback which is the only user
- Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
- s/offset/start/ in gnttab_count_grant and update the
description of the parameter
- s/mfn/gfn base on the new terminologies
- Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
- Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
- Fix compilation on x86.
Changes in v2:
- Patch added
---
arch/x86/include/asm/xen/page.h | 2 +-
drivers/xen/grant-table.c | 26 +++++++++++++++++++++++++
include/xen/grant_table.h | 42 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 0b762f6..501479e 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -12,7 +12,7 @@
#include <asm/pgtable.h>
#include <xen/interface/xen.h>
-#include <xen/grant_table.h>
+#include <xen/interface/grant_table.h>
#include <xen/features.h>
/* Xen machine address */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 62f591f..94ae0fd 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
}
EXPORT_SYMBOL_GPL(gnttab_batch_copy);
+void gnttab_foreach_grant_in_range(struct page *page,
+ unsigned int offset,
+ unsigned int len,
+ xen_grant_fn_t fn,
+ void *data)
+{
+ unsigned int goffset;
+ unsigned int glen;
+ unsigned long xen_pfn;
+
+ len = min_t(unsigned int, PAGE_SIZE - offset, len);
+ goffset = xen_offset_in_page(offset);
+
+ xen_pfn = xen_page_to_pfn(page) + XEN_PFN_DOWN(offset);
+
+ while (len) {
+ glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
+ fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
+
+ goffset = 0;
+ xen_pfn++;
+ len -= glen;
+ }
+}
+EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
+
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 4478f4b..2a8ebe8 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -45,8 +45,10 @@
#include <asm/xen/hypervisor.h>
#include <xen/features.h>
+#include <xen/page.h>
#include <linux/mm_types.h>
#include <linux/page-flags.h>
+#include <linux/kernel.h>
#define GNTTAB_RESERVED_XENSTORE 1
@@ -224,4 +226,44 @@ static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
#endif
}
+/* Split Linux page in chunk of the size of the grant and call fn
+ *
+ * Parameters of fn:
+ * gfn: guest frame number
+ * offset: offset in the grant
+ * len: length of the data in the grant.
+ * data: internal information
+ */
+typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data);
+
+void gnttab_foreach_grant_in_range(struct page *page,
+ unsigned int offset,
+ unsigned int len,
+ xen_grant_fn_t fn,
+ void *data);
+
+/* Helper to get to call fn only on the first "grant chunk" */
+static inline void gnttab_one_grant(struct page *page, unsigned int offset,
+ unsigned len, xen_grant_fn_t fn,
+ void *data)
+{
+ /* The first request is limited to the size of one grant */
+ len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
+ len);
+
+ gnttab_foreach_grant_in_range(page, offset, len, fn, data);
+}
+
+/* Get the number of grant in a specified region
+ *
+ * start: Offset from the beginning of the first page
+ * len: total length of data (can cross multiple page)
+ */
+static inline unsigned int gnttab_count_grant(unsigned int start,
+ unsigned int len)
+{
+ return XEN_PFN_UP(xen_offset_in_page(start) + len);
+}
+
#endif /* __ASM_GNTTAB_H__ */
--
2.1.4
Many PV drivers contain the idiom:
pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref
Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.
This is useful where drivers are allocating a full Linux page for each
grant.
Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.
[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Changes in v3:
- Rename gnttab_page_grant_foreign_access_ref into
gnttab_page_grant_foreign_access_ref_one
- Fix typo in the commit message
- s/mfn/gfn based on the new naming
- Add David and Stefano's reviewed-by
Changes in v2:
- Patch added
---
include/xen/grant_table.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 2a8ebe8..1c1ddd9 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -131,6 +131,15 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback);
void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
unsigned long frame, int readonly);
+/* Give access to the first 4K of the page */
+static inline void gnttab_page_grant_foreign_access_ref_one(
+ grant_ref_t ref, domid_t domid,
+ struct page *page, int readonly)
+{
+ gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+ readonly);
+}
+
void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
unsigned long pfn);
--
2.1.4
Currently, blkif_queue_request has 2 distinct execution path:
- Send a discard request
- Send a read/write request
The function is also allocating grants to use for generating the
request. Although, this is only used for read/write request.
Rather than having a function with 2 distinct execution path, separate
the function in 2. This will also remove one level of tabulation.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Roger, if you really want if can drop the else clause in
blkif_queue_request, IHMO it's more clear here. Although I've kept
your Reviewed-by. Let me know if it's not fine.
Changes in v3:
- Fix errors reported by checkpatch.pl
- Add Roger's Reviewed-by
Changes in v2:
- Patch added
---
drivers/block/xen-blkfront.c | 281 ++++++++++++++++++++++++-------------------
1 file changed, 155 insertions(+), 126 deletions(-)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index ebb3624..b235689 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -394,13 +394,35 @@ static int blkif_ioctl(struct block_device *bdev, fmode_t mode,
return 0;
}
-/*
- * Generate a Xen blkfront IO request from a blk layer request. Reads
- * and writes are handled as expected.
- *
- * @req: a request struct
- */
-static int blkif_queue_request(struct request *req)
+static int blkif_queue_discard_req(struct request *req)
+{
+ struct blkfront_info *info = req->rq_disk->private_data;
+ struct blkif_request *ring_req;
+ unsigned long id;
+
+ /* Fill out a communications ring structure. */
+ ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+ id = get_id_from_freelist(info);
+ info->shadow[id].request = req;
+
+ ring_req->operation = BLKIF_OP_DISCARD;
+ ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
+ ring_req->u.discard.id = id;
+ ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
+ if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
+ ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
+ else
+ ring_req->u.discard.flag = 0;
+
+ info->ring.req_prod_pvt++;
+
+ /* Keep a private copy so we can reissue requests when recovering. */
+ info->shadow[id].req = *ring_req;
+
+ return 0;
+}
+
+static int blkif_queue_rw_req(struct request *req)
{
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
@@ -420,9 +442,6 @@ static int blkif_queue_request(struct request *req)
struct scatterlist *sg;
int nseg, max_grefs;
- if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
- return 1;
-
max_grefs = req->nr_phys_segments;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
@@ -452,139 +471,131 @@ static int blkif_queue_request(struct request *req)
id = get_id_from_freelist(info);
info->shadow[id].request = req;
- if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) {
- ring_req->operation = BLKIF_OP_DISCARD;
- ring_req->u.discard.nr_sectors = blk_rq_sectors(req);
- ring_req->u.discard.id = id;
- ring_req->u.discard.sector_number = (blkif_sector_t)blk_rq_pos(req);
- if ((req->cmd_flags & REQ_SECURE) && info->feature_secdiscard)
- ring_req->u.discard.flag = BLKIF_DISCARD_SECURE;
- else
- ring_req->u.discard.flag = 0;
+ BUG_ON(info->max_indirect_segments == 0 &&
+ req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+ BUG_ON(info->max_indirect_segments &&
+ req->nr_phys_segments > info->max_indirect_segments);
+ nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+ ring_req->u.rw.id = id;
+ if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+ /*
+ * The indirect operation can only be a BLKIF_OP_READ or
+ * BLKIF_OP_WRITE
+ */
+ BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
+ ring_req->operation = BLKIF_OP_INDIRECT;
+ ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
+ BLKIF_OP_WRITE : BLKIF_OP_READ;
+ ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
+ ring_req->u.indirect.handle = info->handle;
+ ring_req->u.indirect.nr_segments = nseg;
} else {
- BUG_ON(info->max_indirect_segments == 0 &&
- req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
- BUG_ON(info->max_indirect_segments &&
- req->nr_phys_segments > info->max_indirect_segments);
- nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
- ring_req->u.rw.id = id;
- if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+ ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
+ ring_req->u.rw.handle = info->handle;
+ ring_req->operation = rq_data_dir(req) ?
+ BLKIF_OP_WRITE : BLKIF_OP_READ;
+ if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
/*
- * The indirect operation can only be a BLKIF_OP_READ or
- * BLKIF_OP_WRITE
+ * Ideally we can do an unordered flush-to-disk.
+ * In case the backend onlysupports barriers, use that.
+ * A barrier request a superset of FUA, so we can
+ * implement it the same way. (It's also a FLUSH+FUA,
+ * since it is guaranteed ordered WRT previous writes.)
*/
- BUG_ON(req->cmd_flags & (REQ_FLUSH | REQ_FUA));
- ring_req->operation = BLKIF_OP_INDIRECT;
- ring_req->u.indirect.indirect_op = rq_data_dir(req) ?
- BLKIF_OP_WRITE : BLKIF_OP_READ;
- ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
- ring_req->u.indirect.handle = info->handle;
- ring_req->u.indirect.nr_segments = nseg;
- } else {
- ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
- ring_req->u.rw.handle = info->handle;
- ring_req->operation = rq_data_dir(req) ?
- BLKIF_OP_WRITE : BLKIF_OP_READ;
- if (req->cmd_flags & (REQ_FLUSH | REQ_FUA)) {
+ switch (info->feature_flush &
+ ((REQ_FLUSH|REQ_FUA))) {
+ case REQ_FLUSH|REQ_FUA:
+ ring_req->operation =
+ BLKIF_OP_WRITE_BARRIER;
+ break;
+ case REQ_FLUSH:
+ ring_req->operation =
+ BLKIF_OP_FLUSH_DISKCACHE;
+ break;
+ default:
+ ring_req->operation = 0;
+ }
+ }
+ ring_req->u.rw.nr_segments = nseg;
+ }
+ for_each_sg(info->shadow[id].sg, sg, nseg, i) {
+ fsect = sg->offset >> 9;
+ lsect = fsect + (sg->length >> 9) - 1;
+
+ if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+ (i % SEGS_PER_INDIRECT_FRAME == 0)) {
+ unsigned long uninitialized_var(pfn);
+
+ if (segments)
+ kunmap_atomic(segments);
+
+ n = i / SEGS_PER_INDIRECT_FRAME;
+ if (!info->feature_persistent) {
+ struct page *indirect_page;
+
/*
- * Ideally we can do an unordered flush-to-disk. In case the
- * backend onlysupports barriers, use that. A barrier request
- * a superset of FUA, so we can implement it the same
- * way. (It's also a FLUSH+FUA, since it is
- * guaranteed ordered WRT previous writes.)
+ * Fetch a pre-allocated page to use for
+ * indirect grefs
*/
- switch (info->feature_flush &
- ((REQ_FLUSH|REQ_FUA))) {
- case REQ_FLUSH|REQ_FUA:
- ring_req->operation =
- BLKIF_OP_WRITE_BARRIER;
- break;
- case REQ_FLUSH:
- ring_req->operation =
- BLKIF_OP_FLUSH_DISKCACHE;
- break;
- default:
- ring_req->operation = 0;
- }
+ BUG_ON(list_empty(&info->indirect_pages));
+ indirect_page = list_first_entry(&info->indirect_pages,
+ struct page, lru);
+ list_del(&indirect_page->lru);
+ pfn = page_to_pfn(indirect_page);
}
- ring_req->u.rw.nr_segments = nseg;
+ gnt_list_entry = get_grant(&gref_head, pfn, info);
+ info->shadow[id].indirect_grants[n] = gnt_list_entry;
+ segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+ ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
}
- for_each_sg(info->shadow[id].sg, sg, nseg, i) {
- fsect = sg->offset >> 9;
- lsect = fsect + (sg->length >> 9) - 1;
-
- if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
- (i % SEGS_PER_INDIRECT_FRAME == 0)) {
- unsigned long uninitialized_var(pfn);
-
- if (segments)
- kunmap_atomic(segments);
-
- n = i / SEGS_PER_INDIRECT_FRAME;
- if (!info->feature_persistent) {
- struct page *indirect_page;
-
- /* Fetch a pre-allocated page to use for indirect grefs */
- BUG_ON(list_empty(&info->indirect_pages));
- indirect_page = list_first_entry(&info->indirect_pages,
- struct page, lru);
- list_del(&indirect_page->lru);
- pfn = page_to_pfn(indirect_page);
- }
- gnt_list_entry = get_grant(&gref_head, pfn, info);
- info->shadow[id].indirect_grants[n] = gnt_list_entry;
- segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
- ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
- }
- gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
- ref = gnt_list_entry->gref;
+ gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+ ref = gnt_list_entry->gref;
- info->shadow[id].grants_used[i] = gnt_list_entry;
+ info->shadow[id].grants_used[i] = gnt_list_entry;
- if (rq_data_dir(req) && info->feature_persistent) {
- char *bvec_data;
- void *shared_data;
+ if (rq_data_dir(req) && info->feature_persistent) {
+ char *bvec_data;
+ void *shared_data;
- BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+ BUG_ON(sg->offset + sg->length > PAGE_SIZE);
- shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
- bvec_data = kmap_atomic(sg_page(sg));
+ shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+ bvec_data = kmap_atomic(sg_page(sg));
- /*
- * this does not wipe data stored outside the
- * range sg->offset..sg->offset+sg->length.
- * Therefore, blkback *could* see data from
- * previous requests. This is OK as long as
- * persistent grants are shared with just one
- * domain. It may need refactoring if this
- * changes
- */
- memcpy(shared_data + sg->offset,
- bvec_data + sg->offset,
- sg->length);
+ /*
+ * this does not wipe data stored outside the
+ * range sg->offset..sg->offset+sg->length.
+ * Therefore, blkback *could* see data from
+ * previous requests. This is OK as long as
+ * persistent grants are shared with just one
+ * domain. It may need refactoring if this
+ * changes
+ */
+ memcpy(shared_data + sg->offset,
+ bvec_data + sg->offset,
+ sg->length);
- kunmap_atomic(bvec_data);
- kunmap_atomic(shared_data);
- }
- if (ring_req->operation != BLKIF_OP_INDIRECT) {
- ring_req->u.rw.seg[i] =
- (struct blkif_request_segment) {
- .gref = ref,
- .first_sect = fsect,
- .last_sect = lsect };
- } else {
- n = i % SEGS_PER_INDIRECT_FRAME;
- segments[n] =
+ kunmap_atomic(bvec_data);
+ kunmap_atomic(shared_data);
+ }
+ if (ring_req->operation != BLKIF_OP_INDIRECT) {
+ ring_req->u.rw.seg[i] =
(struct blkif_request_segment) {
- .gref = ref,
- .first_sect = fsect,
- .last_sect = lsect };
- }
+ .gref = ref,
+ .first_sect = fsect,
+ .last_sect = lsect };
+ } else {
+ n = i % SEGS_PER_INDIRECT_FRAME;
+ segments[n] =
+ (struct blkif_request_segment) {
+ .gref = ref,
+ .first_sect = fsect,
+ .last_sect = lsect };
}
- if (segments)
- kunmap_atomic(segments);
}
+ if (segments)
+ kunmap_atomic(segments);
info->ring.req_prod_pvt++;
@@ -597,6 +608,24 @@ static int blkif_queue_request(struct request *req)
return 0;
}
+/*
+ * Generate a Xen blkfront IO request from a blk layer request. Reads
+ * and writes are handled as expected.
+ *
+ * @req: a request struct
+ */
+static int blkif_queue_request(struct request *req)
+{
+ struct blkfront_info *info = req->rq_disk->private_data;
+
+ if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+ return 1;
+
+ if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE)))
+ return blkif_queue_discard_req(req);
+ else
+ return blkif_queue_rw_req(req);
+}
static inline void flush_requests(struct blkfront_info *info)
{
--
2.1.4
All the usage of the field pfn are done using the same idiom:
pfn_to_page(grant->pfn)
This will return always the same page. Store directly the page in the
grant to clean up the code.
Signed-off-by: Julien Grall <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Changes in v3:
- Use the correct indentation in get_grant. The current
indentation (i.e without this patch) was wrong because it was
using space rather than tabulation.
- Add Roger's acked and Stefano's reviewed
- s/mfn/gfn based on the new naming
Changes in v2:
- Patch added
---
drivers/block/xen-blkfront.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index b235689..e431e82 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -68,7 +68,7 @@ enum blkif_state {
struct grant {
grant_ref_t gref;
- unsigned long pfn;
+ struct page *page;
struct list_head node;
};
@@ -221,7 +221,7 @@ static int fill_grant_buffer(struct blkfront_info *info, int num)
kfree(gnt_list_entry);
goto out_of_memory;
}
- gnt_list_entry->pfn = page_to_pfn(granted_page);
+ gnt_list_entry->page = granted_page;
}
gnt_list_entry->gref = GRANT_INVALID_REF;
@@ -236,7 +236,7 @@ out_of_memory:
&info->grants, node) {
list_del(&gnt_list_entry->node);
if (info->feature_persistent)
- __free_page(pfn_to_page(gnt_list_entry->pfn));
+ __free_page(gnt_list_entry->page);
kfree(gnt_list_entry);
i--;
}
@@ -245,8 +245,8 @@ out_of_memory:
}
static struct grant *get_grant(grant_ref_t *gref_head,
- unsigned long pfn,
- struct blkfront_info *info)
+ struct page *page,
+ struct blkfront_info *info)
{
struct grant *gnt_list_entry;
unsigned long buffer_gfn;
@@ -265,10 +265,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
- BUG_ON(!pfn);
- gnt_list_entry->pfn = pfn;
+ BUG_ON(!page);
+ gnt_list_entry->page = page;
}
- buffer_gfn = pfn_to_gfn(gnt_list_entry->pfn);
+ buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
info->xbdev->otherend_id,
buffer_gfn, 0);
@@ -524,7 +524,7 @@ static int blkif_queue_rw_req(struct request *req)
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
- unsigned long uninitialized_var(pfn);
+ struct page *uninitialized_var(page);
if (segments)
kunmap_atomic(segments);
@@ -541,15 +541,15 @@ static int blkif_queue_rw_req(struct request *req)
indirect_page = list_first_entry(&info->indirect_pages,
struct page, lru);
list_del(&indirect_page->lru);
- pfn = page_to_pfn(indirect_page);
+ page = indirect_page;
}
- gnt_list_entry = get_grant(&gref_head, pfn, info);
+ gnt_list_entry = get_grant(&gref_head, page, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
- segments = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+ segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
}
- gnt_list_entry = get_grant(&gref_head, page_to_pfn(sg_page(sg)), info);
+ gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
ref = gnt_list_entry->gref;
info->shadow[id].grants_used[i] = gnt_list_entry;
@@ -560,7 +560,7 @@ static int blkif_queue_rw_req(struct request *req)
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
- shared_data = kmap_atomic(pfn_to_page(gnt_list_entry->pfn));
+ shared_data = kmap_atomic(gnt_list_entry->page);
bvec_data = kmap_atomic(sg_page(sg));
/*
@@ -1001,7 +1001,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
info->persistent_gnts_c--;
}
if (info->feature_persistent)
- __free_page(pfn_to_page(persistent_gnt->pfn));
+ __free_page(persistent_gnt->page);
kfree(persistent_gnt);
}
}
@@ -1036,7 +1036,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
persistent_gnt = info->shadow[i].grants_used[j];
gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
if (info->feature_persistent)
- __free_page(pfn_to_page(persistent_gnt->pfn));
+ __free_page(persistent_gnt->page);
kfree(persistent_gnt);
}
@@ -1050,7 +1050,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
for (j = 0; j < INDIRECT_GREFS(segs); j++) {
persistent_gnt = info->shadow[i].indirect_grants[j];
gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
- __free_page(pfn_to_page(persistent_gnt->pfn));
+ __free_page(persistent_gnt->page);
kfree(persistent_gnt);
}
@@ -1101,8 +1101,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
for_each_sg(s->sg, sg, nseg, i) {
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
- shared_data = kmap_atomic(
- pfn_to_page(s->grants_used[i]->pfn));
+ shared_data = kmap_atomic(s->grants_used[i]->page);
bvec_data = kmap_atomic(sg_page(sg));
memcpy(bvec_data + sg->offset,
shared_data + sg->offset,
@@ -1153,7 +1152,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
* Add the used indirect page back to the list of
* available pages for indirect grefs.
*/
- indirect_page = pfn_to_page(s->indirect_grants[i]->pfn);
+ indirect_page = s->indirect_grants[i]->page;
list_add(&indirect_page->lru, &info->indirect_pages);
s->indirect_grants[i]->gref = GRANT_INVALID_REF;
list_add_tail(&s->indirect_grants[i]->node, &info->grants);
--
2.1.4
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.
Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.
Provide 2 functions, to setup indirect grant, the other for bio page.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Changes in v3:
- Fix errors reported by checkpatch.pl
- gnttab_page_grant_foreign_access_ref has been renamed to
gnttab_page_grant_foreign_access_ref_one
- Fix compilation by using get_indirect_grant rather than
get_grant (the changes was in a later patch...).
- Make grant_foreign_access static inline
- s/mfn/gfn/ based on the new naming
Changes in v2:
- Patch added
---
drivers/block/xen-blkfront.c | 88 +++++++++++++++++++++++++++++---------------
1 file changed, 59 insertions(+), 29 deletions(-)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index e431e82..52105e7 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -244,34 +244,77 @@ out_of_memory:
return -ENOMEM;
}
-static struct grant *get_grant(grant_ref_t *gref_head,
- struct page *page,
- struct blkfront_info *info)
+static struct grant *get_free_grant(struct blkfront_info *info)
{
struct grant *gnt_list_entry;
- unsigned long buffer_gfn;
BUG_ON(list_empty(&info->grants));
gnt_list_entry = list_first_entry(&info->grants, struct grant,
- node);
+ node);
list_del(&gnt_list_entry->node);
- if (gnt_list_entry->gref != GRANT_INVALID_REF) {
+ if (gnt_list_entry->gref != GRANT_INVALID_REF)
info->persistent_gnts_c--;
+
+ return gnt_list_entry;
+}
+
+static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+ const struct blkfront_info *info)
+{
+ gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+ info->xbdev->otherend_id,
+ gnt_list_entry->page,
+ 0);
+}
+
+static struct grant *get_grant(grant_ref_t *gref_head,
+ unsigned long gfn,
+ struct blkfront_info *info)
+{
+ struct grant *gnt_list_entry = get_free_grant(info);
+
+ if (gnt_list_entry->gref != GRANT_INVALID_REF)
return gnt_list_entry;
+
+ /* Assign a gref to this page */
+ gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+ BUG_ON(gnt_list_entry->gref == -ENOSPC);
+ if (info->feature_persistent)
+ grant_foreign_access(gnt_list_entry, info);
+ else {
+ /* Grant access to the GFN passed by the caller */
+ gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+ info->xbdev->otherend_id,
+ gfn, 0);
}
+ return gnt_list_entry;
+}
+
+static struct grant *get_indirect_grant(grant_ref_t *gref_head,
+ struct blkfront_info *info)
+{
+ struct grant *gnt_list_entry = get_free_grant(info);
+
+ if (gnt_list_entry->gref != GRANT_INVALID_REF)
+ return gnt_list_entry;
+
/* Assign a gref to this page */
gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
- BUG_ON(!page);
- gnt_list_entry->page = page;
+ struct page *indirect_page;
+
+ /* Fetch a pre-allocated page to use for indirect grefs */
+ BUG_ON(list_empty(&info->indirect_pages));
+ indirect_page = list_first_entry(&info->indirect_pages,
+ struct page, lru);
+ list_del(&indirect_page->lru);
+ gnt_list_entry->page = indirect_page;
}
- buffer_gfn = xen_page_to_gfn(gnt_list_entry->page);
- gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
- info->xbdev->otherend_id,
- buffer_gfn, 0);
+ grant_foreign_access(gnt_list_entry, info);
+
return gnt_list_entry;
}
@@ -524,32 +567,19 @@ static int blkif_queue_rw_req(struct request *req)
if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
(i % SEGS_PER_INDIRECT_FRAME == 0)) {
- struct page *uninitialized_var(page);
-
if (segments)
kunmap_atomic(segments);
n = i / SEGS_PER_INDIRECT_FRAME;
- if (!info->feature_persistent) {
- struct page *indirect_page;
-
- /*
- * Fetch a pre-allocated page to use for
- * indirect grefs
- */
- BUG_ON(list_empty(&info->indirect_pages));
- indirect_page = list_first_entry(&info->indirect_pages,
- struct page, lru);
- list_del(&indirect_page->lru);
- page = indirect_page;
- }
- gnt_list_entry = get_grant(&gref_head, page, info);
+ gnt_list_entry = get_indirect_grant(&gref_head, info);
info->shadow[id].indirect_grants[n] = gnt_list_entry;
segments = kmap_atomic(gnt_list_entry->page);
ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
}
- gnt_list_entry = get_grant(&gref_head, sg_page(sg), info);
+ gnt_list_entry = get_grant(&gref_head,
+ xen_page_to_gfn(sg_page(sg)),
+ info);
ref = gnt_list_entry->gref;
info->shadow[id].grants_used[i] = gnt_list_entry;
--
2.1.4
On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.
While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.
When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.
Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
There is some ideas to check whether two biovec could be merged
(see [1]) but it's not critical and can be consider as a performance
improvement.
Changes in v3:
- Update commit message
- s/mfn/bfn/ base on the new renaming
- Update TODO
Changes in v2:
- Remove the workaround and check if the Linux page granularity
is the same as Xen or not
[1] https://lkml.org/lkml/2015/7/17/418
---
drivers/xen/biomerge.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 8ae2fc90..4da69db 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -6,10 +6,18 @@
bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
const struct bio_vec *vec2)
{
+#if XEN_PAGE_SIZE == PAGE_SIZE
unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((bfn1 == bfn2) || ((bfn1+1) == bfn2));
+#else
+ /*
+ * XXX: Add support for merging bio_vec when using different page
+ * size in Xen and Linux.
+ */
+ return 0;
+#endif
}
EXPORT_SYMBOL(xen_biovec_phys_mergeable);
--
2.1.4
All the ring (xenstore, and PV rings) are always based on the page
granularity of Xen.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Changes in v3:
- Fix errors reported by checkpatch.pl
- s/MFN/GFN base on the new naming
- Add David and Stefano's reviewed-by
Changes in v2:
- Also update the ring mapping function
---
drivers/xen/xenbus/xenbus_client.c | 6 +++---
drivers/xen/xenbus/xenbus_probe.c | 3 ++-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index daa267a..a2f6276 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -388,7 +388,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
}
grefs[i] = err;
- vaddr = vaddr + PAGE_SIZE;
+ vaddr = vaddr + XEN_PAGE_SIZE;
}
return 0;
@@ -555,7 +555,7 @@ static int xenbus_map_ring_valloc_pv(struct xenbus_device *dev,
if (!node)
return -ENOMEM;
- area = alloc_vm_area(PAGE_SIZE * nr_grefs, ptes);
+ area = alloc_vm_area(XEN_PAGE_SIZE * nr_grefs, ptes);
if (!area) {
kfree(node);
return -ENOMEM;
@@ -750,7 +750,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
unsigned long addr;
memset(&unmap[i], 0, sizeof(unmap[i]));
- addr = (unsigned long)vaddr + (PAGE_SIZE * i);
+ addr = (unsigned long)vaddr + (XEN_PAGE_SIZE * i);
unmap[i].host_addr = arbitrary_virt_to_machine(
lookup_address(addr, &level)).maddr;
unmap[i].dev_bus_addr = 0;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3cbe055..33a31cf 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -802,7 +802,8 @@ static int __init xenbus_init(void)
goto out_error;
xen_store_gfn = (unsigned long)v;
xen_store_interface =
- xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
+ xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+ XEN_PAGE_SIZE);
break;
default:
pr_warn("Xenstore state unknown\n");
--
2.1.4
The console ring is always based on the page granularity of Xen.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: [email protected]
Changes in v3:
- Some changes has been moved in the series "Use correctly the
Xen memory terminologies in Linux".
- Add Stefano's reviewed-by
---
drivers/tty/hvc/hvc_xen.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 10beb15..0599d9d 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
if (r < 0 || v == 0)
goto err;
gfn = v;
- info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
+ info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, PAGE_SIZE);
if (info->intf == NULL)
goto err;
info->vtermno = HVC_COOKIE;
@@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
struct xencons_info *info = dev_get_drvdata(&dev->dev);
xencons_disconnect_backend(info);
- memset(info->intf, 0, PAGE_SIZE);
+ memset(info->intf, 0, XEN_PAGE_SIZE);
return xencons_connect_backend(dev, info);
}
--
2.1.4
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.
With 64K page granularity, a single page will be spread over multiple
Xen frame.
To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.
Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Wei Liu <[email protected]>
Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- Rather than splitting the page into 4KB chunk, use the
extent_order field to allocate directly a Linux page size. This
is avoid lots of code for no benefits.
Changes in v2:
- Use xen_apply_to_page to split a page in 4K chunk
- It's not necessary to have a smaller frame list. Re-use
PAGE_SIZE
- Convert reserve_additional_memory to use XEN_... macro
---
drivers/xen/balloon.c | 47 ++++++++++++++++++++++++++++++++++++-----------
1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 9734649..c8739c8 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -70,6 +70,9 @@
#include <xen/features.h>
#include <xen/page.h>
+/* Use one extent per PAGE_SIZE to avoid break down into multiple frame */
+#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
+
/*
* balloon_process() state:
*
@@ -230,6 +233,11 @@ static enum bp_state reserve_additional_memory(long credit)
nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
#ifdef CONFIG_XEN_HAVE_PVMMU
+ /* We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
/*
* add_memory() will build page tables for the new memory so
* the p2m must contain invalid entries so the correct
@@ -326,11 +334,11 @@ static enum bp_state reserve_additional_memory(long credit)
static enum bp_state increase_reservation(unsigned long nr_pages)
{
int rc;
- unsigned long pfn, i;
+ unsigned long i;
struct page *page;
struct xen_memory_reservation reservation = {
.address_bits = 0,
- .extent_order = 0,
+ .extent_order = EXTENT_ORDER,
.domid = DOMID_SELF
};
@@ -352,7 +360,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
nr_pages = i;
break;
}
- frame_list[i] = page_to_pfn(page);
+
+ /* XENMEM_populate_physmap requires a PFN based on Xen
+ * granularity.
+ */
+ frame_list[i] = xen_page_to_pfn(page);
page = balloon_next_page(page);
}
@@ -366,10 +378,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
page = balloon_retrieve(false);
BUG_ON(page == NULL);
- pfn = page_to_pfn(page);
-
#ifdef CONFIG_XEN_HAVE_PVMMU
+ /* We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ unsigned long pfn = page_to_pfn(page);
+
set_phys_to_machine(pfn, frame_list[i]);
/* Link back into the page tables if not highmem. */
@@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
{
enum bp_state state = BP_DONE;
- unsigned long pfn, i;
+ unsigned long i;
struct page *page;
int ret;
struct xen_memory_reservation reservation = {
.address_bits = 0,
- .extent_order = 0,
+ .extent_order = EXTENT_ORDER,
.domid = DOMID_SELF
};
+ static struct page *pages[ARRAY_SIZE(frame_list)];
#ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
if (balloon_stats.hotplug_pages) {
@@ -426,7 +444,9 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
}
scrub_page(page);
- frame_list[i] = page_to_pfn(page);
+ /* XENMEM_decrease_reservation requires a GFN */
+ frame_list[i] = xen_page_to_gfn(page);
+ pages[i] = page;
}
/*
@@ -440,12 +460,17 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
/* Update direct mapping, invalidate P2M, and add to balloon. */
for (i = 0; i < nr_pages; i++) {
- pfn = frame_list[i];
- frame_list[i] = pfn_to_gfn(pfn);
- page = pfn_to_page(pfn);
+ page = pages[i];
#ifdef CONFIG_XEN_HAVE_PVMMU
+ /* We don't support PV MMU when Linux and Xen is using
+ * different page granularity.
+ */
+ BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
+
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ unsigned long pfn = page_to_pfn(page);
+
if (!PageHighMem(page)) {
ret = HYPERVISOR_update_va_mapping(
(unsigned long)__va(pfn << PAGE_SHIFT),
--
2.1.4
Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
* control block: a page is allocating per CPU
* event array: a page is allocating everytime we need to expand it
I think we can reduce the memory waste for the 2 areas by:
* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there
* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Note I haven't updated the suggestion to reduce the memory waste
after David's email [1]. I can do it if necessary.
Changes in v3:
- Add David and Stefano's reviewed-by
[1] http://lists.xen.org/archives/html/xen-devel/2015-07/msg04596.html
---
drivers/xen/events/events_base.c | 2 +-
drivers/xen/events/events_fifo.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c49bb7a..00dd923 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -40,11 +40,11 @@
#include <asm/idle.h>
#include <asm/io_apic.h>
#include <asm/xen/pci.h>
-#include <xen/page.h>
#endif
#include <asm/sync_bitops.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
+#include <xen/page.h>
#include <xen/xen.h>
#include <xen/hvm.h>
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 1d4baf5..e3e9e3d 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -54,7 +54,7 @@
#include "events_internal.h"
-#define EVENT_WORDS_PER_PAGE (PAGE_SIZE / sizeof(event_word_t))
+#define EVENT_WORDS_PER_PAGE (XEN_PAGE_SIZE / sizeof(event_word_t))
#define MAX_EVENT_ARRAY_PAGES (EVTCHN_FIFO_NR_CHANNELS / EVENT_WORDS_PER_PAGE)
struct evtchn_fifo_queue {
--
2.1.4
The Xen interface is using 4KB page granularity. This means that each
grant is 4KB.
The current implementation allocates a Linux page per grant. On Linux
using 64KB page granularity, only the first 4KB of the page will be
used.
We could decrease the memory wasted by sharing the page with multiple
grant. It will require some care with the {Set,Clear}ForeignPage macro.
Note that no changes has been made in the x86 code because both Linux
and Xen will only use 4KB page granularity.
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
---
Cc: Stefano Stabellini <[email protected]>
Cc: Russell King <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Changes in v3:
- Add Stefano's reviewed-by
Changes in v2
- Add David's reviewed-by
---
arch/arm/xen/p2m.c | 6 +++---
drivers/xen/grant-table.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/arm/xen/p2m.c b/arch/arm/xen/p2m.c
index 887596c..0ed01f2 100644
--- a/arch/arm/xen/p2m.c
+++ b/arch/arm/xen/p2m.c
@@ -93,8 +93,8 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
for (i = 0; i < count; i++) {
if (map_ops[i].status)
continue;
- set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
- map_ops[i].dev_bus_addr >> PAGE_SHIFT);
+ set_phys_to_machine(map_ops[i].host_addr >> XEN_PAGE_SHIFT,
+ map_ops[i].dev_bus_addr >> XEN_PAGE_SHIFT);
}
return 0;
@@ -108,7 +108,7 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
int i;
for (i = 0; i < count; i++) {
- set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
+ set_phys_to_machine(unmap_ops[i].host_addr >> XEN_PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 94ae0fd..253d4db 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -642,7 +642,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
if (xen_auto_xlat_grant_frames.count)
return -EINVAL;
- vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+ vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
if (vaddr == NULL) {
pr_warn("Failed to ioremap gnttab share frames (addr=%pa)!\n",
&addr);
@@ -654,7 +654,7 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
return -ENOMEM;
}
for (i = 0; i < max_nr_gframes; i++)
- pfn[i] = PFN_DOWN(addr) + i;
+ pfn[i] = XEN_PFN_DOWN(addr) + i;
xen_auto_xlat_grant_frames.vaddr = vaddr;
xen_auto_xlat_grant_frames.pfn = pfn;
@@ -1004,7 +1004,7 @@ static void gnttab_request_version(void)
{
/* Only version 1 is used, which will always be available. */
grant_table_version = 1;
- grefs_per_grant_frame = PAGE_SIZE / sizeof(struct grant_entry_v1);
+ grefs_per_grant_frame = XEN_PAGE_SIZE / sizeof(struct grant_entry_v1);
gnttab_interface = &gnttab_v1_ops;
pr_info("Grant tables using version %d layout\n", grant_table_version);
--
2.1.4
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using block
device on a non-modified Xen.
The block API is using segment which should at least be the size of a
Linux page. Therefore, the driver will have to break the page in chunk
of 4K before giving the page to the backend.
Breaking a 64KB segment in 4KB chunk will result to have some chunk with
no data. As the PV protocol always require to have data in the chunk, we
have to count the number of Xen page which will be in use and avoid to
sent empty chunk.
Note that, a pre-defined number of grant is reserved before preparing
the request. This pre-defined number is based on the number and the
maximum size of the segments. If each segment contain a very small
amount of data, the driver may reserve too much grant (16 grant is
reserved per segment with 64KB page granularity).
Futhermore, in the case of persistent grant we allocate one Linux page
per grant although only the 4KB of the page will be effectively use.
This could be improved by share the page with multiple grants.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Roger Pau Monné <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Improvement such as support 64KB grant is not taken into consideration in
this patch because we have the requirement to run a Linux using 64KB page
on a non-modified Xen.
Royger, I haven't retain your acked-by because of some changes which
were not because of naming modifications.
Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_GREFS
- Split lines over 80 characters whenever it's possible
- s/mfn/gfn/ based on the new naming
- The grant callback doesn't allow anymore to change the len
(wasn't used here).
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- Use gnttab_count_grant to get the number of grants in a sg
- Do some renaming to use the correct variable every time
Changes in v2:
- Use gnttab_foreach_grant to split a Linux page into grant
---
drivers/block/xen-blkfront.c | 324 ++++++++++++++++++++++++++++---------------
1 file changed, 213 insertions(+), 111 deletions(-)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 52105e7..acc4208 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -78,6 +78,7 @@ struct blk_shadow {
struct grant **grants_used;
struct grant **indirect_grants;
struct scatterlist *sg;
+ unsigned int num_sg;
};
struct split_bio {
@@ -107,8 +108,12 @@ static unsigned int xen_blkif_max_ring_order;
module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, S_IRUGO);
MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for the shared ring");
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * (info)->nr_ring_pages)
-#define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+#define BLK_RING_SIZE(info) \
+ __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
+
+#define BLK_MAX_RING_SIZE \
+ __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * XENBUS_MAX_RING_PAGES)
+
/*
* ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
* characters are enough. Define to 20 to keep consist with backend.
@@ -147,6 +152,7 @@ struct blkfront_info
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int feature_persistent:1;
+ /* Number of 4KB segments handled */
unsigned int max_indirect_segments;
int is_ready;
struct blk_mq_tag_set tag_set;
@@ -175,10 +181,23 @@ static DEFINE_SPINLOCK(minor_lock);
#define DEV_NAME "xvd" /* name in /dev */
-#define SEGS_PER_INDIRECT_FRAME \
- (PAGE_SIZE/sizeof(struct blkif_request_segment))
-#define INDIRECT_GREFS(_segs) \
- ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+/*
+ * Grants are always the same size as a Xen page (i.e 4KB).
+ * A physical segment is always the same size as a Linux page.
+ * Number of grants per physical segment
+ */
+#define GRANTS_PER_PSEG (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define GRANTS_PER_INDIRECT_FRAME \
+ (XEN_PAGE_SIZE / sizeof(struct blkif_request_segment))
+
+#define PSEGS_PER_INDIRECT_FRAME \
+ (GRANTS_INDIRECT_FRAME / GRANTS_PSEGS)
+
+#define INDIRECT_GREFS(_grants) \
+ DIV_ROUND_UP(_grants, GRANTS_PER_INDIRECT_FRAME)
+
+#define GREFS(_psegs) ((_psegs) * GRANTS_PER_PSEG)
static int blkfront_setup_indirect(struct blkfront_info *info);
@@ -465,14 +484,100 @@ static int blkif_queue_discard_req(struct request *req)
return 0;
}
+struct setup_rw_req {
+ unsigned int grant_idx;
+ struct blkif_request_segment *segments;
+ struct blkfront_info *info;
+ struct blkif_request *ring_req;
+ grant_ref_t gref_head;
+ unsigned int id;
+ /* Only used when persistent grant is used and it's a read request */
+ bool need_copy;
+ unsigned int bvec_off;
+ char *bvec_data;
+};
+
+static void blkif_setup_rw_req_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
+{
+ struct setup_rw_req *setup = data;
+ int n, ref;
+ struct grant *gnt_list_entry;
+ unsigned int fsect, lsect;
+ /* Convenient aliases */
+ unsigned int grant_idx = setup->grant_idx;
+ struct blkif_request *ring_req = setup->ring_req;
+ struct blkfront_info *info = setup->info;
+ struct blk_shadow *shadow = &info->shadow[setup->id];
+
+ if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
+ (grant_idx % GRANTS_PER_INDIRECT_FRAME == 0)) {
+ if (setup->segments)
+ kunmap_atomic(setup->segments);
+
+ n = grant_idx / GRANTS_PER_INDIRECT_FRAME;
+ gnt_list_entry = get_indirect_grant(&setup->gref_head, info);
+ shadow->indirect_grants[n] = gnt_list_entry;
+ setup->segments = kmap_atomic(gnt_list_entry->page);
+ ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
+ }
+
+ gnt_list_entry = get_grant(&setup->gref_head, gfn, info);
+ ref = gnt_list_entry->gref;
+ shadow->grants_used[grant_idx] = gnt_list_entry;
+
+ if (setup->need_copy) {
+ void *shared_data;
+
+ shared_data = kmap_atomic(gnt_list_entry->page);
+ /*
+ * this does not wipe data stored outside the
+ * range sg->offset..sg->offset+sg->length.
+ * Therefore, blkback *could* see data from
+ * previous requests. This is OK as long as
+ * persistent grants are shared with just one
+ * domain. It may need refactoring if this
+ * changes
+ */
+ memcpy(shared_data + offset,
+ setup->bvec_data + setup->bvec_off,
+ len);
+
+ kunmap_atomic(shared_data);
+ setup->bvec_off += len;
+ }
+
+ fsect = offset >> 9;
+ lsect = fsect + (len >> 9) - 1;
+ if (ring_req->operation != BLKIF_OP_INDIRECT) {
+ ring_req->u.rw.seg[grant_idx] =
+ (struct blkif_request_segment) {
+ .gref = ref,
+ .first_sect = fsect,
+ .last_sect = lsect };
+ } else {
+ setup->segments[grant_idx % GRANTS_PER_INDIRECT_FRAME] =
+ (struct blkif_request_segment) {
+ .gref = ref,
+ .first_sect = fsect,
+ .last_sect = lsect };
+ }
+
+ (setup->grant_idx)++;
+}
+
static int blkif_queue_rw_req(struct request *req)
{
struct blkfront_info *info = req->rq_disk->private_data;
struct blkif_request *ring_req;
unsigned long id;
- unsigned int fsect, lsect;
- int i, ref, n;
- struct blkif_request_segment *segments = NULL;
+ int i;
+ struct setup_rw_req setup = {
+ .grant_idx = 0,
+ .segments = NULL,
+ .info = info,
+ .need_copy = rq_data_dir(req) && info->feature_persistent,
+ };
/*
* Used to store if we are able to queue the request by just using
@@ -480,25 +585,23 @@ static int blkif_queue_rw_req(struct request *req)
* as there are not sufficiently many free.
*/
bool new_persistent_gnts;
- grant_ref_t gref_head;
- struct grant *gnt_list_entry = NULL;
struct scatterlist *sg;
- int nseg, max_grefs;
+ int num_sg, max_grefs, num_grant;
- max_grefs = req->nr_phys_segments;
+ max_grefs = req->nr_phys_segments * GRANTS_PER_PSEG;
if (max_grefs > BLKIF_MAX_SEGMENTS_PER_REQUEST)
/*
* If we are using indirect segments we need to account
* for the indirect grefs used in the request.
*/
- max_grefs += INDIRECT_GREFS(req->nr_phys_segments);
+ max_grefs += INDIRECT_GREFS(max_grefs);
/* Check if we have enough grants to allocate a requests */
if (info->persistent_gnts_c < max_grefs) {
new_persistent_gnts = 1;
if (gnttab_alloc_grant_references(
max_grefs - info->persistent_gnts_c,
- &gref_head) < 0) {
+ &setup.gref_head) < 0) {
gnttab_request_free_callback(
&info->callback,
blkif_restart_queue_callback,
@@ -515,12 +618,19 @@ static int blkif_queue_rw_req(struct request *req)
info->shadow[id].request = req;
BUG_ON(info->max_indirect_segments == 0 &&
- req->nr_phys_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
+ GREFS(req->nr_phys_segments) > BLKIF_MAX_SEGMENTS_PER_REQUEST);
BUG_ON(info->max_indirect_segments &&
- req->nr_phys_segments > info->max_indirect_segments);
- nseg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+ GREFS(req->nr_phys_segments) > info->max_indirect_segments);
+
+ num_sg = blk_rq_map_sg(req->q, req, info->shadow[id].sg);
+ num_grant = 0;
+ /* Calculate the number of grant used */
+ for_each_sg(info->shadow[id].sg, sg, num_sg, i)
+ num_grant += gnttab_count_grant(sg->offset, sg->length);
+
ring_req->u.rw.id = id;
- if (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
+ info->shadow[id].num_sg = num_sg;
+ if (num_grant > BLKIF_MAX_SEGMENTS_PER_REQUEST) {
/*
* The indirect operation can only be a BLKIF_OP_READ or
* BLKIF_OP_WRITE
@@ -531,7 +641,7 @@ static int blkif_queue_rw_req(struct request *req)
BLKIF_OP_WRITE : BLKIF_OP_READ;
ring_req->u.indirect.sector_number = (blkif_sector_t)blk_rq_pos(req);
ring_req->u.indirect.handle = info->handle;
- ring_req->u.indirect.nr_segments = nseg;
+ ring_req->u.indirect.nr_segments = num_grant;
} else {
ring_req->u.rw.sector_number = (blkif_sector_t)blk_rq_pos(req);
ring_req->u.rw.handle = info->handle;
@@ -559,73 +669,30 @@ static int blkif_queue_rw_req(struct request *req)
ring_req->operation = 0;
}
}
- ring_req->u.rw.nr_segments = nseg;
- }
- for_each_sg(info->shadow[id].sg, sg, nseg, i) {
- fsect = sg->offset >> 9;
- lsect = fsect + (sg->length >> 9) - 1;
-
- if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
- (i % SEGS_PER_INDIRECT_FRAME == 0)) {
- if (segments)
- kunmap_atomic(segments);
-
- n = i / SEGS_PER_INDIRECT_FRAME;
- gnt_list_entry = get_indirect_grant(&gref_head, info);
- info->shadow[id].indirect_grants[n] = gnt_list_entry;
- segments = kmap_atomic(gnt_list_entry->page);
- ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
- }
-
- gnt_list_entry = get_grant(&gref_head,
- xen_page_to_gfn(sg_page(sg)),
- info);
- ref = gnt_list_entry->gref;
-
- info->shadow[id].grants_used[i] = gnt_list_entry;
-
- if (rq_data_dir(req) && info->feature_persistent) {
- char *bvec_data;
- void *shared_data;
+ ring_req->u.rw.nr_segments = num_grant;
+ }
- BUG_ON(sg->offset + sg->length > PAGE_SIZE);
+ setup.ring_req = ring_req;
+ setup.id = id;
+ for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
+ BUG_ON(sg->offset + sg->length > PAGE_SIZE);
- shared_data = kmap_atomic(gnt_list_entry->page);
- bvec_data = kmap_atomic(sg_page(sg));
+ if (setup.need_copy) {
+ setup.bvec_off = sg->offset;
+ setup.bvec_data = kmap_atomic(sg_page(sg));
+ }
- /*
- * this does not wipe data stored outside the
- * range sg->offset..sg->offset+sg->length.
- * Therefore, blkback *could* see data from
- * previous requests. This is OK as long as
- * persistent grants are shared with just one
- * domain. It may need refactoring if this
- * changes
- */
- memcpy(shared_data + sg->offset,
- bvec_data + sg->offset,
- sg->length);
+ gnttab_foreach_grant_in_range(sg_page(sg),
+ sg->offset,
+ sg->length,
+ blkif_setup_rw_req_grant,
+ &setup);
- kunmap_atomic(bvec_data);
- kunmap_atomic(shared_data);
- }
- if (ring_req->operation != BLKIF_OP_INDIRECT) {
- ring_req->u.rw.seg[i] =
- (struct blkif_request_segment) {
- .gref = ref,
- .first_sect = fsect,
- .last_sect = lsect };
- } else {
- n = i % SEGS_PER_INDIRECT_FRAME;
- segments[n] =
- (struct blkif_request_segment) {
- .gref = ref,
- .first_sect = fsect,
- .last_sect = lsect };
- }
+ if (setup.need_copy)
+ kunmap_atomic(setup.bvec_data);
}
- if (segments)
- kunmap_atomic(segments);
+ if (setup.segments)
+ kunmap_atomic(setup.segments);
info->ring.req_prod_pvt++;
@@ -633,7 +700,7 @@ static int blkif_queue_rw_req(struct request *req)
info->shadow[id].req = *ring_req;
if (new_persistent_gnts)
- gnttab_free_grant_references(gref_head);
+ gnttab_free_grant_references(setup.gref_head);
return 0;
}
@@ -750,14 +817,14 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
/* Hard sector size and max sectors impersonate the equiv. hardware. */
blk_queue_logical_block_size(rq, sector_size);
blk_queue_physical_block_size(rq, physical_sector_size);
- blk_queue_max_hw_sectors(rq, (segments * PAGE_SIZE) / 512);
+ blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
/* Each segment in a request is up to an aligned page in size. */
blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
blk_queue_max_segment_size(rq, PAGE_SIZE);
/* Ensure a merged request will fit in a single I/O ring slot. */
- blk_queue_max_segments(rq, segments);
+ blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
/* Make sure buffer addresses are sector-aligned. */
blk_queue_dma_alignment(rq, 511);
@@ -1116,32 +1183,65 @@ free_shadow:
}
+struct copy_from_grant {
+ const struct blk_shadow *s;
+ unsigned int grant_idx;
+ unsigned int bvec_offset;
+ char *bvec_data;
+};
+
+static void blkif_copy_from_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
+{
+ struct copy_from_grant *info = data;
+ char *shared_data;
+ /* Convenient aliases */
+ const struct blk_shadow *s = info->s;
+
+ shared_data = kmap_atomic(s->grants_used[info->grant_idx]->page);
+
+ memcpy(info->bvec_data + info->bvec_offset,
+ shared_data + offset, len);
+
+ info->bvec_offset += len;
+ info->grant_idx++;
+
+ kunmap_atomic(shared_data);
+}
+
static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
struct blkif_response *bret)
{
int i = 0;
struct scatterlist *sg;
- char *bvec_data;
- void *shared_data;
- int nseg;
+ int num_sg, num_grant;
+ struct copy_from_grant data = {
+ .s = s,
+ .grant_idx = 0,
+ };
- nseg = s->req.operation == BLKIF_OP_INDIRECT ?
+ num_grant = s->req.operation == BLKIF_OP_INDIRECT ?
s->req.u.indirect.nr_segments : s->req.u.rw.nr_segments;
+ num_sg = s->num_sg;
if (bret->operation == BLKIF_OP_READ && info->feature_persistent) {
- for_each_sg(s->sg, sg, nseg, i) {
+ for_each_sg(s->sg, sg, num_sg, i) {
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
- shared_data = kmap_atomic(s->grants_used[i]->page);
- bvec_data = kmap_atomic(sg_page(sg));
- memcpy(bvec_data + sg->offset,
- shared_data + sg->offset,
- sg->length);
- kunmap_atomic(bvec_data);
- kunmap_atomic(shared_data);
+
+ data.bvec_offset = sg->offset;
+ data.bvec_data = kmap_atomic(sg_page(sg));
+
+ gnttab_foreach_grant_in_range(sg_page(sg),
+ sg->offset,
+ sg->length,
+ blkif_copy_from_grant,
+ &data);
+
+ kunmap_atomic(data.bvec_data);
}
}
/* Add the persistent grant into the list of free grants */
- for (i = 0; i < nseg; i++) {
+ for (i = 0; i < num_grant; i++) {
if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
/*
* If the grant is still mapped by the backend (the
@@ -1167,7 +1267,7 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
}
}
if (s->req.operation == BLKIF_OP_INDIRECT) {
- for (i = 0; i < INDIRECT_GREFS(nseg); i++) {
+ for (i = 0; i < INDIRECT_GREFS(num_grant); i++) {
if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
if (!info->feature_persistent)
pr_alert_ratelimited("backed has not unmapped grant: %u\n",
@@ -1309,7 +1409,7 @@ static int setup_blkring(struct xenbus_device *dev,
{
struct blkif_sring *sring;
int err, i;
- unsigned long ring_size = info->nr_ring_pages * PAGE_SIZE;
+ unsigned long ring_size = info->nr_ring_pages * XEN_PAGE_SIZE;
grant_ref_t gref[XENBUS_MAX_RING_PAGES];
for (i = 0; i < info->nr_ring_pages; i++)
@@ -1640,8 +1740,8 @@ static int blkif_recover(struct blkfront_info *info)
atomic_set(&split_bio->pending, pending);
split_bio->bio = bio;
for (i = 0; i < pending; i++) {
- offset = (i * segs * PAGE_SIZE) >> 9;
- size = min((unsigned int)(segs * PAGE_SIZE) >> 9,
+ offset = (i * segs * XEN_PAGE_SIZE) >> 9;
+ size = min((unsigned int)(segs * XEN_PAGE_SIZE) >> 9,
(unsigned int)bio_sectors(bio) - offset);
cloned_bio = bio_clone(bio, GFP_NOIO);
BUG_ON(cloned_bio == NULL);
@@ -1752,7 +1852,7 @@ static void blkfront_setup_discard(struct blkfront_info *info)
static int blkfront_setup_indirect(struct blkfront_info *info)
{
- unsigned int indirect_segments, segs;
+ unsigned int indirect_segments, psegs, grants;
int err, i;
err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
@@ -1760,14 +1860,16 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
NULL);
if (err) {
info->max_indirect_segments = 0;
- segs = BLKIF_MAX_SEGMENTS_PER_REQUEST;
+ grants = BLKIF_MAX_SEGMENTS_PER_REQUEST;
} else {
info->max_indirect_segments = min(indirect_segments,
xen_blkif_max_segments);
- segs = info->max_indirect_segments;
+ grants = info->max_indirect_segments;
}
+ psegs = grants / GRANTS_PER_PSEG;
- err = fill_grant_buffer(info, (segs + INDIRECT_GREFS(segs)) * BLK_RING_SIZE(info));
+ err = fill_grant_buffer(info,
+ (grants + INDIRECT_GREFS(grants)) * BLK_RING_SIZE(info));
if (err)
goto out_of_memory;
@@ -1777,7 +1879,7 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
* grants, we need to allocate a set of pages that can be
* used for mapping indirect grefs
*/
- int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE(info);
+ int num = INDIRECT_GREFS(grants) * BLK_RING_SIZE(info);
BUG_ON(!list_empty(&info->indirect_pages));
for (i = 0; i < num; i++) {
@@ -1790,20 +1892,20 @@ static int blkfront_setup_indirect(struct blkfront_info *info)
for (i = 0; i < BLK_RING_SIZE(info); i++) {
info->shadow[i].grants_used = kzalloc(
- sizeof(info->shadow[i].grants_used[0]) * segs,
+ sizeof(info->shadow[i].grants_used[0]) * grants,
GFP_NOIO);
- info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * segs, GFP_NOIO);
+ info->shadow[i].sg = kzalloc(sizeof(info->shadow[i].sg[0]) * psegs, GFP_NOIO);
if (info->max_indirect_segments)
info->shadow[i].indirect_grants = kzalloc(
sizeof(info->shadow[i].indirect_grants[0]) *
- INDIRECT_GREFS(segs),
+ INDIRECT_GREFS(grants),
GFP_NOIO);
if ((info->shadow[i].grants_used == NULL) ||
(info->shadow[i].sg == NULL) ||
(info->max_indirect_segments &&
(info->shadow[i].indirect_grants == NULL)))
goto out_of_memory;
- sg_init_table(info->shadow[i].sg, segs);
+ sg_init_table(info->shadow[i].sg, psegs);
}
--
2.1.4
The PV block protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity behaving as a
block backend on a non-modified Xen.
It's only necessary to adapt the ring size and the number of request per
indirect frames. The rest of the code is relying on the grant table
code.
Note that the grant table code is allocating a Linux page per grant
which will result to waste 6OKB for every grant when Linux is using 64KB
page granularity. This could be improved by sharing the page between
multiple grants.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: "Roger Pau Monné" <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.
This has been tested only with a loop device. I plan to test passing
hard drive partition but I didn't yet convert the swiotlb code.
Changes in v3:
- Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
characters
---
drivers/block/xen-blkback/blkback.c | 5 +++--
drivers/block/xen-blkback/common.h | 17 +++++++++++++----
drivers/block/xen-blkback/xenbus.c | 9 ++++++---
3 files changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index ced9677..d5cce8c 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
seg[n].nsec = segments[i].last_sect -
segments[i].first_sect + 1;
seg[n].offset = (segments[i].first_sect << 9);
- if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
+ if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(segments[i].last_sect < segments[i].first_sect)) {
rc = -EINVAL;
goto unmap;
@@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
req_operation = req->operation == BLKIF_OP_INDIRECT ?
req->u.indirect.indirect_op : req->operation;
+
if ((req->operation == BLKIF_OP_INDIRECT) &&
(req_operation != BLKIF_OP_READ) &&
(req_operation != BLKIF_OP_WRITE)) {
@@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
seg[i].nsec = req->u.rw.seg[i].last_sect -
req->u.rw.seg[i].first_sect + 1;
seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
- if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
+ if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(req->u.rw.seg[i].last_sect <
req->u.rw.seg[i].first_sect))
goto fail_response;
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 45a044a..68e87a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -39,6 +39,7 @@
#include <asm/pgalloc.h>
#include <asm/hypervisor.h>
#include <xen/grant_table.h>
+#include <xen/page.h>
#include <xen/xenbus.h>
#include <xen/interface/io/ring.h>
#include <xen/interface/io/blkif.h>
@@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
*/
#define MAX_INDIRECT_SEGMENTS 256
-#define SEGS_PER_INDIRECT_FRAME \
- (PAGE_SIZE/sizeof(struct blkif_request_segment))
+/*
+ * Xen use 4K pages. The guest may use different page size (4K or 64K)
+ * Number of Xen pages per segment
+ */
+#define XEN_PAGES_PER_SEGMENT (PAGE_SIZE / XEN_PAGE_SIZE)
+
+#define XEN_PAGES_PER_INDIRECT_FRAME \
+ (XEN_PAGE_SIZE/sizeof(struct blkif_request_segment))
+#define SEGS_PER_INDIRECT_FRAME \
+ (XEN_PAGES_PER_INDIRECT_FRAME / XEN_PAGES_PER_SEGMENT)
+
#define MAX_INDIRECT_PAGES \
((MAX_INDIRECT_SEGMENTS + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
-#define INDIRECT_PAGES(_segs) \
- ((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
+#define INDIRECT_PAGES(_segs) DIV_ROUND_UP(_segs, XEN_PAGES_PER_INDIRECT_FRAME)
/* Not a real protocol. Used to generate ring structs which contain
* the elements common to all protocols only. This way we get a
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index deb3f00..edd27e4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -176,21 +176,24 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
{
struct blkif_sring *sring;
sring = (struct blkif_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.native, sring, PAGE_SIZE * nr_grefs);
+ BACK_RING_INIT(&blkif->blk_rings.native, sring,
+ XEN_PAGE_SIZE * nr_grefs);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
struct blkif_x86_32_sring *sring_x86_32;
sring_x86_32 = (struct blkif_x86_32_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32, PAGE_SIZE * nr_grefs);
+ BACK_RING_INIT(&blkif->blk_rings.x86_32, sring_x86_32,
+ XEN_PAGE_SIZE * nr_grefs);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
struct blkif_x86_64_sring *sring_x86_64;
sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring;
- BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE * nr_grefs);
+ BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64,
+ XEN_PAGE_SIZE * nr_grefs);
break;
}
default:
--
2.1.4
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.
It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.
Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: [email protected]
Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a Linux
using 64KB pages on a non-modified Xen.
Tested with workload such as ping, ssh, wget, git... I would happy if
someone give details how to test all the path.
Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ base on the new naming
- xennet_tx_setup_grant was calling itself resulting an
guest stall when using iperf.
- The grant callback doesn't allow anymore to change the len
(wasn't used here)
- gnttab_foreach_grant has been renamed to gnttab_foreach_grant_in_range
- gnttab_page_grant_foreign_ref has been renamed to
gnttab_foreach_grant_foreign_ref_one
Changes in v2:
- Use gnttab_foreach_grant to split a Linux page in grant
- Fix count slots
---
drivers/net/xen-netfront.c | 122 ++++++++++++++++++++++++++++++++-------------
1 file changed, 86 insertions(+), 36 deletions(-)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 47f791e..204ffb8 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -74,8 +74,8 @@ struct netfront_cb {
#define GRANT_INVALID_REF 0
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
/* Minimum number of Rx slots (includes slot for GSO metadata). */
#define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
@@ -291,7 +291,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
struct sk_buff *skb;
unsigned short id;
grant_ref_t ref;
- unsigned long gfn;
+ struct page *page;
struct xen_netif_rx_request *req;
skb = xennet_alloc_one_rx_buffer(queue);
@@ -307,14 +307,13 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
BUG_ON((signed short)ref < 0);
queue->grant_rx_ref[id] = ref;
- gfn = xen_page_to_gfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+ page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
req = RING_GET_REQUEST(&queue->rx, req_prod);
- gnttab_grant_foreign_access_ref(ref,
- queue->info->xbdev->otherend_id,
- gfn,
- 0);
-
+ gnttab_page_grant_foreign_access_ref_one(ref,
+ queue->info->xbdev->otherend_id,
+ page,
+ 0);
req->id = id;
req->gref = ref;
}
@@ -415,25 +414,33 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
}
-static struct xen_netif_tx_request *xennet_make_one_txreq(
- struct netfront_queue *queue, struct sk_buff *skb,
- struct page *page, unsigned int offset, unsigned int len)
+struct xennet_gnttab_make_txreq {
+ struct netfront_queue *queue;
+ struct sk_buff *skb;
+ struct page *page;
+ struct xen_netif_tx_request *tx; /* Last request */
+ unsigned int size;
+};
+
+static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
{
+ struct xennet_gnttab_make_txreq *info = data;
unsigned int id;
struct xen_netif_tx_request *tx;
grant_ref_t ref;
-
- len = min_t(unsigned int, PAGE_SIZE - offset, len);
+ /* convenient aliases */
+ struct page *page = info->page;
+ struct netfront_queue *queue = info->queue;
+ struct sk_buff *skb = info->skb;
id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
BUG_ON((signed short)ref < 0);
- gnttab_grant_foreign_access_ref(ref,
- queue->info->xbdev->otherend_id,
- xen_page_to_gfn(page),
- GNTMAP_readonly);
+ gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+ gfn, GNTMAP_readonly);
queue->tx_skbs[id].skb = skb;
queue->grant_tx_page[id] = page;
@@ -445,7 +452,34 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
tx->size = len;
tx->flags = 0;
- return tx;
+ info->tx = tx;
+ info->size += tx->size;
+}
+
+static struct xen_netif_tx_request *xennet_make_first_txreq(
+ struct netfront_queue *queue, struct sk_buff *skb,
+ struct page *page, unsigned int offset, unsigned int len)
+{
+ struct xennet_gnttab_make_txreq info = {
+ .queue = queue,
+ .skb = skb,
+ .page = page,
+ .size = 0,
+ };
+
+ gnttab_one_grant(page, offset, len, xennet_tx_setup_grant, &info);
+
+ return info.tx;
+}
+
+static void xennet_make_one_txreq(unsigned long gfn, unsigned int offset,
+ unsigned int len, void *data)
+{
+ struct xennet_gnttab_make_txreq *info = data;
+
+ info->tx->flags |= XEN_NETTXF_more_data;
+ skb_get(info->skb);
+ xennet_tx_setup_grant(gfn, offset, len, data);
}
static struct xen_netif_tx_request *xennet_make_txreqs(
@@ -453,20 +487,30 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
struct sk_buff *skb, struct page *page,
unsigned int offset, unsigned int len)
{
+ struct xennet_gnttab_make_txreq info = {
+ .queue = queue,
+ .skb = skb,
+ .tx = tx,
+ };
+
/* Skip unused frames from start of page */
page += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;
while (len) {
- tx->flags |= XEN_NETTXF_more_data;
- tx = xennet_make_one_txreq(queue, skb_get(skb),
- page, offset, len);
+ info.page = page;
+ info.size = 0;
+
+ gnttab_foreach_grant_in_range(page, offset, len,
+ xennet_make_one_txreq,
+ &info);
+
page++;
offset = 0;
- len -= tx->size;
+ len -= info.size;
}
- return tx;
+ return info.tx;
}
/*
@@ -476,9 +520,10 @@ static struct xen_netif_tx_request *xennet_make_txreqs(
static int xennet_count_skb_slots(struct sk_buff *skb)
{
int i, frags = skb_shinfo(skb)->nr_frags;
- int pages;
+ int slots;
- pages = PFN_UP(offset_in_page(skb->data) + skb_headlen(skb));
+ slots = gnttab_count_grant(offset_in_page(skb->data),
+ skb_headlen(skb));
for (i = 0; i < frags; i++) {
skb_frag_t *frag = skb_shinfo(skb)->frags + i;
@@ -488,10 +533,10 @@ static int xennet_count_skb_slots(struct sk_buff *skb)
/* Skip unused frames from start of page */
offset &= ~PAGE_MASK;
- pages += PFN_UP(offset + size);
+ slots += gnttab_count_grant(offset, size);
}
- return pages;
+ return slots;
}
static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
@@ -512,6 +557,8 @@ static u16 xennet_select_queue(struct net_device *dev, struct sk_buff *skb,
return queue_idx;
}
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct netfront_info *np = netdev_priv(dev);
@@ -546,7 +593,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
slots = xennet_count_skb_slots(skb);
- if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+ if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {
net_dbg_ratelimited("xennet: skb rides the rocket: %d slots, %d bytes\n",
slots, skb->len);
if (skb_linearize(skb))
@@ -567,10 +614,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
/* First request for the linear area. */
- first_tx = tx = xennet_make_one_txreq(queue, skb,
- page, offset, len);
- page++;
- offset = 0;
+ first_tx = tx = xennet_make_first_txreq(queue, skb,
+ page, offset, len);
+ offset += tx->size;
+ if (offset == PAGE_SIZE) {
+ page++;
+ offset = 0;
+ }
len -= tx->size;
if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -732,7 +782,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
for (;;) {
if (unlikely(rx->status < 0 ||
- rx->offset + rx->status > PAGE_SIZE)) {
+ rx->offset + rx->status > XEN_PAGE_SIZE)) {
if (net_ratelimit())
dev_warn(dev, "rx->offset: %u, size: %d\n",
rx->offset, rx->status);
@@ -1495,7 +1545,7 @@ static int setup_netfront(struct xenbus_device *dev,
goto fail;
}
SHARED_RING_INIT(txs);
- FRONT_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+ FRONT_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
err = xenbus_grant_ring(dev, txs, 1, &gref);
if (err < 0)
@@ -1509,7 +1559,7 @@ static int setup_netfront(struct xenbus_device *dev,
goto alloc_rx_ring_fail;
}
SHARED_RING_INIT(rxs);
- FRONT_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+ FRONT_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
err = xenbus_grant_ring(dev, rxs, 1, &gref);
if (err < 0)
--
2.1.4
The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity working as a
network backend on a non-modified Xen.
It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Ian Campbell <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: [email protected]
Improvement such as support of 64KB grant is not taken into
consideration in this patch because we have the requirement to run a
Linux using 64KB pages on a non-modified Xen.
Changes in v3:
- Fix errors reported by checkpatch.pl
- s/mfn/gfn/ based on the new naming
- gnttab_foreach_grant has been renamed to gnttab_forach_grant_in_range
- The grant callback doesn't allow anymore to use less data. An
helpers has been added in netback to handle this.
Changes in v2:
- Correctly set MAX_GRANT_COPY_OPS and XEN_NETBK_RX_SLOTS_MAX
- Don't use XEN_PAGE_SIZE in handle_frag_list as we coalesce
fragment into a new skb
- Use gnntab_foreach_grant to split a Linux page into grant
---
drivers/net/xen-netback/common.h | 15 ++--
drivers/net/xen-netback/netback.c | 153 ++++++++++++++++++++++++--------------
2 files changed, 107 insertions(+), 61 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..bb68211 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -44,6 +44,7 @@
#include <xen/interface/grant_table.h>
#include <xen/grant_table.h>
#include <xen/xenbus.h>
+#include <xen/page.h>
#include <linux/debugfs.h>
typedef unsigned int pending_ring_idx_t;
@@ -64,8 +65,8 @@ struct pending_tx_info {
struct ubuf_info callback_struct;
};
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
struct xenvif_rx_meta {
int id;
@@ -80,16 +81,18 @@ struct xenvif_rx_meta {
/* Discriminate from any valid pending_idx value. */
#define INVALID_PENDING_IDX 0xFFFF
-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
#define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+
/* It's possible for an skb to have a maximal number of frags
* but still be less than MAX_BUFFER_OFFSET in size. Thus the
- * worst-case number of copy operations is MAX_SKB_FRAGS per
+ * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
* ring slot.
*/
-#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
+#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
#define NETBACK_INVALID_HANDLE -1
@@ -203,7 +206,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
/* Maximum number of Rx slots a to-guest packet may use, including the
* slot needed for GSO meta-data.
*/
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
+#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
enum state_bit_shift {
/* This bit marks that the vif is connected */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 66f1780..c32a9f2 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
return meta;
}
+struct gop_frag_copy {
+ struct xenvif_queue *queue;
+ struct netrx_pending_operations *npo;
+ struct xenvif_rx_meta *meta;
+ int head;
+ int gso_type;
+
+ struct page *page;
+};
+
+static void xenvif_setup_copy_gop(unsigned long gfn,
+ unsigned int offset,
+ unsigned int *len,
+ struct gop_frag_copy *info)
+{
+ struct gnttab_copy *copy_gop;
+ struct xen_page_foreign *foreign;
+ /* Convenient aliases */
+ struct xenvif_queue *queue = info->queue;
+ struct netrx_pending_operations *npo = info->npo;
+ struct page *page = info->page;
+
+ BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+ if (npo->copy_off == MAX_BUFFER_OFFSET)
+ info->meta = get_next_rx_buffer(queue, npo);
+
+ if (npo->copy_off + *len > MAX_BUFFER_OFFSET)
+ *len = MAX_BUFFER_OFFSET - npo->copy_off;
+
+ copy_gop = npo->copy + npo->copy_prod++;
+ copy_gop->flags = GNTCOPY_dest_gref;
+ copy_gop->len = *len;
+
+ foreign = xen_page_foreign(page);
+ if (foreign) {
+ copy_gop->source.domid = foreign->domid;
+ copy_gop->source.u.ref = foreign->gref;
+ copy_gop->flags |= GNTCOPY_source_gref;
+ } else {
+ copy_gop->source.domid = DOMID_SELF;
+ copy_gop->source.u.gmfn = gfn;
+ }
+ copy_gop->source.offset = offset;
+
+ copy_gop->dest.domid = queue->vif->domid;
+ copy_gop->dest.offset = npo->copy_off;
+ copy_gop->dest.u.ref = npo->copy_gref;
+
+ npo->copy_off += *len;
+ info->meta->size += *len;
+
+ /* Leave a gap for the GSO descriptor. */
+ if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+ queue->rx.req_cons++;
+
+ info->head = 0; /* There must be something in this buffer now */
+}
+
+static void xenvif_gop_frag_copy_grant(unsigned long gfn,
+ unsigned offset,
+ unsigned int len,
+ void *data)
+{
+ unsigned int bytes;
+
+ while (len) {
+ bytes = len;
+ xenvif_setup_copy_gop(gfn, offset, &bytes, data);
+ offset += bytes;
+ len -= bytes;
+ }
+}
+
/*
* Set up the grant operations for this fragment. If it's a flipping
* interface, we also set up the unmap request from here.
@@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
struct page *page, unsigned long size,
unsigned long offset, int *head)
{
- struct gnttab_copy *copy_gop;
- struct xenvif_rx_meta *meta;
+ struct gop_frag_copy info = {
+ .queue = queue,
+ .npo = npo,
+ .head = *head,
+ .gso_type = XEN_NETIF_GSO_TYPE_NONE,
+ };
unsigned long bytes;
- int gso_type = XEN_NETIF_GSO_TYPE_NONE;
if (skb_is_gso(skb)) {
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
- gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+ info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
- gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+ info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
}
/* Data must not cross a page boundary. */
BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
- meta = npo->meta + npo->meta_prod - 1;
+ info.meta = npo->meta + npo->meta_prod - 1;
/* Skip unused frames from start of page */
page += offset >> PAGE_SHIFT;
offset &= ~PAGE_MASK;
while (size > 0) {
- struct xen_page_foreign *foreign;
-
BUG_ON(offset >= PAGE_SIZE);
- BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
- if (npo->copy_off == MAX_BUFFER_OFFSET)
- meta = get_next_rx_buffer(queue, npo);
bytes = PAGE_SIZE - offset;
if (bytes > size)
bytes = size;
- if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
- bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
- copy_gop = npo->copy + npo->copy_prod++;
- copy_gop->flags = GNTCOPY_dest_gref;
- copy_gop->len = bytes;
-
- foreign = xen_page_foreign(page);
- if (foreign) {
- copy_gop->source.domid = foreign->domid;
- copy_gop->source.u.ref = foreign->gref;
- copy_gop->flags |= GNTCOPY_source_gref;
- } else {
- copy_gop->source.domid = DOMID_SELF;
- copy_gop->source.u.gmfn =
- virt_to_gfn(page_address(page));
- }
- copy_gop->source.offset = offset;
-
- copy_gop->dest.domid = queue->vif->domid;
- copy_gop->dest.offset = npo->copy_off;
- copy_gop->dest.u.ref = npo->copy_gref;
-
- npo->copy_off += bytes;
- meta->size += bytes;
-
- offset += bytes;
+ info.page = page;
+ gnttab_foreach_grant_in_range(page, offset, bytes,
+ xenvif_gop_frag_copy_grant,
+ &info);
size -= bytes;
+ offset = 0;
- /* Next frame */
- if (offset == PAGE_SIZE && size) {
+ /* Next page */
+ if (size) {
BUG_ON(!PageCompound(page));
page++;
- offset = 0;
}
-
- /* Leave a gap for the GSO descriptor. */
- if (*head && ((1 << gso_type) & queue->vif->gso_mask))
- queue->rx.req_cons++;
-
- *head = 0; /* There must be something in this buffer now. */
-
}
+
+ *head = info.head;
}
/*
@@ -747,7 +790,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
first->size -= txp->size;
slots++;
- if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
+ if (unlikely((txp->offset + txp->size) > XEN_PAGE_SIZE)) {
netdev_err(queue->vif->dev, "Cross page boundary, txp->offset: %u, size: %u\n",
txp->offset, txp->size);
xenvif_fatal_tx_err(queue->vif);
@@ -1241,11 +1284,11 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
}
/* No crossing a page as the payload mustn't fragment. */
- if (unlikely((txreq.offset + txreq.size) > PAGE_SIZE)) {
+ if (unlikely((txreq.offset + txreq.size) > XEN_PAGE_SIZE)) {
netdev_err(queue->vif->dev,
"txreq.offset: %u, size: %u, end: %lu\n",
txreq.offset, txreq.size,
- (unsigned long)(txreq.offset&~PAGE_MASK) + txreq.size);
+ (unsigned long)(txreq.offset&~XEN_PAGE_MASK) + txreq.size);
xenvif_fatal_tx_err(queue->vif);
break;
}
@@ -1287,7 +1330,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
virt_to_gfn(skb->data);
queue->tx_copy_ops[*copy_ops].dest.domid = DOMID_SELF;
queue->tx_copy_ops[*copy_ops].dest.offset =
- offset_in_page(skb->data);
+ offset_in_page(skb->data) & ~XEN_PAGE_MASK;
queue->tx_copy_ops[*copy_ops].len = data_len;
queue->tx_copy_ops[*copy_ops].flags = GNTCOPY_source_gref;
@@ -1780,7 +1823,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
goto err;
txs = (struct xen_netif_tx_sring *)addr;
- BACK_RING_INIT(&queue->tx, txs, PAGE_SIZE);
+ BACK_RING_INIT(&queue->tx, txs, XEN_PAGE_SIZE);
err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(queue->vif),
&rx_ring_ref, 1, &addr);
@@ -1788,7 +1831,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
goto err;
rxs = (struct xen_netif_rx_sring *)addr;
- BACK_RING_INIT(&queue->rx, rxs, PAGE_SIZE);
+ BACK_RING_INIT(&queue->rx, rxs, XEN_PAGE_SIZE);
return 0;
--
2.1.4
The hypercall interface (as well as the toolstack) is always using 4KB
page granularity. When the toolstack is asking for mapping a series of
guest PFN in a batch, it expects to have the page map contiguously in
its virtual memory.
When Linux is using 64KB page granularity, the privcmd driver will have
to map multiple Xen PFN in a single Linux page.
Note that this solution works on page granularity which is a multiple of
4KB.
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: David Vrabel <[email protected]>
I kept the hypercall arguments in remap_data to avoid allocating them on
the stack every time that remap_pte_fn is called.
I will keep like that unless someone is strongly disagree.
Changes in v3:
- The function to split a Linux page in mutiple Xen page has
been moved internally. It was the only use (not used anymore in
the balloon) and it's not quite clear what should be the common
interface. Differ the question until someone need to use it.
- s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
- Use DIV_ROUND_UP rather round_up and fix the usage in
xen_xlate_unmap_gfn_range
Changes in v2:
- Use xen_apply_to_page
---
drivers/xen/privcmd.c | 8 ++--
drivers/xen/xlate_mmu.c | 124 ++++++++++++++++++++++++++++++++----------------
2 files changed, 89 insertions(+), 43 deletions(-)
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index c6deb87..c8798ee 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
return -EINVAL;
}
- nr_pages = m.num;
+ nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
@@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
goto out_unlock;
}
if (xen_feature(XENFEAT_auto_translated_physmap)) {
- ret = alloc_empty_pages(vma, m.num);
+ ret = alloc_empty_pages(vma, nr_pages);
if (ret < 0)
goto out_unlock;
} else
@@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
state.global_error = 0;
state.version = version;
+ BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
/* mmap_batch_fn guarantees ret == 0 */
BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
&pagelist, mmap_batch_fn, &state));
@@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
{
struct page **pages = vma->vm_private_data;
int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+ int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
int rc;
if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
return;
- rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
+ rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
if (rc == 0)
free_xenballooned_pages(numpgs, pages);
else
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index cff2387..a1d3904 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -38,31 +38,28 @@
#include <xen/interface/xen.h>
#include <xen/interface/memory.h>
-/* map fgfn of domid to lpfn in the current domain */
-static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
- unsigned int domid)
-{
- int rc;
- struct xen_add_to_physmap_range xatp = {
- .domid = DOMID_SELF,
- .foreign_domid = domid,
- .size = 1,
- .space = XENMAPSPACE_gmfn_foreign,
- };
- xen_ulong_t idx = fgfn;
- xen_pfn_t gpfn = lpfn;
- int err = 0;
+typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
- set_xen_guest_handle(xatp.idxs, &idx);
- set_xen_guest_handle(xatp.gpfns, &gpfn);
- set_xen_guest_handle(xatp.errs, &err);
+/* Break down the pages in 4KB chunk and call fn for each gfn */
+static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
+ xen_gfn_fn_t fn, void *data)
+{
+ unsigned long xen_pfn = 0;
+ struct page *page;
+ int i;
- rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
- return rc < 0 ? rc : err;
+ for (i = 0; i < nr_gfn; i++) {
+ if ((i % XEN_PFN_PER_PAGE) == 0) {
+ page = pages[i / XEN_PFN_PER_PAGE];
+ xen_pfn = xen_page_to_pfn(page);
+ }
+ fn(pfn_to_gfn(xen_pfn++), data);
+ }
}
struct remap_data {
xen_pfn_t *fgfn; /* foreign domain's gfn */
+ int nr_fgfn; /* Number of foreign gfn left to map */
pgprot_t prot;
domid_t domid;
struct vm_area_struct *vma;
@@ -71,24 +68,71 @@ struct remap_data {
struct xen_remap_gfn_info *info;
int *err_ptr;
int mapped;
+
+ /* Hypercall parameters */
+ int h_errs[XEN_PFN_PER_PAGE];
+ xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
+ xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
+
+ int h_iter; /* Iterator */
};
+static void setup_hparams(unsigned long gfn, void *data)
+{
+ struct remap_data *info = data;
+
+ info->h_idxs[info->h_iter] = *info->fgfn;
+ info->h_gpfns[info->h_iter] = gfn;
+ info->h_errs[info->h_iter] = 0;
+
+ info->h_iter++;
+ info->fgfn++;
+}
+
static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
void *data)
{
struct remap_data *info = data;
struct page *page = info->pages[info->index++];
- unsigned long pfn = page_to_pfn(page);
- pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
- int rc;
+ pte_t pte = pte_mkspecial(pfn_pte(page_to_pfn(page), info->prot));
+ int rc, nr_gfn;
+ uint32_t i;
+ struct xen_add_to_physmap_range xatp = {
+ .domid = DOMID_SELF,
+ .foreign_domid = info->domid,
+ .space = XENMAPSPACE_gmfn_foreign,
+ };
- rc = map_foreign_page(pfn, *info->fgfn, info->domid);
- *info->err_ptr++ = rc;
- if (!rc) {
- set_pte_at(info->vma->vm_mm, addr, ptep, pte);
- info->mapped++;
+ nr_gfn = min_t(typeof(info->nr_fgfn), XEN_PFN_PER_PAGE, info->nr_fgfn);
+ info->nr_fgfn -= nr_gfn;
+
+ info->h_iter = 0;
+ xen_for_each_gfn(&page, nr_gfn, setup_hparams, info);
+ BUG_ON(info->h_iter != nr_gfn);
+
+ set_xen_guest_handle(xatp.idxs, info->h_idxs);
+ set_xen_guest_handle(xatp.gpfns, info->h_gpfns);
+ set_xen_guest_handle(xatp.errs, info->h_errs);
+ xatp.size = nr_gfn;
+
+ rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+
+ /* info->err_ptr expect to have one error status per Xen PFN */
+ for (i = 0; i < nr_gfn; i++) {
+ int err = (rc < 0) ? rc : info->h_errs[i];
+
+ *(info->err_ptr++) = err;
+ if (!err)
+ info->mapped++;
}
- info->fgfn++;
+
+ /*
+ * Note: The hypercall will return 0 in most of the case if even if
+ * all the fgmfn are not mapped. We still have to update the pte
+ * as the userspace may decide to continue.
+ */
+ if (!rc)
+ set_pte_at(info->vma->vm_mm, addr, ptep, pte);
return 0;
}
@@ -102,13 +146,14 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
{
int err;
struct remap_data data;
- unsigned long range = nr << PAGE_SHIFT;
+ unsigned long range = DIV_ROUND_UP(nr, XEN_PFN_PER_PAGE) << PAGE_SHIFT;
/* Kept here for the purpose of making sure code doesn't break
x86 PVOPS */
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
data.fgfn = gfn;
+ data.nr_fgfn = nr;
data.prot = prot;
data.domid = domid;
data.vma = vma;
@@ -123,21 +168,20 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
}
EXPORT_SYMBOL_GPL(xen_xlate_remap_gfn_array);
-int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
- int nr, struct page **pages)
+static void unmap_gfn(unsigned long gfn, void *data)
{
- int i;
+ struct xen_remove_from_physmap xrp;
- for (i = 0; i < nr; i++) {
- struct xen_remove_from_physmap xrp;
- unsigned long pfn;
+ xrp.domid = DOMID_SELF;
+ xrp.gpfn = gfn;
+ (void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+}
- pfn = page_to_pfn(pages[i]);
+int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+ int nr, struct page **pages)
+{
+ xen_for_each_gfn(pages, nr, unmap_gfn, NULL);
- xrp.domid = DOMID_SELF;
- xrp.gpfn = pfn;
- (void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
- }
return 0;
}
EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
--
2.1.4
The hypercall interface is always using 4KB page granularity. This is
requiring to use xen page definition macro when we deal with hypercall.
Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
rename pfn_gfn to make this explicit.
We also allocate a 64KB page for the shared page even though only the
first 4KB is used. I don't think this is really important for now as it
helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
Xen PFN).
Signed-off-by: Julien Grall <[email protected]>
---
Cc: Stefano Stabellini <[email protected]>
Cc: Russell King <[email protected]>
Stefano, I've dropped your reviewed-by given I've updated the doc and do
changes to avoid usage of XEN_PAGE_SHIFT
Changes in v3:
- s/MFN/GFN/ base on the new naming
- Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
- Drop Stefano's reviewed-by
- Add some docs in arch/arm/asm/xen/page.h
Changes in v2
- Add Stefano's reviewed-by
---
arch/arm/include/asm/xen/page.h | 15 +++++++++++++--
arch/arm/xen/enlighten.c | 6 +++---
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 98c9fc3..e3d94cf 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -28,6 +28,17 @@ typedef struct xpaddr {
#define INVALID_P2M_ENTRY (~0UL)
+/*
+ * The pseudo-physical frame (pfn) used in all the helpers is always based
+ * on Xen page granularity (i.e 4KB).
+ *
+ * A Linux page may be split across multiple non-contiguous Xen page so we
+ * have to keep track with frame based on 4KB page granularity.
+ *
+ * PV drivers should never make a direct usage of those helpers (particularly
+ * pfn_to_gfn and gfn_to_pfn).
+ */
+
unsigned long __pfn_to_mfn(unsigned long pfn);
extern struct rb_root phys_to_mach;
@@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
#define bfn_to_local_pfn(bfn) bfn_to_pfn(bfn)
/* VIRT <-> GUEST conversion */
-#define virt_to_gfn(v) (pfn_to_gfn(virt_to_pfn(v)))
-#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << PAGE_SHIFT))
+#define virt_to_gfn(v) (pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
+#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
/* Only used in PV code. But ARM guests are always HVM. */
static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index eeeab07..50b4769 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,8 +89,8 @@ static void xen_percpu_init(void)
pr_info("Xen: initializing cpu%d\n", cpu);
vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
- info.mfn = __pa(vcpup) >> PAGE_SHIFT;
- info.offset = offset_in_page(vcpup);
+ info.mfn = virt_to_gfn(vcpup);
+ info.offset = xen_offset_in_page(vcpup);
err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
BUG_ON(err);
@@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
- xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+ xatp.gpfn = virt_to_gfn(shared_info_page);
if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
BUG();
--
2.1.4
On 07/08/15 17:46, Julien Grall wrote:
> Hi all,
>
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
>
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
>
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
>
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
>
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
>
> TODO list:
> - Convert swiotlb to 64KB
> - Convert xenfb to 64KB
> - Check if backend in QEMU works with DOM0 64KB
> - It may be possible to move some common define between
> netback/netfront and blkfront/blkback in an header
>
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.
>
> A branch based on the latest linux/master can be found here:
Sorry, I forgot to update this bits in the cover letter.
It's based on xentip for-linus-4.3 and depends on "Use correctly the Xen
memory terminologies". The v3 has been sent few minutes ago [1]
>
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v3
>
> Comments, suggestions are welcomed.
>
> Sincerely yours,
[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00619.html
--
Julien Grall
On Fri, Aug 07, 2015 at 05:46:40PM +0100, Julien Grall wrote:
> The skb doesn't change within the function. Therefore it's only
> necessary to check if we need GSO once at the beginning.
>
> Signed-off-by: Julien Grall <[email protected]>
>
Acked-by: Wei Liu <[email protected]>
On Fri, Aug 07, 2015 at 05:46:57PM +0100, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity working as a
> network backend on a non-modified Xen.
>
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
>
> Signed-off-by: Julien Grall <[email protected]>
>
> ---
> Cc: Ian Campbell <[email protected]>
> Cc: Wei Liu <[email protected]>
> Cc: [email protected]
>
[...]
> +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
> +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
>
> struct xenvif_rx_meta {
> int id;
> @@ -80,16 +81,18 @@ struct xenvif_rx_meta {
> /* Discriminate from any valid pending_idx value. */
> #define INVALID_PENDING_IDX 0xFFFF
>
> -#define MAX_BUFFER_OFFSET PAGE_SIZE
> +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
>
> #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
>
> +#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
> +
It might be clearer if you add a comment saying the maximum number of
frags is derived from the page size of the grant page, which happens to
be XEN_PAGE_SIZE at the moment.
In the future we need to figure out the page size of grant page in a
dynamic way. We shall cross the bridge when we get there.
> /* It's possible for an skb to have a maximal number of frags
> * but still be less than MAX_BUFFER_OFFSET in size. Thus the
> - * worst-case number of copy operations is MAX_SKB_FRAGS per
> + * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
> * ring slot.
> */
> -#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
> +#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
>
> #define NETBACK_INVALID_HANDLE -1
>
> @@ -203,7 +206,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
> /* Maximum number of Rx slots a to-guest packet may use, including the
> * slot needed for GSO meta-data.
> */
> -#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
> +#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
>
> enum state_bit_shift {
> /* This bit marks that the vif is connected */
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 66f1780..c32a9f2 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
> return meta;
> }
>
[...]
> * Set up the grant operations for this fragment. If it's a flipping
> * interface, we also set up the unmap request from here.
> @@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
> struct page *page, unsigned long size,
> unsigned long offset, int *head)
> {
> - struct gnttab_copy *copy_gop;
> - struct xenvif_rx_meta *meta;
> + struct gop_frag_copy info = {
> + .queue = queue,
> + .npo = npo,
> + .head = *head,
> + .gso_type = XEN_NETIF_GSO_TYPE_NONE,
> + };
> unsigned long bytes;
> - int gso_type = XEN_NETIF_GSO_TYPE_NONE;
>
> if (skb_is_gso(skb)) {
> if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> - gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> + info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> - gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> + info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> }
>
> /* Data must not cross a page boundary. */
> BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>
> - meta = npo->meta + npo->meta_prod - 1;
> + info.meta = npo->meta + npo->meta_prod - 1;
>
> /* Skip unused frames from start of page */
> page += offset >> PAGE_SHIFT;
> offset &= ~PAGE_MASK;
>
> while (size > 0) {
> - struct xen_page_foreign *foreign;
> -
> BUG_ON(offset >= PAGE_SIZE);
> - BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
> -
> - if (npo->copy_off == MAX_BUFFER_OFFSET)
> - meta = get_next_rx_buffer(queue, npo);
>
> bytes = PAGE_SIZE - offset;
> if (bytes > size)
> bytes = size;
>
> - if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
> - bytes = MAX_BUFFER_OFFSET - npo->copy_off;
> -
> - copy_gop = npo->copy + npo->copy_prod++;
> - copy_gop->flags = GNTCOPY_dest_gref;
> - copy_gop->len = bytes;
> -
> - foreign = xen_page_foreign(page);
> - if (foreign) {
> - copy_gop->source.domid = foreign->domid;
> - copy_gop->source.u.ref = foreign->gref;
> - copy_gop->flags |= GNTCOPY_source_gref;
> - } else {
> - copy_gop->source.domid = DOMID_SELF;
> - copy_gop->source.u.gmfn =
> - virt_to_gfn(page_address(page));
> - }
> - copy_gop->source.offset = offset;
> -
> - copy_gop->dest.domid = queue->vif->domid;
> - copy_gop->dest.offset = npo->copy_off;
> - copy_gop->dest.u.ref = npo->copy_gref;
> -
> - npo->copy_off += bytes;
> - meta->size += bytes;
> -
> - offset += bytes;
> + info.page = page;
> + gnttab_foreach_grant_in_range(page, offset, bytes,
> + xenvif_gop_frag_copy_grant,
> + &info);
Looks like I need to at least wait until the API is settle before giving
my ack.
> size -= bytes;
> + offset = 0;
This looks wrong. Should be offset += bytes.
>
> - /* Next frame */
> - if (offset == PAGE_SIZE && size) {
> + /* Next page */
> + if (size) {
> BUG_ON(!PageCompound(page));
> page++;
> - offset = 0;
And this should not be deleted, I think.
What is the reason for changing offset calculation? I think there is
still compound page when using 64K page.
> }
> -
> - /* Leave a gap for the GSO descriptor. */
> - if (*head && ((1 << gso_type) & queue->vif->gso_mask))
> - queue->rx.req_cons++;
> -
> - *head = 0; /* There must be something in this buffer now. */
> -
> }
> +
> + *head = info.head;
> }
>
The reset looks OK.
Wei.
Hi Wei,
On 08/08/2015 15:55, Wei Liu wrote:
>> struct xenvif_rx_meta {
>> int id;
>> @@ -80,16 +81,18 @@ struct xenvif_rx_meta {
>> /* Discriminate from any valid pending_idx value. */
>> #define INVALID_PENDING_IDX 0xFFFF
>>
>> -#define MAX_BUFFER_OFFSET PAGE_SIZE
>> +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
>>
>> #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
>>
>> +#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
>> +
>
> It might be clearer if you add a comment saying the maximum number of
> frags is derived from the page size of the grant page, which happens to
> be XEN_PAGE_SIZE at the moment.
Will do.
> In the future we need to figure out the page size of grant page in a
> dynamic way. We shall cross the bridge when we get there.
Right, there is few other places where we would need to do that too (see
MAX_BUFFER_OFFSET for instance).
[..]
>> + info.page = page;
>> + gnttab_foreach_grant_in_range(page, offset, bytes,
>> + xenvif_gop_frag_copy_grant,
>> + &info);
>
> Looks like I need to at least wait until the API is settle before giving
> my ack.
>
>> size -= bytes;
>> + offset = 0;
>
> This looks wrong. Should be offset += bytes.
With the new implementation of the loop, each iteration will be on a
different page.
So only the first page has an offset different than zero.
>
>>
>> - /* Next frame */
>> - if (offset == PAGE_SIZE && size) {
>> + /* Next page */
>> + if (size) {
>> BUG_ON(!PageCompound(page));
>> page++;
>> - offset = 0;
>
> And this should not be deleted, I think.
>
> What is the reason for changing offset calculation? I think there is
> still compound page when using 64K page.
The compound pages are still working ... gnttab_foreach_grant_in_range
is called once per page. So the offset can be reset to 0 every time. No
need to add code which would make the result less clear.
We only need to know if the size is not 0 to get the next page.
The patch may not be clear enough to see it's working so I've copied the
result loop below:
while (size > 0) {
BUG_ON(offset >= PAGE_SIZE);
bytes = PAGE_SIZE - offset;
if (bytes > size)
bytes = size;
info.page = page;
gnttab_foreach_grant_in_range(page, offset, bytes,
xenvif_gop_frag_copy_grant,
&info);
size -= bytes;
offset = 0;
/* Next page */
if (size) {
BUG_ON(!PageCompound(page));
page++;
}
}
Regards,
--
Julien Grall
On Fri, 7 Aug 2015, Julien Grall wrote:
> They are not used in common code expect in one place in balloon.c which is
> only compiled when Linux is using PV MMU. It's not the case on ARM.
>
> Rather than worrying how to handle the 64KB case, drop them.
>
> Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
> Stefano Stabellini <[email protected]>
> Russell King <[email protected]>
>
> Changes in v3:
> - Patch added
> ---
> arch/arm/include/asm/xen/page.h | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 1279563..98c9fc3 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -13,9 +13,6 @@
>
> #define phys_to_machine_mapping_valid(pfn) (1)
>
> -#define pte_mfn pte_pfn
> -#define mfn_pte pfn_pte
> -
> /* Xen machine address */
> typedef struct xmaddr {
> phys_addr_t maddr;
> --
> 2.1.4
>
On Fri, 7 Aug 2015, Julien Grall wrote:
> Currently, a grant is always based on the Xen page granularity (i.e
> 4KB). When Linux is using a different page granularity, a single page
> will be split between multiple grants.
>
> The new helpers will be in charge to split the Linux page into grants and
^ of splitting
> call a function given by the caller on each grant.
>
> Also provide an helper to count the number of grants within a given
> contiguous region.
>
> Note that the x86/include/asm/xen/page.h is now including
> xen/interface/grant_table.h rather than xen/grant_table.h. It's
> necessary because xen/grant_table.h depends on asm/xen/page.h and will
> break the compilation. Furthermore, only definition in
> interface/grant_table.h was required.
^ is
> Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
> ---
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
>
> Changes in v3:
> - Fix error reported by checkpatch.pl
> - Typoes
> - s/pfn/xen_pfn/ in gnttab_foreach_grant
> - Drop the possibility to use less data. The complexity is moved
> in netback which is the only user
> - Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
> - s/offset/start/ in gnttab_count_grant and update the
> description of the parameter
> - s/mfn/gfn base on the new terminologies
> - Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
> - Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
> - Fix compilation on x86.
>
> Changes in v2:
> - Patch added
> ---
> arch/x86/include/asm/xen/page.h | 2 +-
> drivers/xen/grant-table.c | 26 +++++++++++++++++++++++++
> include/xen/grant_table.h | 42 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index 0b762f6..501479e 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -12,7 +12,7 @@
> #include <asm/pgtable.h>
>
> #include <xen/interface/xen.h>
> -#include <xen/grant_table.h>
> +#include <xen/interface/grant_table.h>
> #include <xen/features.h>
>
> /* Xen machine address */
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 62f591f..94ae0fd 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
> }
> EXPORT_SYMBOL_GPL(gnttab_batch_copy);
>
> +void gnttab_foreach_grant_in_range(struct page *page,
> + unsigned int offset,
> + unsigned int len,
> + xen_grant_fn_t fn,
> + void *data)
> +{
> + unsigned int goffset;
> + unsigned int glen;
> + unsigned long xen_pfn;
> +
> + len = min_t(unsigned int, PAGE_SIZE - offset, len);
> + goffset = xen_offset_in_page(offset);
> +
> + xen_pfn = xen_page_to_pfn(page) + XEN_PFN_DOWN(offset);
> +
> + while (len) {
> + glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
> + fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
> +
> + goffset = 0;
> + xen_pfn++;
> + len -= glen;
> + }
> +}
> +EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
> +
> int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> struct page **pages, unsigned int count)
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 4478f4b..2a8ebe8 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -45,8 +45,10 @@
> #include <asm/xen/hypervisor.h>
>
> #include <xen/features.h>
> +#include <xen/page.h>
> #include <linux/mm_types.h>
> #include <linux/page-flags.h>
> +#include <linux/kernel.h>
>
> #define GNTTAB_RESERVED_XENSTORE 1
>
> @@ -224,4 +226,44 @@ static inline struct xen_page_foreign *xen_page_foreign(struct page *page)
> #endif
> }
>
> +/* Split Linux page in chunk of the size of the grant and call fn
> + *
> + * Parameters of fn:
> + * gfn: guest frame number
> + * offset: offset in the grant
> + * len: length of the data in the grant.
> + * data: internal information
> + */
> +typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
> + unsigned int len, void *data);
> +
> +void gnttab_foreach_grant_in_range(struct page *page,
> + unsigned int offset,
> + unsigned int len,
> + xen_grant_fn_t fn,
> + void *data);
> +
> +/* Helper to get to call fn only on the first "grant chunk" */
> +static inline void gnttab_one_grant(struct page *page, unsigned int offset,
> + unsigned len, xen_grant_fn_t fn,
> + void *data)
> +{
> + /* The first request is limited to the size of one grant */
> + len = min_t(unsigned int, XEN_PAGE_SIZE - (offset & ~XEN_PAGE_MASK),
> + len);
> +
> + gnttab_foreach_grant_in_range(page, offset, len, fn, data);
> +}
> +
> +/* Get the number of grant in a specified region
> + *
> + * start: Offset from the beginning of the first page
> + * len: total length of data (can cross multiple page)
> + */
> +static inline unsigned int gnttab_count_grant(unsigned int start,
> + unsigned int len)
> +{
> + return XEN_PFN_UP(xen_offset_in_page(start) + len);
> +}
> +
> #endif /* __ASM_GNTTAB_H__ */
> --
> 2.1.4
>
On Fri, 7 Aug 2015, Julien Grall wrote:
> The Xen hypercall interface is always using 4K page granularity on ARM
> and x86 architecture.
>
> With the incoming support of 64K page granularity for ARM64 guest, it
> won't be possible to re-use the Linux page definition in Xen drivers.
>
> Introduce Xen page definition helpers based on the Linux page
> definition. They have exactly the same name but prefixed with
> XEN_/xen_ prefix.
>
> Also modify xen_page_to_gfn to use new Xen page definition.
>
> Signed-off-by: Julien Grall <[email protected]>
> Reviewed-by: Stefano Stabellini <[email protected]>
>
> ---
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
>
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - Rename pfn to xen_pfn in xen_pfn_to_page
> - Add a comment that we assume PAGE_SIZE to be a multiple of
> XEN_PAGE_SIZE
> - s/MFN/GFN/ according to new naming
> - Add Stefano's reviewed-by
>
> Changes in v2:
> - Add XEN_PFN_UP
> - Add a comment describing the behavior of page_to_pfn
> ---
> include/xen/page.h | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/xen/page.h b/include/xen/page.h
> index f202992..dac1b26 100644
> --- a/include/xen/page.h
> +++ b/include/xen/page.h
> @@ -1,11 +1,36 @@
> #ifndef _XEN_PAGE_H
> #define _XEN_PAGE_H
>
> +#include <asm/page.h>
> +
> +/* The hypercall interface supports only 4KB page */
> +#define XEN_PAGE_SHIFT 12
> +#define XEN_PAGE_SIZE (_AC(1, UL) << XEN_PAGE_SHIFT)
> +#define XEN_PAGE_MASK (~(XEN_PAGE_SIZE-1))
> +#define xen_offset_in_page(p) ((unsigned long)(p) & ~XEN_PAGE_MASK)
> +
> +/*
> + * We asume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
^ assume
> + * XXX: Add a BUILD_BUG_ON?
> + */
> +
> +#define xen_pfn_to_page(xen_pfn) \
> + ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> PAGE_SHIFT)))
> +#define xen_page_to_pfn(page) \
> + (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
> +
> +#define XEN_PFN_PER_PAGE (PAGE_SIZE / XEN_PAGE_SIZE)
> +
> +#define XEN_PFN_DOWN(x) ((x) >> XEN_PAGE_SHIFT)
> +#define XEN_PFN_UP(x) (((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
> +#define XEN_PFN_PHYS(x) ((phys_addr_t)(x) << XEN_PAGE_SHIFT)
> +
> #include <asm/xen/page.h>
>
> +/* Return the GFN associated to the first 4KB of the page */
> static inline unsigned long xen_page_to_gfn(struct page *page)
> {
> - return pfn_to_gfn(page_to_pfn(page));
> + return pfn_to_gfn(xen_page_to_pfn(page));
> }
>
> struct xen_memory_region {
> --
> 2.1.4
>
On Fri, 7 Aug 2015, Julien Grall wrote:
> On ARM all dma-capable devices on a same platform may not be protected
> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
> order to use correctly the device.
>
> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
> mapping will screw this contiguous mapping.
>
> When Linux is using 64KB page granularitary, the page may be split
> accross multiple non-contiguous MFN (Xen is using 4KB page
> granularity). Therefore a DMA request will likely fail.
>
> Checking that a 64KB page is using contiguous MFN is tedious. For
> now, always says that biovec are not mergeable.
>
> Signed-off-by: Julien Grall <[email protected]>
Please fix the grammar in the subject line.
Reviewed-by: Stefano Stabellini <[email protected]>
> ---
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
>
> There is some ideas to check whether two biovec could be merged
> (see [1]) but it's not critical and can be consider as a performance
> improvement.
>
> Changes in v3:
> - Update commit message
> - s/mfn/bfn/ base on the new renaming
> - Update TODO
>
> Changes in v2:
> - Remove the workaround and check if the Linux page granularity
> is the same as Xen or not
>
> [1] https://lkml.org/lkml/2015/7/17/418
> ---
> drivers/xen/biomerge.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
> index 8ae2fc90..4da69db 100644
> --- a/drivers/xen/biomerge.c
> +++ b/drivers/xen/biomerge.c
> @@ -6,10 +6,18 @@
> bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> const struct bio_vec *vec2)
> {
> +#if XEN_PAGE_SIZE == PAGE_SIZE
> unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
> unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
>
> return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
> ((bfn1 == bfn2) || ((bfn1+1) == bfn2));
> +#else
> + /*
> + * XXX: Add support for merging bio_vec when using different page
> + * size in Xen and Linux.
^ sizes
> + */
> + return 0;
> +#endif
> }
> EXPORT_SYMBOL(xen_biovec_phys_mergeable);
> --
> 2.1.4
>
On Fri, 7 Aug 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
>
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
>
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
>
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
>
> Signed-off-by: Julien Grall <[email protected]>
This is much better than the previous version. Good idea using the
extent_order field.
I only have a minor comment below.
> ---
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
> Cc: Wei Liu <[email protected]>
>
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - s/mfn/gfn/ based on the new naming
> - Rather than splitting the page into 4KB chunk, use the
> extent_order field to allocate directly a Linux page size. This
> is avoid lots of code for no benefits.
>
> Changes in v2:
> - Use xen_apply_to_page to split a page in 4K chunk
> - It's not necessary to have a smaller frame list. Re-use
> PAGE_SIZE
> - Convert reserve_additional_memory to use XEN_... macro
> ---
> drivers/xen/balloon.c | 47 ++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 36 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 9734649..c8739c8 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,9 @@
> #include <xen/features.h>
> #include <xen/page.h>
>
> +/* Use one extent per PAGE_SIZE to avoid break down into multiple frame */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
> /*
> * balloon_process() state:
> *
> @@ -230,6 +233,11 @@ static enum bp_state reserve_additional_memory(long credit)
> nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>
> #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> /*
> * add_memory() will build page tables for the new memory so
> * the p2m must contain invalid entries so the correct
> @@ -326,11 +334,11 @@ static enum bp_state reserve_additional_memory(long credit)
> static enum bp_state increase_reservation(unsigned long nr_pages)
> {
> int rc;
> - unsigned long pfn, i;
> + unsigned long i;
> struct page *page;
> struct xen_memory_reservation reservation = {
> .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
> .domid = DOMID_SELF
> };
>
> @@ -352,7 +360,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> nr_pages = i;
> break;
> }
> - frame_list[i] = page_to_pfn(page);
> +
> + /* XENMEM_populate_physmap requires a PFN based on Xen
> + * granularity.
> + */
> + frame_list[i] = xen_page_to_pfn(page);
> page = balloon_next_page(page);
> }
>
> @@ -366,10 +378,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> page = balloon_retrieve(false);
> BUG_ON(page == NULL);
>
> - pfn = page_to_pfn(page);
> -
> #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
> set_phys_to_machine(pfn, frame_list[i]);
>
> /* Link back into the page tables if not highmem. */
> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> {
> enum bp_state state = BP_DONE;
> - unsigned long pfn, i;
> + unsigned long i;
> struct page *page;
> int ret;
> struct xen_memory_reservation reservation = {
> .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
> .domid = DOMID_SELF
> };
> + static struct page *pages[ARRAY_SIZE(frame_list)];
This array can be rather large: I would try to avoid it, see below.
> #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
> if (balloon_stats.hotplug_pages) {
> @@ -426,7 +444,9 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> }
> scrub_page(page);
>
> - frame_list[i] = page_to_pfn(page);
> + /* XENMEM_decrease_reservation requires a GFN */
> + frame_list[i] = xen_page_to_gfn(page);
> + pages[i] = page;
> }
>
> /*
> @@ -440,12 +460,17 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>
> /* Update direct mapping, invalidate P2M, and add to balloon. */
> for (i = 0; i < nr_pages; i++) {
> - pfn = frame_list[i];
> - frame_list[i] = pfn_to_gfn(pfn);
> - page = pfn_to_page(pfn);
> + page = pages[i];
>
> #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
> if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
I would simply and avoid introducing a new array:
pfn = (frame_list[i] << XEN_PAGE_SHIFT) >> PAGE_SHIFT;
page = pfn_to_page(pfn);
> if (!PageHighMem(page)) {
> ret = HYPERVISOR_update_va_mapping(
> (unsigned long)__va(pfn << PAGE_SHIFT),
> --
> 2.1.4
>
Hi Stefano,
On 10/08/15 11:50, Stefano Stabellini wrote:
> On Fri, 7 Aug 2015, Julien Grall wrote:
>> On ARM all dma-capable devices on a same platform may not be protected
>> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
>> order to use correctly the device.
>>
>> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
>> mapping will screw this contiguous mapping.
>>
>> When Linux is using 64KB page granularitary, the page may be split
>> accross multiple non-contiguous MFN (Xen is using 4KB page
>> granularity). Therefore a DMA request will likely fail.
>>
>> Checking that a 64KB page is using contiguous MFN is tedious. For
>> now, always says that biovec are not mergeable.
>>
>> Signed-off-by: Julien Grall <[email protected]>
>
> Please fix the grammar in the subject line.
If I made a mistake it's unlikely that I will find myself which one I made.
Anyway, I guess you mean to replace merge by merged? I don't see any
other in the subject.
Regards,
--
Julien Grall
On Mon, 10 Aug 2015, Julien Grall wrote:
> Hi Stefano,
>
> On 10/08/15 11:50, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2015, Julien Grall wrote:
> >> On ARM all dma-capable devices on a same platform may not be protected
> >> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
> >> order to use correctly the device.
> >>
> >> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
> >> mapping will screw this contiguous mapping.
> >>
> >> When Linux is using 64KB page granularitary, the page may be split
> >> accross multiple non-contiguous MFN (Xen is using 4KB page
> >> granularity). Therefore a DMA request will likely fail.
> >>
> >> Checking that a 64KB page is using contiguous MFN is tedious. For
> >> now, always says that biovec are not mergeable.
> >>
> >> Signed-off-by: Julien Grall <[email protected]>
> >
> > Please fix the grammar in the subject line.
>
> If I made a mistake it's unlikely that I will find myself which one I made.
>
> Anyway, I guess you mean to replace merge by merged? I don't see any
> other in the subject.
yes and page/pages:
xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
Hi Stefano,
On 10/08/15 12:18, Stefano Stabellini wrote:
>> /* Link back into the page tables if not highmem. */
>> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>> static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>> {
>> enum bp_state state = BP_DONE;
>> - unsigned long pfn, i;
>> + unsigned long i;
>> struct page *page;
>> int ret;
>> struct xen_memory_reservation reservation = {
>> .address_bits = 0,
>> - .extent_order = 0,
>> + .extent_order = EXTENT_ORDER,
>> .domid = DOMID_SELF
>> };
>> + static struct page *pages[ARRAY_SIZE(frame_list)];
>
> This array can be rather large: I would try to avoid it, see below.
[..]
>
> I would simply and avoid introducing a new array:
> pfn = (frame_list[i] << XEN_PAGE_SHIFT) >> PAGE_SHIFT;
> page = pfn_to_page(pfn);
Which won't work because the frame_list contains a gfn and not a pfn.
We need to translate back the gfn into a pfn and the into a page.
The cost of the translation may be big and I wanted to avoid anymore
XEN_PAGE_SHIFT in the code. In general we should avoid to deal with 4KB
PFN when it's not necessary, it make the code more confusing to read.
If your only concern is the size of the array, we could decrease the
number of frames by batch. Or allocation the variable once a boot time.
Regards,
--
Julien Grall
On 10/08/15 12:25, Stefano Stabellini wrote:
> yes and page/pages:
>
> xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
Why the ' in biovec's ? Shouldn't we says biovecs directly?
Regards,
--
Julien Grall
On Mon, Aug 10, 2015 at 10:57:48AM +0100, Julien Grall wrote:
[...]
>
> >>+ info.page = page;
> >>+ gnttab_foreach_grant_in_range(page, offset, bytes,
> >>+ xenvif_gop_frag_copy_grant,
> >>+ &info);
> >
> >Looks like I need to at least wait until the API is settle before giving
> >my ack.
> >
> >> size -= bytes;
> >>+ offset = 0;
> >
> >This looks wrong. Should be offset += bytes.
>
> With the new implementation of the loop, each iteration will be on a
> different page.
> So only the first page has an offset different than zero.
>
> >
> >>
> >>- /* Next frame */
> >>- if (offset == PAGE_SIZE && size) {
> >>+ /* Next page */
> >>+ if (size) {
> >> BUG_ON(!PageCompound(page));
> >> page++;
> >>- offset = 0;
> >
> >And this should not be deleted, I think.
> >
> >What is the reason for changing offset calculation? I think there is
> >still compound page when using 64K page.
>
> The compound pages are still working ... gnttab_foreach_grant_in_range is
> called once per page. So the offset can be reset to 0 every time. No need to
> add code which would make the result less clear.
>
> We only need to know if the size is not 0 to get the next page.
>
> The patch may not be clear enough to see it's working so I've copied the
> result loop below:
>
> while (size > 0) {
> BUG_ON(offset >= PAGE_SIZE);
>
> bytes = PAGE_SIZE - offset;
> if (bytes > size)
> bytes = size;
>
> info.page = page;
> gnttab_foreach_grant_in_range(page, offset, bytes,
> xenvif_gop_frag_copy_grant,
> &info);
> size -= bytes;
> offset = 0;
>
> /* Next page */
> if (size) {
> BUG_ON(!PageCompound(page));
> page++;
> }
> }
>
Right. That doesn't mean the original code was wrong or anything. But I
don't want to bikeshed about this.
Please add a comment saying that offset is always 0 starting from second
iteration because the gnttab_foreach_grant_in_range makes sure we handle
one page in one go.
Wei.
> Regards,
>
> --
> Julien Grall
On 10/08/15 12:39, Wei Liu wrote:
> On Mon, Aug 10, 2015 at 10:57:48AM +0100, Julien Grall wrote:
>> while (size > 0) {
>> BUG_ON(offset >= PAGE_SIZE);
>>
>> bytes = PAGE_SIZE - offset;
>> if (bytes > size)
>> bytes = size;
>>
>> info.page = page;
>> gnttab_foreach_grant_in_range(page, offset, bytes,
>> xenvif_gop_frag_copy_grant,
>> &info);
>> size -= bytes;
>> offset = 0;
>>
>> /* Next page */
>> if (size) {
>> BUG_ON(!PageCompound(page));
>> page++;
>> }
>> }
>>
>
> Right. That doesn't mean the original code was wrong or anything. But I
> don't want to bikeshed about this.
I never said the original code was wrong... The original code was
allowing the possibility to copy less data than the length contained in
page.
In the new version, it has been pushed with the callback
xenvif_gop_frag_copy_grant.
> Please add a comment saying that offset is always 0 starting from second
> iteration because the gnttab_foreach_grant_in_range makes sure we handle
> one page in one go.
I think this is superfluous. To be honest, the comment should have been
on the original version and not in the new one. The construction of the
loop was far from obvious that we copied less data.
In this new version, the reason is not because of
gnttab_foreach_grant_in_range is always a page but how the loop has been
constructed.
If you look how bytes has been defined, it will always contain
min(PAGE_SIZE - offset, size)
So for the first page, this will be PAGE_SIZE - offset. A the end of the
loop we reset the offset 0, indeed we copy all the data of the first
page. For the second page and onwards this will always be PAGE_SIZE
except for the last one where we took size.
Regards,
--
Julien Grall
On Fri, 7 Aug 2015, Julien Grall wrote:
> The hypercall interface (as well as the toolstack) is always using 4KB
> page granularity. When the toolstack is asking for mapping a series of
> guest PFN in a batch, it expects to have the page map contiguously in
> its virtual memory.
>
> When Linux is using 64KB page granularity, the privcmd driver will have
> to map multiple Xen PFN in a single Linux page.
>
> Note that this solution works on page granularity which is a multiple of
> 4KB.
>
> Signed-off-by: Julien Grall <[email protected]>
>
> ---
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
>
> I kept the hypercall arguments in remap_data to avoid allocating them on
> the stack every time that remap_pte_fn is called.
> I will keep like that unless someone is strongly disagree.
>
> Changes in v3:
> - The function to split a Linux page in mutiple Xen page has
> been moved internally. It was the only use (not used anymore in
> the balloon) and it's not quite clear what should be the common
> interface. Differ the question until someone need to use it.
> - s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
> - Use DIV_ROUND_UP rather round_up and fix the usage in
> xen_xlate_unmap_gfn_range
>
> Changes in v2:
> - Use xen_apply_to_page
> ---
> drivers/xen/privcmd.c | 8 ++--
> drivers/xen/xlate_mmu.c | 124 ++++++++++++++++++++++++++++++++----------------
> 2 files changed, 89 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index c6deb87..c8798ee 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
> return -EINVAL;
> }
>
> - nr_pages = m.num;
> + nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
> if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
> return -EINVAL;
>
> @@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
> goto out_unlock;
> }
> if (xen_feature(XENFEAT_auto_translated_physmap)) {
> - ret = alloc_empty_pages(vma, m.num);
> + ret = alloc_empty_pages(vma, nr_pages);
> if (ret < 0)
> goto out_unlock;
> } else
> @@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
> state.global_error = 0;
> state.version = version;
>
> + BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
> /* mmap_batch_fn guarantees ret == 0 */
> BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
> &pagelist, mmap_batch_fn, &state));
> @@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
> {
> struct page **pages = vma->vm_private_data;
> int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> + int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
> int rc;
>
> if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
> return;
>
> - rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
> + rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
> if (rc == 0)
> free_xenballooned_pages(numpgs, pages);
> else
> diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
> index cff2387..a1d3904 100644
> --- a/drivers/xen/xlate_mmu.c
> +++ b/drivers/xen/xlate_mmu.c
> @@ -38,31 +38,28 @@
> #include <xen/interface/xen.h>
> #include <xen/interface/memory.h>
>
> -/* map fgfn of domid to lpfn in the current domain */
> -static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
> - unsigned int domid)
> -{
> - int rc;
> - struct xen_add_to_physmap_range xatp = {
> - .domid = DOMID_SELF,
> - .foreign_domid = domid,
> - .size = 1,
> - .space = XENMAPSPACE_gmfn_foreign,
> - };
> - xen_ulong_t idx = fgfn;
> - xen_pfn_t gpfn = lpfn;
> - int err = 0;
> +typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
>
> - set_xen_guest_handle(xatp.idxs, &idx);
> - set_xen_guest_handle(xatp.gpfns, &gpfn);
> - set_xen_guest_handle(xatp.errs, &err);
> +/* Break down the pages in 4KB chunk and call fn for each gfn */
> +static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
> + xen_gfn_fn_t fn, void *data)
> +{
> + unsigned long xen_pfn = 0;
> + struct page *page;
> + int i;
>
> - rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
> - return rc < 0 ? rc : err;
> + for (i = 0; i < nr_gfn; i++) {
> + if ((i % XEN_PFN_PER_PAGE) == 0) {
> + page = pages[i / XEN_PFN_PER_PAGE];
If this function is going to be called very frequently you might want to
consider using a shift instead.
page = pages[i >> 4];
With an appropriate macro of course.
> + xen_pfn = xen_page_to_pfn(page);
> + }
> + fn(pfn_to_gfn(xen_pfn++), data);
What is the purpose of incrementing xen_pfn here?
> + }
> }
>
> struct remap_data {
> xen_pfn_t *fgfn; /* foreign domain's gfn */
> + int nr_fgfn; /* Number of foreign gfn left to map */
> pgprot_t prot;
> domid_t domid;
> struct vm_area_struct *vma;
> @@ -71,24 +68,71 @@ struct remap_data {
> struct xen_remap_gfn_info *info;
> int *err_ptr;
> int mapped;
> +
> + /* Hypercall parameters */
> + int h_errs[XEN_PFN_PER_PAGE];
> + xen_ulong_t h_idxs[XEN_PFN_PER_PAGE];
> + xen_pfn_t h_gpfns[XEN_PFN_PER_PAGE];
> +
> + int h_iter; /* Iterator */
> };
>
> +static void setup_hparams(unsigned long gfn, void *data)
> +{
> + struct remap_data *info = data;
> +
> + info->h_idxs[info->h_iter] = *info->fgfn;
> + info->h_gpfns[info->h_iter] = gfn;
> + info->h_errs[info->h_iter] = 0;
> +
> + info->h_iter++;
> + info->fgfn++;
> +}
> +
> static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
> void *data)
> {
> struct remap_data *info = data;
> struct page *page = info->pages[info->index++];
> - unsigned long pfn = page_to_pfn(page);
> - pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
> - int rc;
> + pte_t pte = pte_mkspecial(pfn_pte(page_to_pfn(page), info->prot));
> + int rc, nr_gfn;
> + uint32_t i;
> + struct xen_add_to_physmap_range xatp = {
> + .domid = DOMID_SELF,
> + .foreign_domid = info->domid,
> + .space = XENMAPSPACE_gmfn_foreign,
> + };
>
> - rc = map_foreign_page(pfn, *info->fgfn, info->domid);
> - *info->err_ptr++ = rc;
> - if (!rc) {
> - set_pte_at(info->vma->vm_mm, addr, ptep, pte);
> - info->mapped++;
> + nr_gfn = min_t(typeof(info->nr_fgfn), XEN_PFN_PER_PAGE, info->nr_fgfn);
> + info->nr_fgfn -= nr_gfn;
> +
> + info->h_iter = 0;
> + xen_for_each_gfn(&page, nr_gfn, setup_hparams, info);
> + BUG_ON(info->h_iter != nr_gfn);
> +
> + set_xen_guest_handle(xatp.idxs, info->h_idxs);
> + set_xen_guest_handle(xatp.gpfns, info->h_gpfns);
> + set_xen_guest_handle(xatp.errs, info->h_errs);
> + xatp.size = nr_gfn;
> +
> + rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
> +
> + /* info->err_ptr expect to have one error status per Xen PFN */
> + for (i = 0; i < nr_gfn; i++) {
> + int err = (rc < 0) ? rc : info->h_errs[i];
> +
> + *(info->err_ptr++) = err;
> + if (!err)
> + info->mapped++;
> }
> - info->fgfn++;
> +
> + /*
> + * Note: The hypercall will return 0 in most of the case if even if
> + * all the fgmfn are not mapped. We still have to update the pte
> + * as the userspace may decide to continue.
> + */
> + if (!rc)
> + set_pte_at(info->vma->vm_mm, addr, ptep, pte);
>
> return 0;
> }
> @@ -102,13 +146,14 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> {
> int err;
> struct remap_data data;
> - unsigned long range = nr << PAGE_SHIFT;
> + unsigned long range = DIV_ROUND_UP(nr, XEN_PFN_PER_PAGE) << PAGE_SHIFT;
>
> /* Kept here for the purpose of making sure code doesn't break
> x86 PVOPS */
> BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
>
> data.fgfn = gfn;
> + data.nr_fgfn = nr;
> data.prot = prot;
> data.domid = domid;
> data.vma = vma;
> @@ -123,21 +168,20 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> }
> EXPORT_SYMBOL_GPL(xen_xlate_remap_gfn_array);
>
> -int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
> - int nr, struct page **pages)
> +static void unmap_gfn(unsigned long gfn, void *data)
> {
> - int i;
> + struct xen_remove_from_physmap xrp;
>
> - for (i = 0; i < nr; i++) {
> - struct xen_remove_from_physmap xrp;
> - unsigned long pfn;
> + xrp.domid = DOMID_SELF;
> + xrp.gpfn = gfn;
> + (void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
> +}
>
> - pfn = page_to_pfn(pages[i]);
> +int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
> + int nr, struct page **pages)
> +{
> + xen_for_each_gfn(pages, nr, unmap_gfn, NULL);
>
> - xrp.domid = DOMID_SELF;
> - xrp.gpfn = pfn;
> - (void)HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
> - }
> return 0;
> }
> EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
> --
> 2.1.4
>
On 10/08/15 13:03, Stefano Stabellini wrote:
> On Fri, 7 Aug 2015, Julien Grall wrote:
>> - rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
>> - return rc < 0 ? rc : err;
>> + for (i = 0; i < nr_gfn; i++) {
>> + if ((i % XEN_PFN_PER_PAGE) == 0) {
>> + page = pages[i / XEN_PFN_PER_PAGE];
>
> If this function is going to be called very frequently you might want to
> consider using a shift instead.
>
> page = pages[i >> 4];
>
> With an appropriate macro of course.
This change isn't necessary. Compilers already turn divides into
suitable shifts.
David
On 10/08/15 12:32, Julien Grall wrote:
> On 10/08/15 12:25, Stefano Stabellini wrote:
>> yes and page/pages:
>>
>> xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
>
> Why the ' in biovec's ? Shouldn't we says biovecs directly?
Pluralizing named C structures with apostrophes is valid. It makes it
clear we're talking about "struct biovec" and not "struct biovecs" objects.
You could also consider "biovec's" to be a contraction for "biovec
objects" so the apostrophe is "correct".
David
On Fri, 7 Aug 2015, Julien Grall wrote:
> The hypercall interface is always using 4KB page granularity. This is
> requiring to use xen page definition macro when we deal with hypercall.
>
> Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
> rename pfn_gfn to make this explicit.
>
> We also allocate a 64KB page for the shared page even though only the
> first 4KB is used. I don't think this is really important for now as it
> helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
> Xen PFN).
>
> Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
> ---
> Cc: Stefano Stabellini <[email protected]>
> Cc: Russell King <[email protected]>
>
> Stefano, I've dropped your reviewed-by given I've updated the doc and do
> changes to avoid usage of XEN_PAGE_SHIFT
>
> Changes in v3:
> - s/MFN/GFN/ base on the new naming
> - Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
> - Drop Stefano's reviewed-by
> - Add some docs in arch/arm/asm/xen/page.h
>
> Changes in v2
> - Add Stefano's reviewed-by
> ---
> arch/arm/include/asm/xen/page.h | 15 +++++++++++++--
> arch/arm/xen/enlighten.c | 6 +++---
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 98c9fc3..e3d94cf 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -28,6 +28,17 @@ typedef struct xpaddr {
>
> #define INVALID_P2M_ENTRY (~0UL)
>
> +/*
> + * The pseudo-physical frame (pfn) used in all the helpers is always based
> + * on Xen page granularity (i.e 4KB).
> + *
> + * A Linux page may be split across multiple non-contiguous Xen page so we
> + * have to keep track with frame based on 4KB page granularity.
> + *
> + * PV drivers should never make a direct usage of those helpers (particularly
> + * pfn_to_gfn and gfn_to_pfn).
> + */
> +
> unsigned long __pfn_to_mfn(unsigned long pfn);
> extern struct rb_root phys_to_mach;
>
> @@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
> #define bfn_to_local_pfn(bfn) bfn_to_pfn(bfn)
>
> /* VIRT <-> GUEST conversion */
> -#define virt_to_gfn(v) (pfn_to_gfn(virt_to_pfn(v)))
> -#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << PAGE_SHIFT))
> +#define virt_to_gfn(v) (pfn_to_gfn(virt_to_phys(v) >> XEN_PAGE_SHIFT))
> +#define gfn_to_virt(m) (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
>
> /* Only used in PV code. But ARM guests are always HVM. */
> static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index eeeab07..50b4769 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -89,8 +89,8 @@ static void xen_percpu_init(void)
> pr_info("Xen: initializing cpu%d\n", cpu);
> vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
>
> - info.mfn = __pa(vcpup) >> PAGE_SHIFT;
> - info.offset = offset_in_page(vcpup);
> + info.mfn = virt_to_gfn(vcpup);
> + info.offset = xen_offset_in_page(vcpup);
>
> err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
> BUG_ON(err);
> @@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
> xatp.domid = DOMID_SELF;
> xatp.idx = 0;
> xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = virt_to_gfn(shared_info_page);
> if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> BUG();
>
> --
> 2.1.4
>
On Mon, 10 Aug 2015, Julien Grall wrote:
> Hi Stefano,
>
> On 10/08/15 12:18, Stefano Stabellini wrote:
> >> /* Link back into the page tables if not highmem. */
> >> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
> >> static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
> >> {
> >> enum bp_state state = BP_DONE;
> >> - unsigned long pfn, i;
> >> + unsigned long i;
> >> struct page *page;
> >> int ret;
> >> struct xen_memory_reservation reservation = {
> >> .address_bits = 0,
> >> - .extent_order = 0,
> >> + .extent_order = EXTENT_ORDER,
> >> .domid = DOMID_SELF
> >> };
> >> + static struct page *pages[ARRAY_SIZE(frame_list)];
> >
> > This array can be rather large: I would try to avoid it, see below.
>
> [..]
>
> >
> > I would simply and avoid introducing a new array:
> > pfn = (frame_list[i] << XEN_PAGE_SHIFT) >> PAGE_SHIFT;
> > page = pfn_to_page(pfn);
>
> Which won't work because the frame_list contains a gfn and not a pfn.
> We need to translate back the gfn into a pfn and the into a page.
>
> The cost of the translation may be big and I wanted to avoid anymore
> XEN_PAGE_SHIFT in the code. In general we should avoid to deal with 4KB
> PFN when it's not necessary, it make the code more confusing to read.
That is true
> If your only concern is the size of the array, we could decrease the
> number of frames by batch. Or allocation the variable once a boot time.
Yes, that is my only concern. Allocating only nr_pages new struct page*
would be good enough I guess.
On Mon, 10 Aug 2015, David Vrabel wrote:
> On 10/08/15 13:03, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2015, Julien Grall wrote:
> >> - rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
> >> - return rc < 0 ? rc : err;
> >> + for (i = 0; i < nr_gfn; i++) {
> >> + if ((i % XEN_PFN_PER_PAGE) == 0) {
> >> + page = pages[i / XEN_PFN_PER_PAGE];
> >
> > If this function is going to be called very frequently you might want to
> > consider using a shift instead.
> >
> > page = pages[i >> 4];
> >
> > With an appropriate macro of course.
>
> This change isn't necessary. Compilers already turn divides into
> suitable shifts.
The ARM compiler I used last time I tested this did not, but that was 1
or 2 years ago. In any case to be clear this change is not required.
Hi Stefano,
On 10/08/15 13:57, Stefano Stabellini wrote:
> On Mon, 10 Aug 2015, David Vrabel wrote:
>> On 10/08/15 13:03, Stefano Stabellini wrote:
>>> On Fri, 7 Aug 2015, Julien Grall wrote:
>>>> - rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
>>>> - return rc < 0 ? rc : err;
>>>> + for (i = 0; i < nr_gfn; i++) {
>>>> + if ((i % XEN_PFN_PER_PAGE) == 0) {
>>>> + page = pages[i / XEN_PFN_PER_PAGE];
>>>
>>> If this function is going to be called very frequently you might want to
>>> consider using a shift instead.
>>>
>>> page = pages[i >> 4];
>>>
>>> With an appropriate macro of course.
>>
>> This change isn't necessary. Compilers already turn divides into
>> suitable shifts.
>
> The ARM compiler I used last time I tested this did not, but that was 1
> or 2 years ago. In any case to be clear this change is not required.
I gave a try on the compiler used by Debian Jessy (gcc 4.9.2). It turns
divides into suitable shifts.
Anyway, if it may happen that older ARM compiler doesn't do this change,
I sure we would have to modify many other places in order to make the
code efficient.
Regards,
--
Julien Grall
On 10/08/15 13:55, Stefano Stabellini wrote:
>> If your only concern is the size of the array, we could decrease the
>> number of frames by batch. Or allocation the variable once a boot time.
>
> Yes, that is my only concern. Allocating only nr_pages new struct page*
> would be good enough I guess.
That would be even worst. We shouldn't allocate the array at every call,
but at boot time.
Note that frame_list is already a static variable use 64KB when 64KB
page is used. I guess this will be unlikely to remove that much frame in
a single batch. But I will keep this optimization for later.
Anyway, I'm wondering if we could re-use the lru field to link the page
when allocate them and retrieve in the second loop in order to avoid the
pages array.
Regards,
--
Julien Grall
Hi,
Ping? I'm missing some reviews on block and netfront code.
We'd like to see this series going in Linux 4.3. Some distributions
plans to use this version for aarch64 support. If we miss it, we won't
have any Xen guests support, even it's minimal, for Linux using 64KB
page granularity.
Regards,
On 07/08/2015 09:46, Julien Grall wrote:
> Hi all,
>
> ARM64 Linux is supporting both 4KB and 64KB page granularity. Although, Xen
> hypercall interface and PV protocol are always based on 4KB page granularity.
>
> Any attempt to boot a Linux guest with 64KB pages enabled will result to a
> guest crash.
>
> This series is a first attempt to allow those Linux running with the current
> hypercall interface and PV protocol.
>
> This solution has been chosen because we want to run Linux 64KB in released
> Xen ARM version or/and platform using an old version of Linux DOM0.
>
> There is room for improvement, such as support of 64KB grant, modification
> of PV protocol to support different page size... They will be explored in a
> separate patch series later.
>
> TODO list:
> - Convert swiotlb to 64KB
> - Convert xenfb to 64KB
> - Check if backend in QEMU works with DOM0 64KB
> - It may be possible to move some common define between
> netback/netfront and blkfront/blkback in an header
>
> All patches has been built tested for ARM32, ARM64, x86. But I haven't tested
> to run it on x86 as I don't have a box with Xen x86 running. I would be
> happy if someone give a try and see possible regression for x86.
>
> A branch based on the latest linux/master can be found here:
>
> git://xenbits.xen.org/people/julieng/linux-arm.git branch xen-64k-v3
>
> Comments, suggestions are welcomed.
>
> Sincerely yours,
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Julien Grall (20):
> net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop
> arm/xen: Drop pte_mfn and mfn_pte
> xen: Add Xen specific page definition
> xen/grant: Introduce helpers to split a page into grant
> xen/grant: Add helper gnttab_page_grant_foreign_access_ref_one
> block/xen-blkfront: Split blkif_queue_request in 2
> block/xen-blkfront: Store a page rather a pfn in the grant structure
> block/xen-blkfront: split get_grant in 2
> xen/biomerge: Don't allow biovec to be merge when Linux is not using
> 4KB page
> xen/xenbus: Use Xen page definition
> tty/hvc: xen: Use xen page definition
> xen/balloon: Don't rely on the page granularity is the same for Xen
> and Linux
> xen/events: fifo: Make it running on 64KB granularity
> xen/grant-table: Make it running on 64KB granularity
> block/xen-blkfront: Make it running on 64KB page granularity
> block/xen-blkback: Make it running on 64KB page granularity
> net/xen-netfront: Make it running on 64KB page granularity
> net/xen-netback: Make it running on 64KB page granularity
> xen/privcmd: Add support for Linux 64KB page granularity
> arm/xen: Add support for 64KB page granularity
>
> arch/arm/include/asm/xen/page.h | 18 +-
> arch/arm/xen/enlighten.c | 6 +-
> arch/arm/xen/p2m.c | 6 +-
> arch/x86/include/asm/xen/page.h | 2 +-
> drivers/block/xen-blkback/blkback.c | 5 +-
> drivers/block/xen-blkback/common.h | 17 +-
> drivers/block/xen-blkback/xenbus.c | 9 +-
> drivers/block/xen-blkfront.c | 552 +++++++++++++++++++++++-------------
> drivers/net/xen-netback/common.h | 15 +-
> drivers/net/xen-netback/netback.c | 163 +++++++----
> drivers/net/xen-netfront.c | 122 +++++---
> drivers/tty/hvc/hvc_xen.c | 4 +-
> drivers/xen/balloon.c | 47 ++-
> drivers/xen/biomerge.c | 8 +
> drivers/xen/events/events_base.c | 2 +-
> drivers/xen/events/events_fifo.c | 2 +-
> drivers/xen/grant-table.c | 32 ++-
> drivers/xen/privcmd.c | 8 +-
> drivers/xen/xenbus/xenbus_client.c | 6 +-
> drivers/xen/xenbus/xenbus_probe.c | 3 +-
> drivers/xen/xlate_mmu.c | 124 +++++---
> include/xen/grant_table.h | 51 ++++
> include/xen/page.h | 27 +-
> 23 files changed, 844 insertions(+), 385 deletions(-)
>
--
Julien Grall
El 07/08/15 a les 18.46, Julien Grall ha escrit:
> Prepare the code to support 64KB page granularity. The first
> implementation will use a full Linux page per indirect and persistent
> grant. When non-persistent grant is used, each page of a bio request
> may be split in multiple grant.
>
> Furthermore, the field page of the grant structure is only used to copy
> data from persistent grant or indirect grant. Avoid to set it for other
> use case as it will have no meaning given the page will be split in
> multiple grant.
>
> Provide 2 functions, to setup indirect grant, the other for bio page.
>
> Signed-off-by: Julien Grall <[email protected]>
Thanks, this looks fine:
Acked-by: Roger Pau Monné <[email protected]>
Roger.
Hello,
I have some comments regarding the commit message, IMHO it would be good
that a native English speaker reviews it too.
El 07/08/15 a les 18.46, Julien Grall ha escrit:
> The PV block protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity using block
> device on a non-modified Xen.
>
> The block API is using segment which should at least be the size of a
> Linux page. Therefore, the driver will have to break the page in chunk
> of 4K before giving the page to the backend.
>
> Breaking a 64KB segment in 4KB chunk will result to have some chunk with
> no data.
I would rewrite the above line as:
When breaking a 64KB segment into 4KB chunks it is possible that some
chunks are empty.
> As the PV protocol always require to have data in the chunk, we
> have to count the number of Xen page which will be in use and avoid to
^pages
> sent empty chunk.
^and avoid sending empty chunks.
>
> Note that, a pre-defined number of grant is reserved before preparing
^grants are
> the request. This pre-defined number is based on the number and the
> maximum size of the segments. If each segment contain a very small
^contains
> amount of data, the driver may reserve too much grant (16 grant is
^many grants ^grants are
> reserved per segment with 64KB page granularity).
>
> Futhermore, in the case of persistent grant we allocate one Linux page
^grants
> per grant although only the 4KB of the page will be effectively use.
^first ^in
> This could be improved by share the page with multiple grants.
^sharing
>
> Signed-off-by: Julien Grall <[email protected]>
LGTM:
Acked-by: Roger Pau Monné <[email protected]>
Just one question.
[...]
> @@ -559,73 +669,30 @@ static int blkif_queue_rw_req(struct request *req)
> ring_req->operation = 0;
> }
> }
> - ring_req->u.rw.nr_segments = nseg;
> - }
> - for_each_sg(info->shadow[id].sg, sg, nseg, i) {
> - fsect = sg->offset >> 9;
> - lsect = fsect + (sg->length >> 9) - 1;
> -
> - if ((ring_req->operation == BLKIF_OP_INDIRECT) &&
> - (i % SEGS_PER_INDIRECT_FRAME == 0)) {
> - if (segments)
> - kunmap_atomic(segments);
> -
> - n = i / SEGS_PER_INDIRECT_FRAME;
> - gnt_list_entry = get_indirect_grant(&gref_head, info);
> - info->shadow[id].indirect_grants[n] = gnt_list_entry;
> - segments = kmap_atomic(gnt_list_entry->page);
> - ring_req->u.indirect.indirect_grefs[n] = gnt_list_entry->gref;
> - }
> -
> - gnt_list_entry = get_grant(&gref_head,
> - xen_page_to_gfn(sg_page(sg)),
> - info);
> - ref = gnt_list_entry->gref;
> -
> - info->shadow[id].grants_used[i] = gnt_list_entry;
> -
> - if (rq_data_dir(req) && info->feature_persistent) {
> - char *bvec_data;
> - void *shared_data;
> + ring_req->u.rw.nr_segments = num_grant;
> + }
>
> - BUG_ON(sg->offset + sg->length > PAGE_SIZE);
> + setup.ring_req = ring_req;
> + setup.id = id;
> + for_each_sg(info->shadow[id].sg, sg, num_sg, i) {
> + BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>
> - shared_data = kmap_atomic(gnt_list_entry->page);
> - bvec_data = kmap_atomic(sg_page(sg));
> + if (setup.need_copy) {
> + setup.bvec_off = sg->offset;
> + setup.bvec_data = kmap_atomic(sg_page(sg));
> + }
>
> - /*
> - * this does not wipe data stored outside the
> - * range sg->offset..sg->offset+sg->length.
> - * Therefore, blkback *could* see data from
> - * previous requests. This is OK as long as
> - * persistent grants are shared with just one
> - * domain. It may need refactoring if this
> - * changes
> - */
> - memcpy(shared_data + sg->offset,
> - bvec_data + sg->offset,
> - sg->length);
> + gnttab_foreach_grant_in_range(sg_page(sg),
> + sg->offset,
> + sg->length,
> + blkif_setup_rw_req_grant,
> + &setup);
If I'm understanding this right, on x86 gnttab_foreach_grant_in_range is
only going to perform one iteration, since XEN_PAGE_SIZE == PAGE_SIZE.
Roger.
El 07/08/15 a les 18.46, Julien Grall ha escrit:
> The PV block protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity behaving as a
> block backend on a non-modified Xen.
>
> It's only necessary to adapt the ring size and the number of request per
> indirect frames. The rest of the code is relying on the grant table
> code.
>
> Note that the grant table code is allocating a Linux page per grant
> which will result to waste 6OKB for every grant when Linux is using 64KB
> page granularity. This could be improved by sharing the page between
> multiple grants.
>
> Signed-off-by: Julien Grall <[email protected]>
LGTM:
Acked-by: Roger Pau Monné <[email protected]>
> ---
>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: "Roger Pau Monné" <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: David Vrabel <[email protected]>
>
> Improvement such as support of 64KB grant is not taken into
> consideration in this patch because we have the requirement to run a
> Linux using 64KB pages on a non-modified Xen.
>
> This has been tested only with a loop device. I plan to test passing
> hard drive partition but I didn't yet convert the swiotlb code.
>
> Changes in v3:
> - Use DIV_ROUND_UP in INDIRECT_PAGES to avoid a line over 80
> characters
> ---
> drivers/block/xen-blkback/blkback.c | 5 +++--
> drivers/block/xen-blkback/common.h | 17 +++++++++++++----
> drivers/block/xen-blkback/xenbus.c | 9 ++++++---
> 3 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> index ced9677..d5cce8c 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -961,7 +961,7 @@ static int xen_blkbk_parse_indirect(struct blkif_request *req,
> seg[n].nsec = segments[i].last_sect -
> segments[i].first_sect + 1;
> seg[n].offset = (segments[i].first_sect << 9);
> - if ((segments[i].last_sect >= (PAGE_SIZE >> 9)) ||
> + if ((segments[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
> (segments[i].last_sect < segments[i].first_sect)) {
> rc = -EINVAL;
> goto unmap;
> @@ -1210,6 +1210,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
>
> req_operation = req->operation == BLKIF_OP_INDIRECT ?
> req->u.indirect.indirect_op : req->operation;
> +
> if ((req->operation == BLKIF_OP_INDIRECT) &&
> (req_operation != BLKIF_OP_READ) &&
> (req_operation != BLKIF_OP_WRITE)) {
> @@ -1268,7 +1269,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
> seg[i].nsec = req->u.rw.seg[i].last_sect -
> req->u.rw.seg[i].first_sect + 1;
> seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
> - if ((req->u.rw.seg[i].last_sect >= (PAGE_SIZE >> 9)) ||
> + if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
> (req->u.rw.seg[i].last_sect <
> req->u.rw.seg[i].first_sect))
> goto fail_response;
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index 45a044a..68e87a0 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -39,6 +39,7 @@
> #include <asm/pgalloc.h>
> #include <asm/hypervisor.h>
> #include <xen/grant_table.h>
> +#include <xen/page.h>
> #include <xen/xenbus.h>
> #include <xen/interface/io/ring.h>
> #include <xen/interface/io/blkif.h>
> @@ -51,12 +52,20 @@ extern unsigned int xen_blkif_max_ring_order;
> */
> #define MAX_INDIRECT_SEGMENTS 256
>
> -#define SEGS_PER_INDIRECT_FRAME \
> - (PAGE_SIZE/sizeof(struct blkif_request_segment))
> +/*
> + * Xen use 4K pages. The guest may use different page size (4K or 64K)
Please expand this comment to mention that it only applies to ARM, for
now on x86 the backend and the frontend always use the same page size.
Roger.
El 20/08/15 a les 2.40, Julien Grall ha escrit:
> Hi,
>
> Ping? I'm missing some reviews on block and netfront code.
Sorry, I didn't realize some of the patches were changed and the Ack
dropped. I think I've reviewed everything related to block.
Roger.
On 07/08/15 17:46, Julien Grall wrote:
> The Xen hypercall interface is always using 4K page granularity on ARM
> and x86 architecture.
>
> With the incoming support of 64K page granularity for ARM64 guest, it
> won't be possible to re-use the Linux page definition in Xen drivers.
>
> Introduce Xen page definition helpers based on the Linux page
> definition. They have exactly the same name but prefixed with
> XEN_/xen_ prefix.
>
> Also modify xen_page_to_gfn to use new Xen page definition.
[...]
> +#define xen_page_to_pfn(page) \
> + (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
Rename this to page_to_xen_pfn()
David
On 07/08/15 17:46, Julien Grall wrote:
> Currently, a grant is always based on the Xen page granularity (i.e
> 4KB). When Linux is using a different page granularity, a single page
> will be split between multiple grants.
>
> The new helpers will be in charge to split the Linux page into grants and
> call a function given by the caller on each grant.
>
> Also provide an helper to count the number of grants within a given
> contiguous region.
>
> Note that the x86/include/asm/xen/page.h is now including
> xen/interface/grant_table.h rather than xen/grant_table.h. It's
> necessary because xen/grant_table.h depends on asm/xen/page.h and will
> break the compilation. Furthermore, only definition in
> interface/grant_table.h was required.
Reviewed-by: David Vrabel <[email protected]>
But...
> +/* Helper to get to call fn only on the first "grant chunk" */
> +static inline void gnttab_one_grant(struct page *page, unsigned int offset,
> + unsigned len, xen_grant_fn_t fn,
> + void *data)
...call this gnttab_for_one_grant().
David
On 07/08/15 17:46, Julien Grall wrote:
> The console ring is always based on the page granularity of Xen.
[...]
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
> if (r < 0 || v == 0)
> goto err;
> gfn = v;
> - info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
> + info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, PAGE_SIZE);
You need XEN_PAGE_SIZE here I think...
> if (info->intf == NULL)
> goto err;
> info->vtermno = HVC_COOKIE;
> @@ -472,7 +472,7 @@ static int xencons_resume(struct xenbus_device *dev)
> struct xencons_info *info = dev_get_drvdata(&dev->dev);
>
> xencons_disconnect_backend(info);
> - memset(info->intf, 0, PAGE_SIZE);
> + memset(info->intf, 0, XEN_PAGE_SIZE);
...particularly since you use it here.
David
On 07/08/15 17:46, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
>
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
>
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
>
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
[...]
> #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> + * different page granularity.
> + */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
You don't need this BUILD_BUG_ON() twice.
Otherwise,
Reviewed-by: David Vrabel <[email protected]>
David
On 07/08/15 17:46, Julien Grall wrote:
> The PV network protocol is using 4KB page granularity. The goal of this
> patch is to allow a Linux using 64KB page granularity using network
> device on a non-modified Xen.
>
> It's only necessary to adapt the ring size and break skb data in small
> chunk of 4KB. The rest of the code is relying on the grant table code.
>
> Note that we allocate a Linux page for each rx skb but only the first
> 4KB is used. We may improve the memory usage by extending the size of
> the rx skb.
Reviewed-by: David Vrabel <[email protected]>
David
On 07/08/15 17:46, Julien Grall wrote:
> The hypercall interface (as well as the toolstack) is always using 4KB
> page granularity. When the toolstack is asking for mapping a series of
> guest PFN in a batch, it expects to have the page map contiguously in
> its virtual memory.
>
> When Linux is using 64KB page granularity, the privcmd driver will have
> to map multiple Xen PFN in a single Linux page.
>
> Note that this solution works on page granularity which is a multiple of
> 4KB.
Reviewed-by: David Vrabel <[email protected]>
David
On 20/08/15 01:40, Julien Grall wrote:
> Hi,
>
> Ping? I'm missing some reviews on block and netfront code.
>
> We'd like to see this series going in Linux 4.3. Some distributions
> plans to use this version for aarch64 support. If we miss it, we won't
> have any Xen guests support, even it's minimal, for Linux using 64KB
> page granularity.
This is too late for Linux 4.3.
David
On 20/08/2015 03:11, David Vrabel wrote:
> On 20/08/15 01:40, Julien Grall wrote:
>> Hi,
>>
>> Ping? I'm missing some reviews on block and netfront code.
>>
>> We'd like to see this series going in Linux 4.3. Some distributions
>> plans to use this version for aarch64 support. If we miss it, we won't
>> have any Xen guests support, even it's minimal, for Linux using 64KB
>> page granularity.
>
> This is too late for Linux 4.3.
May I know why it's too late for Linux 4.3? AFAICT, Linux 4.2 is not yet
out and therefore the merge window for Linux 4.3 is not yet opened.
Regards,
--
Julien Grall
On 20/08/15 16:03, Julien Grall wrote:
>
>
> On 20/08/2015 03:11, David Vrabel wrote:
>> On 20/08/15 01:40, Julien Grall wrote:
>>> Hi,
>>>
>>> Ping? I'm missing some reviews on block and netfront code.
>>>
>>> We'd like to see this series going in Linux 4.3. Some distributions
>>> plans to use this version for aarch64 support. If we miss it, we won't
>>> have any Xen guests support, even it's minimal, for Linux using 64KB
>>> page granularity.
>>
>> This is too late for Linux 4.3.
>
> May I know why it's too late for Linux 4.3? AFAICT, Linux 4.2 is not yet
> out and therefore the merge window for Linux 4.3 is not yet opened.
Because:
a) There's nothing suitable to apply yet (need a v4 series)
b) The merge window opens this weekend (probably).
c) What's currently queued for 4.3 has been tested and no further tests
can be run until next week (when Boris or Konrad get back).
FWIW, "mm, xen/balloon: memory hotplug improvements" has also not been
queued for 4.3 for similar reasons (despite this series being ready 4
days ago and it having been more extensively tested inside XenServer).
David
Hi David,
On 20/08/15 10:51, David Vrabel wrote:
> On 07/08/15 17:46, Julien Grall wrote:
>> Currently, a grant is always based on the Xen page granularity (i.e
>> 4KB). When Linux is using a different page granularity, a single page
>> will be split between multiple grants.
>>
>> The new helpers will be in charge to split the Linux page into grants and
>> call a function given by the caller on each grant.
>>
>> Also provide an helper to count the number of grants within a given
>> contiguous region.
>>
>> Note that the x86/include/asm/xen/page.h is now including
>> xen/interface/grant_table.h rather than xen/grant_table.h. It's
>> necessary because xen/grant_table.h depends on asm/xen/page.h and will
>> break the compilation. Furthermore, only definition in
>> interface/grant_table.h was required.
>
> Reviewed-by: David Vrabel <[email protected]>
> But...
>
>> +/* Helper to get to call fn only on the first "grant chunk" */
>> +static inline void gnttab_one_grant(struct page *page, unsigned int offset,
>> + unsigned len, xen_grant_fn_t fn,
>> + void *data)
>
> ...call this gnttab_for_one_grant().
Will rename it on the next version.
Regards,
--
Julien Grall
Hi David,
On 20/08/15 10:55, David Vrabel wrote:
> On 07/08/15 17:46, Julien Grall wrote:
>> The console ring is always based on the page granularity of Xen.
> [...]
>> --- a/drivers/tty/hvc/hvc_xen.c
>> +++ b/drivers/tty/hvc/hvc_xen.c
>> @@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
>> if (r < 0 || v == 0)
>> goto err;
>> gfn = v;
>> - info->intf = xen_remap(gfn << PAGE_SHIFT, PAGE_SIZE);
>> + info->intf = xen_remap(gfn << XEN_PAGE_SHIFT, PAGE_SIZE);
>
> You need XEN_PAGE_SIZE here I think...
Right, I did the mistake while rebase on my s/mfn/gfn/ series. I will
fix it in the next version.
Regards,
--
Julien Grall
On 20/08/15 10:59, David Vrabel wrote:
> On 07/08/15 17:46, Julien Grall wrote:
>> For ARM64 guests, Linux is able to support either 64K or 4K page
>> granularity. Although, the hypercall interface is always based on 4K
>> page granularity.
>>
>> With 64K page granularity, a single page will be spread over multiple
>> Xen frame.
>>
>> To avoid splitting the page into 4K frame, take advantage of the
>> extent_order field to directly allocate/free chunk of the Linux page
>> size.
>>
>> Note that PVMMU is only used for PV guest (which is x86) and the page
>> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
>> that because the code has not been modified.
> [...]
>> #ifdef CONFIG_XEN_HAVE_PVMMU
>> + /* We don't support PV MMU when Linux and Xen is using
>> + * different page granularity.
>> + */
>> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
>
> You don't need this BUILD_BUG_ON() twice.
I put twice the BUILD_BUG_ON so if we ever decide to drop one of the
#ifdef CONFIG_XEN_HAVE_PVMMU, the check will still be present.
So I'd like to keep it.
> Otherwise,
>
> Reviewed-by: David Vrabel <[email protected]>
Based on the discussion with Stefano I rework a bit the balloon code to
re-use lru field (see [1]), so I won't retain your reviewed-by on the
next series.
Regards,
[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg00781.html
--
Julien Grall
On 20/08/15 09:10, Roger Pau Monné wrote:
> Hello,
Hi,
> I have some comments regarding the commit message, IMHO it would be good
> that a native English speaker reviews it too.
>
> El 07/08/15 a les 18.46, Julien Grall ha escrit:
>> The PV block protocol is using 4KB page granularity. The goal of this
>> patch is to allow a Linux using 64KB page granularity using block
>> device on a non-modified Xen.
>>
>> The block API is using segment which should at least be the size of a
>> Linux page. Therefore, the driver will have to break the page in chunk
>> of 4K before giving the page to the backend.
>>
>> Breaking a 64KB segment in 4KB chunk will result to have some chunk with
>> no data.
>
> I would rewrite the above line as:
>
> When breaking a 64KB segment into 4KB chunks it is possible that some
Correct,
> chunks are empty.
Sounds good, I will replace with it.
>> As the PV protocol always require to have data in the chunk, we
>> have to count the number of Xen page which will be in use and avoid to
> ^pages
>> sent empty chunk.
> ^and avoid sending empty chunks.
>>
>> Note that, a pre-defined number of grant is reserved before preparing
> ^grants are
>> the request. This pre-defined number is based on the number and the
>> maximum size of the segments. If each segment contain a very small
> ^contains
>> amount of data, the driver may reserve too much grant (16 grant is
> ^many grants ^grants are
>> reserved per segment with 64KB page granularity).
>>
>> Futhermore, in the case of persistent grant we allocate one Linux page
> ^grants
>> per grant although only the 4KB of the page will be effectively use.
> ^first ^in
>> This could be improved by share the page with multiple grants.
> ^sharing
>>
>> Signed-off-by: Julien Grall <[email protected]>
>
> LGTM:
>
> Acked-by: Roger Pau Monné <[email protected]>
Thank you, I will fix all the typos in the next version.
> Just one question.
[..]
>> + gnttab_foreach_grant_in_range(sg_page(sg),
>> + sg->offset,
>> + sg->length,
>> + blkif_setup_rw_req_grant,
>> + &setup);
>
> If I'm understanding this right, on x86 gnttab_foreach_grant_in_range is
> only going to perform one iteration, since XEN_PAGE_SIZE == PAGE_SIZE.
Correct, it will only perform when iteration for x86 but also for arm32
and arm64 (when 4KB page is in use).
Regards,
--
Julien Grall