2019-05-09 17:27:11

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 00/16] xenhost support

Hi all,

This is an RFC for xenhost support, outlined here by Juergen here:
https://lkml.org/lkml/2019/4/8/67.

The high level idea is to provide an abstraction of the Xen
communication interface, as a xenhost_t.

xenhost_t expose ops for communication between the guest and Xen
(hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on top
of those, xenbus, ballooning), and these can differ based on the kind
of underlying Xen: regular, local, and nested.

(Since this abstraction is largely about guest -- xenhost communication,
no ops are needed for timer, clock, sched, memory (MMU, P2M), VCPU mgmt.
etc.)

Xenhost use-cases:

Regular-Xen: the standard Xen interface presented to a guest,
specifically for comunication between Lx-guest and Lx-Xen.

Local-Xen: a Xen like interface which runs in the same address space as
the guest (dom0). This, can act as the default xenhost.

The major ways it differs from a regular Xen interface is in presenting
a different hypercall interface (call instead of a syscall/vmcall), and
in an inability to do grant-mappings: since local-Xen exists in the same
address space as Xen, there's no way for it to cheaply change the
physical page that a GFN maps to (assuming no P2M tables.)

Nested-Xen: this channel is to Xen, one level removed: from L1-guest to
L0-Xen. The use case is that we want L0-dom0-backends to talk to
L1-dom0-frontend drivers which can then present PV devices which can
in-turn be used by the L1-dom0-backend drivers as raw underlying devices.
The interfaces themselves, broadly remain similar.

Note: L0-Xen, L1-Xen represent Xen running at that nesting level
and L0-guest, L1-guest represent guests that are children of Xen
at that nesting level. Lx, represents any level.

Patches 1-7,
"x86/xen: add xenhost_t interface"
"x86/xen: cpuid support in xenhost_t"
"x86/xen: make hypercall_page generic"
"x86/xen: hypercall support for xenhost_t"
"x86/xen: add feature support in xenhost_t"
"x86/xen: add shared_info support to xenhost_t"
"x86/xen: make vcpu_info part of xenhost_t"
abstract out interfaces that setup hypercalls/cpuid/shared_info/vcpu_info etc.

Patch 8, "x86/xen: irq/upcall handling with multiple xenhosts"
sets up the upcall and pv_irq ops based on vcpu_info.

Patch 9, "xen/evtchn: support evtchn in xenhost_t" adds xenhost based
evtchn support for evtchn_2l.

Patches 10 and 16, "xen/balloon: support ballooning in xenhost_t" and
"xen/grant-table: host_addr fixup in mapping on xenhost_r0"
implement support from GNTTABOP_map_grant_ref for xenhosts of type
xenhost_r0 (xenhost local.)

Patch 12, "xen/xenbus: support xenbus frontend/backend with xenhost_t"
makes xenbus so that both its frontend and backend can be bootstrapped
separately via separate xenhosts.

Remaining patches, 11, 13, 14, 15:
"xen/grant-table: make grant-table xenhost aware"
"drivers/xen: gnttab, evtchn, xenbus API changes"
"xen/blk: gnttab, evtchn, xenbus API changes"
"xen/net: gnttab, evtchn, xenbus API changes"
are mostly mechanical changes for APIs that now take xenhost_t *
as parameter.

The code itself is RFC quality, and is mostly meant to get feedback before
proceeding further. Also note that the FIFO logic and some Xen drivers
(input, pciback, scsi etc) are mostly unchanged, so will not build.


Please take a look.

Thanks
Ankur


Ankur Arora (16):

x86/xen: add xenhost_t interface
x86/xen: cpuid support in xenhost_t
x86/xen: make hypercall_page generic
x86/xen: hypercall support for xenhost_t
x86/xen: add feature support in xenhost_t
x86/xen: add shared_info support to xenhost_t
x86/xen: make vcpu_info part of xenhost_t
x86/xen: irq/upcall handling with multiple xenhosts
xen/evtchn: support evtchn in xenhost_t
xen/balloon: support ballooning in xenhost_t
xen/grant-table: make grant-table xenhost aware
xen/xenbus: support xenbus frontend/backend with xenhost_t
drivers/xen: gnttab, evtchn, xenbus API changes
xen/blk: gnttab, evtchn, xenbus API changes
xen/net: gnttab, evtchn, xenbus API changes
xen/grant-table: host_addr fixup in mapping on xenhost_r0

arch/x86/include/asm/xen/hypercall.h | 239 +++++---
arch/x86/include/asm/xen/hypervisor.h | 3 +-
arch/x86/pci/xen.c | 18 +-
arch/x86/xen/Makefile | 3 +-
arch/x86/xen/enlighten.c | 101 ++--
arch/x86/xen/enlighten_hvm.c | 185 ++++--
arch/x86/xen/enlighten_pv.c | 144 ++++-
arch/x86/xen/enlighten_pvh.c | 25 +-
arch/x86/xen/grant-table.c | 71 ++-
arch/x86/xen/irq.c | 75 ++-
arch/x86/xen/mmu_pv.c | 6 +-
arch/x86/xen/p2m.c | 24 +-
arch/x86/xen/pci-swiotlb-xen.c | 1 +
arch/x86/xen/setup.c | 1 +
arch/x86/xen/smp.c | 25 +-
arch/x86/xen/smp_hvm.c | 17 +-
arch/x86/xen/smp_pv.c | 27 +-
arch/x86/xen/suspend_hvm.c | 6 +-
arch/x86/xen/suspend_pv.c | 14 +-
arch/x86/xen/time.c | 32 +-
arch/x86/xen/xen-asm_32.S | 2 +-
arch/x86/xen/xen-asm_64.S | 2 +-
arch/x86/xen/xen-head.S | 11 +-
arch/x86/xen/xen-ops.h | 8 +-
arch/x86/xen/xenhost.c | 102 ++++
drivers/block/xen-blkback/blkback.c | 56 +-
drivers/block/xen-blkback/common.h | 2 +-
drivers/block/xen-blkback/xenbus.c | 65 +--
drivers/block/xen-blkfront.c | 105 ++--
drivers/input/misc/xen-kbdfront.c | 2 +-
drivers/net/xen-netback/hash.c | 7 +-
drivers/net/xen-netback/interface.c | 15 +-
drivers/net/xen-netback/netback.c | 11 +-
drivers/net/xen-netback/rx.c | 3 +-
drivers/net/xen-netback/xenbus.c | 81 +--
drivers/net/xen-netfront.c | 122 ++--
drivers/pci/xen-pcifront.c | 6 +-
drivers/tty/hvc/hvc_xen.c | 2 +-
drivers/xen/acpi.c | 2 +
drivers/xen/balloon.c | 21 +-
drivers/xen/cpu_hotplug.c | 16 +-
drivers/xen/events/Makefile | 1 -
drivers/xen/events/events_2l.c | 198 +++----
drivers/xen/events/events_base.c | 381 +++++++------
drivers/xen/events/events_fifo.c | 4 +-
drivers/xen/events/events_internal.h | 78 +--
drivers/xen/evtchn.c | 24 +-
drivers/xen/fallback.c | 9 +-
drivers/xen/features.c | 33 +-
drivers/xen/gntalloc.c | 21 +-
drivers/xen/gntdev.c | 26 +-
drivers/xen/grant-table.c | 632 ++++++++++++---------
drivers/xen/manage.c | 37 +-
drivers/xen/mcelog.c | 2 +-
drivers/xen/pcpu.c | 2 +-
drivers/xen/platform-pci.c | 12 +-
drivers/xen/preempt.c | 1 +
drivers/xen/privcmd.c | 5 +-
drivers/xen/sys-hypervisor.c | 14 +-
drivers/xen/time.c | 4 +-
drivers/xen/xen-balloon.c | 16 +-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 5 +-
drivers/xen/xen-selfballoon.c | 2 +
drivers/xen/xenbus/xenbus.h | 45 +-
drivers/xen/xenbus/xenbus_client.c | 40 +-
drivers/xen/xenbus/xenbus_comms.c | 121 ++--
drivers/xen/xenbus/xenbus_dev_backend.c | 30 +-
drivers/xen/xenbus/xenbus_dev_frontend.c | 22 +-
drivers/xen/xenbus/xenbus_probe.c | 247 +++++---
drivers/xen/xenbus/xenbus_probe_backend.c | 20 +-
drivers/xen/xenbus/xenbus_probe_frontend.c | 66 ++-
drivers/xen/xenbus/xenbus_xs.c | 192 ++++---
drivers/xen/xenfs/xenstored.c | 7 +-
drivers/xen/xlate_mmu.c | 4 +-
include/xen/balloon.h | 4 +-
include/xen/events.h | 45 +-
include/xen/features.h | 17 +-
include/xen/grant_table.h | 83 +--
include/xen/xen-ops.h | 10 +-
include/xen/xen.h | 3 +
include/xen/xenbus.h | 54 +-
include/xen/xenhost.h | 302 ++++++++++
83 files changed, 2826 insertions(+), 1653 deletions(-)
create mode 100644 arch/x86/xen/xenhost.c
create mode 100644 include/xen/xenhost.h

--
2.20.1


2019-05-09 17:27:24

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 05/16] x86/xen: add feature support in xenhost_t

With nested xenhosts, both the xenhosts could have different supported
xen_features. Add support for probing both.

In addition, validate that features are compatible across xenhosts.

For runtime feature checking, the code uses xen_feature() with the
default xenhost. This should be good enough because we do feature
validation early which guarantees that the features of interest are
compatible. Features not of interest, are related to MMU, clock, pirq, etc where
the interface to L0-Xen should not matter.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/enlighten_hvm.c | 15 +++++++++++----
arch/x86/xen/enlighten_pv.c | 14 ++++++++++----
drivers/xen/features.c | 33 +++++++++++++++++++++++++++------
include/xen/features.h | 17 ++++++++++++++---
include/xen/xenhost.h | 10 ++++++++++
5 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index f84941d6944e..a118b61a1a8a 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -119,17 +119,24 @@ static void __init init_hvm_pv_info(void)

xen_domain_type = XEN_HVM_DOMAIN;

- /* PVH set up hypercall page in xen_prepare_pvh(). */
if (xen_pvh_domain())
pv_info.name = "Xen PVH";
- else {
+ else
pv_info.name = "Xen HVM";

- for_each_xenhost(xh)
+ for_each_xenhost(xh) {
+ /* PVH set up hypercall page in xen_prepare_pvh(). */
+ if (!xen_pvh_domain())
xenhost_setup_hypercall_page(*xh);
+ xen_setup_features(*xh);
}

- xen_setup_features();
+ /*
+ * Check if features are compatible across L1-Xen and L0-Xen;
+ * If not, get rid of xenhost_r2.
+ */
+ if (xen_validate_features() == false)
+ __xenhost_unregister(xenhost_r2);

cpuid(base + 4, &eax, &ebx, &ecx, &edx);
if (eax & XEN_HVM_CPUID_VCPU_ID_PRESENT)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index a2c07cc71498..484968ff16a4 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1236,13 +1236,19 @@ asmlinkage __visible void __init xen_start_kernel(void)
if (xen_driver_domain() && xen_nested())
xenhost_register(xenhost_r2, &xh_pv_nested_ops);

- for_each_xenhost(xh)
- xenhost_setup_hypercall_page(*xh);
-
xen_domain_type = XEN_PV_DOMAIN;
xen_start_flags = xen_start_info->flags;

- xen_setup_features();
+ for_each_xenhost(xh) {
+ xenhost_setup_hypercall_page(*xh);
+ xen_setup_features(*xh);
+ }
+ /*
+ * Check if features are compatible across L1-Xen and L0-Xen;
+ * If not, get rid of xenhost_r2.
+ */
+ if (xen_validate_features() == false)
+ __xenhost_unregister(xenhost_r2);

/* Install Xen paravirt ops */
pv_info = xen_info;
diff --git a/drivers/xen/features.c b/drivers/xen/features.c
index d7d34fdfc993..b4fba808ebae 100644
--- a/drivers/xen/features.c
+++ b/drivers/xen/features.c
@@ -15,19 +15,40 @@
#include <xen/interface/version.h>
#include <xen/features.h>

-u8 xen_features[XENFEAT_NR_SUBMAPS * 32] __read_mostly;
-EXPORT_SYMBOL_GPL(xen_features);
-
-void xen_setup_features(void)
+void xen_setup_features(xenhost_t *xh)
{
struct xen_feature_info fi;
int i, j;

for (i = 0; i < XENFEAT_NR_SUBMAPS; i++) {
fi.submap_idx = i;
- if (HYPERVISOR_xen_version(XENVER_get_features, &fi) < 0)
+ if (hypervisor_xen_version(xh, XENVER_get_features, &fi) < 0)
break;
for (j = 0; j < 32; j++)
- xen_features[i * 32 + j] = !!(fi.submap & 1<<j);
+ xh->features[i * 32 + j] = !!(fi.submap & 1<<j);
}
}
+
+bool xen_validate_features(void)
+{
+ int fail = 0;
+
+ if (xh_default && xh_remote) {
+ /*
+ * Check xh_default->features and xh_remote->features for
+ * compatibility. Relevant features should be compatible
+ * or we are asking for trouble.
+ */
+ fail += __xen_feature(xh_default, XENFEAT_auto_translated_physmap) !=
+ __xen_feature(xh_remote, XENFEAT_auto_translated_physmap);
+
+ /* We would like callbacks via hvm_callback_vector. */
+ fail += __xen_feature(xh_default, XENFEAT_hvm_callback_vector) == 0;
+ fail += __xen_feature(xh_remote, XENFEAT_hvm_callback_vector) == 0;
+
+ if (fail)
+ return false;
+ }
+
+ return fail ? false : true;
+}
diff --git a/include/xen/features.h b/include/xen/features.h
index e4cb464386a9..63e6735ed6a3 100644
--- a/include/xen/features.h
+++ b/include/xen/features.h
@@ -11,14 +11,25 @@
#define __XEN_FEATURES_H__

#include <xen/interface/features.h>
+#include <xen/xenhost.h>

-void xen_setup_features(void);
+void xen_setup_features(xenhost_t *xh);

-extern u8 xen_features[XENFEAT_NR_SUBMAPS * 32];
+bool xen_validate_features(void);

+static inline int __xen_feature(xenhost_t *xh, int flag)
+{
+ return xh->features[flag];
+}
+
+/*
+ * We've validated the features that need to be common for both xenhost_r1 and
+ * xenhost_r2 (XENFEAT_hvm_callback_vector, XENFEAT_auto_translated_physmap.)
+ * Most of the other features should be only needed for the default xenhost.
+ */
static inline int xen_feature(int flag)
{
- return xen_features[flag];
+ return __xen_feature(xh_default, flag);
}

#endif /* __ASM_XEN_FEATURES_H__ */
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index d9bc1fb6cce4..dd1e2b64f50d 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -4,6 +4,7 @@
#include <xen/interface/features.h>
#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
+
/*
* Xenhost abstracts out the Xen interface. It co-exists with the PV/HVM/PVH
* abstractions (x86_init, hypervisor_x86, pv_ops etc) and is meant to
@@ -72,6 +73,15 @@ typedef struct {
struct xenhost_ops *ops;

struct hypercall_entry *hypercall_page;
+
+ /*
+ * Not clear if we need to draw features from two different
+ * hypervisors. There is one feature that seems might be necessary:
+ * XENFEAT_hvm_callback_vector.
+ * Ensuring support in both L1-Xen and L0-Xen means that L0-Xen can
+ * bounce callbacks via L1-Xen.
+ */
+ u8 features[XENFEAT_NR_SUBMAPS * 32];
} xenhost_t;

typedef struct xenhost_ops {
--
2.20.1

2019-05-09 17:27:29

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 16/16] xen/grant-table: host_addr fixup in mapping on xenhost_r0

Xenhost type xenhost_r0 does not support standard GNTTABOP_map_grant_ref
semantics (map a gref onto a specified host_addr). That's because
since the hypervisor is local (same address space as the caller of
GNTTABOP_map_grant_ref), there is no external entity that could
map an arbitrary page underneath an arbitrary address.

To handle this, the GNTTABOP_map_grant_ref hypercall on xenhost_r0
treats the host_addr as an OUT parameter instead of IN and expects the
gnttab_map_refs() and similar to fixup any state that caches the
value of host_addr from before the hypercall.

Accordingly gnttab_map_refs() now adds two parameters, a fixup function
and a pointer to cached maps to fixup:
int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
- struct page **pages, unsigned int count)
+ struct page **pages, gnttab_map_fixup_t map_fixup_fn,
+ void **map_fixup[], unsigned int count)

The reason we use a fixup function and not an additional mapping op
in the xenhost_t is because, depending on the caller, what we are fixing
might be different: blkback, netback for instance cache host_addr in
via a struct page *, while __xenbus_map_ring() caches a phys_addr.

This patch fixes up xen-blkback and xen-gntdev drivers.

TODO:
- also rewrite gnttab_batch_map() and __xenbus_map_ring().
- modify xen-netback, scsiback, pciback etc

Co-developed-by: Joao Martins <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
---
drivers/block/xen-blkback/blkback.c | 14 +++++++++++++-
drivers/xen/gntdev.c | 2 +-
drivers/xen/grant-table.c | 20 ++++++++++++++------
include/xen/grant_table.h | 11 ++++++++++-
4 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index d366a17a4bd8..50ce40ba35e5 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -806,11 +806,18 @@ static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
}
}

+static void blkbk_map_fixup(uint64_t host_addr, void **fixup)
+{
+ struct page **pg = (struct page **)fixup;
+ *pg = virt_to_page(host_addr);
+}
+
static int xen_blkbk_map(struct xen_blkif_ring *ring,
struct grant_page *pages[],
int num, bool ro)
{
struct gnttab_map_grant_ref map[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+ struct page **map_fixup[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct page *pages_to_gnt[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct persistent_gnt *persistent_gnt = NULL;
phys_addr_t addr = 0;
@@ -858,6 +865,9 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
gnttab_set_map_op(&map[segs_to_map++], addr,
flags, pages[i]->gref,
blkif->domid);
+
+ if (gnttab_map_fixup(dev->xh))
+ map_fixup[i] = &pages[i]->page;
}
map_until = i + 1;
if (segs_to_map == BLKIF_MAX_SEGMENTS_PER_REQUEST)
@@ -865,7 +875,9 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
}

if (segs_to_map) {
- ret = gnttab_map_refs(dev->xh, map, NULL, pages_to_gnt, segs_to_map);
+ ret = gnttab_map_refs(dev->xh, map, NULL, pages_to_gnt,
+ gnttab_map_fixup(dev->xh) ? blkbk_map_fixup : NULL,
+ (void ***) map_fixup, segs_to_map);
BUG_ON(ret);
}

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 40a42abe2dd0..32c6471834ba 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -342,7 +342,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)

pr_debug("map %d+%d\n", map->index, map->count);
err = gnttab_map_refs(xh, map->map_ops, use_ptemod ? map->kmap_ops : NULL,
- map->pages, map->count);
+ map->pages, NULL, NULL, map->count);
if (err)
return err;

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 959b81ade113..2f3a0a4a2660 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1084,7 +1084,8 @@ void gnttab_foreach_grant(struct page **pages,

int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
- struct page **pages, unsigned int count)
+ struct page **pages, gnttab_map_fixup_t map_fixup_fn,
+ void **map_fixup[], unsigned int count)
{
int i, ret;

@@ -1096,12 +1097,19 @@ int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
switch (map_ops[i].status) {
case GNTST_okay:
{
- struct xen_page_foreign *foreign;
+ if (!gnttab_map_fixup(xh)) {
+ struct xen_page_foreign *foreign;

- SetPageForeign(pages[i]);
- foreign = xen_page_foreign(pages[i]);
- foreign->domid = map_ops[i].dom;
- foreign->gref = map_ops[i].ref;
+ SetPageForeign(pages[i]);
+ foreign = xen_page_foreign(pages[i]);
+ foreign->domid = map_ops[i].dom;
+ foreign->gref = map_ops[i].ref;
+ } else {
+ pages[i] = virt_to_page(map_ops[i].host_addr);
+
+ if (map_fixup_fn)
+ map_fixup_fn(map_ops[i].host_addr, map_fixup[i]);
+ }
break;
}

diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 827b790199fb..14f7cc70cd01 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -219,9 +219,18 @@ int gnttab_dma_free_pages(xenhost_t *xh, struct gnttab_dma_alloc_args *args);
int gnttab_pages_set_private(int nr_pages, struct page **pages);
void gnttab_pages_clear_private(int nr_pages, struct page **pages);

+static inline bool
+gnttab_map_fixup(xenhost_t *xh)
+{
+ return xh->type == xenhost_r0;
+}
+
+typedef void (*gnttab_map_fixup_t)(uint64_t host_addr, void **map_fixup);
+
int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
- struct page **pages, unsigned int count);
+ struct page **pages, gnttab_map_fixup_t map_fixup_fn,
+ void **map_fixup[], unsigned int count);
int gnttab_unmap_refs(xenhost_t *xh, struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
--
2.20.1

2019-05-09 17:27:33

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 13/16] drivers/xen: gnttab, evtchn, xenbus API changes

Mechanical changes, now most of these calls take xenhost_t *
as parameter.

Co-developed-by: Joao Martins <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
---
drivers/xen/cpu_hotplug.c | 14 ++++++-------
drivers/xen/gntalloc.c | 13 ++++++++----
drivers/xen/gntdev.c | 16 +++++++++++----
drivers/xen/manage.c | 37 ++++++++++++++++++-----------------
drivers/xen/platform-pci.c | 12 +++++++-----
drivers/xen/sys-hypervisor.c | 12 ++++++++----
drivers/xen/xen-balloon.c | 10 +++++++---
drivers/xen/xenfs/xenstored.c | 7 ++++---
8 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index afeb94446d34..4a05bc028956 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -31,13 +31,13 @@ static void disable_hotplug_cpu(int cpu)
unlock_device_hotplug();
}

-static int vcpu_online(unsigned int cpu)
+static int vcpu_online(xenhost_t *xh, unsigned int cpu)
{
int err;
char dir[16], state[16];

sprintf(dir, "cpu/%u", cpu);
- err = xenbus_scanf(xh_default, XBT_NIL, dir, "availability", "%15s", state);
+ err = xenbus_scanf(xh, XBT_NIL, dir, "availability", "%15s", state);
if (err != 1) {
if (!xen_initial_domain())
pr_err("Unable to read cpu state\n");
@@ -52,12 +52,12 @@ static int vcpu_online(unsigned int cpu)
pr_err("unknown state(%s) on CPU%d\n", state, cpu);
return -EINVAL;
}
-static void vcpu_hotplug(unsigned int cpu)
+static void vcpu_hotplug(xenhost_t *xh, unsigned int cpu)
{
if (!cpu_possible(cpu))
return;

- switch (vcpu_online(cpu)) {
+ switch (vcpu_online(xh, cpu)) {
case 1:
enable_hotplug_cpu(cpu);
break;
@@ -78,7 +78,7 @@ static void handle_vcpu_hotplug_event(struct xenbus_watch *watch,
cpustr = strstr(path, "cpu/");
if (cpustr != NULL) {
sscanf(cpustr, "cpu/%u", &cpu);
- vcpu_hotplug(cpu);
+ vcpu_hotplug(watch->xh, cpu);
}
}

@@ -93,7 +93,7 @@ static int setup_cpu_watcher(struct notifier_block *notifier,
(void)register_xenbus_watch(xh_default, &cpu_watch);

for_each_possible_cpu(cpu) {
- if (vcpu_online(cpu) == 0) {
+ if (vcpu_online(cpu_watch.xh, cpu) == 0) {
(void)cpu_down(cpu);
set_cpu_present(cpu, false);
}
@@ -114,7 +114,7 @@ static int __init setup_vcpu_hotplug_event(void)
#endif
return -ENODEV;

- register_xenstore_notifier(&xsn_cpu);
+ register_xenstore_notifier(xh_default, &xsn_cpu);

return 0;
}
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index e07823886fa8..a490e4e8c854 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -79,6 +79,8 @@ static LIST_HEAD(gref_list);
static DEFINE_MUTEX(gref_mutex);
static int gref_size;

+static xenhost_t *xh;
+
struct notify_info {
uint16_t pgoff:12; /* Bits 0-11: Offset of the byte to clear */
uint16_t flags:2; /* Bits 12-13: Unmap notification flags */
@@ -144,7 +146,7 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op,
}

/* Grant foreign access to the page. */
- rc = gnttab_grant_foreign_access(op->domid,
+ rc = gnttab_grant_foreign_access(xh, op->domid,
xen_page_to_gfn(gref->page),
readonly);
if (rc < 0)
@@ -196,13 +198,13 @@ static void __del_gref(struct gntalloc_gref *gref)
gref->notify.flags = 0;

if (gref->gref_id) {
- if (gnttab_query_foreign_access(gref->gref_id))
+ if (gnttab_query_foreign_access(xh, gref->gref_id))
return;

- if (!gnttab_end_foreign_access_ref(gref->gref_id, 0))
+ if (!gnttab_end_foreign_access_ref(xh, gref->gref_id, 0))
return;

- gnttab_free_grant_reference(gref->gref_id);
+ gnttab_free_grant_reference(xh, gref->gref_id);
}

gref_size--;
@@ -586,6 +588,9 @@ static int __init gntalloc_init(void)
if (!xen_domain())
return -ENODEV;

+ /* Limit to default xenhost for now. */
+ xh = xh_default;
+
err = misc_register(&gntalloc_miscdev);
if (err != 0) {
pr_err("Could not register misc gntalloc device\n");
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 0f0c951cd5b1..40a42abe2dd0 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -67,6 +67,8 @@ static atomic_t pages_mapped = ATOMIC_INIT(0);
static int use_ptemod;
#define populate_freeable_maps use_ptemod

+static xenhost_t *xh;
+
static int unmap_grant_pages(struct gntdev_grant_map *map,
int offset, int pages);

@@ -114,7 +116,7 @@ static void gntdev_free_map(struct gntdev_grant_map *map)
} else
#endif
if (map->pages)
- gnttab_free_pages(map->count, map->pages);
+ gnttab_free_pages(xh, map->count, map->pages);

#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
kfree(map->frames);
@@ -183,7 +185,7 @@ struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
add->dma_bus_addr = args.dev_bus_addr;
} else
#endif
- if (gnttab_alloc_pages(count, add->pages))
+ if (gnttab_alloc_pages(xh, count, add->pages))
goto err;

for (i = 0; i < count; i++) {
@@ -339,7 +341,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
}

pr_debug("map %d+%d\n", map->index, map->count);
- err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops : NULL,
+ err = gnttab_map_refs(xh, map->map_ops, use_ptemod ? map->kmap_ops : NULL,
map->pages, map->count);
if (err)
return err;
@@ -385,6 +387,7 @@ static int __unmap_grant_pages(struct gntdev_grant_map *map, int offset,
unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
unmap_data.pages = map->pages + offset;
unmap_data.count = pages;
+ unmap_data.xh = xh;

err = gnttab_unmap_refs_sync(&unmap_data);
if (err)
@@ -877,7 +880,7 @@ static int gntdev_copy(struct gntdev_copy_batch *batch)
{
unsigned int i;

- gnttab_batch_copy(batch->ops, batch->nr_ops);
+ gnttab_batch_copy(xh, batch->ops, batch->nr_ops);
gntdev_put_pages(batch);

/*
@@ -1210,8 +1213,13 @@ static int __init gntdev_init(void)
if (!xen_domain())
return -ENODEV;

+ /*
+ * Use for mappings grants related to the default xenhost.
+ */
+ xh = xh_default;
use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);

+
err = misc_register(&gntdev_miscdev);
if (err != 0) {
pr_err("Could not register gntdev device\n");
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 9a69d955dd5c..1655d0a039fd 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch *watch,
return;

again:
- err = xenbus_transaction_start(xh_default, &xbt);
+ err = xenbus_transaction_start(watch->xh, &xbt);
if (err)
return;

- str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown", NULL);
+ str = (char *)xenbus_read(watch->xh, xbt, "control", "shutdown", NULL);
/* Ignore read errors and empty reads. */
if (XENBUS_IS_ERR_READ(str)) {
- xenbus_transaction_end(xh_default, xbt, 1);
+ xenbus_transaction_end(watch->xh, xbt, 1);
return;
}

@@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch *watch,

/* Only acknowledge commands which we are prepared to handle. */
if (idx < ARRAY_SIZE(shutdown_handlers))
- xenbus_write(xh_default, xbt, "control", "shutdown", "");
+ xenbus_write(watch->xh, xbt, "control", "shutdown", "");

- err = xenbus_transaction_end(xh_default, xbt, 0);
+ err = xenbus_transaction_end(watch->xh, xbt, 0);
if (err == -EAGAIN) {
kfree(str);
goto again;
@@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
int err;

again:
- err = xenbus_transaction_start(xh_default, &xbt);
+ err = xenbus_transaction_start(watch->xh, &xbt);
if (err)
return;
- err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c", &sysrq_key);
+ err = xenbus_scanf(watch->xh, xbt, "control", "sysrq", "%c", &sysrq_key);
if (err < 0) {
/*
* The Xenstore watch fires directly after registering it and
@@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
if (err != -ENOENT && err != -ERANGE)
pr_err("Error %d reading sysrq code in control/sysrq\n",
err);
- xenbus_transaction_end(xh_default, xbt, 1);
+ xenbus_transaction_end(watch->xh, xbt, 1);
return;
}

if (sysrq_key != '\0') {
- err = xenbus_printf(xh_default, xbt, "control", "sysrq", "%c", '\0');
+ err = xenbus_printf(watch->xh, xbt, "control", "sysrq", "%c", '\0');
if (err) {
pr_err("%s: Error %d writing sysrq in control/sysrq\n",
__func__, err);
- xenbus_transaction_end(xh_default, xbt, 1);
+ xenbus_transaction_end(watch->xh, xbt, 1);
return;
}
}

- err = xenbus_transaction_end(xh_default, xbt, 0);
+ err = xenbus_transaction_end(watch->xh, xbt, 0);
if (err == -EAGAIN)
goto again;

@@ -324,14 +324,14 @@ static struct notifier_block xen_reboot_nb = {
.notifier_call = poweroff_nb,
};

-static int setup_shutdown_watcher(void)
+static int setup_shutdown_watcher(xenhost_t *xh)
{
int err;
int idx;
#define FEATURE_PATH_SIZE (SHUTDOWN_CMD_SIZE + sizeof("feature-"))
char node[FEATURE_PATH_SIZE];

- err = register_xenbus_watch(xh_default, &shutdown_watch);
+ err = register_xenbus_watch(xh, &shutdown_watch);
if (err) {
pr_err("Failed to set shutdown watcher\n");
return err;
@@ -339,7 +339,7 @@ static int setup_shutdown_watcher(void)


#ifdef CONFIG_MAGIC_SYSRQ
- err = register_xenbus_watch(xh_default, &sysrq_watch);
+ err = register_xenbus_watch(xh, &sysrq_watch);
if (err) {
pr_err("Failed to set sysrq watcher\n");
return err;
@@ -351,7 +351,7 @@ static int setup_shutdown_watcher(void)
continue;
snprintf(node, FEATURE_PATH_SIZE, "feature-%s",
shutdown_handlers[idx].command);
- err = xenbus_printf(xh_default, XBT_NIL, "control", node, "%u", 1);
+ err = xenbus_printf(xh, XBT_NIL, "control", node, "%u", 1);
if (err) {
pr_err("%s: Error %d writing %s\n", __func__,
err, node);
@@ -364,9 +364,9 @@ static int setup_shutdown_watcher(void)

static int shutdown_event(struct notifier_block *notifier,
unsigned long event,
- void *data)
+ void *xh)
{
- setup_shutdown_watcher();
+ setup_shutdown_watcher((xenhost_t *) xh);
return NOTIFY_DONE;
}

@@ -378,7 +378,8 @@ int xen_setup_shutdown_event(void)

if (!xen_domain())
return -ENODEV;
- register_xenstore_notifier(&xenstore_notifier);
+
+ register_xenstore_notifier(xh_default, &xenstore_notifier);
register_reboot_notifier(&xen_reboot_nb);

return 0;
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 5d7dcad0b0a0..8fdb01c4a610 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -154,18 +154,20 @@ static int platform_pci_probe(struct pci_dev *pdev,
}
}

- max_nr_gframes = gnttab_max_grant_frames();
+ max_nr_gframes = gnttab_max_grant_frames(xh_default);
grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
- ret = gnttab_setup_auto_xlat_frames(grant_frames);
+ ret = gnttab_setup_auto_xlat_frames(xh_default, grant_frames);
if (ret)
goto out;
- ret = gnttab_init();
+
+ /* HVM only, we don't need xh_remote */
+ ret = gnttab_init(xh_default);
if (ret)
goto grant_out;
- xenbus_probe(NULL);
+ __xenbus_probe(xh_default->xenstore_private);
return 0;
grant_out:
- gnttab_free_auto_xlat_frames();
+ gnttab_free_auto_xlat_frames(xh_default);
out:
pci_release_region(pdev, 0);
mem_out:
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 005a898e7a23..d69c0790692c 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -14,6 +14,7 @@
#include <linux/err.h>

#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

@@ -141,15 +142,15 @@ static ssize_t uuid_show_fallback(struct hyp_sysfs_attr *attr, char *buffer)
{
char *vm, *val;
int ret;
- extern int xenstored_ready;

+ /* Disable for now: xenstored_ready is private to xenbus
if (!xenstored_ready)
- return -EBUSY;
+ return -EBUSY;*/

- vm = xenbus_read(XBT_NIL, "vm", "", NULL);
+ vm = xenbus_read(xh_default, XBT_NIL, "vm", "", NULL);
if (IS_ERR(vm))
return PTR_ERR(vm);
- val = xenbus_read(XBT_NIL, vm, "uuid", NULL);
+ val = xenbus_read(xh_default, XBT_NIL, vm, "uuid", NULL);
kfree(vm);
if (IS_ERR(val))
return PTR_ERR(val);
@@ -602,6 +603,9 @@ static struct kobj_type hyp_sysfs_kobj_type = {
.sysfs_ops = &hyp_sysfs_ops,
};

+/*
+ * For now, default xenhost only.
+ */
static int __init hypervisor_subsys_init(void)
{
if (!xen_domain())
diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
index d34d9b1af7a8..9d448dd7ff17 100644
--- a/drivers/xen/xen-balloon.c
+++ b/drivers/xen/xen-balloon.c
@@ -40,6 +40,7 @@

#include <xen/xen.h>
#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <xen/balloon.h>
#include <xen/xenbus.h>
#include <xen/features.h>
@@ -99,11 +100,11 @@ static struct xenbus_watch target_watch = {

static int balloon_init_watcher(struct notifier_block *notifier,
unsigned long event,
- void *data)
+ void *xh)
{
int err;

- err = register_xenbus_watch(xh_default, &target_watch);
+ err = register_xenbus_watch(xh, &target_watch);
if (err)
pr_err("Failed to set balloon watcher\n");

@@ -120,7 +121,10 @@ void xen_balloon_init(void)

register_xen_selfballooning(&balloon_dev);

- register_xenstore_notifier(&xenstore_notifier);
+ /*
+ * ballooning is only concerned with the default xenhost.
+ */
+ register_xenstore_notifier(xh_default, &xenstore_notifier);
}
EXPORT_SYMBOL_GPL(xen_balloon_init);

diff --git a/drivers/xen/xenfs/xenstored.c b/drivers/xen/xenfs/xenstored.c
index f59235f9f8a2..1d66974ae730 100644
--- a/drivers/xen/xenfs/xenstored.c
+++ b/drivers/xen/xenfs/xenstored.c
@@ -8,6 +8,7 @@
#include <xen/xenbus.h>

#include "xenfs.h"
+#include "../xenbus/xenbus.h" /* FIXME */

static ssize_t xsd_read(struct file *file, char __user *buf,
size_t size, loff_t *off)
@@ -25,7 +26,7 @@ static int xsd_release(struct inode *inode, struct file *file)
static int xsd_kva_open(struct inode *inode, struct file *file)
{
file->private_data = (void *)kasprintf(GFP_KERNEL, "0x%p",
- xen_store_interface);
+ xs_priv(xh_default)->store_interface);
if (!file->private_data)
return -ENOMEM;
return 0;
@@ -39,7 +40,7 @@ static int xsd_kva_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;

if (remap_pfn_range(vma, vma->vm_start,
- virt_to_pfn(xen_store_interface),
+ virt_to_pfn(xs_priv(xh_default)->store_interface),
size, vma->vm_page_prot))
return -EAGAIN;

@@ -56,7 +57,7 @@ const struct file_operations xsd_kva_file_ops = {
static int xsd_port_open(struct inode *inode, struct file *file)
{
file->private_data = (void *)kasprintf(GFP_KERNEL, "%d",
- xen_store_evtchn);
+ xs_priv(xh_default)->store_evtchn);
if (!file->private_data)
return -ENOMEM;
return 0;
--
2.20.1

2019-05-09 17:27:38

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 08/16] x86/xen: irq/upcall handling with multiple xenhosts

For configurations with multiple xenhosts, we need to handle events
generated from multiple xenhosts.

Having more than one upcall handler might be quite hairy, and it would
be simpler if the callback from L0-Xen could be bounced via L1-Xen.
This will also mean simpler pv_irq_ops code because now the IF flag
maps onto the xh_default->vcpu_info->evtchn_upcall_mask.

However, we still update the xh_remote->vcpu_info->evtchn_upcall_mask
on a best effort basis to minimize unnecessary work in remote xenhost.

TODO:
- direct pv_ops.irq are disabled.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/enlighten_pv.c | 4 ++-
arch/x86/xen/irq.c | 69 +++++++++++++++++++++++++++++--------
arch/x86/xen/smp_pv.c | 11 ++++++
4 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 564b4dddbc15..3c7056ad3520 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_XEN_PV) += enlighten_pv.o
obj-$(CONFIG_XEN_PV) += mmu_pv.o
obj-$(CONFIG_XEN_PV) += irq.o
obj-$(CONFIG_XEN_PV) += multicalls.o
-obj-$(CONFIG_XEN_PV) += xen-asm.o
+obj-n += xen-asm.o
obj-$(CONFIG_XEN_PV) += xen-asm_$(BITS).o

obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5f6a1475ec0c..77b1a0d4aef2 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -996,8 +996,9 @@ void __init xen_setup_vcpu_info_placement(void)
* xen_vcpu_setup managed to place the vcpu_info within the
* percpu area for all cpus, so make use of it.
*/
+#if 0
+ /* Disable direct access for now. */
if (xen_have_vcpu_info_placement && false) {
- /* Disable direct access until we have proper pcpu data structures. */
pv_ops.irq.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
pv_ops.irq.restore_fl =
__PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
@@ -1007,6 +1008,7 @@ void __init xen_setup_vcpu_info_placement(void)
__PV_IS_CALLEE_SAVE(xen_irq_enable_direct);
pv_ops.mmu.read_cr2 = xen_read_cr2_direct;
}
+#endif
}

static const struct pv_info xen_info __initconst = {
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 38ad1a1c4763..f760a6abfb1e 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -19,9 +19,9 @@
* callback mask. We do this in a very simple manner, by making a call
* down into Xen. The pending flag will be checked by Xen on return.
*/
-void xen_force_evtchn_callback(void)
+void xen_force_evtchn_callback(xenhost_t *xh)
{
- (void)HYPERVISOR_xen_version(0, NULL);
+ (void)hypervisor_xen_version(xh, 0, NULL);
}

asmlinkage __visible unsigned long xen_save_fl(void)
@@ -29,6 +29,21 @@ asmlinkage __visible unsigned long xen_save_fl(void)
struct vcpu_info *vcpu;
unsigned long flags;

+ /*
+ * In scenarios with more than one xenhost, the primary xenhost
+ * is responsible for all the upcalls, with the remote xenhost
+ * bouncing its upcalls through it (see comment in
+ * cpu_initialize_context().)
+ *
+ * To minimize unnecessary upcalls, the remote xenhost still looks at
+ * the value of vcpu_info->evtchn_upcall_mask, so we still set and reset
+ * that.
+ *
+ * The fact that the upcall itself is gated by the default xenhost,
+ * also helps in simplifying the logic here because we don't have to
+ * worry about guaranteeing atomicity with updates to
+ * xh_remote->vcpu_info->evtchn_upcall_mask.
+ */
vcpu = xh_default->xen_vcpu[smp_processor_id()];

/* flag has opposite sense of mask */
@@ -38,26 +53,34 @@ asmlinkage __visible unsigned long xen_save_fl(void)
-0 -> 0x00000000
-1 -> 0xffffffff
*/
- return (-flags) & X86_EFLAGS_IF;
+ return ((-flags) & X86_EFLAGS_IF);
}
PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);

__visible void xen_restore_fl(unsigned long flags)
{
struct vcpu_info *vcpu;
+ xenhost_t **xh;

/* convert from IF type flag */
flags = !(flags & X86_EFLAGS_IF);

/* See xen_irq_enable() for why preemption must be disabled. */
preempt_disable();
- vcpu = xh_default->xen_vcpu[smp_processor_id()];
- vcpu->evtchn_upcall_mask = flags;
+ for_each_xenhost(xh) {
+ vcpu = (*xh)->xen_vcpu[smp_processor_id()];
+ vcpu->evtchn_upcall_mask = flags;
+ }

if (flags == 0) {
barrier(); /* unmask then check (avoid races) */
- if (unlikely(vcpu->evtchn_upcall_pending))
- xen_force_evtchn_callback();
+ for_each_xenhost(xh) {
+ /* Preemption is disabled so we should not have
+ * gotten moved to a different VCPU. */
+ vcpu = (*xh)->xen_vcpu[smp_processor_id()];
+ if (unlikely(vcpu->evtchn_upcall_pending))
+ xen_force_evtchn_callback(*xh);
+ }
preempt_enable();
} else
preempt_enable_no_resched();
@@ -66,11 +89,19 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);

asmlinkage __visible void xen_irq_disable(void)
{
+ xenhost_t **xh;
+
/* There's a one instruction preempt window here. We need to
make sure we're don't switch CPUs between getting the vcpu
pointer and updating the mask. */
preempt_disable();
- xh_default->xen_vcpu[smp_processor_id()]->evtchn_upcall_mask = 1;
+ for_each_xenhost(xh)
+ /*
+ * Mask events on this CPU for both the xenhosts. As the
+ * comment above mentions, disabling preemption means we
+ * can safely do that.
+ */
+ (*xh)->xen_vcpu[smp_processor_id()]->evtchn_upcall_mask = 1;
preempt_enable_no_resched();
}
PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
@@ -78,6 +109,7 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
asmlinkage __visible void xen_irq_enable(void)
{
struct vcpu_info *vcpu;
+ xenhost_t **xh;

/*
* We may be preempted as soon as vcpu->evtchn_upcall_mask is
@@ -86,16 +118,25 @@ asmlinkage __visible void xen_irq_enable(void)
*/
preempt_disable();

- vcpu = xh_default->xen_vcpu[smp_processor_id()];
- vcpu->evtchn_upcall_mask = 0;
+ /* Given that the interrupts are generated from the default xenhost,
+ * we should do this in reverse order.
+ */
+ for_each_xenhost(xh) {
+ vcpu = (*xh)->xen_vcpu[smp_processor_id()];
+ vcpu->evtchn_upcall_mask = 0;

- /* Doesn't matter if we get preempted here, because any
- pending event will get dealt with anyway. */
+ /* We could get preempted by an incoming interrupt here with a
+ * half enabled irq (for the first xenhost.)
+ */
+ }

barrier(); /* unmask then check (avoid races) */
- if (unlikely(vcpu->evtchn_upcall_pending))
- xen_force_evtchn_callback();

+ for_each_xenhost(xh) {
+ vcpu = (*xh)->xen_vcpu[smp_processor_id()];
+ if (unlikely(vcpu->evtchn_upcall_pending))
+ xen_force_evtchn_callback(*xh);
+ }
preempt_enable();
}
PV_CALLEE_SAVE_REGS_THUNK(xen_irq_enable);
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6d9c3e6611ef..f4ea9eac8b6a 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -343,6 +343,17 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
#else
ctxt->gs_base_kernel = per_cpu_offset(cpu);
#endif
+ /*
+ * We setup an upcall handler only for the default xenhost. The remote
+ * xenhost will generate evtchn events, but an additional callback would be
+ * quite hairy, since we would have VCPU state initialised in multiple
+ * hypervisors and issues like re-entrancy of upcalls.
+ *
+ * It would be simpler if the callback from L0-Xen could be bounced
+ * bounced via L1-Xen. This also simplifies the pv_irq_ops code
+ * because now the CPU's IF processing only needs to happen on
+ * xh_default->vcpu_info.
+ */
ctxt->event_callback_eip =
(unsigned long)xen_hypervisor_callback;
ctxt->failsafe_callback_eip =
--
2.20.1

2019-05-09 17:27:39

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 04/16] x86/xen: hypercall support for xenhost_t

Allow for different hypercall implementations for different xenhost types.
Nested xenhost, which has two underlying xenhosts, can use both
simultaneously.

The hypercall macros (HYPERVISOR_*) implicitly use the default xenhost.x
A new macro (hypervisor_*) takes xenhost_t * as a parameter and does the
right thing.

TODO:
- Multicalls for now assume the default xenhost
- xen_hypercall_* symbols are only generated for the default xenhost.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 233 ++++++++++++++++++---------
arch/x86/xen/enlighten.c | 3 -
arch/x86/xen/enlighten_hvm.c | 23 ++-
arch/x86/xen/enlighten_pv.c | 13 +-
arch/x86/xen/enlighten_pvh.c | 9 +-
arch/x86/xen/xen-head.S | 3 +
drivers/xen/fallback.c | 8 +-
include/xen/xenhost.h | 23 +++
8 files changed, 218 insertions(+), 97 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 1a3cd6680e6f..e138f9c36a5a 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -51,6 +51,7 @@
#include <xen/interface/physdev.h>
#include <xen/interface/platform.h>
#include <xen/interface/xen-mca.h>
+#include <xen/xenhost.h>

struct xen_dm_op_buf;

@@ -88,11 +89,11 @@ struct xen_dm_op_buf;

struct hypercall_entry { char _entry[32]; };
extern struct hypercall_entry xen_hypercall_page[128];
-extern struct hypercall_entry *hypercall_page;
+extern struct hypercall_entry xen_hypercall_page2[128];

#define __HYPERCALL CALL_NOSPEC
-#define __HYPERCALL_ENTRY(x) \
- [thunk_target] "0" (hypercall_page + __HYPERVISOR_##x)
+#define __HYPERCALL_ENTRY(xh, x) \
+ [thunk_target] "0" (xh->hypercall_page + __HYPERVISOR_##x)

#ifdef CONFIG_X86_32
#define __HYPERCALL_RETREG "eax"
@@ -144,57 +145,57 @@ extern struct hypercall_entry *hypercall_page;
#define __HYPERCALL_CLOBBER1 __HYPERCALL_CLOBBER2, __HYPERCALL_ARG2REG
#define __HYPERCALL_CLOBBER0 __HYPERCALL_CLOBBER1, __HYPERCALL_ARG1REG

-#define _hypercall0(type, name) \
+#define _hypercall0(xh, type, name) \
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_0ARG(); \
asm volatile (__HYPERCALL \
: __HYPERCALL_0PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(xh, name) \
: __HYPERCALL_CLOBBER0); \
(type)__res; \
})

-#define _hypercall1(type, name, a1) \
+#define _hypercall1(xh, type, name, a1) \
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_1ARG(a1); \
asm volatile (__HYPERCALL \
: __HYPERCALL_1PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(xh, name) \
: __HYPERCALL_CLOBBER1); \
(type)__res; \
})

-#define _hypercall2(type, name, a1, a2) \
+#define _hypercall2(xh, type, name, a1, a2) \
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_2ARG(a1, a2); \
asm volatile (__HYPERCALL \
: __HYPERCALL_2PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(xh, name) \
: __HYPERCALL_CLOBBER2); \
(type)__res; \
})

-#define _hypercall3(type, name, a1, a2, a3) \
+#define _hypercall3(xh, type, name, a1, a2, a3) \
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_3ARG(a1, a2, a3); \
asm volatile (__HYPERCALL \
: __HYPERCALL_3PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(xh, name) \
: __HYPERCALL_CLOBBER3); \
(type)__res; \
})

-#define _hypercall4(type, name, a1, a2, a3, a4) \
+#define _hypercall4(xh, type, name, a1, a2, a3, a4) \
({ \
__HYPERCALL_DECLS; \
__HYPERCALL_4ARG(a1, a2, a3, a4); \
asm volatile (__HYPERCALL \
: __HYPERCALL_4PARAM \
- : __HYPERCALL_ENTRY(name) \
+ : __HYPERCALL_ENTRY(xh, name) \
: __HYPERCALL_CLOBBER4); \
(type)__res; \
})
@@ -210,7 +211,7 @@ xen_single_call(unsigned int call,

asm volatile(CALL_NOSPEC
: __HYPERCALL_5PARAM
- : [thunk_target] "0" (hypercall_page + call)
+ : [thunk_target] "0" (xh_default->hypercall_page + call)
: __HYPERCALL_CLOBBER5);

return (long)__res;
@@ -232,170 +233,235 @@ privcmd_call(unsigned int call,
}

static inline int
-HYPERVISOR_set_trap_table(struct trap_info *table)
+hypervisor_set_trap_table(xenhost_t *xh, struct trap_info *table)
{
- return _hypercall1(int, set_trap_table, table);
+ return _hypercall1(xh, int, set_trap_table, table);
}

+#define HYPERVISOR_set_trap_table(table) \
+ hypervisor_set_trap_table(xh_default, table)
+
static inline int
-HYPERVISOR_mmu_update(struct mmu_update *req, int count,
+hypervisor_mmu_update(xenhost_t *xh, struct mmu_update *req, int count,
int *success_count, domid_t domid)
{
- return _hypercall4(int, mmu_update, req, count, success_count, domid);
+ return _hypercall4(xh, int, mmu_update, req, count, success_count, domid);
}
+#define HYPERVISOR_mmu_update(req, count, success_count, domid) \
+ hypervisor_mmu_update(xh_default, req, count, success_count, domid)

static inline int
-HYPERVISOR_mmuext_op(struct mmuext_op *op, int count,
+hypervisor_mmuext_op(xenhost_t *xh, struct mmuext_op *op, int count,
int *success_count, domid_t domid)
{
- return _hypercall4(int, mmuext_op, op, count, success_count, domid);
+ return _hypercall4(xh, int, mmuext_op, op, count, success_count, domid);
}

+#define HYPERVISOR_mmuext_op(op, count, success_count, domid) \
+ hypervisor_mmuext_op(xh_default, op, count, success_count, domid)
+
static inline int
-HYPERVISOR_set_gdt(unsigned long *frame_list, int entries)
+hypervisor_set_gdt(xenhost_t *xh, unsigned long *frame_list, int entries)
{
- return _hypercall2(int, set_gdt, frame_list, entries);
+ return _hypercall2(xh, int, set_gdt, frame_list, entries);
}

+#define HYPERVISOR_set_gdt(frame_list, entries) \
+ hypervisor_set_gdt(xh_default, frame_list, entries)
+
static inline int
-HYPERVISOR_callback_op(int cmd, void *arg)
+hypervisor_callback_op(xenhost_t *xh, int cmd, void *arg)
{
- return _hypercall2(int, callback_op, cmd, arg);
+ return _hypercall2(xh, int, callback_op, cmd, arg);
}

+#define HYPERVISOR_callback_op(cmd, arg) \
+ hypervisor_callback_op(xh_default, cmd, arg)
+
static inline int
-HYPERVISOR_sched_op(int cmd, void *arg)
+hypervisor_sched_op(xenhost_t *xh, int cmd, void *arg)
{
- return _hypercall2(int, sched_op, cmd, arg);
+ return _hypercall2(xh, int, sched_op, cmd, arg);
}

+#define HYPERVISOR_sched_op(cmd, arg) \
+ hypervisor_sched_op(xh_default, cmd, arg)
+
static inline long
-HYPERVISOR_set_timer_op(u64 timeout)
+hypervisor_set_timer_op(xenhost_t *xh, u64 timeout)
{
unsigned long timeout_hi = (unsigned long)(timeout>>32);
unsigned long timeout_lo = (unsigned long)timeout;
- return _hypercall2(long, set_timer_op, timeout_lo, timeout_hi);
+ return _hypercall2(xh, long, set_timer_op, timeout_lo, timeout_hi);
}

+#define HYPERVISOR_set_timer_op(timeout) \
+ hypervisor_set_timer_op(xh_default, timeout)
+
static inline int
-HYPERVISOR_mca(struct xen_mc *mc_op)
+hypervisor_mca(xenhost_t *xh, struct xen_mc *mc_op)
{
mc_op->interface_version = XEN_MCA_INTERFACE_VERSION;
- return _hypercall1(int, mca, mc_op);
+ return _hypercall1(xh, int, mca, mc_op);
}

+#define HYPERVISOR_mca(mc_op) \
+ hypervisor_mca(xh_default, mc_op)
+
static inline int
-HYPERVISOR_platform_op(struct xen_platform_op *op)
+hypervisor_platform_op(xenhost_t *xh, struct xen_platform_op *op)
{
op->interface_version = XENPF_INTERFACE_VERSION;
- return _hypercall1(int, platform_op, op);
+ return _hypercall1(xh, int, platform_op, op);
}

+#define HYPERVISOR_platform_op(op) \
+ hypervisor_platform_op(xh_default, op)
+
static inline int
-HYPERVISOR_set_debugreg(int reg, unsigned long value)
+hypervisor_set_debugreg(xenhost_t *xh, int reg, unsigned long value)
{
- return _hypercall2(int, set_debugreg, reg, value);
+ return _hypercall2(xh, int, set_debugreg, reg, value);
}

+#define HYPERVISOR_set_debugreg(reg, value) \
+ hypervisor_set_debugreg(xh_default, reg, value)
+
static inline unsigned long
-HYPERVISOR_get_debugreg(int reg)
+hypervisor_get_debugreg(xenhost_t *xh, int reg)
{
- return _hypercall1(unsigned long, get_debugreg, reg);
+ return _hypercall1(xh, unsigned long, get_debugreg, reg);
}
+#define HYPERVISOR_get_debugreg(reg) \
+ hypervisor_get_debugreg(xh_default, reg)

static inline int
-HYPERVISOR_update_descriptor(u64 ma, u64 desc)
+hypervisor_update_descriptor(xenhost_t *xh, u64 ma, u64 desc)
{
if (sizeof(u64) == sizeof(long))
- return _hypercall2(int, update_descriptor, ma, desc);
- return _hypercall4(int, update_descriptor, ma, ma>>32, desc, desc>>32);
+ return _hypercall2(xh, int, update_descriptor, ma, desc);
+ return _hypercall4(xh, int, update_descriptor, ma, ma>>32, desc, desc>>32);
}

+#define HYPERVISOR_update_descriptor(ma, desc) \
+ hypervisor_update_descriptor(xh_default, ma, desc)
+
static inline long
-HYPERVISOR_memory_op(unsigned int cmd, void *arg)
+hypervisor_memory_op(xenhost_t *xh, unsigned int cmd, void *arg)
{
- return _hypercall2(long, memory_op, cmd, arg);
+ return _hypercall2(xh, long, memory_op, cmd, arg);
}

+#define HYPERVISOR_memory_op(cmd, arg) \
+ hypervisor_memory_op(xh_default, cmd, arg) \
+
static inline int
-HYPERVISOR_multicall(void *call_list, uint32_t nr_calls)
+hypervisor_multicall(xenhost_t *xh, void *call_list, uint32_t nr_calls)
{
- return _hypercall2(int, multicall, call_list, nr_calls);
+ return _hypercall2(xh, int, multicall, call_list, nr_calls);
}

+#define HYPERVISOR_multicall(call_list, nr_calls) \
+ hypervisor_multicall(xh_default, call_list, nr_calls)
+
static inline int
-HYPERVISOR_update_va_mapping(unsigned long va, pte_t new_val,
+hypervisor_update_va_mapping(xenhost_t *xh, unsigned long va, pte_t new_val,
unsigned long flags)
{
if (sizeof(new_val) == sizeof(long))
- return _hypercall3(int, update_va_mapping, va,
+ return _hypercall3(xh, int, update_va_mapping, va,
new_val.pte, flags);
else
- return _hypercall4(int, update_va_mapping, va,
+ return _hypercall4(xh, int, update_va_mapping, va,
new_val.pte, new_val.pte >> 32, flags);
}
-extern int __must_check xen_event_channel_op_compat(int, void *);
+
+#define HYPERVISOR_update_va_mapping(va, new_val, flags) \
+ hypervisor_update_va_mapping(xh_default, va, new_val, flags)
+
+extern int __must_check xen_event_channel_op_compat(xenhost_t *xh, int, void *);

static inline int
-HYPERVISOR_event_channel_op(int cmd, void *arg)
+hypervisor_event_channel_op(xenhost_t *xh, int cmd, void *arg)
{
- int rc = _hypercall2(int, event_channel_op, cmd, arg);
+ int rc = _hypercall2(xh, int, event_channel_op, cmd, arg);
if (unlikely(rc == -ENOSYS))
- rc = xen_event_channel_op_compat(cmd, arg);
+ rc = xen_event_channel_op_compat(xh, cmd, arg);
return rc;
}

+#define HYPERVISOR_event_channel_op(cmd, arg) \
+ hypervisor_event_channel_op(xh_default, cmd, arg)
+
static inline int
-HYPERVISOR_xen_version(int cmd, void *arg)
+hypervisor_xen_version(xenhost_t *xh, int cmd, void *arg)
{
- return _hypercall2(int, xen_version, cmd, arg);
+ return _hypercall2(xh, int, xen_version, cmd, arg);
}

+#define HYPERVISOR_xen_version(cmd, arg) \
+ hypervisor_xen_version(xh_default, cmd, arg)
+
static inline int
-HYPERVISOR_console_io(int cmd, int count, char *str)
+hypervisor_console_io(xenhost_t *xh, int cmd, int count, char *str)
{
- return _hypercall3(int, console_io, cmd, count, str);
+ return _hypercall3(xh, int, console_io, cmd, count, str);
}
+#define HYPERVISOR_console_io(cmd, count, str) \
+ hypervisor_console_io(xh_default, cmd, count, str)

-extern int __must_check xen_physdev_op_compat(int, void *);
+extern int __must_check xen_physdev_op_compat(xenhost_t *xh, int, void *);

static inline int
-HYPERVISOR_physdev_op(int cmd, void *arg)
+hypervisor_physdev_op(xenhost_t *xh, int cmd, void *arg)
{
- int rc = _hypercall2(int, physdev_op, cmd, arg);
+ int rc = _hypercall2(xh, int, physdev_op, cmd, arg);
if (unlikely(rc == -ENOSYS))
- rc = xen_physdev_op_compat(cmd, arg);
+ rc = xen_physdev_op_compat(xh, cmd, arg);
return rc;
}
+#define HYPERVISOR_physdev_op(cmd, arg) \
+ hypervisor_physdev_op(xh_default, cmd, arg)

static inline int
-HYPERVISOR_grant_table_op(unsigned int cmd, void *uop, unsigned int count)
+hypervisor_grant_table_op(xenhost_t *xh, unsigned int cmd, void *uop, unsigned int count)
{
- return _hypercall3(int, grant_table_op, cmd, uop, count);
+ return _hypercall3(xh, int, grant_table_op, cmd, uop, count);
}

+#define HYPERVISOR_grant_table_op(cmd, uop, count) \
+ hypervisor_grant_table_op(xh_default, cmd, uop, count)
+
static inline int
-HYPERVISOR_vm_assist(unsigned int cmd, unsigned int type)
+hypervisor_vm_assist(xenhost_t *xh, unsigned int cmd, unsigned int type)
{
- return _hypercall2(int, vm_assist, cmd, type);
+ return _hypercall2(xh, int, vm_assist, cmd, type);
}

+#define HYPERVISOR_vm_assist(cmd, type) \
+ hypervisor_vm_assist(xh_default, cmd, type)
+
static inline int
-HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
+hypervisor_vcpu_op(xenhost_t *xh, int cmd, int vcpuid, void *extra_args)
{
- return _hypercall3(int, vcpu_op, cmd, vcpuid, extra_args);
+ return _hypercall3(xh, int, vcpu_op, cmd, vcpuid, extra_args);
}

+#define HYPERVISOR_vcpu_op(cmd, vcpuid, extra_args) \
+ hypervisor_vcpu_op(xh_default, cmd, vcpuid, extra_args)
+
#ifdef CONFIG_X86_64
static inline int
-HYPERVISOR_set_segment_base(int reg, unsigned long value)
+hypervisor_set_segment_base(xenhost_t *xh, int reg, unsigned long value)
{
- return _hypercall2(int, set_segment_base, reg, value);
+ return _hypercall2(xh, int, set_segment_base, reg, value);
}
+#define HYPERVISOR_set_segment_base(reg, value) \
+ hypervisor_set_segment_base(xh_default, reg, value)
#endif

static inline int
-HYPERVISOR_suspend(unsigned long start_info_mfn)
+hypervisor_suspend(xenhost_t *xh, unsigned long start_info_mfn)
{
struct sched_shutdown r = { .reason = SHUTDOWN_suspend };

@@ -405,38 +471,53 @@ HYPERVISOR_suspend(unsigned long start_info_mfn)
* hypercall calling convention this is the third hypercall
* argument, which is start_info_mfn here.
*/
- return _hypercall3(int, sched_op, SCHEDOP_shutdown, &r, start_info_mfn);
+ return _hypercall3(xh, int, sched_op, SCHEDOP_shutdown, &r, start_info_mfn);
}
+#define HYPERVISOR_suspend(start_info_mfn) \
+ hypervisor_suspend(xh_default, start_info_mfn)

static inline unsigned long __must_check
-HYPERVISOR_hvm_op(int op, void *arg)
+hypervisor_hvm_op(xenhost_t *xh, int op, void *arg)
{
- return _hypercall2(unsigned long, hvm_op, op, arg);
+ return _hypercall2(xh, unsigned long, hvm_op, op, arg);
}

+#define HYPERVISOR_hvm_op(op, arg) \
+ hypervisor_hvm_op(xh_default, op, arg)
+
static inline int
-HYPERVISOR_tmem_op(
+hypervisor_tmem_op(
+ xenhost_t *xh,
struct tmem_op *op)
{
- return _hypercall1(int, tmem_op, op);
+ return _hypercall1(xh, int, tmem_op, op);
}

+#define HYPERVISOR_tmem_op(op) \
+ hypervisor_tmem_op(xh_default, op)
+
static inline int
-HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
+hypervisor_xenpmu_op(xenhost_t *xh, unsigned int op, void *arg)
{
- return _hypercall2(int, xenpmu_op, op, arg);
+ return _hypercall2(xh, int, xenpmu_op, op, arg);
}

+#define HYPERVISOR_xenpmu_op(op, arg) \
+ hypervisor_xenpmu_op(xh_default, op, arg)
+
static inline int
-HYPERVISOR_dm_op(
+hypervisor_dm_op(
+ xenhost_t *xh,
domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs)
{
int ret;
stac();
- ret = _hypercall3(int, dm_op, dom, nr_bufs, bufs);
+ ret = _hypercall3(xh, int, dm_op, dom, nr_bufs, bufs);
clac();
return ret;
}
+#define HYPERVISOR_dm_op(dom, nr_bufs, bufs) \
+ hypervisor_dm_op(xh_default, dom, nr_bufs, bufs)

static inline void
MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e9dc92e79afa..f88bb14da3f2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -20,9 +20,6 @@
#include "smp.h"
#include "pmu.h"

-struct hypercall_entry *hypercall_page;
-EXPORT_SYMBOL_GPL(hypercall_page);
-
/*
* Pointer to the xen_vcpu_info structure or
* &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 4d85cd2ff261..f84941d6944e 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -85,8 +85,20 @@ static void __init xen_hvm_init_mem_mapping(void)

extern uint32_t xen_pv_cpuid_base(xenhost_t *xh);

+void xen_hvm_setup_hypercall_page(xenhost_t *xh)
+{
+ u32 msr;
+ u64 pfn;
+
+ msr = cpuid_ebx(xenhost_cpuid_base(xh) + 2);
+ pfn = __pa(xen_hypercall_page);
+ wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+ xh->hypercall_page = xen_hypercall_page;
+}
+
xenhost_ops_t xh_hvm_ops = {
.cpuid_base = xen_pv_cpuid_base,
+ .setup_hypercall_page = xen_hvm_setup_hypercall_page,
};

xenhost_ops_t xh_hvm_nested_ops = {
@@ -96,6 +108,7 @@ static void __init init_hvm_pv_info(void)
{
int major, minor;
uint32_t eax, ebx, ecx, edx, base;
+ xenhost_t **xh;

base = xenhost_cpuid_base(xh_default);
eax = cpuid_eax(base + 1);
@@ -110,14 +123,10 @@ static void __init init_hvm_pv_info(void)
if (xen_pvh_domain())
pv_info.name = "Xen PVH";
else {
- u64 pfn;
- uint32_t msr;
-
pv_info.name = "Xen HVM";
- msr = cpuid_ebx(base + 2);
- pfn = __pa(xen_hypercall_page);
- wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
- hypercall_page = xen_hypercall_page;
+
+ for_each_xenhost(xh)
+ xenhost_setup_hypercall_page(*xh);
}

xen_setup_features();
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 3239e8452ede..a2c07cc71498 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1200,12 +1200,20 @@ uint32_t xen_pv_nested_cpuid_base(xenhost_t *xh)
2 /* nested specific leaf? */);
}

+static void xen_pv_setup_hypercall_page(xenhost_t *xh)
+{
+ xh->hypercall_page = xen_hypercall_page;
+}
+
xenhost_ops_t xh_pv_ops = {
.cpuid_base = xen_pv_cpuid_base,
+
+ .setup_hypercall_page = xen_pv_setup_hypercall_page,
};

xenhost_ops_t xh_pv_nested_ops = {
.cpuid_base = xen_pv_nested_cpuid_base,
+ .setup_hypercall_page = NULL,
};

/* First C function to be called on Xen boot */
@@ -1213,11 +1221,11 @@ asmlinkage __visible void __init xen_start_kernel(void)
{
struct physdev_set_iopl set_iopl;
unsigned long initrd_start = 0;
+ xenhost_t **xh;
int rc;

if (!xen_start_info)
return;
- hypercall_page = xen_hypercall_page;

xenhost_register(xenhost_r1, &xh_pv_ops);

@@ -1228,6 +1236,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
if (xen_driver_domain() && xen_nested())
xenhost_register(xenhost_r2, &xh_pv_nested_ops);

+ for_each_xenhost(xh)
+ xenhost_setup_hypercall_page(*xh);
+
xen_domain_type = XEN_PV_DOMAIN;
xen_start_flags = xen_start_info->flags;

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index e47866fcb7ea..50277dfbdf30 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -26,8 +26,7 @@ extern xenhost_ops_t xh_hvm_ops, xh_hvm_nested_ops;

void __init xen_pvh_init(void)
{
- u32 msr;
- u64 pfn;
+ xenhost_t **xh;

/*
* Note: we have already called xen_cpuid_base() in
@@ -45,10 +44,8 @@ void __init xen_pvh_init(void)
xen_pvh = 1;
xen_start_flags = pvh_start_info.flags;

- msr = cpuid_ebx(xen_cpuid_base() + 2);
- pfn = __pa(xen_hypercall_page);
- wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
- hypercall_page = xen_hypercall_page;
+ for_each_xenhost(xh)
+ xenhost_setup_hypercall_page(*xh);
}

void __init mem_map_via_hcall(struct boot_params *boot_params_p)
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7ff5437bd83f..6bbf4ff700d6 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -70,6 +70,9 @@ ENTRY(xen_hypercall_page)
#include <asm/xen-hypercalls.h>
#undef HYPERCALL
END(xen_hypercall_page)
+/*
+ * Add xen_hypercall_page2 for remote xenhost?
+ */
.popsection

ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux")
diff --git a/drivers/xen/fallback.c b/drivers/xen/fallback.c
index b04fb64c5a91..ae81cf75ae5f 100644
--- a/drivers/xen/fallback.c
+++ b/drivers/xen/fallback.c
@@ -5,14 +5,14 @@
#include <asm/hypervisor.h>
#include <asm/xen/hypercall.h>

-int xen_event_channel_op_compat(int cmd, void *arg)
+int xen_event_channel_op_compat(xenhost_t *xh, int cmd, void *arg)
{
struct evtchn_op op;
int rc;

op.cmd = cmd;
memcpy(&op.u, arg, sizeof(op.u));
- rc = _hypercall1(int, event_channel_op_compat, &op);
+ rc = _hypercall1(xh, int, event_channel_op_compat, &op);

switch (cmd) {
case EVTCHNOP_close:
@@ -44,14 +44,14 @@ int xen_event_channel_op_compat(int cmd, void *arg)
}
EXPORT_SYMBOL_GPL(xen_event_channel_op_compat);

-int xen_physdev_op_compat(int cmd, void *arg)
+int xen_physdev_op_compat(xenhost_t *xh, int cmd, void *arg)
{
struct physdev_op op;
int rc;

op.cmd = cmd;
memcpy(&op.u, arg, sizeof(op.u));
- rc = _hypercall1(int, physdev_op_compat, &op);
+ rc = _hypercall1(xh, int, physdev_op_compat, &op);

switch (cmd) {
case PHYSDEVOP_IRQ_UNMASK_NOTIFY:
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index 13a70bdadfd2..d9bc1fb6cce4 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -70,6 +70,8 @@ typedef struct {
enum xenhost_type type;

struct xenhost_ops *ops;
+
+ struct hypercall_entry *hypercall_page;
} xenhost_t;

typedef struct xenhost_ops {
@@ -83,6 +85,22 @@ typedef struct xenhost_ops {
* Separate cpuid-leafs?
*/
uint32_t (*cpuid_base)(xenhost_t *xenhost);
+
+ /*
+ * Hypercall page is setup as the first thing once the PV/PVH/PVHVM
+ * code detects that it is selected. The first use is in
+ * xen_setup_features().
+ *
+ * PV/PVH/PVHVM set this up in different ways: hypervisor takes
+ * care of this for PV, PVH and PVHVM use xen_cpuid.
+ *
+ * xenhost_r0: point hypercall_page to external hypercall_page.
+ * xenhost_r1: what we do now.
+ * xenhost_r2: hypercall interface that bypasses L1-Xen to go from
+ * L1-guest to L0-Xen. The interface would allow L0-Xen to be able
+ * to decide which particular L1-guest was the caller.
+ */
+ void (*setup_hypercall_page)(xenhost_t *xenhost);
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
@@ -113,4 +131,9 @@ static inline uint32_t xenhost_cpuid_base(xenhost_t *xh)
return xen_cpuid_base();
}

+static inline void xenhost_setup_hypercall_page(xenhost_t *xh)
+{
+ (xh->ops->setup_hypercall_page)(xh);
+}
+
#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:27:46

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 09/16] xen/evtchn: support evtchn in xenhost_t

Largely mechanical patch that adds a new param, xenhost_t * to the
evtchn interfaces. The evtchn port instead of being domain unique, is
now scoped to xenhost_t.

As part of upcall handling we now look at all the xenhosts and, for
evtchn_2l, the xenhost's shared_info and vcpu_info. Other than this
event handling is largley unchanged.

Note that the IPI, timer, VIRQ, FUNCTION, PMU etc vectors remain
attached to xh_default. Only interdomain evtchns are allowable as
xh_remote.

TODO:
- to minimize the changes, evtchn FIFO is disabled for now.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/pci/xen.c | 16 +-
arch/x86/xen/enlighten_hvm.c | 2 +-
arch/x86/xen/irq.c | 2 +-
arch/x86/xen/smp.c | 16 +-
arch/x86/xen/smp_pv.c | 4 +-
arch/x86/xen/time.c | 5 +-
arch/x86/xen/xen-ops.h | 1 +
arch/x86/xen/xenhost.c | 16 +
drivers/block/xen-blkback/xenbus.c | 2 +-
drivers/block/xen-blkfront.c | 2 +-
drivers/input/misc/xen-kbdfront.c | 2 +-
drivers/net/xen-netback/interface.c | 8 +-
drivers/net/xen-netfront.c | 6 +-
drivers/pci/xen-pcifront.c | 2 +-
drivers/xen/acpi.c | 2 +
drivers/xen/balloon.c | 2 +-
drivers/xen/events/Makefile | 1 -
drivers/xen/events/events_2l.c | 188 +++++-----
drivers/xen/events/events_base.c | 379 ++++++++++++---------
drivers/xen/events/events_fifo.c | 2 +-
drivers/xen/events/events_internal.h | 78 ++---
drivers/xen/evtchn.c | 22 +-
drivers/xen/fallback.c | 1 +
drivers/xen/gntalloc.c | 8 +-
drivers/xen/gntdev.c | 8 +-
drivers/xen/mcelog.c | 2 +-
drivers/xen/pcpu.c | 2 +-
drivers/xen/preempt.c | 1 +
drivers/xen/privcmd.c | 1 +
drivers/xen/sys-hypervisor.c | 2 +-
drivers/xen/time.c | 2 +-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 5 +-
drivers/xen/xenbus/xenbus_client.c | 2 +-
drivers/xen/xenbus/xenbus_comms.c | 6 +-
drivers/xen/xenbus/xenbus_probe.c | 1 +
drivers/xen/xenbus/xenbus_probe_backend.c | 1 +
drivers/xen/xenbus/xenbus_probe_frontend.c | 1 +
drivers/xen/xenbus/xenbus_xs.c | 1 +
include/xen/events.h | 45 +--
include/xen/xenhost.h | 17 +
41 files changed, 483 insertions(+), 383 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index d1a3b9f08289..9aa591b5fa3b 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -19,6 +19,8 @@
#include <asm/pci_x86.h>

#include <asm/xen/hypervisor.h>
+#include <xen/interface/xen.h>
+#include <xen/xenhost.h>

#include <xen/features.h>
#include <xen/events.h>
@@ -46,7 +48,7 @@ static int xen_pcifront_enable_irq(struct pci_dev *dev)
if (gsi < nr_legacy_irqs())
share = 0;

- rc = xen_bind_pirq_gsi_to_irq(gsi, pirq, share, "pcifront");
+ rc = xen_bind_pirq_gsi_to_irq(xh_default, gsi, pirq, share, "pcifront");
if (rc < 0) {
dev_warn(&dev->dev, "Xen PCI: failed to bind GSI%d (PIRQ%d) to IRQ: %d\n",
gsi, pirq, rc);
@@ -96,7 +98,7 @@ static int xen_register_pirq(u32 gsi, int gsi_override, int triggering,
if (gsi_override >= 0)
gsi = gsi_override;

- irq = xen_bind_pirq_gsi_to_irq(gsi, map_irq.pirq, shareable, name);
+ irq = xen_bind_pirq_gsi_to_irq(xh_default, gsi, map_irq.pirq, shareable, name);
if (irq < 0)
goto out;

@@ -180,7 +182,7 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
goto error;
i = 0;
for_each_pci_msi_entry(msidesc, dev) {
- irq = xen_bind_pirq_msi_to_irq(dev, msidesc, v[i],
+ irq = xen_bind_pirq_msi_to_irq(xh_default, dev, msidesc, v[i],
(type == PCI_CAP_ID_MSI) ? nvec : 1,
(type == PCI_CAP_ID_MSIX) ?
"pcifront-msi-x" :
@@ -234,7 +236,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
return 1;

for_each_pci_msi_entry(msidesc, dev) {
- pirq = xen_allocate_pirq_msi(dev, msidesc);
+ pirq = xen_allocate_pirq_msi(xh_default, dev, msidesc);
if (pirq < 0) {
irq = -ENODEV;
goto error;
@@ -242,7 +244,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
xen_msi_compose_msg(dev, pirq, &msg);
__pci_write_msi_msg(msidesc, &msg);
dev_dbg(&dev->dev, "xen: msi bound to pirq=%d\n", pirq);
- irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
+ irq = xen_bind_pirq_msi_to_irq(xh_default, dev, msidesc, pirq,
(type == PCI_CAP_ID_MSI) ? nvec : 1,
(type == PCI_CAP_ID_MSIX) ?
"msi-x" : "msi",
@@ -337,7 +339,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
goto out;
}

- ret = xen_bind_pirq_msi_to_irq(dev, msidesc, map_irq.pirq,
+ ret = xen_bind_pirq_msi_to_irq(xh_default, dev, msidesc, map_irq.pirq,
(type == PCI_CAP_ID_MSI) ? nvec : 1,
(type == PCI_CAP_ID_MSIX) ? "msi-x" : "msi",
domid);
@@ -496,7 +498,7 @@ int __init pci_xen_initial_domain(void)
}
if (0 == nr_ioapics) {
for (irq = 0; irq < nr_legacy_irqs(); irq++)
- xen_bind_pirq_gsi_to_irq(irq, irq, 0, "xt-pic");
+ xen_bind_pirq_gsi_to_irq(xh_default, irq, irq, 0, "xt-pic");
}
return 0;
}
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index c1981a3e4989..efe483ceeb9a 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -266,7 +266,7 @@ static void __init xen_hvm_guest_init(void)
xen_hvm_smp_init();
WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
xen_unplug_emulated_devices();
- x86_init.irqs.intr_init = xen_init_IRQ;
+ x86_init.irqs.intr_init = xenhost_init_IRQ;
xen_hvm_init_time_ops();
xen_hvm_init_mmu_ops();

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index f760a6abfb1e..3267c3505a64 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -170,5 +170,5 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
void __init xen_init_irq_ops(void)
{
pv_ops.irq = xen_irq_ops;
- x86_init.irqs.intr_init = xen_init_IRQ;
+ x86_init.irqs.intr_init = xenhost_init_IRQ;
}
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 867524be0065..c186d868dc5c 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -66,7 +66,7 @@ int xen_smp_intr_init(unsigned int cpu)
char *resched_name, *callfunc_name, *debug_name;

resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
- rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
+ rc = bind_ipi_to_irqhandler(xh_default, XEN_RESCHEDULE_VECTOR,
cpu,
xen_reschedule_interrupt,
IRQF_PERCPU|IRQF_NOBALANCING,
@@ -78,7 +78,7 @@ int xen_smp_intr_init(unsigned int cpu)
per_cpu(xen_resched_irq, cpu).name = resched_name;

callfunc_name = kasprintf(GFP_KERNEL, "callfunc%d", cpu);
- rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_VECTOR,
+ rc = bind_ipi_to_irqhandler(xh_default, XEN_CALL_FUNCTION_VECTOR,
cpu,
xen_call_function_interrupt,
IRQF_PERCPU|IRQF_NOBALANCING,
@@ -90,7 +90,7 @@ int xen_smp_intr_init(unsigned int cpu)
per_cpu(xen_callfunc_irq, cpu).name = callfunc_name;

debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
- rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu, xen_debug_interrupt,
+ rc = bind_virq_to_irqhandler(xh_default, VIRQ_DEBUG, cpu, xen_debug_interrupt,
IRQF_PERCPU | IRQF_NOBALANCING,
debug_name, NULL);
if (rc < 0)
@@ -99,7 +99,7 @@ int xen_smp_intr_init(unsigned int cpu)
per_cpu(xen_debug_irq, cpu).name = debug_name;

callfunc_name = kasprintf(GFP_KERNEL, "callfuncsingle%d", cpu);
- rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_SINGLE_VECTOR,
+ rc = bind_ipi_to_irqhandler(xh_default, XEN_CALL_FUNCTION_SINGLE_VECTOR,
cpu,
xen_call_function_single_interrupt,
IRQF_PERCPU|IRQF_NOBALANCING,
@@ -155,7 +155,7 @@ void __init xen_smp_cpus_done(unsigned int max_cpus)

void xen_smp_send_reschedule(int cpu)
{
- xen_send_IPI_one(cpu, XEN_RESCHEDULE_VECTOR);
+ xen_send_IPI_one(xh_default, cpu, XEN_RESCHEDULE_VECTOR);
}

static void __xen_send_IPI_mask(const struct cpumask *mask,
@@ -164,7 +164,7 @@ static void __xen_send_IPI_mask(const struct cpumask *mask,
unsigned cpu;

for_each_cpu_and(cpu, mask, cpu_online_mask)
- xen_send_IPI_one(cpu, vector);
+ xen_send_IPI_one(xh_default, cpu, vector);
}

void xen_smp_send_call_function_ipi(const struct cpumask *mask)
@@ -242,7 +242,7 @@ void xen_send_IPI_self(int vector)
int xen_vector = xen_map_vector(vector);

if (xen_vector >= 0)
- xen_send_IPI_one(smp_processor_id(), xen_vector);
+ xen_send_IPI_one(xh_default, smp_processor_id(), xen_vector);
}

void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
@@ -259,7 +259,7 @@ void xen_send_IPI_mask_allbutself(const struct cpumask *mask,
if (this_cpu == cpu)
continue;

- xen_send_IPI_one(cpu, xen_vector);
+ xen_send_IPI_one(xh_default, cpu, xen_vector);
}
}

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index f4ea9eac8b6a..f8292be25d52 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -116,7 +116,7 @@ int xen_smp_intr_init_pv(unsigned int cpu)
char *callfunc_name, *pmu_name;

callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
- rc = bind_ipi_to_irqhandler(XEN_IRQ_WORK_VECTOR,
+ rc = bind_ipi_to_irqhandler(xh_default, XEN_IRQ_WORK_VECTOR,
cpu,
xen_irq_work_interrupt,
IRQF_PERCPU|IRQF_NOBALANCING,
@@ -129,7 +129,7 @@ int xen_smp_intr_init_pv(unsigned int cpu)

if (is_xen_pmu(cpu)) {
pmu_name = kasprintf(GFP_KERNEL, "pmu%d", cpu);
- rc = bind_virq_to_irqhandler(VIRQ_XENPMU, cpu,
+ rc = bind_virq_to_irqhandler(xh_default, VIRQ_XENPMU, cpu,
xen_pmu_irq_handler,
IRQF_PERCPU|IRQF_NOBALANCING,
pmu_name, NULL);
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 217bc4de07ee..2f7ff3272d5d 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -340,11 +340,12 @@ void xen_setup_timer(int cpu)

snprintf(xevt->name, sizeof(xevt->name), "timer%d", cpu);

- irq = bind_virq_to_irqhandler(VIRQ_TIMER, cpu, xen_timer_interrupt,
+ irq = bind_virq_to_irqhandler(xh_default,
+ VIRQ_TIMER, cpu, xen_timer_interrupt,
IRQF_PERCPU|IRQF_NOBALANCING|IRQF_TIMER|
IRQF_FORCE_RESUME|IRQF_EARLY_RESUME,
xevt->name, NULL);
- (void)xen_set_irq_priority(irq, XEN_IRQ_PRIORITY_MAX);
+ (void)xen_set_irq_priority(xh_default, irq, XEN_IRQ_PRIORITY_MAX);

memcpy(evt, xen_clockevent, sizeof(*evt));

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 96fd7edea7e9..4619808f1640 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -78,6 +78,7 @@ extern int xen_have_vcpu_info_placement;
int xen_vcpu_setup(xenhost_t *xh, int cpu);
void xen_vcpu_info_reset(xenhost_t *xh, int cpu);
void xen_setup_vcpu_info_placement(void);
+void xenhost_init_IRQ(void);

#ifdef CONFIG_SMP
void xen_smp_init(void);
diff --git a/arch/x86/xen/xenhost.c b/arch/x86/xen/xenhost.c
index 3d8ccef89dcd..3bbfd0654833 100644
--- a/arch/x86/xen/xenhost.c
+++ b/arch/x86/xen/xenhost.c
@@ -2,6 +2,7 @@
#include <linux/bug.h>
#include <xen/xen.h>
#include <xen/xenhost.h>
+#include <xen/events.h>
#include "xen-ops.h"

/*
@@ -84,3 +85,18 @@ void __xenhost_unregister(enum xenhost_type type)
BUG();
}
}
+
+void xenhost_init_IRQ(void)
+{
+ xenhost_t **xh;
+ /*
+ * xenhost_init_IRQ is called via x86_init.irq.intr_init().
+ * For xenhost_r1 and xenhost_r2, the underlying state is
+ * ready so we can go ahead and init both the variants.
+ *
+ * xenhost_r0, might be implemented via a loadable module
+ * so that would do this initialization explicitly.
+ */
+ for_each_xenhost(xh)
+ xen_init_IRQ(*xh);
+}
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..beea4272cfd3 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -228,7 +228,7 @@ static int xen_blkif_map(struct xen_blkif_ring *ring, grant_ref_t *gref,
BUG();
}

- err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
+ err = bind_interdomain_evtchn_to_irqhandler(xh_default, blkif->domid, evtchn,
xen_blkif_be_int, 0,
"blkif-backend", ring);
if (err < 0) {
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 0ed4b200fa58..a06716424023 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1700,7 +1700,7 @@ static int setup_blkring(struct xenbus_device *dev,
if (err)
goto fail;

- err = bind_evtchn_to_irqhandler(rinfo->evtchn, blkif_interrupt, 0,
+ err = bind_evtchn_to_irqhandler(xh_default, rinfo->evtchn, blkif_interrupt, 0,
"blkif", rinfo);
if (err <= 0) {
xenbus_dev_fatal(dev, err,
diff --git a/drivers/input/misc/xen-kbdfront.c b/drivers/input/misc/xen-kbdfront.c
index 24bc5c5d876f..47c6e499fe31 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -435,7 +435,7 @@ static int xenkbd_connect_backend(struct xenbus_device *dev,
ret = xenbus_alloc_evtchn(dev, &evtchn);
if (ret)
goto error_grant;
- ret = bind_evtchn_to_irqhandler(evtchn, input_handler,
+ ret = bind_evtchn_to_irqhandler(xh_default, evtchn, input_handler,
0, dev->devicetype, info);
if (ret < 0) {
xenbus_dev_fatal(dev, ret, "bind_evtchn_to_irqhandler");
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 182d6770f102..53d4e6351f1e 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -588,7 +588,7 @@ int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
shared = (struct xen_netif_ctrl_sring *)addr;
BACK_RING_INIT(&vif->ctrl, shared, XEN_PAGE_SIZE);

- err = bind_interdomain_evtchn_to_irq(vif->domid, evtchn);
+ err = bind_interdomain_evtchn_to_irq(xh_default, vif->domid, evtchn);
if (err < 0)
goto err_unmap;

@@ -646,7 +646,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,

if (tx_evtchn == rx_evtchn) {
/* feature-split-event-channels == 0 */
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler(xh_default,
queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
queue->name, queue);
if (err < 0)
@@ -657,7 +657,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
/* feature-split-event-channels == 1 */
snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
"%s-tx", queue->name);
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler(xh_default,
queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
queue->tx_irq_name, queue);
if (err < 0)
@@ -667,7 +667,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,

snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
"%s-rx", queue->name);
- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler(xh_default,
queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
queue->rx_irq_name, queue);
if (err < 0)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c914c24f880b..1cd0a2d2ba54 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1468,7 +1468,7 @@ static int setup_netfront_single(struct netfront_queue *queue)
if (err < 0)
goto fail;

- err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
+ err = bind_evtchn_to_irqhandler(xh_default, queue->tx_evtchn,
xennet_interrupt,
0, queue->info->netdev->name, queue);
if (err < 0)
@@ -1498,7 +1498,7 @@ static int setup_netfront_split(struct netfront_queue *queue)

snprintf(queue->tx_irq_name, sizeof(queue->tx_irq_name),
"%s-tx", queue->name);
- err = bind_evtchn_to_irqhandler(queue->tx_evtchn,
+ err = bind_evtchn_to_irqhandler(xh_default, queue->tx_evtchn,
xennet_tx_interrupt,
0, queue->tx_irq_name, queue);
if (err < 0)
@@ -1507,7 +1507,7 @@ static int setup_netfront_split(struct netfront_queue *queue)

snprintf(queue->rx_irq_name, sizeof(queue->rx_irq_name),
"%s-rx", queue->name);
- err = bind_evtchn_to_irqhandler(queue->rx_evtchn,
+ err = bind_evtchn_to_irqhandler(xh_default, queue->rx_evtchn,
xennet_rx_interrupt,
0, queue->rx_irq_name, queue);
if (err < 0)
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index eba6e33147a2..f894290e8b3a 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -800,7 +800,7 @@ static int pcifront_publish_info(struct pcifront_device *pdev)
if (err)
goto out;

- err = bind_evtchn_to_irqhandler(pdev->evtchn, pcifront_handler_aer,
+ err = bind_evtchn_to_irqhandler(xh_default, pdev->evtchn, pcifront_handler_aer,
0, "pcifront", pdev);

if (err < 0)
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..a959fce175f8 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,8 @@
* IN THE SOFTWARE.
*/

+#include <linux/types.h>
+#include <xen/interface/xen.h>
#include <xen/acpi.h>
#include <xen/interface/platform.h>
#include <asm/xen/hypercall.h>
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index ceb5048de9a7..5ef4d6ad920d 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -62,11 +62,11 @@
#include <asm/pgtable.h>
#include <asm/tlb.h>

+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

#include <xen/xen.h>
-#include <xen/interface/xen.h>
#include <xen/interface/memory.h>
#include <xen/balloon.h>
#include <xen/features.h>
diff --git a/drivers/xen/events/Makefile b/drivers/xen/events/Makefile
index 62be55cd981d..08179fe04612 100644
--- a/drivers/xen/events/Makefile
+++ b/drivers/xen/events/Makefile
@@ -2,4 +2,3 @@ obj-y += events.o

events-y += events_base.o
events-y += events_2l.o
-events-y += events_fifo.o
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index f09dbe4e9c33..c69d7a5b3dff 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -40,50 +40,52 @@

#define EVTCHN_MASK_SIZE (EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD)

-static DEFINE_PER_CPU(xen_ulong_t [EVTCHN_MASK_SIZE], cpu_evtchn_mask);
+static DEFINE_PER_CPU(xen_ulong_t [2][EVTCHN_MASK_SIZE], cpu_evtchn_mask);

-static unsigned evtchn_2l_max_channels(void)
+static unsigned evtchn_2l_max_channels(xenhost_t *xh)
{
return EVTCHN_2L_NR_CHANNELS;
}

static void evtchn_2l_bind_to_cpu(struct irq_info *info, unsigned cpu)
{
- clear_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, info->cpu)));
- set_bit(info->evtchn, BM(per_cpu(cpu_evtchn_mask, cpu)));
+ clear_bit(info->evtchn,
+ BM(per_cpu(cpu_evtchn_mask, info->cpu))[info->xh - xenhosts]);
+ set_bit(info->evtchn,
+ BM(per_cpu(cpu_evtchn_mask, cpu))[info->xh - xenhosts]);
}

-static void evtchn_2l_clear_pending(unsigned port)
+static void evtchn_2l_clear_pending(xenhost_t *xh, unsigned port)
{
struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_clear_bit(port, BM(&s->evtchn_pending[0]));
}

-static void evtchn_2l_set_pending(unsigned port)
+static void evtchn_2l_set_pending(xenhost_t *xh, unsigned port)
{
struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_set_bit(port, BM(&s->evtchn_pending[0]));
}

-static bool evtchn_2l_is_pending(unsigned port)
+static bool evtchn_2l_is_pending(xenhost_t *xh, unsigned port)
{
- struct shared_info *s = xh_default->HYPERVISOR_shared_info;
+ struct shared_info *s = xh->HYPERVISOR_shared_info;
return sync_test_bit(port, BM(&s->evtchn_pending[0]));
}

-static bool evtchn_2l_test_and_set_mask(unsigned port)
+static bool evtchn_2l_test_and_set_mask(xenhost_t *xh, unsigned port)
{
- struct shared_info *s = xh_default->HYPERVISOR_shared_info;
+ struct shared_info *s = xh->HYPERVISOR_shared_info;
return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
}

-static void evtchn_2l_mask(unsigned port)
+static void evtchn_2l_mask(xenhost_t *xh, unsigned port)
{
struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_set_bit(port, BM(&s->evtchn_mask[0]));
}

-static void evtchn_2l_unmask(unsigned port)
+static void evtchn_2l_unmask(xenhost_t *xh, unsigned port)
{
struct shared_info *s = xh_default->HYPERVISOR_shared_info;
unsigned int cpu = get_cpu();
@@ -91,7 +93,7 @@ static void evtchn_2l_unmask(unsigned port)

BUG_ON(!irqs_disabled());

- if (unlikely((cpu != cpu_from_evtchn(port))))
+ if (unlikely((cpu != cpu_from_evtchn(xh, port))))
do_hypercall = 1;
else {
/*
@@ -116,9 +118,9 @@ static void evtchn_2l_unmask(unsigned port)
* their own implementation of irq_enable). */
if (do_hypercall) {
struct evtchn_unmask unmask = { .port = port };
- (void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask, &unmask);
+ (void)hypervisor_event_channel_op(xh, EVTCHNOP_unmask, &unmask);
} else {
- struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
+ struct vcpu_info *vcpu_info = xh->xen_vcpu[cpu];

/*
* The following is basically the equivalent of
@@ -134,8 +136,8 @@ static void evtchn_2l_unmask(unsigned port)
put_cpu();
}

-static DEFINE_PER_CPU(unsigned int, current_word_idx);
-static DEFINE_PER_CPU(unsigned int, current_bit_idx);
+static DEFINE_PER_CPU(unsigned int [2], current_word_idx);
+static DEFINE_PER_CPU(unsigned int [2], current_bit_idx);

/*
* Mask out the i least significant bits of w
@@ -143,11 +145,12 @@ static DEFINE_PER_CPU(unsigned int, current_bit_idx);
#define MASK_LSBS(w, i) (w & ((~((xen_ulong_t)0UL)) << i))

static inline xen_ulong_t active_evtchns(unsigned int cpu,
+ xenhost_t *xh,
struct shared_info *sh,
unsigned int idx)
{
return sh->evtchn_pending[idx] &
- per_cpu(cpu_evtchn_mask, cpu)[idx] &
+ per_cpu(cpu_evtchn_mask, cpu)[xh - xenhosts][idx] &
~sh->evtchn_mask[idx];
}

@@ -159,7 +162,7 @@ static inline xen_ulong_t active_evtchns(unsigned int cpu,
* a bitset of words which contain pending event bits. The second
* level is a bitset of pending events themselves.
*/
-static void evtchn_2l_handle_events(unsigned cpu)
+static void evtchn_2l_handle_events(xenhost_t *xh, unsigned cpu)
{
int irq;
xen_ulong_t pending_words;
@@ -167,8 +170,8 @@ static void evtchn_2l_handle_events(unsigned cpu)
int start_word_idx, start_bit_idx;
int word_idx, bit_idx;
int i;
- struct shared_info *s = xh_default->HYPERVISOR_shared_info;
- struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
+ struct shared_info *s = xh->HYPERVISOR_shared_info;
+ struct vcpu_info *vcpu_info = xh->xen_vcpu[cpu];

/* Timer interrupt has highest priority. */
irq = irq_from_virq(cpu, VIRQ_TIMER);
@@ -176,7 +179,7 @@ static void evtchn_2l_handle_events(unsigned cpu)
unsigned int evtchn = evtchn_from_irq(irq);
word_idx = evtchn / BITS_PER_LONG;
bit_idx = evtchn % BITS_PER_LONG;
- if (active_evtchns(cpu, s, word_idx) & (1ULL << bit_idx))
+ if (active_evtchns(cpu, xh, s, word_idx) & (1ULL << bit_idx))
generic_handle_irq(irq);
}

@@ -187,8 +190,8 @@ static void evtchn_2l_handle_events(unsigned cpu)
*/
pending_words = xchg_xen_ulong(&vcpu_info->evtchn_pending_sel, 0);

- start_word_idx = __this_cpu_read(current_word_idx);
- start_bit_idx = __this_cpu_read(current_bit_idx);
+ start_word_idx = __this_cpu_read(current_word_idx[xh - xenhosts]);
+ start_bit_idx = __this_cpu_read(current_bit_idx[xh - xenhosts]);

word_idx = start_word_idx;

@@ -207,7 +210,7 @@ static void evtchn_2l_handle_events(unsigned cpu)
}
word_idx = EVTCHN_FIRST_BIT(words);

- pending_bits = active_evtchns(cpu, s, word_idx);
+ pending_bits = active_evtchns(cpu, xh, s, word_idx);
bit_idx = 0; /* usually scan entire word from start */
/*
* We scan the starting word in two parts.
@@ -240,7 +243,7 @@ static void evtchn_2l_handle_events(unsigned cpu)

/* Process port. */
port = (word_idx * BITS_PER_EVTCHN_WORD) + bit_idx;
- irq = get_evtchn_to_irq(port);
+ irq = get_evtchn_to_irq(xh, port);

if (irq != -1)
generic_handle_irq(irq);
@@ -248,10 +251,10 @@ static void evtchn_2l_handle_events(unsigned cpu)
bit_idx = (bit_idx + 1) % BITS_PER_EVTCHN_WORD;

/* Next caller starts at last processed + 1 */
- __this_cpu_write(current_word_idx,
+ __this_cpu_write(current_word_idx[xh - xenhosts],
bit_idx ? word_idx :
(word_idx+1) % BITS_PER_EVTCHN_WORD);
- __this_cpu_write(current_bit_idx, bit_idx);
+ __this_cpu_write(current_bit_idx[xh - xenhosts], bit_idx);
} while (bit_idx != 0);

/* Scan start_l1i twice; all others once. */
@@ -266,78 +269,81 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
{
struct shared_info *sh = xh_default->HYPERVISOR_shared_info;
int cpu = smp_processor_id();
- xen_ulong_t *cpu_evtchn = per_cpu(cpu_evtchn_mask, cpu);
+ xen_ulong_t *cpu_evtchn;
int i;
unsigned long flags;
static DEFINE_SPINLOCK(debug_lock);
struct vcpu_info *v;
+ xenhost_t **xh;

spin_lock_irqsave(&debug_lock, flags);

printk("\nvcpu %d\n ", cpu);

- for_each_online_cpu(i) {
- int pending;
- v = per_cpu(xen_vcpu, i);
- pending = (get_irq_regs() && i == cpu)
- ? xen_irqs_disabled(get_irq_regs())
- : v->evtchn_upcall_mask;
- printk("%d: masked=%d pending=%d event_sel %0*"PRI_xen_ulong"\n ", i,
- pending, v->evtchn_upcall_pending,
- (int)(sizeof(v->evtchn_pending_sel)*2),
- v->evtchn_pending_sel);
- }
- v = per_cpu(xen_vcpu, cpu);
+ for_each_xenhost(xh) {
+ cpu_evtchn = per_cpu(cpu_evtchn_mask, cpu)[(*xh) - xenhosts];
+ for_each_online_cpu(i) {
+ int pending;
+ v = (*xh)->xen_vcpu[i];
+ pending = (get_irq_regs() && i == cpu)
+ ? xen_irqs_disabled(get_irq_regs())
+ : v->evtchn_upcall_mask;
+ printk("%d: masked=%d pending=%d event_sel %0*"PRI_xen_ulong"\n ", i,
+ pending, v->evtchn_upcall_pending,
+ (int)(sizeof(v->evtchn_pending_sel)*2),
+ v->evtchn_pending_sel);
+ }
+ v = (*xh)->xen_vcpu[cpu];

- printk("\npending:\n ");
- for (i = ARRAY_SIZE(sh->evtchn_pending)-1; i >= 0; i--)
- printk("%0*"PRI_xen_ulong"%s",
- (int)sizeof(sh->evtchn_pending[0])*2,
- sh->evtchn_pending[i],
- i % 8 == 0 ? "\n " : " ");
- printk("\nglobal mask:\n ");
- for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
- printk("%0*"PRI_xen_ulong"%s",
- (int)(sizeof(sh->evtchn_mask[0])*2),
- sh->evtchn_mask[i],
- i % 8 == 0 ? "\n " : " ");
+ printk("\npending:\n ");
+ for (i = ARRAY_SIZE(sh->evtchn_pending)-1; i >= 0; i--)
+ printk("%0*"PRI_xen_ulong"%s",
+ (int)sizeof(sh->evtchn_pending[0])*2,
+ sh->evtchn_pending[i],
+ i % 8 == 0 ? "\n " : " ");
+ printk("\nglobal mask:\n ");
+ for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
+ printk("%0*"PRI_xen_ulong"%s",
+ (int)(sizeof(sh->evtchn_mask[0])*2),
+ sh->evtchn_mask[i],
+ i % 8 == 0 ? "\n " : " ");

- printk("\nglobally unmasked:\n ");
- for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
- printk("%0*"PRI_xen_ulong"%s",
- (int)(sizeof(sh->evtchn_mask[0])*2),
- sh->evtchn_pending[i] & ~sh->evtchn_mask[i],
- i % 8 == 0 ? "\n " : " ");
+ printk("\nglobally unmasked:\n ");
+ for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--)
+ printk("%0*"PRI_xen_ulong"%s",
+ (int)(sizeof(sh->evtchn_mask[0])*2),
+ sh->evtchn_pending[i] & ~sh->evtchn_mask[i],
+ i % 8 == 0 ? "\n " : " ");
+ printk("\nlocal cpu%d mask:\n ", cpu);
+ for (i = (EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
+ printk("%0*"PRI_xen_ulong"%s", (int)(sizeof(cpu_evtchn[0])*2),
+ cpu_evtchn[i],
+ i % 8 == 0 ? "\n " : " ");

- printk("\nlocal cpu%d mask:\n ", cpu);
- for (i = (EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD)-1; i >= 0; i--)
- printk("%0*"PRI_xen_ulong"%s", (int)(sizeof(cpu_evtchn[0])*2),
- cpu_evtchn[i],
- i % 8 == 0 ? "\n " : " ");
+ printk("\nlocally unmasked:\n ");
+ for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--) {
+ xen_ulong_t pending = sh->evtchn_pending[i]
+ & ~sh->evtchn_mask[i]
+ & cpu_evtchn[i];
+ printk("%0*"PRI_xen_ulong"%s",
+ (int)(sizeof(sh->evtchn_mask[0])*2),
+ pending, i % 8 == 0 ? "\n " : " ");
+ }

- printk("\nlocally unmasked:\n ");
- for (i = ARRAY_SIZE(sh->evtchn_mask)-1; i >= 0; i--) {
- xen_ulong_t pending = sh->evtchn_pending[i]
- & ~sh->evtchn_mask[i]
- & cpu_evtchn[i];
- printk("%0*"PRI_xen_ulong"%s",
- (int)(sizeof(sh->evtchn_mask[0])*2),
- pending, i % 8 == 0 ? "\n " : " ");
- }
-
- printk("\npending list:\n");
- for (i = 0; i < EVTCHN_2L_NR_CHANNELS; i++) {
- if (sync_test_bit(i, BM(sh->evtchn_pending))) {
- int word_idx = i / BITS_PER_EVTCHN_WORD;
- printk(" %d: event %d -> irq %d%s%s%s\n",
- cpu_from_evtchn(i), i,
- get_evtchn_to_irq(i),
- sync_test_bit(word_idx, BM(&v->evtchn_pending_sel))
- ? "" : " l2-clear",
- !sync_test_bit(i, BM(sh->evtchn_mask))
- ? "" : " globally-masked",
- sync_test_bit(i, BM(cpu_evtchn))
- ? "" : " locally-masked");
+ printk("\npending list:\n");
+ for (i = 0; i < EVTCHN_2L_NR_CHANNELS; i++) {
+ if (sync_test_bit(i, BM(sh->evtchn_pending))) {
+ int word_idx = i / BITS_PER_EVTCHN_WORD;
+ printk(" %d: event %d -> irq %d%s%s%s\n",
+ cpu_from_evtchn(*xh, i), i,
+ get_evtchn_to_irq(*xh, i),
+ sync_test_bit(word_idx, BM(&v->evtchn_pending_sel))
+ ? "" : " l2-clear",
+ !sync_test_bit(i, BM(sh->evtchn_mask))
+ ? "" : " globally-masked",
+ sync_test_bit(i, BM(cpu_evtchn))
+ ? "" : " locally-masked");
+ }
}
}

@@ -346,12 +352,12 @@ irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
}

-static void evtchn_2l_resume(void)
+static void evtchn_2l_resume(xenhost_t *xh)
{
int i;

for_each_online_cpu(i)
- memset(per_cpu(cpu_evtchn_mask, i), 0, sizeof(xen_ulong_t) *
+ memset(per_cpu(cpu_evtchn_mask, i)[xh - xenhosts], 0, sizeof(xen_ulong_t) *
EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
}

@@ -369,8 +375,8 @@ static const struct evtchn_ops evtchn_ops_2l = {
.resume = evtchn_2l_resume,
};

-void __init xen_evtchn_2l_init(void)
+void xen_evtchn_2l_init(xenhost_t *xh)
{
pr_info("Using 2-level ABI\n");
- evtchn_ops = &evtchn_ops_2l;
+ xh->evtchn_ops = &evtchn_ops_2l;
}
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index ae497876fe41..99b6b2c57d23 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -77,15 +77,14 @@ static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
/* IRQ <-> IPI mapping */
static DEFINE_PER_CPU(int [XEN_NR_IPIS], ipi_to_irq) = {[0 ... XEN_NR_IPIS-1] = -1};

-int **evtchn_to_irq;
#ifdef CONFIG_X86
static unsigned long *pirq_eoi_map;
#endif
static bool (*pirq_needs_eoi)(unsigned irq);

-#define EVTCHN_ROW(e) (e / (PAGE_SIZE/sizeof(**evtchn_to_irq)))
-#define EVTCHN_COL(e) (e % (PAGE_SIZE/sizeof(**evtchn_to_irq)))
-#define EVTCHN_PER_ROW (PAGE_SIZE / sizeof(**evtchn_to_irq))
+#define EVTCHN_ROW(xh, e) (e / (PAGE_SIZE/sizeof(**((xh)->evtchn_to_irq))))
+#define EVTCHN_COL(xh, e) (e % (PAGE_SIZE/sizeof(**((xh)->evtchn_to_irq))))
+#define EVTCHN_PER_ROW(xh) (PAGE_SIZE / sizeof(**((xh)->evtchn_to_irq)))

/* Xen will never allocate port zero for any purpose. */
#define VALID_EVTCHN(chn) ((chn) != 0)
@@ -96,59 +95,62 @@ static struct irq_chip xen_pirq_chip;
static void enable_dynirq(struct irq_data *data);
static void disable_dynirq(struct irq_data *data);

-static void clear_evtchn_to_irq_row(unsigned row)
+static void clear_evtchn_to_irq_row(xenhost_t *xh, unsigned row)
{
unsigned col;

- for (col = 0; col < EVTCHN_PER_ROW; col++)
- evtchn_to_irq[row][col] = -1;
+ for (col = 0; col < EVTCHN_PER_ROW(xh); col++)
+ xh->evtchn_to_irq[row][col] = -1;
}

static void clear_evtchn_to_irq_all(void)
{
unsigned row;
+ xenhost_t **xh;

- for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
- if (evtchn_to_irq[row] == NULL)
- continue;
- clear_evtchn_to_irq_row(row);
+ for_each_xenhost(xh) {
+ for (row = 0; row < EVTCHN_ROW(*xh, xen_evtchn_max_channels(*xh)); row++) {
+ if ((*xh)->evtchn_to_irq[row] == NULL)
+ continue;
+ clear_evtchn_to_irq_row(*xh, row);
+ }
}
}

-static int set_evtchn_to_irq(unsigned evtchn, unsigned irq)
+static int set_evtchn_to_irq(xenhost_t *xh, unsigned evtchn, unsigned irq)
{
unsigned row;
unsigned col;

- if (evtchn >= xen_evtchn_max_channels())
+ if (evtchn >= xen_evtchn_max_channels(xh))
return -EINVAL;

- row = EVTCHN_ROW(evtchn);
- col = EVTCHN_COL(evtchn);
+ row = EVTCHN_ROW(xh, evtchn);
+ col = EVTCHN_COL(xh, evtchn);

- if (evtchn_to_irq[row] == NULL) {
+ if (xh->evtchn_to_irq[row] == NULL) {
/* Unallocated irq entries return -1 anyway */
if (irq == -1)
return 0;

- evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
- if (evtchn_to_irq[row] == NULL)
+ xh->evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
+ if (xh->evtchn_to_irq[row] == NULL)
return -ENOMEM;

- clear_evtchn_to_irq_row(row);
+ clear_evtchn_to_irq_row(xh, row);
}

- evtchn_to_irq[row][col] = irq;
+ xh->evtchn_to_irq[row][col] = irq;
return 0;
}

-int get_evtchn_to_irq(unsigned evtchn)
+int get_evtchn_to_irq(xenhost_t *xh, unsigned evtchn)
{
- if (evtchn >= xen_evtchn_max_channels())
+ if (evtchn >= xen_evtchn_max_channels(xh))
return -1;
- if (evtchn_to_irq[EVTCHN_ROW(evtchn)] == NULL)
+ if (xh->evtchn_to_irq[EVTCHN_ROW(xh, evtchn)] == NULL)
return -1;
- return evtchn_to_irq[EVTCHN_ROW(evtchn)][EVTCHN_COL(evtchn)];
+ return xh->evtchn_to_irq[EVTCHN_ROW(xh, evtchn)][EVTCHN_COL(xh, evtchn)];
}

/* Get info for IRQ */
@@ -159,6 +161,7 @@ struct irq_info *info_for_irq(unsigned irq)

/* Constructors for packed IRQ information. */
static int xen_irq_info_common_setup(struct irq_info *info,
+ xenhost_t *xh,
unsigned irq,
enum xen_irq_type type,
unsigned evtchn,
@@ -173,7 +176,7 @@ static int xen_irq_info_common_setup(struct irq_info *info,
info->evtchn = evtchn;
info->cpu = cpu;

- ret = set_evtchn_to_irq(evtchn, irq);
+ ret = set_evtchn_to_irq(xh, evtchn, irq);
if (ret < 0)
return ret;

@@ -182,29 +185,34 @@ static int xen_irq_info_common_setup(struct irq_info *info,
return xen_evtchn_port_setup(info);
}

-static int xen_irq_info_evtchn_setup(unsigned irq,
+static int xen_irq_info_evtchn_setup(xenhost_t *xh,
+ unsigned irq,
unsigned evtchn)
{
struct irq_info *info = info_for_irq(irq);

- return xen_irq_info_common_setup(info, irq, IRQT_EVTCHN, evtchn, 0);
+ return xen_irq_info_common_setup(info, xh, irq, IRQT_EVTCHN, evtchn, 0);
}

-static int xen_irq_info_ipi_setup(unsigned cpu,
+static int xen_irq_info_ipi_setup(xenhost_t *xh,
+ unsigned cpu,
unsigned irq,
unsigned evtchn,
enum ipi_vector ipi)
{
struct irq_info *info = info_for_irq(irq);

+ BUG_ON(xh->type != xenhost_r1);
+
info->u.ipi = ipi;

per_cpu(ipi_to_irq, cpu)[ipi] = irq;

- return xen_irq_info_common_setup(info, irq, IRQT_IPI, evtchn, 0);
+ return xen_irq_info_common_setup(info, xh, irq, IRQT_IPI, evtchn, 0);
}

-static int xen_irq_info_virq_setup(unsigned cpu,
+static int xen_irq_info_virq_setup(xenhost_t *xh,
+ unsigned cpu,
unsigned irq,
unsigned evtchn,
unsigned virq)
@@ -215,10 +223,11 @@ static int xen_irq_info_virq_setup(unsigned cpu,

per_cpu(virq_to_irq, cpu)[virq] = irq;

- return xen_irq_info_common_setup(info, irq, IRQT_VIRQ, evtchn, 0);
+ return xen_irq_info_common_setup(info, xh, irq, IRQT_VIRQ, evtchn, 0);
}

-static int xen_irq_info_pirq_setup(unsigned irq,
+static int xen_irq_info_pirq_setup(xenhost_t *xh,
+ unsigned irq,
unsigned evtchn,
unsigned pirq,
unsigned gsi,
@@ -232,12 +241,12 @@ static int xen_irq_info_pirq_setup(unsigned irq,
info->u.pirq.domid = domid;
info->u.pirq.flags = flags;

- return xen_irq_info_common_setup(info, irq, IRQT_PIRQ, evtchn, 0);
+ return xen_irq_info_common_setup(info, xh, irq, IRQT_PIRQ, evtchn, 0);
}

static void xen_irq_info_cleanup(struct irq_info *info)
{
- set_evtchn_to_irq(info->evtchn, -1);
+ set_evtchn_to_irq(info->xh, info->evtchn, -1);
info->evtchn = 0;
}

@@ -252,9 +261,9 @@ unsigned int evtchn_from_irq(unsigned irq)
return info_for_irq(irq)->evtchn;
}

-unsigned irq_from_evtchn(unsigned int evtchn)
+unsigned irq_from_evtchn(xenhost_t *xh, unsigned int evtchn)
{
- return get_evtchn_to_irq(evtchn);
+ return get_evtchn_to_irq(xh, evtchn);
}
EXPORT_SYMBOL_GPL(irq_from_evtchn);

@@ -303,9 +312,9 @@ unsigned cpu_from_irq(unsigned irq)
return info_for_irq(irq)->cpu;
}

-unsigned int cpu_from_evtchn(unsigned int evtchn)
+unsigned int cpu_from_evtchn(xenhost_t *xh, unsigned int evtchn)
{
- int irq = get_evtchn_to_irq(evtchn);
+ int irq = get_evtchn_to_irq(xh, evtchn);
unsigned ret = 0;

if (irq != -1)
@@ -329,9 +338,9 @@ static bool pirq_needs_eoi_flag(unsigned irq)
return info->u.pirq.flags & PIRQ_NEEDS_EOI;
}

-static void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
+static void bind_evtchn_to_cpu(xenhost_t *xh, unsigned int chn, unsigned int cpu)
{
- int irq = get_evtchn_to_irq(chn);
+ int irq = get_evtchn_to_irq(xh, chn);
struct irq_info *info = info_for_irq(irq);

BUG_ON(irq == -1);
@@ -356,11 +365,11 @@ void notify_remote_via_irq(int irq)
int evtchn = evtchn_from_irq(irq);

if (VALID_EVTCHN(evtchn))
- notify_remote_via_evtchn(evtchn);
+ notify_remote_via_evtchn(info_for_irq(irq)->xh, evtchn);
}
EXPORT_SYMBOL_GPL(notify_remote_via_irq);

-static void xen_irq_init(unsigned irq)
+static void xen_irq_init(xenhost_t *xh, unsigned irq)
{
struct irq_info *info;
#ifdef CONFIG_SMP
@@ -374,31 +383,32 @@ static void xen_irq_init(unsigned irq)

info->type = IRQT_UNBOUND;
info->refcnt = -1;
+ info->xh = xh;

irq_set_handler_data(irq, info);

list_add_tail(&info->list, &xen_irq_list_head);
}

-static int __must_check xen_allocate_irqs_dynamic(int nvec)
+static int __must_check xen_allocate_irqs_dynamic(xenhost_t *xh, int nvec)
{
int i, irq = irq_alloc_descs(-1, 0, nvec, -1);

if (irq >= 0) {
for (i = 0; i < nvec; i++)
- xen_irq_init(irq + i);
+ xen_irq_init(xh, irq + i);
}

return irq;
}

-static inline int __must_check xen_allocate_irq_dynamic(void)
+static inline int __must_check xen_allocate_irq_dynamic(xenhost_t *xh)
{

- return xen_allocate_irqs_dynamic(1);
+ return xen_allocate_irqs_dynamic(xh, 1);
}

-static int __must_check xen_allocate_irq_gsi(unsigned gsi)
+static int __must_check xen_allocate_irq_gsi(xenhost_t *xh, unsigned gsi)
{
int irq;

@@ -409,7 +419,7 @@ static int __must_check xen_allocate_irq_gsi(unsigned gsi)
* space.
*/
if (xen_pv_domain() && !xen_initial_domain())
- return xen_allocate_irq_dynamic();
+ return xen_allocate_irq_dynamic(xh);

/* Legacy IRQ descriptors are already allocated by the arch. */
if (gsi < nr_legacy_irqs())
@@ -417,7 +427,7 @@ static int __must_check xen_allocate_irq_gsi(unsigned gsi)
else
irq = irq_alloc_desc_at(gsi, -1);

- xen_irq_init(irq);
+ xen_irq_init(xh, irq);

return irq;
}
@@ -444,12 +454,12 @@ static void xen_free_irq(unsigned irq)
irq_free_desc(irq);
}

-static void xen_evtchn_close(unsigned int port)
+static void xen_evtchn_close(xenhost_t *xh, unsigned int port)
{
struct evtchn_close close;

close.port = port;
- if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+ if (hypervisor_event_channel_op(xh, EVTCHNOP_close, &close) != 0)
BUG();
}

@@ -473,6 +483,7 @@ static void eoi_pirq(struct irq_data *data)
{
int evtchn = evtchn_from_irq(data->irq);
struct physdev_eoi eoi = { .irq = pirq_from_irq(data->irq) };
+ xenhost_t *xh = info_for_irq(data->irq)->xh;
int rc = 0;

if (!VALID_EVTCHN(evtchn))
@@ -480,16 +491,16 @@ static void eoi_pirq(struct irq_data *data)

if (unlikely(irqd_is_setaffinity_pending(data)) &&
likely(!irqd_irq_disabled(data))) {
- int masked = test_and_set_mask(evtchn);
+ int masked = test_and_set_mask(xh, evtchn);

- clear_evtchn(evtchn);
+ clear_evtchn(xh, evtchn);

irq_move_masked_irq(data);

if (!masked)
- unmask_evtchn(evtchn);
+ unmask_evtchn(xh, evtchn);
} else
- clear_evtchn(evtchn);
+ clear_evtchn(xh, evtchn);

if (pirq_needs_eoi(data->irq)) {
rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi);
@@ -519,7 +530,7 @@ static unsigned int __startup_pirq(unsigned int irq)
/* NB. We are happy to share unless we are probing. */
bind_pirq.flags = info->u.pirq.flags & PIRQ_SHAREABLE ?
BIND_PIRQ__WILL_SHARE : 0;
- rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq);
+ rc = hypervisor_event_channel_op(info->xh, EVTCHNOP_bind_pirq, &bind_pirq);
if (rc != 0) {
pr_warn("Failed to obtain physical IRQ %d\n", irq);
return 0;
@@ -528,26 +539,26 @@ static unsigned int __startup_pirq(unsigned int irq)

pirq_query_unmask(irq);

- rc = set_evtchn_to_irq(evtchn, irq);
+ rc = set_evtchn_to_irq(info->xh, evtchn, irq);
if (rc)
goto err;

info->evtchn = evtchn;
- bind_evtchn_to_cpu(evtchn, 0);
+ bind_evtchn_to_cpu(info->xh, evtchn, 0);

rc = xen_evtchn_port_setup(info);
if (rc)
goto err;

out:
- unmask_evtchn(evtchn);
+ unmask_evtchn(info->xh, evtchn);
eoi_pirq(irq_get_irq_data(irq));

return 0;

err:
pr_err("irq%d: Failed to set port to irq mapping (%d)\n", irq, rc);
- xen_evtchn_close(evtchn);
+ xen_evtchn_close(info->xh, evtchn);
return 0;
}

@@ -567,8 +578,8 @@ static void shutdown_pirq(struct irq_data *data)
if (!VALID_EVTCHN(evtchn))
return;

- mask_evtchn(evtchn);
- xen_evtchn_close(evtchn);
+ mask_evtchn(info->xh, evtchn);
+ xen_evtchn_close(info->xh, evtchn);
xen_irq_info_cleanup(info);
}

@@ -612,7 +623,7 @@ static void __unbind_from_irq(unsigned int irq)
if (VALID_EVTCHN(evtchn)) {
unsigned int cpu = cpu_from_irq(irq);

- xen_evtchn_close(evtchn);
+ xen_evtchn_close(info->xh, evtchn);

switch (type_from_irq(irq)) {
case IRQT_VIRQ:
@@ -641,13 +652,15 @@ static void __unbind_from_irq(unsigned int irq)
* Shareable implies level triggered, not shareable implies edge
* triggered here.
*/
-int xen_bind_pirq_gsi_to_irq(unsigned gsi,
+int xen_bind_pirq_gsi_to_irq(xenhost_t *xh, unsigned gsi,
unsigned pirq, int shareable, char *name)
{
int irq = -1;
struct physdev_irq irq_op;
int ret;

+ BUG_ON(xh->type != xenhost_r1);
+
mutex_lock(&irq_mapping_update_lock);

irq = xen_irq_from_gsi(gsi);
@@ -657,7 +670,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
goto out;
}

- irq = xen_allocate_irq_gsi(gsi);
+ irq = xen_allocate_irq_gsi(xh, gsi);
if (irq < 0)
goto out;

@@ -668,13 +681,13 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
* driver provides a PCI bus that does the call to do exactly
* this in the priv domain. */
if (xen_initial_domain() &&
- HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
+ hypervisor_physdev_op(xh, PHYSDEVOP_alloc_irq_vector, &irq_op)) {
xen_free_irq(irq);
irq = -ENOSPC;
goto out;
}

- ret = xen_irq_info_pirq_setup(irq, 0, pirq, gsi, DOMID_SELF,
+ ret = xen_irq_info_pirq_setup(xh, irq, 0, pirq, gsi, DOMID_SELF,
shareable ? PIRQ_SHAREABLE : 0);
if (ret < 0) {
__unbind_from_irq(irq);
@@ -712,13 +725,13 @@ int xen_bind_pirq_gsi_to_irq(unsigned gsi,
}

#ifdef CONFIG_PCI_MSI
-int xen_allocate_pirq_msi(struct pci_dev *dev, struct msi_desc *msidesc)
+int xen_allocate_pirq_msi(xenhost_t *xh, struct pci_dev *dev, struct msi_desc *msidesc)
{
int rc;
struct physdev_get_free_pirq op_get_free_pirq;

op_get_free_pirq.type = MAP_PIRQ_TYPE_MSI;
- rc = HYPERVISOR_physdev_op(PHYSDEVOP_get_free_pirq, &op_get_free_pirq);
+ rc = hypervisor_physdev_op(xh, PHYSDEVOP_get_free_pirq, &op_get_free_pirq);

WARN_ONCE(rc == -ENOSYS,
"hypervisor does not support the PHYSDEVOP_get_free_pirq interface\n");
@@ -726,21 +739,21 @@ int xen_allocate_pirq_msi(struct pci_dev *dev, struct msi_desc *msidesc)
return rc ? -1 : op_get_free_pirq.pirq;
}

-int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct msi_desc *msidesc,
+int xen_bind_pirq_msi_to_irq(xenhost_t *xh, struct pci_dev *dev, struct msi_desc *msidesc,
int pirq, int nvec, const char *name, domid_t domid)
{
int i, irq, ret;

mutex_lock(&irq_mapping_update_lock);

- irq = xen_allocate_irqs_dynamic(nvec);
+ irq = xen_allocate_irqs_dynamic(xh, nvec);
if (irq < 0)
goto out;

for (i = 0; i < nvec; i++) {
irq_set_chip_and_handler_name(irq + i, &xen_pirq_chip, handle_edge_irq, name);

- ret = xen_irq_info_pirq_setup(irq + i, 0, pirq + i, 0, domid,
+ ret = xen_irq_info_pirq_setup(xh, irq + i, 0, pirq + i, 0, domid,
i == 0 ? 0 : PIRQ_MSI_GROUP);
if (ret < 0)
goto error_irq;
@@ -776,7 +789,7 @@ int xen_destroy_irq(int irq)
if (xen_initial_domain() && !(info->u.pirq.flags & PIRQ_MSI_GROUP)) {
unmap_irq.pirq = info->u.pirq.pirq;
unmap_irq.domid = info->u.pirq.domid;
- rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
+ rc = hypervisor_physdev_op(info->xh, PHYSDEVOP_unmap_pirq, &unmap_irq);
/* If another domain quits without making the pci_disable_msix
* call, the Xen hypervisor takes care of freeing the PIRQs
* (free_domain_pirqs).
@@ -826,34 +839,34 @@ int xen_pirq_from_irq(unsigned irq)
}
EXPORT_SYMBOL_GPL(xen_pirq_from_irq);

-int bind_evtchn_to_irq(unsigned int evtchn)
+int bind_evtchn_to_irq(xenhost_t *xh, unsigned int evtchn)
{
int irq;
int ret;

- if (evtchn >= xen_evtchn_max_channels())
+ if (evtchn >= xen_evtchn_max_channels(xh))
return -ENOMEM;

mutex_lock(&irq_mapping_update_lock);

- irq = get_evtchn_to_irq(evtchn);
+ irq = get_evtchn_to_irq(xh, evtchn);

if (irq == -1) {
- irq = xen_allocate_irq_dynamic();
+ irq = xen_allocate_irq_dynamic(xh);
if (irq < 0)
goto out;

irq_set_chip_and_handler_name(irq, &xen_dynamic_chip,
handle_edge_irq, "event");

- ret = xen_irq_info_evtchn_setup(irq, evtchn);
+ ret = xen_irq_info_evtchn_setup(xh, irq, evtchn);
if (ret < 0) {
__unbind_from_irq(irq);
irq = ret;
goto out;
}
/* New interdomain events are bound to VCPU 0. */
- bind_evtchn_to_cpu(evtchn, 0);
+ bind_evtchn_to_cpu(xh, evtchn, 0);
} else {
struct irq_info *info = info_for_irq(irq);
WARN_ON(info == NULL || info->type != IRQT_EVTCHN);
@@ -866,37 +879,39 @@ int bind_evtchn_to_irq(unsigned int evtchn)
}
EXPORT_SYMBOL_GPL(bind_evtchn_to_irq);

-static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
+static int bind_ipi_to_irq(xenhost_t *xh, unsigned int ipi, unsigned int cpu)
{
struct evtchn_bind_ipi bind_ipi;
int evtchn, irq;
int ret;

+ BUG_ON(xh->type == xenhost_r2);
+
mutex_lock(&irq_mapping_update_lock);

irq = per_cpu(ipi_to_irq, cpu)[ipi];

if (irq == -1) {
- irq = xen_allocate_irq_dynamic();
+ irq = xen_allocate_irq_dynamic(xh);
if (irq < 0)
goto out;

irq_set_chip_and_handler_name(irq, &xen_percpu_chip,
handle_percpu_irq, "ipi");

- bind_ipi.vcpu = xen_vcpu_nr(xh_default, cpu);
- if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
+ bind_ipi.vcpu = xen_vcpu_nr(xh, cpu);
+ if (hypervisor_event_channel_op(xh, EVTCHNOP_bind_ipi,
&bind_ipi) != 0)
BUG();
evtchn = bind_ipi.port;

- ret = xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
+ ret = xen_irq_info_ipi_setup(xh, cpu, irq, evtchn, ipi);
if (ret < 0) {
__unbind_from_irq(irq);
irq = ret;
goto out;
}
- bind_evtchn_to_cpu(evtchn, cpu);
+ bind_evtchn_to_cpu(xh, evtchn, cpu);
} else {
struct irq_info *info = info_for_irq(irq);
WARN_ON(info == NULL || info->type != IRQT_IPI);
@@ -907,7 +922,7 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
return irq;
}

-int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irq(xenhost_t *xh, unsigned int remote_domain,
unsigned int remote_port)
{
struct evtchn_bind_interdomain bind_interdomain;
@@ -916,28 +931,28 @@ int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
bind_interdomain.remote_dom = remote_domain;
bind_interdomain.remote_port = remote_port;

- err = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
+ err = hypervisor_event_channel_op(xh, EVTCHNOP_bind_interdomain,
&bind_interdomain);

- return err ? : bind_evtchn_to_irq(bind_interdomain.local_port);
+ return err ? : bind_evtchn_to_irq(xh, bind_interdomain.local_port);
}
EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irq);

-static int find_virq(unsigned int virq, unsigned int cpu)
+static int find_virq(xenhost_t *xh, unsigned int virq, unsigned int cpu)
{
struct evtchn_status status;
int port, rc = -ENOENT;

memset(&status, 0, sizeof(status));
- for (port = 0; port < xen_evtchn_max_channels(); port++) {
+ for (port = 0; port < xen_evtchn_max_channels(xh); port++) {
status.dom = DOMID_SELF;
status.port = port;
- rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
+ rc = hypervisor_event_channel_op(xh, EVTCHNOP_status, &status);
if (rc < 0)
continue;
if (status.status != EVTCHNSTAT_virq)
continue;
- if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(xh_default, cpu)) {
+ if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(xh, cpu)) {
rc = port;
break;
}
@@ -952,13 +967,13 @@ static int find_virq(unsigned int virq, unsigned int cpu)
* hypervisor ABI. Use xen_evtchn_max_channels() for the maximum
* supported.
*/
-unsigned xen_evtchn_nr_channels(void)
+unsigned xen_evtchn_nr_channels(xenhost_t *xh)
{
- return evtchn_ops->nr_channels();
+ return evtchn_ops->nr_channels(xh);
}
EXPORT_SYMBOL_GPL(xen_evtchn_nr_channels);

-int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu)
+int bind_virq_to_irq(xenhost_t *xh, unsigned int virq, unsigned int cpu, bool percpu)
{
struct evtchn_bind_virq bind_virq;
int evtchn, irq, ret;
@@ -968,7 +983,7 @@ int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu)
irq = per_cpu(virq_to_irq, cpu)[virq];

if (irq == -1) {
- irq = xen_allocate_irq_dynamic();
+ irq = xen_allocate_irq_dynamic(xh);
if (irq < 0)
goto out;

@@ -980,26 +995,26 @@ int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu)
handle_edge_irq, "virq");

bind_virq.virq = virq;
- bind_virq.vcpu = xen_vcpu_nr(xh_default, cpu);
- ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+ bind_virq.vcpu = xen_vcpu_nr(xh, cpu);
+ ret = hypervisor_event_channel_op(xh, EVTCHNOP_bind_virq,
&bind_virq);
if (ret == 0)
evtchn = bind_virq.port;
else {
if (ret == -EEXIST)
- ret = find_virq(virq, cpu);
+ ret = find_virq(xh, virq, cpu);
BUG_ON(ret < 0);
evtchn = ret;
}

- ret = xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
+ ret = xen_irq_info_virq_setup(xh, cpu, irq, evtchn, virq);
if (ret < 0) {
__unbind_from_irq(irq);
irq = ret;
goto out;
}

- bind_evtchn_to_cpu(evtchn, cpu);
+ bind_evtchn_to_cpu(xh, evtchn, cpu);
} else {
struct irq_info *info = info_for_irq(irq);
WARN_ON(info == NULL || info->type != IRQT_VIRQ);
@@ -1018,14 +1033,15 @@ static void unbind_from_irq(unsigned int irq)
mutex_unlock(&irq_mapping_update_lock);
}

-int bind_evtchn_to_irqhandler(unsigned int evtchn,
+int bind_evtchn_to_irqhandler(xenhost_t *xh,
+ unsigned int evtchn,
irq_handler_t handler,
unsigned long irqflags,
const char *devname, void *dev_id)
{
int irq, retval;

- irq = bind_evtchn_to_irq(evtchn);
+ irq = bind_evtchn_to_irq(xh, evtchn);
if (irq < 0)
return irq;
retval = request_irq(irq, handler, irqflags, devname, dev_id);
@@ -1038,7 +1054,8 @@ int bind_evtchn_to_irqhandler(unsigned int evtchn,
}
EXPORT_SYMBOL_GPL(bind_evtchn_to_irqhandler);

-int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irqhandler(xenhost_t *xh,
+ unsigned int remote_domain,
unsigned int remote_port,
irq_handler_t handler,
unsigned long irqflags,
@@ -1047,7 +1064,7 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
{
int irq, retval;

- irq = bind_interdomain_evtchn_to_irq(remote_domain, remote_port);
+ irq = bind_interdomain_evtchn_to_irq(xh, remote_domain, remote_port);
if (irq < 0)
return irq;

@@ -1061,13 +1078,14 @@ int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
}
EXPORT_SYMBOL_GPL(bind_interdomain_evtchn_to_irqhandler);

-int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu,
+int bind_virq_to_irqhandler(xenhost_t *xh,
+ unsigned int virq, unsigned int cpu,
irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
{
int irq, retval;

- irq = bind_virq_to_irq(virq, cpu, irqflags & IRQF_PERCPU);
+ irq = bind_virq_to_irq(xh, virq, cpu, irqflags & IRQF_PERCPU);
if (irq < 0)
return irq;
retval = request_irq(irq, handler, irqflags, devname, dev_id);
@@ -1080,7 +1098,8 @@ int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu,
}
EXPORT_SYMBOL_GPL(bind_virq_to_irqhandler);

-int bind_ipi_to_irqhandler(enum ipi_vector ipi,
+int bind_ipi_to_irqhandler(xenhost_t *xh,
+ enum ipi_vector ipi,
unsigned int cpu,
irq_handler_t handler,
unsigned long irqflags,
@@ -1089,7 +1108,7 @@ int bind_ipi_to_irqhandler(enum ipi_vector ipi,
{
int irq, retval;

- irq = bind_ipi_to_irq(ipi, cpu);
+ irq = bind_ipi_to_irq(xh, ipi, cpu);
if (irq < 0)
return irq;

@@ -1119,21 +1138,21 @@ EXPORT_SYMBOL_GPL(unbind_from_irqhandler);
* @irq:irq bound to an event channel.
* @priority: priority between XEN_IRQ_PRIORITY_MAX and XEN_IRQ_PRIORITY_MIN.
*/
-int xen_set_irq_priority(unsigned irq, unsigned priority)
+int xen_set_irq_priority(xenhost_t *xh, unsigned irq, unsigned priority)
{
struct evtchn_set_priority set_priority;

set_priority.port = evtchn_from_irq(irq);
set_priority.priority = priority;

- return HYPERVISOR_event_channel_op(EVTCHNOP_set_priority,
+ return hypervisor_event_channel_op(xh, EVTCHNOP_set_priority,
&set_priority);
}
EXPORT_SYMBOL_GPL(xen_set_irq_priority);

-int evtchn_make_refcounted(unsigned int evtchn)
+int evtchn_make_refcounted(xenhost_t *xh, unsigned int evtchn)
{
- int irq = get_evtchn_to_irq(evtchn);
+ int irq = get_evtchn_to_irq(xh, evtchn);
struct irq_info *info;

if (irq == -1)
@@ -1152,18 +1171,18 @@ int evtchn_make_refcounted(unsigned int evtchn)
}
EXPORT_SYMBOL_GPL(evtchn_make_refcounted);

-int evtchn_get(unsigned int evtchn)
+int evtchn_get(xenhost_t *xh, unsigned int evtchn)
{
int irq;
struct irq_info *info;
int err = -ENOENT;

- if (evtchn >= xen_evtchn_max_channels())
+ if (evtchn >= xen_evtchn_max_channels(xh))
return -EINVAL;

mutex_lock(&irq_mapping_update_lock);

- irq = get_evtchn_to_irq(evtchn);
+ irq = get_evtchn_to_irq(xh, evtchn);
if (irq == -1)
goto done;

@@ -1185,22 +1204,22 @@ int evtchn_get(unsigned int evtchn)
}
EXPORT_SYMBOL_GPL(evtchn_get);

-void evtchn_put(unsigned int evtchn)
+void evtchn_put(xenhost_t *xh, unsigned int evtchn)
{
- int irq = get_evtchn_to_irq(evtchn);
+ int irq = get_evtchn_to_irq(xh, evtchn);
if (WARN_ON(irq == -1))
return;
unbind_from_irq(irq);
}
EXPORT_SYMBOL_GPL(evtchn_put);

-void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
+void xen_send_IPI_one(xenhost_t *xh, unsigned int cpu, enum ipi_vector vector)
{
int irq;

#ifdef CONFIG_X86
if (unlikely(vector == XEN_NMI_VECTOR)) {
- int rc = HYPERVISOR_vcpu_op(VCPUOP_send_nmi, xen_vcpu_nr(xh_default, cpu),
+ int rc = hypervisor_vcpu_op(xh, VCPUOP_send_nmi, xen_vcpu_nr(xh, cpu),
NULL);
if (rc < 0)
printk(KERN_WARNING "Sending nmi to CPU%d failed (rc:%d)\n", cpu, rc);
@@ -1216,23 +1235,26 @@ static DEFINE_PER_CPU(unsigned, xed_nesting_count);

static void __xen_evtchn_do_upcall(void)
{
- struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
int cpu = get_cpu();
unsigned count;
+ xenhost_t **xh;

- do {
- vcpu_info->evtchn_upcall_pending = 0;
+ for_each_xenhost(xh) {
+ struct vcpu_info *vcpu_info = (*xh)->xen_vcpu[cpu];
+ do {
+ vcpu_info->evtchn_upcall_pending = 0;

- if (__this_cpu_inc_return(xed_nesting_count) - 1)
- goto out;
+ if (__this_cpu_inc_return(xed_nesting_count) - 1)
+ goto out;

- xen_evtchn_handle_events(cpu);
+ xen_evtchn_handle_events(*xh, cpu);

- BUG_ON(!irqs_disabled());
+ BUG_ON(!irqs_disabled());

- count = __this_cpu_read(xed_nesting_count);
- __this_cpu_write(xed_nesting_count, 0);
- } while (count != 1 || vcpu_info->evtchn_upcall_pending);
+ count = __this_cpu_read(xed_nesting_count);
+ __this_cpu_write(xed_nesting_count, 0);
+ } while (count != 1 || vcpu_info->evtchn_upcall_pending);
+ }

out:

@@ -1275,16 +1297,16 @@ void rebind_evtchn_irq(int evtchn, int irq)
mutex_lock(&irq_mapping_update_lock);

/* After resume the irq<->evtchn mappings are all cleared out */
- BUG_ON(get_evtchn_to_irq(evtchn) != -1);
+ BUG_ON(get_evtchn_to_irq(info->xh, evtchn) != -1);
/* Expect irq to have been bound before,
so there should be a proper type */
BUG_ON(info->type == IRQT_UNBOUND);

- (void)xen_irq_info_evtchn_setup(irq, evtchn);
+ (void)xen_irq_info_evtchn_setup(info->xh, irq, evtchn);

mutex_unlock(&irq_mapping_update_lock);

- bind_evtchn_to_cpu(evtchn, info->cpu);
+ bind_evtchn_to_cpu(info->xh, evtchn, info->cpu);
/* This will be deferred until interrupt is processed */
irq_set_affinity(irq, cpumask_of(info->cpu));

@@ -1293,7 +1315,7 @@ void rebind_evtchn_irq(int evtchn, int irq)
}

/* Rebind an evtchn so that it gets delivered to a specific cpu */
-int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu)
+int xen_rebind_evtchn_to_cpu(xenhost_t *xh, int evtchn, unsigned tcpu)
{
struct evtchn_bind_vcpu bind_vcpu;
int masked;
@@ -1306,24 +1328,24 @@ int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu)

/* Send future instances of this interrupt to other vcpu. */
bind_vcpu.port = evtchn;
- bind_vcpu.vcpu = xen_vcpu_nr(xh_default, tcpu);
+ bind_vcpu.vcpu = xen_vcpu_nr(xh, tcpu);

/*
* Mask the event while changing the VCPU binding to prevent
* it being delivered on an unexpected VCPU.
*/
- masked = test_and_set_mask(evtchn);
+ masked = test_and_set_mask(xh, evtchn);

/*
* If this fails, it usually just indicates that we're dealing with a
* virq or IPI channel, which don't actually need to be rebound. Ignore
* it, but don't do the xenlinux-level rebind in that case.
*/
- if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_vcpu, &bind_vcpu) >= 0)
- bind_evtchn_to_cpu(evtchn, tcpu);
+ if (hypervisor_event_channel_op(xh, EVTCHNOP_bind_vcpu, &bind_vcpu) >= 0)
+ bind_evtchn_to_cpu(xh, evtchn, tcpu);

if (!masked)
- unmask_evtchn(evtchn);
+ unmask_evtchn(xh, evtchn);

return 0;
}
@@ -1333,7 +1355,10 @@ static int set_affinity_irq(struct irq_data *data, const struct cpumask *dest,
bool force)
{
unsigned tcpu = cpumask_first_and(dest, cpu_online_mask);
- int ret = xen_rebind_evtchn_to_cpu(evtchn_from_irq(data->irq), tcpu);
+ xenhost_t *xh = info_for_irq(data->irq)->xh;
+ int ret;
+
+ ret = xen_rebind_evtchn_to_cpu(xh, evtchn_from_irq(data->irq), tcpu);

if (!ret)
irq_data_update_effective_affinity(data, cpumask_of(tcpu));
@@ -1344,38 +1369,41 @@ static int set_affinity_irq(struct irq_data *data, const struct cpumask *dest,
static void enable_dynirq(struct irq_data *data)
{
int evtchn = evtchn_from_irq(data->irq);
+ xenhost_t *xh = info_for_irq(data->irq)->xh;

if (VALID_EVTCHN(evtchn))
- unmask_evtchn(evtchn);
+ unmask_evtchn(xh, evtchn);
}

static void disable_dynirq(struct irq_data *data)
{
int evtchn = evtchn_from_irq(data->irq);
+ xenhost_t *xh = info_for_irq(data->irq)->xh;

if (VALID_EVTCHN(evtchn))
- mask_evtchn(evtchn);
+ mask_evtchn(xh, evtchn);
}

static void ack_dynirq(struct irq_data *data)
{
int evtchn = evtchn_from_irq(data->irq);
+ xenhost_t *xh = info_for_irq(data->irq)->xh;

if (!VALID_EVTCHN(evtchn))
return;

if (unlikely(irqd_is_setaffinity_pending(data)) &&
likely(!irqd_irq_disabled(data))) {
- int masked = test_and_set_mask(evtchn);
+ int masked = test_and_set_mask(xh, evtchn);

- clear_evtchn(evtchn);
+ clear_evtchn(xh, evtchn);

irq_move_masked_irq(data);

if (!masked)
- unmask_evtchn(evtchn);
+ unmask_evtchn(xh, evtchn);
} else
- clear_evtchn(evtchn);
+ clear_evtchn(xh, evtchn);
}

static void mask_ack_dynirq(struct irq_data *data)
@@ -1387,15 +1415,16 @@ static void mask_ack_dynirq(struct irq_data *data)
static int retrigger_dynirq(struct irq_data *data)
{
unsigned int evtchn = evtchn_from_irq(data->irq);
+ xenhost_t *xh = info_for_irq(data->irq)->xh;
int masked;

if (!VALID_EVTCHN(evtchn))
return 0;

- masked = test_and_set_mask(evtchn);
- set_evtchn(evtchn);
+ masked = test_and_set_mask(xh, evtchn);
+ set_evtchn(xh, evtchn);
if (!masked)
- unmask_evtchn(evtchn);
+ unmask_evtchn(xh, evtchn);

return 1;
}
@@ -1442,24 +1471,26 @@ static void restore_cpu_virqs(unsigned int cpu)
{
struct evtchn_bind_virq bind_virq;
int virq, irq, evtchn;
+ xenhost_t *xh;

for (virq = 0; virq < NR_VIRQS; virq++) {
if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1)
continue;
+ xh = info_for_irq(irq)->xh;

BUG_ON(virq_from_irq(irq) != virq);

/* Get a new binding from Xen. */
bind_virq.virq = virq;
bind_virq.vcpu = xen_vcpu_nr(xh_default, cpu);
- if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+ if (hypervisor_event_channel_op(xh, EVTCHNOP_bind_virq,
&bind_virq) != 0)
BUG();
evtchn = bind_virq.port;

/* Record the new mapping. */
- (void)xen_irq_info_virq_setup(cpu, irq, evtchn, virq);
- bind_evtchn_to_cpu(evtchn, cpu);
+ (void)xen_irq_info_virq_setup(xh, cpu, irq, evtchn, virq);
+ bind_evtchn_to_cpu(xh, evtchn, cpu);
}
}

@@ -1467,23 +1498,25 @@ static void restore_cpu_ipis(unsigned int cpu)
{
struct evtchn_bind_ipi bind_ipi;
int ipi, irq, evtchn;
+ xenhost_t *xh;

for (ipi = 0; ipi < XEN_NR_IPIS; ipi++) {
if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1)
continue;
+ xh = info_for_irq(irq)->xh;

BUG_ON(ipi_from_irq(irq) != ipi);

/* Get a new binding from Xen. */
- bind_ipi.vcpu = xen_vcpu_nr(xh_default, cpu);
- if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
+ bind_ipi.vcpu = xen_vcpu_nr(xh, cpu);
+ if (hypervisor_event_channel_op(xh, EVTCHNOP_bind_ipi,
&bind_ipi) != 0)
BUG();
evtchn = bind_ipi.port;

/* Record the new mapping. */
- (void)xen_irq_info_ipi_setup(cpu, irq, evtchn, ipi);
- bind_evtchn_to_cpu(evtchn, cpu);
+ (void)xen_irq_info_ipi_setup(xh, cpu, irq, evtchn, ipi);
+ bind_evtchn_to_cpu(xh, evtchn, cpu);
}
}

@@ -1491,26 +1524,29 @@ static void restore_cpu_ipis(unsigned int cpu)
void xen_clear_irq_pending(int irq)
{
int evtchn = evtchn_from_irq(irq);
+ xenhost_t *xh = info_for_irq(irq)->xh;

if (VALID_EVTCHN(evtchn))
- clear_evtchn(evtchn);
+ clear_evtchn(xh, evtchn);
}
EXPORT_SYMBOL(xen_clear_irq_pending);
void xen_set_irq_pending(int irq)
{
int evtchn = evtchn_from_irq(irq);
+ xenhost_t *xh = info_for_irq(irq)->xh;

if (VALID_EVTCHN(evtchn))
- set_evtchn(evtchn);
+ set_evtchn(xh, evtchn);
}

bool xen_test_irq_pending(int irq)
{
int evtchn = evtchn_from_irq(irq);
+ xenhost_t *xh = info_for_irq(irq)->xh;
bool ret = false;

if (VALID_EVTCHN(evtchn))
- ret = test_evtchn(evtchn);
+ ret = test_evtchn(xh, evtchn);

return ret;
}
@@ -1520,10 +1556,13 @@ bool xen_test_irq_pending(int irq)
void xen_poll_irq_timeout(int irq, u64 timeout)
{
evtchn_port_t evtchn = evtchn_from_irq(irq);
+ xenhost_t *xh = info_for_irq(irq)->xh;

if (VALID_EVTCHN(evtchn)) {
struct sched_poll poll;

+ BUG_ON(xh->type != xenhost_r1);
+
poll.nr_ports = 1;
poll.timeout = timeout;
set_xen_guest_handle(poll.ports, &evtchn);
@@ -1665,26 +1704,30 @@ void xen_callback_vector(void) {}
static bool fifo_events = true;
module_param(fifo_events, bool, 0);

-void __init xen_init_IRQ(void)
+void xen_init_IRQ(xenhost_t *xh)
{
int ret = -EINVAL;
unsigned int evtchn;

- if (fifo_events)
- ret = xen_evtchn_fifo_init();
if (ret < 0)
- xen_evtchn_2l_init();
+ xen_evtchn_2l_init(xh);

- evtchn_to_irq = kcalloc(EVTCHN_ROW(xen_evtchn_max_channels()),
- sizeof(*evtchn_to_irq), GFP_KERNEL);
- BUG_ON(!evtchn_to_irq);
+ xh->evtchn_to_irq = kcalloc(EVTCHN_ROW(xh, xen_evtchn_max_channels(xh)),
+ sizeof(*(xh->evtchn_to_irq)), GFP_KERNEL);
+ BUG_ON(!xh->evtchn_to_irq);

/* No event channels are 'live' right now. */
- for (evtchn = 0; evtchn < xen_evtchn_nr_channels(); evtchn++)
- mask_evtchn(evtchn);
+ for (evtchn = 0; evtchn < xen_evtchn_nr_channels(xh); evtchn++)
+ mask_evtchn(xh, evtchn);

pirq_needs_eoi = pirq_needs_eoi_flag;

+ /*
+ * Callback vectors, HW irqs are only for xenhost_r1
+ */
+ if (xh->type != xenhost_r1)
+ return;
+
#ifdef CONFIG_X86
if (xen_pv_domain()) {
irq_ctx_init(smp_processor_id());
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index eed766219dd0..38ce98f96fbb 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -324,7 +324,7 @@ static void consume_one_event(unsigned cpu,
q->head[priority] = head;
}

-static void __evtchn_fifo_handle_events(unsigned cpu, bool drop)
+static void __evtchn_fifo_handle_events(xenhost_t *xh, unsigned cpu, bool drop)
{
struct evtchn_fifo_control_block *control_block;
unsigned long ready;
diff --git a/drivers/xen/events/events_internal.h b/drivers/xen/events/events_internal.h
index 50c2050a1e32..9293c2593846 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -21,6 +21,7 @@ enum xen_irq_type {
/*
* Packed IRQ information:
* type - enum xen_irq_type
+ * xh - xenhost_t *
* event channel - irq->event channel mapping
* cpu - cpu this event channel is bound to
* index - type-specific information:
@@ -32,6 +33,7 @@ enum xen_irq_type {
*/
struct irq_info {
struct list_head list;
+ xenhost_t *xh;
int refcnt;
enum xen_irq_type type; /* type */
unsigned irq;
@@ -56,35 +58,32 @@ struct irq_info {
#define PIRQ_MSI_GROUP (1 << 2)

struct evtchn_ops {
- unsigned (*max_channels)(void);
- unsigned (*nr_channels)(void);
+ unsigned (*max_channels)(xenhost_t *xh);
+ unsigned (*nr_channels)(xenhost_t *xh);

int (*setup)(struct irq_info *info);
void (*bind_to_cpu)(struct irq_info *info, unsigned cpu);

- void (*clear_pending)(unsigned port);
- void (*set_pending)(unsigned port);
- bool (*is_pending)(unsigned port);
- bool (*test_and_set_mask)(unsigned port);
- void (*mask)(unsigned port);
- void (*unmask)(unsigned port);
+ void (*clear_pending)(xenhost_t *xh, unsigned port);
+ void (*set_pending)(xenhost_t *xh, unsigned port);
+ bool (*is_pending)(xenhost_t *xh, unsigned port);
+ bool (*test_and_set_mask)(xenhost_t *xh, unsigned port);
+ void (*mask)(xenhost_t *xh, unsigned port);
+ void (*unmask)(xenhost_t *xh, unsigned port);

- void (*handle_events)(unsigned cpu);
- void (*resume)(void);
+ void (*handle_events)(xenhost_t *xh, unsigned cpu);
+ void (*resume)(xenhost_t *xh);
};

-extern const struct evtchn_ops *evtchn_ops;
-
-extern int **evtchn_to_irq;
-int get_evtchn_to_irq(unsigned int evtchn);
+int get_evtchn_to_irq(xenhost_t *xh, unsigned int evtchn);

struct irq_info *info_for_irq(unsigned irq);
unsigned cpu_from_irq(unsigned irq);
-unsigned cpu_from_evtchn(unsigned int evtchn);
+unsigned cpu_from_evtchn(xenhost_t *xh, unsigned int evtchn);

-static inline unsigned xen_evtchn_max_channels(void)
+static inline unsigned xen_evtchn_max_channels(xenhost_t *xh)
{
- return evtchn_ops->max_channels();
+ return xh->evtchn_ops->max_channels(xh);
}

/*
@@ -93,59 +92,62 @@ static inline unsigned xen_evtchn_max_channels(void)
*/
static inline int xen_evtchn_port_setup(struct irq_info *info)
{
- if (evtchn_ops->setup)
- return evtchn_ops->setup(info);
+ if (info->xh->evtchn_ops->setup)
+ return info->xh->evtchn_ops->setup(info);
return 0;
}

static inline void xen_evtchn_port_bind_to_cpu(struct irq_info *info,
unsigned cpu)
{
- evtchn_ops->bind_to_cpu(info, cpu);
+ info->xh->evtchn_ops->bind_to_cpu(info, cpu);
}

-static inline void clear_evtchn(unsigned port)
+static inline void clear_evtchn(xenhost_t *xh, unsigned port)
{
- evtchn_ops->clear_pending(port);
+ xh->evtchn_ops->clear_pending(xh, port);
}

-static inline void set_evtchn(unsigned port)
+static inline void set_evtchn(xenhost_t *xh, unsigned port)
{
- evtchn_ops->set_pending(port);
+ xh->evtchn_ops->set_pending(xh, port);
}

-static inline bool test_evtchn(unsigned port)
+static inline bool test_evtchn(xenhost_t *xh, unsigned port)
{
- return evtchn_ops->is_pending(port);
+ return xh->evtchn_ops->is_pending(xh, port);
}

-static inline bool test_and_set_mask(unsigned port)
+static inline bool test_and_set_mask(xenhost_t *xh, unsigned port)
{
- return evtchn_ops->test_and_set_mask(port);
+ return xh->evtchn_ops->test_and_set_mask(xh, port);
}

-static inline void mask_evtchn(unsigned port)
+static inline void mask_evtchn(xenhost_t *xh, unsigned port)
{
- return evtchn_ops->mask(port);
+ return xh->evtchn_ops->mask(xh, port);
}

-static inline void unmask_evtchn(unsigned port)
+static inline void unmask_evtchn(xenhost_t *xh, unsigned port)
{
- return evtchn_ops->unmask(port);
+ return xh->evtchn_ops->unmask(xh, port);
}

-static inline void xen_evtchn_handle_events(unsigned cpu)
+static inline void xen_evtchn_handle_events(xenhost_t *xh, unsigned cpu)
{
- return evtchn_ops->handle_events(cpu);
+ return xh->evtchn_ops->handle_events(xh, cpu);
}

static inline void xen_evtchn_resume(void)
{
- if (evtchn_ops->resume)
- evtchn_ops->resume();
+ xenhost_t **xh;
+
+ for_each_xenhost(xh)
+ if ((*xh)->evtchn_ops->resume)
+ (*xh)->evtchn_ops->resume(*xh);
}

-void xen_evtchn_2l_init(void);
-int xen_evtchn_fifo_init(void);
+void xen_evtchn_2l_init(xenhost_t *xh);
+int xen_evtchn_fifo_init(xenhost_t *xh);

#endif /* #ifndef __EVENTS_INTERNAL_H__ */
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 66622109f2be..b868816874fd 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -292,7 +292,7 @@ static ssize_t evtchn_write(struct file *file, const char __user *buf,
evtchn = find_evtchn(u, port);
if (evtchn && !evtchn->enabled) {
evtchn->enabled = true;
- enable_irq(irq_from_evtchn(port));
+ enable_irq(irq_from_evtchn(xh_default, port));
}
}

@@ -392,18 +392,18 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port)
if (rc < 0)
goto err;

- rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, 0,
+ rc = bind_evtchn_to_irqhandler(xh_default, port, evtchn_interrupt, 0,
u->name, evtchn);
if (rc < 0)
goto err;

- rc = evtchn_make_refcounted(port);
+ rc = evtchn_make_refcounted(xh_default, port);
return rc;

err:
/* bind failed, should close the port now */
close.port = port;
- if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+ if (hypervisor_event_channel_op(xh_default, EVTCHNOP_close, &close) != 0)
BUG();
del_evtchn(u, evtchn);
return rc;
@@ -412,7 +412,7 @@ static int evtchn_bind_to_user(struct per_user_data *u, int port)
static void evtchn_unbind_from_user(struct per_user_data *u,
struct user_evtchn *evtchn)
{
- int irq = irq_from_evtchn(evtchn->port);
+ int irq = irq_from_evtchn(xh_default, evtchn->port);

BUG_ON(irq < 0);

@@ -429,7 +429,7 @@ static void evtchn_bind_interdom_next_vcpu(int evtchn)
struct irq_desc *desc;
unsigned long flags;

- irq = irq_from_evtchn(evtchn);
+ irq = irq_from_evtchn(xh_default, evtchn);
desc = irq_to_desc(irq);

if (!desc)
@@ -447,7 +447,7 @@ static void evtchn_bind_interdom_next_vcpu(int evtchn)
this_cpu_write(bind_last_selected_cpu, selected_cpu);

/* unmask expects irqs to be disabled */
- xen_rebind_evtchn_to_cpu(evtchn, selected_cpu);
+ xen_rebind_evtchn_to_cpu(xh_default, evtchn, selected_cpu);
raw_spin_unlock_irqrestore(&desc->lock, flags);
}

@@ -549,7 +549,7 @@ static long evtchn_ioctl(struct file *file,
break;

rc = -EINVAL;
- if (unbind.port >= xen_evtchn_nr_channels())
+ if (unbind.port >= xen_evtchn_nr_channels(xh_default))
break;

rc = -ENOTCONN;
@@ -557,7 +557,7 @@ static long evtchn_ioctl(struct file *file,
if (!evtchn)
break;

- disable_irq(irq_from_evtchn(unbind.port));
+ disable_irq(irq_from_evtchn(xh_default, unbind.port));
evtchn_unbind_from_user(u, evtchn);
rc = 0;
break;
@@ -574,7 +574,7 @@ static long evtchn_ioctl(struct file *file,
rc = -ENOTCONN;
evtchn = find_evtchn(u, notify.port);
if (evtchn) {
- notify_remote_via_evtchn(notify.port);
+ notify_remote_via_evtchn(xh_default, notify.port);
rc = 0;
}
break;
@@ -676,7 +676,7 @@ static int evtchn_release(struct inode *inode, struct file *filp)
struct user_evtchn *evtchn;

evtchn = rb_entry(node, struct user_evtchn, node);
- disable_irq(irq_from_evtchn(evtchn->port));
+ disable_irq(irq_from_evtchn(xh_default, evtchn->port));
evtchn_unbind_from_user(u, evtchn);
}

diff --git a/drivers/xen/fallback.c b/drivers/xen/fallback.c
index ae81cf75ae5f..9f54fb8cf96d 100644
--- a/drivers/xen/fallback.c
+++ b/drivers/xen/fallback.c
@@ -2,6 +2,7 @@
#include <linux/string.h>
#include <linux/bug.h>
#include <linux/export.h>
+#include <xen/interface/xen.h>
#include <asm/hypervisor.h>
#include <asm/xen/hypercall.h>

diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 3fa40c723e8e..e07823886fa8 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -189,8 +189,8 @@ static void __del_gref(struct gntalloc_gref *gref)
kunmap(gref->page);
}
if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
- notify_remote_via_evtchn(gref->notify.event);
- evtchn_put(gref->notify.event);
+ notify_remote_via_evtchn(xh_default, gref->notify.event);
+ evtchn_put(xh_default, gref->notify.event);
}

gref->notify.flags = 0;
@@ -418,14 +418,14 @@ static long gntalloc_ioctl_unmap_notify(struct gntalloc_file_private_data *priv,
* reference to that event channel.
*/
if (op.action & UNMAP_NOTIFY_SEND_EVENT) {
- if (evtchn_get(op.event_channel_port)) {
+ if (evtchn_get(xh_default, op.event_channel_port)) {
rc = -EINVAL;
goto unlock_out;
}
}

if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT)
- evtchn_put(gref->notify.event);
+ evtchn_put(xh_default, gref->notify.event);

gref->notify.flags = op.action;
gref->notify.pgoff = pgoff;
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5efc5eee9544..0f0c951cd5b1 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -247,8 +247,8 @@ void gntdev_put_map(struct gntdev_priv *priv, struct gntdev_grant_map *map)
atomic_sub(map->count, &pages_mapped);

if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
- notify_remote_via_evtchn(map->notify.event);
- evtchn_put(map->notify.event);
+ notify_remote_via_evtchn(xh_default, map->notify.event);
+ evtchn_put(xh_default, map->notify.event);
}

if (populate_freeable_maps && priv) {
@@ -790,7 +790,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)
* reference to that event channel.
*/
if (op.action & UNMAP_NOTIFY_SEND_EVENT) {
- if (evtchn_get(op.event_channel_port))
+ if (evtchn_get(xh_default, op.event_channel_port))
return -EINVAL;
}

@@ -829,7 +829,7 @@ static long gntdev_ioctl_notify(struct gntdev_priv *priv, void __user *u)

/* Drop the reference to the event channel we did not save in the map */
if (out_flags & UNMAP_NOTIFY_SEND_EVENT)
- evtchn_put(out_event);
+ evtchn_put(xh_default, out_event);

return rc;
}
diff --git a/drivers/xen/mcelog.c b/drivers/xen/mcelog.c
index b8bf61abb65b..45be85960f53 100644
--- a/drivers/xen/mcelog.c
+++ b/drivers/xen/mcelog.c
@@ -378,7 +378,7 @@ static int bind_virq_for_mce(void)
return ret;
}

- ret = bind_virq_to_irqhandler(VIRQ_MCA, 0,
+ ret = bind_virq_to_irqhandler(xh_default, VIRQ_MCA, 0,
xen_mce_interrupt, 0, "mce", NULL);
if (ret < 0) {
pr_err("Failed to bind virq\n");
diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
index cdc6daa7a9f6..d0807f8fbd8b 100644
--- a/drivers/xen/pcpu.c
+++ b/drivers/xen/pcpu.c
@@ -387,7 +387,7 @@ static int __init xen_pcpu_init(void)
if (!xen_initial_domain())
return -ENODEV;

- irq = bind_virq_to_irqhandler(VIRQ_PCPU_STATE, 0,
+ irq = bind_virq_to_irqhandler(xh_default, VIRQ_PCPU_STATE, 0,
xen_pcpu_interrupt, 0,
"xen-pcpu", NULL);
if (irq < 0) {
diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
index 08cb419eb4e6..b5f16a98414b 100644
--- a/drivers/xen/preempt.c
+++ b/drivers/xen/preempt.c
@@ -10,6 +10,7 @@
*/

#include <linux/sched.h>
+#include <xen/interface/xen.h>
#include <xen/xen-ops.h>

#ifndef CONFIG_PREEMPT
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index b24ddac1604b..b5541f862720 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -27,6 +27,7 @@
#include <asm/pgalloc.h>
#include <asm/pgtable.h>
#include <asm/tlb.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 9d314bba7c4e..005a898e7a23 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -13,12 +13,12 @@
#include <linux/kobject.h>
#include <linux/err.h>

+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

#include <xen/xen.h>
#include <xen/xenbus.h>
-#include <xen/interface/xen.h>
#include <xen/interface/version.h>
#ifdef CONFIG_XEN_HAVE_VPMU
#include <xen/interface/xenpmu.h>
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index feee74bbab0a..73916766dcac 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -8,13 +8,13 @@
#include <linux/gfp.h>
#include <linux/slab.h>

+#include <xen/interface/xen.h>
#include <asm/paravirt.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

#include <xen/events.h>
#include <xen/features.h>
-#include <xen/interface/xen.h>
#include <xen/interface/vcpu.h>
#include <xen/xen-ops.h>

diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index 581c4e1a8b82..b95dd65f3872 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -123,7 +123,7 @@ static int xen_pcibk_do_attach(struct xen_pcibk_device *pdev, int gnt_ref,

pdev->sh_info = vaddr;

- err = bind_interdomain_evtchn_to_irqhandler(
+ err = bind_interdomain_evtchn_to_irqhandler(xh_default,
pdev->xdev->otherend_id, remote_evtchn, xen_pcibk_handle_event,
0, DRV_NAME, pdev);
if (err < 0) {
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index c9e23a126218..8702b1ac92a8 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -54,8 +54,9 @@
#include <target/target_core_base.h>
#include <target/target_core_fabric.h>

+
+#include <xen/interface/xen.h>
#include <asm/hypervisor.h>
-
#include <xen/xen.h>
#include <xen/balloon.h>
#include <xen/events.h>
@@ -829,7 +830,7 @@ static int scsiback_init_sring(struct vscsibk_info *info, grant_ref_t ring_ref,
sring = (struct vscsiif_sring *)area;
BACK_RING_INIT(&info->ring, sring, PAGE_SIZE);

- err = bind_interdomain_evtchn_to_irq(info->domid, evtchn);
+ err = bind_interdomain_evtchn_to_irq(xh_default, info->domid, evtchn);
if (err < 0)
goto unmap_page;

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index e17ca8156171..f0cf47765726 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -36,9 +36,9 @@
#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <linux/export.h>
-#include <asm/xen/hypervisor.h>
#include <xen/page.h>
#include <xen/interface/xen.h>
+#include <asm/xen/hypervisor.h>
#include <xen/interface/event_channel.h>
#include <xen/balloon.h>
#include <xen/events.h>
diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index d239fc3c5e3d..acbc366c1717 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -151,7 +151,7 @@ static int xb_write(const void *data, unsigned int len)

/* Implies mb(): other side will see the updated producer. */
if (prod <= intf->req_cons)
- notify_remote_via_evtchn(xen_store_evtchn);
+ notify_remote_via_evtchn(xh_default, xen_store_evtchn);
}

return bytes;
@@ -204,7 +204,7 @@ static int xb_read(void *data, unsigned int len)

/* Implies mb(): other side will see the updated consumer. */
if (intf->rsp_prod - cons >= XENSTORE_RING_SIZE)
- notify_remote_via_evtchn(xen_store_evtchn);
+ notify_remote_via_evtchn(xh_default, xen_store_evtchn);
}

return bytes;
@@ -461,7 +461,7 @@ int xb_init_comms(void)
} else {
int err;

- err = bind_evtchn_to_irqhandler(xen_store_evtchn, wake_waiting,
+ err = bind_evtchn_to_irqhandler(xh_default, xen_store_evtchn, wake_waiting,
0, "xenbus", &xb_waitq);
if (err < 0) {
pr_err("request irq failed %i\n", err);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..049bd511f36e 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -52,6 +52,7 @@

#include <asm/page.h>
#include <asm/pgtable.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>

#include <xen/xen.h>
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..d3c53a9db5e3 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -48,6 +48,7 @@

#include <asm/page.h>
#include <asm/pgtable.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/hypervisor.h>
#include <xen/xenbus.h>
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 07896f4b2736..3edab7cc03c3 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -19,6 +19,7 @@

#include <asm/page.h>
#include <asm/pgtable.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <xen/xenbus.h>
#include <xen/events.h>
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 3236d1b1fa01..74c2b9416b88 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -46,6 +46,7 @@
#include <linux/reboot.h>
#include <linux/rwsem.h>
#include <linux/mutex.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <xen/xenbus.h>
#include <xen/xen.h>
diff --git a/include/xen/events.h b/include/xen/events.h
index a48897199975..138dbbbefc6d 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -11,27 +11,30 @@
#include <asm/xen/hypercall.h>
#include <asm/xen/events.h>

-unsigned xen_evtchn_nr_channels(void);
+unsigned xen_evtchn_nr_channels(xenhost_t *xh);

-int bind_evtchn_to_irq(unsigned int evtchn);
-int bind_evtchn_to_irqhandler(unsigned int evtchn,
+int bind_evtchn_to_irq(xenhost_t *xh, unsigned int evtchn);
+int bind_evtchn_to_irqhandler(xenhost_t *xh, unsigned int evtchn,
irq_handler_t handler,
unsigned long irqflags, const char *devname,
void *dev_id);
-int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu);
-int bind_virq_to_irqhandler(unsigned int virq, unsigned int cpu,
+int bind_virq_to_irq(xenhost_t *xh, unsigned int virq, unsigned int cpu, bool percpu);
+int bind_virq_to_irqhandler(xenhost_t *xh, unsigned int virq, unsigned int cpu,
irq_handler_t handler,
unsigned long irqflags, const char *devname,
void *dev_id);
-int bind_ipi_to_irqhandler(enum ipi_vector ipi,
+int bind_ipi_to_irqhandler(xenhost_t *xh,
+ enum ipi_vector ipi,
unsigned int cpu,
irq_handler_t handler,
unsigned long irqflags,
const char *devname,
void *dev_id);
-int bind_interdomain_evtchn_to_irq(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irq(xenhost_t *xh,
+ unsigned int remote_domain,
unsigned int remote_port);
-int bind_interdomain_evtchn_to_irqhandler(unsigned int remote_domain,
+int bind_interdomain_evtchn_to_irqhandler(xenhost_t *xh,
+ unsigned int remote_domain,
unsigned int remote_port,
irq_handler_t handler,
unsigned long irqflags,
@@ -48,23 +51,23 @@ void unbind_from_irqhandler(unsigned int irq, void *dev_id);
#define XEN_IRQ_PRIORITY_MAX EVTCHN_FIFO_PRIORITY_MAX
#define XEN_IRQ_PRIORITY_DEFAULT EVTCHN_FIFO_PRIORITY_DEFAULT
#define XEN_IRQ_PRIORITY_MIN EVTCHN_FIFO_PRIORITY_MIN
-int xen_set_irq_priority(unsigned irq, unsigned priority);
+int xen_set_irq_priority(xenhost_t *xh, unsigned irq, unsigned priority);

/*
* Allow extra references to event channels exposed to userspace by evtchn
*/
-int evtchn_make_refcounted(unsigned int evtchn);
-int evtchn_get(unsigned int evtchn);
-void evtchn_put(unsigned int evtchn);
+int evtchn_make_refcounted(xenhost_t *xh, unsigned int evtchn);
+int evtchn_get(xenhost_t *xh, unsigned int evtchn);
+void evtchn_put(xenhost_t *xh, unsigned int evtchn);

-void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector);
+void xen_send_IPI_one(xenhost_t *xh, unsigned int cpu, enum ipi_vector vector);
void rebind_evtchn_irq(int evtchn, int irq);
-int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu);
+int xen_rebind_evtchn_to_cpu(xenhost_t *xh, int evtchn, unsigned tcpu);

-static inline void notify_remote_via_evtchn(int port)
+static inline void notify_remote_via_evtchn(xenhost_t *xh, int port)
{
struct evtchn_send send = { .port = port };
- (void)HYPERVISOR_event_channel_op(EVTCHNOP_send, &send);
+ (void)hypervisor_event_channel_op(xh, EVTCHNOP_send, &send);
}

void notify_remote_via_irq(int irq);
@@ -85,7 +88,7 @@ void xen_poll_irq(int irq);
void xen_poll_irq_timeout(int irq, u64 timeout);

/* Determine the IRQ which is bound to an event channel */
-unsigned irq_from_evtchn(unsigned int evtchn);
+unsigned irq_from_evtchn(xenhost_t *xh,unsigned int evtchn);
int irq_from_virq(unsigned int cpu, unsigned int virq);
unsigned int evtchn_from_irq(unsigned irq);

@@ -101,14 +104,14 @@ void xen_evtchn_do_upcall(struct pt_regs *regs);
void xen_hvm_evtchn_do_upcall(void);

/* Bind a pirq for a physical interrupt to an irq. */
-int xen_bind_pirq_gsi_to_irq(unsigned gsi,
+int xen_bind_pirq_gsi_to_irq(xenhost_t *xh, unsigned gsi,
unsigned pirq, int shareable, char *name);

#ifdef CONFIG_PCI_MSI
/* Allocate a pirq for a MSI style physical interrupt. */
-int xen_allocate_pirq_msi(struct pci_dev *dev, struct msi_desc *msidesc);
+int xen_allocate_pirq_msi(xenhost_t *xh, struct pci_dev *dev, struct msi_desc *msidesc);
/* Bind an PSI pirq to an irq. */
-int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct msi_desc *msidesc,
+int xen_bind_pirq_msi_to_irq(xenhost_t *xh, struct pci_dev *dev, struct msi_desc *msidesc,
int pirq, int nvec, const char *name, domid_t domid);
#endif

@@ -128,5 +131,5 @@ int xen_irq_from_gsi(unsigned gsi);
int xen_test_irq_shared(int irq);

/* initialize Xen IRQ subsystem */
-void xen_init_IRQ(void);
+void xen_init_IRQ(xenhost_t *xh);
#endif /* _XEN_EVENTS_H */
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index f6092a8987f1..c9dabf739ff8 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -112,6 +112,23 @@ typedef struct {
*/
uint32_t xen_vcpu_id[NR_CPUS];
};
+
+ /*
+ * evtchn: get init'd via x86_init.irqs.intr_init (xen_init_IRQ()).
+ *
+ * The common functionality for xenhost_* provided by xen_init_IRQ()
+ * is the mapping between evtchn <-> irq.
+ *
+ * For all three of xenhost_r0/r1 and r2, post-init the evtchn logic
+ * should just work using the evtchn_to_irq mapping and the vcpu_info,
+ * shared_info state.
+ * (Plus some state private to evtchn_2l/evtchn_fifo which for now
+ * is defined locally.)
+ */
+ struct {
+ const struct evtchn_ops *evtchn_ops;
+ int **evtchn_to_irq;
+ };
} xenhost_t;

typedef struct xenhost_ops {
--
2.20.1

2019-05-09 17:27:49

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 12/16] xen/xenbus: support xenbus frontend/backend with xenhost_t

As part of xenbus init, both frontend, backend interfaces need to talk
on the correct xenbus. This might be a local xenstore (backend) or might
be a XS_PV/XS_HVM interface (frontend) which needs to talk over xenbus
with the remote xenstored. We bootstrap all of these with evtchn/gfn
parameters from (*setup_xs)().

Given this we can do appropriate device discovery (in case of frontend)
and device connectivity for the backend.
Once done, we stash the xenhost_t * in xen_bus_type, xenbus_device or
xenbus_watch and then the frontend and backend devices implicitly use
the correct interface.

The rest of patch is just changing the interfaces where needed.

Signed-off-by: Ankur Arora <[email protected]>
---
drivers/block/xen-blkback/blkback.c | 10 +-
drivers/net/xen-netfront.c | 14 +-
drivers/pci/xen-pcifront.c | 4 +-
drivers/xen/cpu_hotplug.c | 4 +-
drivers/xen/manage.c | 28 +--
drivers/xen/xen-balloon.c | 8 +-
drivers/xen/xenbus/xenbus.h | 45 ++--
drivers/xen/xenbus/xenbus_client.c | 32 +--
drivers/xen/xenbus/xenbus_comms.c | 121 +++++-----
drivers/xen/xenbus/xenbus_dev_backend.c | 30 ++-
drivers/xen/xenbus/xenbus_dev_frontend.c | 22 +-
drivers/xen/xenbus/xenbus_probe.c | 246 +++++++++++++--------
drivers/xen/xenbus/xenbus_probe_backend.c | 19 +-
drivers/xen/xenbus/xenbus_probe_frontend.c | 65 +++---
drivers/xen/xenbus/xenbus_xs.c | 188 +++++++++-------
include/xen/xen-ops.h | 3 +
include/xen/xenbus.h | 54 +++--
include/xen/xenhost.h | 20 ++
18 files changed, 536 insertions(+), 377 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..7ad4423c24b8 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -541,12 +541,12 @@ static void xen_vbd_resize(struct xen_blkif *blkif)
pr_info("VBD Resize: new size %llu\n", new_size);
vbd->size = new_size;
again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
pr_warn("Error starting transaction\n");
return;
}
- err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "sectors", "%llu",
(unsigned long long)vbd_sz(vbd));
if (err) {
pr_warn("Error writing new size\n");
@@ -557,20 +557,20 @@ static void xen_vbd_resize(struct xen_blkif *blkif)
* the front-end. If the current state is "connected" the
* front-end will get the new size information online.
*/
- err = xenbus_printf(xbt, dev->nodename, "state", "%d", dev->state);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "state", "%d", dev->state);
if (err) {
pr_warn("Error writing the state\n");
goto abort;
}

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(dev->xh, xbt, 0);
if (err == -EAGAIN)
goto again;
if (err)
pr_warn("Error ending transaction\n");
return;
abort:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(dev->xh, xbt, 1);
}

/*
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 1cd0a2d2ba54..ee28e8b85406 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1336,9 +1336,9 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)

xenbus_switch_state(dev, XenbusStateInitialising);
wait_event(module_wq,
- xenbus_read_driver_state(dev->otherend) !=
+ xenbus_read_driver_state(dev, dev->otherend) !=
XenbusStateClosed &&
- xenbus_read_driver_state(dev->otherend) !=
+ xenbus_read_driver_state(dev, dev->otherend) !=
XenbusStateUnknown);
return netdev;

@@ -2145,19 +2145,19 @@ static int xennet_remove(struct xenbus_device *dev)

dev_dbg(&dev->dev, "%s\n", dev->nodename);

- if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
+ if (xenbus_read_driver_state(dev, dev->otherend) != XenbusStateClosed) {
xenbus_switch_state(dev, XenbusStateClosing);
wait_event(module_wq,
- xenbus_read_driver_state(dev->otherend) ==
+ xenbus_read_driver_state(dev, dev->otherend) ==
XenbusStateClosing ||
- xenbus_read_driver_state(dev->otherend) ==
+ xenbus_read_driver_state(dev, dev->otherend) ==
XenbusStateUnknown);

xenbus_switch_state(dev, XenbusStateClosed);
wait_event(module_wq,
- xenbus_read_driver_state(dev->otherend) ==
+ xenbus_read_driver_state(dev, dev->otherend) ==
XenbusStateClosed ||
- xenbus_read_driver_state(dev->otherend) ==
+ xenbus_read_driver_state(dev, dev->otherend) ==
XenbusStateUnknown);
}

diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index f894290e8b3a..4c7ef1e09ed7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -860,7 +860,7 @@ static int pcifront_try_connect(struct pcifront_device *pdev)


/* Only connect once */
- if (xenbus_read_driver_state(pdev->xdev->nodename) !=
+ if (xenbus_read_driver_state(pdev->xdev, pdev->xdev->nodename) !=
XenbusStateInitialised)
goto out;

@@ -871,7 +871,7 @@ static int pcifront_try_connect(struct pcifront_device *pdev)
goto out;
}

- err = xenbus_scanf(XBT_NIL, pdev->xdev->otherend,
+ err = xenbus_scanf(pdev->xdev->xh, XBT_NIL, pdev->xdev->otherend,
"root_num", "%d", &num_roots);
if (err == -ENOENT) {
xenbus_dev_error(pdev->xdev, err,
diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index b1357aa4bc55..afeb94446d34 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -37,7 +37,7 @@ static int vcpu_online(unsigned int cpu)
char dir[16], state[16];

sprintf(dir, "cpu/%u", cpu);
- err = xenbus_scanf(XBT_NIL, dir, "availability", "%15s", state);
+ err = xenbus_scanf(xh_default, XBT_NIL, dir, "availability", "%15s", state);
if (err != 1) {
if (!xen_initial_domain())
pr_err("Unable to read cpu state\n");
@@ -90,7 +90,7 @@ static int setup_cpu_watcher(struct notifier_block *notifier,
.node = "cpu",
.callback = handle_vcpu_hotplug_event};

- (void)register_xenbus_watch(&cpu_watch);
+ (void)register_xenbus_watch(xh_default, &cpu_watch);

for_each_possible_cpu(cpu) {
if (vcpu_online(cpu) == 0) {
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 5bb01a62f214..9a69d955dd5c 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch *watch,
return;

again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(xh_default, &xbt);
if (err)
return;

- str = (char *)xenbus_read(xbt, "control", "shutdown", NULL);
+ str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown", NULL);
/* Ignore read errors and empty reads. */
if (XENBUS_IS_ERR_READ(str)) {
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(xh_default, xbt, 1);
return;
}

@@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch *watch,

/* Only acknowledge commands which we are prepared to handle. */
if (idx < ARRAY_SIZE(shutdown_handlers))
- xenbus_write(xbt, "control", "shutdown", "");
+ xenbus_write(xh_default, xbt, "control", "shutdown", "");

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(xh_default, xbt, 0);
if (err == -EAGAIN) {
kfree(str);
goto again;
@@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
int err;

again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(xh_default, &xbt);
if (err)
return;
- err = xenbus_scanf(xbt, "control", "sysrq", "%c", &sysrq_key);
+ err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c", &sysrq_key);
if (err < 0) {
/*
* The Xenstore watch fires directly after registering it and
@@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
if (err != -ENOENT && err != -ERANGE)
pr_err("Error %d reading sysrq code in control/sysrq\n",
err);
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(xh_default, xbt, 1);
return;
}

if (sysrq_key != '\0') {
- err = xenbus_printf(xbt, "control", "sysrq", "%c", '\0');
+ err = xenbus_printf(xh_default, xbt, "control", "sysrq", "%c", '\0');
if (err) {
pr_err("%s: Error %d writing sysrq in control/sysrq\n",
__func__, err);
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(xh_default, xbt, 1);
return;
}
}

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(xh_default, xbt, 0);
if (err == -EAGAIN)
goto again;

@@ -331,7 +331,7 @@ static int setup_shutdown_watcher(void)
#define FEATURE_PATH_SIZE (SHUTDOWN_CMD_SIZE + sizeof("feature-"))
char node[FEATURE_PATH_SIZE];

- err = register_xenbus_watch(&shutdown_watch);
+ err = register_xenbus_watch(xh_default, &shutdown_watch);
if (err) {
pr_err("Failed to set shutdown watcher\n");
return err;
@@ -339,7 +339,7 @@ static int setup_shutdown_watcher(void)


#ifdef CONFIG_MAGIC_SYSRQ
- err = register_xenbus_watch(&sysrq_watch);
+ err = register_xenbus_watch(xh_default, &sysrq_watch);
if (err) {
pr_err("Failed to set sysrq watcher\n");
return err;
@@ -351,7 +351,7 @@ static int setup_shutdown_watcher(void)
continue;
snprintf(node, FEATURE_PATH_SIZE, "feature-%s",
shutdown_handlers[idx].command);
- err = xenbus_printf(XBT_NIL, "control", node, "%u", 1);
+ err = xenbus_printf(xh_default, XBT_NIL, "control", node, "%u", 1);
if (err) {
pr_err("%s: Error %d writing %s\n", __func__,
err, node);
diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
index 2acbfe104e46..d34d9b1af7a8 100644
--- a/drivers/xen/xen-balloon.c
+++ b/drivers/xen/xen-balloon.c
@@ -63,7 +63,7 @@ static void watch_target(struct xenbus_watch *watch,
static bool watch_fired;
static long target_diff;

- err = xenbus_scanf(XBT_NIL, "memory", "target", "%llu", &new_target);
+ err = xenbus_scanf(xh_default, XBT_NIL, "memory", "target", "%llu", &new_target);
if (err != 1) {
/* This is ok (for domain0 at least) - so just return */
return;
@@ -77,9 +77,9 @@ static void watch_target(struct xenbus_watch *watch,
if (!watch_fired) {
watch_fired = true;

- if ((xenbus_scanf(XBT_NIL, "memory", "static-max",
+ if ((xenbus_scanf(xh_default, XBT_NIL, "memory", "static-max",
"%llu", &static_max) == 1) ||
- (xenbus_scanf(XBT_NIL, "memory", "memory_static_max",
+ (xenbus_scanf(xh_default, XBT_NIL, "memory", "memory_static_max",
"%llu", &static_max) == 1))
static_max >>= PAGE_SHIFT - 10;
else
@@ -103,7 +103,7 @@ static int balloon_init_watcher(struct notifier_block *notifier,
{
int err;

- err = register_xenbus_watch(&target_watch);
+ err = register_xenbus_watch(xh_default, &target_watch);
if (err)
pr_err("Failed to set balloon watcher\n");

diff --git a/drivers/xen/xenbus/xenbus.h b/drivers/xen/xenbus/xenbus.h
index 092981171df1..183c6e40bdaa 100644
--- a/drivers/xen/xenbus/xenbus.h
+++ b/drivers/xen/xenbus/xenbus.h
@@ -39,9 +39,11 @@
#define XEN_BUS_ID_SIZE 20

struct xen_bus_type {
+ xenhost_t *xh;
char *root;
unsigned int levels;
- int (*get_bus_id)(char bus_id[XEN_BUS_ID_SIZE], const char *nodename);
+ int (*get_bus_id)(struct xen_bus_type *bus, char bus_id[XEN_BUS_ID_SIZE],
+ const char *nodename);
int (*probe)(struct xen_bus_type *bus, const char *type,
const char *dir);
void (*otherend_changed)(struct xenbus_watch *watch, const char *path,
@@ -49,13 +51,30 @@ struct xen_bus_type {
struct bus_type bus;
};

-enum xenstore_init {
- XS_UNKNOWN,
- XS_PV,
- XS_HVM,
- XS_LOCAL,
+struct xenstore_private {
+ /* xenbus_comms.c */
+ struct work_struct probe_work;
+ struct wait_queue_head xb_waitq;
+ struct list_head xb_write_list;
+ struct task_struct *xenbus_task;
+ struct list_head reply_list;
+ int xenbus_irq;
+
+ /* xenbus_probe.c */
+ struct xenstore_domain_interface *store_interface;
+ struct blocking_notifier_head xenstore_chain;
+
+ enum xenstore_init domain_type;
+ xen_pfn_t store_gfn;
+ uint32_t store_evtchn;
+ int xenstored_ready;
+
+ /* xenbus_xs.c */
+ struct list_head watches; /* xenhost local so we don't mix them up. */
};

+#define xs_priv(xh) ((struct xenstore_private *) (xh)->xenstore_private)
+
struct xs_watch_event {
struct list_head list;
unsigned int len;
@@ -87,18 +106,14 @@ struct xb_req_data {
void *par;
};

-extern enum xenstore_init xen_store_domain_type;
extern const struct attribute_group *xenbus_dev_groups[];
extern struct mutex xs_response_mutex;
-extern struct list_head xs_reply_list;
-extern struct list_head xb_write_list;
-extern wait_queue_head_t xb_waitq;
extern struct mutex xb_write_mutex;

-int xs_init(void);
-int xb_init_comms(void);
-void xb_deinit_comms(void);
-int xs_watch_msg(struct xs_watch_event *event);
+int xs_init(xenhost_t *xh);
+int xb_init_comms(xenhost_t *xh);
+void xb_deinit_comms(xenhost_t *xh);
+int xs_watch_msg(xenhost_t *xh, struct xs_watch_event *event);
void xs_request_exit(struct xb_req_data *req);

int xenbus_match(struct device *_dev, struct device_driver *_drv);
@@ -130,7 +145,7 @@ int xenbus_read_otherend_details(struct xenbus_device *xendev,

void xenbus_ring_ops_init(void);

-int xenbus_dev_request_and_reply(struct xsd_sockmsg *msg, void *par);
+int xenbus_dev_request_and_reply(xenhost_t *xh, struct xsd_sockmsg *msg, void *par);
void xenbus_dev_queue_reply(struct xb_req_data *req);

#endif
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 5748fbaf0238..e4f8ecb9490a 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -122,7 +122,7 @@ int xenbus_watch_path(struct xenbus_device *dev, const char *path,
watch->node = path;
watch->callback = callback;

- err = register_xenbus_watch(watch);
+ err = register_xenbus_watch(dev->xh, watch);

if (err) {
watch->node = NULL;
@@ -206,17 +206,17 @@ __xenbus_switch_state(struct xenbus_device *dev,
again:
abort = 1;

- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
xenbus_switch_fatal(dev, depth, err, "starting transaction");
return 0;
}

- err = xenbus_scanf(xbt, dev->nodename, "state", "%d", &current_state);
+ err = xenbus_scanf(dev->xh, xbt, dev->nodename, "state", "%d", &current_state);
if (err != 1)
goto abort;

- err = xenbus_printf(xbt, dev->nodename, "state", "%d", state);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "state", "%d", state);
if (err) {
xenbus_switch_fatal(dev, depth, err, "writing new state");
goto abort;
@@ -224,7 +224,7 @@ __xenbus_switch_state(struct xenbus_device *dev,

abort = 0;
abort:
- err = xenbus_transaction_end(xbt, abort);
+ err = xenbus_transaction_end(dev->xh, xbt, abort);
if (err) {
if (err == -EAGAIN && !abort)
goto again;
@@ -279,7 +279,7 @@ static void xenbus_va_dev_error(struct xenbus_device *dev, int err,

path_buffer = kasprintf(GFP_KERNEL, "error/%s", dev->nodename);
if (path_buffer)
- xenbus_write(XBT_NIL, path_buffer, "error", printf_buffer);
+ xenbus_write(dev->xh, XBT_NIL, path_buffer, "error", printf_buffer);

kfree(printf_buffer);
kfree(path_buffer);
@@ -363,7 +363,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,
int i, j;

for (i = 0; i < nr_pages; i++) {
- err = gnttab_grant_foreign_access(dev->otherend_id,
+ err = gnttab_grant_foreign_access(dev->xh, dev->otherend_id,
virt_to_gfn(vaddr), 0);
if (err < 0) {
xenbus_dev_fatal(dev, err,
@@ -379,7 +379,7 @@ int xenbus_grant_ring(struct xenbus_device *dev, void *vaddr,

fail:
for (j = 0; j < i; j++)
- gnttab_end_foreign_access_ref(grefs[j], 0);
+ gnttab_end_foreign_access_ref(dev->xh, grefs[j], 0);
return err;
}
EXPORT_SYMBOL_GPL(xenbus_grant_ring);
@@ -399,7 +399,7 @@ int xenbus_alloc_evtchn(struct xenbus_device *dev, int *port)
alloc_unbound.dom = DOMID_SELF;
alloc_unbound.remote_dom = dev->otherend_id;

- err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound,
+ err = hypervisor_event_channel_op(dev->xh, EVTCHNOP_alloc_unbound,
&alloc_unbound);
if (err)
xenbus_dev_fatal(dev, err, "allocating event channel");
@@ -421,7 +421,7 @@ int xenbus_free_evtchn(struct xenbus_device *dev, int port)

close.port = port;

- err = HYPERVISOR_event_channel_op(EVTCHNOP_close, &close);
+ err = hypervisor_event_channel_op(dev->xh, EVTCHNOP_close, &close);
if (err)
xenbus_dev_error(dev, err, "freeing event channel %d", port);

@@ -478,7 +478,7 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
handles[i] = INVALID_GRANT_HANDLE;
}

- gnttab_batch_map(map, i);
+ gnttab_batch_map(dev->xh, map, i);

for (i = 0; i < nr_grefs; i++) {
if (map[i].status != GNTST_okay) {
@@ -503,7 +503,7 @@ static int __xenbus_map_ring(struct xenbus_device *dev,
}
}

- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, j))
+ if (hypervisor_grant_table_op(dev->xh, GNTTABOP_unmap_grant_ref, unmap, j))
BUG();

*leaked = false;
@@ -761,7 +761,7 @@ static int xenbus_unmap_ring_vfree_pv(struct xenbus_device *dev, void *vaddr)
unmap[i].handle = node->handles[i];
}

- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, i))
+ if (hypervisor_grant_table_op(dev->xh, GNTTABOP_unmap_grant_ref, unmap, i))
BUG();

err = GNTST_okay;
@@ -884,7 +884,7 @@ int xenbus_unmap_ring(struct xenbus_device *dev,
gnttab_set_unmap_op(&unmap[i], vaddrs[i],
GNTMAP_host_map, handles[i]);

- if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap, i))
+ if (hypervisor_grant_table_op(dev->xh, GNTTABOP_unmap_grant_ref, unmap, i))
BUG();

err = GNTST_okay;
@@ -910,10 +910,10 @@ EXPORT_SYMBOL_GPL(xenbus_unmap_ring);
* Return the state of the driver rooted at the given store path, or
* XenbusStateUnknown if no state can be read.
*/
-enum xenbus_state xenbus_read_driver_state(const char *path)
+enum xenbus_state xenbus_read_driver_state(struct xenbus_device *dev, const char *path)
{
enum xenbus_state result;
- int err = xenbus_gather(XBT_NIL, path, "state", "%d", &result, NULL);
+ int err = xenbus_gather(dev->xh, XBT_NIL, path, "state", "%d", &result, NULL);
if (err)
result = XenbusStateUnknown;

diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index acbc366c1717..2494ae1a0a7e 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -43,31 +43,21 @@
#include <xen/page.h>
#include "xenbus.h"

-/* A list of replies. Currently only one will ever be outstanding. */
-LIST_HEAD(xs_reply_list);
-
-/* A list of write requests. */
-LIST_HEAD(xb_write_list);
-DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
DEFINE_MUTEX(xb_write_mutex);

/* Protect xenbus reader thread against save/restore. */
DEFINE_MUTEX(xs_response_mutex);

-static int xenbus_irq;
-static struct task_struct *xenbus_task;
-
-static DECLARE_WORK(probe_work, xenbus_probe);
-
-
-static irqreturn_t wake_waiting(int irq, void *unused)
+static irqreturn_t wake_waiting(int irq, void *_xs)
{
- if (unlikely(xenstored_ready == 0)) {
- xenstored_ready = 1;
- schedule_work(&probe_work);
+ struct xenstore_private *xs = (struct xenstore_private *) _xs;
+
+ if (unlikely(xs->xenstored_ready == 0)) {
+ xs->xenstored_ready = 1;
+ schedule_work(&xs->probe_work);
}

- wake_up(&xb_waitq);
+ wake_up(&xs->xb_waitq);
return IRQ_HANDLED;
}

@@ -96,24 +86,26 @@ static const void *get_input_chunk(XENSTORE_RING_IDX cons,
return buf + MASK_XENSTORE_IDX(cons);
}

-static int xb_data_to_write(void)
+static int xb_data_to_write(struct xenstore_private *xs)
{
- struct xenstore_domain_interface *intf = xen_store_interface;
+ struct xenstore_domain_interface *intf = xs->store_interface;

return (intf->req_prod - intf->req_cons) != XENSTORE_RING_SIZE &&
- !list_empty(&xb_write_list);
+ !list_empty(&xs->xb_write_list);
}

/**
* xb_write - low level write
+ * @xh: xenhost to send to
* @data: buffer to send
* @len: length of buffer
*
* Returns number of bytes written or -err.
*/
-static int xb_write(const void *data, unsigned int len)
+static int xb_write(xenhost_t *xh, const void *data, unsigned int len)
{
- struct xenstore_domain_interface *intf = xen_store_interface;
+ struct xenstore_private *xs = xs_priv(xh);
+ struct xenstore_domain_interface *intf = xs->store_interface;
XENSTORE_RING_IDX cons, prod;
unsigned int bytes = 0;

@@ -128,7 +120,7 @@ static int xb_write(const void *data, unsigned int len)
intf->req_cons = intf->req_prod = 0;
return -EIO;
}
- if (!xb_data_to_write())
+ if (!xb_data_to_write(xs))
return bytes;

/* Must write data /after/ reading the consumer index. */
@@ -151,21 +143,22 @@ static int xb_write(const void *data, unsigned int len)

/* Implies mb(): other side will see the updated producer. */
if (prod <= intf->req_cons)
- notify_remote_via_evtchn(xh_default, xen_store_evtchn);
+ notify_remote_via_evtchn(xh, xs->store_evtchn);
}

return bytes;
}

-static int xb_data_to_read(void)
+static int xb_data_to_read(struct xenstore_private *xs)
{
- struct xenstore_domain_interface *intf = xen_store_interface;
+ struct xenstore_domain_interface *intf = xs->store_interface;
return (intf->rsp_cons != intf->rsp_prod);
}

-static int xb_read(void *data, unsigned int len)
+static int xb_read(xenhost_t *xh, void *data, unsigned int len)
{
- struct xenstore_domain_interface *intf = xen_store_interface;
+ struct xenstore_private *xs = xs_priv(xh);
+ struct xenstore_domain_interface *intf = xs->store_interface;
XENSTORE_RING_IDX cons, prod;
unsigned int bytes = 0;

@@ -204,14 +197,15 @@ static int xb_read(void *data, unsigned int len)

/* Implies mb(): other side will see the updated consumer. */
if (intf->rsp_prod - cons >= XENSTORE_RING_SIZE)
- notify_remote_via_evtchn(xh_default, xen_store_evtchn);
+ notify_remote_via_evtchn(xh, xs->store_evtchn);
}

return bytes;
}

-static int process_msg(void)
+static int process_msg(xenhost_t *xh)
{
+ struct xenstore_private *xs = xs_priv(xh);
static struct {
struct xsd_sockmsg msg;
char *body;
@@ -242,7 +236,7 @@ static int process_msg(void)
*/
mutex_lock(&xs_response_mutex);

- if (!xb_data_to_read()) {
+ if (!xb_data_to_read(xh->xenstore_private)) {
/* We raced with save/restore: pending data 'gone'. */
mutex_unlock(&xs_response_mutex);
state.in_msg = false;
@@ -252,7 +246,7 @@ static int process_msg(void)

if (state.in_hdr) {
if (state.read != sizeof(state.msg)) {
- err = xb_read((void *)&state.msg + state.read,
+ err = xb_read(xh, (void *)&state.msg + state.read,
sizeof(state.msg) - state.read);
if (err < 0)
goto out;
@@ -281,7 +275,7 @@ static int process_msg(void)
state.read = 0;
}

- err = xb_read(state.body + state.read, state.msg.len - state.read);
+ err = xb_read(xh, state.body + state.read, state.msg.len - state.read);
if (err < 0)
goto out;

@@ -293,11 +287,11 @@ static int process_msg(void)

if (state.msg.type == XS_WATCH_EVENT) {
state.watch->len = state.msg.len;
- err = xs_watch_msg(state.watch);
+ err = xs_watch_msg(xh, state.watch);
} else {
err = -ENOENT;
mutex_lock(&xb_write_mutex);
- list_for_each_entry(req, &xs_reply_list, list) {
+ list_for_each_entry(req, &xs->reply_list, list) {
if (req->msg.req_id == state.msg.req_id) {
list_del(&req->list);
err = 0;
@@ -333,8 +327,9 @@ static int process_msg(void)
return err;
}

-static int process_writes(void)
+static int process_writes(xenhost_t *xh)
{
+ struct xenstore_private *xs = xs_priv(xh);
static struct {
struct xb_req_data *req;
int idx;
@@ -344,13 +339,13 @@ static int process_writes(void)
unsigned int len;
int err = 0;

- if (!xb_data_to_write())
+ if (!xb_data_to_write(xs))
return 0;

mutex_lock(&xb_write_mutex);

if (!state.req) {
- state.req = list_first_entry(&xb_write_list,
+ state.req = list_first_entry(&xs->xb_write_list,
struct xb_req_data, list);
state.idx = -1;
state.written = 0;
@@ -367,7 +362,7 @@ static int process_writes(void)
base = state.req->vec[state.idx].iov_base;
len = state.req->vec[state.idx].iov_len;
}
- err = xb_write(base + state.written, len - state.written);
+ err = xb_write(xh, base + state.written, len - state.written);
if (err < 0)
goto out_err;
state.written += err;
@@ -380,7 +375,7 @@ static int process_writes(void)

list_del(&state.req->list);
state.req->state = xb_req_state_wait_reply;
- list_add_tail(&state.req->list, &xs_reply_list);
+ list_add_tail(&state.req->list, &xs->reply_list);
state.req = NULL;

out:
@@ -406,42 +401,45 @@ static int process_writes(void)
return err;
}

-static int xb_thread_work(void)
+static int xb_thread_work(struct xenstore_private *xs)
{
- return xb_data_to_read() || xb_data_to_write();
+ return xb_data_to_read(xs) || xb_data_to_write(xs);
}

-static int xenbus_thread(void *unused)
+static int xenbus_thread(void *_xh)
{
+ xenhost_t *xh = (xenhost_t *)_xh;
+ struct xenstore_private *xs = xs_priv(xh);
int err;

while (!kthread_should_stop()) {
- if (wait_event_interruptible(xb_waitq, xb_thread_work()))
+ if (wait_event_interruptible(xs->xb_waitq, xb_thread_work(xs)))
continue;

- err = process_msg();
+ err = process_msg(xh);
if (err == -ENOMEM)
schedule();
else if (err)
pr_warn_ratelimited("error %d while reading message\n",
err);

- err = process_writes();
+ err = process_writes(xh);
if (err)
pr_warn_ratelimited("error %d while writing message\n",
err);
}

- xenbus_task = NULL;
+ xs->xenbus_task = NULL;
return 0;
}

/**
* xb_init_comms - Set up interrupt handler off store event channel.
*/
-int xb_init_comms(void)
+int xb_init_comms(xenhost_t *xh)
{
- struct xenstore_domain_interface *intf = xen_store_interface;
+ struct xenstore_private *xs = xs_priv(xh);
+ struct xenstore_domain_interface *intf = xs->store_interface;

if (intf->req_prod != intf->req_cons)
pr_err("request ring is not quiescent (%08x:%08x)!\n",
@@ -455,34 +453,35 @@ int xb_init_comms(void)
intf->rsp_cons = intf->rsp_prod;
}

- if (xenbus_irq) {
+ if (xs->xenbus_irq) {
/* Already have an irq; assume we're resuming */
- rebind_evtchn_irq(xen_store_evtchn, xenbus_irq);
+ rebind_evtchn_irq(xs->store_evtchn, xs->xenbus_irq);
} else {
int err;

- err = bind_evtchn_to_irqhandler(xh_default, xen_store_evtchn, wake_waiting,
- 0, "xenbus", &xb_waitq);
+ err = bind_evtchn_to_irqhandler(xh, xs->store_evtchn, wake_waiting,
+ 0, "xenbus", xs);
if (err < 0) {
pr_err("request irq failed %i\n", err);
return err;
}

- xenbus_irq = err;
+ xs->xenbus_irq = err;

- if (!xenbus_task) {
- xenbus_task = kthread_run(xenbus_thread, NULL,
+ if (!xs->xenbus_task) {
+ xs->xenbus_task = kthread_run(xenbus_thread, xh,
"xenbus");
- if (IS_ERR(xenbus_task))
- return PTR_ERR(xenbus_task);
+ if (IS_ERR(xs->xenbus_task))
+ return PTR_ERR(xs->xenbus_task);
}
}

return 0;
}

-void xb_deinit_comms(void)
+void xb_deinit_comms(xenhost_t *xh)
{
- unbind_from_irqhandler(xenbus_irq, &xb_waitq);
- xenbus_irq = 0;
+ struct xenstore_private *xs = xs_priv(xh);
+ unbind_from_irqhandler(xs->xenbus_irq, xs);
+ xs->xenbus_irq = 0;
}
diff --git a/drivers/xen/xenbus/xenbus_dev_backend.c b/drivers/xen/xenbus/xenbus_dev_backend.c
index edba5fecde4d..211f1ce53d30 100644
--- a/drivers/xen/xenbus/xenbus_dev_backend.c
+++ b/drivers/xen/xenbus/xenbus_dev_backend.c
@@ -19,6 +19,8 @@

#include "xenbus.h"

+static xenhost_t *xh;
+
static int xenbus_backend_open(struct inode *inode, struct file *filp)
{
if (!capable(CAP_SYS_ADMIN))
@@ -31,6 +33,7 @@ static long xenbus_alloc(domid_t domid)
{
struct evtchn_alloc_unbound arg;
int err = -EEXIST;
+ struct xenstore_private *xs = xs_priv(xh);

xs_suspend();

@@ -44,23 +47,23 @@ static long xenbus_alloc(domid_t domid)
* unnecessarily complex for the intended use where xenstored is only
* started once - so return -EEXIST if it's already running.
*/
- if (xenstored_ready)
+ if (xs->xenstored_ready)
goto out_err;

- gnttab_grant_foreign_access_ref(GNTTAB_RESERVED_XENSTORE, domid,
- virt_to_gfn(xen_store_interface), 0 /* writable */);
+ gnttab_grant_foreign_access_ref(xh, GNTTAB_RESERVED_XENSTORE, domid,
+ virt_to_gfn(xs->store_interface), 0 /* writable */);

arg.dom = DOMID_SELF;
arg.remote_dom = domid;

- err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound, &arg);
+ err = hypervisor_event_channel_op(xh, EVTCHNOP_alloc_unbound, &arg);
if (err)
goto out_err;

- if (xen_store_evtchn > 0)
- xb_deinit_comms();
+ if (xs->store_evtchn > 0)
+ xb_deinit_comms(xh);

- xen_store_evtchn = arg.port;
+ xs->store_evtchn = arg.port;

xs_resume();

@@ -74,13 +77,15 @@ static long xenbus_alloc(domid_t domid)
static long xenbus_backend_ioctl(struct file *file, unsigned int cmd,
unsigned long data)
{
+ struct xenstore_private *xs = xs_priv(xh);
+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;

switch (cmd) {
case IOCTL_XENBUS_BACKEND_EVTCHN:
- if (xen_store_evtchn > 0)
- return xen_store_evtchn;
+ if (xs->store_evtchn > 0)
+ return xs->store_evtchn;
return -ENODEV;
case IOCTL_XENBUS_BACKEND_SETUP:
return xenbus_alloc(data);
@@ -92,6 +97,7 @@ static long xenbus_backend_ioctl(struct file *file, unsigned int cmd,
static int xenbus_backend_mmap(struct file *file, struct vm_area_struct *vma)
{
size_t size = vma->vm_end - vma->vm_start;
+ struct xenstore_private *xs = xs_priv(xh);

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -100,7 +106,7 @@ static int xenbus_backend_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;

if (remap_pfn_range(vma, vma->vm_start,
- virt_to_pfn(xen_store_interface),
+ virt_to_pfn(xs->store_interface),
size, vma->vm_page_prot))
return -EAGAIN;

@@ -125,6 +131,10 @@ static int __init xenbus_backend_init(void)

if (!xen_initial_domain())
return -ENODEV;
+ /*
+ * Backends shouldn't have any truck with the remote xenhost.
+ */
+ xh = xh_default;

err = misc_register(&xenbus_backend_dev);
if (err)
diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
index c3e201025ef0..d6e0c397c6a0 100644
--- a/drivers/xen/xenbus/xenbus_dev_frontend.c
+++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
@@ -58,10 +58,14 @@

#include <xen/xenbus.h>
#include <xen/xen.h>
+#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <asm/xen/hypervisor.h>

#include "xenbus.h"

+static xenhost_t *xh;
+
/*
* An element of a list of outstanding transactions, for which we're
* still waiting a reply.
@@ -312,13 +316,13 @@ static void xenbus_file_free(struct kref *kref)
*/

list_for_each_entry_safe(trans, tmp, &u->transactions, list) {
- xenbus_transaction_end(trans->handle, 1);
+ xenbus_transaction_end(xh, trans->handle, 1);
list_del(&trans->list);
kfree(trans);
}

list_for_each_entry_safe(watch, tmp_watch, &u->watches, list) {
- unregister_xenbus_watch(&watch->watch);
+ unregister_xenbus_watch(xh, &watch->watch);
list_del(&watch->list);
free_watch_adapter(watch);
}
@@ -450,7 +454,7 @@ static int xenbus_write_transaction(unsigned msg_type,
(!strcmp(msg->body, "T") || !strcmp(msg->body, "F"))))
return xenbus_command_reply(u, XS_ERROR, "EINVAL");

- rc = xenbus_dev_request_and_reply(&msg->hdr, u);
+ rc = xenbus_dev_request_and_reply(xh, &msg->hdr, u);
if (rc && trans) {
list_del(&trans->list);
kfree(trans);
@@ -489,7 +493,7 @@ static int xenbus_write_watch(unsigned msg_type, struct xenbus_file_priv *u)
watch->watch.callback = watch_fired;
watch->dev_data = u;

- err = register_xenbus_watch(&watch->watch);
+ err = register_xenbus_watch(xh, &watch->watch);
if (err) {
free_watch_adapter(watch);
rc = err;
@@ -500,7 +504,7 @@ static int xenbus_write_watch(unsigned msg_type, struct xenbus_file_priv *u)
list_for_each_entry(watch, &u->watches, list) {
if (!strcmp(watch->token, token) &&
!strcmp(watch->watch.node, path)) {
- unregister_xenbus_watch(&watch->watch);
+ unregister_xenbus_watch(xh, &watch->watch);
list_del(&watch->list);
free_watch_adapter(watch);
break;
@@ -618,8 +622,9 @@ static ssize_t xenbus_file_write(struct file *filp,
static int xenbus_file_open(struct inode *inode, struct file *filp)
{
struct xenbus_file_priv *u;
+ struct xenstore_private *xs = xs_priv(xh);

- if (xen_store_evtchn == 0)
+ if (xs->store_evtchn == 0)
return -ENOENT;

nonseekable_open(inode, filp);
@@ -687,6 +692,11 @@ static int __init xenbus_init(void)
if (!xen_domain())
return -ENODEV;

+ if (xen_driver_domain() && xen_nested())
+ xh = xh_remote;
+ else
+ xh = xh_default;
+
err = misc_register(&xenbus_dev);
if (err)
pr_err("Could not register xenbus frontend device\n");
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 049bd511f36e..bd90ba00d64c 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -65,20 +65,6 @@

#include "xenbus.h"

-
-int xen_store_evtchn;
-EXPORT_SYMBOL_GPL(xen_store_evtchn);
-
-struct xenstore_domain_interface *xen_store_interface;
-EXPORT_SYMBOL_GPL(xen_store_interface);
-
-enum xenstore_init xen_store_domain_type;
-EXPORT_SYMBOL_GPL(xen_store_domain_type);
-
-static unsigned long xen_store_gfn;
-
-static BLOCKING_NOTIFIER_HEAD(xenstore_chain);
-
/* If something in array of ids matches this device, return it. */
static const struct xenbus_device_id *
match_device(const struct xenbus_device_id *arr, struct xenbus_device *dev)
@@ -112,7 +98,7 @@ static void free_otherend_details(struct xenbus_device *dev)
static void free_otherend_watch(struct xenbus_device *dev)
{
if (dev->otherend_watch.node) {
- unregister_xenbus_watch(&dev->otherend_watch);
+ unregister_xenbus_watch(dev->xh, &dev->otherend_watch);
kfree(dev->otherend_watch.node);
dev->otherend_watch.node = NULL;
}
@@ -145,7 +131,7 @@ static int watch_otherend(struct xenbus_device *dev)
int xenbus_read_otherend_details(struct xenbus_device *xendev,
char *id_node, char *path_node)
{
- int err = xenbus_gather(XBT_NIL, xendev->nodename,
+ int err = xenbus_gather(xendev->xh, XBT_NIL, xendev->nodename,
id_node, "%i", &xendev->otherend_id,
path_node, NULL, &xendev->otherend,
NULL);
@@ -156,7 +142,7 @@ int xenbus_read_otherend_details(struct xenbus_device *xendev,
return err;
}
if (strlen(xendev->otherend) == 0 ||
- !xenbus_exists(XBT_NIL, xendev->otherend, "")) {
+ !xenbus_exists(xendev->xh, XBT_NIL, xendev->otherend, "")) {
xenbus_dev_fatal(xendev, -ENOENT,
"unable to read other end from %s. "
"missing or inaccessible.",
@@ -186,7 +172,7 @@ void xenbus_otherend_changed(struct xenbus_watch *watch,
return;
}

- state = xenbus_read_driver_state(dev->otherend);
+ state = xenbus_read_driver_state(dev, dev->otherend);

dev_dbg(&dev->dev, "state is %d, (%s), %s, %s\n",
state, xenbus_strstate(state), dev->otherend_watch.node, path);
@@ -439,7 +425,11 @@ int xenbus_probe_node(struct xen_bus_type *bus,
size_t stringlen;
char *tmpstring;

- enum xenbus_state state = xenbus_read_driver_state(nodename);
+ enum xenbus_state state;
+
+ err = xenbus_gather(bus->xh, XBT_NIL, nodename, "state", "%d", &state, NULL);
+ if (err)
+ state = XenbusStateUnknown;

if (state != XenbusStateInitialising) {
/* Device is not new, so ignore it. This can happen if a
@@ -465,10 +455,11 @@ int xenbus_probe_node(struct xen_bus_type *bus,
xendev->devicetype = tmpstring;
init_completion(&xendev->down);

+ xendev->xh = bus->xh;
xendev->dev.bus = &bus->bus;
xendev->dev.release = xenbus_dev_release;

- err = bus->get_bus_id(devname, xendev->nodename);
+ err = bus->get_bus_id(bus, devname, xendev->nodename);
if (err)
goto fail;

@@ -496,7 +487,7 @@ static int xenbus_probe_device_type(struct xen_bus_type *bus, const char *type)
unsigned int dir_n = 0;
int i;

- dir = xenbus_directory(XBT_NIL, bus->root, type, &dir_n);
+ dir = xenbus_directory(bus->xh, XBT_NIL, bus->root, type, &dir_n);
if (IS_ERR(dir))
return PTR_ERR(dir);

@@ -516,7 +507,7 @@ int xenbus_probe_devices(struct xen_bus_type *bus)
char **dir;
unsigned int i, dir_n;

- dir = xenbus_directory(XBT_NIL, bus->root, "", &dir_n);
+ dir = xenbus_directory(bus->xh, XBT_NIL, bus->root, "", &dir_n);
if (IS_ERR(dir))
return PTR_ERR(dir);

@@ -564,7 +555,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
if (char_count(node, '/') < 2)
return;

- exists = xenbus_exists(XBT_NIL, node, "");
+ exists = xenbus_exists(bus->xh, XBT_NIL, node, "");
if (!exists) {
xenbus_cleanup_devices(node, &bus->bus);
return;
@@ -660,47 +651,61 @@ int xenbus_dev_cancel(struct device *dev)
}
EXPORT_SYMBOL_GPL(xenbus_dev_cancel);

-/* A flag to determine if xenstored is 'ready' (i.e. has started) */
-int xenstored_ready;
-
-
-int register_xenstore_notifier(struct notifier_block *nb)
+int register_xenstore_notifier(xenhost_t *xh, struct notifier_block *nb)
{
int ret = 0;
+ struct xenstore_private *xs = xs_priv(xh);

- if (xenstored_ready > 0)
+ if (xs->xenstored_ready > 0)
ret = nb->notifier_call(nb, 0, NULL);
else
- blocking_notifier_chain_register(&xenstore_chain, nb);
+ blocking_notifier_chain_register(&xs->xenstore_chain, nb);

return ret;
}
EXPORT_SYMBOL_GPL(register_xenstore_notifier);

-void unregister_xenstore_notifier(struct notifier_block *nb)
+void unregister_xenstore_notifier(xenhost_t *xh, struct notifier_block *nb)
{
- blocking_notifier_chain_unregister(&xenstore_chain, nb);
+ struct xenstore_private *xs = xs_priv(xh);
+
+ blocking_notifier_chain_unregister(&xs->xenstore_chain, nb);
}
EXPORT_SYMBOL_GPL(unregister_xenstore_notifier);

-void xenbus_probe(struct work_struct *unused)
+/* Needed by platform-pci */
+void __xenbus_probe(void *_xs)
{
- xenstored_ready = 1;
+ struct xenstore_private *xs = (struct xenstore_private *) _xs;
+ xs->xenstored_ready = 1;

/* Notify others that xenstore is up */
- blocking_notifier_call_chain(&xenstore_chain, 0, NULL);
+ blocking_notifier_call_chain(&xs->xenstore_chain, 0, NULL);
+}
+EXPORT_SYMBOL_GPL(__xenbus_probe);
+
+void xenbus_probe(struct work_struct *w)
+{
+ struct xenstore_private *xs = container_of(w,
+ struct xenstore_private, probe_work);
+
+ __xenbus_probe(xs);
}
-EXPORT_SYMBOL_GPL(xenbus_probe);

static int __init xenbus_probe_initcall(void)
{
+ xenhost_t **xh;
+
if (!xen_domain())
return -ENODEV;

if (xen_initial_domain() || xen_hvm_domain())
return 0;

- xenbus_probe(NULL);
+ for_each_xenhost(xh) {
+ struct xenstore_private *xs = xs_priv(*xh);
+ xenbus_probe(&xs->probe_work);
+ }
return 0;
}

@@ -709,30 +714,31 @@ device_initcall(xenbus_probe_initcall);
/* Set up event channel for xenstored which is run as a local process
* (this is normally used only in dom0)
*/
-static int __init xenstored_local_init(void)
+static int __init xenstored_local_init(xenhost_t *xh)
{
int err = -ENOMEM;
unsigned long page = 0;
struct evtchn_alloc_unbound alloc_unbound;
+ struct xenstore_private *xs = xs_priv(xh);

/* Allocate Xenstore page */
page = get_zeroed_page(GFP_KERNEL);
if (!page)
goto out_err;

- xen_store_gfn = virt_to_gfn((void *)page);
+ xs->store_gfn = virt_to_gfn((void *)page);

/* Next allocate a local port which xenstored can bind to */
alloc_unbound.dom = DOMID_SELF;
alloc_unbound.remote_dom = DOMID_SELF;

- err = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound,
+ err = hypervisor_event_channel_op(xh, EVTCHNOP_alloc_unbound,
&alloc_unbound);
if (err == -ENOSYS)
goto out_err;

BUG_ON(err);
- xen_store_evtchn = alloc_unbound.port;
+ xs->store_evtchn = alloc_unbound.port;

return 0;

@@ -746,18 +752,24 @@ static int xenbus_resume_cb(struct notifier_block *nb,
unsigned long action, void *data)
{
int err = 0;
+ xenhost_t **xh;

- if (xen_hvm_domain()) {
- uint64_t v = 0;
+ for_each_xenhost(xh) {
+ struct xenstore_private *xs = xs_priv(*xh);

- err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
- if (!err && v)
- xen_store_evtchn = v;
- else
- pr_warn("Cannot update xenstore event channel: %d\n",
- err);
- } else
- xen_store_evtchn = xen_start_info->store_evtchn;
+ /* FIXME xh->resume_xs()? */
+ if (xen_hvm_domain()) {
+ uint64_t v = 0;
+
+ err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
+ if (!err && v)
+ xs->store_evtchn = v;
+ else
+ pr_warn("Cannot update xenstore event channel: %d\n",
+ err);
+ } else
+ xs->store_evtchn = xen_start_info->store_evtchn;
+ }

return err;
}
@@ -766,67 +778,115 @@ static struct notifier_block xenbus_resume_nb = {
.notifier_call = xenbus_resume_cb,
};

-static int __init xenbus_init(void)
+int xenbus_setup(xenhost_t *xh)
{
+ struct xenstore_private *xs = xs_priv(xh);
int err = 0;
- uint64_t v = 0;
- xen_store_domain_type = XS_UNKNOWN;

- if (!xen_domain())
- return -ENODEV;
+ BUG_ON(xs->domain_type == XS_UNKNOWN);

- xenbus_ring_ops_init();
-
- if (xen_pv_domain())
- xen_store_domain_type = XS_PV;
- if (xen_hvm_domain())
- xen_store_domain_type = XS_HVM;
- if (xen_hvm_domain() && xen_initial_domain())
- xen_store_domain_type = XS_LOCAL;
- if (xen_pv_domain() && !xen_start_info->store_evtchn)
- xen_store_domain_type = XS_LOCAL;
- if (xen_pv_domain() && xen_start_info->store_evtchn)
- xenstored_ready = 1;
-
- switch (xen_store_domain_type) {
+ switch (xs->domain_type) {
case XS_LOCAL:
- err = xenstored_local_init();
+ err = xenstored_local_init(xh);
if (err)
- goto out_error;
- xen_store_interface = gfn_to_virt(xen_store_gfn);
+ goto out;
+ xs->store_interface = gfn_to_virt(xs->store_gfn);
break;
case XS_PV:
- xen_store_evtchn = xen_start_info->store_evtchn;
- xen_store_gfn = xen_start_info->store_mfn;
- xen_store_interface = gfn_to_virt(xen_store_gfn);
+ xs->store_interface = gfn_to_virt(xs->store_gfn);
+ xs->xenstored_ready = 1;
break;
case XS_HVM:
- err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
- if (err)
- goto out_error;
- xen_store_evtchn = (int)v;
- err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
- if (err)
- goto out_error;
- xen_store_gfn = (unsigned long)v;
- xen_store_interface =
- xen_remap(xen_store_gfn << XEN_PAGE_SHIFT,
+ xs->store_interface =
+ xen_remap(xs->store_gfn << XEN_PAGE_SHIFT,
XEN_PAGE_SIZE);
break;
default:
pr_warn("Xenstore state unknown\n");
break;
}
+out:
+ return err;
+}

- /* Initialize the interface to xenstore. */
- err = xs_init();
- if (err) {
- pr_warn("Error initializing xenstore comms: %i\n", err);
- goto out_error;
+int xen_hvm_setup_xs(xenhost_t *xh)
+{
+ uint64_t v = 0;
+ int err = 0;
+ struct xenstore_private *xs = xs_priv(xh);
+
+ if (xen_initial_domain()) {
+ xs->domain_type = XS_LOCAL;
+ xs->store_evtchn = 0;
+ xs->store_gfn = 0;
+ } else { /* Frontend */
+ xs->domain_type = XS_HVM;
+ err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
+ if (err)
+ goto out;
+ xs->store_evtchn = (int) v;
+
+ err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
+ if (err)
+ goto out;
+ xs->store_gfn = (int) v;
+ }
+
+out:
+ return err;
+}
+
+int xen_pv_setup_xs(xenhost_t *xh)
+{
+ struct xenstore_private *xs = xs_priv(xh);
+
+ if (xen_initial_domain()) {
+ xs->domain_type = XS_LOCAL;
+ xs->store_evtchn = 0;
+ xs->store_gfn = 0;
+ } else { /* Frontend */
+ xs->domain_type = XS_PV;
+ xs->store_evtchn = xen_start_info->store_evtchn;
+ xs->store_gfn = xen_start_info->store_mfn;
+ }
+
+ return 0;
+}
+
+static int __init xenbus_init(void)
+{
+ int err = 0;
+ struct xenstore_private *xs;
+ xenhost_t **xh;
+ int notifier = 0;
+
+ if (!xen_domain())
+ return -ENODEV;
+
+ xenbus_ring_ops_init();
+
+ for_each_xenhost(xh) {
+ (*xh)->xenstore_private = kzalloc(sizeof(*xs), GFP_KERNEL);
+ xenhost_setup_xs(*xh);
+ err = xenbus_setup(*xh);
+ if (err)
+ goto out_error;
+
+ /* Initialize the interface to xenstore. */
+ err = xs_init(*xh);
+ if (err) {
+ pr_warn("Error initializing xenstore comms: %i\n", err);
+ goto out_error;
+ }
+
+ xs = xs_priv(*xh);
+
+ if ((xs->domain_type != XS_LOCAL) &&
+ (xs->domain_type != XS_UNKNOWN))
+ notifier++;
}

- if ((xen_store_domain_type != XS_LOCAL) &&
- (xen_store_domain_type != XS_UNKNOWN))
+ if (notifier)
xen_resume_notifier_register(&xenbus_resume_nb);

#ifdef CONFIG_XEN_COMPAT_XENFS
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c
index d3c53a9db5e3..f030d6ab3c31 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -57,7 +57,8 @@
#include "xenbus.h"

/* backend/<type>/<fe-uuid>/<id> => <type>-<fe-domid>-<id> */
-static int backend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename)
+static int backend_bus_id(struct xen_bus_type *bus, char bus_id[XEN_BUS_ID_SIZE],
+ const char *nodename)
{
int domid, err;
const char *devid, *type, *frontend;
@@ -73,14 +74,14 @@ static int backend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename)

devid = strrchr(nodename, '/') + 1;

- err = xenbus_gather(XBT_NIL, nodename, "frontend-id", "%i", &domid,
+ err = xenbus_gather(bus->xh, XBT_NIL, nodename, "frontend-id", "%i", &domid,
"frontend", NULL, &frontend,
NULL);
if (err)
return err;
if (strlen(frontend) == 0)
err = -ERANGE;
- if (!err && !xenbus_exists(XBT_NIL, frontend, ""))
+ if (!err && !xenbus_exists(bus->xh, XBT_NIL, frontend, ""))
err = -ENOENT;
kfree(frontend);

@@ -165,7 +166,7 @@ static int xenbus_probe_backend(struct xen_bus_type *bus, const char *type,
if (!nodename)
return -ENOMEM;

- dir = xenbus_directory(XBT_NIL, nodename, "", &dir_n);
+ dir = xenbus_directory(bus->xh, XBT_NIL, nodename, "", &dir_n);
if (IS_ERR(dir)) {
kfree(nodename);
return PTR_ERR(dir);
@@ -189,6 +190,7 @@ static void frontend_changed(struct xenbus_watch *watch,

static struct xen_bus_type xenbus_backend = {
.root = "backend",
+ .xh = NULL, /* Filled at xenbus_probe_backend_init() */
.levels = 3, /* backend/type/<frontend>/<id> */
.get_bus_id = backend_bus_id,
.probe = xenbus_probe_backend,
@@ -224,7 +226,7 @@ static int read_frontend_details(struct xenbus_device *xendev)

int xenbus_dev_is_online(struct xenbus_device *dev)
{
- return !!xenbus_read_unsigned(dev->nodename, "online", 0);
+ return !!xenbus_read_unsigned(dev->xh, dev->nodename, "online", 0);
}
EXPORT_SYMBOL_GPL(xenbus_dev_is_online);

@@ -244,7 +246,7 @@ static int backend_probe_and_watch(struct notifier_block *notifier,
{
/* Enumerate devices in xenstore and watch for changes. */
xenbus_probe_devices(&xenbus_backend);
- register_xenbus_watch(&be_watch);
+ register_xenbus_watch(xenbus_backend.xh,&be_watch);

return NOTIFY_DONE;
}
@@ -258,12 +260,15 @@ static int __init xenbus_probe_backend_init(void)

DPRINTK("");

+ /* Backends always talk to default xenhost */
+ xenbus_backend.xh = xh_default;
+
/* Register ourselves with the kernel bus subsystem */
err = bus_register(&xenbus_backend.bus);
if (err)
return err;

- register_xenstore_notifier(&xenstore_notifier);
+ register_xenstore_notifier(xenbus_backend.xh, &xenstore_notifier);

return 0;
}
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index 3edab7cc03c3..fa2f733d1f1e 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -20,6 +20,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <asm/xen/hypervisor.h>
#include <xen/xenbus.h>
#include <xen/events.h>
@@ -33,7 +34,8 @@


/* device/<type>/<id> => <type>-<id> */
-static int frontend_bus_id(char bus_id[XEN_BUS_ID_SIZE], const char *nodename)
+static int frontend_bus_id(struct xen_bus_type *bus, char bus_id[XEN_BUS_ID_SIZE],
+ const char *nodename)
{
nodename = strchr(nodename, '/');
if (!nodename || strlen(nodename + 1) >= XEN_BUS_ID_SIZE) {
@@ -101,13 +103,13 @@ static void xenbus_frontend_delayed_resume(struct work_struct *w)

static int xenbus_frontend_dev_resume(struct device *dev)
{
+ struct xenbus_device *xdev = to_xenbus_device(dev);
+ struct xenstore_private *xs = xs_priv(xdev->xh);
/*
* If xenstored is running in this domain, we cannot access the backend
* state at the moment, so we need to defer xenbus_dev_resume
*/
- if (xen_store_domain_type == XS_LOCAL) {
- struct xenbus_device *xdev = to_xenbus_device(dev);
-
+ if (xs->domain_type == XS_LOCAL) {
schedule_work(&xdev->work);

return 0;
@@ -118,8 +120,10 @@ static int xenbus_frontend_dev_resume(struct device *dev)

static int xenbus_frontend_dev_probe(struct device *dev)
{
- if (xen_store_domain_type == XS_LOCAL) {
- struct xenbus_device *xdev = to_xenbus_device(dev);
+ struct xenbus_device *xdev = to_xenbus_device(dev);
+ struct xenstore_private *xs = xs_priv(xdev->xh);
+
+ if (xs->domain_type == XS_LOCAL) {
INIT_WORK(&xdev->work, xenbus_frontend_delayed_resume);
}

@@ -136,6 +140,7 @@ static const struct dev_pm_ops xenbus_pm_ops = {

static struct xen_bus_type xenbus_frontend = {
.root = "device",
+ .xh = NULL, /* initializd in xenbus_probe_frontend_init() */
.levels = 2, /* device/type/<id> */
.get_bus_id = frontend_bus_id,
.probe = xenbus_probe_frontend,
@@ -242,7 +247,7 @@ static int print_device_status(struct device *dev, void *data)
} else if (xendev->state < XenbusStateConnected) {
enum xenbus_state rstate = XenbusStateUnknown;
if (xendev->otherend)
- rstate = xenbus_read_driver_state(xendev->otherend);
+ rstate = xenbus_read_driver_state(xendev, xendev->otherend);
pr_warn("Timeout connecting to device: %s (local state %d, remote state %d)\n",
xendev->nodename, xendev->state, rstate);
}
@@ -335,7 +340,7 @@ static int backend_state;
static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
const char *path, const char *token)
{
- if (xenbus_scanf(XBT_NIL, path, "", "%i",
+ if (xenbus_scanf(xenbus_frontend.xh, XBT_NIL, path, "", "%i",
&backend_state) != 1)
backend_state = XenbusStateUnknown;
printk(KERN_DEBUG "XENBUS: backend %s %s\n",
@@ -373,26 +378,27 @@ static void xenbus_reset_frontend(char *fe, char *be, int be_state)
backend_state = XenbusStateUnknown;

pr_info("triggering reconnect on %s\n", be);
- register_xenbus_watch(&be_watch);
+ register_xenbus_watch(xenbus_frontend.xh, &be_watch);

/* fall through to forward backend to state XenbusStateInitialising */
switch (be_state) {
case XenbusStateConnected:
- xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosing);
+ xenbus_printf(xenbus_frontend.xh, XBT_NIL, fe,
+ "state", "%d", XenbusStateClosing);
xenbus_reset_wait_for_backend(be, XenbusStateClosing);
/* fall through */

case XenbusStateClosing:
- xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosed);
+ xenbus_printf(xenbus_frontend.xh, XBT_NIL, fe, "state", "%d", XenbusStateClosed);
xenbus_reset_wait_for_backend(be, XenbusStateClosed);
/* fall through */

case XenbusStateClosed:
- xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateInitialising);
+ xenbus_printf(xenbus_frontend.xh, XBT_NIL, fe, "state", "%d", XenbusStateInitialising);
xenbus_reset_wait_for_backend(be, XenbusStateInitWait);
}

- unregister_xenbus_watch(&be_watch);
+ unregister_xenbus_watch(xenbus_frontend.xh, &be_watch);
pr_info("reconnect done on %s\n", be);
kfree(be_watch.node);
}
@@ -406,7 +412,7 @@ static void xenbus_check_frontend(char *class, char *dev)
if (!frontend)
return;

- err = xenbus_scanf(XBT_NIL, frontend, "state", "%i", &fe_state);
+ err = xenbus_scanf(xenbus_frontend.xh, XBT_NIL, frontend, "state", "%i", &fe_state);
if (err != 1)
goto out;

@@ -415,10 +421,10 @@ static void xenbus_check_frontend(char *class, char *dev)
case XenbusStateClosed:
printk(KERN_DEBUG "XENBUS: frontend %s %s\n",
frontend, xenbus_strstate(fe_state));
- backend = xenbus_read(XBT_NIL, frontend, "backend", NULL);
+ backend = xenbus_read(xenbus_frontend.xh, XBT_NIL, frontend, "backend", NULL);
if (!backend || IS_ERR(backend))
goto out;
- err = xenbus_scanf(XBT_NIL, backend, "state", "%i", &be_state);
+ err = xenbus_scanf(xenbus_frontend.xh, XBT_NIL, backend, "state", "%i", &be_state);
if (err == 1)
xenbus_reset_frontend(frontend, backend, be_state);
kfree(backend);
@@ -430,18 +436,18 @@ static void xenbus_check_frontend(char *class, char *dev)
kfree(frontend);
}

-static void xenbus_reset_state(void)
+static void xenbus_reset_state(xenhost_t *xh)
{
char **devclass, **dev;
int devclass_n, dev_n;
int i, j;

- devclass = xenbus_directory(XBT_NIL, "device", "", &devclass_n);
+ devclass = xenbus_directory(xh, XBT_NIL, "device", "", &devclass_n);
if (IS_ERR(devclass))
return;

for (i = 0; i < devclass_n; i++) {
- dev = xenbus_directory(XBT_NIL, "device", devclass[i], &dev_n);
+ dev = xenbus_directory(xh, XBT_NIL, "device", devclass[i], &dev_n);
if (IS_ERR(dev))
continue;
for (j = 0; j < dev_n; j++)
@@ -453,14 +459,14 @@ static void xenbus_reset_state(void)

static int frontend_probe_and_watch(struct notifier_block *notifier,
unsigned long event,
- void *data)
+ void *xh)
{
/* reset devices in Connected or Closed state */
if (xen_hvm_domain())
- xenbus_reset_state();
+ xenbus_reset_state((xenhost_t *)xh);
/* Enumerate devices in xenstore and watch for changes. */
xenbus_probe_devices(&xenbus_frontend);
- register_xenbus_watch(&fe_watch);
+ register_xenbus_watch(xh, &fe_watch);

return NOTIFY_DONE;
}
@@ -475,12 +481,19 @@ static int __init xenbus_probe_frontend_init(void)

DPRINTK("");

+ if (xen_driver_domain() && xen_nested())
+ xenbus_frontend.xh = xh_remote;
+ else
+ xenbus_frontend.xh = xh_default;
+
/* Register ourselves with the kernel bus subsystem */
- err = bus_register(&xenbus_frontend.bus);
- if (err)
- return err;
+ if (xenbus_frontend.xh) {
+ err = bus_register(&xenbus_frontend.bus);
+ if (err)
+ return err;

- register_xenstore_notifier(&xenstore_notifier);
+ register_xenstore_notifier(xenbus_frontend.xh, &xenstore_notifier);
+ }

return 0;
}
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 74c2b9416b88..35c771bea9b6 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -76,8 +76,6 @@ static DECLARE_WAIT_QUEUE_HEAD(xs_state_enter_wq);
/* Wait queue for suspend handling waiting for critical region being empty. */
static DECLARE_WAIT_QUEUE_HEAD(xs_state_exit_wq);

-/* List of registered watches, and a lock to protect it. */
-static LIST_HEAD(watches);
static DEFINE_SPINLOCK(watches_lock);

/* List of pending watch callback events, and a lock to protect it. */
@@ -166,9 +164,9 @@ static int get_error(const char *errorstring)
return xsd_errors[i].errnum;
}

-static bool xenbus_ok(void)
+static bool xenbus_ok(struct xenstore_private *xs)
{
- switch (xen_store_domain_type) {
+ switch (xs->domain_type) {
case XS_LOCAL:
switch (system_state) {
case SYSTEM_POWER_OFF:
@@ -190,9 +188,9 @@ static bool xenbus_ok(void)
return false;
}

-static bool test_reply(struct xb_req_data *req)
+static bool test_reply(struct xenstore_private *xs, struct xb_req_data *req)
{
- if (req->state == xb_req_state_got_reply || !xenbus_ok())
+ if (req->state == xb_req_state_got_reply || !xenbus_ok(xs))
return true;

/* Make sure to reread req->state each time. */
@@ -201,12 +199,12 @@ static bool test_reply(struct xb_req_data *req)
return false;
}

-static void *read_reply(struct xb_req_data *req)
+static void *read_reply(struct xenstore_private *xs, struct xb_req_data *req)
{
while (req->state != xb_req_state_got_reply) {
- wait_event(req->wq, test_reply(req));
+ wait_event(req->wq, test_reply(xs, req));

- if (!xenbus_ok())
+ if (!xenbus_ok(xs))
/*
* If we are in the process of being shut-down there is
* no point of trying to contact XenBus - it is either
@@ -222,9 +220,10 @@ static void *read_reply(struct xb_req_data *req)
return req->body;
}

-static void xs_send(struct xb_req_data *req, struct xsd_sockmsg *msg)
+static void xs_send(xenhost_t *xh, struct xb_req_data *req, struct xsd_sockmsg *msg)
{
bool notify;
+ struct xenstore_private *xs = xs_priv(xh);

req->msg = *msg;
req->err = 0;
@@ -236,19 +235,19 @@ static void xs_send(struct xb_req_data *req, struct xsd_sockmsg *msg)
req->msg.req_id = xs_request_enter(req);

mutex_lock(&xb_write_mutex);
- list_add_tail(&req->list, &xb_write_list);
- notify = list_is_singular(&xb_write_list);
+ list_add_tail(&req->list, &xs->xb_write_list);
+ notify = list_is_singular(&xs->xb_write_list);
mutex_unlock(&xb_write_mutex);

if (notify)
- wake_up(&xb_waitq);
+ wake_up(&xs->xb_waitq);
}

-static void *xs_wait_for_reply(struct xb_req_data *req, struct xsd_sockmsg *msg)
+static void *xs_wait_for_reply(struct xenstore_private *xs, struct xb_req_data *req, struct xsd_sockmsg *msg)
{
void *ret;

- ret = read_reply(req);
+ ret = read_reply(xs, req);

xs_request_exit(req);

@@ -271,7 +270,7 @@ static void xs_wake_up(struct xb_req_data *req)
wake_up(&req->wq);
}

-int xenbus_dev_request_and_reply(struct xsd_sockmsg *msg, void *par)
+int xenbus_dev_request_and_reply(xenhost_t *xh, struct xsd_sockmsg *msg, void *par)
{
struct xb_req_data *req;
struct kvec *vec;
@@ -289,14 +288,15 @@ int xenbus_dev_request_and_reply(struct xsd_sockmsg *msg, void *par)
req->cb = xenbus_dev_queue_reply;
req->par = par;

- xs_send(req, msg);
+ xs_send(xh, req, msg);

return 0;
}
EXPORT_SYMBOL(xenbus_dev_request_and_reply);

/* Send message to xs, get kmalloc'ed reply. ERR_PTR() on error. */
-static void *xs_talkv(struct xenbus_transaction t,
+static void *xs_talkv(xenhost_t *xh,
+ struct xenbus_transaction t,
enum xsd_sockmsg_type type,
const struct kvec *iovec,
unsigned int num_vecs,
@@ -307,6 +307,7 @@ static void *xs_talkv(struct xenbus_transaction t,
void *ret = NULL;
unsigned int i;
int err;
+ struct xenstore_private *xs = xs_priv(xh);

req = kmalloc(sizeof(*req), GFP_NOIO | __GFP_HIGH);
if (!req)
@@ -323,9 +324,9 @@ static void *xs_talkv(struct xenbus_transaction t,
for (i = 0; i < num_vecs; i++)
msg.len += iovec[i].iov_len;

- xs_send(req, &msg);
+ xs_send(xh, req, &msg);

- ret = xs_wait_for_reply(req, &msg);
+ ret = xs_wait_for_reply(xs, req, &msg);
if (len)
*len = msg.len;

@@ -348,7 +349,7 @@ static void *xs_talkv(struct xenbus_transaction t,
}

/* Simplified version of xs_talkv: single message. */
-static void *xs_single(struct xenbus_transaction t,
+static void *xs_single(xenhost_t *xh, struct xenbus_transaction t,
enum xsd_sockmsg_type type,
const char *string,
unsigned int *len)
@@ -357,7 +358,7 @@ static void *xs_single(struct xenbus_transaction t,

iovec.iov_base = (void *)string;
iovec.iov_len = strlen(string) + 1;
- return xs_talkv(t, type, &iovec, 1, len);
+ return xs_talkv(xh, t, type, &iovec, 1, len);
}

/* Many commands only need an ack, don't care what it says. */
@@ -415,7 +416,7 @@ static char **split(char *strings, unsigned int len, unsigned int *num)
return ret;
}

-char **xenbus_directory(struct xenbus_transaction t,
+char **xenbus_directory(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, unsigned int *num)
{
char *strings, *path;
@@ -425,7 +426,7 @@ char **xenbus_directory(struct xenbus_transaction t,
if (IS_ERR(path))
return (char **)path;

- strings = xs_single(t, XS_DIRECTORY, path, &len);
+ strings = xs_single(xh, t, XS_DIRECTORY, path, &len);
kfree(path);
if (IS_ERR(strings))
return (char **)strings;
@@ -435,13 +436,13 @@ char **xenbus_directory(struct xenbus_transaction t,
EXPORT_SYMBOL_GPL(xenbus_directory);

/* Check if a path exists. Return 1 if it does. */
-int xenbus_exists(struct xenbus_transaction t,
+int xenbus_exists(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node)
{
char **d;
int dir_n;

- d = xenbus_directory(t, dir, node, &dir_n);
+ d = xenbus_directory(xh, t, dir, node, &dir_n);
if (IS_ERR(d))
return 0;
kfree(d);
@@ -453,7 +454,7 @@ EXPORT_SYMBOL_GPL(xenbus_exists);
* Returns a kmalloced value: call free() on it after use.
* len indicates length in bytes.
*/
-void *xenbus_read(struct xenbus_transaction t,
+void *xenbus_read(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, unsigned int *len)
{
char *path;
@@ -463,7 +464,7 @@ void *xenbus_read(struct xenbus_transaction t,
if (IS_ERR(path))
return (void *)path;

- ret = xs_single(t, XS_READ, path, len);
+ ret = xs_single(xh, t, XS_READ, path, len);
kfree(path);
return ret;
}
@@ -472,7 +473,7 @@ EXPORT_SYMBOL_GPL(xenbus_read);
/* Write the value of a single file.
* Returns -err on failure.
*/
-int xenbus_write(struct xenbus_transaction t,
+int xenbus_write(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *string)
{
const char *path;
@@ -488,14 +489,14 @@ int xenbus_write(struct xenbus_transaction t,
iovec[1].iov_base = (void *)string;
iovec[1].iov_len = strlen(string);

- ret = xs_error(xs_talkv(t, XS_WRITE, iovec, ARRAY_SIZE(iovec), NULL));
+ ret = xs_error(xs_talkv(xh, t, XS_WRITE, iovec, ARRAY_SIZE(iovec), NULL));
kfree(path);
return ret;
}
EXPORT_SYMBOL_GPL(xenbus_write);

/* Create a new directory. */
-int xenbus_mkdir(struct xenbus_transaction t,
+int xenbus_mkdir(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node)
{
char *path;
@@ -505,14 +506,14 @@ int xenbus_mkdir(struct xenbus_transaction t,
if (IS_ERR(path))
return PTR_ERR(path);

- ret = xs_error(xs_single(t, XS_MKDIR, path, NULL));
+ ret = xs_error(xs_single(xh, t, XS_MKDIR, path, NULL));
kfree(path);
return ret;
}
EXPORT_SYMBOL_GPL(xenbus_mkdir);

/* Destroy a file or directory (directories must be empty). */
-int xenbus_rm(struct xenbus_transaction t, const char *dir, const char *node)
+int xenbus_rm(xenhost_t *xh,struct xenbus_transaction t, const char *dir, const char *node)
{
char *path;
int ret;
@@ -521,7 +522,7 @@ int xenbus_rm(struct xenbus_transaction t, const char *dir, const char *node)
if (IS_ERR(path))
return PTR_ERR(path);

- ret = xs_error(xs_single(t, XS_RM, path, NULL));
+ ret = xs_error(xs_single(xh, t, XS_RM, path, NULL));
kfree(path);
return ret;
}
@@ -530,11 +531,11 @@ EXPORT_SYMBOL_GPL(xenbus_rm);
/* Start a transaction: changes by others will not be seen during this
* transaction, and changes will not be visible to others until end.
*/
-int xenbus_transaction_start(struct xenbus_transaction *t)
+int xenbus_transaction_start(xenhost_t *xh, struct xenbus_transaction *t)
{
char *id_str;

- id_str = xs_single(XBT_NIL, XS_TRANSACTION_START, "", NULL);
+ id_str = xs_single(xh, XBT_NIL, XS_TRANSACTION_START, "", NULL);
if (IS_ERR(id_str))
return PTR_ERR(id_str);

@@ -547,7 +548,7 @@ EXPORT_SYMBOL_GPL(xenbus_transaction_start);
/* End a transaction.
* If abandon is true, transaction is discarded instead of committed.
*/
-int xenbus_transaction_end(struct xenbus_transaction t, int abort)
+int xenbus_transaction_end(xenhost_t *xh, struct xenbus_transaction t, int abort)
{
char abortstr[2];

@@ -556,19 +557,19 @@ int xenbus_transaction_end(struct xenbus_transaction t, int abort)
else
strcpy(abortstr, "T");

- return xs_error(xs_single(t, XS_TRANSACTION_END, abortstr, NULL));
+ return xs_error(xs_single(xh, t, XS_TRANSACTION_END, abortstr, NULL));
}
EXPORT_SYMBOL_GPL(xenbus_transaction_end);

/* Single read and scanf: returns -errno or num scanned. */
-int xenbus_scanf(struct xenbus_transaction t,
+int xenbus_scanf(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *fmt, ...)
{
va_list ap;
int ret;
char *val;

- val = xenbus_read(t, dir, node, NULL);
+ val = xenbus_read(xh, t, dir, node, NULL);
if (IS_ERR(val))
return PTR_ERR(val);

@@ -584,13 +585,13 @@ int xenbus_scanf(struct xenbus_transaction t,
EXPORT_SYMBOL_GPL(xenbus_scanf);

/* Read an (optional) unsigned value. */
-unsigned int xenbus_read_unsigned(const char *dir, const char *node,
+unsigned int xenbus_read_unsigned(xenhost_t *xh, const char *dir, const char *node,
unsigned int default_val)
{
unsigned int val;
int ret;

- ret = xenbus_scanf(XBT_NIL, dir, node, "%u", &val);
+ ret = xenbus_scanf(xh, XBT_NIL, dir, node, "%u", &val);
if (ret <= 0)
val = default_val;

@@ -599,7 +600,7 @@ unsigned int xenbus_read_unsigned(const char *dir, const char *node,
EXPORT_SYMBOL_GPL(xenbus_read_unsigned);

/* Single printf and write: returns -errno or 0. */
-int xenbus_printf(struct xenbus_transaction t,
+int xenbus_printf(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *fmt, ...)
{
va_list ap;
@@ -613,7 +614,7 @@ int xenbus_printf(struct xenbus_transaction t,
if (!buf)
return -ENOMEM;

- ret = xenbus_write(t, dir, node, buf);
+ ret = xenbus_write(xh, t, dir, node, buf);

kfree(buf);

@@ -622,7 +623,7 @@ int xenbus_printf(struct xenbus_transaction t,
EXPORT_SYMBOL_GPL(xenbus_printf);

/* Takes tuples of names, scanf-style args, and void **, NULL terminated. */
-int xenbus_gather(struct xenbus_transaction t, const char *dir, ...)
+int xenbus_gather(xenhost_t *xh, struct xenbus_transaction t, const char *dir, ...)
{
va_list ap;
const char *name;
@@ -634,7 +635,7 @@ int xenbus_gather(struct xenbus_transaction t, const char *dir, ...)
void *result = va_arg(ap, void *);
char *p;

- p = xenbus_read(t, dir, name, NULL);
+ p = xenbus_read(xh, t, dir, name, NULL);
if (IS_ERR(p)) {
ret = PTR_ERR(p);
break;
@@ -651,7 +652,7 @@ int xenbus_gather(struct xenbus_transaction t, const char *dir, ...)
}
EXPORT_SYMBOL_GPL(xenbus_gather);

-static int xs_watch(const char *path, const char *token)
+static int xs_watch(xenhost_t *xh, const char *path, const char *token)
{
struct kvec iov[2];

@@ -660,11 +661,11 @@ static int xs_watch(const char *path, const char *token)
iov[1].iov_base = (void *)token;
iov[1].iov_len = strlen(token) + 1;

- return xs_error(xs_talkv(XBT_NIL, XS_WATCH, iov,
+ return xs_error(xs_talkv(xh, XBT_NIL, XS_WATCH, iov,
ARRAY_SIZE(iov), NULL));
}

-static int xs_unwatch(const char *path, const char *token)
+static int xs_unwatch(xenhost_t *xh, const char *path, const char *token)
{
struct kvec iov[2];

@@ -673,24 +674,25 @@ static int xs_unwatch(const char *path, const char *token)
iov[1].iov_base = (char *)token;
iov[1].iov_len = strlen(token) + 1;

- return xs_error(xs_talkv(XBT_NIL, XS_UNWATCH, iov,
+ return xs_error(xs_talkv(xh, XBT_NIL, XS_UNWATCH, iov,
ARRAY_SIZE(iov), NULL));
}

-static struct xenbus_watch *find_watch(const char *token)
+static struct xenbus_watch *find_watch(xenhost_t *xh, const char *token)
{
struct xenbus_watch *i, *cmp;
+ struct xenstore_private *xs = xs_priv(xh);

cmp = (void *)simple_strtoul(token, NULL, 16);

- list_for_each_entry(i, &watches, list)
+ list_for_each_entry(i, &xs->watches, list)
if (i == cmp)
return i;

return NULL;
}

-int xs_watch_msg(struct xs_watch_event *event)
+int xs_watch_msg(xenhost_t *xh, struct xs_watch_event *event)
{
if (count_strings(event->body, event->len) != 2) {
kfree(event);
@@ -700,7 +702,7 @@ int xs_watch_msg(struct xs_watch_event *event)
event->token = (const char *)strchr(event->body, '\0') + 1;

spin_lock(&watches_lock);
- event->handle = find_watch(event->token);
+ event->handle = find_watch(xh, event->token);
if (event->handle != NULL) {
spin_lock(&watch_events_lock);
list_add_tail(&event->list, &watch_events);
@@ -719,7 +721,7 @@ int xs_watch_msg(struct xs_watch_event *event)
* so if we are running on anything older than 4 do not attempt to read
* control/platform-feature-xs_reset_watches.
*/
-static bool xen_strict_xenbus_quirk(void)
+static bool xen_strict_xenbus_quirk(xenhost_t *xh)
{
#ifdef CONFIG_X86
uint32_t eax, ebx, ecx, edx, base;
@@ -733,42 +735,44 @@ static bool xen_strict_xenbus_quirk(void)
return false;

}
-static void xs_reset_watches(void)
+static void xs_reset_watches(xenhost_t *xh)
{
int err;

if (!xen_hvm_domain() || xen_initial_domain())
return;

- if (xen_strict_xenbus_quirk())
+ if (xen_strict_xenbus_quirk(xh))
return;

- if (!xenbus_read_unsigned("control",
+ if (!xenbus_read_unsigned(xh, "control",
"platform-feature-xs_reset_watches", 0))
return;

- err = xs_error(xs_single(XBT_NIL, XS_RESET_WATCHES, "", NULL));
+ err = xs_error(xs_single(xh, XBT_NIL, XS_RESET_WATCHES, "", NULL));
if (err && err != -EEXIST)
pr_warn("xs_reset_watches failed: %d\n", err);
}

/* Register callback to watch this node. */
-int register_xenbus_watch(struct xenbus_watch *watch)
+int register_xenbus_watch(xenhost_t *xh, struct xenbus_watch *watch)
{
/* Pointer in ascii is the token. */
char token[sizeof(watch) * 2 + 1];
+ struct xenstore_private *xs = xs_priv(xh);
int err;

sprintf(token, "%lX", (long)watch);
+ watch->xh = xh;

down_read(&xs_watch_rwsem);

spin_lock(&watches_lock);
- BUG_ON(find_watch(token));
- list_add(&watch->list, &watches);
+ BUG_ON(find_watch(xh, token));
+ list_add(&watch->list, &xs->watches);
spin_unlock(&watches_lock);

- err = xs_watch(watch->node, token);
+ err = xs_watch(xh, watch->node, token);

if (err) {
spin_lock(&watches_lock);
@@ -782,7 +786,7 @@ int register_xenbus_watch(struct xenbus_watch *watch)
}
EXPORT_SYMBOL_GPL(register_xenbus_watch);

-void unregister_xenbus_watch(struct xenbus_watch *watch)
+void unregister_xenbus_watch(xenhost_t *xh, struct xenbus_watch *watch)
{
struct xs_watch_event *event, *tmp;
char token[sizeof(watch) * 2 + 1];
@@ -793,11 +797,11 @@ void unregister_xenbus_watch(struct xenbus_watch *watch)
down_read(&xs_watch_rwsem);

spin_lock(&watches_lock);
- BUG_ON(!find_watch(token));
+ BUG_ON(!find_watch(xh, token));
list_del(&watch->list);
spin_unlock(&watches_lock);

- err = xs_unwatch(watch->node, token);
+ err = xs_unwatch(xh, watch->node, token);
if (err)
pr_warn("Failed to release watch %s: %i\n", watch->node, err);

@@ -831,24 +835,29 @@ void xs_suspend(void)
mutex_lock(&xs_response_mutex);
}

-void xs_resume(void)
+void xs_resume()
{
struct xenbus_watch *watch;
char token[sizeof(watch) * 2 + 1];
+ xenhost_t **xh;

- xb_init_comms();
+ for_each_xenhost(xh) {
+ struct xenstore_private *xs = xs_priv(*xh);

- mutex_unlock(&xs_response_mutex);
+ xb_init_comms(*xh);

- xs_suspend_exit();
+ mutex_unlock(&xs_response_mutex);

- /* No need for watches_lock: the xs_watch_rwsem is sufficient. */
- list_for_each_entry(watch, &watches, list) {
- sprintf(token, "%lX", (long)watch);
- xs_watch(watch->node, token);
+ xs_suspend_exit();
+
+ /* No need for watches_lock: the xs_watch_rwsem is sufficient. */
+ list_for_each_entry(watch, &xs->watches, list) {
+ sprintf(token, "%lX", (long)watch);
+ xs_watch(*xh, watch->node, token);
+ }
+
+ up_write(&xs_watch_rwsem);
}
-
- up_write(&xs_watch_rwsem);
}

void xs_suspend_cancel(void)
@@ -905,13 +914,18 @@ static int xs_reboot_notify(struct notifier_block *nb,
unsigned long code, void *unused)
{
struct xb_req_data *req;
+ xenhost_t **xh;

- mutex_lock(&xb_write_mutex);
- list_for_each_entry(req, &xs_reply_list, list)
- wake_up(&req->wq);
- list_for_each_entry(req, &xb_write_list, list)
- wake_up(&req->wq);
- mutex_unlock(&xb_write_mutex);
+ for_each_xenhost(xh) {
+ struct xenstore_private *xs = xs_priv(*xh);
+
+ mutex_lock(&xb_write_mutex);
+ list_for_each_entry(req, &xs->reply_list, list)
+ wake_up(&req->wq);
+ list_for_each_entry(req, &xs->xb_write_list, list)
+ wake_up(&req->wq);
+ mutex_unlock(&xb_write_mutex);
+ }
return NOTIFY_DONE;
}

@@ -919,15 +933,17 @@ static struct notifier_block xs_reboot_nb = {
.notifier_call = xs_reboot_notify,
};

-int xs_init(void)
+int xs_init(xenhost_t *xh)
{
int err;
struct task_struct *task;

- register_reboot_notifier(&xs_reboot_nb);
+ if (xh->type != xenhost_r2)
+ /* Needs to be moved out */
+ register_reboot_notifier(&xs_reboot_nb);

/* Initialize the shared memory rings to talk to xenstored */
- err = xb_init_comms();
+ err = xb_init_comms(xh);
if (err)
return err;

@@ -936,7 +952,7 @@ int xs_init(void)
return PTR_ERR(task);

/* shutdown watches for kexec boot */
- xs_reset_watches();
+ xs_reset_watches(xh);

return 0;
}
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 75be9059893f..3ba2f6b1e196 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -204,6 +204,9 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
int xen_xlate_map_ballooned_pages(xen_pfn_t **pfns, void **vaddr,
unsigned long nr_grant_frames);

+int xen_hvm_setup_xs(xenhost_t *xh);
+int xen_pv_setup_xs(xenhost_t *xh);
+
bool xen_running_on_version_or_later(unsigned int major, unsigned int minor);

efi_status_t xen_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..8f8c39008e15 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -43,6 +43,7 @@
#include <linux/init.h>
#include <linux/slab.h>
#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <xen/interface/grant_table.h>
#include <xen/interface/io/xenbus.h>
#include <xen/interface/io/xs_wire.h>
@@ -58,6 +59,8 @@ struct xenbus_watch

/* Path being watched. */
const char *node;
+ /* On xenhost. */
+ xenhost_t *xh;

/* Callback (executed in a process context with no locks held). */
void (*callback)(struct xenbus_watch *,
@@ -70,6 +73,7 @@ struct xenbus_device {
const char *devicetype;
const char *nodename;
const char *otherend;
+ xenhost_t *xh;
int otherend_id;
struct xenbus_watch otherend_watch;
struct device dev;
@@ -78,6 +82,13 @@ struct xenbus_device {
struct work_struct work;
};

+enum xenstore_init {
+ XS_UNKNOWN,
+ XS_PV,
+ XS_HVM,
+ XS_LOCAL,
+};
+
static inline struct xenbus_device *to_xenbus_device(struct device *dev)
{
return container_of(dev, struct xenbus_device, dev);
@@ -133,52 +144,51 @@ struct xenbus_transaction
/* Nil transaction ID. */
#define XBT_NIL ((struct xenbus_transaction) { 0 })

-char **xenbus_directory(struct xenbus_transaction t,
+char **xenbus_directory(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, unsigned int *num);
-void *xenbus_read(struct xenbus_transaction t,
+void *xenbus_read(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, unsigned int *len);
-int xenbus_write(struct xenbus_transaction t,
+int xenbus_write(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *string);
-int xenbus_mkdir(struct xenbus_transaction t,
+int xenbus_mkdir(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node);
-int xenbus_exists(struct xenbus_transaction t,
+int xenbus_exists(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node);
-int xenbus_rm(struct xenbus_transaction t, const char *dir, const char *node);
-int xenbus_transaction_start(struct xenbus_transaction *t);
-int xenbus_transaction_end(struct xenbus_transaction t, int abort);
+int xenbus_rm(xenhost_t *xh, struct xenbus_transaction t, const char *dir, const char *node);
+int xenbus_transaction_start(xenhost_t *xh, struct xenbus_transaction *t);
+int xenbus_transaction_end(xenhost_t *xh, struct xenbus_transaction t, int abort);

/* Single read and scanf: returns -errno or num scanned if > 0. */
-__scanf(4, 5)
-int xenbus_scanf(struct xenbus_transaction t,
+__scanf(5, 6)
+int xenbus_scanf(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *fmt, ...);

/* Read an (optional) unsigned value. */
-unsigned int xenbus_read_unsigned(const char *dir, const char *node,
+unsigned int xenbus_read_unsigned(xenhost_t *xh, const char *dir, const char *node,
unsigned int default_val);

/* Single printf and write: returns -errno or 0. */
-__printf(4, 5)
-int xenbus_printf(struct xenbus_transaction t,
+__printf(5, 6)
+int xenbus_printf(xenhost_t *xh, struct xenbus_transaction t,
const char *dir, const char *node, const char *fmt, ...);

/* Generic read function: NULL-terminated triples of name,
* sprintf-style type string, and pointer. Returns 0 or errno.*/
-int xenbus_gather(struct xenbus_transaction t, const char *dir, ...);
+int xenbus_gather(xenhost_t *xh, struct xenbus_transaction t, const char *dir, ...);

/* notifer routines for when the xenstore comes up */
-extern int xenstored_ready;
-int register_xenstore_notifier(struct notifier_block *nb);
-void unregister_xenstore_notifier(struct notifier_block *nb);
+int register_xenstore_notifier(xenhost_t *xh, struct notifier_block *nb);
+void unregister_xenstore_notifier(xenhost_t *xh, struct notifier_block *nb);

-int register_xenbus_watch(struct xenbus_watch *watch);
-void unregister_xenbus_watch(struct xenbus_watch *watch);
+int register_xenbus_watch(xenhost_t *xh, struct xenbus_watch *watch);
+void unregister_xenbus_watch(xenhost_t *xh, struct xenbus_watch *watch);
void xs_suspend(void);
void xs_resume(void);
void xs_suspend_cancel(void);

struct work_struct;

-void xenbus_probe(struct work_struct *);
+void __xenbus_probe(void *xs);

#define XENBUS_IS_ERR_READ(str) ({ \
if (!IS_ERR(str) && strlen(str) == 0) { \
@@ -218,7 +228,7 @@ int xenbus_unmap_ring(struct xenbus_device *dev,
int xenbus_alloc_evtchn(struct xenbus_device *dev, int *port);
int xenbus_free_evtchn(struct xenbus_device *dev, int port);

-enum xenbus_state xenbus_read_driver_state(const char *path);
+enum xenbus_state xenbus_read_driver_state(struct xenbus_device *dev, const char *path);

__printf(3, 4)
void xenbus_dev_error(struct xenbus_device *dev, int err, const char *fmt, ...);
@@ -230,7 +240,5 @@ int xenbus_dev_is_online(struct xenbus_device *dev);
int xenbus_frontend_closed(struct xenbus_device *dev);

extern const struct file_operations xen_xenbus_fops;
-extern struct xenstore_domain_interface *xen_store_interface;
-extern int xen_store_evtchn;

#endif /* _XEN_XENBUS_H */
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index acee0c7872b6..91574ecaad6c 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -140,6 +140,9 @@ typedef struct {
void *gnttab_status_vm_area;
void *auto_xlat_grant_frames;
};
+
+ /* xenstore private state */
+ void *xenstore_private;
} xenhost_t;

typedef struct xenhost_ops {
@@ -228,6 +231,17 @@ typedef struct xenhost_ops {
int (*alloc_ballooned_pages)(xenhost_t *xh, int nr_pages, struct page **pages);
void (*free_ballooned_pages)(xenhost_t *xh, int nr_pages, struct page **pages);

+ /*
+ * xenbus: as part of xenbus-init, frontend/backend need to talk to the
+ * correct xenbus. This might be a local xenstore (backend) or might
+ * be a XS_PV/XS_HVM interface (frontend). We bootstrap these with
+ * evtchn/gfn parameters from (*setup_xs)().
+ *
+ * Once done, stash the xenhost_t * in xen_bus_type, xenbus_device or
+ * xenbus_watch and then the frontend and backend devices implicitly
+ * use the correct interface.
+ */
+ int (*setup_xs)(xenhost_t *xh);
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
@@ -279,4 +293,10 @@ static inline void xenhost_probe_vcpu_id(xenhost_t *xh, int cpu)
(xh->ops->probe_vcpu_id)(xh, cpu);
}

+static inline void xenhost_setup_xs(xenhost_t *xh)
+{
+ if (xh)
+ (xh->ops->setup_xs)(xh);
+}
+
#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:28:19

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 03/16] x86/xen: make hypercall_page generic

Make hypercall_page a generic interface which can be implemented
by other hypervisors. With this change, hypercall_page now points to
the newly introduced xen_hypercall_page which is seeded by Xen, or
to one that is filled in by a different hypervisor.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 12 +++++++-----
arch/x86/xen/enlighten.c | 1 +
arch/x86/xen/enlighten_hvm.c | 3 ++-
arch/x86/xen/enlighten_pv.c | 1 +
arch/x86/xen/enlighten_pvh.c | 3 ++-
arch/x86/xen/xen-asm_32.S | 2 +-
arch/x86/xen/xen-asm_64.S | 2 +-
arch/x86/xen/xen-head.S | 8 ++++----
8 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index ef05bea7010d..1a3cd6680e6f 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -86,11 +86,13 @@ struct xen_dm_op_buf;
* there aren't more than 5 arguments...)
*/

-extern struct { char _entry[32]; } hypercall_page[];
+struct hypercall_entry { char _entry[32]; };
+extern struct hypercall_entry xen_hypercall_page[128];
+extern struct hypercall_entry *hypercall_page;

-#define __HYPERCALL "call hypercall_page+%c[offset]"
+#define __HYPERCALL CALL_NOSPEC
#define __HYPERCALL_ENTRY(x) \
- [offset] "i" (__HYPERVISOR_##x * sizeof(hypercall_page[0]))
+ [thunk_target] "0" (hypercall_page + __HYPERVISOR_##x)

#ifdef CONFIG_X86_32
#define __HYPERCALL_RETREG "eax"
@@ -116,7 +118,7 @@ extern struct { char _entry[32]; } hypercall_page[];
register unsigned long __arg4 asm(__HYPERCALL_ARG4REG) = __arg4; \
register unsigned long __arg5 asm(__HYPERCALL_ARG5REG) = __arg5;

-#define __HYPERCALL_0PARAM "=r" (__res), ASM_CALL_CONSTRAINT
+#define __HYPERCALL_0PARAM "=&r" (__res), ASM_CALL_CONSTRAINT
#define __HYPERCALL_1PARAM __HYPERCALL_0PARAM, "+r" (__arg1)
#define __HYPERCALL_2PARAM __HYPERCALL_1PARAM, "+r" (__arg2)
#define __HYPERCALL_3PARAM __HYPERCALL_2PARAM, "+r" (__arg3)
@@ -208,7 +210,7 @@ xen_single_call(unsigned int call,

asm volatile(CALL_NOSPEC
: __HYPERCALL_5PARAM
- : [thunk_target] "a" (&hypercall_page[call])
+ : [thunk_target] "0" (hypercall_page + call)
: __HYPERCALL_CLOBBER5);

return (long)__res;
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 750f46ad018a..e9dc92e79afa 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -20,6 +20,7 @@
#include "smp.h"
#include "pmu.h"

+struct hypercall_entry *hypercall_page;
EXPORT_SYMBOL_GPL(hypercall_page);

/*
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index ffc5791675b2..4d85cd2ff261 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -115,8 +115,9 @@ static void __init init_hvm_pv_info(void)

pv_info.name = "Xen HVM";
msr = cpuid_ebx(base + 2);
- pfn = __pa(hypercall_page);
+ pfn = __pa(xen_hypercall_page);
wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+ hypercall_page = xen_hypercall_page;
}

xen_setup_features();
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index a4e04b0cc596..3239e8452ede 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1217,6 +1217,7 @@ asmlinkage __visible void __init xen_start_kernel(void)

if (!xen_start_info)
return;
+ hypercall_page = xen_hypercall_page;

xenhost_register(xenhost_r1, &xh_pv_ops);

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index c07eba169572..e47866fcb7ea 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -46,8 +46,9 @@ void __init xen_pvh_init(void)
xen_start_flags = pvh_start_info.flags;

msr = cpuid_ebx(xen_cpuid_base() + 2);
- pfn = __pa(hypercall_page);
+ pfn = __pa(xen_hypercall_page);
wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+ hypercall_page = xen_hypercall_page;
}

void __init mem_map_via_hcall(struct boot_params *boot_params_p)
diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index c15db060a242..ee4998055ea9 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -121,7 +121,7 @@ xen_iret_end_crit:

hyper_iret:
/* put this out of line since its very rarely used */
- jmp hypercall_page + __HYPERVISOR_iret * 32
+ jmp xen_hypercall_page + __HYPERVISOR_iret * 32

.globl xen_iret_start_crit, xen_iret_end_crit

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 1e9ef0ba30a5..2172d6aec9a3 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -70,7 +70,7 @@ ENTRY(xen_early_idt_handler_array)
END(xen_early_idt_handler_array)
__FINIT

-hypercall_iret = hypercall_page + __HYPERVISOR_iret * 32
+hypercall_iret = xen_hypercall_page + __HYPERVISOR_iret * 32
/*
* Xen64 iret frame:
*
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 5077ead5e59c..7ff5437bd83f 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -58,18 +58,18 @@ END(startup_xen)

.pushsection .text
.balign PAGE_SIZE
-ENTRY(hypercall_page)
+ENTRY(xen_hypercall_page)
.rept (PAGE_SIZE / 32)
UNWIND_HINT_EMPTY
.skip 32
.endr

#define HYPERCALL(n) \
- .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \
+ .equ xen_hypercall_##n, xen_hypercall_page + __HYPERVISOR_##n * 32; \
.type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32
#include <asm/xen-hypercalls.h>
#undef HYPERCALL
-END(hypercall_page)
+END(xen_hypercall_page)
.popsection

ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux")
@@ -85,7 +85,7 @@ END(hypercall_page)
#ifdef CONFIG_XEN_PV
ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, _ASM_PTR startup_xen)
#endif
- ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
+ ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR xen_hypercall_page)
ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,
.ascii "!writable_page_tables|pae_pgdir_above_4gb")
ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES,
--
2.20.1

2019-05-09 17:28:21

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 02/16] x86/xen: cpuid support in xenhost_t

xen_cpuid_base() is used to probe and setup features early in a
guest's lifetime.

We want this to behave differently depending on xenhost->type: for
instance, local xenhosts cannot intercept the cpuid instruction at all.

Add op (*cpuid_base)() in xenhost_ops_t.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/include/asm/xen/hypervisor.h | 2 +-
arch/x86/pci/xen.c | 2 +-
arch/x86/xen/enlighten_hvm.c | 7 +++++--
arch/x86/xen/enlighten_pv.c | 16 +++++++++++++++-
arch/x86/xen/enlighten_pvh.c | 4 ++++
drivers/tty/hvc/hvc_xen.c | 2 +-
drivers/xen/grant-table.c | 3 ++-
drivers/xen/xenbus/xenbus_xs.c | 3 ++-
include/xen/xenhost.h | 21 +++++++++++++++++++++
9 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h
index 39171b3646bb..6c4cdcdf997d 100644
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -53,7 +53,7 @@ static inline bool xen_x2apic_para_available(void)
#else
static inline bool xen_x2apic_para_available(void)
{
- return (xen_cpuid_base() != 0);
+ return (xen_cpuid_base(NULL) != 0);
}
#endif

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 9112d1cb397b..d1a3b9f08289 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -431,7 +431,7 @@ void __init xen_msi_init(void)
* event channels for MSI handling and instead use regular
* APIC processing
*/
- uint32_t eax = cpuid_eax(xen_cpuid_base() + 4);
+ uint32_t eax = cpuid_eax(xenhost_cpuid_base(xh_default) + 4);

if (((eax & XEN_HVM_CPUID_X2APIC_VIRT) && x2apic_mode) ||
((eax & XEN_HVM_CPUID_APIC_ACCESS_VIRT) && boot_cpu_has(X86_FEATURE_APIC)))
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 100452f4f44c..ffc5791675b2 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -83,7 +83,10 @@ static void __init xen_hvm_init_mem_mapping(void)
xen_vcpu_info_reset(0);
}

+extern uint32_t xen_pv_cpuid_base(xenhost_t *xh);
+
xenhost_ops_t xh_hvm_ops = {
+ .cpuid_base = xen_pv_cpuid_base,
};

xenhost_ops_t xh_hvm_nested_ops = {
@@ -94,7 +97,7 @@ static void __init init_hvm_pv_info(void)
int major, minor;
uint32_t eax, ebx, ecx, edx, base;

- base = xen_cpuid_base();
+ base = xenhost_cpuid_base(xh_default);
eax = cpuid_eax(base + 1);

major = eax >> 16;
@@ -250,7 +253,7 @@ static uint32_t __init xen_platform_hvm(void)
if (xen_pv_domain() || xen_nopv)
return 0;

- return xen_cpuid_base();
+ return xenhost_cpuid_base(xh_default);
}

static __init void xen_hvm_guest_late_init(void)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index bb6e811c1525..a4e04b0cc596 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1189,10 +1189,23 @@ static void __init xen_dom0_set_legacy_features(void)
x86_platform.legacy.rtc = 1;
}

+uint32_t xen_pv_cpuid_base(xenhost_t *xh)
+{
+ return hypervisor_cpuid_base("XenVMMXenVMM", 2);
+}
+
+uint32_t xen_pv_nested_cpuid_base(xenhost_t *xh)
+{
+ return hypervisor_cpuid_base("XenVMMXenVMM",
+ 2 /* nested specific leaf? */);
+}
+
xenhost_ops_t xh_pv_ops = {
+ .cpuid_base = xen_pv_cpuid_base,
};

xenhost_ops_t xh_pv_nested_ops = {
+ .cpuid_base = xen_pv_nested_cpuid_base,
};

/* First C function to be called on Xen boot */
@@ -1469,7 +1482,8 @@ static int xen_cpu_dead_pv(unsigned int cpu)
static uint32_t __init xen_platform_pv(void)
{
if (xen_pv_domain())
- return xen_cpuid_base();
+ /* xenhost is setup in xen_start_kernel. */
+ return xenhost_cpuid_base(xh_default);

return 0;
}
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 826c296d27a3..c07eba169572 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -29,6 +29,10 @@ void __init xen_pvh_init(void)
u32 msr;
u64 pfn;

+ /*
+ * Note: we have already called xen_cpuid_base() in
+ * hypervisor_specific_init()
+ */
xenhost_register(xenhost_r1, &xh_hvm_ops);

/*
diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index dc43fa96c3de..5e5ca35d7187 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -595,7 +595,7 @@ console_initcall(xen_cons_init);
#ifdef CONFIG_X86
static void xen_hvm_early_write(uint32_t vtermno, const char *str, int len)
{
- if (xen_cpuid_base())
+ if (xen_cpuid_base(xh_default))
outsb(0xe9, str, len);
}
#else
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7ea6fb6a2e5d..98af259d0d4f 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -50,6 +50,7 @@
#endif

#include <xen/xen.h>
+#include <xen/xenhost.h>
#include <xen/interface/xen.h>
#include <xen/page.h>
#include <xen/grant_table.h>
@@ -1318,7 +1319,7 @@ static bool gnttab_need_v2(void)
uint32_t base, width;

if (xen_pv_domain()) {
- base = xen_cpuid_base();
+ base = xenhost_cpuid_base(xh_default);
if (cpuid_eax(base) < 5)
return false; /* Information not available, use V1. */
width = cpuid_ebx(base + 5) &
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 49a3874ae6bb..3236d1b1fa01 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -49,6 +49,7 @@
#include <asm/xen/hypervisor.h>
#include <xen/xenbus.h>
#include <xen/xen.h>
+#include <xen/xenhost.h>
#include "xenbus.h"

/*
@@ -722,7 +723,7 @@ static bool xen_strict_xenbus_quirk(void)
#ifdef CONFIG_X86
uint32_t eax, ebx, ecx, edx, base;

- base = xen_cpuid_base();
+ base = xenhost_cpuid_base(xh_default);
cpuid(base + 1, &eax, &ebx, &ecx, &edx);

if ((eax >> 16) < 4)
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index a58e883f144e..13a70bdadfd2 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -1,6 +1,9 @@
#ifndef __XENHOST_H
#define __XENHOST_H

+#include <xen/interface/features.h>
+#include <xen/interface/xen.h>
+#include <asm/xen/hypervisor.h>
/*
* Xenhost abstracts out the Xen interface. It co-exists with the PV/HVM/PVH
* abstractions (x86_init, hypervisor_x86, pv_ops etc) and is meant to
@@ -70,6 +73,16 @@ typedef struct {
} xenhost_t;

typedef struct xenhost_ops {
+ /*
+ * xen_cpuid is used to probe features early.
+ * xenhost_r0:
+ * Implementation could not use cpuid at all: it's difficult to
+ * intercept cpuid instruction locally.
+ * xenhost_r1:
+ * xenhost_r2:
+ * Separate cpuid-leafs?
+ */
+ uint32_t (*cpuid_base)(xenhost_t *xenhost);
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
@@ -92,4 +105,12 @@ void __xenhost_unregister(enum xenhost_type type);
for ((xh) = (xenhost_t **) &xenhosts[0]; \
(((xh) - (xenhost_t **)&xenhosts) < 2) && (*xh)->type != xenhost_invalid; (xh)++)

+static inline uint32_t xenhost_cpuid_base(xenhost_t *xh)
+{
+ if (xh)
+ return (xh->ops->cpuid_base)(xh);
+ else
+ return xen_cpuid_base();
+}
+
#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:28:40

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 07/16] x86/xen: make vcpu_info part of xenhost_t

Abstract out xen_vcpu_id probing via (*probe_vcpu_id)(). Once that is
availab,e the vcpu_info registration happens via the VCPUOP hypercall.

Note that for the nested case, there are two vcpu_ids, and two vcpu_info
areas, one each for the default xenhost and the remote xenhost.
The vcpu_info is used via pv_irq_ops, and evtchn signaling.

The other VCPUOP hypercalls are used for management (and scheduling)
which is expected to be done purely in the default hypervisor.
However, scheduling of L1-guest does imply L0-Xen-vcpu_info switching,
which might mean that the remote hypervisor needs some visibility
into related events/hypercalls in the default hypervisor.

TODO:
- percpu data structures for xen_vcpu

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/enlighten.c | 93 +++++++++++++-------------------
arch/x86/xen/enlighten_hvm.c | 87 ++++++++++++++++++------------
arch/x86/xen/enlighten_pv.c | 60 ++++++++++++++-------
arch/x86/xen/enlighten_pvh.c | 3 +-
arch/x86/xen/irq.c | 10 ++--
arch/x86/xen/mmu_pv.c | 6 +--
arch/x86/xen/pci-swiotlb-xen.c | 1 +
arch/x86/xen/setup.c | 1 +
arch/x86/xen/smp.c | 9 +++-
arch/x86/xen/smp_hvm.c | 17 +++---
arch/x86/xen/smp_pv.c | 12 ++---
arch/x86/xen/time.c | 23 ++++----
arch/x86/xen/xen-ops.h | 5 +-
drivers/xen/events/events_base.c | 14 ++---
drivers/xen/events/events_fifo.c | 2 +-
drivers/xen/evtchn.c | 2 +-
drivers/xen/time.c | 2 +-
include/xen/xen-ops.h | 7 +--
include/xen/xenhost.h | 47 ++++++++++++++++
19 files changed, 240 insertions(+), 161 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 20e0de844442..0dafbbc838ef 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -20,35 +20,6 @@
#include "smp.h"
#include "pmu.h"

-/*
- * Pointer to the xen_vcpu_info structure or
- * &HYPERVISOR_shared_info->vcpu_info[cpu]. See xen_hvm_init_shared_info
- * and xen_vcpu_setup for details. By default it points to share_info->vcpu_info
- * but if the hypervisor supports VCPUOP_register_vcpu_info then it can point
- * to xen_vcpu_info. The pointer is used in __xen_evtchn_do_upcall to
- * acknowledge pending events.
- * Also more subtly it is used by the patched version of irq enable/disable
- * e.g. xen_irq_enable_direct and xen_iret in PV mode.
- *
- * The desire to be able to do those mask/unmask operations as a single
- * instruction by using the per-cpu offset held in %gs is the real reason
- * vcpu info is in a per-cpu pointer and the original reason for this
- * hypercall.
- *
- */
-DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
-
-/*
- * Per CPU pages used if hypervisor supports VCPUOP_register_vcpu_info
- * hypercall. This can be used both in PV and PVHVM mode. The structure
- * overrides the default per_cpu(xen_vcpu, cpu) value.
- */
-DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
-
-/* Linux <-> Xen vCPU id mapping */
-DEFINE_PER_CPU(uint32_t, xen_vcpu_id);
-EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
-
enum xen_domain_type xen_domain_type = XEN_NATIVE;
EXPORT_SYMBOL_GPL(xen_domain_type);

@@ -112,12 +83,12 @@ int xen_cpuhp_setup(int (*cpu_up_prepare_cb)(unsigned int),
return rc >= 0 ? 0 : rc;
}

-static int xen_vcpu_setup_restore(int cpu)
+static int xen_vcpu_setup_restore(xenhost_t *xh, int cpu)
{
int rc = 0;

/* Any per_cpu(xen_vcpu) is stale, so reset it */
- xen_vcpu_info_reset(cpu);
+ xen_vcpu_info_reset(xh, cpu);

/*
* For PVH and PVHVM, setup online VCPUs only. The rest will
@@ -125,7 +96,7 @@ static int xen_vcpu_setup_restore(int cpu)
*/
if (xen_pv_domain() ||
(xen_hvm_domain() && cpu_online(cpu))) {
- rc = xen_vcpu_setup(cpu);
+ rc = xen_vcpu_setup(xh, cpu);
}

return rc;
@@ -138,30 +109,42 @@ static int xen_vcpu_setup_restore(int cpu)
*/
void xen_vcpu_restore(void)
{
- int cpu, rc;
+ int cpu, rc = 0;

+ /*
+ * VCPU management is primarily the responsibility of xh_default and
+ * xh_remote only needs VCPUOP_register_vcpu_info.
+ * So, we do VPUOP_down and VCPUOP_up only on xh_default.
+ *
+ * (Currently, however, VCPUOP_register_vcpu_info is allowed only
+ * on VCPUs that are self or down, so we might need a new model
+ * there.)
+ */
for_each_possible_cpu(cpu) {
bool other_cpu = (cpu != smp_processor_id());
bool is_up;
+ xenhost_t **xh;

- if (xen_vcpu_nr(cpu) == XEN_VCPU_ID_INVALID)
+ if (xen_vcpu_nr(xh_default, cpu) == XEN_VCPU_ID_INVALID)
continue;

/* Only Xen 4.5 and higher support this. */
is_up = HYPERVISOR_vcpu_op(VCPUOP_is_up,
- xen_vcpu_nr(cpu), NULL) > 0;
+ xen_vcpu_nr(xh_default, cpu), NULL) > 0;

if (other_cpu && is_up &&
- HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(cpu), NULL))
+ HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(xh_default, cpu), NULL))
BUG();

if (xen_pv_domain() || xen_feature(XENFEAT_hvm_safe_pvclock))
xen_setup_runstate_info(cpu);

- rc = xen_vcpu_setup_restore(cpu);
- if (rc)
- pr_emerg_once("vcpu restore failed for cpu=%d err=%d. "
- "System will hang.\n", cpu, rc);
+ for_each_xenhost(xh) {
+ rc = xen_vcpu_setup_restore(*xh, cpu);
+ if (rc)
+ pr_emerg_once("vcpu restore failed for cpu=%d err=%d. "
+ "System will hang.\n", cpu, rc);
+ }
/*
* In case xen_vcpu_setup_restore() fails, do not bring up the
* VCPU. This helps us avoid the resulting OOPS when the VCPU
@@ -172,29 +155,29 @@ void xen_vcpu_restore(void)
* VCPUs to come up.
*/
if (other_cpu && is_up && (rc == 0) &&
- HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL))
+ HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(xh_default, cpu), NULL))
BUG();
}
}

-void xen_vcpu_info_reset(int cpu)
+void xen_vcpu_info_reset(xenhost_t *xh, int cpu)
{
- if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS) {
- per_cpu(xen_vcpu, cpu) =
- &xh_default->HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)];
+ if (xen_vcpu_nr(xh, cpu) < MAX_VIRT_CPUS) {
+ xh->xen_vcpu[cpu] =
+ &xh->HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(xh, cpu)];
} else {
/* Set to NULL so that if somebody accesses it we get an OOPS */
- per_cpu(xen_vcpu, cpu) = NULL;
+ xh->xen_vcpu[cpu] = NULL;
}
}

-int xen_vcpu_setup(int cpu)
+int xen_vcpu_setup(xenhost_t *xh, int cpu)
{
struct vcpu_register_vcpu_info info;
int err;
struct vcpu_info *vcpup;

- BUG_ON(xh_default->HYPERVISOR_shared_info == &xen_dummy_shared_info);
+ BUG_ON(xh->HYPERVISOR_shared_info == &xen_dummy_shared_info);

/*
* This path is called on PVHVM at bootup (xen_hvm_smp_prepare_boot_cpu)
@@ -208,12 +191,12 @@ int xen_vcpu_setup(int cpu)
* use this function.
*/
if (xen_hvm_domain()) {
- if (per_cpu(xen_vcpu, cpu) == &per_cpu(xen_vcpu_info, cpu))
+ if (xh->xen_vcpu[cpu] == &xh->xen_vcpu_info[cpu])
return 0;
}

if (xen_have_vcpu_info_placement) {
- vcpup = &per_cpu(xen_vcpu_info, cpu);
+ vcpup = &xh->xen_vcpu_info[cpu];
info.mfn = arbitrary_virt_to_mfn(vcpup);
info.offset = offset_in_page(vcpup);

@@ -227,8 +210,8 @@ int xen_vcpu_setup(int cpu)
* hypercall does not allow to over-write info.mfn and
* info.offset.
*/
- err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info,
- xen_vcpu_nr(cpu), &info);
+ err = hypervisor_vcpu_op(xh, VCPUOP_register_vcpu_info,
+ xen_vcpu_nr(xh, cpu), &info);

if (err) {
pr_warn_once("register_vcpu_info failed: cpu=%d err=%d\n",
@@ -239,14 +222,14 @@ int xen_vcpu_setup(int cpu)
* This cpu is using the registered vcpu info, even if
* later ones fail to.
*/
- per_cpu(xen_vcpu, cpu) = vcpup;
+ xh->xen_vcpu[cpu] = vcpup;
}
}

if (!xen_have_vcpu_info_placement)
- xen_vcpu_info_reset(cpu);
+ xen_vcpu_info_reset(xh, cpu);

- return ((per_cpu(xen_vcpu, cpu) == NULL) ? -ENODEV : 0);
+ return ((xh->xen_vcpu[cpu] == NULL) ? -ENODEV : 0);
}

void xen_reboot(int reason)
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 0e53363f9d1f..c1981a3e4989 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -5,6 +5,7 @@
#include <linux/kexec.h>
#include <linux/memblock.h>

+#include <xen/interface/xen.h>
#include <xen/xenhost.h>
#include <xen/features.h>
#include <xen/events.h>
@@ -72,22 +73,22 @@ static void __init xen_hvm_init_mem_mapping(void)
{
xenhost_t **xh;

- for_each_xenhost(xh)
+ for_each_xenhost(xh) {
xenhost_reset_shared_info(*xh);

- /*
- * The virtual address of the shared_info page has changed, so
- * the vcpu_info pointer for VCPU 0 is now stale.
- *
- * The prepare_boot_cpu callback will re-initialize it via
- * xen_vcpu_setup, but we can't rely on that to be called for
- * old Xen versions (xen_have_vector_callback == 0).
- *
- * It is, in any case, bad to have a stale vcpu_info pointer
- * so reset it now.
- * For now, this uses xh_default implictly.
- */
- xen_vcpu_info_reset(0);
+ /*
+ * The virtual address of the shared_info page has changed, so
+ * the vcpu_info pointer for VCPU 0 is now stale.
+ *
+ * The prepare_boot_cpu callback will re-initialize it via
+ * xen_vcpu_setup, but we can't rely on that to be called for
+ * old Xen versions (xen_have_vector_callback == 0).
+ *
+ * It is, in any case, bad to have a stale vcpu_info pointer
+ * so reset it now.
+ */
+ xen_vcpu_info_reset(*xh, 0);
+ }
}

extern uint32_t xen_pv_cpuid_base(xenhost_t *xh);
@@ -103,11 +104,32 @@ void xen_hvm_setup_hypercall_page(xenhost_t *xh)
xh->hypercall_page = xen_hypercall_page;
}

+static void xen_hvm_probe_vcpu_id(xenhost_t *xh, int cpu)
+{
+ uint32_t eax, ebx, ecx, edx, base;
+
+ base = xenhost_cpuid_base(xh);
+
+ if (cpu == 0) {
+ cpuid(base + 4, &eax, &ebx, &ecx, &edx);
+ if (eax & XEN_HVM_CPUID_VCPU_ID_PRESENT)
+ xh->xen_vcpu_id[cpu] = ebx;
+ else
+ xh->xen_vcpu_id[cpu] = smp_processor_id();
+ } else {
+ if (cpu_acpi_id(cpu) != U32_MAX)
+ xh->xen_vcpu_id[cpu] = cpu_acpi_id(cpu);
+ else
+ xh->xen_vcpu_id[cpu] = cpu;
+ }
+}
+
xenhost_ops_t xh_hvm_ops = {
.cpuid_base = xen_pv_cpuid_base,
.setup_hypercall_page = xen_hvm_setup_hypercall_page,
.setup_shared_info = xen_hvm_init_shared_info,
.reset_shared_info = xen_hvm_reset_shared_info,
+ .probe_vcpu_id = xen_hvm_probe_vcpu_id,
};

xenhost_ops_t xh_hvm_nested_ops = {
@@ -116,7 +138,7 @@ xenhost_ops_t xh_hvm_nested_ops = {
static void __init init_hvm_pv_info(void)
{
int major, minor;
- uint32_t eax, ebx, ecx, edx, base;
+ uint32_t eax, base;
xenhost_t **xh;

base = xenhost_cpuid_base(xh_default);
@@ -147,11 +169,8 @@ static void __init init_hvm_pv_info(void)
if (xen_validate_features() == false)
__xenhost_unregister(xenhost_r2);

- cpuid(base + 4, &eax, &ebx, &ecx, &edx);
- if (eax & XEN_HVM_CPUID_VCPU_ID_PRESENT)
- this_cpu_write(xen_vcpu_id, ebx);
- else
- this_cpu_write(xen_vcpu_id, smp_processor_id());
+ for_each_xenhost(xh)
+ xenhost_probe_vcpu_id(*xh, smp_processor_id());
}

#ifdef CONFIG_KEXEC_CORE
@@ -172,6 +191,7 @@ static void xen_hvm_crash_shutdown(struct pt_regs *regs)
static int xen_cpu_up_prepare_hvm(unsigned int cpu)
{
int rc = 0;
+ xenhost_t **xh;

/*
* This can happen if CPU was offlined earlier and
@@ -182,13 +202,12 @@ static int xen_cpu_up_prepare_hvm(unsigned int cpu)
xen_uninit_lock_cpu(cpu);
}

- if (cpu_acpi_id(cpu) != U32_MAX)
- per_cpu(xen_vcpu_id, cpu) = cpu_acpi_id(cpu);
- else
- per_cpu(xen_vcpu_id, cpu) = cpu;
- rc = xen_vcpu_setup(cpu);
- if (rc)
- return rc;
+ for_each_xenhost(xh) {
+ xenhost_probe_vcpu_id(*xh, cpu);
+ rc = xen_vcpu_setup(*xh, cpu);
+ if (rc)
+ return rc;
+ }

if (xen_have_vector_callback && xen_feature(XENFEAT_hvm_safe_pvclock))
xen_setup_timer(cpu);
@@ -229,15 +248,15 @@ static void __init xen_hvm_guest_init(void)
for_each_xenhost(xh) {
reserve_shared_info(*xh);
xenhost_setup_shared_info(*xh);
+
+ /*
+ * xen_vcpu is a pointer to the vcpu_info struct in the
+ * shared_info page, we use it in the event channel upcall
+ * and in some pvclock related functions.
+ */
+ xen_vcpu_info_reset(*xh, 0);
}

- /*
- * xen_vcpu is a pointer to the vcpu_info struct in the shared_info
- * page, we use it in the event channel upcall and in some pvclock
- * related functions.
- * For now, this uses xh_default implictly.
- */
- xen_vcpu_info_reset(0);

xen_panic_handler_init();

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 1a9eded4b76b..5f6a1475ec0c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -36,8 +36,8 @@

#include <xen/xen.h>
#include <xen/events.h>
-#include <xen/xenhost.h>
#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <xen/interface/version.h>
#include <xen/interface/physdev.h>
#include <xen/interface/vcpu.h>
@@ -126,12 +126,12 @@ static void __init xen_pv_init_platform(void)

populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));

- for_each_xenhost(xh)
+ for_each_xenhost(xh) {
xenhost_setup_shared_info(*xh);

- /* xen clock uses per-cpu vcpu_info, need to init it for boot cpu */
- /* For now this uses xh_default implicitly. */
- xen_vcpu_info_reset(0);
+ /* xen clock uses per-cpu vcpu_info, need to init it for boot cpu */
+ xen_vcpu_info_reset(*xh, 0);
+ }

/* pvclock is in shared info area */
xen_init_time_ops();
@@ -973,28 +973,31 @@ static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
/* This is called once we have the cpu_possible_mask */
void __init xen_setup_vcpu_info_placement(void)
{
+ xenhost_t **xh;
int cpu;

for_each_possible_cpu(cpu) {
- /* Set up direct vCPU id mapping for PV guests. */
- per_cpu(xen_vcpu_id, cpu) = cpu;
+ for_each_xenhost(xh) {
+ xenhost_probe_vcpu_id(*xh, cpu);

- /*
- * xen_vcpu_setup(cpu) can fail -- in which case it
- * falls back to the shared_info version for cpus
- * where xen_vcpu_nr(cpu) < MAX_VIRT_CPUS.
- *
- * xen_cpu_up_prepare_pv() handles the rest by failing
- * them in hotplug.
- */
- (void) xen_vcpu_setup(cpu);
+ /*
+ * xen_vcpu_setup(cpu) can fail -- in which case it
+ * falls back to the shared_info version for cpus
+ * where xen_vcpu_nr(cpu) < MAX_VIRT_CPUS.
+ *
+ * xen_cpu_up_prepare_pv() handles the rest by failing
+ * them in hotplug.
+ */
+ (void) xen_vcpu_setup(*xh, cpu);
+ }
}

/*
* xen_vcpu_setup managed to place the vcpu_info within the
* percpu area for all cpus, so make use of it.
*/
- if (xen_have_vcpu_info_placement) {
+ if (xen_have_vcpu_info_placement && false) {
+ /* Disable direct access until we have proper pcpu data structures. */
pv_ops.irq.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
pv_ops.irq.restore_fl =
__PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
@@ -1110,6 +1113,11 @@ static unsigned char xen_get_nmi_reason(void)
{
unsigned char reason = 0;

+ /*
+ * We could get this information from all the xenhosts and OR it.
+ * But, the remote xenhost isn't really expected to send us NMIs.
+ */
+
/* Construct a value which looks like it came from port 0x61. */
if (test_bit(_XEN_NMIREASON_io_error,
&xh_default->HYPERVISOR_shared_info->arch.nmi_reason))
@@ -1222,6 +1230,12 @@ static void xen_pv_reset_shared_info(xenhost_t *xh)
BUG();
}

+void xen_pv_probe_vcpu_id(xenhost_t *xh, int cpu)
+{
+ /* Set up direct vCPU id mapping for PV guests. */
+ xh->xen_vcpu_id[cpu] = cpu;
+}
+
xenhost_ops_t xh_pv_ops = {
.cpuid_base = xen_pv_cpuid_base,

@@ -1229,6 +1243,8 @@ xenhost_ops_t xh_pv_ops = {

.setup_shared_info = xen_pv_setup_shared_info,
.reset_shared_info = xen_pv_reset_shared_info,
+
+ .probe_vcpu_id = xen_pv_probe_vcpu_id,
};

xenhost_ops_t xh_pv_nested_ops = {
@@ -1283,7 +1299,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
* Don't do the full vcpu_info placement stuff until we have
* the cpu_possible_mask and a non-dummy shared_info.
*/
- xen_vcpu_info_reset(0);
+ for_each_xenhost(xh) {
+ xen_vcpu_info_reset(*xh, 0);
+ }

x86_platform.get_nmi_reason = xen_get_nmi_reason;

@@ -1328,7 +1346,9 @@ asmlinkage __visible void __init xen_start_kernel(void)
get_cpu_address_sizes(&boot_cpu_data);

/* Let's presume PV guests always boot on vCPU with id 0. */
- per_cpu(xen_vcpu_id, 0) = 0;
+ /* Note: we should be doing this before xen_vcpu_info_reset above. */
+ for_each_xenhost(xh)
+ xenhost_probe_vcpu_id(*xh, 0);

idt_setup_early_handler();

@@ -1485,7 +1505,7 @@ static int xen_cpu_up_prepare_pv(unsigned int cpu)
{
int rc;

- if (per_cpu(xen_vcpu, cpu) == NULL)
+ if (xh_default->xen_vcpu[cpu] == NULL)
return -ENODEV;

xen_setup_timer(cpu);
diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 50277dfbdf30..3f98526dd041 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -2,13 +2,14 @@
#include <linux/acpi.h>

#include <xen/hvc-console.h>
+#include <xen/interface/xen.h>

#include <asm/io_apic.h>
#include <asm/hypervisor.h>
#include <asm/e820/api.h>

-#include <xen/xen.h>
#include <xen/xenhost.h>
+#include <xen/xen.h>
#include <asm/xen/interface.h>
#include <asm/xen/hypercall.h>

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 850c93f346c7..38ad1a1c4763 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -29,7 +29,7 @@ asmlinkage __visible unsigned long xen_save_fl(void)
struct vcpu_info *vcpu;
unsigned long flags;

- vcpu = this_cpu_read(xen_vcpu);
+ vcpu = xh_default->xen_vcpu[smp_processor_id()];

/* flag has opposite sense of mask */
flags = !vcpu->evtchn_upcall_mask;
@@ -51,7 +51,7 @@ __visible void xen_restore_fl(unsigned long flags)

/* See xen_irq_enable() for why preemption must be disabled. */
preempt_disable();
- vcpu = this_cpu_read(xen_vcpu);
+ vcpu = xh_default->xen_vcpu[smp_processor_id()];
vcpu->evtchn_upcall_mask = flags;

if (flags == 0) {
@@ -70,7 +70,7 @@ asmlinkage __visible void xen_irq_disable(void)
make sure we're don't switch CPUs between getting the vcpu
pointer and updating the mask. */
preempt_disable();
- this_cpu_read(xen_vcpu)->evtchn_upcall_mask = 1;
+ xh_default->xen_vcpu[smp_processor_id()]->evtchn_upcall_mask = 1;
preempt_enable_no_resched();
}
PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
@@ -86,7 +86,7 @@ asmlinkage __visible void xen_irq_enable(void)
*/
preempt_disable();

- vcpu = this_cpu_read(xen_vcpu);
+ vcpu = xh_default->xen_vcpu[smp_processor_id()];
vcpu->evtchn_upcall_mask = 0;

/* Doesn't matter if we get preempted here, because any
@@ -111,7 +111,7 @@ static void xen_halt(void)
{
if (irqs_disabled())
HYPERVISOR_vcpu_op(VCPUOP_down,
- xen_vcpu_nr(smp_processor_id()), NULL);
+ xen_vcpu_nr(xh_default, smp_processor_id()), NULL);
else
xen_safe_halt();
}
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 0f4fe206dcc2..e99af51ab481 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1304,17 +1304,17 @@ static void __init xen_pagetable_init(void)
}
static void xen_write_cr2(unsigned long cr2)
{
- this_cpu_read(xen_vcpu)->arch.cr2 = cr2;
+ xh_default->xen_vcpu[smp_processor_id()]->arch.cr2 = cr2;
}

static unsigned long xen_read_cr2(void)
{
- return this_cpu_read(xen_vcpu)->arch.cr2;
+ return xh_default->xen_vcpu[smp_processor_id()]->arch.cr2;
}

unsigned long xen_read_cr2_direct(void)
{
- return this_cpu_read(xen_vcpu_info.arch.cr2);
+ return xh_default->xen_vcpu_info[smp_processor_id()].arch.cr2;
}

static noinline void xen_flush_tlb(void)
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 33293ce01d8d..04f9b2e92f06 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@

#include <linux/dma-mapping.h>
#include <linux/pci.h>
+#include <xen/interface/xen.h>
#include <xen/swiotlb-xen.h>

#include <asm/xen/hypervisor.h>
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d5f303c0e656..ec8f22a54f6e 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -19,6 +19,7 @@
#include <asm/setup.h>
#include <asm/acpi.h>
#include <asm/numa.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7a43b2ae19f1..867524be0065 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -6,6 +6,7 @@
#include <linux/percpu.h>

#include <xen/events.h>
+#include <xen/xenhost.h>

#include <xen/hvc-console.h>
#include "xen-ops.h"
@@ -129,7 +130,10 @@ void __init xen_smp_cpus_done(unsigned int max_cpus)
return;

for_each_online_cpu(cpu) {
- if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS)
+ xenhost_t **xh;
+
+ if ((xen_vcpu_nr(xh_default, cpu) < MAX_VIRT_CPUS) &&
+ (!xh_remote || (xen_vcpu_nr(xh_remote, cpu) < MAX_VIRT_CPUS)))
continue;

rc = cpu_down(cpu);
@@ -138,7 +142,8 @@ void __init xen_smp_cpus_done(unsigned int max_cpus)
/*
* Reset vcpu_info so this cpu cannot be onlined again.
*/
- xen_vcpu_info_reset(cpu);
+ for_each_xenhost(xh)
+ xen_vcpu_info_reset(*xh, cpu);
count++;
} else {
pr_warn("%s: failed to bring CPU %d down, error %d\n",
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
index f8d39440b292..5e7f591bfdd9 100644
--- a/arch/x86/xen/smp_hvm.c
+++ b/arch/x86/xen/smp_hvm.c
@@ -9,6 +9,7 @@

static void __init xen_hvm_smp_prepare_boot_cpu(void)
{
+ xenhost_t **xh;
BUG_ON(smp_processor_id() != 0);
native_smp_prepare_boot_cpu();

@@ -16,7 +17,8 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
* Setup vcpu_info for boot CPU. Secondary CPUs get their vcpu_info
* in xen_cpu_up_prepare_hvm().
*/
- xen_vcpu_setup(0);
+ for_each_xenhost(xh)
+ xen_vcpu_setup(*xh, 0);

/*
* The alternative logic (which patches the unlock/lock) runs before
@@ -29,6 +31,7 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)

static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
{
+ xenhost_t **xh;
int cpu;

native_smp_prepare_cpus(max_cpus);
@@ -36,12 +39,14 @@ static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

xen_init_lock_cpu(0);

- for_each_possible_cpu(cpu) {
- if (cpu == 0)
- continue;
+ for_each_xenhost(xh) {
+ for_each_possible_cpu(cpu) {
+ if (cpu == 0)
+ continue;

- /* Set default vcpu_id to make sure that we don't use cpu-0's */
- per_cpu(xen_vcpu_id, cpu) = XEN_VCPU_ID_INVALID;
+ /* Set default vcpu_id to make sure that we don't use cpu-0's */
+ (*xh)->xen_vcpu_id[cpu] = XEN_VCPU_ID_INVALID;
+ }
}
}

diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 145506f9fdbe..6d9c3e6611ef 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -350,7 +350,7 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);

ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir));
- if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt))
+ if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(xh_default, cpu), ctxt))
BUG();

kfree(ctxt);
@@ -374,7 +374,7 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)
return rc;

/* make sure interrupts start blocked */
- per_cpu(xen_vcpu, cpu)->evtchn_upcall_mask = 1;
+ xh_default->xen_vcpu[cpu]->evtchn_upcall_mask = 1;

rc = cpu_initialize_context(cpu, idle);
if (rc)
@@ -382,7 +382,7 @@ static int xen_pv_cpu_up(unsigned int cpu, struct task_struct *idle)

xen_pmu_init(cpu);

- rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(cpu), NULL);
+ rc = HYPERVISOR_vcpu_op(VCPUOP_up, xen_vcpu_nr(xh_default, cpu), NULL);
BUG_ON(rc);

while (cpu_report_state(cpu) != CPU_ONLINE)
@@ -407,7 +407,7 @@ static int xen_pv_cpu_disable(void)
static void xen_pv_cpu_die(unsigned int cpu)
{
while (HYPERVISOR_vcpu_op(VCPUOP_is_up,
- xen_vcpu_nr(cpu), NULL)) {
+ xen_vcpu_nr(xh_default, cpu), NULL)) {
__set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ/10);
}
@@ -423,7 +423,7 @@ static void xen_pv_cpu_die(unsigned int cpu)
static void xen_pv_play_dead(void) /* used only with HOTPLUG_CPU */
{
play_dead_common();
- HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(smp_processor_id()), NULL);
+ HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(xh_default, smp_processor_id()), NULL);
cpu_bringup();
/*
* commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down)
@@ -464,7 +464,7 @@ static void stop_self(void *v)

set_cpu_online(cpu, false);

- HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(cpu), NULL);
+ HYPERVISOR_vcpu_op(VCPUOP_down, xen_vcpu_nr(xh_default, cpu), NULL);
BUG();
}

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index d4bb1f8b4f58..217bc4de07ee 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -18,12 +18,12 @@
#include <linux/timekeeper_internal.h>

#include <asm/pvclock.h>
+#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

#include <xen/events.h>
#include <xen/features.h>
-#include <xen/interface/xen.h>
#include <xen/interface/vcpu.h>

#include "xen-ops.h"
@@ -48,7 +48,7 @@ static u64 xen_clocksource_read(void)
u64 ret;

preempt_disable_notrace();
- src = &__this_cpu_read(xen_vcpu)->time;
+ src = &xh_default->xen_vcpu[smp_processor_id()]->time;
ret = pvclock_clocksource_read(src);
preempt_enable_notrace();
return ret;
@@ -70,9 +70,10 @@ static void xen_read_wallclock(struct timespec64 *ts)
struct pvclock_wall_clock *wall_clock = &(s->wc);
struct pvclock_vcpu_time_info *vcpu_time;

- vcpu_time = &get_cpu_var(xen_vcpu)->time;
+ preempt_disable_notrace();
+ vcpu_time = &xh_default->xen_vcpu[smp_processor_id()]->time;
pvclock_read_wallclock(wall_clock, vcpu_time, ts);
- put_cpu_var(xen_vcpu);
+ preempt_enable_notrace();
}

static void xen_get_wallclock(struct timespec64 *now)
@@ -233,9 +234,9 @@ static int xen_vcpuop_shutdown(struct clock_event_device *evt)
{
int cpu = smp_processor_id();

- if (HYPERVISOR_vcpu_op(VCPUOP_stop_singleshot_timer, xen_vcpu_nr(cpu),
+ if (HYPERVISOR_vcpu_op(VCPUOP_stop_singleshot_timer, xen_vcpu_nr(xh_default, cpu),
NULL) ||
- HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(cpu),
+ HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(xh_default, cpu),
NULL))
BUG();

@@ -246,7 +247,7 @@ static int xen_vcpuop_set_oneshot(struct clock_event_device *evt)
{
int cpu = smp_processor_id();

- if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(cpu),
+ if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(xh_default, cpu),
NULL))
BUG();

@@ -266,7 +267,7 @@ static int xen_vcpuop_set_next_event(unsigned long delta,
/* Get an event anyway, even if the timeout is already expired */
single.flags = 0;

- ret = HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer, xen_vcpu_nr(cpu),
+ ret = HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer, xen_vcpu_nr(xh_default, cpu),
&single);
BUG_ON(ret != 0);

@@ -366,7 +367,7 @@ void xen_timer_resume(void)

for_each_online_cpu(cpu) {
if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer,
- xen_vcpu_nr(cpu), NULL))
+ xen_vcpu_nr(xh_default, cpu), NULL))
BUG();
}
}
@@ -482,7 +483,7 @@ static void __init xen_time_init(void)

clocksource_register_hz(&xen_clocksource, NSEC_PER_SEC);

- if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(cpu),
+ if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(xh_default, cpu),
NULL) == 0) {
/* Successfully turned off 100Hz tick, so we have the
vcpuop-based timer interface */
@@ -500,7 +501,7 @@ static void __init xen_time_init(void)
* We check ahead on the primary time info if this
* bit is supported hence speeding up Xen clocksource.
*/
- pvti = &__this_cpu_read(xen_vcpu)->time;
+ pvti = &xh_default->xen_vcpu[smp_processor_id()]->time;
if (pvti->flags & PVCLOCK_TSC_STABLE_BIT) {
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
xen_setup_vsyscall_time_info();
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 5085ce88a8d7..96fd7edea7e9 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -22,7 +22,6 @@ extern void *xen_initial_gdt;
struct trap_info;
void xen_copy_trap_info(struct trap_info *traps);

-DECLARE_PER_CPU(struct vcpu_info, xen_vcpu_info);
DECLARE_PER_CPU(unsigned long, xen_cr3);
DECLARE_PER_CPU(unsigned long, xen_current_cr3);

@@ -76,8 +75,8 @@ bool xen_vcpu_stolen(int vcpu);

extern int xen_have_vcpu_info_placement;

-int xen_vcpu_setup(int cpu);
-void xen_vcpu_info_reset(int cpu);
+int xen_vcpu_setup(xenhost_t *xh, int cpu);
+void xen_vcpu_info_reset(xenhost_t *xh, int cpu);
void xen_setup_vcpu_info_placement(void);

#ifdef CONFIG_SMP
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 117e76b2f939..ae497876fe41 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -884,7 +884,7 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
irq_set_chip_and_handler_name(irq, &xen_percpu_chip,
handle_percpu_irq, "ipi");

- bind_ipi.vcpu = xen_vcpu_nr(cpu);
+ bind_ipi.vcpu = xen_vcpu_nr(xh_default, cpu);
if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
&bind_ipi) != 0)
BUG();
@@ -937,7 +937,7 @@ static int find_virq(unsigned int virq, unsigned int cpu)
continue;
if (status.status != EVTCHNSTAT_virq)
continue;
- if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(cpu)) {
+ if (status.u.virq == virq && status.vcpu == xen_vcpu_nr(xh_default, cpu)) {
rc = port;
break;
}
@@ -980,7 +980,7 @@ int bind_virq_to_irq(unsigned int virq, unsigned int cpu, bool percpu)
handle_edge_irq, "virq");

bind_virq.virq = virq;
- bind_virq.vcpu = xen_vcpu_nr(cpu);
+ bind_virq.vcpu = xen_vcpu_nr(xh_default, cpu);
ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
&bind_virq);
if (ret == 0)
@@ -1200,7 +1200,7 @@ void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)

#ifdef CONFIG_X86
if (unlikely(vector == XEN_NMI_VECTOR)) {
- int rc = HYPERVISOR_vcpu_op(VCPUOP_send_nmi, xen_vcpu_nr(cpu),
+ int rc = HYPERVISOR_vcpu_op(VCPUOP_send_nmi, xen_vcpu_nr(xh_default, cpu),
NULL);
if (rc < 0)
printk(KERN_WARNING "Sending nmi to CPU%d failed (rc:%d)\n", cpu, rc);
@@ -1306,7 +1306,7 @@ int xen_rebind_evtchn_to_cpu(int evtchn, unsigned tcpu)

/* Send future instances of this interrupt to other vcpu. */
bind_vcpu.port = evtchn;
- bind_vcpu.vcpu = xen_vcpu_nr(tcpu);
+ bind_vcpu.vcpu = xen_vcpu_nr(xh_default, tcpu);

/*
* Mask the event while changing the VCPU binding to prevent
@@ -1451,7 +1451,7 @@ static void restore_cpu_virqs(unsigned int cpu)

/* Get a new binding from Xen. */
bind_virq.virq = virq;
- bind_virq.vcpu = xen_vcpu_nr(cpu);
+ bind_virq.vcpu = xen_vcpu_nr(xh_default, cpu);
if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
&bind_virq) != 0)
BUG();
@@ -1475,7 +1475,7 @@ static void restore_cpu_ipis(unsigned int cpu)
BUG_ON(ipi_from_irq(irq) != ipi);

/* Get a new binding from Xen. */
- bind_ipi.vcpu = xen_vcpu_nr(cpu);
+ bind_ipi.vcpu = xen_vcpu_nr(xh_default, cpu);
if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
&bind_ipi) != 0)
BUG();
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 76b318e88382..eed766219dd0 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -113,7 +113,7 @@ static int init_control_block(int cpu,

init_control.control_gfn = virt_to_gfn(control_block);
init_control.offset = 0;
- init_control.vcpu = xen_vcpu_nr(cpu);
+ init_control.vcpu = xen_vcpu_nr(xh_default, cpu);

return HYPERVISOR_event_channel_op(EVTCHNOP_init_control, &init_control);
}
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 6d1a5e58968f..66622109f2be 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -475,7 +475,7 @@ static long evtchn_ioctl(struct file *file,
break;

bind_virq.virq = bind.virq;
- bind_virq.vcpu = xen_vcpu_nr(0);
+ bind_virq.vcpu = xen_vcpu_nr(xh_default, 0);
rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
&bind_virq);
if (rc != 0)
diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 0968859c29d0..feee74bbab0a 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -164,7 +164,7 @@ void xen_setup_runstate_info(int cpu)
area.addr.v = &per_cpu(xen_runstate, cpu);

if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
- xen_vcpu_nr(cpu), &area))
+ xen_vcpu_nr(xh_default, cpu), &area))
BUG();
}

diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 4969817124a8..75be9059893f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -9,12 +9,9 @@
#include <asm/xen/interface.h>
#include <xen/interface/vcpu.h>

-DECLARE_PER_CPU(struct vcpu_info *, xen_vcpu);
-
-DECLARE_PER_CPU(uint32_t, xen_vcpu_id);
-static inline uint32_t xen_vcpu_nr(int cpu)
+static inline uint32_t xen_vcpu_nr(xenhost_t *xh, int cpu)
{
- return per_cpu(xen_vcpu_id, cpu);
+ return xh->xen_vcpu_id[cpu];
}

#define XEN_VCPU_ID_INVALID U32_MAX
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index 7c19c361d16e..f6092a8987f1 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -90,6 +90,28 @@ typedef struct {
struct shared_info *HYPERVISOR_shared_info;
unsigned long shared_info_pfn;
};
+
+ struct {
+ /*
+ * Events on xen-evtchn ports show up in struct vcpu_info.
+ * With multiple xenhosts, the evtchn-port numbering space that
+ * was global so far is now attached to a xenhost.
+ *
+ * So, now we allocate vcpu_info for each processor (we had space
+ * for only MAX_VIRT_CPUS in the shared_info above.)
+ *
+ * FIXME we statically allocate for NR_CPUS because alloc_percpu()
+ * isn't available at PV boot time but this is slow.
+ */
+ struct vcpu_info xen_vcpu_info[NR_CPUS];
+ struct vcpu_info *xen_vcpu[NR_CPUS];
+
+ /*
+ * Different xenhosts might have different Linux <-> Xen vCPU-id
+ * mapping.
+ */
+ uint32_t xen_vcpu_id[NR_CPUS];
+ };
} xenhost_t;

typedef struct xenhost_ops {
@@ -139,6 +161,26 @@ typedef struct xenhost_ops {
*/
void (*setup_shared_info)(xenhost_t *xenhost);
void (*reset_shared_info)(xenhost_t *xenhost);
+
+ /*
+ * vcpu_info, vcpu_id: needs to be setup early -- all IRQ code accesses
+ * relevant bits.
+ *
+ * vcpu_id is probed on PVH/PVHVM via xen_cpuid(). For PV, its direct
+ * mapped to smp_processor_id().
+ *
+ * This is part of xenhost_t because we might be registered with two
+ * different xenhosts and both of those might have their own vcpu
+ * numbering.
+ *
+ * After the vcpu numbering is identified, we can go ahead and register
+ * vcpu_info with the xenhost; on the default xenhost this happens via
+ * the register_vcpu_info hypercall.
+ *
+ * Once vcpu_info is setup (this or the shared_info version), it would
+ * get accessed via pv_ops.irq.* and the evtchn logic.
+ */
+ void (*probe_vcpu_id)(xenhost_t *xenhost, int cpu);
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
@@ -185,4 +227,9 @@ static inline void xenhost_reset_shared_info(xenhost_t *xh)
(xh->ops->reset_shared_info)(xh);
}

+static inline void xenhost_probe_vcpu_id(xenhost_t *xh, int cpu)
+{
+ (xh->ops->probe_vcpu_id)(xh, cpu);
+}
+
#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:28:44

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 06/16] x86/xen: add shared_info support to xenhost_t

HYPERVISOR_shared_info is used for irq/evtchn communication between the
guest and the host. Abstract out the setup/reset in xenhost_t such that
nested configurations can use both xenhosts simultaneously.

In addition to irq/evtchn communication, shared_info is also used for
pvclock and p2m related state. For both of those, remote xenhost is not
of interest so we only use the default xenhost.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/include/asm/xen/hypervisor.h | 1 -
arch/x86/xen/enlighten.c | 10 ++-----
arch/x86/xen/enlighten_hvm.c | 38 +++++++++++++++++---------
arch/x86/xen/enlighten_pv.c | 28 ++++++++++++++++---
arch/x86/xen/p2m.c | 24 ++++++++---------
arch/x86/xen/suspend_hvm.c | 6 ++++-
arch/x86/xen/suspend_pv.c | 14 +++++-----
arch/x86/xen/time.c | 4 +--
arch/x86/xen/xen-ops.h | 2 --
arch/x86/xen/xenhost.c | 13 ++++++++-
drivers/xen/events/events_2l.c | 16 +++++------
include/xen/xenhost.h | 39 +++++++++++++++++++++++++++
12 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h
index 6c4cdcdf997d..3e6bd455fbd0 100644
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -33,7 +33,6 @@
#ifndef _ASM_X86_XEN_HYPERVISOR_H
#define _ASM_X86_XEN_HYPERVISOR_H

-extern struct shared_info *HYPERVISOR_shared_info;
extern struct start_info *xen_start_info;

#include <asm/processor.h>
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index f88bb14da3f2..20e0de844442 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -72,12 +72,6 @@ EXPORT_SYMBOL_GPL(xen_have_vector_callback);
uint32_t xen_start_flags __attribute__((section(".data"))) = 0;
EXPORT_SYMBOL(xen_start_flags);

-/*
- * Point at some empty memory to start with. We map the real shared_info
- * page as soon as fixmap is up and running.
- */
-struct shared_info *HYPERVISOR_shared_info = &xen_dummy_shared_info;
-
/*
* Flag to determine whether vcpu info placement is available on all
* VCPUs. We assume it is to start with, and then set it to zero on
@@ -187,7 +181,7 @@ void xen_vcpu_info_reset(int cpu)
{
if (xen_vcpu_nr(cpu) < MAX_VIRT_CPUS) {
per_cpu(xen_vcpu, cpu) =
- &HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)];
+ &xh_default->HYPERVISOR_shared_info->vcpu_info[xen_vcpu_nr(cpu)];
} else {
/* Set to NULL so that if somebody accesses it we get an OOPS */
per_cpu(xen_vcpu, cpu) = NULL;
@@ -200,7 +194,7 @@ int xen_vcpu_setup(int cpu)
int err;
struct vcpu_info *vcpup;

- BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
+ BUG_ON(xh_default->HYPERVISOR_shared_info == &xen_dummy_shared_info);

/*
* This path is called on PVHVM at bootup (xen_hvm_smp_prepare_boot_cpu)
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index a118b61a1a8a..0e53363f9d1f 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -26,21 +26,25 @@
#include "mmu.h"
#include "smp.h"

-static unsigned long shared_info_pfn;
-
-void xen_hvm_init_shared_info(void)
+static void xen_hvm_init_shared_info(xenhost_t *xh)
{
struct xen_add_to_physmap xatp;

xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
- xatp.gpfn = shared_info_pfn;
- if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
+ xatp.gpfn = xh->shared_info_pfn;
+ if (hypervisor_memory_op(xh, XENMEM_add_to_physmap, &xatp))
BUG();
}

-static void __init reserve_shared_info(void)
+static void xen_hvm_reset_shared_info(xenhost_t *xh)
+{
+ early_memunmap(xh->HYPERVISOR_shared_info, PAGE_SIZE);
+ xh->HYPERVISOR_shared_info = __va(PFN_PHYS(xh->shared_info_pfn));
+}
+
+static void __init reserve_shared_info(xenhost_t *xh)
{
u64 pa;

@@ -58,16 +62,18 @@ static void __init reserve_shared_info(void)
pa += PAGE_SIZE)
;

- shared_info_pfn = PHYS_PFN(pa);
+ xh->shared_info_pfn = PHYS_PFN(pa);

memblock_reserve(pa, PAGE_SIZE);
- HYPERVISOR_shared_info = early_memremap(pa, PAGE_SIZE);
+ xh->HYPERVISOR_shared_info = early_memremap(pa, PAGE_SIZE);
}

static void __init xen_hvm_init_mem_mapping(void)
{
- early_memunmap(HYPERVISOR_shared_info, PAGE_SIZE);
- HYPERVISOR_shared_info = __va(PFN_PHYS(shared_info_pfn));
+ xenhost_t **xh;
+
+ for_each_xenhost(xh)
+ xenhost_reset_shared_info(*xh);

/*
* The virtual address of the shared_info page has changed, so
@@ -79,6 +85,7 @@ static void __init xen_hvm_init_mem_mapping(void)
*
* It is, in any case, bad to have a stale vcpu_info pointer
* so reset it now.
+ * For now, this uses xh_default implictly.
*/
xen_vcpu_info_reset(0);
}
@@ -99,6 +106,8 @@ void xen_hvm_setup_hypercall_page(xenhost_t *xh)
xenhost_ops_t xh_hvm_ops = {
.cpuid_base = xen_pv_cpuid_base,
.setup_hypercall_page = xen_hvm_setup_hypercall_page,
+ .setup_shared_info = xen_hvm_init_shared_info,
+ .reset_shared_info = xen_hvm_reset_shared_info,
};

xenhost_ops_t xh_hvm_nested_ops = {
@@ -204,6 +213,8 @@ static int xen_cpu_dead_hvm(unsigned int cpu)

static void __init xen_hvm_guest_init(void)
{
+ xenhost_t **xh;
+
if (xen_pv_domain())
return;
/*
@@ -215,13 +226,16 @@ static void __init xen_hvm_guest_init(void)

init_hvm_pv_info();

- reserve_shared_info();
- xen_hvm_init_shared_info();
+ for_each_xenhost(xh) {
+ reserve_shared_info(*xh);
+ xenhost_setup_shared_info(*xh);
+ }

/*
* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
* page, we use it in the event channel upcall and in some pvclock
* related functions.
+ * For now, this uses xh_default implictly.
*/
xen_vcpu_info_reset(0);

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 484968ff16a4..1a9eded4b76b 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -122,12 +122,15 @@ static void __init xen_banner(void)

static void __init xen_pv_init_platform(void)
{
+ xenhost_t **xh;
+
populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));

- set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
- HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+ for_each_xenhost(xh)
+ xenhost_setup_shared_info(*xh);

/* xen clock uses per-cpu vcpu_info, need to init it for boot cpu */
+ /* For now this uses xh_default implicitly. */
xen_vcpu_info_reset(0);

/* pvclock is in shared info area */
@@ -1109,10 +1112,10 @@ static unsigned char xen_get_nmi_reason(void)

/* Construct a value which looks like it came from port 0x61. */
if (test_bit(_XEN_NMIREASON_io_error,
- &HYPERVISOR_shared_info->arch.nmi_reason))
+ &xh_default->HYPERVISOR_shared_info->arch.nmi_reason))
reason |= NMI_REASON_IOCHK;
if (test_bit(_XEN_NMIREASON_pci_serr,
- &HYPERVISOR_shared_info->arch.nmi_reason))
+ &xh_default->HYPERVISOR_shared_info->arch.nmi_reason))
reason |= NMI_REASON_SERR;

return reason;
@@ -1205,10 +1208,27 @@ static void xen_pv_setup_hypercall_page(xenhost_t *xh)
xh->hypercall_page = xen_hypercall_page;
}

+static void xen_pv_setup_shared_info(xenhost_t *xh)
+{
+ set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
+ xh->HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+}
+
+static void xen_pv_reset_shared_info(xenhost_t *xh)
+{
+ xh->HYPERVISOR_shared_info = &xen_dummy_shared_info;
+ if (hypervisor_update_va_mapping(xh, fix_to_virt(FIX_PARAVIRT_BOOTMAP),
+ __pte_ma(0), 0))
+ BUG();
+}
+
xenhost_ops_t xh_pv_ops = {
.cpuid_base = xen_pv_cpuid_base,

.setup_hypercall_page = xen_pv_setup_hypercall_page,
+
+ .setup_shared_info = xen_pv_setup_shared_info,
+ .reset_shared_info = xen_pv_reset_shared_info,
};

xenhost_ops_t xh_pv_nested_ops = {
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 055e37e43541..8200a9582246 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -270,17 +270,17 @@ void __ref xen_build_mfn_list_list(void)

void xen_setup_mfn_list_list(void)
{
- BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
+ BUG_ON(xh_default->HYPERVISOR_shared_info == &xen_dummy_shared_info);

if (xen_start_info->flags & SIF_VIRT_P2M_4TOOLS)
- HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list = ~0UL;
+ xh_default->HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list = ~0UL;
else
- HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
+ xh_default->HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
virt_to_mfn(p2m_top_mfn);
- HYPERVISOR_shared_info->arch.max_pfn = xen_p2m_last_pfn;
- HYPERVISOR_shared_info->arch.p2m_generation = 0;
- HYPERVISOR_shared_info->arch.p2m_vaddr = (unsigned long)xen_p2m_addr;
- HYPERVISOR_shared_info->arch.p2m_cr3 =
+ xh_default->HYPERVISOR_shared_info->arch.max_pfn = xen_p2m_last_pfn;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_generation = 0;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_vaddr = (unsigned long)xen_p2m_addr;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_cr3 =
xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
}

@@ -496,12 +496,12 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *pte_pg)

ptechk = lookup_address(vaddr, &level);
if (ptechk == pte_pg) {
- HYPERVISOR_shared_info->arch.p2m_generation++;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_generation++;
wmb(); /* Tools are synchronizing via p2m_generation. */
set_pmd(pmdp,
__pmd(__pa(pte_newpg[i]) | _KERNPG_TABLE));
wmb(); /* Tools are synchronizing via p2m_generation. */
- HYPERVISOR_shared_info->arch.p2m_generation++;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_generation++;
pte_newpg[i] = NULL;
}

@@ -597,12 +597,12 @@ int xen_alloc_p2m_entry(unsigned long pfn)
spin_lock_irqsave(&p2m_update_lock, flags);

if (pte_pfn(*ptep) == p2m_pfn) {
- HYPERVISOR_shared_info->arch.p2m_generation++;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_generation++;
wmb(); /* Tools are synchronizing via p2m_generation. */
set_pte(ptep,
pfn_pte(PFN_DOWN(__pa(p2m)), PAGE_KERNEL));
wmb(); /* Tools are synchronizing via p2m_generation. */
- HYPERVISOR_shared_info->arch.p2m_generation++;
+ xh_default->HYPERVISOR_shared_info->arch.p2m_generation++;
if (mid_mfn)
mid_mfn[p2m_mid_index(pfn)] = virt_to_mfn(p2m);
p2m = NULL;
@@ -617,7 +617,7 @@ int xen_alloc_p2m_entry(unsigned long pfn)
/* Expanded the p2m? */
if (pfn > xen_p2m_last_pfn) {
xen_p2m_last_pfn = pfn;
- HYPERVISOR_shared_info->arch.max_pfn = xen_p2m_last_pfn;
+ xh_default->HYPERVISOR_shared_info->arch.max_pfn = xen_p2m_last_pfn;
}

return 0;
diff --git a/arch/x86/xen/suspend_hvm.c b/arch/x86/xen/suspend_hvm.c
index e666b614cf6d..cc9a0163845c 100644
--- a/arch/x86/xen/suspend_hvm.c
+++ b/arch/x86/xen/suspend_hvm.c
@@ -2,6 +2,7 @@
#include <linux/types.h>

#include <xen/xen.h>
+#include <xen/xenhost.h>
#include <xen/features.h>
#include <xen/interface/features.h>

@@ -10,7 +11,10 @@
void xen_hvm_post_suspend(int suspend_cancelled)
{
if (!suspend_cancelled) {
- xen_hvm_init_shared_info();
+ xenhost_t **xh;
+
+ for_each_xenhost(xh)
+ xenhost_setup_shared_info(*xh);
xen_vcpu_restore();
}
xen_callback_vector();
diff --git a/arch/x86/xen/suspend_pv.c b/arch/x86/xen/suspend_pv.c
index 8303b58c79a9..87af0c0cc66f 100644
--- a/arch/x86/xen/suspend_pv.c
+++ b/arch/x86/xen/suspend_pv.c
@@ -10,6 +10,8 @@

void xen_pv_pre_suspend(void)
{
+ xenhost_t **xh;
+
xen_mm_pin_all();

xen_start_info->store_mfn = mfn_to_pfn(xen_start_info->store_mfn);
@@ -18,17 +20,17 @@ void xen_pv_pre_suspend(void)

BUG_ON(!irqs_disabled());

- HYPERVISOR_shared_info = &xen_dummy_shared_info;
- if (HYPERVISOR_update_va_mapping(fix_to_virt(FIX_PARAVIRT_BOOTMAP),
- __pte_ma(0), 0))
- BUG();
+ for_each_xenhost(xh)
+ xenhost_reset_shared_info(*xh);
}

void xen_pv_post_suspend(int suspend_cancelled)
{
+ xenhost_t **xh;
+
xen_build_mfn_list_list();
- set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
- HYPERVISOR_shared_info = (void *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+ for_each_xenhost(xh)
+ xenhost_setup_shared_info(*xh);
xen_setup_mfn_list_list();

if (suspend_cancelled) {
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 6e29794573b7..d4bb1f8b4f58 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -37,7 +37,7 @@ static u64 xen_sched_clock_offset __read_mostly;
static unsigned long xen_tsc_khz(void)
{
struct pvclock_vcpu_time_info *info =
- &HYPERVISOR_shared_info->vcpu_info[0].time;
+ &xh_default->HYPERVISOR_shared_info->vcpu_info[0].time;

return pvclock_tsc_khz(info);
}
@@ -66,7 +66,7 @@ static u64 xen_sched_clock(void)

static void xen_read_wallclock(struct timespec64 *ts)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
struct pvclock_wall_clock *wall_clock = &(s->wc);
struct pvclock_vcpu_time_info *vcpu_time;

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 0e60bd918695..5085ce88a8d7 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -28,7 +28,6 @@ DECLARE_PER_CPU(unsigned long, xen_current_cr3);

extern struct start_info *xen_start_info;
extern struct shared_info xen_dummy_shared_info;
-extern struct shared_info *HYPERVISOR_shared_info;

void xen_setup_mfn_list_list(void);
void xen_build_mfn_list_list(void);
@@ -56,7 +55,6 @@ void xen_enable_syscall(void);
void xen_vcpu_restore(void);

void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
void xen_unplug_emulated_devices(void);

void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/arch/x86/xen/xenhost.c b/arch/x86/xen/xenhost.c
index ca90acd7687e..3d8ccef89dcd 100644
--- a/arch/x86/xen/xenhost.c
+++ b/arch/x86/xen/xenhost.c
@@ -2,8 +2,19 @@
#include <linux/bug.h>
#include <xen/xen.h>
#include <xen/xenhost.h>
+#include "xen-ops.h"

-xenhost_t xenhosts[2];
+/*
+ * Point at some empty memory to start with. On PV, we map the real shared_info
+ * page as soon as fixmap is up and running and PVH* doesn't use this.
+ */
+xenhost_t xenhosts[2] = {
+ /*
+ * We should probably have two separate dummy shared_info pages.
+ */
+ [0].HYPERVISOR_shared_info = &xen_dummy_shared_info,
+ [1].HYPERVISOR_shared_info = &xen_dummy_shared_info,
+};
/*
* xh_default: interface to the regular hypervisor. xenhost_type is xenhost_r0
* or xenhost_r1.
diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index 8edef51c92e5..f09dbe4e9c33 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -55,37 +55,37 @@ static void evtchn_2l_bind_to_cpu(struct irq_info *info, unsigned cpu)

static void evtchn_2l_clear_pending(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_clear_bit(port, BM(&s->evtchn_pending[0]));
}

static void evtchn_2l_set_pending(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_set_bit(port, BM(&s->evtchn_pending[0]));
}

static bool evtchn_2l_is_pending(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
return sync_test_bit(port, BM(&s->evtchn_pending[0]));
}

static bool evtchn_2l_test_and_set_mask(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
return sync_test_and_set_bit(port, BM(&s->evtchn_mask[0]));
}

static void evtchn_2l_mask(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
sync_set_bit(port, BM(&s->evtchn_mask[0]));
}

static void evtchn_2l_unmask(unsigned port)
{
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
unsigned int cpu = get_cpu();
int do_hypercall = 0, evtchn_pending = 0;

@@ -167,7 +167,7 @@ static void evtchn_2l_handle_events(unsigned cpu)
int start_word_idx, start_bit_idx;
int word_idx, bit_idx;
int i;
- struct shared_info *s = HYPERVISOR_shared_info;
+ struct shared_info *s = xh_default->HYPERVISOR_shared_info;
struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);

/* Timer interrupt has highest priority. */
@@ -264,7 +264,7 @@ static void evtchn_2l_handle_events(unsigned cpu)

irqreturn_t xen_debug_interrupt(int irq, void *dev_id)
{
- struct shared_info *sh = HYPERVISOR_shared_info;
+ struct shared_info *sh = xh_default->HYPERVISOR_shared_info;
int cpu = smp_processor_id();
xen_ulong_t *cpu_evtchn = per_cpu(cpu_evtchn_mask, cpu);
int i;
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index dd1e2b64f50d..7c19c361d16e 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -82,6 +82,14 @@ typedef struct {
* bounce callbacks via L1-Xen.
*/
u8 features[XENFEAT_NR_SUBMAPS * 32];
+
+ /*
+ * shared-info to communicate with this xenhost instance.
+ */
+ struct {
+ struct shared_info *HYPERVISOR_shared_info;
+ unsigned long shared_info_pfn;
+ };
} xenhost_t;

typedef struct xenhost_ops {
@@ -111,6 +119,26 @@ typedef struct xenhost_ops {
* to decide which particular L1-guest was the caller.
*/
void (*setup_hypercall_page)(xenhost_t *xenhost);
+
+ /*
+ * shared_info: needed before vcpu-info setup.
+ *
+ * Needed early because Xen needs it for irq_disable() and such.
+ * On PV first a dummy_shared_info is setup which eventually gets
+ * switched to the real one so this needs to support switching
+ * xenhost.
+ *
+ * Reset for PV is done differently from HVM, so provide a
+ * separate interface.
+ *
+ * xenhost_r0: point xenhost->HYPERVISOR_shared_info to a
+ * newly allocated shared_info page.
+ * xenhost_r1: similar to what we do now.
+ * xenhost_r2: new remote hypercall to setup a shared_info page.
+ * This is where we would now handle L0-Xen irq/evtchns.
+ */
+ void (*setup_shared_info)(xenhost_t *xenhost);
+ void (*reset_shared_info)(xenhost_t *xenhost);
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
@@ -146,4 +174,15 @@ static inline void xenhost_setup_hypercall_page(xenhost_t *xh)
(xh->ops->setup_hypercall_page)(xh);
}

+
+static inline void xenhost_setup_shared_info(xenhost_t *xh)
+{
+ (xh->ops->setup_shared_info)(xh);
+}
+
+static inline void xenhost_reset_shared_info(xenhost_t *xh)
+{
+ (xh->ops->reset_shared_info)(xh);
+}
+
#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:29:33

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 10/16] xen/balloon: support ballooning in xenhost_t

Xen ballooning uses hollow struct pages (with the underlying GFNs being
populated/unpopulated via hypercalls) which are used by the grant logic
to map grants from other domains.

This patch allows the default xenhost to provide an alternate ballooning
allocation mechanism. This is expected to be useful for local xenhosts
(type xenhost_r0) because unlike Xen, where there is an external
hypervisor which can change the memory underneath a GFN, that is not
possible when the hypervisor is running in the same address space
as the entity doing the ballooning.

Co-developed-by: Ankur Arora <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/enlighten_hvm.c | 7 +++++++
arch/x86/xen/enlighten_pv.c | 8 ++++++++
drivers/xen/balloon.c | 19 ++++++++++++++++---
drivers/xen/grant-table.c | 4 ++--
drivers/xen/privcmd.c | 4 ++--
drivers/xen/xen-selfballoon.c | 2 ++
drivers/xen/xenbus/xenbus_client.c | 6 +++---
drivers/xen/xlate_mmu.c | 4 ++--
include/xen/balloon.h | 4 ++--
include/xen/xenhost.h | 19 +++++++++++++++++++
10 files changed, 63 insertions(+), 14 deletions(-)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index efe483ceeb9a..a371bb9ee478 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -130,9 +130,16 @@ xenhost_ops_t xh_hvm_ops = {
.setup_shared_info = xen_hvm_init_shared_info,
.reset_shared_info = xen_hvm_reset_shared_info,
.probe_vcpu_id = xen_hvm_probe_vcpu_id,
+
+ /* We just use the default method of ballooning. */
+ .alloc_ballooned_pages = NULL,
+ .free_ballooned_pages = NULL,
};

xenhost_ops_t xh_hvm_nested_ops = {
+ /* Nested xenhosts, are disallowed ballooning */
+ .alloc_ballooned_pages = NULL,
+ .free_ballooned_pages = NULL,
};

static void __init init_hvm_pv_info(void)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 77b1a0d4aef2..2e94e02cdbb4 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1247,11 +1247,19 @@ xenhost_ops_t xh_pv_ops = {
.reset_shared_info = xen_pv_reset_shared_info,

.probe_vcpu_id = xen_pv_probe_vcpu_id,
+
+ /* We just use the default method of ballooning. */
+ .alloc_ballooned_pages = NULL,
+ .free_ballooned_pages = NULL,
};

xenhost_ops_t xh_pv_nested_ops = {
.cpuid_base = xen_pv_nested_cpuid_base,
.setup_hypercall_page = NULL,
+
+ /* Nested xenhosts, are disallowed ballooning */
+ .alloc_ballooned_pages = NULL,
+ .free_ballooned_pages = NULL,
};

/* First C function to be called on Xen boot */
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5ef4d6ad920d..08becf574743 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -63,6 +63,7 @@
#include <asm/tlb.h>

#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>

@@ -583,12 +584,21 @@ static int add_ballooned_pages(int nr_pages)
* @pages: pages returned
* @return 0 on success, error otherwise
*/
-int alloc_xenballooned_pages(int nr_pages, struct page **pages)
+int alloc_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages)
{
int pgno = 0;
struct page *page;
int ret;

+ /*
+ * xenmem transactions for remote xenhost are disallowed.
+ */
+ if (xh->type == xenhost_r2)
+ return -EINVAL;
+
+ if (xh->ops->alloc_ballooned_pages)
+ return xh->ops->alloc_ballooned_pages(xh, nr_pages, pages);
+
mutex_lock(&balloon_mutex);

balloon_stats.target_unpopulated += nr_pages;
@@ -620,7 +630,7 @@ int alloc_xenballooned_pages(int nr_pages, struct page **pages)
return 0;
out_undo:
mutex_unlock(&balloon_mutex);
- free_xenballooned_pages(pgno, pages);
+ free_xenballooned_pages(xh, pgno, pages);
return ret;
}
EXPORT_SYMBOL(alloc_xenballooned_pages);
@@ -630,10 +640,13 @@ EXPORT_SYMBOL(alloc_xenballooned_pages);
* @nr_pages: Number of pages
* @pages: pages to return
*/
-void free_xenballooned_pages(int nr_pages, struct page **pages)
+void free_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages)
{
int i;

+ if (xh->ops->free_ballooned_pages)
+ return xh->ops->free_ballooned_pages(xh, nr_pages, pages);
+
mutex_lock(&balloon_mutex);

for (i = 0; i < nr_pages; i++) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 98af259d0d4f..ec90769907a4 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -804,7 +804,7 @@ int gnttab_alloc_pages(int nr_pages, struct page **pages)
{
int ret;

- ret = alloc_xenballooned_pages(nr_pages, pages);
+ ret = alloc_xenballooned_pages(xh_default, nr_pages, pages);
if (ret < 0)
return ret;

@@ -839,7 +839,7 @@ EXPORT_SYMBOL_GPL(gnttab_pages_clear_private);
void gnttab_free_pages(int nr_pages, struct page **pages)
{
gnttab_pages_clear_private(nr_pages, pages);
- free_xenballooned_pages(nr_pages, pages);
+ free_xenballooned_pages(xh_default, nr_pages, pages);
}
EXPORT_SYMBOL_GPL(gnttab_free_pages);

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index b5541f862720..88cd99e4f5c1 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -427,7 +427,7 @@ static int alloc_empty_pages(struct vm_area_struct *vma, int numpgs)
if (pages == NULL)
return -ENOMEM;

- rc = alloc_xenballooned_pages(numpgs, pages);
+ rc = alloc_xenballooned_pages(xh_default, numpgs, pages);
if (rc != 0) {
pr_warn("%s Could not alloc %d pfns rc:%d\n", __func__,
numpgs, rc);
@@ -928,7 +928,7 @@ static void privcmd_close(struct vm_area_struct *vma)

rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
if (rc == 0)
- free_xenballooned_pages(numpgs, pages);
+ free_xenballooned_pages(xh_default, numpgs, pages);
else
pr_crit("unable to unmap MFN range: leaking %d pages. rc=%d\n",
numpgs, rc);
diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index 246f6122c9ee..83a3995a33e3 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -74,6 +74,8 @@
#include <linux/mman.h>
#include <linux/workqueue.h>
#include <linux/device.h>
+#include <xen/interface/xen.h>
+#include <xen/xenhost.h>
#include <xen/balloon.h>
#include <xen/tmem.h>
#include <xen/xen.h>
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index f0cf47765726..5748fbaf0238 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -563,7 +563,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
if (!node)
return -ENOMEM;

- err = alloc_xenballooned_pages(nr_pages, node->hvm.pages);
+ err = alloc_xenballooned_pages(xh_default, nr_pages, node->hvm.pages);
if (err)
goto out_err;

@@ -602,7 +602,7 @@ static int xenbus_map_ring_valloc_hvm(struct xenbus_device *dev,
addr, nr_pages);
out_free_ballooned_pages:
if (!leaked)
- free_xenballooned_pages(nr_pages, node->hvm.pages);
+ free_xenballooned_pages(xh_default, nr_pages, node->hvm.pages);
out_err:
kfree(node);
return err;
@@ -849,7 +849,7 @@ static int xenbus_unmap_ring_vfree_hvm(struct xenbus_device *dev, void *vaddr)
info.addrs);
if (!rv) {
vunmap(vaddr);
- free_xenballooned_pages(nr_pages, node->hvm.pages);
+ free_xenballooned_pages(xh_default, nr_pages, node->hvm.pages);
}
else
WARN(1, "Leaking %p, size %u page(s)\n", vaddr, nr_pages);
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index e7df65d32c91..f25a80a4076b 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -233,7 +233,7 @@ int __init xen_xlate_map_ballooned_pages(xen_pfn_t **gfns, void **virt,
kfree(pages);
return -ENOMEM;
}
- rc = alloc_xenballooned_pages(nr_pages, pages);
+ rc = alloc_xenballooned_pages(xh_default, nr_pages, pages);
if (rc) {
pr_warn("%s Couldn't balloon alloc %ld pages rc:%d\n", __func__,
nr_pages, rc);
@@ -250,7 +250,7 @@ int __init xen_xlate_map_ballooned_pages(xen_pfn_t **gfns, void **virt,
if (!vaddr) {
pr_warn("%s Couldn't map %ld pages rc:%d\n", __func__,
nr_pages, rc);
- free_xenballooned_pages(nr_pages, pages);
+ free_xenballooned_pages(xh_default, nr_pages, pages);
kfree(pages);
kfree(pfns);
return -ENOMEM;
diff --git a/include/xen/balloon.h b/include/xen/balloon.h
index 4914b93a23f2..e8fb5a5ef490 100644
--- a/include/xen/balloon.h
+++ b/include/xen/balloon.h
@@ -24,8 +24,8 @@ extern struct balloon_stats balloon_stats;

void balloon_set_new_target(unsigned long target);

-int alloc_xenballooned_pages(int nr_pages, struct page **pages);
-void free_xenballooned_pages(int nr_pages, struct page **pages);
+int alloc_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages);
+void free_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages);

struct device;
#ifdef CONFIG_XEN_SELFBALLOONING
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index c9dabf739ff8..9e08627a9e3e 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -198,6 +198,25 @@ typedef struct xenhost_ops {
* get accessed via pv_ops.irq.* and the evtchn logic.
*/
void (*probe_vcpu_id)(xenhost_t *xenhost, int cpu);
+
+ /*
+ * We only want to do ballooning with the default xenhost -- two
+ * hypervisors managing a guest's memory is unlikely to lead anywhere
+ * good and xenballooned frames obtained from the default xenhost can
+ * be just as well populated by the remote xenhost (which is what we
+ * will need it for.)
+ *
+ * xenhost_r1: unchanged from before.
+ * xenhost_r2: disallowed.
+ * xenhost_r0: for a local xenhost, unlike Xen, there's no external entity
+ * which can remap pages, so the balloon alocation here just returns page-0.
+ * When the allocated page is used (in GNTTABOP_map_grant_ref), we fix this
+ * up by returning the correct page.
+ */
+
+ int (*alloc_ballooned_pages)(xenhost_t *xh, int nr_pages, struct page **pages);
+ void (*free_ballooned_pages)(xenhost_t *xh, int nr_pages, struct page **pages);
+
} xenhost_ops_t;

extern xenhost_t *xh_default, *xh_remote;
--
2.20.1

2019-05-09 17:29:38

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 11/16] xen/grant-table: make grant-table xenhost aware

Largely mechanical changes: the exported grant table symbols now take
xenhost_t * as a parameter. Also, move the grant table global state
inside xenhost_t.

If there's more than one xenhost, then initialize both.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/grant-table.c | 71 +++--
drivers/xen/grant-table.c | 611 +++++++++++++++++++++----------------
include/xen/grant_table.h | 72 ++---
include/xen/xenhost.h | 11 +
4 files changed, 443 insertions(+), 322 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index ecb0d5450334..8f4b071427f9 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -23,48 +23,54 @@

#include <asm/pgtable.h>

-static struct gnttab_vm_area {
+struct gnttab_vm_area {
struct vm_struct *area;
pte_t **ptes;
-} gnttab_shared_vm_area, gnttab_status_vm_area;
+};

-int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
- unsigned long max_nr_gframes,
- void **__shared)
+int arch_gnttab_map_shared(xenhost_t *xh, unsigned long *frames,
+ unsigned long nr_gframes,
+ unsigned long max_nr_gframes,
+ void **__shared)
{
void *shared = *__shared;
unsigned long addr;
unsigned long i;

if (shared == NULL)
- *__shared = shared = gnttab_shared_vm_area.area->addr;
+ *__shared = shared = ((struct gnttab_vm_area *)
+ xh->gnttab_shared_vm_area)->area->addr;

addr = (unsigned long)shared;

for (i = 0; i < nr_gframes; i++) {
- set_pte_at(&init_mm, addr, gnttab_shared_vm_area.ptes[i],
- mfn_pte(frames[i], PAGE_KERNEL));
+ set_pte_at(&init_mm, addr,
+ ((struct gnttab_vm_area *) xh->gnttab_shared_vm_area)->ptes[i],
+ mfn_pte(frames[i], PAGE_KERNEL));
addr += PAGE_SIZE;
}

return 0;
}

-int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
- unsigned long max_nr_gframes,
- grant_status_t **__shared)
+int arch_gnttab_map_status(xenhost_t *xh, uint64_t *frames,
+ unsigned long nr_gframes,
+ unsigned long max_nr_gframes,
+ grant_status_t **__shared)
{
grant_status_t *shared = *__shared;
unsigned long addr;
unsigned long i;

if (shared == NULL)
- *__shared = shared = gnttab_status_vm_area.area->addr;
+ *__shared = shared = ((struct gnttab_vm_area *)
+ xh->gnttab_status_vm_area)->area->addr;

addr = (unsigned long)shared;

for (i = 0; i < nr_gframes; i++) {
- set_pte_at(&init_mm, addr, gnttab_status_vm_area.ptes[i],
+ set_pte_at(&init_mm, addr, ((struct gnttab_vm_area *)
+ xh->gnttab_status_vm_area)->ptes[i],
mfn_pte(frames[i], PAGE_KERNEL));
addr += PAGE_SIZE;
}
@@ -72,16 +78,17 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
return 0;
}

-void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
+void arch_gnttab_unmap(xenhost_t *xh, void *shared, unsigned long nr_gframes)
{
pte_t **ptes;
unsigned long addr;
unsigned long i;

- if (shared == gnttab_status_vm_area.area->addr)
- ptes = gnttab_status_vm_area.ptes;
+ if (shared == ((struct gnttab_vm_area *)
+ xh->gnttab_status_vm_area)->area->addr)
+ ptes = ((struct gnttab_vm_area *) xh->gnttab_status_vm_area)->ptes;
else
- ptes = gnttab_shared_vm_area.ptes;
+ ptes = ((struct gnttab_vm_area *) xh->gnttab_shared_vm_area)->ptes;

addr = (unsigned long)shared;

@@ -112,14 +119,15 @@ static void arch_gnttab_vfree(struct gnttab_vm_area *area)
kfree(area->ptes);
}

-int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status)
+int arch_gnttab_init(xenhost_t *xh, unsigned long nr_shared, unsigned long nr_status)
{
int ret;

if (!xen_pv_domain())
return 0;

- ret = arch_gnttab_valloc(&gnttab_shared_vm_area, nr_shared);
+ ret = arch_gnttab_valloc((struct gnttab_vm_area *)
+ xh->gnttab_shared_vm_area, nr_shared);
if (ret < 0)
return ret;

@@ -127,13 +135,15 @@ int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status)
* Always allocate the space for the status frames in case
* we're migrated to a host with V2 support.
*/
- ret = arch_gnttab_valloc(&gnttab_status_vm_area, nr_status);
+ ret = arch_gnttab_valloc((struct gnttab_vm_area *)
+ xh->gnttab_status_vm_area, nr_status);
if (ret < 0)
goto err;

return 0;
err:
- arch_gnttab_vfree(&gnttab_shared_vm_area);
+ arch_gnttab_vfree((struct gnttab_vm_area *)
+ xh->gnttab_shared_vm_area);
return -ENOMEM;
}

@@ -142,16 +152,25 @@ int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status)
#include <xen/xen-ops.h>
static int __init xen_pvh_gnttab_setup(void)
{
+ xenhost_t **xh;
+ int err;
+
if (!xen_pvh_domain())
return -ENODEV;

- xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
+ for_each_xenhost(xh) {
+ struct grant_frames *gf = (struct grant_frames *) (*xh)->auto_xlat_grant_frames;

- return xen_xlate_map_ballooned_pages(&xen_auto_xlat_grant_frames.pfn,
- &xen_auto_xlat_grant_frames.vaddr,
- xen_auto_xlat_grant_frames.count);
+ gf->count = gnttab_max_grant_frames(*xh);
+
+ err = xen_xlate_map_ballooned_pages(&gf->pfn, &gf->vaddr, gf->count);
+ if (err)
+ return err;
+ }
+
+ return 0;
}
/* Call it _before_ __gnttab_init as we need to initialize the
- * xen_auto_xlat_grant_frames first. */
+ * auto_xlat_grant_frames first. */
core_initcall(xen_pvh_gnttab_setup);
#endif
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index ec90769907a4..959b81ade113 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -72,21 +72,10 @@
#define NR_RESERVED_ENTRIES 8
#define GNTTAB_LIST_END 0xffffffff

-static grant_ref_t **gnttab_list;
-static unsigned int nr_grant_frames;
-static int gnttab_free_count;
-static grant_ref_t gnttab_free_head;
static DEFINE_SPINLOCK(gnttab_list_lock);
-struct grant_frames xen_auto_xlat_grant_frames;
static unsigned int xen_gnttab_version;
module_param_named(version, xen_gnttab_version, uint, 0);

-static union {
- struct grant_entry_v1 *v1;
- union grant_entry_v2 *v2;
- void *addr;
-} gnttab_shared;
-
/*This is a structure of function pointers for grant table*/
struct gnttab_ops {
/*
@@ -103,12 +92,12 @@ struct gnttab_ops {
* nr_gframes is the number of frames to map grant table. Returning
* GNTST_okay means success and negative value means failure.
*/
- int (*map_frames)(xen_pfn_t *frames, unsigned int nr_gframes);
+ int (*map_frames)(xenhost_t *xh, xen_pfn_t *frames, unsigned int nr_gframes);
/*
* Release a list of frames which are mapped in map_frames for grant
* entry status.
*/
- void (*unmap_frames)(void);
+ void (*unmap_frames)(xenhost_t *xh);
/*
* Introducing a valid entry into the grant table, granting the frame of
* this grant entry to domain for accessing or transfering. Ref
@@ -116,7 +105,7 @@ struct gnttab_ops {
* granted domain, frame is the page frame to be granted, and flags is
* status of the grant entry to be updated.
*/
- void (*update_entry)(grant_ref_t ref, domid_t domid,
+ void (*update_entry)(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long frame, unsigned flags);
/*
* Stop granting a grant entry to domain for accessing. Ref parameter is
@@ -126,7 +115,7 @@ struct gnttab_ops {
* directly and don't tear down the grant access. Otherwise, stop grant
* access for this entry and return success(==1).
*/
- int (*end_foreign_access_ref)(grant_ref_t ref, int readonly);
+ int (*end_foreign_access_ref)(xenhost_t *xh, grant_ref_t ref, int readonly);
/*
* Stop granting a grant entry to domain for transfer. Ref parameter is
* reference of a grant entry whose grant transfer will be stopped. If
@@ -134,14 +123,14 @@ struct gnttab_ops {
* failure(==0). Otherwise, wait for the transfer to complete and then
* return the frame.
*/
- unsigned long (*end_foreign_transfer_ref)(grant_ref_t ref);
+ unsigned long (*end_foreign_transfer_ref)(xenhost_t *xh, grant_ref_t ref);
/*
* Query the status of a grant entry. Ref parameter is reference of
* queried grant entry, return value is the status of queried entry.
* Detailed status(writing/reading) can be gotten from the return value
* by bit operations.
*/
- int (*query_foreign_access)(grant_ref_t ref);
+ int (*query_foreign_access)(xenhost_t *xh, grant_ref_t ref);
};

struct unmap_refs_callback_data {
@@ -149,85 +138,105 @@ struct unmap_refs_callback_data {
int result;
};

-static const struct gnttab_ops *gnttab_interface;
+struct gnttab_private {
+ const struct gnttab_ops *gnttab_interface;
+ grant_status_t *grstatus;
+ grant_ref_t gnttab_free_head;
+ unsigned int nr_grant_frames;
+ int gnttab_free_count;
+ struct gnttab_free_callback *gnttab_free_callback_list;
+ struct grant_frames auto_xlat_grant_frames;
+ grant_ref_t **gnttab_list;

-/* This reflects status of grant entries, so act as a global value. */
-static grant_status_t *grstatus;
+ union {
+ struct grant_entry_v1 *v1;
+ union grant_entry_v2 *v2;
+ void *addr;
+ } gnttab_shared;
+};

-static struct gnttab_free_callback *gnttab_free_callback_list;
+#define gt_priv(xh) ((struct gnttab_private *) (xh)->gnttab_private)

-static int gnttab_expand(unsigned int req_entries);
+static int gnttab_expand(xenhost_t *xh, unsigned int req_entries);

#define RPP (PAGE_SIZE / sizeof(grant_ref_t))
#define SPP (PAGE_SIZE / sizeof(grant_status_t))

-static inline grant_ref_t *__gnttab_entry(grant_ref_t entry)
+static inline grant_ref_t *__gnttab_entry(xenhost_t *xh, grant_ref_t entry)
{
- return &gnttab_list[(entry) / RPP][(entry) % RPP];
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return &gt->gnttab_list[(entry) / RPP][(entry) % RPP];
}
/* This can be used as an l-value */
-#define gnttab_entry(entry) (*__gnttab_entry(entry))
+#define gnttab_entry(xh, entry) (*__gnttab_entry(xh, entry))

-static int get_free_entries(unsigned count)
+static int get_free_entries(xenhost_t *xh, unsigned count)
{
unsigned long flags;
int ref, rc = 0;
grant_ref_t head;
+ struct gnttab_private *gt = gt_priv(xh);

spin_lock_irqsave(&gnttab_list_lock, flags);

- if ((gnttab_free_count < count) &&
- ((rc = gnttab_expand(count - gnttab_free_count)) < 0)) {
+ if ((gt->gnttab_free_count < count) &&
+ ((rc = gnttab_expand(xh, count - gt->gnttab_free_count)) < 0)) {
spin_unlock_irqrestore(&gnttab_list_lock, flags);
return rc;
}

- ref = head = gnttab_free_head;
- gnttab_free_count -= count;
+ ref = head = gt->gnttab_free_head;
+ gt->gnttab_free_count -= count;
while (count-- > 1)
- head = gnttab_entry(head);
- gnttab_free_head = gnttab_entry(head);
- gnttab_entry(head) = GNTTAB_LIST_END;
+ head = gnttab_entry(xh, head);
+ gt->gnttab_free_head = gnttab_entry(xh, head);
+ gnttab_entry(xh, head) = GNTTAB_LIST_END;

spin_unlock_irqrestore(&gnttab_list_lock, flags);

return ref;
}

-static void do_free_callbacks(void)
+static void do_free_callbacks(xenhost_t *xh)
{
struct gnttab_free_callback *callback, *next;
+ struct gnttab_private *gt = gt_priv(xh);

- callback = gnttab_free_callback_list;
- gnttab_free_callback_list = NULL;
+ callback = gt->gnttab_free_callback_list;
+ gt->gnttab_free_callback_list = NULL;

while (callback != NULL) {
next = callback->next;
- if (gnttab_free_count >= callback->count) {
+ if (gt->gnttab_free_count >= callback->count) {
callback->next = NULL;
callback->fn(callback->arg);
} else {
- callback->next = gnttab_free_callback_list;
- gnttab_free_callback_list = callback;
+ callback->next = gt->gnttab_free_callback_list;
+ gt->gnttab_free_callback_list = callback;
}
callback = next;
}
}

-static inline void check_free_callbacks(void)
+static inline void check_free_callbacks(xenhost_t *xh)
{
- if (unlikely(gnttab_free_callback_list))
- do_free_callbacks();
+ struct gnttab_private *gt = gt_priv(xh);
+
+ if (unlikely(gt->gnttab_free_callback_list))
+ do_free_callbacks(xh);
}

-static void put_free_entry(grant_ref_t ref)
+static void put_free_entry(xenhost_t *xh, grant_ref_t ref)
{
unsigned long flags;
+ struct gnttab_private *gt = gt_priv(xh);
+
spin_lock_irqsave(&gnttab_list_lock, flags);
- gnttab_entry(ref) = gnttab_free_head;
- gnttab_free_head = ref;
- gnttab_free_count++;
- check_free_callbacks();
+ gnttab_entry(xh, ref) = gt->gnttab_free_head;
+ gt->gnttab_free_head = ref;
+ gt->gnttab_free_count++;
+ check_free_callbacks(xh);
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}

@@ -242,72 +251,85 @@ static void put_free_entry(grant_ref_t ref)
* 3. Write memory barrier (WMB).
* 4. Write ent->flags, inc. valid type.
*/
-static void gnttab_update_entry_v1(grant_ref_t ref, domid_t domid,
+static void gnttab_update_entry_v1(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long frame, unsigned flags)
{
- gnttab_shared.v1[ref].domid = domid;
- gnttab_shared.v1[ref].frame = frame;
+ struct gnttab_private *gt = gt_priv(xh);
+
+ gt->gnttab_shared.v1[ref].domid = domid;
+ gt->gnttab_shared.v1[ref].frame = frame;
wmb();
- gnttab_shared.v1[ref].flags = flags;
+ gt->gnttab_shared.v1[ref].flags = flags;
}

-static void gnttab_update_entry_v2(grant_ref_t ref, domid_t domid,
+static void gnttab_update_entry_v2(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long frame, unsigned int flags)
{
- gnttab_shared.v2[ref].hdr.domid = domid;
- gnttab_shared.v2[ref].full_page.frame = frame;
+ struct gnttab_private *gt = gt_priv(xh);
+
+ gt->gnttab_shared.v2[ref].hdr.domid = domid;
+ gt->gnttab_shared.v2[ref].full_page.frame = frame;
wmb(); /* Hypervisor concurrent accesses. */
- gnttab_shared.v2[ref].hdr.flags = GTF_permit_access | flags;
+ gt->gnttab_shared.v2[ref].hdr.flags = GTF_permit_access | flags;
}

/*
* Public grant-issuing interface functions
*/
-void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
+void gnttab_grant_foreign_access_ref(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long frame, int readonly)
{
- gnttab_interface->update_entry(ref, domid, frame,
+ struct gnttab_private *gt = gt_priv(xh);
+
+ gt->gnttab_interface->update_entry(xh, ref, domid, frame,
GTF_permit_access | (readonly ? GTF_readonly : 0));
}
EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access_ref);

-int gnttab_grant_foreign_access(domid_t domid, unsigned long frame,
+int gnttab_grant_foreign_access(xenhost_t *xh, domid_t domid, unsigned long frame,
int readonly)
{
int ref;

- ref = get_free_entries(1);
+ ref = get_free_entries(xh, 1);
if (unlikely(ref < 0))
return -ENOSPC;

- gnttab_grant_foreign_access_ref(ref, domid, frame, readonly);
+ gnttab_grant_foreign_access_ref(xh, ref, domid, frame, readonly);

return ref;
}
EXPORT_SYMBOL_GPL(gnttab_grant_foreign_access);

-static int gnttab_query_foreign_access_v1(grant_ref_t ref)
+static int gnttab_query_foreign_access_v1(xenhost_t *xh, grant_ref_t ref)
{
- return gnttab_shared.v1[ref].flags & (GTF_reading|GTF_writing);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return gt->gnttab_shared.v1[ref].flags & (GTF_reading|GTF_writing);
}

-static int gnttab_query_foreign_access_v2(grant_ref_t ref)
+static int gnttab_query_foreign_access_v2(xenhost_t *xh, grant_ref_t ref)
{
- return grstatus[ref] & (GTF_reading|GTF_writing);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return gt->grstatus[ref] & (GTF_reading|GTF_writing);
}

-int gnttab_query_foreign_access(grant_ref_t ref)
+int gnttab_query_foreign_access(xenhost_t *xh, grant_ref_t ref)
{
- return gnttab_interface->query_foreign_access(ref);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return gt->gnttab_interface->query_foreign_access(xh, ref);
}
EXPORT_SYMBOL_GPL(gnttab_query_foreign_access);

-static int gnttab_end_foreign_access_ref_v1(grant_ref_t ref, int readonly)
+static int gnttab_end_foreign_access_ref_v1(xenhost_t *xh, grant_ref_t ref, int readonly)
{
+ struct gnttab_private *gt = gt_priv(xh);
u16 flags, nflags;
u16 *pflags;

- pflags = &gnttab_shared.v1[ref].flags;
+ pflags = &gt->gnttab_shared.v1[ref].flags;
nflags = *pflags;
do {
flags = nflags;
@@ -318,11 +340,13 @@ static int gnttab_end_foreign_access_ref_v1(grant_ref_t ref, int readonly)
return 1;
}

-static int gnttab_end_foreign_access_ref_v2(grant_ref_t ref, int readonly)
+static int gnttab_end_foreign_access_ref_v2(xenhost_t *xh, grant_ref_t ref, int readonly)
{
- gnttab_shared.v2[ref].hdr.flags = 0;
+ struct gnttab_private *gt = gt_priv(xh);
+
+ gt->gnttab_shared.v2[ref].hdr.flags = 0;
mb(); /* Concurrent access by hypervisor. */
- if (grstatus[ref] & (GTF_reading|GTF_writing)) {
+ if (gt->grstatus[ref] & (GTF_reading|GTF_writing)) {
return 0;
} else {
/*
@@ -341,14 +365,16 @@ static int gnttab_end_foreign_access_ref_v2(grant_ref_t ref, int readonly)
return 1;
}

-static inline int _gnttab_end_foreign_access_ref(grant_ref_t ref, int readonly)
+static inline int _gnttab_end_foreign_access_ref(xenhost_t *xh, grant_ref_t ref, int readonly)
{
- return gnttab_interface->end_foreign_access_ref(ref, readonly);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return gt->gnttab_interface->end_foreign_access_ref(xh, ref, readonly);
}

-int gnttab_end_foreign_access_ref(grant_ref_t ref, int readonly)
+int gnttab_end_foreign_access_ref(xenhost_t *xh, grant_ref_t ref, int readonly)
{
- if (_gnttab_end_foreign_access_ref(ref, readonly))
+ if (_gnttab_end_foreign_access_ref(xh, ref, readonly))
return 1;
pr_warn("WARNING: g.e. %#x still in use!\n", ref);
return 0;
@@ -361,6 +387,7 @@ struct deferred_entry {
bool ro;
uint16_t warn_delay;
struct page *page;
+ xenhost_t *xh;
};
static LIST_HEAD(deferred_list);
static void gnttab_handle_deferred(struct timer_list *);
@@ -382,8 +409,8 @@ static void gnttab_handle_deferred(struct timer_list *unused)
break;
list_del(&entry->list);
spin_unlock_irqrestore(&gnttab_list_lock, flags);
- if (_gnttab_end_foreign_access_ref(entry->ref, entry->ro)) {
- put_free_entry(entry->ref);
+ if (_gnttab_end_foreign_access_ref(entry->xh, entry->ref, entry->ro)) {
+ put_free_entry(entry->xh, entry->ref);
if (entry->page) {
pr_debug("freeing g.e. %#x (pfn %#lx)\n",
entry->ref, page_to_pfn(entry->page));
@@ -411,7 +438,7 @@ static void gnttab_handle_deferred(struct timer_list *unused)
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}

-static void gnttab_add_deferred(grant_ref_t ref, bool readonly,
+static void gnttab_add_deferred(xenhost_t *xh, grant_ref_t ref, bool readonly,
struct page *page)
{
struct deferred_entry *entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
@@ -423,6 +450,7 @@ static void gnttab_add_deferred(grant_ref_t ref, bool readonly,
entry->ref = ref;
entry->ro = readonly;
entry->page = page;
+ entry->xh = xh;
entry->warn_delay = 60;
spin_lock_irqsave(&gnttab_list_lock, flags);
list_add_tail(&entry->list, &deferred_list);
@@ -437,46 +465,49 @@ static void gnttab_add_deferred(grant_ref_t ref, bool readonly,
what, ref, page ? page_to_pfn(page) : -1);
}

-void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
+void gnttab_end_foreign_access(xenhost_t *xh, grant_ref_t ref, int readonly,
unsigned long page)
{
- if (gnttab_end_foreign_access_ref(ref, readonly)) {
- put_free_entry(ref);
+ if (gnttab_end_foreign_access_ref(xh, ref, readonly)) {
+ put_free_entry(xh, ref);
if (page != 0)
put_page(virt_to_page(page));
} else
- gnttab_add_deferred(ref, readonly,
+ gnttab_add_deferred(xh, ref, readonly,
page ? virt_to_page(page) : NULL);
}
EXPORT_SYMBOL_GPL(gnttab_end_foreign_access);

-int gnttab_grant_foreign_transfer(domid_t domid, unsigned long pfn)
+int gnttab_grant_foreign_transfer(xenhost_t *xh, domid_t domid, unsigned long pfn)
{
int ref;

- ref = get_free_entries(1);
+ ref = get_free_entries(xh, 1);
if (unlikely(ref < 0))
return -ENOSPC;
- gnttab_grant_foreign_transfer_ref(ref, domid, pfn);
+ gnttab_grant_foreign_transfer_ref(xh, ref, domid, pfn);

return ref;
}
EXPORT_SYMBOL_GPL(gnttab_grant_foreign_transfer);

-void gnttab_grant_foreign_transfer_ref(grant_ref_t ref, domid_t domid,
+void gnttab_grant_foreign_transfer_ref(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long pfn)
{
- gnttab_interface->update_entry(ref, domid, pfn, GTF_accept_transfer);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ gt->gnttab_interface->update_entry(xh, ref, domid, pfn, GTF_accept_transfer);
}
EXPORT_SYMBOL_GPL(gnttab_grant_foreign_transfer_ref);

-static unsigned long gnttab_end_foreign_transfer_ref_v1(grant_ref_t ref)
+static unsigned long gnttab_end_foreign_transfer_ref_v1(xenhost_t *xh, grant_ref_t ref)
{
+ struct gnttab_private *gt = gt_priv(xh);
unsigned long frame;
u16 flags;
u16 *pflags;

- pflags = &gnttab_shared.v1[ref].flags;
+ pflags = &gt->gnttab_shared.v1[ref].flags;

/*
* If a transfer is not even yet started, try to reclaim the grant
@@ -495,19 +526,20 @@ static unsigned long gnttab_end_foreign_transfer_ref_v1(grant_ref_t ref)
}

rmb(); /* Read the frame number /after/ reading completion status. */
- frame = gnttab_shared.v1[ref].frame;
+ frame = gt->gnttab_shared.v1[ref].frame;
BUG_ON(frame == 0);

return frame;
}

-static unsigned long gnttab_end_foreign_transfer_ref_v2(grant_ref_t ref)
+static unsigned long gnttab_end_foreign_transfer_ref_v2(xenhost_t *xh, grant_ref_t ref)
{
unsigned long frame;
u16 flags;
u16 *pflags;
+ struct gnttab_private *gt = gt_priv(xh);

- pflags = &gnttab_shared.v2[ref].hdr.flags;
+ pflags = &gt->gnttab_shared.v2[ref].hdr.flags;

/*
* If a transfer is not even yet started, try to reclaim the grant
@@ -526,34 +558,39 @@ static unsigned long gnttab_end_foreign_transfer_ref_v2(grant_ref_t ref)
}

rmb(); /* Read the frame number /after/ reading completion status. */
- frame = gnttab_shared.v2[ref].full_page.frame;
+ frame = gt->gnttab_shared.v2[ref].full_page.frame;
BUG_ON(frame == 0);

return frame;
}

-unsigned long gnttab_end_foreign_transfer_ref(grant_ref_t ref)
+unsigned long gnttab_end_foreign_transfer_ref(xenhost_t *xh, grant_ref_t ref)
{
- return gnttab_interface->end_foreign_transfer_ref(ref);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return gt->gnttab_interface->end_foreign_transfer_ref(xh, ref);
}
EXPORT_SYMBOL_GPL(gnttab_end_foreign_transfer_ref);

-unsigned long gnttab_end_foreign_transfer(grant_ref_t ref)
+unsigned long gnttab_end_foreign_transfer(xenhost_t *xh, grant_ref_t ref)
{
- unsigned long frame = gnttab_end_foreign_transfer_ref(ref);
- put_free_entry(ref);
+ unsigned long frame = gnttab_end_foreign_transfer_ref(xh, ref);
+
+ put_free_entry(xh, ref);
+
return frame;
}
EXPORT_SYMBOL_GPL(gnttab_end_foreign_transfer);

-void gnttab_free_grant_reference(grant_ref_t ref)
+void gnttab_free_grant_reference(xenhost_t *xh, grant_ref_t ref)
{
- put_free_entry(ref);
+ put_free_entry(xh, ref);
}
EXPORT_SYMBOL_GPL(gnttab_free_grant_reference);

-void gnttab_free_grant_references(grant_ref_t head)
+void gnttab_free_grant_references(xenhost_t *xh, grant_ref_t head)
{
+ struct gnttab_private *gt = gt_priv(xh);
grant_ref_t ref;
unsigned long flags;
int count = 1;
@@ -561,21 +598,21 @@ void gnttab_free_grant_references(grant_ref_t head)
return;
spin_lock_irqsave(&gnttab_list_lock, flags);
ref = head;
- while (gnttab_entry(ref) != GNTTAB_LIST_END) {
- ref = gnttab_entry(ref);
+ while (gnttab_entry(xh, ref) != GNTTAB_LIST_END) {
+ ref = gnttab_entry(xh, ref);
count++;
}
- gnttab_entry(ref) = gnttab_free_head;
- gnttab_free_head = head;
- gnttab_free_count += count;
- check_free_callbacks();
+ gnttab_entry(xh, ref) = gt->gnttab_free_head;
+ gt->gnttab_free_head = head;
+ gt->gnttab_free_count += count;
+ check_free_callbacks(xh);
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}
EXPORT_SYMBOL_GPL(gnttab_free_grant_references);

-int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
+int gnttab_alloc_grant_references(xenhost_t *xh, u16 count, grant_ref_t *head)
{
- int h = get_free_entries(count);
+ int h = get_free_entries(xh, count);

if (h < 0)
return -ENOSPC;
@@ -586,40 +623,41 @@ int gnttab_alloc_grant_references(u16 count, grant_ref_t *head)
}
EXPORT_SYMBOL_GPL(gnttab_alloc_grant_references);

-int gnttab_empty_grant_references(const grant_ref_t *private_head)
+int gnttab_empty_grant_references(xenhost_t *xh, const grant_ref_t *private_head)
{
return (*private_head == GNTTAB_LIST_END);
}
EXPORT_SYMBOL_GPL(gnttab_empty_grant_references);

-int gnttab_claim_grant_reference(grant_ref_t *private_head)
+int gnttab_claim_grant_reference(xenhost_t *xh, grant_ref_t *private_head)
{
grant_ref_t g = *private_head;
if (unlikely(g == GNTTAB_LIST_END))
return -ENOSPC;
- *private_head = gnttab_entry(g);
+ *private_head = gnttab_entry(xh, g);
return g;
}
EXPORT_SYMBOL_GPL(gnttab_claim_grant_reference);

-void gnttab_release_grant_reference(grant_ref_t *private_head,
+void gnttab_release_grant_reference(xenhost_t *xh, grant_ref_t *private_head,
grant_ref_t release)
{
- gnttab_entry(release) = *private_head;
+ gnttab_entry(xh, release) = *private_head;
*private_head = release;
}
EXPORT_SYMBOL_GPL(gnttab_release_grant_reference);

-void gnttab_request_free_callback(struct gnttab_free_callback *callback,
+void gnttab_request_free_callback(xenhost_t *xh, struct gnttab_free_callback *callback,
void (*fn)(void *), void *arg, u16 count)
{
unsigned long flags;
struct gnttab_free_callback *cb;
+ struct gnttab_private *gt = gt_priv(xh);

spin_lock_irqsave(&gnttab_list_lock, flags);

/* Check if the callback is already on the list */
- cb = gnttab_free_callback_list;
+ cb = gt->gnttab_free_callback_list;
while (cb) {
if (cb == callback)
goto out;
@@ -629,21 +667,23 @@ void gnttab_request_free_callback(struct gnttab_free_callback *callback,
callback->fn = fn;
callback->arg = arg;
callback->count = count;
- callback->next = gnttab_free_callback_list;
- gnttab_free_callback_list = callback;
- check_free_callbacks();
+ callback->next = gt->gnttab_free_callback_list;
+ gt->gnttab_free_callback_list = callback;
+ check_free_callbacks(xh);
out:
spin_unlock_irqrestore(&gnttab_list_lock, flags);
}
EXPORT_SYMBOL_GPL(gnttab_request_free_callback);

-void gnttab_cancel_free_callback(struct gnttab_free_callback *callback)
+void gnttab_cancel_free_callback(xenhost_t *xh, struct gnttab_free_callback *callback)
{
struct gnttab_free_callback **pcb;
unsigned long flags;
+ struct gnttab_private *gt = gt_priv(xh);
+

spin_lock_irqsave(&gnttab_list_lock, flags);
- for (pcb = &gnttab_free_callback_list; *pcb; pcb = &(*pcb)->next) {
+ for (pcb = &gt->gnttab_free_callback_list; *pcb; pcb = &(*pcb)->next) {
if (*pcb == callback) {
*pcb = callback->next;
break;
@@ -653,75 +693,78 @@ void gnttab_cancel_free_callback(struct gnttab_free_callback *callback)
}
EXPORT_SYMBOL_GPL(gnttab_cancel_free_callback);

-static unsigned int gnttab_frames(unsigned int frames, unsigned int align)
+static unsigned int gnttab_frames(xenhost_t *xh, unsigned int frames, unsigned int align)
{
- return (frames * gnttab_interface->grefs_per_grant_frame + align - 1) /
+ struct gnttab_private *gt = gt_priv(xh);
+
+ return (frames * gt->gnttab_interface->grefs_per_grant_frame + align - 1) /
align;
}

-static int grow_gnttab_list(unsigned int more_frames)
+static int grow_gnttab_list(xenhost_t *xh, unsigned int more_frames)
{
unsigned int new_nr_grant_frames, extra_entries, i;
unsigned int nr_glist_frames, new_nr_glist_frames;
unsigned int grefs_per_frame;
+ struct gnttab_private *gt = gt_priv(xh);

- BUG_ON(gnttab_interface == NULL);
- grefs_per_frame = gnttab_interface->grefs_per_grant_frame;
+ BUG_ON(gt->gnttab_interface == NULL);
+ grefs_per_frame = gt->gnttab_interface->grefs_per_grant_frame;

- new_nr_grant_frames = nr_grant_frames + more_frames;
+ new_nr_grant_frames = gt->nr_grant_frames + more_frames;
extra_entries = more_frames * grefs_per_frame;

- nr_glist_frames = gnttab_frames(nr_grant_frames, RPP);
- new_nr_glist_frames = gnttab_frames(new_nr_grant_frames, RPP);
+ nr_glist_frames = gnttab_frames(xh, gt->nr_grant_frames, RPP);
+ new_nr_glist_frames = gnttab_frames(xh, new_nr_grant_frames, RPP);
for (i = nr_glist_frames; i < new_nr_glist_frames; i++) {
- gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_ATOMIC);
- if (!gnttab_list[i])
+ gt->gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_ATOMIC);
+ if (!gt->gnttab_list[i])
goto grow_nomem;
}


- for (i = grefs_per_frame * nr_grant_frames;
+ for (i = grefs_per_frame * gt->nr_grant_frames;
i < grefs_per_frame * new_nr_grant_frames - 1; i++)
- gnttab_entry(i) = i + 1;
+ gnttab_entry(xh, i) = i + 1;

- gnttab_entry(i) = gnttab_free_head;
- gnttab_free_head = grefs_per_frame * nr_grant_frames;
- gnttab_free_count += extra_entries;
+ gnttab_entry(xh, i) = gt->gnttab_free_head;
+ gt->gnttab_free_head = grefs_per_frame * gt->nr_grant_frames;
+ gt->gnttab_free_count += extra_entries;

- nr_grant_frames = new_nr_grant_frames;
+ gt->nr_grant_frames = new_nr_grant_frames;

- check_free_callbacks();
+ check_free_callbacks(xh);

return 0;

grow_nomem:
while (i-- > nr_glist_frames)
- free_page((unsigned long) gnttab_list[i]);
+ free_page((unsigned long) gt->gnttab_list[i]);
return -ENOMEM;
}

-static unsigned int __max_nr_grant_frames(void)
+static unsigned int __max_nr_grant_frames(xenhost_t *xh)
{
struct gnttab_query_size query;
int rc;

query.dom = DOMID_SELF;

- rc = HYPERVISOR_grant_table_op(GNTTABOP_query_size, &query, 1);
+ rc = hypervisor_grant_table_op(xh, GNTTABOP_query_size, &query, 1);
if ((rc < 0) || (query.status != GNTST_okay))
return 4; /* Legacy max supported number of frames */

return query.max_nr_frames;
}

-unsigned int gnttab_max_grant_frames(void)
+unsigned int gnttab_max_grant_frames(xenhost_t *xh)
{
- unsigned int xen_max = __max_nr_grant_frames();
+ unsigned int xen_max = __max_nr_grant_frames(xh);
static unsigned int boot_max_nr_grant_frames;

/* First time, initialize it properly. */
if (!boot_max_nr_grant_frames)
- boot_max_nr_grant_frames = __max_nr_grant_frames();
+ boot_max_nr_grant_frames = __max_nr_grant_frames(xh);

if (xen_max > boot_max_nr_grant_frames)
return boot_max_nr_grant_frames;
@@ -729,14 +772,15 @@ unsigned int gnttab_max_grant_frames(void)
}
EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);

-int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
+int gnttab_setup_auto_xlat_frames(xenhost_t *xh, phys_addr_t addr)
{
+ struct gnttab_private *gt = gt_priv(xh);
xen_pfn_t *pfn;
- unsigned int max_nr_gframes = __max_nr_grant_frames();
+ unsigned int max_nr_gframes = __max_nr_grant_frames(xh);
unsigned int i;
void *vaddr;

- if (xen_auto_xlat_grant_frames.count)
+ if (gt->auto_xlat_grant_frames.count)
return -EINVAL;

vaddr = xen_remap(addr, XEN_PAGE_SIZE * max_nr_gframes);
@@ -753,24 +797,26 @@ int gnttab_setup_auto_xlat_frames(phys_addr_t addr)
for (i = 0; i < max_nr_gframes; i++)
pfn[i] = XEN_PFN_DOWN(addr) + i;

- xen_auto_xlat_grant_frames.vaddr = vaddr;
- xen_auto_xlat_grant_frames.pfn = pfn;
- xen_auto_xlat_grant_frames.count = max_nr_gframes;
+ gt->auto_xlat_grant_frames.vaddr = vaddr;
+ gt->auto_xlat_grant_frames.pfn = pfn;
+ gt->auto_xlat_grant_frames.count = max_nr_gframes;

return 0;
}
EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);

-void gnttab_free_auto_xlat_frames(void)
+void gnttab_free_auto_xlat_frames(xenhost_t *xh)
{
- if (!xen_auto_xlat_grant_frames.count)
+ struct gnttab_private *gt = gt_priv(xh);
+
+ if (!gt->auto_xlat_grant_frames.count)
return;
- kfree(xen_auto_xlat_grant_frames.pfn);
- xen_unmap(xen_auto_xlat_grant_frames.vaddr);
+ kfree(gt->auto_xlat_grant_frames.pfn);
+ xen_unmap(gt->auto_xlat_grant_frames.vaddr);

- xen_auto_xlat_grant_frames.pfn = NULL;
- xen_auto_xlat_grant_frames.count = 0;
- xen_auto_xlat_grant_frames.vaddr = NULL;
+ gt->auto_xlat_grant_frames.pfn = NULL;
+ gt->auto_xlat_grant_frames.count = 0;
+ gt->auto_xlat_grant_frames.vaddr = NULL;
}
EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);

@@ -800,17 +846,17 @@ EXPORT_SYMBOL_GPL(gnttab_pages_set_private);
* @nr_pages: number of pages to alloc
* @pages: returns the pages
*/
-int gnttab_alloc_pages(int nr_pages, struct page **pages)
+int gnttab_alloc_pages(xenhost_t *xh, int nr_pages, struct page **pages)
{
int ret;

- ret = alloc_xenballooned_pages(xh_default, nr_pages, pages);
+ ret = alloc_xenballooned_pages(xh, nr_pages, pages);
if (ret < 0)
return ret;

ret = gnttab_pages_set_private(nr_pages, pages);
if (ret < 0)
- gnttab_free_pages(nr_pages, pages);
+ gnttab_free_pages(xh, nr_pages, pages);

return ret;
}
@@ -836,10 +882,10 @@ EXPORT_SYMBOL_GPL(gnttab_pages_clear_private);
* @nr_pages; number of pages to free
* @pages: the pages
*/
-void gnttab_free_pages(int nr_pages, struct page **pages)
+void gnttab_free_pages(xenhost_t *xh, int nr_pages, struct page **pages)
{
gnttab_pages_clear_private(nr_pages, pages);
- free_xenballooned_pages(xh_default, nr_pages, pages);
+ free_xenballooned_pages(xh, nr_pages, pages);
}
EXPORT_SYMBOL_GPL(gnttab_free_pages);

@@ -848,12 +894,15 @@ EXPORT_SYMBOL_GPL(gnttab_free_pages);
* gnttab_dma_alloc_pages - alloc DMAable pages suitable for grant mapping into
* @args: arguments to the function
*/
-int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args)
+int gnttab_dma_alloc_pages(xenhost_t *xh, struct gnttab_dma_alloc_args *args)
{
unsigned long pfn, start_pfn;
size_t size;
int i, ret;

+ if (xh->type != xenhost_r1)
+ return -EINVAL;
+
size = args->nr_pages << PAGE_SHIFT;
if (args->coherent)
args->vaddr = dma_alloc_coherent(args->dev, size,
@@ -903,11 +952,14 @@ EXPORT_SYMBOL_GPL(gnttab_dma_alloc_pages);
* gnttab_dma_free_pages - free DMAable pages
* @args: arguments to the function
*/
-int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args)
+int gnttab_dma_free_pages(xenhost_t *xh, struct gnttab_dma_alloc_args *args)
{
size_t size;
int i, ret;

+ if (xh->type != xenhost_r1)
+ return -EINVAL;
+
gnttab_pages_clear_private(args->nr_pages, args->pages);

for (i = 0; i < args->nr_pages; i++)
@@ -939,13 +991,13 @@ EXPORT_SYMBOL_GPL(gnttab_dma_free_pages);
/* Handling of paged out grant targets (GNTST_eagain) */
#define MAX_DELAY 256
static inline void
-gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status,
+gnttab_retry_eagain_gop(xenhost_t *xh, unsigned int cmd, void *gop, int16_t *status,
const char *func)
{
unsigned delay = 1;

do {
- BUG_ON(HYPERVISOR_grant_table_op(cmd, gop, 1));
+ BUG_ON(hypervisor_grant_table_op(xh, cmd, gop, 1));
if (*status == GNTST_eagain)
msleep(delay++);
} while ((*status == GNTST_eagain) && (delay < MAX_DELAY));
@@ -956,28 +1008,28 @@ gnttab_retry_eagain_gop(unsigned int cmd, void *gop, int16_t *status,
}
}

-void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count)
+void gnttab_batch_map(xenhost_t *xh, struct gnttab_map_grant_ref *batch, unsigned count)
{
struct gnttab_map_grant_ref *op;

- if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, batch, count))
+ if (hypervisor_grant_table_op(xh, GNTTABOP_map_grant_ref, batch, count))
BUG();
for (op = batch; op < batch + count; op++)
if (op->status == GNTST_eagain)
- gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref, op,
+ gnttab_retry_eagain_gop(xh, GNTTABOP_map_grant_ref, op,
&op->status, __func__);
}
EXPORT_SYMBOL_GPL(gnttab_batch_map);

-void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
+void gnttab_batch_copy(xenhost_t *xh, struct gnttab_copy *batch, unsigned count)
{
struct gnttab_copy *op;

- if (HYPERVISOR_grant_table_op(GNTTABOP_copy, batch, count))
+ if (hypervisor_grant_table_op(xh, GNTTABOP_copy, batch, count))
BUG();
for (op = batch; op < batch + count; op++)
if (op->status == GNTST_eagain)
- gnttab_retry_eagain_gop(GNTTABOP_copy, op,
+ gnttab_retry_eagain_gop(xh, GNTTABOP_copy, op,
&op->status, __func__);
}
EXPORT_SYMBOL_GPL(gnttab_batch_copy);
@@ -1030,13 +1082,13 @@ void gnttab_foreach_grant(struct page **pages,
}
}

-int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count)
{
int i, ret;

- ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
+ ret = hypervisor_grant_table_op(xh, GNTTABOP_map_grant_ref, map_ops, count);
if (ret)
return ret;

@@ -1059,7 +1111,7 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,

case GNTST_eagain:
/* Retry eagain maps */
- gnttab_retry_eagain_gop(GNTTABOP_map_grant_ref,
+ gnttab_retry_eagain_gop(xh, GNTTABOP_map_grant_ref,
map_ops + i,
&map_ops[i].status, __func__);
/* Test status in next loop iteration. */
@@ -1075,14 +1127,14 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
}
EXPORT_SYMBOL_GPL(gnttab_map_refs);

-int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
+int gnttab_unmap_refs(xenhost_t *xh, struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count)
{
unsigned int i;
int ret;

- ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
+ ret = hypervisor_grant_table_op(xh, GNTTABOP_unmap_grant_ref, unmap_ops, count);
if (ret)
return ret;

@@ -1122,7 +1174,7 @@ static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
}
}

- ret = gnttab_unmap_refs(item->unmap_ops, item->kunmap_ops,
+ ret = gnttab_unmap_refs(item->xh, item->unmap_ops, item->kunmap_ops,
item->pages, item->count);
item->done(ret, item);
}
@@ -1159,37 +1211,43 @@ int gnttab_unmap_refs_sync(struct gntab_unmap_queue_data *item)
}
EXPORT_SYMBOL_GPL(gnttab_unmap_refs_sync);

-static unsigned int nr_status_frames(unsigned int nr_grant_frames)
+static unsigned int nr_status_frames(xenhost_t *xh, unsigned int nr_grant_frames)
{
- BUG_ON(gnttab_interface == NULL);
- return gnttab_frames(nr_grant_frames, SPP);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ BUG_ON(gt->gnttab_interface == NULL);
+ return gnttab_frames(xh, nr_grant_frames, SPP);
}

-static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
+static int gnttab_map_frames_v1(xenhost_t *xh, xen_pfn_t *frames, unsigned int nr_gframes)
{
int rc;
+ struct gnttab_private *gt = gt_priv(xh);

- rc = arch_gnttab_map_shared(frames, nr_gframes,
- gnttab_max_grant_frames(),
- &gnttab_shared.addr);
+ rc = arch_gnttab_map_shared(xh, frames, nr_gframes,
+ gnttab_max_grant_frames(xh),
+ &gt->gnttab_shared.addr);
BUG_ON(rc);

return 0;
}

-static void gnttab_unmap_frames_v1(void)
+static void gnttab_unmap_frames_v1(xenhost_t *xh)
{
- arch_gnttab_unmap(gnttab_shared.addr, nr_grant_frames);
+ struct gnttab_private *gt = gt_priv(xh);
+
+ arch_gnttab_unmap(xh, gt->gnttab_shared.addr, gt->nr_grant_frames);
}

-static int gnttab_map_frames_v2(xen_pfn_t *frames, unsigned int nr_gframes)
+static int gnttab_map_frames_v2(xenhost_t *xh, xen_pfn_t *frames, unsigned int nr_gframes)
{
uint64_t *sframes;
unsigned int nr_sframes;
struct gnttab_get_status_frames getframes;
int rc;
+ struct gnttab_private *gt = gt_priv(xh);

- nr_sframes = nr_status_frames(nr_gframes);
+ nr_sframes = nr_status_frames(xh, nr_gframes);

/* No need for kzalloc as it is initialized in following hypercall
* GNTTABOP_get_status_frames.
@@ -1202,7 +1260,7 @@ static int gnttab_map_frames_v2(xen_pfn_t *frames, unsigned int nr_gframes)
getframes.nr_frames = nr_sframes;
set_xen_guest_handle(getframes.frame_list, sframes);

- rc = HYPERVISOR_grant_table_op(GNTTABOP_get_status_frames,
+ rc = hypervisor_grant_table_op(xh, GNTTABOP_get_status_frames,
&getframes, 1);
if (rc == -ENOSYS) {
kfree(sframes);
@@ -1211,38 +1269,41 @@ static int gnttab_map_frames_v2(xen_pfn_t *frames, unsigned int nr_gframes)

BUG_ON(rc || getframes.status);

- rc = arch_gnttab_map_status(sframes, nr_sframes,
- nr_status_frames(gnttab_max_grant_frames()),
- &grstatus);
+ rc = arch_gnttab_map_status(xh, sframes, nr_sframes,
+ nr_status_frames(xh, gnttab_max_grant_frames(xh)),
+ &gt->grstatus);
BUG_ON(rc);
kfree(sframes);

- rc = arch_gnttab_map_shared(frames, nr_gframes,
- gnttab_max_grant_frames(),
- &gnttab_shared.addr);
+ rc = arch_gnttab_map_shared(xh, frames, nr_gframes,
+ gnttab_max_grant_frames(xh),
+ &gt->gnttab_shared.addr);
BUG_ON(rc);

return 0;
}

-static void gnttab_unmap_frames_v2(void)
+static void gnttab_unmap_frames_v2(xenhost_t *xh)
{
- arch_gnttab_unmap(gnttab_shared.addr, nr_grant_frames);
- arch_gnttab_unmap(grstatus, nr_status_frames(nr_grant_frames));
+ struct gnttab_private *gt = gt_priv(xh);
+
+ arch_gnttab_unmap(xh, gt->gnttab_shared.addr, gt->nr_grant_frames);
+ arch_gnttab_unmap(xh, gt->grstatus, nr_status_frames(xh, gt->nr_grant_frames));
}

-static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
+static int gnttab_map(xenhost_t *xh, unsigned int start_idx, unsigned int end_idx)
{
struct gnttab_setup_table setup;
xen_pfn_t *frames;
unsigned int nr_gframes = end_idx + 1;
+ struct gnttab_private *gt = gt_priv(xh);
int rc;

- if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ if (__xen_feature(xh, XENFEAT_auto_translated_physmap)) {
struct xen_add_to_physmap xatp;
unsigned int i = end_idx;
rc = 0;
- BUG_ON(xen_auto_xlat_grant_frames.count < nr_gframes);
+ BUG_ON(gt->auto_xlat_grant_frames.count < nr_gframes);
/*
* Loop backwards, so that the first hypercall has the largest
* index, ensuring that the table will grow only once.
@@ -1251,8 +1312,8 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
- xatp.gpfn = xen_auto_xlat_grant_frames.pfn[i];
- rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
+ xatp.gpfn = gt->auto_xlat_grant_frames.pfn[i];
+ rc = hypervisor_memory_op(xh, XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
pr_warn("grant table add_to_physmap failed, err=%d\n",
rc);
@@ -1274,7 +1335,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
setup.nr_frames = nr_gframes;
set_xen_guest_handle(setup.frame_list, frames);

- rc = HYPERVISOR_grant_table_op(GNTTABOP_setup_table, &setup, 1);
+ rc = hypervisor_grant_table_op(xh, GNTTABOP_setup_table, &setup, 1);
if (rc == -ENOSYS) {
kfree(frames);
return -ENOSYS;
@@ -1282,7 +1343,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)

BUG_ON(rc || setup.status);

- rc = gnttab_interface->map_frames(frames, nr_gframes);
+ rc = gt->gnttab_interface->map_frames(xh, frames, nr_gframes);

kfree(frames);

@@ -1313,13 +1374,13 @@ static const struct gnttab_ops gnttab_v2_ops = {
.query_foreign_access = gnttab_query_foreign_access_v2,
};

-static bool gnttab_need_v2(void)
+static bool gnttab_need_v2(xenhost_t *xh)
{
#ifdef CONFIG_X86
uint32_t base, width;

if (xen_pv_domain()) {
- base = xenhost_cpuid_base(xh_default);
+ base = xenhost_cpuid_base(xh);
if (cpuid_eax(base) < 5)
return false; /* Information not available, use V1. */
width = cpuid_ebx(base + 5) &
@@ -1330,12 +1391,13 @@ static bool gnttab_need_v2(void)
return !!(max_possible_pfn >> 32);
}

-static void gnttab_request_version(void)
+static void gnttab_request_version(xenhost_t *xh)
{
long rc;
struct gnttab_set_version gsv;
+ struct gnttab_private *gt = gt_priv(xh);

- if (gnttab_need_v2())
+ if (gnttab_need_v2(xh))
gsv.version = 2;
else
gsv.version = 1;
@@ -1344,139 +1406,162 @@ static void gnttab_request_version(void)
if (xen_gnttab_version >= 1 && xen_gnttab_version <= 2)
gsv.version = xen_gnttab_version;

- rc = HYPERVISOR_grant_table_op(GNTTABOP_set_version, &gsv, 1);
+ rc = hypervisor_grant_table_op(xh, GNTTABOP_set_version, &gsv, 1);
if (rc == 0 && gsv.version == 2)
- gnttab_interface = &gnttab_v2_ops;
+ gt->gnttab_interface = &gnttab_v2_ops;
else
- gnttab_interface = &gnttab_v1_ops;
+ gt->gnttab_interface = &gnttab_v1_ops;
+
pr_info("Grant tables using version %d layout\n",
- gnttab_interface->version);
+ gt->gnttab_interface->version);
}

-static int gnttab_setup(void)
+static int gnttab_setup(xenhost_t *xh)
{
unsigned int max_nr_gframes;
+ struct gnttab_private *gt = gt_priv(xh);

- max_nr_gframes = gnttab_max_grant_frames();
- if (max_nr_gframes < nr_grant_frames)
+ max_nr_gframes = gnttab_max_grant_frames(xh);
+ if (max_nr_gframes < gt->nr_grant_frames)
return -ENOSYS;

- if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL) {
- gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
- if (gnttab_shared.addr == NULL) {
+ if (__xen_feature(xh, XENFEAT_auto_translated_physmap) && gt->gnttab_shared.addr == NULL) {
+ gt->gnttab_shared.addr = gt->auto_xlat_grant_frames.vaddr;
+ if (gt->gnttab_shared.addr == NULL) {
pr_warn("gnttab share frames (addr=0x%08lx) is not mapped!\n",
- (unsigned long)xen_auto_xlat_grant_frames.vaddr);
+ (unsigned long)gt->auto_xlat_grant_frames.vaddr);
return -ENOMEM;
}
}
- return gnttab_map(0, nr_grant_frames - 1);
+ return gnttab_map(xh, 0, gt->nr_grant_frames - 1);
}

int gnttab_resume(void)
{
- gnttab_request_version();
- return gnttab_setup();
+ xenhost_t **xh;
+ for_each_xenhost(xh) {
+ int err;
+
+ gnttab_request_version(*xh);
+ err = gnttab_setup(*xh);
+ if (err)
+ return err;
+ }
+ return 0;
}

int gnttab_suspend(void)
{
- if (!xen_feature(XENFEAT_auto_translated_physmap))
- gnttab_interface->unmap_frames();
+ xenhost_t **xh;
+ struct gnttab_private *gt;
+
+ for_each_xenhost(xh) {
+ gt = gt_priv(*xh);
+
+ if (!__xen_feature((*xh), XENFEAT_auto_translated_physmap))
+ gt->gnttab_interface->unmap_frames(*xh);
+ return 0;
+ }
return 0;
}

-static int gnttab_expand(unsigned int req_entries)
+static int gnttab_expand(xenhost_t *xh, unsigned int req_entries)
{
int rc;
unsigned int cur, extra;
+ struct gnttab_private *gt = gt_priv(xh);

- BUG_ON(gnttab_interface == NULL);
- cur = nr_grant_frames;
- extra = ((req_entries + gnttab_interface->grefs_per_grant_frame - 1) /
- gnttab_interface->grefs_per_grant_frame);
- if (cur + extra > gnttab_max_grant_frames()) {
+ BUG_ON(gt->gnttab_interface == NULL);
+ cur = gt->nr_grant_frames;
+ extra = ((req_entries + gt->gnttab_interface->grefs_per_grant_frame - 1) /
+ gt->gnttab_interface->grefs_per_grant_frame);
+ if (cur + extra > gnttab_max_grant_frames(xh)) {
pr_warn_ratelimited("xen/grant-table: max_grant_frames reached"
" cur=%u extra=%u limit=%u"
" gnttab_free_count=%u req_entries=%u\n",
- cur, extra, gnttab_max_grant_frames(),
- gnttab_free_count, req_entries);
+ cur, extra, gnttab_max_grant_frames(xh),
+ gt->gnttab_free_count, req_entries);
return -ENOSPC;
}

- rc = gnttab_map(cur, cur + extra - 1);
+ rc = gnttab_map(xh, cur, cur + extra - 1);
if (rc == 0)
- rc = grow_gnttab_list(extra);
+ rc = grow_gnttab_list(xh, extra);

return rc;
}

-int gnttab_init(void)
+int gnttab_init(xenhost_t *xh)
{
int i;
unsigned long max_nr_grant_frames;
unsigned int max_nr_glist_frames, nr_glist_frames;
unsigned int nr_init_grefs;
int ret;
+ struct gnttab_private *gt = gt_priv(xh);

- gnttab_request_version();
- max_nr_grant_frames = gnttab_max_grant_frames();
- nr_grant_frames = 1;
+ gnttab_request_version(xh);
+ max_nr_grant_frames = gnttab_max_grant_frames(xh);
+ gt->nr_grant_frames = 1;

/* Determine the maximum number of frames required for the
* grant reference free list on the current hypervisor.
*/
- BUG_ON(gnttab_interface == NULL);
+ BUG_ON(gt->gnttab_interface == NULL);
max_nr_glist_frames = (max_nr_grant_frames *
- gnttab_interface->grefs_per_grant_frame / RPP);
+ gt->gnttab_interface->grefs_per_grant_frame / RPP);

- gnttab_list = kmalloc_array(max_nr_glist_frames,
+ gt->gnttab_list = kmalloc_array(max_nr_glist_frames,
sizeof(grant_ref_t *),
GFP_KERNEL);
- if (gnttab_list == NULL)
+ if (gt->gnttab_list == NULL)
return -ENOMEM;

- nr_glist_frames = gnttab_frames(nr_grant_frames, RPP);
+ nr_glist_frames = gnttab_frames(xh, gt->nr_grant_frames, RPP);
for (i = 0; i < nr_glist_frames; i++) {
- gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_KERNEL);
- if (gnttab_list[i] == NULL) {
+ gt->gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_KERNEL);
+ if (gt->gnttab_list[i] == NULL) {
ret = -ENOMEM;
goto ini_nomem;
}
}

- ret = arch_gnttab_init(max_nr_grant_frames,
- nr_status_frames(max_nr_grant_frames));
+ ret = arch_gnttab_init(xh, max_nr_grant_frames,
+ nr_status_frames(xh, max_nr_grant_frames));
if (ret < 0)
goto ini_nomem;

- if (gnttab_setup() < 0) {
+ if (gnttab_setup(xh) < 0) {
ret = -ENODEV;
goto ini_nomem;
}

- nr_init_grefs = nr_grant_frames *
- gnttab_interface->grefs_per_grant_frame;
+ nr_init_grefs = gt->nr_grant_frames *
+ gt->gnttab_interface->grefs_per_grant_frame;

for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
- gnttab_entry(i) = i + 1;
+ gnttab_entry(xh, i) = i + 1;

- gnttab_entry(nr_init_grefs - 1) = GNTTAB_LIST_END;
- gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
- gnttab_free_head = NR_RESERVED_ENTRIES;
+ gnttab_entry(xh, nr_init_grefs - 1) = GNTTAB_LIST_END;
+ gt->gnttab_free_count = nr_init_grefs - NR_RESERVED_ENTRIES;
+ gt->gnttab_free_head = NR_RESERVED_ENTRIES;

printk("Grant table initialized\n");
return 0;

ini_nomem:
for (i--; i >= 0; i--)
- free_page((unsigned long)gnttab_list[i]);
- kfree(gnttab_list);
+ free_page((unsigned long)gt->gnttab_list[i]);
+ kfree(gt->gnttab_list);
return ret;
}
EXPORT_SYMBOL_GPL(gnttab_init);

static int __gnttab_init(void)
{
+ xenhost_t **xh;
+ int err;
+
if (!xen_domain())
return -ENODEV;

@@ -1484,8 +1569,14 @@ static int __gnttab_init(void)
if (xen_hvm_domain() && !xen_pvh_domain())
return 0;

- return gnttab_init();
+ for_each_xenhost(xh) {
+ err = gnttab_init(*xh);
+ if (err)
+ return err;
+ }
+
+ return 0;
}
/* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
- * beforehand to initialize xen_auto_xlat_grant_frames. */
+ * beforehand to initialize auto_xlat_grant_frames. */
core_initcall_sync(__gnttab_init);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 9bc5bc07d4d3..827b790199fb 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -74,15 +74,16 @@ struct gntab_unmap_queue_data
struct gnttab_unmap_grant_ref *unmap_ops;
struct gnttab_unmap_grant_ref *kunmap_ops;
struct page **pages;
+ xenhost_t *xh;
unsigned int count;
unsigned int age;
};

-int gnttab_init(void);
+int gnttab_init(xenhost_t *xh);
int gnttab_suspend(void);
int gnttab_resume(void);

-int gnttab_grant_foreign_access(domid_t domid, unsigned long frame,
+int gnttab_grant_foreign_access(xenhost_t *xh, domid_t domid, unsigned long frame,
int readonly);

/*
@@ -90,7 +91,7 @@ int gnttab_grant_foreign_access(domid_t domid, unsigned long frame,
* longer in use. Return 1 if the grant entry was freed, 0 if it is still in
* use.
*/
-int gnttab_end_foreign_access_ref(grant_ref_t ref, int readonly);
+int gnttab_end_foreign_access_ref(xenhost_t *xh, grant_ref_t ref, int readonly);

/*
* Eventually end access through the given grant reference, and once that
@@ -98,49 +99,49 @@ int gnttab_end_foreign_access_ref(grant_ref_t ref, int readonly);
* immediately iff the grant entry is not in use, otherwise it will happen
* some time later. page may be 0, in which case no freeing will occur.
*/
-void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
+void gnttab_end_foreign_access(xenhost_t *xh, grant_ref_t ref, int readonly,
unsigned long page);

-int gnttab_grant_foreign_transfer(domid_t domid, unsigned long pfn);
+int gnttab_grant_foreign_transfer(xenhost_t *xh, domid_t domid, unsigned long pfn);

-unsigned long gnttab_end_foreign_transfer_ref(grant_ref_t ref);
-unsigned long gnttab_end_foreign_transfer(grant_ref_t ref);
+unsigned long gnttab_end_foreign_transfer_ref(xenhost_t *xh, grant_ref_t ref);
+unsigned long gnttab_end_foreign_transfer(xenhost_t *xh, grant_ref_t ref);

-int gnttab_query_foreign_access(grant_ref_t ref);
+int gnttab_query_foreign_access(xenhost_t *xh, grant_ref_t ref);

/*
* operations on reserved batches of grant references
*/
-int gnttab_alloc_grant_references(u16 count, grant_ref_t *pprivate_head);
+int gnttab_alloc_grant_references(xenhost_t *xh, u16 count, grant_ref_t *pprivate_head);

-void gnttab_free_grant_reference(grant_ref_t ref);
+void gnttab_free_grant_reference(xenhost_t *xh, grant_ref_t ref);

-void gnttab_free_grant_references(grant_ref_t head);
+void gnttab_free_grant_references(xenhost_t *xh, grant_ref_t head);

-int gnttab_empty_grant_references(const grant_ref_t *pprivate_head);
+int gnttab_empty_grant_references(xenhost_t *xh, const grant_ref_t *pprivate_head);

-int gnttab_claim_grant_reference(grant_ref_t *pprivate_head);
+int gnttab_claim_grant_reference(xenhost_t *xh, grant_ref_t *pprivate_head);

-void gnttab_release_grant_reference(grant_ref_t *private_head,
+void gnttab_release_grant_reference(xenhost_t *xh, grant_ref_t *private_head,
grant_ref_t release);

-void gnttab_request_free_callback(struct gnttab_free_callback *callback,
+void gnttab_request_free_callback(xenhost_t *xh, struct gnttab_free_callback *callback,
void (*fn)(void *), void *arg, u16 count);
-void gnttab_cancel_free_callback(struct gnttab_free_callback *callback);
+void gnttab_cancel_free_callback(xenhost_t *xh, struct gnttab_free_callback *callback);

-void gnttab_grant_foreign_access_ref(grant_ref_t ref, domid_t domid,
+void gnttab_grant_foreign_access_ref(xenhost_t *xh, grant_ref_t ref, domid_t domid,
unsigned long frame, int readonly);

/* Give access to the first 4K of the page */
static inline void gnttab_page_grant_foreign_access_ref_one(
- grant_ref_t ref, domid_t domid,
+ xenhost_t *xh, grant_ref_t ref, domid_t domid,
struct page *page, int readonly)
{
- gnttab_grant_foreign_access_ref(ref, domid, xen_page_to_gfn(page),
+ gnttab_grant_foreign_access_ref(xh, ref, domid, xen_page_to_gfn(page),
readonly);
}

-void gnttab_grant_foreign_transfer_ref(grant_ref_t, domid_t domid,
+void gnttab_grant_foreign_transfer_ref(xenhost_t *xh, grant_ref_t, domid_t domid,
unsigned long pfn);

static inline void
@@ -174,29 +175,28 @@ gnttab_set_unmap_op(struct gnttab_unmap_grant_ref *unmap, phys_addr_t addr,
unmap->dev_bus_addr = 0;
}

-int arch_gnttab_init(unsigned long nr_shared, unsigned long nr_status);
-int arch_gnttab_map_shared(xen_pfn_t *frames, unsigned long nr_gframes,
+int arch_gnttab_init(xenhost_t *xh, unsigned long nr_shared, unsigned long nr_status);
+int arch_gnttab_map_shared(xenhost_t *xh, xen_pfn_t *frames, unsigned long nr_gframes,
unsigned long max_nr_gframes,
void **__shared);
-int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
+int arch_gnttab_map_status(xenhost_t *xh, uint64_t *frames, unsigned long nr_gframes,
unsigned long max_nr_gframes,
grant_status_t **__shared);
-void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
+void arch_gnttab_unmap(xenhost_t *xh, void *shared, unsigned long nr_gframes);

struct grant_frames {
xen_pfn_t *pfn;
unsigned int count;
void *vaddr;
};
-extern struct grant_frames xen_auto_xlat_grant_frames;
-unsigned int gnttab_max_grant_frames(void);
-int gnttab_setup_auto_xlat_frames(phys_addr_t addr);
-void gnttab_free_auto_xlat_frames(void);
+unsigned int gnttab_max_grant_frames(xenhost_t *xh);
+int gnttab_setup_auto_xlat_frames(xenhost_t *xh, phys_addr_t addr);
+void gnttab_free_auto_xlat_frames(xenhost_t *xh);

#define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))

-int gnttab_alloc_pages(int nr_pages, struct page **pages);
-void gnttab_free_pages(int nr_pages, struct page **pages);
+int gnttab_alloc_pages(xenhost_t *xh, int nr_pages, struct page **pages);
+void gnttab_free_pages(xenhost_t *xh, int nr_pages, struct page **pages);

#ifdef CONFIG_XEN_GRANT_DMA_ALLOC
struct gnttab_dma_alloc_args {
@@ -212,17 +212,17 @@ struct gnttab_dma_alloc_args {
dma_addr_t dev_bus_addr;
};

-int gnttab_dma_alloc_pages(struct gnttab_dma_alloc_args *args);
-int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args);
+int gnttab_dma_alloc_pages(xenhost_t *xh, struct gnttab_dma_alloc_args *args);
+int gnttab_dma_free_pages(xenhost_t *xh, struct gnttab_dma_alloc_args *args);
#endif

int gnttab_pages_set_private(int nr_pages, struct page **pages);
void gnttab_pages_clear_private(int nr_pages, struct page **pages);

-int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
-int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
+int gnttab_unmap_refs(xenhost_t *xh, struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_unmap_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
@@ -238,8 +238,8 @@ int gnttab_unmap_refs_sync(struct gntab_unmap_queue_data *item);
* Return value in each iand every status field of the batch guaranteed
* to not be GNTST_eagain.
*/
-void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
-void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
+void gnttab_batch_map(xenhost_t *xh, struct gnttab_map_grant_ref *batch, unsigned count);
+void gnttab_batch_copy(xenhost_t *xh, struct gnttab_copy *batch, unsigned count);


struct xen_page_foreign {
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
index 9e08627a9e3e..acee0c7872b6 100644
--- a/include/xen/xenhost.h
+++ b/include/xen/xenhost.h
@@ -129,6 +129,17 @@ typedef struct {
const struct evtchn_ops *evtchn_ops;
int **evtchn_to_irq;
};
+
+ /* grant table private state */
+ struct {
+ /* private to drivers/xen/grant-table.c */
+ void *gnttab_private;
+
+ /* x86/xen/grant-table.c */
+ void *gnttab_shared_vm_area;
+ void *gnttab_status_vm_area;
+ void *auto_xlat_grant_frames;
+ };
} xenhost_t;

typedef struct xenhost_ops {
--
2.20.1

2019-05-09 17:29:59

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 15/16] xen/net: gnttab, evtchn, xenbus API changes

For the most part, we now pass xenhost_t * as parameter.

Co-developed-by: Joao Martins <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
---
drivers/net/xen-netback/hash.c | 7 +-
drivers/net/xen-netback/interface.c | 7 +-
drivers/net/xen-netback/netback.c | 11 +--
drivers/net/xen-netback/rx.c | 3 +-
drivers/net/xen-netback/xenbus.c | 81 +++++++++++-----------
drivers/net/xen-netfront.c | 102 +++++++++++++++-------------
6 files changed, 117 insertions(+), 94 deletions(-)

diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c
index 0ccb021f1e78..93a449571ef3 100644
--- a/drivers/net/xen-netback/hash.c
+++ b/drivers/net/xen-netback/hash.c
@@ -289,6 +289,8 @@ u32 xenvif_set_hash_flags(struct xenvif *vif, u32 flags)
u32 xenvif_set_hash_key(struct xenvif *vif, u32 gref, u32 len)
{
u8 *key = vif->hash.key;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
+
struct gnttab_copy copy_op = {
.source.u.ref = gref,
.source.domid = vif->domid,
@@ -303,7 +305,7 @@ u32 xenvif_set_hash_key(struct xenvif *vif, u32 gref, u32 len)
return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;

if (copy_op.len != 0) {
- gnttab_batch_copy(&copy_op, 1);
+ gnttab_batch_copy(dev->xh, &copy_op, 1);

if (copy_op.status != GNTST_okay)
return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
@@ -334,6 +336,7 @@ u32 xenvif_set_hash_mapping(struct xenvif *vif, u32 gref, u32 len,
u32 off)
{
u32 *mapping = vif->hash.mapping[!vif->hash.mapping_sel];
+ struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
unsigned int nr = 1;
struct gnttab_copy copy_op[2] = {{
.source.u.ref = gref,
@@ -363,7 +366,7 @@ u32 xenvif_set_hash_mapping(struct xenvif *vif, u32 gref, u32 len,
vif->hash.size * sizeof(*mapping));

if (copy_op[0].len != 0) {
- gnttab_batch_copy(copy_op, nr);
+ gnttab_batch_copy(dev->xh, copy_op, nr);

if (copy_op[0].status != GNTST_okay ||
copy_op[nr - 1].status != GNTST_okay)
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 53d4e6351f1e..329a4c701042 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -519,6 +519,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
int xenvif_init_queue(struct xenvif_queue *queue)
{
int err, i;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);

queue->credit_bytes = queue->remaining_credit = ~0UL;
queue->credit_usec = 0UL;
@@ -542,7 +543,7 @@ int xenvif_init_queue(struct xenvif_queue *queue)
* better enable it. The long term solution would be to use just a
* bunch of valid page descriptors, without dependency on ballooning
*/
- err = gnttab_alloc_pages(MAX_PENDING_REQS,
+ err = gnttab_alloc_pages(dev->xh, MAX_PENDING_REQS,
queue->mmap_pages);
if (err) {
netdev_err(queue->vif->dev, "Could not reserve mmap_pages\n");
@@ -790,7 +791,9 @@ void xenvif_disconnect_ctrl(struct xenvif *vif)
*/
void xenvif_deinit_queue(struct xenvif_queue *queue)
{
- gnttab_free_pages(MAX_PENDING_REQS, queue->mmap_pages);
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);
+
+ gnttab_free_pages(dev->xh, MAX_PENDING_REQS, queue->mmap_pages);
}

void xenvif_free(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..055de62ecbf5 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1244,6 +1244,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue)
pending_ring_idx_t dc, dp;
u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
unsigned int i = 0;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);

dc = queue->dealloc_cons;
gop = queue->tx_unmap_ops;
@@ -1280,7 +1281,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue)

if (gop - queue->tx_unmap_ops > 0) {
int ret;
- ret = gnttab_unmap_refs(queue->tx_unmap_ops,
+ ret = gnttab_unmap_refs(dev->xh, queue->tx_unmap_ops,
NULL,
queue->pages_to_unmap,
gop - queue->tx_unmap_ops);
@@ -1310,6 +1311,7 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
{
unsigned nr_mops, nr_cops = 0;
int work_done, ret;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);

if (unlikely(!tx_work_todo(queue)))
return 0;
@@ -1319,9 +1321,9 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
if (nr_cops == 0)
return 0;

- gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
+ gnttab_batch_copy(dev->xh, queue->tx_copy_ops, nr_cops);
if (nr_mops != 0) {
- ret = gnttab_map_refs(queue->tx_map_ops,
+ ret = gnttab_map_refs(dev->xh, queue->tx_map_ops,
NULL,
queue->pages_to_map,
nr_mops);
@@ -1391,6 +1393,7 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
{
int ret;
struct gnttab_unmap_grant_ref tx_unmap_op;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);

gnttab_set_unmap_op(&tx_unmap_op,
idx_to_kaddr(queue, pending_idx),
@@ -1398,7 +1401,7 @@ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
queue->grant_tx_handle[pending_idx]);
xenvif_grant_handle_reset(queue, pending_idx);

- ret = gnttab_unmap_refs(&tx_unmap_op, NULL,
+ ret = gnttab_unmap_refs(dev->xh, &tx_unmap_op, NULL,
&queue->mmap_pages[pending_idx], 1);
if (ret) {
netdev_err(queue->vif->dev,
diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index ef5887037b22..aa8fcbe315a6 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -134,8 +134,9 @@ static void xenvif_rx_copy_flush(struct xenvif_queue *queue)
{
unsigned int i;
int notify;
+ struct xenbus_device *dev = xenvif_to_xenbus_device(queue->vif);

- gnttab_batch_copy(queue->rx_copy.op, queue->rx_copy.num);
+ gnttab_batch_copy(dev->xh, queue->rx_copy.op, queue->rx_copy.num);

for (i = 0; i < queue->rx_copy.num; i++) {
struct gnttab_copy *op;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 2625740bdc4a..09316c221db9 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -257,7 +257,7 @@ static int netback_remove(struct xenbus_device *dev)
if (be->vif) {
kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
xen_unregister_watchers(be->vif);
- xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
+ xenbus_rm(dev->xh, XBT_NIL, dev->nodename, "hotplug-status");
xenvif_free(be->vif);
be->vif = NULL;
}
@@ -299,26 +299,26 @@ static int netback_probe(struct xenbus_device *dev,
sg = 1;

do {
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
xenbus_dev_fatal(dev, err, "starting transaction");
goto fail;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", sg);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-sg", "%d", sg);
if (err) {
message = "writing feature-sg";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-gso-tcpv4",
"%d", sg);
if (err) {
message = "writing feature-gso-tcpv4";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-gso-tcpv6",
"%d", sg);
if (err) {
message = "writing feature-gso-tcpv6";
@@ -326,7 +326,7 @@ static int netback_probe(struct xenbus_device *dev,
}

/* We support partial checksum setup for IPv6 packets */
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-ipv6-csum-offload",
"%d", 1);
if (err) {
@@ -335,7 +335,7 @@ static int netback_probe(struct xenbus_device *dev,
}

/* We support rx-copy path. */
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-rx-copy", "%d", 1);
if (err) {
message = "writing feature-rx-copy";
@@ -346,7 +346,7 @@ static int netback_probe(struct xenbus_device *dev,
* We don't support rx-flip path (except old guests who don't
* grok this feature flag).
*/
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-rx-flip", "%d", 0);
if (err) {
message = "writing feature-rx-flip";
@@ -354,14 +354,14 @@ static int netback_probe(struct xenbus_device *dev,
}

/* We support dynamic multicast-control. */
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-multicast-control", "%d", 1);
if (err) {
message = "writing feature-multicast-control";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-dynamic-multicast-control",
"%d", 1);
if (err) {
@@ -369,7 +369,7 @@ static int netback_probe(struct xenbus_device *dev,
goto abort_transaction;
}

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(dev->xh, xbt, 0);
} while (err == -EAGAIN);

if (err) {
@@ -381,25 +381,25 @@ static int netback_probe(struct xenbus_device *dev,
* Split event channels support, this is optional so it is not
* put inside the above loop.
*/
- err = xenbus_printf(XBT_NIL, dev->nodename,
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename,
"feature-split-event-channels",
"%u", separate_tx_rx_irq);
if (err)
pr_debug("Error writing feature-split-event-channels\n");

/* Multi-queue support: This is an optional feature. */
- err = xenbus_printf(XBT_NIL, dev->nodename,
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename,
"multi-queue-max-queues", "%u", xenvif_max_queues);
if (err)
pr_debug("Error writing multi-queue-max-queues\n");

- err = xenbus_printf(XBT_NIL, dev->nodename,
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename,
"feature-ctrl-ring",
"%u", true);
if (err)
pr_debug("Error writing feature-ctrl-ring\n");

- script = xenbus_read(XBT_NIL, dev->nodename, "script", NULL);
+ script = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "script", NULL);
if (IS_ERR(script)) {
err = PTR_ERR(script);
xenbus_dev_fatal(dev, err, "reading script");
@@ -417,7 +417,7 @@ static int netback_probe(struct xenbus_device *dev,
return 0;

abort_transaction:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(dev->xh, xbt, 1);
xenbus_dev_fatal(dev, err, "%s", message);
fail:
pr_debug("failed\n");
@@ -459,7 +459,7 @@ static int backend_create_xenvif(struct backend_info *be)
if (be->vif != NULL)
return 0;

- err = xenbus_scanf(XBT_NIL, dev->nodename, "handle", "%li", &handle);
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->nodename, "handle", "%li", &handle);
if (err != 1) {
xenbus_dev_fatal(dev, err, "reading handle");
return (err < 0) ? err : -EINVAL;
@@ -680,7 +680,7 @@ static void xen_net_read_rate(struct xenbus_device *dev,
*bytes = ~0UL;
*usec = 0;

- ratestr = xenbus_read(XBT_NIL, dev->nodename, "rate", NULL);
+ ratestr = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "rate", NULL);
if (IS_ERR(ratestr))
return;

@@ -710,7 +710,7 @@ static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
char *s, *e, *macstr;
int i;

- macstr = s = xenbus_read(XBT_NIL, dev->nodename, "mac", NULL);
+ macstr = s = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "mac", NULL);
if (IS_ERR(macstr))
return PTR_ERR(macstr);

@@ -765,7 +765,7 @@ static int xen_register_credit_watch(struct xenbus_device *dev,
snprintf(node, maxlen, "%s/rate", dev->nodename);
vif->credit_watch.node = node;
vif->credit_watch.callback = xen_net_rate_changed;
- err = register_xenbus_watch(&vif->credit_watch);
+ err = register_xenbus_watch(dev->xh, &vif->credit_watch);
if (err) {
pr_err("Failed to set watcher %s\n", vif->credit_watch.node);
kfree(node);
@@ -777,8 +777,9 @@ static int xen_register_credit_watch(struct xenbus_device *dev,

static void xen_unregister_credit_watch(struct xenvif *vif)
{
+ struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
if (vif->credit_watch.node) {
- unregister_xenbus_watch(&vif->credit_watch);
+ unregister_xenbus_watch(dev->xh, &vif->credit_watch);
kfree(vif->credit_watch.node);
vif->credit_watch.node = NULL;
}
@@ -791,7 +792,7 @@ static void xen_mcast_ctrl_changed(struct xenbus_watch *watch,
mcast_ctrl_watch);
struct xenbus_device *dev = xenvif_to_xenbus_device(vif);

- vif->multicast_control = !!xenbus_read_unsigned(dev->otherend,
+ vif->multicast_control = !!xenbus_read_unsigned(dev->xh, dev->otherend,
"request-multicast-control", 0);
}

@@ -817,7 +818,7 @@ static int xen_register_mcast_ctrl_watch(struct xenbus_device *dev,
dev->otherend);
vif->mcast_ctrl_watch.node = node;
vif->mcast_ctrl_watch.callback = xen_mcast_ctrl_changed;
- err = register_xenbus_watch(&vif->mcast_ctrl_watch);
+ err = register_xenbus_watch(dev->xh, &vif->mcast_ctrl_watch);
if (err) {
pr_err("Failed to set watcher %s\n",
vif->mcast_ctrl_watch.node);
@@ -830,8 +831,10 @@ static int xen_register_mcast_ctrl_watch(struct xenbus_device *dev,

static void xen_unregister_mcast_ctrl_watch(struct xenvif *vif)
{
+ struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
+
if (vif->mcast_ctrl_watch.node) {
- unregister_xenbus_watch(&vif->mcast_ctrl_watch);
+ unregister_xenbus_watch(dev->xh, &vif->mcast_ctrl_watch);
kfree(vif->mcast_ctrl_watch.node);
vif->mcast_ctrl_watch.node = NULL;
}
@@ -853,7 +856,7 @@ static void xen_unregister_watchers(struct xenvif *vif)
static void unregister_hotplug_status_watch(struct backend_info *be)
{
if (be->have_hotplug_status_watch) {
- unregister_xenbus_watch(&be->hotplug_status_watch);
+ unregister_xenbus_watch(be->dev->xh, &be->hotplug_status_watch);
kfree(be->hotplug_status_watch.node);
}
be->have_hotplug_status_watch = 0;
@@ -869,7 +872,7 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
char *str;
unsigned int len;

- str = xenbus_read(XBT_NIL, be->dev->nodename, "hotplug-status", &len);
+ str = xenbus_read(be->dev->xh, XBT_NIL, be->dev->nodename, "hotplug-status", &len);
if (IS_ERR(str))
return;
if (len == sizeof("connected")-1 && !memcmp(str, "connected", len)) {
@@ -891,14 +894,14 @@ static int connect_ctrl_ring(struct backend_info *be)
unsigned int evtchn;
int err;

- err = xenbus_scanf(XBT_NIL, dev->otherend,
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->otherend,
"ctrl-ring-ref", "%u", &val);
if (err < 0)
goto done; /* The frontend does not have a control ring */

ring_ref = val;

- err = xenbus_scanf(XBT_NIL, dev->otherend,
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->otherend,
"event-channel-ctrl", "%u", &val);
if (err < 0) {
xenbus_dev_fatal(dev, err,
@@ -936,7 +939,7 @@ static void connect(struct backend_info *be)
/* Check whether the frontend requested multiple queues
* and read the number requested.
*/
- requested_num_queues = xenbus_read_unsigned(dev->otherend,
+ requested_num_queues = xenbus_read_unsigned(dev->xh, dev->otherend,
"multi-queue-num-queues", 1);
if (requested_num_queues > xenvif_max_queues) {
/* buggy or malicious guest */
@@ -1087,7 +1090,7 @@ static int connect_data_rings(struct backend_info *be,
queue->id);
}

- err = xenbus_gather(XBT_NIL, xspath,
+ err = xenbus_gather(dev->xh, XBT_NIL, xspath,
"tx-ring-ref", "%lu", &tx_ring_ref,
"rx-ring-ref", "%lu", &rx_ring_ref, NULL);
if (err) {
@@ -1098,11 +1101,11 @@ static int connect_data_rings(struct backend_info *be,
}

/* Try split event channels first, then single event channel. */
- err = xenbus_gather(XBT_NIL, xspath,
+ err = xenbus_gather(dev->xh, XBT_NIL, xspath,
"event-channel-tx", "%u", &tx_evtchn,
"event-channel-rx", "%u", &rx_evtchn, NULL);
if (err < 0) {
- err = xenbus_scanf(XBT_NIL, xspath,
+ err = xenbus_scanf(dev->xh, XBT_NIL, xspath,
"event-channel", "%u", &tx_evtchn);
if (err < 0) {
xenbus_dev_fatal(dev, err,
@@ -1137,7 +1140,7 @@ static int read_xenbus_vif_flags(struct backend_info *be)
unsigned int rx_copy;
int err;

- err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->otherend, "request-rx-copy", "%u",
&rx_copy);
if (err == -ENOENT) {
err = 0;
@@ -1151,7 +1154,7 @@ static int read_xenbus_vif_flags(struct backend_info *be)
if (!rx_copy)
return -EOPNOTSUPP;

- if (!xenbus_read_unsigned(dev->otherend, "feature-rx-notify", 0)) {
+ if (!xenbus_read_unsigned(dev->xh, dev->otherend, "feature-rx-notify", 0)) {
/* - Reduce drain timeout to poll more frequently for
* Rx requests.
* - Disable Rx stall detection.
@@ -1160,20 +1163,20 @@ static int read_xenbus_vif_flags(struct backend_info *be)
be->vif->stall_timeout = 0;
}

- vif->can_sg = !!xenbus_read_unsigned(dev->otherend, "feature-sg", 0);
+ vif->can_sg = !!xenbus_read_unsigned(dev->xh, dev->otherend, "feature-sg", 0);

vif->gso_mask = 0;

- if (xenbus_read_unsigned(dev->otherend, "feature-gso-tcpv4", 0))
+ if (xenbus_read_unsigned(dev->xh, dev->otherend, "feature-gso-tcpv4", 0))
vif->gso_mask |= GSO_BIT(TCPV4);

- if (xenbus_read_unsigned(dev->otherend, "feature-gso-tcpv6", 0))
+ if (xenbus_read_unsigned(dev->xh, dev->otherend, "feature-gso-tcpv6", 0))
vif->gso_mask |= GSO_BIT(TCPV6);

- vif->ip_csum = !xenbus_read_unsigned(dev->otherend,
+ vif->ip_csum = !xenbus_read_unsigned(dev->xh, dev->otherend,
"feature-no-csum-offload", 0);

- vif->ipv6_csum = !!xenbus_read_unsigned(dev->otherend,
+ vif->ipv6_csum = !!xenbus_read_unsigned(dev->xh, dev->otherend,
"feature-ipv6-csum-offload", 0);

return 0;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ee28e8b85406..71007ad822c0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -285,6 +285,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
RING_IDX req_prod = queue->rx.req_prod_pvt;
int notify;
int err = 0;
+ struct xenbus_device *dev = queue->info->xbdev;

if (unlikely(!netif_carrier_ok(queue->info->netdev)))
return;
@@ -309,14 +310,14 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
BUG_ON(queue->rx_skbs[id]);
queue->rx_skbs[id] = skb;

- ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
+ ref = gnttab_claim_grant_reference(dev->xh, &queue->gref_rx_head);
WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref));
queue->grant_rx_ref[id] = ref;

page = skb_frag_page(&skb_shinfo(skb)->frags[0]);

req = RING_GET_REQUEST(&queue->rx, req_prod);
- gnttab_page_grant_foreign_access_ref_one(ref,
+ gnttab_page_grant_foreign_access_ref_one(dev->xh, ref,
queue->info->xbdev->otherend_id,
page,
0);
@@ -377,6 +378,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
unsigned short id;
struct sk_buff *skb;
bool more_to_do;
+ struct xenbus_device *dev = queue->info->xbdev;

BUG_ON(!netif_carrier_ok(queue->info->netdev));

@@ -393,15 +395,15 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)

id = txrsp->id;
skb = queue->tx_skbs[id].skb;
- if (unlikely(gnttab_query_foreign_access(
+ if (unlikely(gnttab_query_foreign_access(dev->xh,
queue->grant_tx_ref[id]) != 0)) {
pr_alert("%s: warning -- grant still in use by backend domain\n",
__func__);
BUG();
}
- gnttab_end_foreign_access_ref(
+ gnttab_end_foreign_access_ref(dev->xh,
queue->grant_tx_ref[id], GNTMAP_readonly);
- gnttab_release_grant_reference(
+ gnttab_release_grant_reference(dev->xh,
&queue->gref_tx_head, queue->grant_tx_ref[id]);
queue->grant_tx_ref[id] = GRANT_INVALID_REF;
queue->grant_tx_page[id] = NULL;
@@ -436,13 +438,14 @@ static void xennet_tx_setup_grant(unsigned long gfn, unsigned int offset,
struct page *page = info->page;
struct netfront_queue *queue = info->queue;
struct sk_buff *skb = info->skb;
+ struct xenbus_device *dev = queue->info->xbdev;

id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
- ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
+ ref = gnttab_claim_grant_reference(dev->xh, &queue->gref_tx_head);
WARN_ON_ONCE(IS_ERR_VALUE((unsigned long)(int)ref));

- gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
+ gnttab_grant_foreign_access_ref(dev->xh, ref, queue->info->xbdev->otherend_id,
gfn, GNTMAP_readonly);

queue->tx_skbs[id].skb = skb;
@@ -786,6 +789,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
struct xen_netif_rx_response *rx = &rinfo->rx;
struct xen_netif_extra_info *extras = rinfo->extras;
struct device *dev = &queue->info->netdev->dev;
+ struct xenbus_device *xdev = queue->info->xbdev;
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
grant_ref_t ref = xennet_get_rx_ref(queue, cons);
@@ -823,10 +827,10 @@ static int xennet_get_responses(struct netfront_queue *queue,
goto next;
}

- ret = gnttab_end_foreign_access_ref(ref, 0);
+ ret = gnttab_end_foreign_access_ref(xdev->xh, ref, 0);
BUG_ON(!ret);

- gnttab_release_grant_reference(&queue->gref_rx_head, ref);
+ gnttab_release_grant_reference(xdev->xh, &queue->gref_rx_head, ref);

__skb_queue_tail(list, skb);

@@ -1130,6 +1134,7 @@ static void xennet_release_tx_bufs(struct netfront_queue *queue)
{
struct sk_buff *skb;
int i;
+ struct xenbus_device *dev = queue->info->xbdev;

for (i = 0; i < NET_TX_RING_SIZE; i++) {
/* Skip over entries which are actually freelist references */
@@ -1138,7 +1143,7 @@ static void xennet_release_tx_bufs(struct netfront_queue *queue)

skb = queue->tx_skbs[i].skb;
get_page(queue->grant_tx_page[i]);
- gnttab_end_foreign_access(queue->grant_tx_ref[i],
+ gnttab_end_foreign_access(dev->xh, queue->grant_tx_ref[i],
GNTMAP_readonly,
(unsigned long)page_address(queue->grant_tx_page[i]));
queue->grant_tx_page[i] = NULL;
@@ -1151,6 +1156,7 @@ static void xennet_release_tx_bufs(struct netfront_queue *queue)
static void xennet_release_rx_bufs(struct netfront_queue *queue)
{
int id, ref;
+ struct xenbus_device *dev = queue->info->xbdev;

spin_lock_bh(&queue->rx_lock);

@@ -1172,7 +1178,7 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
* foreign access is ended (which may be deferred).
*/
get_page(page);
- gnttab_end_foreign_access(ref, 0,
+ gnttab_end_foreign_access(dev->xh, ref, 0,
(unsigned long)page_address(page));
queue->grant_rx_ref[id] = GRANT_INVALID_REF;

@@ -1186,22 +1192,23 @@ static netdev_features_t xennet_fix_features(struct net_device *dev,
netdev_features_t features)
{
struct netfront_info *np = netdev_priv(dev);
+ struct xenbus_device *xdev = np->xbdev;

if (features & NETIF_F_SG &&
- !xenbus_read_unsigned(np->xbdev->otherend, "feature-sg", 0))
+ !xenbus_read_unsigned(xdev->xh, np->xbdev->otherend, "feature-sg", 0))
features &= ~NETIF_F_SG;

if (features & NETIF_F_IPV6_CSUM &&
- !xenbus_read_unsigned(np->xbdev->otherend,
+ !xenbus_read_unsigned(xdev->xh, np->xbdev->otherend,
"feature-ipv6-csum-offload", 0))
features &= ~NETIF_F_IPV6_CSUM;

if (features & NETIF_F_TSO &&
- !xenbus_read_unsigned(np->xbdev->otherend, "feature-gso-tcpv4", 0))
+ !xenbus_read_unsigned(xdev->xh, np->xbdev->otherend, "feature-gso-tcpv4", 0))
features &= ~NETIF_F_TSO;

if (features & NETIF_F_TSO6 &&
- !xenbus_read_unsigned(np->xbdev->otherend, "feature-gso-tcpv6", 0))
+ !xenbus_read_unsigned(xdev->xh, np->xbdev->otherend, "feature-gso-tcpv6", 0))
features &= ~NETIF_F_TSO6;

return features;
@@ -1375,17 +1382,18 @@ static int netfront_probe(struct xenbus_device *dev,
return 0;
}

-static void xennet_end_access(int ref, void *page)
+static void xennet_end_access(xenhost_t *xh, int ref, void *page)
{
/* This frees the page as a side-effect */
if (ref != GRANT_INVALID_REF)
- gnttab_end_foreign_access(ref, 0, (unsigned long)page);
+ gnttab_end_foreign_access(xh, ref, 0, (unsigned long)page);
}

static void xennet_disconnect_backend(struct netfront_info *info)
{
unsigned int i = 0;
unsigned int num_queues = info->netdev->real_num_tx_queues;
+ struct xenbus_device *dev = info->xbdev;

netif_carrier_off(info->netdev);

@@ -1408,12 +1416,12 @@ static void xennet_disconnect_backend(struct netfront_info *info)

xennet_release_tx_bufs(queue);
xennet_release_rx_bufs(queue);
- gnttab_free_grant_references(queue->gref_tx_head);
- gnttab_free_grant_references(queue->gref_rx_head);
+ gnttab_free_grant_references(dev->xh, queue->gref_tx_head);
+ gnttab_free_grant_references(dev->xh, queue->gref_rx_head);

/* End access and free the pages */
- xennet_end_access(queue->tx_ring_ref, queue->tx.sring);
- xennet_end_access(queue->rx_ring_ref, queue->rx.sring);
+ xennet_end_access(dev->xh, queue->tx_ring_ref, queue->tx.sring);
+ xennet_end_access(dev->xh, queue->rx_ring_ref, queue->rx.sring);

queue->tx_ring_ref = GRANT_INVALID_REF;
queue->rx_ring_ref = GRANT_INVALID_REF;
@@ -1443,7 +1451,7 @@ static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
char *s, *e, *macstr;
int i;

- macstr = s = xenbus_read(XBT_NIL, dev->nodename, "mac", NULL);
+ macstr = s = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "mac", NULL);
if (IS_ERR(macstr))
return PTR_ERR(macstr);

@@ -1588,11 +1596,11 @@ static int setup_netfront(struct xenbus_device *dev,
* granted pages because backend is not accessing it at this point.
*/
alloc_evtchn_fail:
- gnttab_end_foreign_access_ref(queue->rx_ring_ref, 0);
+ gnttab_end_foreign_access_ref(dev->xh, queue->rx_ring_ref, 0);
grant_rx_ring_fail:
free_page((unsigned long)rxs);
alloc_rx_ring_fail:
- gnttab_end_foreign_access_ref(queue->tx_ring_ref, 0);
+ gnttab_end_foreign_access_ref(dev->xh, queue->tx_ring_ref, 0);
grant_tx_ring_fail:
free_page((unsigned long)txs);
fail:
@@ -1608,6 +1616,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
unsigned short i;
int err = 0;
char *devid;
+ struct xenbus_device *dev = queue->info->xbdev;

spin_lock_init(&queue->tx_lock);
spin_lock_init(&queue->rx_lock);
@@ -1633,7 +1642,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
}

/* A grant for every tx ring slot */
- if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
+ if (gnttab_alloc_grant_references(dev->xh, NET_TX_RING_SIZE,
&queue->gref_tx_head) < 0) {
pr_alert("can't alloc tx grant refs\n");
err = -ENOMEM;
@@ -1641,7 +1650,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
}

/* A grant for every rx ring slot */
- if (gnttab_alloc_grant_references(NET_RX_RING_SIZE,
+ if (gnttab_alloc_grant_references(dev->xh, NET_RX_RING_SIZE,
&queue->gref_rx_head) < 0) {
pr_alert("can't alloc rx grant refs\n");
err = -ENOMEM;
@@ -1651,7 +1660,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
return 0;

exit_free_tx:
- gnttab_free_grant_references(queue->gref_tx_head);
+ gnttab_free_grant_references(dev->xh, queue->gref_tx_head);
exit:
return err;
}
@@ -1685,14 +1694,14 @@ static int write_queue_xenstore_keys(struct netfront_queue *queue,
}

/* Write ring references */
- err = xenbus_printf(*xbt, path, "tx-ring-ref", "%u",
+ err = xenbus_printf(dev->xh, *xbt, path, "tx-ring-ref", "%u",
queue->tx_ring_ref);
if (err) {
message = "writing tx-ring-ref";
goto error;
}

- err = xenbus_printf(*xbt, path, "rx-ring-ref", "%u",
+ err = xenbus_printf(dev->xh, *xbt, path, "rx-ring-ref", "%u",
queue->rx_ring_ref);
if (err) {
message = "writing rx-ring-ref";
@@ -1704,7 +1713,7 @@ static int write_queue_xenstore_keys(struct netfront_queue *queue,
*/
if (queue->tx_evtchn == queue->rx_evtchn) {
/* Shared event channel */
- err = xenbus_printf(*xbt, path,
+ err = xenbus_printf(dev->xh,*xbt, path,
"event-channel", "%u", queue->tx_evtchn);
if (err) {
message = "writing event-channel";
@@ -1712,14 +1721,14 @@ static int write_queue_xenstore_keys(struct netfront_queue *queue,
}
} else {
/* Split event channels */
- err = xenbus_printf(*xbt, path,
+ err = xenbus_printf(dev->xh, *xbt, path,
"event-channel-tx", "%u", queue->tx_evtchn);
if (err) {
message = "writing event-channel-tx";
goto error;
}

- err = xenbus_printf(*xbt, path,
+ err = xenbus_printf(dev->xh, *xbt, path,
"event-channel-rx", "%u", queue->rx_evtchn);
if (err) {
message = "writing event-channel-rx";
@@ -1810,12 +1819,12 @@ static int talk_to_netback(struct xenbus_device *dev,
info->netdev->irq = 0;

/* Check if backend supports multiple queues */
- max_queues = xenbus_read_unsigned(info->xbdev->otherend,
+ max_queues = xenbus_read_unsigned(dev->xh, info->xbdev->otherend,
"multi-queue-max-queues", 1);
num_queues = min(max_queues, xennet_max_queues);

/* Check feature-split-event-channels */
- feature_split_evtchn = xenbus_read_unsigned(info->xbdev->otherend,
+ feature_split_evtchn = xenbus_read_unsigned(dev->xh, info->xbdev->otherend,
"feature-split-event-channels", 0);

/* Read mac addr. */
@@ -1847,16 +1856,16 @@ static int talk_to_netback(struct xenbus_device *dev,
}

again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
xenbus_dev_fatal(dev, err, "starting transaction");
goto destroy_ring;
}

- if (xenbus_exists(XBT_NIL,
+ if (xenbus_exists(dev->xh, XBT_NIL,
info->xbdev->otherend, "multi-queue-max-queues")) {
/* Write the number of queues */
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"multi-queue-num-queues", "%u", num_queues);
if (err) {
message = "writing multi-queue-num-queues";
@@ -1879,45 +1888,45 @@ static int talk_to_netback(struct xenbus_device *dev,
}

/* The remaining keys are not queue-specific */
- err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "request-rx-copy", "%u",
1);
if (err) {
message = "writing request-rx-copy";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-rx-notify", "%d", 1);
if (err) {
message = "writing feature-rx-notify";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-sg", "%d", 1);
if (err) {
message = "writing feature-sg";
goto abort_transaction;
}

- err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4", "%d", 1);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-gso-tcpv4", "%d", 1);
if (err) {
message = "writing feature-gso-tcpv4";
goto abort_transaction;
}

- err = xenbus_write(xbt, dev->nodename, "feature-gso-tcpv6", "1");
+ err = xenbus_write(dev->xh, xbt, dev->nodename, "feature-gso-tcpv6", "1");
if (err) {
message = "writing feature-gso-tcpv6";
goto abort_transaction;
}

- err = xenbus_write(xbt, dev->nodename, "feature-ipv6-csum-offload",
+ err = xenbus_write(dev->xh, xbt, dev->nodename, "feature-ipv6-csum-offload",
"1");
if (err) {
message = "writing feature-ipv6-csum-offload";
goto abort_transaction;
}

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(dev->xh, xbt, 0);
if (err) {
if (err == -EAGAIN)
goto again;
@@ -1930,7 +1939,7 @@ static int talk_to_netback(struct xenbus_device *dev,
abort_transaction:
xenbus_dev_fatal(dev, err, "%s", message);
abort_transaction_no_dev_fatal:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(dev->xh, xbt, 1);
destroy_ring:
xennet_disconnect_backend(info);
rtnl_lock();
@@ -1949,8 +1958,9 @@ static int xennet_connect(struct net_device *dev)
int err;
unsigned int j = 0;
struct netfront_queue *queue = NULL;
+ struct xenbus_device *xdev = np->xbdev;

- if (!xenbus_read_unsigned(np->xbdev->otherend, "feature-rx-copy", 0)) {
+ if (!xenbus_read_unsigned(xdev->xh, np->xbdev->otherend, "feature-rx-copy", 0)) {
dev_info(&dev->dev,
"backend does not support copying receive path\n");
return -ENODEV;
--
2.20.1

2019-05-09 17:30:23

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 01/16] x86/xen: add xenhost_t interface

Add xenhost_t which will serve as an abstraction over Xen interfaces.
It co-exists with the PV/HVM/PVH abstractions (x86_init, hypervisor_x86,
pv_ops etc) and is meant to capture mechanisms for communication with
Xen so we could have different types of underlying Xen: regular, local,
and nested.

Also add xenhost_register() and stub registration in the various guest
types.

Signed-off-by: Ankur Arora <[email protected]>
---
arch/x86/xen/Makefile | 1 +
arch/x86/xen/enlighten_hvm.c | 13 +++++
arch/x86/xen/enlighten_pv.c | 16 ++++++
arch/x86/xen/enlighten_pvh.c | 12 +++++
arch/x86/xen/xenhost.c | 75 ++++++++++++++++++++++++++++
include/xen/xen.h | 3 ++
include/xen/xenhost.h | 95 ++++++++++++++++++++++++++++++++++++
7 files changed, 215 insertions(+)
create mode 100644 arch/x86/xen/xenhost.c
create mode 100644 include/xen/xenhost.h

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 084de77a109e..564b4dddbc15 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -18,6 +18,7 @@ obj-y += mmu.o
obj-y += time.o
obj-y += grant-table.o
obj-y += suspend.o
+obj-y += xenhost.o

obj-$(CONFIG_XEN_PVHVM) += enlighten_hvm.o
obj-$(CONFIG_XEN_PVHVM) += mmu_hvm.o
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 0e75642d42a3..100452f4f44c 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -5,6 +5,7 @@
#include <linux/kexec.h>
#include <linux/memblock.h>

+#include <xen/xenhost.h>
#include <xen/features.h>
#include <xen/events.h>
#include <xen/interface/memory.h>
@@ -82,6 +83,12 @@ static void __init xen_hvm_init_mem_mapping(void)
xen_vcpu_info_reset(0);
}

+xenhost_ops_t xh_hvm_ops = {
+};
+
+xenhost_ops_t xh_hvm_nested_ops = {
+};
+
static void __init init_hvm_pv_info(void)
{
int major, minor;
@@ -179,6 +186,12 @@ static void __init xen_hvm_guest_init(void)
{
if (xen_pv_domain())
return;
+ /*
+ * We need only xenhost_r1 for HVM guests since they cannot be
+ * driver domain (?) or dom0.
+ */
+ if (!xen_pvh_domain())
+ xenhost_register(xenhost_r1, &xh_hvm_ops);

init_hvm_pv_info();

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index c54a493e139a..bb6e811c1525 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -36,6 +36,7 @@

#include <xen/xen.h>
#include <xen/events.h>
+#include <xen/xenhost.h>
#include <xen/interface/xen.h>
#include <xen/interface/version.h>
#include <xen/interface/physdev.h>
@@ -1188,6 +1189,12 @@ static void __init xen_dom0_set_legacy_features(void)
x86_platform.legacy.rtc = 1;
}

+xenhost_ops_t xh_pv_ops = {
+};
+
+xenhost_ops_t xh_pv_nested_ops = {
+};
+
/* First C function to be called on Xen boot */
asmlinkage __visible void __init xen_start_kernel(void)
{
@@ -1198,6 +1205,15 @@ asmlinkage __visible void __init xen_start_kernel(void)
if (!xen_start_info)
return;

+ xenhost_register(xenhost_r1, &xh_pv_ops);
+
+ /*
+ * Detect in some implementation defined manner whether this is
+ * nested or not.
+ */
+ if (xen_driver_domain() && xen_nested())
+ xenhost_register(xenhost_r2, &xh_pv_nested_ops);
+
xen_domain_type = XEN_PV_DOMAIN;
xen_start_flags = xen_start_info->flags;

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 35b7599d2d0b..826c296d27a3 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -8,6 +8,7 @@
#include <asm/e820/api.h>

#include <xen/xen.h>
+#include <xen/xenhost.h>
#include <asm/xen/interface.h>
#include <asm/xen/hypercall.h>

@@ -21,11 +22,22 @@
*/
bool xen_pvh __attribute__((section(".data"))) = 0;

+extern xenhost_ops_t xh_hvm_ops, xh_hvm_nested_ops;
+
void __init xen_pvh_init(void)
{
u32 msr;
u64 pfn;

+ xenhost_register(xenhost_r1, &xh_hvm_ops);
+
+ /*
+ * Detect in some implementation defined manner whether this is
+ * nested or not.
+ */
+ if (xen_driver_domain() && xen_nested())
+ xenhost_register(xenhost_r2, &xh_hvm_nested_ops);
+
xen_pvh = 1;
xen_start_flags = pvh_start_info.flags;

diff --git a/arch/x86/xen/xenhost.c b/arch/x86/xen/xenhost.c
new file mode 100644
index 000000000000..ca90acd7687e
--- /dev/null
+++ b/arch/x86/xen/xenhost.c
@@ -0,0 +1,75 @@
+#include <linux/types.h>
+#include <linux/bug.h>
+#include <xen/xen.h>
+#include <xen/xenhost.h>
+
+xenhost_t xenhosts[2];
+/*
+ * xh_default: interface to the regular hypervisor. xenhost_type is xenhost_r0
+ * or xenhost_r1.
+ *
+ * xh_remote: interface to remote hypervisor. Needed for PV driver support on
+ * L1-dom0/driver-domain for nested Xen. xenhost_type is xenhost_r2.
+ */
+xenhost_t *xh_default = (xenhost_t *) &xenhosts[0];
+xenhost_t *xh_remote = (xenhost_t *) &xenhosts[1];
+
+/*
+ * Exported for use of for_each_xenhost().
+ */
+EXPORT_SYMBOL_GPL(xenhosts);
+
+/*
+ * Some places refer directly to a specific type of xenhost.
+ * This might be better as a macro though.
+ */
+EXPORT_SYMBOL_GPL(xh_default);
+EXPORT_SYMBOL_GPL(xh_remote);
+
+void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops)
+{
+ switch (type) {
+ case xenhost_r0:
+ case xenhost_r1:
+ BUG_ON(xh_default->type != xenhost_invalid);
+
+ xh_default->type = type;
+ xh_default->ops = ops;
+ break;
+ case xenhost_r2:
+ BUG_ON(xh_remote->type != xenhost_invalid);
+
+ /*
+ * We should have a default xenhost by the
+ * time xh_remote is registered.
+ */
+ BUG_ON(!xh_default);
+
+ xh_remote->type = type;
+ xh_remote->ops = ops;
+ break;
+ default:
+ BUG();
+ }
+}
+
+/*
+ * __xenhost_unregister: expected to be called only if there's an
+ * error early in the init.
+ */
+void __xenhost_unregister(enum xenhost_type type)
+{
+ switch (type) {
+ case xenhost_r0:
+ case xenhost_r1:
+ xh_default->type = xenhost_invalid;
+ xh_default->ops = NULL;
+ break;
+ case xenhost_r2:
+ xh_remote->type = xenhost_invalid;
+ xh_remote->ops = NULL;
+ break;
+ default:
+ BUG();
+ }
+}
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 0e2156786ad2..540db8459536 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -42,6 +42,9 @@ extern struct hvm_start_info pvh_start_info;
#define xen_initial_domain() (0)
#endif /* CONFIG_XEN_DOM0 */

+#define xen_driver_domain() xen_initial_domain()
+#define xen_nested() 0
+
struct bio_vec;
bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
const struct bio_vec *vec2);
diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
new file mode 100644
index 000000000000..a58e883f144e
--- /dev/null
+++ b/include/xen/xenhost.h
@@ -0,0 +1,95 @@
+#ifndef __XENHOST_H
+#define __XENHOST_H
+
+/*
+ * Xenhost abstracts out the Xen interface. It co-exists with the PV/HVM/PVH
+ * abstractions (x86_init, hypervisor_x86, pv_ops etc) and is meant to
+ * expose ops for communication between the guest and Xen (hypercall, cpuid,
+ * shared_info/vcpu_info, evtchn, grant-table and on top of those, xenbus, ballooning),
+ * so these could differ based on the kind of underlying Xen: regular, local,
+ * and nested.
+ *
+ * Any call-sites which initiate communication with the hypervisor take
+ * xenhost_t * as a parameter and use the appropriate xenhost interface.
+ *
+ * Note, that the init for the nested xenhost (in the nested dom0 case,
+ * there are two) happens for each operation alongside the default xenhost
+ * (which remains similar to the one now) and is not deferred for later.
+ * This allows us to piggy-back on the non-trivial sequencing, inter-locking
+ * logic in the init of the default xenhost.
+ */
+
+/*
+ * xenhost_type: specifies the controlling Xen interface. The notation,
+ * xenhost_r0, xenhost_r1, xenhost_r2 is meant to invoke hypervisor distance
+ * from the guest.
+ *
+ * Note that the distance is relative, and so does not identify a specific
+ * hypervisor, just the role played by the interface: so, instance for L0-guest
+ * xenhost_r1 would be L0-Xen and for an L1-guest, L1-Xen.
+ */
+enum xenhost_type {
+ xenhost_invalid = 0,
+ /*
+ * xenhost_r1: the guest's frontend or backend drivers talking
+ * to a hypervisor one level removed.
+ * This is the ordinary, non-nested configuration as well as for the
+ * typical nested frontends and backends.
+ *
+ * The corresponding xenhost_t would continue to use the current
+ * interfaces, via a redirection layer.
+ */
+ xenhost_r1,
+
+ /*
+ * xenhost_r2: frontend drivers communicating with a hypervisor two
+ * levels removed: so L1-dom0-frontends communicating with L0-Xen.
+ *
+ * This is the nested-Xen configuration: L1-dom0-frontend drivers can
+ * now talk to L0-dom0-backend drivers via a separate xenhost_t.
+ */
+ xenhost_r2,
+
+ /*
+ * Local/Co-located case: backend drivers now run in the same address
+ * space as the hypervisor. The driver model remains same as
+ * xenhost_r1, but with slightly different interfaces.
+ *
+ * Any frontend guests of this hypervisor will continue to be
+ * xenhost_r1.
+ */
+ xenhost_r0,
+};
+
+struct xenhost_ops;
+
+typedef struct {
+ enum xenhost_type type;
+
+ struct xenhost_ops *ops;
+} xenhost_t;
+
+typedef struct xenhost_ops {
+} xenhost_ops_t;
+
+extern xenhost_t *xh_default, *xh_remote;
+extern xenhost_t xenhosts[2];
+
+/*
+ * xenhost_register(): is called early in the guest's xen-init, after it detects
+ * in some implementation defined manner what kind of underlying xenhost or
+ * xenhosts exist.
+ * Specifies the type of xenhost being registered and the ops for that.
+ */
+void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops);
+void __xenhost_unregister(enum xenhost_type type);
+
+
+/*
+ * Convoluted interface so we can do this without adding a loop counter.
+ */
+#define for_each_xenhost(xh) \
+ for ((xh) = (xenhost_t **) &xenhosts[0]; \
+ (((xh) - (xenhost_t **)&xenhosts) < 2) && (*xh)->type != xenhost_invalid; (xh)++)
+
+#endif /* __XENHOST_H */
--
2.20.1

2019-05-09 17:30:55

by Ankur Arora

[permalink] [raw]
Subject: [RFC PATCH 14/16] xen/blk: gnttab, evtchn, xenbus API changes

For the most part, we now pass xenhost_t * as a parameter.

Co-developed-by: Joao Martins <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
---
drivers/block/xen-blkback/blkback.c | 34 +++++----
drivers/block/xen-blkback/common.h | 2 +-
drivers/block/xen-blkback/xenbus.c | 63 ++++++++---------
drivers/block/xen-blkfront.c | 103 +++++++++++++++-------------
4 files changed, 107 insertions(+), 95 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 7ad4423c24b8..d366a17a4bd8 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,7 +142,7 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
}

-static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
+static inline int get_free_page(xenhost_t *xh, struct xen_blkif_ring *ring, struct page **page)
{
unsigned long flags;

@@ -150,7 +150,7 @@ static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
if (list_empty(&ring->free_pages)) {
BUG_ON(ring->free_pages_num != 0);
spin_unlock_irqrestore(&ring->free_pages_lock, flags);
- return gnttab_alloc_pages(1, page);
+ return gnttab_alloc_pages(xh, 1, page);
}
BUG_ON(ring->free_pages_num == 0);
page[0] = list_first_entry(&ring->free_pages, struct page, lru);
@@ -174,7 +174,7 @@ static inline void put_free_pages(struct xen_blkif_ring *ring, struct page **pag
spin_unlock_irqrestore(&ring->free_pages_lock, flags);
}

-static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
+static inline void shrink_free_pagepool(xenhost_t *xh, struct xen_blkif_ring *ring, int num)
{
/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
struct page *page[NUM_BATCH_FREE_PAGES];
@@ -190,14 +190,14 @@ static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
ring->free_pages_num--;
if (++num_pages == NUM_BATCH_FREE_PAGES) {
spin_unlock_irqrestore(&ring->free_pages_lock, flags);
- gnttab_free_pages(num_pages, page);
+ gnttab_free_pages(xh, num_pages, page);
spin_lock_irqsave(&ring->free_pages_lock, flags);
num_pages = 0;
}
}
spin_unlock_irqrestore(&ring->free_pages_lock, flags);
if (num_pages != 0)
- gnttab_free_pages(num_pages, page);
+ gnttab_free_pages(xh, num_pages, page);
}

#define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
@@ -301,8 +301,8 @@ static void put_persistent_gnt(struct xen_blkif_ring *ring,
atomic_dec(&ring->persistent_gnt_in_use);
}

-static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *root,
- unsigned int num)
+static void free_persistent_gnts(xenhost_t *xh, struct xen_blkif_ring *ring,
+ struct rb_root *root, unsigned int num)
{
struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -314,6 +314,7 @@ static void free_persistent_gnts(struct xen_blkif_ring *ring, struct rb_root *ro
unmap_data.pages = pages;
unmap_data.unmap_ops = unmap;
unmap_data.kunmap_ops = NULL;
+ unmap_data.xh = xh;

foreach_grant_safe(persistent_gnt, n, root, node) {
BUG_ON(persistent_gnt->handle ==
@@ -351,10 +352,12 @@ void xen_blkbk_unmap_purged_grants(struct work_struct *work)
int segs_to_unmap = 0;
struct xen_blkif_ring *ring = container_of(work, typeof(*ring), persistent_purge_work);
struct gntab_unmap_queue_data unmap_data;
+ struct xenbus_device *dev = xen_blkbk_xenbus(ring->blkif->be);

unmap_data.pages = pages;
unmap_data.unmap_ops = unmap;
unmap_data.kunmap_ops = NULL;
+ unmap_data.xh = dev->xh;

while(!list_empty(&ring->persistent_purge_list)) {
persistent_gnt = list_first_entry(&ring->persistent_purge_list,
@@ -615,6 +618,7 @@ int xen_blkif_schedule(void *arg)
struct xen_vbd *vbd = &blkif->vbd;
unsigned long timeout;
int ret;
+ struct xenbus_device *dev = xen_blkbk_xenbus(blkif->be);

set_freezable();
while (!kthread_should_stop()) {
@@ -657,7 +661,7 @@ int xen_blkif_schedule(void *arg)
}

/* Shrink if we have more than xen_blkif_max_buffer_pages */
- shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+ shrink_free_pagepool(dev->xh, ring, xen_blkif_max_buffer_pages);

if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -677,18 +681,18 @@ int xen_blkif_schedule(void *arg)
/*
* Remove persistent grants and empty the pool of free pages
*/
-void xen_blkbk_free_caches(struct xen_blkif_ring *ring)
+void xen_blkbk_free_caches(xenhost_t *xh, struct xen_blkif_ring *ring)
{
/* Free all persistent grant pages */
if (!RB_EMPTY_ROOT(&ring->persistent_gnts))
- free_persistent_gnts(ring, &ring->persistent_gnts,
+ free_persistent_gnts(xh, ring, &ring->persistent_gnts,
ring->persistent_gnt_c);

BUG_ON(!RB_EMPTY_ROOT(&ring->persistent_gnts));
ring->persistent_gnt_c = 0;

/* Since we are shutting down remove all pages from the buffer */
- shrink_free_pagepool(ring, 0 /* All */);
+ shrink_free_pagepool(xh, ring, 0 /* All */);
}

static unsigned int xen_blkbk_unmap_prepare(
@@ -784,6 +788,7 @@ static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
struct gnttab_unmap_grant_ref unmap[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct page *unmap_pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
unsigned int invcount = 0;
+ struct xenbus_device *dev = xen_blkbk_xenbus(ring->blkif->be);
int ret;

while (num) {
@@ -792,7 +797,7 @@ static void xen_blkbk_unmap(struct xen_blkif_ring *ring,
invcount = xen_blkbk_unmap_prepare(ring, pages, batch,
unmap, unmap_pages);
if (invcount) {
- ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
+ ret = gnttab_unmap_refs(dev->xh, unmap, NULL, unmap_pages, invcount);
BUG_ON(ret);
put_free_pages(ring, unmap_pages, invcount);
}
@@ -815,6 +820,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
int last_map = 0, map_until = 0;
int use_persistent_gnts;
struct xen_blkif *blkif = ring->blkif;
+ struct xenbus_device *dev = xen_blkbk_xenbus(blkif->be); /* function call */

use_persistent_gnts = (blkif->vbd.feature_gnt_persistent);

@@ -841,7 +847,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
pages[i]->page = persistent_gnt->page;
pages[i]->persistent_gnt = persistent_gnt;
} else {
- if (get_free_page(ring, &pages[i]->page))
+ if (get_free_page(dev->xh, ring, &pages[i]->page))
goto out_of_memory;
addr = vaddr(pages[i]->page);
pages_to_gnt[segs_to_map] = pages[i]->page;
@@ -859,7 +865,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
}

if (segs_to_map) {
- ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map);
+ ret = gnttab_map_refs(dev->xh, map, NULL, pages_to_gnt, segs_to_map);
BUG_ON(ret);
}

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..633115888765 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -382,7 +382,7 @@ int xen_blkif_xenbus_init(void);
irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
int xen_blkif_schedule(void *arg);
int xen_blkif_purge_persistent(void *arg);
-void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+void xen_blkbk_free_caches(xenhost_t *xh, struct xen_blkif_ring *ring);

int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index beea4272cfd3..a3ed34269b23 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -65,7 +65,7 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
char *devpath, *devname;
struct xenbus_device *dev = blkif->be->dev;

- devpath = xenbus_read(XBT_NIL, dev->nodename, "dev", NULL);
+ devpath = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "dev", NULL);
if (IS_ERR(devpath))
return PTR_ERR(devpath);

@@ -246,6 +246,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
struct pending_req *req, *n;
unsigned int j, r;
bool busy = false;
+ struct xenbus_device *dev = xen_blkbk_xenbus(blkif->be);

for (r = 0; r < blkif->nr_rings; r++) {
struct xen_blkif_ring *ring = &blkif->rings[r];
@@ -279,7 +280,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
}

/* Remove all persistent grants and the cache of ballooned pages. */
- xen_blkbk_free_caches(ring);
+ xen_blkbk_free_caches(dev->xh, ring);

/* Check that there is no request in use */
list_for_each_entry_safe(req, n, &ring->pending_free, free_list) {
@@ -507,7 +508,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
xenvbd_sysfs_delif(dev);

if (be->backend_watch.node) {
- unregister_xenbus_watch(&be->backend_watch);
+ unregister_xenbus_watch(dev->xh, &be->backend_watch);
kfree(be->backend_watch.node);
be->backend_watch.node = NULL;
}
@@ -530,7 +531,7 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
struct xenbus_device *dev = be->dev;
int err;

- err = xenbus_printf(xbt, dev->nodename, "feature-flush-cache",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-flush-cache",
"%d", state);
if (err)
dev_warn(&dev->dev, "writing feature-flush-cache (%d)", err);
@@ -547,18 +548,18 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info
struct block_device *bdev = be->blkif->vbd.bdev;
struct request_queue *q = bdev_get_queue(bdev);

- if (!xenbus_read_unsigned(dev->nodename, "discard-enable", 1))
+ if (!xenbus_read_unsigned(dev->xh, dev->nodename, "discard-enable", 1))
return;

if (blk_queue_discard(q)) {
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"discard-granularity", "%u",
q->limits.discard_granularity);
if (err) {
dev_warn(&dev->dev, "writing discard-granularity (%d)", err);
return;
}
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"discard-alignment", "%u",
q->limits.discard_alignment);
if (err) {
@@ -567,7 +568,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info
}
state = 1;
/* Optional. */
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"discard-secure", "%d",
blkif->vbd.discard_secure);
if (err) {
@@ -575,7 +576,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info
return;
}
}
- err = xenbus_printf(xbt, dev->nodename, "feature-discard",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-discard",
"%d", state);
if (err)
dev_warn(&dev->dev, "writing feature-discard (%d)", err);
@@ -586,7 +587,7 @@ int xen_blkbk_barrier(struct xenbus_transaction xbt,
struct xenbus_device *dev = be->dev;
int err;

- err = xenbus_printf(xbt, dev->nodename, "feature-barrier",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-barrier",
"%d", state);
if (err)
dev_warn(&dev->dev, "writing feature-barrier (%d)", err);
@@ -625,7 +626,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
goto fail;
}

- err = xenbus_printf(XBT_NIL, dev->nodename,
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename,
"feature-max-indirect-segments", "%u",
MAX_INDIRECT_SEGMENTS);
if (err)
@@ -634,7 +635,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
dev->nodename, err);

/* Multi-queue: advertise how many queues are supported by us.*/
- err = xenbus_printf(XBT_NIL, dev->nodename,
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename,
"multi-queue-max-queues", "%u", xenblk_max_queues);
if (err)
pr_warn("Error writing multi-queue-max-queues\n");
@@ -647,7 +648,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
goto fail;

- err = xenbus_printf(XBT_NIL, dev->nodename, "max-ring-page-order", "%u",
+ err = xenbus_printf(dev->xh, XBT_NIL, dev->nodename, "max-ring-page-order", "%u",
xen_blkif_max_ring_order);
if (err)
pr_warn("%s write out 'max-ring-page-order' failed\n", __func__);
@@ -685,7 +686,7 @@ static void backend_changed(struct xenbus_watch *watch,

pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);

- err = xenbus_scanf(XBT_NIL, dev->nodename, "physical-device", "%x:%x",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->nodename, "physical-device", "%x:%x",
&major, &minor);
if (XENBUS_EXIST_ERR(err)) {
/*
@@ -707,7 +708,7 @@ static void backend_changed(struct xenbus_watch *watch,
return;
}

- be->mode = xenbus_read(XBT_NIL, dev->nodename, "mode", NULL);
+ be->mode = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "mode", NULL);
if (IS_ERR(be->mode)) {
err = PTR_ERR(be->mode);
be->mode = NULL;
@@ -715,7 +716,7 @@ static void backend_changed(struct xenbus_watch *watch,
return;
}

- device_type = xenbus_read(XBT_NIL, dev->otherend, "device-type", NULL);
+ device_type = xenbus_read(dev->xh, XBT_NIL, dev->otherend, "device-type", NULL);
if (!IS_ERR(device_type)) {
cdrom = strcmp(device_type, "cdrom") == 0;
kfree(device_type);
@@ -849,7 +850,7 @@ static void connect(struct backend_info *be)

/* Supply the information about the device the frontend needs */
again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
xenbus_dev_fatal(dev, err, "starting transaction");
return;
@@ -862,14 +863,14 @@ static void connect(struct backend_info *be)

xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);

- err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "feature-persistent", "%u", 1);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
dev->nodename);
goto abort;
}

- err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "sectors", "%llu",
(unsigned long long)vbd_sz(&be->blkif->vbd));
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/sectors",
@@ -878,7 +879,7 @@ static void connect(struct backend_info *be)
}

/* FIXME: use a typename instead */
- err = xenbus_printf(xbt, dev->nodename, "info", "%u",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "info", "%u",
be->blkif->vbd.type |
(be->blkif->vbd.readonly ? VDISK_READONLY : 0));
if (err) {
@@ -886,7 +887,7 @@ static void connect(struct backend_info *be)
dev->nodename);
goto abort;
}
- err = xenbus_printf(xbt, dev->nodename, "sector-size", "%lu",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "sector-size", "%lu",
(unsigned long)
bdev_logical_block_size(be->blkif->vbd.bdev));
if (err) {
@@ -894,13 +895,13 @@ static void connect(struct backend_info *be)
dev->nodename);
goto abort;
}
- err = xenbus_printf(xbt, dev->nodename, "physical-sector-size", "%u",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "physical-sector-size", "%u",
bdev_physical_block_size(be->blkif->vbd.bdev));
if (err)
xenbus_dev_error(dev, err, "writing %s/physical-sector-size",
dev->nodename);

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(dev->xh, xbt, 0);
if (err == -EAGAIN)
goto again;
if (err)
@@ -913,7 +914,7 @@ static void connect(struct backend_info *be)

return;
abort:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(dev->xh, xbt, 1);
}

/*
@@ -928,7 +929,7 @@ static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
struct xenbus_device *dev = blkif->be->dev;
unsigned int ring_page_order, nr_grefs, evtchn;

- err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dir, "event-channel", "%u",
&evtchn);
if (err != 1) {
err = -EINVAL;
@@ -936,10 +937,10 @@ static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
return err;
}

- err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->otherend, "ring-page-order", "%u",
&ring_page_order);
if (err != 1) {
- err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u", &ring_ref[0]);
+ err = xenbus_scanf(dev->xh, XBT_NIL, dir, "ring-ref", "%u", &ring_ref[0]);
if (err != 1) {
err = -EINVAL;
xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
@@ -962,7 +963,7 @@ static int read_per_ring_refs(struct xen_blkif_ring *ring, const char *dir)
char ring_ref_name[RINGREF_NAME_LEN];

snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
- err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
+ err = xenbus_scanf(dev->xh, XBT_NIL, dir, ring_ref_name,
"%u", &ring_ref[i]);
if (err != 1) {
err = -EINVAL;
@@ -1034,7 +1035,7 @@ static int connect_ring(struct backend_info *be)
pr_debug("%s %s\n", __func__, dev->otherend);

be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
- err = xenbus_scanf(XBT_NIL, dev->otherend, "protocol",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->otherend, "protocol",
"%63s", protocol);
if (err <= 0)
strcpy(protocol, "unspecified, assuming default");
@@ -1048,7 +1049,7 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
- pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
+ pers_grants = xenbus_read_unsigned(dev->xh, dev->otherend, "feature-persistent",
0);
be->blkif->vbd.feature_gnt_persistent = pers_grants;
be->blkif->vbd.overflow_max_grants = 0;
@@ -1056,7 +1057,7 @@ static int connect_ring(struct backend_info *be)
/*
* Read the number of hardware queues from frontend.
*/
- requested_num_queues = xenbus_read_unsigned(dev->otherend,
+ requested_num_queues = xenbus_read_unsigned(dev->xh, dev->otherend,
"multi-queue-num-queues",
1);
if (requested_num_queues > xenblk_max_queues
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a06716424023..3929370d1f2f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -341,10 +341,11 @@ static struct grant *get_free_grant(struct blkfront_ring_info *rinfo)
return gnt_list_entry;
}

-static inline void grant_foreign_access(const struct grant *gnt_list_entry,
+static inline void grant_foreign_access(xenhost_t *xh,
+ const struct grant *gnt_list_entry,
const struct blkfront_info *info)
{
- gnttab_page_grant_foreign_access_ref_one(gnt_list_entry->gref,
+ gnttab_page_grant_foreign_access_ref_one(xh, gnt_list_entry->gref,
info->xbdev->otherend_id,
gnt_list_entry->page,
0);
@@ -361,13 +362,13 @@ static struct grant *get_grant(grant_ref_t *gref_head,
return gnt_list_entry;

/* Assign a gref to this page */
- gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+ gnt_list_entry->gref = gnttab_claim_grant_reference(info->xbdev->xh, gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (info->feature_persistent)
- grant_foreign_access(gnt_list_entry, info);
+ grant_foreign_access(info->xbdev->xh, gnt_list_entry, info);
else {
/* Grant access to the GFN passed by the caller */
- gnttab_grant_foreign_access_ref(gnt_list_entry->gref,
+ gnttab_grant_foreign_access_ref(info->xbdev->xh, gnt_list_entry->gref,
info->xbdev->otherend_id,
gfn, 0);
}
@@ -385,7 +386,7 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
return gnt_list_entry;

/* Assign a gref to this page */
- gnt_list_entry->gref = gnttab_claim_grant_reference(gref_head);
+ gnt_list_entry->gref = gnttab_claim_grant_reference(info->xbdev->xh, gref_head);
BUG_ON(gnt_list_entry->gref == -ENOSPC);
if (!info->feature_persistent) {
struct page *indirect_page;
@@ -397,7 +398,7 @@ static struct grant *get_indirect_grant(grant_ref_t *gref_head,
list_del(&indirect_page->lru);
gnt_list_entry->page = indirect_page;
}
- grant_foreign_access(gnt_list_entry, info);
+ grant_foreign_access(info->xbdev->xh, gnt_list_entry, info);

return gnt_list_entry;
}
@@ -723,10 +724,10 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
if (rinfo->persistent_gnts_c < max_grefs) {
new_persistent_gnts = true;

- if (gnttab_alloc_grant_references(
+ if (gnttab_alloc_grant_references(info->xbdev->xh,
max_grefs - rinfo->persistent_gnts_c,
&setup.gref_head) < 0) {
- gnttab_request_free_callback(
+ gnttab_request_free_callback(info->xbdev->xh,
&rinfo->callback,
blkif_restart_queue_callback,
rinfo,
@@ -835,7 +836,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
rinfo->shadow[extra_id].req = *extra_ring_req;

if (new_persistent_gnts)
- gnttab_free_grant_references(setup.gref_head);
+ gnttab_free_grant_references(info->xbdev->xh, setup.gref_head);

return 0;
}
@@ -1195,7 +1196,7 @@ static void xlvbd_release_gendisk(struct blkfront_info *info)
struct blkfront_ring_info *rinfo = &info->rinfo[i];

/* No more gnttab callback work. */
- gnttab_cancel_free_callback(&rinfo->callback);
+ gnttab_cancel_free_callback(info->xbdev->xh, &rinfo->callback);

/* Flush gnttab callback work. Must be done with no locks held. */
flush_work(&rinfo->work);
@@ -1265,7 +1266,7 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
&rinfo->grants, node) {
list_del(&persistent_gnt->node);
if (persistent_gnt->gref != GRANT_INVALID_REF) {
- gnttab_end_foreign_access(persistent_gnt->gref,
+ gnttab_end_foreign_access(info->xbdev->xh, persistent_gnt->gref,
0, 0UL);
rinfo->persistent_gnts_c--;
}
@@ -1289,7 +1290,7 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
rinfo->shadow[i].req.u.rw.nr_segments;
for (j = 0; j < segs; j++) {
persistent_gnt = rinfo->shadow[i].grants_used[j];
- gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+ gnttab_end_foreign_access(info->xbdev->xh, persistent_gnt->gref, 0, 0UL);
if (info->feature_persistent)
__free_page(persistent_gnt->page);
kfree(persistent_gnt);
@@ -1304,7 +1305,7 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)

for (j = 0; j < INDIRECT_GREFS(segs); j++) {
persistent_gnt = rinfo->shadow[i].indirect_grants[j];
- gnttab_end_foreign_access(persistent_gnt->gref, 0, 0UL);
+ gnttab_end_foreign_access(info->xbdev->xh, persistent_gnt->gref, 0, 0UL);
__free_page(persistent_gnt->page);
kfree(persistent_gnt);
}
@@ -1319,7 +1320,7 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
}

/* No more gnttab callback work. */
- gnttab_cancel_free_callback(&rinfo->callback);
+ gnttab_cancel_free_callback(info->xbdev->xh, &rinfo->callback);

/* Flush gnttab callback work. Must be done with no locks held. */
flush_work(&rinfo->work);
@@ -1327,7 +1328,7 @@ static void blkif_free_ring(struct blkfront_ring_info *rinfo)
/* Free resources associated with old device channel. */
for (i = 0; i < info->nr_ring_pages; i++) {
if (rinfo->ring_ref[i] != GRANT_INVALID_REF) {
- gnttab_end_foreign_access(rinfo->ring_ref[i], 0, 0);
+ gnttab_end_foreign_access(info->xbdev->xh, rinfo->ring_ref[i], 0, 0);
rinfo->ring_ref[i] = GRANT_INVALID_REF;
}
}
@@ -1491,7 +1492,7 @@ static bool blkif_completion(unsigned long *id,
}
/* Add the persistent grant into the list of free grants */
for (i = 0; i < num_grant; i++) {
- if (gnttab_query_foreign_access(s->grants_used[i]->gref)) {
+ if (gnttab_query_foreign_access(info->xbdev->xh, s->grants_used[i]->gref)) {
/*
* If the grant is still mapped by the backend (the
* backend has chosen to make this grant persistent)
@@ -1510,14 +1511,14 @@ static bool blkif_completion(unsigned long *id,
* so it will not be picked again unless we run out of
* persistent grants.
*/
- gnttab_end_foreign_access(s->grants_used[i]->gref, 0, 0UL);
+ gnttab_end_foreign_access(info->xbdev->xh, s->grants_used[i]->gref, 0, 0UL);
s->grants_used[i]->gref = GRANT_INVALID_REF;
list_add_tail(&s->grants_used[i]->node, &rinfo->grants);
}
}
if (s->req.operation == BLKIF_OP_INDIRECT) {
for (i = 0; i < INDIRECT_GREFS(num_grant); i++) {
- if (gnttab_query_foreign_access(s->indirect_grants[i]->gref)) {
+ if (gnttab_query_foreign_access(info->xbdev->xh, s->indirect_grants[i]->gref)) {
if (!info->feature_persistent)
pr_alert_ratelimited("backed has not unmapped grant: %u\n",
s->indirect_grants[i]->gref);
@@ -1526,7 +1527,7 @@ static bool blkif_completion(unsigned long *id,
} else {
struct page *indirect_page;

- gnttab_end_foreign_access(s->indirect_grants[i]->gref, 0, 0UL);
+ gnttab_end_foreign_access(info->xbdev->xh, s->indirect_grants[i]->gref, 0, 0UL);
/*
* Add the used indirect page back to the list of
* available pages for indirect grefs.
@@ -1726,9 +1727,10 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
unsigned int i;
const char *message = NULL;
struct blkfront_info *info = rinfo->dev_info;
+ xenhost_t *xh = info->xbdev->xh;

if (info->nr_ring_pages == 1) {
- err = xenbus_printf(xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
+ err = xenbus_printf(xh, xbt, dir, "ring-ref", "%u", rinfo->ring_ref[0]);
if (err) {
message = "writing ring-ref";
goto abort_transaction;
@@ -1738,7 +1740,7 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
char ring_ref_name[RINGREF_NAME_LEN];

snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
- err = xenbus_printf(xbt, dir, ring_ref_name,
+ err = xenbus_printf(xh, xbt, dir, ring_ref_name,
"%u", rinfo->ring_ref[i]);
if (err) {
message = "writing ring-ref";
@@ -1747,7 +1749,7 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
}
}

- err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+ err = xenbus_printf(xh, xbt, dir, "event-channel", "%u", rinfo->evtchn);
if (err) {
message = "writing event-channel";
goto abort_transaction;
@@ -1756,7 +1758,7 @@ static int write_per_ring_nodes(struct xenbus_transaction xbt,
return 0;

abort_transaction:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(xh, xbt, 1);
if (message)
xenbus_dev_fatal(info->xbdev, err, "%s", message);

@@ -1782,7 +1784,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
if (!info)
return -ENODEV;

- max_page_order = xenbus_read_unsigned(info->xbdev->otherend,
+ max_page_order = xenbus_read_unsigned(dev->xh, info->xbdev->otherend,
"max-ring-page-order", 0);
ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
info->nr_ring_pages = 1 << ring_page_order;
@@ -1801,14 +1803,14 @@ static int talk_to_blkback(struct xenbus_device *dev,
}

again:
- err = xenbus_transaction_start(&xbt);
+ err = xenbus_transaction_start(dev->xh, &xbt);
if (err) {
xenbus_dev_fatal(dev, err, "starting transaction");
goto destroy_blkring;
}

if (info->nr_ring_pages > 1) {
- err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "ring-page-order", "%u",
ring_page_order);
if (err) {
message = "writing ring-page-order";
@@ -1825,7 +1827,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
char *path;
size_t pathsize;

- err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", "%u",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "multi-queue-num-queues", "%u",
info->nr_rings);
if (err) {
message = "writing multi-queue-num-queues";
@@ -1851,19 +1853,19 @@ static int talk_to_blkback(struct xenbus_device *dev,
}
kfree(path);
}
- err = xenbus_printf(xbt, dev->nodename, "protocol", "%s",
+ err = xenbus_printf(dev->xh, xbt, dev->nodename, "protocol", "%s",
XEN_IO_PROTO_ABI_NATIVE);
if (err) {
message = "writing protocol";
goto abort_transaction;
}
- err = xenbus_printf(xbt, dev->nodename,
+ err = xenbus_printf(dev->xh, xbt, dev->nodename,
"feature-persistent", "%u", 1);
if (err)
dev_warn(&dev->dev,
"writing persistent grants feature to xenbus");

- err = xenbus_transaction_end(xbt, 0);
+ err = xenbus_transaction_end(dev->xh, xbt, 0);
if (err) {
if (err == -EAGAIN)
goto again;
@@ -1884,7 +1886,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
return 0;

abort_transaction:
- xenbus_transaction_end(xbt, 1);
+ xenbus_transaction_end(dev->xh, xbt, 1);
if (message)
xenbus_dev_fatal(dev, err, "%s", message);
destroy_blkring:
@@ -1907,7 +1909,7 @@ static int negotiate_mq(struct blkfront_info *info)
BUG_ON(info->nr_rings);

/* Check if backend supports multiple queues. */
- backend_max_queues = xenbus_read_unsigned(info->xbdev->otherend,
+ backend_max_queues = xenbus_read_unsigned(info->xbdev->xh, info->xbdev->otherend,
"multi-queue-max-queues", 1);
info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
/* We need at least one ring. */
@@ -1948,11 +1950,11 @@ static int blkfront_probe(struct xenbus_device *dev,
struct blkfront_info *info;

/* FIXME: Use dynamic device id if this is not set. */
- err = xenbus_scanf(XBT_NIL, dev->nodename,
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->nodename,
"virtual-device", "%i", &vdevice);
if (err != 1) {
/* go looking in the extended area instead */
- err = xenbus_scanf(XBT_NIL, dev->nodename, "virtual-device-ext",
+ err = xenbus_scanf(dev->xh, XBT_NIL, dev->nodename, "virtual-device-ext",
"%i", &vdevice);
if (err != 1) {
xenbus_dev_fatal(dev, err, "reading virtual-device");
@@ -1980,7 +1982,7 @@ static int blkfront_probe(struct xenbus_device *dev,
}
}
/* do not create a PV cdrom device if we are an HVM guest */
- type = xenbus_read(XBT_NIL, dev->nodename, "device-type", &len);
+ type = xenbus_read(dev->xh, XBT_NIL, dev->nodename, "device-type", &len);
if (IS_ERR(type))
return -ENODEV;
if (strncmp(type, "cdrom", 5) == 0) {
@@ -2173,7 +2175,7 @@ static void blkfront_setup_discard(struct blkfront_info *info)
unsigned int discard_alignment;

info->feature_discard = 1;
- err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+ err = xenbus_gather(info->xbdev->xh, XBT_NIL, info->xbdev->otherend,
"discard-granularity", "%u", &discard_granularity,
"discard-alignment", "%u", &discard_alignment,
NULL);
@@ -2182,7 +2184,7 @@ static void blkfront_setup_discard(struct blkfront_info *info)
info->discard_alignment = discard_alignment;
}
info->feature_secdiscard =
- !!xenbus_read_unsigned(info->xbdev->otherend, "discard-secure",
+ !!xenbus_read_unsigned(info->xbdev->xh, info->xbdev->otherend, "discard-secure",
0);
}

@@ -2279,6 +2281,7 @@ static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
static void blkfront_gather_backend_features(struct blkfront_info *info)
{
unsigned int indirect_segments;
+ xenhost_t *xh = info->xbdev->xh;

info->feature_flush = 0;
info->feature_fua = 0;
@@ -2290,7 +2293,8 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
*
* If there are barriers, then we use flush.
*/
- if (xenbus_read_unsigned(info->xbdev->otherend, "feature-barrier", 0)) {
+ if (xenbus_read_unsigned(xh, info->xbdev->otherend,
+ "feature-barrier", 0)) {
info->feature_flush = 1;
info->feature_fua = 1;
}
@@ -2299,20 +2303,21 @@ static void blkfront_gather_backend_features(struct blkfront_info *info)
* And if there is "feature-flush-cache" use that above
* barriers.
*/
- if (xenbus_read_unsigned(info->xbdev->otherend, "feature-flush-cache",
- 0)) {
+ if (xenbus_read_unsigned(xh, info->xbdev->otherend,
+ "feature-flush-cache", 0)) {
info->feature_flush = 1;
info->feature_fua = 0;
}

- if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
+ if (xenbus_read_unsigned(xh, info->xbdev->otherend,
+ "feature-discard", 0))
blkfront_setup_discard(info);

info->feature_persistent =
- !!xenbus_read_unsigned(info->xbdev->otherend,
+ !!xenbus_read_unsigned(xh, info->xbdev->otherend,
"feature-persistent", 0);

- indirect_segments = xenbus_read_unsigned(info->xbdev->otherend,
+ indirect_segments = xenbus_read_unsigned(xh, info->xbdev->otherend,
"feature-max-indirect-segments", 0);
if (indirect_segments > xen_blkif_max_segments)
indirect_segments = xen_blkif_max_segments;
@@ -2346,7 +2351,7 @@ static void blkfront_connect(struct blkfront_info *info)
* Potentially, the back-end may be signalling
* a capacity change; update the capacity.
*/
- err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+ err = xenbus_scanf(info->xbdev->xh, XBT_NIL, info->xbdev->otherend,
"sectors", "%Lu", &sectors);
if (XENBUS_EXIST_ERR(err))
return;
@@ -2375,7 +2380,7 @@ static void blkfront_connect(struct blkfront_info *info)
dev_dbg(&info->xbdev->dev, "%s:%s.\n",
__func__, info->xbdev->otherend);

- err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
+ err = xenbus_gather(info->xbdev->xh, XBT_NIL, info->xbdev->otherend,
"sectors", "%llu", &sectors,
"info", "%u", &binfo,
"sector-size", "%lu", &sector_size,
@@ -2392,7 +2397,7 @@ static void blkfront_connect(struct blkfront_info *info)
* provide this. Assume physical sector size to be the same as
* sector_size in that case.
*/
- physical_sector_size = xenbus_read_unsigned(info->xbdev->otherend,
+ physical_sector_size = xenbus_read_unsigned(info->xbdev->xh, info->xbdev->otherend,
"physical-sector-size",
sector_size);
blkfront_gather_backend_features(info);
@@ -2668,11 +2673,11 @@ static void purge_persistent_grants(struct blkfront_info *info)
list_for_each_entry_safe(gnt_list_entry, tmp, &rinfo->grants,
node) {
if (gnt_list_entry->gref == GRANT_INVALID_REF ||
- gnttab_query_foreign_access(gnt_list_entry->gref))
+ gnttab_query_foreign_access(info->xbdev->xh, gnt_list_entry->gref))
continue;

list_del(&gnt_list_entry->node);
- gnttab_end_foreign_access(gnt_list_entry->gref, 0, 0UL);
+ gnttab_end_foreign_access(info->xbdev->xh, gnt_list_entry->gref, 0, 0UL);
rinfo->persistent_gnts_c--;
gnt_list_entry->gref = GRANT_INVALID_REF;
list_add_tail(&gnt_list_entry->node, &rinfo->grants);
--
2.20.1

2019-06-07 14:53:20

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] xenhost support

On 09.05.19 19:25, Ankur Arora wrote:
> Hi all,
>
> This is an RFC for xenhost support, outlined here by Juergen here:
> https://lkml.org/lkml/2019/4/8/67.

First: thanks for all the effort you've put into this series!

> The high level idea is to provide an abstraction of the Xen
> communication interface, as a xenhost_t.
>
> xenhost_t expose ops for communication between the guest and Xen
> (hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on top
> of those, xenbus, ballooning), and these can differ based on the kind
> of underlying Xen: regular, local, and nested.

I'm not sure we need to abstract away hypercalls and cpuid. I believe in
case of nested Xen all contacts to the L0 hypervisor should be done via
the L1 hypervisor. So we might need to issue some kind of passthrough
hypercall when e.g. granting a page to L0 dom0, but this should be
handled via the grant abstraction (events should be similar).

So IMO we should drop patches 2-5.

> (Since this abstraction is largely about guest -- xenhost communication,
> no ops are needed for timer, clock, sched, memory (MMU, P2M), VCPU mgmt.
> etc.)
>
> Xenhost use-cases:
>
> Regular-Xen: the standard Xen interface presented to a guest,
> specifically for comunication between Lx-guest and Lx-Xen.
>
> Local-Xen: a Xen like interface which runs in the same address space as
> the guest (dom0). This, can act as the default xenhost.
>
> The major ways it differs from a regular Xen interface is in presenting
> a different hypercall interface (call instead of a syscall/vmcall), and
> in an inability to do grant-mappings: since local-Xen exists in the same
> address space as Xen, there's no way for it to cheaply change the
> physical page that a GFN maps to (assuming no P2M tables.)
>
> Nested-Xen: this channel is to Xen, one level removed: from L1-guest to
> L0-Xen. The use case is that we want L0-dom0-backends to talk to
> L1-dom0-frontend drivers which can then present PV devices which can
> in-turn be used by the L1-dom0-backend drivers as raw underlying devices.
> The interfaces themselves, broadly remain similar.
>
> Note: L0-Xen, L1-Xen represent Xen running at that nesting level
> and L0-guest, L1-guest represent guests that are children of Xen
> at that nesting level. Lx, represents any level.
>
> Patches 1-7,
> "x86/xen: add xenhost_t interface"
> "x86/xen: cpuid support in xenhost_t"
> "x86/xen: make hypercall_page generic"
> "x86/xen: hypercall support for xenhost_t"
> "x86/xen: add feature support in xenhost_t"
> "x86/xen: add shared_info support to xenhost_t"
> "x86/xen: make vcpu_info part of xenhost_t"
> abstract out interfaces that setup hypercalls/cpuid/shared_info/vcpu_info etc.
>
> Patch 8, "x86/xen: irq/upcall handling with multiple xenhosts"
> sets up the upcall and pv_irq ops based on vcpu_info.
>
> Patch 9, "xen/evtchn: support evtchn in xenhost_t" adds xenhost based
> evtchn support for evtchn_2l.
>
> Patches 10 and 16, "xen/balloon: support ballooning in xenhost_t" and
> "xen/grant-table: host_addr fixup in mapping on xenhost_r0"
> implement support from GNTTABOP_map_grant_ref for xenhosts of type
> xenhost_r0 (xenhost local.)
>
> Patch 12, "xen/xenbus: support xenbus frontend/backend with xenhost_t"
> makes xenbus so that both its frontend and backend can be bootstrapped
> separately via separate xenhosts.
>
> Remaining patches, 11, 13, 14, 15:
> "xen/grant-table: make grant-table xenhost aware"
> "drivers/xen: gnttab, evtchn, xenbus API changes"
> "xen/blk: gnttab, evtchn, xenbus API changes"
> "xen/net: gnttab, evtchn, xenbus API changes"
> are mostly mechanical changes for APIs that now take xenhost_t *
> as parameter.
>
> The code itself is RFC quality, and is mostly meant to get feedback before
> proceeding further. Also note that the FIFO logic and some Xen drivers
> (input, pciback, scsi etc) are mostly unchanged, so will not build.
>
>
> Please take a look.


Juergen

2019-06-07 15:06:48

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] x86/xen: add xenhost_t interface

On 09.05.19 19:25, Ankur Arora wrote:
> Add xenhost_t which will serve as an abstraction over Xen interfaces.
> It co-exists with the PV/HVM/PVH abstractions (x86_init, hypervisor_x86,
> pv_ops etc) and is meant to capture mechanisms for communication with
> Xen so we could have different types of underlying Xen: regular, local,
> and nested.
>
> Also add xenhost_register() and stub registration in the various guest
> types.
>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> arch/x86/xen/Makefile | 1 +
> arch/x86/xen/enlighten_hvm.c | 13 +++++
> arch/x86/xen/enlighten_pv.c | 16 ++++++
> arch/x86/xen/enlighten_pvh.c | 12 +++++
> arch/x86/xen/xenhost.c | 75 ++++++++++++++++++++++++++++
> include/xen/xen.h | 3 ++
> include/xen/xenhost.h | 95 ++++++++++++++++++++++++++++++++++++
> 7 files changed, 215 insertions(+)
> create mode 100644 arch/x86/xen/xenhost.c
> create mode 100644 include/xen/xenhost.h
>
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 084de77a109e..564b4dddbc15 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -18,6 +18,7 @@ obj-y += mmu.o
> obj-y += time.o
> obj-y += grant-table.o
> obj-y += suspend.o
> +obj-y += xenhost.o
>
> obj-$(CONFIG_XEN_PVHVM) += enlighten_hvm.o
> obj-$(CONFIG_XEN_PVHVM) += mmu_hvm.o
> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
> index 0e75642d42a3..100452f4f44c 100644
> --- a/arch/x86/xen/enlighten_hvm.c
> +++ b/arch/x86/xen/enlighten_hvm.c
> @@ -5,6 +5,7 @@
> #include <linux/kexec.h>
> #include <linux/memblock.h>
>
> +#include <xen/xenhost.h>
> #include <xen/features.h>
> #include <xen/events.h>
> #include <xen/interface/memory.h>
> @@ -82,6 +83,12 @@ static void __init xen_hvm_init_mem_mapping(void)
> xen_vcpu_info_reset(0);
> }
>
> +xenhost_ops_t xh_hvm_ops = {
> +};
> +
> +xenhost_ops_t xh_hvm_nested_ops = {
> +};
> +
> static void __init init_hvm_pv_info(void)
> {
> int major, minor;
> @@ -179,6 +186,12 @@ static void __init xen_hvm_guest_init(void)
> {
> if (xen_pv_domain())
> return;
> + /*
> + * We need only xenhost_r1 for HVM guests since they cannot be
> + * driver domain (?) or dom0.

I think even HVM guests could (in theory) be driver domains.

> + */
> + if (!xen_pvh_domain())
> + xenhost_register(xenhost_r1, &xh_hvm_ops);
>
> init_hvm_pv_info();
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index c54a493e139a..bb6e811c1525 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -36,6 +36,7 @@
>
> #include <xen/xen.h>
> #include <xen/events.h>
> +#include <xen/xenhost.h>
> #include <xen/interface/xen.h>
> #include <xen/interface/version.h>
> #include <xen/interface/physdev.h>
> @@ -1188,6 +1189,12 @@ static void __init xen_dom0_set_legacy_features(void)
> x86_platform.legacy.rtc = 1;
> }
>
> +xenhost_ops_t xh_pv_ops = {
> +};
> +
> +xenhost_ops_t xh_pv_nested_ops = {
> +};
> +
> /* First C function to be called on Xen boot */
> asmlinkage __visible void __init xen_start_kernel(void)
> {
> @@ -1198,6 +1205,15 @@ asmlinkage __visible void __init xen_start_kernel(void)
> if (!xen_start_info)
> return;
>
> + xenhost_register(xenhost_r1, &xh_pv_ops);
> +
> + /*
> + * Detect in some implementation defined manner whether this is
> + * nested or not.
> + */
> + if (xen_driver_domain() && xen_nested())
> + xenhost_register(xenhost_r2, &xh_pv_nested_ops);

I don't think a driver domain other than dom0 "knows" this in the
beginning. It will need to register xenhost_r2 in case it learns
about a pv device from L0 hypervisor.

> +
> xen_domain_type = XEN_PV_DOMAIN;
> xen_start_flags = xen_start_info->flags;
>
> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
> index 35b7599d2d0b..826c296d27a3 100644
> --- a/arch/x86/xen/enlighten_pvh.c
> +++ b/arch/x86/xen/enlighten_pvh.c
> @@ -8,6 +8,7 @@
> #include <asm/e820/api.h>
>
> #include <xen/xen.h>
> +#include <xen/xenhost.h>
> #include <asm/xen/interface.h>
> #include <asm/xen/hypercall.h>
>
> @@ -21,11 +22,22 @@
> */
> bool xen_pvh __attribute__((section(".data"))) = 0;
>
> +extern xenhost_ops_t xh_hvm_ops, xh_hvm_nested_ops;
> +
> void __init xen_pvh_init(void)
> {
> u32 msr;
> u64 pfn;
>
> + xenhost_register(xenhost_r1, &xh_hvm_ops);
> +
> + /*
> + * Detect in some implementation defined manner whether this is
> + * nested or not.
> + */
> + if (xen_driver_domain() && xen_nested())
> + xenhost_register(xenhost_r2, &xh_hvm_nested_ops);
> +
> xen_pvh = 1;
> xen_start_flags = pvh_start_info.flags;
>
> diff --git a/arch/x86/xen/xenhost.c b/arch/x86/xen/xenhost.c
> new file mode 100644
> index 000000000000..ca90acd7687e
> --- /dev/null
> +++ b/arch/x86/xen/xenhost.c
> @@ -0,0 +1,75 @@
> +#include <linux/types.h>
> +#include <linux/bug.h>
> +#include <xen/xen.h>
> +#include <xen/xenhost.h>
> +
> +xenhost_t xenhosts[2];
> +/*
> + * xh_default: interface to the regular hypervisor. xenhost_type is xenhost_r0
> + * or xenhost_r1.
> + *
> + * xh_remote: interface to remote hypervisor. Needed for PV driver support on
> + * L1-dom0/driver-domain for nested Xen. xenhost_type is xenhost_r2.
> + */
> +xenhost_t *xh_default = (xenhost_t *) &xenhosts[0];
> +xenhost_t *xh_remote = (xenhost_t *) &xenhosts[1];
> +
> +/*
> + * Exported for use of for_each_xenhost().
> + */
> +EXPORT_SYMBOL_GPL(xenhosts);
> +
> +/*
> + * Some places refer directly to a specific type of xenhost.
> + * This might be better as a macro though.
> + */
> +EXPORT_SYMBOL_GPL(xh_default);
> +EXPORT_SYMBOL_GPL(xh_remote);
> +
> +void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops)
> +{
> + switch (type) {
> + case xenhost_r0:
> + case xenhost_r1:
> + BUG_ON(xh_default->type != xenhost_invalid);
> +
> + xh_default->type = type;
> + xh_default->ops = ops;
> + break;
> + case xenhost_r2:
> + BUG_ON(xh_remote->type != xenhost_invalid);
> +
> + /*
> + * We should have a default xenhost by the
> + * time xh_remote is registered.
> + */
> + BUG_ON(!xh_default);
> +
> + xh_remote->type = type;
> + xh_remote->ops = ops;
> + break;
> + default:
> + BUG();
> + }
> +}
> +
> +/*
> + * __xenhost_unregister: expected to be called only if there's an
> + * error early in the init.
> + */
> +void __xenhost_unregister(enum xenhost_type type)
> +{
> + switch (type) {
> + case xenhost_r0:
> + case xenhost_r1:
> + xh_default->type = xenhost_invalid;
> + xh_default->ops = NULL;
> + break;
> + case xenhost_r2:
> + xh_remote->type = xenhost_invalid;
> + xh_remote->ops = NULL;
> + break;
> + default:
> + BUG();
> + }
> +}
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 0e2156786ad2..540db8459536 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -42,6 +42,9 @@ extern struct hvm_start_info pvh_start_info;
> #define xen_initial_domain() (0)
> #endif /* CONFIG_XEN_DOM0 */
>
> +#define xen_driver_domain() xen_initial_domain()
> +#define xen_nested() 0
> +
> struct bio_vec;
> bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> const struct bio_vec *vec2);
> diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
> new file mode 100644
> index 000000000000..a58e883f144e
> --- /dev/null
> +++ b/include/xen/xenhost.h
> @@ -0,0 +1,95 @@
> +#ifndef __XENHOST_H
> +#define __XENHOST_H
> +
> +/*
> + * Xenhost abstracts out the Xen interface. It co-exists with the PV/HVM/PVH
> + * abstractions (x86_init, hypervisor_x86, pv_ops etc) and is meant to
> + * expose ops for communication between the guest and Xen (hypercall, cpuid,
> + * shared_info/vcpu_info, evtchn, grant-table and on top of those, xenbus, ballooning),
> + * so these could differ based on the kind of underlying Xen: regular, local,
> + * and nested.
> + *
> + * Any call-sites which initiate communication with the hypervisor take
> + * xenhost_t * as a parameter and use the appropriate xenhost interface.
> + *
> + * Note, that the init for the nested xenhost (in the nested dom0 case,
> + * there are two) happens for each operation alongside the default xenhost
> + * (which remains similar to the one now) and is not deferred for later.
> + * This allows us to piggy-back on the non-trivial sequencing, inter-locking
> + * logic in the init of the default xenhost.
> + */
> +
> +/*
> + * xenhost_type: specifies the controlling Xen interface. The notation,
> + * xenhost_r0, xenhost_r1, xenhost_r2 is meant to invoke hypervisor distance
> + * from the guest.

This naming makes it hard to correlate the different things: In the
nested case xenhost_r2 means L0 hypervisor, same as in the non-nested
case xenhost_r1 does.

What about: xenhost_local (instead xenhost_r0), xenhost_direct (instead
xenhost_r1) and xenhost_nested (instead xenhost_r2). Or you use an
integer to denote the distance enabling even deeper nesting levels (at
least in theory).

> + *
> + * Note that the distance is relative, and so does not identify a specific
> + * hypervisor, just the role played by the interface: so, instance for L0-guest
> + * xenhost_r1 would be L0-Xen and for an L1-guest, L1-Xen.
> + */
> +enum xenhost_type {
> + xenhost_invalid = 0,
> + /*
> + * xenhost_r1: the guest's frontend or backend drivers talking
> + * to a hypervisor one level removed.
> + * This is the ordinary, non-nested configuration as well as for the
> + * typical nested frontends and backends.
> + *
> + * The corresponding xenhost_t would continue to use the current
> + * interfaces, via a redirection layer.
> + */
> + xenhost_r1,
> +
> + /*
> + * xenhost_r2: frontend drivers communicating with a hypervisor two
> + * levels removed: so L1-dom0-frontends communicating with L0-Xen.
> + *
> + * This is the nested-Xen configuration: L1-dom0-frontend drivers can
> + * now talk to L0-dom0-backend drivers via a separate xenhost_t.
> + */
> + xenhost_r2,
> +
> + /*
> + * Local/Co-located case: backend drivers now run in the same address
> + * space as the hypervisor. The driver model remains same as
> + * xenhost_r1, but with slightly different interfaces.
> + *
> + * Any frontend guests of this hypervisor will continue to be
> + * xenhost_r1.
> + */
> + xenhost_r0,
> +};
> +
> +struct xenhost_ops;
> +
> +typedef struct {
> + enum xenhost_type type;
> +
> + struct xenhost_ops *ops;
> +} xenhost_t;
> +
> +typedef struct xenhost_ops {
> +} xenhost_ops_t;
> +
> +extern xenhost_t *xh_default, *xh_remote;
> +extern xenhost_t xenhosts[2];

Use a max nesting level define here and ...

> +
> +/*
> + * xenhost_register(): is called early in the guest's xen-init, after it detects
> + * in some implementation defined manner what kind of underlying xenhost or
> + * xenhosts exist.
> + * Specifies the type of xenhost being registered and the ops for that.
> + */
> +void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops);
> +void __xenhost_unregister(enum xenhost_type type);
> +
> +
> +/*
> + * Convoluted interface so we can do this without adding a loop counter.
> + */
> +#define for_each_xenhost(xh) \
> + for ((xh) = (xenhost_t **) &xenhosts[0]; \
> + (((xh) - (xenhost_t **)&xenhosts) < 2) && (*xh)->type != xenhost_invalid; (xh)++)

... here, too.

> +
> +#endif /* __XENHOST_H */
>


Juergen

2019-06-07 15:10:13

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/xen: add shared_info support to xenhost_t

On 09.05.19 19:25, Ankur Arora wrote:
> HYPERVISOR_shared_info is used for irq/evtchn communication between the
> guest and the host. Abstract out the setup/reset in xenhost_t such that
> nested configurations can use both xenhosts simultaneously.

I have mixed feelings about this patch. Most of the shared_info stuff we
don't need for the nested case. In the end only the event channels might
be interesting, but we obviously want them not for all vcpus of the L1
hypervisor, but for those of the current guest.

So I think just drop that patch for now. We can dig it out later in case
nesting wants it again.


Juergen

2019-06-07 15:42:47

by Joao Martins

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] xenhost support

On 6/7/19 3:51 PM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Hi all,
>>
>> This is an RFC for xenhost support, outlined here by Juergen here:
>> https://lkml.org/lkml/2019/4/8/67.
>
> First: thanks for all the effort you've put into this series!
>
>> The high level idea is to provide an abstraction of the Xen
>> communication interface, as a xenhost_t.
>>
>> xenhost_t expose ops for communication between the guest and Xen
>> (hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on top
>> of those, xenbus, ballooning), and these can differ based on the kind
>> of underlying Xen: regular, local, and nested.
>
> I'm not sure we need to abstract away hypercalls and cpuid. I believe in
> case of nested Xen all contacts to the L0 hypervisor should be done via
> the L1 hypervisor. So we might need to issue some kind of passthrough
> hypercall when e.g. granting a page to L0 dom0, but this should be
> handled via the grant abstraction (events should be similar).
>
Just to be clear: By "kind of passthrough hypercall" you mean (e.g. for every
access/modify of grant table frames) you would proxy hypercall to L0 Xen via L1 Xen?

> So IMO we should drop patches 2-5.
>
>> (Since this abstraction is largely about guest -- xenhost communication,
>> no ops are needed for timer, clock, sched, memory (MMU, P2M), VCPU mgmt.
>> etc.)
>>
>> Xenhost use-cases:
>>
>> Regular-Xen: the standard Xen interface presented to a guest,
>> specifically for comunication between Lx-guest and Lx-Xen.
>>
>> Local-Xen: a Xen like interface which runs in the same address space as
>> the guest (dom0). This, can act as the default xenhost.
>>
>> The major ways it differs from a regular Xen interface is in presenting
>> a different hypercall interface (call instead of a syscall/vmcall), and
>> in an inability to do grant-mappings: since local-Xen exists in the same
>> address space as Xen, there's no way for it to cheaply change the
>> physical page that a GFN maps to (assuming no P2M tables.)
>>
>> Nested-Xen: this channel is to Xen, one level removed: from L1-guest to
>> L0-Xen. The use case is that we want L0-dom0-backends to talk to
>> L1-dom0-frontend drivers which can then present PV devices which can
>> in-turn be used by the L1-dom0-backend drivers as raw underlying devices.
>> The interfaces themselves, broadly remain similar.
>>
>> Note: L0-Xen, L1-Xen represent Xen running at that nesting level
>> and L0-guest, L1-guest represent guests that are children of Xen
>> at that nesting level. Lx, represents any level.
>>
>> Patches 1-7,
>> "x86/xen: add xenhost_t interface"
>> "x86/xen: cpuid support in xenhost_t"
>> "x86/xen: make hypercall_page generic"
>> "x86/xen: hypercall support for xenhost_t"
>> "x86/xen: add feature support in xenhost_t"
>> "x86/xen: add shared_info support to xenhost_t"
>> "x86/xen: make vcpu_info part of xenhost_t"
>> abstract out interfaces that setup hypercalls/cpuid/shared_info/vcpu_info etc.
>>
>> Patch 8, "x86/xen: irq/upcall handling with multiple xenhosts"
>> sets up the upcall and pv_irq ops based on vcpu_info.
>>
>> Patch 9, "xen/evtchn: support evtchn in xenhost_t" adds xenhost based
>> evtchn support for evtchn_2l.
>>
>> Patches 10 and 16, "xen/balloon: support ballooning in xenhost_t" and
>> "xen/grant-table: host_addr fixup in mapping on xenhost_r0"
>> implement support from GNTTABOP_map_grant_ref for xenhosts of type
>> xenhost_r0 (xenhost local.)
>>
>> Patch 12, "xen/xenbus: support xenbus frontend/backend with xenhost_t"
>> makes xenbus so that both its frontend and backend can be bootstrapped
>> separately via separate xenhosts.
>>
>> Remaining patches, 11, 13, 14, 15:
>> "xen/grant-table: make grant-table xenhost aware"
>> "drivers/xen: gnttab, evtchn, xenbus API changes"
>> "xen/blk: gnttab, evtchn, xenbus API changes"
>> "xen/net: gnttab, evtchn, xenbus API changes"
>> are mostly mechanical changes for APIs that now take xenhost_t *
>> as parameter.
>>
>> The code itself is RFC quality, and is mostly meant to get feedback before
>> proceeding further. Also note that the FIFO logic and some Xen drivers
>> (input, pciback, scsi etc) are mostly unchanged, so will not build.
>>
>>
>> Please take a look.
>
>
> Juergen
>

2019-06-07 16:22:50

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] xenhost support

On 07.06.19 17:22, Joao Martins wrote:
> On 6/7/19 3:51 PM, Juergen Gross wrote:
>> On 09.05.19 19:25, Ankur Arora wrote:
>>> Hi all,
>>>
>>> This is an RFC for xenhost support, outlined here by Juergen here:
>>> https://lkml.org/lkml/2019/4/8/67.
>>
>> First: thanks for all the effort you've put into this series!
>>
>>> The high level idea is to provide an abstraction of the Xen
>>> communication interface, as a xenhost_t.
>>>
>>> xenhost_t expose ops for communication between the guest and Xen
>>> (hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on top
>>> of those, xenbus, ballooning), and these can differ based on the kind
>>> of underlying Xen: regular, local, and nested.
>>
>> I'm not sure we need to abstract away hypercalls and cpuid. I believe in
>> case of nested Xen all contacts to the L0 hypervisor should be done via
>> the L1 hypervisor. So we might need to issue some kind of passthrough
>> hypercall when e.g. granting a page to L0 dom0, but this should be
>> handled via the grant abstraction (events should be similar).
>>
> Just to be clear: By "kind of passthrough hypercall" you mean (e.g. for every
> access/modify of grant table frames) you would proxy hypercall to L0 Xen via L1 Xen?

It might be possible to spare some hypercalls by directly writing to
grant frames mapped into L1 dom0, but in general you are right.


Juergen

2019-06-08 05:06:13

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 06/16] x86/xen: add shared_info support to xenhost_t

On 2019-06-07 8:08 a.m., Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> HYPERVISOR_shared_info is used for irq/evtchn communication between the
>> guest and the host. Abstract out the setup/reset in xenhost_t such that
>> nested configurations can use both xenhosts simultaneously.
>
> I have mixed feelings about this patch. Most of the shared_info stuff we
> don't need for the nested case. In the end only the event channels might
> be interesting, but we obviously want them not for all vcpus of the L1
> hypervisor, but for those of the current guest.
Agreed about the mixed feelings part. shared_info does feel far too
heavy to drag along just for the event-channel state.
Infact, on thinking a bit more, a better abstraction for nested
event-channels would have been as an extension to the primary
xenhost's event-channel bits.
(The nested upcalls also go via the primary xenhost in patch-8.)

Ankur

>
> So I think just drop that patch for now. We can dig it out later in case > nesting wants it again.
>
>
> Juergen

2019-06-08 06:14:23

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 00/16] xenhost support

On 2019-06-07 7:51 a.m., Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Hi all,
>>
>> This is an RFC for xenhost support, outlined here by Juergen here:
>> https://lkml.org/lkml/2019/4/8/67.
>
> First: thanks for all the effort you've put into this series!
>
>> The high level idea is to provide an abstraction of the Xen
>> communication interface, as a xenhost_t.
>>
>> xenhost_t expose ops for communication between the guest and Xen
>> (hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on top
>> of those, xenbus, ballooning), and these can differ based on the kind
>> of underlying Xen: regular, local, and nested.
>
> I'm not sure we need to abstract away hypercalls and cpuid. I believe in
> case of nested Xen all contacts to the L0 hypervisor should be done via
> the L1 hypervisor. So we might need to issue some kind of passthrough
Yes, that does make sense. This also allows the L1 hypervisor to
control which hypercalls can be nested.
As for cpuid, what about nested feature discovery such as in
gnttab_need_v2()?
(Though for this particular case, the hypercall should be fine.)

> hypercall when e.g. granting a page to L0 dom0, but this should be
> handled via the grant abstraction (events should be similar).
>
> So IMO we should drop patches 2-5.
For 3-5, I'd like to prune them to provide a limited hypercall
registration ability -- this is meant to be used for the
xenhost_r0/xenhost_local case.

Ankur

>
>> (Since this abstraction is largely about guest -- xenhost communication,
>> no ops are needed for timer, clock, sched, memory (MMU, P2M), VCPU mgmt.
>> etc.)
>>
>> Xenhost use-cases:
>>
>> Regular-Xen: the standard Xen interface presented to a guest,
>> specifically for comunication between Lx-guest and Lx-Xen.
>>
>> Local-Xen: a Xen like interface which runs in the same address space as
>> the guest (dom0). This, can act as the default xenhost.
>>
>> The major ways it differs from a regular Xen interface is in presenting
>> a different hypercall interface (call instead of a syscall/vmcall), and
>> in an inability to do grant-mappings: since local-Xen exists in the same
>> address space as Xen, there's no way for it to cheaply change the
>> physical page that a GFN maps to (assuming no P2M tables.)
>>
>> Nested-Xen: this channel is to Xen, one level removed: from L1-guest to
>> L0-Xen. The use case is that we want L0-dom0-backends to talk to
>> L1-dom0-frontend drivers which can then present PV devices which can
>> in-turn be used by the L1-dom0-backend drivers as raw underlying devices.
>> The interfaces themselves, broadly remain similar.
>>
>> Note: L0-Xen, L1-Xen represent Xen running at that nesting level
>> and L0-guest, L1-guest represent guests that are children of Xen
>> at that nesting level. Lx, represents any level.
>>
>> Patches 1-7,
>>    "x86/xen: add xenhost_t interface"
>>    "x86/xen: cpuid support in xenhost_t"
>>    "x86/xen: make hypercall_page generic"
>>    "x86/xen: hypercall support for xenhost_t"
>>    "x86/xen: add feature support in xenhost_t"
>>    "x86/xen: add shared_info support to xenhost_t"
>>    "x86/xen: make vcpu_info part of xenhost_t"
>> abstract out interfaces that setup
>> hypercalls/cpuid/shared_info/vcpu_info etc.
>>
>> Patch 8, "x86/xen: irq/upcall handling with multiple xenhosts"
>> sets up the upcall and pv_irq ops based on vcpu_info.
>>
>> Patch 9, "xen/evtchn: support evtchn in xenhost_t" adds xenhost based
>> evtchn support for evtchn_2l.
>>
>> Patches 10 and 16, "xen/balloon: support ballooning in xenhost_t" and
>> "xen/grant-table: host_addr fixup in mapping on xenhost_r0"
>> implement support from GNTTABOP_map_grant_ref for xenhosts of type
>> xenhost_r0 (xenhost local.)
>>
>> Patch 12, "xen/xenbus: support xenbus frontend/backend with xenhost_t"
>> makes xenbus so that both its frontend and backend can be bootstrapped
>> separately via separate xenhosts.
>>
>> Remaining patches, 11, 13, 14, 15:
>>    "xen/grant-table: make grant-table xenhost aware"
>>    "drivers/xen: gnttab, evtchn, xenbus API changes"
>>    "xen/blk: gnttab, evtchn, xenbus API changes"
>>    "xen/net: gnttab, evtchn, xenbus API changes"
>> are mostly mechanical changes for APIs that now take xenhost_t *
>> as parameter.
>>
>> The code itself is RFC quality, and is mostly meant to get feedback
>> before
>> proceeding further. Also note that the FIFO logic and some Xen drivers
>> (input, pciback, scsi etc) are mostly unchanged, so will not build.
>>
>>
>> Please take a look.
>
>
> Juergen

2019-06-08 06:16:14

by Ankur Arora

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 00/16] xenhost support

On 2019-06-07 9:21 a.m., Juergen Gross wrote:
> On 07.06.19 17:22, Joao Martins wrote:
>> On 6/7/19 3:51 PM, Juergen Gross wrote:
>>> On 09.05.19 19:25, Ankur Arora wrote:
>>>> Hi all,
>>>>
>>>> This is an RFC for xenhost support, outlined here by Juergen here:
>>>> https://lkml.org/lkml/2019/4/8/67.
>>>
>>> First: thanks for all the effort you've put into this series!
>>>
>>>> The high level idea is to provide an abstraction of the Xen
>>>> communication interface, as a xenhost_t.
>>>>
>>>> xenhost_t expose ops for communication between the guest and Xen
>>>> (hypercall, cpuid, shared_info/vcpu_info, evtchn, grant-table and on
>>>> top
>>>> of those, xenbus, ballooning), and these can differ based on the kind
>>>> of underlying Xen: regular, local, and nested.
>>>
>>> I'm not sure we need to abstract away hypercalls and cpuid. I believe in
>>> case of nested Xen all contacts to the L0 hypervisor should be done via
>>> the L1 hypervisor. So we might need to issue some kind of passthrough
>>> hypercall when e.g. granting a page to L0 dom0, but this should be
>>> handled via the grant abstraction (events should be similar).
>>>
>> Just to be clear: By "kind of passthrough hypercall" you mean (e.g.
>> for every
>> access/modify of grant table frames) you would proxy hypercall to L0
>> Xen via L1 Xen?
>
> It might be possible to spare some hypercalls by directly writing to
> grant frames mapped into L1 dom0, but in general you are right.
Wouldn't we still need map/unmap_grant_ref?
AFAICS, both the xenhost_direct and the xenhost_indirect cases should be
very similar (apart from the need to proxy in the indirect case.)

Ankur

>
>
> Juergen
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel

2019-06-11 07:17:23

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] x86/xen: add xenhost_t interface

On 2019-06-07 8:04 a.m., Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Add xenhost_t which will serve as an abstraction over Xen interfaces.
>> It co-exists with the PV/HVM/PVH abstractions (x86_init, hypervisor_x86,
>> pv_ops etc) and is meant to capture mechanisms for communication with
>> Xen so we could have different types of underlying Xen: regular, local,
>> and nested.
>>
>> Also add xenhost_register() and stub registration in the various guest
>> types.
>>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>>   arch/x86/xen/Makefile        |  1 +
>>   arch/x86/xen/enlighten_hvm.c | 13 +++++
>>   arch/x86/xen/enlighten_pv.c  | 16 ++++++
>>   arch/x86/xen/enlighten_pvh.c | 12 +++++
>>   arch/x86/xen/xenhost.c       | 75 ++++++++++++++++++++++++++++
>>   include/xen/xen.h            |  3 ++
>>   include/xen/xenhost.h        | 95 ++++++++++++++++++++++++++++++++++++
>>   7 files changed, 215 insertions(+)
>>   create mode 100644 arch/x86/xen/xenhost.c
>>   create mode 100644 include/xen/xenhost.h
>>
>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>> index 084de77a109e..564b4dddbc15 100644
>> --- a/arch/x86/xen/Makefile
>> +++ b/arch/x86/xen/Makefile
>> @@ -18,6 +18,7 @@ obj-y                += mmu.o
>>   obj-y                += time.o
>>   obj-y                += grant-table.o
>>   obj-y                += suspend.o
>> +obj-y                += xenhost.o
>>   obj-$(CONFIG_XEN_PVHVM)        += enlighten_hvm.o
>>   obj-$(CONFIG_XEN_PVHVM)        += mmu_hvm.o
>> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
>> index 0e75642d42a3..100452f4f44c 100644
>> --- a/arch/x86/xen/enlighten_hvm.c
>> +++ b/arch/x86/xen/enlighten_hvm.c
>> @@ -5,6 +5,7 @@
>>   #include <linux/kexec.h>
>>   #include <linux/memblock.h>
>> +#include <xen/xenhost.h>
>>   #include <xen/features.h>
>>   #include <xen/events.h>
>>   #include <xen/interface/memory.h>
>> @@ -82,6 +83,12 @@ static void __init xen_hvm_init_mem_mapping(void)
>>       xen_vcpu_info_reset(0);
>>   }
>> +xenhost_ops_t xh_hvm_ops = {
>> +};
>> +
>> +xenhost_ops_t xh_hvm_nested_ops = {
>> +};
>> +
>>   static void __init init_hvm_pv_info(void)
>>   {
>>       int major, minor;
>> @@ -179,6 +186,12 @@ static void __init xen_hvm_guest_init(void)
>>   {
>>       if (xen_pv_domain())
>>           return;
>> +    /*
>> +     * We need only xenhost_r1 for HVM guests since they cannot be
>> +     * driver domain (?) or dom0.
>
> I think even HVM guests could (in theory) be driver domains.
>
>> +     */
>> +    if (!xen_pvh_domain())
>> +        xenhost_register(xenhost_r1, &xh_hvm_ops);
>>       init_hvm_pv_info();
>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>> index c54a493e139a..bb6e811c1525 100644
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -36,6 +36,7 @@
>>   #include <xen/xen.h>
>>   #include <xen/events.h>
>> +#include <xen/xenhost.h>
>>   #include <xen/interface/xen.h>
>>   #include <xen/interface/version.h>
>>   #include <xen/interface/physdev.h>
>> @@ -1188,6 +1189,12 @@ static void __init
>> xen_dom0_set_legacy_features(void)
>>       x86_platform.legacy.rtc = 1;
>>   }
>> +xenhost_ops_t xh_pv_ops = {
>> +};
>> +
>> +xenhost_ops_t xh_pv_nested_ops = {
>> +};
>> +
>>   /* First C function to be called on Xen boot */
>>   asmlinkage __visible void __init xen_start_kernel(void)
>>   {
>> @@ -1198,6 +1205,15 @@ asmlinkage __visible void __init
>> xen_start_kernel(void)
>>       if (!xen_start_info)
>>           return;
>> +    xenhost_register(xenhost_r1, &xh_pv_ops);
>> +
>> +    /*
>> +     * Detect in some implementation defined manner whether this is
>> +     * nested or not.
>> +     */
>> +    if (xen_driver_domain() && xen_nested())
>> +        xenhost_register(xenhost_r2, &xh_pv_nested_ops);
>
> I don't think a driver domain other than dom0 "knows" this in the
> beginning. It will need to register xenhost_r2
Right. No point in needlessly registrating as xenhost_r2 without
needing to handle any xenhost_r2 devices.

> in case it learns about a pv device from L0 hypervisor.
What's the mechanism you are thinking of, for this?
I'm guessing this PV device notification could arrive at an
arbitrary point in time after the system has booted.

The earlier reason for my assumption that the driver-domain
would "know" this at boot, was because it seemed to me
that we would need to setup hypercall/shared_info/vcpu_info.

Given that we don't need cpuid/hypercall/shared_info, the remaining
few look like they could be made dynamically callable with a bit
of refactoring:
- vcpu_info: the registration logic (xen_vcpu_setup() and friends)
seems straight-forwardly adaptable to be called dynamically for
xenhost_r2. Places where we touch the vcpu_info bits (xen_irq_ops)
also seem fine.
- evtchn: xenhost_r2 should only need interdomain evtchns, so
should be easy to defer to until we get a xenhost_r2 device.
- grant-table/xenbus: the xenhost_r2 logic (in the current patchset)
expects to be inited at core_initcall and postcore_initcall
respectively. Again, doesn't

>
>> +
>>       xen_domain_type = XEN_PV_DOMAIN;
>>       xen_start_flags = xen_start_info->flags;
>> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
>> index 35b7599d2d0b..826c296d27a3 100644
>> --- a/arch/x86/xen/enlighten_pvh.c
>> +++ b/arch/x86/xen/enlighten_pvh.c
>> @@ -8,6 +8,7 @@
>>   #include <asm/e820/api.h>
>>   #include <xen/xen.h>
>> +#include <xen/xenhost.h>
>>   #include <asm/xen/interface.h>
>>   #include <asm/xen/hypercall.h>
>> @@ -21,11 +22,22 @@
>>    */
>>   bool xen_pvh __attribute__((section(".data"))) = 0;
>> +extern xenhost_ops_t xh_hvm_ops, xh_hvm_nested_ops;
>> +
>>   void __init xen_pvh_init(void)
>>   {
>>       u32 msr;
>>       u64 pfn;
>> +    xenhost_register(xenhost_r1, &xh_hvm_ops);
>> +
>> +    /*
>> +     * Detect in some implementation defined manner whether this is
>> +     * nested or not.
>> +     */
>> +    if (xen_driver_domain() && xen_nested())
>> +        xenhost_register(xenhost_r2, &xh_hvm_nested_ops);
>> +
>>       xen_pvh = 1;
>>       xen_start_flags = pvh_start_info.flags;
>> diff --git a/arch/x86/xen/xenhost.c b/arch/x86/xen/xenhost.c
>> new file mode 100644
>> index 000000000000..ca90acd7687e
>> --- /dev/null
>> +++ b/arch/x86/xen/xenhost.c
>> @@ -0,0 +1,75 @@
>> +#include <linux/types.h>
>> +#include <linux/bug.h>
>> +#include <xen/xen.h>
>> +#include <xen/xenhost.h>
>> +
>> +xenhost_t xenhosts[2];
>> +/*
>> + * xh_default: interface to the regular hypervisor. xenhost_type is
>> xenhost_r0
>> + * or xenhost_r1.
>> + *
>> + * xh_remote: interface to remote hypervisor. Needed for PV driver
>> support on
>> + * L1-dom0/driver-domain for nested Xen. xenhost_type is xenhost_r2.
>> + */
>> +xenhost_t *xh_default = (xenhost_t *) &xenhosts[0];
>> +xenhost_t *xh_remote = (xenhost_t *) &xenhosts[1];
>> +
>> +/*
>> + * Exported for use of for_each_xenhost().
>> + */
>> +EXPORT_SYMBOL_GPL(xenhosts);
>> +
>> +/*
>> + * Some places refer directly to a specific type of xenhost.
>> + * This might be better as a macro though.
>> + */
>> +EXPORT_SYMBOL_GPL(xh_default);
>> +EXPORT_SYMBOL_GPL(xh_remote);
>> +
>> +void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops)
>> +{
>> +    switch (type) {
>> +        case xenhost_r0:
>> +        case xenhost_r1:
>> +            BUG_ON(xh_default->type != xenhost_invalid);
>> +
>> +            xh_default->type = type;
>> +            xh_default->ops = ops;
>> +            break;
>> +        case xenhost_r2:
>> +            BUG_ON(xh_remote->type != xenhost_invalid);
>> +
>> +            /*
>> +             * We should have a default xenhost by the
>> +             * time xh_remote is registered.
>> +             */
>> +            BUG_ON(!xh_default);
>> +
>> +            xh_remote->type = type;
>> +            xh_remote->ops = ops;
>> +            break;
>> +        default:
>> +            BUG();
>> +    }
>> +}
>> +
>> +/*
>> + * __xenhost_unregister: expected to be called only if there's an
>> + * error early in the init.
>> + */
>> +void __xenhost_unregister(enum xenhost_type type)
>> +{
>> +    switch (type) {
>> +        case xenhost_r0:
>> +        case xenhost_r1:
>> +            xh_default->type = xenhost_invalid;
>> +            xh_default->ops = NULL;
>> +            break;
>> +        case xenhost_r2:
>> +            xh_remote->type = xenhost_invalid;
>> +            xh_remote->ops = NULL;
>> +            break;
>> +        default:
>> +            BUG();
>> +    }
>> +}
>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>> index 0e2156786ad2..540db8459536 100644
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -42,6 +42,9 @@ extern struct hvm_start_info pvh_start_info;
>>   #define xen_initial_domain()    (0)
>>   #endif    /* CONFIG_XEN_DOM0 */
>> +#define xen_driver_domain()    xen_initial_domain()
>> +#define xen_nested()    0
>> +
>>   struct bio_vec;
>>   bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
>>           const struct bio_vec *vec2);
>> diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
>> new file mode 100644
>> index 000000000000..a58e883f144e
>> --- /dev/null
>> +++ b/include/xen/xenhost.h
>> @@ -0,0 +1,95 @@
>> +#ifndef __XENHOST_H
>> +#define __XENHOST_H
>> +
>> +/*
>> + * Xenhost abstracts out the Xen interface. It co-exists with the
>> PV/HVM/PVH
>> + * abstractions (x86_init, hypervisor_x86, pv_ops etc) and is meant to
>> + * expose ops for communication between the guest and Xen (hypercall,
>> cpuid,
>> + * shared_info/vcpu_info, evtchn, grant-table and on top of those,
>> xenbus, ballooning),
>> + * so these could differ based on the kind of underlying Xen:
>> regular, local,
>> + * and nested.
>> + *
>> + * Any call-sites which initiate communication with the hypervisor take
>> + * xenhost_t * as a parameter and use the appropriate xenhost interface.
>> + *
>> + * Note, that the init for the nested xenhost (in the nested dom0 case,
>> + * there are two) happens for each operation alongside the default
>> xenhost
>> + * (which remains similar to the one now) and is not deferred for later.
>> + * This allows us to piggy-back on the non-trivial sequencing,
>> inter-locking
>> + * logic in the init of the default xenhost.
>> + */
>> +
>> +/*
>> + * xenhost_type: specifies the controlling Xen interface. The notation,
>> + * xenhost_r0, xenhost_r1, xenhost_r2 is meant to invoke hypervisor
>> distance
>> + * from the guest.
>
> This naming makes it hard to correlate the different things: In the
> nested case xenhost_r2 means L0 hypervisor, same as in the non-nested
> case xenhost_r1 does.
Agreed.

>
> What about: xenhost_local (instead xenhost_r0), xenhost_direct (instead
> xenhost_r1) and xenhost_nested (instead xenhost_r2). Or you use an
> integer to denote the distance enabling even deeper nesting levels (at
> least in theory).
These are clearer. Will change.

>
>> + *
>> + * Note that the distance is relative, and so does not identify a
>> specific
>> + * hypervisor, just the role played by the interface: so, instance
>> for L0-guest
>> + * xenhost_r1 would be L0-Xen and for an L1-guest, L1-Xen.
>> + */
>> +enum xenhost_type {
>> +    xenhost_invalid = 0,
>> +    /*
>> +     * xenhost_r1: the guest's frontend or backend drivers talking
>> +     * to a hypervisor one level removed.
>> +     * This is the ordinary, non-nested configuration as well as for the
>> +     * typical nested frontends and backends.
>> +     *
>> +     * The corresponding xenhost_t would continue to use the current
>> +     * interfaces, via a redirection layer.
>> +     */
>> +    xenhost_r1,
>> +
>> +    /*
>> +     * xenhost_r2: frontend drivers communicating with a hypervisor two
>> +     * levels removed: so L1-dom0-frontends communicating with L0-Xen.
>> +     *
>> +     * This is the nested-Xen configuration: L1-dom0-frontend drivers
>> can
>> +     * now talk to L0-dom0-backend drivers via a separate xenhost_t.
>> +     */
>> +    xenhost_r2,
>> +
>> +    /*
>> +     * Local/Co-located case: backend drivers now run in the same
>> address
>> +     * space as the hypervisor. The driver model remains same as
>> +     * xenhost_r1, but with slightly different interfaces.
>> +     *
>> +     * Any frontend guests of this hypervisor will continue to be
>> +     * xenhost_r1.
>> +     */
>> +    xenhost_r0,
>> +};
>> +
>> +struct xenhost_ops;
>> +
>> +typedef struct {
>> +    enum xenhost_type type;
>> +
>> +    struct xenhost_ops *ops;
>> +} xenhost_t;
>> +
>> +typedef struct xenhost_ops {
>> +} xenhost_ops_t;
>> +
>> +extern xenhost_t *xh_default, *xh_remote;
>> +extern xenhost_t xenhosts[2];
>
> Use a max nesting level define here and ...
>
>> +
>> +/*
>> + * xenhost_register(): is called early in the guest's xen-init, after
>> it detects
>> + * in some implementation defined manner what kind of underlying
>> xenhost or
>> + * xenhosts exist.
>> + * Specifies the type of xenhost being registered and the ops for that.
>> + */
>> +void xenhost_register(enum xenhost_type type, xenhost_ops_t *ops);
>> +void __xenhost_unregister(enum xenhost_type type);
>> +
>> +
>> +/*
>> + * Convoluted interface so we can do this without adding a loop counter.
>> + */
>> +#define for_each_xenhost(xh) \
>> +    for ((xh) = (xenhost_t **) &xenhosts[0];    \
>> +        (((xh) - (xenhost_t **)&xenhosts) < 2) && (*xh)->type !=
>> xenhost_invalid; (xh)++)
>
> ... here, too.
Sure.

Ankur

>
>> +
>> +#endif /* __XENHOST_H */
>>
>
>
> Juergen

2019-06-12 21:16:17

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 04/16] x86/xen: hypercall support for xenhost_t

On 09/05/2019 18:25, Ankur Arora wrote:
> Allow for different hypercall implementations for different xenhost types.
> Nested xenhost, which has two underlying xenhosts, can use both
> simultaneously.
>
> The hypercall macros (HYPERVISOR_*) implicitly use the default xenhost.x
> A new macro (hypervisor_*) takes xenhost_t * as a parameter and does the
> right thing.
>
> TODO:
> - Multicalls for now assume the default xenhost
> - xen_hypercall_* symbols are only generated for the default xenhost.
>
> Signed-off-by: Ankur Arora <[email protected]>

Again, what is the hypervisor nesting and/or guest layout here?

I can't think of any case where a single piece of software can
legitimately have two hypercall pages, because if it has one working
one, it is by definition a guest, and therefore not privileged enough to
use the outer one.

~Andrew

2019-06-12 21:18:47

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 02/16] x86/xen: cpuid support in xenhost_t

On 09/05/2019 18:25, Ankur Arora wrote:
> xen_cpuid_base() is used to probe and setup features early in a
> guest's lifetime.
>
> We want this to behave differently depending on xenhost->type: for
> instance, local xenhosts cannot intercept the cpuid instruction at all.
>
> Add op (*cpuid_base)() in xenhost_ops_t.
>
> Signed-off-by: Ankur Arora <[email protected]>

What is the real layout of hypervisor nesting here?

When Xen is at L0, all HVM guests get working CPUID faulting to combat
this problem, because CPUID faulting can be fully emulated even on older
Intel hardware, and AMD hardware.

It is a far cleaner way of fixing the problem.

~Andrew

2019-06-14 07:22:15

by Ankur Arora

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 04/16] x86/xen: hypercall support for xenhost_t

On 2019-06-12 2:15 p.m., Andrew Cooper wrote:
> On 09/05/2019 18:25, Ankur Arora wrote:
>> Allow for different hypercall implementations for different xenhost types.
>> Nested xenhost, which has two underlying xenhosts, can use both
>> simultaneously.
>>
>> The hypercall macros (HYPERVISOR_*) implicitly use the default xenhost.x
>> A new macro (hypervisor_*) takes xenhost_t * as a parameter and does the
>> right thing.
>>
>> TODO:
>> - Multicalls for now assume the default xenhost
>> - xen_hypercall_* symbols are only generated for the default xenhost.
>>
>> Signed-off-by: Ankur Arora <[email protected]>
>
> Again, what is the hypervisor nesting and/or guest layout here?
Two hypervisors, L0 and L1, and the guest is a child of the L1
hypervisor but could have PV devices attached to both L0 and L1
hypervisors.

>
> I can't think of any case where a single piece of software can
> legitimately have two hypercall pages, because if it has one working
> one, it is by definition a guest, and therefore not privileged enough to
> use the outer one.
Depending on which hypercall page is used, the hypercall would
(eventually) land in the corresponding hypervisor.

Juergen elsewhere pointed out proxying hypercalls is a better approach,
so I'm not really considering this any more but, given this layout, and
assuming that the hypercall pages could be encoded differently would it
still not work?

Ankur

>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
>

2019-06-14 07:36:02

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 04/16] x86/xen: hypercall support for xenhost_t

On 14.06.19 09:20, Ankur Arora wrote:
> On 2019-06-12 2:15 p.m., Andrew Cooper wrote:
>> On 09/05/2019 18:25, Ankur Arora wrote:
>>> Allow for different hypercall implementations for different xenhost
>>> types.
>>> Nested xenhost, which has two underlying xenhosts, can use both
>>> simultaneously.
>>>
>>> The hypercall macros (HYPERVISOR_*) implicitly use the default xenhost.x
>>> A new macro (hypervisor_*) takes xenhost_t * as a parameter and does the
>>> right thing.
>>>
>>> TODO:
>>>    - Multicalls for now assume the default xenhost
>>>    - xen_hypercall_* symbols are only generated for the default xenhost.
>>>
>>> Signed-off-by: Ankur Arora <[email protected]>
>>
>> Again, what is the hypervisor nesting and/or guest layout here?
> Two hypervisors, L0 and L1, and the guest is a child of the L1
> hypervisor but could have PV devices attached to both L0 and L1
> hypervisors.
>
>>
>> I can't think of any case where a single piece of software can
>> legitimately have two hypercall pages, because if it has one working
>> one, it is by definition a guest, and therefore not privileged enough to
>> use the outer one.
> Depending on which hypercall page is used, the hypercall would
> (eventually) land in the corresponding hypervisor.
>
> Juergen elsewhere pointed out proxying hypercalls is a better approach,
> so I'm not really considering this any more but, given this layout, and
> assuming that the hypercall pages could be encoded differently would it
> still not work?

Hypercalls might work, but it is a bad idea and a violation of layering
to let a L1 guest issue hypercalls to L0 hypervisor, as those hypercalls
could influence other L1 guests and even the L1 hypervisor.

Hmm, thinking more about it, I even doubt those hypercalls could work in
all cases: when issued from a L1 PV guest the hypercalls would seem to
be issued from user mode for the L0 hypervisor, and this is not allowed.


Juergen

2019-06-14 08:02:46

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC PATCH 04/16] x86/xen: hypercall support for xenhost_t

On 14/06/2019 08:35, Juergen Gross wrote:
> On 14.06.19 09:20, Ankur Arora wrote:
>> On 2019-06-12 2:15 p.m., Andrew Cooper wrote:
>>> On 09/05/2019 18:25, Ankur Arora wrote:
>>>> Allow for different hypercall implementations for different xenhost
>>>> types.
>>>> Nested xenhost, which has two underlying xenhosts, can use both
>>>> simultaneously.
>>>>
>>>> The hypercall macros (HYPERVISOR_*) implicitly use the default
>>>> xenhost.x
>>>> A new macro (hypervisor_*) takes xenhost_t * as a parameter and
>>>> does the
>>>> right thing.
>>>>
>>>> TODO:
>>>>    - Multicalls for now assume the default xenhost
>>>>    - xen_hypercall_* symbols are only generated for the default
>>>> xenhost.
>>>>
>>>> Signed-off-by: Ankur Arora <[email protected]>
>>>
>>> Again, what is the hypervisor nesting and/or guest layout here?
>> Two hypervisors, L0 and L1, and the guest is a child of the L1
>> hypervisor but could have PV devices attached to both L0 and L1
>> hypervisors.
>>
>>>
>>> I can't think of any case where a single piece of software can
>>> legitimately have two hypercall pages, because if it has one working
>>> one, it is by definition a guest, and therefore not privileged
>>> enough to
>>> use the outer one.
>> Depending on which hypercall page is used, the hypercall would
>> (eventually) land in the corresponding hypervisor.
>>
>> Juergen elsewhere pointed out proxying hypercalls is a better approach,
>> so I'm not really considering this any more but, given this layout, and
>> assuming that the hypercall pages could be encoded differently would it
>> still not work?
>
> Hypercalls might work, but it is a bad idea and a violation of layering
> to let a L1 guest issue hypercalls to L0 hypervisor, as those hypercalls
> could influence other L1 guests and even the L1 hypervisor.
>
> Hmm, thinking more about it, I even doubt those hypercalls could work in
> all cases: when issued from a L1 PV guest the hypercalls would seem to
> be issued from user mode for the L0 hypervisor, and this is not allowed.

That is exactly the point I was trying to make.

If L2 is an HVM guest, then both its hypercall pages will be using
VMCALL/VMMCALL which will end up making hypercalls to L1, rather than
having one go to L0.

If L2 is a PV guest, then one hypercall page will be SYSCALL/INT 82
which will go to L1, and one will be VMCALL/VMMCALL which goes to L0,
but L0 will see it from ring1/ring3 and reject the hypercall.

However you nest the system, every guest only has a single occurrence of
"supervisor software", so only has a single context that will be
tolerated to make hypercalls by the next hypervisor up.

~Andrew

2019-06-14 11:53:38

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 01/16] x86/xen: add xenhost_t interface

On 11.06.19 09:16, Ankur Arora wrote:
> On 2019-06-07 8:04 a.m., Juergen Gross wrote:
>> On 09.05.19 19:25, Ankur Arora wrote:
>>> Add xenhost_t which will serve as an abstraction over Xen interfaces.
>>> It co-exists with the PV/HVM/PVH abstractions (x86_init, hypervisor_x86,
>>> pv_ops etc) and is meant to capture mechanisms for communication with
>>> Xen so we could have different types of underlying Xen: regular, local,
>>> and nested.
>>>
>>> Also add xenhost_register() and stub registration in the various guest
>>> types.
>>>
>>> Signed-off-by: Ankur Arora <[email protected]>
>>> ---
>>>   arch/x86/xen/Makefile        |  1 +
>>>   arch/x86/xen/enlighten_hvm.c | 13 +++++
>>>   arch/x86/xen/enlighten_pv.c  | 16 ++++++
>>>   arch/x86/xen/enlighten_pvh.c | 12 +++++
>>>   arch/x86/xen/xenhost.c       | 75 ++++++++++++++++++++++++++++
>>>   include/xen/xen.h            |  3 ++
>>>   include/xen/xenhost.h        | 95 ++++++++++++++++++++++++++++++++++++
>>>   7 files changed, 215 insertions(+)
>>>   create mode 100644 arch/x86/xen/xenhost.c
>>>   create mode 100644 include/xen/xenhost.h
>>>
>>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>>> index 084de77a109e..564b4dddbc15 100644
>>> --- a/arch/x86/xen/Makefile
>>> +++ b/arch/x86/xen/Makefile
>>> @@ -18,6 +18,7 @@ obj-y                += mmu.o
>>>   obj-y                += time.o
>>>   obj-y                += grant-table.o
>>>   obj-y                += suspend.o
>>> +obj-y                += xenhost.o
>>>   obj-$(CONFIG_XEN_PVHVM)        += enlighten_hvm.o
>>>   obj-$(CONFIG_XEN_PVHVM)        += mmu_hvm.o
>>> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
>>> index 0e75642d42a3..100452f4f44c 100644
>>> --- a/arch/x86/xen/enlighten_hvm.c
>>> +++ b/arch/x86/xen/enlighten_hvm.c
>>> @@ -5,6 +5,7 @@
>>>   #include <linux/kexec.h>
>>>   #include <linux/memblock.h>
>>> +#include <xen/xenhost.h>
>>>   #include <xen/features.h>
>>>   #include <xen/events.h>
>>>   #include <xen/interface/memory.h>
>>> @@ -82,6 +83,12 @@ static void __init xen_hvm_init_mem_mapping(void)
>>>       xen_vcpu_info_reset(0);
>>>   }
>>> +xenhost_ops_t xh_hvm_ops = {
>>> +};
>>> +
>>> +xenhost_ops_t xh_hvm_nested_ops = {
>>> +};
>>> +
>>>   static void __init init_hvm_pv_info(void)
>>>   {
>>>       int major, minor;
>>> @@ -179,6 +186,12 @@ static void __init xen_hvm_guest_init(void)
>>>   {
>>>       if (xen_pv_domain())
>>>           return;
>>> +    /*
>>> +     * We need only xenhost_r1 for HVM guests since they cannot be
>>> +     * driver domain (?) or dom0.
>>
>> I think even HVM guests could (in theory) be driver domains.
>>
>>> +     */
>>> +    if (!xen_pvh_domain())
>>> +        xenhost_register(xenhost_r1, &xh_hvm_ops);
>>>       init_hvm_pv_info();
>>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>>> index c54a493e139a..bb6e811c1525 100644
>>> --- a/arch/x86/xen/enlighten_pv.c
>>> +++ b/arch/x86/xen/enlighten_pv.c
>>> @@ -36,6 +36,7 @@
>>>   #include <xen/xen.h>
>>>   #include <xen/events.h>
>>> +#include <xen/xenhost.h>
>>>   #include <xen/interface/xen.h>
>>>   #include <xen/interface/version.h>
>>>   #include <xen/interface/physdev.h>
>>> @@ -1188,6 +1189,12 @@ static void __init
>>> xen_dom0_set_legacy_features(void)
>>>       x86_platform.legacy.rtc = 1;
>>>   }
>>> +xenhost_ops_t xh_pv_ops = {
>>> +};
>>> +
>>> +xenhost_ops_t xh_pv_nested_ops = {
>>> +};
>>> +
>>>   /* First C function to be called on Xen boot */
>>>   asmlinkage __visible void __init xen_start_kernel(void)
>>>   {
>>> @@ -1198,6 +1205,15 @@ asmlinkage __visible void __init
>>> xen_start_kernel(void)
>>>       if (!xen_start_info)
>>>           return;
>>> +    xenhost_register(xenhost_r1, &xh_pv_ops);
>>> +
>>> +    /*
>>> +     * Detect in some implementation defined manner whether this is
>>> +     * nested or not.
>>> +     */
>>> +    if (xen_driver_domain() && xen_nested())
>>> +        xenhost_register(xenhost_r2, &xh_pv_nested_ops);
>>
>> I don't think a driver domain other than dom0 "knows" this in the
>> beginning. It will need to register xenhost_r2
> Right. No point in needlessly registrating as xenhost_r2 without
> needing to handle any xenhost_r2 devices.
>
>>  in case it learns about a pv device from L0 hypervisor.
> What's the mechanism you are thinking of, for this?
> I'm guessing this PV device notification could arrive at an
> arbitrary point in time after the system has booted.

I'm not sure yet how this should be handled.

Maybe an easy solution would be the presence of a Xen PCI device
passed through from L1 hypervisor to L1 dom0. OTOH this would
preclude nested Xen for L1 hypervisor running in PVH mode. And for
L1 driver domains this would need either a shared PCI device or
multiple Xen PCI devices or something new.

There is a design session planned for this topic at the Xen developer
summit in July.


Juergen

2019-06-14 11:55:34

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 07/16] x86/xen: make vcpu_info part of xenhost_t

On 09.05.19 19:25, Ankur Arora wrote:
> Abstract out xen_vcpu_id probing via (*probe_vcpu_id)(). Once that is
> availab,e the vcpu_info registration happens via the VCPUOP hypercall.
>
> Note that for the nested case, there are two vcpu_ids, and two vcpu_info
> areas, one each for the default xenhost and the remote xenhost.
> The vcpu_info is used via pv_irq_ops, and evtchn signaling.
>
> The other VCPUOP hypercalls are used for management (and scheduling)
> which is expected to be done purely in the default hypervisor.
> However, scheduling of L1-guest does imply L0-Xen-vcpu_info switching,
> which might mean that the remote hypervisor needs some visibility
> into related events/hypercalls in the default hypervisor.

Another candidate for dropping due to layering violation, I guess.


Juergen

2019-06-14 12:01:41

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 08/16] x86/xen: irq/upcall handling with multiple xenhosts

On 09.05.19 19:25, Ankur Arora wrote:
> For configurations with multiple xenhosts, we need to handle events
> generated from multiple xenhosts.
>
> Having more than one upcall handler might be quite hairy, and it would
> be simpler if the callback from L0-Xen could be bounced via L1-Xen.
> This will also mean simpler pv_irq_ops code because now the IF flag
> maps onto the xh_default->vcpu_info->evtchn_upcall_mask.
>
> However, we still update the xh_remote->vcpu_info->evtchn_upcall_mask
> on a best effort basis to minimize unnecessary work in remote xenhost.

This is another design decision yet to be taken.

My current prefernce is L1 Xen mapping events from L0 to L1 guest
events.


Juergen

2019-06-14 12:04:57

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 09/16] xen/evtchn: support evtchn in xenhost_t

On 09.05.19 19:25, Ankur Arora wrote:
> Largely mechanical patch that adds a new param, xenhost_t * to the
> evtchn interfaces. The evtchn port instead of being domain unique, is
> now scoped to xenhost_t.
>
> As part of upcall handling we now look at all the xenhosts and, for
> evtchn_2l, the xenhost's shared_info and vcpu_info. Other than this
> event handling is largley unchanged.
>
> Note that the IPI, timer, VIRQ, FUNCTION, PMU etc vectors remain
> attached to xh_default. Only interdomain evtchns are allowable as
> xh_remote.

I'd do only the interface changes for now (including evtchn FIFO).

The main difference will be how to call the hypervisor for sending an
event (either direct or via a passthrough-hypercall).


Juergen

2019-06-17 06:11:24

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 09/16] xen/evtchn: support evtchn in xenhost_t

On 2019-06-14 5:04 a.m., Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Largely mechanical patch that adds a new param, xenhost_t * to the
>> evtchn interfaces. The evtchn port instead of being domain unique, is
>> now scoped to xenhost_t.
>>
>> As part of upcall handling we now look at all the xenhosts and, for
>> evtchn_2l, the xenhost's shared_info and vcpu_info. Other than this
>> event handling is largley unchanged.
>>
>> Note that the IPI, timer, VIRQ, FUNCTION, PMU etc vectors remain
>> attached to xh_default. Only interdomain evtchns are allowable as
>> xh_remote.
>
> I'd do only the interface changes for now (including evtchn FIFO).
Looking at this patch again, it seems to me that it would be best to
limit the interface change (to take the xenhost_t * parameter) only to
bind_interdomain_*. That also happily limits the change to the drivers/
subtree.

>
> The main difference will be how to call the hypervisor for sending an
> event (either direct or via a passthrough-hypercall).
Yeah, though, this would depend on how the evtchns are mapped (if it's
the L1-Xen which is responsible for mapping the evtchn on behalf of the
L0-Xen, then notify_remote_via_evtchn() could just stay the same.)
Still, I'll add a send interface (perhaps just an inline function) to
the xenhost interface for this.

Ankur

>
>
> Juergen

2019-06-17 06:29:52

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 07/16] x86/xen: make vcpu_info part of xenhost_t

On 2019-06-14 4:53 a.m., Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Abstract out xen_vcpu_id probing via (*probe_vcpu_id)(). Once that is
>> availab,e the vcpu_info registration happens via the VCPUOP hypercall.
>>
>> Note that for the nested case, there are two vcpu_ids, and two vcpu_info
>> areas, one each for the default xenhost and the remote xenhost.
>> The vcpu_info is used via pv_irq_ops, and evtchn signaling.
>>
>> The other VCPUOP hypercalls are used for management (and scheduling)
>> which is expected to be done purely in the default hypervisor.
>> However, scheduling of L1-guest does imply L0-Xen-vcpu_info switching,
>> which might mean that the remote hypervisor needs some visibility
>> into related events/hypercalls in the default hypervisor.
>
> Another candidate for dropping due to layering violation, I guess.
Yeah, a more narrowly tailored interface, where perhaps the L1-Xen
maps events for L0-Xen makes sense.
Also, just realized that given that L0-Xen has no control over
scheduling of L1-Xen's guests (some of which it might want to
send events to), it makes sense for L1-Xen to have some state
for guest evtchns which pertain to L0-Xen.


Ankur

>
>
> Juergen

2019-06-17 09:29:13

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 10/16] xen/balloon: support ballooning in xenhost_t

On 09.05.19 19:25, Ankur Arora wrote:
> Xen ballooning uses hollow struct pages (with the underlying GFNs being
> populated/unpopulated via hypercalls) which are used by the grant logic
> to map grants from other domains.
>
> This patch allows the default xenhost to provide an alternate ballooning
> allocation mechanism. This is expected to be useful for local xenhosts
> (type xenhost_r0) because unlike Xen, where there is an external
> hypervisor which can change the memory underneath a GFN, that is not
> possible when the hypervisor is running in the same address space
> as the entity doing the ballooning.
>
> Co-developed-by: Ankur Arora <[email protected]>
> Signed-off-by: Joao Martins <[email protected]>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> arch/x86/xen/enlighten_hvm.c | 7 +++++++
> arch/x86/xen/enlighten_pv.c | 8 ++++++++
> drivers/xen/balloon.c | 19 ++++++++++++++++---
> drivers/xen/grant-table.c | 4 ++--
> drivers/xen/privcmd.c | 4 ++--
> drivers/xen/xen-selfballoon.c | 2 ++
> drivers/xen/xenbus/xenbus_client.c | 6 +++---
> drivers/xen/xlate_mmu.c | 4 ++--
> include/xen/balloon.h | 4 ++--
> include/xen/xenhost.h | 19 +++++++++++++++++++
> 10 files changed, 63 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 5ef4d6ad920d..08becf574743 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -63,6 +63,7 @@
> #include <asm/tlb.h>
>
> #include <xen/interface/xen.h>
> +#include <xen/xenhost.h>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
>
> @@ -583,12 +584,21 @@ static int add_ballooned_pages(int nr_pages)
> * @pages: pages returned
> * @return 0 on success, error otherwise
> */
> -int alloc_xenballooned_pages(int nr_pages, struct page **pages)
> +int alloc_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages)
> {
> int pgno = 0;
> struct page *page;
> int ret;
>
> + /*
> + * xenmem transactions for remote xenhost are disallowed.
> + */
> + if (xh->type == xenhost_r2)
> + return -EINVAL;

Why don't you set a dummy function returning -EINVAL into the xenhost_r2
structure instead?

> +
> + if (xh->ops->alloc_ballooned_pages)
> + return xh->ops->alloc_ballooned_pages(xh, nr_pages, pages);
> +

Please make alloc_xenballooned_pages() an inline wrapper and use the
current implmentaion as the default. This avoids another if ().

> mutex_lock(&balloon_mutex);
>
> balloon_stats.target_unpopulated += nr_pages;
> @@ -620,7 +630,7 @@ int alloc_xenballooned_pages(int nr_pages, struct page **pages)
> return 0;
> out_undo:
> mutex_unlock(&balloon_mutex);
> - free_xenballooned_pages(pgno, pages);
> + free_xenballooned_pages(xh, pgno, pages);
> return ret;
> }
> EXPORT_SYMBOL(alloc_xenballooned_pages);
> @@ -630,10 +640,13 @@ EXPORT_SYMBOL(alloc_xenballooned_pages);
> * @nr_pages: Number of pages
> * @pages: pages to return
> */
> -void free_xenballooned_pages(int nr_pages, struct page **pages)
> +void free_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page **pages)
> {
> int i;
>
> + if (xh->ops->free_ballooned_pages)
> + return xh->ops->free_ballooned_pages(xh, nr_pages, pages);
> +

Same again: please use an inline wrapper.


Juergen

2019-06-17 09:37:07

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 11/16] xen/grant-table: make grant-table xenhost aware

On 09.05.19 19:25, Ankur Arora wrote:
> Largely mechanical changes: the exported grant table symbols now take
> xenhost_t * as a parameter. Also, move the grant table global state
> inside xenhost_t.
>
> If there's more than one xenhost, then initialize both.
>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> arch/x86/xen/grant-table.c | 71 +++--
> drivers/xen/grant-table.c | 611 +++++++++++++++++++++----------------
> include/xen/grant_table.h | 72 ++---
> include/xen/xenhost.h | 11 +
> 4 files changed, 443 insertions(+), 322 deletions(-)
>
> diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
> index 9e08627a9e3e..acee0c7872b6 100644
> --- a/include/xen/xenhost.h
> +++ b/include/xen/xenhost.h
> @@ -129,6 +129,17 @@ typedef struct {
> const struct evtchn_ops *evtchn_ops;
> int **evtchn_to_irq;
> };
> +
> + /* grant table private state */
> + struct {
> + /* private to drivers/xen/grant-table.c */
> + void *gnttab_private;
> +
> + /* x86/xen/grant-table.c */
> + void *gnttab_shared_vm_area;
> + void *gnttab_status_vm_area;
> + void *auto_xlat_grant_frames;

Please use proper types here instead of void *. This avoids lots of
casts. It is okay to just add anonymous struct definitions and keep the
real struct layout local to grant table code.


Juergen

2019-06-17 09:50:54

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 12/16] xen/xenbus: support xenbus frontend/backend with xenhost_t

On 09.05.19 19:25, Ankur Arora wrote:
> As part of xenbus init, both frontend, backend interfaces need to talk
> on the correct xenbus. This might be a local xenstore (backend) or might
> be a XS_PV/XS_HVM interface (frontend) which needs to talk over xenbus
> with the remote xenstored. We bootstrap all of these with evtchn/gfn
> parameters from (*setup_xs)().
>
> Given this we can do appropriate device discovery (in case of frontend)
> and device connectivity for the backend.
> Once done, we stash the xenhost_t * in xen_bus_type, xenbus_device or
> xenbus_watch and then the frontend and backend devices implicitly use
> the correct interface.
>
> The rest of patch is just changing the interfaces where needed.
>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> drivers/block/xen-blkback/blkback.c | 10 +-
> drivers/net/xen-netfront.c | 14 +-
> drivers/pci/xen-pcifront.c | 4 +-
> drivers/xen/cpu_hotplug.c | 4 +-
> drivers/xen/manage.c | 28 +--
> drivers/xen/xen-balloon.c | 8 +-
> drivers/xen/xenbus/xenbus.h | 45 ++--
> drivers/xen/xenbus/xenbus_client.c | 32 +--
> drivers/xen/xenbus/xenbus_comms.c | 121 +++++-----
> drivers/xen/xenbus/xenbus_dev_backend.c | 30 ++-
> drivers/xen/xenbus/xenbus_dev_frontend.c | 22 +-
> drivers/xen/xenbus/xenbus_probe.c | 246 +++++++++++++--------
> drivers/xen/xenbus/xenbus_probe_backend.c | 19 +-
> drivers/xen/xenbus/xenbus_probe_frontend.c | 65 +++---
> drivers/xen/xenbus/xenbus_xs.c | 188 +++++++++-------
> include/xen/xen-ops.h | 3 +
> include/xen/xenbus.h | 54 +++--
> include/xen/xenhost.h | 20 ++
> 18 files changed, 536 insertions(+), 377 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c b/drivers/xen/xenbus/xenbus_dev_frontend.c
> index c3e201025ef0..d6e0c397c6a0 100644
> --- a/drivers/xen/xenbus/xenbus_dev_frontend.c
> +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
> @@ -58,10 +58,14 @@
>
> #include <xen/xenbus.h>
> #include <xen/xen.h>
> +#include <xen/interface/xen.h>
> +#include <xen/xenhost.h>
> #include <asm/xen/hypervisor.h>
>
> #include "xenbus.h"
>
> +static xenhost_t *xh;
> +
> /*
> * An element of a list of outstanding transactions, for which we're
> * still waiting a reply.
> @@ -312,13 +316,13 @@ static void xenbus_file_free(struct kref *kref)
> */
>
> list_for_each_entry_safe(trans, tmp, &u->transactions, list) {
> - xenbus_transaction_end(trans->handle, 1);
> + xenbus_transaction_end(xh, trans->handle, 1);
> list_del(&trans->list);
> kfree(trans);
> }
>
> list_for_each_entry_safe(watch, tmp_watch, &u->watches, list) {
> - unregister_xenbus_watch(&watch->watch);
> + unregister_xenbus_watch(xh, &watch->watch);
> list_del(&watch->list);
> free_watch_adapter(watch);
> }
> @@ -450,7 +454,7 @@ static int xenbus_write_transaction(unsigned msg_type,
> (!strcmp(msg->body, "T") || !strcmp(msg->body, "F"))))
> return xenbus_command_reply(u, XS_ERROR, "EINVAL");
>
> - rc = xenbus_dev_request_and_reply(&msg->hdr, u);
> + rc = xenbus_dev_request_and_reply(xh, &msg->hdr, u);
> if (rc && trans) {
> list_del(&trans->list);
> kfree(trans);
> @@ -489,7 +493,7 @@ static int xenbus_write_watch(unsigned msg_type, struct xenbus_file_priv *u)
> watch->watch.callback = watch_fired;
> watch->dev_data = u;
>
> - err = register_xenbus_watch(&watch->watch);
> + err = register_xenbus_watch(xh, &watch->watch);
> if (err) {
> free_watch_adapter(watch);
> rc = err;
> @@ -500,7 +504,7 @@ static int xenbus_write_watch(unsigned msg_type, struct xenbus_file_priv *u)
> list_for_each_entry(watch, &u->watches, list) {
> if (!strcmp(watch->token, token) &&
> !strcmp(watch->watch.node, path)) {
> - unregister_xenbus_watch(&watch->watch);
> + unregister_xenbus_watch(xh, &watch->watch);
> list_del(&watch->list);
> free_watch_adapter(watch);
> break;
> @@ -618,8 +622,9 @@ static ssize_t xenbus_file_write(struct file *filp,
> static int xenbus_file_open(struct inode *inode, struct file *filp)
> {
> struct xenbus_file_priv *u;
> + struct xenstore_private *xs = xs_priv(xh);
>
> - if (xen_store_evtchn == 0)
> + if (xs->store_evtchn == 0)
> return -ENOENT;
>
> nonseekable_open(inode, filp);
> @@ -687,6 +692,11 @@ static int __init xenbus_init(void)
> if (!xen_domain())
> return -ENODEV;
>
> + if (xen_driver_domain() && xen_nested())
> + xh = xh_remote;
> + else
> + xh = xh_default;

This precludes any mixed use of L0 and L1 frontends. With this move you
make it impossible to e.g. use a driver domain for networking in L1 with
a L1-local PV disk, or pygrub in L1 dom0.


Juergen

2019-06-17 10:08:58

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] drivers/xen: gnttab, evtchn, xenbus API changes

On 09.05.19 19:25, Ankur Arora wrote:
> Mechanical changes, now most of these calls take xenhost_t *
> as parameter.
>
> Co-developed-by: Joao Martins <[email protected]>
> Signed-off-by: Ankur Arora <[email protected]>
> ---
> drivers/xen/cpu_hotplug.c | 14 ++++++-------
> drivers/xen/gntalloc.c | 13 ++++++++----
> drivers/xen/gntdev.c | 16 +++++++++++----
> drivers/xen/manage.c | 37 ++++++++++++++++++-----------------
> drivers/xen/platform-pci.c | 12 +++++++-----
> drivers/xen/sys-hypervisor.c | 12 ++++++++----
> drivers/xen/xen-balloon.c | 10 +++++++---
> drivers/xen/xenfs/xenstored.c | 7 ++++---
> 8 files changed, 73 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> index afeb94446d34..4a05bc028956 100644
> --- a/drivers/xen/cpu_hotplug.c
> +++ b/drivers/xen/cpu_hotplug.c
> @@ -31,13 +31,13 @@ static void disable_hotplug_cpu(int cpu)
> unlock_device_hotplug();
> }
>
> -static int vcpu_online(unsigned int cpu)
> +static int vcpu_online(xenhost_t *xh, unsigned int cpu)

Do we really need xenhost for cpu on/offlinig?

> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 9a69d955dd5c..1655d0a039fd 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch *watch,
> return;
>
> again:
> - err = xenbus_transaction_start(xh_default, &xbt);
> + err = xenbus_transaction_start(watch->xh, &xbt);
> if (err)
> return;
>
> - str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown", NULL);
> + str = (char *)xenbus_read(watch->xh, xbt, "control", "shutdown", NULL);
> /* Ignore read errors and empty reads. */
> if (XENBUS_IS_ERR_READ(str)) {
> - xenbus_transaction_end(xh_default, xbt, 1);
> + xenbus_transaction_end(watch->xh, xbt, 1);
> return;
> }
>
> @@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch *watch,
>
> /* Only acknowledge commands which we are prepared to handle. */
> if (idx < ARRAY_SIZE(shutdown_handlers))
> - xenbus_write(xh_default, xbt, "control", "shutdown", "");
> + xenbus_write(watch->xh, xbt, "control", "shutdown", "");
>
> - err = xenbus_transaction_end(xh_default, xbt, 0);
> + err = xenbus_transaction_end(watch->xh, xbt, 0);
> if (err == -EAGAIN) {
> kfree(str);
> goto again;
> @@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
> int err;
>
> again:
> - err = xenbus_transaction_start(xh_default, &xbt);
> + err = xenbus_transaction_start(watch->xh, &xbt);
> if (err)
> return;
> - err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c", &sysrq_key);
> + err = xenbus_scanf(watch->xh, xbt, "control", "sysrq", "%c", &sysrq_key);
> if (err < 0) {
> /*
> * The Xenstore watch fires directly after registering it and
> @@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch *watch, const char *path,
> if (err != -ENOENT && err != -ERANGE)
> pr_err("Error %d reading sysrq code in control/sysrq\n",
> err);
> - xenbus_transaction_end(xh_default, xbt, 1);
> + xenbus_transaction_end(watch->xh, xbt, 1);
> return;
> }
>
> if (sysrq_key != '\0') {
> - err = xenbus_printf(xh_default, xbt, "control", "sysrq", "%c", '\0');
> + err = xenbus_printf(watch->xh, xbt, "control", "sysrq", "%c", '\0');
> if (err) {
> pr_err("%s: Error %d writing sysrq in control/sysrq\n",
> __func__, err);
> - xenbus_transaction_end(xh_default, xbt, 1);
> + xenbus_transaction_end(watch->xh, xbt, 1);
> return;
> }
> }
>
> - err = xenbus_transaction_end(xh_default, xbt, 0);
> + err = xenbus_transaction_end(watch->xh, xbt, 0);
> if (err == -EAGAIN)
> goto again;
>
> @@ -324,14 +324,14 @@ static struct notifier_block xen_reboot_nb = {
> .notifier_call = poweroff_nb,
> };
>
> -static int setup_shutdown_watcher(void)
> +static int setup_shutdown_watcher(xenhost_t *xh)

I think shutdown is purely local, too.


Juergen

2019-06-17 10:14:59

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 14/16] xen/blk: gnttab, evtchn, xenbus API changes

On 09.05.19 19:25, Ankur Arora wrote:
> For the most part, we now pass xenhost_t * as a parameter.
>
> Co-developed-by: Joao Martins <[email protected]>
> Signed-off-by: Ankur Arora <[email protected]>

I don't see how this can be a patch on its own.

The only way to be able to use a patch for each driver would be to
keep the original grant-, event- and xenbus-interfaces and add the
new ones taking xenhost * with a new name. The original interfaces
could then use xenhost_default and you can switch them to the new
interfaces one by one. The last patch could then remove the old
interfaces when there is no user left.


Juergen

2019-06-17 10:15:39

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 15/16] xen/net: gnttab, evtchn, xenbus API changes

On 09.05.19 19:25, Ankur Arora wrote:
> For the most part, we now pass xenhost_t * as parameter.
>
> Co-developed-by: Joao Martins <[email protected]>
> Signed-off-by: Ankur Arora <[email protected]>

Same as previous patch.


Juergen

2019-06-17 10:56:35

by Jürgen Groß

[permalink] [raw]
Subject: Re: [RFC PATCH 16/16] xen/grant-table: host_addr fixup in mapping on xenhost_r0

On 09.05.19 19:25, Ankur Arora wrote:
> Xenhost type xenhost_r0 does not support standard GNTTABOP_map_grant_ref
> semantics (map a gref onto a specified host_addr). That's because
> since the hypervisor is local (same address space as the caller of
> GNTTABOP_map_grant_ref), there is no external entity that could
> map an arbitrary page underneath an arbitrary address.
>
> To handle this, the GNTTABOP_map_grant_ref hypercall on xenhost_r0
> treats the host_addr as an OUT parameter instead of IN and expects the
> gnttab_map_refs() and similar to fixup any state that caches the
> value of host_addr from before the hypercall.
>
> Accordingly gnttab_map_refs() now adds two parameters, a fixup function
> and a pointer to cached maps to fixup:
> int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref *map_ops,
> struct gnttab_map_grant_ref *kmap_ops,
> - struct page **pages, unsigned int count)
> + struct page **pages, gnttab_map_fixup_t map_fixup_fn,
> + void **map_fixup[], unsigned int count)
>
> The reason we use a fixup function and not an additional mapping op
> in the xenhost_t is because, depending on the caller, what we are fixing
> might be different: blkback, netback for instance cache host_addr in
> via a struct page *, while __xenbus_map_ring() caches a phys_addr.
>
> This patch fixes up xen-blkback and xen-gntdev drivers.
>
> TODO:
> - also rewrite gnttab_batch_map() and __xenbus_map_ring().
> - modify xen-netback, scsiback, pciback etc
>
> Co-developed-by: Joao Martins <[email protected]>
> Signed-off-by: Ankur Arora <[email protected]>

Without seeing the __xenbus_map_ring() modification it is impossible to
do a proper review of this patch.


Juergen

2019-06-19 02:25:39

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 10/16] xen/balloon: support ballooning in xenhost_t



On 6/17/19 2:28 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Xen ballooning uses hollow struct pages (with the underlying GFNs being
>> populated/unpopulated via hypercalls) which are used by the grant logic
>> to map grants from other domains.
>>
>> This patch allows the default xenhost to provide an alternate ballooning
>> allocation mechanism. This is expected to be useful for local xenhosts
>> (type xenhost_r0) because unlike Xen, where there is an external
>> hypervisor which can change the memory underneath a GFN, that is not
>> possible when the hypervisor is running in the same address space
>> as the entity doing the ballooning.
>>
>> Co-developed-by: Ankur Arora <[email protected]>
>> Signed-off-by: Joao Martins <[email protected]>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>>   arch/x86/xen/enlighten_hvm.c       |  7 +++++++
>>   arch/x86/xen/enlighten_pv.c        |  8 ++++++++
>>   drivers/xen/balloon.c              | 19 ++++++++++++++++---
>>   drivers/xen/grant-table.c          |  4 ++--
>>   drivers/xen/privcmd.c              |  4 ++--
>>   drivers/xen/xen-selfballoon.c      |  2 ++
>>   drivers/xen/xenbus/xenbus_client.c |  6 +++---
>>   drivers/xen/xlate_mmu.c            |  4 ++--
>>   include/xen/balloon.h              |  4 ++--
>>   include/xen/xenhost.h              | 19 +++++++++++++++++++
>>   10 files changed, 63 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index 5ef4d6ad920d..08becf574743 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -63,6 +63,7 @@
>>   #include <asm/tlb.h>
>>   #include <xen/interface/xen.h>
>> +#include <xen/xenhost.h>
>>   #include <asm/xen/hypervisor.h>
>>   #include <asm/xen/hypercall.h>
>> @@ -583,12 +584,21 @@ static int add_ballooned_pages(int nr_pages)
>>    * @pages: pages returned
>>    * @return 0 on success, error otherwise
>>    */
>> -int alloc_xenballooned_pages(int nr_pages, struct page **pages)
>> +int alloc_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page
>> **pages)
>>   {
>>       int pgno = 0;
>>       struct page *page;
>>       int ret;
>> +    /*
>> +     * xenmem transactions for remote xenhost are disallowed.
>> +     */
>> +    if (xh->type == xenhost_r2)
>> +        return -EINVAL;
>
> Why don't you set a dummy function returning -EINVAL into the xenhost_r2
> structure instead?
Will do. And, same for the two comments below.

Ankur

>
>> +
>> +    if (xh->ops->alloc_ballooned_pages)
>> +        return xh->ops->alloc_ballooned_pages(xh, nr_pages, pages);
>> +
>
> Please make alloc_xenballooned_pages() an inline wrapper and use the
> current implmentaion as the default. This avoids another if ().
>
>>       mutex_lock(&balloon_mutex);
>>       balloon_stats.target_unpopulated += nr_pages;
>> @@ -620,7 +630,7 @@ int alloc_xenballooned_pages(int nr_pages, struct
>> page **pages)
>>       return 0;
>>    out_undo:
>>       mutex_unlock(&balloon_mutex);
>> -    free_xenballooned_pages(pgno, pages);
>> +    free_xenballooned_pages(xh, pgno, pages);
>>       return ret;
>>   }
>>   EXPORT_SYMBOL(alloc_xenballooned_pages);
>> @@ -630,10 +640,13 @@ EXPORT_SYMBOL(alloc_xenballooned_pages);
>>    * @nr_pages: Number of pages
>>    * @pages: pages to return
>>    */
>> -void free_xenballooned_pages(int nr_pages, struct page **pages)
>> +void free_xenballooned_pages(xenhost_t *xh, int nr_pages, struct page
>> **pages)
>>   {
>>       int i;
>> +    if (xh->ops->free_ballooned_pages)
>> +        return xh->ops->free_ballooned_pages(xh, nr_pages, pages);
>> +
>
> Same again: please use an inline wrapper.
>
>
> Juergen

2019-06-19 02:27:10

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 11/16] xen/grant-table: make grant-table xenhost aware

On 6/17/19 2:36 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Largely mechanical changes: the exported grant table symbols now take
>> xenhost_t * as a parameter. Also, move the grant table global state
>> inside xenhost_t.
>>
>> If there's more than one xenhost, then initialize both.
>>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>>   arch/x86/xen/grant-table.c |  71 +++--
>>   drivers/xen/grant-table.c  | 611 +++++++++++++++++++++----------------
>>   include/xen/grant_table.h  |  72 ++---
>>   include/xen/xenhost.h      |  11 +
>>   4 files changed, 443 insertions(+), 322 deletions(-)
>>
>> diff --git a/include/xen/xenhost.h b/include/xen/xenhost.h
>> index 9e08627a9e3e..acee0c7872b6 100644
>> --- a/include/xen/xenhost.h
>> +++ b/include/xen/xenhost.h
>> @@ -129,6 +129,17 @@ typedef struct {
>>           const struct evtchn_ops *evtchn_ops;
>>           int **evtchn_to_irq;
>>       };
>> +
>> +    /* grant table private state */
>> +    struct {
>> +        /* private to drivers/xen/grant-table.c */
>> +        void *gnttab_private;
>> +
>> +        /* x86/xen/grant-table.c */
>> +        void *gnttab_shared_vm_area;
>> +        void *gnttab_status_vm_area;
>> +        void *auto_xlat_grant_frames;
>
> Please use proper types here instead of void *. This avoids lots of
> casts. It is okay to just add anonymous struct definitions and keep the
> real struct layout local to grant table code.
Will fix.

Ankur

>
>
> Juergen

2019-06-19 02:39:28

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 12/16] xen/xenbus: support xenbus frontend/backend with xenhost_t



On 6/17/19 2:50 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> As part of xenbus init, both frontend, backend interfaces need to talk
>> on the correct xenbus. This might be a local xenstore (backend) or might
>> be a XS_PV/XS_HVM interface (frontend) which needs to talk over xenbus
>> with the remote xenstored. We bootstrap all of these with evtchn/gfn
>> parameters from (*setup_xs)().
>>
>> Given this we can do appropriate device discovery (in case of frontend)
>> and device connectivity for the backend.
>> Once done, we stash the xenhost_t * in xen_bus_type, xenbus_device or
>> xenbus_watch and then the frontend and backend devices implicitly use
>> the correct interface.
>>
>> The rest of patch is just changing the interfaces where needed.
>>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>>   drivers/block/xen-blkback/blkback.c        |  10 +-
>>   drivers/net/xen-netfront.c                 |  14 +-
>>   drivers/pci/xen-pcifront.c                 |   4 +-
>>   drivers/xen/cpu_hotplug.c                  |   4 +-
>>   drivers/xen/manage.c                       |  28 +--
>>   drivers/xen/xen-balloon.c                  |   8 +-
>>   drivers/xen/xenbus/xenbus.h                |  45 ++--
>>   drivers/xen/xenbus/xenbus_client.c         |  32 +--
>>   drivers/xen/xenbus/xenbus_comms.c          | 121 +++++-----
>>   drivers/xen/xenbus/xenbus_dev_backend.c    |  30 ++-
>>   drivers/xen/xenbus/xenbus_dev_frontend.c   |  22 +-
>>   drivers/xen/xenbus/xenbus_probe.c          | 246 +++++++++++++--------
>>   drivers/xen/xenbus/xenbus_probe_backend.c  |  19 +-
>>   drivers/xen/xenbus/xenbus_probe_frontend.c |  65 +++---
>>   drivers/xen/xenbus/xenbus_xs.c             | 188 +++++++++-------
>>   include/xen/xen-ops.h                      |   3 +
>>   include/xen/xenbus.h                       |  54 +++--
>>   include/xen/xenhost.h                      |  20 ++
>>   18 files changed, 536 insertions(+), 377 deletions(-)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c
>> b/drivers/xen/xenbus/xenbus_dev_frontend.c
>> index c3e201025ef0..d6e0c397c6a0 100644
>> --- a/drivers/xen/xenbus/xenbus_dev_frontend.c
>> +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c
>> @@ -58,10 +58,14 @@
>>   #include <xen/xenbus.h>
>>   #include <xen/xen.h>
>> +#include <xen/interface/xen.h>
>> +#include <xen/xenhost.h>
>>   #include <asm/xen/hypervisor.h>
>>   #include "xenbus.h"
>> +static xenhost_t *xh;
>> +
>>   /*
>>    * An element of a list of outstanding transactions, for which we're
>>    * still waiting a reply.
>> @@ -312,13 +316,13 @@ static void xenbus_file_free(struct kref *kref)
>>        */
>>       list_for_each_entry_safe(trans, tmp, &u->transactions, list) {
>> -        xenbus_transaction_end(trans->handle, 1);
>> +        xenbus_transaction_end(xh, trans->handle, 1);
>>           list_del(&trans->list);
>>           kfree(trans);
>>       }
>>       list_for_each_entry_safe(watch, tmp_watch, &u->watches, list) {
>> -        unregister_xenbus_watch(&watch->watch);
>> +        unregister_xenbus_watch(xh, &watch->watch);
>>           list_del(&watch->list);
>>           free_watch_adapter(watch);
>>       }
>> @@ -450,7 +454,7 @@ static int xenbus_write_transaction(unsigned
>> msg_type,
>>              (!strcmp(msg->body, "T") || !strcmp(msg->body, "F"))))
>>           return xenbus_command_reply(u, XS_ERROR, "EINVAL");
>> -    rc = xenbus_dev_request_and_reply(&msg->hdr, u);
>> +    rc = xenbus_dev_request_and_reply(xh, &msg->hdr, u);
>>       if (rc && trans) {
>>           list_del(&trans->list);
>>           kfree(trans);
>> @@ -489,7 +493,7 @@ static int xenbus_write_watch(unsigned msg_type,
>> struct xenbus_file_priv *u)
>>           watch->watch.callback = watch_fired;
>>           watch->dev_data = u;
>> -        err = register_xenbus_watch(&watch->watch);
>> +        err = register_xenbus_watch(xh, &watch->watch);
>>           if (err) {
>>               free_watch_adapter(watch);
>>               rc = err;
>> @@ -500,7 +504,7 @@ static int xenbus_write_watch(unsigned msg_type,
>> struct xenbus_file_priv *u)
>>           list_for_each_entry(watch, &u->watches, list) {
>>               if (!strcmp(watch->token, token) &&
>>                   !strcmp(watch->watch.node, path)) {
>> -                unregister_xenbus_watch(&watch->watch);
>> +                unregister_xenbus_watch(xh, &watch->watch);
>>                   list_del(&watch->list);
>>                   free_watch_adapter(watch);
>>                   break;
>> @@ -618,8 +622,9 @@ static ssize_t xenbus_file_write(struct file *filp,
>>   static int xenbus_file_open(struct inode *inode, struct file *filp)
>>   {
>>       struct xenbus_file_priv *u;
>> +    struct xenstore_private *xs = xs_priv(xh);
>> -    if (xen_store_evtchn == 0)
>> +    if (xs->store_evtchn == 0)
>>           return -ENOENT;
>>       nonseekable_open(inode, filp);
>> @@ -687,6 +692,11 @@ static int __init xenbus_init(void)
>>       if (!xen_domain())
>>           return -ENODEV;
>> +    if (xen_driver_domain() && xen_nested())
>> +        xh = xh_remote;
>> +    else
>> +        xh = xh_default;
>
> This precludes any mixed use of L0 and L1 frontends. With this move you
> make it impossible to e.g. use a driver domain for networking in L1 with
> a L1-local PV disk, or pygrub in L1 dom0.
Ah, yes. I hadn't thought about that case.

Let me see how I can rework this interface.

Ankur

>
>
> Juergen

2019-06-19 02:59:05

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 13/16] drivers/xen: gnttab, evtchn, xenbus API changes

On 6/17/19 3:07 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Mechanical changes, now most of these calls take xenhost_t *
>> as parameter.
>>
>> Co-developed-by: Joao Martins <[email protected]>
>> Signed-off-by: Ankur Arora <[email protected]>
>> ---
>>   drivers/xen/cpu_hotplug.c     | 14 ++++++-------
>>   drivers/xen/gntalloc.c        | 13 ++++++++----
>>   drivers/xen/gntdev.c          | 16 +++++++++++----
>>   drivers/xen/manage.c          | 37 ++++++++++++++++++-----------------
>>   drivers/xen/platform-pci.c    | 12 +++++++-----
>>   drivers/xen/sys-hypervisor.c  | 12 ++++++++----
>>   drivers/xen/xen-balloon.c     | 10 +++++++---
>>   drivers/xen/xenfs/xenstored.c |  7 ++++---
>>   8 files changed, 73 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
>> index afeb94446d34..4a05bc028956 100644
>> --- a/drivers/xen/cpu_hotplug.c
>> +++ b/drivers/xen/cpu_hotplug.c
>> @@ -31,13 +31,13 @@ static void disable_hotplug_cpu(int cpu)
>>       unlock_device_hotplug();
>>   }
>> -static int vcpu_online(unsigned int cpu)
>> +static int vcpu_online(xenhost_t *xh, unsigned int cpu)
>
> Do we really need xenhost for cpu on/offlinig?
I was in two minds about this. We only need it for the xenbus
interfaces which could very well have been just xh_default.

However, the xenhost is part of the xenbus_watch state, so
I thought it is easier to percolate that down instead of
adding xh_default all over the place.

>
>> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
>> index 9a69d955dd5c..1655d0a039fd 100644
>> --- a/drivers/xen/manage.c
>> +++ b/drivers/xen/manage.c
>> @@ -227,14 +227,14 @@ static void shutdown_handler(struct xenbus_watch
>> *watch,
>>           return;
>>    again:
>> -    err = xenbus_transaction_start(xh_default, &xbt);
>> +    err = xenbus_transaction_start(watch->xh, &xbt);
>>       if (err)
>>           return;
>> -    str = (char *)xenbus_read(xh_default, xbt, "control", "shutdown",
>> NULL);
>> +    str = (char *)xenbus_read(watch->xh, xbt, "control", "shutdown",
>> NULL);
>>       /* Ignore read errors and empty reads. */
>>       if (XENBUS_IS_ERR_READ(str)) {
>> -        xenbus_transaction_end(xh_default, xbt, 1);
>> +        xenbus_transaction_end(watch->xh, xbt, 1);
>>           return;
>>       }
>> @@ -245,9 +245,9 @@ static void shutdown_handler(struct xenbus_watch
>> *watch,
>>       /* Only acknowledge commands which we are prepared to handle. */
>>       if (idx < ARRAY_SIZE(shutdown_handlers))
>> -        xenbus_write(xh_default, xbt, "control", "shutdown", "");
>> +        xenbus_write(watch->xh, xbt, "control", "shutdown", "");
>> -    err = xenbus_transaction_end(xh_default, xbt, 0);
>> +    err = xenbus_transaction_end(watch->xh, xbt, 0);
>>       if (err == -EAGAIN) {
>>           kfree(str);
>>           goto again;
>> @@ -272,10 +272,10 @@ static void sysrq_handler(struct xenbus_watch
>> *watch, const char *path,
>>       int err;
>>    again:
>> -    err = xenbus_transaction_start(xh_default, &xbt);
>> +    err = xenbus_transaction_start(watch->xh, &xbt);
>>       if (err)
>>           return;
>> -    err = xenbus_scanf(xh_default, xbt, "control", "sysrq", "%c",
>> &sysrq_key);
>> +    err = xenbus_scanf(watch->xh, xbt, "control", "sysrq", "%c",
>> &sysrq_key);
>>       if (err < 0) {
>>           /*
>>            * The Xenstore watch fires directly after registering it and
>> @@ -287,21 +287,21 @@ static void sysrq_handler(struct xenbus_watch
>> *watch, const char *path,
>>           if (err != -ENOENT && err != -ERANGE)
>>               pr_err("Error %d reading sysrq code in control/sysrq\n",
>>                      err);
>> -        xenbus_transaction_end(xh_default, xbt, 1);
>> +        xenbus_transaction_end(watch->xh, xbt, 1);
>>           return;
>>       }
>>       if (sysrq_key != '\0') {
>> -        err = xenbus_printf(xh_default, xbt, "control", "sysrq",
>> "%c", '\0');
>> +        err = xenbus_printf(watch->xh, xbt, "control", "sysrq", "%c",
>> '\0');
>>           if (err) {
>>               pr_err("%s: Error %d writing sysrq in control/sysrq\n",
>>                      __func__, err);
>> -            xenbus_transaction_end(xh_default, xbt, 1);
>> +            xenbus_transaction_end(watch->xh, xbt, 1);
>>               return;
>>           }
>>       }
>> -    err = xenbus_transaction_end(xh_default, xbt, 0);
>> +    err = xenbus_transaction_end(watch->xh, xbt, 0);
>>       if (err == -EAGAIN)
>>           goto again;
>> @@ -324,14 +324,14 @@ static struct notifier_block xen_reboot_nb = {
>>       .notifier_call = poweroff_nb,
>>   };
>> -static int setup_shutdown_watcher(void)
>> +static int setup_shutdown_watcher(xenhost_t *xh)
>
> I think shutdown is purely local, too.
Yes, I introduced xenhost for the same reason as above.

I agree that either of these cases (and similar others) have no use
for the concept of xenhost. Do you think it makes sense for these
to pass NULL instead and the underlying interface would just assume
xh_default.

Ankur

>
>
> Juergen

2019-06-19 03:01:51

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 14/16] xen/blk: gnttab, evtchn, xenbus API changes

On 6/17/19 3:14 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> For the most part, we now pass xenhost_t * as a parameter.
>>
>> Co-developed-by: Joao Martins <[email protected]>
>> Signed-off-by: Ankur Arora <[email protected]>
>
> I don't see how this can be a patch on its own.
Yes, the reason this was separate was that given this was an
RFC, I didn't want to pollute the logic page with lots of
mechanical changes.

>
> The only way to be able to use a patch for each driver would be to
> keep the original grant-, event- and xenbus-interfaces and add the
> new ones taking xenhost * with a new name. The original interfaces
> could then use xenhost_default and you can switch them to the new
> interfaces one by one. The last patch could then remove the old
> interfaces when there is no user left.
Yes, this makes sense.

Ankur

>
>
> Juergen

2019-06-19 03:02:43

by Ankur Arora

[permalink] [raw]
Subject: Re: [RFC PATCH 16/16] xen/grant-table: host_addr fixup in mapping on xenhost_r0

On 6/17/19 3:55 AM, Juergen Gross wrote:
> On 09.05.19 19:25, Ankur Arora wrote:
>> Xenhost type xenhost_r0 does not support standard GNTTABOP_map_grant_ref
>> semantics (map a gref onto a specified host_addr). That's because
>> since the hypervisor is local (same address space as the caller of
>> GNTTABOP_map_grant_ref), there is no external entity that could
>> map an arbitrary page underneath an arbitrary address.
>>
>> To handle this, the GNTTABOP_map_grant_ref hypercall on xenhost_r0
>> treats the host_addr as an OUT parameter instead of IN and expects the
>> gnttab_map_refs() and similar to fixup any state that caches the
>> value of host_addr from before the hypercall.
>>
>> Accordingly gnttab_map_refs() now adds two parameters, a fixup function
>> and a pointer to cached maps to fixup:
>>   int gnttab_map_refs(xenhost_t *xh, struct gnttab_map_grant_ref
>> *map_ops,
>>               struct gnttab_map_grant_ref *kmap_ops,
>> -            struct page **pages, unsigned int count)
>> +            struct page **pages, gnttab_map_fixup_t map_fixup_fn,
>> +            void **map_fixup[], unsigned int count)
>>
>> The reason we use a fixup function and not an additional mapping op
>> in the xenhost_t is because, depending on the caller, what we are fixing
>> might be different: blkback, netback for instance cache host_addr in
>> via a struct page *, while __xenbus_map_ring() caches a phys_addr.
>>
>> This patch fixes up xen-blkback and xen-gntdev drivers.
>>
>> TODO:
>>    - also rewrite gnttab_batch_map() and __xenbus_map_ring().
>>    - modify xen-netback, scsiback, pciback etc
>>
>> Co-developed-by: Joao Martins <[email protected]>
>> Signed-off-by: Ankur Arora <[email protected]>
>
> Without seeing the __xenbus_map_ring() modification it is impossible to
> do a proper review of this patch.
Will do in v2.

Ankur

>
>
> Juergen