2010-02-08 08:05:59

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 0/7][v3] PV featured HVM(Hybrid) for Xen

Hi, Jeremy & Keir & Ian

Here is the third version of patchset to enable Xen Hybrid extension support
in Linux kernel.

The Hybrid Extension is started from real mode like HVM guest, but also with a
a range of PV features(e.g. PV halt, PV timer, event channel, as well as PV
drivers). So guest with Hybrid extension feature can takes the advantages of
both H/W virtualization and Para-Virtualization.

The first two of the patchset imported several header file from Jeremy's tree
and Xen tree, respect to Jeremy and Keir's works.

The whole patchset based on Linux upstream.

You need a line like:

cpuid = [ '0x40000002:edx=0x3' ]

in HVM configuration file to expose hybrid feature to guest, and

CONFIG_XEN

in the guest kernel configuration file to enable the hybrid support.

And the compiled image can be used as native/pv domU/hvm guest/pv feature hvm kernel.

Current the patchset support x86_64 only.

Change from v2:
1. change the name "hybrid" to "PV featured HVM".
2. Unified the PV driver's judgement of xen_domain() to xen_evtchn_enabled().
3. Move the function(evtchn) initialize hypercall near the real enabling place,
rather than a unified place before function enabled.
4. Remove the reserved E820 region for grant table. Use QEmu Xen platform
device's MMIO instead.

But the item 4 introduce another issue: the gnttab_init() is listed as a
core_initcall, but the QEmu's Xen platform device is a device, the
initialization would be much later(and we need more code to probe that
device). Currently I just hardcode the MMIO address of platform device in the
gnttab_init() for HVM initialization, but don't think it's a good idea. And
since HVM's P2M is different from PV's address translate mechanism, we can't
use PV code to initialize grant table. So I think we may still need to reserve
several pages for it? The pages may be known by hypervisor, and use some way
to notify guest, like Ian purposed earlier. But seems it still looks like we
need E820... And more ideas?

The major change from v1:
1. SMP support.
2. Modify the entrance point to avoid most of genernic kernel modification.
3. Binding PV timer with event channel mechanism.

--
regards
Yang, Sheng

arch/x86/include/asm/xen/cpuid.h | 73 +++++++++++++
arch/x86/include/asm/xen/hypercall.h | 6 +
arch/x86/kernel/setup.c | 8 ++
arch/x86/xen/enlighten.c | 188 ++++++++++++++++++++++++++++++++++
arch/x86/xen/irq.c | 54 ++++++++++
arch/x86/xen/smp.c | 144 ++++++++++++++++++++++++++-
arch/x86/xen/xen-head.S | 6 +
arch/x86/xen/xen-ops.h | 4 +
drivers/block/xen-blkfront.c | 3 -
drivers/input/xen-kbdfront.c | 6 +-
drivers/net/xen-netfront.c | 5 -
drivers/video/xen-fbfront.c | 6 +-
drivers/xen/events.c | 66 +++++++++++-
drivers/xen/grant-table.c | 59 ++++++++++-
drivers/xen/xenbus/xenbus_probe.c | 22 +++-
include/xen/events.h | 1 +
include/xen/hvm.h | 28 +++++
include/xen/interface/hvm/hvm_op.h | 79 ++++++++++++++
include/xen/interface/hvm/params.h | 111 ++++++++++++++++++++
include/xen/interface/xen.h | 6 +-
include/xen/xen.h | 11 ++
include/xen/xenbus.h | 3 +
22 files changed, 861 insertions(+), 28 deletions(-)


2010-02-08 08:06:19

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 6/7] xen: Unified checking for Xen of PV drivers to xenbus_register_frontend()


Signed-off-by: Sheng Yang <[email protected]>
---
drivers/block/xen-blkfront.c | 3 ---
drivers/input/xen-kbdfront.c | 3 ---
drivers/net/xen-netfront.c | 5 -----
drivers/video/xen-fbfront.c | 3 ---
include/xen/xenbus.h | 3 +++
5 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 05a31e5..d6465c1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1068,9 +1068,6 @@ static struct xenbus_driver blkfront = {

static int __init xlblk_init(void)
{
- if (!xen_domain())
- return -ENODEV;
-
if (register_blkdev(XENVBD_MAJOR, DEV_NAME)) {
printk(KERN_WARNING "xen_blk: can't get major %d with name %s\n",
XENVBD_MAJOR, DEV_NAME);
diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c
index c721c0a..febffc3 100644
--- a/drivers/input/xen-kbdfront.c
+++ b/drivers/input/xen-kbdfront.c
@@ -338,9 +338,6 @@ static struct xenbus_driver xenkbd_driver = {

static int __init xenkbd_init(void)
{
- if (!xen_domain())
- return -ENODEV;
-
/* Nothing to do if running in dom0. */
if (xen_initial_domain())
return -ENODEV;
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index a869b45..d89fd0b 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1804,14 +1804,9 @@ static struct xenbus_driver netfront_driver = {

static int __init netif_init(void)
{
- if (!xen_domain())
- return -ENODEV;
-
if (xen_initial_domain())
return 0;

- printk(KERN_INFO "Initialising Xen virtual ethernet driver.\n");
-
return xenbus_register_frontend(&netfront_driver);
}
module_init(netif_init);
diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
index 603598f..daff72f 100644
--- a/drivers/video/xen-fbfront.c
+++ b/drivers/video/xen-fbfront.c
@@ -683,9 +683,6 @@ static struct xenbus_driver xenfb_driver = {

static int __init xenfb_init(void)
{
- if (!xen_domain())
- return -ENODEV;
-
/* Nothing to do if running in dom0. */
if (xen_initial_domain())
return -ENODEV;
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index b9763ba..9f68cf5 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -43,6 +43,7 @@
#include <xen/interface/grant_table.h>
#include <xen/interface/io/xenbus.h>
#include <xen/interface/io/xs_wire.h>
+#include <xen/xen.h>

/* Register callback to watch this node. */
struct xenbus_watch
@@ -112,6 +113,8 @@ static inline int __must_check
xenbus_register_frontend(struct xenbus_driver *drv)
{
WARN_ON(drv->owner != THIS_MODULE);
+ if (!xen_domain())
+ return -ENODEV;
return __xenbus_register_frontend(drv, THIS_MODULE, KBUILD_MODNAME);
}

--
1.5.4.5

2010-02-08 08:06:26

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 3/7] xen/hvm: Xen PV featured HVM initialization

The PV featured HVM(once known as Hybrid) is started from real mode like HVM
guest, but also with a component based PV feature selection(e.g. PV halt, PV
timer, event channel, then PV drivers). So guest can takes the advantages of
both H/W virtualization and Para-Virtualization.

This patch introduced the PV featured HVM guest initialization.

Guest would detect the capability using CPUID 0x40000002.edx, then call
HVMOP_enable_pv hypercall to enable pv support in hypervisor.

Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Yaozu (Eddie) Dong <[email protected]>
---
arch/x86/include/asm/xen/cpuid.h | 5 ++
arch/x86/xen/enlighten.c | 115 ++++++++++++++++++++++++++++++++++++
arch/x86/xen/irq.c | 21 +++++++
arch/x86/xen/xen-head.S | 6 ++
arch/x86/xen/xen-ops.h | 1 +
include/xen/interface/hvm/hvm_op.h | 7 ++
include/xen/xen.h | 9 +++
7 files changed, 164 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
index 8787f03..a93c851 100644
--- a/arch/x86/include/asm/xen/cpuid.h
+++ b/arch/x86/include/asm/xen/cpuid.h
@@ -65,4 +65,9 @@
#define _XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD 0
#define XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD (1u<<0)

+#define _XEN_CPUID_FEAT2_HVM_PV 0
+#define XEN_CPUID_FEAT2_HVM_PV (1u<<0)
+#define _XEN_CPUID_FEAT2_HVM_PV_EVTCHN 1
+#define XEN_CPUID_FEAT2_HVM_PV_EVTCHN (1u<<1)
+
#endif /* __XEN_PUBLIC_ARCH_X86_CPUID_H__ */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2b26dd5..cbd1374 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -34,6 +34,8 @@
#include <xen/interface/version.h>
#include <xen/interface/physdev.h>
#include <xen/interface/vcpu.h>
+#include <xen/interface/memory.h>
+#include <xen/interface/hvm/hvm_op.h>
#include <xen/features.h>
#include <xen/page.h>
#include <xen/hvc-console.h>
@@ -43,6 +45,7 @@
#include <asm/page.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
+#include <asm/xen/cpuid.h>
#include <asm/fixmap.h>
#include <asm/processor.h>
#include <asm/proto.h>
@@ -1194,3 +1197,115 @@ asmlinkage void __init xen_start_kernel(void)
x86_64_start_reservations((char *)__pa_symbol(&boot_params));
#endif
}
+
+static void __init xen_hvm_pv_banner(void)
+{
+ unsigned version = HYPERVISOR_xen_version(XENVER_version, NULL);
+ struct xen_extraversion extra;
+ HYPERVISOR_xen_version(XENVER_extraversion, &extra);
+
+ printk(KERN_INFO "Booting PV featured HVM kernel on %s\n",
+ pv_info.name);
+ printk(KERN_INFO "Xen version: %d.%d%s\n",
+ version >> 16, version & 0xffff, extra.extraversion);
+}
+
+static int xen_para_available(void)
+{
+ uint32_t eax, ebx, ecx, edx;
+ cpuid(XEN_CPUID_LEAF(0), &eax, &ebx, &ecx, &edx);
+
+ if (ebx == XEN_CPUID_SIGNATURE_EBX &&
+ ecx == XEN_CPUID_SIGNATURE_ECX &&
+ edx == XEN_CPUID_SIGNATURE_EDX &&
+ ((eax - XEN_CPUID_LEAF(0)) >= 2))
+ return 1;
+
+ return 0;
+}
+
+u32 xen_hvm_pv_status;
+EXPORT_SYMBOL_GPL(xen_hvm_pv_status);
+
+static int enable_hvm_pv(u64 flags)
+{
+ struct xen_hvm_pv_type a;
+
+ a.domid = DOMID_SELF;
+ a.flags = flags;
+ return HYPERVISOR_hvm_op(HVMOP_enable_pv, &a);
+}
+
+static int init_hvm_pv_info(void)
+{
+ uint32_t ecx, edx, pages, msr;
+ u64 pfn;
+
+ if (!xen_para_available())
+ return -EINVAL;
+
+ cpuid(XEN_CPUID_LEAF(2), &pages, &msr, &ecx, &edx);
+
+ /* Check if hvm_pv mode is supported */
+ if (!(edx & XEN_CPUID_FEAT2_HVM_PV))
+ return -ENODEV;
+
+ xen_hvm_pv_status = XEN_HVM_PV_ENABLED;
+
+ /* We only support 1 page of hypercall for now */
+ if (pages != 1)
+ return -ENOMEM;
+
+ pfn = __pa(hypercall_page);
+ wrmsrl(msr, pfn);
+
+ xen_setup_features();
+
+ x86_init.oem.banner = xen_hvm_pv_banner;
+ pv_info = xen_info;
+ pv_info.kernel_rpl = 0;
+
+ return 0;
+}
+
+extern struct shared_info shared_info_page;
+
+static void __init init_shared_info(void)
+{
+ struct xen_add_to_physmap xatp;
+
+ xatp.domid = DOMID_SELF;
+ xatp.idx = 0;
+ xatp.space = XENMAPSPACE_shared_info;
+ xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT;
+ if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
+ BUG();
+
+ HYPERVISOR_shared_info = (struct shared_info *)&shared_info_page;
+
+ /* Don't do the full vcpu_info placement stuff until we have a
+ possible map and a non-dummy shared_info. */
+ per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
+}
+
+void __init xen_guest_init(void)
+{
+#ifdef CONFIG_X86_32
+ return;
+#else
+ int r;
+
+ /* Ensure the we won't confused with PV */
+ if (xen_domain_type == XEN_PV_DOMAIN)
+ return;
+
+ r = init_hvm_pv_info();
+ if (r < 0)
+ return;
+
+ init_shared_info();
+
+ xen_hvm_pv_init_irq_ops();
+#endif
+}
+
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 9d30105..fadaa97 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -131,3 +131,24 @@ void __init xen_init_irq_ops()
pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
}
+
+static void xen_hvm_pv_safe_halt(void)
+{
+ /* Do local_irq_enable() explicitly in hvm_pv guest */
+ local_irq_enable();
+ xen_safe_halt();
+}
+
+static void xen_hvm_pv_halt(void)
+{
+ if (irqs_disabled())
+ HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL);
+ else
+ xen_hvm_pv_safe_halt();
+}
+
+void __init xen_hvm_pv_init_irq_ops(void)
+{
+ pv_irq_ops.safe_halt = xen_hvm_pv_safe_halt;
+ pv_irq_ops.halt = xen_hvm_pv_halt;
+}
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 1a5ff24..26041ce 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -33,6 +33,12 @@ ENTRY(hypercall_page)
.skip PAGE_SIZE_asm
.popsection

+.pushsection .data
+ .align PAGE_SIZE_asm
+ENTRY(shared_info_page)
+ .skip PAGE_SIZE_asm
+.popsection
+
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux")
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz "2.6")
ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz "xen-3.0")
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index f9153a3..cc00760 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,6 +41,7 @@ void xen_vcpu_restore(void);
void __init xen_build_dynamic_phys_to_machine(void);

void xen_init_irq_ops(void);
+void xen_hvm_pv_init_irq_ops(void);
void xen_setup_timer(int cpu);
void xen_setup_runstate_info(int cpu);
void xen_teardown_timer(int cpu);
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index 7c74ba4..0ce8a26 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -69,4 +69,11 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_set_pci_link_route);
/* Flushes all VCPU TLBs: @arg must be NULL. */
#define HVMOP_flush_tlbs 5

+#define HVMOP_enable_pv 9
+struct xen_hvm_pv_type {
+ domid_t domid;
+ uint32_t flags;
+#define HVM_PV_EVTCHN (1ull<<1)
+};
+
#endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/include/xen/xen.h b/include/xen/xen.h
index a164024..9bb92e5 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -9,6 +9,7 @@ enum xen_domain_type {

#ifdef CONFIG_XEN
extern enum xen_domain_type xen_domain_type;
+extern void xen_guest_init(void);
#else
#define xen_domain_type XEN_NATIVE
#endif
@@ -19,6 +20,14 @@ extern enum xen_domain_type xen_domain_type;
#define xen_hvm_domain() (xen_domain() && \
xen_domain_type == XEN_HVM_DOMAIN)

+#define XEN_HVM_PV_ENABLED (1u << 0)
+#define XEN_HVM_PV_EVTCHN_ENABLED (1u << 1)
+extern u32 xen_hvm_pv_status;
+
+#define xen_hvm_pv_enabled() (xen_hvm_pv_status & XEN_HVM_PV_ENABLED)
+#define xen_hvm_pv_evtchn_enabled() (xen_hvm_pv_enabled() && \
+ (xen_hvm_pv_status & XEN_HVM_PV_EVTCHN_ENABLED))
+
#ifdef CONFIG_XEN_DOM0
#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
--
1.5.4.5

2010-02-08 08:06:24

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 7/7] xen: Enable grant table and xenbus

Now pv drivers(vnif, vbd) can work.

Signed-off-by: Sheng Yang <[email protected]>
---

Ian, I give vkbd and vfb a try, and found they can't work... I took a look
at unmodified_driver, and found it also didn't include them. So I leave them
disabled in HVM guest.

arch/x86/xen/enlighten.c | 1 +
drivers/input/xen-kbdfront.c | 3 ++
drivers/video/xen-fbfront.c | 3 ++
drivers/xen/grant-table.c | 59 +++++++++++++++++++++++++++++++++++--
drivers/xen/xenbus/xenbus_probe.c | 22 ++++++++++---
include/xen/xen.h | 2 +
include/xen/xenbus.h | 2 +-
7 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 53bf824..d82bfe1 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1273,6 +1273,7 @@ static int init_hvm_pv_info(void)
pv_info = xen_info;
pv_info.kernel_rpl = 0;

+ xen_domain_type = XEN_HVM_DOMAIN;
return 0;
}

diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c
index febffc3..8e58812 100644
--- a/drivers/input/xen-kbdfront.c
+++ b/drivers/input/xen-kbdfront.c
@@ -342,6 +342,9 @@ static int __init xenkbd_init(void)
if (xen_initial_domain())
return -ENODEV;

+ if (xen_hvm_domain())
+ return -ENODEV;
+
return xenbus_register_frontend(&xenkbd_driver);
}

diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
index daff72f..b173474 100644
--- a/drivers/video/xen-fbfront.c
+++ b/drivers/video/xen-fbfront.c
@@ -687,6 +687,9 @@ static int __init xenfb_init(void)
if (xen_initial_domain())
return -ENODEV;

+ if (xen_hvm_domain())
+ return -ENODEV;
+
return xenbus_register_frontend(&xenfb_driver);
}

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 4c6c0bd..abffda5 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -46,6 +46,8 @@
#include <asm/pgtable.h>
#include <asm/sync_bitops.h>

+#include <xen/interface/memory.h>
+#include <linux/io.h>

/* External tools reserve first few grant table entries. */
#define NR_RESERVED_ENTRIES 8
@@ -441,12 +443,33 @@ static inline unsigned int max_nr_grant_frames(void)
return xen_max;
}

+static unsigned long hvm_pv_resume_frames;
+
static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
{
struct gnttab_setup_table setup;
unsigned long *frames;
unsigned int nr_gframes = end_idx + 1;
int rc;
+ struct xen_add_to_physmap xatp;
+ unsigned int i = end_idx;
+
+ if (xen_hvm_pv_evtchn_enabled()) {
+ /*
+ * Loop backwards, so that the first hypercall has the largest
+ * index, ensuring that the table will grow only once.
+ */
+ do {
+ xatp.domid = DOMID_SELF;
+ xatp.idx = i;
+ xatp.space = XENMAPSPACE_grant_table;
+ xatp.gpfn = (hvm_pv_resume_frames >> PAGE_SHIFT) + i;
+ if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
+ BUG();
+ } while (i-- > start_idx);
+
+ return 0;
+ }

frames = kmalloc(nr_gframes * sizeof(unsigned long), GFP_ATOMIC);
if (!frames)
@@ -473,11 +496,41 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
return 0;
}

+/* The region reserved by QEmu for Xen platform device */
+#define GNTTAB_START 0xf2000000ul
+#define GNTTAB_SIZE 0x20000ul
+
int gnttab_resume(void)
{
- if (max_nr_grant_frames() < nr_grant_frames)
+ unsigned int max_nr_gframes;
+
+ max_nr_gframes = max_nr_grant_frames();
+ if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
- return gnttab_map(0, nr_grant_frames - 1);
+
+ if (xen_pv_domain())
+ return gnttab_map(0, nr_grant_frames - 1);
+
+ /* Xen PV featured HVM */
+ if (!hvm_pv_resume_frames) {
+ if (PAGE_SIZE * max_nr_gframes > GNTTAB_SIZE) {
+ printk(KERN_WARNING
+ "Grant table size exceed the limit!\n");
+ return -EINVAL;
+ }
+ hvm_pv_resume_frames = GNTTAB_START;
+ shared = ioremap(hvm_pv_resume_frames,
+ PAGE_SIZE * max_nr_gframes);
+ if (shared == NULL) {
+ printk(KERN_WARNING
+ "Fail to ioremap gnttab share frames\n");
+ return -ENOMEM;
+ }
+ }
+
+ gnttab_map(0, nr_grant_frames - 1);
+
+ return 0;
}

int gnttab_suspend(void)
@@ -510,7 +563,7 @@ static int __devinit gnttab_init(void)
unsigned int max_nr_glist_frames, nr_glist_frames;
unsigned int nr_init_grefs;

- if (!xen_domain())
+ if (!xen_evtchn_enabled())
return -ENODEV;

nr_grant_frames = 1;
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 2f7aaa9..2704f0c 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -55,6 +55,8 @@
#include <xen/events.h>
#include <xen/page.h>

+#include <xen/hvm.h>
+
#include "xenbus_comms.h"
#include "xenbus_probe.h"

@@ -786,7 +788,8 @@ static int __init xenbus_probe_init(void)
DPRINTK("");

err = -ENODEV;
- if (!xen_domain())
+
+ if (!xen_evtchn_enabled())
goto out_error;

/* Register ourselves with the kernel bus subsystem */
@@ -805,10 +808,19 @@ static int __init xenbus_probe_init(void)
/* dom0 not yet supported */
} else {
xenstored_ready = 1;
- xen_store_evtchn = xen_start_info->store_evtchn;
- xen_store_mfn = xen_start_info->store_mfn;
+ if (xen_hvm_pv_evtchn_enabled()) {
+ xen_store_evtchn =
+ hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
+ xen_store_mfn =
+ hvm_get_parameter(HVM_PARAM_STORE_PFN);
+ xen_store_interface =
+ ioremap(xen_store_mfn << PAGE_SHIFT, PAGE_SIZE);
+ } else {
+ xen_store_evtchn = xen_start_info->store_evtchn;
+ xen_store_mfn = xen_start_info->store_mfn;
+ xen_store_interface = mfn_to_virt(xen_store_mfn);
+ }
}
- xen_store_interface = mfn_to_virt(xen_store_mfn);

/* Initialize the interface to xenstore. */
err = xs_init();
@@ -922,7 +934,7 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
struct device_driver *drv = xendrv ? &xendrv->driver : NULL;
unsigned int seconds_waited = 0;

- if (!ready_to_wait_for_devices || !xen_domain())
+ if (!ready_to_wait_for_devices || !xen_evtchn_enabled())
return;

while (exists_connecting_device(drv)) {
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 9bb92e5..f949aee 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -28,6 +28,8 @@ extern u32 xen_hvm_pv_status;
#define xen_hvm_pv_evtchn_enabled() (xen_hvm_pv_enabled() && \
(xen_hvm_pv_status & XEN_HVM_PV_EVTCHN_ENABLED))

+#define xen_evtchn_enabled() (xen_pv_domain() || xen_hvm_pv_evtchn_enabled())
+
#ifdef CONFIG_XEN_DOM0
#include <xen/interface/xen.h>
#include <asm/xen/hypervisor.h>
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 9f68cf5..0e79644 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -113,7 +113,7 @@ static inline int __must_check
xenbus_register_frontend(struct xenbus_driver *drv)
{
WARN_ON(drv->owner != THIS_MODULE);
- if (!xen_domain())
+ if (!xen_evtchn_enabled())
return -ENODEV;
return __xenbus_register_frontend(drv, THIS_MODULE, KBUILD_MODNAME);
}
--
1.5.4.5

2010-02-08 08:06:21

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 5/7] xen: Make event channel work with PV featured HVM

We mapped each IOAPIC pin to a VIRQ, so that we can deliver interrupt through
these VIRQs.

We used X86_PLATFORM_IPI_VECTOR as the noficiation vector for hypervisor
to notify guest about the event.

The Xen PV timer is used to provide guest a reliable timer.

The patch also enabled SMP support, then we can support IPI through evtchn as well.

Then we don't need IOAPIC/LAPIC...

Signed-off-by: Sheng Yang <[email protected]>
---
arch/x86/xen/enlighten.c | 72 +++++++++++++++++++++
arch/x86/xen/irq.c | 37 ++++++++++-
arch/x86/xen/smp.c | 144 ++++++++++++++++++++++++++++++++++++++++++-
arch/x86/xen/xen-ops.h | 3 +
drivers/xen/events.c | 66 ++++++++++++++++++-
include/xen/events.h | 1 +
include/xen/hvm.h | 5 ++
include/xen/interface/xen.h | 6 ++-
8 files changed, 326 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index cbd1374..53bf824 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -58,6 +58,9 @@
#include <asm/reboot.h>
#include <asm/stackprotector.h>

+#include <xen/hvm.h>
+#include <xen/events.h>
+
#include "xen-ops.h"
#include "mmu.h"
#include "multicalls.h"
@@ -1208,6 +1211,8 @@ static void __init xen_hvm_pv_banner(void)
pv_info.name);
printk(KERN_INFO "Xen version: %d.%d%s\n",
version >> 16, version & 0xffff, extra.extraversion);
+ if (xen_hvm_pv_evtchn_enabled())
+ printk(KERN_INFO "PV feature: Event channel enabled\n");
}

static int xen_para_available(void)
@@ -1252,6 +1257,9 @@ static int init_hvm_pv_info(void)

xen_hvm_pv_status = XEN_HVM_PV_ENABLED;

+ if (edx & XEN_CPUID_FEAT2_HVM_PV_EVTCHN)
+ xen_hvm_pv_status |= XEN_HVM_PV_EVTCHN_ENABLED;
+
/* We only support 1 page of hypercall for now */
if (pages != 1)
return -ENOMEM;
@@ -1288,12 +1296,42 @@ static void __init init_shared_info(void)
per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
}

+static int set_callback_via(uint64_t via)
+{
+ struct xen_hvm_param a;
+
+ a.domid = DOMID_SELF;
+ a.index = HVM_PARAM_CALLBACK_IRQ;
+ a.value = via;
+ return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
+}
+
+void do_hvm_pv_evtchn_intr(void)
+{
+#ifdef CONFIG_X86_64
+ per_cpu(irq_count, smp_processor_id())++;
+#endif
+ xen_evtchn_do_upcall(get_irq_regs());
+#ifdef CONFIG_X86_64
+ per_cpu(irq_count, smp_processor_id())--;
+#endif
+}
+
+#ifdef CONFIG_X86_LOCAL_APIC
+static void xen_hvm_pv_evtchn_apic_write(u32 reg, u32 val)
+{
+ /* The only one reached here should be EOI */
+ WARN_ON(reg != APIC_EOI);
+}
+#endif
+
void __init xen_guest_init(void)
{
#ifdef CONFIG_X86_32
return;
#else
int r;
+ uint64_t callback_via;

/* Ensure the we won't confused with PV */
if (xen_domain_type == XEN_PV_DOMAIN)
@@ -1306,6 +1344,40 @@ void __init xen_guest_init(void)
init_shared_info();

xen_hvm_pv_init_irq_ops();
+
+ if (xen_hvm_pv_evtchn_enabled()) {
+ if (enable_hvm_pv(HVM_PV_EVTCHN))
+ return -EINVAL;
+
+ pv_time_ops = xen_time_ops;
+
+ x86_init.timers.timer_init = xen_time_init;
+ x86_init.timers.setup_percpu_clockev = x86_init_noop;
+ x86_cpuinit.setup_percpu_clockev = x86_init_noop;
+
+ x86_platform.calibrate_tsc = xen_tsc_khz;
+ x86_platform.get_wallclock = xen_get_wallclock;
+ x86_platform.set_wallclock = xen_set_wallclock;
+
+ pv_apic_ops = xen_apic_ops;
+#ifdef CONFIG_X86_LOCAL_APIC
+ /*
+ * set up the basic apic ops.
+ */
+ set_xen_basic_apic_ops();
+ apic->write = xen_hvm_pv_evtchn_apic_write;
+#endif
+
+ callback_via = HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR);
+ set_callback_via(callback_via);
+
+ x86_platform_ipi_callback = do_hvm_pv_evtchn_intr;
+
+ disable_acpi();
+
+ xen_hvm_pv_smp_init();
+ machine_ops = xen_machine_ops;
+ }
#endif
}

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index fadaa97..7827a6d 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
#include <xen/interface/vcpu.h>
+#include <xen/xen.h>

#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
@@ -132,6 +133,20 @@ void __init xen_init_irq_ops()
x86_init.irqs.intr_init = xen_init_IRQ;
}

+static void xen_hvm_pv_evtchn_disable(void)
+{
+ native_irq_disable();
+ xen_irq_disable();
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_hvm_pv_evtchn_disable);
+
+static void xen_hvm_pv_evtchn_enable(void)
+{
+ native_irq_enable();
+ xen_irq_enable();
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_hvm_pv_evtchn_enable);
+
static void xen_hvm_pv_safe_halt(void)
{
/* Do local_irq_enable() explicitly in hvm_pv guest */
@@ -147,8 +162,26 @@ static void xen_hvm_pv_halt(void)
xen_hvm_pv_safe_halt();
}

+static const struct pv_irq_ops xen_hvm_pv_irq_ops __initdata = {
+ .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
+ .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
+ .irq_disable = PV_CALLEE_SAVE(xen_hvm_pv_evtchn_disable),
+ .irq_enable = PV_CALLEE_SAVE(xen_hvm_pv_evtchn_enable),
+
+ .safe_halt = xen_hvm_pv_safe_halt,
+ .halt = xen_hvm_pv_halt,
+#ifdef CONFIG_X86_64
+ .adjust_exception_frame = paravirt_nop,
+#endif
+};
+
void __init xen_hvm_pv_init_irq_ops(void)
{
- pv_irq_ops.safe_halt = xen_hvm_pv_safe_halt;
- pv_irq_ops.halt = xen_hvm_pv_halt;
+ if (xen_hvm_pv_evtchn_enabled()) {
+ pv_irq_ops = xen_hvm_pv_irq_ops;
+ x86_init.irqs.intr_init = xen_hvm_pv_evtchn_init_IRQ;
+ } else {
+ pv_irq_ops.safe_halt = xen_hvm_pv_safe_halt;
+ pv_irq_ops.halt = xen_hvm_pv_halt;
+ }
}
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 563d205..a85d0b6 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -15,20 +15,26 @@
#include <linux/sched.h>
#include <linux/err.h>
#include <linux/smp.h>
+#include <linux/nmi.h>

#include <asm/paravirt.h>
#include <asm/desc.h>
#include <asm/pgtable.h>
#include <asm/cpu.h>
+#include <asm/trampoline.h>
+#include <asm/tlbflush.h>
+#include <asm/mtrr.h>

#include <xen/interface/xen.h>
#include <xen/interface/vcpu.h>

#include <asm/xen/interface.h>
#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>

#include <xen/page.h>
#include <xen/events.h>
+#include <xen/xen.h>

#include "xen-ops.h"
#include "mmu.h"
@@ -171,7 +177,8 @@ static void __init xen_smp_prepare_boot_cpu(void)

/* We've switched to the "real" per-cpu gdt, so make sure the
old memory can be recycled */
- make_lowmem_page_readwrite(xen_initial_gdt);
+ if (xen_pv_domain())
+ make_lowmem_page_readwrite(xen_initial_gdt);

xen_setup_vcpu_info_placement();
}
@@ -480,3 +487,138 @@ void __init xen_smp_init(void)
xen_fill_possible_map();
xen_init_spinlocks();
}
+
+static __cpuinit void xen_hvm_pv_start_secondary(void)
+{
+ int cpu = smp_processor_id();
+
+ cpu_init();
+ touch_nmi_watchdog();
+ preempt_disable();
+
+ /* otherwise gcc will move up smp_processor_id before the cpu_init */
+ barrier();
+ /*
+ * Check TSC synchronization with the BSP:
+ */
+ check_tsc_sync_target();
+
+ /* Done in smp_callin(), move it here */
+ set_mtrr_aps_delayed_init();
+ smp_store_cpu_info(cpu);
+
+ /* This must be done before setting cpu_online_mask */
+ set_cpu_sibling_map(cpu);
+ wmb();
+
+ set_cpu_online(smp_processor_id(), true);
+ per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
+
+ /* enable local interrupts */
+ local_irq_enable();
+
+ xen_setup_cpu_clockevents();
+
+ wmb();
+ cpu_idle();
+}
+
+static __cpuinit int
+hvm_pv_cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
+{
+ struct vcpu_guest_context *ctxt;
+ unsigned long start_ip;
+
+ if (cpumask_test_and_set_cpu(cpu, xen_cpu_initialized_map))
+ return 0;
+
+ ctxt = kzalloc(sizeof(*ctxt), GFP_KERNEL);
+ if (ctxt == NULL)
+ return -ENOMEM;
+
+ early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
+ initial_code = (unsigned long)xen_hvm_pv_start_secondary;
+ stack_start.sp = (void *) idle->thread.sp;
+
+ /* start_ip had better be page-aligned! */
+ start_ip = setup_trampoline();
+
+ /* only start_ip is what we want */
+ ctxt->flags = VGCF_HVM_GUEST;
+ ctxt->user_regs.eip = start_ip;
+
+ printk(KERN_INFO "Booting processor %d ip 0x%lx\n", cpu, start_ip);
+
+ if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
+ BUG();
+
+ kfree(ctxt);
+ return 0;
+}
+
+static int __init xen_hvm_pv_cpu_up(unsigned int cpu)
+{
+ struct task_struct *idle = idle_task(cpu);
+ int rc;
+ unsigned long flags;
+
+ per_cpu(current_task, cpu) = idle;
+
+#ifdef CONFIG_X86_32
+ irq_ctx_init(cpu);
+#else
+ clear_tsk_thread_flag(idle, TIF_FORK);
+ initial_gs = per_cpu_offset(cpu);
+ per_cpu(kernel_stack, cpu) =
+ (unsigned long)task_stack_page(idle) -
+ KERNEL_STACK_OFFSET + THREAD_SIZE;
+#endif
+
+ xen_setup_timer(cpu);
+ xen_init_lock_cpu(cpu);
+
+ per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+
+ rc = hvm_pv_cpu_initialize_context(cpu, idle);
+ if (rc)
+ return rc;
+
+ if (num_online_cpus() == 1)
+ alternatives_smp_switch(1);
+
+ rc = xen_smp_intr_init(cpu);
+ if (rc)
+ return rc;
+
+ rc = HYPERVISOR_vcpu_op(VCPUOP_up, cpu, NULL);
+ BUG_ON(rc);
+
+ /*
+ * Check TSC synchronization with the AP (keep irqs disabled
+ * while doing so):
+ */
+ local_irq_save(flags);
+ check_tsc_sync_source(cpu);
+ local_irq_restore(flags);
+
+ while (!cpu_online(cpu)) {
+ cpu_relax();
+ touch_nmi_watchdog();
+ }
+
+ return 0;
+}
+
+static void xen_hvm_pv_flush_tlb_others(const struct cpumask *cpumask,
+ struct mm_struct *mm, unsigned long va)
+{
+ /* TODO Make it more specific */
+ flush_tlb_all();
+}
+
+void __init xen_hvm_pv_smp_init(void)
+{
+ smp_ops = xen_smp_ops;
+ smp_ops.cpu_up = xen_hvm_pv_cpu_up;
+ pv_mmu_ops.flush_tlb_others = xen_hvm_pv_flush_tlb_others;
+}
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index cc00760..a8cc129 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -34,6 +34,7 @@ void xen_reserve_top(void);
char * __init xen_memory_setup(void);
void __init xen_arch_setup(void);
void __init xen_init_IRQ(void);
+void __init xen_hvm_pv_evtchn_init_IRQ(void);
void xen_enable_sysenter(void);
void xen_enable_syscall(void);
void xen_vcpu_restore(void);
@@ -61,10 +62,12 @@ void xen_setup_vcpu_info_placement(void);

#ifdef CONFIG_SMP
void xen_smp_init(void);
+void xen_hvm_pv_smp_init(void);

extern cpumask_var_t xen_cpu_initialized_map;
#else
static inline void xen_smp_init(void) {}
+static inline void xen_hvm_pv_smp_init(void) {}
#endif

#ifdef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index ce602dd..b1549c0 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -37,9 +37,12 @@

#include <xen/xen-ops.h>
#include <xen/events.h>
+#include <xen/xen.h>
#include <xen/interface/xen.h>
#include <xen/interface/event_channel.h>

+#include <asm/desc.h>
+
/*
* This lock protects updates to the following mapping and reference-count
* arrays. The lock does not need to be acquired to read the mapping tables.
@@ -624,8 +627,13 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
struct vcpu_info *vcpu_info = __get_cpu_var(xen_vcpu);
unsigned count;

- exit_idle();
- irq_enter();
+ /*
+ * If is PV featured HVM, these have already been done
+ */
+ if (!xen_hvm_pv_evtchn_enabled()) {
+ exit_idle();
+ irq_enter();
+ }

do {
unsigned long pending_words;
@@ -662,8 +670,10 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
} while(count != 1);

out:
- irq_exit();
- set_irq_regs(old_regs);
+ if (!xen_hvm_pv_evtchn_enabled()) {
+ irq_exit();
+ set_irq_regs(old_regs);
+ }

put_cpu();
}
@@ -944,3 +954,51 @@ void __init xen_init_IRQ(void)

irq_ctx_init(smp_processor_id());
}
+
+void __init xen_hvm_pv_evtchn_init_IRQ(void)
+{
+ int i;
+
+ xen_init_IRQ();
+ for (i = 0; i < NR_IRQS_LEGACY; i++) {
+ struct evtchn_bind_virq bind_virq;
+ struct irq_desc *desc = irq_to_desc(i);
+ int virq, evtchn;
+
+ virq = i + VIRQ_EMUL_PIN_START;
+ bind_virq.virq = virq;
+ bind_virq.vcpu = 0;
+
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+ &bind_virq) != 0)
+ BUG();
+
+ evtchn = bind_virq.port;
+ evtchn_to_irq[evtchn] = i;
+ irq_info[i] = mk_virq_info(evtchn, virq);
+
+ desc->status = IRQ_DISABLED;
+ desc->action = NULL;
+ desc->depth = 1;
+
+ /*
+ * 16 old-style INTA-cycle interrupts:
+ */
+ set_irq_chip_and_handler_name(i, &xen_dynamic_chip,
+ handle_level_irq, "event");
+ }
+
+ /*
+ * Cover the whole vector space, no vector can escape
+ * us. (some of these will be overridden and become
+ * 'special' SMP interrupts)
+ */
+ for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
+ int vector = FIRST_EXTERNAL_VECTOR + i;
+ if (vector != IA32_SYSCALL_VECTOR)
+ set_intr_gate(vector, interrupt[i]);
+ }
+
+ /* generic IPI for platform specific use, now used for HVM evtchn */
+ alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
+}
diff --git a/include/xen/events.h b/include/xen/events.h
index e68d59a..91755db 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -56,4 +56,5 @@ void xen_poll_irq(int irq);
/* Determine the IRQ which is bound to an event channel */
unsigned irq_from_evtchn(unsigned int evtchn);

+void xen_evtchn_do_upcall(struct pt_regs *regs);
#endif /* _XEN_EVENTS_H */
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
index 4ea8887..c66d788 100644
--- a/include/xen/hvm.h
+++ b/include/xen/hvm.h
@@ -20,4 +20,9 @@ static inline unsigned long hvm_get_parameter(int idx)
return xhv.value;
}

+#define HVM_CALLBACK_VIA_TYPE_VECTOR 0x2
+#define HVM_CALLBACK_VIA_TYPE_SHIFT 56
+#define HVM_CALLBACK_VECTOR(x) (((uint64_t)HVM_CALLBACK_VIA_TYPE_VECTOR)<<\
+ HVM_CALLBACK_VIA_TYPE_SHIFT | (x))
+
#endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 2befa3e..9282ff7 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -90,7 +90,11 @@
#define VIRQ_ARCH_6 22
#define VIRQ_ARCH_7 23

-#define NR_VIRQS 24
+#define VIRQ_EMUL_PIN_START 24
+#define VIRQ_EMUL_PIN_NUM 16
+
+#define NR_VIRQS 40
+
/*
* MMU-UPDATE REQUESTS
*
--
1.5.4.5

2010-02-08 08:06:12

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 1/7] xen: add support for hvm_op

From: Jeremy Fitzhardinge <[email protected]>

Add support for hvm_op hypercall.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 6 ++
include/xen/hvm.h | 23 +++++++
include/xen/interface/hvm/hvm_op.h | 72 ++++++++++++++++++++++
include/xen/interface/hvm/params.h | 111 ++++++++++++++++++++++++++++++++++
4 files changed, 212 insertions(+), 0 deletions(-)
create mode 100644 include/xen/hvm.h
create mode 100644 include/xen/interface/hvm/hvm_op.h
create mode 100644 include/xen/interface/hvm/params.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9c371e4..47c2ebb 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -417,6 +417,12 @@ HYPERVISOR_nmi_op(unsigned long op, unsigned long arg)
return _hypercall2(int, nmi_op, op, arg);
}

+static inline unsigned long __must_check
+HYPERVISOR_hvm_op(int op, void *arg)
+{
+ return _hypercall2(unsigned long, hvm_op, op, arg);
+}
+
static inline void
MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
{
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
new file mode 100644
index 0000000..4ea8887
--- /dev/null
+++ b/include/xen/hvm.h
@@ -0,0 +1,23 @@
+/* Simple wrappers around HVM functions */
+#ifndef XEN_HVM_H__
+#define XEN_HVM_H__
+
+#include <xen/interface/hvm/params.h>
+
+static inline unsigned long hvm_get_parameter(int idx)
+{
+ struct xen_hvm_param xhv;
+ int r;
+
+ xhv.domid = DOMID_SELF;
+ xhv.index = idx;
+ r = HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+ if (r < 0) {
+ printk(KERN_ERR "cannot get hvm parameter %d: %d.\n",
+ idx, r);
+ return 0;
+ }
+ return xhv.value;
+}
+
+#endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
new file mode 100644
index 0000000..7c74ba4
--- /dev/null
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -0,0 +1,72 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_HVM_HVM_OP_H__
+#define __XEN_PUBLIC_HVM_HVM_OP_H__
+
+/* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */
+#define HVMOP_set_param 0
+#define HVMOP_get_param 1
+struct xen_hvm_param {
+ domid_t domid; /* IN */
+ uint32_t index; /* IN */
+ uint64_t value; /* IN/OUT */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
+
+/* Set the logical level of one of a domain's PCI INTx wires. */
+#define HVMOP_set_pci_intx_level 2
+struct xen_hvm_set_pci_intx_level {
+ /* Domain to be updated. */
+ domid_t domid;
+ /* PCI INTx identification in PCI topology (domain:bus:device:intx). */
+ uint8_t domain, bus, device, intx;
+ /* Assertion level (0 = unasserted, 1 = asserted). */
+ uint8_t level;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_set_pci_intx_level);
+
+/* Set the logical level of one of a domain's ISA IRQ wires. */
+#define HVMOP_set_isa_irq_level 3
+struct xen_hvm_set_isa_irq_level {
+ /* Domain to be updated. */
+ domid_t domid;
+ /* ISA device identification, by ISA IRQ (0-15). */
+ uint8_t isa_irq;
+ /* Assertion level (0 = unasserted, 1 = asserted). */
+ uint8_t level;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_set_isa_irq_level);
+
+#define HVMOP_set_pci_link_route 4
+struct xen_hvm_set_pci_link_route {
+ /* Domain to be updated. */
+ domid_t domid;
+ /* PCI link identifier (0-3). */
+ uint8_t link;
+ /* ISA IRQ (1-15), or 0 (disable link). */
+ uint8_t isa_irq;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_set_pci_link_route);
+
+/* Flushes all VCPU TLBs: @arg must be NULL. */
+#define HVMOP_flush_tlbs 5
+
+#endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/include/xen/interface/hvm/params.h b/include/xen/interface/hvm/params.h
new file mode 100644
index 0000000..15d828f
--- /dev/null
+++ b/include/xen/interface/hvm/params.h
@@ -0,0 +1,111 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_HVM_PARAMS_H__
+#define __XEN_PUBLIC_HVM_PARAMS_H__
+
+#include "hvm_op.h"
+
+/*
+ * Parameter space for HVMOP_{set,get}_param.
+ */
+
+/*
+ * How should CPU0 event-channel notifications be delivered?
+ * val[63:56] == 0: val[55:0] is a delivery GSI (Global System Interrupt).
+ * val[63:56] == 1: val[55:0] is a delivery PCI INTx line, as follows:
+ * Domain = val[47:32], Bus = val[31:16],
+ * DevFn = val[15: 8], IntX = val[ 1: 0]
+ * If val == 0 then CPU0 event-channel notifications are not delivered.
+ */
+#define HVM_PARAM_CALLBACK_IRQ 0
+
+/*
+ * These are not used by Xen. They are here for convenience of HVM-guest
+ * xenbus implementations.
+ */
+#define HVM_PARAM_STORE_PFN 1
+#define HVM_PARAM_STORE_EVTCHN 2
+
+#define HVM_PARAM_PAE_ENABLED 4
+
+#define HVM_PARAM_IOREQ_PFN 5
+
+#define HVM_PARAM_BUFIOREQ_PFN 6
+
+#ifdef __ia64__
+
+#define HVM_PARAM_NVRAM_FD 7
+#define HVM_PARAM_VHPT_SIZE 8
+#define HVM_PARAM_BUFPIOREQ_PFN 9
+
+#elif defined(__i386__) || defined(__x86_64__)
+
+/* Expose Viridian interfaces to this HVM guest? */
+#define HVM_PARAM_VIRIDIAN 9
+
+#endif
+
+/*
+ * Set mode for virtual timers (currently x86 only):
+ * delay_for_missed_ticks (default):
+ * Do not advance a vcpu's time beyond the correct delivery time for
+ * interrupts that have been missed due to preemption. Deliver missed
+ * interrupts when the vcpu is rescheduled and advance the vcpu's virtual
+ * time stepwise for each one.
+ * no_delay_for_missed_ticks:
+ * As above, missed interrupts are delivered, but guest time always tracks
+ * wallclock (i.e., real) time while doing so.
+ * no_missed_ticks_pending:
+ * No missed interrupts are held pending. Instead, to ensure ticks are
+ * delivered at some non-zero rate, if we detect missed ticks then the
+ * internal tick alarm is not disabled if the VCPU is preempted during the
+ * next tick period.
+ * one_missed_tick_pending:
+ * Missed interrupts are collapsed together and delivered as one 'late tick'.
+ * Guest time always tracks wallclock (i.e., real) time.
+ */
+#define HVM_PARAM_TIMER_MODE 10
+#define HVMPTM_delay_for_missed_ticks 0
+#define HVMPTM_no_delay_for_missed_ticks 1
+#define HVMPTM_no_missed_ticks_pending 2
+#define HVMPTM_one_missed_tick_pending 3
+
+/* Boolean: Enable virtual HPET (high-precision event timer)? (x86-only) */
+#define HVM_PARAM_HPET_ENABLED 11
+
+/* Identity-map page directory used by Intel EPT when CR0.PG=0. */
+#define HVM_PARAM_IDENT_PT 12
+
+/* Device Model domain, defaults to 0. */
+#define HVM_PARAM_DM_DOMAIN 13
+
+/* ACPI S state: currently support S0 and S3 on x86. */
+#define HVM_PARAM_ACPI_S_STATE 14
+
+/* TSS used on Intel when CR0.PE=0. */
+#define HVM_PARAM_VM86_TSS 15
+
+/* Boolean: Enable aligning all periodic vpts to reduce interrupts */
+#define HVM_PARAM_VPT_ALIGN 16
+
+#define HVM_NR_PARAMS 17
+
+#endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
--
1.5.4.5

2010-02-08 08:06:10

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 2/7] xen: Import cpuid.h from Xen

From: Keir Fraser <[email protected]>

Which would be used by CPUID detection later

Signed-off-by: Keir Fraser <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
---
arch/x86/include/asm/xen/cpuid.h | 68 ++++++++++++++++++++++++++++++++++++++
1 files changed, 68 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/xen/cpuid.h

diff --git a/arch/x86/include/asm/xen/cpuid.h b/arch/x86/include/asm/xen/cpuid.h
new file mode 100644
index 0000000..8787f03
--- /dev/null
+++ b/arch/x86/include/asm/xen/cpuid.h
@@ -0,0 +1,68 @@
+/******************************************************************************
+ * arch/include/asm/xen/cpuid.h
+ *
+ * CPUID interface to Xen.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2007 Citrix Systems, Inc.
+ *
+ * Authors:
+ * Keir Fraser <[email protected]>
+ */
+
+#ifndef __ASM_X86_XEN_CPUID_H__
+#define __ASM_X86_XEN_CPUID_H__
+
+/* Xen identification leaves start at 0x40000000. */
+#define XEN_CPUID_FIRST_LEAF 0x40000000
+#define XEN_CPUID_LEAF(i) (XEN_CPUID_FIRST_LEAF + (i))
+
+/*
+ * Leaf 1 (0x40000000)
+ * EAX: Largest Xen-information leaf. All leaves up to an including @EAX
+ * are supported by the Xen host.
+ * EBX-EDX: "XenVMMXenVMM" signature, allowing positive identification
+ * of a Xen host.
+ */
+#define XEN_CPUID_SIGNATURE_EBX 0x566e6558 /* "XenV" */
+#define XEN_CPUID_SIGNATURE_ECX 0x65584d4d /* "MMXe" */
+#define XEN_CPUID_SIGNATURE_EDX 0x4d4d566e /* "nVMM" */
+
+/*
+ * Leaf 2 (0x40000001)
+ * EAX[31:16]: Xen major version.
+ * EAX[15: 0]: Xen minor version.
+ * EBX-EDX: Reserved (currently all zeroes).
+ */
+
+/*
+ * Leaf 3 (0x40000002)
+ * EAX: Number of hypercall transfer pages. This register is always guaranteed
+ * to specify one hypercall page.
+ * EBX: Base address of Xen-specific MSRs.
+ * ECX: Features 1. Unused bits are set to zero.
+ * EDX: Features 2. Unused bits are set to zero.
+ */
+
+/* Does the host support MMU_PT_UPDATE_PRESERVE_AD for this guest? */
+#define _XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD 0
+#define XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD (1u<<0)
+
+#endif /* __XEN_PUBLIC_ARCH_X86_CPUID_H__ */
--
1.5.4.5

2010-02-08 08:06:07

by Sheng Yang

[permalink] [raw]
Subject: [PATCH 4/7] xen: The entrance for PV featured HVM

xen_guest_init() would setup the environment.

Signed-off-by: Sheng Yang <[email protected]>
---
arch/x86/kernel/setup.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f7b8b98..1568a27 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -113,6 +113,10 @@
#endif
#include <asm/mce.h>

+#ifdef CONFIG_XEN
+#include <xen/xen.h>
+#endif
+
/*
* end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
* The direct mapping extends to max_pfn_mapped, so that we can directly access
@@ -732,6 +736,10 @@ void __init setup_arch(char **cmdline_p)

x86_init.oem.arch_setup();

+#ifdef CONFIG_XEN
+ xen_guest_init();
+#endif
+
setup_memory_map();
parse_setup_data();
/* update the e820_saved too */
--
1.5.4.5

2010-02-09 11:53:05

by Ian Campbell

[permalink] [raw]
Subject: Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> + if (xen_hvm_pv_evtchn_enabled()) {
> + if (enable_hvm_pv(HVM_PV_EVTCHN))
> + return -EINVAL;
> +[...]
> + callback_via = HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR);
> + set_callback_via(callback_via);
> +
> + x86_platform_ipi_callback = do_hvm_pv_evtchn_intr;

Why this indirection via X86_PLATFORM_IPI_VECTOR?

Apart from that why not use CALLBACKOP_register subop CALLBACKTYPE_event
pointing to xen_hypervisor_callback the same as a full PV guest?

This would remove all the evtchn related code from HVMOP_enable_pv which
I think should be eventually unnecessary as an independent hypercall
since all HVM guests should simply be PV capable by default -- the
hypervisor only needs to track if the guest has made use of specific PV
functionality, not the umbrella "is PV" state.

Ian.

2010-02-09 12:46:49

by Sheng Yang

[permalink] [raw]
Subject: Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
> On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> > + if (xen_hvm_pv_evtchn_enabled()) {
> > + if (enable_hvm_pv(HVM_PV_EVTCHN))
> > + return -EINVAL;
> > +[...]
> > + callback_via =
> > HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
> > set_callback_via(callback_via);
> > +
> > + x86_platform_ipi_callback = do_hvm_pv_evtchn_intr;
>
> Why this indirection via X86_PLATFORM_IPI_VECTOR?
>
> Apart from that why not use CALLBACKOP_register subop CALLBACKTYPE_event
> pointing to xen_hypervisor_callback the same as a full PV guest?
>
> This would remove all the evtchn related code from HVMOP_enable_pv which
> I think should be eventually unnecessary as an independent hypercall
> since all HVM guests should simply be PV capable by default -- the
> hypervisor only needs to track if the guest has made use of specific PV
> functionality, not the umbrella "is PV" state.

The reason is the bounce frame buffer implemented by PV guest to inject a
event is too complex here... Basically you need to setup a stack like hardware
would do, and return to the certain guest CS:IP to handle this. And you need
to take care of every case, e.g. guest in the ring0 or ring3, guest in the
interrupt context or not, and the recursion of the handler, and so on.
Hardware can easily handle all these elegantly, you just need to inject a
vector through hardware provided method. That's much easily and elegant. Take
the advantage of hardware is still a part of our target. :)

And even with CALLBACKOP_register, I think the change in hypervisor is needed.
And I think the updated approach is near to your idea, and I am totally agree
that a functionality based enabling is better than a big umbrella. Now you can
see, a generic enabling is discard, and current the enabling is in feature
branch enabling, one at a time(though there is only one now...). The message
for the evtchn enabling of HVM hypercall transfered is, the guest won't use
IOAPIC/LAPIC now, it would purely use evtchn; so hypervisor indeed need change
to continue to service the guest. The "umbrella 'is PV'" isn't in usage now(in
fact I still put a flag there because I was wandering if I can optimize PV
halt if hypervisor is aware of it, but maybe it's too early to consider this
now..). The only three positions in Xen patch mentioned hvm_pv, are for
evtchn. So the meaning here is: HVM can be PV featured, but sometime(e.g. for
evtchn) hypervisor need aware of the state of guest, so we have judgment for
evtchn, as well as clear the hardware TSC offset in preparing for the PV timer
which would come along with evtchn. That is something we have to do. And we
don't have the similar thing for halt now, because hypervisor don't need to
know about it. We only let hypervisor know when it's necessary.

I would remove XEN_HVM_PV_ENABLED flags in the xen patchset, if it remind you
of the umbrella of "HVM PV" features.

--
regards
Yang, Sheng

2010-02-09 13:03:59

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Mon, 8 Feb 2010, Sheng Yang wrote:
> We mapped each IOAPIC pin to a VIRQ, so that we can deliver interrupt through
> these VIRQs.
>
> We used X86_PLATFORM_IPI_VECTOR as the noficiation vector for hypervisor
> to notify guest about the event.
>
> The Xen PV timer is used to provide guest a reliable timer.
>
> The patch also enabled SMP support, then we can support IPI through evtchn as well.
>
> Then we don't need IOAPIC/LAPIC...
>

First of all I want to say that this series looks much better than the
previous one.

However I think there might still be some room for improvement:
wouldn't it make more sense to map vectors to event channels instead of
IOAPIC pins? This way it could work for MSI and passthrought devices
too. Also it would make sense to have a per-vector delivery option in
vlapic.c in xen: vlapic_set_irq should probably be the function that
decides how to inject the vector into the guest, either using the
classic emulated method or an event channel.
The decision should come from a matrix so that the guest might decide to
enable event channels on a per vector basis instead of having a single
global switch.
It seems to me that doing an intx to event channel translation in
hvm/irq.c is not the right thing in the right place.

2010-02-09 14:02:11

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
> On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
> > On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> > > + if (xen_hvm_pv_evtchn_enabled()) {
> > > + if (enable_hvm_pv(HVM_PV_EVTCHN))
> > > + return -EINVAL;
> > > +[...]
> > > + callback_via =
> > > HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
> > > set_callback_via(callback_via);
> > > +
> > > + x86_platform_ipi_callback = do_hvm_pv_evtchn_intr;
> >
> > Why this indirection via X86_PLATFORM_IPI_VECTOR?
> >
> > Apart from that why not use CALLBACKOP_register subop CALLBACKTYPE_event
> > pointing to xen_hypervisor_callback the same as a full PV guest?
> >
> > This would remove all the evtchn related code from HVMOP_enable_pv which
> > I think should be eventually unnecessary as an independent hypercall
> > since all HVM guests should simply be PV capable by default -- the
> > hypervisor only needs to track if the guest has made use of specific PV
> > functionality, not the umbrella "is PV" state.
>
> The reason is the bounce frame buffer implemented by PV guest to inject a
> event is too complex here... Basically you need to setup a stack like hardware
> would do, and return to the certain guest CS:IP to handle this. And you need
> to take care of every case, e.g. guest in the ring0 or ring3, guest in the
> interrupt context or not, and the recursion of the handler, and so on.

The code for all this already exists on both the hypervisor and guest
side in order to support PV guests, would it not just be a case of
wiring it up for this case as well?

> Hardware can easily handle all these elegantly, you just need to inject a
> vector through hardware provided method. That's much easily and elegant. Take
> the advantage of hardware is still a part of our target. :)

I thought one of the points of this patchset was that there was overhead
associated with the hardware event injection mechanisms which you wanted
to avoid?

As it stands what you appear to be implementing does not seem to vary
greatly from the existing PVonHVM PCI IRQ associated with the virtual
PCI device.

> And even with CALLBACKOP_register, I think the change in hypervisor is needed.
> And I think the updated approach is near to your idea, and I am totally agree
> that a functionality based enabling is better than a big umbrella. Now you can
> see, a generic enabling is discard, and current the enabling is in feature
> branch enabling, one at a time(though there is only one now...). The message
> for the evtchn enabling of HVM hypercall transfered is, the guest won't use
> IOAPIC/LAPIC now, it would purely use evtchn; so hypervisor indeed need change
> to continue to service the guest.

There have been objections from several people to this mutually
exclusive *APIC or evtchn approach. I understand that your immediate aim
is to move everything to evtchn and that this is the easiest path to
that goal but you are then tying the hypervisor into supporting the
least flexible possible interface forever. Instead lets try and define
an interface which is flexible enough that we think it can be supported
for the long term which can also be used to meet your immediate aims.
(IOW if the guest wants to request evtchn injection for every individual
interrupt then, fine, it may do so, but if it doesn't want to do that
then the hypervisor should not force it).

If you make the distinction between evtchn and *APIC interrupts in the
LAPIC at the vector level as Stefano suggests doesn't the more flexible
interface naturally present itself? Plus you get MSI and passthrough
support as well.

Ian.

2010-02-09 17:17:49

by Sheng Yang

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Tuesday 09 February 2010 22:02:04 Ian Campbell wrote:
> On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
> > On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
> > > On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> > > > + if (xen_hvm_pv_evtchn_enabled()) {
> > > > + if (enable_hvm_pv(HVM_PV_EVTCHN))
> > > > + return -EINVAL;
> > > > +[...]
> > > > + callback_via =
> > > > HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
> > > > set_callback_via(callback_via);
> > > > +
> > > > + x86_platform_ipi_callback = do_hvm_pv_evtchn_intr;
> > >
> > > Why this indirection via X86_PLATFORM_IPI_VECTOR?
> > >
> > > Apart from that why not use CALLBACKOP_register subop
> > > CALLBACKTYPE_event pointing to xen_hypervisor_callback the same as a
> > > full PV guest?
> > >
> > > This would remove all the evtchn related code from HVMOP_enable_pv
> > > which I think should be eventually unnecessary as an independent
> > > hypercall since all HVM guests should simply be PV capable by default
> > > -- the hypervisor only needs to track if the guest has made use of
> > > specific PV functionality, not the umbrella "is PV" state.
> >
> > The reason is the bounce frame buffer implemented by PV guest to inject a
> > event is too complex here... Basically you need to setup a stack like
> > hardware would do, and return to the certain guest CS:IP to handle this.
> > And you need to take care of every case, e.g. guest in the ring0 or
> > ring3, guest in the interrupt context or not, and the recursion of the
> > handler, and so on.
>
> The code for all this already exists on both the hypervisor and guest
> side in order to support PV guests, would it not just be a case of
> wiring it up for this case as well?
>
> > Hardware can easily handle all these elegantly, you just need to inject a
> > vector through hardware provided method. That's much easily and elegant.
> > Take the advantage of hardware is still a part of our target. :)
>
> I thought one of the points of this patchset was that there was overhead
> associated with the hardware event injection mechanisms which you wanted
> to avoid?

No, I am not sure what's the overhead you are meaning here. I think I haven't
express me well... The overhead we want to eliminate is the unnecessary VMExit
due to *APIC, e.g. EOI. PV callback injection won't benefit in this case I
think.
>
> As it stands what you appear to be implementing does not seem to vary
> greatly from the existing PVonHVM PCI IRQ associated with the virtual
> PCI device.

PV driver is not our target. Our target is interrupt intensive assigned
device, with MSI/MSI-x. And we would share the solution with pv_ops dom0 as we
planned.

And this is still different from PVonHVM PCI IRQ injection, because the
ack_irq won't cause VMExit. That's the overhead we targeted.
>
> > And even with CALLBACKOP_register, I think the change in hypervisor is
> > needed. And I think the updated approach is near to your idea, and I am
> > totally agree that a functionality based enabling is better than a big
> > umbrella. Now you can see, a generic enabling is discard, and current the
> > enabling is in feature branch enabling, one at a time(though there is
> > only one now...). The message for the evtchn enabling of HVM hypercall
> > transfered is, the guest won't use IOAPIC/LAPIC now, it would purely use
> > evtchn; so hypervisor indeed need change to continue to service the
> > guest.
>
> There have been objections from several people to this mutually
> exclusive *APIC or evtchn approach. I understand that your immediate aim
> is to move everything to evtchn and that this is the easiest path to
> that goal but you are then tying the hypervisor into supporting the
> least flexible possible interface forever. Instead lets try and define
> an interface which is flexible enough that we think it can be supported
> for the long term which can also be used to meet your immediate aims.
> (IOW if the guest wants to request evtchn injection for every individual
> interrupt then, fine, it may do so, but if it doesn't want to do that
> then the hypervisor should not force it).
>
> If you make the distinction between evtchn and *APIC interrupts in the
> LAPIC at the vector level as Stefano suggests doesn't the more flexible
> interface naturally present itself? Plus you get MSI and passthrough
> support as well.

Thanks Stefano, I haven't consider this before...

But for evtchn/vector mapping, I think there is still problem existed for this
case.

For natively support MSI, LAPIC is a must. Because IA32 MSI msg/addr not
only contained the vector number, but also contained information like LAPIC
delivery mode/destination mode etc. If we want to natively support MSI, we
need LAPIC. But discard LAPIC is the target of this patchset, due to it's
unnecessary VMExit; and we would replace it with evtchn.

And still, the target of this patch is: we want to eliminate the overhead of
interrupt handling. Especially, our target overhead is *APIC, because they
would cause unnecessary VMExit in the current hardware(e.g. EOI). Then we
introduced the evtchn, because it's a mature shared memory based event
delivery mechanism, with the minimal overhead. We replace the *APIC with
dynamic IRQ chip which is more efficient, and no more unnecessary VMExit.
Because we enabled evtchn, so we can support PV driver seamless - but you know
this can also be done by platform PCI driver. The main target of this patch is
to benefit interrupt intensive assigned devices. And we would only support
MSI/MSI-X devices(if you don't mind two lines more code in Xen, we can also
get assigned device support now, with MSI2INTx translation, but I think it's a
little hacky). We are working on evtchn support on MSI/MSI-X devices; we
already have workable patches, but we want to get a solution for both PV
featured HVM and pv_ops dom0, so we are still purposing an approach that
upstream Linux can accept.

In fact, I don't think guest evtchn code was written with coexisted with other
interrupt delivery mechanism in mind. Many codes is exclusive, self-
maintained. So use it exclusive seems a good idea to keep it simple and nature
to me(sure, the easy way as well). I think it's maybe necessary to touch some
generic code if making evtchn coexist with *APIC. At the same time, MSI/MSI-X
benefit is a must for us, which means no LAPIC...

And I still have question on "flexibility": how much we can benefit if evtchn
can coexist with *APIC? What I can think of is some level triggered
interrupts, like USB, but they are rare and not useful when we targeting
servers. Well, in this case I think PVonHVM could fit the job better...

--
regards
Yang, Sheng

2010-02-09 17:58:52

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Tue, 9 Feb 2010, Sheng Yang wrote:
> Thanks Stefano, I haven't consider this before...
>
> But for evtchn/vector mapping, I think there is still problem existed for this
> case.
>
> For natively support MSI, LAPIC is a must. Because IA32 MSI msg/addr not
> only contained the vector number, but also contained information like LAPIC
> delivery mode/destination mode etc. If we want to natively support MSI, we
> need LAPIC. But discard LAPIC is the target of this patchset, due to it's
> unnecessary VMExit; and we would replace it with evtchn.
>
> And still, the target of this patch is: we want to eliminate the overhead of
> interrupt handling. Especially, our target overhead is *APIC, because they
> would cause unnecessary VMExit in the current hardware(e.g. EOI). Then we
> introduced the evtchn, because it's a mature shared memory based event
> delivery mechanism, with the minimal overhead. We replace the *APIC with
> dynamic IRQ chip which is more efficient, and no more unnecessary VMExit.
> Because we enabled evtchn, so we can support PV driver seamless - but you know
> this can also be done by platform PCI driver. The main target of this patch is
> to benefit interrupt intensive assigned devices. And we would only support
> MSI/MSI-X devices(if you don't mind two lines more code in Xen, we can also
> get assigned device support now, with MSI2INTx translation, but I think it's a
> little hacky). We are working on evtchn support on MSI/MSI-X devices; we
> already have workable patches, but we want to get a solution for both PV
> featured HVM and pv_ops dom0, so we are still purposing an approach that
> upstream Linux can accept.
>
> In fact, I don't think guest evtchn code was written with coexisted with other
> interrupt delivery mechanism in mind. Many codes is exclusive, self-
> maintained. So use it exclusive seems a good idea to keep it simple and nature
> to me(sure, the easy way as well). I think it's maybe necessary to touch some
> generic code if making evtchn coexist with *APIC. At the same time, MSI/MSI-X
> benefit is a must for us, which means no LAPIC...

First you say that for MSI to work LAPIC is a must, but then you say
that for performance gains you want to avoid LAPIC altogether.
Which one is the correct one?

If you want to avoid LAPIC then my suggestion of mapping vectors into
event channels is still a good one (if it is actually possible to do
without touching generic kernel code, but to be sure it needs to be
tried).

Regarding making event channels coexist with *APIC, my suggestion is
actually more similar to what you have already done than you think:
instead of a global switch just use a per-device (actually
per-vector) switch.
The principal difference would be that in xen instead of having all the
assert_irq related changes and a global if( is_hvm_pv_evtchn_domaini(d) ),
your changes would be limited to vlapic.c and you would check that the
guest enabled event channels as a delivery mechanism for that particular
vector, like if ( delivery_mode(vlapic, vec) == EVENT_CHANNEL ).

> And I still have question on "flexibility": how much we can benefit if evtchn
> can coexist with *APIC? What I can think of is some level triggered
> interrupts, like USB, but they are rare and not useful when we targeting
> servers. Well, in this case I think PVonHVM could fit the job better...

it is not only about flexibility, but also about code changes in
delicate code paths and designing a system that can work with pci
passthrough and MSI too.
You said that you are working on patches to make MSI devices work: maybe
seeing a working implementation of that would convince us about which one
is the correct approach.

2010-02-10 03:16:25

by Nakajima, Jun

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

Ian Campbell wrote on Tue, 9 Feb 2010 at 06:02:04:

> On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
>> On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
>>> On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
>>>> + if (xen_hvm_pv_evtchn_enabled()) {
>>>> + if (enable_hvm_pv(HVM_PV_EVTCHN))
>>>> + return -EINVAL;
>>>> +[...]
>>>> + callback_via =
>>>> HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
>>>> set_callback_via(callback_via);
>>>> +
>>>> + x86_platform_ipi_callback =
> do_hvm_pv_evtchn_intr;
>>>
>>> Why this indirection via X86_PLATFORM_IPI_VECTOR?
>>>
>>> Apart from that why not use CALLBACKOP_register subop
>>> CALLBACKTYPE_event pointing to xen_hypervisor_callback the same as a
>>> full PV guest?
>>>
>>> This would remove all the evtchn related code from HVMOP_enable_pv
>>> which I think should be eventually unnecessary as an independent
>>> hypercall since all HVM guests should simply be PV capable by default
>>> -- the hypervisor only needs to track if the guest has made use of
>>> specific PV functionality, not the umbrella "is PV" state.
>> The reason is the bounce frame buffer implemented by PV guest to
>> inject a event is too complex here... Basically you need to setup a
>> stack like hardware would do, and return to the certain guest CS:IP to
>> handle this. And you need to take care of every case, e.g. guest in the
>> ring0 or ring3, guest in the interrupt context or not, and the
>> recursion of the handler, and so on.
>
> The code for all this already exists on both the hypervisor and guest
> side in order to support PV guests, would it not just be a case of
> wiring it up for this case as well?

The code is not so useful for HVM guests. The current PV code uses the ring transition which maintains the processor state in the stack, to switch between the hypervisor and the guest, but HVM VM entry/exit does not use the stack at all. To implement an asynchronous event, i.e. callback handler for HVM, the simplest (and reliable) way is to use the architectural event (i.e. IDT-based). Otherwise, we need to modify various VMCS/VMCB fields (e.g. selectors, segments, stacks, etc.) depending on where the last VM happened using the OS-specific knowledge.

Having said that, the interface and implementation are different. I think we can use the same/similar code that registers the callback handler, by hiding such HVM-specific code from the common code path.

>
>> Hardware can easily handle all these elegantly, you just need to inject
>> a vector through hardware provided method. That's much easily and
>> elegant. Take the advantage of hardware is still a part of our target.
>> :)
> I thought one of the points of this patchset was that there was
> overhead associated with the hardware event injection mechanisms which
> you wanted to avoid?

We need to execute VM entry anyway to call back a handler in the guest kernel. Bypassing IDT vectoring does not help.

Jun
___
Intel Open Source Technology Center



????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2010-02-10 10:20:26

by Ian Campbell

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Wed, 2010-02-10 at 03:16 +0000, Nakajima, Jun wrote:
> Ian Campbell wrote on Tue, 9 Feb 2010 at 06:02:04:
>
> > On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
> >> On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
> >>> On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> >>>> + if (xen_hvm_pv_evtchn_enabled()) {
> >>>> + if (enable_hvm_pv(HVM_PV_EVTCHN))
> >>>> + return -EINVAL;
> >>>> +[...]
> >>>> + callback_via =
> >>>> HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
> >>>> set_callback_via(callback_via);
> >>>> +
> >>>> + x86_platform_ipi_callback =
> > do_hvm_pv_evtchn_intr;
> >>>
> >>> Why this indirection via X86_PLATFORM_IPI_VECTOR?
> >>>
> >>> Apart from that why not use CALLBACKOP_register subop
> >>> CALLBACKTYPE_event pointing to xen_hypervisor_callback the same as a
> >>> full PV guest?
> >>>
> >>> This would remove all the evtchn related code from HVMOP_enable_pv
> >>> which I think should be eventually unnecessary as an independent
> >>> hypercall since all HVM guests should simply be PV capable by default
> >>> -- the hypervisor only needs to track if the guest has made use of
> >>> specific PV functionality, not the umbrella "is PV" state.
> >> The reason is the bounce frame buffer implemented by PV guest to
> >> inject a event is too complex here... Basically you need to setup a
> >> stack like hardware would do, and return to the certain guest CS:IP to
> >> handle this. And you need to take care of every case, e.g. guest in the
> >> ring0 or ring3, guest in the interrupt context or not, and the
> >> recursion of the handler, and so on.
> >
> > The code for all this already exists on both the hypervisor and guest
> > side in order to support PV guests, would it not just be a case of
> > wiring it up for this case as well?
>
> The code is not so useful for HVM guests. The current PV code uses the
> ring transition which maintains the processor state in the stack, to
> switch between the hypervisor and the guest, but HVM VM entry/exit
> does not use the stack at all. To implement an asynchronous event,
> i.e. callback handler for HVM, the simplest (and reliable) way is to
> use the architectural event (i.e. IDT-based). Otherwise, we need to
> modify various VMCS/VMCB fields (e.g. selectors, segments, stacks,
> etc.) depending on where the last VM happened using the OS-specific
> knowledge.

RIP and RSP are taken from the stack just prior to vmentry (by the code
in vmx/entry.S) but you are right that CS/SS etc are not handled in this
way which would be make things more complicated, probably not worth it.

> Having said that, the interface and implementation are different. I
> think we can use the same/similar code that registers the callback
> handler, by hiding such HVM-specific code from the common code path.

Yes, I think that would be an improvement.

Even better would be if we could use the same entry point into the
kernel in both PV and HVM cases, with only the injection method on the
hypervisor side differing. AFAIK xen_hypervisor_callback expects a stack
frame very like a hardware exception so perhaps this works out? IOW can
we reference xen_hypervisor_callback directly in the IDT?

BTW I not sure we should be repurposing x86_platform_ipi in this way
(maybe this goes away with the above changes), I think it should be fine
to simply pick another free vector < 255 (perhaps even dynamically)?
There were objections on LKML to a patch which did a similar thing last
month (thread: Add "handle page fault" PV helper).

Ian.

2010-02-11 09:59:14

by Sheng Yang

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Wednesday 10 February 2010 18:20:20 Ian Campbell wrote:
> On Wed, 2010-02-10 at 03:16 +0000, Nakajima, Jun wrote:
> > Ian Campbell wrote on Tue, 9 Feb 2010 at 06:02:04:
> > > On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
> > >> On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
> > >>> On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
> > >>>> + if (xen_hvm_pv_evtchn_enabled()) {
> > >>>> + if (enable_hvm_pv(HVM_PV_EVTCHN))
> > >>>> + return -EINVAL;
> > >>>> +[...]
> > >>>> + callback_via =
> > >>>> HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
> > >>>> set_callback_via(callback_via);
> > >>>> +
> > >>>> + x86_platform_ipi_callback =
> > >
> > > do_hvm_pv_evtchn_intr;
> > >
> > >>> Why this indirection via X86_PLATFORM_IPI_VECTOR?
> > >>>
> > >>> Apart from that why not use CALLBACKOP_register subop
> > >>> CALLBACKTYPE_event pointing to xen_hypervisor_callback the same as a
> > >>> full PV guest?
> > >>>
> > >>> This would remove all the evtchn related code from HVMOP_enable_pv
> > >>> which I think should be eventually unnecessary as an independent
> > >>> hypercall since all HVM guests should simply be PV capable by default
> > >>> -- the hypervisor only needs to track if the guest has made use of
> > >>> specific PV functionality, not the umbrella "is PV" state.
> > >>
> > >> The reason is the bounce frame buffer implemented by PV guest to
> > >> inject a event is too complex here... Basically you need to setup a
> > >> stack like hardware would do, and return to the certain guest CS:IP to
> > >> handle this. And you need to take care of every case, e.g. guest in
> > >> the ring0 or ring3, guest in the interrupt context or not, and the
> > >> recursion of the handler, and so on.
> > >
> > > The code for all this already exists on both the hypervisor and guest
> > > side in order to support PV guests, would it not just be a case of
> > > wiring it up for this case as well?
> >
> > The code is not so useful for HVM guests. The current PV code uses the
> > ring transition which maintains the processor state in the stack, to
> > switch between the hypervisor and the guest, but HVM VM entry/exit
> > does not use the stack at all. To implement an asynchronous event,
> > i.e. callback handler for HVM, the simplest (and reliable) way is to
> > use the architectural event (i.e. IDT-based). Otherwise, we need to
> > modify various VMCS/VMCB fields (e.g. selectors, segments, stacks,
> > etc.) depending on where the last VM happened using the OS-specific
> > knowledge.
>
> RIP and RSP are taken from the stack just prior to vmentry (by the code
> in vmx/entry.S) but you are right that CS/SS etc are not handled in this
> way which would be make things more complicated, probably not worth it.
>
> > Having said that, the interface and implementation are different. I
> > think we can use the same/similar code that registers the callback
> > handler, by hiding such HVM-specific code from the common code path.
>
> Yes, I think that would be an improvement.
>
> Even better would be if we could use the same entry point into the
> kernel in both PV and HVM cases, with only the injection method on the
> hypervisor side differing. AFAIK xen_hypervisor_callback expects a stack
> frame very like a hardware exception so perhaps this works out? IOW can
> we reference xen_hypervisor_callback directly in the IDT?
>
> BTW I not sure we should be repurposing x86_platform_ipi in this way
> (maybe this goes away with the above changes), I think it should be fine
> to simply pick another free vector < 255 (perhaps even dynamically)?
> There were objections on LKML to a patch which did a similar thing last
> month (thread: Add "handle page fault" PV helper).

Of course this would be the best solution. I just thought a new vector is
harder to be checked in, so I choose one "generic" vector which won't show in
Xen guest. This can be improved later, for we only need to change a vector
number, if upstream accept the modification.

--
regards
Yang, Sheng

2010-02-11 09:59:08

by Sheng Yang

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVM

On Wednesday 10 February 2010 02:01:30 Stefano Stabellini wrote:
> On Tue, 9 Feb 2010, Sheng Yang wrote:
> > Thanks Stefano, I haven't consider this before...
> >
> > But for evtchn/vector mapping, I think there is still problem existed for
> > this case.
> >
> > For natively support MSI, LAPIC is a must. Because IA32 MSI msg/addr not
> > only contained the vector number, but also contained information like
> > LAPIC delivery mode/destination mode etc. If we want to natively support
> > MSI, we need LAPIC. But discard LAPIC is the target of this patchset, due
> > to it's unnecessary VMExit; and we would replace it with evtchn.
> >
> > And still, the target of this patch is: we want to eliminate the overhead
> > of interrupt handling. Especially, our target overhead is *APIC, because
> > they would cause unnecessary VMExit in the current hardware(e.g. EOI).
> > Then we introduced the evtchn, because it's a mature shared memory based
> > event delivery mechanism, with the minimal overhead. We replace the *APIC
> > with dynamic IRQ chip which is more efficient, and no more unnecessary
> > VMExit. Because we enabled evtchn, so we can support PV driver seamless -
> > but you know this can also be done by platform PCI driver. The main
> > target of this patch is to benefit interrupt intensive assigned devices.
> > And we would only support MSI/MSI-X devices(if you don't mind two lines
> > more code in Xen, we can also get assigned device support now, with
> > MSI2INTx translation, but I think it's a little hacky). We are working on
> > evtchn support on MSI/MSI-X devices; we already have workable patches,
> > but we want to get a solution for both PV featured HVM and pv_ops dom0,
> > so we are still purposing an approach that upstream Linux can accept.
> >
> > In fact, I don't think guest evtchn code was written with coexisted with
> > other interrupt delivery mechanism in mind. Many codes is exclusive,
> > self- maintained. So use it exclusive seems a good idea to keep it simple
> > and nature to me(sure, the easy way as well). I think it's maybe
> > necessary to touch some generic code if making evtchn coexist with *APIC.
> > At the same time, MSI/MSI-X benefit is a must for us, which means no
> > LAPIC...
>
> First you say that for MSI to work LAPIC is a must, but then you say
> that for performance gains you want to avoid LAPIC altogether.
> Which one is the correct one?

What I mean is, if you want to support MSI without modification of
generic kernel code(natively support), LAPIC is a must. But we want to avoid
LAPIC to get performance gain, so we would have to modify generic kernel code,
and discard LAPIC. And the same method applied to pv_ops dom0, because it use
evtchn as well.

> If you want to avoid LAPIC then my suggestion of mapping vectors into
> event channels is still a good one (if it is actually possible to do
> without touching generic kernel code, but to be sure it needs to be
> tried).

Could you elaborate your idea? Per my understanding, seems it can't make
situation better. I've explained a part of the reason.

> Regarding making event channels coexist with *APIC, my suggestion is
> actually more similar to what you have already done than you think:
> instead of a global switch just use a per-device (actually
> per-vector) switch.
>
> The principal difference would be that in xen instead of having all the
> assert_irq related changes and a global if( is_hvm_pv_evtchn_domaini(d) ),
> your changes would be limited to vlapic.c and you would check that the
> guest enabled event channels as a delivery mechanism for that particular
> vector, like if ( delivery_mode(vlapic, vec) == EVENT_CHANNEL ).

This can be done with per-vector/evtchn within the current framework. The
virq_to_evtchn[] can do these, because they are enabled by bind_virq
hypercall. I can update this, the semantic reason sound good. But I still have
reservations to change the delivery mechanism of each vector. I'd like stick
to the typical usage mode, and keep it simple.

But I don't think vlapic is a good place for this. The interrupt delivery
mechanism should be higher level, vlapic means you stick with *APIC mechanism,
but it's not true.

> > And I still have question on "flexibility": how much we can benefit if
> > evtchn can coexist with *APIC? What I can think of is some level
> > triggered interrupts, like USB, but they are rare and not useful when we
> > targeting servers. Well, in this case I think PVonHVM could fit the job
> > better...
>
> it is not only about flexibility, but also about code changes in
> delicate code paths and designing a system that can work with pci
> passthrough and MSI too.

The MSI/MSI-X is the target, but if we add more code then we want benefit from
them. Stick with LAPIC is no benefit I think.

> You said that you are working on patches to make MSI devices work: maybe
> seeing a working implementation of that would convince us about which one
> is the correct approach.

Um, we don't want to show code to the community before it's mature. I can
describe one implementation: it add a hook in arch_setup_msi_irqs(), and write
the self-defined MSI data/addr(contained event channel information) to the PCI
configuration/MMIO; then hypervisor/qemu can intercept and parse it, then we
can get the event when real device's interrupt injected.

--
regards
Yang, Sheng

2010-02-12 11:56:29

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 5/7] xen: Make event channel work with PV featured HVe

On Thu, 11 Feb 2010, Sheng Yang wrote:
>
> The MSI/MSI-X is the target, but if we add more code then we want benefit from
> them. Stick with LAPIC is no benefit I think.
>

We wouldn't stick with LAPIC: the guest could still decide to use event
channels for all the vectors and LAPIC usage would be avoided, and it is
probably what is going to happen.


> > You said that you are working on patches to make MSI devices work: maybe
> > seeing a working implementation of that would convince us about which one
> > is the correct approach.
>
> Um, we don't want to show code to the community before it's mature. I can
> describe one implementation: it add a hook in arch_setup_msi_irqs(), and write
> the self-defined MSI data/addr(contained event channel information) to the PCI
> configuration/MMIO; then hypervisor/qemu can intercept and parse it, then we
> can get the event when real device's interrupt injected.
>

your approach needs:

- global enable of evtchns in place of legacy irqs on the linux side
- special translation irq -> evtchn in irq.c on the xen side
- special requests of evtchns in place of MSIs on the linux side
(touching generic kernel code)
- special handling of evtchns in place of MSIs on the qemu/xen side

the last two points are particularly worrying.
My approach needs:

- per vector enable of evtchns on the linux side
- special delivery of evtchns for guest's vectors in vlapic.c on the xen side

I think it is worth giving it a try, given that it is simpler and it
doesn't need any change in the generic kernel code.

In any case it seems to me that the MSI\evtchn work should be part of
this patch series, because it is difficult to understand if your
approach makes sense or not without it. We should probably just wait for
it to be complete before proceeding further.