2010-06-21 16:12:22

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 0/12] PV on HVM Xen

Hi all,
this is another update on the PV on HVM Xen series; a list of changes
compared to the previous version follows:

- the bug caused by modprobing xen frontend modules when xenbus is not
initialized has been fixed: now xenbus is always initialized during
postcore_initcall, before device drivers are initialized, so that we
can be sure that loading xen pv frontends will always happen afterwards,
no matter how the xen platform pci driver is built.
In order to do that xenbus_probe has been moved out of the xenbus
initialization and called it later on at device_initcall.

- a new HVMOP_pagetable_dying hypercall has been added to notify Xen
that a pagetable is going to be destroyed: this improves performances
significantly when running on shadow pagetables.
A patch is currently need on the Xen side for this to work.

Jeremy's comments have been addressed:

- xen_guest_init has been renamed xen_hvm_guest_init;

- init_hvm_time has been moved to arch/x86/xen/time.c;

Konrad's comments have been addressed:

- gnttab_max_nr_grant_frames has been renamed gnttab_max_grant_frames;

- few inaccurate comments have been rewritten;

- the preprocessor checks in platform-pci-unplug.c have been moved to
platform_pci.h;

- few other code style improvements, like using dev_err instead of
printk and strncmp instead of strcmp.



A git tree is available here:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git

branch name 2.6.34-pvhvm-v4.

Cheers,

Stefano


2010-06-21 16:14:00

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 0/12] PV on HVM Xen

As usual I mistyped the topic, the series is made up of 13 patches now
:)

On Mon, 21 Jun 2010, Stefano Stabellini wrote:
> Hi all,
> this is another update on the PV on HVM Xen series; a list of changes
> compared to the previous version follows:
>
> - the bug caused by modprobing xen frontend modules when xenbus is not
> initialized has been fixed: now xenbus is always initialized during
> postcore_initcall, before device drivers are initialized, so that we
> can be sure that loading xen pv frontends will always happen afterwards,
> no matter how the xen platform pci driver is built.
> In order to do that xenbus_probe has been moved out of the xenbus
> initialization and called it later on at device_initcall.
>
> - a new HVMOP_pagetable_dying hypercall has been added to notify Xen
> that a pagetable is going to be destroyed: this improves performances
> significantly when running on shadow pagetables.
> A patch is currently need on the Xen side for this to work.
>
> Jeremy's comments have been addressed:
>
> - xen_guest_init has been renamed xen_hvm_guest_init;
>
> - init_hvm_time has been moved to arch/x86/xen/time.c;
>
> Konrad's comments have been addressed:
>
> - gnttab_max_nr_grant_frames has been renamed gnttab_max_grant_frames;
>
> - few inaccurate comments have been rewritten;
>
> - the preprocessor checks in platform-pci-unplug.c have been moved to
> platform_pci.h;
>
> - few other code style improvements, like using dev_err instead of
> printk and strncmp instead of strcmp.
>
>
>
> A git tree is available here:
>
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>
> branch name 2.6.34-pvhvm-v4.
>
> Cheers,
>
> Stefano
>

2010-06-21 16:23:53

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 01/13] Add support for hvm_op

From: Jeremy Fitzhardinge <[email protected]>

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/include/asm/xen/hypercall.h | 6 ++
include/xen/hvm.h | 24 +++++++++
include/xen/interface/hvm/hvm_op.h | 35 ++++++++++++
include/xen/interface/hvm/params.h | 95 ++++++++++++++++++++++++++++++++++
4 files changed, 160 insertions(+), 0 deletions(-)
create mode 100644 include/xen/hvm.h
create mode 100644 include/xen/interface/hvm/hvm_op.h
create mode 100644 include/xen/interface/hvm/params.h

diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index 9c371e4..7fda040 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -417,6 +417,12 @@ HYPERVISOR_nmi_op(unsigned long op, unsigned long arg)
return _hypercall2(int, nmi_op, op, arg);
}

+static inline unsigned long __must_check
+HYPERVISOR_hvm_op(int op, void *arg)
+{
+ return _hypercall2(unsigned long, hvm_op, op, arg);
+}
+
static inline void
MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
{
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
new file mode 100644
index 0000000..708c821
--- /dev/null
+++ b/include/xen/hvm.h
@@ -0,0 +1,24 @@
+/* Simple wrappers around HVM functions */
+#ifndef XEN_HVM_H__
+#define XEN_HVM_H__
+
+#include <xen/interface/hvm/params.h>
+
+static inline int hvm_get_parameter(int idx, uint64_t *value)
+{
+ struct xen_hvm_param xhv;
+ int r;
+
+ xhv.domid = DOMID_SELF;
+ xhv.index = idx;
+ r = HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+ if (r < 0) {
+ printk(KERN_ERR "cannot get hvm parameter %d: %d.\n",
+ idx, r);
+ return r;
+ }
+ *value = xhv.value;
+ return r;
+}
+
+#endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
new file mode 100644
index 0000000..73c8c7e
--- /dev/null
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -0,0 +1,35 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_HVM_HVM_OP_H__
+#define __XEN_PUBLIC_HVM_HVM_OP_H__
+
+/* Get/set subcommands: the second argument of the hypercall is a
+ * pointer to a xen_hvm_param struct. */
+#define HVMOP_set_param 0
+#define HVMOP_get_param 1
+struct xen_hvm_param {
+ domid_t domid; /* IN */
+ uint32_t index; /* IN */
+ uint64_t value; /* IN/OUT */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
+
+#endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
diff --git a/include/xen/interface/hvm/params.h b/include/xen/interface/hvm/params.h
new file mode 100644
index 0000000..1888d8c
--- /dev/null
+++ b/include/xen/interface/hvm/params.h
@@ -0,0 +1,95 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_HVM_PARAMS_H__
+#define __XEN_PUBLIC_HVM_PARAMS_H__
+
+#include "hvm_op.h"
+
+/*
+ * Parameter space for HVMOP_{set,get}_param.
+ */
+
+/*
+ * How should CPU0 event-channel notifications be delivered?
+ * val[63:56] == 0: val[55:0] is a delivery GSI (Global System Interrupt).
+ * val[63:56] == 1: val[55:0] is a delivery PCI INTx line, as follows:
+ * Domain = val[47:32], Bus = val[31:16],
+ * DevFn = val[15: 8], IntX = val[ 1: 0]
+ * val[63:56] == 2: val[7:0] is a vector number.
+ * If val == 0 then CPU0 event-channel notifications are not delivered.
+ */
+#define HVM_PARAM_CALLBACK_IRQ 0
+
+#define HVM_PARAM_STORE_PFN 1
+#define HVM_PARAM_STORE_EVTCHN 2
+
+#define HVM_PARAM_PAE_ENABLED 4
+
+#define HVM_PARAM_IOREQ_PFN 5
+
+#define HVM_PARAM_BUFIOREQ_PFN 6
+
+/*
+ * Set mode for virtual timers (currently x86 only):
+ * delay_for_missed_ticks (default):
+ * Do not advance a vcpu's time beyond the correct delivery time for
+ * interrupts that have been missed due to preemption. Deliver missed
+ * interrupts when the vcpu is rescheduled and advance the vcpu's virtual
+ * time stepwise for each one.
+ * no_delay_for_missed_ticks:
+ * As above, missed interrupts are delivered, but guest time always tracks
+ * wallclock (i.e., real) time while doing so.
+ * no_missed_ticks_pending:
+ * No missed interrupts are held pending. Instead, to ensure ticks are
+ * delivered at some non-zero rate, if we detect missed ticks then the
+ * internal tick alarm is not disabled if the VCPU is preempted during the
+ * next tick period.
+ * one_missed_tick_pending:
+ * Missed interrupts are collapsed together and delivered as one 'late tick'.
+ * Guest time always tracks wallclock (i.e., real) time.
+ */
+#define HVM_PARAM_TIMER_MODE 10
+#define HVMPTM_delay_for_missed_ticks 0
+#define HVMPTM_no_delay_for_missed_ticks 1
+#define HVMPTM_no_missed_ticks_pending 2
+#define HVMPTM_one_missed_tick_pending 3
+
+/* Boolean: Enable virtual HPET (high-precision event timer)? (x86-only) */
+#define HVM_PARAM_HPET_ENABLED 11
+
+/* Identity-map page directory used by Intel EPT when CR0.PG=0. */
+#define HVM_PARAM_IDENT_PT 12
+
+/* Device Model domain, defaults to 0. */
+#define HVM_PARAM_DM_DOMAIN 13
+
+/* ACPI S state: currently support S0 and S3 on x86. */
+#define HVM_PARAM_ACPI_S_STATE 14
+
+/* TSS used on Intel when CR0.PE=0. */
+#define HVM_PARAM_VM86_TSS 15
+
+/* Boolean: Enable aligning all periodic vpts to reduce interrupts */
+#define HVM_PARAM_VPT_ALIGN 16
+
+#define HVM_NR_PARAMS 17
+
+#endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
--
1.7.0.4

2010-06-21 16:23:59

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 03/13] evtchn delivery on HVM

From: Sheng Yang <[email protected]>

Set the callback to receive evtchns from Xen, using the
callback vector delivery mechanism.

The traditional way for receiving event channel notifications from Xen
is interrupts from the platform PCI device.
The callback vector is a newer alternative that allow us to receive
notifications on any vcpu and doesn't need any PCI support: we allocate
a vector exclusively to receive events, in the vector handler we don't
need to interact with the vlapic, therefore we avoid a VMEXIT.

Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
---
arch/x86/include/asm/irq_vectors.h | 5 ++
arch/x86/kernel/entry_32.S | 2 +
arch/x86/kernel/entry_64.S | 3 +
arch/x86/xen/enlighten.c | 29 ++++++++++++
arch/x86/xen/xen-ops.h | 2 +
drivers/xen/events.c | 83 +++++++++++++++++++++++++++++++++---
include/xen/events.h | 7 +++
include/xen/hvm.h | 6 +++
include/xen/interface/features.h | 3 +
9 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 8767d99..ab6ac1b 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -125,6 +125,11 @@
*/
#define MCE_SELF_VECTOR 0xeb

+#ifdef CONFIG_XEN
+/* Xen vector callback to receive events in a HVM domain */
+#define XEN_HVM_EVTCHN_CALLBACK 0xe9
+#endif
+
#define NR_VECTORS 256

#define FPU_IRQ 13
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 44a8e0d..88ed128 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1147,6 +1147,8 @@ ENTRY(xen_failsafe_callback)
.previous
ENDPROC(xen_failsafe_callback)

+BUILD_INTERRUPT(xen_hvm_callback_vector, XEN_HVM_EVTCHN_CALLBACK)
+
#endif /* CONFIG_XEN */

#ifdef CONFIG_FUNCTION_TRACER
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0697ff1..d40c1e1 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1329,6 +1329,9 @@ ENTRY(xen_failsafe_callback)
CFI_ENDPROC
END(xen_failsafe_callback)

+apicinterrupt XEN_HVM_EVTCHN_CALLBACK \
+ xen_hvm_callback_vector smp_xen_hvm_callback_vector
+
#endif /* CONFIG_XEN */

/*
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index d776c02..4b2172d 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -11,6 +11,7 @@
* Jeremy Fitzhardinge <[email protected]>, XenSource Inc, 2007
*/

+#include <linux/cpu.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/smp.h>
@@ -38,6 +39,7 @@
#include <xen/interface/memory.h>
#include <xen/features.h>
#include <xen/page.h>
+#include <xen/hvm.h>
#include <xen/hvc-console.h>

#include <asm/paravirt.h>
@@ -78,6 +80,9 @@ struct shared_info xen_dummy_shared_info;

void *xen_initial_gdt;

+__read_mostly int xen_have_vector_callback;
+EXPORT_SYMBOL_GPL(xen_have_vector_callback);
+
/*
* Point at some empty memory to start with. We map the real shared_info
* page as soon as fixmap is up and running.
@@ -1276,6 +1281,24 @@ static void __init init_shared_info(void)
per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
}

+static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ int cpu = (long)hcpu;
+ switch (action) {
+ case CPU_UP_PREPARE:
+ per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata xen_hvm_cpu_notifier = {
+ .notifier_call = xen_hvm_cpu_notify,
+};
+
void __init xen_hvm_guest_init(void)
{
int r;
@@ -1289,4 +1312,10 @@ void __init xen_hvm_guest_init(void)
return;

init_shared_info();
+
+ if (xen_feature(XENFEAT_hvm_callback_vector))
+ xen_have_vector_callback = 1;
+ register_cpu_notifier(&xen_hvm_cpu_notifier);
+ have_vcpu_info_placement = 0;
+ x86_init.irqs.intr_init = xen_init_IRQ;
}
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index f9153a3..0d0e0e6 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -38,6 +38,8 @@ void xen_enable_sysenter(void);
void xen_enable_syscall(void);
void xen_vcpu_restore(void);

+void xen_callback_vector(void);
+
void __init xen_build_dynamic_phys_to_machine(void);

void xen_init_irq_ops(void);
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index db8f506..0a68343 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -29,6 +29,7 @@
#include <linux/bootmem.h>
#include <linux/slab.h>

+#include <asm/desc.h>
#include <asm/ptrace.h>
#include <asm/irq.h>
#include <asm/idle.h>
@@ -36,10 +37,14 @@
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>

+#include <xen/xen.h>
+#include <xen/hvm.h>
#include <xen/xen-ops.h>
#include <xen/events.h>
#include <xen/interface/xen.h>
#include <xen/interface/event_channel.h>
+#include <xen/interface/hvm/hvm_op.h>
+#include <xen/interface/hvm/params.h>

/*
* This lock protects updates to the following mapping and reference-count
@@ -617,17 +622,13 @@ static DEFINE_PER_CPU(unsigned, xed_nesting_count);
* a bitset of words which contain pending event bits. The second
* level is a bitset of pending events themselves.
*/
-void xen_evtchn_do_upcall(struct pt_regs *regs)
+static void __xen_evtchn_do_upcall(void)
{
int cpu = get_cpu();
- struct pt_regs *old_regs = set_irq_regs(regs);
struct shared_info *s = HYPERVISOR_shared_info;
struct vcpu_info *vcpu_info = __get_cpu_var(xen_vcpu);
unsigned count;

- exit_idle();
- irq_enter();
-
do {
unsigned long pending_words;

@@ -667,10 +668,26 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
} while(count != 1);

out:
+
+ put_cpu();
+}
+
+void xen_evtchn_do_upcall(struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
+ exit_idle();
+ irq_enter();
+
+ __xen_evtchn_do_upcall();
+
irq_exit();
set_irq_regs(old_regs);
+}

- put_cpu();
+void xen_hvm_evtchn_do_upcall(void)
+{
+ __xen_evtchn_do_upcall();
}

/* Rebind a new event channel to an existing irq. */
@@ -933,6 +950,53 @@ static struct irq_chip xen_dynamic_chip __read_mostly = {
.retrigger = retrigger_dynirq,
};

+int xen_set_callback_via(uint64_t via)
+{
+ struct xen_hvm_param a;
+ a.domid = DOMID_SELF;
+ a.index = HVM_PARAM_CALLBACK_IRQ;
+ a.value = via;
+ return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
+}
+EXPORT_SYMBOL_GPL(xen_set_callback_via);
+
+void smp_xen_hvm_callback_vector(struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
+ exit_idle();
+
+ irq_enter();
+
+ __xen_evtchn_do_upcall();
+
+ irq_exit();
+
+ set_irq_regs(old_regs);
+}
+
+/* Vector callbacks are better than PCI interrupts to receive event
+ * channel notifications because we can receive vector callbacks on any
+ * vcpu and we don't need PCI support or APIC interactions. */
+void xen_callback_vector(void)
+{
+ int rc;
+ uint64_t callback_via;
+ if (xen_have_vector_callback) {
+ callback_via = HVM_CALLBACK_VECTOR(XEN_HVM_EVTCHN_CALLBACK);
+ rc = xen_set_callback_via(callback_via);
+ if (rc) {
+ printk(KERN_ERR "Request for Xen HVM callback vector"
+ " failed.\n");
+ xen_have_vector_callback = 0;
+ return;
+ }
+ printk(KERN_INFO "Xen HVM callback vector for event delivery is "
+ "enabled\n");
+ alloc_intr_gate(XEN_HVM_EVTCHN_CALLBACK, xen_hvm_callback_vector);
+ }
+}
+
void __init xen_init_IRQ(void)
{
int i;
@@ -947,5 +1011,10 @@ void __init xen_init_IRQ(void)
for (i = 0; i < NR_EVENT_CHANNELS; i++)
mask_evtchn(i);

- irq_ctx_init(smp_processor_id());
+ if (xen_hvm_domain()) {
+ xen_callback_vector();
+ native_init_IRQ();
+ } else {
+ irq_ctx_init(smp_processor_id());
+ }
}
diff --git a/include/xen/events.h b/include/xen/events.h
index e68d59a..a15d932 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -56,4 +56,11 @@ void xen_poll_irq(int irq);
/* Determine the IRQ which is bound to an event channel */
unsigned irq_from_evtchn(unsigned int evtchn);

+/* Xen HVM evtchn vector callback */
+extern void xen_hvm_callback_vector(void);
+extern int xen_have_vector_callback;
+int xen_set_callback_via(uint64_t via);
+void xen_evtchn_do_upcall(struct pt_regs *regs);
+void xen_hvm_evtchn_do_upcall(void);
+
#endif /* _XEN_EVENTS_H */
diff --git a/include/xen/hvm.h b/include/xen/hvm.h
index 708c821..3427f92 100644
--- a/include/xen/hvm.h
+++ b/include/xen/hvm.h
@@ -3,6 +3,7 @@
#define XEN_HVM_H__

#include <xen/interface/hvm/params.h>
+#include <asm/xen/hypercall.h>

static inline int hvm_get_parameter(int idx, uint64_t *value)
{
@@ -21,4 +22,9 @@ static inline int hvm_get_parameter(int idx, uint64_t *value)
return r;
}

+#define HVM_CALLBACK_VIA_TYPE_VECTOR 0x2
+#define HVM_CALLBACK_VIA_TYPE_SHIFT 56
+#define HVM_CALLBACK_VECTOR(x) (((uint64_t)HVM_CALLBACK_VIA_TYPE_VECTOR)<<\
+ HVM_CALLBACK_VIA_TYPE_SHIFT | (x))
+
#endif /* XEN_HVM_H__ */
diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
index f51b641..8ab08b9 100644
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -41,6 +41,9 @@
/* x86: Does this Xen host support the MMU_PT_UPDATE_PRESERVE_AD hypercall? */
#define XENFEAT_mmu_pt_update_preserve_ad 5

+/* x86: Does this Xen host support the HVM callback vector type? */
+#define XENFEAT_hvm_callback_vector 8
+
#define XENFEAT_NR_SUBMAPS 1

#endif /* __XEN_PUBLIC_FEATURES_H__ */
--
1.7.0.4

2010-06-21 16:24:10

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 05/13] Add suspend\resume support for PV on HVM guests.

From: Stefano Stabellini <[email protected]>

Suspend\resume requires few different thing on HVM: the suspend
hypercall is different, we don't need to save\restore any memory related
setting, but we need to reinitialize the shared info page and the
callback mechanism.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/xen/enlighten.c | 22 ++++++++++++++++----
arch/x86/xen/suspend.c | 6 +++++
arch/x86/xen/xen-ops.h | 1 +
drivers/xen/manage.c | 45 ++++++++++++++++++++++++++++++++++++++++---
drivers/xen/platform-pci.c | 22 ++++++++++++++++++++-
include/xen/xen-ops.h | 3 ++
6 files changed, 89 insertions(+), 10 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4b2172d..90bac21 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1263,12 +1263,14 @@ static int init_hvm_pv_info(int *major, int *minor)
return 0;
}

-static void __init init_shared_info(void)
+void xen_hvm_init_shared_info(void)
{
+ int cpu;
struct xen_add_to_physmap xatp;
- struct shared_info *shared_info_page;
+ static struct shared_info *shared_info_page = 0;

- shared_info_page = (struct shared_info *) alloc_bootmem_pages(PAGE_SIZE);
+ if (!shared_info_page)
+ shared_info_page = (struct shared_info *) alloc_bootmem_pages(PAGE_SIZE);
xatp.domid = DOMID_SELF;
xatp.idx = 0;
xatp.space = XENMAPSPACE_shared_info;
@@ -1278,7 +1280,17 @@ static void __init init_shared_info(void)

HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;

- per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
+ /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
+ * page, we use it in the event channel upcall and in some pvclock
+ * related functions. We don't need the vcpu_info placement
+ * optimizations because we don't use any pv_mmu or pv_irq op on
+ * HVM.
+ * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
+ * online but xen_hvm_init_shared_info is run at resume time too and
+ * in that case multiple vcpus might be online. */
+ for_each_online_cpu(cpu) {
+ per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
+ }
}

static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
@@ -1311,7 +1323,7 @@ void __init xen_hvm_guest_init(void)
if (r < 0)
return;

- init_shared_info();
+ xen_hvm_init_shared_info();

if (xen_feature(XENFEAT_hvm_callback_vector))
xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 987267f..6ff9665 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -26,6 +26,12 @@ void xen_pre_suspend(void)
BUG();
}

+void xen_hvm_post_suspend(int suspend_cancelled)
+{
+ xen_hvm_init_shared_info();
+ xen_callback_vector();
+}
+
void xen_post_suspend(int suspend_cancelled)
{
xen_build_mfn_list_list();
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 0d0e0e6..01c9dd3 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -39,6 +39,7 @@ void xen_enable_syscall(void);
void xen_vcpu_restore(void);

void xen_callback_vector(void);
+void xen_hvm_init_shared_info(void);

void __init xen_build_dynamic_phys_to_machine(void);

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 2ac4440..0716ba6 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -9,6 +9,7 @@
#include <linux/stop_machine.h>
#include <linux/freezer.h>

+#include <xen/xen.h>
#include <xen/xenbus.h>
#include <xen/grant_table.h>
#include <xen/events.h>
@@ -17,6 +18,7 @@

#include <asm/xen/hypercall.h>
#include <asm/xen/page.h>
+#include <asm/xen/hypervisor.h>

enum shutdown_state {
SHUTDOWN_INVALID = -1,
@@ -33,10 +35,30 @@ enum shutdown_state {
static enum shutdown_state shutting_down = SHUTDOWN_INVALID;

#ifdef CONFIG_PM_SLEEP
-static int xen_suspend(void *data)
+static int xen_hvm_suspend(void *data)
{
+ struct sched_shutdown r = { .reason = SHUTDOWN_suspend };
int *cancelled = data;
+
+ BUG_ON(!irqs_disabled());
+
+ *cancelled = HYPERVISOR_sched_op(SCHEDOP_shutdown, &r);
+
+ xen_hvm_post_suspend(*cancelled);
+ gnttab_resume();
+
+ if (!*cancelled) {
+ xen_irq_resume();
+ xen_timer_resume();
+ }
+
+ return 0;
+}
+
+static int xen_suspend(void *data)
+{
int err;
+ int *cancelled = data;

BUG_ON(!irqs_disabled());

@@ -112,7 +134,10 @@ static void do_suspend(void)
goto out_resume;
}

- err = stop_machine(xen_suspend, &cancelled, cpumask_of(0));
+ if (xen_hvm_domain())
+ err = stop_machine(xen_hvm_suspend, &cancelled, cpumask_of(0));
+ else
+ err = stop_machine(xen_suspend, &cancelled, cpumask_of(0));

dpm_resume_noirq(PMSG_RESUME);

@@ -261,7 +286,19 @@ static int shutdown_event(struct notifier_block *notifier,
return NOTIFY_DONE;
}

-static int __init setup_shutdown_event(void)
+static int __init __setup_shutdown_event(void)
+{
+ /* Delay initialization in the PV on HVM case */
+ if (xen_hvm_domain())
+ return 0;
+
+ if (!xen_pv_domain())
+ return -ENODEV;
+
+ return xen_setup_shutdown_event();
+}
+
+int xen_setup_shutdown_event(void)
{
static struct notifier_block xenstore_notifier = {
.notifier_call = shutdown_event
@@ -271,4 +308,4 @@ static int __init setup_shutdown_event(void)
return 0;
}

-subsys_initcall(setup_shutdown_event);
+subsys_initcall(__setup_shutdown_event);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index cdb2199..6663201 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -31,6 +31,7 @@
#include <xen/xenbus.h>
#include <xen/events.h>
#include <xen/hvm.h>
+#include <xen/xen-ops.h>

#define DRV_NAME "xen-platform-pci"

@@ -41,6 +42,7 @@ MODULE_LICENSE("GPL");
static unsigned long platform_mmio;
static unsigned long platform_mmio_alloc;
static unsigned long platform_mmiolen;
+static uint64_t callback_via;

unsigned long alloc_xen_mmio(unsigned long len)
{
@@ -85,13 +87,25 @@ static int xen_allocate_irq(struct pci_dev *pdev)
"xen-platform-pci", pdev);
}

+static int platform_pci_resume(struct pci_dev *pdev)
+{
+ int err;
+ if (xen_have_vector_callback)
+ return 0;
+ err = xen_set_callback_via(callback_via);
+ if (err) {
+ dev_err(&pdev->dev, "platform_pci_resume failure!\n");
+ return err;
+ }
+ return 0;
+}
+
static int __devinit platform_pci_init(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
int i, ret;
long ioaddr, iolen;
long mmio_addr, mmio_len;
- uint64_t callback_via;

i = pci_enable_device(pdev);
if (i)
@@ -143,6 +157,9 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
if (ret)
goto out;
xenbus_probe(NULL);
+ ret = xen_setup_shutdown_event();
+ if (ret)
+ goto out;

out:
if (ret) {
@@ -166,6 +183,9 @@ static struct pci_driver platform_driver = {
.name = DRV_NAME,
.probe = platform_pci_init,
.id_table = platform_pci_tbl,
+#ifdef CONFIG_PM
+ .resume_early = platform_pci_resume,
+#endif
};

static int __init platform_pci_module_init(void)
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 883a21b..46bc81e 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -7,6 +7,7 @@ DECLARE_PER_CPU(struct vcpu_info *, xen_vcpu);

void xen_pre_suspend(void);
void xen_post_suspend(int suspend_cancelled);
+void xen_hvm_post_suspend(int suspend_cancelled);

void xen_mm_pin_all(void);
void xen_mm_unpin_all(void);
@@ -14,4 +15,6 @@ void xen_mm_unpin_all(void);
void xen_timer_resume(void);
void xen_arch_resume(void);

+int xen_setup_shutdown_event(void);
+
#endif /* INCLUDE_XEN_OPS_H */
--
1.7.0.4

2010-06-21 16:24:19

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 08/13] Fix possible NULL pointer dereference in print_IO_APIC

From: Stefano Stabellini <[email protected]>

Make sure chip_data is not NULL before accessing it (the VIRQ_TIMER
handler and virq handlers in general don't have any chip_data).

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index eb2789c..c64499c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1732,6 +1732,8 @@ __apicdebuginit(void) print_IO_APIC(void)
struct irq_pin_list *entry;

cfg = desc->chip_data;
+ if (!cfg)
+ continue;
entry = cfg->irq_2_pin;
if (!entry)
continue;
--
1.7.0.4

2010-06-21 16:24:27

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 10/13] Do not try to disable hpet if it hasn't been initialized before

From: Stefano Stabellini <[email protected]>

hpet_disable is called unconditionally on machine reboot if hpet support
is compiled in the kernel.
hpet_disable only checks if the machine is hpet capable but doesn't make
sure that hpet has been initialized.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/kernel/hpet.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 23b4ecd..2b299da 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -959,16 +959,18 @@ fs_initcall(hpet_late_init);

void hpet_disable(void)
{
- if (is_hpet_capable()) {
- unsigned int cfg = hpet_readl(HPET_CFG);
+ unsigned int cfg;

- if (hpet_legacy_int_enabled) {
- cfg &= ~HPET_CFG_LEGACY;
- hpet_legacy_int_enabled = 0;
- }
- cfg &= ~HPET_CFG_ENABLE;
- hpet_writel(cfg, HPET_CFG);
+ if (!is_hpet_capable() || !hpet_address || !hpet_virt_address)
+ return;
+
+ cfg = hpet_readl(HPET_CFG);
+ if (hpet_legacy_int_enabled) {
+ cfg &= ~HPET_CFG_LEGACY;
+ hpet_legacy_int_enabled = 0;
}
+ cfg &= ~HPET_CFG_ENABLE;
+ hpet_writel(cfg, HPET_CFG);
}

#ifdef CONFIG_HPET_EMULATE_RTC
--
1.7.0.4

2010-06-21 16:24:35

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 13/13] Call HVMOP_pagetable_dying on exit_mmap

From: Stefano Stabellini <[email protected]>

When a pagetable is about to be destroyed, we notify Xen so that the
hypervisor can clear the related shadow pagetable.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/xen/enlighten.c | 1 +
arch/x86/xen/mmu.c | 33 +++++++++++++++++++++++++++++++++
arch/x86/xen/mmu.h | 1 +
include/xen/interface/hvm/hvm_op.h | 11 +++++++++++
4 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5126b53..c8d9198 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1321,4 +1321,5 @@ void __init xen_hvm_guest_init(void)
have_vcpu_info_placement = 0;
x86_init.irqs.intr_init = xen_init_IRQ;
xen_hvm_init_time_ops();
+ xen_hvm_init_mmu_ops();
}
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 914f046..726c5bd 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -58,6 +58,7 @@

#include <xen/page.h>
#include <xen/interface/xen.h>
+#include <xen/interface/hvm/hvm_op.h>
#include <xen/interface/version.h>
#include <xen/hvc-console.h>

@@ -1941,6 +1942,38 @@ void __init xen_init_mmu_ops(void)
pv_mmu_ops = xen_mmu_ops;
}

+static void xen_hvm_exit_mmap(struct mm_struct *mm)
+{
+ struct xen_hvm_pagetable_dying a;
+ int rc;
+
+ a.domid = DOMID_SELF;
+ a.gpa = __pa(mm->pgd);
+ rc = HYPERVISOR_hvm_op(HVMOP_pagetable_dying, &a);
+ WARN_ON(rc < 0);
+}
+
+static int is_pagetable_dying_supported(void)
+{
+ struct xen_hvm_pagetable_dying a;
+ int rc = 0;
+
+ a.domid = DOMID_SELF;
+ a.gpa = 0x00;
+ rc = HYPERVISOR_hvm_op(HVMOP_pagetable_dying, &a);
+ if (rc == -EINVAL) {
+ printk(KERN_INFO "HVMOP_pagetable_dying not supported\n");
+ return 0;
+ }
+ return 1;
+}
+
+void __init xen_hvm_init_mmu_ops(void)
+{
+ if (is_pagetable_dying_supported())
+ pv_mmu_ops.exit_mmap = xen_hvm_exit_mmap;
+}
+
#ifdef CONFIG_XEN_DEBUG_FS

static struct dentry *d_mmu_debug;
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index 5fe6bc7..fa938c4 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -60,4 +60,5 @@ void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
unsigned long xen_read_cr2_direct(void);

extern void xen_init_mmu_ops(void);
+extern void xen_hvm_init_mmu_ops(void);
#endif /* _XEN_MMU_H */
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index 73c8c7e..a4827f4 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -32,4 +32,15 @@ struct xen_hvm_param {
};
DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);

+/* Hint from PV drivers for pagetable destruction. */
+#define HVMOP_pagetable_dying 9
+struct xen_hvm_pagetable_dying {
+ /* Domain with a pagetable about to be destroyed. */
+ domid_t domid;
+ /* guest physical address of the toplevel pagetable dying */
+ aligned_u64 gpa;
+};
+typedef struct xen_hvm_pagetable_dying xen_hvm_pagetable_dying_t;
+DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_pagetable_dying_t);
+
#endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
--
1.7.0.4

2010-06-21 16:24:38

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 11/13] Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.

From: Stefano Stabellini <[email protected]>

Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
clockevent device on all vcpus, use the xen wallclock time as wallclock
instead of rtc and use xen_clocksource as clocksource.
The pv clock algorithm needs to work correctly for the xen_clocksource
and xen wallclock to be usable, only modern Xen versions offer a
reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).

Using the hpet as clocksource means a VMEXIT every time we read/write to
the hpet mmio addresses, pvclock give us a better rating without
VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/xen/enlighten.c | 14 +--------
arch/x86/xen/suspend.c | 4 ++
arch/x86/xen/time.c | 59 ++++++++++++++++++++++++++++++++++---
arch/x86/xen/xen-ops.h | 7 +---
include/xen/interface/features.h | 3 ++
5 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 90bac21..bcd98a9 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -933,10 +933,6 @@ static const struct pv_init_ops xen_init_ops __initdata = {
.patch = xen_patch,
};

-static const struct pv_time_ops xen_time_ops __initdata = {
- .sched_clock = xen_sched_clock,
-};
-
static const struct pv_cpu_ops xen_cpu_ops __initdata = {
.cpuid = xen_cpuid,

@@ -1074,7 +1070,6 @@ asmlinkage void __init xen_start_kernel(void)
/* Install Xen paravirt ops */
pv_info = xen_info;
pv_init_ops = xen_init_ops;
- pv_time_ops = xen_time_ops;
pv_cpu_ops = xen_cpu_ops;
pv_apic_ops = xen_apic_ops;

@@ -1082,13 +1077,7 @@ asmlinkage void __init xen_start_kernel(void)
x86_init.oem.arch_setup = xen_arch_setup;
x86_init.oem.banner = xen_banner;

- x86_init.timers.timer_init = xen_time_init;
- x86_init.timers.setup_percpu_clockev = x86_init_noop;
- x86_cpuinit.setup_percpu_clockev = x86_init_noop;
-
- x86_platform.calibrate_tsc = xen_tsc_khz;
- x86_platform.get_wallclock = xen_get_wallclock;
- x86_platform.set_wallclock = xen_set_wallclock;
+ xen_init_time_ops();

/*
* Set up some pagetable state before starting to set any ptes.
@@ -1330,4 +1319,5 @@ void __init xen_hvm_guest_init(void)
register_cpu_notifier(&xen_hvm_cpu_notifier);
have_vcpu_info_placement = 0;
x86_init.irqs.intr_init = xen_init_IRQ;
+ xen_hvm_init_time_ops();
}
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 6ff9665..0774c67 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -28,8 +28,12 @@ void xen_pre_suspend(void)

void xen_hvm_post_suspend(int suspend_cancelled)
{
+ int cpu;
xen_hvm_init_shared_info();
xen_callback_vector();
+ for_each_online_cpu(cpu) {
+ xen_setup_runstate_info(cpu);
+ }
}

void xen_post_suspend(int suspend_cancelled)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 32764b8..2d76c28 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -20,6 +20,7 @@
#include <asm/xen/hypercall.h>

#include <xen/events.h>
+#include <xen/features.h>
#include <xen/interface/xen.h>
#include <xen/interface/vcpu.h>

@@ -160,7 +161,7 @@ static void do_stolen_accounting(void)
* nanoseconds, which is nanoseconds the VCPU spent in RUNNING+BLOCKED
* states.
*/
-unsigned long long xen_sched_clock(void)
+static unsigned long long xen_sched_clock(void)
{
struct vcpu_runstate_info state;
cycle_t now;
@@ -195,7 +196,7 @@ unsigned long long xen_sched_clock(void)


/* Get the TSC speed from Xen */
-unsigned long xen_tsc_khz(void)
+static unsigned long xen_tsc_khz(void)
{
struct pvclock_vcpu_time_info *info =
&HYPERVISOR_shared_info->vcpu_info[0].time;
@@ -230,7 +231,7 @@ static void xen_read_wallclock(struct timespec *ts)
put_cpu_var(xen_vcpu);
}

-unsigned long xen_get_wallclock(void)
+static unsigned long xen_get_wallclock(void)
{
struct timespec ts;

@@ -238,7 +239,7 @@ unsigned long xen_get_wallclock(void)
return ts.tv_sec;
}

-int xen_set_wallclock(unsigned long now)
+static int xen_set_wallclock(unsigned long now)
{
/* do nothing for domU */
return -1;
@@ -473,7 +474,11 @@ void xen_timer_resume(void)
}
}

-__init void xen_time_init(void)
+static const struct pv_time_ops xen_time_ops __initdata = {
+ .sched_clock = xen_sched_clock,
+};
+
+static __init void xen_time_init(void)
{
int cpu = smp_processor_id();

@@ -497,3 +502,47 @@ __init void xen_time_init(void)
xen_setup_timer(cpu);
xen_setup_cpu_clockevents();
}
+
+__init void xen_init_time_ops(void)
+{
+ pv_time_ops = xen_time_ops;
+
+ x86_init.timers.timer_init = xen_time_init;
+ x86_init.timers.setup_percpu_clockev = x86_init_noop;
+ x86_cpuinit.setup_percpu_clockev = x86_init_noop;
+
+ x86_platform.calibrate_tsc = xen_tsc_khz;
+ x86_platform.get_wallclock = xen_get_wallclock;
+ x86_platform.set_wallclock = xen_set_wallclock;
+}
+
+static void xen_hvm_setup_cpu_clockevents(void)
+{
+ int cpu = smp_processor_id();
+ xen_setup_runstate_info(cpu);
+ xen_setup_timer(cpu);
+ xen_setup_cpu_clockevents();
+}
+
+__init void xen_hvm_init_time_ops(void)
+{
+ /* vector callback is needed otherwise we cannot receive interrupts
+ * on cpu > 0 */
+ if (!xen_have_vector_callback && num_present_cpus() > 1)
+ return;
+ if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
+ printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
+ "disable pv timer\n");
+ return;
+ }
+
+ pv_time_ops = xen_time_ops;
+ x86_init.timers.timer_init = xen_time_init;
+ x86_init.timers.setup_percpu_clockev = x86_init_noop;
+ x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
+
+ x86_platform.calibrate_tsc = xen_tsc_khz;
+ x86_platform.get_wallclock = xen_get_wallclock;
+ x86_platform.set_wallclock = xen_set_wallclock;
+}
+
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 01c9dd3..089d189 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -49,11 +49,8 @@ void xen_setup_runstate_info(int cpu);
void xen_teardown_timer(int cpu);
cycle_t xen_clocksource_read(void);
void xen_setup_cpu_clockevents(void);
-unsigned long xen_tsc_khz(void);
-void __init xen_time_init(void);
-unsigned long xen_get_wallclock(void);
-int xen_set_wallclock(unsigned long time);
-unsigned long long xen_sched_clock(void);
+void __init xen_init_time_ops(void);
+void __init xen_hvm_init_time_ops(void);

irqreturn_t xen_debug_interrupt(int irq, void *dev_id);

diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
index 8ab08b9..70d2563 100644
--- a/include/xen/interface/features.h
+++ b/include/xen/interface/features.h
@@ -44,6 +44,9 @@
/* x86: Does this Xen host support the HVM callback vector type? */
#define XENFEAT_hvm_callback_vector 8

+/* x86: pvclock algorithm is safe to use on HVM */
+#define XENFEAT_hvm_safe_pvclock 9
+
#define XENFEAT_NR_SUBMAPS 1

#endif /* __XEN_PUBLIC_FEATURES_H__ */
--
1.7.0.4

2010-06-21 16:24:56

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 12/13] Unplug emulated disks and nics

From: Stefano Stabellini <[email protected]>

Add a xen_emul_unplug command line option to the kernel to unplug
xen emulated disks and nics.

Set the default value of xen_emul_unplug depending on whether or
not the Xen PV frontends and the Xen platform PCI driver have
been compiled for this kernel (modules or built-in are both OK).

Signed-off-by: Stefano Stabellini <[email protected]>
---
Documentation/kernel-parameters.txt | 11 +++
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/enlighten.c | 1 +
arch/x86/xen/platform-pci-unplug.c | 130 +++++++++++++++++++++++++++++++++++
arch/x86/xen/xen-ops.h | 1 +
drivers/xen/platform-pci.c | 4 +
drivers/xen/xenbus/xenbus_probe.c | 4 +
include/xen/platform_pci.h | 49 +++++++++++++
8 files changed, 201 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/xen/platform-pci-unplug.c
create mode 100644 include/xen/platform_pci.h

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 839b21b..716eea8 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -113,6 +113,7 @@ parameter is applicable:
More X86-64 boot options can be found in
Documentation/x86/x86_64/boot-options.txt .
X86 Either 32bit or 64bit x86 (same as X86-32+X86-64)
+ XEN Xen support is enabled

In addition, the following text indicates that the option:

@@ -2834,6 +2835,16 @@ and is between 256 and 4096 characters. It is defined in the file
xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks.
xd_geo= See header of drivers/block/xd.c.

+ xen_emul_unplug= [HW,X86,XEN]
+ Unplug Xen emulated devices
+ Format: [unplug0,][unplug1]
+ ide-disks -- unplug primary master IDE devices
+ aux-ide-disks -- unplug non-primary-master IDE devices
+ nics -- unplug network devices
+ all -- unplug all emulated devices (NICs and IDE disks)
+ ignore -- continue loading the Xen platform PCI driver even
+ if the version check failed
+
xirc2ps_cs= [NET,PCMCIA]
Format:
<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 3bb4fc2..9309546 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -12,7 +12,7 @@ CFLAGS_mmu.o := $(nostackp)

obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
- grant-table.o suspend.o
+ grant-table.o suspend.o platform-pci-unplug.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bcd98a9..5126b53 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1317,6 +1317,7 @@ void __init xen_hvm_guest_init(void)
if (xen_feature(XENFEAT_hvm_callback_vector))
xen_have_vector_callback = 1;
register_cpu_notifier(&xen_hvm_cpu_notifier);
+ xen_unplug_emulated_devices();
have_vcpu_info_placement = 0;
x86_init.irqs.intr_init = xen_init_IRQ;
xen_hvm_init_time_ops();
diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
new file mode 100644
index 0000000..d686440
--- /dev/null
+++ b/arch/x86/xen/platform-pci-unplug.c
@@ -0,0 +1,130 @@
+/******************************************************************************
+ * platform-pci-unplug.c
+ *
+ * Xen platform PCI device driver
+ * Copyright (c) 2010, Citrix
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <asm/io.h>
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#include <xen/platform_pci.h>
+
+/* boolean to signal that the platform pci device can be used */
+bool xen_platform_pci_enabled;
+EXPORT_SYMBOL_GPL(xen_platform_pci_enabled);
+static int xen_emul_unplug;
+
+static int __init check_platform_magic(void)
+{
+ short magic;
+ char protocol;
+
+ magic = inw(XEN_IOPORT_MAGIC);
+ if (magic != XEN_IOPORT_MAGIC_VAL) {
+ printk(KERN_ERR "Xen Platform PCI: unrecognised magic value\n");
+ return -1;
+ }
+
+ protocol = inb(XEN_IOPORT_PROTOVER);
+
+ printk(KERN_DEBUG "Xen Platform PCI: I/O protocol version %d\n",
+ protocol);
+
+ switch (protocol) {
+ case 1:
+ outw(XEN_IOPORT_LINUX_PRODNUM, XEN_IOPORT_PRODNUM);
+ outl(XEN_IOPORT_LINUX_DRVVER, XEN_IOPORT_DRVVER);
+ if (inw(XEN_IOPORT_MAGIC) != XEN_IOPORT_MAGIC_VAL) {
+ printk(KERN_ERR "Xen Platform: blacklisted by host\n");
+ return -3;
+ }
+ break;
+ default:
+ printk(KERN_WARNING "Xen Platform PCI: unknown I/O protocol version");
+ return -2;
+ }
+
+ return 0;
+}
+
+void __init xen_unplug_emulated_devices(void)
+{
+ int r;
+
+ /* check the version of the xen platform PCI device */
+ r = check_platform_magic();
+ /* If the version matches enable the Xen platform PCI driver.
+ * Also enable the Xen platform PCI driver if the version is really old
+ * and the user told us to ignore it. */
+ if (!r || (r == -1 && (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
+ xen_platform_pci_enabled = 1;
+ /* Set the default value of xen_emul_unplug depending on whether or
+ * not the Xen PV frontends and the Xen platform PCI driver have
+ * been compiled for this kernel (modules or built-in are both OK). */
+ if (xen_platform_pci_enabled && !xen_emul_unplug) {
+ if (xen_must_unplug_nics()) {
+ printk(KERN_INFO "Netfront and the Xen platform PCI driver have "
+ "been compiled for this kernel: unplug emulated NICs.\n");
+ xen_emul_unplug |= XEN_UNPLUG_ALL_NICS;
+ }
+ if (xen_must_unplug_disks()) {
+ printk(KERN_INFO "Blkfront and the Xen platform PCI driver have "
+ "been compiled for this kernel: unplug emulated disks.\n"
+ "You might have to change the root device\n"
+ "from /dev/hd[a-d] to /dev/xvd[a-d]\n"
+ "in your root= kernel command line option\n");
+ xen_emul_unplug |= XEN_UNPLUG_ALL_IDE_DISKS;
+ }
+ }
+ /* Now unplug the emulated devices */
+ if (xen_platform_pci_enabled && !(xen_emul_unplug & XEN_UNPLUG_IGNORE))
+ outw(xen_emul_unplug, XEN_IOPORT_UNPLUG);
+}
+
+static int __init parse_xen_emul_unplug(char *arg)
+{
+ char *p, *q;
+ int l;
+
+ for (p = arg; p; p = q) {
+ q = strchr(p, ',');
+ if (q) {
+ l = q - p;
+ q++;
+ } else {
+ l = strlen(p);
+ }
+ if (!strncmp(p, "all", l))
+ xen_emul_unplug |= XEN_UNPLUG_ALL;
+ else if (!strncmp(p, "ide-disks", l))
+ xen_emul_unplug |= XEN_UNPLUG_ALL_IDE_DISKS;
+ else if (!strncmp(p, "aux-ide-disks", l))
+ xen_emul_unplug |= XEN_UNPLUG_AUX_IDE_DISKS;
+ else if (!strncmp(p, "nics", l))
+ xen_emul_unplug |= XEN_UNPLUG_ALL_NICS;
+ else if (!strncmp(p, "ignore", l))
+ xen_emul_unplug |= XEN_UNPLUG_IGNORE;
+ else
+ printk(KERN_WARNING "unrecognised option '%s' "
+ "in module parameter 'dev_unplug'\n", p);
+ }
+ return 0;
+}
+early_param("xen_emul_unplug", parse_xen_emul_unplug);
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 089d189..ed77694 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -40,6 +40,7 @@ void xen_vcpu_restore(void);

void xen_callback_vector(void);
void xen_hvm_init_shared_info(void);
+void __init xen_unplug_emulated_devices(void);

void __init xen_build_dynamic_phys_to_machine(void);

diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 10b92ec..35f162d 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/pci.h>

+#include <xen/platform_pci.h>
#include <xen/grant_table.h>
#include <xen/xenbus.h>
#include <xen/events.h>
@@ -195,6 +196,9 @@ static int __init platform_pci_module_init(void)
{
int rc;

+ if (!xen_platform_pci_enabled)
+ return -ENODEV;
+
rc = pci_register_driver(&platform_driver);
if (rc) {
printk(KERN_INFO DRV_NAME
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3d941ec..243279a 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -56,6 +56,7 @@
#include <xen/events.h>
#include <xen/page.h>

+#include <xen/platform_pci.h>
#include <xen/hvm.h>

#include "xenbus_comms.h"
@@ -977,6 +978,9 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
#ifndef MODULE
static int __init boot_wait_for_devices(void)
{
+ if (xen_hvm_domain() && !xen_platform_pci_enabled)
+ return -ENODEV;
+
ready_to_wait_for_devices = 1;
wait_for_devices(NULL);
return 0;
diff --git a/include/xen/platform_pci.h b/include/xen/platform_pci.h
new file mode 100644
index 0000000..afa8855
--- /dev/null
+++ b/include/xen/platform_pci.h
@@ -0,0 +1,49 @@
+#ifndef _XEN_PLATFORM_PCI_H
+#define _XEN_PLATFORM_PCI_H
+
+#define XEN_IOPORT_MAGIC_VAL 0x49d2
+#define XEN_IOPORT_LINUX_PRODNUM 0x0003
+#define XEN_IOPORT_LINUX_DRVVER 0x0001
+
+#define XEN_IOPORT_BASE 0x10
+
+#define XEN_IOPORT_PLATFLAGS (XEN_IOPORT_BASE + 0) /* 1 byte access (R/W) */
+#define XEN_IOPORT_MAGIC (XEN_IOPORT_BASE + 0) /* 2 byte access (R) */
+#define XEN_IOPORT_UNPLUG (XEN_IOPORT_BASE + 0) /* 2 byte access (W) */
+#define XEN_IOPORT_DRVVER (XEN_IOPORT_BASE + 0) /* 4 byte access (W) */
+
+#define XEN_IOPORT_SYSLOG (XEN_IOPORT_BASE + 2) /* 1 byte access (W) */
+#define XEN_IOPORT_PROTOVER (XEN_IOPORT_BASE + 2) /* 1 byte access (R) */
+#define XEN_IOPORT_PRODNUM (XEN_IOPORT_BASE + 2) /* 2 byte access (W) */
+
+#define XEN_UNPLUG_ALL_IDE_DISKS 1
+#define XEN_UNPLUG_ALL_NICS 2
+#define XEN_UNPLUG_AUX_IDE_DISKS 4
+#define XEN_UNPLUG_ALL 7
+#define XEN_UNPLUG_IGNORE 8
+
+static inline int xen_must_unplug_nics(void) {
+#if (defined(CONFIG_XEN_NETDEV_FRONTEND) || \
+ defined(CONFIG_XEN_NETDEV_FRONTEND_MODULE)) && \
+ (defined(CONFIG_XEN_PLATFORM_PCI) || \
+ defined(CONFIG_XEN_PLATFORM_PCI_MODULE))
+ return 1;
+#else
+ return 0;
+#endif
+}
+
+static inline int xen_must_unplug_disks(void) {
+#if (defined(CONFIG_XEN_BLKDEV_FRONTEND) || \
+ defined(CONFIG_XEN_BLKDEV_FRONTEND_MODULE)) && \
+ (defined(CONFIG_XEN_PLATFORM_PCI) || \
+ defined(CONFIG_XEN_PLATFORM_PCI_MODULE))
+ return 1;
+#else
+ return 0;
+#endif
+}
+
+extern bool xen_platform_pci_enabled;
+
+#endif /* _XEN_PLATFORM_PCI_H */
--
1.7.0.4

2010-06-21 16:25:23

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 09/13] __setup_vector_irq: handle NULL chip_data

From: Stefano Stabellini <[email protected]>

the VIRQ_TIMER handler and virq handlers in general don't have any
chip_data.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c64499c..4d3d391 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1269,6 +1269,8 @@ void __setup_vector_irq(int cpu)
/* Mark the inuse vectors */
for_each_irq_desc(irq, desc) {
cfg = desc->chip_data;
+ if (!cfg)
+ continue;

/*
* If it is a legacy IRQ handled by the legacy PIC, this cpu
--
1.7.0.4

2010-06-21 16:25:58

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 07/13] Fix find_unbound_irq in presence of ioapic irqs.

From: Stefano Stabellini <[email protected]>

Don't break the assumption that the first 16 irqs are ISA irqs;
make sure that the irq is actually free before using it.

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/events.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 0f7546e..b029a32 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -340,9 +340,18 @@ static int find_unbound_irq(void)
int irq;
struct irq_desc *desc;

- for (irq = 0; irq < nr_irqs; irq++)
+ for (irq = 0; irq < nr_irqs; irq++) {
+ desc = irq_to_desc(irq);
+ /* only 0->15 have init'd desc; handle irq > 16 */
+ if (desc == NULL)
+ break;
+ if (desc->chip == &no_irq_chip)
+ break;
+ if (desc->chip != &xen_dynamic_chip)
+ continue;
if (irq_info[irq].type == IRQT_UNBOUND)
break;
+ }

if (irq == nr_irqs)
panic("No available IRQ to bind to: increase nr_irqs!\n");
--
1.7.0.4

2010-06-21 16:24:07

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 04/13] Xen PCI platform device driver

From: Stefano Stabellini <[email protected]>

Add the xen pci platform device driver that is responsible
for initializing the grant table and xenbus in PV on HVM mode.
Few changes to xenbus and grant table are necessary to allow the delayed
initialization in HVM mode.
Grant table needs few additional modifications to work in HVM mode.

The Xen PCI platform device raises an irq every time an event has been
delivered to us. However these interrupts are only delivered to vcpu 0.
The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
that is a little wrapper around __xen_evtchn_do_upcall, the traditional
Xen upcall handler, the very same used with traditional PV guests.

When running on HVM the event channel upcall is never called while in
progress because it is a normal Linux irq handler, therefore we cannot
be sure that evtchn_upcall_pending is 0 when returning.
For this reason if evtchn_upcall_pending is set by Xen we need to loop
again on the event channels set pending otherwise we might loose some
event channel deliveries.

Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
---
drivers/xen/Kconfig | 9 ++
drivers/xen/Makefile | 3 +-
drivers/xen/events.c | 7 +-
drivers/xen/grant-table.c | 69 ++++++++++++--
drivers/xen/platform-pci.c | 184 +++++++++++++++++++++++++++++++++++
drivers/xen/xenbus/xenbus_probe.c | 22 +++-
include/linux/pci_ids.h | 3 +
include/xen/grant_table.h | 1 +
include/xen/interface/grant_table.h | 1 +
9 files changed, 284 insertions(+), 15 deletions(-)
create mode 100644 drivers/xen/platform-pci.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index fad3df2..8819202 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -62,4 +62,13 @@ config XEN_SYS_HYPERVISOR
virtual environment, /sys/hypervisor will still be present,
but will have no xen contents.

+config XEN_PLATFORM_PCI
+ tristate "xen platform pci device driver"
+ depends on XEN
+ default y
+ help
+ Driver for the Xen PCI Platform device: it is responsible for
+ initializing xenbus and grant_table when running in a Xen HVM
+ domain. As a consequence this driver is required to run any Xen PV
+ frontend on Xen HVM.
endmenu
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 7c28434..e392fb7 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o
obj-$(CONFIG_XENFS) += xenfs/
-obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
\ No newline at end of file
+obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
+obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 0a68343..44fdbbb 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -665,7 +665,7 @@ static void __xen_evtchn_do_upcall(void)

count = __get_cpu_var(xed_nesting_count);
__get_cpu_var(xed_nesting_count) = 0;
- } while(count != 1);
+ } while (count != 1 || vcpu_info->evtchn_upcall_pending);

out:

@@ -725,7 +725,10 @@ static int rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
struct evtchn_bind_vcpu bind_vcpu;
int evtchn = evtchn_from_irq(irq);

- if (!VALID_EVTCHN(evtchn))
+ /* events delivered via platform PCI interrupts are always
+ * routed to vcpu 0 */
+ if (!VALID_EVTCHN(evtchn) ||
+ (xen_hvm_domain() && !xen_have_vector_callback))
return -1;

/* Send future instances of this interrupt to other vcpu. */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f66db3b..4c959a5 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -37,11 +37,13 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
+#include <linux/io.h>

#include <xen/xen.h>
#include <xen/interface/xen.h>
#include <xen/page.h>
#include <xen/grant_table.h>
+#include <xen/interface/memory.h>
#include <asm/xen/hypercall.h>

#include <asm/pgtable.h>
@@ -59,6 +61,7 @@ static unsigned int boot_max_nr_grant_frames;
static int gnttab_free_count;
static grant_ref_t gnttab_free_head;
static DEFINE_SPINLOCK(gnttab_list_lock);
+static unsigned long hvm_pv_resume_frames;

static struct grant_entry *shared;

@@ -449,6 +452,30 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
unsigned int nr_gframes = end_idx + 1;
int rc;

+ if (xen_hvm_domain()) {
+ struct xen_add_to_physmap xatp;
+ unsigned int i = end_idx;
+ rc = 0;
+ /*
+ * Loop backwards, so that the first hypercall has the largest
+ * index, ensuring that the table will grow only once.
+ */
+ do {
+ xatp.domid = DOMID_SELF;
+ xatp.idx = i;
+ xatp.space = XENMAPSPACE_grant_table;
+ xatp.gpfn = (hvm_pv_resume_frames >> PAGE_SHIFT) + i;
+ rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
+ if (rc != 0) {
+ printk(KERN_WARNING
+ "grant table add_to_physmap failed, err=%d\n", rc);
+ break;
+ }
+ } while (i-- > start_idx);
+
+ return rc;
+ }
+
frames = kmalloc(nr_gframes * sizeof(unsigned long), GFP_ATOMIC);
if (!frames)
return -ENOMEM;
@@ -476,9 +503,28 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)

int gnttab_resume(void)
{
- if (max_nr_grant_frames() < nr_grant_frames)
+ unsigned int max_nr_gframes;
+
+ max_nr_gframes = max_nr_grant_frames();
+ if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;
- return gnttab_map(0, nr_grant_frames - 1);
+
+ if (xen_pv_domain())
+ return gnttab_map(0, nr_grant_frames - 1);
+
+ if (!hvm_pv_resume_frames) {
+ hvm_pv_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+ shared = ioremap(hvm_pv_resume_frames, PAGE_SIZE * max_nr_gframes);
+ if (shared == NULL) {
+ printk(KERN_WARNING
+ "Fail to ioremap gnttab share frames\n");
+ return -ENOMEM;
+ }
+ }
+
+ gnttab_map(0, nr_grant_frames - 1);
+
+ return 0;
}

int gnttab_suspend(void)
@@ -505,15 +551,12 @@ static int gnttab_expand(unsigned int req_entries)
return rc;
}

-static int __devinit gnttab_init(void)
+int gnttab_init(void)
{
int i;
unsigned int max_nr_glist_frames, nr_glist_frames;
unsigned int nr_init_grefs;

- if (!xen_domain())
- return -ENODEV;
-
nr_grant_frames = 1;
boot_max_nr_grant_frames = __max_nr_grant_frames();

@@ -557,4 +600,16 @@ static int __devinit gnttab_init(void)
return -ENOMEM;
}

-core_initcall(gnttab_init);
+static int __devinit __gnttab_init(void)
+{
+ /* Delay grant-table initialization in the PV on HVM case */
+ if (xen_hvm_domain())
+ return 0;
+
+ if (!xen_pv_domain())
+ return -ENODEV;
+
+ return gnttab_init();
+}
+
+core_initcall(__gnttab_init);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
new file mode 100644
index 0000000..cdb2199
--- /dev/null
+++ b/drivers/xen/platform-pci.c
@@ -0,0 +1,184 @@
+/******************************************************************************
+ * platform-pci.c
+ *
+ * Xen platform PCI device driver
+ * Copyright (c) 2005, Intel Corporation.
+ * Copyright (c) 2007, XenSource Inc.
+ * Copyright (c) 2010, Citrix
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <asm/io.h>
+
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include <xen/grant_table.h>
+#include <xen/xenbus.h>
+#include <xen/events.h>
+#include <xen/hvm.h>
+
+#define DRV_NAME "xen-platform-pci"
+
+MODULE_AUTHOR("[email protected] and [email protected]");
+MODULE_DESCRIPTION("Xen platform PCI device");
+MODULE_LICENSE("GPL");
+
+static unsigned long platform_mmio;
+static unsigned long platform_mmio_alloc;
+static unsigned long platform_mmiolen;
+
+unsigned long alloc_xen_mmio(unsigned long len)
+{
+ unsigned long addr;
+
+ addr = platform_mmio + platform_mmio_alloc;
+ platform_mmio_alloc += len;
+ BUG_ON(platform_mmio_alloc > platform_mmiolen);
+
+ return addr;
+}
+
+static uint64_t get_callback_via(struct pci_dev *pdev)
+{
+ u8 pin;
+ int irq;
+
+ irq = pdev->irq;
+ if (irq < 16)
+ return irq; /* ISA IRQ */
+
+ pin = pdev->pin;
+
+ /* We don't know the GSI. Specify the PCI INTx line instead. */
+ return ((uint64_t)0x01 << 56) | /* PCI INTx identifier */
+ ((uint64_t)pci_domain_nr(pdev->bus) << 32) |
+ ((uint64_t)pdev->bus->number << 16) |
+ ((uint64_t)(pdev->devfn & 0xff) << 8) |
+ ((uint64_t)(pin - 1) & 3);
+}
+
+static irqreturn_t do_hvm_evtchn_intr(int irq, void *dev_id)
+{
+ xen_hvm_evtchn_do_upcall();
+ return IRQ_HANDLED;
+}
+
+static int xen_allocate_irq(struct pci_dev *pdev)
+{
+ return request_irq(pdev->irq, do_hvm_evtchn_intr,
+ IRQF_DISABLED | IRQF_NOBALANCING | IRQF_TRIGGER_RISING,
+ "xen-platform-pci", pdev);
+}
+
+static int __devinit platform_pci_init(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ int i, ret;
+ long ioaddr, iolen;
+ long mmio_addr, mmio_len;
+ uint64_t callback_via;
+
+ i = pci_enable_device(pdev);
+ if (i)
+ return i;
+
+ ioaddr = pci_resource_start(pdev, 0);
+ iolen = pci_resource_len(pdev, 0);
+
+ mmio_addr = pci_resource_start(pdev, 1);
+ mmio_len = pci_resource_len(pdev, 1);
+
+ if (mmio_addr == 0 || ioaddr == 0) {
+ dev_err(&pdev->dev, "no resources found\n");
+ ret = -ENOENT;
+ }
+
+ if (request_mem_region(mmio_addr, mmio_len, DRV_NAME) == NULL) {
+ dev_err(&pdev->dev, "MEM I/O resource 0x%lx @ 0x%lx busy\n",
+ mmio_addr, mmio_len);
+ ret = -EBUSY;
+ }
+
+ if (request_region(ioaddr, iolen, DRV_NAME) == NULL) {
+ dev_err(&pdev->dev, "I/O resource 0x%lx @ 0x%lx busy\n",
+ iolen, ioaddr);
+ ret = -EBUSY;
+ goto out;
+ }
+
+ platform_mmio = mmio_addr;
+ platform_mmiolen = mmio_len;
+
+ if (!xen_have_vector_callback) {
+ ret = xen_allocate_irq(pdev);
+ if (ret) {
+ printk(KERN_WARNING "request_irq failed err=%d\n", ret);
+ goto out;
+ }
+ callback_via = get_callback_via(pdev);
+ ret = xen_set_callback_via(callback_via);
+ if (ret) {
+ printk(KERN_WARNING
+ "Unable to set the evtchn callback err=%d\n", ret);
+ goto out;
+ }
+ }
+
+ ret = gnttab_init();
+ if (ret)
+ goto out;
+ xenbus_probe(NULL);
+
+out:
+ if (ret) {
+ release_mem_region(mmio_addr, mmio_len);
+ release_region(ioaddr, iolen);
+ pci_disable_device(pdev);
+ }
+
+ return ret;
+}
+
+static struct pci_device_id platform_pci_tbl[] __devinitdata = {
+ {PCI_VENDOR_ID_XEN, PCI_DEVICE_ID_XEN_PLATFORM,
+ PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+ {0,}
+};
+
+MODULE_DEVICE_TABLE(pci, platform_pci_tbl);
+
+static struct pci_driver platform_driver = {
+ .name = DRV_NAME,
+ .probe = platform_pci_init,
+ .id_table = platform_pci_tbl,
+};
+
+static int __init platform_pci_module_init(void)
+{
+ int rc;
+
+ rc = pci_register_driver(&platform_driver);
+ if (rc) {
+ printk(KERN_INFO DRV_NAME
+ ": No platform pci device model found\n");
+ return rc;
+ }
+ return 0;
+}
+
+module_init(platform_pci_module_init);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 0b05b62..3d941ec 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -781,8 +781,23 @@ void xenbus_probe(struct work_struct *unused)
/* Notify others that xenstore is up */
blocking_notifier_call_chain(&xenstore_chain, 0, NULL);
}
+EXPORT_SYMBOL_GPL(xenbus_probe);

-static int __init xenbus_probe_init(void)
+static int __init xenbus_probe_initcall(void)
+{
+ if (!xen_domain())
+ return -ENODEV;
+
+ if (xen_initial_domain() || xen_hvm_domain())
+ return 0;
+
+ xenbus_probe(NULL);
+ return 0;
+}
+
+device_initcall(xenbus_probe_initcall);
+
+static int __init xenbus_init(void)
{
int err = 0;

@@ -834,9 +849,6 @@ static int __init xenbus_probe_init(void)
goto out_unreg_back;
}

- if (!xen_initial_domain())
- xenbus_probe(NULL);
-
#ifdef CONFIG_XEN_COMPAT_XENFS
/*
* Create xenfs mountpoint in /proc for compatibility with
@@ -857,7 +869,7 @@ static int __init xenbus_probe_init(void)
return err;
}

-postcore_initcall(xenbus_probe_init);
+postcore_initcall(xenbus_init);

MODULE_LICENSE("GPL");

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 9f688d2..64b528d 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2718,3 +2718,6 @@
#define PCI_DEVICE_ID_RME_DIGI32 0x9896
#define PCI_DEVICE_ID_RME_DIGI32_PRO 0x9897
#define PCI_DEVICE_ID_RME_DIGI32_8 0x9898
+
+#define PCI_VENDOR_ID_XEN 0x5853
+#define PCI_DEVICE_ID_XEN_PLATFORM 0x0001
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index a40f1cd..811cda5 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -51,6 +51,7 @@ struct gnttab_free_callback {
u16 count;
};

+int gnttab_init(void);
int gnttab_suspend(void);
int gnttab_resume(void);

diff --git a/include/xen/interface/grant_table.h b/include/xen/interface/grant_table.h
index 39da93c..39e5717 100644
--- a/include/xen/interface/grant_table.h
+++ b/include/xen/interface/grant_table.h
@@ -28,6 +28,7 @@
#ifndef __XEN_PUBLIC_GRANT_TABLE_H__
#define __XEN_PUBLIC_GRANT_TABLE_H__

+#include <xen/interface/xen.h>

/***********************************
* GRANT TABLE REPRESENTATION
--
1.7.0.4

2010-06-21 16:26:22

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 06/13] Allow xen platform pci device to be compiled as a module

From: Stefano Stabellini <[email protected]>

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/events.c | 1 +
drivers/xen/grant-table.c | 20 +++++++++++---------
drivers/xen/manage.c | 1 +
drivers/xen/platform-pci.c | 3 +++
include/xen/grant_table.h | 3 +++
5 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 44fdbbb..0f7546e 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -689,6 +689,7 @@ void xen_hvm_evtchn_do_upcall(void)
{
__xen_evtchn_do_upcall();
}
+EXPORT_SYMBOL_GPL(xen_hvm_evtchn_do_upcall);

/* Rebind a new event channel to an existing irq. */
void rebind_evtchn_irq(int evtchn, int irq)
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 4c959a5..4a3f1b1 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -61,7 +61,8 @@ static unsigned int boot_max_nr_grant_frames;
static int gnttab_free_count;
static grant_ref_t gnttab_free_head;
static DEFINE_SPINLOCK(gnttab_list_lock);
-static unsigned long hvm_pv_resume_frames;
+unsigned long xen_hvm_resume_frames;
+EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);

static struct grant_entry *shared;

@@ -436,7 +437,7 @@ static unsigned int __max_nr_grant_frames(void)
return query.max_nr_frames;
}

-static inline unsigned int max_nr_grant_frames(void)
+unsigned int gnttab_max_grant_frames(void)
{
unsigned int xen_max = __max_nr_grant_frames();

@@ -444,6 +445,7 @@ static inline unsigned int max_nr_grant_frames(void)
return boot_max_nr_grant_frames;
return xen_max;
}
+EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);

static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
{
@@ -464,7 +466,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
xatp.domid = DOMID_SELF;
xatp.idx = i;
xatp.space = XENMAPSPACE_grant_table;
- xatp.gpfn = (hvm_pv_resume_frames >> PAGE_SHIFT) + i;
+ xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
if (rc != 0) {
printk(KERN_WARNING
@@ -492,7 +494,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)

BUG_ON(rc || setup.status);

- rc = arch_gnttab_map_shared(frames, nr_gframes, max_nr_grant_frames(),
+ rc = arch_gnttab_map_shared(frames, nr_gframes, gnttab_max_grant_frames(),
&shared);
BUG_ON(rc);

@@ -505,16 +507,15 @@ int gnttab_resume(void)
{
unsigned int max_nr_gframes;

- max_nr_gframes = max_nr_grant_frames();
+ max_nr_gframes = gnttab_max_grant_frames();
if (max_nr_gframes < nr_grant_frames)
return -ENOSYS;

if (xen_pv_domain())
return gnttab_map(0, nr_grant_frames - 1);

- if (!hvm_pv_resume_frames) {
- hvm_pv_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
- shared = ioremap(hvm_pv_resume_frames, PAGE_SIZE * max_nr_gframes);
+ if (!shared) {
+ shared = ioremap(xen_hvm_resume_frames, PAGE_SIZE * max_nr_gframes);
if (shared == NULL) {
printk(KERN_WARNING
"Fail to ioremap gnttab share frames\n");
@@ -541,7 +542,7 @@ static int gnttab_expand(unsigned int req_entries)
cur = nr_grant_frames;
extra = ((req_entries + (GREFS_PER_GRANT_FRAME-1)) /
GREFS_PER_GRANT_FRAME);
- if (cur + extra > max_nr_grant_frames())
+ if (cur + extra > gnttab_max_grant_frames())
return -ENOSPC;

rc = gnttab_map(cur, cur + extra - 1);
@@ -599,6 +600,7 @@ int gnttab_init(void)
kfree(gnttab_list);
return -ENOMEM;
}
+EXPORT_SYMBOL_GPL(gnttab_init);

static int __devinit __gnttab_init(void)
{
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 0716ba6..f5162e4 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -307,5 +307,6 @@ int xen_setup_shutdown_event(void)

return 0;
}
+EXPORT_SYMBOL_GPL(xen_setup_shutdown_event);

subsys_initcall(__setup_shutdown_event);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 6663201..10b92ec 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -106,6 +106,7 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
int i, ret;
long ioaddr, iolen;
long mmio_addr, mmio_len;
+ unsigned int max_nr_gframes;

i = pci_enable_device(pdev);
if (i)
@@ -153,6 +154,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
}
}

+ max_nr_gframes = gnttab_max_grant_frames();
+ xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
ret = gnttab_init();
if (ret)
goto out;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 811cda5..9a73170 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -113,6 +113,9 @@ int arch_gnttab_map_shared(unsigned long *frames, unsigned long nr_gframes,
void arch_gnttab_unmap_shared(struct grant_entry *shared,
unsigned long nr_gframes);

+extern unsigned long xen_hvm_resume_frames;
+unsigned int gnttab_max_grant_frames(void);
+
#define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))

#endif /* __ASM_GNTTAB_H__ */
--
1.7.0.4

2010-06-21 16:26:42

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 02/13] early PV on HVM

From: Sheng Yang <[email protected]>

Initialize basic pv on hvm features in xen_hvm_guest_init.

The hook in arch/x86/kernel/setup.c can easily be removed using the new
generic hypervisor independent initialization infrastructure present in
the linux-2.6-tip tree.

Signed-off-by: Stefano Stabellini <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Yaozu (Eddie) Dong <[email protected]>
---
arch/x86/kernel/setup.c | 2 +
arch/x86/xen/enlighten.c | 84 +++++++++++++++++++++++++++++++++++++
drivers/input/xen-kbdfront.c | 2 +-
drivers/video/xen-fbfront.c | 2 +-
drivers/xen/xenbus/xenbus_probe.c | 21 ++++++++-
include/xen/xen.h | 2 +
6 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..de6d23f 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -69,6 +69,7 @@
#include <linux/tboot.h>

#include <video/edid.h>
+#include <xen/xen.h>

#include <asm/mtrr.h>
#include <asm/apic.h>
@@ -1032,6 +1033,7 @@ void __init setup_arch(char **cmdline_p)
probe_nr_irqs_gsi();

kvm_guest_init();
+ xen_hvm_guest_init();

e820_reserve_resources();
e820_mark_nosave_regions(max_low_pfn);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 65d8d79..d776c02 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -35,6 +35,7 @@
#include <xen/interface/version.h>
#include <xen/interface/physdev.h>
#include <xen/interface/vcpu.h>
+#include <xen/interface/memory.h>
#include <xen/features.h>
#include <xen/page.h>
#include <xen/hvc-console.h>
@@ -56,6 +57,7 @@
#include <asm/tlbflush.h>
#include <asm/reboot.h>
#include <asm/stackprotector.h>
+#include <asm/hypervisor.h>

#include "xen-ops.h"
#include "mmu.h"
@@ -1206,3 +1208,85 @@ asmlinkage void __init xen_start_kernel(void)
x86_64_start_reservations((char *)__pa_symbol(&boot_params));
#endif
}
+
+static uint32_t xen_cpuid_base(void)
+{
+ uint32_t base, eax, ebx, ecx, edx;
+ char signature[13];
+
+ for (base = 0x40000000; base < 0x40010000; base += 0x100) {
+ cpuid(base, &eax, &ebx, &ecx, &edx);
+ *(uint32_t *)(signature + 0) = ebx;
+ *(uint32_t *)(signature + 4) = ecx;
+ *(uint32_t *)(signature + 8) = edx;
+ signature[12] = 0;
+
+ if (!strcmp("XenVMMXenVMM", signature) && ((eax - base) >= 2))
+ return base;
+ }
+
+ return 0;
+}
+
+static int init_hvm_pv_info(int *major, int *minor)
+{
+ uint32_t eax, ebx, ecx, edx, pages, msr, base;
+ u64 pfn;
+
+ base = xen_cpuid_base();
+ if (!base)
+ return -EINVAL;
+
+ cpuid(base + 1, &eax, &ebx, &ecx, &edx);
+
+ *major = eax >> 16;
+ *minor = eax & 0xffff;
+ printk(KERN_INFO "Xen version %d.%d.\n", *major, *minor);
+
+ cpuid(base + 2, &pages, &msr, &ecx, &edx);
+
+ pfn = __pa(hypercall_page);
+ wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
+
+ xen_setup_features();
+
+ pv_info = xen_info;
+ pv_info.kernel_rpl = 0;
+
+ xen_domain_type = XEN_HVM_DOMAIN;
+
+ return 0;
+}
+
+static void __init init_shared_info(void)
+{
+ struct xen_add_to_physmap xatp;
+ struct shared_info *shared_info_page;
+
+ shared_info_page = (struct shared_info *) alloc_bootmem_pages(PAGE_SIZE);
+ xatp.domid = DOMID_SELF;
+ xatp.idx = 0;
+ xatp.space = XENMAPSPACE_shared_info;
+ xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+ if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
+ BUG();
+
+ HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+
+ per_cpu(xen_vcpu, 0) = &HYPERVISOR_shared_info->vcpu_info[0];
+}
+
+void __init xen_hvm_guest_init(void)
+{
+ int r;
+ int major, minor;
+
+ if (xen_pv_domain())
+ return;
+
+ r = init_hvm_pv_info(&major, &minor);
+ if (r < 0)
+ return;
+
+ init_shared_info();
+}
diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c
index e140816..7451d78 100644
--- a/drivers/input/xen-kbdfront.c
+++ b/drivers/input/xen-kbdfront.c
@@ -339,7 +339,7 @@ static struct xenbus_driver xenkbd_driver = {

static int __init xenkbd_init(void)
{
- if (!xen_domain())
+ if (!xen_domain() || xen_hvm_domain())
return -ENODEV;

/* Nothing to do if running in dom0. */
diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
index fa97d3e..a105a19 100644
--- a/drivers/video/xen-fbfront.c
+++ b/drivers/video/xen-fbfront.c
@@ -684,7 +684,7 @@ static struct xenbus_driver xenfb_driver = {

static int __init xenfb_init(void)
{
- if (!xen_domain())
+ if (!xen_domain() || xen_hvm_domain())
return -ENODEV;

/* Nothing to do if running in dom0. */
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 3479332..0b05b62 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -56,6 +56,8 @@
#include <xen/events.h>
#include <xen/page.h>

+#include <xen/hvm.h>
+
#include "xenbus_comms.h"
#include "xenbus_probe.h"

@@ -806,10 +808,23 @@ static int __init xenbus_probe_init(void)
/* dom0 not yet supported */
} else {
xenstored_ready = 1;
- xen_store_evtchn = xen_start_info->store_evtchn;
- xen_store_mfn = xen_start_info->store_mfn;
+ if (xen_hvm_domain()) {
+ uint64_t v = 0;
+ err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
+ if (err)
+ goto out_error;
+ xen_store_evtchn = (int)v;
+ err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
+ if (err)
+ goto out_error;
+ xen_store_mfn = (unsigned long)v;
+ xen_store_interface = ioremap(xen_store_mfn << PAGE_SHIFT, PAGE_SIZE);
+ } else {
+ xen_store_evtchn = xen_start_info->store_evtchn;
+ xen_store_mfn = xen_start_info->store_mfn;
+ xen_store_interface = mfn_to_virt(xen_store_mfn);
+ }
}
- xen_store_interface = mfn_to_virt(xen_store_mfn);

/* Initialize the interface to xenstore. */
err = xs_init();
diff --git a/include/xen/xen.h b/include/xen/xen.h
index a164024..77604ed 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -9,8 +9,10 @@ enum xen_domain_type {

#ifdef CONFIG_XEN
extern enum xen_domain_type xen_domain_type;
+extern void xen_hvm_guest_init(void);
#else
#define xen_domain_type XEN_NATIVE
+#define xen_hvm_guest_init() do { } while (0)
#endif

#define xen_domain() (xen_domain_type != XEN_NATIVE)
--
1.7.0.4

2010-06-30 17:55:34

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 10/13] Do not try to disable hpet if it hasn't been initialized before

On Mon, Jun 21, 2010 at 05:14:04PM +0100, [email protected] wrote:
> From: Stefano Stabellini <[email protected]>
>
> hpet_disable is called unconditionally on machine reboot if hpet support
> is compiled in the kernel.
> hpet_disable only checks if the machine is hpet capable but doesn't make
> sure that hpet has been initialized.

CC-ing John Stultz who has had his head wrapped around this time-thingy.

Hi John!

What are your thoughts about this patch?
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> arch/x86/kernel/hpet.c | 18 ++++++++++--------
> 1 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 23b4ecd..2b299da 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -959,16 +959,18 @@ fs_initcall(hpet_late_init);
>
> void hpet_disable(void)
> {
> - if (is_hpet_capable()) {
> - unsigned int cfg = hpet_readl(HPET_CFG);
> + unsigned int cfg;
>
> - if (hpet_legacy_int_enabled) {
> - cfg &= ~HPET_CFG_LEGACY;
> - hpet_legacy_int_enabled = 0;
> - }
> - cfg &= ~HPET_CFG_ENABLE;
> - hpet_writel(cfg, HPET_CFG);
> + if (!is_hpet_capable() || !hpet_address || !hpet_virt_address)
> + return;
> +
> + cfg = hpet_readl(HPET_CFG);
> + if (hpet_legacy_int_enabled) {
> + cfg &= ~HPET_CFG_LEGACY;
> + hpet_legacy_int_enabled = 0;
> }
> + cfg &= ~HPET_CFG_ENABLE;
> + hpet_writel(cfg, HPET_CFG);
> }
>
> #ifdef CONFIG_HPET_EMULATE_RTC
> --
> 1.7.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-06-30 17:57:45

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 04/13] Xen PCI platform device driver

> @@ -476,9 +503,28 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>
> int gnttab_resume(void)
> {
> - if (max_nr_grant_frames() < nr_grant_frames)
> + unsigned int max_nr_gframes;
> +
> + max_nr_gframes = max_nr_grant_frames();
> + if (max_nr_gframes < nr_grant_frames)
> return -ENOSYS;
> - return gnttab_map(0, nr_grant_frames - 1);
> +
> + if (xen_pv_domain())
> + return gnttab_map(0, nr_grant_frames - 1);
> +
> + if (!hvm_pv_resume_frames) {
> + hvm_pv_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> + shared = ioremap(hvm_pv_resume_frames, PAGE_SIZE * max_nr_gframes);
> + if (shared == NULL) {
> + printk(KERN_WARNING
> + "Fail to ioremap gnttab share frames\n");

I would say: "Failed to ioremap gnttab share frames!\n".

.. snip ..

> + if (request_region(ioaddr, iolen, DRV_NAME) == NULL) {
> + dev_err(&pdev->dev, "I/O resource 0x%lx @ 0x%lx busy\n",
> + iolen, ioaddr);
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + platform_mmio = mmio_addr;
> + platform_mmiolen = mmio_len;
> +
> + if (!xen_have_vector_callback) {
> + ret = xen_allocate_irq(pdev);
> + if (ret) {
> + printk(KERN_WARNING "request_irq failed err=%d\n", ret);
Use dev_warn here.
> + goto out;
> + }
> + callback_via = get_callback_via(pdev);
> + ret = xen_set_callback_via(callback_via);
> + if (ret) {
> + printk(KERN_WARNING
> + "Unable to set the evtchn callback err=%d\n", ret);
ditto.
> + goto out;
> + }
> + }
> +
> + ret = gnttab_init();
> + if (ret)
> + goto out;
> + xenbus_probe(NULL);
> +
> +out:
> + if (ret) {
> + release_mem_region(mmio_addr, mmio_len);
> + release_region(ioaddr, iolen);
> + pci_disable_device(pdev);
> + }
> +
> + return ret;
> +}
> +
> +static struct pci_device_id platform_pci_tbl[] __devinitdata = {
> + {PCI_VENDOR_ID_XEN, PCI_DEVICE_ID_XEN_PLATFORM,
> + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> + {0,}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, platform_pci_tbl);
> +
> +static struct pci_driver platform_driver = {
> + .name = DRV_NAME,
> + .probe = platform_pci_init,
> + .id_table = platform_pci_tbl,
> +};
> +
> +static int __init platform_pci_module_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&platform_driver);
> + if (rc) {
> + printk(KERN_INFO DRV_NAME
> + ": No platform pci device model found\n");

No need for that. The pci_register_driver call to ->probe and
its subsequent setup, if it failed, should have displayed the
appropiate error+explanation?

How about just making it:

return pci_register_driver(&platform_driver);

2010-06-30 17:58:30

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 0/12] PV on HVM Xen

On Mon, Jun 21, 2010 at 05:12:20PM +0100, Stefano Stabellini wrote:
> Hi all,
> this is another update on the PV on HVM Xen series; a list of changes
> compared to the previous version follows:

Can you use the [v4] in the main patch (this is the fourth version of
these patches, right?)? I am getting confused as to which version I am
reviewing and if I've missed one or what not.

And in this first intro email state:

Since v7 the following things have changed:
- Konrad made some comments.
- etc.

2010-06-30 21:24:26

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [PATCH 10/13] Do not try to disable hpet if it hasn't been initialized before

Looks Good.
Acked-by: Venkatesh Pallipadi <[email protected]>

Copying Thomas/Ingo/Peter.

On Mon, Jun 21, 2010 at 9:14 AM, <[email protected]> wrote:
> From: Stefano Stabellini <[email protected]>
>
> hpet_disable is called unconditionally on machine reboot if hpet support
> is compiled in the kernel.
> hpet_disable only checks if the machine is hpet capable but doesn't make
> sure that hpet has been initialized.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> ?arch/x86/kernel/hpet.c | ? 18 ++++++++++--------
> ?1 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 23b4ecd..2b299da 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -959,16 +959,18 @@ fs_initcall(hpet_late_init);
>
> ?void hpet_disable(void)
> ?{
> - ? ? ? if (is_hpet_capable()) {
> - ? ? ? ? ? ? ? unsigned int cfg = hpet_readl(HPET_CFG);
> + ? ? ? unsigned int cfg;
>
> - ? ? ? ? ? ? ? if (hpet_legacy_int_enabled) {
> - ? ? ? ? ? ? ? ? ? ? ? cfg &= ~HPET_CFG_LEGACY;
> - ? ? ? ? ? ? ? ? ? ? ? hpet_legacy_int_enabled = 0;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? cfg &= ~HPET_CFG_ENABLE;
> - ? ? ? ? ? ? ? hpet_writel(cfg, HPET_CFG);
> + ? ? ? if (!is_hpet_capable() || !hpet_address || !hpet_virt_address)
> + ? ? ? ? ? ? ? return;
> +
> + ? ? ? cfg = hpet_readl(HPET_CFG);
> + ? ? ? if (hpet_legacy_int_enabled) {
> + ? ? ? ? ? ? ? cfg &= ~HPET_CFG_LEGACY;
> + ? ? ? ? ? ? ? hpet_legacy_int_enabled = 0;
> ? ? ? ?}
> + ? ? ? cfg &= ~HPET_CFG_ENABLE;
> + ? ? ? hpet_writel(cfg, HPET_CFG);
> ?}
>
> ?#ifdef CONFIG_HPET_EMULATE_RTC
> --
> 1.7.0.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

2010-07-01 11:38:24

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 0/12] PV on HVM Xen

On Wed, 30 Jun 2010, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 21, 2010 at 05:12:20PM +0100, Stefano Stabellini wrote:
> > Hi all,
> > this is another update on the PV on HVM Xen series; a list of changes
> > compared to the previous version follows:
>
> Can you use the [v4] in the main patch (this is the fourth version of
> these patches, right?)? I am getting confused as to which version I am
> reviewing and if I've missed one or what not.
>
> And in this first intro email state:
>
> Since v7 the following things have changed:
> - Konrad made some comments.
> - etc.
>

Good point, I'll do that from now on.
I am pretty sure that you are reviewing the latest version I sent.

2010-07-01 11:38:35

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 04/13] Xen PCI platform device driver

On Wed, 30 Jun 2010, Konrad Rzeszutek Wilk wrote:
> > @@ -476,9 +503,28 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
> >
> > int gnttab_resume(void)
> > {
> > - if (max_nr_grant_frames() < nr_grant_frames)
> > + unsigned int max_nr_gframes;
> > +
> > + max_nr_gframes = max_nr_grant_frames();
> > + if (max_nr_gframes < nr_grant_frames)
> > return -ENOSYS;
> > - return gnttab_map(0, nr_grant_frames - 1);
> > +
> > + if (xen_pv_domain())
> > + return gnttab_map(0, nr_grant_frames - 1);
> > +
> > + if (!hvm_pv_resume_frames) {
> > + hvm_pv_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> > + shared = ioremap(hvm_pv_resume_frames, PAGE_SIZE * max_nr_gframes);
> > + if (shared == NULL) {
> > + printk(KERN_WARNING
> > + "Fail to ioremap gnttab share frames\n");
>
> I would say: "Failed to ioremap gnttab share frames!\n".
>
> .. snip ..
>
> > + if (request_region(ioaddr, iolen, DRV_NAME) == NULL) {
> > + dev_err(&pdev->dev, "I/O resource 0x%lx @ 0x%lx busy\n",
> > + iolen, ioaddr);
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + platform_mmio = mmio_addr;
> > + platform_mmiolen = mmio_len;
> > +
> > + if (!xen_have_vector_callback) {
> > + ret = xen_allocate_irq(pdev);
> > + if (ret) {
> > + printk(KERN_WARNING "request_irq failed err=%d\n", ret);
> Use dev_warn here.
> > + goto out;
> > + }
> > + callback_via = get_callback_via(pdev);
> > + ret = xen_set_callback_via(callback_via);
> > + if (ret) {
> > + printk(KERN_WARNING
> > + "Unable to set the evtchn callback err=%d\n", ret);
> ditto.
> > + goto out;
> > + }
> > + }
> > +
> > + ret = gnttab_init();
> > + if (ret)
> > + goto out;
> > + xenbus_probe(NULL);
> > +
> > +out:
> > + if (ret) {
> > + release_mem_region(mmio_addr, mmio_len);
> > + release_region(ioaddr, iolen);
> > + pci_disable_device(pdev);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static struct pci_device_id platform_pci_tbl[] __devinitdata = {
> > + {PCI_VENDOR_ID_XEN, PCI_DEVICE_ID_XEN_PLATFORM,
> > + PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
> > + {0,}
> > +};
> > +
> > +MODULE_DEVICE_TABLE(pci, platform_pci_tbl);
> > +
> > +static struct pci_driver platform_driver = {
> > + .name = DRV_NAME,
> > + .probe = platform_pci_init,
> > + .id_table = platform_pci_tbl,
> > +};
> > +
> > +static int __init platform_pci_module_init(void)
> > +{
> > + int rc;
> > +
> > + rc = pci_register_driver(&platform_driver);
> > + if (rc) {
> > + printk(KERN_INFO DRV_NAME
> > + ": No platform pci device model found\n");
>
> No need for that. The pci_register_driver call to ->probe and
> its subsequent setup, if it failed, should have displayed the
> appropiate error+explanation?
>
> How about just making it:
>
> return pci_register_driver(&platform_driver);
>
>


Thanks for all the comments and suggestions, I'll make the changes.

2010-07-01 19:42:23

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH 12/13] Unplug emulated disks and nics

[email protected] wrote:
> From: Stefano Stabellini <[email protected]>
>
> Add a xen_emul_unplug command line option to the kernel to unplug
> xen emulated disks and nics.
>
> Set the default value of xen_emul_unplug depending on whether or
> not the Xen PV frontends and the Xen platform PCI driver have
> been compiled for this kernel (modules or built-in are both OK).
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 11 +++
> arch/x86/xen/Makefile | 2 +-
> arch/x86/xen/enlighten.c | 1 +
> arch/x86/xen/platform-pci-unplug.c | 130 +++++++++++++++++++++++++++++++++++
> arch/x86/xen/xen-ops.h | 1 +
> drivers/xen/platform-pci.c | 4 +
> drivers/xen/xenbus/xenbus_probe.c | 4 +
> include/xen/platform_pci.h | 49 +++++++++++++
> 8 files changed, 201 insertions(+), 1 deletions(-)
> create mode 100644 arch/x86/xen/platform-pci-unplug.c
> create mode 100644 include/xen/platform_pci.h
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 839b21b..716eea8 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -113,6 +113,7 @@ parameter is applicable:
> More X86-64 boot options can be found in
> Documentation/x86/x86_64/boot-options.txt .
> X86 Either 32bit or 64bit x86 (same as X86-32+X86-64)
> + XEN Xen support is enabled
>
> In addition, the following text indicates that the option:
>
> @@ -2834,6 +2835,16 @@ and is between 256 and 4096 characters. It is defined in the file
> xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks.
> xd_geo= See header of drivers/block/xd.c.
>
> + xen_emul_unplug= [HW,X86,XEN]
> + Unplug Xen emulated devices
> + Format: [unplug0,][unplug1]
> + ide-disks -- unplug primary master IDE devices
> + aux-ide-disks -- unplug non-primary-master IDE devices
> + nics -- unplug network devices
> + all -- unplug all emulated devices (NICs and IDE disks)
> + ignore -- continue loading the Xen platform PCI driver even
> + if the version check failed
> +
> xirc2ps_cs= [NET,PCMCIA]
> Format:
> <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 3bb4fc2..9309546 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -12,7 +12,7 @@ CFLAGS_mmu.o := $(nostackp)
>
> obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
> time.o xen-asm.o xen-asm_$(BITS).o \
> - grant-table.o suspend.o
> + grant-table.o suspend.o platform-pci-unplug.o
>
> obj-$(CONFIG_SMP) += smp.o
> obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index bcd98a9..5126b53 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1317,6 +1317,7 @@ void __init xen_hvm_guest_init(void)
> if (xen_feature(XENFEAT_hvm_callback_vector))
> xen_have_vector_callback = 1;
> register_cpu_notifier(&xen_hvm_cpu_notifier);
> + xen_unplug_emulated_devices();
> have_vcpu_info_placement = 0;
> x86_init.irqs.intr_init = xen_init_IRQ;
> xen_hvm_init_time_ops();
> diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
> new file mode 100644
> index 0000000..d686440
> --- /dev/null
> +++ b/arch/x86/xen/platform-pci-unplug.c
> @@ -0,0 +1,130 @@
> +/******************************************************************************
> + * platform-pci-unplug.c
> + *
> + * Xen platform PCI device driver
> + * Copyright (c) 2010, Citrix
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + *
> + */
> +
> +#include <asm/io.h>
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +
> +#include <xen/platform_pci.h>
> +
> +/* boolean to signal that the platform pci device can be used */
> +bool xen_platform_pci_enabled;
> +EXPORT_SYMBOL_GPL(xen_platform_pci_enabled);
> +static int xen_emul_unplug;
> +
> +static int __init check_platform_magic(void)
> +{
> + short magic;
> + char protocol;
> +
> + magic = inw(XEN_IOPORT_MAGIC);
> + if (magic != XEN_IOPORT_MAGIC_VAL) {
> + printk(KERN_ERR "Xen Platform PCI: unrecognised magic value\n");
> + return -1;
> + }
> +
> + protocol = inb(XEN_IOPORT_PROTOVER);
> +
> + printk(KERN_DEBUG "Xen Platform PCI: I/O protocol version %d\n",
> + protocol);
> +
> + switch (protocol) {
> + case 1:
> + outw(XEN_IOPORT_LINUX_PRODNUM, XEN_IOPORT_PRODNUM);
> + outl(XEN_IOPORT_LINUX_DRVVER, XEN_IOPORT_DRVVER);
> + if (inw(XEN_IOPORT_MAGIC) != XEN_IOPORT_MAGIC_VAL) {
> + printk(KERN_ERR "Xen Platform: blacklisted by host\n");
> + return -3;
> + }
> + break;
> + default:
> + printk(KERN_WARNING "Xen Platform PCI: unknown I/O protocol version");
> + return -2;
> + }
> +
> + return 0;
> +}
> +
> +void __init xen_unplug_emulated_devices(void)
> +{
> + int r;
> +
> + /* check the version of the xen platform PCI device */
> + r = check_platform_magic();
> + /* If the version matches enable the Xen platform PCI driver.
> + * Also enable the Xen platform PCI driver if the version is really old
> + * and the user told us to ignore it. */
> + if (!r || (r == -1 && (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
> + xen_platform_pci_enabled = 1;
> + /* Set the default value of xen_emul_unplug depending on whether or
> + * not the Xen PV frontends and the Xen platform PCI driver have
> + * been compiled for this kernel (modules or built-in are both OK). */
> + if (xen_platform_pci_enabled && !xen_emul_unplug) {
> + if (xen_must_unplug_nics()) {
> + printk(KERN_INFO "Netfront and the Xen platform PCI driver have "
> + "been compiled for this kernel: unplug emulated NICs.\n");
> + xen_emul_unplug |= XEN_UNPLUG_ALL_NICS;
> + }
> + if (xen_must_unplug_disks()) {
> + printk(KERN_INFO "Blkfront and the Xen platform PCI driver have "
> + "been compiled for this kernel: unplug emulated disks.\n"
> + "You might have to change the root device\n"
> + "from /dev/hd[a-d] to /dev/xvd[a-d]\n"
> + "in your root= kernel command line option\n");
> + xen_emul_unplug |= XEN_UNPLUG_ALL_IDE_DISKS;
> + }
> + }
> + /* Now unplug the emulated devices */
> + if (xen_platform_pci_enabled && !(xen_emul_unplug & XEN_UNPLUG_IGNORE))
> + outw(xen_emul_unplug, XEN_IOPORT_UNPLUG);
> +}
> +
> +static int __init parse_xen_emul_unplug(char *arg)
> +{
> + char *p, *q;
> + int l;
> +
> + for (p = arg; p; p = q) {
> + q = strchr(p, ',');
> + if (q) {
> + l = q - p;
> + q++;
> + } else {
> + l = strlen(p);
> + }
> + if (!strncmp(p, "all", l))
> + xen_emul_unplug |= XEN_UNPLUG_ALL;
> + else if (!strncmp(p, "ide-disks", l))
> + xen_emul_unplug |= XEN_UNPLUG_ALL_IDE_DISKS;
> + else if (!strncmp(p, "aux-ide-disks", l))
> + xen_emul_unplug |= XEN_UNPLUG_AUX_IDE_DISKS;
> + else if (!strncmp(p, "nics", l))
> + xen_emul_unplug |= XEN_UNPLUG_ALL_NICS;
> + else if (!strncmp(p, "ignore", l))
> + xen_emul_unplug |= XEN_UNPLUG_IGNORE;
> + else
> + printk(KERN_WARNING "unrecognised option '%s' "
> + "in module parameter 'dev_unplug'\n", p);
> + }
> + return 0;
> +}
> +early_param("xen_emul_unplug", parse_xen_emul_unplug);
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 089d189..ed77694 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -40,6 +40,7 @@ void xen_vcpu_restore(void);
>
> void xen_callback_vector(void);
> void xen_hvm_init_shared_info(void);
> +void __init xen_unplug_emulated_devices(void);
>
> void __init xen_build_dynamic_phys_to_machine(void);
>
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 10b92ec..35f162d 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -27,6 +27,7 @@
> #include <linux/module.h>
> #include <linux/pci.h>
>
> +#include <xen/platform_pci.h>
> #include <xen/grant_table.h>
> #include <xen/xenbus.h>
> #include <xen/events.h>
> @@ -195,6 +196,9 @@ static int __init platform_pci_module_init(void)
> {
> int rc;
>
> + if (!xen_platform_pci_enabled)
> + return -ENODEV;
> +

The problem with this check/enable is that if you run
this on an older qemu-xen that doesn't have unplug support,
it fails pv-hvm configuration.

But, all that means is that you can't use an xvd as the boot device,
and you have to use an emulated IDE device as boot device.
There are a couple ways to configure the vnif correctly (in guest
or in xen guest config file).

So, on an older (say, rhel5) xen, I don't have this check;
the boot device is required to be spec'd as hda, not vda, and
xen-blkfront is not allowed to configure blk major nums
for IDE (& SCSI) (to avoid 2 drivers twiddling w/same phys backend... not good!).

When that older qemu is updated, this restriction will be lifted
(in the guest kernel & related xen guest config file spec).

> rc = pci_register_driver(&platform_driver);
> if (rc) {
> printk(KERN_INFO DRV_NAME
> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> index 3d941ec..243279a 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -56,6 +56,7 @@
> #include <xen/events.h>
> #include <xen/page.h>
>
> +#include <xen/platform_pci.h>
> #include <xen/hvm.h>
>
> #include "xenbus_comms.h"
> @@ -977,6 +978,9 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
> #ifndef MODULE
> static int __init boot_wait_for_devices(void)
> {
> + if (xen_hvm_domain() && !xen_platform_pci_enabled)
> + return -ENODEV;
> +

This check also fails on qemu-xen that doesn't support
ide-unplug but pv-hvm is configured in (aka, on rhel5 today).
The flag should be something different.


> ready_to_wait_for_devices = 1;
> wait_for_devices(NULL);
> return 0;
> diff --git a/include/xen/platform_pci.h b/include/xen/platform_pci.h
> new file mode 100644
> index 0000000..afa8855
> --- /dev/null
> +++ b/include/xen/platform_pci.h
> @@ -0,0 +1,49 @@
> +#ifndef _XEN_PLATFORM_PCI_H
> +#define _XEN_PLATFORM_PCI_H
> +
> +#define XEN_IOPORT_MAGIC_VAL 0x49d2
> +#define XEN_IOPORT_LINUX_PRODNUM 0x0003
> +#define XEN_IOPORT_LINUX_DRVVER 0x0001
> +
> +#define XEN_IOPORT_BASE 0x10
> +
> +#define XEN_IOPORT_PLATFLAGS (XEN_IOPORT_BASE + 0) /* 1 byte access (R/W) */
> +#define XEN_IOPORT_MAGIC (XEN_IOPORT_BASE + 0) /* 2 byte access (R) */
> +#define XEN_IOPORT_UNPLUG (XEN_IOPORT_BASE + 0) /* 2 byte access (W) */
> +#define XEN_IOPORT_DRVVER (XEN_IOPORT_BASE + 0) /* 4 byte access (W) */
> +
> +#define XEN_IOPORT_SYSLOG (XEN_IOPORT_BASE + 2) /* 1 byte access (W) */
> +#define XEN_IOPORT_PROTOVER (XEN_IOPORT_BASE + 2) /* 1 byte access (R) */
> +#define XEN_IOPORT_PRODNUM (XEN_IOPORT_BASE + 2) /* 2 byte access (W) */
> +
> +#define XEN_UNPLUG_ALL_IDE_DISKS 1
> +#define XEN_UNPLUG_ALL_NICS 2
> +#define XEN_UNPLUG_AUX_IDE_DISKS 4
> +#define XEN_UNPLUG_ALL 7
> +#define XEN_UNPLUG_IGNORE 8
> +
> +static inline int xen_must_unplug_nics(void) {
> +#if (defined(CONFIG_XEN_NETDEV_FRONTEND) || \
> + defined(CONFIG_XEN_NETDEV_FRONTEND_MODULE)) && \
> + (defined(CONFIG_XEN_PLATFORM_PCI) || \
> + defined(CONFIG_XEN_PLATFORM_PCI_MODULE))
> + return 1;
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline int xen_must_unplug_disks(void) {
> +#if (defined(CONFIG_XEN_BLKDEV_FRONTEND) || \
> + defined(CONFIG_XEN_BLKDEV_FRONTEND_MODULE)) && \
> + (defined(CONFIG_XEN_PLATFORM_PCI) || \
> + defined(CONFIG_XEN_PLATFORM_PCI_MODULE))
> + return 1;
> +#else
> + return 0;
> +#endif
> +}
> +
> +extern bool xen_platform_pci_enabled;
> +
> +#endif /* _XEN_PLATFORM_PCI_H */

2010-07-01 19:42:25

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH 11/13] Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.

[email protected] wrote:
> From: Stefano Stabellini <[email protected]>
>
> Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
> clockevent device on all vcpus, use the xen wallclock time as wallclock
> instead of rtc and use xen_clocksource as clocksource.
> The pv clock algorithm needs to work correctly for the xen_clocksource
> and xen wallclock to be usable, only modern Xen versions offer a
> reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).
>
> Using the hpet as clocksource means a VMEXIT every time we read/write to
> the hpet mmio addresses, pvclock give us a better rating without
> VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> arch/x86/xen/enlighten.c | 14 +--------
> arch/x86/xen/suspend.c | 4 ++
> arch/x86/xen/time.c | 59 ++++++++++++++++++++++++++++++++++---
> arch/x86/xen/xen-ops.h | 7 +---
> include/xen/interface/features.h | 3 ++
> 5 files changed, 65 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 90bac21..bcd98a9 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -933,10 +933,6 @@ static const struct pv_init_ops xen_init_ops __initdata = {
> .patch = xen_patch,
> };
>
> -static const struct pv_time_ops xen_time_ops __initdata = {
> - .sched_clock = xen_sched_clock,
> -};
> -
> static const struct pv_cpu_ops xen_cpu_ops __initdata = {
> .cpuid = xen_cpuid,
>
> @@ -1074,7 +1070,6 @@ asmlinkage void __init xen_start_kernel(void)
> /* Install Xen paravirt ops */
> pv_info = xen_info;
> pv_init_ops = xen_init_ops;
> - pv_time_ops = xen_time_ops;
> pv_cpu_ops = xen_cpu_ops;
> pv_apic_ops = xen_apic_ops;
>
> @@ -1082,13 +1077,7 @@ asmlinkage void __init xen_start_kernel(void)
> x86_init.oem.arch_setup = xen_arch_setup;
> x86_init.oem.banner = xen_banner;
>
> - x86_init.timers.timer_init = xen_time_init;
> - x86_init.timers.setup_percpu_clockev = x86_init_noop;
> - x86_cpuinit.setup_percpu_clockev = x86_init_noop;
> -
> - x86_platform.calibrate_tsc = xen_tsc_khz;
> - x86_platform.get_wallclock = xen_get_wallclock;
> - x86_platform.set_wallclock = xen_set_wallclock;
> + xen_init_time_ops();
>
> /*
> * Set up some pagetable state before starting to set any ptes.
> @@ -1330,4 +1319,5 @@ void __init xen_hvm_guest_init(void)
> register_cpu_notifier(&xen_hvm_cpu_notifier);
> have_vcpu_info_placement = 0;
> x86_init.irqs.intr_init = xen_init_IRQ;
> + xen_hvm_init_time_ops();
> }
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 6ff9665..0774c67 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -28,8 +28,12 @@ void xen_pre_suspend(void)
>
> void xen_hvm_post_suspend(int suspend_cancelled)
> {
> + int cpu;
> xen_hvm_init_shared_info();
> xen_callback_vector();
> + for_each_online_cpu(cpu) {
> + xen_setup_runstate_info(cpu);
> + }
> }
>

As I found out on older xen (non 'modern Xen version' ;-) )
w/o XENFEAT_hvm_safe_pvclock, the above patch should look like:

if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
int cpu;
for_each_online_cpu(cpu) {
xen_setup_runstate_info(cpu);
}
}

- Don

> void xen_post_suspend(int suspend_cancelled)
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 32764b8..2d76c28 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -20,6 +20,7 @@
> #include <asm/xen/hypercall.h>
>
> #include <xen/events.h>
> +#include <xen/features.h>
> #include <xen/interface/xen.h>
> #include <xen/interface/vcpu.h>
>
> @@ -160,7 +161,7 @@ static void do_stolen_accounting(void)
> * nanoseconds, which is nanoseconds the VCPU spent in RUNNING+BLOCKED
> * states.
> */
> -unsigned long long xen_sched_clock(void)
> +static unsigned long long xen_sched_clock(void)
> {
> struct vcpu_runstate_info state;
> cycle_t now;
> @@ -195,7 +196,7 @@ unsigned long long xen_sched_clock(void)
>
>
> /* Get the TSC speed from Xen */
> -unsigned long xen_tsc_khz(void)
> +static unsigned long xen_tsc_khz(void)
> {
> struct pvclock_vcpu_time_info *info =
> &HYPERVISOR_shared_info->vcpu_info[0].time;
> @@ -230,7 +231,7 @@ static void xen_read_wallclock(struct timespec *ts)
> put_cpu_var(xen_vcpu);
> }
>
> -unsigned long xen_get_wallclock(void)
> +static unsigned long xen_get_wallclock(void)
> {
> struct timespec ts;
>
> @@ -238,7 +239,7 @@ unsigned long xen_get_wallclock(void)
> return ts.tv_sec;
> }
>
> -int xen_set_wallclock(unsigned long now)
> +static int xen_set_wallclock(unsigned long now)
> {
> /* do nothing for domU */
> return -1;
> @@ -473,7 +474,11 @@ void xen_timer_resume(void)
> }
> }
>
> -__init void xen_time_init(void)
> +static const struct pv_time_ops xen_time_ops __initdata = {
> + .sched_clock = xen_sched_clock,
> +};
> +
> +static __init void xen_time_init(void)
> {
> int cpu = smp_processor_id();
>
> @@ -497,3 +502,47 @@ __init void xen_time_init(void)
> xen_setup_timer(cpu);
> xen_setup_cpu_clockevents();
> }
> +
> +__init void xen_init_time_ops(void)
> +{
> + pv_time_ops = xen_time_ops;
> +
> + x86_init.timers.timer_init = xen_time_init;
> + x86_init.timers.setup_percpu_clockev = x86_init_noop;
> + x86_cpuinit.setup_percpu_clockev = x86_init_noop;
> +
> + x86_platform.calibrate_tsc = xen_tsc_khz;
> + x86_platform.get_wallclock = xen_get_wallclock;
> + x86_platform.set_wallclock = xen_set_wallclock;
> +}
> +
> +static void xen_hvm_setup_cpu_clockevents(void)
> +{
> + int cpu = smp_processor_id();
> + xen_setup_runstate_info(cpu);
> + xen_setup_timer(cpu);
> + xen_setup_cpu_clockevents();
> +}
> +
> +__init void xen_hvm_init_time_ops(void)
> +{
> + /* vector callback is needed otherwise we cannot receive interrupts
> + * on cpu > 0 */
> + if (!xen_have_vector_callback && num_present_cpus() > 1)
> + return;
> + if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
> + printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
> + "disable pv timer\n");
> + return;
> + }
> +
> + pv_time_ops = xen_time_ops;
> + x86_init.timers.timer_init = xen_time_init;
> + x86_init.timers.setup_percpu_clockev = x86_init_noop;
> + x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
> +
> + x86_platform.calibrate_tsc = xen_tsc_khz;
> + x86_platform.get_wallclock = xen_get_wallclock;
> + x86_platform.set_wallclock = xen_set_wallclock;
> +}
> +
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 01c9dd3..089d189 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -49,11 +49,8 @@ void xen_setup_runstate_info(int cpu);
> void xen_teardown_timer(int cpu);
> cycle_t xen_clocksource_read(void);
> void xen_setup_cpu_clockevents(void);
> -unsigned long xen_tsc_khz(void);
> -void __init xen_time_init(void);
> -unsigned long xen_get_wallclock(void);
> -int xen_set_wallclock(unsigned long time);
> -unsigned long long xen_sched_clock(void);
> +void __init xen_init_time_ops(void);
> +void __init xen_hvm_init_time_ops(void);
>
> irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
>
> diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h
> index 8ab08b9..70d2563 100644
> --- a/include/xen/interface/features.h
> +++ b/include/xen/interface/features.h
> @@ -44,6 +44,9 @@
> /* x86: Does this Xen host support the HVM callback vector type? */
> #define XENFEAT_hvm_callback_vector 8
>
> +/* x86: pvclock algorithm is safe to use on HVM */
> +#define XENFEAT_hvm_safe_pvclock 9
> +
> #define XENFEAT_NR_SUBMAPS 1
>
> #endif /* __XEN_PUBLIC_FEATURES_H__ */

2010-07-01 19:44:57

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen

Hi Stefano --

What versions of Xen are compatible with this series?
I.e. is 4.0 required? Are all Xen-side changes in
official 4.0.0? Or only in 4.0-testing (and therefore
will be in 4.0.1)? Or only in xen-unstable?

If post-4.0.0 xen changes are necessary, could you point
out the changesets?

Thanks,
Dan

P.S. Sorry if I missed earlier discussion or documentation
on this... but if not otherwise documented, maybe the
info should be in the patch comment?

> -----Original Message-----
> From: Stefano Stabellini [mailto:[email protected]]
> Sent: Thursday, July 01, 2010 5:39 AM
> To: Konrad Rzeszutek Wilk
> Cc: Fitzhardinge; [email protected];
> [email protected]; Stefano Stabellini; linux-
> [email protected]; Don Dutile; Sheng Yang
> Subject: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen
>
> On Wed, 30 Jun 2010, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jun 21, 2010 at 05:12:20PM +0100, Stefano Stabellini wrote:
> > > Hi all,
> > > this is another update on the PV on HVM Xen series; a list of
> changes
> > > compared to the previous version follows:
> >
> > Can you use the [v4] in the main patch (this is the fourth version of
> > these patches, right?)? I am getting confused as to which version I
> am
> > reviewing and if I've missed one or what not.
> >
> > And in this first intro email state:
> >
> > Since v7 the following things have changed:
> > - Konrad made some comments.
> > - etc.
> >
>
> Good point, I'll do that from now on.
> I am pretty sure that you are reviewing the latest version I sent.
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xensource.com/xen-devel

2010-07-02 10:43:38

by Stefano Stabellini

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen

On Thu, 1 Jul 2010, Dan Magenheimer wrote:
> Hi Stefano --
>
> What versions of Xen are compatible with this series?
> I.e. is 4.0 required? Are all Xen-side changes in
> official 4.0.0? Or only in 4.0-testing (and therefore
> will be in 4.0.1)? Or only in xen-unstable?
>
> If post-4.0.0 xen changes are necessary, could you point
> out the changesets?
>


Any Xen version is going to work, but you need unstable to get all the
features.

In particular you need these patches:

- 21647: implement HVMOP_pagetable_dying

- 21493: update_runstate_area for 32 bit PV on HVM guests

- 21449: implement vector callback for evtchn delivery

- 21445: TSC handling cleanups (version 2)

- 21341: Export timer hypercalls to HVM guests too

- 21339: TSC handling cleanups

2010-07-02 10:45:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 10/13] Do not try to disable hpet if it hasn't been initialized before

On 06/21/2010 06:14 PM, [email protected] wrote:
> - if (is_hpet_capable()) {
> - unsigned int cfg = hpet_readl(HPET_CFG);
> + unsigned int cfg;

You changed this earlier from unsigned long to unsigned int, but it is
wrong (though it works because bits 32-63 of the HPET_CFG register are
unused). Can you please make a note to change it back sometime?

Thanks,

Paolo

2010-07-02 16:09:26

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen

> From: Stefano Stabellini [mailto:[email protected]]
>
> On Thu, 1 Jul 2010, Dan Magenheimer wrote:
> > Hi Stefano --
> >
> > What versions of Xen are compatible with this series?
> > I.e. is 4.0 required? Are all Xen-side changes in
> > official 4.0.0? Or only in 4.0-testing (and therefore
> > will be in 4.0.1)? Or only in xen-unstable?
> >
> > If post-4.0.0 xen changes are necessary, could you point
> > out the changesets?
>
> Any Xen version is going to work, but you need unstable to get all the
> features.
>
> In particular you need these patches:
>
> - 21647: implement HVMOP_pagetable_dying
>
> - 21493: update_runstate_area for 32 bit PV on HVM guests
>
> - 21449: implement vector callback for evtchn delivery
>
> - 21445: TSC handling cleanups (version 2)
>
> - 21341: Export timer hypercalls to HVM guests too
>
> - 21339: TSC handling cleanups

Thanks for the thorough reply! Any reason that any of these
cannot or should not be back-ported to go into 4.0.1?

Thanks,
Dan

2010-07-02 17:14:14

by Stefano Stabellini

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen

On Fri, 2 Jul 2010, Dan Magenheimer wrote:
> > From: Stefano Stabellini [mailto:[email protected]]
> >
> > On Thu, 1 Jul 2010, Dan Magenheimer wrote:
> > > Hi Stefano --
> > >
> > > What versions of Xen are compatible with this series?
> > > I.e. is 4.0 required? Are all Xen-side changes in
> > > official 4.0.0? Or only in 4.0-testing (and therefore
> > > will be in 4.0.1)? Or only in xen-unstable?
> > >
> > > If post-4.0.0 xen changes are necessary, could you point
> > > out the changesets?
> >
> > Any Xen version is going to work, but you need unstable to get all the
> > features.
> >
> > In particular you need these patches:
> >
> > - 21647: implement HVMOP_pagetable_dying
> >
> > - 21493: update_runstate_area for 32 bit PV on HVM guests
> >
> > - 21449: implement vector callback for evtchn delivery
> >
> > - 21445: TSC handling cleanups (version 2)
> >
> > - 21341: Export timer hypercalls to HVM guests too
> >
> > - 21339: TSC handling cleanups
>
> Thanks for the thorough reply! Any reason that any of these
> cannot or should not be back-ported to go into 4.0.1?
>

They should be safe to backport, the only ones we should be extra
careful are the TSC related patches, and you were the one to raise that
complain :)
If you have some spare time to test 21445 and 21339 on 4.0.1 to make
sure that everything keeps working as expected, then we can safely
apply them to xen 4.0 testing.

2010-07-02 17:17:53

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 11/13] Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.

On Thu, Jul 1, 2010 at 8:41 PM, Don Dutile <[email protected]> wrote:
> [email protected] wrote:
>> From: Stefano Stabellini <[email protected]>
>>
>> Use xen_vcpuop_clockevent instead of hpet and APIC timers as main
>> clockevent device on all vcpus, use the xen wallclock time as wallclock
>> instead of rtc and use xen_clocksource as clocksource.
>> The pv clock algorithm needs to work correctly for the xen_clocksource
>> and xen wallclock to be usable, only modern Xen versions offer a
>> reliable pv clock in HVM guests (XENFEAT_hvm_safe_pvclock).
>>
>> Using the hpet as clocksource means a VMEXIT every time we read/write to
>> the hpet mmio addresses, pvclock give us a better rating without
>> VMEXITs. Same goes for the xen wallclock and xen_vcpuop_clockevent
>>
>> Signed-off-by: Stefano Stabellini <[email protected]>
>> ---
>> ?arch/x86/xen/enlighten.c ? ? ? ? | ? 14 +--------
>> ?arch/x86/xen/suspend.c ? ? ? ? ? | ? ?4 ++
>> ?arch/x86/xen/time.c ? ? ? ? ? ? ?| ? 59 ++++++++++++++++++++++++++++++++++---
>> ?arch/x86/xen/xen-ops.h ? ? ? ? ? | ? ?7 +---
>> ?include/xen/interface/features.h | ? ?3 ++
>> ?5 files changed, 65 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 90bac21..bcd98a9 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -933,10 +933,6 @@ static const struct pv_init_ops xen_init_ops __initdata = {
>> ? ? ? .patch = xen_patch,
>> ?};
>>
>> -static const struct pv_time_ops xen_time_ops __initdata = {
>> - ? ? .sched_clock = xen_sched_clock,
>> -};
>> -
>> ?static const struct pv_cpu_ops xen_cpu_ops __initdata = {
>> ? ? ? .cpuid = xen_cpuid,
>>
>> @@ -1074,7 +1070,6 @@ asmlinkage void __init xen_start_kernel(void)
>> ? ? ? /* Install Xen paravirt ops */
>> ? ? ? pv_info = xen_info;
>> ? ? ? pv_init_ops = xen_init_ops;
>> - ? ? pv_time_ops = xen_time_ops;
>> ? ? ? pv_cpu_ops = xen_cpu_ops;
>> ? ? ? pv_apic_ops = xen_apic_ops;
>>
>> @@ -1082,13 +1077,7 @@ asmlinkage void __init xen_start_kernel(void)
>> ? ? ? x86_init.oem.arch_setup = xen_arch_setup;
>> ? ? ? x86_init.oem.banner = xen_banner;
>>
>> - ? ? x86_init.timers.timer_init = xen_time_init;
>> - ? ? x86_init.timers.setup_percpu_clockev = x86_init_noop;
>> - ? ? x86_cpuinit.setup_percpu_clockev = x86_init_noop;
>> -
>> - ? ? x86_platform.calibrate_tsc = xen_tsc_khz;
>> - ? ? x86_platform.get_wallclock = xen_get_wallclock;
>> - ? ? x86_platform.set_wallclock = xen_set_wallclock;
>> + ? ? xen_init_time_ops();
>>
>> ? ? ? /*
>> ? ? ? ?* Set up some pagetable state before starting to set any ptes.
>> @@ -1330,4 +1319,5 @@ void __init xen_hvm_guest_init(void)
>> ? ? ? register_cpu_notifier(&xen_hvm_cpu_notifier);
>> ? ? ? have_vcpu_info_placement = 0;
>> ? ? ? x86_init.irqs.intr_init = xen_init_IRQ;
>> + ? ? xen_hvm_init_time_ops();
>> ?}
>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> index 6ff9665..0774c67 100644
>> --- a/arch/x86/xen/suspend.c
>> +++ b/arch/x86/xen/suspend.c
>> @@ -28,8 +28,12 @@ void xen_pre_suspend(void)
>>
>> ?void xen_hvm_post_suspend(int suspend_cancelled)
>> ?{
>> + ? ? int cpu;
>> ? ? ? xen_hvm_init_shared_info();
>> ? ? ? xen_callback_vector();
>> + ? ? for_each_online_cpu(cpu) {
>> + ? ? ? ? ? ? xen_setup_runstate_info(cpu);
>> + ? ? }
>> ?}
>>
>
> As I found out on older xen (non 'modern Xen version' ;-) )
> w/o XENFEAT_hvm_safe_pvclock, the above patch should look like:
>
> ? ? ? ?if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> ? ? ? ? ? ? ? ?int cpu;
> ? ? ? ? ? ? ? ?for_each_online_cpu(cpu) {
> ? ? ? ? ? ? ? ? ? ? ? ?xen_setup_runstate_info(cpu);
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
>

Thank you for testing my series and for the patch, I'll include it in
the next version.

2010-07-02 19:51:16

by Dan Magenheimer

[permalink] [raw]
Subject: RE: [Xen-devel] Re: [PATCH 0/12] PV on HVM Xen

> From: Stefano Stabellini [mailto:[email protected]]
> >
> > Thanks for the thorough reply! Any reason that any of these
> > cannot or should not be back-ported to go into 4.0.1?
>
> They should be safe to backport, the only ones we should be extra
> careful are the TSC related patches, and you were the one to raise that
> complain :)
> If you have some spare time to test 21445 and 21339 on 4.0.1 to make
> sure that everything keeps working as expected, then we can safely
> apply them to xen 4.0 testing.

Thanks, yes, I was putting that task on my list and wasn't sure
what the full set of required changes were and if there were
interdependencies so that I could test on 4.0-testing instead
of on xen-unstable. (Depending on Keir's 4.0.1 schedule, I
might not get it done in time for 4.0.1 though.)

Dan

2010-07-05 11:58:33

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 12/13] Unplug emulated disks and nics

On Thu, 1 Jul 2010, Don Dutile wrote:
> The problem with this check/enable is that if you run
> this on an older qemu-xen that doesn't have unplug support,
> it fails pv-hvm configuration.
>
> But, all that means is that you can't use an xvd as the boot device,
> and you have to use an emulated IDE device as boot device.
> There are a couple ways to configure the vnif correctly (in guest
> or in xen guest config file).
>
> So, on an older (say, rhel5) xen, I don't have this check;
> the boot device is required to be spec'd as hda, not vda, and
> xen-blkfront is not allowed to configure blk major nums
> for IDE (& SCSI) (to avoid 2 drivers twiddling w/same phys backend... not good!).
>

Who is requiring that the boot device is spec'd as hda and not xvda?
I don't think there is such limitation in xend or libxl at the moment.

2010-07-07 20:02:47

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH 12/13] Unplug emulated disks and nics

Stefano Stabellini wrote:
> On Thu, 1 Jul 2010, Don Dutile wrote:
>> The problem with this check/enable is that if you run
>> this on an older qemu-xen that doesn't have unplug support,
>> it fails pv-hvm configuration.
>>
>> But, all that means is that you can't use an xvd as the boot device,
>> and you have to use an emulated IDE device as boot device.
>> There are a couple ways to configure the vnif correctly (in guest
>> or in xen guest config file).
>>
>> So, on an older (say, rhel5) xen, I don't have this check;
>> the boot device is required to be spec'd as hda, not vda, and
>> xen-blkfront is not allowed to configure blk major nums
>> for IDE (& SCSI) (to avoid 2 drivers twiddling w/same phys backend... not good!).
>>
>
> Who is requiring that the boot device is spec'd as hda and not xvda?
> I don't think there is such limitation in xend or libxl at the moment.
>

If you take a previous xen HVM guest spec & just run it on a guest
that has pv-hvm added to it, then one has hda spec'd as boot device (by default,
by not editing the guest config spec/file).

Ideally, both config specs should/would work.

- Don

2010-07-08 13:12:54

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 12/13] Unplug emulated disks and nics

On Wed, 7 Jul 2010, Don Dutile wrote:
> Stefano Stabellini wrote:
> > On Thu, 1 Jul 2010, Don Dutile wrote:
> >> The problem with this check/enable is that if you run
> >> this on an older qemu-xen that doesn't have unplug support,
> >> it fails pv-hvm configuration.
> >>
> >> But, all that means is that you can't use an xvd as the boot device,
> >> and you have to use an emulated IDE device as boot device.
> >> There are a couple ways to configure the vnif correctly (in guest
> >> or in xen guest config file).
> >>
> >> So, on an older (say, rhel5) xen, I don't have this check;
> >> the boot device is required to be spec'd as hda, not vda, and
> >> xen-blkfront is not allowed to configure blk major nums
> >> for IDE (& SCSI) (to avoid 2 drivers twiddling w/same phys backend... not good!).
> >>
> >
> > Who is requiring that the boot device is spec'd as hda and not xvda?
> > I don't think there is such limitation in xend or libxl at the moment.
> >
>
> If you take a previous xen HVM guest spec & just run it on a guest
> that has pv-hvm added to it, then one has hda spec'd as boot device (by default,
> by not editing the guest config spec/file).
>
> Ideally, both config specs should/would work.

I wouldn't want to cause data corruptions by default to people that
specified xvda in their HVM config files by mistake (for example because
they copied and pasted from a PV guest config file).
But I agree that we should be able to do the right thing in your case
scenario too.

I propose the appended patch (to be merge with "Unplug emulated disks
and nics"): if the user specifies xen_emul_unplug=ignore and the unplug
protocol is not supported (old xen installations like rhel5), we
continue with the PV on HVM initialization and we make sure that
blkfront doesn't hook any IDE or SCSI device.

Don, Jeremy, what do you think about it?


---


diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
index 72a3da6..2f7f3fb 100644
--- a/arch/x86/xen/platform-pci-unplug.c
+++ b/arch/x86/xen/platform-pci-unplug.c
@@ -29,9 +29,9 @@
#define XEN_PLATFORM_ERR_PROTOCOL -2
#define XEN_PLATFORM_ERR_BLACKLIST -3

-/* boolean to signal that the platform pci device can be used */
-bool xen_platform_pci_enabled;
-EXPORT_SYMBOL_GPL(xen_platform_pci_enabled);
+/* store the value of xen_emul_unplug after the unplug is done */
+int xen_platform_pci_unplug;
+EXPORT_SYMBOL_GPL(xen_platform_pci_unplug);
static int xen_emul_unplug;

static int __init check_platform_magic(void)
@@ -76,13 +76,13 @@ void __init xen_unplug_emulated_devices(void)
/* If the version matches enable the Xen platform PCI driver.
* Also enable the Xen platform PCI driver if the version is really old
* and the user told us to ignore it. */
- if (!r || (r == XEN_PLATFORM_ERR_MAGIC &&
- (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
- xen_platform_pci_enabled = 1;
+ if (r && !(r == XEN_PLATFORM_ERR_MAGIC &&
+ (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
+ return;
/* Set the default value of xen_emul_unplug depending on whether or
* not the Xen PV frontends and the Xen platform PCI driver have
* been compiled for this kernel (modules or built-in are both OK). */
- if (xen_platform_pci_enabled && !xen_emul_unplug) {
+ if (!xen_emul_unplug) {
if (xen_must_unplug_nics()) {
printk(KERN_INFO "Netfront and the Xen platform PCI driver have "
"been compiled for this kernel: unplug emulated NICs.\n");
@@ -98,8 +98,9 @@ void __init xen_unplug_emulated_devices(void)
}
}
/* Now unplug the emulated devices */
- if (xen_platform_pci_enabled && !(xen_emul_unplug & XEN_UNPLUG_IGNORE))
+ if (!(xen_emul_unplug & XEN_UNPLUG_IGNORE))
outw(xen_emul_unplug, XEN_IOPORT_UNPLUG);
+ xen_platform_pci_unplug = xen_emul_unplug;
}

static int __init parse_xen_emul_unplug(char *arg)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 82ed403..6eb2989 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -48,6 +48,7 @@
#include <xen/grant_table.h>
#include <xen/events.h>
#include <xen/page.h>
+#include <xen/platform_pci.h>

#include <xen/interface/grant_table.h>
#include <xen/interface/io/blkif.h>
@@ -737,6 +738,22 @@ static int blkfront_probe(struct xenbus_device *dev,
}
}

+ /* no unplug has been done: do not hook devices != xen vbds */
+ if (xen_hvm_domain() && (xen_platform_pci_unplug & XEN_UNPLUG_IGNORE)) {
+ int major;
+
+ if (!VDEV_IS_EXTENDED(vdevice))
+ major = BLKIF_MAJOR(vdevice);
+ else
+ major = XENVBD_MAJOR;
+
+ if (major != XENVBD_MAJOR) {
+ printk(KERN_INFO
+ "%s: HVM does not support vbd %d as xen block device\n",
+ __FUNCTION__, vdevice);
+ return -ENODEV;
+ }
+ }
info = kzalloc(sizeof(*info), GFP_KERNEL);
if (!info) {
xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index be8b4f3..c01b5dd 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -196,7 +196,9 @@ static struct pci_driver platform_driver = {

static int __init platform_pci_module_init(void)
{
- if (!xen_platform_pci_enabled)
+ /* no unplug has been done, IGNORE hasn't been specified: just
+ * return now */
+ if (!xen_platform_pci_unplug)
return -ENODEV;

return pci_register_driver(&platform_driver);
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 243279a..34287da 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -978,7 +978,7 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
#ifndef MODULE
static int __init boot_wait_for_devices(void)
{
- if (xen_hvm_domain() && !xen_platform_pci_enabled)
+ if (xen_hvm_domain() && !xen_platform_pci_unplug)
return -ENODEV;

ready_to_wait_for_devices = 1;
diff --git a/include/xen/platform_pci.h b/include/xen/platform_pci.h
index afa8855..ce9d671 100644
--- a/include/xen/platform_pci.h
+++ b/include/xen/platform_pci.h
@@ -44,6 +44,6 @@ static inline int xen_must_unplug_disks(void) {
#endif
}

-extern bool xen_platform_pci_enabled;
+extern int xen_platform_pci_unplug;

#endif /* _XEN_PLATFORM_PCI_H */

2010-07-08 19:59:10

by Donald Dutile

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

Stefano Stabellini wrote:
> On Wed, 7 Jul 2010, Don Dutile wrote:
>> Stefano Stabellini wrote:
>>> On Thu, 1 Jul 2010, Don Dutile wrote:
>>>> The problem with this check/enable is that if you run
>>>> this on an older qemu-xen that doesn't have unplug support,
>>>> it fails pv-hvm configuration.
>>>>
>>>> But, all that means is that you can't use an xvd as the boot device,
>>>> and you have to use an emulated IDE device as boot device.
>>>> There are a couple ways to configure the vnif correctly (in guest
>>>> or in xen guest config file).
>>>>
>>>> So, on an older (say, rhel5) xen, I don't have this check;
>>>> the boot device is required to be spec'd as hda, not vda, and
>>>> xen-blkfront is not allowed to configure blk major nums
>>>> for IDE (& SCSI) (to avoid 2 drivers twiddling w/same phys backend... not good!).
>>>>
>>> Who is requiring that the boot device is spec'd as hda and not xvda?
>>> I don't think there is such limitation in xend or libxl at the moment.
>>>
>> If you take a previous xen HVM guest spec & just run it on a guest
>> that has pv-hvm added to it, then one has hda spec'd as boot device (by default,
>> by not editing the guest config spec/file).
>>
>> Ideally, both config specs should/would work.
>
> I wouldn't want to cause data corruptions by default to people that
> specified xvda in their HVM config files by mistake (for example because
> they copied and pasted from a PV guest config file).
> But I agree that we should be able to do the right thing in your case
> scenario too.
>
> I propose the appended patch (to be merge with "Unplug emulated disks
> and nics"): if the user specifies xen_emul_unplug=ignore and the unplug
> protocol is not supported (old xen installations like rhel5), we
> continue with the PV on HVM initialization and we make sure that
> blkfront doesn't hook any IDE or SCSI device.
>
> Don, Jeremy, what do you think about it?
>
>

I guess what I'm wondering is why not set xen_emul_unplug to ignore by
default (static int xen_emul_unplug=XEN_UNPLUG_IGNORE), which handles
the case I mentioned (just take existing guest config file as is, no edits,
pre-pv-hvm added to guest kernel), and if person edits config file to
change boot device to xvda, they would then edit the config to add
-x xen_emul_unplug=[all|ide-disks|...] as well.

overall, i like the direction of the change to the unplug patch;
there'd be minor changes if you adopted the above default, but effectively
what's below would work well on older (rhel5) dom0's/tools.

- Don

> ---
>
>
> diff --git a/arch/x86/xen/platform-pci-unplug.c b/arch/x86/xen/platform-pci-unplug.c
> index 72a3da6..2f7f3fb 100644
> --- a/arch/x86/xen/platform-pci-unplug.c
> +++ b/arch/x86/xen/platform-pci-unplug.c
> @@ -29,9 +29,9 @@
> #define XEN_PLATFORM_ERR_PROTOCOL -2
> #define XEN_PLATFORM_ERR_BLACKLIST -3
>
> -/* boolean to signal that the platform pci device can be used */
> -bool xen_platform_pci_enabled;
> -EXPORT_SYMBOL_GPL(xen_platform_pci_enabled);
> +/* store the value of xen_emul_unplug after the unplug is done */
> +int xen_platform_pci_unplug;
> +EXPORT_SYMBOL_GPL(xen_platform_pci_unplug);
> static int xen_emul_unplug;
>
> static int __init check_platform_magic(void)
> @@ -76,13 +76,13 @@ void __init xen_unplug_emulated_devices(void)
> /* If the version matches enable the Xen platform PCI driver.
> * Also enable the Xen platform PCI driver if the version is really old
> * and the user told us to ignore it. */
> - if (!r || (r == XEN_PLATFORM_ERR_MAGIC &&
> - (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
> - xen_platform_pci_enabled = 1;
> + if (r && !(r == XEN_PLATFORM_ERR_MAGIC &&
> + (xen_emul_unplug & XEN_UNPLUG_IGNORE)))
> + return;

and it should return ??? ^^^^


> /* Set the default value of xen_emul_unplug depending on whether or
> * not the Xen PV frontends and the Xen platform PCI driver have
> * been compiled for this kernel (modules or built-in are both OK). */
> - if (xen_platform_pci_enabled && !xen_emul_unplug) {
> + if (!xen_emul_unplug) {
> if (xen_must_unplug_nics()) {
> printk(KERN_INFO "Netfront and the Xen platform PCI driver have "
> "been compiled for this kernel: unplug emulated NICs.\n");
> @@ -98,8 +98,9 @@ void __init xen_unplug_emulated_devices(void)
> }
> }
> /* Now unplug the emulated devices */
> - if (xen_platform_pci_enabled && !(xen_emul_unplug & XEN_UNPLUG_IGNORE))
> + if (!(xen_emul_unplug & XEN_UNPLUG_IGNORE))
> outw(xen_emul_unplug, XEN_IOPORT_UNPLUG);
> + xen_platform_pci_unplug = xen_emul_unplug;
> }
>
> static int __init parse_xen_emul_unplug(char *arg)
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 82ed403..6eb2989 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -48,6 +48,7 @@
> #include <xen/grant_table.h>
> #include <xen/events.h>
> #include <xen/page.h>
> +#include <xen/platform_pci.h>
>
> #include <xen/interface/grant_table.h>
> #include <xen/interface/io/blkif.h>
> @@ -737,6 +738,22 @@ static int blkfront_probe(struct xenbus_device *dev,
> }
> }
>
> + /* no unplug has been done: do not hook devices != xen vbds */
> + if (xen_hvm_domain() && (xen_platform_pci_unplug & XEN_UNPLUG_IGNORE)) {
> + int major;
> +
> + if (!VDEV_IS_EXTENDED(vdevice))
> + major = BLKIF_MAJOR(vdevice);
> + else
> + major = XENVBD_MAJOR;
> +
> + if (major != XENVBD_MAJOR) {
> + printk(KERN_INFO
> + "%s: HVM does not support vbd %d as xen block device\n",
> + __FUNCTION__, vdevice);
> + return -ENODEV;
> + }
> + }

Close to what I have for rhel5 (except I specifically scanned for ide & scsi blk major);
I like the above better -- xen-blkfront only supports.... xvd's ! :)


> info = kzalloc(sizeof(*info), GFP_KERNEL);
> if (!info) {
> xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index be8b4f3..c01b5dd 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -196,7 +196,9 @@ static struct pci_driver platform_driver = {
>
> static int __init platform_pci_module_init(void)
> {
> - if (!xen_platform_pci_enabled)
> + /* no unplug has been done, IGNORE hasn't been specified: just
> + * return now */
> + if (!xen_platform_pci_unplug)
> return -ENODEV;
>
> return pci_register_driver(&platform_driver);
> diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
> index 243279a..34287da 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -978,7 +978,7 @@ static void wait_for_devices(struct xenbus_driver *xendrv)
> #ifndef MODULE
> static int __init boot_wait_for_devices(void)
> {
> - if (xen_hvm_domain() && !xen_platform_pci_enabled)
> + if (xen_hvm_domain() && !xen_platform_pci_unplug)
> return -ENODEV;
>
> ready_to_wait_for_devices = 1;
> diff --git a/include/xen/platform_pci.h b/include/xen/platform_pci.h
> index afa8855..ce9d671 100644
> --- a/include/xen/platform_pci.h
> +++ b/include/xen/platform_pci.h
> @@ -44,6 +44,6 @@ static inline int xen_must_unplug_disks(void) {
> #endif
> }
>
> -extern bool xen_platform_pci_enabled;
> +extern int xen_platform_pci_unplug;
>
> #endif /* _XEN_PLATFORM_PCI_H */
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xensource.com/xen-devel

2010-07-08 21:39:18

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

On Thu, 2010-07-08 at 20:57 +0100, Don Dutile wrote:
> I guess what I'm wondering is why not set xen_emul_unplug to ignore by
> default (static int xen_emul_unplug=XEN_UNPLUG_IGNORE), which handles
> the case I mentioned (just take existing guest config file as is, no edits,
> pre-pv-hvm added to guest kernel), and if person edits config file to
> change boot device to xvda, they would then edit the config to add
> -x xen_emul_unplug=[all|ide-disks|...] as well.

Can you guarantee that nobody is running an HVM guest today with a
configuration file that specifies xvda (I believe it would work)? In
other words can you be sure that defaulting to XEN_UNPLUG_IGNORE is
_always_ going to be safe? Not just on RHEL hosts and with
configurations generated by the RH tools or according to the RH docs but
on any host with any (possibly hand-crafted) configuration?

Any guest which uses xvda in its configuration file today will be using
emulated devices but I think that with Stefano's patch and your proposed
change in default on a Xen system without support for unplug will start
using PV devices without unplugging the emulated ones first.

I don't think there is any way for a guest running on a platform which
does not support the unplug protocol to know automatically if it is safe
to use the PV devices or not, therefore we have to err on the side of
caution and ask users with such systems who know that their
configuration is safe to explicitly request PV devices by using the
command line option. Doing anything else is taking risks with people's
data.

Ian.

2010-07-08 22:00:50

by Donald Dutile

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

Ian Campbell wrote:
> On Thu, 2010-07-08 at 20:57 +0100, Don Dutile wrote:
>> I guess what I'm wondering is why not set xen_emul_unplug to ignore by
>> default (static int xen_emul_unplug=XEN_UNPLUG_IGNORE), which handles
>> the case I mentioned (just take existing guest config file as is, no edits,
>> pre-pv-hvm added to guest kernel), and if person edits config file to
>> change boot device to xvda, they would then edit the config to add
>> -x xen_emul_unplug=[all|ide-disks|...] as well.
>
> Can you guarantee that nobody is running an HVM guest today with a
> configuration file that specifies xvda (I believe it would work)? In
> other words can you be sure that defaulting to XEN_UNPLUG_IGNORE is
> _always_ going to be safe? Not just on RHEL hosts and with
> configurations generated by the RH tools or according to the RH docs but
> on any host with any (possibly hand-crafted) configuration?
>
No, you have a valid point. We have pv-on-hvm support for rhel3->rhel5
HVM guests, and they support xvda on boot devices (once initrd is rebuilt),
so it's possible to have that config (spec'd) as well, and someone
to copy & edit it for use on a more current disk image w/relatively current
kernel.
But I'm considering 2.6.32+ HVM guests that didn't have xvda spec'd in
the boot path ever, and are upgraded to a post-2.6.32 kernel that has
pv-on-hvm added to it.

> Any guest which uses xvda in its configuration file today will be using
> emulated devices but I think that with Stefano's patch and your proposed
> change in default on a Xen system without support for unplug will start
> using PV devices without unplugging the emulated ones first.
>
Well, Stefano requires the admin to add unplug switch to kernel cmd line,
so I don't see the harm in defaulting to unplug...

> I don't think there is any way for a guest running on a platform which
> does not support the unplug protocol to know automatically if it is safe
> to use the PV devices or not, therefore we have to err on the side of
> caution and ask users with such systems who know that their
> configuration is safe to explicitly request PV devices by using the
> command line option. Doing anything else is taking risks with people's
> data.
>
> Ian.
>
>
Either way, the user/admin has to add cmdline to unplug to be safe.
I don't see how defaulting to UNPLUG_IGNORE changes that requirement.
... or am I missing a case? -- ah, if IGNORE isn't spec'd, pvhvm just won't
be configured in, and blkfront wont run, and cant have blkfront & ide
accessing the same device.... is that the case I'm missing ?

- Don

2010-07-09 01:05:54

by john stultz

[permalink] [raw]
Subject: Re: [PATCH 10/13] Do not try to disable hpet if it hasn't been initialized before

On Wed, 2010-06-30 at 13:53 -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Jun 21, 2010 at 05:14:04PM +0100, [email protected] wrote:
> > From: Stefano Stabellini <[email protected]>
> >
> > hpet_disable is called unconditionally on machine reboot if hpet support
> > is compiled in the kernel.
> > hpet_disable only checks if the machine is hpet capable but doesn't make
> > sure that hpet has been initialized.
>
> CC-ing John Stultz who has had his head wrapped around this time-thingy.
>
> Hi John!
>
> What are your thoughts about this patch?

Sorry for the slow reply.

The patch looks good to me. It makes the code easier to read as well!

However, I've not had my head in the hpet details for awhile, so
Venkatesh's ack (already provided) is the better endorsement.

thanks
-john


> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > arch/x86/kernel/hpet.c | 18 ++++++++++--------
> > 1 files changed, 10 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> > index 23b4ecd..2b299da 100644
> > --- a/arch/x86/kernel/hpet.c
> > +++ b/arch/x86/kernel/hpet.c
> > @@ -959,16 +959,18 @@ fs_initcall(hpet_late_init);
> >
> > void hpet_disable(void)
> > {
> > - if (is_hpet_capable()) {
> > - unsigned int cfg = hpet_readl(HPET_CFG);
> > + unsigned int cfg;
> >
> > - if (hpet_legacy_int_enabled) {
> > - cfg &= ~HPET_CFG_LEGACY;
> > - hpet_legacy_int_enabled = 0;
> > - }
> > - cfg &= ~HPET_CFG_ENABLE;
> > - hpet_writel(cfg, HPET_CFG);
> > + if (!is_hpet_capable() || !hpet_address || !hpet_virt_address)
> > + return;
> > +
> > + cfg = hpet_readl(HPET_CFG);
> > + if (hpet_legacy_int_enabled) {
> > + cfg &= ~HPET_CFG_LEGACY;
> > + hpet_legacy_int_enabled = 0;
> > }
> > + cfg &= ~HPET_CFG_ENABLE;
> > + hpet_writel(cfg, HPET_CFG);
> > }
> >
> > #ifdef CONFIG_HPET_EMULATE_RTC
> > --
> > 1.7.0.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

2010-07-09 08:02:31

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

On Thu, 2010-07-08 at 22:59 +0100, Don Dutile wrote:
> Ian Campbell wrote:
> > On Thu, 2010-07-08 at 20:57 +0100, Don Dutile wrote:
> >> I guess what I'm wondering is why not set xen_emul_unplug to ignore by
> >> default (static int xen_emul_unplug=XEN_UNPLUG_IGNORE), which handles
> >> the case I mentioned (just take existing guest config file as is, no edits,
> >> pre-pv-hvm added to guest kernel), and if person edits config file to
> >> change boot device to xvda, they would then edit the config to add
> >> -x xen_emul_unplug=[all|ide-disks|...] as well.
> >
> > Can you guarantee that nobody is running an HVM guest today with a
> > configuration file that specifies xvda (I believe it would work)? In
> > other words can you be sure that defaulting to XEN_UNPLUG_IGNORE is
> > _always_ going to be safe? Not just on RHEL hosts and with
> > configurations generated by the RH tools or according to the RH docs but
> > on any host with any (possibly hand-crafted) configuration?
> >
> No, you have a valid point. We have pv-on-hvm support for rhel3->rhel5
> HVM guests, and they support xvda on boot devices (once initrd is rebuilt),
> so it's possible to have that config (spec'd) as well, and someone
> to copy & edit it for use on a more current disk image w/relatively current
> kernel.
> But I'm considering 2.6.32+ HVM guests that didn't have xvda spec'd in
> the boot path ever, and are upgraded to a post-2.6.32 kernel that has
> pv-on-hvm added to it.

An HVM guest could have any kernel version, even one prior to 2.6.32,
and be updated to a post-2.6.32 with PV-on-HVM, we cannot restrict
ourselves to just 2.6.32+ kernels (not that I think it makes a
difference anyhow).

>
> > Any guest which uses xvda in its configuration file today will be using
> > emulated devices but I think that with Stefano's patch and your proposed
> > change in default on a Xen system without support for unplug will start
> > using PV devices without unplugging the emulated ones first.
> >
> Well, Stefano requires the admin to add unplug switch to kernel cmd line,

In the case where the host platform does not support the unplug protocol
this is correct and requiring explicit admin action to allow the PV
frontends to activate is the only safe option WRT the users data.

However if the host platform does support the unplug protocol then this
is incorrect. In that case the default (in the absence of the command
line option) is to automatically unplug any device for which a PV driver
is available and so no command line option will be required in the
common case. (see xen_unplug_emulated_devices() under the comment "Set
the default value of xen_emul_unplug depending on...")

> so I don't see the harm in defaulting to unplug...

As I described in my previous mail this is unsafe on host platforms
which do not support unplug. As I describe above it is unnecessary on
host platforms which do support unplug

> Either way, the user/admin has to add cmdline to unplug to be safe.

Not true, on a recent Xen system which supports the unplug protocol then
devices will be unplugged automatically without an additional command
line option.

Ian.

2010-07-09 10:54:48

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

On Fri, 9 Jul 2010, Ian Campbell wrote:
> >
> > > Any guest which uses xvda in its configuration file today will be using
> > > emulated devices but I think that with Stefano's patch and your proposed
> > > change in default on a Xen system without support for unplug will start
> > > using PV devices without unplugging the emulated ones first.
> > >
> > Well, Stefano requires the admin to add unplug switch to kernel cmd line,
>
> In the case where the host platform does not support the unplug protocol
> this is correct and requiring explicit admin action to allow the PV
> frontends to activate is the only safe option WRT the users data.
>
> However if the host platform does support the unplug protocol then this
> is incorrect. In that case the default (in the absence of the command
> line option) is to automatically unplug any device for which a PV driver
> is available and so no command line option will be required in the
> common case. (see xen_unplug_emulated_devices() under the comment "Set
> the default value of xen_emul_unplug depending on...")
>

that's right: on host platforms supporting unplug no command line options
are required; on the other hand if the host platform does not support
unplug then xen_emul_unplug=ignore is required to use PV drivers anyway.


> > so I don't see the harm in defaulting to unplug...
>
> As I described in my previous mail this is unsafe on host platforms
> which do not support unplug. As I describe above it is unnecessary on
> host platforms which do support unplug
>

yep, keep in mind the copy and paste example.



If you are happy about the patch, I'll include it in my next version.

2010-07-09 13:43:38

by Donald Dutile

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [PATCH 12/13] Unplug emulated disks and nics

Stefano Stabellini wrote:
> On Fri, 9 Jul 2010, Ian Campbell wrote:
>>>> Any guest which uses xvda in its configuration file today will be using
>>>> emulated devices but I think that with Stefano's patch and your proposed
>>>> change in default on a Xen system without support for unplug will start
>>>> using PV devices without unplugging the emulated ones first.
>>>>
>>> Well, Stefano requires the admin to add unplug switch to kernel cmd line,
>> In the case where the host platform does not support the unplug protocol
>> this is correct and requiring explicit admin action to allow the PV
>> frontends to activate is the only safe option WRT the users data.
>>
>> However if the host platform does support the unplug protocol then this
>> is incorrect. In that case the default (in the absence of the command
>> line option) is to automatically unplug any device for which a PV driver
>> is available and so no command line option will be required in the
>> common case. (see xen_unplug_emulated_devices() under the comment "Set
>> the default value of xen_emul_unplug depending on...")
>>
>
> that's right: on host platforms supporting unplug no command line options
> are required; on the other hand if the host platform does not support
> unplug then xen_emul_unplug=ignore is required to use PV drivers anyway.
>
>
I see by looking at xen_unplug_emulated_devices() closer, that the default
unplugs emulated ide & emulated nic:
if (xen_platform_pci_enabled && !xen_emul_unplug)

I had to work around the fact that xen_platform_pci_enabled would
not be set on rhel5-dom0 (no emul-unplug), so I hadn't looked at this default
so closely.


>>> so I don't see the harm in defaulting to unplug...
>> As I described in my previous mail this is unsafe on host platforms
>> which do not support unplug. As I describe above it is unnecessary on
>> host platforms which do support unplug
>>
>
> yep, keep in mind the copy and paste example.
>
>
>
> If you are happy about the patch, I'll include it in my next version.
>

Yes, the patch seems like a reasonable solution for running on older qemu-dm's.

Thanks for the clarification(s).

- Don