2009-03-13 08:12:26

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [GIT PULL] 2.6.30 Xen core updates


This series updates the kernel's baseline domU Xen functionality.
It's mostly bugfixes, but there are a couple of new Xen-specific drivers.

The series depends on the earlier x86/brk and x86/paravirt patches I
posted a couple of days ago.

Thanks,
J

The following changes since commit 6b3933081104945c557d8fe678301cc1bdefdcc8:
Jeremy Fitzhardinge (1):
Merge branch 'push/x86/brk' into HEAD

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git push/xen/master

Alex Nixon (1):
Xen: Add virt_to_pfn helper function

Hannes Eder (1):
NULL noise: arch/x86/xen/smp.c

Ian Campbell (6):
xen: add irq_from_evtchn
xen: add /dev/xen/evtchn driver
xen: export ioctl headers to userspace
xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
xen: remove suspend_cancel hook
xen: use device model for suspending xenbus devices

Jeremy Fitzhardinge (18):
xen: disable preempt for leave_lazy_mmu
xen: separate p2m allocation from setting
xen: dynamically allocate p2m tables
xen: split construction of p2m mfn tables from registration
xen: clean up xen_load_gdt
xen: make xen_load_gdt simpler
xen: remove xen_load_gdt debug
xen: reserve i386 Xen pagetables
xen: mask XSAVE from cpuid
xen: add FIX_TEXT_POKE to fixmap
x86-64: remove PGE from must-have feature list
xen/dev-evtchn: clean up locking in evtchn
xen: add "capabilities" file
xen: add /sys/hypervisor support
xen/sys/hypervisor: change writable_pt to features
xen/xenbus: export xenbus_dev_changed
Merge branches 'push/xen/dev-evtchn', 'push/xen/xenfs' and 'push/xen/sys-hypervisor' into push/xen/control
Merge branches 'push/xen/control' and 'push/xen/xenbus' into push/xen/master

arch/x86/include/asm/required-features.h | 2 +-
arch/x86/include/asm/xen/page.h | 3 +-
arch/x86/xen/enlighten.c | 76 ++++-
arch/x86/xen/mmu.c | 116 ++++++--
arch/x86/xen/mmu.h | 3 +
arch/x86/xen/smp.c | 4 +-
drivers/xen/Kconfig | 20 ++
drivers/xen/Makefile | 4 +-
drivers/xen/events.c | 6 +
drivers/xen/evtchn.c | 507 ++++++++++++++++++++++++++++++
drivers/xen/manage.c | 9 +-
drivers/xen/sys-hypervisor.c | 445 ++++++++++++++++++++++++++
drivers/xen/xenbus/xenbus_probe.c | 61 +---
drivers/xen/xenbus/xenbus_xs.c | 2 +
drivers/xen/xenfs/super.c | 19 +-
include/Kbuild | 1 +
include/xen/Kbuild | 1 +
include/xen/events.h | 3 +
include/xen/evtchn.h | 88 +++++
include/xen/interface/version.h | 3 +
include/xen/xenbus.h | 3 +-
21 files changed, 1269 insertions(+), 107 deletions(-)
create mode 100644 drivers/xen/evtchn.c
create mode 100644 drivers/xen/sys-hypervisor.c
create mode 100644 include/xen/Kbuild
create mode 100644 include/xen/evtchn.h


2009-03-13 08:13:20

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 02/24] xen: separate p2m allocation from setting

From: Jeremy Fitzhardinge <[email protected]>

When doing very early p2m setting, we need to separate setting
from allocation, so split things up accordingly.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 61 +++++++++++++++++++++++++++++++++++++--------------
arch/x86/xen/mmu.h | 3 ++
2 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index eceff87..d534986 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -233,47 +233,74 @@ unsigned long get_phys_to_machine(unsigned long pfn)
}
EXPORT_SYMBOL_GPL(get_phys_to_machine);

-static void alloc_p2m(unsigned long **pp, unsigned long *mfnp)
+/* install a new p2m_top page */
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p)
{
- unsigned long *p;
+ unsigned topidx = p2m_top_index(pfn);
+ unsigned long **pfnp, *mfnp;
unsigned i;

- p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
- BUG_ON(p == NULL);
+ pfnp = &p2m_top[topidx];
+ mfnp = &p2m_top_mfn[topidx];

for (i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
p[i] = INVALID_P2M_ENTRY;

- if (cmpxchg(pp, p2m_missing, p) != p2m_missing)
- free_page((unsigned long)p);
- else
+ if (cmpxchg(pfnp, p2m_missing, p) == p2m_missing) {
*mfnp = virt_to_mfn(p);
+ return true;
+ }
+
+ return false;
}

-void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+static void alloc_p2m(unsigned long pfn)
{
- unsigned topidx, idx;
+ unsigned long *p;

- if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
- BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
- return;
- }
+ p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
+ BUG_ON(p == NULL);
+
+ if (!install_p2mtop_page(pfn, p))
+ free_page((unsigned long)p);
+}
+
+/* Try to install p2m mapping; fail if intermediate bits missing */
+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+ unsigned topidx, idx;

if (unlikely(pfn >= MAX_DOMAIN_PAGES)) {
BUG_ON(mfn != INVALID_P2M_ENTRY);
- return;
+ return true;
}

topidx = p2m_top_index(pfn);
if (p2m_top[topidx] == p2m_missing) {
- /* no need to allocate a page to store an invalid entry */
if (mfn == INVALID_P2M_ENTRY)
- return;
- alloc_p2m(&p2m_top[topidx], &p2m_top_mfn[topidx]);
+ return true;
+ return false;
}

idx = p2m_index(pfn);
p2m_top[topidx][idx] = mfn;
+
+ return true;
+}
+
+void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+ BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+ return;
+ }
+
+ if (unlikely(!__set_phys_to_machine(pfn, mfn))) {
+ alloc_p2m(pfn);
+
+ if (!__set_phys_to_machine(pfn, mfn))
+ BUG();
+ }
}

unsigned long arbitrary_virt_to_mfn(void *vaddr)
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index 24d1b44..da73026 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -11,6 +11,9 @@ enum pt_level {
};


+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p);
+
void set_pte_mfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);


--
1.6.0.6

2009-03-13 08:12:56

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 04/24] xen: split construction of p2m mfn tables from registration

From: Jeremy Fitzhardinge <[email protected]>

Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 05280b4..2d30b74 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -187,7 +187,7 @@ RESERVE_BRK(xen_top_mfn, SIZE_TOP_MFN);
RESERVE_BRK(xen_top_mfn_list, SIZE_TOP_MFN_LIST);

/* Build the parallel p2m_top_mfn structures */
-void xen_setup_mfn_list_list(void)
+static void __init xen_build_mfn_list_list(void)
{
unsigned pfn, idx;

@@ -204,7 +204,10 @@ void xen_setup_mfn_list_list(void)
unsigned topidx = idx * P2M_ENTRIES_PER_PAGE;
p2m_top_mfn_list[idx] = virt_to_mfn(&p2m_top_mfn[topidx]);
}
+}

+void xen_setup_mfn_list_list(void)
+{
BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);

HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
@@ -238,6 +241,8 @@ void __init xen_build_dynamic_phys_to_machine(void)

p2m_top[topidx] = &mfn_list[pfn];
}
+
+ xen_build_mfn_list_list();
}

unsigned long get_phys_to_machine(unsigned long pfn)
--
1.6.0.6

2009-03-13 08:13:44

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 03/24] xen: dynamically allocate p2m tables

From: Jeremy Fitzhardinge <[email protected]>

Saves about 128k static object size.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 38 +++++++++++++++++++++++++++++---------
1 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d534986..05280b4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -159,18 +159,14 @@ DEFINE_PER_CPU(unsigned long, xen_current_cr3); /* actual vcpu cr3 */
#define TOP_ENTRIES (MAX_DOMAIN_PAGES / P2M_ENTRIES_PER_PAGE)

/* Placeholder for holes in the address space */
-static unsigned long p2m_missing[P2M_ENTRIES_PER_PAGE] __page_aligned_data =
- { [ 0 ... P2M_ENTRIES_PER_PAGE-1 ] = ~0UL };
+static unsigned long *p2m_missing;

/* Array of pointers to pages containing p2m entries */
-static unsigned long *p2m_top[TOP_ENTRIES] __page_aligned_data =
- { [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
+static unsigned long **p2m_top;

/* Arrays of p2m arrays expressed in mfns used for save/restore */
-static unsigned long p2m_top_mfn[TOP_ENTRIES] __page_aligned_bss;
-
-static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
- __page_aligned_bss;
+static unsigned long *p2m_top_mfn;
+static unsigned long *p2m_top_mfn_list;

static inline unsigned p2m_top_index(unsigned long pfn)
{
@@ -183,18 +179,28 @@ static inline unsigned p2m_index(unsigned long pfn)
return pfn % P2M_ENTRIES_PER_PAGE;
}

+#define SIZE_TOP_MFN sizeof(*p2m_top_mfn) * TOP_ENTRIES
+#define SIZE_TOP_MFN_LIST sizeof(*p2m_top_mfn_list) * \
+ (TOP_ENTRIES / P2M_ENTRIES_PER_PAGE)
+
+RESERVE_BRK(xen_top_mfn, SIZE_TOP_MFN);
+RESERVE_BRK(xen_top_mfn_list, SIZE_TOP_MFN_LIST);
+
/* Build the parallel p2m_top_mfn structures */
void xen_setup_mfn_list_list(void)
{
unsigned pfn, idx;

+ p2m_top_mfn = extend_brk(SIZE_TOP_MFN, PAGE_SIZE);
+ p2m_top_mfn_list = extend_brk(SIZE_TOP_MFN_LIST, PAGE_SIZE);
+
for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_ENTRIES_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);

p2m_top_mfn[topidx] = virt_to_mfn(p2m_top[topidx]);
}

- for (idx = 0; idx < ARRAY_SIZE(p2m_top_mfn_list); idx++) {
+ for (idx = 0; idx < (TOP_ENTRIES / P2M_ENTRIES_PER_PAGE); idx++) {
unsigned topidx = idx * P2M_ENTRIES_PER_PAGE;
p2m_top_mfn_list[idx] = virt_to_mfn(&p2m_top_mfn[topidx]);
}
@@ -206,12 +212,26 @@ void xen_setup_mfn_list_list(void)
HYPERVISOR_shared_info->arch.max_pfn = xen_start_info->nr_pages;
}

+#define SIZE_P2M_MISSING sizeof(*p2m_missing) * P2M_ENTRIES_PER_PAGE
+#define SIZE_P2M_TOP sizeof(*p2m_top) * TOP_ENTRIES
+RESERVE_BRK(xen_p2m_missing, SIZE_P2M_MISSING);
+RESERVE_BRK(xen_p2m_top, SIZE_P2M_TOP);
+
/* Set up p2m_top to point to the domain-builder provided p2m pages */
void __init xen_build_dynamic_phys_to_machine(void)
{
unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
unsigned pfn;
+ unsigned i;
+
+ p2m_missing = extend_brk(SIZE_P2M_MISSING, PAGE_SIZE);
+ for(i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
+ p2m_missing[i] = ~0ul;
+
+ p2m_top = extend_brk(SIZE_P2M_TOP, PAGE_SIZE);
+ for(i = 0; i < TOP_ENTRIES; i++)
+ p2m_top[i] = p2m_missing;

for (pfn = 0; pfn < max_pfn; pfn += P2M_ENTRIES_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);
--
1.6.0.6

2009-03-13 08:14:36

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 07/24] xen: remove xen_load_gdt debug

From: Jeremy Fitzhardinge <[email protected]>

Don't need the noise.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/enlighten.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 48b399b..75b7a0f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -310,9 +310,6 @@ static void xen_load_gdt(const struct desc_ptr *dtr)

frames[f] = mfn;

- printk("xen_load_gdt: %d va=%p mfn=%lx pfn=%lx va'=%p\n",
- f, (void *)va, mfn, pfn, virt);
-
make_lowmem_page_readonly((void *)va);
make_lowmem_page_readonly(virt);
}
--
1.6.0.6

2009-03-13 08:14:11

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 01/24] xen: disable preempt for leave_lazy_mmu

From: Jeremy Fitzhardinge <[email protected]>

xen_mc_flush() requires preemption to be disabled for its own sanity,
so disable it while we're flushing.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 185b547..eceff87 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1812,8 +1812,10 @@ __init void xen_post_allocator_init(void)

static void xen_leave_lazy_mmu(void)
{
+ preempt_disable();
xen_mc_flush();
paravirt_leave_lazy_mmu();
+ preempt_enable();
}

const struct pv_mmu_ops xen_mmu_ops __initdata = {
--
1.6.0.6

2009-03-13 08:15:06

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 06/24] xen: make xen_load_gdt simpler

From: Jeremy Fitzhardinge <[email protected]>

Remove use of multicall machinery which is unused (gdt loading
is never performance critical). This removes the implicit use
of percpu variables, which simplifies understanding how
the percpu code's use of load_gdt interacts with this code.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/enlighten.c | 14 ++++++--------
1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5776dc2..48b399b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -284,12 +284,11 @@ static void xen_set_ldt(const void *addr, unsigned entries)

static void xen_load_gdt(const struct desc_ptr *dtr)
{
- unsigned long *frames;
unsigned long va = dtr->address;
unsigned int size = dtr->size + 1;
unsigned pages = (size + PAGE_SIZE - 1) / PAGE_SIZE;
+ unsigned long frames[pages];
int f;
- struct multicall_space mcs;

/* A GDT can be up to 64k in size, which corresponds to 8192
8-byte entries, or 16 4k pages.. */
@@ -297,9 +296,6 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
BUG_ON(size > 65536);
BUG_ON(va & ~PAGE_MASK);

- mcs = xen_mc_entry(sizeof(*frames) * pages);
- frames = mcs.args;
-
for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
int level;
pte_t *ptep = lookup_address(va, &level);
@@ -314,13 +310,15 @@ static void xen_load_gdt(const struct desc_ptr *dtr)

frames[f] = mfn;

+ printk("xen_load_gdt: %d va=%p mfn=%lx pfn=%lx va'=%p\n",
+ f, (void *)va, mfn, pfn, virt);
+
make_lowmem_page_readonly((void *)va);
make_lowmem_page_readonly(virt);
}

- MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
-
- xen_mc_issue(PARAVIRT_LAZY_CPU);
+ if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct)))
+ BUG();
}

static void load_TLS_descriptor(struct thread_struct *t,
--
1.6.0.6

2009-03-13 08:16:06

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 05/24] xen: clean up xen_load_gdt

From: Jeremy Fitzhardinge <[email protected]>

Makes the logic a bit clearer.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/enlighten.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 70b355d..5776dc2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -301,10 +301,21 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
frames = mcs.args;

for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
- frames[f] = arbitrary_virt_to_mfn((void *)va);
+ int level;
+ pte_t *ptep = lookup_address(va, &level);
+ unsigned long pfn, mfn;
+ void *virt;
+
+ BUG_ON(ptep == NULL);
+
+ pfn = pte_pfn(*ptep);
+ mfn = pfn_to_mfn(pfn);
+ virt = __va(PFN_PHYS(pfn));
+
+ frames[f] = mfn;

make_lowmem_page_readonly((void *)va);
- make_lowmem_page_readonly(mfn_to_virt(frames[f]));
+ make_lowmem_page_readonly(virt);
}

MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
--
1.6.0.6

2009-03-13 08:15:37

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 08/24] xen: reserve i386 Xen pagetables

From: Jeremy Fitzhardinge <[email protected]>

Make sure the Xen-provided pagetables are reserved on x86-32.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 2d30b74..065fe8d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1789,6 +1789,11 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,

pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));

+ reserve_early(__pa(xen_start_info->pt_base),
+ __pa(xen_start_info->pt_base +
+ xen_start_info->nr_pt_frames * PAGE_SIZE),
+ "XEN PAGETABLES");
+
return swapper_pg_dir;
}
#endif /* CONFIG_X86_64 */
--
1.6.0.6

2009-03-13 08:16:29

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 10/24] xen: mask XSAVE from cpuid

From: Jeremy Fitzhardinge <[email protected]>

Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
to be set. This confuses the kernel and it ends up crashing on
an xsetbv instruction.

At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of
cpuid it we can't. This will produce a spurious error from Xen,
but allows us to support XSAVE if/when Xen does.

This also factors out the cpuid mask decisions to boot time.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/enlighten.c | 50 ++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 75b7a0f..da33e0c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -168,21 +168,23 @@ static void __init xen_banner(void)
xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
}

+static __read_mostly unsigned int cpuid_leaf1_edx_mask = ~0;
+static __read_mostly unsigned int cpuid_leaf1_ecx_mask = ~0;
+
static void xen_cpuid(unsigned int *ax, unsigned int *bx,
unsigned int *cx, unsigned int *dx)
{
+ unsigned maskecx = ~0;
unsigned maskedx = ~0;

/*
* Mask out inconvenient features, to try and disable as many
* unsupported kernel subsystems as possible.
*/
- if (*ax == 1)
- maskedx = ~((1 << X86_FEATURE_APIC) | /* disable APIC */
- (1 << X86_FEATURE_ACPI) | /* disable ACPI */
- (1 << X86_FEATURE_MCE) | /* disable MCE */
- (1 << X86_FEATURE_MCA) | /* disable MCA */
- (1 << X86_FEATURE_ACC)); /* thermal monitoring */
+ if (*ax == 1) {
+ maskecx = cpuid_leaf1_ecx_mask;
+ maskedx = cpuid_leaf1_edx_mask;
+ }

asm(XEN_EMULATE_PREFIX "cpuid"
: "=a" (*ax),
@@ -190,9 +192,43 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
"=c" (*cx),
"=d" (*dx)
: "0" (*ax), "2" (*cx));
+
+ *cx &= maskecx;
*dx &= maskedx;
}

+static __init void xen_init_cpuid_mask(void)
+{
+ unsigned int ax, bx, cx, dx;
+
+ cpuid_leaf1_edx_mask =
+ ~((1 << X86_FEATURE_MCE) | /* disable MCE */
+ (1 << X86_FEATURE_MCA) | /* disable MCA */
+ (1 << X86_FEATURE_ACC)); /* thermal monitoring */
+
+ if (!xen_initial_domain())
+ cpuid_leaf1_edx_mask &=
+ ~((1 << X86_FEATURE_APIC) | /* disable local APIC */
+ (1 << X86_FEATURE_ACPI)); /* disable ACPI */
+
+ ax = 1;
+ xen_cpuid(&ax, &bx, &cx, &dx);
+
+ /* cpuid claims we support xsave; try enabling it to see what happens */
+ if (cx & (1 << (X86_FEATURE_XSAVE % 32))) {
+ unsigned long cr4;
+
+ set_in_cr4(X86_CR4_OSXSAVE);
+
+ cr4 = read_cr4();
+
+ if ((cr4 & X86_CR4_OSXSAVE) == 0)
+ cpuid_leaf1_ecx_mask &= ~(1 << (X86_FEATURE_XSAVE % 32));
+
+ clear_in_cr4(X86_CR4_OSXSAVE);
+ }
+}
+
static void xen_set_debugreg(int reg, unsigned long val)
{
HYPERVISOR_set_debugreg(reg, val);
@@ -901,6 +937,8 @@ asmlinkage void __init xen_start_kernel(void)

xen_init_irq_ops();

+ xen_init_cpuid_mask();
+
#ifdef CONFIG_X86_LOCAL_APIC
/*
* set up the basic apic ops.
--
1.6.0.6

2009-03-13 08:17:22

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 11/24] xen: add FIX_TEXT_POKE to fixmap

From: Jeremy Fitzhardinge <[email protected]>

FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping
pfns, not mfns.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/mmu.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 065fe8d..8969353 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1821,6 +1821,9 @@ static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
#ifdef CONFIG_X86_LOCAL_APIC
case FIX_APIC_BASE: /* maps dummy local APIC */
#endif
+ case FIX_TEXT_POKE0:
+ case FIX_TEXT_POKE1:
+ /* All local page mappings */
pte = pfn_pte(phys, prot);
break;

--
1.6.0.6

2009-03-13 08:16:57

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 09/24] NULL noise: arch/x86/xen/smp.c

From: Hannes Eder <[email protected]>

Fix this sparse warnings:
arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer
arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/smp.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 8d47056..304d832 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -317,7 +317,7 @@ static int __cpuinit xen_cpu_up(unsigned int cpu)
BUG_ON(rc);

while(per_cpu(cpu_state, cpu) != CPU_ONLINE) {
- HYPERVISOR_sched_op(SCHEDOP_yield, 0);
+ HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
barrier();
}

@@ -422,7 +422,7 @@ static void xen_smp_send_call_function_ipi(const struct cpumask *mask)
/* Make sure other vcpus get a chance to run if they need to. */
for_each_cpu(cpu, mask) {
if (xen_vcpu_stolen(cpu)) {
- HYPERVISOR_sched_op(SCHEDOP_yield, 0);
+ HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
break;
}
}
--
1.6.0.6

2009-03-13 08:17:47

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 12/24] x86-64: remove PGE from must-have feature list

From: Jeremy Fitzhardinge <[email protected]>

PGE may not be available when running paravirtualized, so test the cpuid
bit before using it.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/include/asm/required-features.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
index d5cd6c5..a4737dd 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -50,7 +50,7 @@
#ifdef CONFIG_X86_64
#define NEED_PSE 0
#define NEED_MSR (1<<(X86_FEATURE_MSR & 31))
-#define NEED_PGE (1<<(X86_FEATURE_PGE & 31))
+#define NEED_PGE 0
#define NEED_FXSR (1<<(X86_FEATURE_FXSR & 31))
#define NEED_XMM (1<<(X86_FEATURE_XMM & 31))
#define NEED_XMM2 (1<<(X86_FEATURE_XMM2 & 31))
--
1.6.0.6

2009-03-13 08:18:42

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 20/24] xen/sys/hypervisor: change writable_pt to features

From: Jeremy Fitzhardinge <[email protected]>

/sys/hypervisor/properties/writable_pt was misnamed. Rename to features,
expressed as a bit array in hex.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/sys-hypervisor.c | 41 ++++++++++++++++++++++++++---------------
1 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index cb29d1c..1267d6f 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -293,37 +293,48 @@ static ssize_t pagesize_show(struct hyp_sysfs_attr *attr, char *buffer)

HYPERVISOR_ATTR_RO(pagesize);

-/* eventually there will be several more features to export */
static ssize_t xen_feature_show(int index, char *buffer)
{
- int ret = -ENOMEM;
- struct xen_feature_info *info;
+ ssize_t ret;
+ struct xen_feature_info info;

- info = kmalloc(sizeof(struct xen_feature_info), GFP_KERNEL);
- if (info) {
- info->submap_idx = index;
- ret = HYPERVISOR_xen_version(XENVER_get_features, info);
- if (!ret)
- ret = sprintf(buffer, "%d\n", info->submap);
- kfree(info);
- }
+ info.submap_idx = index;
+ ret = HYPERVISOR_xen_version(XENVER_get_features, &info);
+ if (!ret)
+ ret = sprintf(buffer, "%08x", info.submap);

return ret;
}

-static ssize_t writable_pt_show(struct hyp_sysfs_attr *attr, char *buffer)
+static ssize_t features_show(struct hyp_sysfs_attr *attr, char *buffer)
{
- return xen_feature_show(XENFEAT_writable_page_tables, buffer);
+ ssize_t len;
+ int i;
+
+ len = 0;
+ for (i = XENFEAT_NR_SUBMAPS-1; i >= 0; i--) {
+ int ret = xen_feature_show(i, buffer + len);
+ if (ret < 0) {
+ if (len == 0)
+ len = ret;
+ break;
+ }
+ len += ret;
+ }
+ if (len > 0)
+ buffer[len++] = '\n';
+
+ return len;
}

-HYPERVISOR_ATTR_RO(writable_pt);
+HYPERVISOR_ATTR_RO(features);

static struct attribute *xen_properties_attrs[] = {
&capabilities_attr.attr,
&changeset_attr.attr,
&virtual_start_attr.attr,
&pagesize_attr.attr,
- &writable_pt_attr.attr,
+ &features_attr.attr,
NULL
};

--
1.6.0.6

2009-03-13 08:18:17

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 22/24] xen: remove suspend_cancel hook

From: Ian Campbell <[email protected]>

Remove suspend_cancel hook from xenbus_driver, in preparation for using
the device model for suspending.

Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/xenbus/xenbus_probe.c | 23 -----------------------
include/xen/xenbus.h | 1 -
2 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 773d1cf..bd20361 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -689,27 +689,6 @@ static int suspend_dev(struct device *dev, void *data)
return 0;
}

-static int suspend_cancel_dev(struct device *dev, void *data)
-{
- int err = 0;
- struct xenbus_driver *drv;
- struct xenbus_device *xdev;
-
- DPRINTK("");
-
- if (dev->driver == NULL)
- return 0;
- drv = to_xenbus_driver(dev->driver);
- xdev = container_of(dev, struct xenbus_device, dev);
- if (drv->suspend_cancel)
- err = drv->suspend_cancel(xdev);
- if (err)
- printk(KERN_WARNING
- "xenbus: suspend_cancel %s failed: %i\n",
- dev_name(dev), err);
- return 0;
-}
-
static int resume_dev(struct device *dev, void *data)
{
int err;
@@ -777,8 +756,6 @@ EXPORT_SYMBOL_GPL(xenbus_resume);
void xenbus_suspend_cancel(void)
{
xs_suspend_cancel();
- bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, suspend_cancel_dev);
- xenbus_backend_resume(suspend_cancel_dev);
}
EXPORT_SYMBOL_GPL(xenbus_suspend_cancel);

diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index f87f961..0836772 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -92,7 +92,6 @@ struct xenbus_driver {
enum xenbus_state backend_state);
int (*remove)(struct xenbus_device *dev);
int (*suspend)(struct xenbus_device *dev);
- int (*suspend_cancel)(struct xenbus_device *dev);
int (*resume)(struct xenbus_device *dev);
int (*uevent)(struct xenbus_device *, char **, int, char *, int);
struct device_driver driver;
--
1.6.0.6

2009-03-13 08:19:13

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 18/24] xen: add "capabilities" file

The xenfs capabilities file allows usermode to determine what
capabilities the domain has. The only one at present is "control_d"
in a privileged domain.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/xenfs/super.c | 19 ++++++++++++++++++-
1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 515741a..6559e0c 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -20,10 +20,27 @@
MODULE_DESCRIPTION("Xen filesystem");
MODULE_LICENSE("GPL");

+static ssize_t capabilities_read(struct file *file, char __user *buf,
+ size_t size, loff_t *off)
+{
+ char *tmp = "";
+
+ if (xen_initial_domain())
+ tmp = "control_d\n";
+
+ return simple_read_from_buffer(buf, size, off, tmp, strlen(tmp));
+}
+
+static const struct file_operations capabilities_file_ops = {
+ .read = capabilities_read,
+};
+
static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
{
static struct tree_descr xenfs_files[] = {
- [2] = {"xenbus", &xenbus_file_ops, S_IRUSR|S_IWUSR},
+ [1] = {},
+ { "xenbus", &xenbus_file_ops, S_IRUSR|S_IWUSR },
+ { "capabilities", &capabilities_file_ops, S_IRUGO },
{""},
};

--
1.6.0.6

2009-03-13 08:19:35

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 13/24] Xen: Add virt_to_pfn helper function

From: Alex Nixon <[email protected]>

Signed-off-by: Alex Nixon <[email protected]>
---
arch/x86/include/asm/xen/page.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 1a918dd..018a0a4 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -124,7 +124,8 @@ static inline unsigned long mfn_to_local_pfn(unsigned long mfn)

/* VIRT <-> MACHINE conversion */
#define virt_to_machine(v) (phys_to_machine(XPADDR(__pa(v))))
-#define virt_to_mfn(v) (pfn_to_mfn(PFN_DOWN(__pa(v))))
+#define virt_to_pfn(v) (PFN_DOWN(__pa(v)))
+#define virt_to_mfn(v) (pfn_to_mfn(virt_to_pfn(v)))
#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))

static inline unsigned long pte_mfn(pte_t pte)
--
1.6.0.6

2009-03-13 08:20:04

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 17/24] xen/dev-evtchn: clean up locking in evtchn

From: Jeremy Fitzhardinge <[email protected]>

Define a new per_user_data mutex to serialize bind/unbind operations
to prevent them from racing with each other. Fix error returns
and don't do a bind while holding a spinlock.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/evtchn.c | 37 +++++++++++++++++++++++++------------
1 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 517b9ee..af03195 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -54,6 +54,8 @@
#include <asm/xen/hypervisor.h>

struct per_user_data {
+ struct mutex bind_mutex; /* serialize bind/unbind operations */
+
/* Notification ring, accessed via /dev/xen/evtchn. */
#define EVTCHN_RING_SIZE (PAGE_SIZE / sizeof(evtchn_port_t))
#define EVTCHN_RING_MASK(_i) ((_i)&(EVTCHN_RING_SIZE-1))
@@ -69,7 +71,7 @@ struct per_user_data {

/* Who's bound to each port? */
static struct per_user_data *port_user[NR_EVENT_CHANNELS];
-static DEFINE_SPINLOCK(port_user_lock);
+static DEFINE_SPINLOCK(port_user_lock); /* protects port_user[] and ring_prod */

irqreturn_t evtchn_interrupt(int irq, void *data)
{
@@ -210,22 +212,24 @@ static ssize_t evtchn_write(struct file *file, const char __user *buf,

static int evtchn_bind_to_user(struct per_user_data *u, int port)
{
- int irq;
int rc = 0;

- spin_lock_irq(&port_user_lock);
-
+ /*
+ * Ports are never reused, so every caller should pass in a
+ * unique port.
+ *
+ * (Locking not necessary because we haven't registered the
+ * interrupt handler yet, and our caller has already
+ * serialized bind operations.)
+ */
BUG_ON(port_user[port] != NULL);
-
- irq = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
- u->name, (void *)(unsigned long)port);
- if (rc < 0)
- goto fail;
-
port_user[port] = u;

-fail:
- spin_unlock_irq(&port_user_lock);
+ rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
+ u->name, (void *)(unsigned long)port);
+ if (rc >= 0)
+ rc = 0;
+
return rc;
}

@@ -234,6 +238,10 @@ static void evtchn_unbind_from_user(struct per_user_data *u, int port)
int irq = irq_from_evtchn(port);

unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+
+ /* make sure we unbind the irq handler before clearing the port */
+ barrier();
+
port_user[port] = NULL;
}

@@ -244,6 +252,9 @@ static long evtchn_ioctl(struct file *file,
struct per_user_data *u = file->private_data;
void __user *uarg = (void __user *) arg;

+ /* Prevent bind from racing with unbind */
+ mutex_lock(&u->bind_mutex);
+
switch (cmd) {
case IOCTL_EVTCHN_BIND_VIRQ: {
struct ioctl_evtchn_bind_virq bind;
@@ -368,6 +379,7 @@ static long evtchn_ioctl(struct file *file,
rc = -ENOSYS;
break;
}
+ mutex_unlock(&u->bind_mutex);

return rc;
}
@@ -414,6 +426,7 @@ static int evtchn_open(struct inode *inode, struct file *filp)
return -ENOMEM;
}

+ mutex_init(&u->bind_mutex);
mutex_init(&u->ring_cons_mutex);

filp->private_data = u;
--
1.6.0.6

2009-03-13 08:20:37

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 16/24] xen: export ioctl headers to userspace

From: Ian Campbell <[email protected]>

Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
include/Kbuild | 1 +
include/xen/Kbuild | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 include/xen/Kbuild

diff --git a/include/Kbuild b/include/Kbuild
index d8c3e3c..fe36acc 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -8,3 +8,4 @@ header-y += mtd/
header-y += rdma/
header-y += video/
header-y += drm/
+header-y += xen/
diff --git a/include/xen/Kbuild b/include/xen/Kbuild
new file mode 100644
index 0000000..4e65c16
--- /dev/null
+++ b/include/xen/Kbuild
@@ -0,0 +1 @@
+header-y += evtchn.h
--
1.6.0.6

2009-03-13 08:21:03

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 14/24] xen: add irq_from_evtchn

From: Ian Campbell <[email protected]>

Given an evtchn, return the corresponding irq.

Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/events.c | 6 ++++++
include/xen/events.h | 3 +++
2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 30963af..1cd2a0e 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -151,6 +151,12 @@ static unsigned int evtchn_from_irq(unsigned irq)
return info_for_irq(irq)->evtchn;
}

+unsigned irq_from_evtchn(unsigned int evtchn)
+{
+ return evtchn_to_irq[evtchn];
+}
+EXPORT_SYMBOL_GPL(irq_from_evtchn);
+
static enum ipi_vector ipi_from_irq(unsigned irq)
{
struct irq_info *info = info_for_irq(irq);
diff --git a/include/xen/events.h b/include/xen/events.h
index 0d5f1ad..e68d59a 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -53,4 +53,7 @@ bool xen_test_irq_pending(int irq);
irq will be disabled so it won't deliver an interrupt. */
void xen_poll_irq(int irq);

+/* Determine the IRQ which is bound to an event channel */
+unsigned irq_from_evtchn(unsigned int evtchn);
+
#endif /* _XEN_EVENTS_H */
--
1.6.0.6

2009-03-13 08:21:26

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 24/24] xen/xenbus: export xenbus_dev_changed

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/xenbus/xenbus_probe.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 4649213..d42e25d 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -660,6 +660,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)

kfree(root);
}
+EXPORT_SYMBOL_GPL(xenbus_dev_changed);

static void frontend_changed(struct xenbus_watch *watch,
const char **vec, unsigned int len)
--
1.6.0.6

2009-03-13 08:21:54

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 15/24] xen: add /dev/xen/evtchn driver

From: Ian Campbell <[email protected]>

This driver is used by application which wish to receive notifications
from the hypervisor or other guests via Xen's event channel
mechanism. In particular it is used by the xenstore daemon in domain
0.

Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/Kconfig | 10 +
drivers/xen/Makefile | 3 +-
drivers/xen/evtchn.c | 494 ++++++++++++++++++++++++++++++++++++++++++++++++++
include/xen/evtchn.h | 88 +++++++++
4 files changed, 594 insertions(+), 1 deletions(-)
create mode 100644 drivers/xen/evtchn.c
create mode 100644 include/xen/evtchn.h

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 526187c..1bbb910 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -18,6 +18,16 @@ config XEN_SCRUB_PAGES
secure, but slightly less efficient.
If in doubt, say yes.

+config XEN_DEV_EVTCHN
+ tristate "Xen /dev/xen/evtchn device"
+ depends on XEN
+ default y
+ help
+ The evtchn driver allows a userspace process to triger event
+ channels and to receive notification of an event channel
+ firing.
+ If in doubt, say yes.
+
config XENFS
tristate "Xen filesystem"
depends on XEN
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ff8accc..1567639 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -4,4 +4,5 @@ obj-y += xenbus/
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
-obj-$(CONFIG_XENFS) += xenfs/
\ No newline at end of file
+obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o
+obj-$(CONFIG_XENFS) += xenfs/
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
new file mode 100644
index 0000000..517b9ee
--- /dev/null
+++ b/drivers/xen/evtchn.c
@@ -0,0 +1,494 @@
+/******************************************************************************
+ * evtchn.c
+ *
+ * Driver for receiving and demuxing event-channel signals.
+ *
+ * Copyright (c) 2004-2005, K A Fraser
+ * Multi-process extensions Copyright (c) 2004, Steven Smith
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/errno.h>
+#include <linux/miscdevice.h>
+#include <linux/major.h>
+#include <linux/proc_fs.h>
+#include <linux/stat.h>
+#include <linux/poll.h>
+#include <linux/irq.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <xen/events.h>
+#include <xen/evtchn.h>
+#include <asm/xen/hypervisor.h>
+
+struct per_user_data {
+ /* Notification ring, accessed via /dev/xen/evtchn. */
+#define EVTCHN_RING_SIZE (PAGE_SIZE / sizeof(evtchn_port_t))
+#define EVTCHN_RING_MASK(_i) ((_i)&(EVTCHN_RING_SIZE-1))
+ evtchn_port_t *ring;
+ unsigned int ring_cons, ring_prod, ring_overflow;
+ struct mutex ring_cons_mutex; /* protect against concurrent readers */
+
+ /* Processes wait on this queue when ring is empty. */
+ wait_queue_head_t evtchn_wait;
+ struct fasync_struct *evtchn_async_queue;
+ const char *name;
+};
+
+/* Who's bound to each port? */
+static struct per_user_data *port_user[NR_EVENT_CHANNELS];
+static DEFINE_SPINLOCK(port_user_lock);
+
+irqreturn_t evtchn_interrupt(int irq, void *data)
+{
+ unsigned int port = (unsigned long)data;
+ struct per_user_data *u;
+
+ spin_lock(&port_user_lock);
+
+ u = port_user[port];
+
+ disable_irq_nosync(irq);
+
+ if ((u->ring_prod - u->ring_cons) < EVTCHN_RING_SIZE) {
+ u->ring[EVTCHN_RING_MASK(u->ring_prod)] = port;
+ wmb(); /* Ensure ring contents visible */
+ if (u->ring_cons == u->ring_prod++) {
+ wake_up_interruptible(&u->evtchn_wait);
+ kill_fasync(&u->evtchn_async_queue,
+ SIGIO, POLL_IN);
+ }
+ } else {
+ u->ring_overflow = 1;
+ }
+
+ spin_unlock(&port_user_lock);
+
+ return IRQ_HANDLED;
+}
+
+static ssize_t evtchn_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int rc;
+ unsigned int c, p, bytes1 = 0, bytes2 = 0;
+ struct per_user_data *u = file->private_data;
+
+ /* Whole number of ports. */
+ count &= ~(sizeof(evtchn_port_t)-1);
+
+ if (count == 0)
+ return 0;
+
+ if (count > PAGE_SIZE)
+ count = PAGE_SIZE;
+
+ for (;;) {
+ mutex_lock(&u->ring_cons_mutex);
+
+ rc = -EFBIG;
+ if (u->ring_overflow)
+ goto unlock_out;
+
+ c = u->ring_cons;
+ p = u->ring_prod;
+ if (c != p)
+ break;
+
+ mutex_unlock(&u->ring_cons_mutex);
+
+ if (file->f_flags & O_NONBLOCK)
+ return -EAGAIN;
+
+ rc = wait_event_interruptible(u->evtchn_wait,
+ u->ring_cons != u->ring_prod);
+ if (rc)
+ return rc;
+ }
+
+ /* Byte lengths of two chunks. Chunk split (if any) is at ring wrap. */
+ if (((c ^ p) & EVTCHN_RING_SIZE) != 0) {
+ bytes1 = (EVTCHN_RING_SIZE - EVTCHN_RING_MASK(c)) *
+ sizeof(evtchn_port_t);
+ bytes2 = EVTCHN_RING_MASK(p) * sizeof(evtchn_port_t);
+ } else {
+ bytes1 = (p - c) * sizeof(evtchn_port_t);
+ bytes2 = 0;
+ }
+
+ /* Truncate chunks according to caller's maximum byte count. */
+ if (bytes1 > count) {
+ bytes1 = count;
+ bytes2 = 0;
+ } else if ((bytes1 + bytes2) > count) {
+ bytes2 = count - bytes1;
+ }
+
+ rc = -EFAULT;
+ rmb(); /* Ensure that we see the port before we copy it. */
+ if (copy_to_user(buf, &u->ring[EVTCHN_RING_MASK(c)], bytes1) ||
+ ((bytes2 != 0) &&
+ copy_to_user(&buf[bytes1], &u->ring[0], bytes2)))
+ goto unlock_out;
+
+ u->ring_cons += (bytes1 + bytes2) / sizeof(evtchn_port_t);
+ rc = bytes1 + bytes2;
+
+ unlock_out:
+ mutex_unlock(&u->ring_cons_mutex);
+ return rc;
+}
+
+static ssize_t evtchn_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int rc, i;
+ evtchn_port_t *kbuf = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+ struct per_user_data *u = file->private_data;
+
+ if (kbuf == NULL)
+ return -ENOMEM;
+
+ /* Whole number of ports. */
+ count &= ~(sizeof(evtchn_port_t)-1);
+
+ rc = 0;
+ if (count == 0)
+ goto out;
+
+ if (count > PAGE_SIZE)
+ count = PAGE_SIZE;
+
+ rc = -EFAULT;
+ if (copy_from_user(kbuf, buf, count) != 0)
+ goto out;
+
+ spin_lock_irq(&port_user_lock);
+ for (i = 0; i < (count/sizeof(evtchn_port_t)); i++)
+ if ((kbuf[i] < NR_EVENT_CHANNELS) && (port_user[kbuf[i]] == u))
+ enable_irq(irq_from_evtchn(kbuf[i]));
+ spin_unlock_irq(&port_user_lock);
+
+ rc = count;
+
+ out:
+ free_page((unsigned long)kbuf);
+ return rc;
+}
+
+static int evtchn_bind_to_user(struct per_user_data *u, int port)
+{
+ int irq;
+ int rc = 0;
+
+ spin_lock_irq(&port_user_lock);
+
+ BUG_ON(port_user[port] != NULL);
+
+ irq = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
+ u->name, (void *)(unsigned long)port);
+ if (rc < 0)
+ goto fail;
+
+ port_user[port] = u;
+
+fail:
+ spin_unlock_irq(&port_user_lock);
+ return rc;
+}
+
+static void evtchn_unbind_from_user(struct per_user_data *u, int port)
+{
+ int irq = irq_from_evtchn(port);
+
+ unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+ port_user[port] = NULL;
+}
+
+static long evtchn_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ int rc;
+ struct per_user_data *u = file->private_data;
+ void __user *uarg = (void __user *) arg;
+
+ switch (cmd) {
+ case IOCTL_EVTCHN_BIND_VIRQ: {
+ struct ioctl_evtchn_bind_virq bind;
+ struct evtchn_bind_virq bind_virq;
+
+ rc = -EFAULT;
+ if (copy_from_user(&bind, uarg, sizeof(bind)))
+ break;
+
+ bind_virq.virq = bind.virq;
+ bind_virq.vcpu = 0;
+ rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+ &bind_virq);
+ if (rc != 0)
+ break;
+
+ rc = evtchn_bind_to_user(u, bind_virq.port);
+ if (rc == 0)
+ rc = bind_virq.port;
+ break;
+ }
+
+ case IOCTL_EVTCHN_BIND_INTERDOMAIN: {
+ struct ioctl_evtchn_bind_interdomain bind;
+ struct evtchn_bind_interdomain bind_interdomain;
+
+ rc = -EFAULT;
+ if (copy_from_user(&bind, uarg, sizeof(bind)))
+ break;
+
+ bind_interdomain.remote_dom = bind.remote_domain;
+ bind_interdomain.remote_port = bind.remote_port;
+ rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
+ &bind_interdomain);
+ if (rc != 0)
+ break;
+
+ rc = evtchn_bind_to_user(u, bind_interdomain.local_port);
+ if (rc == 0)
+ rc = bind_interdomain.local_port;
+ break;
+ }
+
+ case IOCTL_EVTCHN_BIND_UNBOUND_PORT: {
+ struct ioctl_evtchn_bind_unbound_port bind;
+ struct evtchn_alloc_unbound alloc_unbound;
+
+ rc = -EFAULT;
+ if (copy_from_user(&bind, uarg, sizeof(bind)))
+ break;
+
+ alloc_unbound.dom = DOMID_SELF;
+ alloc_unbound.remote_dom = bind.remote_domain;
+ rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound,
+ &alloc_unbound);
+ if (rc != 0)
+ break;
+
+ rc = evtchn_bind_to_user(u, alloc_unbound.port);
+ if (rc == 0)
+ rc = alloc_unbound.port;
+ break;
+ }
+
+ case IOCTL_EVTCHN_UNBIND: {
+ struct ioctl_evtchn_unbind unbind;
+
+ rc = -EFAULT;
+ if (copy_from_user(&unbind, uarg, sizeof(unbind)))
+ break;
+
+ rc = -EINVAL;
+ if (unbind.port >= NR_EVENT_CHANNELS)
+ break;
+
+ spin_lock_irq(&port_user_lock);
+
+ rc = -ENOTCONN;
+ if (port_user[unbind.port] != u) {
+ spin_unlock_irq(&port_user_lock);
+ break;
+ }
+
+ evtchn_unbind_from_user(u, unbind.port);
+
+ spin_unlock_irq(&port_user_lock);
+
+ rc = 0;
+ break;
+ }
+
+ case IOCTL_EVTCHN_NOTIFY: {
+ struct ioctl_evtchn_notify notify;
+
+ rc = -EFAULT;
+ if (copy_from_user(&notify, uarg, sizeof(notify)))
+ break;
+
+ if (notify.port >= NR_EVENT_CHANNELS) {
+ rc = -EINVAL;
+ } else if (port_user[notify.port] != u) {
+ rc = -ENOTCONN;
+ } else {
+ notify_remote_via_evtchn(notify.port);
+ rc = 0;
+ }
+ break;
+ }
+
+ case IOCTL_EVTCHN_RESET: {
+ /* Initialise the ring to empty. Clear errors. */
+ mutex_lock(&u->ring_cons_mutex);
+ spin_lock_irq(&port_user_lock);
+ u->ring_cons = u->ring_prod = u->ring_overflow = 0;
+ spin_unlock_irq(&port_user_lock);
+ mutex_unlock(&u->ring_cons_mutex);
+ rc = 0;
+ break;
+ }
+
+ default:
+ rc = -ENOSYS;
+ break;
+ }
+
+ return rc;
+}
+
+static unsigned int evtchn_poll(struct file *file, poll_table *wait)
+{
+ unsigned int mask = POLLOUT | POLLWRNORM;
+ struct per_user_data *u = file->private_data;
+
+ poll_wait(file, &u->evtchn_wait, wait);
+ if (u->ring_cons != u->ring_prod)
+ mask |= POLLIN | POLLRDNORM;
+ if (u->ring_overflow)
+ mask = POLLERR;
+ return mask;
+}
+
+static int evtchn_fasync(int fd, struct file *filp, int on)
+{
+ struct per_user_data *u = filp->private_data;
+ return fasync_helper(fd, filp, on, &u->evtchn_async_queue);
+}
+
+static int evtchn_open(struct inode *inode, struct file *filp)
+{
+ struct per_user_data *u;
+
+ u = kzalloc(sizeof(*u), GFP_KERNEL);
+ if (u == NULL)
+ return -ENOMEM;
+
+ u->name = kasprintf(GFP_KERNEL, "evtchn:%s", current->comm);
+ if (u->name == NULL) {
+ kfree(u);
+ return -ENOMEM;
+ }
+
+ init_waitqueue_head(&u->evtchn_wait);
+
+ u->ring = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+ if (u->ring == NULL) {
+ kfree(u->name);
+ kfree(u);
+ return -ENOMEM;
+ }
+
+ mutex_init(&u->ring_cons_mutex);
+
+ filp->private_data = u;
+
+ return 0;
+}
+
+static int evtchn_release(struct inode *inode, struct file *filp)
+{
+ int i;
+ struct per_user_data *u = filp->private_data;
+
+ spin_lock_irq(&port_user_lock);
+
+ free_page((unsigned long)u->ring);
+
+ for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+ if (port_user[i] != u)
+ continue;
+
+ evtchn_unbind_from_user(port_user[i], i);
+ }
+
+ spin_unlock_irq(&port_user_lock);
+
+ kfree(u->name);
+ kfree(u);
+
+ return 0;
+}
+
+static const struct file_operations evtchn_fops = {
+ .owner = THIS_MODULE,
+ .read = evtchn_read,
+ .write = evtchn_write,
+ .unlocked_ioctl = evtchn_ioctl,
+ .poll = evtchn_poll,
+ .fasync = evtchn_fasync,
+ .open = evtchn_open,
+ .release = evtchn_release,
+};
+
+static struct miscdevice evtchn_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "evtchn",
+ .fops = &evtchn_fops,
+};
+static int __init evtchn_init(void)
+{
+ int err;
+
+ if (!xen_domain())
+ return -ENODEV;
+
+ spin_lock_init(&port_user_lock);
+ memset(port_user, 0, sizeof(port_user));
+
+ /* Create '/dev/misc/evtchn'. */
+ err = misc_register(&evtchn_miscdev);
+ if (err != 0) {
+ printk(KERN_ALERT "Could not register /dev/misc/evtchn\n");
+ return err;
+ }
+
+ printk(KERN_INFO "Event-channel device installed.\n");
+
+ return 0;
+}
+
+static void __exit evtchn_cleanup(void)
+{
+ misc_deregister(&evtchn_miscdev);
+}
+
+module_init(evtchn_init);
+module_exit(evtchn_cleanup);
+
+MODULE_LICENSE("GPL");
diff --git a/include/xen/evtchn.h b/include/xen/evtchn.h
new file mode 100644
index 0000000..14e833e
--- /dev/null
+++ b/include/xen/evtchn.h
@@ -0,0 +1,88 @@
+/******************************************************************************
+ * evtchn.h
+ *
+ * Interface to /dev/xen/evtchn.
+ *
+ * Copyright (c) 2003-2005, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __LINUX_PUBLIC_EVTCHN_H__
+#define __LINUX_PUBLIC_EVTCHN_H__
+
+/*
+ * Bind a fresh port to VIRQ @virq.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_VIRQ \
+ _IOC(_IOC_NONE, 'E', 0, sizeof(struct ioctl_evtchn_bind_virq))
+struct ioctl_evtchn_bind_virq {
+ unsigned int virq;
+};
+
+/*
+ * Bind a fresh port to remote <@remote_domain, @remote_port>.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_INTERDOMAIN \
+ _IOC(_IOC_NONE, 'E', 1, sizeof(struct ioctl_evtchn_bind_interdomain))
+struct ioctl_evtchn_bind_interdomain {
+ unsigned int remote_domain, remote_port;
+};
+
+/*
+ * Allocate a fresh port for binding to @remote_domain.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_UNBOUND_PORT \
+ _IOC(_IOC_NONE, 'E', 2, sizeof(struct ioctl_evtchn_bind_unbound_port))
+struct ioctl_evtchn_bind_unbound_port {
+ unsigned int remote_domain;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_UNBIND \
+ _IOC(_IOC_NONE, 'E', 3, sizeof(struct ioctl_evtchn_unbind))
+struct ioctl_evtchn_unbind {
+ unsigned int port;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_NOTIFY \
+ _IOC(_IOC_NONE, 'E', 4, sizeof(struct ioctl_evtchn_notify))
+struct ioctl_evtchn_notify {
+ unsigned int port;
+};
+
+/* Clear and reinitialise the event buffer. Clear error condition. */
+#define IOCTL_EVTCHN_RESET \
+ _IOC(_IOC_NONE, 'E', 5, 0)
+
+#endif /* __LINUX_PUBLIC_EVTCHN_H__ */
--
1.6.0.6

2009-03-13 08:22:40

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 23/24] xen: use device model for suspending xenbus devices

From: Ian Campbell <[email protected]>

Signed-off-by: Ian Campbell <[email protected]>
Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/manage.c | 9 ++++-----
drivers/xen/xenbus/xenbus_probe.c | 37 +++++++++----------------------------
drivers/xen/xenbus/xenbus_xs.c | 2 ++
include/xen/xenbus.h | 2 +-
4 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 3ccd348..0489ea2 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -104,9 +104,8 @@ static void do_suspend(void)
goto out;
}

- printk("suspending xenbus...\n");
- /* XXX use normal device tree? */
- xenbus_suspend();
+ printk(KERN_DEBUG "suspending xenstore...\n");
+ xs_suspend();

err = stop_machine(xen_suspend, &cancelled, cpumask_of(0));
if (err) {
@@ -116,9 +115,9 @@ static void do_suspend(void)

if (!cancelled) {
xen_arch_resume();
- xenbus_resume();
+ xs_resume();
} else
- xenbus_suspend_cancel();
+ xs_suspend_cancel();

device_resume(PMSG_RESUME);

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index bd20361..4649213 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -71,6 +71,9 @@ static int xenbus_probe_frontend(const char *type, const char *name);

static void xenbus_dev_shutdown(struct device *_dev);

+static int xenbus_dev_suspend(struct device *dev, pm_message_t state);
+static int xenbus_dev_resume(struct device *dev);
+
/* If something in array of ids matches this device, return it. */
static const struct xenbus_device_id *
match_device(const struct xenbus_device_id *arr, struct xenbus_device *dev)
@@ -188,6 +191,9 @@ static struct xen_bus_type xenbus_frontend = {
.remove = xenbus_dev_remove,
.shutdown = xenbus_dev_shutdown,
.dev_attrs = xenbus_dev_attrs,
+
+ .suspend = xenbus_dev_suspend,
+ .resume = xenbus_dev_resume,
},
};

@@ -669,7 +675,7 @@ static struct xenbus_watch fe_watch = {
.callback = frontend_changed,
};

-static int suspend_dev(struct device *dev, void *data)
+static int xenbus_dev_suspend(struct device *dev, pm_message_t state)
{
int err = 0;
struct xenbus_driver *drv;
@@ -682,14 +688,14 @@ static int suspend_dev(struct device *dev, void *data)
drv = to_xenbus_driver(dev->driver);
xdev = container_of(dev, struct xenbus_device, dev);
if (drv->suspend)
- err = drv->suspend(xdev);
+ err = drv->suspend(xdev, state);
if (err)
printk(KERN_WARNING
"xenbus: suspend %s failed: %i\n", dev_name(dev), err);
return 0;
}

-static int resume_dev(struct device *dev, void *data)
+static int xenbus_dev_resume(struct device *dev)
{
int err;
struct xenbus_driver *drv;
@@ -734,31 +740,6 @@ static int resume_dev(struct device *dev, void *data)
return 0;
}

-void xenbus_suspend(void)
-{
- DPRINTK("");
-
- bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, suspend_dev);
- xenbus_backend_suspend(suspend_dev);
- xs_suspend();
-}
-EXPORT_SYMBOL_GPL(xenbus_suspend);
-
-void xenbus_resume(void)
-{
- xb_init_comms();
- xs_resume();
- bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, resume_dev);
- xenbus_backend_resume(resume_dev);
-}
-EXPORT_SYMBOL_GPL(xenbus_resume);
-
-void xenbus_suspend_cancel(void)
-{
- xs_suspend_cancel();
-}
-EXPORT_SYMBOL_GPL(xenbus_suspend_cancel);
-
/* A flag to determine if xenstored is 'ready' (i.e. has started) */
int xenstored_ready = 0;

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index e325eab..eab33f1 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -673,6 +673,8 @@ void xs_resume(void)
struct xenbus_watch *watch;
char token[sizeof(watch) * 2 + 1];

+ xb_init_comms();
+
mutex_unlock(&xs_state.response_mutex);
mutex_unlock(&xs_state.request_mutex);
up_write(&xs_state.transaction_mutex);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 0836772..b9763ba 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -91,7 +91,7 @@ struct xenbus_driver {
void (*otherend_changed)(struct xenbus_device *dev,
enum xenbus_state backend_state);
int (*remove)(struct xenbus_device *dev);
- int (*suspend)(struct xenbus_device *dev);
+ int (*suspend)(struct xenbus_device *dev, pm_message_t state);
int (*resume)(struct xenbus_device *dev);
int (*uevent)(struct xenbus_device *, char **, int, char *, int);
struct device_driver driver;
--
1.6.0.6

2009-03-13 08:22:16

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 21/24] xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet

From: Ian Campbell <[email protected]>

I needed this to compile since there is no kexec yet in pvops kernel
CC drivers/xen/sys-hypervisor.o
drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_init':
drivers/xen/sys-hypervisor.c:405: error: 'vmcoreinfo_size_xen' undeclared (first use in this function)
drivers/xen/sys-hypervisor.c:405: error: (Each undeclared identifier is reported only once
drivers/xen/sys-hypervisor.c:405: error: for each function it appears in.)
drivers/xen/sys-hypervisor.c:406: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_init'
drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_exit':
drivers/xen/sys-hypervisor.c:433: error: 'vmcoreinfo_size_xen' undeclared (first use in this function)
drivers/xen/sys-hypervisor.c:434: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_destroy'

Signed-off-by: Ian Campbell <[email protected]>
---
drivers/xen/sys-hypervisor.c | 41 -----------------------------------------
1 files changed, 0 insertions(+), 41 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 1267d6f..88a60e0 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -353,32 +353,6 @@ static void xen_properties_destroy(void)
sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
}

-#ifdef CONFIG_KEXEC
-
-extern size_t vmcoreinfo_size_xen;
-extern unsigned long paddr_vmcoreinfo_xen;
-
-static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *page)
-{
- return sprintf(page, "%lx %zx\n",
- paddr_vmcoreinfo_xen, vmcoreinfo_size_xen);
-}
-
-HYPERVISOR_ATTR_RO(vmcoreinfo);
-
-static int __init xen_sysfs_vmcoreinfo_init(void)
-{
- return sysfs_create_file(hypervisor_kobj,
- &vmcoreinfo_attr.attr);
-}
-
-static void xen_sysfs_vmcoreinfo_destroy(void)
-{
- sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
-}
-
-#endif
-
static int __init hyper_sysfs_init(void)
{
int ret;
@@ -401,20 +375,9 @@ static int __init hyper_sysfs_init(void)
ret = xen_properties_init();
if (ret)
goto prop_out;
-#ifdef CONFIG_KEXEC
- if (vmcoreinfo_size_xen != 0) {
- ret = xen_sysfs_vmcoreinfo_init();
- if (ret)
- goto vmcoreinfo_out;
- }
-#endif

goto out;

-#ifdef CONFIG_KEXEC
-vmcoreinfo_out:
-#endif
- xen_properties_destroy();
prop_out:
xen_sysfs_uuid_destroy();
uuid_out:
@@ -429,10 +392,6 @@ out:

static void __exit hyper_sysfs_exit(void)
{
-#ifdef CONFIG_KEXEC
- if (vmcoreinfo_size_xen != 0)
- xen_sysfs_vmcoreinfo_destroy();
-#endif
xen_properties_destroy();
xen_compilation_destroy();
xen_sysfs_uuid_destroy();
--
1.6.0.6

2009-03-13 08:23:06

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 19/24] xen: add /sys/hypervisor support

From: Jeremy Fitzhardinge <[email protected]>

Adds support for Xen info under /sys/hypervisor. Taken from Novell 2.6.27
backport tree.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
drivers/xen/Kconfig | 10 +
drivers/xen/Makefile | 3 +-
drivers/xen/sys-hypervisor.c | 475 +++++++++++++++++++++++++++++++++++++++
include/xen/interface/version.h | 3 +
4 files changed, 490 insertions(+), 1 deletions(-)
create mode 100644 drivers/xen/sys-hypervisor.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 526187c..88bca1c 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -41,3 +41,13 @@ config XEN_COMPAT_XENFS
a xen platform.
If in doubt, say yes.

+config XEN_SYS_HYPERVISOR
+ bool "Create xen entries under /sys/hypervisor"
+ depends on XEN && SYSFS
+ select SYS_HYPERVISOR
+ default y
+ help
+ Create entries under /sys/hypervisor describing the Xen
+ hypervisor environment. When running native or in another
+ virtual environment, /sys/hypervisor will still be present,
+ but will have no xen contents.
\ No newline at end of file
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ff8accc..f3603a3 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -4,4 +4,5 @@ obj-y += xenbus/
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
-obj-$(CONFIG_XENFS) += xenfs/
\ No newline at end of file
+obj-$(CONFIG_XENFS) += xenfs/
+obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
new file mode 100644
index 0000000..cb29d1c
--- /dev/null
+++ b/drivers/xen/sys-hypervisor.c
@@ -0,0 +1,475 @@
+/*
+ * copyright (c) 2006 IBM Corporation
+ * Authored by: Mike D. Day <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kobject.h>
+
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/hypercall.h>
+
+#include <xen/xenbus.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/version.h>
+
+#define HYPERVISOR_ATTR_RO(_name) \
+static struct hyp_sysfs_attr _name##_attr = __ATTR_RO(_name)
+
+#define HYPERVISOR_ATTR_RW(_name) \
+static struct hyp_sysfs_attr _name##_attr = \
+ __ATTR(_name, 0644, _name##_show, _name##_store)
+
+struct hyp_sysfs_attr {
+ struct attribute attr;
+ ssize_t (*show)(struct hyp_sysfs_attr *, char *);
+ ssize_t (*store)(struct hyp_sysfs_attr *, const char *, size_t);
+ void *hyp_attr_data;
+};
+
+static ssize_t type_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ return sprintf(buffer, "xen\n");
+}
+
+HYPERVISOR_ATTR_RO(type);
+
+static int __init xen_sysfs_type_init(void)
+{
+ return sysfs_create_file(hypervisor_kobj, &type_attr.attr);
+}
+
+static void xen_sysfs_type_destroy(void)
+{
+ sysfs_remove_file(hypervisor_kobj, &type_attr.attr);
+}
+
+/* xen version attributes */
+static ssize_t major_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int version = HYPERVISOR_xen_version(XENVER_version, NULL);
+ if (version)
+ return sprintf(buffer, "%d\n", version >> 16);
+ return -ENODEV;
+}
+
+HYPERVISOR_ATTR_RO(major);
+
+static ssize_t minor_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int version = HYPERVISOR_xen_version(XENVER_version, NULL);
+ if (version)
+ return sprintf(buffer, "%d\n", version & 0xff);
+ return -ENODEV;
+}
+
+HYPERVISOR_ATTR_RO(minor);
+
+static ssize_t extra_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ char *extra;
+
+ extra = kmalloc(XEN_EXTRAVERSION_LEN, GFP_KERNEL);
+ if (extra) {
+ ret = HYPERVISOR_xen_version(XENVER_extraversion, extra);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", extra);
+ kfree(extra);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(extra);
+
+static struct attribute *version_attrs[] = {
+ &major_attr.attr,
+ &minor_attr.attr,
+ &extra_attr.attr,
+ NULL
+};
+
+static struct attribute_group version_group = {
+ .name = "version",
+ .attrs = version_attrs,
+};
+
+static int __init xen_sysfs_version_init(void)
+{
+ return sysfs_create_group(hypervisor_kobj, &version_group);
+}
+
+static void xen_sysfs_version_destroy(void)
+{
+ sysfs_remove_group(hypervisor_kobj, &version_group);
+}
+
+/* UUID */
+
+static ssize_t uuid_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ char *vm, *val;
+ int ret;
+ extern int xenstored_ready;
+
+ if (!xenstored_ready)
+ return -EBUSY;
+
+ vm = xenbus_read(XBT_NIL, "vm", "", NULL);
+ if (IS_ERR(vm))
+ return PTR_ERR(vm);
+ val = xenbus_read(XBT_NIL, vm, "uuid", NULL);
+ kfree(vm);
+ if (IS_ERR(val))
+ return PTR_ERR(val);
+ ret = sprintf(buffer, "%s\n", val);
+ kfree(val);
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(uuid);
+
+static int __init xen_sysfs_uuid_init(void)
+{
+ return sysfs_create_file(hypervisor_kobj, &uuid_attr.attr);
+}
+
+static void xen_sysfs_uuid_destroy(void)
+{
+ sysfs_remove_file(hypervisor_kobj, &uuid_attr.attr);
+}
+
+/* xen compilation attributes */
+
+static ssize_t compiler_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ struct xen_compile_info *info;
+
+ info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+ if (info) {
+ ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", info->compiler);
+ kfree(info);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(compiler);
+
+static ssize_t compiled_by_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ struct xen_compile_info *info;
+
+ info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+ if (info) {
+ ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", info->compile_by);
+ kfree(info);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(compiled_by);
+
+static ssize_t compile_date_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ struct xen_compile_info *info;
+
+ info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+ if (info) {
+ ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", info->compile_date);
+ kfree(info);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(compile_date);
+
+static struct attribute *xen_compile_attrs[] = {
+ &compiler_attr.attr,
+ &compiled_by_attr.attr,
+ &compile_date_attr.attr,
+ NULL
+};
+
+static struct attribute_group xen_compilation_group = {
+ .name = "compilation",
+ .attrs = xen_compile_attrs,
+};
+
+int __init static xen_compilation_init(void)
+{
+ return sysfs_create_group(hypervisor_kobj, &xen_compilation_group);
+}
+
+static void xen_compilation_destroy(void)
+{
+ sysfs_remove_group(hypervisor_kobj, &xen_compilation_group);
+}
+
+/* xen properties info */
+
+static ssize_t capabilities_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ char *caps;
+
+ caps = kmalloc(XEN_CAPABILITIES_INFO_LEN, GFP_KERNEL);
+ if (caps) {
+ ret = HYPERVISOR_xen_version(XENVER_capabilities, caps);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", caps);
+ kfree(caps);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(capabilities);
+
+static ssize_t changeset_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ char *cset;
+
+ cset = kmalloc(XEN_CHANGESET_INFO_LEN, GFP_KERNEL);
+ if (cset) {
+ ret = HYPERVISOR_xen_version(XENVER_changeset, cset);
+ if (!ret)
+ ret = sprintf(buffer, "%s\n", cset);
+ kfree(cset);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(changeset);
+
+static ssize_t virtual_start_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret = -ENOMEM;
+ struct xen_platform_parameters *parms;
+
+ parms = kmalloc(sizeof(struct xen_platform_parameters), GFP_KERNEL);
+ if (parms) {
+ ret = HYPERVISOR_xen_version(XENVER_platform_parameters,
+ parms);
+ if (!ret)
+ ret = sprintf(buffer, "%lx\n", parms->virt_start);
+ kfree(parms);
+ }
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(virtual_start);
+
+static ssize_t pagesize_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ int ret;
+
+ ret = HYPERVISOR_xen_version(XENVER_pagesize, NULL);
+ if (ret > 0)
+ ret = sprintf(buffer, "%x\n", ret);
+
+ return ret;
+}
+
+HYPERVISOR_ATTR_RO(pagesize);
+
+/* eventually there will be several more features to export */
+static ssize_t xen_feature_show(int index, char *buffer)
+{
+ int ret = -ENOMEM;
+ struct xen_feature_info *info;
+
+ info = kmalloc(sizeof(struct xen_feature_info), GFP_KERNEL);
+ if (info) {
+ info->submap_idx = index;
+ ret = HYPERVISOR_xen_version(XENVER_get_features, info);
+ if (!ret)
+ ret = sprintf(buffer, "%d\n", info->submap);
+ kfree(info);
+ }
+
+ return ret;
+}
+
+static ssize_t writable_pt_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+ return xen_feature_show(XENFEAT_writable_page_tables, buffer);
+}
+
+HYPERVISOR_ATTR_RO(writable_pt);
+
+static struct attribute *xen_properties_attrs[] = {
+ &capabilities_attr.attr,
+ &changeset_attr.attr,
+ &virtual_start_attr.attr,
+ &pagesize_attr.attr,
+ &writable_pt_attr.attr,
+ NULL
+};
+
+static struct attribute_group xen_properties_group = {
+ .name = "properties",
+ .attrs = xen_properties_attrs,
+};
+
+static int __init xen_properties_init(void)
+{
+ return sysfs_create_group(hypervisor_kobj, &xen_properties_group);
+}
+
+static void xen_properties_destroy(void)
+{
+ sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
+}
+
+#ifdef CONFIG_KEXEC
+
+extern size_t vmcoreinfo_size_xen;
+extern unsigned long paddr_vmcoreinfo_xen;
+
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *page)
+{
+ return sprintf(page, "%lx %zx\n",
+ paddr_vmcoreinfo_xen, vmcoreinfo_size_xen);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_sysfs_vmcoreinfo_init(void)
+{
+ return sysfs_create_file(hypervisor_kobj,
+ &vmcoreinfo_attr.attr);
+}
+
+static void xen_sysfs_vmcoreinfo_destroy(void)
+{
+ sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+#endif
+
+static int __init hyper_sysfs_init(void)
+{
+ int ret;
+
+ if (!xen_domain())
+ return -ENODEV;
+
+ ret = xen_sysfs_type_init();
+ if (ret)
+ goto out;
+ ret = xen_sysfs_version_init();
+ if (ret)
+ goto version_out;
+ ret = xen_compilation_init();
+ if (ret)
+ goto comp_out;
+ ret = xen_sysfs_uuid_init();
+ if (ret)
+ goto uuid_out;
+ ret = xen_properties_init();
+ if (ret)
+ goto prop_out;
+#ifdef CONFIG_KEXEC
+ if (vmcoreinfo_size_xen != 0) {
+ ret = xen_sysfs_vmcoreinfo_init();
+ if (ret)
+ goto vmcoreinfo_out;
+ }
+#endif
+
+ goto out;
+
+#ifdef CONFIG_KEXEC
+vmcoreinfo_out:
+#endif
+ xen_properties_destroy();
+prop_out:
+ xen_sysfs_uuid_destroy();
+uuid_out:
+ xen_compilation_destroy();
+comp_out:
+ xen_sysfs_version_destroy();
+version_out:
+ xen_sysfs_type_destroy();
+out:
+ return ret;
+}
+
+static void __exit hyper_sysfs_exit(void)
+{
+#ifdef CONFIG_KEXEC
+ if (vmcoreinfo_size_xen != 0)
+ xen_sysfs_vmcoreinfo_destroy();
+#endif
+ xen_properties_destroy();
+ xen_compilation_destroy();
+ xen_sysfs_uuid_destroy();
+ xen_sysfs_version_destroy();
+ xen_sysfs_type_destroy();
+
+}
+module_init(hyper_sysfs_init);
+module_exit(hyper_sysfs_exit);
+
+static ssize_t hyp_sysfs_show(struct kobject *kobj,
+ struct attribute *attr,
+ char *buffer)
+{
+ struct hyp_sysfs_attr *hyp_attr;
+ hyp_attr = container_of(attr, struct hyp_sysfs_attr, attr);
+ if (hyp_attr->show)
+ return hyp_attr->show(hyp_attr, buffer);
+ return 0;
+}
+
+static ssize_t hyp_sysfs_store(struct kobject *kobj,
+ struct attribute *attr,
+ const char *buffer,
+ size_t len)
+{
+ struct hyp_sysfs_attr *hyp_attr;
+ hyp_attr = container_of(attr, struct hyp_sysfs_attr, attr);
+ if (hyp_attr->store)
+ return hyp_attr->store(hyp_attr, buffer, len);
+ return 0;
+}
+
+static struct sysfs_ops hyp_sysfs_ops = {
+ .show = hyp_sysfs_show,
+ .store = hyp_sysfs_store,
+};
+
+static struct kobj_type hyp_sysfs_kobj_type = {
+ .sysfs_ops = &hyp_sysfs_ops,
+};
+
+static int __init hypervisor_subsys_init(void)
+{
+ if (!xen_domain())
+ return -ENODEV;
+
+ hypervisor_kobj->ktype = &hyp_sysfs_kobj_type;
+ return 0;
+}
+device_initcall(hypervisor_subsys_init);
diff --git a/include/xen/interface/version.h b/include/xen/interface/version.h
index 453235e..e8b6519 100644
--- a/include/xen/interface/version.h
+++ b/include/xen/interface/version.h
@@ -57,4 +57,7 @@ struct xen_feature_info {
/* Declares the features reported by XENVER_get_features. */
#include "features.h"

+/* arg == NULL; returns host memory page size. */
+#define XENVER_pagesize 7
+
#endif /* __XEN_PUBLIC_VERSION_H__ */
--
1.6.0.6

2009-03-13 10:23:19

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

>>> Jeremy Fitzhardinge <[email protected]> 13.03.09 09:11 >>>
>From: Jeremy Fitzhardinge <[email protected]>
>
>Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
>to be set. This confuses the kernel and it ends up crashing on
>an xsetbv instruction.
>
>At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of
>cpuid it we can't. This will produce a spurious error from Xen,
>but allows us to support XSAVE if/when Xen does.

As pointed out on an earlier thread, it seems inappropriate to do probing
like this when there is a cpuid feature flag (osxsave) that can be used to
determine whether XSAVE can be used. And even without that flag,
simply reading CR4 and checking whether osxsave is set there would
suffice. This is under the assumption that Xen's to-be-done implementation
of XSAVE support would match that of FXSAVE (Xen turns its support on
unconditionally and for all [pv] guests).

Jan

2009-03-13 10:23:35

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 22/24] xen: remove suspend_cancel hook

>>> Jeremy Fitzhardinge <[email protected]> 13.03.09 09:11 >>>
>From: Ian Campbell <[email protected]>
>
>Remove suspend_cancel hook from xenbus_driver, in preparation for using
>the device model for suspending.
>
>Signed-off-by: Ian Campbell <[email protected]>
>Signed-off-by: Jeremy Fitzhardinge <[email protected]>

Does that mean that there are no intentions to ever support the accelerator
stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
only user of the cancel hook...

Jan

2009-03-13 10:23:51

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 23/24] xen: use device model for suspendingxenbus devices

>>> Jeremy Fitzhardinge <[email protected]> 13.03.09 09:11 >>>

Shouldn't this also include removing the explicit calls to
gnttab_{suspend,resume}() in favor of making gnttab a sysdev?

Jan

2009-03-13 15:13:33

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Jan Beulich wrote:
> As pointed out on an earlier thread, it seems inappropriate to do probing
> like this when there is a cpuid feature flag (osxsave) that can be used to
> determine whether XSAVE can be used. And even without that flag,
> simply reading CR4 and checking whether osxsave is set there would
> suffice. This is under the assumption that Xen's to-be-done implementation
> of XSAVE support would match that of FXSAVE (Xen turns its support on
> unconditionally and for all [pv] guests).

I didn't want to make too many assumptions about how Xen's XSAVE support
would look. In particular, I thought it might virtualize the state of
OSXSAVE to give the guest the honour of appearing to enable it. A guest
kernel may get confused if it starts with OSXSAVE set, as it may use it
to control its own init logic.

J

2009-03-13 15:17:22

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 22/24] xen: remove suspend_cancel hook

Jan Beulich wrote:
> Does that mean that there are no intentions to ever support the accelerator
> stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
> only user of the cancel hook...
>

I don't have any immediate plan to merge it myself, but I'm sure we can
either add this back or otherwise work out how to implement it in the
pvops kernel.

J

2009-03-15 18:50:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Jeremy Fitzhardinge wrote:
> Jan Beulich wrote:
>> As pointed out on an earlier thread, it seems inappropriate to do probing
>> like this when there is a cpuid feature flag (osxsave) that can be
>> used to
>> determine whether XSAVE can be used. And even without that flag,
>> simply reading CR4 and checking whether osxsave is set there would
>> suffice. This is under the assumption that Xen's to-be-done
>> implementation
>> of XSAVE support would match that of FXSAVE (Xen turns its support on
>> unconditionally and for all [pv] guests).
>
> I didn't want to make too many assumptions about how Xen's XSAVE support
> would look. In particular, I thought it might virtualize the state of
> OSXSAVE to give the guest the honour of appearing to enable it. A guest
> kernel may get confused if it starts with OSXSAVE set, as it may use it
> to control its own init logic.

That wouldn't be an issue if you use the *native* CPUID to look for
OSXSAVE early on, since such virtualization would only be visible though
the PV interface, right?

It seems cleaner than probing, to be sure...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-03-15 21:03:23

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>
>> Jan Beulich wrote:
>>
>>> As pointed out on an earlier thread, it seems inappropriate to do probing
>>> like this when there is a cpuid feature flag (osxsave) that can be
>>> used to
>>> determine whether XSAVE can be used. And even without that flag,
>>> simply reading CR4 and checking whether osxsave is set there would
>>> suffice. This is under the assumption that Xen's to-be-done
>>> implementation
>>> of XSAVE support would match that of FXSAVE (Xen turns its support on
>>> unconditionally and for all [pv] guests).
>>>
>> I didn't want to make too many assumptions about how Xen's XSAVE support
>> would look. In particular, I thought it might virtualize the state of
>> OSXSAVE to give the guest the honour of appearing to enable it. A guest
>> kernel may get confused if it starts with OSXSAVE set, as it may use it
>> to control its own init logic.
>>
>
> That wouldn't be an issue if you use the *native* CPUID to look for
> OSXSAVE early on, since such virtualization would only be visible though
> the PV interface, right?
>
> It seems cleaner than probing, to be sure...
>

Well, at the moment the problem is that cpuid (both PV and native) show
XSAVE, but Xen prevents cr4.OSXSAVE from being set, crashing the
kernel. There's now a patch in Xen to mask XSAVE in CPUID, so that
guests don't try to use it; the patch in the kernel is just to support
non-bleeding-edge versions of Xen.

There have been some patches floating around for Xen support of XSAVE,
but I think there are some issues with the variable-sized CPU context
and save/restore/migrate, so they've been put on the backburner until
there's a real need for them. I haven't looked at them, but I wouldn't
have assumed that Xen would necessarily set OSXSAVE for itself, or
require guests to do so (if a guest can make do with a simpler CPU
context structure, then that might be simpler for things like
cross-architecture migration, etc). I think that the only safe
assumption is that XSAVE is available iff cpuid.XSAVE is set, modulo the
bug mentioned above.

I guess if we support XSAVE for any vcpu, all the pcpus must have
OSXSAVE set, and we rely on the fact that the XSAVE format is compatible
with FXSAVE where they overlap. But I really don't know what happens
when guests use xsetbv and how that might be virtualized/paravirtualized.

J

2009-03-15 21:23:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 12/24] x86-64: remove PGE from must-have feature list

Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <[email protected]>
>
> PGE may not be available when running paravirtualized, so test the cpuid
> bit before using it.
>
> Signed-off-by: Jeremy Fitzhardinge <[email protected]>
> ---
> arch/x86/include/asm/required-features.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
> index d5cd6c5..a4737dd 100644
> --- a/arch/x86/include/asm/required-features.h
> +++ b/arch/x86/include/asm/required-features.h
> @@ -50,7 +50,7 @@
> #ifdef CONFIG_X86_64
> #define NEED_PSE 0
> #define NEED_MSR (1<<(X86_FEATURE_MSR & 31))
> -#define NEED_PGE (1<<(X86_FEATURE_PGE & 31))
> +#define NEED_PGE 0
> #define NEED_FXSR (1<<(X86_FEATURE_FXSR & 31))
> #define NEED_XMM (1<<(X86_FEATURE_XMM & 31))
> #define NEED_XMM2 (1<<(X86_FEATURE_XMM2 & 31))

This should be conditionalized on CONFIG_PARAVIRT, since doing this
removes real-hardware optimimizations.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-03-15 21:26:18

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 12/24] x86-64: remove PGE from must-have feature list

H. Peter Anvin wrote:
>> --- a/arch/x86/include/asm/required-features.h
>> +++ b/arch/x86/include/asm/required-features.h
>> @@ -50,7 +50,7 @@
>> #ifdef CONFIG_X86_64
>> #define NEED_PSE 0
>> #define NEED_MSR (1<<(X86_FEATURE_MSR & 31))
>> -#define NEED_PGE (1<<(X86_FEATURE_PGE & 31))
>> +#define NEED_PGE 0
>> #define NEED_FXSR (1<<(X86_FEATURE_FXSR & 31))
>> #define NEED_XMM (1<<(X86_FEATURE_XMM & 31))
>> #define NEED_XMM2 (1<<(X86_FEATURE_XMM2 & 31))
>>
>
> This should be conditionalized on CONFIG_PARAVIRT, since doing this
> removes real-hardware optimimizations.
>

OK. Can do the same for PSE.

J

2009-03-15 22:53:19

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

On Sun, 15 Mar 2009 14:03:10 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> H. Peter Anvin wrote:
> > Jeremy Fitzhardinge wrote:
> >
> >> Jan Beulich wrote:
> >>
> >>> As pointed out on an earlier thread, it seems inappropriate to do
> >>> probing like this when there is a cpuid feature flag (osxsave)
> >>> that can be used to
> >>> determine whether XSAVE can be used. And even without that flag,
> >>> simply reading CR4 and checking whether osxsave is set there would
> >>> suffice. This is under the assumption that Xen's to-be-done
> >>> implementation
> >>> of XSAVE support would match that of FXSAVE (Xen turns its
> >>> support on unconditionally and for all [pv] guests).
> >>>
> >> I didn't want to make too many assumptions about how Xen's XSAVE
> >> support would look. In particular, I thought it might virtualize
> >> the state of OSXSAVE to give the guest the honour of appearing to
> >> enable it. A guest kernel may get confused if it starts with
> >> OSXSAVE set, as it may use it to control its own init logic.
> >>
> >
> > That wouldn't be an issue if you use the *native* CPUID to look for
> > OSXSAVE early on, since such virtualization would only be visible
> > though the PV interface, right?
> >
> > It seems cleaner than probing, to be sure...
> >
>
> Well, at the moment the problem is that cpuid (both PV and native)
> show XSAVE, but Xen prevents cr4.OSXSAVE from being set, crashing the
> kernel. There's now a patch in Xen to mask XSAVE in CPUID, so that
> guests don't try to use it; the patch in the kernel is just to
> support non-bleeding-edge versions of Xen.

This is indicative of something that might be a huge bug in Xen:
Xen should never ever pass through CPUID bits it does not know.
If Xen does not honor that, there is a fundamental and eternally
recurring problem.... every time something new gets introduced Xen
likely breaks.


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-03-16 00:05:43

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Arjan van de Ven wrote:
> This is indicative of something that might be a huge bug in Xen:
> Xen should never ever pass through CPUID bits it does not know.
> If Xen does not honor that, there is a fundamental and eternally
> recurring problem.... every time something new gets introduced Xen
> likely breaks.

Yes, I'd agree; Xen should whitelist cpu capabilities rather than
blacklist them. Jan expressed the opposite opinion (on the grounds that
it precludes using features which don't require special OS or hypervisor
support without Xen modifications). But if its just a matter of
sticking a bit into a mask, its easy and quick to roll a new version of
Xen (esp since it can generally be done before CPUs with the new feature
get into people's hands).

J

2009-03-16 00:11:18

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

On Sun, 15 Mar 2009 17:05:26 -0700
Jeremy Fitzhardinge <[email protected]> wrote:

> Arjan van de Ven wrote:
> > This is indicative of something that might be a huge bug in Xen:
> > Xen should never ever pass through CPUID bits it does not know.
> > If Xen does not honor that, there is a fundamental and eternally
> > recurring problem.... every time something new gets introduced Xen
> > likely breaks.
>
> Yes, I'd agree; Xen should whitelist cpu capabilities rather than
> blacklist them. Jan expressed the opposite opinion (on the grounds
> that it precludes using features which don't require special OS or
> hypervisor support without Xen modifications).

Well.. pretty much all new instructions need Xen modifications due to
the need to be emulate to deal with traps/vmexits/etc right?
So I don't quite see many cpuid bits that would NOT involve some Xen
modification or another ;)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-03-16 01:06:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Arjan van de Ven wrote:
>
> Well.. pretty much all new instructions need Xen modifications due to
> the need to be emulate to deal with traps/vmexits/etc right?
> So I don't quite see many cpuid bits that would NOT involve some Xen
> modification or another ;)
>

There are going to be a very small number which need only userspace
support. However, there will be absolutely no way for Xen to know this
a priori.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-03-16 14:17:12

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

>>> Arjan van de Ven <[email protected]> 16.03.09 01:09 >>>
>Well.. pretty much all new instructions need Xen modifications due to
>the need to be emulate to deal with traps/vmexits/etc right?
>So I don't quite see many cpuid bits that would NOT involve some Xen
>modification or another ;)

No, new (user-mode accessible) instructions represent precisely the kind
of extension that do not require hypervisor (or OS) awareness (see SSE2
etc, AES, FMA). New registers otoh are examples of where awareness is
needed (SSE, AVX), as would be new privileged instructions.

Jan

2009-03-16 14:29:26

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

On Mon, 16 Mar 2009 14:16:32 +0000
"Jan Beulich" <[email protected]> wrote:

> >>> Arjan van de Ven <[email protected]> 16.03.09 01:09 >>>
> >Well.. pretty much all new instructions need Xen modifications due to
> >the need to be emulate to deal with traps/vmexits/etc right?
> >So I don't quite see many cpuid bits that would NOT involve some Xen
> >modification or another ;)
>
> No, new (user-mode accessible) instructions represent precisely the
> kind of extension that do not require hypervisor (or OS) awareness
> (see SSE2 etc, AES, FMA).

so Xen doesn't need to handle a case where the kernel does AES on
uncached IO memory ?


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-03-17 00:02:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

"Jan Beulich" <[email protected]> writes:

>>>> Arjan van de Ven <[email protected]> 16.03.09 01:09 >>>
>>Well.. pretty much all new instructions need Xen modifications due to
>>the need to be emulate to deal with traps/vmexits/etc right?
>>So I don't quite see many cpuid bits that would NOT involve some Xen
>>modification or another ;)
>
> No, new (user-mode accessible) instructions represent precisely the kind
> of extension that do not require hypervisor (or OS) awareness (see SSE2
> etc, AES, FMA). New registers otoh are examples of where awareness is
> needed (SSE, AVX), as would be new privileged instructions.

Whey would another hypothetical FP register extension need Xen support
once it gets proper XSAVE support? I can't think of a reason why
(assuming XSAVE support) it would need to know of a new kind of
FP register or similar. They very likely won't appear in any
instructions that need mmio. Or are you worried about the real
mode emulator?

-Andi

--
[email protected] -- Speaking for myself only.

2009-03-17 01:42:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Andi Kleen wrote:
> "Jan Beulich" <[email protected]> writes:
>
>>>>> Arjan van de Ven <[email protected]> 16.03.09 01:09 >>>
>>> Well.. pretty much all new instructions need Xen modifications due to
>>> the need to be emulate to deal with traps/vmexits/etc right?
>>> So I don't quite see many cpuid bits that would NOT involve some Xen
>>> modification or another ;)
>> No, new (user-mode accessible) instructions represent precisely the kind
>> of extension that do not require hypervisor (or OS) awareness (see SSE2
>> etc, AES, FMA). New registers otoh are examples of where awareness is
>> needed (SSE, AVX), as would be new privileged instructions.
>
> Whey would another hypothetical FP register extension need Xen support
> once it gets proper XSAVE support? I can't think of a reason why
> (assuming XSAVE support) it would need to know of a new kind of
> FP register or similar. They very likely won't appear in any
> instructions that need mmio. Or are you worried about the real
> mode emulator?
>

The point is YOU DON'T KNOW. In particular, there might be new traps,
there might be new state, there might be new MSRs, there might be new
control bits... anything. Therefore, you cannot blindly pass the bit
on, even though XSAVE solves one part of the problem.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-03-17 07:52:49

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

>>> Andi Kleen <[email protected]> 17.03.09 00:59 >>>
>"Jan Beulich" <[email protected]> writes:
>
>>>>> Arjan van de Ven <[email protected]> 16.03.09 01:09 >>>
>>>Well.. pretty much all new instructions need Xen modifications due to
>>>the need to be emulate to deal with traps/vmexits/etc right?
>>>So I don't quite see many cpuid bits that would NOT involve some Xen
>>>modification or another ;)
>>
>> No, new (user-mode accessible) instructions represent precisely the kind
>> of extension that do not require hypervisor (or OS) awareness (see SSE2
>> etc, AES, FMA). New registers otoh are examples of where awareness is
>> needed (SSE, AVX), as would be new privileged instructions.
>
>Whey would another hypothetical FP register extension need Xen support
>once it gets proper XSAVE support? I can't think of a reason why
>(assuming XSAVE support) it would need to know of a new kind of
>FP register or similar. They very likely won't appear in any
>instructions that need mmio. Or are you worried about the real
>mode emulator?

No, properly coded xsave support will (hopefully) make user-visible context
extensions transparent to hypervisor and OS. But I was giving a general
example here, and the change from xmm to ymm registers is one that does
need hypervisor (and OS) changes.

Jan

2009-03-17 10:48:10

by Andi Kleen

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

> No, properly coded xsave support will (hopefully) make user-visible context
> extensions transparent to hypervisor and OS. But I was giving a general
> example here, and the change from xmm to ymm registers is one that does
> need hypervisor (and OS) changes.

Again except for XSAVE support it doesn't, does it?

-Andi

2009-03-17 10:55:00

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

>>> Andi Kleen <[email protected]> 17.03.09 11:48 >>>
>> No, properly coded xsave support will (hopefully) make user-visible context
>> extensions transparent to hypervisor and OS. But I was giving a general
>> example here, and the change from xmm to ymm registers is one that does
>> need hypervisor (and OS) changes.
>
>Again except for XSAVE support it doesn't, does it?

No, it shouldn't (minus the valid comments hpa added in another reply).

Jan

2009-03-17 11:56:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

> The point is YOU DON'T KNOW. In particular, there might be new traps,
> there might be new state, there might be new MSRs, there might be new
> control bits... anything. Therefore, you cannot blindly pass the bit
> on, even though XSAVE solves one part of the problem.

I think what will happen if you don't expose it is that there will
be always hypervisors which are behind and applications/OS will end up
doing probing for opcodes instead of trusting CPUID bits.

Probably not what you intended.

-Andi
--
[email protected] -- Speaking for myself only.

2009-03-17 15:52:17

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

On Tue, 17 Mar 2009 12:56:21 +0100
Andi Kleen <[email protected]> wrote:

> > The point is YOU DON'T KNOW. In particular, there might be new
> > traps, there might be new state, there might be new MSRs, there
> > might be new control bits... anything. Therefore, you cannot
> > blindly pass the bit on, even though XSAVE solves one part of the
> > problem.
>
> I think what will happen if you don't expose it is that there will
> be always hypervisors which are behind and applications/OS will end up
> doing probing for opcodes instead of trusting CPUID bits.
>
> Probably not what you intended.
>

well the choice fundamentally is
1) Have correct applications work, even though you might not always get
all new features that the hardware could have done.. at the expense
that someone who wants to do horrible things can
2) Have all latest features always there, but break correctly written
apps/oses every 2 years.

I'd go for option 1 any day of the week, hands down.
Esp if the "cpu cloaking" kind of things really disable the
instructions... but even without.

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-03-17 15:55:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

> well the choice fundamentally is
> 1) Have correct applications work, even though you might not always get
> all new features that the hardware could have done.. at the expense
> that someone who wants to do horrible things can
> 2) Have all latest features always there, but break correctly written
> apps/oses every 2 years.

I'm not sure there will be that much breakage. It seems more like
a theoretical danger.

>
> I'd go for option 1 any day of the week, hands down.
> Esp if the "cpu cloaking" kind of things really disable the
> instructions... but even without.

With cpu cloaking and disabling unknown instructions it would be fine
to go conservative. But that's not what is being proposed.

-Andi

--
[email protected] -- Speaking for myself only.

2009-03-17 15:56:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid

Andi Kleen wrote:
>> The point is YOU DON'T KNOW. In particular, there might be new traps,
>> there might be new state, there might be new MSRs, there might be new
>> control bits... anything. Therefore, you cannot blindly pass the bit
>> on, even though XSAVE solves one part of the problem.
>
> I think what will happen if you don't expose it is that there will
> be always hypervisors which are behind and applications/OS will end up
> doing probing for opcodes instead of trusting CPUID bits.
>
> Probably not what you intended.
>

Probing for opcodes is even more harmful, though. But yes, we don't
have a good answer to this, and I believe we *can't* have a good answer
to this either -- we could architect the CPUID instruction a bit
differently, but that doesn't account for the various needs of
differnent hypervisors.

Hypervisor vendors can of course make this easier by making their CPUID
code pluggable so the end user can "hotfix" upgrade it without upgrading
the hypervisor (which makes a lot of them nervous.)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.