2007-12-20 20:07:15

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 0/15] adjust pvops to accomodate its x86_64 variant

Hi folks,

With this series, the bulk of the work of pvops64 is done.
Here, I integrate most of the paravirt.c and paravirt.h files, making
them applicable to both architectures.

CONFIG_PARAVIRT is _not_ present yet. Basically, this code is missing page
table integration (patches currently being worked on by Jeremy).

Enjoy


2007-12-20 20:08:28

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 02/15] adjust PVOP_CALL/VCALL macros for x86_64

This patch adjust the PVOP_VCALL and PVOP_CALL macros to
work with x86_64. It has a different calling convention, and
we use auxiliary macros to account for both calling conventions
as cleanly as possible

Comments are adjusted accordingly.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 87 +++++++++++++++++++++++++++++++++-----------
1 files changed, 65 insertions(+), 22 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:07.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:17.000000000 -0800
@@ -320,7 +320,7 @@
* runtime.
*
* Normally, a call to a pv_op function is a simple indirect call:
- * (paravirt_ops.operations)(args...).
+ * (pv_op_struct.operations)(args...).
*
* Unfortunately, this is a relatively slow operation for modern CPUs,
* because it cannot necessarily determine what the destination
@@ -330,11 +330,17 @@
* calls are essentially free, because the call and return addresses
* are completely predictable.)
*
- * These macros rely on the standard gcc "regparm(3)" calling
+ * For i386, these macros rely on the standard gcc "regparm(3)" calling
* convention, in which the first three arguments are placed in %eax,
* %edx, %ecx (in that order), and the remaining arguments are placed
* on the stack. All caller-save registers (eax,edx,ecx) are expected
* to be modified (either clobbered or used for return values).
+ * X86_64, on the other hand, already specifies a register-based calling
+ * conventions, returning at %rax, with parameteres going on %rdi, %rsi,
+ * %rdx, and %rcx. Note that for this reason, x86_64 does not need any
+ * special handling for dealing with 4 arguments, unlike i386.
+ * However, x86_64 also have to clobber all caller saved registers, which
+ * unfortunately, are quite a bit (r8 - r11)
*
* The call instruction itself is marked by placing its start address
* and size into the .parainstructions section, so that
@@ -357,10 +363,12 @@
* the return type. The macro then uses sizeof() on that type to
* determine whether its a 32 or 64 bit value, and places the return
* in the right register(s) (just %eax for 32-bit, and %edx:%eax for
- * 64-bit).
+ * 64-bit). For x86_64 machines, it just returns at %rax regardless of
+ * the return value size.
*
* 64-bit arguments are passed as a pair of adjacent 32-bit arguments
- * in low,high order.
+ * i386 also passes 64-bit arguments as a pair of adjacent 32-bit arguments
+ * in low,high order
*
* Small structures are passed and returned in registers. The macro
* calling convention can't directly deal with this, so the wrapper
@@ -370,46 +378,67 @@
* means that all uses must be wrapped in inline functions. This also
* makes sure the incoming and outgoing types are always correct.
*/
+#ifdef CONFIG_X86_32
+#define PVOP_VCALL_ARGS unsigned long __eax, __edx, __ecx
+#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
+#define PVOP_VCALL_CLOBBERS "=a" (__eax), "=d" (__edx), \
+ "=c" (__ecx)
+#define PVOP_CALL_CLOBBERS PVOP_VCALL_CLOBBERS
+#define EXTRA_CLOBBERS
+#define VEXTRA_CLOBBERS
+#else
+#define PVOP_VCALL_ARGS unsigned long __edi, __esi, __edx, __ecx
+#define PVOP_CALL_ARGS PVOP_VCALL_ARGS, __eax
+#define PVOP_VCALL_CLOBBERS "=D" (__edi), \
+ "=S" (__esi), "=d" (__edx), \
+ "=c" (__ecx)
+
+#define PVOP_CALL_CLOBBERS PVOP_VCALL_CLOBBERS, "=a" (__eax)
+
+#define EXTRA_CLOBBERS , "r8", "r9", "r10", "r11"
+#define VEXTRA_CLOBBERS , "rax", "r8", "r9", "r10", "r11"
+#endif
+
#define __PVOP_CALL(rettype, op, pre, post, ...) \
({ \
rettype __ret; \
- unsigned long __eax, __edx, __ecx; \
+ PVOP_CALL_ARGS; \
+ /* This is 32-bit specific, but is okay in 64-bit */ \
+ /* since this condition will never hold */ \
if (sizeof(rettype) > sizeof(unsigned long)) { \
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : "=a" (__eax), "=d" (__edx), \
- "=c" (__ecx) \
+ : PVOP_CALL_CLOBBERS \
: paravirt_type(op), \
paravirt_clobber(CLBR_ANY), \
##__VA_ARGS__ \
- : "memory", "cc"); \
+ : "memory", "cc" EXTRA_CLOBBERS); \
__ret = (rettype)((((u64)__edx) << 32) | __eax); \
} else { \
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : "=a" (__eax), "=d" (__edx), \
- "=c" (__ecx) \
+ : PVOP_CALL_CLOBBERS \
: paravirt_type(op), \
paravirt_clobber(CLBR_ANY), \
##__VA_ARGS__ \
- : "memory", "cc"); \
+ : "memory", "cc" EXTRA_CLOBBERS); \
__ret = (rettype)__eax; \
} \
__ret; \
})
#define __PVOP_VCALL(op, pre, post, ...) \
({ \
- unsigned long __eax, __edx, __ecx; \
+ PVOP_VCALL_ARGS; \
asm volatile(pre \
paravirt_alt(PARAVIRT_CALL) \
post \
- : "=a" (__eax), "=d" (__edx), "=c" (__ecx) \
+ : PVOP_VCALL_CLOBBERS \
: paravirt_type(op), \
paravirt_clobber(CLBR_ANY), \
##__VA_ARGS__ \
- : "memory", "cc"); \
+ : "memory", "cc" VEXTRA_CLOBBERS); \
})

#define PVOP_CALL0(rettype, op) \
@@ -418,22 +447,26 @@
__PVOP_VCALL(op, "", "")

#define PVOP_CALL1(rettype, op, arg1) \
- __PVOP_CALL(rettype, op, "", "", "0" ((u32)(arg1)))
+ __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)))
#define PVOP_VCALL1(op, arg1) \
- __PVOP_VCALL(op, "", "", "0" ((u32)(arg1)))
+ __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)))

#define PVOP_CALL2(rettype, op, arg1, arg2) \
- __PVOP_CALL(rettype, op, "", "", "0" ((u32)(arg1)), "1" ((u32)(arg2)))
+ __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \
+ "1" ((unsigned long)(arg2)))
#define PVOP_VCALL2(op, arg1, arg2) \
- __PVOP_VCALL(op, "", "", "0" ((u32)(arg1)), "1" ((u32)(arg2)))
+ __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \
+ "1" ((unsigned long)(arg2)))

#define PVOP_CALL3(rettype, op, arg1, arg2, arg3) \
- __PVOP_CALL(rettype, op, "", "", "0" ((u32)(arg1)), \
- "1"((u32)(arg2)), "2"((u32)(arg3)))
+ __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \
+ "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)))
#define PVOP_VCALL3(op, arg1, arg2, arg3) \
- __PVOP_VCALL(op, "", "", "0" ((u32)(arg1)), "1"((u32)(arg2)), \
- "2"((u32)(arg3)))
+ __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \
+ "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)))

+/* This is the only difference in x86_64. We can make it much simpler */
+#ifdef CONFIG_X86_32
#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \
__PVOP_CALL(rettype, op, \
"push %[_arg4];", "lea 4(%%esp),%%esp;", \
@@ -444,6 +477,16 @@
"push %[_arg4];", "lea 4(%%esp),%%esp;", \
"0" ((u32)(arg1)), "1" ((u32)(arg2)), \
"2" ((u32)(arg3)), [_arg4] "mr" ((u32)(arg4)))
+#else
+#define PVOP_CALL4(rettype, op, arg1, arg2, arg3, arg4) \
+ __PVOP_CALL(rettype, op, "", "", "0" ((unsigned long)(arg1)), \
+ "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)), \
+ "3"((unsigned long)(arg4)))
+#define PVOP_VCALL4(op, arg1, arg2, arg3, arg4) \
+ __PVOP_VCALL(op, "", "", "0" ((unsigned long)(arg1)), \
+ "1"((unsigned long)(arg2)), "2"((unsigned long)(arg3)), \
+ "3"((unsigned long)(arg4)))
+#endif

static inline int paravirt_enabled(void)
{

2007-12-20 20:08:43

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 01/15] change paravirt_32.c name

This patch changes paravirt_32.c to paravirt.c. The goal
is to have paravirt support in x86_64, so we do it in a common file

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/Makefile_32 | 2 +-
arch/x86/kernel/paravirt.c | 475 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/paravirt_32.c | 472 ----------------------------------------
3 files changed, 476 insertions(+), 473 deletions(-)
create mode 100644 arch/x86/kernel/paravirt.c
delete mode 100644 arch/x86/kernel/paravirt_32.c

Index: linux-2.6-x86/arch/x86/kernel/Makefile_32
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/Makefile_32 2007-12-20 19:07:08.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/Makefile_32 2007-12-20 19:07:15.000000000 -0800
@@ -48,7 +48,7 @@
obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o

obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
-obj-$(CONFIG_PARAVIRT) += paravirt_32.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
obj-y += pcspeaker.o

obj-$(CONFIG_SCx200) += scx200_32.o
Index: linux-2.6-x86/arch/x86/kernel/paravirt.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86/arch/x86/kernel/paravirt.c 2007-12-20 19:07:15.000000000 -0800
@@ -0,0 +1,475 @@
+/* Paravirtualization interfaces
+ Copyright (C) 2006 Rusty Russell IBM Corporation
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+
+ 2007 - x86_64 support added by Glauber de Oliveira Costa, Red Hat Inc
+*/
+
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/efi.h>
+#include <linux/bcd.h>
+#include <linux/highmem.h>
+
+#include <asm/bug.h>
+#include <asm/paravirt.h>
+#include <asm/desc.h>
+#include <asm/setup.h>
+#include <asm/arch_hooks.h>
+#include <asm/time.h>
+#include <asm/irq.h>
+#include <asm/delay.h>
+#include <asm/fixmap.h>
+#include <asm/apic.h>
+#include <asm/tlbflush.h>
+#include <asm/timer.h>
+
+/* nop stub */
+void _paravirt_nop(void)
+{
+}
+
+static void __init default_banner(void)
+{
+ printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
+ pv_info.name);
+}
+
+char *memory_setup(void)
+{
+ return pv_init_ops.memory_setup();
+}
+
+/* Simple instruction patching code. */
+#define DEF_NATIVE(ops, name, code) \
+ extern const char start_##ops##_##name[], end_##ops##_##name[]; \
+ asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
+
+DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
+DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
+DEF_NATIVE(pv_irq_ops, restore_fl, "push %eax; popf");
+DEF_NATIVE(pv_irq_ops, save_fl, "pushf; pop %eax");
+DEF_NATIVE(pv_cpu_ops, iret, "iret");
+DEF_NATIVE(pv_cpu_ops, irq_enable_syscall_ret, "sti; sysexit");
+DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
+DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
+DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
+DEF_NATIVE(pv_cpu_ops, clts, "clts");
+DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
+
+/* Undefined instruction for dealing with missing ops pointers. */
+static const unsigned char ud2a[] = { 0x0f, 0x0b };
+
+static unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
+ unsigned long addr, unsigned len)
+{
+ const unsigned char *start, *end;
+ unsigned ret;
+
+ switch(type) {
+#define SITE(ops, x) \
+ case PARAVIRT_PATCH(ops.x): \
+ start = start_##ops##_##x; \
+ end = end_##ops##_##x; \
+ goto patch_site
+
+ SITE(pv_irq_ops, irq_disable);
+ SITE(pv_irq_ops, irq_enable);
+ SITE(pv_irq_ops, restore_fl);
+ SITE(pv_irq_ops, save_fl);
+ SITE(pv_cpu_ops, iret);
+ SITE(pv_cpu_ops, irq_enable_syscall_ret);
+ SITE(pv_mmu_ops, read_cr2);
+ SITE(pv_mmu_ops, read_cr3);
+ SITE(pv_mmu_ops, write_cr3);
+ SITE(pv_cpu_ops, clts);
+ SITE(pv_cpu_ops, read_tsc);
+#undef SITE
+
+ patch_site:
+ ret = paravirt_patch_insns(ibuf, len, start, end);
+ break;
+
+ default:
+ ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
+ break;
+ }
+
+ return ret;
+}
+
+unsigned paravirt_patch_nop(void)
+{
+ return 0;
+}
+
+unsigned paravirt_patch_ignore(unsigned len)
+{
+ return len;
+}
+
+struct branch {
+ unsigned char opcode;
+ u32 delta;
+} __attribute__((packed));
+
+unsigned paravirt_patch_call(void *insnbuf,
+ const void *target, u16 tgt_clobbers,
+ unsigned long addr, u16 site_clobbers,
+ unsigned len)
+{
+ struct branch *b = insnbuf;
+ unsigned long delta = (unsigned long)target - (addr+5);
+
+ if (tgt_clobbers & ~site_clobbers)
+ return len; /* target would clobber too much for this site */
+ if (len < 5)
+ return len; /* call too long for patch site */
+
+ b->opcode = 0xe8; /* call */
+ b->delta = delta;
+ BUILD_BUG_ON(sizeof(*b) != 5);
+
+ return 5;
+}
+
+unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
+ unsigned long addr, unsigned len)
+{
+ struct branch *b = insnbuf;
+ unsigned long delta = (unsigned long)target - (addr+5);
+
+ if (len < 5)
+ return len; /* call too long for patch site */
+
+ b->opcode = 0xe9; /* jmp */
+ b->delta = delta;
+
+ return 5;
+}
+
+/* Neat trick to map patch type back to the call within the
+ * corresponding structure. */
+static void *get_call_destination(u8 type)
+{
+ struct paravirt_patch_template tmpl = {
+ .pv_init_ops = pv_init_ops,
+ .pv_time_ops = pv_time_ops,
+ .pv_cpu_ops = pv_cpu_ops,
+ .pv_irq_ops = pv_irq_ops,
+ .pv_apic_ops = pv_apic_ops,
+ .pv_mmu_ops = pv_mmu_ops,
+ };
+ return *((void **)&tmpl + type);
+}
+
+unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
+ unsigned long addr, unsigned len)
+{
+ void *opfunc = get_call_destination(type);
+ unsigned ret;
+
+ if (opfunc == NULL)
+ /* If there's no function, patch it with a ud2a (BUG) */
+ ret = paravirt_patch_insns(insnbuf, len, ud2a, ud2a+sizeof(ud2a));
+ else if (opfunc == paravirt_nop)
+ /* If the operation is a nop, then nop the callsite */
+ ret = paravirt_patch_nop();
+ else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
+ type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_syscall_ret))
+ /* If operation requires a jmp, then jmp */
+ ret = paravirt_patch_jmp(insnbuf, opfunc, addr, len);
+ else
+ /* Otherwise call the function; assume target could
+ clobber any caller-save reg */
+ ret = paravirt_patch_call(insnbuf, opfunc, CLBR_ANY,
+ addr, clobbers, len);
+
+ return ret;
+}
+
+unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
+ const char *start, const char *end)
+{
+ unsigned insn_len = end - start;
+
+ if (insn_len > len || start == NULL)
+ insn_len = len;
+ else
+ memcpy(insnbuf, start, insn_len);
+
+ return insn_len;
+}
+
+void init_IRQ(void)
+{
+ pv_irq_ops.init_IRQ();
+}
+
+static void native_flush_tlb(void)
+{
+ __native_flush_tlb();
+}
+
+/*
+ * Global pages have to be flushed a bit differently. Not a real
+ * performance problem because this does not happen often.
+ */
+static void native_flush_tlb_global(void)
+{
+ __native_flush_tlb_global();
+}
+
+static void native_flush_tlb_single(unsigned long addr)
+{
+ __native_flush_tlb_single(addr);
+}
+
+/* These are in entry.S */
+extern void native_iret(void);
+extern void native_irq_enable_syscall_ret(void);
+
+static int __init print_banner(void)
+{
+ pv_init_ops.banner();
+ return 0;
+}
+core_initcall(print_banner);
+
+static struct resource reserve_ioports = {
+ .start = 0,
+ .end = IO_SPACE_LIMIT,
+ .name = "paravirt-ioport",
+ .flags = IORESOURCE_IO | IORESOURCE_BUSY,
+};
+
+static struct resource reserve_iomem = {
+ .start = 0,
+ .end = -1,
+ .name = "paravirt-iomem",
+ .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
+};
+
+/*
+ * Reserve the whole legacy IO space to prevent any legacy drivers
+ * from wasting time probing for their hardware. This is a fairly
+ * brute-force approach to disabling all non-virtual drivers.
+ *
+ * Note that this must be called very early to have any effect.
+ */
+int paravirt_disable_iospace(void)
+{
+ int ret;
+
+ ret = request_resource(&ioport_resource, &reserve_ioports);
+ if (ret == 0) {
+ ret = request_resource(&iomem_resource, &reserve_iomem);
+ if (ret)
+ release_resource(&reserve_ioports);
+ }
+
+ return ret;
+}
+
+static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = PARAVIRT_LAZY_NONE;
+
+static inline void enter_lazy(enum paravirt_lazy_mode mode)
+{
+ BUG_ON(x86_read_percpu(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
+ BUG_ON(preemptible());
+
+ x86_write_percpu(paravirt_lazy_mode, mode);
+}
+
+void paravirt_leave_lazy(enum paravirt_lazy_mode mode)
+{
+ BUG_ON(x86_read_percpu(paravirt_lazy_mode) != mode);
+ BUG_ON(preemptible());
+
+ x86_write_percpu(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);
+}
+
+void paravirt_enter_lazy_mmu(void)
+{
+ enter_lazy(PARAVIRT_LAZY_MMU);
+}
+
+void paravirt_leave_lazy_mmu(void)
+{
+ paravirt_leave_lazy(PARAVIRT_LAZY_MMU);
+}
+
+void paravirt_enter_lazy_cpu(void)
+{
+ enter_lazy(PARAVIRT_LAZY_CPU);
+}
+
+void paravirt_leave_lazy_cpu(void)
+{
+ paravirt_leave_lazy(PARAVIRT_LAZY_CPU);
+}
+
+enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
+{
+ return x86_read_percpu(paravirt_lazy_mode);
+}
+
+struct pv_info pv_info = {
+ .name = "bare hardware",
+ .paravirt_enabled = 0,
+ .kernel_rpl = 0,
+ .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
+};
+
+struct pv_init_ops pv_init_ops = {
+ .patch = native_patch,
+ .banner = default_banner,
+ .arch_setup = paravirt_nop,
+ .memory_setup = machine_specific_memory_setup,
+};
+
+struct pv_time_ops pv_time_ops = {
+ .time_init = hpet_time_init,
+ .get_wallclock = native_get_wallclock,
+ .set_wallclock = native_set_wallclock,
+ .sched_clock = native_sched_clock,
+ .get_cpu_khz = native_calculate_cpu_khz,
+};
+
+struct pv_irq_ops pv_irq_ops = {
+ .init_IRQ = native_init_IRQ,
+ .save_fl = native_save_fl,
+ .restore_fl = native_restore_fl,
+ .irq_disable = native_irq_disable,
+ .irq_enable = native_irq_enable,
+ .safe_halt = native_safe_halt,
+ .halt = native_halt,
+};
+
+struct pv_cpu_ops pv_cpu_ops = {
+ .cpuid = native_cpuid,
+ .get_debugreg = native_get_debugreg,
+ .set_debugreg = native_set_debugreg,
+ .clts = native_clts,
+ .read_cr0 = native_read_cr0,
+ .write_cr0 = native_write_cr0,
+ .read_cr4 = native_read_cr4,
+ .read_cr4_safe = native_read_cr4_safe,
+ .write_cr4 = native_write_cr4,
+ .wbinvd = native_wbinvd,
+ .read_msr = native_read_msr_safe,
+ .write_msr = native_write_msr_safe,
+ .read_tsc = native_read_tsc,
+ .read_pmc = native_read_pmc,
+ .load_tr_desc = native_load_tr_desc,
+ .set_ldt = native_set_ldt,
+ .load_gdt = native_load_gdt,
+ .load_idt = native_load_idt,
+ .store_gdt = native_store_gdt,
+ .store_idt = native_store_idt,
+ .store_tr = native_store_tr,
+ .load_tls = native_load_tls,
+ .write_ldt_entry = native_write_ldt_entry,
+ .write_gdt_entry = native_write_gdt_entry,
+ .write_idt_entry = native_write_idt_entry,
+ .load_sp0 = native_load_sp0,
+
+ .irq_enable_syscall_ret = native_irq_enable_syscall_ret,
+ .iret = native_iret,
+
+ .set_iopl_mask = native_set_iopl_mask,
+ .io_delay = native_io_delay,
+
+ .lazy_mode = {
+ .enter = paravirt_nop,
+ .leave = paravirt_nop,
+ },
+};
+
+struct pv_apic_ops pv_apic_ops = {
+#ifdef CONFIG_X86_LOCAL_APIC
+ .apic_write = native_apic_write,
+ .apic_write_atomic = native_apic_write_atomic,
+ .apic_read = native_apic_read,
+ .setup_boot_clock = setup_boot_APIC_clock,
+ .setup_secondary_clock = setup_secondary_APIC_clock,
+ .startup_ipi_hook = paravirt_nop,
+#endif
+};
+
+struct pv_mmu_ops pv_mmu_ops = {
+ .pagetable_setup_start = native_pagetable_setup_start,
+ .pagetable_setup_done = native_pagetable_setup_done,
+
+ .read_cr2 = native_read_cr2,
+ .write_cr2 = native_write_cr2,
+ .read_cr3 = native_read_cr3,
+ .write_cr3 = native_write_cr3,
+
+ .flush_tlb_user = native_flush_tlb,
+ .flush_tlb_kernel = native_flush_tlb_global,
+ .flush_tlb_single = native_flush_tlb_single,
+ .flush_tlb_others = native_flush_tlb_others,
+
+ .alloc_pt = paravirt_nop,
+ .alloc_pd = paravirt_nop,
+ .alloc_pd_clone = paravirt_nop,
+ .release_pt = paravirt_nop,
+ .release_pd = paravirt_nop,
+
+ .set_pte = native_set_pte,
+ .set_pte_at = native_set_pte_at,
+ .set_pmd = native_set_pmd,
+ .pte_update = paravirt_nop,
+ .pte_update_defer = paravirt_nop,
+
+#ifdef CONFIG_HIGHPTE
+ .kmap_atomic_pte = kmap_atomic,
+#endif
+
+#ifdef CONFIG_X86_PAE
+ .set_pte_atomic = native_set_pte_atomic,
+ .set_pte_present = native_set_pte_present,
+ .set_pud = native_set_pud,
+ .pte_clear = native_pte_clear,
+ .pmd_clear = native_pmd_clear,
+
+ .pmd_val = native_pmd_val,
+ .make_pmd = native_make_pmd,
+#endif
+
+ .pte_val = native_pte_val,
+ .pgd_val = native_pgd_val,
+
+ .make_pte = native_make_pte,
+ .make_pgd = native_make_pgd,
+
+ .dup_mmap = paravirt_nop,
+ .exit_mmap = paravirt_nop,
+ .activate_mm = paravirt_nop,
+
+ .lazy_mode = {
+ .enter = paravirt_nop,
+ .leave = paravirt_nop,
+ },
+};
+
+EXPORT_SYMBOL_GPL(pv_time_ops);
+EXPORT_SYMBOL (pv_cpu_ops);
+EXPORT_SYMBOL (pv_mmu_ops);
+EXPORT_SYMBOL_GPL(pv_apic_ops);
+EXPORT_SYMBOL_GPL(pv_info);
+EXPORT_SYMBOL (pv_irq_ops);
Index: linux-2.6-x86/arch/x86/kernel/paravirt_32.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/paravirt_32.c 2007-12-20 19:07:08.000000000 -0800
+++ /dev/null 1970-01-01 00:00:00.000000000 +0000
@@ -1,472 +0,0 @@
-/* Paravirtualization interfaces
- Copyright (C) 2006 Rusty Russell IBM Corporation
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
-*/
-#include <linux/errno.h>
-#include <linux/module.h>
-#include <linux/efi.h>
-#include <linux/bcd.h>
-#include <linux/highmem.h>
-
-#include <asm/bug.h>
-#include <asm/paravirt.h>
-#include <asm/desc.h>
-#include <asm/setup.h>
-#include <asm/arch_hooks.h>
-#include <asm/time.h>
-#include <asm/irq.h>
-#include <asm/delay.h>
-#include <asm/fixmap.h>
-#include <asm/apic.h>
-#include <asm/tlbflush.h>
-#include <asm/timer.h>
-
-/* nop stub */
-void _paravirt_nop(void)
-{
-}
-
-static void __init default_banner(void)
-{
- printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
- pv_info.name);
-}
-
-char *memory_setup(void)
-{
- return pv_init_ops.memory_setup();
-}
-
-/* Simple instruction patching code. */
-#define DEF_NATIVE(ops, name, code) \
- extern const char start_##ops##_##name[], end_##ops##_##name[]; \
- asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
-
-DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
-DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
-DEF_NATIVE(pv_irq_ops, restore_fl, "push %eax; popf");
-DEF_NATIVE(pv_irq_ops, save_fl, "pushf; pop %eax");
-DEF_NATIVE(pv_cpu_ops, iret, "iret");
-DEF_NATIVE(pv_cpu_ops, irq_enable_syscall_ret, "sti; sysexit");
-DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
-DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
-DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
-DEF_NATIVE(pv_cpu_ops, clts, "clts");
-DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
-
-/* Undefined instruction for dealing with missing ops pointers. */
-static const unsigned char ud2a[] = { 0x0f, 0x0b };
-
-static unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
- unsigned long addr, unsigned len)
-{
- const unsigned char *start, *end;
- unsigned ret;
-
- switch(type) {
-#define SITE(ops, x) \
- case PARAVIRT_PATCH(ops.x): \
- start = start_##ops##_##x; \
- end = end_##ops##_##x; \
- goto patch_site
-
- SITE(pv_irq_ops, irq_disable);
- SITE(pv_irq_ops, irq_enable);
- SITE(pv_irq_ops, restore_fl);
- SITE(pv_irq_ops, save_fl);
- SITE(pv_cpu_ops, iret);
- SITE(pv_cpu_ops, irq_enable_syscall_ret);
- SITE(pv_mmu_ops, read_cr2);
- SITE(pv_mmu_ops, read_cr3);
- SITE(pv_mmu_ops, write_cr3);
- SITE(pv_cpu_ops, clts);
- SITE(pv_cpu_ops, read_tsc);
-#undef SITE
-
- patch_site:
- ret = paravirt_patch_insns(ibuf, len, start, end);
- break;
-
- default:
- ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
- break;
- }
-
- return ret;
-}
-
-unsigned paravirt_patch_nop(void)
-{
- return 0;
-}
-
-unsigned paravirt_patch_ignore(unsigned len)
-{
- return len;
-}
-
-struct branch {
- unsigned char opcode;
- u32 delta;
-} __attribute__((packed));
-
-unsigned paravirt_patch_call(void *insnbuf,
- const void *target, u16 tgt_clobbers,
- unsigned long addr, u16 site_clobbers,
- unsigned len)
-{
- struct branch *b = insnbuf;
- unsigned long delta = (unsigned long)target - (addr+5);
-
- if (tgt_clobbers & ~site_clobbers)
- return len; /* target would clobber too much for this site */
- if (len < 5)
- return len; /* call too long for patch site */
-
- b->opcode = 0xe8; /* call */
- b->delta = delta;
- BUILD_BUG_ON(sizeof(*b) != 5);
-
- return 5;
-}
-
-unsigned paravirt_patch_jmp(void *insnbuf, const void *target,
- unsigned long addr, unsigned len)
-{
- struct branch *b = insnbuf;
- unsigned long delta = (unsigned long)target - (addr+5);
-
- if (len < 5)
- return len; /* call too long for patch site */
-
- b->opcode = 0xe9; /* jmp */
- b->delta = delta;
-
- return 5;
-}
-
-/* Neat trick to map patch type back to the call within the
- * corresponding structure. */
-static void *get_call_destination(u8 type)
-{
- struct paravirt_patch_template tmpl = {
- .pv_init_ops = pv_init_ops,
- .pv_time_ops = pv_time_ops,
- .pv_cpu_ops = pv_cpu_ops,
- .pv_irq_ops = pv_irq_ops,
- .pv_apic_ops = pv_apic_ops,
- .pv_mmu_ops = pv_mmu_ops,
- };
- return *((void **)&tmpl + type);
-}
-
-unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
- unsigned long addr, unsigned len)
-{
- void *opfunc = get_call_destination(type);
- unsigned ret;
-
- if (opfunc == NULL)
- /* If there's no function, patch it with a ud2a (BUG) */
- ret = paravirt_patch_insns(insnbuf, len, ud2a, ud2a+sizeof(ud2a));
- else if (opfunc == paravirt_nop)
- /* If the operation is a nop, then nop the callsite */
- ret = paravirt_patch_nop();
- else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
- type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_syscall_ret))
- /* If operation requires a jmp, then jmp */
- ret = paravirt_patch_jmp(insnbuf, opfunc, addr, len);
- else
- /* Otherwise call the function; assume target could
- clobber any caller-save reg */
- ret = paravirt_patch_call(insnbuf, opfunc, CLBR_ANY,
- addr, clobbers, len);
-
- return ret;
-}
-
-unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
- const char *start, const char *end)
-{
- unsigned insn_len = end - start;
-
- if (insn_len > len || start == NULL)
- insn_len = len;
- else
- memcpy(insnbuf, start, insn_len);
-
- return insn_len;
-}
-
-void init_IRQ(void)
-{
- pv_irq_ops.init_IRQ();
-}
-
-static void native_flush_tlb(void)
-{
- __native_flush_tlb();
-}
-
-/*
- * Global pages have to be flushed a bit differently. Not a real
- * performance problem because this does not happen often.
- */
-static void native_flush_tlb_global(void)
-{
- __native_flush_tlb_global();
-}
-
-static void native_flush_tlb_single(unsigned long addr)
-{
- __native_flush_tlb_single(addr);
-}
-
-/* These are in entry.S */
-extern void native_iret(void);
-extern void native_irq_enable_syscall_ret(void);
-
-static int __init print_banner(void)
-{
- pv_init_ops.banner();
- return 0;
-}
-core_initcall(print_banner);
-
-static struct resource reserve_ioports = {
- .start = 0,
- .end = IO_SPACE_LIMIT,
- .name = "paravirt-ioport",
- .flags = IORESOURCE_IO | IORESOURCE_BUSY,
-};
-
-static struct resource reserve_iomem = {
- .start = 0,
- .end = -1,
- .name = "paravirt-iomem",
- .flags = IORESOURCE_MEM | IORESOURCE_BUSY,
-};
-
-/*
- * Reserve the whole legacy IO space to prevent any legacy drivers
- * from wasting time probing for their hardware. This is a fairly
- * brute-force approach to disabling all non-virtual drivers.
- *
- * Note that this must be called very early to have any effect.
- */
-int paravirt_disable_iospace(void)
-{
- int ret;
-
- ret = request_resource(&ioport_resource, &reserve_ioports);
- if (ret == 0) {
- ret = request_resource(&iomem_resource, &reserve_iomem);
- if (ret)
- release_resource(&reserve_ioports);
- }
-
- return ret;
-}
-
-static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = PARAVIRT_LAZY_NONE;
-
-static inline void enter_lazy(enum paravirt_lazy_mode mode)
-{
- BUG_ON(x86_read_percpu(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
- BUG_ON(preemptible());
-
- x86_write_percpu(paravirt_lazy_mode, mode);
-}
-
-void paravirt_leave_lazy(enum paravirt_lazy_mode mode)
-{
- BUG_ON(x86_read_percpu(paravirt_lazy_mode) != mode);
- BUG_ON(preemptible());
-
- x86_write_percpu(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);
-}
-
-void paravirt_enter_lazy_mmu(void)
-{
- enter_lazy(PARAVIRT_LAZY_MMU);
-}
-
-void paravirt_leave_lazy_mmu(void)
-{
- paravirt_leave_lazy(PARAVIRT_LAZY_MMU);
-}
-
-void paravirt_enter_lazy_cpu(void)
-{
- enter_lazy(PARAVIRT_LAZY_CPU);
-}
-
-void paravirt_leave_lazy_cpu(void)
-{
- paravirt_leave_lazy(PARAVIRT_LAZY_CPU);
-}
-
-enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
-{
- return x86_read_percpu(paravirt_lazy_mode);
-}
-
-struct pv_info pv_info = {
- .name = "bare hardware",
- .paravirt_enabled = 0,
- .kernel_rpl = 0,
- .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
-};
-
-struct pv_init_ops pv_init_ops = {
- .patch = native_patch,
- .banner = default_banner,
- .arch_setup = paravirt_nop,
- .memory_setup = machine_specific_memory_setup,
-};
-
-struct pv_time_ops pv_time_ops = {
- .time_init = hpet_time_init,
- .get_wallclock = native_get_wallclock,
- .set_wallclock = native_set_wallclock,
- .sched_clock = native_sched_clock,
- .get_cpu_khz = native_calculate_cpu_khz,
-};
-
-struct pv_irq_ops pv_irq_ops = {
- .init_IRQ = native_init_IRQ,
- .save_fl = native_save_fl,
- .restore_fl = native_restore_fl,
- .irq_disable = native_irq_disable,
- .irq_enable = native_irq_enable,
- .safe_halt = native_safe_halt,
- .halt = native_halt,
-};
-
-struct pv_cpu_ops pv_cpu_ops = {
- .cpuid = native_cpuid,
- .get_debugreg = native_get_debugreg,
- .set_debugreg = native_set_debugreg,
- .clts = native_clts,
- .read_cr0 = native_read_cr0,
- .write_cr0 = native_write_cr0,
- .read_cr4 = native_read_cr4,
- .read_cr4_safe = native_read_cr4_safe,
- .write_cr4 = native_write_cr4,
- .wbinvd = native_wbinvd,
- .read_msr = native_read_msr_safe,
- .write_msr = native_write_msr_safe,
- .read_tsc = native_read_tsc,
- .read_pmc = native_read_pmc,
- .load_tr_desc = native_load_tr_desc,
- .set_ldt = native_set_ldt,
- .load_gdt = native_load_gdt,
- .load_idt = native_load_idt,
- .store_gdt = native_store_gdt,
- .store_idt = native_store_idt,
- .store_tr = native_store_tr,
- .load_tls = native_load_tls,
- .write_ldt_entry = native_write_ldt_entry,
- .write_gdt_entry = native_write_gdt_entry,
- .write_idt_entry = native_write_idt_entry,
- .load_sp0 = native_load_sp0,
-
- .irq_enable_syscall_ret = native_irq_enable_syscall_ret,
- .iret = native_iret,
-
- .set_iopl_mask = native_set_iopl_mask,
- .io_delay = native_io_delay,
-
- .lazy_mode = {
- .enter = paravirt_nop,
- .leave = paravirt_nop,
- },
-};
-
-struct pv_apic_ops pv_apic_ops = {
-#ifdef CONFIG_X86_LOCAL_APIC
- .apic_write = native_apic_write,
- .apic_write_atomic = native_apic_write_atomic,
- .apic_read = native_apic_read,
- .setup_boot_clock = setup_boot_APIC_clock,
- .setup_secondary_clock = setup_secondary_APIC_clock,
- .startup_ipi_hook = paravirt_nop,
-#endif
-};
-
-struct pv_mmu_ops pv_mmu_ops = {
- .pagetable_setup_start = native_pagetable_setup_start,
- .pagetable_setup_done = native_pagetable_setup_done,
-
- .read_cr2 = native_read_cr2,
- .write_cr2 = native_write_cr2,
- .read_cr3 = native_read_cr3,
- .write_cr3 = native_write_cr3,
-
- .flush_tlb_user = native_flush_tlb,
- .flush_tlb_kernel = native_flush_tlb_global,
- .flush_tlb_single = native_flush_tlb_single,
- .flush_tlb_others = native_flush_tlb_others,
-
- .alloc_pt = paravirt_nop,
- .alloc_pd = paravirt_nop,
- .alloc_pd_clone = paravirt_nop,
- .release_pt = paravirt_nop,
- .release_pd = paravirt_nop,
-
- .set_pte = native_set_pte,
- .set_pte_at = native_set_pte_at,
- .set_pmd = native_set_pmd,
- .pte_update = paravirt_nop,
- .pte_update_defer = paravirt_nop,
-
-#ifdef CONFIG_HIGHPTE
- .kmap_atomic_pte = kmap_atomic,
-#endif
-
-#ifdef CONFIG_X86_PAE
- .set_pte_atomic = native_set_pte_atomic,
- .set_pte_present = native_set_pte_present,
- .set_pud = native_set_pud,
- .pte_clear = native_pte_clear,
- .pmd_clear = native_pmd_clear,
-
- .pmd_val = native_pmd_val,
- .make_pmd = native_make_pmd,
-#endif
-
- .pte_val = native_pte_val,
- .pgd_val = native_pgd_val,
-
- .make_pte = native_make_pte,
- .make_pgd = native_make_pgd,
-
- .dup_mmap = paravirt_nop,
- .exit_mmap = paravirt_nop,
- .activate_mm = paravirt_nop,
-
- .lazy_mode = {
- .enter = paravirt_nop,
- .leave = paravirt_nop,
- },
-};
-
-EXPORT_SYMBOL_GPL(pv_time_ops);
-EXPORT_SYMBOL (pv_cpu_ops);
-EXPORT_SYMBOL (pv_mmu_ops);
-EXPORT_SYMBOL_GPL(pv_apic_ops);
-EXPORT_SYMBOL_GPL(pv_info);
-EXPORT_SYMBOL (pv_irq_ops);

2007-12-20 20:09:04

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 03/15] cleanup write_tsc

write_tsc() does not need to be enclosed in any paravirt closure,
as it uses wrmsr(). So we rip off the duplicate in msr.h
and the definition from paravirt.h

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/msr.h | 2 --
include/asm-x86/paravirt.h | 2 --
2 files changed, 0 insertions(+), 4 deletions(-)

Index: linux-2.6-x86/include/asm-x86/msr.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/msr.h 2007-12-20 19:06:59.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/msr.h 2007-12-20 19:07:18.000000000 -0800
@@ -153,8 +153,6 @@
#define rdtscll(val) \
((val) = native_read_tsc())

-#define write_tsc(val1,val2) wrmsr(0x10, val1, val2)
-
#define rdpmc(counter,low,high) \
do { \
u64 _l = native_read_pmc(counter); \
Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:17.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:18.000000000 -0800
@@ -657,8 +657,6 @@
}
#define calculate_cpu_khz() (pv_time_ops.get_cpu_khz())

-#define write_tsc(val1,val2) wrmsr(0x10, val1, val2)
-
static inline unsigned long long paravirt_read_pmc(int counter)
{
return PVOP_CALL1(u64, pv_cpu_ops.read_pmc, counter);

2007-12-20 20:09:32

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 04/15] provide paravirtualized hook for rdtscp

This patch adds a field in pv_cpu_ops for a paravirtualized hook
for rdtscp, needed for x86_64.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/paravirt.c | 1 +
include/asm-x86/paravirt.h | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 0 deletions(-)

Index: linux-2.6-x86/arch/x86/kernel/paravirt.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/paravirt.c 2007-12-20 19:07:15.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/paravirt.c 2007-12-20 19:07:22.000000000 -0800
@@ -374,6 +374,7 @@
.write_msr = native_write_msr_safe,
.read_tsc = native_read_tsc,
.read_pmc = native_read_pmc,
+ .read_tscp = native_read_tscp,
.load_tr_desc = native_load_tr_desc,
.set_ldt = native_set_ldt,
.load_gdt = native_load_gdt,
Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:18.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:22.000000000 -0800
@@ -120,6 +120,7 @@

u64 (*read_tsc)(void);
u64 (*read_pmc)(int counter);
+ unsigned long long (*read_tscp)(unsigned int *aux);

/* These two are jmp to, not actually called. */
void (*irq_enable_syscall_ret)(void);
@@ -668,6 +669,27 @@
high = _l >> 32; \
} while(0)

+static inline unsigned long long paravirt_rdtscp(unsigned int *aux)
+{
+ return PVOP_CALL1(u64, pv_cpu_ops.read_tscp, aux);
+}
+
+#define rdtscp(low, high, aux) \
+do { \
+ int __aux; \
+ unsigned long __val = paravirt_rdtscp(&__aux); \
+ (low) = (u32)__val; \
+ (high) = (u32)(__val >> 32); \
+ (aux) = __aux; \
+} while (0)
+
+#define rdtscpll(val, aux) \
+do { \
+ unsigned long __aux; \
+ val = paravirt_rdtscp(&__aux); \
+ (aux) = __aux; \
+} while (0)
+
static inline void load_TR_desc(void)
{
PVOP_VCALL0(pv_cpu_ops.load_tr_desc);

2007-12-20 20:09:47

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 05/15] change assembly definition of paravirt_patch_site

To account for differences in x86_64, we change the macros that
create raw instances of the paravirt_patch_site struct.
We need to align 64-pointers to 64-bit boundaries, so we add an alignment
directive. Also, we need to make room for a word-sized pointer,
instead of a fixed 32-bit one

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:22.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:25.000000000 -0800
@@ -5,6 +5,7 @@

#ifdef CONFIG_PARAVIRT
#include <asm/page.h>
+#include <asm/asm.h>

/* Bitmask of what can be clobbered: usually at least eax. */
#define CLBR_NONE 0x0
@@ -281,7 +282,8 @@
#define _paravirt_alt(insn_string, type, clobber) \
"771:\n\t" insn_string "\n" "772:\n" \
".pushsection .parainstructions,\"a\"\n" \
- " .long 771b\n" \
+ _ASM_ALIGN "\n" \
+ _ASM_PTR " 771b\n" \
" .byte " type "\n" \
" .byte 772b-771b\n" \
" .short " clobber "\n" \
@@ -1159,17 +1161,25 @@

#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4)

-#define PARA_SITE(ptype, clobbers, ops) \
+#define _PVSITE(ptype, clobbers, ops, word, algn) \
771:; \
ops; \
772:; \
.pushsection .parainstructions,"a"; \
- .long 771b; \
+ .align algn; \
+ word 771b; \
.byte ptype; \
.byte 772b-771b; \
.short clobbers; \
.popsection

+
+#ifdef CONFIG_X86_64
+#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .quad, 8)
+#else
+#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .long, 4)
+#endif
+
#define INTERRUPT_RETURN \
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE, \
jmp *%cs:pv_cpu_ops+PV_CPU_iret)

2007-12-20 20:10:03

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 06/15] adjust assembly macros to x86_64 as well.

This patch adjust the paravirt macros used in assembly code
to accomodate for x86_64 as well.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 18 ++++++++++++------
1 files changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:25.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:27.000000000 -0800
@@ -1159,8 +1159,6 @@

#else /* __ASSEMBLY__ */

-#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4)
-
#define _PVSITE(ptype, clobbers, ops, word, algn) \
771:; \
ops; \
@@ -1175,8 +1173,14 @@


#ifdef CONFIG_X86_64
+#define PV_SAVE_REGS pushq %rax; pushq %rdi; pushq %rcx; pushq %rdx
+#define PV_RESTORE_REGS popq %rdx; popq %rcx; popq %rdi; popq %rax
+#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 8)
#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .quad, 8)
#else
+#define PV_SAVE_REGS pushl %eax; pushl %edi; pushl %ecx; pushl %edx
+#define PV_RESTORE_REGS popl %edx; popl %ecx; popl %edi; popl %eax
+#define PARA_PATCH(struct, off) ((PARAVIRT_PATCH_##struct + (off)) / 4)
#define PARA_SITE(ptype, clobbers, ops) _PVSITE(ptype, clobbers, ops, .long, 4)
#endif

@@ -1186,25 +1190,27 @@

#define DISABLE_INTERRUPTS(clobbers) \
PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
- pushl %eax; pushl %ecx; pushl %edx; \
+ PV_SAVE_REGS; \
call *%cs:pv_irq_ops+PV_IRQ_irq_disable; \
- popl %edx; popl %ecx; popl %eax) \
+ PV_RESTORE_REGS;) \

#define ENABLE_INTERRUPTS(clobbers) \
PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), clobbers, \
- pushl %eax; pushl %ecx; pushl %edx; \
+ PV_SAVE_REGS; \
call *%cs:pv_irq_ops+PV_IRQ_irq_enable; \
- popl %edx; popl %ecx; popl %eax)
+ PV_RESTORE_REGS;)

#define ENABLE_INTERRUPTS_SYSCALL_RET \
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_irq_enable_syscall_ret),\
CLBR_NONE, \
jmp *%cs:pv_cpu_ops+PV_CPU_irq_enable_syscall_ret)

+#ifdef CONFIG_X86_32
#define GET_CR0_INTO_EAX \
push %ecx; push %edx; \
call *pv_cpu_ops+PV_CPU_read_cr0; \
pop %edx; pop %ecx
+#endif

#endif /* __ASSEMBLY__ */
#endif /* CONFIG_PARAVIRT */

2007-12-20 20:10:47

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 08/15] add macro for privileged x86_64 operation

i386 has a macro GET_CR0_INTO_EAX, used in early trap handling code.
x86_64 has similar needs, only it needs to put cr2 into rcx. We provide
a macro for such task, in the same way

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:28.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:29.000000000 -0800
@@ -1227,6 +1227,12 @@
push %ecx; push %edx; \
call *pv_cpu_ops+PV_CPU_read_cr0; \
pop %edx; pop %ecx
+#else
+#define GET_CR2_INTO_RCX \
+ call *pv_mmu_ops+PV_MMU_read_cr2; \
+ movq %rax, %rcx; \
+ xorq %rax, %rax;
+
#endif

#endif /* __ASSEMBLY__ */

2007-12-20 20:11:11

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 07/15] change irq functions to accomodate x86_64

This patch changes the irq handling function definitions
in paravirt.h (like raw_local_irq_disable) to accomodate for x86_64.
The differences are in the calling convention.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 43 ++++++++++++++++++++++++++++++-------------
1 files changed, 30 insertions(+), 13 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:27.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:07:28.000000000 -0800
@@ -1085,52 +1085,68 @@
extern struct paravirt_patch_site __parainstructions[],
__parainstructions_end[];

+#ifdef CONFIG_X86_32
+#define PV_SAVE_REGS "pushl %%ecx; pushl %%edx;"
+#define PV_RESTORE_REGS "popl %%edx; popl %%ecx"
+#define PV_FLAGS_ARG "0"
+#define PV_EXTRA_CLOBBERS
+#define PV_VEXTRA_CLOBBERS
+#else
+/* We save some registers, but all of them, that's too much. We clobber all
+ * caller saved registers but the argument parameter */
+#define PV_SAVE_REGS "pushq %%rdi;"
+#define PV_RESTORE_REGS "popq %%rdi;"
+#define PV_EXTRA_CLOBBERS EXTRA_CLOBBERS, "rcx" , "rdx"
+#define PV_VEXTRA_CLOBBERS EXTRA_CLOBBERS, "rdi", "rcx" , "rdx"
+#define PV_FLAGS_ARG "D"
+#endif
+
static inline unsigned long __raw_local_save_flags(void)
{
unsigned long f;

- asm volatile(paravirt_alt("pushl %%ecx; pushl %%edx;"
+ asm volatile(paravirt_alt(PV_SAVE_REGS
PARAVIRT_CALL
- "popl %%edx; popl %%ecx")
+ PV_RESTORE_REGS)
: "=a"(f)
: paravirt_type(pv_irq_ops.save_fl),
paravirt_clobber(CLBR_EAX)
- : "memory", "cc");
+ : "memory", "cc" PV_VEXTRA_CLOBBERS);
return f;
}

static inline void raw_local_irq_restore(unsigned long f)
{
- asm volatile(paravirt_alt("pushl %%ecx; pushl %%edx;"
+ asm volatile(paravirt_alt(PV_SAVE_REGS
PARAVIRT_CALL
- "popl %%edx; popl %%ecx")
+ PV_RESTORE_REGS)
: "=a"(f)
- : "0"(f),
+ : PV_FLAGS_ARG(f),
paravirt_type(pv_irq_ops.restore_fl),
paravirt_clobber(CLBR_EAX)
- : "memory", "cc");
+ : "memory", "cc" PV_EXTRA_CLOBBERS);
}

static inline void raw_local_irq_disable(void)
{
- asm volatile(paravirt_alt("pushl %%ecx; pushl %%edx;"
+ asm volatile(paravirt_alt(PV_SAVE_REGS
PARAVIRT_CALL
- "popl %%edx; popl %%ecx")
+ PV_RESTORE_REGS)
:
: paravirt_type(pv_irq_ops.irq_disable),
paravirt_clobber(CLBR_EAX)
- : "memory", "eax", "cc");
+ : "memory", "eax", "cc" PV_EXTRA_CLOBBERS);
}

static inline void raw_local_irq_enable(void)
{
- asm volatile(paravirt_alt("pushl %%ecx; pushl %%edx;"
+ asm volatile(paravirt_alt(PV_SAVE_REGS
PARAVIRT_CALL
- "popl %%edx; popl %%ecx")
+ PV_RESTORE_REGS)
:
: paravirt_type(pv_irq_ops.irq_enable),
paravirt_clobber(CLBR_EAX)
- : "memory", "eax", "cc");
+ : "memory", "eax", "cc" PV_EXTRA_CLOBBERS);
}

static inline unsigned long __raw_local_irq_save(void)
@@ -1205,6 +1221,7 @@
CLBR_NONE, \
jmp *%cs:pv_cpu_ops+PV_CPU_irq_enable_syscall_ret)

+
#ifdef CONFIG_X86_32
#define GET_CR0_INTO_EAX \
push %ecx; push %edx; \

2007-12-20 20:11:35

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 10/15] replace privileged instructions with paravirt macros

The assembly code in entry_64.S issues a bunch of privileged instructions,
like cli, sti, swapgs, and others. Paravirt guests are forbidden to do so,
and we then replace them with macros that will do the right thing.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/entry_64.S | 101 +++++++++++++++++++++++++------------------
1 files changed, 59 insertions(+), 42 deletions(-)

Index: linux-2.6-x86/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/entry_64.S 2007-12-20 19:06:59.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/entry_64.S 2007-12-20 19:08:08.000000000 -0800
@@ -50,6 +50,7 @@
#include <asm/hw_irq.h>
#include <asm/page.h>
#include <asm/irqflags.h>
+#include <asm/paravirt.h>

.code64

@@ -57,6 +58,13 @@
#define retint_kernel retint_restore_args
#endif

+#ifdef CONFIG_PARAVIRT
+ENTRY(native_irq_enable_syscall_ret)
+ movq %gs:pda_oldrsp,%rsp
+ swapgs
+ sysretq
+#endif /* CONFIG_PARAVIRT */
+

.macro TRACE_IRQS_IRETQ offset=ARGOFFSET
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -216,14 +224,21 @@
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
- swapgs
+ SWAPGS_UNSAFE_STACK
+ /*
+ * A hypervisor implementation might want to use a label
+ * after the swapgs, so that it can do the swapgs
+ * for the guest and jump here on syscall.
+ */
+ENTRY(system_call_after_swapgs)
+
movq %rsp,%gs:pda_oldrsp
movq %gs:pda_kernelstack,%rsp
/*
* No need to follow this irqs off/on section - it's straight
* and short:
*/
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
movq %rcx,RIP-ARGOFFSET(%rsp)
@@ -246,7 +261,7 @@
sysret_check:
LOCKDEP_SYS_EXIT
GET_THREAD_INFO(%rcx)
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
movl threadinfo_flags(%rcx),%edx
andl %edi,%edx
@@ -260,9 +275,7 @@
CFI_REGISTER rip,rcx
RESTORE_ARGS 0,-ARG_SKIP,1
/*CFI_REGISTER rflags,r11*/
- movq %gs:pda_oldrsp,%rsp
- swapgs
- sysretq
+ ENABLE_INTERRUPTS_SYSCALL_RET

CFI_RESTORE_STATE
/* Handle reschedules */
@@ -271,7 +284,7 @@
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
@@ -282,7 +295,7 @@
/* Handle a signal */
sysret_signal:
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
testl $(_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz 1f

@@ -295,7 +308,7 @@
1: movl $_TIF_NEED_RESCHED,%edi
/* Use IRET because user could have changed frame. This
works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check

@@ -327,7 +340,7 @@
*/
.globl int_ret_from_sys_call
int_ret_from_sys_call:
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
testl $3,CS-ARGOFFSET(%rsp)
je retint_restore_args
@@ -349,20 +362,20 @@
bt $TIF_NEED_RESCHED,%edx
jnc int_very_careful
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
popq %rdi
CFI_ADJUST_CFA_OFFSET -8
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check

/* handle signals and tracing -- both require a full stack frame */
int_very_careful:
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_REST
/* Check for syscall exit trace */
testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edx
@@ -385,7 +398,7 @@
1: movl $_TIF_NEED_RESCHED,%edi
int_restore_rest:
RESTORE_REST
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
CFI_ENDPROC
@@ -506,7 +519,7 @@
CFI_DEF_CFA_REGISTER rbp
testl $3,CS(%rdi)
je 1f
- swapgs
+ SWAPGS
/* irqcount is used to check if a CPU is already on an interrupt
stack or not. While this is essentially redundant with preempt_count
it is a little cheaper to use a separate counter in the PDA
@@ -527,7 +540,7 @@
interrupt do_IRQ
/* 0(%rsp): oldrsp-ARGOFFSET */
ret_from_intr:
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
decl %gs:pda_irqcount
leaveq
@@ -556,13 +569,13 @@
/*
* The iretq could re-enable interrupts:
*/
- cli
+ DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_IRETQ
- swapgs
+ SWAPGS
jmp restore_args

retint_restore_args: /* return to kernel space */
- cli
+ DISABLE_INTERRUPTS(CLBR_ANY)
/*
* The iretq could re-enable interrupts:
*/
@@ -570,10 +583,14 @@
restore_args:
RESTORE_ARGS 0,8,0
iret_label:
+#ifdef CONFIG_PARAVIRT
+ INTERRUPT_RETURN
+#endif
+ENTRY(native_iret)
iretq

.section __ex_table,"a"
- .quad iret_label,bad_iret
+ .quad native_iret, bad_iret
.previous
.section .fixup,"ax"
/* force a signal here? this matches i386 behaviour */
@@ -581,24 +598,24 @@
bad_iret:
movq $11,%rdi /* SIGSEGV */
TRACE_IRQS_ON
- sti
- jmp do_exit
- .previous
-
+ ENABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
+ jmp do_exit
+ .previous
+
/* edi: workmask, edx: work */
retint_careful:
CFI_RESTORE_STATE
bt $TIF_NEED_RESCHED,%edx
jnc retint_signal
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
popq %rdi
CFI_ADJUST_CFA_OFFSET -8
GET_THREAD_INFO(%rcx)
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp retint_check

@@ -606,14 +623,14 @@
testl $(_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz retint_swapgs
TRACE_IRQS_ON
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_REST
movq $-1,ORIG_RAX(%rsp)
xorl %esi,%esi # oldset
movq %rsp,%rdi # &pt_regs
call do_notify_resume
RESTORE_REST
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
movl $_TIF_NEED_RESCHED,%edi
GET_THREAD_INFO(%rcx)
@@ -731,7 +748,7 @@
rdmsr
testl %edx,%edx
js 1f
- swapgs
+ SWAPGS
xorl %ebx,%ebx
1:
.if \ist
@@ -747,7 +764,7 @@
.if \ist
addq $EXCEPTION_STKSZ, per_cpu__init_tss + TSS_ist + (\ist - 1) * 8(%rbp)
.endif
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
.if \irqtrace
TRACE_IRQS_OFF
.endif
@@ -776,10 +793,10 @@
.if \trace
TRACE_IRQS_IRETQ 0
.endif
- swapgs
+ SWAPGS_UNSAFE_STACK
paranoid_restore\trace:
RESTORE_ALL 8
- iretq
+ INTERRUPT_RETURN
paranoid_userspace\trace:
GET_THREAD_INFO(%rcx)
movl threadinfo_flags(%rcx),%ebx
@@ -794,11 +811,11 @@
.if \trace
TRACE_IRQS_ON
.endif
- sti
+ ENABLE_INTERRUPTS(CLBR_NONE)
xorl %esi,%esi /* arg2: oldset */
movq %rsp,%rdi /* arg1: &pt_regs */
call do_notify_resume
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
.if \trace
TRACE_IRQS_OFF
.endif
@@ -807,9 +824,9 @@
.if \trace
TRACE_IRQS_ON
.endif
- sti
+ ENABLE_INTERRUPTS(CLBR_ANY)
call schedule
- cli
+ DISABLE_INTERRUPTS(CLBR_ANY)
.if \trace
TRACE_IRQS_OFF
.endif
@@ -862,7 +879,7 @@
testl $3,CS(%rsp)
je error_kernelspace
error_swapgs:
- swapgs
+ SWAPGS
error_sti:
movq %rdi,RDI(%rsp)
CFI_REL_OFFSET rdi,RDI
@@ -874,7 +891,7 @@
error_exit:
movl %ebx,%eax
RESTORE_REST
- cli
+ DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
GET_THREAD_INFO(%rcx)
testl %eax,%eax
@@ -911,12 +928,12 @@
CFI_STARTPROC
pushf
CFI_ADJUST_CFA_OFFSET 8
- cli
- swapgs
+ DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
+ SWAPGS
gs_change:
movl %edi,%gs
2: mfence /* workaround */
- swapgs
+ SWAPGS
popf
CFI_ADJUST_CFA_OFFSET -8
ret
@@ -930,7 +947,7 @@
.section .fixup,"ax"
/* running with kernelgs */
bad_gs:
- swapgs /* switch back to user gs */
+ SWAPGS /* switch back to user gs */
xorl %eax,%eax
movl %eax,%gs
jmp 2b

2007-12-20 20:11:51

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 09/15] adds paravirt hook for swapgs

This patch adds paravirt hook for swapgs operation, which is a privileged
operation in x86_64.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/paravirt.c | 1 +
include/asm-x86/paravirt.h | 9 +++++++++
include/asm-x86/processor.h | 8 ++++++++
3 files changed, 18 insertions(+), 0 deletions(-)

Index: linux-2.6-x86/arch/x86/kernel/paravirt.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/paravirt.c 2007-12-20 19:07:22.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/paravirt.c 2007-12-20 19:08:06.000000000 -0800
@@ -390,6 +390,7 @@

.irq_enable_syscall_ret = native_irq_enable_syscall_ret,
.iret = native_iret,
+ .swapgs = native_swapgs,

.set_iopl_mask = native_set_iopl_mask,
.io_delay = native_io_delay,
Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:07:29.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:08:06.000000000 -0800
@@ -127,6 +127,8 @@
void (*irq_enable_syscall_ret)(void);
void (*iret)(void);

+ void (*swapgs)(void);
+
struct pv_lazy_ops lazy_mode;
};

@@ -1228,6 +1230,13 @@
call *pv_cpu_ops+PV_CPU_read_cr0; \
pop %edx; pop %ecx
#else
+#define SWAPGS \
+ PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_swapgs), CLBR_NONE, \
+ PV_SAVE_REGS \
+ call *pv_cpu_ops+PV_CPU_swapgs; \
+ PV_RESTORE_REGS \
+ )
+
#define GET_CR2_INTO_RCX \
call *pv_mmu_ops+PV_MMU_read_cr2; \
movq %rax, %rcx; \
Index: linux-2.6-x86/include/asm-x86/processor.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/processor.h 2007-12-20 19:06:59.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/processor.h 2007-12-20 19:08:06.000000000 -0800
@@ -435,6 +435,13 @@
#endif
}

+static inline void native_swapgs(void)
+{
+#ifdef CONFIG_X86_64
+ asm volatile("swapgs" ::: "memory");
+#endif
+}
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else
@@ -456,6 +463,7 @@
}

#define set_iopl_mask native_set_iopl_mask
+#define SWAPGS swapgs
#endif /* CONFIG_PARAVIRT */

/*

2007-12-20 20:12:19

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 11/15] cleanup CLI_STRING, STI_STRING and friends

Since the advent of ticket locking, CLI_STRING, STI_STRING, and friends
are not used anymore. They can now be safely deleted.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/spinlock.h | 9 ---------
1 files changed, 0 insertions(+), 9 deletions(-)

Index: linux-2.6-x86/include/asm-x86/spinlock.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/spinlock.h 2007-12-20 19:06:59.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/spinlock.h 2007-12-20 19:08:09.000000000 -0800
@@ -19,15 +19,6 @@
* (the type definitions are in asm/spinlock_types.h)
*/

-#ifdef CONFIG_PARAVIRT
-#include <asm/paravirt.h>
-#else
-#define CLI_STRING "cli"
-#define STI_STRING "sti"
-#define CLI_STI_CLOBBERS
-#define CLI_STI_INPUT_ARGS
-#endif /* CONFIG_PARAVIRT */
-
#ifdef CONFIG_X86_32
typedef char _slock_t;
# define LOCK_INS_DEC "decb"

2007-12-20 20:12:41

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 12/15] add CLBR_ defines for x86_64.

x86_64 needs a potentially larger clobber list than i386, due to its calling
convention. So we add more CLBR_ defines for it.
Note that CLBR_ANY is different for each of the architectures, since it comprises
the notion of "All call clobbers in this architecture"

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
include/asm-x86/paravirt.h | 23 ++++++++++++++++++-----
1 files changed, 18 insertions(+), 5 deletions(-)

Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:08:06.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:08:10.000000000 -0800
@@ -8,11 +8,24 @@
#include <asm/asm.h>

/* Bitmask of what can be clobbered: usually at least eax. */
-#define CLBR_NONE 0x0
-#define CLBR_EAX 0x1
-#define CLBR_ECX 0x2
-#define CLBR_EDX 0x4
-#define CLBR_ANY 0x7
+#define CLBR_NONE 0
+#define CLBR_EAX (1 << 0)
+#define CLBR_ECX (1 << 1)
+#define CLBR_EDX (1 << 2)
+
+#ifdef CONFIG_X86_64
+#define CLBR_RSI (1 << 3)
+#define CLBR_RDI (1 << 4)
+#define CLBR_R8 (1 << 5)
+#define CLBR_R9 (1 << 6)
+#define CLBR_R10 (1 << 7)
+#define CLBR_R11 (1 << 8)
+#define CLBR_ANY ((1 << 9) - 1)
+#include <asm/desc_defs.h>
+#else
+/* CLBR_ANY should match all regs platform has. For i386, that's just it */
+#define CLBR_ANY ((1 << 3) - 1)
+#endif /* X86_64 */

#ifndef __ASSEMBLY__
#include <linux/types.h>

2007-12-20 20:12:59

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 13/15] move patching code to arch-specific file.

The core patching code for paravirt is sufficiently different
among i386 and x86_64, and we move them to specific files.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/Makefile_32 | 2 +-
arch/x86/kernel/paravirt.c | 50 -------------------------------------------
include/asm-x86/paravirt.h | 8 +++++++
3 files changed, 9 insertions(+), 51 deletions(-)

Index: linux-2.6-x86/arch/x86/kernel/Makefile_32
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/Makefile_32 2007-12-20 19:07:15.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/Makefile_32 2007-12-20 19:08:11.000000000 -0800
@@ -48,7 +48,7 @@
obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o

obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_32.o
obj-y += pcspeaker.o

obj-$(CONFIG_SCx200) += scx200_32.o
Index: linux-2.6-x86/arch/x86/kernel/paravirt.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/paravirt.c 2007-12-20 19:08:06.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/paravirt.c 2007-12-20 19:08:11.000000000 -0800
@@ -58,59 +58,9 @@
extern const char start_##ops##_##name[], end_##ops##_##name[]; \
asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")

-DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
-DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
-DEF_NATIVE(pv_irq_ops, restore_fl, "push %eax; popf");
-DEF_NATIVE(pv_irq_ops, save_fl, "pushf; pop %eax");
-DEF_NATIVE(pv_cpu_ops, iret, "iret");
-DEF_NATIVE(pv_cpu_ops, irq_enable_syscall_ret, "sti; sysexit");
-DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
-DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
-DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
-DEF_NATIVE(pv_cpu_ops, clts, "clts");
-DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
-
/* Undefined instruction for dealing with missing ops pointers. */
static const unsigned char ud2a[] = { 0x0f, 0x0b };

-static unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
- unsigned long addr, unsigned len)
-{
- const unsigned char *start, *end;
- unsigned ret;
-
- switch(type) {
-#define SITE(ops, x) \
- case PARAVIRT_PATCH(ops.x): \
- start = start_##ops##_##x; \
- end = end_##ops##_##x; \
- goto patch_site
-
- SITE(pv_irq_ops, irq_disable);
- SITE(pv_irq_ops, irq_enable);
- SITE(pv_irq_ops, restore_fl);
- SITE(pv_irq_ops, save_fl);
- SITE(pv_cpu_ops, iret);
- SITE(pv_cpu_ops, irq_enable_syscall_ret);
- SITE(pv_mmu_ops, read_cr2);
- SITE(pv_mmu_ops, read_cr3);
- SITE(pv_mmu_ops, write_cr3);
- SITE(pv_cpu_ops, clts);
- SITE(pv_cpu_ops, read_tsc);
-#undef SITE
-
- patch_site:
- ret = paravirt_patch_insns(ibuf, len, start, end);
- break;
-
- default:
- ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
- break;
- }
-
- return ret;
-}
-
unsigned paravirt_patch_nop(void)
{
return 0;
Index: linux-2.6-x86/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-x86.orig/include/asm-x86/paravirt.h 2007-12-20 19:08:10.000000000 -0800
+++ linux-2.6-x86/include/asm-x86/paravirt.h 2007-12-20 19:08:11.000000000 -0800
@@ -308,6 +308,11 @@
#define paravirt_alt(insn_string) \
_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")

+/* Simple instruction patching code. */
+#define DEF_NATIVE(ops, name, code) \
+ extern const char start_##ops##_##name[], end_##ops##_##name[]; \
+ asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
+
unsigned paravirt_patch_nop(void);
unsigned paravirt_patch_ignore(unsigned len);
unsigned paravirt_patch_call(void *insnbuf,
@@ -322,6 +327,9 @@
unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
const char *start, const char *end);

+unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
+ unsigned long addr, unsigned len);
+
int paravirt_disable_iospace(void);

/*

2007-12-20 20:13:22

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 14/15] x86_64 patching functions

Like i386, x86_64 also need to include its own patching function.
(Well, if you're not in a hurry, and don't care about speed, you don't
really _need_ ;-))

So here they are. Not much different in essence from i386

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/Makefile_64 | 1 +
arch/x86/kernel/paravirt_patch_64.c | 56 +++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/kernel/paravirt_patch_64.c

Index: linux-2.6-x86/arch/x86/kernel/Makefile_64
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/Makefile_64 2007-12-20 19:05:52.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/Makefile_64 2007-12-20 19:08:14.000000000 -0800
@@ -41,6 +41,7 @@
obj-$(CONFIG_K8_NB) += k8.o
obj-$(CONFIG_AUDIT) += audit_64.o
obj-$(CONFIG_EFI) += efi.o efi_64.o efi_stub_64.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_64.o

obj-$(CONFIG_MODULES) += module_64.o
obj-$(CONFIG_PCI) += early-quirks.o
Index: linux-2.6-x86/arch/x86/kernel/paravirt_patch_64.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-x86/arch/x86/kernel/paravirt_patch_64.c 2007-12-20 19:08:14.000000000 -0800
@@ -0,0 +1,56 @@
+#include <asm/paravirt.h>
+#include <asm/asm-offsets.h>
+
+DEF_NATIVE(pv_irq_ops, irq_disable, "cli");
+DEF_NATIVE(pv_irq_ops, irq_enable, "sti");
+DEF_NATIVE(pv_irq_ops, restore_fl, "pushq %rdi; popfq");
+DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax");
+DEF_NATIVE(pv_cpu_ops, iret, "iretq");
+DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax");
+DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax");
+DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3");
+DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)");
+DEF_NATIVE(pv_cpu_ops, clts, "clts");
+DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd");
+
+/* the three commands give us more control to how to return from a syscall */
+DEF_NATIVE(pv_cpu_ops, irq_enable_syscall_ret, "movq %gs:" __stringify(pda_oldrsp) ", %rsp; swapgs; sysretq;");
+DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
+
+unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
+ unsigned long addr, unsigned len)
+{
+ const unsigned char *start, *end;
+ unsigned ret;
+
+#define PATCH_SITE(ops, x) \
+ case PARAVIRT_PATCH(ops.x): \
+ start = start_##ops##_##x; \
+ end = end_##ops##_##x; \
+ goto patch_site
+ switch(type) {
+ PATCH_SITE(pv_irq_ops, restore_fl);
+ PATCH_SITE(pv_irq_ops, save_fl);
+ PATCH_SITE(pv_irq_ops, irq_enable);
+ PATCH_SITE(pv_irq_ops, irq_disable);
+ PATCH_SITE(pv_cpu_ops, iret);
+ PATCH_SITE(pv_cpu_ops, irq_enable_syscall_ret);
+ PATCH_SITE(pv_cpu_ops, swapgs);
+ PATCH_SITE(pv_mmu_ops, read_cr2);
+ PATCH_SITE(pv_mmu_ops, read_cr3);
+ PATCH_SITE(pv_mmu_ops, write_cr3);
+ PATCH_SITE(pv_cpu_ops, clts);
+ PATCH_SITE(pv_mmu_ops, flush_tlb_single);
+ PATCH_SITE(pv_cpu_ops, wbinvd);
+
+ patch_site:
+ ret = paravirt_patch_insns(ibuf, len, start, end);
+ break;
+
+ default:
+ ret = paravirt_patch_default(type, clobbers, ibuf, addr, len);
+ break;
+ }
+#undef PATCH_SITE
+ return ret;
+}

2007-12-20 20:13:37

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 15/15] replace x86_read/write_per_cpu with a common function.

x86_read_per_cpu() and its writeish sister are not present in x86_64. So in
this patch, we replace them with __get_cpu_var(), which is present in both

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/paravirt.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6-x86/arch/x86/kernel/paravirt.c
===================================================================
--- linux-2.6-x86.orig/arch/x86/kernel/paravirt.c 2007-12-20 19:08:11.000000000 -0800
+++ linux-2.6-x86/arch/x86/kernel/paravirt.c 2007-12-20 19:08:18.000000000 -0800
@@ -238,18 +238,18 @@

static inline void enter_lazy(enum paravirt_lazy_mode mode)
{
- BUG_ON(x86_read_percpu(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
+ BUG_ON(__get_cpu_var(paravirt_lazy_mode) != PARAVIRT_LAZY_NONE);
BUG_ON(preemptible());

- x86_write_percpu(paravirt_lazy_mode, mode);
+ __get_cpu_var(paravirt_lazy_mode) = mode;
}

void paravirt_leave_lazy(enum paravirt_lazy_mode mode)
{
- BUG_ON(x86_read_percpu(paravirt_lazy_mode) != mode);
+ BUG_ON(__get_cpu_var(paravirt_lazy_mode) != mode);
BUG_ON(preemptible());

- x86_write_percpu(paravirt_lazy_mode, PARAVIRT_LAZY_NONE);
+ __get_cpu_var(paravirt_lazy_mode) = PARAVIRT_LAZY_NONE;
}

void paravirt_enter_lazy_mmu(void)
@@ -274,7 +274,7 @@

enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
{
- return x86_read_percpu(paravirt_lazy_mode);
+ return __get_cpu_var(paravirt_lazy_mode);
}

struct pv_info pv_info = {

2007-12-20 20:34:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 13/15] move patching code to arch-specific file.


* Ingo Molnar <[email protected]> wrote:

> this patch adds the paravirt_patch_32.o:
>
> > -obj-$(CONFIG_PARAVIRT) += paravirt.o
> > +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_32.o
>
> but does not add that file nor any other rule to build that target, so
> it fails to build with the attached config:
>
> make[1]: *** No rule to make target
> `arch/x86/kernel/paravirt_patch_32.o', needed by
> `arch/x86/kernel/built-in.o'. Stop.

if it's just that single missing file then please send me a patch that
adds that file and i'll add it to this patch. (instead of resending the
whole series)

Ingo

2007-12-20 20:34:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 13/15] move patching code to arch-specific file.


* Glauber de Oliveira Costa <[email protected]> wrote:

> The core patching code for paravirt is sufficiently different
> among i386 and x86_64, and we move them to specific files.

this patch adds the paravirt_patch_32.o:

> -obj-$(CONFIG_PARAVIRT) += paravirt.o
> +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_32.o

but does not add that file nor any other rule to build that target, so
it fails to build with the attached config:

make[1]: *** No rule to make target
`arch/x86/kernel/paravirt_patch_32.o', needed by
`arch/x86/kernel/built-in.o'. Stop.

Ingo


Attachments:
(No filename) (578.00 B)
config (51.24 kB)
Download all attachments

2007-12-20 21:22:37

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 13/15] move patching code to arch-specific file.

On Dec 20, 2007 6:33 PM, Ingo Molnar <[email protected]> wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > this patch adds the paravirt_patch_32.o:
> >
> > > -obj-$(CONFIG_PARAVIRT) += paravirt.o
> > > +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_32.o
> >
> > but does not add that file nor any other rule to build that target, so
> > it fails to build with the attached config:
> >
> > make[1]: *** No rule to make target
> > `arch/x86/kernel/paravirt_patch_32.o', needed by
> > `arch/x86/kernel/built-in.o'. Stop.
>
> if it's just that single missing file then please send me a patch that
> adds that file and i'll add it to this patch. (instead of resending the
> whole series)

Yeah, I must have forgotten to git-add it.
It was building here fine, because I clearly have the file locally ;-)

I'm attaching a new patch that replaces just this one.

--
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."


Attachments:
(No filename) (1.00 kB)
0001-move-patching-code-to-arch-specific-file.patch (5.86 kB)
Download all attachments

2007-12-20 21:23:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 15/15] replace x86_read/write_per_cpu with a common function.

Glauber de Oliveira Costa wrote:
> x86_read_per_cpu() and its writeish sister are not present in x86_64. So in
> this patch, we replace them with __get_cpu_var(), which is present in both

Eventually we'd like to use something closer to the i386 code for
x86_64, obviously...

-hpa

2007-12-20 21:28:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 13/15] move patching code to arch-specific file.


* Glauber de Oliveira Costa <[email protected]> wrote:

> > > make[1]: *** No rule to make target
> > > `arch/x86/kernel/paravirt_patch_32.o', needed by
> > > `arch/x86/kernel/built-in.o'. Stop.
> >
> > if it's just that single missing file then please send me a patch that
> > adds that file and i'll add it to this patch. (instead of resending the
> > whole series)
>
> Yeah, I must have forgotten to git-add it. It was building here fine,
> because I clearly have the file locally ;-)
>
> I'm attaching a new patch that replaces just this one.

thanks, this did the trick, it builds fine now.

Ingo