2009-09-18 00:17:11

by Alok Kataria

[permalink] [raw]
Subject: Paravirtualization on VMware's Platform [VMI].

Hi,

We ran a few experiments to compare performance of VMware's
paravirtualization technique (VMI) and hardware MMU technologies (HWMMU)
on VMware's hypervisor.

To give some background, VMI is VMware's paravirtualization
specification which tries to optimize CPU and MMU operations of the
guest operating system. For more information take a look at this
http://www.vmware.com/interfaces/paravirtualization.html

In most of the benchmarks, EPT/NPT (hwmmu) technologies are at par or
provide better performance compared to VMI.
The experiments included comparing performance across various micro and
real world like benchmarks.

Host configuration used for testing.
* Dell PowerEdge 2970
* 2 x quad-core AMD Opteron 2384 2.7GHz (Shanghai C2), RVI capable.
* 8 GB (4 x 2GB) memory, NUMA enabled
* 2 x 300GB RAID 0 storage
* 2 x embedded 1Gb NICs (Braodcom NetXtreme II BCM5708 1000Base-T)
* Running developement build of ESX.

The guest VM was a SLES 10 SP2 based VM for both the VMI and non-VMI
case. kernel version: 2.6.16.60-0.37_f594963d-vmipae.

Below is a short summary of performance results between HWMMU and VMI.
These results are averaged over 9 runs. The memory was sized at 512MB
per VCPU in all experiments.
For the ratio results comparing hwmmu technologies to vmi, higher than 1
means hwmmu is better than vmi.

compile workloads - 4-way : 1.02, i.e. about 2% better.
compile workloads - 8-way : 1.14, i,e. 14% better.
oracle swingbench - 4-way (small pages) : 1.34, i.e. 34% better.
oracle swingbench - 4-way (large pages) : 1.03, i.e. 3% better.
specjbb (large pages) : 0.99, i.e. 1% degradation.

Please note that specjbb is the worst case benchmark for hwmmu, due to
the higher TLB miss latency, so it's a good result that the worst case
benchmark has a degradation of only 1%.

VMware expects that these hardware virtualization features will be
ubiquitous by 2011.

Apart from the performance benefit, VMI was important for Linux on
VMware's platform, from timekeeping point of view, but with the tickless
kernels and TSC improvements that were done for the mainline tree, we
think VMI has outlived those requirements too.

In light of these results and availability of such hardware, we have
decided to stop supporting VMI in our future products.

Given this new development, I wanted to discuss how should we go about
retiring the VMI code from mainline Linux, i.e. the vmi_32.c and
vmiclock_32.c bits.

One of the options that I am contemplating is to drop the code from the
tip tree in this release cycle, and given that this should be a low risk
change we can remove it from Linus's tree later in the merge cycle.

Let me know your views on this or if you think we should do this some
other way.

Thanks,
Alok


2009-09-18 00:40:24

by Chris Wright

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

* Alok Kataria ([email protected]) wrote:
> We ran a few experiments to compare performance of VMware's
> paravirtualization technique (VMI) and hardware MMU technologies (HWMMU)
> on VMware's hypervisor.
>
> To give some background, VMI is VMware's paravirtualization
> specification which tries to optimize CPU and MMU operations of the
> guest operating system. For more information take a look at this
> http://www.vmware.com/interfaces/paravirtualization.html
>
> In most of the benchmarks, EPT/NPT (hwmmu) technologies are at par or
> provide better performance compared to VMI.
> The experiments included comparing performance across various micro and
> real world like benchmarks.
>
> Host configuration used for testing.
> * Dell PowerEdge 2970
> * 2 x quad-core AMD Opteron 2384 2.7GHz (Shanghai C2), RVI capable.
> * 8 GB (4 x 2GB) memory, NUMA enabled
> * 2 x 300GB RAID 0 storage
> * 2 x embedded 1Gb NICs (Braodcom NetXtreme II BCM5708 1000Base-T)
> * Running developement build of ESX.
>
> The guest VM was a SLES 10 SP2 based VM for both the VMI and non-VMI
> case. kernel version: 2.6.16.60-0.37_f594963d-vmipae.
>
> Below is a short summary of performance results between HWMMU and VMI.
> These results are averaged over 9 runs. The memory was sized at 512MB
> per VCPU in all experiments.
> For the ratio results comparing hwmmu technologies to vmi, higher than 1
> means hwmmu is better than vmi.
>
> compile workloads - 4-way : 1.02, i.e. about 2% better.
> compile workloads - 8-way : 1.14, i,e. 14% better.
> oracle swingbench - 4-way (small pages) : 1.34, i.e. 34% better.
> oracle swingbench - 4-way (large pages) : 1.03, i.e. 3% better.
> specjbb (large pages) : 0.99, i.e. 1% degradation.

Not entirely surprising. Curious if you ran specjbb w/ small pages too?

> Please note that specjbb is the worst case benchmark for hwmmu, due to
> the higher TLB miss latency, so it's a good result that the worst case
> benchmark has a degradation of only 1%.
>
> VMware expects that these hardware virtualization features will be
> ubiquitous by 2011.
>
> Apart from the performance benefit, VMI was important for Linux on
> VMware's platform, from timekeeping point of view, but with the tickless
> kernels and TSC improvements that were done for the mainline tree, we
> think VMI has outlived those requirements too.
>
> In light of these results and availability of such hardware, we have
> decided to stop supporting VMI in our future products.
>
> Given this new development, I wanted to discuss how should we go about
> retiring the VMI code from mainline Linux, i.e. the vmi_32.c and
> vmiclock_32.c bits.
>
> One of the options that I am contemplating is to drop the code from the
> tip tree in this release cycle, and given that this should be a low risk
> change we can remove it from Linus's tree later in the merge cycle.
>
> Let me know your views on this or if you think we should do this some
> other way.

Typically we give time measured in multiple release cycles
before deprecating a feature. This means placing an entry in
Documentation/feature-removal-schedule.txt, and potentially
adding some noise to warn users they are using a deprecated
feature.

thanks,
-chris

2009-09-18 00:53:33

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/17/09 17:34, Chris Wright wrote:
>> One of the options that I am contemplating is to drop the code from the
>> tip tree in this release cycle, and given that this should be a low risk
>> change we can remove it from Linus's tree later in the merge cycle.
>>
>> Let me know your views on this or if you think we should do this some
>> other way.
>>
> Typically we give time measured in multiple release cycles
> before deprecating a feature. This means placing an entry in
> Documentation/feature-removal-schedule.txt, and potentially
> adding some noise to warn users they are using a deprecated
> feature.
>

That's true if the feature has some functional effect on users. But at
first sight, VMI is really just an optimisation, and a non-VMI-equipped
kernel would be completely functionally equivalent, right?

On the other hand, there could well be a performance regression which
could affect users. However they're taking the explicit step of
withdrawing support for VMI, so I guess they can just take that in their
stride.

J

2009-09-18 01:04:18

by Chris Wright

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

* Jeremy Fitzhardinge ([email protected]) wrote:
> On 09/17/09 17:34, Chris Wright wrote:
> >> One of the options that I am contemplating is to drop the code from the
> >> tip tree in this release cycle, and given that this should be a low risk
> >> change we can remove it from Linus's tree later in the merge cycle.
> >>
> >> Let me know your views on this or if you think we should do this some
> >> other way.
> >>
> > Typically we give time measured in multiple release cycles
> > before deprecating a feature. This means placing an entry in
> > Documentation/feature-removal-schedule.txt, and potentially
> > adding some noise to warn users they are using a deprecated
> > feature.
>
> That's true if the feature has some functional effect on users. But at
> first sight, VMI is really just an optimisation, and a non-VMI-equipped
> kernel would be completely functionally equivalent, right?

True. I'm all for removing code that's got no planned maintenance and
no place to run ;-)

> On the other hand, there could well be a performance regression which
> could affect users. However they're taking the explicit step of
> withdrawing support for VMI, so I guess they can just take that in their
> stride.

Yeah. Different than normal deprecation since it's atop VMware's HV
which is all in their domain.

thanks,
-chris

2009-09-18 01:43:12

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Thu, 2009-09-17 at 17:58 -0700, Chris Wright wrote:
> * Jeremy Fitzhardinge ([email protected]) wrote:
> > On 09/17/09 17:34, Chris Wright wrote:
> > >> One of the options that I am contemplating is to drop the code from the
> > >> tip tree in this release cycle, and given that this should be a low risk
> > >> change we can remove it from Linus's tree later in the merge cycle.
> > >>
> > >> Let me know your views on this or if you think we should do this some
> > >> other way.
> > >>
> > > Typically we give time measured in multiple release cycles
> > > before deprecating a feature. This means placing an entry in
> > > Documentation/feature-removal-schedule.txt, and potentially
> > > adding some noise to warn users they are using a deprecated
> > > feature.
> >
> > That's true if the feature has some functional effect on users. But at
> > first sight, VMI is really just an optimisation, and a non-VMI-equipped
> > kernel would be completely functionally equivalent, right?
>
> True. I'm all for removing code that's got no planned maintenance and
> no place to run ;-)

That's correct, Jeremy put it as well as I could, VMI was always a
optimization, and we expect that new HW features bridge that performance
gap too. So a generic kernel will run just as well on VMware's platform.

Having said that, I would like to clarify that existing products which
support VMI will still carry on supporting it for the current customer
base. Its only the new products which will stop supporting this feature.

>
> > On the other hand, there could well be a performance regression which
> > could affect users. However they're taking the explicit step of
> > withdrawing support for VMI, so I guess they can just take that in their
> > stride.
>
> Yeah. Different than normal deprecation since it's atop VMware's HV
> which is all in their domain.

Yep that's true.

Thanks,
Alok

2009-09-19 07:47:06

by Avi Kivity

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/18/2009 03:17 AM, Alok Kataria wrote:
> Hi,
>
> We ran a few experiments to compare performance of VMware's
> paravirtualization technique (VMI) and hardware MMU technologies (HWMMU)
> on VMware's hypervisor.
>
> To give some background, VMI is VMware's paravirtualization
> specification which tries to optimize CPU and MMU operations of the
> guest operating system. For more information take a look at this
> http://www.vmware.com/interfaces/paravirtualization.html
>
> In most of the benchmarks, EPT/NPT (hwmmu) technologies are at par or
> provide better performance compared to VMI.
> The experiments included comparing performance across various micro and
> real world like benchmarks.
>

We've reached a similar conclusion for kvm pvmmu vs ept/npt.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-09-19 22:47:57

by Greg KH

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On Thu, Sep 17, 2009 at 05:17:08PM -0700, Alok Kataria wrote:
> Given this new development, I wanted to discuss how should we go about
> retiring the VMI code from mainline Linux, i.e. the vmi_32.c and
> vmiclock_32.c bits.
>
> One of the options that I am contemplating is to drop the code from the
> tip tree in this release cycle, and given that this should be a low risk
> change we can remove it from Linus's tree later in the merge cycle.

That sounds good to me, how intrusive are the patches to do this? Is it
going to be tricky to get everything merged properly in -tip for it?

thanks,

greg k-h

2009-09-20 01:05:00

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/19/09 15:44, Greg KH wrote:
> That sounds good to me, how intrusive are the patches to do this? Is it
> going to be tricky to get everything merged properly in -tip for it?

They should be very local - just a matter of removing a couple of files
and dropping some config options.

J

2009-09-20 03:56:32

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Sat, 2009-09-19 at 15:44 -0700, Greg KH wrote:
> On Thu, Sep 17, 2009 at 05:17:08PM -0700, Alok Kataria wrote:
> > Given this new development, I wanted to discuss how should we go about
> > retiring the VMI code from mainline Linux, i.e. the vmi_32.c and
> > vmiclock_32.c bits.
> >
> > One of the options that I am contemplating is to drop the code from the
> > tip tree in this release cycle, and given that this should be a low risk
> > change we can remove it from Linus's tree later in the merge cycle.
>
> That sounds good to me, how intrusive are the patches to do this?

It's a single patch, and the changes are pretty much self contained,
meat of the patch comprises of removing the vmi_32.c and vmiclock_32.c
file. I don't think we may want to break the changes down.

Below are the diffstats, let me post the patch in a separate mail.

====
Documentation/kernel-parameters.txt | 2
arch/x86/Kconfig | 10
arch/x86/include/asm/vmi.h | 269 ----------
arch/x86/include/asm/vmi_time.h | 98 ----
arch/x86/kernel/Makefile | 1
arch/x86/kernel/setup.c | 7
arch/x86/kernel/smpboot.c | 9
arch/x86/kernel/vmi_32.c | 913 -----------------------------------
arch/x86/kernel/vmiclock_32.c | 321 ------------
9 files changed, 1 insertions(+), 1629 deletions(-)
delete mode 100644 arch/x86/include/asm/vmi.h
delete mode 100644 arch/x86/include/asm/vmi_time.h
delete mode 100644 arch/x86/kernel/vmi_32.c
delete mode 100644 arch/x86/kernel/vmiclock_32.c
====

> Is it going to be tricky to get everything merged properly in -tip
> for it?

IMO, shouldn't be a problem.

Thanks,
Alok

2009-09-20 03:59:45

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

Here is the patch which actually removes the vmi code.

Signed-off-by: Alok N Kataria <[email protected]>
---

Documentation/kernel-parameters.txt | 2
arch/x86/Kconfig | 10
arch/x86/include/asm/vmi.h | 269 ----------
arch/x86/include/asm/vmi_time.h | 98 ----
arch/x86/kernel/Makefile | 1
arch/x86/kernel/setup.c | 7
arch/x86/kernel/smpboot.c | 9
arch/x86/kernel/vmi_32.c | 913 -----------------------------------
arch/x86/kernel/vmiclock_32.c | 321 ------------
9 files changed, 1 insertions(+), 1629 deletions(-)
delete mode 100644 arch/x86/include/asm/vmi.h
delete mode 100644 arch/x86/include/asm/vmi_time.h
delete mode 100644 arch/x86/kernel/vmi_32.c
delete mode 100644 arch/x86/kernel/vmiclock_32.c


diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index f45d0d8..3c679aa 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -463,7 +463,7 @@ and is between 256 and 4096 characters. It is defined in the file
[ARM] imx_timer1,OSTS,netx_timer,mpu_timer2,
pxa_timer,timer3,32k_counter,timer0_1
[AVR32] avr32
- [X86-32] pit,hpet,tsc,vmi-timer;
+ [X86-32] pit,hpet,tsc;
scx200_hrt on Geode; cyclone on IBM x440
[MIPS] MIPS
[PARISC] cr16
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index beed5c2..c761aeb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -484,16 +484,6 @@ if PARAVIRT_GUEST

source "arch/x86/xen/Kconfig"

-config VMI
- bool "VMI Guest support"
- select PARAVIRT
- depends on X86_32
- ---help---
- VMI provides a paravirtualized interface to the VMware ESX server
- (it could be used by other hypervisors in theory too, but is not
- at the moment), by linking the kernel to a GPL-ed ROM module
- provided by the hypervisor.
-
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
diff --git a/arch/x86/include/asm/vmi.h b/arch/x86/include/asm/vmi.h
deleted file mode 100644
index 61e08c0..0000000
--- a/arch/x86/include/asm/vmi.h
+++ /dev/null
@@ -1,269 +0,0 @@
-/*
- * VMI interface definition
- *
- * Copyright (C) 2005, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Maintained by: Zachary Amsden [email protected]
- *
- */
-#include <linux/types.h>
-
-/*
- *---------------------------------------------------------------------
- *
- * VMI Option ROM API
- *
- *---------------------------------------------------------------------
- */
-#define VMI_SIGNATURE 0x696d5663 /* "cVmi" */
-
-#define PCI_VENDOR_ID_VMWARE 0x15AD
-#define PCI_DEVICE_ID_VMWARE_VMI 0x0801
-
-/*
- * We use two version numbers for compatibility, with the major
- * number signifying interface breakages, and the minor number
- * interface extensions.
- */
-#define VMI_API_REV_MAJOR 3
-#define VMI_API_REV_MINOR 0
-
-#define VMI_CALL_CPUID 0
-#define VMI_CALL_WRMSR 1
-#define VMI_CALL_RDMSR 2
-#define VMI_CALL_SetGDT 3
-#define VMI_CALL_SetLDT 4
-#define VMI_CALL_SetIDT 5
-#define VMI_CALL_SetTR 6
-#define VMI_CALL_GetGDT 7
-#define VMI_CALL_GetLDT 8
-#define VMI_CALL_GetIDT 9
-#define VMI_CALL_GetTR 10
-#define VMI_CALL_WriteGDTEntry 11
-#define VMI_CALL_WriteLDTEntry 12
-#define VMI_CALL_WriteIDTEntry 13
-#define VMI_CALL_UpdateKernelStack 14
-#define VMI_CALL_SetCR0 15
-#define VMI_CALL_SetCR2 16
-#define VMI_CALL_SetCR3 17
-#define VMI_CALL_SetCR4 18
-#define VMI_CALL_GetCR0 19
-#define VMI_CALL_GetCR2 20
-#define VMI_CALL_GetCR3 21
-#define VMI_CALL_GetCR4 22
-#define VMI_CALL_WBINVD 23
-#define VMI_CALL_SetDR 24
-#define VMI_CALL_GetDR 25
-#define VMI_CALL_RDPMC 26
-#define VMI_CALL_RDTSC 27
-#define VMI_CALL_CLTS 28
-#define VMI_CALL_EnableInterrupts 29
-#define VMI_CALL_DisableInterrupts 30
-#define VMI_CALL_GetInterruptMask 31
-#define VMI_CALL_SetInterruptMask 32
-#define VMI_CALL_IRET 33
-#define VMI_CALL_SYSEXIT 34
-#define VMI_CALL_Halt 35
-#define VMI_CALL_Reboot 36
-#define VMI_CALL_Shutdown 37
-#define VMI_CALL_SetPxE 38
-#define VMI_CALL_SetPxELong 39
-#define VMI_CALL_UpdatePxE 40
-#define VMI_CALL_UpdatePxELong 41
-#define VMI_CALL_MachineToPhysical 42
-#define VMI_CALL_PhysicalToMachine 43
-#define VMI_CALL_AllocatePage 44
-#define VMI_CALL_ReleasePage 45
-#define VMI_CALL_InvalPage 46
-#define VMI_CALL_FlushTLB 47
-#define VMI_CALL_SetLinearMapping 48
-
-#define VMI_CALL_SetIOPLMask 61
-#define VMI_CALL_SetInitialAPState 62
-#define VMI_CALL_APICWrite 63
-#define VMI_CALL_APICRead 64
-#define VMI_CALL_IODelay 65
-#define VMI_CALL_SetLazyMode 73
-
-/*
- *---------------------------------------------------------------------
- *
- * MMU operation flags
- *
- *---------------------------------------------------------------------
- */
-
-/* Flags used by VMI_{Allocate|Release}Page call */
-#define VMI_PAGE_PAE 0x10 /* Allocate PAE shadow */
-#define VMI_PAGE_CLONE 0x20 /* Clone from another shadow */
-#define VMI_PAGE_ZEROED 0x40 /* Page is pre-zeroed */
-
-
-/* Flags shared by Allocate|Release Page and PTE updates */
-#define VMI_PAGE_PT 0x01
-#define VMI_PAGE_PD 0x02
-#define VMI_PAGE_PDP 0x04
-#define VMI_PAGE_PML4 0x08
-
-#define VMI_PAGE_NORMAL 0x00 /* for debugging */
-
-/* Flags used by PTE updates */
-#define VMI_PAGE_CURRENT_AS 0x10 /* implies VMI_PAGE_VA_MASK is valid */
-#define VMI_PAGE_DEFER 0x20 /* may queue update until TLB inval */
-#define VMI_PAGE_VA_MASK 0xfffff000
-
-#ifdef CONFIG_X86_PAE
-#define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_PAE | VMI_PAGE_ZEROED)
-#define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_PAE | VMI_PAGE_ZEROED)
-#else
-#define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_ZEROED)
-#define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_ZEROED)
-#endif
-
-/* Flags used by VMI_FlushTLB call */
-#define VMI_FLUSH_TLB 0x01
-#define VMI_FLUSH_GLOBAL 0x02
-
-/*
- *---------------------------------------------------------------------
- *
- * VMI relocation definitions for ROM call get_reloc
- *
- *---------------------------------------------------------------------
- */
-
-/* VMI Relocation types */
-#define VMI_RELOCATION_NONE 0
-#define VMI_RELOCATION_CALL_REL 1
-#define VMI_RELOCATION_JUMP_REL 2
-#define VMI_RELOCATION_NOP 3
-
-#ifndef __ASSEMBLY__
-struct vmi_relocation_info {
- unsigned char *eip;
- unsigned char type;
- unsigned char reserved[3];
-};
-#endif
-
-
-/*
- *---------------------------------------------------------------------
- *
- * Generic ROM structures and definitions
- *
- *---------------------------------------------------------------------
- */
-
-#ifndef __ASSEMBLY__
-
-struct vrom_header {
- u16 rom_signature; /* option ROM signature */
- u8 rom_length; /* ROM length in 512 byte chunks */
- u8 rom_entry[4]; /* 16-bit code entry point */
- u8 rom_pad0; /* 4-byte align pad */
- u32 vrom_signature; /* VROM identification signature */
- u8 api_version_min;/* Minor version of API */
- u8 api_version_maj;/* Major version of API */
- u8 jump_slots; /* Number of jump slots */
- u8 reserved1; /* Reserved for expansion */
- u32 virtual_top; /* Hypervisor virtual address start */
- u16 reserved2; /* Reserved for expansion */
- u16 license_offs; /* Offset to License string */
- u16 pci_header_offs;/* Offset to PCI OPROM header */
- u16 pnp_header_offs;/* Offset to PnP OPROM header */
- u32 rom_pad3; /* PnP reserverd / VMI reserved */
- u8 reserved[96]; /* Reserved for headers */
- char vmi_init[8]; /* VMI_Init jump point */
- char get_reloc[8]; /* VMI_GetRelocationInfo jump point */
-} __attribute__((packed));
-
-struct pnp_header {
- char sig[4];
- char rev;
- char size;
- short next;
- short res;
- long devID;
- unsigned short manufacturer_offset;
- unsigned short product_offset;
-} __attribute__((packed));
-
-struct pci_header {
- char sig[4];
- short vendorID;
- short deviceID;
- short vpdData;
- short size;
- char rev;
- char class;
- char subclass;
- char interface;
- short chunks;
- char rom_version_min;
- char rom_version_maj;
- char codetype;
- char lastRom;
- short reserved;
-} __attribute__((packed));
-
-/* Function prototypes for bootstrapping */
-#ifdef CONFIG_VMI
-extern void vmi_init(void);
-extern void vmi_activate(void);
-extern void vmi_bringup(void);
-#else
-static inline void vmi_init(void) {}
-static inline void vmi_activate(void) {}
-static inline void vmi_bringup(void) {}
-#endif
-
-/* State needed to start an application processor in an SMP system. */
-struct vmi_ap_state {
- u32 cr0;
- u32 cr2;
- u32 cr3;
- u32 cr4;
-
- u64 efer;
-
- u32 eip;
- u32 eflags;
- u32 eax;
- u32 ebx;
- u32 ecx;
- u32 edx;
- u32 esp;
- u32 ebp;
- u32 esi;
- u32 edi;
- u16 cs;
- u16 ss;
- u16 ds;
- u16 es;
- u16 fs;
- u16 gs;
- u16 ldtr;
-
- u16 gdtr_limit;
- u32 gdtr_base;
- u32 idtr_base;
- u16 idtr_limit;
-};
-
-#endif
diff --git a/arch/x86/include/asm/vmi_time.h b/arch/x86/include/asm/vmi_time.h
deleted file mode 100644
index c6e0bee..0000000
--- a/arch/x86/include/asm/vmi_time.h
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * VMI Time wrappers
- *
- * Copyright (C) 2006, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to [email protected]
- *
- */
-
-#ifndef _ASM_X86_VMI_TIME_H
-#define _ASM_X86_VMI_TIME_H
-
-/*
- * Raw VMI call indices for timer functions
- */
-#define VMI_CALL_GetCycleFrequency 66
-#define VMI_CALL_GetCycleCounter 67
-#define VMI_CALL_SetAlarm 68
-#define VMI_CALL_CancelAlarm 69
-#define VMI_CALL_GetWallclockTime 70
-#define VMI_CALL_WallclockUpdated 71
-
-/* Cached VMI timer operations */
-extern struct vmi_timer_ops {
- u64 (*get_cycle_frequency)(void);
- u64 (*get_cycle_counter)(int);
- u64 (*get_wallclock)(void);
- int (*wallclock_updated)(void);
- void (*set_alarm)(u32 flags, u64 expiry, u64 period);
- void (*cancel_alarm)(u32 flags);
-} vmi_timer_ops;
-
-/* Prototypes */
-extern void __init vmi_time_init(void);
-extern unsigned long vmi_get_wallclock(void);
-extern int vmi_set_wallclock(unsigned long now);
-extern unsigned long long vmi_sched_clock(void);
-extern unsigned long vmi_tsc_khz(void);
-
-#ifdef CONFIG_X86_LOCAL_APIC
-extern void __devinit vmi_time_bsp_init(void);
-extern void __devinit vmi_time_ap_init(void);
-#endif
-
-/*
- * When run under a hypervisor, a vcpu is always in one of three states:
- * running, halted, or ready. The vcpu is in the 'running' state if it
- * is executing. When the vcpu executes the halt interface, the vcpu
- * enters the 'halted' state and remains halted until there is some work
- * pending for the vcpu (e.g. an alarm expires, host I/O completes on
- * behalf of virtual I/O). At this point, the vcpu enters the 'ready'
- * state (waiting for the hypervisor to reschedule it). Finally, at any
- * time when the vcpu is not in the 'running' state nor the 'halted'
- * state, it is in the 'ready' state.
- *
- * Real time is advances while the vcpu is 'running', 'ready', or
- * 'halted'. Stolen time is the time in which the vcpu is in the
- * 'ready' state. Available time is the remaining time -- the vcpu is
- * either 'running' or 'halted'.
- *
- * All three views of time are accessible through the VMI cycle
- * counters.
- */
-
-/* The cycle counters. */
-#define VMI_CYCLES_REAL 0
-#define VMI_CYCLES_AVAILABLE 1
-#define VMI_CYCLES_STOLEN 2
-
-/* The alarm interface 'flags' bits */
-#define VMI_ALARM_COUNTERS 2
-
-#define VMI_ALARM_COUNTER_MASK 0x000000ff
-
-#define VMI_ALARM_WIRED_IRQ0 0x00000000
-#define VMI_ALARM_WIRED_LVTT 0x00010000
-
-#define VMI_ALARM_IS_ONESHOT 0x00000000
-#define VMI_ALARM_IS_PERIODIC 0x00000100
-
-#define CONFIG_VMI_ALARM_HZ 100
-
-#endif /* _ASM_X86_VMI_TIME_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 91d4189..255d6c8 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,7 +92,6 @@ obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o

-obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
obj-$(CONFIG_KVM_GUEST) += kvm.o
obj-$(CONFIG_KVM_CLOCK) += kvmclock.o
obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f327bcc..f2fefe0 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -82,7 +82,6 @@
#include <asm/dmi.h>
#include <asm/io_apic.h>
#include <asm/ist.h>
-#include <asm/vmi.h>
#include <asm/setup_arch.h>
#include <asm/bios_ebda.h>
#include <asm/cacheflush.h>
@@ -697,9 +696,6 @@ void __init setup_arch(char **cmdline_p)
printk(KERN_INFO "Command line: %s\n", boot_command_line);
#endif

- /* VMI may relocate the fixmap; do this before touching ioremap area */
- vmi_init();
-
early_cpu_init();
early_ioremap_init();

@@ -798,9 +794,6 @@ void __init setup_arch(char **cmdline_p)
check_efer();
#endif

- /* Must be before kernel pagetables are setup */
- vmi_activate();
-
/* after early param, so could get panic from serial */
reserve_early_setup_data();

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a9ccc17..52b60dd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -60,7 +60,6 @@
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/mtrr.h>
-#include <asm/vmi.h>
#include <asm/apic.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
@@ -274,7 +273,6 @@ notrace static void __cpuinit start_secondary(void *unused)
* fragile that we want to limit the things done here to the
* most necessary things.
*/
- vmi_bringup();
cpu_init();
preempt_disable();
smp_callin();
@@ -599,13 +597,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
num_starts = 0;

/*
- * Paravirt / VMI wants a startup IPI hook here to set up the
- * target processor state.
- */
- startup_ipi_hook(phys_apicid, (unsigned long) start_secondary,
- (unsigned long)stack_start.sp);
-
- /*
* Run STARTUP IPI loop.
*/
pr_debug("#startup loops: %d.\n", num_starts);
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
deleted file mode 100644
index 31e6f6c..0000000
--- a/arch/x86/kernel/vmi_32.c
+++ /dev/null
@@ -1,913 +0,0 @@
-/*
- * VMI specific paravirt-ops implementation
- *
- * Copyright (C) 2005, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to [email protected]
- *
- */
-
-#include <linux/module.h>
-#include <linux/cpu.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/sched.h>
-#include <asm/vmi.h>
-#include <asm/io.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/processor.h>
-#include <asm/timer.h>
-#include <asm/vmi_time.h>
-#include <asm/kmap_types.h>
-#include <asm/setup.h>
-
-/* Convenient for calling VMI functions indirectly in the ROM */
-typedef u32 __attribute__((regparm(1))) (VROMFUNC)(void);
-typedef u64 __attribute__((regparm(2))) (VROMLONGFUNC)(int);
-
-#define call_vrom_func(rom,func) \
- (((VROMFUNC *)(rom->func))())
-
-#define call_vrom_long_func(rom,func,arg) \
- (((VROMLONGFUNC *)(rom->func)) (arg))
-
-static struct vrom_header *vmi_rom;
-static int disable_pge;
-static int disable_pse;
-static int disable_sep;
-static int disable_tsc;
-static int disable_mtrr;
-static int disable_noidle;
-static int disable_vmi_timer;
-
-/* Cached VMI operations */
-static struct {
- void (*cpuid)(void /* non-c */);
- void (*_set_ldt)(u32 selector);
- void (*set_tr)(u32 selector);
- void (*write_idt_entry)(struct desc_struct *, int, u32, u32);
- void (*write_gdt_entry)(struct desc_struct *, int, u32, u32);
- void (*write_ldt_entry)(struct desc_struct *, int, u32, u32);
- void (*set_kernel_stack)(u32 selector, u32 sp0);
- void (*allocate_page)(u32, u32, u32, u32, u32);
- void (*release_page)(u32, u32);
- void (*set_pte)(pte_t, pte_t *, unsigned);
- void (*update_pte)(pte_t *, unsigned);
- void (*set_linear_mapping)(int, void *, u32, u32);
- void (*_flush_tlb)(int);
- void (*set_initial_ap_state)(int, int);
- void (*halt)(void);
- void (*set_lazy_mode)(int mode);
-} vmi_ops;
-
-/* Cached VMI operations */
-struct vmi_timer_ops vmi_timer_ops;
-
-/*
- * VMI patching routines.
- */
-#define MNEM_CALL 0xe8
-#define MNEM_JMP 0xe9
-#define MNEM_RET 0xc3
-
-#define IRQ_PATCH_INT_MASK 0
-#define IRQ_PATCH_DISABLE 5
-
-static inline void patch_offset(void *insnbuf,
- unsigned long ip, unsigned long dest)
-{
- *(unsigned long *)(insnbuf+1) = dest-ip-5;
-}
-
-static unsigned patch_internal(int call, unsigned len, void *insnbuf,
- unsigned long ip)
-{
- u64 reloc;
- struct vmi_relocation_info *const rel = (struct vmi_relocation_info *)&reloc;
- reloc = call_vrom_long_func(vmi_rom, get_reloc, call);
- switch(rel->type) {
- case VMI_RELOCATION_CALL_REL:
- BUG_ON(len < 5);
- *(char *)insnbuf = MNEM_CALL;
- patch_offset(insnbuf, ip, (unsigned long)rel->eip);
- return 5;
-
- case VMI_RELOCATION_JUMP_REL:
- BUG_ON(len < 5);
- *(char *)insnbuf = MNEM_JMP;
- patch_offset(insnbuf, ip, (unsigned long)rel->eip);
- return 5;
-
- case VMI_RELOCATION_NOP:
- /* obliterate the whole thing */
- return 0;
-
- case VMI_RELOCATION_NONE:
- /* leave native code in place */
- break;
-
- default:
- BUG();
- }
- return len;
-}
-
-/*
- * Apply patch if appropriate, return length of new instruction
- * sequence. The callee does nop padding for us.
- */
-static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
- unsigned long ip, unsigned len)
-{
- switch (type) {
- case PARAVIRT_PATCH(pv_irq_ops.irq_disable):
- return patch_internal(VMI_CALL_DisableInterrupts, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.irq_enable):
- return patch_internal(VMI_CALL_EnableInterrupts, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.restore_fl):
- return patch_internal(VMI_CALL_SetInterruptMask, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.save_fl):
- return patch_internal(VMI_CALL_GetInterruptMask, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_cpu_ops.iret):
- return patch_internal(VMI_CALL_IRET, len, insns, ip);
- case PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit):
- return patch_internal(VMI_CALL_SYSEXIT, len, insns, ip);
- default:
- break;
- }
- return len;
-}
-
-/* CPUID has non-C semantics, and paravirt-ops API doesn't match hardware ISA */
-static void vmi_cpuid(unsigned int *ax, unsigned int *bx,
- unsigned int *cx, unsigned int *dx)
-{
- int override = 0;
- if (*ax == 1)
- override = 1;
- asm volatile ("call *%6"
- : "=a" (*ax),
- "=b" (*bx),
- "=c" (*cx),
- "=d" (*dx)
- : "0" (*ax), "2" (*cx), "r" (vmi_ops.cpuid));
- if (override) {
- if (disable_pse)
- *dx &= ~X86_FEATURE_PSE;
- if (disable_pge)
- *dx &= ~X86_FEATURE_PGE;
- if (disable_sep)
- *dx &= ~X86_FEATURE_SEP;
- if (disable_tsc)
- *dx &= ~X86_FEATURE_TSC;
- if (disable_mtrr)
- *dx &= ~X86_FEATURE_MTRR;
- }
-}
-
-static inline void vmi_maybe_load_tls(struct desc_struct *gdt, int nr, struct desc_struct *new)
-{
- if (gdt[nr].a != new->a || gdt[nr].b != new->b)
- write_gdt_entry(gdt, nr, new, 0);
-}
-
-static void vmi_load_tls(struct thread_struct *t, unsigned int cpu)
-{
- struct desc_struct *gdt = get_cpu_gdt_table(cpu);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 0, &t->tls_array[0]);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 1, &t->tls_array[1]);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 2, &t->tls_array[2]);
-}
-
-static void vmi_set_ldt(const void *addr, unsigned entries)
-{
- unsigned cpu = smp_processor_id();
- struct desc_struct desc;
-
- pack_descriptor(&desc, (unsigned long)addr,
- entries * sizeof(struct desc_struct) - 1,
- DESC_LDT, 0);
- write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_LDT, &desc, DESC_LDT);
- vmi_ops._set_ldt(entries ? GDT_ENTRY_LDT*sizeof(struct desc_struct) : 0);
-}
-
-static void vmi_set_tr(void)
-{
- vmi_ops.set_tr(GDT_ENTRY_TSS*sizeof(struct desc_struct));
-}
-
-static void vmi_write_idt_entry(gate_desc *dt, int entry, const gate_desc *g)
-{
- u32 *idt_entry = (u32 *)g;
- vmi_ops.write_idt_entry(dt, entry, idt_entry[0], idt_entry[1]);
-}
-
-static void vmi_write_gdt_entry(struct desc_struct *dt, int entry,
- const void *desc, int type)
-{
- u32 *gdt_entry = (u32 *)desc;
- vmi_ops.write_gdt_entry(dt, entry, gdt_entry[0], gdt_entry[1]);
-}
-
-static void vmi_write_ldt_entry(struct desc_struct *dt, int entry,
- const void *desc)
-{
- u32 *ldt_entry = (u32 *)desc;
- vmi_ops.write_ldt_entry(dt, entry, ldt_entry[0], ldt_entry[1]);
-}
-
-static void vmi_load_sp0(struct tss_struct *tss,
- struct thread_struct *thread)
-{
- tss->x86_tss.sp0 = thread->sp0;
-
- /* This can only happen when SEP is enabled, no need to test "SEP"arately */
- if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
- tss->x86_tss.ss1 = thread->sysenter_cs;
- wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
- }
- vmi_ops.set_kernel_stack(__KERNEL_DS, tss->x86_tss.sp0);
-}
-
-static void vmi_flush_tlb_user(void)
-{
- vmi_ops._flush_tlb(VMI_FLUSH_TLB);
-}
-
-static void vmi_flush_tlb_kernel(void)
-{
- vmi_ops._flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
-}
-
-/* Stub to do nothing at all; used for delays and unimplemented calls */
-static void vmi_nop(void)
-{
-}
-
-#ifdef CONFIG_HIGHPTE
-static void *vmi_kmap_atomic_pte(struct page *page, enum km_type type)
-{
- void *va = kmap_atomic(page, type);
-
- /*
- * Internally, the VMI ROM must map virtual addresses to physical
- * addresses for processing MMU updates. By the time MMU updates
- * are issued, this information is typically already lost.
- * Fortunately, the VMI provides a cache of mapping slots for active
- * page tables.
- *
- * We use slot zero for the linear mapping of physical memory, and
- * in HIGHPTE kernels, slot 1 and 2 for KM_PTE0 and KM_PTE1.
- *
- * args: SLOT VA COUNT PFN
- */
- BUG_ON(type != KM_PTE0 && type != KM_PTE1);
- vmi_ops.set_linear_mapping((type - KM_PTE0)+1, va, 1, page_to_pfn(page));
-
- return va;
-}
-#endif
-
-static void vmi_allocate_pte(struct mm_struct *mm, unsigned long pfn)
-{
- vmi_ops.allocate_page(pfn, VMI_PAGE_L1, 0, 0, 0);
-}
-
-static void vmi_allocate_pmd(struct mm_struct *mm, unsigned long pfn)
-{
- /*
- * This call comes in very early, before mem_map is setup.
- * It is called only for swapper_pg_dir, which already has
- * data on it.
- */
- vmi_ops.allocate_page(pfn, VMI_PAGE_L2, 0, 0, 0);
-}
-
-static void vmi_allocate_pmd_clone(unsigned long pfn, unsigned long clonepfn, unsigned long start, unsigned long count)
-{
- vmi_ops.allocate_page(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE, clonepfn, start, count);
-}
-
-static void vmi_release_pte(unsigned long pfn)
-{
- vmi_ops.release_page(pfn, VMI_PAGE_L1);
-}
-
-static void vmi_release_pmd(unsigned long pfn)
-{
- vmi_ops.release_page(pfn, VMI_PAGE_L2);
-}
-
-/*
- * We use the pgd_free hook for releasing the pgd page:
- */
-static void vmi_pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
- unsigned long pfn = __pa(pgd) >> PAGE_SHIFT;
-
- vmi_ops.release_page(pfn, VMI_PAGE_L2);
-}
-
-/*
- * Helper macros for MMU update flags. We can defer updates until a flush
- * or page invalidation only if the update is to the current address space
- * (otherwise, there is no flush). We must check against init_mm, since
- * this could be a kernel update, which usually passes init_mm, although
- * sometimes this check can be skipped if we know the particular function
- * is only called on user mode PTEs. We could change the kernel to pass
- * current->active_mm here, but in particular, I was unsure if changing
- * mm/highmem.c to do this would still be correct on other architectures.
- */
-#define is_current_as(mm, mustbeuser) ((mm) == current->active_mm || \
- (!mustbeuser && (mm) == &init_mm))
-#define vmi_flags_addr(mm, addr, level, user) \
- ((level) | (is_current_as(mm, user) ? \
- (VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0))
-#define vmi_flags_addr_defer(mm, addr, level, user) \
- ((level) | (is_current_as(mm, user) ? \
- (VMI_PAGE_DEFER | VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0))
-
-static void vmi_update_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- vmi_ops.update_pte(ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_update_pte_defer(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- vmi_ops.update_pte(ptep, vmi_flags_addr_defer(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_set_pte(pte_t *ptep, pte_t pte)
-{
- /* XXX because of set_pmd_pte, this can be called on PT or PD layers */
- vmi_ops.set_pte(pte, ptep, VMI_PAGE_PT);
-}
-
-static void vmi_set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
-{
- vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_set_pmd(pmd_t *pmdp, pmd_t pmdval)
-{
-#ifdef CONFIG_X86_PAE
- const pte_t pte = { .pte = pmdval.pmd };
-#else
- const pte_t pte = { pmdval.pud.pgd.pgd };
-#endif
- vmi_ops.set_pte(pte, (pte_t *)pmdp, VMI_PAGE_PD);
-}
-
-#ifdef CONFIG_X86_PAE
-
-static void vmi_set_pte_atomic(pte_t *ptep, pte_t pteval)
-{
- /*
- * XXX This is called from set_pmd_pte, but at both PT
- * and PD layers so the VMI_PAGE_PT flag is wrong. But
- * it is only called for large page mapping changes,
- * the Xen backend, doesn't support large pages, and the
- * ESX backend doesn't depend on the flag.
- */
- set_64bit((unsigned long long *)ptep,pte_val(pteval));
- vmi_ops.update_pte(ptep, VMI_PAGE_PT);
-}
-
-static void vmi_set_pud(pud_t *pudp, pud_t pudval)
-{
- /* Um, eww */
- const pte_t pte = { .pte = pudval.pgd.pgd };
- vmi_ops.set_pte(pte, (pte_t *)pudp, VMI_PAGE_PDP);
-}
-
-static void vmi_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- const pte_t pte = { .pte = 0 };
- vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_pmd_clear(pmd_t *pmd)
-{
- const pte_t pte = { .pte = 0 };
- vmi_ops.set_pte(pte, (pte_t *)pmd, VMI_PAGE_PD);
-}
-#endif
-
-#ifdef CONFIG_SMP
-static void __devinit
-vmi_startup_ipi_hook(int phys_apicid, unsigned long start_eip,
- unsigned long start_esp)
-{
- struct vmi_ap_state ap;
-
- /* Default everything to zero. This is fine for most GPRs. */
- memset(&ap, 0, sizeof(struct vmi_ap_state));
-
- ap.gdtr_limit = GDT_SIZE - 1;
- ap.gdtr_base = (unsigned long) get_cpu_gdt_table(phys_apicid);
-
- ap.idtr_limit = IDT_ENTRIES * 8 - 1;
- ap.idtr_base = (unsigned long) idt_table;
-
- ap.ldtr = 0;
-
- ap.cs = __KERNEL_CS;
- ap.eip = (unsigned long) start_eip;
- ap.ss = __KERNEL_DS;
- ap.esp = (unsigned long) start_esp;
-
- ap.ds = __USER_DS;
- ap.es = __USER_DS;
- ap.fs = __KERNEL_PERCPU;
- ap.gs = __KERNEL_STACK_CANARY;
-
- ap.eflags = 0;
-
-#ifdef CONFIG_X86_PAE
- /* efer should match BSP efer. */
- if (cpu_has_nx) {
- unsigned l, h;
- rdmsr(MSR_EFER, l, h);
- ap.efer = (unsigned long long) h << 32 | l;
- }
-#endif
-
- ap.cr3 = __pa(swapper_pg_dir);
- /* Protected mode, paging, AM, WP, NE, MP. */
- ap.cr0 = 0x80050023;
- ap.cr4 = mmu_cr4_features;
- vmi_ops.set_initial_ap_state((u32)&ap, phys_apicid);
-}
-#endif
-
-static void vmi_start_context_switch(struct task_struct *prev)
-{
- paravirt_start_context_switch(prev);
- vmi_ops.set_lazy_mode(2);
-}
-
-static void vmi_end_context_switch(struct task_struct *next)
-{
- vmi_ops.set_lazy_mode(0);
- paravirt_end_context_switch(next);
-}
-
-static void vmi_enter_lazy_mmu(void)
-{
- paravirt_enter_lazy_mmu();
- vmi_ops.set_lazy_mode(1);
-}
-
-static void vmi_leave_lazy_mmu(void)
-{
- vmi_ops.set_lazy_mode(0);
- paravirt_leave_lazy_mmu();
-}
-
-static inline int __init check_vmi_rom(struct vrom_header *rom)
-{
- struct pci_header *pci;
- struct pnp_header *pnp;
- const char *manufacturer = "UNKNOWN";
- const char *product = "UNKNOWN";
- const char *license = "unspecified";
-
- if (rom->rom_signature != 0xaa55)
- return 0;
- if (rom->vrom_signature != VMI_SIGNATURE)
- return 0;
- if (rom->api_version_maj != VMI_API_REV_MAJOR ||
- rom->api_version_min+1 < VMI_API_REV_MINOR+1) {
- printk(KERN_WARNING "VMI: Found mismatched rom version %d.%d\n",
- rom->api_version_maj,
- rom->api_version_min);
- return 0;
- }
-
- /*
- * Relying on the VMI_SIGNATURE field is not 100% safe, so check
- * the PCI header and device type to make sure this is really a
- * VMI device.
- */
- if (!rom->pci_header_offs) {
- printk(KERN_WARNING "VMI: ROM does not contain PCI header.\n");
- return 0;
- }
-
- pci = (struct pci_header *)((char *)rom+rom->pci_header_offs);
- if (pci->vendorID != PCI_VENDOR_ID_VMWARE ||
- pci->deviceID != PCI_DEVICE_ID_VMWARE_VMI) {
- /* Allow it to run... anyways, but warn */
- printk(KERN_WARNING "VMI: ROM from unknown manufacturer\n");
- }
-
- if (rom->pnp_header_offs) {
- pnp = (struct pnp_header *)((char *)rom+rom->pnp_header_offs);
- if (pnp->manufacturer_offset)
- manufacturer = (const char *)rom+pnp->manufacturer_offset;
- if (pnp->product_offset)
- product = (const char *)rom+pnp->product_offset;
- }
-
- if (rom->license_offs)
- license = (char *)rom+rom->license_offs;
-
- printk(KERN_INFO "VMI: Found %s %s, API version %d.%d, ROM version %d.%d\n",
- manufacturer, product,
- rom->api_version_maj, rom->api_version_min,
- pci->rom_version_maj, pci->rom_version_min);
-
- /* Don't allow BSD/MIT here for now because we don't want to end up
- with any binary only shim layers */
- if (strcmp(license, "GPL") && strcmp(license, "GPL v2")) {
- printk(KERN_WARNING "VMI: Non GPL license `%s' found for ROM. Not used.\n",
- license);
- return 0;
- }
-
- return 1;
-}
-
-/*
- * Probe for the VMI option ROM
- */
-static inline int __init probe_vmi_rom(void)
-{
- unsigned long base;
-
- /* VMI ROM is in option ROM area, check signature */
- for (base = 0xC0000; base < 0xE0000; base += 2048) {
- struct vrom_header *romstart;
- romstart = (struct vrom_header *)isa_bus_to_virt(base);
- if (check_vmi_rom(romstart)) {
- vmi_rom = romstart;
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * VMI setup common to all processors
- */
-void vmi_bringup(void)
-{
- /* We must establish the lowmem mapping for MMU ops to work */
- if (vmi_ops.set_linear_mapping)
- vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, MAXMEM_PFN, 0);
-}
-
-/*
- * Return a pointer to a VMI function or NULL if unimplemented
- */
-static void *vmi_get_function(int vmicall)
-{
- u64 reloc;
- const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc;
- reloc = call_vrom_long_func(vmi_rom, get_reloc, vmicall);
- BUG_ON(rel->type == VMI_RELOCATION_JUMP_REL);
- if (rel->type == VMI_RELOCATION_CALL_REL)
- return (void *)rel->eip;
- else
- return NULL;
-}
-
-/*
- * Helper macro for making the VMI paravirt-ops fill code readable.
- * For unimplemented operations, fall back to default, unless nop
- * is returned by the ROM.
- */
-#define para_fill(opname, vmicall) \
-do { \
- reloc = call_vrom_long_func(vmi_rom, get_reloc, \
- VMI_CALL_##vmicall); \
- if (rel->type == VMI_RELOCATION_CALL_REL) \
- opname = (void *)rel->eip; \
- else if (rel->type == VMI_RELOCATION_NOP) \
- opname = (void *)vmi_nop; \
- else if (rel->type != VMI_RELOCATION_NONE) \
- printk(KERN_WARNING "VMI: Unknown relocation " \
- "type %d for " #vmicall"\n",\
- rel->type); \
-} while (0)
-
-/*
- * Helper macro for making the VMI paravirt-ops fill code readable.
- * For cached operations which do not match the VMI ROM ABI and must
- * go through a tranlation stub. Ignore NOPs, since it is not clear
- * a NOP * VMI function corresponds to a NOP paravirt-op when the
- * functions are not in 1-1 correspondence.
- */
-#define para_wrap(opname, wrapper, cache, vmicall) \
-do { \
- reloc = call_vrom_long_func(vmi_rom, get_reloc, \
- VMI_CALL_##vmicall); \
- BUG_ON(rel->type == VMI_RELOCATION_JUMP_REL); \
- if (rel->type == VMI_RELOCATION_CALL_REL) { \
- opname = wrapper; \
- vmi_ops.cache = (void *)rel->eip; \
- } \
-} while (0)
-
-/*
- * Activate the VMI interface and switch into paravirtualized mode
- */
-static inline int __init activate_vmi(void)
-{
- short kernel_cs;
- u64 reloc;
- const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc;
-
- if (call_vrom_func(vmi_rom, vmi_init) != 0) {
- printk(KERN_ERR "VMI ROM failed to initialize!");
- return 0;
- }
- savesegment(cs, kernel_cs);
-
- pv_info.paravirt_enabled = 1;
- pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
- pv_info.name = "vmi";
-
- pv_init_ops.patch = vmi_patch;
-
- /*
- * Many of these operations are ABI compatible with VMI.
- * This means we can fill in the paravirt-ops with direct
- * pointers into the VMI ROM. If the calling convention for
- * these operations changes, this code needs to be updated.
- *
- * Exceptions
- * CPUID paravirt-op uses pointers, not the native ISA
- * halt has no VMI equivalent; all VMI halts are "safe"
- * no MSR support yet - just trap and emulate. VMI uses the
- * same ABI as the native ISA, but Linux wants exceptions
- * from bogus MSR read / write handled
- * rdpmc is not yet used in Linux
- */
-
- /* CPUID is special, so very special it gets wrapped like a present */
- para_wrap(pv_cpu_ops.cpuid, vmi_cpuid, cpuid, CPUID);
-
- para_fill(pv_cpu_ops.clts, CLTS);
- para_fill(pv_cpu_ops.get_debugreg, GetDR);
- para_fill(pv_cpu_ops.set_debugreg, SetDR);
- para_fill(pv_cpu_ops.read_cr0, GetCR0);
- para_fill(pv_mmu_ops.read_cr2, GetCR2);
- para_fill(pv_mmu_ops.read_cr3, GetCR3);
- para_fill(pv_cpu_ops.read_cr4, GetCR4);
- para_fill(pv_cpu_ops.write_cr0, SetCR0);
- para_fill(pv_mmu_ops.write_cr2, SetCR2);
- para_fill(pv_mmu_ops.write_cr3, SetCR3);
- para_fill(pv_cpu_ops.write_cr4, SetCR4);
-
- para_fill(pv_irq_ops.save_fl.func, GetInterruptMask);
- para_fill(pv_irq_ops.restore_fl.func, SetInterruptMask);
- para_fill(pv_irq_ops.irq_disable.func, DisableInterrupts);
- para_fill(pv_irq_ops.irq_enable.func, EnableInterrupts);
-
- para_fill(pv_cpu_ops.wbinvd, WBINVD);
- para_fill(pv_cpu_ops.read_tsc, RDTSC);
-
- /* The following we emulate with trap and emulate for now */
- /* paravirt_ops.read_msr = vmi_rdmsr */
- /* paravirt_ops.write_msr = vmi_wrmsr */
- /* paravirt_ops.rdpmc = vmi_rdpmc */
-
- /* TR interface doesn't pass TR value, wrap */
- para_wrap(pv_cpu_ops.load_tr_desc, vmi_set_tr, set_tr, SetTR);
-
- /* LDT is special, too */
- para_wrap(pv_cpu_ops.set_ldt, vmi_set_ldt, _set_ldt, SetLDT);
-
- para_fill(pv_cpu_ops.load_gdt, SetGDT);
- para_fill(pv_cpu_ops.load_idt, SetIDT);
- para_fill(pv_cpu_ops.store_gdt, GetGDT);
- para_fill(pv_cpu_ops.store_idt, GetIDT);
- para_fill(pv_cpu_ops.store_tr, GetTR);
- pv_cpu_ops.load_tls = vmi_load_tls;
- para_wrap(pv_cpu_ops.write_ldt_entry, vmi_write_ldt_entry,
- write_ldt_entry, WriteLDTEntry);
- para_wrap(pv_cpu_ops.write_gdt_entry, vmi_write_gdt_entry,
- write_gdt_entry, WriteGDTEntry);
- para_wrap(pv_cpu_ops.write_idt_entry, vmi_write_idt_entry,
- write_idt_entry, WriteIDTEntry);
- para_wrap(pv_cpu_ops.load_sp0, vmi_load_sp0, set_kernel_stack, UpdateKernelStack);
- para_fill(pv_cpu_ops.set_iopl_mask, SetIOPLMask);
- para_fill(pv_cpu_ops.io_delay, IODelay);
-
- para_wrap(pv_cpu_ops.start_context_switch, vmi_start_context_switch,
- set_lazy_mode, SetLazyMode);
- para_wrap(pv_cpu_ops.end_context_switch, vmi_end_context_switch,
- set_lazy_mode, SetLazyMode);
-
- para_wrap(pv_mmu_ops.lazy_mode.enter, vmi_enter_lazy_mmu,
- set_lazy_mode, SetLazyMode);
- para_wrap(pv_mmu_ops.lazy_mode.leave, vmi_leave_lazy_mmu,
- set_lazy_mode, SetLazyMode);
-
- /* user and kernel flush are just handled with different flags to FlushTLB */
- para_wrap(pv_mmu_ops.flush_tlb_user, vmi_flush_tlb_user, _flush_tlb, FlushTLB);
- para_wrap(pv_mmu_ops.flush_tlb_kernel, vmi_flush_tlb_kernel, _flush_tlb, FlushTLB);
- para_fill(pv_mmu_ops.flush_tlb_single, InvalPage);
-
- /*
- * Until a standard flag format can be agreed on, we need to
- * implement these as wrappers in Linux. Get the VMI ROM
- * function pointers for the two backend calls.
- */
-#ifdef CONFIG_X86_PAE
- vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxELong);
- vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxELong);
-#else
- vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxE);
- vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxE);
-#endif
-
- if (vmi_ops.set_pte) {
- pv_mmu_ops.set_pte = vmi_set_pte;
- pv_mmu_ops.set_pte_at = vmi_set_pte_at;
- pv_mmu_ops.set_pmd = vmi_set_pmd;
-#ifdef CONFIG_X86_PAE
- pv_mmu_ops.set_pte_atomic = vmi_set_pte_atomic;
- pv_mmu_ops.set_pud = vmi_set_pud;
- pv_mmu_ops.pte_clear = vmi_pte_clear;
- pv_mmu_ops.pmd_clear = vmi_pmd_clear;
-#endif
- }
-
- if (vmi_ops.update_pte) {
- pv_mmu_ops.pte_update = vmi_update_pte;
- pv_mmu_ops.pte_update_defer = vmi_update_pte_defer;
- }
-
- vmi_ops.allocate_page = vmi_get_function(VMI_CALL_AllocatePage);
- if (vmi_ops.allocate_page) {
- pv_mmu_ops.alloc_pte = vmi_allocate_pte;
- pv_mmu_ops.alloc_pmd = vmi_allocate_pmd;
- pv_mmu_ops.alloc_pmd_clone = vmi_allocate_pmd_clone;
- }
-
- vmi_ops.release_page = vmi_get_function(VMI_CALL_ReleasePage);
- if (vmi_ops.release_page) {
- pv_mmu_ops.release_pte = vmi_release_pte;
- pv_mmu_ops.release_pmd = vmi_release_pmd;
- pv_mmu_ops.pgd_free = vmi_pgd_free;
- }
-
- /* Set linear is needed in all cases */
- vmi_ops.set_linear_mapping = vmi_get_function(VMI_CALL_SetLinearMapping);
-#ifdef CONFIG_HIGHPTE
- if (vmi_ops.set_linear_mapping)
- pv_mmu_ops.kmap_atomic_pte = vmi_kmap_atomic_pte;
-#endif
-
- /*
- * These MUST always be patched. Don't support indirect jumps
- * through these operations, as the VMI interface may use either
- * a jump or a call to get to these operations, depending on
- * the backend. They are performance critical anyway, so requiring
- * a patch is not a big problem.
- */
- pv_cpu_ops.irq_enable_sysexit = (void *)0xfeedbab0;
- pv_cpu_ops.iret = (void *)0xbadbab0;
-
-#ifdef CONFIG_SMP
- para_wrap(pv_apic_ops.startup_ipi_hook, vmi_startup_ipi_hook, set_initial_ap_state, SetInitialAPState);
-#endif
-
-#ifdef CONFIG_X86_LOCAL_APIC
- para_fill(apic->read, APICRead);
- para_fill(apic->write, APICWrite);
-#endif
-
- /*
- * Check for VMI timer functionality by probing for a cycle frequency method
- */
- reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_GetCycleFrequency);
- if (!disable_vmi_timer && rel->type != VMI_RELOCATION_NONE) {
- vmi_timer_ops.get_cycle_frequency = (void *)rel->eip;
- vmi_timer_ops.get_cycle_counter =
- vmi_get_function(VMI_CALL_GetCycleCounter);
- vmi_timer_ops.get_wallclock =
- vmi_get_function(VMI_CALL_GetWallclockTime);
- vmi_timer_ops.wallclock_updated =
- vmi_get_function(VMI_CALL_WallclockUpdated);
- vmi_timer_ops.set_alarm = vmi_get_function(VMI_CALL_SetAlarm);
- vmi_timer_ops.cancel_alarm =
- vmi_get_function(VMI_CALL_CancelAlarm);
- x86_init.timers.timer_init = vmi_time_init;
-#ifdef CONFIG_X86_LOCAL_APIC
- x86_init.timers.setup_percpu_clockev = vmi_time_bsp_init;
- x86_cpuinit.setup_percpu_clockev = vmi_time_ap_init;
-#endif
- pv_time_ops.sched_clock = vmi_sched_clock;
- x86_platform.calibrate_tsc = vmi_tsc_khz;
- x86_platform.get_wallclock = vmi_get_wallclock;
- x86_platform.set_wallclock = vmi_set_wallclock;
-
- /* We have true wallclock functions; disable CMOS clock sync */
- no_sync_cmos_clock = 1;
- } else {
- disable_noidle = 1;
- disable_vmi_timer = 1;
- }
-
- para_fill(pv_irq_ops.safe_halt, Halt);
-
- /*
- * Alternative instruction rewriting doesn't happen soon enough
- * to convert VMI_IRET to a call instead of a jump; so we have
- * to do this before IRQs get reenabled. Fortunately, it is
- * idempotent.
- */
- apply_paravirt(__parainstructions, __parainstructions_end);
-
- vmi_bringup();
-
- return 1;
-}
-
-#undef para_fill
-
-void __init vmi_init(void)
-{
- if (!vmi_rom)
- probe_vmi_rom();
- else
- check_vmi_rom(vmi_rom);
-
- /* In case probing for or validating the ROM failed, basil */
- if (!vmi_rom)
- return;
-
- reserve_top_address(-vmi_rom->virtual_top);
-
-#ifdef CONFIG_X86_IO_APIC
- /* This is virtual hardware; timer routing is wired correctly */
- no_timer_check = 1;
-#endif
-}
-
-void __init vmi_activate(void)
-{
- unsigned long flags;
-
- if (!vmi_rom)
- return;
-
- local_irq_save(flags);
- activate_vmi();
- local_irq_restore(flags & X86_EFLAGS_IF);
-}
-
-static int __init parse_vmi(char *arg)
-{
- if (!arg)
- return -EINVAL;
-
- if (!strcmp(arg, "disable_pge")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_PGE);
- disable_pge = 1;
- } else if (!strcmp(arg, "disable_pse")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_PSE);
- disable_pse = 1;
- } else if (!strcmp(arg, "disable_sep")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_SEP);
- disable_sep = 1;
- } else if (!strcmp(arg, "disable_tsc")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC);
- disable_tsc = 1;
- } else if (!strcmp(arg, "disable_mtrr")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_MTRR);
- disable_mtrr = 1;
- } else if (!strcmp(arg, "disable_timer")) {
- disable_vmi_timer = 1;
- disable_noidle = 1;
- } else if (!strcmp(arg, "disable_noidle"))
- disable_noidle = 1;
- return 0;
-}
-
-early_param("vmi", parse_vmi);
diff --git a/arch/x86/kernel/vmiclock_32.c b/arch/x86/kernel/vmiclock_32.c
deleted file mode 100644
index 611b9e2..0000000
--- a/arch/x86/kernel/vmiclock_32.c
+++ /dev/null
@@ -1,321 +0,0 @@
-/*
- * VMI paravirtual timer support routines.
- *
- * Copyright (C) 2007, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- */
-
-#include <linux/smp.h>
-#include <linux/interrupt.h>
-#include <linux/cpumask.h>
-#include <linux/clocksource.h>
-#include <linux/clockchips.h>
-
-#include <asm/vmi.h>
-#include <asm/vmi_time.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/timer.h>
-#include <asm/i8253.h>
-#include <asm/irq_vectors.h>
-
-#define VMI_ONESHOT (VMI_ALARM_IS_ONESHOT | VMI_CYCLES_REAL | vmi_get_alarm_wiring())
-#define VMI_PERIODIC (VMI_ALARM_IS_PERIODIC | VMI_CYCLES_REAL | vmi_get_alarm_wiring())
-
-static DEFINE_PER_CPU(struct clock_event_device, local_events);
-
-static inline u32 vmi_counter(u32 flags)
-{
- /* Given VMI_ONESHOT or VMI_PERIODIC, return the corresponding
- * cycle counter. */
- return flags & VMI_ALARM_COUNTER_MASK;
-}
-
-/* paravirt_ops.get_wallclock = vmi_get_wallclock */
-unsigned long vmi_get_wallclock(void)
-{
- unsigned long long wallclock;
- wallclock = vmi_timer_ops.get_wallclock(); // nsec
- (void)do_div(wallclock, 1000000000); // sec
-
- return wallclock;
-}
-
-/* paravirt_ops.set_wallclock = vmi_set_wallclock */
-int vmi_set_wallclock(unsigned long now)
-{
- return 0;
-}
-
-/* paravirt_ops.sched_clock = vmi_sched_clock */
-unsigned long long vmi_sched_clock(void)
-{
- return cycles_2_ns(vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE));
-}
-
-/* x86_platform.calibrate_tsc = vmi_tsc_khz */
-unsigned long vmi_tsc_khz(void)
-{
- unsigned long long khz;
- khz = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(khz, 1000);
- return khz;
-}
-
-static inline unsigned int vmi_get_timer_vector(void)
-{
-#ifdef CONFIG_X86_IO_APIC
- return FIRST_DEVICE_VECTOR;
-#else
- return FIRST_EXTERNAL_VECTOR;
-#endif
-}
-
-/** vmi clockchip */
-#ifdef CONFIG_X86_LOCAL_APIC
-static unsigned int startup_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, vmi_get_timer_vector());
-
- return (val & APIC_SEND_PENDING);
-}
-
-static void mask_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, val | APIC_LVT_MASKED);
-}
-
-static void unmask_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, val & ~APIC_LVT_MASKED);
-}
-
-static void ack_timer_irq(unsigned int irq)
-{
- ack_APIC_irq();
-}
-
-static struct irq_chip vmi_chip __read_mostly = {
- .name = "VMI-LOCAL",
- .startup = startup_timer_irq,
- .mask = mask_timer_irq,
- .unmask = unmask_timer_irq,
- .ack = ack_timer_irq
-};
-#endif
-
-/** vmi clockevent */
-#define VMI_ALARM_WIRED_IRQ0 0x00000000
-#define VMI_ALARM_WIRED_LVTT 0x00010000
-static int vmi_wiring = VMI_ALARM_WIRED_IRQ0;
-
-static inline int vmi_get_alarm_wiring(void)
-{
- return vmi_wiring;
-}
-
-static void vmi_timer_set_mode(enum clock_event_mode mode,
- struct clock_event_device *evt)
-{
- cycle_t now, cycles_per_hz;
- BUG_ON(!irqs_disabled());
-
- switch (mode) {
- case CLOCK_EVT_MODE_ONESHOT:
- case CLOCK_EVT_MODE_RESUME:
- break;
- case CLOCK_EVT_MODE_PERIODIC:
- cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_hz, HZ);
- now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
- vmi_timer_ops.set_alarm(VMI_PERIODIC, now, cycles_per_hz);
- break;
- case CLOCK_EVT_MODE_UNUSED:
- case CLOCK_EVT_MODE_SHUTDOWN:
- switch (evt->mode) {
- case CLOCK_EVT_MODE_ONESHOT:
- vmi_timer_ops.cancel_alarm(VMI_ONESHOT);
- break;
- case CLOCK_EVT_MODE_PERIODIC:
- vmi_timer_ops.cancel_alarm(VMI_PERIODIC);
- break;
- default:
- break;
- }
- break;
- default:
- break;
- }
-}
-
-static int vmi_timer_next_event(unsigned long delta,
- struct clock_event_device *evt)
-{
- /* Unfortunately, set_next_event interface only passes relative
- * expiry, but we want absolute expiry. It'd be better if were
- * were passed an aboslute expiry, since a bunch of time may
- * have been stolen between the time the delta is computed and
- * when we set the alarm below. */
- cycle_t now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_ONESHOT));
-
- BUG_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT);
- vmi_timer_ops.set_alarm(VMI_ONESHOT, now + delta, 0);
- return 0;
-}
-
-static struct clock_event_device vmi_clockevent = {
- .name = "vmi-timer",
- .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
- .shift = 22,
- .set_mode = vmi_timer_set_mode,
- .set_next_event = vmi_timer_next_event,
- .rating = 1000,
- .irq = 0,
-};
-
-static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id)
-{
- struct clock_event_device *evt = &__get_cpu_var(local_events);
- evt->event_handler(evt);
- return IRQ_HANDLED;
-}
-
-static struct irqaction vmi_clock_action = {
- .name = "vmi-timer",
- .handler = vmi_timer_interrupt,
- .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_TIMER,
-};
-
-static void __devinit vmi_time_init_clockevent(void)
-{
- cycle_t cycles_per_msec;
- struct clock_event_device *evt;
-
- int cpu = smp_processor_id();
- evt = &__get_cpu_var(local_events);
-
- /* Use cycles_per_msec since div_sc params are 32-bits. */
- cycles_per_msec = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_msec, 1000);
-
- memcpy(evt, &vmi_clockevent, sizeof(*evt));
- /* Must pick .shift such that .mult fits in 32-bits. Choosing
- * .shift to be 22 allows 2^(32-22) cycles per nano-seconds
- * before overflow. */
- evt->mult = div_sc(cycles_per_msec, NSEC_PER_MSEC, evt->shift);
- /* Upper bound is clockevent's use of ulong for cycle deltas. */
- evt->max_delta_ns = clockevent_delta2ns(ULONG_MAX, evt);
- evt->min_delta_ns = clockevent_delta2ns(1, evt);
- evt->cpumask = cpumask_of(cpu);
-
- printk(KERN_WARNING "vmi: registering clock event %s. mult=%lu shift=%u\n",
- evt->name, evt->mult, evt->shift);
- clockevents_register_device(evt);
-}
-
-void __init vmi_time_init(void)
-{
- unsigned int cpu;
- /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
- outb_pit(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
-
- vmi_time_init_clockevent();
- setup_irq(0, &vmi_clock_action);
- for_each_possible_cpu(cpu)
- per_cpu(vector_irq, cpu)[vmi_get_timer_vector()] = 0;
-}
-
-#ifdef CONFIG_X86_LOCAL_APIC
-void __devinit vmi_time_bsp_init(void)
-{
- /*
- * On APIC systems, we want local timers to fire on each cpu. We do
- * this by programming LVTT to deliver timer events to the IRQ handler
- * for IRQ-0, since we can't re-use the APIC local timer handler
- * without interfering with that code.
- */
- clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
- local_irq_disable();
-#ifdef CONFIG_SMP
- /*
- * XXX handle_percpu_irq only defined for SMP; we need to switch over
- * to using it, since this is a local interrupt, which each CPU must
- * handle individually without locking out or dropping simultaneous
- * local timers on other CPUs. We also don't want to trigger the
- * quirk workaround code for interrupts which gets invoked from
- * handle_percpu_irq via eoi, so we use our own IRQ chip.
- */
- set_irq_chip_and_handler_name(0, &vmi_chip, handle_percpu_irq, "lvtt");
-#else
- set_irq_chip_and_handler_name(0, &vmi_chip, handle_edge_irq, "lvtt");
-#endif
- vmi_wiring = VMI_ALARM_WIRED_LVTT;
- apic_write(APIC_LVTT, vmi_get_timer_vector());
- local_irq_enable();
- clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
-}
-
-void __devinit vmi_time_ap_init(void)
-{
- vmi_time_init_clockevent();
- apic_write(APIC_LVTT, vmi_get_timer_vector());
-}
-#endif
-
-/** vmi clocksource */
-static struct clocksource clocksource_vmi;
-
-static cycle_t read_real_cycles(struct clocksource *cs)
-{
- cycle_t ret = (cycle_t)vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL);
- return max(ret, clocksource_vmi.cycle_last);
-}
-
-static struct clocksource clocksource_vmi = {
- .name = "vmi-timer",
- .rating = 450,
- .read = read_real_cycles,
- .mask = CLOCKSOURCE_MASK(64),
- .mult = 0, /* to be set */
- .shift = 22,
- .flags = CLOCK_SOURCE_IS_CONTINUOUS,
-};
-
-static int __init init_vmi_clocksource(void)
-{
- cycle_t cycles_per_msec;
-
- if (!vmi_timer_ops.get_cycle_frequency)
- return 0;
- /* Use khz2mult rather than hz2mult since hz arg is only 32-bits. */
- cycles_per_msec = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_msec, 1000);
-
- /* Note that clocksource.{mult, shift} converts in the opposite direction
- * as clockevents. */
- clocksource_vmi.mult = clocksource_khz2mult(cycles_per_msec,
- clocksource_vmi.shift);
-
- printk(KERN_WARNING "vmi: registering clock source khz=%lld\n", cycles_per_msec);
- return clocksource_register(&clocksource_vmi);
-
-}
-module_init(init_vmi_clocksource);

2009-09-20 07:43:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


* Alok Kataria <[email protected]> wrote:

> Here is the patch which actually removes the vmi code.
>
> Signed-off-by: Alok N Kataria <[email protected]>
> ---
>
> Documentation/kernel-parameters.txt | 2
> arch/x86/Kconfig | 10
> arch/x86/include/asm/vmi.h | 269 ----------
> arch/x86/include/asm/vmi_time.h | 98 ----
> arch/x86/kernel/Makefile | 1
> arch/x86/kernel/setup.c | 7
> arch/x86/kernel/smpboot.c | 9
> arch/x86/kernel/vmi_32.c | 913 -----------------------------------
> arch/x86/kernel/vmiclock_32.c | 321 ------------
> 9 files changed, 1 insertions(+), 1629 deletions(-)
> delete mode 100644 arch/x86/include/asm/vmi.h
> delete mode 100644 arch/x86/include/asm/vmi_time.h
> delete mode 100644 arch/x86/kernel/vmi_32.c
> delete mode 100644 arch/x86/kernel/vmiclock_32.c

The thing is, the overwhelming majority of vmware users dont benefit
from hardware features like nested page tables yet. So this needs to be
done _way_ more carefully, with a proper sunset period of a couple of
kernel cycles.

This is as if Intel had sent a patch to desupport say Core2
optimizations, now that Nehalem is out.

'Virtual hardware' is no different in this respect: until users benefit
from something we want to keep it, even if the vendor would like to sell
new hardware and would like the new hardware to have an edge over the
installed base.

If we were able to rip out all (or most) of paravirt from arch/x86 it
would be tempting for other technical reasons - but the patch above is
well localized.

Ingo

2009-09-20 07:55:28

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On Sun, 20 Sep 2009 09:42:47 +0200
Ingo Molnar <[email protected]> wrote:

> If we were able to rip out all (or most) of paravirt from arch/x86 it
> would be tempting for other technical reasons - but the patch above
> is well localized.

interesting question is if this would allow us to remove a few of the
paravirt hooks....


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-20 09:02:12

by Avi Kivity

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
> On Sun, 20 Sep 2009 09:42:47 +0200
> Ingo Molnar<[email protected]> wrote:
>
>
>> If we were able to rip out all (or most) of paravirt from arch/x86 it
>> would be tempting for other technical reasons - but the patch above
>> is well localized.
>>
> interesting question is if this would allow us to remove a few of the
> paravirt hooks....
>

kvm will be removing the pvmmu support soon; and Xen is talking about
running paravirtualized guests in a vmx/svm container where they don't
need most of the hooks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-09-20 15:49:16

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/20/09 02:00, Avi Kivity wrote:
> On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
>> On Sun, 20 Sep 2009 09:42:47 +0200
>> Ingo Molnar<[email protected]> wrote:
>>
>>
>>> If we were able to rip out all (or most) of paravirt from arch/x86 it
>>> would be tempting for other technical reasons - but the patch above
>>> is well localized.
>>>
>> interesting question is if this would allow us to remove a few of the
>> paravirt hooks....
>>
>
> kvm will be removing the pvmmu support soon; and Xen is talking about
> running paravirtualized guests in a vmx/svm container where they don't
> need most of the hooks.
>

We have no plans to drop support for non-vmx/svm capable processors, let
alone require ept/npt.

J

2009-09-20 19:02:44

by Avi Kivity

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/20/2009 06:49 PM, Jeremy Fitzhardinge wrote:
>> kvm will be removing the pvmmu support soon; and Xen is talking about
>> running paravirtualized guests in a vmx/svm container where they don't
>> need most of the hooks.
>>
>>
> We have no plans to drop support for non-vmx/svm capable processors, let
> alone require ept/npt.
>

Today, certainly; similarly kvm will host-side pvmmu support for a while
to support live migration from older hosts.

But in a few years it may make sense to run everything in a vmx/svm
container even for Xen; we can then drop x86 pv_ops for good.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-09-22 07:22:37

by Rusty Russell

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On Sun, 20 Sep 2009 06:30:21 pm Avi Kivity wrote:
> On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
> > On Sun, 20 Sep 2009 09:42:47 +0200
> > Ingo Molnar<[email protected]> wrote:
> >
> >
> >> If we were able to rip out all (or most) of paravirt from arch/x86 it
> >> would be tempting for other technical reasons - but the patch above
> >> is well localized.
> >>
> > interesting question is if this would allow us to remove a few of the
> > paravirt hooks....
> >
>
> kvm will be removing the pvmmu support soon; and Xen is talking about
> running paravirtualized guests in a vmx/svm container where they don't
> need most of the hooks.

When they're all gone, even I don't think lguest is sufficient excuse
to keep CONFIG_PARAVIRT. Oh well. But that will probably be a while.

Cheers,
Rusty.

2009-09-22 08:10:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


* Jeremy Fitzhardinge <[email protected]> wrote:

> On 09/20/09 02:00, Avi Kivity wrote:
> > On 09/20/2009 10:52 AM, Arjan van de Ven wrote:
> >> On Sun, 20 Sep 2009 09:42:47 +0200
> >> Ingo Molnar<[email protected]> wrote:
> >>
> >>
> >>> If we were able to rip out all (or most) of paravirt from arch/x86 it
> >>> would be tempting for other technical reasons - but the patch above
> >>> is well localized.
> >>>
> >> interesting question is if this would allow us to remove a few of the
> >> paravirt hooks....
> >>
> >
> > kvm will be removing the pvmmu support soon; and Xen is talking about
> > running paravirtualized guests in a vmx/svm container where they don't
> > need most of the hooks.
>
> We have no plans to drop support for non-vmx/svm capable processors,
> let alone require ept/npt.

But, just to map out our plans for the future, do you concur with the
statements and numbers offered here by the VMware and KVM folks that
on sufficiently recent hardware, hardware-assisted virtualization
outperforms paravirt_ops in many (most?) workloads?

I.e. paravirt_ops becomes a legacy hardware thing, not a core component
of the design of arch/x86/.

(with a long obsoletion period, of course.)

Thanks,

Ingo

2009-09-22 16:52:30

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 01:09, Ingo Molnar wrote:
>>> kvm will be removing the pvmmu support soon; and Xen is talking about
>>> running paravirtualized guests in a vmx/svm container where they don't
>>> need most of the hooks.
>>>
>> We have no plans to drop support for non-vmx/svm capable processors,
>> let alone require ept/npt.
>>
> But, just to map out our plans for the future, do you concur with the
> statements and numbers offered here by the VMware and KVM folks that
> on sufficiently recent hardware, hardware-assisted virtualization
> outperforms paravirt_ops in many (most?) workloads?
>

Well, what Avi is referring to here is some discussions about a hybrid
paravirtualized mode, in which Xen runs a normal Xen PV guest within a
hardware container in order to get some immediate optimisations, and
allow further optimisations like using hardware assisted paging extensions.

For KVM and VMI, which always use a shadow pagetable scheme, hardware
paging is now unambigiously better than shadow pagetables, but for Xen
PV guests the picture is mixed since they don't use shadow pagetables.
The NPT/EPT extensions make updating the pagetable more efficent, but
actual access is more expensive because of the higher load on the TLB
and the increased expense of a TLB miss, so the actual performance
effects are very workload dependent.

> I.e. paravirt_ops becomes a legacy hardware thing, not a core component
> of the design of arch/x86/.
>
> (with a long obsoletion period, of course.)
>

I expect we'll eventually get to the point that the performance delta
and the installed userbase will no longer justify the effort in
maintaining the full set of pvops hooks. But I don't have a good
feeling for when that might be.

J

2009-09-22 16:53:57

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 00:22, Rusty Russell wrote:
> When they're all gone, even I don't think lguest is sufficient excuse
> to keep CONFIG_PARAVIRT. Oh well. But that will probably be a while.
>

/Solidarność/!

J

2009-09-22 18:03:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


* Jeremy Fitzhardinge <[email protected]> wrote:

> On 09/22/09 01:09, Ingo Molnar wrote:
> >>> kvm will be removing the pvmmu support soon; and Xen is talking about
> >>> running paravirtualized guests in a vmx/svm container where they don't
> >>> need most of the hooks.
> >>>
> >> We have no plans to drop support for non-vmx/svm capable processors,
> >> let alone require ept/npt.
> >
> > But, just to map out our plans for the future, do you concur with
> > the statements and numbers offered here by the VMware and KVM folks
> > that on sufficiently recent hardware, hardware-assisted
> > virtualization outperforms paravirt_ops in many (most?) workloads?
>
> Well, what Avi is referring to here is some discussions about a hybrid
> paravirtualized mode, in which Xen runs a normal Xen PV guest within a
> hardware container in order to get some immediate optimisations, and
> allow further optimisations like using hardware assisted paging
> extensions.
>
> For KVM and VMI, which always use a shadow pagetable scheme, hardware
> paging is now unambigiously better than shadow pagetables, but for Xen
> PV guests the picture is mixed since they don't use shadow pagetables.
> The NPT/EPT extensions make updating the pagetable more efficent, but
> actual access is more expensive because of the higher load on the TLB
> and the increased expense of a TLB miss, so the actual performance
> effects are very workload dependent.

obviously they are workload dependent - that's why numbers were posted
in this thread with various workloads. Do you concur with those
conclusions that they are generally a speedup over paravirt? If not,
which are the workloads where paravirt offers significant speedup over
hardware acceleration?

Ingo

2009-09-22 18:16:22

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 11:02, Ingo Molnar wrote:
> obviously they are workload dependent - that's why numbers were posted
> in this thread with various workloads. Do you concur with those
> conclusions that they are generally a speedup over paravirt? If not,
> which are the workloads where paravirt offers significant speedup over
> hardware acceleration?
>

We're not in a position to do any useful measurements yet.

J

2009-09-22 19:06:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


* Jeremy Fitzhardinge <[email protected]> wrote:

> On 09/22/09 11:02, Ingo Molnar wrote:
>
> > obviously they are workload dependent - that's why numbers were
> > posted in this thread with various workloads. Do you concur with
> > those conclusions that they are generally a speedup over paravirt?
> > If not, which are the workloads where paravirt offers significant
> > speedup over hardware acceleration?
>
> We're not in a position to do any useful measurements yet.

Sorry for being dense, but what does that mean precisely? No available
hardware? Xen doesnt run?

Ingo

2009-09-22 19:30:38

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 12:04, Ingo Molnar wrote:
> Sorry for being dense, but what does that mean precisely? No available
> hardware? Xen doesnt run?

Nobody has implemented hybrid PV mode yet, so we haven't got anything to
measure.

Also, I don't think there have been very many measurements of Linux HVM
(full virtualization) Xen guests, because Linux is typically run
paravirtualized and HVM support is primarily tuned for Windows guests.

J

2009-09-22 19:30:46

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

Hi Ingo,

On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:

>
> The thing is, the overwhelming majority of vmware users dont benefit
> from hardware features like nested page tables yet. So this needs to be
> done _way_ more carefully, with a proper sunset period of a couple of
> kernel cycles.

I am fine with that too. Below is a patch which adds notes in
feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
Please consider this patch for 2.6.32.

> If we were able to rip out all (or most) of paravirt from arch/x86 it
> would be tempting for other technical reasons - but the patch above is
> well localized.

We can certainly look at removing some paravirt-hooks which are only
used by VMI. Not sure if there are any but will take a look when we
actually remove VMI.

Thanks,
Alok

--

Mark VMI for deprecation in feature-removal-schedule.txt.

From: Alok N Kataria <[email protected]>

Add text in feature-removal.txt and also modify Kconfig to disable
vmi by default.
Patch on top of tip/master.

Details about VMware's plan about retiring VMI can be found here
http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html

---

Documentation/feature-removal-schedule.txt | 24 ++++++++++++++++++++++++
arch/x86/Kconfig | 8 +++++---
2 files changed, 29 insertions(+), 3 deletions(-)


diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index fa75220..b985328 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -459,3 +459,27 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.34
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtulization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too, in a couple of releases.
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e214f45..1f3e156 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -485,14 +485,16 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
- select PARAVIRT
- depends on X86_32
+ bool "VMI Guest support [will be deprecated soon]"
+ default n
+ depends on X86_32 && PARAVIRT
---help---
VMI provides a paravirtualized interface to the VMware ESX server
(it could be used by other hypervisors in theory too, but is not
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
+ VMware has started a phased retirement of this feature from there
+ products. Please see feature-removal-schedule.txt for details.

config KVM_CLOCK
bool "KVM paravirtualized clock"

2009-09-22 19:47:44

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 12:30, Alok Kataria wrote:
> We can certainly look at removing some paravirt-hooks which are only
> used by VMI. Not sure if there are any but will take a look when we
> actually remove VMI.
>

There are a couple:

* pte_update_defer
* alloc_pmd_clone

lguest appears to still use pte_update(), but I suspect its two
callsites could be recast in the form of other existing pvops.

J

2009-09-22 21:29:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

Alok Kataria wrote:
> Hi Ingo,
>
> On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
>
>> The thing is, the overwhelming majority of vmware users dont benefit
>> from hardware features like nested page tables yet. So this needs to be
>> done _way_ more carefully, with a proper sunset period of a couple of
>> kernel cycles.
>
> I am fine with that too. Below is a patch which adds notes in
> feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> Please consider this patch for 2.6.32.
>

This seems way, way too early still.

-hpa

2009-09-22 21:54:33

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Tue, 2009-09-22 at 14:27 -0700, H. Peter Anvin wrote:
> Alok Kataria wrote:
> > Hi Ingo,
> >
> > On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
> >
> >> The thing is, the overwhelming majority of vmware users dont benefit
> >> from hardware features like nested page tables yet. So this needs to be
> >> done _way_ more carefully, with a proper sunset period of a couple of
> >> kernel cycles.
> >
> > I am fine with that too. Below is a patch which adds notes in
> > feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> > Please consider this patch for 2.6.32.
> >
>
> This seems way, way too early still.

What do you suggest would be the right time ?

Please note that the next major release of VMware's product will not
have this supported. Also that, most of our customers will actually be
running some distro's enterprise release, rather than running the
cutting edge kernel. So IMO there is still a window of around 1-1.5
years, until a customer actually sees a kernel which has dropped VMI
support.

Thanks,
Alok

2009-09-22 23:00:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

Alok Kataria wrote:
>
> What do you suggest would be the right time ?
>
> Please note that the next major release of VMware's product will not
> have this supported. Also that, most of our customers will actually be
> running some distro's enterprise release, rather than running the
> cutting edge kernel. So IMO there is still a window of around 1-1.5
> years, until a customer actually sees a kernel which has dropped VMI
> support.
>

I would say it might make sense pulling it out around the end of 2010,
which would be about 6 kernel releases from now -- 2.6.37.

-hpa

2009-09-23 07:33:20

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/22/09 21:30, Alok Kataria wrote:
> Hi Ingo,
>
> On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
>
>>
>> The thing is, the overwhelming majority of vmware users dont benefit
>> from hardware features like nested page tables yet. So this needs to be
>> done _way_ more carefully, with a proper sunset period of a couple of
>> kernel cycles.
>
> I am fine with that too. Below is a patch which adds notes in
> feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> Please consider this patch for 2.6.32.

Hmm. Given that you are talking about vmi not being supported any more
in *future* products, there is a huge installed base with vmi support
available, right? I don't think we should zap the code that quickly.

> config VMI
> - bool "VMI Guest support"
> - select PARAVIRT
> - depends on X86_32
> + bool "VMI Guest support [will be deprecated soon]"
> + default n
> + depends on X86_32&& PARAVIRT
> ---help---
> VMI provides a paravirtualized interface to the VMware ESX server
> (it could be used by other hypervisors in theory too, but is not
> at the moment), by linking the kernel to a GPL-ed ROM module
> provided by the hypervisor.
> + VMware has started a phased retirement of this feature from there
> + products. Please see feature-removal-schedule.txt for details.

How about adding version numbers here? i.e. latest versions with vmi
support are workstation x.y, ...

So people can easily figure whenever it makes sense to turn this on for
their environment.

cheers
Gerd

2009-09-29 00:45:06

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Wed, 2009-09-23 at 00:29 -0700, Gerd Hoffmann wrote:
> On 09/22/09 21:30, Alok Kataria wrote:
> > Hi Ingo,
> >
> > On Sun, 2009-09-20 at 00:42 -0700, Ingo Molnar wrote:
> >
> >>
> >> The thing is, the overwhelming majority of vmware users dont benefit
> >> from hardware features like nested page tables yet. So this needs to be
> >> done _way_ more carefully, with a proper sunset period of a couple of
> >> kernel cycles.
> >
> > I am fine with that too. Below is a patch which adds notes in
> > feature-removal-schedule.txt, I have marked it for removal from 2.6.34.
> > Please consider this patch for 2.6.32.
>
> Hmm. Given that you are talking about vmi not being supported any more
> in *future* products, there is a huge installed base with vmi support
> available, right? I don't think we should zap the code that quickly.

Yep, hpa too raised the same issue, I spoke to him during LPC and we
decided that 2.6.37 will be the right time frame for removal of this
code.
For now I have just added some text in the feature-removal file and
disabled VMI by default in the Kconfig, the reason that needs to be done
is because "Live Migration" of a VMI enabled VM to future products which
don't support VMI will not work, so its important that newer distros
keep this disabled, if they want seamless migration that is.

>
> > config VMI
> > - bool "VMI Guest support"
> > - select PARAVIRT
> > - depends on X86_32
> > + bool "VMI Guest support [will be deprecated soon]"
> > + default n
> > + depends on X86_32&& PARAVIRT
> > ---help---
> > VMI provides a paravirtualized interface to the VMware ESX server
> > (it could be used by other hypervisors in theory too, but is not
> > at the moment), by linking the kernel to a GPL-ed ROM module
> > provided by the hypervisor.
> > + VMware has started a phased retirement of this feature from there
> > + products. Please see feature-removal-schedule.txt for details.
>
> How about adding version numbers here? i.e. latest versions with vmi
> support are workstation x.y, ...
>
> So people can easily figure whenever it makes sense to turn this on for
> their environment.

Okay, have added that text too.

Thanks for your comments.

Ingo/hpa, please consider the patch below for tip.

--

Mark VMI for deprecation in feature-removal-schedule.txt.

From: Alok N Kataria <[email protected]>

Add text in feature-removal.txt and also modify Kconfig to disable
vmi by default.
---

Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++
arch/x86/Kconfig | 13 +++++++++---
2 files changed, 40 insertions(+), 3 deletions(-)


diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index fa75220..0271f37 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -459,3 +459,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.37 or earlier.
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtualization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+ technical reasons ( read opportunity to remove major chunk of pvops)
+ arise.
+
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+ Latest versions of VMware's product which support VMI are,
+ Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+ releases for these products will continue supporting VMI.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e214f45..84fd47c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -485,14 +485,21 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
- select PARAVIRT
- depends on X86_32
+ bool "VMI Guest support [will be deprecated soon]"
+ default n
+ depends on X86_32 && PARAVIRT
---help---
VMI provides a paravirtualized interface to the VMware ESX server
(it could be used by other hypervisors in theory too, but is not
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
+ As of September 2009, VMware has started a phased retirement of this
+ feature from VMware's products. Please see
+ feature-removal-schedule.txt for details.
+ If you are planning to enable this option, please note that you
+ cannot live migrate a VMI enabled VM to a future VMware product,
+ which doesn't support VMI. So if you expect your kernel to seamlessly
+ migrate to newer VMware products, keep this disabled.

config KVM_CLOCK
bool "KVM paravirtualized clock"

2009-09-29 02:31:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/28/2009 05:45 PM, Alok Kataria wrote:
> + bool "VMI Guest support [will be deprecated soon]"
> + default n

This is incorrect use of the word "deprecated"... it's *already*
deprecated (a word which pretty much means the opposite of "recommended".)

As far as "default n" is concerned... this is usually not necessary; "n"
is the default unless anything else is specified.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-09-29 03:00:35

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Mon, 2009-09-28 at 19:25 -0700, H. Peter Anvin wrote:
> On 09/28/2009 05:45 PM, Alok Kataria wrote:
> > + bool "VMI Guest support [will be deprecated soon]"
> > + default n
>
> This is incorrect use of the word "deprecated"... it's *already*
> deprecated (a word which pretty much means the opposite of "recommended".)
>
> As far as "default n" is concerned... this is usually not necessary; "n"
> is the default unless anything else is specified.

How about this ? Thanks.

--
Mark VMI for removal in feature-removal-schedule.txt.

From: Alok N Kataria <[email protected]>

Add text in feature-removal.txt and also modify Kconfig to disable
vmi by default.

---

Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++
arch/x86/Kconfig | 12 ++++++++---
2 files changed, 39 insertions(+), 3 deletions(-)


diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..d24c1af 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.37 or earlier.
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtualization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+ technical reasons ( read opportunity to remove major chunk of pvops)
+ arise.
+
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+ Latest versions of VMware's product which support VMI are,
+ Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+ releases for these products will continue supporting VMI.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f777aaf..44c1660 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -496,14 +496,20 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
- select PARAVIRT
- depends on X86_32
+ bool "VMI Guest support [deprecated]"
+ depends on X86_32 && PARAVIRT
---help---
VMI provides a paravirtualized interface to the VMware ESX server
(it could be used by other hypervisors in theory too, but is not
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
+ As of September 2009, VMware has started a phased retirement of this
+ feature from VMware's products. Please see
+ feature-removal-schedule.txt for details.
+ If you are planning to enable this option, please note that you
+ cannot live migrate a VMI enabled VM to a future VMware product,
+ which doesn't support VMI. So if you expect your kernel to seamlessly
+ migrate to newer VMware products, keep this disabled.

config KVM_CLOCK
bool "KVM paravirtualized clock"


2009-09-29 08:12:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

> For now I have just added some text in the feature-removal file and
> disabled VMI by default in the Kconfig, the reason that needs to be
> done is because "Live Migration" of a VMI enabled VM to future
> products which don't support VMI will not work, so its important that
> newer distros keep this disabled, if they want seamless migration
> that is.

btw the "default" in KConfig tends to be totally ignored by distro
kernel maintainers... please don't assume that just because some default
is set in KConfig it has ANY impact on what shows up in distributions.

2009-09-29 09:03:24

by Chris Wright

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

* Alok Kataria ([email protected]) wrote:
>
> On Mon, 2009-09-28 at 19:25 -0700, H. Peter Anvin wrote:
> > On 09/28/2009 05:45 PM, Alok Kataria wrote:
> > > + bool "VMI Guest support [will be deprecated soon]"
> > > + default n
> >
> > This is incorrect use of the word "deprecated"... it's *already*
> > deprecated (a word which pretty much means the opposite of "recommended".)
> >
> > As far as "default n" is concerned... this is usually not necessary; "n"
> > is the default unless anything else is specified.
>
> How about this ? Thanks.

Looks good to me (missing Signed-off-by). I think it's also useful
to generate some runtime noise saying it's a deprecated option.

Even something as simple as:

- pv_info.name = "vmi"
+ pv_info.name = "vmi [deprecated]";

2009-09-29 16:49:47

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Tue, 2009-09-29 at 01:08 -0700, Arjan van de Ven wrote:
> > For now I have just added some text in the feature-removal file and
> > disabled VMI by default in the Kconfig, the reason that needs to be
> > done is because "Live Migration" of a VMI enabled VM to future
> > products which don't support VMI will not work, so its important that
> > newer distros keep this disabled, if they want seamless migration
> > that is.
>
> btw the "default" in KConfig tends to be totally ignored by distro
> kernel maintainers... please don't assume that just because some default
> is set in KConfig it has ANY impact on what shows up in distributions.

So, are you saying that we should be doing something else along with
toggling it off in the Kconfig ?
We have already informed most of the distro folks about this deprecation
so I think we should be okay there, but if there is something else that
should be done, do let me know.

Thanks,
Alok


2009-09-29 16:57:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/29/2009 09:49 AM, Alok Kataria wrote:
>
> So, are you saying that we should be doing something else along with
> toggling it off in the Kconfig ?
> We have already informed most of the distro folks about this deprecation
> so I think we should be okay there, but if there is something else that
> should be done, do let me know.
>

I don't see it ever having been anything than off by default in Kconfig.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-09-29 17:25:22

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Tue, 2009-09-29 at 02:01 -0700, Chris Wright wrote:
> * Alok Kataria ([email protected]) wrote:
> >
> > On Mon, 2009-09-28 at 19:25 -0700, H. Peter Anvin wrote:
> > > On 09/28/2009 05:45 PM, Alok Kataria wrote:
> > > > + bool "VMI Guest support [will be deprecated soon]"
> > > > + default n
> > >
> > > This is incorrect use of the word "deprecated"... it's *already*
> > > deprecated (a word which pretty much means the opposite of "recommended".)
> > >
> > > As far as "default n" is concerned... this is usually not necessary; "n"
> > > is the default unless anything else is specified.
> >
> > How about this ? Thanks.
>
> Looks good to me (missing Signed-off-by). I think it's also useful
> to generate some runtime noise saying it's a deprecated option.
>
> Even something as simple as:
>
> - pv_info.name = "vmi"
> + pv_info.name = "vmi [deprecated]";
>

Yep, I was thinking of adding KERN_WARN's in vmi_init, though I like
your suggestion better. Also added SOB line. Thanks.

--

Mark VMI for removal in feature-removal-schedule.txt.

From: Alok N Kataria <[email protected]>

Add text in feature-removal.txt and also modify Kconfig to disable
vmi by default.

Signed-off-by: Alok N Kataria <[email protected]>
---

Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++
arch/x86/Kconfig | 12 ++++++++---
arch/x86/kernel/vmi_32.c | 2 +-
3 files changed, 40 insertions(+), 4 deletions(-)


diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..04e6c81 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.37 or earlier.
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtualization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+ technical reasons (read opportunity to remove major chunk of pvops)
+ arise.
+
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+ Latest versions of VMware's product which support VMI are,
+ Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+ releases for these products will continue supporting VMI.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f777aaf..44c1660 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -496,14 +496,20 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
- select PARAVIRT
- depends on X86_32
+ bool "VMI Guest support [deprecated]"
+ depends on X86_32 && PARAVIRT
---help---
VMI provides a paravirtualized interface to the VMware ESX server
(it could be used by other hypervisors in theory too, but is not
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
+ As of September 2009, VMware has started a phased retirement of this
+ feature from VMware's products. Please see
+ feature-removal-schedule.txt for details.
+ If you are planning to enable this option, please note that you
+ cannot live migrate a VMI enabled VM to a future VMware product,
+ which doesn't support VMI. So if you expect your kernel to seamlessly
+ migrate to newer VMware products, keep this disabled.

config KVM_CLOCK
bool "KVM paravirtualized clock"
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index 31e6f6c..d430e4c 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -648,7 +648,7 @@ static inline int __init activate_vmi(void)

pv_info.paravirt_enabled = 1;
pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
- pv_info.name = "vmi";
+ pv_info.name = "vmi [deprecated]";

pv_init_ops.patch = vmi_patch;


2009-09-29 17:30:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

On 09/29/2009 10:25 AM, Alok Kataria wrote:
> diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
> index 31e6f6c..d430e4c 100644
> --- a/arch/x86/kernel/vmi_32.c
> +++ b/arch/x86/kernel/vmi_32.c
> @@ -648,7 +648,7 @@ static inline int __init activate_vmi(void)
>
> pv_info.paravirt_enabled = 1;
> pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
> - pv_info.name = "vmi";
> + pv_info.name = "vmi [deprecated]";
>
> pv_init_ops.patch = vmi_patch;
>
>
>

Where is this string used, and could this break something?

-hpa

2009-09-29 17:36:52

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Tue, 2009-09-29 at 10:27 -0700, H. Peter Anvin wrote:
> On 09/29/2009 10:25 AM, Alok Kataria wrote:
> > diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
> > index 31e6f6c..d430e4c 100644
> > --- a/arch/x86/kernel/vmi_32.c
> > +++ b/arch/x86/kernel/vmi_32.c
> > @@ -648,7 +648,7 @@ static inline int __init activate_vmi(void)
> >
> > pv_info.paravirt_enabled = 1;
> > pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
> > - pv_info.name = "vmi";
> > + pv_info.name = "vmi [deprecated]";
> >
> > pv_init_ops.patch = vmi_patch;
> >
> >
> >
>
> Where is this string used, and could this break something?

It is used by default_banner in "arch/x86/kernel/paravirt.c", IMO this
just prints some info for users, shouldn't break anything.

Alok
>
> -hpa
>

2009-09-29 18:23:54

by Chris Wright

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

* Alok Kataria ([email protected]) wrote:
> Mark VMI for removal in feature-removal-schedule.txt.
>
> From: Alok N Kataria <[email protected]>
>
> Add text in feature-removal.txt and also modify Kconfig to disable
> vmi by default.
>
> Signed-off-by: Alok N Kataria <[email protected]>

Acked-by: Chris Wright <[email protected]>

2009-10-02 03:01:08

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].

Alok Kataria <[email protected]> writes:

> On Tue, 2009-09-29 at 01:08 -0700, Arjan van de Ven wrote:
>> > For now I have just added some text in the feature-removal file and
>> > disabled VMI by default in the Kconfig, the reason that needs to be
>> > done is because "Live Migration" of a VMI enabled VM to future
>> > products which don't support VMI will not work, so its important that
>> > newer distros keep this disabled, if they want seamless migration
>> > that is.
>>
>> btw the "default" in KConfig tends to be totally ignored by distro
>> kernel maintainers... please don't assume that just because some default
>> is set in KConfig it has ANY impact on what shows up in distributions.
>
> So, are you saying that we should be doing something else along with
> toggling it off in the Kconfig ?
> We have already informed most of the distro folks about this deprecation
> so I think we should be okay there, but if there is something else that
> should be done, do let me know.

Perhaps log a message when it is first used?

I do that for sysctl right now.

Eric

2009-10-02 04:45:19

by Alok Kataria

[permalink] [raw]
Subject: Re: Paravirtualization on VMware's Platform [VMI].


On Thu, 2009-10-01 at 20:00 -0700, Eric W. Biederman wrote:

> >> btw the "default" in KConfig tends to be totally ignored by distro
> >> kernel maintainers... please don't assume that just because some default
> >> is set in KConfig it has ANY impact on what shows up in distributions.
> >
> > So, are you saying that we should be doing something else along with
> > toggling it off in the Kconfig ?
> > We have already informed most of the distro folks about this deprecation
> > so I think we should be okay there, but if there is something else that
> > should be done, do let me know.
>
> Perhaps log a message when it is first used?

Yeah, Chris' suggestion about adding "deprecated" string in pv_info.name
does just that.

Thanks,
Alok

2009-10-08 20:25:57

by Alok Kataria

[permalink] [raw]
Subject: [tip:x86/urgent] x86, vmi: Mark VMI deprecated and schedule it for remval

Commit-ID: 6c42ffab4dc0e21f9c6adc906368e7e7c12df47f
Gitweb: http://git.kernel.org/tip/6c42ffab4dc0e21f9c6adc906368e7e7c12df47f
Author: Alok Kataria <[email protected]>
AuthorDate: Tue, 29 Sep 2009 10:25:24 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 8 Oct 2009 13:21:04 -0700

x86, vmi: Mark VMI deprecated and schedule it for remval

Add text in feature-removal.txt indicating that VMI will be removed in
the 2.6.37 timeframe, and modify Kconfig to disable VMI by default.

[ hpa: removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ]

Signed-off-by: Alok N Kataria <[email protected]>
Acked-by: Chris Wright <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++
arch/x86/Kconfig | 11 +++++++++-
arch/x86/kernel/vmi_32.c | 2 +-
3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..04e6c81 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.37 or earlier.
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtualization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+ technical reasons (read opportunity to remove major chunk of pvops)
+ arise.
+
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+ Latest versions of VMware's product which support VMI are,
+ Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+ releases for these products will continue supporting VMI.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c876bac..07e0114 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -491,7 +491,7 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
+ bool "VMI Guest support (DEPRECATED)"
select PARAVIRT
depends on X86_32
---help---
@@ -500,6 +500,15 @@ config VMI
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.

+ As of September 2009, VMware has started a phased retirement
+ of this feature from VMware's products. Please see
+ feature-removal-schedule.txt for details. If you are
+ planning to enable this option, please note that you cannot
+ live migrate a VMI enabled VM to a future VMware product,
+ which doesn't support VMI. So if you expect your kernel to
+ seamlessly migrate to newer VMware products, keep this
+ disabled.
+
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index 31e6f6c..d430e4c 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -648,7 +648,7 @@ static inline int __init activate_vmi(void)

pv_info.paravirt_enabled = 1;
pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
- pv_info.name = "vmi";
+ pv_info.name = "vmi [deprecated]";

pv_init_ops.patch = vmi_patch;

2009-10-08 20:35:26

by Alok Kataria

[permalink] [raw]
Subject: [tip:x86/urgent] x86, vmi: Mark VMI deprecated and schedule it for removal

Commit-ID: d0153ca35d344d9b640dc305031b0703ba3f30f0
Gitweb: http://git.kernel.org/tip/d0153ca35d344d9b640dc305031b0703ba3f30f0
Author: Alok Kataria <[email protected]>
AuthorDate: Tue, 29 Sep 2009 10:25:24 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 8 Oct 2009 22:27:55 +0200

x86, vmi: Mark VMI deprecated and schedule it for removal

Add text in feature-removal.txt indicating that VMI will be removed in
the 2.6.37 timeframe.

Signed-off-by: Alok N Kataria <[email protected]>
Acked-by: Chris Wright <[email protected]>
LKML-Reference: <[email protected]>
[ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ]
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++
arch/x86/Kconfig | 11 +++++++++-
arch/x86/kernel/vmi_32.c | 2 +-
3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..04e6c81 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -451,3 +451,33 @@ Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
will also allow making ALSA OSS emulation independent of
sound_core. The dependency will be broken then too.
Who: Tejun Heo <[email protected]>
+
+----------------------------
+
+What: Support for VMware's guest paravirtuliazation technique [VMI] will be
+ dropped.
+When: 2.6.37 or earlier.
+Why: With the recent innovations in CPU hardware acceleration technologies
+ from Intel and AMD, VMware ran a few experiments to compare these
+ techniques to guest paravirtualization technique on VMware's platform.
+ These hardware assisted virtualization techniques have outperformed the
+ performance benefits provided by VMI in most of the workloads. VMware
+ expects that these hardware features will be ubiquitous in a couple of
+ years, as a result, VMware has started a phased retirement of this
+ feature from the hypervisor. We will be removing this feature from the
+ Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+ technical reasons (read opportunity to remove major chunk of pvops)
+ arise.
+
+ Please note that VMI has always been an optimization and non-VMI kernels
+ still work fine on VMware's platform.
+ Latest versions of VMware's product which support VMI are,
+ Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+ releases for these products will continue supporting VMI.
+
+ For more details about VMI retirement take a look at this,
+ http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who: Alok N Kataria <[email protected]>
+
+----------------------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c876bac..07e0114 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -491,7 +491,7 @@ if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"

config VMI
- bool "VMI Guest support"
+ bool "VMI Guest support (DEPRECATED)"
select PARAVIRT
depends on X86_32
---help---
@@ -500,6 +500,15 @@ config VMI
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.

+ As of September 2009, VMware has started a phased retirement
+ of this feature from VMware's products. Please see
+ feature-removal-schedule.txt for details. If you are
+ planning to enable this option, please note that you cannot
+ live migrate a VMI enabled VM to a future VMware product,
+ which doesn't support VMI. So if you expect your kernel to
+ seamlessly migrate to newer VMware products, keep this
+ disabled.
+
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
index 31e6f6c..d430e4c 100644
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -648,7 +648,7 @@ static inline int __init activate_vmi(void)

pv_info.paravirt_enabled = 1;
pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
- pv_info.name = "vmi";
+ pv_info.name = "vmi [deprecated]";

pv_init_ops.patch = vmi_patch;