2022-08-20 06:01:45

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 00/18] KVM hardware enable/disable reorganize

From: Isaku Yamahata <[email protected]>

The purpose of this patch series is to get feedback before going further.
e.g. rebasing TDX KVM series, etc.

This patch series is to implement the suggestion by Sean Christopherson [1]
to reorganize enable/disable cpu virtualization feature by replacing
the arch-generic current enable/disable logic with PM related hooks. And
convert kvm/x86 to use new hooks.

- Untable x86 hardware enable logic, snapshot MSRs for user return notifier,
enabling cpu virtualization on cpu online and platform resume. and real
enabling of CPU virtualization feature
- Introduce hooks related to PM.
- Convert kvm/x86 code to user those hooks.
- Split out hardware enabling/disabling logic into a separate file. Compile
it for non-x86 code. Once conversion of other KVM archs is done, this file
can be dropped.
- Delete cpus_hardware_enabled. 17/18 and 18/18

[1] https://lore.kernel.org/kvm/[email protected]/

Chao Gao (2):
KVM: x86: Move check_processor_compatibility from init ops to runtime
ops
Partially revert "KVM: Pass kvm_init()'s opaque param to additional
arch funcs"

Isaku Yamahata (16):
KVM: x86: Drop kvm_user_return_msr_cpu_online()
KVM: x86: Use this_cpu_ptr() instead of
per_cpu_ptr(smp_processor_id())
KVM: Drop kvm_count_lock and instead protect kvm_usage_count with
kvm_lock
KVM: Add arch hooks for PM events with empty stub
KVM: x86: Move TSC fixup logic to KVM arch resume callback
KVM: Add arch hook when VM is added/deleted
KVM: Move out KVM arch PM hooks and hardware enable/disable logic
KVM: kvm_arch.c: Remove _nolock post fix
KVM: kvm_arch.c: Remove a global variable, hardware_enable_failed
KVM: Do processor compatibility check on cpu online and resume
KVM: x86: Duplicate arch callbacks related to pm events
KVM: Eliminate kvm_arch_post_init_vm()
KVM: x86: Delete kvm_arch_hardware_enable/disable()
KVM: Add config to not compile kvm_arch.c
KVM: x86: Remove cpus_hardware_enabled and related sanity check
KVM: Remove cpus_hardware_enabled and related sanity check

Documentation/virt/kvm/locking.rst | 14 +--
arch/arm64/kvm/arm.c | 2 +-
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/powerpc.c | 2 +-
arch/riscv/kvm/main.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/vmx/vmx.c | 14 +--
arch/x86/kvm/x86.c | 184 ++++++++++++++++++++++-------
include/linux/kvm_host.h | 14 ++-
virt/kvm/Kconfig | 3 +
virt/kvm/Makefile.kvm | 3 +
virt/kvm/kvm_arch.c | 119 +++++++++++++++++++
virt/kvm/kvm_main.c | 177 ++++++++-------------------
17 files changed, 349 insertions(+), 197 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c

--
2.25.1


2022-08-20 06:01:58

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 02/18] KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())

From: Isaku Yamahata <[email protected]>

convert per_cpu_ptr(smp_processor_id()) to this_cpu_ptr() as trivial
cleanup.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16104a2f7d8e..7d5fff68befe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -416,8 +416,7 @@ EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);

int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
struct kvm_user_return_msr_values *values = &msrs->values[slot];
int err;

@@ -449,8 +448,7 @@ EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);

static void drop_user_return_notifiers(void)
{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);

if (msrs->registered)
kvm_on_user_return(&msrs->urn);
--
2.25.1

2022-08-20 06:02:06

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 03/18] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

From: Isaku Yamahata <[email protected]>

Because kvm_count_lock unnecessarily complicates the KVM locking convention
Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
simplicity.

Opportunistically add some comments on locking.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
Documentation/virt/kvm/locking.rst | 14 +++++-------
virt/kvm/kvm_main.c | 36 +++++++++++++++++++++---------
2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 845a561629f1..8957e32aa724 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -216,15 +216,11 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
-
-``kvm_count_lock``
-^^^^^^^^^^^^^^^^^^
-
-:Type: raw_spinlock_t
-:Arch: any
-:Protects: - hardware virtualization enable/disable
-:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
- migration.
+ - kvm_usage_count
+ - hardware virtualization enable/disable
+:Comment: Use cpus_read_lock() for hardware virtualization enable/disable
+ because hardware enabling/disabling must be atomic /wrt
+ migration. The lock order is cpus lock => kvm_lock.

``kvm->mn_invalidate_lock``
^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 515dfe9d3bcf..c6781fa30461 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -100,7 +100,6 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
*/

DEFINE_MUTEX(kvm_lock);
-static DEFINE_RAW_SPINLOCK(kvm_count_lock);
LIST_HEAD(vm_list);

static cpumask_var_t cpus_hardware_enabled;
@@ -4999,6 +4998,8 @@ static void hardware_enable_nolock(void *junk)
int cpu = raw_smp_processor_id();
int r;

+ WARN_ON_ONCE(preemptible());
+
if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;

@@ -5015,10 +5016,10 @@ static void hardware_enable_nolock(void *junk)

static int kvm_starting_cpu(unsigned int cpu)
{
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_enable_nolock(NULL);
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return 0;
}

@@ -5026,6 +5027,8 @@ static void hardware_disable_nolock(void *junk)
{
int cpu = raw_smp_processor_id();

+ WARN_ON_ONCE(preemptible());
+
if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
@@ -5034,10 +5037,10 @@ static void hardware_disable_nolock(void *junk)

static int kvm_dying_cpu(unsigned int cpu)
{
- raw_spin_lock(&kvm_count_lock);
+ mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_disable_nolock(NULL);
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
return 0;
}

@@ -5052,16 +5055,19 @@ static void hardware_disable_all_nolock(void)

static void hardware_disable_all(void)
{
- raw_spin_lock(&kvm_count_lock);
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
hardware_disable_all_nolock();
- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
}

static int hardware_enable_all(void)
{
int r = 0;

- raw_spin_lock(&kvm_count_lock);
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);

kvm_usage_count++;
if (kvm_usage_count == 1) {
@@ -5074,7 +5080,8 @@ static int hardware_enable_all(void)
}
}

- raw_spin_unlock(&kvm_count_lock);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();

return r;
}
@@ -5680,15 +5687,22 @@ static void kvm_init_debug(void)

static int kvm_suspend(void)
{
- if (kvm_usage_count)
+ /*
+ * The caller ensures that CPU hotlug is disabled by
+ * cpu_hotplug_disable() and other CPUs are offlined. No need for
+ * locking.
+ */
+ if (kvm_usage_count) {
+ lockdep_assert_not_held(&kvm_lock);
hardware_disable_nolock(NULL);
+ }
return 0;
}

static void kvm_resume(void)
{
if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_count_lock);
+ lockdep_assert_not_held(&kvm_lock);
hardware_enable_nolock(NULL);
}
}
--
2.25.1

2022-08-20 06:02:44

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 08/18] KVM: kvm_arch.c: Remove _nolock post fix

From: Isaku Yamahata <[email protected]>

Now all related callbacks are called under kvm_lock, no point for _nolock
post fix. Remove _nolock post fix for short function name.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index d593610edb0a..8a5d88b02aab 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -20,7 +20,7 @@ __weak int kvm_arch_post_init_vm(struct kvm *kvm)
return 0;
}

-static void hardware_enable_nolock(void *junk)
+static void hardware_enable(void *junk)
{
int cpu = raw_smp_processor_id();
int r;
@@ -41,7 +41,7 @@ static void hardware_enable_nolock(void *junk)
}
}

-static void hardware_disable_nolock(void *junk)
+static void hardware_disable(void *junk)
{
int cpu = raw_smp_processor_id();

@@ -55,7 +55,7 @@ static void hardware_disable_nolock(void *junk)

__weak void kvm_arch_pre_hardware_unsetup(void)
{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
}

/*
@@ -70,7 +70,7 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
return 0;

atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
+ on_each_cpu(hardware_enable, NULL, 1);

if (atomic_read(&hardware_enable_failed)) {
r = -EBUSY;
@@ -80,7 +80,7 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
r = kvm_arch_post_init_vm(kvm);
err:
if (r)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return r;
}

@@ -89,39 +89,39 @@ __weak int kvm_arch_del_vm(int usage_count)
if (usage_count)
return 0;

- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return 0;
}

__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
if (usage_count)
- hardware_enable_nolock(NULL);
+ hardware_enable(NULL);
return 0;
}

__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
{
if (usage_count)
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
return 0;
}

__weak int kvm_arch_reboot(int val)
{
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(hardware_disable, NULL, 1);
return NOTIFY_OK;
}

__weak int kvm_arch_suspend(int usage_count)
{
if (usage_count)
- hardware_disable_nolock(NULL);
+ hardware_disable(NULL);
return 0;
}

__weak void kvm_arch_resume(int usage_count)
{
if (usage_count)
- hardware_enable_nolock(NULL);
+ hardware_enable(NULL);
}
--
2.25.1

2022-08-20 06:04:09

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 15/18] KVM: x86: Delete kvm_arch_hardware_enable/disable()

From: Isaku Yamahata <[email protected]>

Now they're function call and there is no point to keep them.
Opportunistically make kvm_arch_pre_hardware_unsetup() empty.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1e8d15aa6b8..5aa6d5308ee8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -354,7 +354,7 @@ static void kvm_on_user_return(struct user_return_notifier *urn)

/*
* Disabling irqs at this point since the following code could be
- * interrupted and executed through kvm_arch_hardware_disable()
+ * interrupted and executed through hardware_disable()
*/
local_irq_save(flags);
if (msrs->registered) {
@@ -11830,17 +11830,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

-int kvm_arch_hardware_enable(void)
-{
- return static_call(kvm_x86_hardware_enable)();
-}
-
-void kvm_arch_hardware_disable(void)
-{
- static_call(kvm_x86_hardware_disable)();
- drop_user_return_notifiers();
-}
-
static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

static int __hardware_enable(void)
@@ -11852,7 +11841,7 @@ static int __hardware_enable(void)

if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return 0;
- r = kvm_arch_hardware_enable();
+ r = static_call(kvm_x86_hardware_enable)();
if (r)
pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
else
@@ -11877,12 +11866,13 @@ static void hardware_disable(void *junk)
if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
return;
cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- kvm_arch_hardware_disable();
+ static_call(kvm_x86_hardware_disable)();
+ drop_user_return_notifiers();
}

void kvm_arch_pre_hardware_unsetup(void)
{
- on_each_cpu(hardware_disable, NULL, 1);
+ /* TODO: eliminate this function */
}

/*
@@ -11968,7 +11958,7 @@ void kvm_arch_resume(int usage_count)

if (kvm_arch_check_processor_compat())
return;
- if (kvm_arch_hardware_enable())
+ if (static_call(kvm_x86_hardware_enable)())
return;

local_tsc = rdtsc();
@@ -12109,6 +12099,8 @@ int kvm_arch_hardware_setup(void *opaque)

void kvm_arch_hardware_unsetup(void)
{
+ on_each_cpu(hardware_disable, NULL, 1);
+
kvm_unregister_perf_callbacks();

static_call(kvm_x86_hardware_unsetup)();
--
2.25.1

2022-08-20 06:04:09

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 11/18] Partially revert "KVM: Pass kvm_init()'s opaque param to additional arch funcs"

From: Chao Gao <[email protected]>

This partially reverts commit b99040853738 ("KVM: Pass kvm_init()'s opaque
param to additional arch funcs") remove opaque from
kvm_arch_check_processor_compat because no one uses this opaque now.
Address conflicts for ARM (due to file movement) and manually handle RISC-V
which comes after the commit. The change about kvm_arch_hardware_setup()
in original commit are still needed so they are not reverted.

The current implementation enables hardware (e.g. enable VMX on all CPUs),
arch-specific initialization for VM creation, and disables hardware (in
x86, disable VMX on all CPUs) for last VM destruction.

TDX requires its initialization on loading KVM module with VMX enabled on
all available CPUs. It needs to enable/disable hardware on module
initialization. To reuse the same logic, one way is to pass around the
unused opaque argument, another way is to remove the unused opaque
argument. This patch is a preparation for the latter by removing the
argument.

Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
Acked-by: Anup Patel <[email protected]>
Acked-by: Claudio Imbrenda <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/arm64/kvm/arm.c | 2 +-
arch/mips/kvm/mips.c | 2 +-
arch/powerpc/kvm/powerpc.c | 2 +-
arch/riscv/kvm/main.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 16 +++-------------
8 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 986cee6fbc7f..e66dacfe7f74 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -68,7 +68,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..092d09fb6a7e 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -140,7 +140,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fb1490761c87..7b56d6ccfdfb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -447,7 +447,7 @@ int kvm_arch_hardware_setup(void *opaque)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return kvmppc_core_check_processor_compat();
}
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 1549205fe5fe..f8d6372d208f 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index edfd4bbd0cba..e26d4dd85668 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -254,7 +254,7 @@ int kvm_arch_hardware_enable(void)
return 0;
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 890985c284be..0b112cd7de58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11999,7 +11999,7 @@ void kvm_arch_hardware_unsetup(void)
static_call(kvm_x86_hardware_unsetup)();
}

-int kvm_arch_check_processor_compat(void *opaque)
+int kvm_arch_check_processor_compat(void)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fdde9c59756d..9584500eb4fa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1439,7 +1439,7 @@ void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_pre_hardware_unsetup(void);
void kvm_arch_hardware_unsetup(void);
-int kvm_arch_check_processor_compat(void *opaque);
+int kvm_arch_check_processor_compat(void);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5b8e8addd1e5..9d917e3e47a9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5741,22 +5741,14 @@ void kvm_unregister_perf_callbacks(void)
}
#endif

-struct kvm_cpu_compat_check {
- void *opaque;
- int *ret;
-};
-
-static void check_processor_compat(void *data)
+static void check_processor_compat(void *rtn)
{
- struct kvm_cpu_compat_check *c = data;
-
- *c->ret = kvm_arch_check_processor_compat(c->opaque);
+ *(int *)rtn = kvm_arch_check_processor_compat();
}

int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module)
{
- struct kvm_cpu_compat_check c;
int r;
int cpu;

@@ -5779,10 +5771,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r < 0)
goto out_free_1;

- c.ret = &r;
- c.opaque = opaque;
for_each_online_cpu(cpu) {
- smp_call_function_single(cpu, check_processor_compat, &c, 1);
+ smp_call_function_single(cpu, check_processor_compat, &r, 1);
if (r < 0)
goto out_free_2;
}
--
2.25.1

2022-08-20 06:04:09

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 13/18] KVM: x86: Duplicate arch callbacks related to pm events

From: Isaku Yamahata <[email protected]>

KVM/X86 can change those callbacks without worrying about breaking other
archs.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 125 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 120 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b112cd7de58..71e90d0f0da9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11841,6 +11841,124 @@ void kvm_arch_hardware_disable(void)
drop_user_return_notifiers();
}

+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return kvm_mmu_post_init_vm(kvm);
+}
+
+static int __hardware_enable(void)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return 0;
+ r = kvm_arch_hardware_enable();
+ if (r)
+ pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable())
+ atomic_inc(failed);
+}
+
+static void hardware_disable(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+void kvm_arch_pre_hardware_unsetup(void)
+{
+ on_each_cpu(hardware_disable, NULL, 1);
+}
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ atomic_t failed;
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&failed, 0);
+ on_each_cpu(hardware_enable, &failed, 1);
+
+ if (atomic_read(&failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r && usage_count == 1)
+ on_each_cpu(hardware_disable, NULL, 1);
+ return r;
+}
+
+int kvm_arch_del_vm(int usage_count)
+{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable, NULL, 1);
+ return 0;
+}
+
+int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ int r;
+
+ if (!usage_count)
+ return 0;
+
+ r = kvm_arch_check_processor_compat();
+ if (r)
+ return r;
+ return __hardware_enable();
+}
+
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ if (usage_count)
+ hardware_disable(NULL);
+ return 0;
+}
+
+int kvm_arch_reboot(int val)
+{
+ on_each_cpu(hardware_disable, NULL, 1);
+ return NOTIFY_OK;
+}
+
+int kvm_arch_suspend(int usage_count)
+{
+ if (usage_count)
+ hardware_disable(NULL);
+ return 0;
+}
+
void kvm_arch_resume(int usage_count)
{
struct kvm *kvm;
@@ -11853,6 +11971,8 @@ void kvm_arch_resume(int usage_count)
if (!usage_count)
return;

+ if (kvm_arch_check_processor_compat())
+ return;
if (kvm_arch_hardware_enable())
return;

@@ -12104,11 +12224,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return ret;
}

-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
--
2.25.1

2022-08-20 06:04:09

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 10/18] KVM: x86: Move check_processor_compatibility from init ops to runtime ops

From: Chao Gao <[email protected]>

so that KVM can do compatibility checks on hotplugged CPUs. Drop __init
from check_processor_compatibility() and its callees.

use a static_call() to invoke .check_processor_compatibility.

Opportunistically rename {svm,vmx}_check_processor_compat to conform
to the naming convention of fields of kvm_x86_ops.

Signed-off-by: Chao Gao <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 14 +++++++-------
arch/x86/kvm/x86.c | 3 +--
5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 51f777071584..3bc45932e2d1 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -129,6 +129,7 @@ KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(check_processor_compatibility)

#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ffa578cafe1..f23d9852b07a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1445,6 +1445,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
struct kvm_x86_ops {
const char *name;

+ int (*check_processor_compatibility)(void);
int (*hardware_enable)(void);
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
@@ -1655,7 +1656,6 @@ struct kvm_x86_nested_ops {
struct kvm_x86_init_ops {
int (*cpu_has_kvm_support)(void);
int (*disabled_by_bios)(void);
- int (*check_processor_compatibility)(void);
int (*hardware_setup)(void);
unsigned int (*handle_intel_pt_intr)(void);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..371300f03f55 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4134,7 +4134,7 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xd9;
}

-static int __init svm_check_processor_compat(void)
+static int svm_check_processor_compatibility(void)
{
return 0;
}
@@ -4740,6 +4740,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = "kvm_amd",

.hardware_unsetup = svm_hardware_unsetup,
+ .check_processor_compatibility = svm_check_processor_compatibility,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
.has_emulated_msr = svm_has_emulated_msr,
@@ -5122,7 +5123,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
.hardware_setup = svm_hardware_setup,
- .check_processor_compatibility = svm_check_processor_compat,

.runtime_ops = &svm_x86_ops,
.pmu_ops = &amd_pmu_ops,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d7f8331d6f7e..3cf7f18a4115 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2495,8 +2495,8 @@ static bool cpu_has_sgx(void)
return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0));
}

-static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
- u32 msr, u32 *result)
+static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
+ u32 msr, u32 *result)
{
u32 vmx_msr_low, vmx_msr_high;
u32 ctl = ctl_min | ctl_opt;
@@ -2514,7 +2514,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
return 0;
}

-static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
+static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
{
u64 allowed;

@@ -2523,8 +2523,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr)
return ctl_opt & allowed;
}

-static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
- struct vmx_capability *vmx_cap)
+static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
+ struct vmx_capability *vmx_cap)
{
u32 vmx_msr_low, vmx_msr_high;
u32 min, opt, min2, opt2;
@@ -7417,7 +7417,7 @@ static int vmx_vm_init(struct kvm *kvm)
return 0;
}

-static int __init vmx_check_processor_compat(void)
+static int vmx_check_processor_compatibility(void)
{
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;
@@ -8015,6 +8015,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {

.hardware_unsetup = vmx_hardware_unsetup,

+ .check_processor_compatibility = vmx_check_processor_compatibility,
.hardware_enable = vmx_hardware_enable,
.hardware_disable = vmx_hardware_disable,
.has_emulated_msr = vmx_has_emulated_msr,
@@ -8404,7 +8405,6 @@ static __init int hardware_setup(void)
static struct kvm_x86_init_ops vmx_init_ops __initdata = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
- .check_processor_compatibility = vmx_check_processor_compat,
.hardware_setup = hardware_setup,
.handle_intel_pt_intr = NULL,

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a7ab9984a236..890985c284be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12002,7 +12002,6 @@ void kvm_arch_hardware_unsetup(void)
int kvm_arch_check_processor_compat(void *opaque)
{
struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
- struct kvm_x86_init_ops *ops = opaque;

WARN_ON(!irqs_disabled());

@@ -12010,7 +12009,7 @@ int kvm_arch_check_processor_compat(void *opaque)
__cr4_reserved_bits(cpu_has, &boot_cpu_data))
return -EIO;

- return ops->check_processor_compatibility();
+ return static_call(kvm_x86_check_processor_compatibility)();
}

bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
--
2.25.1

2022-08-20 06:04:41

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 14/18] KVM: Eliminate kvm_arch_post_init_vm()

From: Isaku Yamahata <[email protected]>

Now kvm_arch_post_init_vm() is used only by x86 kvm_arch_add_vm(). Other
arch doesn't define it. Merge x86 kvm_arch_post_init_vm() int x86
kvm_arch_add_vm() and eliminate kvm_arch_post_init_vm().

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 7 +------
include/linux/kvm_host.h | 1 -
virt/kvm/kvm_arch.c | 12 +-----------
3 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71e90d0f0da9..a1e8d15aa6b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11843,11 +11843,6 @@ void kvm_arch_hardware_disable(void)

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

-int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return kvm_mmu_post_init_vm(kvm);
-}
-
static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
@@ -11910,7 +11905,7 @@ int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
goto err;
}

- r = kvm_arch_post_init_vm(kvm);
+ r = kvm_mmu_post_init_vm(kvm);
err:
if (r && usage_count == 1)
on_each_cpu(hardware_disable, NULL, 1);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9584500eb4fa..7983744addbf 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,7 +1445,6 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
-int kvm_arch_post_init_vm(struct kvm *kvm);
int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 20971f43df95..94dd57bbc8bd 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -14,11 +14,6 @@

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;

-__weak int kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
@@ -78,13 +73,8 @@ __weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)

if (atomic_read(&failed)) {
r = -EBUSY;
- goto err;
- }
-
- r = kvm_arch_post_init_vm(kvm);
-err:
- if (r)
on_each_cpu(hardware_disable, NULL, 1);
+ }
return r;
}

--
2.25.1

2022-08-20 06:06:03

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 17/18] KVM: x86: Remove cpus_hardware_enabled and related sanity check

From: Isaku Yamahata <[email protected]>

cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.

Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5aa6d5308ee8..f43caa8919ea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11830,22 +11830,15 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void)
{
- int cpu = raw_smp_processor_id();
int r;

WARN_ON_ONCE(preemptible());

- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = static_call(kvm_x86_hardware_enable)();
if (r)
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ pr_info("kvm: enabling virtualization on CPU%d failed\n", smp_processor_id());
return r;
}

@@ -11859,13 +11852,7 @@ static void hardware_enable(void *arg)

static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
static_call(kvm_x86_hardware_disable)();
drop_user_return_notifiers();
}
--
2.25.1

2022-08-20 06:06:25

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 16/18] KVM: Add config to not compile kvm_arch.c

From: Isaku Yamahata <[email protected]>

So that kvm_arch_hardware_enable/disable() aren't defined.

Once the conversion of all KVM archs is done, this config and kvm_arch.c
should be removed.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
include/linux/kvm_host.h | 2 ++
virt/kvm/Kconfig | 3 +++
virt/kvm/Makefile.kvm | 5 ++++-
4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e3cbd7706136..e2e16205425d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
depends on X86_LOCAL_APIC
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
+ select HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
select HAVE_KVM_IRQCHIP
select HAVE_KVM_PFNCACHE
select HAVE_KVM_IRQFD
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7983744addbf..1a05438ef33f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1434,8 +1434,10 @@ void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_
static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif

+#ifndef CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
+#endif
int kvm_arch_hardware_setup(void *opaque);
void kvm_arch_pre_hardware_unsetup(void);
void kvm_arch_hardware_unsetup(void);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index a8c5c9f06b3c..917314a87696 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -72,3 +72,6 @@ config KVM_XFER_TO_GUEST_WORK

config HAVE_KVM_PM_NOTIFIER
bool
+
+config HAVE_KVM_OVERRIDE_HARDWARE_ENABLE
+ def_bool n
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index c4210acabd35..c0187ec4f83c 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,10 @@

KVM ?= ../../../virt/kvm

-kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+ifneq ($(CONFIG_HAVE_KVM_OVERRIDE_HARDWARE_ENABLE), y)
+kvm-y += $(KVM)/kvm_arch.o
+endif
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
--
2.25.1

2022-08-20 06:06:27

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 18/18] KVM: Remove cpus_hardware_enabled and related sanity check

From: Isaku Yamahata <[email protected]>

cpus_hardware_enabled mask seems incomplete protection against other kernel
component using CPU virtualization feature. Because it's obscure and
incomplete, remove the check.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 16 ++--------------
1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 94dd57bbc8bd..03946321a21c 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -12,22 +12,16 @@

#include <linux/kvm_host.h>

-static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-
static int __hardware_enable(void)
{
- int cpu = raw_smp_processor_id();
int r;

WARN_ON_ONCE(preemptible());

- if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return 0;
r = kvm_arch_hardware_enable();
if (r)
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
- else
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ pr_info("kvm: enabling virtualization on CPU%d failed\n",
+ smp_processor_id());
return r;
}

@@ -41,13 +35,7 @@ static void hardware_enable(void *arg)

static void hardware_disable(void *junk)
{
- int cpu = raw_smp_processor_id();
-
WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
kvm_arch_hardware_disable();
}

--
2.25.1

2022-08-20 06:14:42

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 06/18] KVM: Add arch hook when VM is added/deleted

From: Isaku Yamahata <[email protected]>

and pass kvm_usage_count with kvm_lock. Move kvm_arch_post_init_vm() under
kvm_arch_add_vm(). Later kvm_arch_post_init_vm() is deleted once x86
overrides kvm_arch_add_vm().

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_arch.c | 12 +++++++++++-
virt/kvm/kvm_main.c | 21 +++++++++++++++++----
3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c38f382f3808..5ae66062620c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_post_init_vm(struct kvm *kvm);
+int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
+int kvm_arch_del_vm(int usage_count);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
int kvm_arch_create_vm_debugfs(struct kvm *kvm);

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 4748a76bcb03..0eac996f4981 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -10,11 +10,21 @@

#include <linux/kvm_host.h>

+__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return 0;
+}
+
/*
* Called after the VM is otherwise initialized, but just before adding it to
* the vm_list.
*/
-__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
+{
+ return kvm_arch_post_init_vm(kvm);
+}
+
+__weak int kvm_arch_del_vm(int usage_count)
{
return 0;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8d08126e6c74..0ba370acff8d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -144,6 +144,7 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#endif
static int hardware_enable_all(void);
static void hardware_disable_all(void);
+static void kvm_del_vm(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);

@@ -1201,11 +1202,12 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_mmu_notifier;

- r = kvm_arch_post_init_vm(kvm);
- if (r)
- goto out_err_mmu_notifier;
-
mutex_lock(&kvm_lock);
+ r = kvm_arch_add_vm(kvm, kvm_usage_count);
+ if (r) {
+ mutex_unlock(&kvm_lock);
+ goto out_err_mmu_notifier;
+ }
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);

@@ -1237,6 +1239,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
#endif
out_err_no_mmu_notifier:
hardware_disable_all();
+ kvm_del_vm();
out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
@@ -1316,6 +1319,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
hardware_disable_all();
+ kvm_del_vm();
mmdrop(mm);
module_put(kvm_chardev_ops.owner);
}
@@ -5055,6 +5059,15 @@ static void hardware_disable_all(void)
cpus_read_unlock();
}

+static void kvm_del_vm(void)
+{
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
+ kvm_arch_del_vm(kvm_usage_count);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
+}
+
static int hardware_enable_all(void)
{
int r = 0;
--
2.25.1

2022-08-20 06:14:54

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 04/18] KVM: Add arch hooks for PM events with empty stub

From: Isaku Yamahata <[email protected]>

Add arch hooks for reboot, suspend, resume, and CPU-online/offline events
with empty stub functions.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 6 ++++++
virt/kvm/Makefile.kvm | 2 +-
virt/kvm/kvm_arch.c | 44 ++++++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 35 ++++++++++++++++----------------
4 files changed, 69 insertions(+), 18 deletions(-)
create mode 100644 virt/kvm/kvm_arch.c

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1c480b1821e1..c38f382f3808 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1448,6 +1448,12 @@ int kvm_arch_post_init_vm(struct kvm *kvm);
void kvm_arch_pre_destroy_vm(struct kvm *kvm);
int kvm_arch_create_vm_debugfs(struct kvm *kvm);

+int kvm_arch_suspend(int usage_count);
+void kvm_arch_resume(int usage_count);
+int kvm_arch_reboot(int val);
+int kvm_arch_online_cpu(unsigned int cpu, int usage_count);
+int kvm_arch_offline_cpu(unsigned int cpu, int usage_count);
+
#ifndef __KVM_HAVE_ARCH_VM_ALLOC
/*
* All architectures that want to use vzalloc currently also
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 2c27d5d0c367..c4210acabd35 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -5,7 +5,7 @@

KVM ?= ../../../virt/kvm

-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm-y := $(KVM)/kvm_main.o $(KVM)/kvm_arch.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
new file mode 100644
index 000000000000..4748a76bcb03
--- /dev/null
+++ b/virt/kvm/kvm_arch.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_arch.c: kvm default arch hooks for hardware enabling/disabling
+ * Copyright (c) 2022 Intel Corporation.
+ *
+ * Author:
+ * Isaku Yamahata <[email protected]>
+ * <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+__weak int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+ return 0;
+}
+
+__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
+{
+ return 0;
+}
+
+__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
+{
+ return 0;
+}
+
+__weak int kvm_arch_reboot(int val)
+{
+ return NOTIFY_OK;
+}
+
+__weak int kvm_arch_suspend(int usage_count)
+{
+ return 0;
+}
+
+__weak void kvm_arch_resume(int usage_count)
+{
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c6781fa30461..8d08126e6c74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1095,15 +1095,6 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
return ret;
}

-/*
- * Called after the VM is otherwise initialized, but just before adding it to
- * the vm_list.
- */
-int __weak kvm_arch_post_init_vm(struct kvm *kvm)
-{
- return 0;
-}
-
/*
* Called just after removing the VM from the vm_list, but before doing any
* other destruction.
@@ -5019,6 +5010,7 @@ static int kvm_starting_cpu(unsigned int cpu)
mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_enable_nolock(NULL);
+ (void)kvm_arch_online_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return 0;
}
@@ -5040,6 +5032,7 @@ static int kvm_dying_cpu(unsigned int cpu)
mutex_lock(&kvm_lock);
if (kvm_usage_count)
hardware_disable_nolock(NULL);
+ (void)kvm_arch_offline_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return 0;
}
@@ -5089,6 +5082,8 @@ static int hardware_enable_all(void)
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
void *v)
{
+ int r;
+
/*
* Some (well, at least mine) BIOSes hang on reboot if
* in vmx root mode.
@@ -5097,8 +5092,15 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
+
+ /* This hook is called without cpuhotplug disabled. */
+ cpus_read_lock();
+ mutex_lock(&kvm_lock);
on_each_cpu(hardware_disable_nolock, NULL, 1);
- return NOTIFY_OK;
+ r = kvm_arch_reboot(val);
+ mutex_unlock(&kvm_lock);
+ cpus_read_unlock();
+ return r;
}

static struct notifier_block kvm_reboot_notifier = {
@@ -5692,19 +5694,18 @@ static int kvm_suspend(void)
* cpu_hotplug_disable() and other CPUs are offlined. No need for
* locking.
*/
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_not_held(&kvm_lock);
+ if (kvm_usage_count)
hardware_disable_nolock(NULL);
- }
- return 0;
+ return kvm_arch_suspend(kvm_usage_count);
}

static void kvm_resume(void)
{
- if (kvm_usage_count) {
- lockdep_assert_not_held(&kvm_lock);
+ kvm_arch_resume(kvm_usage_count);
+ lockdep_assert_not_held(&kvm_lock);
+ if (kvm_usage_count)
hardware_enable_nolock(NULL);
- }
}

static struct syscore_ops kvm_syscore_ops = {
--
2.25.1

2022-08-20 06:15:56

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 07/18] KVM: Move out KVM arch PM hooks and hardware enable/disable logic

From: Isaku Yamahata <[email protected]>

To make clear that those files are default implementation that KVM/x86 (and
other KVM arch in future) will override them, split out those into a single
file. Once conversions for all kvm archs are done, the file will be
deleted. kvm_arch_pre_hardware_unsetup() is introduced to avoid cross-arch
code churn for now. Once it's settled down,
kvm_arch_pre_hardware_unsetup() can be merged into
kvm_arch_hardware_unsetup() in each arch code.

Signed-off-by: Isaku Yamahata <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_arch.c | 75 ++++++++++++++++++++++++-
virt/kvm/kvm_main.c | 115 ++++-----------------------------------
3 files changed, 85 insertions(+), 106 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ae66062620c..fdde9c59756d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
int kvm_arch_hardware_setup(void *opaque);
+void kvm_arch_pre_hardware_unsetup(void);
void kvm_arch_hardware_unsetup(void);
int kvm_arch_check_processor_compat(void *opaque);
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 0eac996f4981..d593610edb0a 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -6,49 +6,122 @@
* Author:
* Isaku Yamahata <[email protected]>
* <[email protected]>
+ *
+ * TODO: Delete this file once the conversion of all KVM arch is done.
*/

#include <linux/kvm_host.h>

+static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
+static atomic_t hardware_enable_failed;
+
__weak int kvm_arch_post_init_vm(struct kvm *kvm)
{
return 0;
}

+static void hardware_enable_nolock(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+ int r;
+
+ WARN_ON_ONCE(preemptible());
+
+ if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+
+ r = kvm_arch_hardware_enable();
+
+ if (r) {
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ atomic_inc(&hardware_enable_failed);
+ pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
+ }
+}
+
+static void hardware_disable_nolock(void *junk)
+{
+ int cpu = raw_smp_processor_id();
+
+ WARN_ON_ONCE(preemptible());
+
+ if (!cpumask_test_cpu(cpu, &cpus_hardware_enabled))
+ return;
+ cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
+ kvm_arch_hardware_disable();
+}
+
+__weak void kvm_arch_pre_hardware_unsetup(void)
+{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+}
+
/*
* Called after the VM is otherwise initialized, but just before adding it to
* the vm_list.
*/
__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
{
- return kvm_arch_post_init_vm(kvm);
+ int r = 0;
+
+ if (usage_count != 1)
+ return 0;
+
+ atomic_set(&hardware_enable_failed, 0);
+ on_each_cpu(hardware_enable_nolock, NULL, 1);
+
+ if (atomic_read(&hardware_enable_failed)) {
+ r = -EBUSY;
+ goto err;
+ }
+
+ r = kvm_arch_post_init_vm(kvm);
+err:
+ if (r)
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
+ return r;
}

__weak int kvm_arch_del_vm(int usage_count)
{
+ if (usage_count)
+ return 0;
+
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
return 0;
}

__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
+ if (usage_count)
+ hardware_enable_nolock(NULL);
return 0;
}

__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
{
+ if (usage_count)
+ hardware_disable_nolock(NULL);
return 0;
}

__weak int kvm_arch_reboot(int val)
{
+ on_each_cpu(hardware_disable_nolock, NULL, 1);
return NOTIFY_OK;
}

__weak int kvm_arch_suspend(int usage_count)
{
+ if (usage_count)
+ hardware_disable_nolock(NULL);
return 0;
}

__weak void kvm_arch_resume(int usage_count)
{
+ if (usage_count)
+ hardware_enable_nolock(NULL);
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0ba370acff8d..5b8e8addd1e5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,9 +102,7 @@ EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
DEFINE_MUTEX(kvm_lock);
LIST_HEAD(vm_list);

-static cpumask_var_t cpus_hardware_enabled;
static int kvm_usage_count;
-static atomic_t hardware_enable_failed;

static struct kmem_cache *kvm_vcpu_cache;

@@ -142,8 +140,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
static void kvm_del_vm(void);

static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1190,10 +1186,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;

- r = hardware_enable_all();
- if (r)
- goto out_err_no_disable;
-
#ifdef CONFIG_HAVE_KVM_IRQFD
INIT_HLIST_HEAD(&kvm->irq_ack_notifier_list);
#endif
@@ -1202,14 +1194,18 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_mmu_notifier;

+ cpus_read_lock();
mutex_lock(&kvm_lock);
+ kvm_usage_count++;
r = kvm_arch_add_vm(kvm, kvm_usage_count);
if (r) {
+ /* the following kvm_del_vm() decrements kvm_usage_count. */
mutex_unlock(&kvm_lock);
goto out_err_mmu_notifier;
}
list_add(&kvm->vm_list, &vm_list);
mutex_unlock(&kvm_lock);
+ cpus_read_unlock();

preempt_notifier_inc();
kvm_init_pm_notifier(kvm);
@@ -1238,9 +1234,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
#endif
out_err_no_mmu_notifier:
- hardware_disable_all();
kvm_del_vm();
-out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
@@ -1318,7 +1312,6 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
- hardware_disable_all();
kvm_del_vm();
mmdrop(mm);
module_put(kvm_chardev_ops.owner);
@@ -4988,110 +4981,33 @@ static struct miscdevice kvm_dev = {
&kvm_chardev_ops,
};

-static void hardware_enable_nolock(void *junk)
-{
- int cpu = raw_smp_processor_id();
- int r;
-
- WARN_ON_ONCE(preemptible());
-
- if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, cpus_hardware_enabled);
-
- r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
- pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
- }
-}
-
static int kvm_starting_cpu(unsigned int cpu)
{
mutex_lock(&kvm_lock);
- if (kvm_usage_count)
- hardware_enable_nolock(NULL);
(void)kvm_arch_online_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return 0;
}

-static void hardware_disable_nolock(void *junk)
-{
- int cpu = raw_smp_processor_id();
-
- WARN_ON_ONCE(preemptible());
-
- if (!cpumask_test_cpu(cpu, cpus_hardware_enabled))
- return;
- cpumask_clear_cpu(cpu, cpus_hardware_enabled);
- kvm_arch_hardware_disable();
-}
-
static int kvm_dying_cpu(unsigned int cpu)
{
mutex_lock(&kvm_lock);
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
(void)kvm_arch_offline_cpu(cpu, kvm_usage_count);
mutex_unlock(&kvm_lock);
return 0;
}

-static void hardware_disable_all_nolock(void)
-{
- BUG_ON(!kvm_usage_count);
-
- kvm_usage_count--;
- if (!kvm_usage_count)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
-{
- cpus_read_lock();
- mutex_lock(&kvm_lock);
- hardware_disable_all_nolock();
- mutex_unlock(&kvm_lock);
- cpus_read_unlock();
-}
-
static void kvm_del_vm(void)
{
cpus_read_lock();
mutex_lock(&kvm_lock);
+ WARN_ON_ONCE(!kvm_usage_count);
+ kvm_usage_count--;
kvm_arch_del_vm(kvm_usage_count);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
}

-static int hardware_enable_all(void)
-{
- int r = 0;
-
- cpus_read_lock();
- mutex_lock(&kvm_lock);
-
- kvm_usage_count++;
- if (kvm_usage_count == 1) {
- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable_nolock, NULL, 1);
-
- if (atomic_read(&hardware_enable_failed)) {
- hardware_disable_all_nolock();
- r = -EBUSY;
- }
- }
-
- mutex_unlock(&kvm_lock);
- cpus_read_unlock();
-
- return r;
-}
-
static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
void *v)
{
@@ -5109,7 +5025,6 @@ static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
/* This hook is called without cpuhotplug disabled. */
cpus_read_lock();
mutex_lock(&kvm_lock);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
r = kvm_arch_reboot(val);
mutex_unlock(&kvm_lock);
cpus_read_unlock();
@@ -5708,17 +5623,13 @@ static int kvm_suspend(void)
* locking.
*/
lockdep_assert_not_held(&kvm_lock);
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
return kvm_arch_suspend(kvm_usage_count);
}

static void kvm_resume(void)
{
- kvm_arch_resume(kvm_usage_count);
lockdep_assert_not_held(&kvm_lock);
- if (kvm_usage_count)
- hardware_enable_nolock(NULL);
+ kvm_arch_resume(kvm_usage_count);
}

static struct syscore_ops kvm_syscore_ops = {
@@ -5864,11 +5775,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r)
goto out_irqfd;

- if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) {
- r = -ENOMEM;
- goto out_free_0;
- }
-
r = kvm_arch_hardware_setup(opaque);
if (r < 0)
goto out_free_1;
@@ -5947,8 +5853,6 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
out_free_2:
kvm_arch_hardware_unsetup();
out_free_1:
- free_cpumask_var(cpus_hardware_enabled);
-out_free_0:
kvm_irqfd_exit();
out_irqfd:
kvm_arch_exit();
@@ -5970,11 +5874,12 @@ void kvm_exit(void)
unregister_syscore_ops(&kvm_syscore_ops);
unregister_reboot_notifier(&kvm_reboot_notifier);
cpuhp_remove_state_nocalls(CPUHP_AP_KVM_STARTING);
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ cpus_read_lock();
+ kvm_arch_pre_hardware_unsetup();
kvm_arch_hardware_unsetup();
+ cpus_read_unlock();
kvm_arch_exit();
kvm_irqfd_exit();
- free_cpumask_var(cpus_hardware_enabled);
kvm_vfio_ops_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);
--
2.25.1

2022-08-20 06:16:25

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 01/18] KVM: x86: Drop kvm_user_return_msr_cpu_online()

From: Isaku Yamahata <[email protected]>

KVM/X86 uses user return notifier to switch MSR for guest or user space.
Snapshot host values on CPU online, change MSR values for guest, and
restore them on returning to user space. The current code abuses
kvm_arch_hardware_enable() which is called on kvm module initialization or
CPU online.

Remove such the abuse of kvm_arch_hardware_enable by capturing the host
value on the first change of the MSR value to guest VM instead of CPU
online.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 43 ++++++++++++++++++++++++-------------------
1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 205ebdc2b11b..16104a2f7d8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -200,6 +200,7 @@ struct kvm_user_return_msrs {
struct kvm_user_return_msr_values {
u64 host;
u64 curr;
+ bool initialized;
} values[KVM_MAX_NR_USER_RETURN_MSRS];
};

@@ -363,6 +364,10 @@ static void kvm_on_user_return(struct user_return_notifier *urn)
local_irq_restore(flags);
for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
values = &msrs->values[slot];
+ /*
+ * No need to check values->initialized because host = curr = 0
+ * by __GFP_ZERO when !values->initialized.
+ */
if (values->host != values->curr) {
wrmsrl(kvm_uret_msrs_list[slot], values->host);
values->curr = values->host;
@@ -409,34 +414,30 @@ int kvm_find_user_return_msr(u32 msr)
}
EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);

-static void kvm_user_return_msr_cpu_online(void)
-{
- unsigned int cpu = smp_processor_id();
- struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
- u64 value;
- int i;
-
- for (i = 0; i < kvm_nr_uret_msrs; ++i) {
- rdmsrl_safe(kvm_uret_msrs_list[i], &value);
- msrs->values[i].host = value;
- msrs->values[i].curr = value;
- }
-}
-
int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
{
unsigned int cpu = smp_processor_id();
struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+ struct kvm_user_return_msr_values *values = &msrs->values[slot];
int err;

- value = (value & mask) | (msrs->values[slot].host & ~mask);
- if (value == msrs->values[slot].curr)
+ if (unlikely(!values->initialized)) {
+ u64 host_value;
+
+ rdmsrl_safe(kvm_uret_msrs_list[slot], &host_value);
+ values->host = host_value;
+ values->curr = host_value;
+ values->initialized = true;
+ }
+
+ value = (value & mask) | (values->host & ~mask);
+ if (value == values->curr)
return 0;
err = wrmsrl_safe(kvm_uret_msrs_list[slot], value);
if (err)
return 1;

- msrs->values[slot].curr = value;
+ values->curr = value;
if (!msrs->registered) {
msrs->urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(&msrs->urn);
@@ -9212,7 +9213,12 @@ int kvm_arch_init(void *opaque)
return -ENOMEM;
}

- user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
+ /*
+ * __GFP_ZERO to ensure user_return_msrs.values[].{host, curr} match.
+ * See kvm_on_user_return()
+ */
+ user_return_msrs = alloc_percpu_gfp(struct kvm_user_return_msrs,
+ GFP_KERNEL | __GFP_ZERO);
if (!user_return_msrs) {
printk(KERN_ERR "kvm: failed to allocate percpu kvm_user_return_msrs\n");
r = -ENOMEM;
@@ -11836,7 +11842,6 @@ int kvm_arch_hardware_enable(void)
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

- kvm_user_return_msr_cpu_online();
ret = static_call(kvm_x86_hardware_enable)();
if (ret != 0)
return ret;
--
2.25.1

2022-08-20 06:16:29

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 12/18] KVM: Do processor compatibility check on cpu online and resume

From: Isaku Yamahata <[email protected]>

So far the processor compatibility check is not done for newly added CPU.
It should be done.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 2ed8de0591c9..20971f43df95 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -99,9 +99,15 @@ __weak int kvm_arch_del_vm(int usage_count)

__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
- if (usage_count)
- return __hardware_enable();
- return 0;
+ int r;
+
+ if (!usage_count)
+ return 0;
+
+ r = kvm_arch_check_processor_compat();
+ if (r)
+ return r;
+ return __hardware_enable();
}

__weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
@@ -126,6 +132,10 @@ __weak int kvm_arch_suspend(int usage_count)

__weak void kvm_arch_resume(int usage_count)
{
- if (usage_count)
- (void)__hardware_enable();
+ if (!usage_count)
+ return;
+
+ if (kvm_arch_check_processor_compat())
+ return; /* FIXME: disable KVM */
+ (void)__hardware_enable();
}
--
2.25.1

2022-08-20 06:16:38

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 05/18] KVM: x86: Move TSC fixup logic to KVM arch resume callback

From: Isaku Yamahata <[email protected]>

commit 0dd6a6edb012 ("KVM: Dont mark TSC unstable due to S4 suspend") made
use of kvm_arch_hardware_enable() callback to detect that TSC goes backward
due to S4 suspend. It has to check it only when resuming from S4. Not
every time virtualization hardware ennoblement. Move the logic to
kvm_arch_resume() callback.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
---
arch/x86/kvm/x86.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7d5fff68befe..a7ab9984a236 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11831,18 +11831,30 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);

int kvm_arch_hardware_enable(void)
+{
+ return static_call(kvm_x86_hardware_enable)();
+}
+
+void kvm_arch_hardware_disable(void)
+{
+ static_call(kvm_x86_hardware_disable)();
+ drop_user_return_notifiers();
+}
+
+void kvm_arch_resume(int usage_count)
{
struct kvm *kvm;
struct kvm_vcpu *vcpu;
unsigned long i;
- int ret;
u64 local_tsc;
u64 max_tsc = 0;
bool stable, backwards_tsc = false;

- ret = static_call(kvm_x86_hardware_enable)();
- if (ret != 0)
- return ret;
+ if (!usage_count)
+ return;
+
+ if (kvm_arch_hardware_enable())
+ return;

local_tsc = rdtsc();
stable = !kvm_check_tsc_unstable();
@@ -11917,13 +11929,6 @@ int kvm_arch_hardware_enable(void)
}

}
- return 0;
-}
-
-void kvm_arch_hardware_disable(void)
-{
- static_call(kvm_x86_hardware_disable)();
- drop_user_return_notifiers();
}

static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
--
2.25.1

2022-08-20 06:17:26

by Isaku Yamahata

[permalink] [raw]
Subject: [RFC PATCH 09/18] KVM: kvm_arch.c: Remove a global variable, hardware_enable_failed

From: Isaku Yamahata <[email protected]>

A global variable hardware_enable_failed in kvm_arch.c is used only by
kvm_arch_add_vm() and hardware_enable(). It doesn't have to be a global
variable. Make it function local.

Signed-off-by: Isaku Yamahata <[email protected]>
---
virt/kvm/kvm_arch.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
index 8a5d88b02aab..2ed8de0591c9 100644
--- a/virt/kvm/kvm_arch.c
+++ b/virt/kvm/kvm_arch.c
@@ -13,14 +13,13 @@
#include <linux/kvm_host.h>

static cpumask_t cpus_hardware_enabled = CPU_MASK_NONE;
-static atomic_t hardware_enable_failed;

__weak int kvm_arch_post_init_vm(struct kvm *kvm)
{
return 0;
}

-static void hardware_enable(void *junk)
+static int __hardware_enable(void)
{
int cpu = raw_smp_processor_id();
int r;
@@ -28,17 +27,21 @@ static void hardware_enable(void *junk)
WARN_ON_ONCE(preemptible());

if (cpumask_test_cpu(cpu, &cpus_hardware_enabled))
- return;
-
- cpumask_set_cpu(cpu, &cpus_hardware_enabled);
-
+ return 0;
r = kvm_arch_hardware_enable();
-
- if (r) {
- cpumask_clear_cpu(cpu, &cpus_hardware_enabled);
- atomic_inc(&hardware_enable_failed);
+ if (r)
pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
- }
+ else
+ cpumask_set_cpu(cpu, &cpus_hardware_enabled);
+ return r;
+}
+
+static void hardware_enable(void *arg)
+{
+ atomic_t *failed = arg;
+
+ if (__hardware_enable())
+ atomic_inc(failed);
}

static void hardware_disable(void *junk)
@@ -64,15 +67,16 @@ __weak void kvm_arch_pre_hardware_unsetup(void)
*/
__weak int kvm_arch_add_vm(struct kvm *kvm, int usage_count)
{
+ atomic_t failed;
int r = 0;

if (usage_count != 1)
return 0;

- atomic_set(&hardware_enable_failed, 0);
- on_each_cpu(hardware_enable, NULL, 1);
+ atomic_set(&failed, 0);
+ on_each_cpu(hardware_enable, &failed, 1);

- if (atomic_read(&hardware_enable_failed)) {
+ if (atomic_read(&failed)) {
r = -EBUSY;
goto err;
}
@@ -96,7 +100,7 @@ __weak int kvm_arch_del_vm(int usage_count)
__weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
{
if (usage_count)
- hardware_enable(NULL);
+ return __hardware_enable();
return 0;
}

@@ -123,5 +127,5 @@ __weak int kvm_arch_suspend(int usage_count)
__weak void kvm_arch_resume(int usage_count)
{
if (usage_count)
- hardware_enable(NULL);
+ (void)__hardware_enable();
}
--
2.25.1

2022-08-23 02:42:30

by Chao Gao

[permalink] [raw]
Subject: Re: [RFC PATCH 03/18] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

On Fri, Aug 19, 2022 at 11:00:09PM -0700, [email protected] wrote:
>From: Isaku Yamahata <[email protected]>
>
>Because kvm_count_lock unnecessarily complicates the KVM locking convention
>Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
>simplicity.
>
>Opportunistically add some comments on locking.
>
>Suggested-by: Sean Christopherson <[email protected]>
>Signed-off-by: Isaku Yamahata <[email protected]>
>---
> static cpumask_var_t cpus_hardware_enabled;
>@@ -4999,6 +4998,8 @@ static void hardware_enable_nolock(void *junk)
> int cpu = raw_smp_processor_id();
> int r;
>
>+ WARN_ON_ONCE(preemptible());
>+
> if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
> return;
>
>@@ -5015,10 +5016,10 @@ static void hardware_enable_nolock(void *junk)
>
> static int kvm_starting_cpu(unsigned int cpu)
> {
>- raw_spin_lock(&kvm_count_lock);
>+ mutex_lock(&kvm_lock);

kvm_starting_cpu() is called with interrupt disabled. So we cannot use
sleeping locks (e.g., mutex) here.

2022-08-23 08:03:31

by Chao Gao

[permalink] [raw]
Subject: Re: [RFC PATCH 12/18] KVM: Do processor compatibility check on cpu online and resume

On Fri, Aug 19, 2022 at 11:00:18PM -0700, [email protected] wrote:
>From: Isaku Yamahata <[email protected]>
>
>So far the processor compatibility check is not done for newly added CPU.
>It should be done.
>
>Signed-off-by: Isaku Yamahata <[email protected]>
>---
> virt/kvm/kvm_arch.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
>diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
>index 2ed8de0591c9..20971f43df95 100644
>--- a/virt/kvm/kvm_arch.c
>+++ b/virt/kvm/kvm_arch.c
>@@ -99,9 +99,15 @@ __weak int kvm_arch_del_vm(int usage_count)
>
> __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> {
>- if (usage_count)
>- return __hardware_enable();
>- return 0;
>+ int r;
>+
>+ if (!usage_count)
>+ return 0;
>+
>+ r = kvm_arch_check_processor_compat();
>+ if (r)
>+ return r;

I think kvm_arch_check_processor_compat() should be called even when
usage_count is 0. Otherwise, compatibility checks may be missing on some
CPUs if no VM is running when those CPUs becomes online.

>+ return __hardware_enable();
> }
>
> __weak int kvm_arch_offline_cpu(unsigned int cpu, int usage_count)
>@@ -126,6 +132,10 @@ __weak int kvm_arch_suspend(int usage_count)
>
> __weak void kvm_arch_resume(int usage_count)
> {
>- if (usage_count)
>- (void)__hardware_enable();
>+ if (!usage_count)
>+ return;
>+
>+ if (kvm_arch_check_processor_compat())
>+ return; /* FIXME: disable KVM */

Ditto.

>+ (void)__hardware_enable();
> }
>--
>2.25.1
>

2022-08-23 08:53:18

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [RFC PATCH 12/18] KVM: Do processor compatibility check on cpu online and resume

On Tue, Aug 23, 2022 at 03:50:09PM +0800,
Chao Gao <[email protected]> wrote:

> >diff --git a/virt/kvm/kvm_arch.c b/virt/kvm/kvm_arch.c
> >index 2ed8de0591c9..20971f43df95 100644
> >--- a/virt/kvm/kvm_arch.c
> >+++ b/virt/kvm/kvm_arch.c
> >@@ -99,9 +99,15 @@ __weak int kvm_arch_del_vm(int usage_count)
> >
> > __weak int kvm_arch_online_cpu(unsigned int cpu, int usage_count)
> > {
> >- if (usage_count)
> >- return __hardware_enable();
> >- return 0;
> >+ int r;
> >+
> >+ if (!usage_count)
> >+ return 0;
> >+
> >+ r = kvm_arch_check_processor_compat();
> >+ if (r)
> >+ return r;
>
> I think kvm_arch_check_processor_compat() should be called even when
> usage_count is 0. Otherwise, compatibility checks may be missing on some
> CPUs if no VM is running when those CPUs becomes online.

Oh, right. Compatibility check should be done unconditionally.
--
Isaku Yamahata <[email protected]>

2022-08-23 09:57:45

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [RFC PATCH 03/18] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock

On Tue, Aug 23, 2022 at 10:26:09AM +0800,
Chao Gao <[email protected]> wrote:

> On Fri, Aug 19, 2022 at 11:00:09PM -0700, [email protected] wrote:
> >From: Isaku Yamahata <[email protected]>
> >
> >Because kvm_count_lock unnecessarily complicates the KVM locking convention
> >Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock for
> >simplicity.
> >
> >Opportunistically add some comments on locking.
> >
> >Suggested-by: Sean Christopherson <[email protected]>
> >Signed-off-by: Isaku Yamahata <[email protected]>
> >---
> > static cpumask_var_t cpus_hardware_enabled;
> >@@ -4999,6 +4998,8 @@ static void hardware_enable_nolock(void *junk)
> > int cpu = raw_smp_processor_id();
> > int r;
> >
> >+ WARN_ON_ONCE(preemptible());
> >+
> > if (cpumask_test_cpu(cpu, cpus_hardware_enabled))
> > return;
> >
> >@@ -5015,10 +5016,10 @@ static void hardware_enable_nolock(void *junk)
> >
> > static int kvm_starting_cpu(unsigned int cpu)
> > {
> >- raw_spin_lock(&kvm_count_lock);
> >+ mutex_lock(&kvm_lock);
>
> kvm_starting_cpu() is called with interrupt disabled. So we cannot use
> sleeping locks (e.g., mutex) here.


So your patch to move it to online section [1] is needed.
I thought I can pullin only some of your patches. But whole your patches are
needed.

[1] https://lore.kernel.org/lkml/20220317091539.GA7257@gao-cwp/T/#mcc0fd81e7a19601e7c3ce451582c516d38f977f6
--
Isaku Yamahata <[email protected]>