2017-04-06 20:21:20

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH 0/6] KVM: towards maintainable kvm_make_all_cpus_request()

[1/6] makes a significant change for s390 and might be too dangerous
because of that.
I'm ok with returning 0 from s390's kvm_arch_vcpu_should_kick() until we
sort out architecture-specific kicks.

Adding kvm_vcpu_wake_up() in [6/6] is the reason why the other patches
were included.

Compile tested on s390, lightly tested and checked that kvm_*_request()
uses are being optimized on x86.


Radim Krčmář (6):
KVM: fix guest_mode optimization in kvm_make_all_cpus_request()
KVM: use kvm_{test,clear}_request instead of {test,clear}_bit
KVM: x86: use kvm_make_request instead of set_bit
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: mark requests that do not need a wakeup
KVM: perform a wake_up in kvm_make_all_cpus_request

arch/arm/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/emulate.c | 2 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 4 ++--
arch/powerpc/kvm/powerpc.c | 2 +-
arch/s390/kvm/kvm-s390.c | 6 ++----
arch/x86/include/asm/kvm_host.h | 6 +++---
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 18 ++++++------------
include/linux/kvm_host.h | 22 +++++++++++++++++-----
virt/kvm/kvm_main.c | 7 ++++---
13 files changed, 41 insertions(+), 36 deletions(-)

--
2.12.0


2017-04-06 20:21:45

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH 2/6] KVM: use kvm_{test,clear}_request instead of {test,clear}_bit

Users were expected to use kvm_check_request() for testing and clearing,
but request have expanded their use since then and some users want to
only test or do a faster clear.

Make sure that requests are not directly accessed with bit operations, because
we'll be clearing them later.

Signed-off-by: Radim Krčmář <[email protected]>
---
arch/mips/kvm/emulate.c | 2 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 4 ++--
arch/powerpc/kvm/powerpc.c | 2 +-
arch/s390/kvm/kvm-s390.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 8 ++++----
include/linux/kvm_host.h | 14 ++++++++++++--
9 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
index d40cfaad4529..aee5ba5840af 100644
--- a/arch/mips/kvm/emulate.c
+++ b/arch/mips/kvm/emulate.c
@@ -865,7 +865,7 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu)
* check if any I/O interrupts are pending.
*/
if (kvm_check_request(KVM_REQ_UNHALT, vcpu)) {
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
vcpu->run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
}
}
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d4dfc0ca2a44..d8ace42f34a6 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -349,7 +349,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
if (msr & MSR_POW) {
if (!vcpu->arch.pending_exceptions) {
kvm_vcpu_block(vcpu);
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
vcpu->stat.halt_wakeup++;

/* Unset POW bit after we woke up */
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index f102616febc7..bcbeeb62dd13 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -344,7 +344,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
case H_CEDE:
kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
kvm_vcpu_block(vcpu);
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
vcpu->stat.halt_wakeup++;
return EMULATE_DONE;
case H_LOGICAL_CI_LOAD:
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 0514cbd4e533..ab968f60d14c 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -579,7 +579,7 @@ static void arm_next_watchdog(struct kvm_vcpu *vcpu)
* userspace, so clear the KVM_REQ_WATCHDOG request.
*/
if ((vcpu->arch.tsr & (TSR_ENW | TSR_WIS)) != (TSR_ENW | TSR_WIS))
- clear_bit(KVM_REQ_WATCHDOG, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_WATCHDOG, vcpu);

spin_lock_irqsave(&vcpu->arch.wdt_lock, flags);
nr_jiffies = watchdog_next_timeout(vcpu);
@@ -690,7 +690,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
if (vcpu->arch.shared->msr & MSR_WE) {
local_irq_enable();
kvm_vcpu_block(vcpu);
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
hard_irq_disable();

kvmppc_set_exit_type(vcpu, EMULATED_MTMSRWE_EXITS);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0e42aa8a279f..63bdf14d4389 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -232,7 +232,7 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
case EV_HCALL_TOKEN(EV_IDLE):
r = EV_SUCCESS;
kvm_vcpu_block(vcpu);
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
break;
default:
r = EV_UNIMPLEMENTED;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 45b6d9ca5d24..ead4b476cce9 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2444,7 +2444,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
}

/* nothing to do, just clear the request */
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);

return 0;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cfdb0d9389d1..24cb416aa84f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6317,7 +6317,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
if (intr_window_requested && vmx_interrupt_allowed(vcpu))
return handle_interrupt_window(&vmx->vcpu);

- if (test_bit(KVM_REQ_EVENT, &vcpu->requests))
+ if (kvm_test_request(KVM_REQ_EVENT, vcpu))
return 1;

err = emulate_instruction(vcpu, EMULTYPE_NO_REEXECUTE);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bc47e2712c8..71a019832df9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1768,7 +1768,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)

/* guest entries allowed */
kvm_for_each_vcpu(i, vcpu, kvm)
- clear_bit(KVM_REQ_MCLOCK_INPROGRESS, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);

spin_unlock(&ka->pvclock_gtod_sync_lock);
#endif
@@ -7045,7 +7045,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
if (r <= 0)
break;

- clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_PENDING_TIMER, vcpu);
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);

@@ -7173,7 +7173,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
kvm_vcpu_block(vcpu);
kvm_apic_accept_events(vcpu);
- clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
+ kvm_clear_request(KVM_REQ_UNHALT, vcpu);
r = -EAGAIN;
goto out;
}
@@ -8383,7 +8383,7 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
if (atomic_read(&vcpu->arch.nmi_queued))
return true;

- if (test_bit(KVM_REQ_SMI, &vcpu->requests))
+ if (kvm_test_request(KVM_REQ_SMI, vcpu))
return true;

if (kvm_arch_interrupt_allowed(vcpu) &&
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e74ae4d99bb..51737d6401ae 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1079,10 +1079,20 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
set_bit(req, &vcpu->requests);
}

+static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
+{
+ return test_bit(req, &vcpu->requests);
+}
+
+static inline void kvm_clear_request(int req, struct kvm_vcpu *vcpu)
+{
+ clear_bit(req, &vcpu->requests);
+}
+
static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
{
- if (test_bit(req, &vcpu->requests)) {
- clear_bit(req, &vcpu->requests);
+ if (kvm_test_request(req, vcpu)) {
+ kvm_clear_request(req, vcpu);

/*
* Ensure the rest of the request is visible to kvm_check_request's
--
2.12.0

2017-04-06 20:22:05

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH RFC 5/6] KVM: mark requests that do not need a wakeup

Some operations must ensure that the guest is not running with stale
data, but if the guest is halted, then the update can wait until another
event happens. kvm_make_all_requests() currently doesn't wake up, so we
can mark all requests used with it.

First 8 bits were arbitrarily reserved for request numbers.

Most uses of requests have the request type as a constant, so a compiler
will optimize the '&'.

An alternative would be to have an inline function that would return
whether the request needs a wake-up or not, but I like this one better
even though it might produce worse assembly.

Suggested-by: Christoffer Dall <[email protected]>
Signed-off-by: Radim Krčmář <[email protected]>
---
Btw. do you recall which macro allowed to define bitmasks? (It has
two arguments, FROM and TO.)

arch/arm/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/x86/include/asm/kvm_host.h | 6 +++---
include/linux/kvm_host.h | 12 +++++++-----
4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index de67ce647501..49358f20d36f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -44,7 +44,7 @@
#define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
#endif

-#define KVM_REQ_VCPU_EXIT 8
+#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_NO_WAKEUP)

u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
int __attribute_const__ kvm_target_cpu(void);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 522e4f60976e..1c9458a7ec92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,7 +41,7 @@

#define KVM_VCPU_MAX_FEATURES 4

-#define KVM_REQ_VCPU_EXIT 8
+#define KVM_REQ_VCPU_EXIT (8 | KVM_REQUEST_NO_WAKEUP)

int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d962fa998a6f..901fc1ab1206 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -61,10 +61,10 @@
#define KVM_REQ_PMI 19
#define KVM_REQ_SMI 20
#define KVM_REQ_MASTERCLOCK_UPDATE 21
-#define KVM_REQ_MCLOCK_INPROGRESS 22
-#define KVM_REQ_SCAN_IOAPIC 23
+#define KVM_REQ_MCLOCK_INPROGRESS (22 | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_SCAN_IOAPIC (23 | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_GLOBAL_CLOCK_UPDATE 24
-#define KVM_REQ_APIC_PAGE_RELOAD 25
+#define KVM_REQ_APIC_PAGE_RELOAD (25 | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_CRASH 26
#define KVM_REQ_IOAPIC_EOI_EXIT 27
#define KVM_REQ_HV_RESET 28
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 51737d6401ae..2b706704bd92 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -115,12 +115,14 @@ static inline bool is_error_page(struct page *page)
return IS_ERR(page);
}

+#define KVM_REQUEST_MASK 0xff
+#define KVM_REQUEST_NO_WAKEUP BIT(8)
/*
* Architecture-independent vcpu->requests bit members
* Bits 4-7 are reserved for more arch-independent bits.
*/
-#define KVM_REQ_TLB_FLUSH 0
-#define KVM_REQ_MMU_RELOAD 1
+#define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_MMU_RELOAD (1 | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_PENDING_TIMER 2
#define KVM_REQ_UNHALT 3

@@ -1076,17 +1078,17 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
* caller. Paired with the smp_mb__after_atomic in kvm_check_request.
*/
smp_wmb();
- set_bit(req, &vcpu->requests);
+ set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
}

static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
{
- return test_bit(req, &vcpu->requests);
+ return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
}

static inline void kvm_clear_request(int req, struct kvm_vcpu *vcpu)
{
- clear_bit(req, &vcpu->requests);
+ clear_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
}

static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
--
2.12.0

2017-04-06 20:22:45

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH 3/6] KVM: x86: use kvm_make_request instead of set_bit

Signed-off-by: Radim Krčmář <[email protected]>
---
arch/x86/kvm/x86.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71a019832df9..57e9989232e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2229,8 +2229,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
bool tmp = (msr == MSR_KVM_SYSTEM_TIME);

if (ka->boot_vcpu_runs_old_kvmclock != tmp)
- set_bit(KVM_REQ_MASTERCLOCK_UPDATE,
- &vcpu->requests);
+ kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);

ka->boot_vcpu_runs_old_kvmclock = tmp;
}
@@ -2803,11 +2802,6 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
return kvm_arch_has_noncoherent_dma(vcpu->kvm);
}

-static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu)
-{
- set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests);
-}
-
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
/* Address WBINVD may be executed by guest */
@@ -2851,7 +2845,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
if (vcpu->cpu != cpu)
- kvm_migrate_timers(vcpu);
+ kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
vcpu->cpu = cpu;
}

--
2.12.0

2017-04-06 20:21:56

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request

We want to have kvm_make_all_cpus_request() to be an optmized version of

kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(vcpu, request);
kvm_vcpu_kick(vcpu);
}

and kvm_vcpu_kick() wakes up the target vcpu. We know which requests do
not need the wake up and use it to optimize the loop.

Thanks to that, this patch doesn't change the behavior of current users
(the all don't need the wake up) and only prepares for future where the
wake up is going to be needed.

I think that most requests do not need the wake up, so we would flip the
bit then.

Signed-off-by: Radim Krčmář <[email protected]>
---
virt/kvm/kvm_main.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a486c6ad27a6..1db503bab3dc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,6 +186,9 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
/* Set ->requests bit before we read ->mode. */
smp_mb__after_atomic();

+ if (!(req & KVM_REQUEST_NO_WAKEUP))
+ kvm_vcpu_wake_up(vcpu);
+
if (cpus != NULL && cpu != -1 && cpu != me &&
kvm_arch_vcpu_should_kick(vcpu))
cpumask_set_cpu(cpu, cpus);
--
2.12.0

2017-04-06 20:22:34

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH 4/6] KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up

The #ifndef was protecting a missing halt_wakeup stat, but that is no
longer necessary.

Signed-off-by: Radim Krčmář <[email protected]>
---
virt/kvm/kvm_main.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2389e9c41cd2..a486c6ad27a6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2195,7 +2195,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_block);

-#ifndef CONFIG_S390
void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
{
struct swait_queue_head *wqp;
@@ -2225,7 +2224,6 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
put_cpu();
}
EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
-#endif /* !CONFIG_S390 */

int kvm_vcpu_yield_to(struct kvm_vcpu *target)
{
--
2.12.0

2017-04-06 20:21:33

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

We have kvm_arch_vcpu_should_kick() to decide whether the target cpu
needs to be kicked. The previous condition was wrong, because
architectures that don't use vcpu->mode would not get interrupts and
also suboptimal, because it sent IPI in cases where none was necessary.

The situation is even more convoluted. MIPS and POWERPC return 1 from
kvm_arch_vcpu_should_kick(), but implement vcpu->mode for some reason,
so now they might kick uselessly. This is not a huge problem.

s390, on the other hand, never changed vcpu->mode, so it would always be
OUTSIDE_GUEST_MODE before and therefore didn't send IPIs.
I don't see a reason why s390 had kvm_make_all_cpus_request() that did
nothing but set the bit in vcpu->request, so the new behavior seems
better.

Signed-off-by: Radim Krčmář <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 4 +---
virt/kvm/kvm_main.c | 2 +-
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fd6cd05bb6a7..45b6d9ca5d24 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2130,9 +2130,7 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,

int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
- /* kvm common code refers to this, but never calls it */
- BUG();
- return 0;
+ return 1;
}

static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f489167839c4..2389e9c41cd2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -187,7 +187,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
smp_mb__after_atomic();

if (cpus != NULL && cpu != -1 && cpu != me &&
- kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
+ kvm_arch_vcpu_should_kick(vcpu))
cpumask_set_cpu(cpu, cpus);
}
if (unlikely(cpus == NULL))
--
2.12.0

2017-04-06 21:02:20

by James Hogan

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

On Thu, Apr 06, 2017 at 10:20:51PM +0200, Radim Krčmář wrote:
> We have kvm_arch_vcpu_should_kick() to decide whether the target cpu
> needs to be kicked. The previous condition was wrong, because
> architectures that don't use vcpu->mode would not get interrupts and
> also suboptimal, because it sent IPI in cases where none was necessary.
>
> The situation is even more convoluted. MIPS and POWERPC return 1 from
> kvm_arch_vcpu_should_kick(), but implement vcpu->mode for some reason,
> so now they might kick uselessly. This is not a huge problem.

Whoops. I hadn't spotted kvm_arch_vcpu_should_kick() when I added
vcpu->mode stuff in 4.11... I'm guessing I need to implement that
similar to ARM / x86... though MIPS doesn't use kvm_vcpu_kick() yet.

>
> s390, on the other hand, never changed vcpu->mode, so it would always be
> OUTSIDE_GUEST_MODE before and therefore didn't send IPIs.
> I don't see a reason why s390 had kvm_make_all_cpus_request() that did
> nothing but set the bit in vcpu->request, so the new behavior seems
> better.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/s390/kvm/kvm-s390.c | 4 +---
> virt/kvm/kvm_main.c | 2 +-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fd6cd05bb6a7..45b6d9ca5d24 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2130,9 +2130,7 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
>
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> {
> - /* kvm common code refers to this, but never calls it */
> - BUG();
> - return 0;
> + return 1;
> }
>
> static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f489167839c4..2389e9c41cd2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -187,7 +187,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> smp_mb__after_atomic();
>
> if (cpus != NULL && cpu != -1 && cpu != me &&
> - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> + kvm_arch_vcpu_should_kick(vcpu))

This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
to == IN_GUEST_MODE. so:
- you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
MIPS also now uses when accessing mappings outside of guest mode and
depends upon to wait until the old mappings are no longer in use).
- you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
get two of these in quick succession only the first will wait for the
IPI, which might work as long as they're already serialised but it
still feels wrong).

But it doesn't seem right to change the kvm_arch_vcpu_should_kick()
implementations to check kvm_vcpu_exiting_guest_mode(vcpu) !=
OUTSIDE_GUEST_MODE to match condition either, since kvm_vcpu_kick()
doesn't seem to need synchronisation, only to know that it won't be
delayed in reaching hypervisor code.

Cheers
James


> cpumask_set_cpu(cpu, cpus);
> }
> if (unlikely(cpus == NULL))
> --
> 2.12.0
>


Attachments:
(No filename) (3.07 kB)
signature.asc (801.00 B)
Digital signature
Download all attachments

2017-04-07 08:19:13

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 3/6] KVM: x86: use kvm_make_request instead of set_bit

On 06.04.2017 22:20, Radim Krčmář wrote:
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/x86/kvm/x86.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 71a019832df9..57e9989232e5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2229,8 +2229,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> bool tmp = (msr == MSR_KVM_SYSTEM_TIME);
>
> if (ka->boot_vcpu_runs_old_kvmclock != tmp)
> - set_bit(KVM_REQ_MASTERCLOCK_UPDATE,
> - &vcpu->requests);
> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
>
> ka->boot_vcpu_runs_old_kvmclock = tmp;
> }
> @@ -2803,11 +2802,6 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
> return kvm_arch_has_noncoherent_dma(vcpu->kvm);
> }
>
> -static inline void kvm_migrate_timers(struct kvm_vcpu *vcpu)
> -{
> - set_bit(KVM_REQ_MIGRATE_TIMER, &vcpu->requests);
> -}
> -
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> {
> /* Address WBINVD may be executed by guest */
> @@ -2851,7 +2845,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> if (vcpu->cpu != cpu)
> - kvm_migrate_timers(vcpu);
> + kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
> vcpu->cpu = cpu;
> }
>
>

Reviewed-by: David Hildenbrand <[email protected]>

--

Thanks,

David

2017-04-07 08:27:09

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH RFC 5/6] KVM: mark requests that do not need a wakeup

On 06/04/17 21:20, Radim Krčmář wrote:
> Some operations must ensure that the guest is not running with stale
> data, but if the guest is halted, then the update can wait until another
> event happens. kvm_make_all_requests() currently doesn't wake up, so we
> can mark all requests used with it.
>
> First 8 bits were arbitrarily reserved for request numbers.
>
> Most uses of requests have the request type as a constant, so a compiler
> will optimize the '&'.
>
> An alternative would be to have an inline function that would return
> whether the request needs a wake-up or not, but I like this one better
> even though it might produce worse assembly.
>
> Suggested-by: Christoffer Dall <[email protected]>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> Btw. do you recall which macro allowed to define bitmasks? (It has
> two arguments, FROM and TO.)

GENMASK (and its _ULL variant), defined in include/linux/bitops.h.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2017-04-07 10:47:53

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

On 04/06/2017 10:20 PM, Radim Krčmář wrote:
> We have kvm_arch_vcpu_should_kick() to decide whether the target cpu
> needs to be kicked. The previous condition was wrong, because
> architectures that don't use vcpu->mode would not get interrupts and
> also suboptimal, because it sent IPI in cases where none was necessary.
>
> The situation is even more convoluted. MIPS and POWERPC return 1 from
> kvm_arch_vcpu_should_kick(), but implement vcpu->mode for some reason,
> so now they might kick uselessly. This is not a huge problem.
>
> s390, on the other hand, never changed vcpu->mode, so it would always be
> OUTSIDE_GUEST_MODE before and therefore didn't send IPIs.
> I don't see a reason why s390 had kvm_make_all_cpus_request() that did
> nothing but set the bit in vcpu->request, so the new behavior seems
> better.

As on s390 nobody ever called kvm_make_all_cpus_request this patch should be fine
for s390. But even if somebody would start calling kvm_make_all_cpus_requests
this should do what we want (as long as we do not need the special "make sure
to be really out of guest" thing).

Not sure about the x86 things that James mentioned.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> arch/s390/kvm/kvm-s390.c | 4 +---
> virt/kvm/kvm_main.c | 2 +-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fd6cd05bb6a7..45b6d9ca5d24 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2130,9 +2130,7 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
>
> int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> {
> - /* kvm common code refers to this, but never calls it */
> - BUG();
> - return 0;
> + return 1;
> }
>
> static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f489167839c4..2389e9c41cd2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -187,7 +187,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> smp_mb__after_atomic();
>
> if (cpus != NULL && cpu != -1 && cpu != me &&
> - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> + kvm_arch_vcpu_should_kick(vcpu))
> cpumask_set_cpu(cpu, cpus);
> }
> if (unlikely(cpus == NULL))
>

2017-04-07 10:55:35

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: use kvm_{test,clear}_request instead of {test,clear}_bit

On 04/06/2017 10:20 PM, Radim Krčmář wrote:
> Users were expected to use kvm_check_request() for testing and clearing,
> but request have expanded their use since then and some users want to
> only test or do a faster clear.
>
> Make sure that requests are not directly accessed with bit operations, because
> we'll be clearing them later.
>
> Signed-off-by: Radim Krčmář <[email protected]>

Patch itself looks sane
Reviewed-by: Christian Borntraeger <[email protected]>


one question:
> static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
> {
> - if (test_bit(req, &vcpu->requests)) {
> - clear_bit(req, &vcpu->requests);
> + if (kvm_test_request(req, vcpu)) {
> + kvm_clear_request(req, vcpu);

This looks fine. I am just asking myself why we do not use
test_and_clear_bit? Do we expect gcc to merge all test bits as
a fast path? This does not seem to work as far as I can tell and
almost everybody does a fast path like in



arch/s390/kvm/kvm-s390.c:
if (!vcpu->requests)
return 0;

arch/x86/kvm/x86.c:
if (vcpu->requests) {

>
> /*
> * Ensure the rest of the request is visible to kvm_check_request's
>

2017-04-07 11:01:36

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH 4/6] KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up

On 04/06/2017 10:20 PM, Radim Krčmář wrote:
> The #ifndef was protecting a missing halt_wakeup stat, but that is no
> longer necessary.


Acked-by: Christian Borntraeger <[email protected]>
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> virt/kvm/kvm_main.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2389e9c41cd2..a486c6ad27a6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2195,7 +2195,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_block);
>
> -#ifndef CONFIG_S390
> void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
> {
> struct swait_queue_head *wqp;
> @@ -2225,7 +2224,6 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> put_cpu();
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
> -#endif /* !CONFIG_S390 */
>
> int kvm_vcpu_yield_to(struct kvm_vcpu *target)
> {
>

2017-04-07 12:24:26

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: use kvm_{test,clear}_request instead of {test,clear}_bit

2017-04-07 12:55+0200, Christian Borntraeger:
> On 04/06/2017 10:20 PM, Radim Krčmář wrote:
>> static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
>> {
>> - if (test_bit(req, &vcpu->requests)) {
>> - clear_bit(req, &vcpu->requests);
>> + if (kvm_test_request(req, vcpu)) {
>> + kvm_clear_request(req, vcpu);
>
> This looks fine. I am just asking myself why we do not use
> test_and_clear_bit? Do we expect gcc to merge all test bits as
> a fast path? This does not seem to work as far as I can tell and
> almost everybody does a fast path like in

test_and_clear_bit() is a slower operation even if the test is false (at
least on x86), because it needs to be fully atomic.

> arch/s390/kvm/kvm-s390.c:
> if (!vcpu->requests)
> return 0;
>
> arch/x86/kvm/x86.c:
> if (vcpu->requests) {

We'll mostly have only one request set, so splitting the test_and_clear
improves the performance of many subsequent tests_and_clear()s even if
the compiler doesn't optimize.

GCC couldn't even optimize if we used test_and_clear_bit(), because that
instruction adds barriers, but the forward check for vcpu->requests is
there because we do not trust the optimizer to do it for us and it would
make a big difference.

2017-04-07 12:29:24

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH RFC 5/6] KVM: mark requests that do not need a wakeup

2017-04-07 09:27+0100, Marc Zyngier:
> On 06/04/17 21:20, Radim Krčmář wrote:
>> Some operations must ensure that the guest is not running with stale
>> data, but if the guest is halted, then the update can wait until another
>> event happens. kvm_make_all_requests() currently doesn't wake up, so we
>> can mark all requests used with it.
>>
>> First 8 bits were arbitrarily reserved for request numbers.
>>
>> Most uses of requests have the request type as a constant, so a compiler
>> will optimize the '&'.
>>
>> An alternative would be to have an inline function that would return
>> whether the request needs a wake-up or not, but I like this one better
>> even though it might produce worse assembly.
>>
>> Suggested-by: Christoffer Dall <[email protected]>
>> Signed-off-by: Radim Krčmář <[email protected]>
>> ---
>> Btw. do you recall which macro allowed to define bitmasks? (It has
>> two arguments, FROM and TO.)
>
> GENMASK (and its _ULL variant), defined in include/linux/bitops.h.

Thank you, it is under BIT() ... I am blind.

> +#define KVM_REQUEST_MASK 0xff

The 0xff should be "GENMASK(7,0)".

First 8 bits is plenty and should be fast even if the compiler doesn't
optimize the masking because request is not constant.

2017-04-07 14:05:59

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: use kvm_{test,clear}_request instead of {test,clear}_bit

2017-04-07 14:24+0200, Radim Krčmář:
> 2017-04-07 12:55+0200, Christian Borntraeger:
>> On 04/06/2017 10:20 PM, Radim Krčmář wrote:
>>> static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
>>> {
>>> - if (test_bit(req, &vcpu->requests)) {
>>> - clear_bit(req, &vcpu->requests);
>>> + if (kvm_test_request(req, vcpu)) {
>>> + kvm_clear_request(req, vcpu);
>>
>> This looks fine. I am just asking myself why we do not use
>> test_and_clear_bit? Do we expect gcc to merge all test bits as
>> a fast path? This does not seem to work as far as I can tell and
>> almost everybody does a fast path like in
>
> test_and_clear_bit() is a slower operation even if the test is false (at
> least on x86), because it needs to be fully atomic.
>
>> arch/s390/kvm/kvm-s390.c:
>> if (!vcpu->requests)
>> return 0;
>>
>> arch/x86/kvm/x86.c:
>> if (vcpu->requests) {
>
> We'll mostly have only one request set, so splitting the test_and_clear
> improves the performance of many subsequent tests_and_clear()s even if
> the compiler doesn't optimize.
>
> GCC couldn't even optimize if we used test_and_clear_bit(), because that
> instruction adds barriers, but the forward check for vcpu->requests is
> there because we do not trust the optimizer to do it for us and it would
> make a big difference.

Ugh, I started thinking that bitops are not atomic because I looked at
wrong boot/bitops.h by mistake. The compiler cannot merge test_bit()s,
but the speed difference holds.

2017-04-10 11:14:34

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request

On Thu, Apr 06, 2017 at 10:20:56PM +0200, Radim Krčmář wrote:
> We want to have kvm_make_all_cpus_request() to be an optmized version of
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_make_request(vcpu, request);
> kvm_vcpu_kick(vcpu);
> }
>
> and kvm_vcpu_kick() wakes up the target vcpu. We know which requests do
> not need the wake up and use it to optimize the loop.

Any reason we don't want kvm_vcpu_kick() to also get the
if (!(req & KVM_REQUEST_NO_WAKEUP)) optimization condition? I did some
grepping, and don't see any kicks of the requests that have been marked as
NO_WAKEUP, so nothing should change by adding it now. But the consistency
would be nice for the doc I'm writing.

Also, the condition in kvm_vcpu_kick() looks like overkill

cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)

How could vcpu->cpu ever be any offline/invalid cpu, other than -1? The
condition in kvm_make_all_cpus_request() makes more sense to me

cpu != -1 && cpu != me

I guess a lot this stuff is planned for a larger requests rework, when
kicks get integrated with requests? I'm a bit anxious, though, as it
changes how I document stuff now, and even how I approach the ARM series.
For example, if kvm_make_request() already integrated kvm_vcpu_kick(),
which means also adding the smp_mb__after_atomic(), like
kvm_make_all_cpus_request() has, then I wouldn't need to add the smp_mb()
to kvm_arch_vcpu_should_kick().

Thanks,
drew

>
> Thanks to that, this patch doesn't change the behavior of current users
> (the all don't need the wake up) and only prepares for future where the
> wake up is going to be needed.
>
> I think that most requests do not need the wake up, so we would flip the
> bit then.
>
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> virt/kvm/kvm_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a486c6ad27a6..1db503bab3dc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -186,6 +186,9 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> /* Set ->requests bit before we read ->mode. */
> smp_mb__after_atomic();
>
> + if (!(req & KVM_REQUEST_NO_WAKEUP))
> + kvm_vcpu_wake_up(vcpu);
> +
> if (cpus != NULL && cpu != -1 && cpu != me &&
> kvm_arch_vcpu_should_kick(vcpu))
> cpumask_set_cpu(cpu, cpus);
> --
> 2.12.0
>

2017-04-10 15:59:51

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

On Thu, Apr 06, 2017 at 10:02:15PM +0100, James Hogan wrote:
> On Thu, Apr 06, 2017 at 10:20:51PM +0200, Radim Krčmář wrote:
> > We have kvm_arch_vcpu_should_kick() to decide whether the target cpu
> > needs to be kicked. The previous condition was wrong, because
> > architectures that don't use vcpu->mode would not get interrupts and
> > also suboptimal, because it sent IPI in cases where none was necessary.
> >
> > The situation is even more convoluted. MIPS and POWERPC return 1 from
> > kvm_arch_vcpu_should_kick(), but implement vcpu->mode for some reason,
> > so now they might kick uselessly. This is not a huge problem.
>
> Whoops. I hadn't spotted kvm_arch_vcpu_should_kick() when I added
> vcpu->mode stuff in 4.11... I'm guessing I need to implement that
> similar to ARM / x86... though MIPS doesn't use kvm_vcpu_kick() yet.
>
> >
> > s390, on the other hand, never changed vcpu->mode, so it would always be
> > OUTSIDE_GUEST_MODE before and therefore didn't send IPIs.
> > I don't see a reason why s390 had kvm_make_all_cpus_request() that did
> > nothing but set the bit in vcpu->request, so the new behavior seems
> > better.
> >
> > Signed-off-by: Radim Krčmář <[email protected]>
> > ---
> > arch/s390/kvm/kvm-s390.c | 4 +---
> > virt/kvm/kvm_main.c | 2 +-
> > 2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index fd6cd05bb6a7..45b6d9ca5d24 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -2130,9 +2130,7 @@ static void kvm_gmap_notifier(struct gmap *gmap, unsigned long start,
> >
> > int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > {
> > - /* kvm common code refers to this, but never calls it */
> > - BUG();
> > - return 0;
> > + return 1;
> > }
> >
> > static int kvm_arch_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu,
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index f489167839c4..2389e9c41cd2 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -187,7 +187,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> > smp_mb__after_atomic();
> >
> > if (cpus != NULL && cpu != -1 && cpu != me &&
> > - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> > + kvm_arch_vcpu_should_kick(vcpu))

Hi James,

I'm actually thinking we should do away with kvm_arch_vcpu_should_kick(),
putting the x86 implementation of it directly in the common code. The
reason is that, while there are currently two implementations for the
function, the x86 one and 'return true', I don't think 'return true'
is correct. 'return true' doesn't consider whether or not the target VCPU
has interrupts disabled, and I don't see how sending the IPI when the
vcpu is not in guest mode, or has disabled interrupts prior to entering
guest mode, makes sense. To consider whether the target vcpu has
interrupts disabled we need to require it to tell us. Requiring the
setting of IN_GUEST_MODE after calling local_irq_disable() is how x86
does it, and seems like it should work for all architectures. So, can you
help me better understand the concerns you have below?

>
> This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
> to == IN_GUEST_MODE. so:
> - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
> MIPS also now uses when accessing mappings outside of guest mode and
> depends upon to wait until the old mappings are no longer in use).

But as long as the kicks were due to vcpu requests, then, since the VCPU
should check requests again before reentering guest mode, it'll still
handle them. I see the comment under the setting of
READING_SHADOW_PAGE_TABLES in arch/mips/kvm/trap_emul.c refers to TLB
flush requests, so that one should be OK. Are there other kicks that
are request-less to be concerned with?

> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
> get two of these in quick succession only the first will wait for the
> IPI, which might work as long as they're already serialised but it
> still feels wrong).

Can you elaborate on this one? My understanding is that there should
be no harm in coalescing these IPIs.

>
> But it doesn't seem right to change the kvm_arch_vcpu_should_kick()
> implementations to check kvm_vcpu_exiting_guest_mode(vcpu) !=
> OUTSIDE_GUEST_MODE to match condition either, since kvm_vcpu_kick()
> doesn't seem to need synchronisation, only to know that it won't be
> delayed in reaching hypervisor code.

I can't think of a scenario that requires sending an IPI to a vcpu
outside guest mode, particularly because if it's the only runnable
task on the cpu then nothing will happen anyway.

Thanks,
drew

2017-04-11 09:14:34

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()



On 07/04/2017 05:02, James Hogan wrote:
> This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
> to == IN_GUEST_MODE. so:
> - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
> MIPS also now uses when accessing mappings outside of guest mode and
> depends upon to wait until the old mappings are no longer in use).

This is wrong, the purpose of READING_SHADOW_PAGE_TABLES is "kvm_flush_remote_tlbs
should send me an IPI, because I want to stop kvm_flush_remote_tlbs until I'm done
reading the page tables".

> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
> get two of these in quick succession only the first will wait for the
> IPI, which might work as long as they're already serialised but it
> still feels wrong).

But this is okay---avoiding multiple IPIs is the exact purpose of
EXITING_GUEST_MODE.

There are evidently multiple uses of kvm_make_all_cpus_request, and we
should avoid smp_call_function_many(..., true) if possible. So perhaps:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a17d78759727..20e3bd60bdda 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -169,7 +169,7 @@ static void ack_flush(void *_completed)
{
}

-bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req, bool wait)
{
int i, cpu, me;
cpumask_var_t cpus;
@@ -182,18 +182,19 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(req, vcpu);
cpu = vcpu->cpu;
+ if (cpus == NULL || cpu == -1 || cpu == me)
+ continue;

/* Set ->requests bit before we read ->mode. */
smp_mb__after_atomic();
-
- if (cpus != NULL && cpu != -1 && cpu != me &&
- kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
+ if (kvm_arch_vcpu_should_kick(vcpu) ||
+ (wait && vcpu->mode != OUTSIDE_GUEST_MODE))
cpumask_set_cpu(cpu, cpus);
}
if (unlikely(cpus == NULL))
- smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1);
+ smp_call_function_many(cpu_online_mask, ack_flush, NULL, wait);
else if (!cpumask_empty(cpus))
- smp_call_function_many(cpus, ack_flush, NULL, 1);
+ smp_call_function_many(cpus, ack_flush, NULL, wait);
else
called = false;
put_cpu();
@@ -221,7 +222,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
* kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
* barrier here.
*/
- if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
+ if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH, true))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
}
@@ -230,7 +231,7 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);

void kvm_reload_remote_mmus(struct kvm *kvm)
{
- kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+ /* FIXME, is wait=true really needed? */
+ kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true);
}

int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)


Other users do not need wait=false.

Or another idea is to embed wait in the request number, as suggested in the
ARM thread, so that for example:

- bits 0-4 = bit number in vcpu->requests

- bit 8 = wait when making request

- bit 9 = kick after making request


Responding to Andrew, I agree that "we should do away with
kvm_arch_vcpu_should_kick(), putting the x86 implementation of it
directly in the common code" (inlining kvm_vcpu_exiting_guest_mode,
I may add). However, kvm_arch_vcpu_should_kick is just an optimization,
it's not a bug not to use it. So let's first iron out
kvm_make_all_cpus_request.

Paolo

2017-04-11 09:15:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request



On 10/04/2017 19:14, Andrew Jones wrote:
> Any reason we don't want kvm_vcpu_kick() to also get the
> if (!(req & KVM_REQUEST_NO_WAKEUP)) optimization condition?

Because what we want is kvm_make_request to do the kick instead,
"if (!(req & KVM_REQUEST_NO_WAKEUP))", I think.

> I did some
> grepping, and don't see any kicks of the requests that have been marked as
> NO_WAKEUP, so nothing should change by adding it now. But the consistency
> would be nice for the doc I'm writing.
>
> Also, the condition in kvm_vcpu_kick() looks like overkill
>
> cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)
>
> How could vcpu->cpu ever be any offline/invalid cpu, other than -1? The
> condition in kvm_make_all_cpus_request() makes more sense to me
>
> cpu != -1 && cpu != me
>
> I guess a lot this stuff is planned for a larger requests rework, when
> kicks get integrated with requests?

Yes, this is more or less what I meant above.

> I'm a bit anxious, though, as it
> changes how I document stuff now, and even how I approach the ARM series.
> For example, if kvm_make_request() already integrated kvm_vcpu_kick(),
> which means also adding the smp_mb__after_atomic(), like
> kvm_make_all_cpus_request() has, then I wouldn't need to add the smp_mb()
> to kvm_arch_vcpu_should_kick().

kvm_arch_vcpu_should_kick() does cmpxchg, which already includes a
memory barrier when it succeeds, so you need not add smp_mb() there.
And indeed by integrating kicks and requests we know that all callers of
kvm_arch_vcpu_should_kick() already do an atomic +
smp_mb__after_atomic(), so there's even less reason to worry about
memory barriers.

kvm_arch_vcpu_should_kick() could then use cmpxchg_relaxed if it helps
ARM, and you could even split the loop in two to limit the number of
memory barriers:

kvm_for_each_vcpu(i, vcpu, kvm) {
set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
smp_mb__after_atomic();
/* now kick and/or wakeup */

It won't make a difference in practice because there's something wrong
if kvm_make_all_cpus_request is a hot spot, but it's readable code and
it makes sense.

In any case, as soon as your patches get in, whoever does the cleanup
also has the honor of updating the docs. Radim could also get extra
karma for putting your documentation at the beginning of this series,
and updating it at the same time. :)

Paolo

2017-04-11 09:15:00

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request



On 07/04/2017 04:20, Radim Krčmář wrote:
> I think that most requests do not need the wake up, so we would flip the
> bit then.

True. I may need a bit more convincing, but let's see the patches:

- point against: no wakeup is a bug, possibly hard to find, while an
extra wakeup is just annoying.

- point in favor: the same argument (multiplied by over 9000) would
apply to a wait flag in the request number, but it would be obviously
stupid to add a no_wait flag to all requests except the couple that
need it.

Thanks,

Paolo

2017-04-11 09:15:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request



On 07/04/2017 04:20, Radim Krčmář wrote:
> I think that most requests do not need the wake up, so we would flip the
> bit then.

True. I may need a bit more convincing, but let's see the patches:

- point against: on the other hand no wakeup is a bug, possibly hard to
find, while an extra wakeup is just annoying.

- point in favor: the same argument (multiplied by 9000) would apply to
a wait flag in the request number, but it would be obviously stupid to
add a no_wait flag to all requests except the couple that need it.

Thanks,

Paolo

2017-04-11 09:37:31

by James Hogan

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

Hi Paolo,

On Tue, Apr 11, 2017 at 01:25:04PM +0800, Paolo Bonzini wrote:
> On 07/04/2017 05:02, James Hogan wrote:
> > This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
> > to == IN_GUEST_MODE. so:
> > - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
> > MIPS also now uses when accessing mappings outside of guest mode and
> > depends upon to wait until the old mappings are no longer in use).
>
> This is wrong, the purpose of READING_SHADOW_PAGE_TABLES is "kvm_flush_remote_tlbs
> should send me an IPI, because I want to stop kvm_flush_remote_tlbs until I'm done
> reading the page tables".

That sounds equivalent to what I meant for MIPS, i.e.
kvm_flush_remote_tlbs() does the waiting (not the thing accessing guest
mappings).

Cheers
James


Attachments:
(No filename) (799.00 B)
signature.asc (801.00 B)
Digital signature
Download all attachments

2017-04-11 10:44:44

by James Hogan

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

Hi Drew,

Note, MIPS doesn't directly use kicks as far as I can tell, only the TLB
flush request, so what I say below is in the context of requests where
the IPI is waited for.

On Mon, Apr 10, 2017 at 05:59:42PM +0200, Andrew Jones wrote:
> I'm actually thinking we should do away with kvm_arch_vcpu_should_kick(),
> putting the x86 implementation of it directly in the common code. The
> reason is that, while there are currently two implementations for the
> function, the x86 one and 'return true', I don't think 'return true'
> is correct. 'return true' doesn't consider whether or not the target VCPU
> has interrupts disabled, and I don't see how sending the IPI when the
> vcpu is not in guest mode,

Generally agreed, except for the READING_SHADOW_PAGE_TABLES case.

> or has disabled interrupts prior to entering guest mode, makes sense.

OTOH:
1) disable interrups
2) set mode to IN_GUEST_MODE
3) check requests
4) enter guest mode

Before (3) an IPI is redundant since the requests will be checked prior
to entering guest mode anyway, before any guest mappings are accessed.

After (3) the IPI is important as its too late to handle the request
before entering guest mode, so the IPI is needed to inform
kvm_flush_remote_tlbs() when it is safe to continue, and to trigger a
prompt exit from guest mode so it isn't waiting too long.

> To consider whether the target vcpu has
> interrupts disabled we need to require it to tell us. Requiring the
> setting of IN_GUEST_MODE after calling local_irq_disable() is how x86
> does it, and seems like it should work for all architectures. So, can you
> help me better understand the concerns you have below?
>
> >
> > This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
> > to == IN_GUEST_MODE. so:
> > - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
> > MIPS also now uses when accessing mappings outside of guest mode and
> > depends upon to wait until the old mappings are no longer in use).
>
> But as long as the kicks were due to vcpu requests, then, since the VCPU
> should check requests again before reentering guest mode, it'll still
> handle them.

At least for MIPS reading shadow page tables is treated a bit like being
in guest mode, in that guest mappings are accessed (including
potentially stale ones before kvm_flush_remote_tlbs() has returned), and
has to be done with IRQs disabled before also checking requests (to
handle requests sent prior to reading shadow page tables). The only
difference is it doesn't happen in guest mode and IRQs are properly
disabled so the IPI is delayed rather than interupting the activity.

> I see the comment under the setting of
> READING_SHADOW_PAGE_TABLES in arch/mips/kvm/trap_emul.c refers to TLB
> flush requests, so that one should be OK. Are there other kicks that
> are request-less to be concerned with?

Not that I'm aware of for MIPS.

>
> > - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
> > get two of these in quick succession only the first will wait for the
> > IPI, which might work as long as they're already serialised but it
> > still feels wrong).
>
> Can you elaborate on this one? My understanding is that there should
> be no harm in coalescing these IPIs.

My concern was e.g.:

CPU1 CPU2 CPU3 (in guest mode)
----------------------- ----------------------- ------------------------
kvm_flush_remote_tlbs()
kvm_flush_remote_tlbs()
IN_GUEST_MODE->EXITING_GUEST_MODE
EXITING_GUEST_MODE
return without IPI
*continue accessing*
*guest mappings*

send IPI to CPU3 & wait ----------------------> Exit guest mode
irqs enable
take IPI
<-----------------------------------------------
wake and return

(i.e. kvm_flush_remote_tlbs() on CPU2 returned while stale mappings
still in use).

However at least for MIPS I think kvm->mmu_lock should protect against
that by serialising the second kvm_flush_remote_tlbs() after the first
is complete. If anything else can switch mode to EXITING_GUEST_MODE (a
kick?) without locking, then perhaps it could still be a problem?

Cheers
James


Attachments:
(No filename) (4.03 kB)
signature.asc (801.00 B)
Digital signature
Download all attachments

2017-04-11 12:04:39

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: perform a wake_up in kvm_make_all_cpus_request

On Tue, Apr 11, 2017 at 01:34:49PM +0800, Paolo Bonzini wrote:
> kvm_arch_vcpu_should_kick() does cmpxchg, which already includes a
> memory barrier when it succeeds, so you need not add smp_mb() there.

When the cmpxchg() fails it only guarantees ACQUIRE semantics, meaning
the request setting may appear to happen after its completion. This
would break our delicate vcpu->requests, vcpu->mode two-variable memory
barrier pattern that prohibits a VCPU entering guest mode with a
pending request and no IPI. IOW, on ARM we need an explicit smp_mb()
before the cmpxchg(), otherwise it's incomplete. I think adding a
smp_mb__before_atomic() should cover ARM and any other relaxed memory
model arches without impacting x86.

Thanks,
drew

2017-04-11 19:31:33

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

2017-04-11 10:37+0100, James Hogan:
> Hi Paolo,
>
> On Tue, Apr 11, 2017 at 01:25:04PM +0800, Paolo Bonzini wrote:
>> On 07/04/2017 05:02, James Hogan wrote:
>> > This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
>> > to == IN_GUEST_MODE. so:
>> > - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
>> > MIPS also now uses when accessing mappings outside of guest mode and
>> > depends upon to wait until the old mappings are no longer in use).
>>
>> This is wrong, the purpose of READING_SHADOW_PAGE_TABLES is "kvm_flush_remote_tlbs
>> should send me an IPI, because I want to stop kvm_flush_remote_tlbs until I'm done
>> reading the page tables".
>
> That sounds equivalent to what I meant for MIPS, i.e.
> kvm_flush_remote_tlbs() does the waiting (not the thing accessing guest
> mappings).

I agree, thanks for noticing this. It would be a huge mistake to drop
the synchronization.

2017-04-11 19:45:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()



----- Original Message -----
> From: "Radim Krčmář" <[email protected]>
> To: "James Hogan" <[email protected]>
> Cc: "Paolo Bonzini" <[email protected]>, [email protected], [email protected], "Christoffer Dall"
> <[email protected]>, "Andrew Jones" <[email protected]>, "Marc Zyngier" <[email protected]>, "Christian
> Borntraeger" <[email protected]>, "Cornelia Huck" <[email protected]>, "Paul Mackerras"
> <[email protected]>
> Sent: Wednesday, April 12, 2017 3:31:24 AM
> Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()
>
> 2017-04-11 10:37+0100, James Hogan:
> > Hi Paolo,
> >
> > On Tue, Apr 11, 2017 at 01:25:04PM +0800, Paolo Bonzini wrote:
> >> On 07/04/2017 05:02, James Hogan wrote:
> >> > This presumably changes the behaviour on x86, from != OUTSIDE_GUEST_MODE
> >> > to == IN_GUEST_MODE. so:
> >> > - you'll no longer get IPIs if its in READING_SHADOW_PAGE_TABLES (which
> >> > MIPS also now uses when accessing mappings outside of guest mode and
> >> > depends upon to wait until the old mappings are no longer in use).
> >>
> >> This is wrong, the purpose of READING_SHADOW_PAGE_TABLES is
> >> "kvm_flush_remote_tlbs
> >> should send me an IPI, because I want to stop kvm_flush_remote_tlbs until
> >> I'm done
> >> reading the page tables".
> >
> > That sounds equivalent to what I meant for MIPS, i.e.
> > kvm_flush_remote_tlbs() does the waiting (not the thing accessing guest
> > mappings).

Yeah, I meant "it's wrong in Radim's patches". Not hard to fix though.

Thanks for the review!

Paolo

> I agree, thanks for noticing this. It would be a huge mistake to drop
> the synchronization.
>

2017-04-11 20:45:44

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()

2017-04-11 13:25+0800, Paolo Bonzini:
> On 07/04/2017 05:02, James Hogan wrote:
>> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you
>> get two of these in quick succession only the first will wait for the
>> IPI, which might work as long as they're already serialised but it
>> still feels wrong).
>
> But this is okay---avoiding multiple IPIs is the exact purpose of
> EXITING_GUEST_MODE.

I think this applied to the missed synchronization, in which case the
point is valid as the latter caller would assume that it can proceed to
reuse the memory even though the guest was still using it.

> There are evidently multiple uses of kvm_make_all_cpus_request, and we
> should avoid smp_call_function_many(..., true) if possible. So perhaps:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..20e3bd60bdda 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -169,7 +169,7 @@ static void ack_flush(void *_completed)
> {
> }
>
> -bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> +bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req, bool wait)
> {
> int i, cpu, me;
> cpumask_var_t cpus;
> @@ -182,18 +182,19 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_make_request(req, vcpu);
> cpu = vcpu->cpu;
> + if (cpus == NULL || cpu == -1 || cpu == me)
> + continue;
>
> /* Set ->requests bit before we read ->mode. */
> smp_mb__after_atomic();
> -
> - if (cpus != NULL && cpu != -1 && cpu != me &&
> - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
> + if (kvm_arch_vcpu_should_kick(vcpu) ||
> + (wait && vcpu->mode != OUTSIDE_GUEST_MODE))
> cpumask_set_cpu(cpu, cpus);
> }
> if (unlikely(cpus == NULL))
> - smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1);
> + smp_call_function_many(cpu_online_mask, ack_flush, NULL, wait);
> else if (!cpumask_empty(cpus))
> - smp_call_function_many(cpus, ack_flush, NULL, 1);
> + smp_call_function_many(cpus, ack_flush, NULL, wait);
> else
> called = false;
> put_cpu();
> @@ -221,7 +222,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
> * barrier here.
> */
> - if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> + if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH, true))
> ++kvm->stat.remote_tlb_flush;
> cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> }
> @@ -230,7 +231,7 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
>
> void kvm_reload_remote_mmus(struct kvm *kvm)
> {
> - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> + /* FIXME, is wait=true really needed? */

Probably not. There are two uses,

in kvm_mmu_prepare_zap_page():
The only change that happens between kvm_reload_remote_mmus() and
kvm_flush_remote_tlbs() in kvm_mmu_commit_zap_page() is setting of
sp->role.invalid -- synchronizing it doesn't prevent any race with
READING_SHADOW_PAGE_TABLES mode and the unconditional TLB flush is the
important one. I think that kvm_reload_remote_mmus doesn't even need
to kick in this case.

in kvm_mmu_invalidate_zap_all_pages():
Same situation: the guest cannot do an entry without increasing the
generation number, but can enter READING_SHADOW_PAGE_TABLES mode
between reload and flush.
I think that we don't need to call

but my knowledge of this area is obviously lacking ...

> + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true);
> }
>
> int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>
>
> Other users do not need wait=false.

You mean "wait=true"?

(Would be safer to assume they depend on the VM exit wait until proved
otherwise ...)

> Or another idea is to embed wait in the request number, as suggested in the
> ARM thread, so that for example:

Right, I don't think that a TLB flush makes sense without
synchronization and adding context sensitivity

> - bits 0-4 = bit number in vcpu->requests
>
> - bit 8 = wait when making request

Sounds good. The single-target kvm_make_request() + kvm_vcpu_kick()
should use this as well.

> - bit 9 = kick after making request

Maybe add bit mask to denote in which modes the kick/wait is necessary?

bit 9 : IN_GUEST_MODE
bit 10 : EXITING_GUEST_MODE
bit 11 : READING_SHADOW_PAGE_TABLES

TLB_FLUSH would set bits 8-11. IIUC, ARM has use for requests that need
to make sure that the guest is not in guest mode before proceeding and
those would set bit 8-10.

The common requests, "notice me as soon as possible", would set bit 9.
The bits 9-11 could also be used only when bit 8 is set, to make the
transition easier. (9 and 10 could be squished then as well.)

2017-04-12 00:15:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()



----- Original Message -----
> From: "Radim Krčmář" <[email protected]>
> To: "Paolo Bonzini" <[email protected]>
> Cc: "James Hogan" <[email protected]>, [email protected], [email protected], "Christoffer Dall"
> <[email protected]>, "Andrew Jones" <[email protected]>, "Marc Zyngier" <[email protected]>, "Christian
> Borntraeger" <[email protected]>, "Cornelia Huck" <[email protected]>, "Paul Mackerras"
> <[email protected]>
> Sent: Wednesday, April 12, 2017 4:45:36 AM
> Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request()
>
> > void kvm_reload_remote_mmus(struct kvm *kvm)
> > {
> > - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> > + /* FIXME, is wait=true really needed? */
>
> Probably not. There are two uses,
>
> in kvm_mmu_prepare_zap_page():
> The only change that happens between kvm_reload_remote_mmus() and
> kvm_flush_remote_tlbs() in kvm_mmu_commit_zap_page() is setting of
> sp->role.invalid -- synchronizing it doesn't prevent any race with
> READING_SHADOW_PAGE_TABLES mode and the unconditional TLB flush is the
> important one. I think that kvm_reload_remote_mmus doesn't even need
> to kick in this case.
>
> in kvm_mmu_invalidate_zap_all_pages():
> Same situation: the guest cannot do an entry without increasing the
> generation number, but can enter READING_SHADOW_PAGE_TABLES mode
> between reload and flush.
> I think that we don't need to call
>
> but my knowledge of this area is obviously lacking ...

Yes, you're right - I just was too lazy. :)

> > + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true);
> > }
> >
> > int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> >
> >
> > Other users do not need wait=false.
>
> You mean "wait=true"?
>
> (Would be safer to assume they depend on the VM exit wait until proved
> otherwise ...)

Yeah, I audited them.

> > - bit 9 = kick after making request
>
> Maybe add bit mask to denote in which modes the kick/wait is necessary?
>
> bit 9 : IN_GUEST_MODE
> bit 10 : EXITING_GUEST_MODE
> bit 11 : READING_SHADOW_PAGE_TABLES
>
> TLB_FLUSH would set bits 8-11. IIUC, ARM has use for requests that need
> to make sure that the guest is not in guest mode before proceeding and
> those would set bit 8-10.

No, checking vcpu->requests after setting IN_GUEST_MODE is done separately.
EXITING_GUEST_MODE's meaning *is* "no IPI needed".

> The common requests, "notice me as soon as possible", would set bit 9.
> The bits 9-11 could also be used only when bit 8 is set, to make the
> transition easier. (9 and 10 could be squished then as well.)

Maybe, depending on how the code looks like. But considering we have
to do the cmpxchg, I think the should_kick and should_wait logic should
be embedded in kvm_make_all_cpus_request (and later on, kvm_make_request).

Paolo