2019-06-17 11:25:19

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 0/5] KVM: LAPIC: Implement Exitless Timer

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both
programming timer in guest and the emulated timer fires incur vmexits.
This patchset tries to avoid vmexit which is incurred by the emulated
timer fires in dedicated instance scenario.

When nohz_full is enabled in dedicated instances scenario, the unpinned
timer will be moved to the nearest busy housekeepers after commit
9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit
444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However,
KVM always makes lapic timer pinned to the pCPU which vCPU residents, the
reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer
pinned). Actually, these emulated timers can be offload to the housekeeping
cpus since APICv is really common in recent years. The guest timer interrupt
is injected by posted-interrupt which is delivered by housekeeping cpu
once the emulated timer fires.

The host admin should fine tuned, e.g. dedicated instances scenario w/
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patchset:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time

EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

w/ patchset:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time

EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>

v3 -> v4:
* drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer
* don't posted inject already-expired timer

v2 -> v3:
* disarming the vmx preemption timer when posted_interrupt_inject_timer_enabled()
* check kvm_hlt_in_guest instead

v1 -> v2:
* check vcpu_halt_in_guest
* move module parameter from kvm-intel to kvm
* add housekeeping_enabled
* rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs

Wanpeng Li (5):
KVM: LAPIC: Make lapic timer unpinned
KVM: LAPIC: inject lapic timer interrupt by posted interrupt
KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi
KVM: LAPIC: Don't posted inject already-expired timer
KVM: LAPIC: add advance timer support to pi_inject_timer

arch/x86/kvm/lapic.c | 62 ++++++++++++++++++++++++++++-------------
arch/x86/kvm/lapic.h | 3 +-
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 5 ++--
arch/x86/kvm/x86.c | 11 ++++----
arch/x86/kvm/x86.h | 2 ++
include/linux/sched/isolation.h | 2 ++
kernel/sched/isolation.c | 6 ++++
8 files changed, 64 insertions(+), 29 deletions(-)

--
2.7.4


2019-06-17 11:25:21

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned

From: Wanpeng Li <[email protected]>

Make lapic timer unpinned when timer is injected by posted-interrupt,
the emulated timer can be offload to the housekeeping cpus, kick after
setting the pending timer request as alternative to commit 61abdbe0bcc.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 8 ++++----
arch/x86/kvm/x86.c | 6 +-----
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e82a18c..87ecb56 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1578,7 +1578,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
likely(ns > apic->lapic_timer.timer_advance_ns)) {
expire = ktime_add_ns(now, ns);
expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
- hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED);
+ hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);

@@ -1680,7 +1680,7 @@ static void start_sw_period(struct kvm_lapic *apic)

hrtimer_start(&apic->lapic_timer.timer,
apic->lapic_timer.target_expiration,
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS);
}

bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
@@ -2317,7 +2317,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
apic->vcpu = vcpu;

hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS);
apic->lapic_timer.timer.function = apic_timer_fn;
if (timer_advance_ns == -1) {
apic->lapic_timer.timer_advance_ns = 1000;
@@ -2506,7 +2506,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)

timer = &vcpu->arch.apic->lapic_timer.timer;
if (hrtimer_cancel(timer))
- hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
+ hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
}

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a05a4e..9450a16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)

void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
{
- /*
- * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
- * vcpu_enter_guest. This function is only called from
- * the physical CPU that is running vcpu.
- */
kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
+ kvm_vcpu_kick(vcpu);
}

static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
--
2.7.4

2019-06-17 11:25:26

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

From: Wanpeng Li <[email protected]>

Dedicated instances are currently disturbed by unnecessary jitter due
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both
programming timer in guest and the emulated timer fires incur vmexits.
This patch tries to avoid vmexit which is incurred by the emulated
timer fires in dedicated instance scenario.

When nohz_full is enabled in dedicated instances scenario, the emulated
timers can be offload to the nearest busy housekeeping cpus since APICv
is really common in recent years. The guest timer interrupt is injected
by posted-interrupt which is delivered by housekeeping cpu once the emulated
timer fires.

The host admin should fine tuned, e.g. dedicated instances scenario w/
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patch:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time

EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )

w/ patch:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time

EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/vmx/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 5 +++++
arch/x86/kvm/x86.h | 2 ++
include/linux/sched/isolation.h | 2 ++
kernel/sched/isolation.c | 6 ++++++
7 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 87ecb56..9ceeee5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
return apic->vcpu->vcpu_id;
}

+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
+{
+ return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
+ kvm_hlt_in_guest(vcpu->kvm);
+}
+EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
+
static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
switch (map->mode) {
@@ -1432,6 +1439,19 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
}
}

+static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
+{
+ struct kvm_timer *ktimer = &apic->lapic_timer;
+
+ kvm_apic_local_deliver(apic, APIC_LVTT);
+ if (apic_lvtt_tscdeadline(apic))
+ ktimer->tscdeadline = 0;
+ if (apic_lvtt_oneshot(apic)) {
+ ktimer->tscdeadline = 0;
+ ktimer->target_expiration = 0;
+ }
+}
+
static void apic_timer_expired(struct kvm_lapic *apic)
{
struct kvm_vcpu *vcpu = apic->vcpu;
@@ -1441,6 +1461,11 @@ static void apic_timer_expired(struct kvm_lapic *apic)
if (atomic_read(&apic->lapic_timer.pending))
return;

+ if (posted_interrupt_inject_timer(apic->vcpu)) {
+ kvm_apic_inject_pending_timer_irqs(apic);
+ return;
+ }
+
atomic_inc(&apic->lapic_timer.pending);
kvm_set_pending_timer(vcpu);

@@ -2373,13 +2398,7 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;

if (atomic_read(&apic->lapic_timer.pending) > 0) {
- kvm_apic_local_deliver(apic, APIC_LVTT);
- if (apic_lvtt_tscdeadline(apic))
- apic->lapic_timer.tscdeadline = 0;
- if (apic_lvtt_oneshot(apic)) {
- apic->lapic_timer.tscdeadline = 0;
- apic->lapic_timer.target_expiration = 0;
- }
+ kvm_apic_inject_pending_timer_irqs(apic);
atomic_set(&apic->lapic_timer.pending, 0);
}
}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 3674717..e41936b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -236,6 +236,7 @@ void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu);
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu);
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu);
void kvm_lapic_restart_hv_timer(struct kvm_vcpu *vcpu);
+bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu);

static inline enum lapic_mode kvm_apic_mode(u64 apic_base)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8fbea03..f45c51e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7059,7 +7059,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc,
u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles;
struct kvm_timer *ktimer = &vcpu->arch.apic->lapic_timer;

- if (kvm_mwait_in_guest(vcpu->kvm))
+ if (kvm_mwait_in_guest(vcpu->kvm) ||
+ posted_interrupt_inject_timer(vcpu))
return -EOPNOTSUPP;

vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9450a16..dae18f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -54,6 +54,7 @@
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
+#include <linux/sched/isolation.h>
#include <linux/mem_encrypt.h>

#include <trace/events/kvm.h>
@@ -155,6 +156,9 @@ EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
static bool __read_mostly force_emulation_prefix = false;
module_param(force_emulation_prefix, bool, S_IRUGO);

+bool __read_mostly pi_inject_timer = 0;
+module_param(pi_inject_timer, bool, S_IRUGO | S_IWUSR);
+
#define KVM_NR_SHARED_MSRS 16

struct kvm_shared_msrs_global {
@@ -7032,6 +7036,7 @@ int kvm_arch_init(void *opaque)
host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);

kvm_lapic_init();
+ pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
#ifdef CONFIG_X86_64
pvclock_gtod_register_notifier(&pvclock_gtod_notifier);

diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e08a128..10b26f4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -301,6 +301,8 @@ extern unsigned int min_timer_period_us;

extern bool enable_vmware_backdoor;

+extern bool pi_inject_timer;
+
extern struct static_key kvm_no_apic_vcpu;

static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index b0fb144..6fc5407 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -19,6 +19,7 @@ enum hk_flags {
DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
extern int housekeeping_any_cpu(enum hk_flags flags);
extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags);
+extern bool housekeeping_enabled(enum hk_flags flags);
extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
extern void __init housekeeping_init(void);
@@ -38,6 +39,7 @@ static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags)
static inline void housekeeping_affine(struct task_struct *t,
enum hk_flags flags) { }
static inline void housekeeping_init(void) { }
+static inline bool housekeeping_enabled(enum hk_flags flags) { }
#endif /* CONFIG_CPU_ISOLATION */

static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 123ea07..ccb2808 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -14,6 +14,12 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden);
static cpumask_var_t housekeeping_mask;
static unsigned int housekeeping_flags;

+bool housekeeping_enabled(enum hk_flags flags)
+{
+ return !!(housekeeping_flags & flags);
+}
+EXPORT_SYMBOL_GPL(housekeeping_enabled);
+
int housekeeping_any_cpu(enum hk_flags flags)
{
if (static_branch_unlikely(&housekeeping_overridden))
--
2.7.4

2019-06-17 11:25:30

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 3/5] KVM: LAPIC: Ignore timer migration when lapic timer is injected by pi

From: Wanpeng Li <[email protected]>

When lapic timer is injected by posted-interrupt, the emulated timer is
offload to the housekeeping cpu. The timer interrupt will be delivered
properly, no need to migrate timer.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9ceeee5..665b1bb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2520,7 +2520,8 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
{
struct hrtimer *timer;

- if (!lapic_in_kernel(vcpu))
+ if (!lapic_in_kernel(vcpu) ||
+ posted_interrupt_inject_timer(vcpu))
return;

timer = &vcpu->arch.apic->lapic_timer.timer;
--
2.7.4

2019-06-17 11:25:31

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 4/5] KVM: LAPIC: Don't posted inject already-expired timer

From: Wanpeng Li <[email protected]>

already-expired timer interrupt can be injected to guest when vCPU who
arms the lapic timer re-vmentry, don't posted inject in this case.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 665b1bb..1a31389 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1452,7 +1452,7 @@ static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
}
}

-static void apic_timer_expired(struct kvm_lapic *apic)
+static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
{
struct kvm_vcpu *vcpu = apic->vcpu;
struct swait_queue_head *q = &vcpu->wq;
@@ -1461,7 +1461,7 @@ static void apic_timer_expired(struct kvm_lapic *apic)
if (atomic_read(&apic->lapic_timer.pending))
return;

- if (posted_interrupt_inject_timer(apic->vcpu)) {
+ if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
kvm_apic_inject_pending_timer_irqs(apic);
return;
}
@@ -1605,7 +1605,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
} else
- apic_timer_expired(apic);
+ apic_timer_expired(apic, false);

local_irq_restore(flags);
}
@@ -1695,7 +1695,7 @@ static void start_sw_period(struct kvm_lapic *apic)

if (ktime_after(ktime_get(),
apic->lapic_timer.target_expiration)) {
- apic_timer_expired(apic);
+ apic_timer_expired(apic, false);

if (apic_lvtt_oneshot(apic))
return;
@@ -1757,7 +1757,7 @@ static bool start_hv_timer(struct kvm_lapic *apic)
if (atomic_read(&ktimer->pending)) {
cancel_hv_timer(apic);
} else if (expired) {
- apic_timer_expired(apic);
+ apic_timer_expired(apic, false);
cancel_hv_timer(apic);
}
}
@@ -1807,7 +1807,7 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
goto out;
WARN_ON(swait_active(&vcpu->wq));
cancel_hv_timer(apic);
- apic_timer_expired(apic);
+ apic_timer_expired(apic, false);

if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
advance_periodic_target_expiration(apic);
@@ -2310,7 +2310,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
struct kvm_lapic *apic = container_of(ktimer, struct kvm_lapic, lapic_timer);

- apic_timer_expired(apic);
+ apic_timer_expired(apic, true);

if (lapic_is_periodic(apic)) {
advance_periodic_target_expiration(apic);
--
2.7.4

2019-06-17 11:26:44

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer

From: Wanpeng Li <[email protected]>

Wait before calling posted-interrupt deliver function directly to add
advance timer support to pi_inject_timer.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 6 ++++--
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1a31389..1a31ba5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
return;

if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
+ if (apic->lapic_timer.timer_advance_ns)
+ kvm_wait_lapic_expire(vcpu, true);
kvm_apic_inject_pending_timer_irqs(apic);
return;
}
@@ -1553,7 +1555,7 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
}

-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u64 guest_tsc, tsc_deadline;
@@ -1561,7 +1563,7 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
if (apic->lapic_timer.expired_tscdeadline == 0)
return;

- if (!lapic_timer_int_injected(vcpu))
+ if (!lapic_timer_int_injected(vcpu) && !pi_inject)
return;

tsc_deadline = apic->lapic_timer.expired_tscdeadline;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index e41936b..3d8a043 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -225,7 +225,7 @@ static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)

bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);

-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu);
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu, bool pi_inject);

bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bbc31f7..7e65de4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5648,7 +5648,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)

if (lapic_in_kernel(vcpu) &&
vcpu->arch.apic->lapic_timer.timer_advance_ns)
- kvm_wait_lapic_expire(vcpu);
+ kvm_wait_lapic_expire(vcpu, false);

/*
* If this vCPU has touched SPEC_CTRL, restore the guest's value if
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f45c51e..718a3ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6462,7 +6462,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)

if (lapic_in_kernel(vcpu) &&
vcpu->arch.apic->lapic_timer.timer_advance_ns)
- kvm_wait_lapic_expire(vcpu);
+ kvm_wait_lapic_expire(vcpu, false);

/*
* If this vCPU has touched SPEC_CTRL, restore the guest's value if
--
2.7.4

2019-06-17 11:50:11

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned

On Mon, Jun 17, 2019 at 07:24:43PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <[email protected]>
>
> Make lapic timer unpinned when timer is injected by posted-interrupt,

It has nothing to do with PI, yet?

And, how about mentioning 61abdbe0bc and telling that this could be
another solution for that problem (but will be used in follow up
patches)?

> the emulated timer can be offload to the housekeeping cpus, kick after
> setting the pending timer request as alternative to commit 61abdbe0bcc.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 8 ++++----
> arch/x86/kvm/x86.c | 6 +-----
> 2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e82a18c..87ecb56 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1578,7 +1578,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
> likely(ns > apic->lapic_timer.timer_advance_ns)) {
> expire = ktime_add_ns(now, ns);
> expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
> - hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED);
> + hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS);
> } else
> apic_timer_expired(apic);
>
> @@ -1680,7 +1680,7 @@ static void start_sw_period(struct kvm_lapic *apic)
>
> hrtimer_start(&apic->lapic_timer.timer,
> apic->lapic_timer.target_expiration,
> - HRTIMER_MODE_ABS_PINNED);
> + HRTIMER_MODE_ABS);
> }
>
> bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
> @@ -2317,7 +2317,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
> apic->vcpu = vcpu;
>
> hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
> - HRTIMER_MODE_ABS_PINNED);
> + HRTIMER_MODE_ABS);
> apic->lapic_timer.timer.function = apic_timer_fn;
> if (timer_advance_ns == -1) {
> apic->lapic_timer.timer_advance_ns = 1000;
> @@ -2506,7 +2506,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
>
> timer = &vcpu->arch.apic->lapic_timer.timer;
> if (hrtimer_cancel(timer))
> - hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
> + hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
> }
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0a05a4e..9450a16 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)
>
> void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
> {
> - /*
> - * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
> - * vcpu_enter_guest. This function is only called from
> - * the physical CPU that is running vcpu.
> - */
> kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> + kvm_vcpu_kick(vcpu);
> }
>
> static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
> --
> 2.7.4
>

Regards,

--
Peter Xu

2019-06-17 21:33:14

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer

2019-06-17 19:24+0800, Wanpeng Li:
> From: Wanpeng Li <[email protected]>
>
> Wait before calling posted-interrupt deliver function directly to add
> advance timer support to pi_inject_timer.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---

Please merge this patch with [2/5], so bisection doesn't break.

> arch/x86/kvm/lapic.c | 6 ++++--
> arch/x86/kvm/lapic.h | 2 +-
> arch/x86/kvm/svm.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 4 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 1a31389..1a31ba5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
> return;
>
> if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> + if (apic->lapic_timer.timer_advance_ns)
> + kvm_wait_lapic_expire(vcpu, true);

From where does kvm_wait_lapic_expire() take
apic->lapic_timer.expired_tscdeadline?

(I think it would be best to take the functional core of
kvm_wait_lapic_expire() and make it into a function that takes the
expired_tscdeadline as an argument.)

Thanks.

2019-06-18 00:38:41

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 1/5] KVM: LAPIC: Make lapic timer unpinned

On Mon, 17 Jun 2019 at 19:48, Peter Xu <[email protected]> wrote:
>
> On Mon, Jun 17, 2019 at 07:24:43PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <[email protected]>
> >
> > Make lapic timer unpinned when timer is injected by posted-interrupt,
>
> It has nothing to do with PI, yet?
>
> And, how about mentioning 61abdbe0bc and telling that this could be
> another solution for that problem (but will be used in follow up
> patches)?
>
> > the emulated timer can be offload to the housekeeping cpus, kick after
> > setting the pending timer request as alternative to commit 61abdbe0bcc.

Yeah, the patch description needs to rework. Thank you.

Regards,
Wanpeng Li

2019-06-18 00:43:46

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer

On Tue, 18 Jun 2019 at 05:32, Radim Krčmář <[email protected]> wrote:
>
> 2019-06-17 19:24+0800, Wanpeng Li:
> > From: Wanpeng Li <[email protected]>
> >
> > Wait before calling posted-interrupt deliver function directly to add
> > advance timer support to pi_inject_timer.
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Marcelo Tosatti <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
>
> Please merge this patch with [2/5], so bisection doesn't break.

Agreed.

>
> > arch/x86/kvm/lapic.c | 6 ++++--
> > arch/x86/kvm/lapic.h | 2 +-
> > arch/x86/kvm/svm.c | 2 +-
> > arch/x86/kvm/vmx/vmx.c | 2 +-
> > 4 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 1a31389..1a31ba5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
> > return;
> >
> > if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> > + if (apic->lapic_timer.timer_advance_ns)
> > + kvm_wait_lapic_expire(vcpu, true);
>
> From where does kvm_wait_lapic_expire() take
> apic->lapic_timer.expired_tscdeadline?

Sorry, I failed to understand this.
https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/arch/x86/kvm/lapic.c?h=queue#n1541
We can get apic->lapic_timer.expired_tscdeadline in
kvm_wait_lapic_expire() directly.

Regards,
Wanpeng Li

2019-06-18 00:57:37

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] KVM: LAPIC: add advance timer support to pi_inject_timer

On Tue, 18 Jun 2019 at 08:44, Wanpeng Li <[email protected]> wrote:
>
> On Tue, 18 Jun 2019 at 05:32, Radim Krčmář <[email protected]> wrote:
> >
> > 2019-06-17 19:24+0800, Wanpeng Li:
> > > From: Wanpeng Li <[email protected]>
> > >
> > > Wait before calling posted-interrupt deliver function directly to add
> > > advance timer support to pi_inject_timer.
> > >
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: Radim Krčmář <[email protected]>
> > > Cc: Marcelo Tosatti <[email protected]>
> > > Signed-off-by: Wanpeng Li <[email protected]>
> > > ---
> >
> > Please merge this patch with [2/5], so bisection doesn't break.
>
> Agreed.
>
> >
> > > arch/x86/kvm/lapic.c | 6 ++++--
> > > arch/x86/kvm/lapic.h | 2 +-
> > > arch/x86/kvm/svm.c | 2 +-
> > > arch/x86/kvm/vmx/vmx.c | 2 +-
> > > 4 files changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 1a31389..1a31ba5 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1462,6 +1462,8 @@ static void apic_timer_expired(struct kvm_lapic *apic, bool can_pi_inject)
> > > return;
> > >
> > > if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
> > > + if (apic->lapic_timer.timer_advance_ns)
> > > + kvm_wait_lapic_expire(vcpu, true);
> >
> > From where does kvm_wait_lapic_expire() take
> > apic->lapic_timer.expired_tscdeadline?
>
> Sorry, I failed to understand this.
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/arch/x86/kvm/lapic.c?h=queue#n1541
> We can get apic->lapic_timer.expired_tscdeadline in
> kvm_wait_lapic_expire() directly.

Oh, miss the latest expired_tscdeadline, how about something like below?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1a31ba5..7cd95ea 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1461,6 +1461,9 @@ static void apic_timer_expired(struct kvm_lapic
*apic, bool can_pi_inject)
if (atomic_read(&apic->lapic_timer.pending))
return;

+ if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
+ ktimer->expired_tscdeadline = ktimer->tscdeadline;
+
if (can_pi_inject && posted_interrupt_inject_timer(apic->vcpu)) {
if (apic->lapic_timer.timer_advance_ns)
kvm_wait_lapic_expire(vcpu, true);
@@ -1477,9 +1480,6 @@ static void apic_timer_expired(struct kvm_lapic
*apic, bool can_pi_inject)
*/
if (swait_active(q))
swake_up_one(q);
-
- if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
- ktimer->expired_tscdeadline = ktimer->tscdeadline;
}

/*

2019-06-18 13:37:03

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <[email protected]>
>
> Dedicated instances are currently disturbed by unnecessary jitter due
> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> There is no hardware virtual timer on Intel for guest like ARM. Both
> programming timer in guest and the emulated timer fires incur vmexits.
> This patch tries to avoid vmexit which is incurred by the emulated
> timer fires in dedicated instance scenario.
>
> When nohz_full is enabled in dedicated instances scenario, the emulated
> timers can be offload to the nearest busy housekeeping cpus since APICv
> is really common in recent years. The guest timer interrupt is injected
> by posted-interrupt which is delivered by housekeeping cpu once the emulated
> timer fires.
>
> The host admin should fine tuned, e.g. dedicated instances scenario w/
> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> mode, ~3% redis performance benefit can be observed on Skylake server.
>
> w/o patch:
>
> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>
> EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
>
> w/ patch:
>
> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>
> EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Marcelo Tosatti <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> arch/x86/kvm/lapic.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 3 ++-
> arch/x86/kvm/x86.c | 5 +++++
> arch/x86/kvm/x86.h | 2 ++
> include/linux/sched/isolation.h | 2 ++
> kernel/sched/isolation.c | 6 ++++++
> 7 files changed, 44 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 87ecb56..9ceeee5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> return apic->vcpu->vcpu_id;
> }
>
> +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> +{
> + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> + kvm_hlt_in_guest(vcpu->kvm);
> +}
> +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

Paolo, can you explain the reasoning behind this?

Should not be necessary...

2019-06-19 00:35:21

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
>
> On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <[email protected]>
> >
> > Dedicated instances are currently disturbed by unnecessary jitter due
> > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > There is no hardware virtual timer on Intel for guest like ARM. Both
> > programming timer in guest and the emulated timer fires incur vmexits.
> > This patch tries to avoid vmexit which is incurred by the emulated
> > timer fires in dedicated instance scenario.
> >
> > When nohz_full is enabled in dedicated instances scenario, the emulated
> > timers can be offload to the nearest busy housekeeping cpus since APICv
> > is really common in recent years. The guest timer interrupt is injected
> > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > timer fires.
> >
> > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > mode, ~3% redis performance benefit can be observed on Skylake server.
> >
> > w/o patch:
> >
> > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> >
> > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> >
> > w/ patch:
> >
> > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> >
> > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Marcelo Tosatti <[email protected]>
> > Signed-off-by: Wanpeng Li <[email protected]>
> > ---
> > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > arch/x86/kvm/lapic.h | 1 +
> > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > arch/x86/kvm/x86.c | 5 +++++
> > arch/x86/kvm/x86.h | 2 ++
> > include/linux/sched/isolation.h | 2 ++
> > kernel/sched/isolation.c | 6 ++++++
> > 7 files changed, 44 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 87ecb56..9ceeee5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > return apic->vcpu->vcpu_id;
> > }
> >
> > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > +{
> > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > + kvm_hlt_in_guest(vcpu->kvm);
> > +}
> > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>
> Paolo, can you explain the reasoning behind this?
>
> Should not be necessary...

Here some new discussions:
https://lkml.org/lkml/2019/6/13/1423
https://lkml.org/lkml/2019/6/13/1420

2019-06-19 21:04:41

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

Hi Li,

On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> >
> > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > From: Wanpeng Li <[email protected]>
> > >
> > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > programming timer in guest and the emulated timer fires incur vmexits.
> > > This patch tries to avoid vmexit which is incurred by the emulated
> > > timer fires in dedicated instance scenario.
> > >
> > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > is really common in recent years. The guest timer interrupt is injected
> > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > timer fires.
> > >
> > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > >
> > > w/o patch:
> > >
> > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > >
> > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > >
> > > w/ patch:
> > >
> > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > >
> > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > >
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: Radim Krčmář <[email protected]>
> > > Cc: Marcelo Tosatti <[email protected]>
> > > Signed-off-by: Wanpeng Li <[email protected]>
> > > ---
> > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > arch/x86/kvm/lapic.h | 1 +
> > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > arch/x86/kvm/x86.c | 5 +++++
> > > arch/x86/kvm/x86.h | 2 ++
> > > include/linux/sched/isolation.h | 2 ++
> > > kernel/sched/isolation.c | 6 ++++++
> > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 87ecb56..9ceeee5 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > return apic->vcpu->vcpu_id;
> > > }
> > >
> > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > +{
> > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > + kvm_hlt_in_guest(vcpu->kvm);
> > > +}
> > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> >
> > Paolo, can you explain the reasoning behind this?
> >
> > Should not be necessary...
>
> Here some new discussions:
> https://lkml.org/lkml/2019/6/13/1423

Not sure what this has to do with injecting timer
interrupts via posted interrupts ?

> https://lkml.org/lkml/2019/6/13/1420

Two things (unrelated to the above):

1) hrtimer_reprogram is unable to wakeup a remote vCPU, therefore
i believe execution of apic_timer_expired can be delayed.
Should wakeup the CPU which hosts apic_timer_expired.


/*
* If the timer is not on the current cpu, we cannot reprogram
* the other cpus clock event device.
*/
if (base->cpu_base != cpu_base)
return;

2) Getting an oops when running cyclictest, debugging...

2019-06-20 00:51:48

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
>
> Hi Li,
>
> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <[email protected]>
> > > >
> > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > timer fires in dedicated instance scenario.
> > > >
> > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > is really common in recent years. The guest timer interrupt is injected
> > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > timer fires.
> > > >
> > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > >
> > > > w/o patch:
> > > >
> > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > >
> > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > >
> > > > w/ patch:
> > > >
> > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > >
> > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > >
> > > > Cc: Paolo Bonzini <[email protected]>
> > > > Cc: Radim Krčmář <[email protected]>
> > > > Cc: Marcelo Tosatti <[email protected]>
> > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > ---
> > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > arch/x86/kvm/lapic.h | 1 +
> > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > arch/x86/kvm/x86.c | 5 +++++
> > > > arch/x86/kvm/x86.h | 2 ++
> > > > include/linux/sched/isolation.h | 2 ++
> > > > kernel/sched/isolation.c | 6 ++++++
> > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 87ecb56..9ceeee5 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > return apic->vcpu->vcpu_id;
> > > > }
> > > >
> > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > >
> > > Paolo, can you explain the reasoning behind this?
> > >
> > > Should not be necessary...
> >
> > Here some new discussions:
> > https://lkml.org/lkml/2019/6/13/1423
>
> Not sure what this has to do with injecting timer
> interrupts via posted interrupts ?

Yeah, need more explain from Paolo! Ping Paolo,

>
> > https://lkml.org/lkml/2019/6/13/1420
>
> Two things (unrelated to the above):
>
> 1) hrtimer_reprogram is unable to wakeup a remote vCPU, therefore
> i believe execution of apic_timer_expired can be delayed.
> Should wakeup the CPU which hosts apic_timer_expired.
>
>
> /*
> * If the timer is not on the current cpu, we cannot reprogram
> * the other cpus clock event device.
> */
> if (base->cpu_base != cpu_base)
> return;

If it is not the first expiring timer on the new target, we don't need
to reprogram. It can't be the first expired timer on the new target
since below:

/*
* We switch the timer base to a power-optimized selected CPU target,
* if:
* - NO_HZ_COMMON is enabled
* - timer migration is enabled
* - the timer callback is not running
* - the timer is not the first expiring timer on the new target
*
* If one of the above requirements is not fulfilled we move the timer
* to the current CPU or leave it on the previously assigned CPU if
* the timer callback is currently running.
*/
static inline struct hrtimer_clock_base *
switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
int pinned)

>
> 2) Getting an oops when running cyclictest, debugging...

Radim point out one issue in patch 5/5, not sure that is cause.

Regards,
Wanpeng Li

2019-06-21 01:43:37

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
>
> Hi Li,
>
> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > From: Wanpeng Li <[email protected]>
> > > >
> > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > timer fires in dedicated instance scenario.
> > > >
> > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > is really common in recent years. The guest timer interrupt is injected
> > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > timer fires.
> > > >
> > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > >
> > > > w/o patch:
> > > >
> > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > >
> > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > >
> > > > w/ patch:
> > > >
> > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > >
> > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > >
> > > > Cc: Paolo Bonzini <[email protected]>
> > > > Cc: Radim Krčmář <[email protected]>
> > > > Cc: Marcelo Tosatti <[email protected]>
> > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > ---
> > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > arch/x86/kvm/lapic.h | 1 +
> > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > arch/x86/kvm/x86.c | 5 +++++
> > > > arch/x86/kvm/x86.h | 2 ++
> > > > include/linux/sched/isolation.h | 2 ++
> > > > kernel/sched/isolation.c | 6 ++++++
> > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 87ecb56..9ceeee5 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > return apic->vcpu->vcpu_id;
> > > > }
> > > >
> > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > >
> > > Paolo, can you explain the reasoning behind this?
> > >
> > > Should not be necessary...

https://lkml.org/lkml/2019/6/5/436 "Here you need to check
kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
through kvm_apic_expired if the guest needs to be woken up from
kvm_vcpu_block."

I think we can still be woken up from kvm_vcpu_block() if pir is set.

Regards,
Wanpeng Li

2019-06-21 22:12:00

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> >
> > Hi Li,
> >
> > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > >
> > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > From: Wanpeng Li <[email protected]>
> > > > >
> > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > timer fires in dedicated instance scenario.
> > > > >
> > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > timer fires.
> > > > >
> > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > >
> > > > > w/o patch:
> > > > >
> > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > >
> > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > >
> > > > > w/ patch:
> > > > >
> > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > >
> > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > >
> > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > Cc: Radim Krčmář <[email protected]>
> > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > ---
> > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > include/linux/sched/isolation.h | 2 ++
> > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index 87ecb56..9ceeee5 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > return apic->vcpu->vcpu_id;
> > > > > }
> > > > >
> > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > >
> > > > Paolo, can you explain the reasoning behind this?
> > > >
> > > > Should not be necessary...
>
> https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> through kvm_apic_expired if the guest needs to be woken up from
> kvm_vcpu_block."

Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
be woken up, if it receives a timer interrupt.

But your patch will go through:

kvm_apic_inject_pending_timer_irqs
__apic_accept_irq ->
vmx_deliver_posted_interrupt ->
kvm_vcpu_trigger_posted_interrupt returns false
(because vcpu->mode != IN_GUEST_MODE) ->
kvm_vcpu_kick

Which will wakeup the vcpu.

Apart from this oops, which triggers when running:
taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40

Timer interruption from housekeeping vcpus is normal to me
(without requiring kvm_hlt_in_guest).

[ 1145.849646] BUG: kernel NULL pointer dereference, address:
0000000000000000
[ 1145.850481] #PF: supervisor instruction fetch in kernel mode
[ 1145.851161] #PF: error_code(0x0010) - not-present page
[ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
PMD 0
[ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
[ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
[ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[ 1145.854554] RIP: 0010:0x0
[ 1145.854879] Code: Bad RIP value.
[ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
[ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
00000000c526b7c4
[ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
ffff8882b58a8320
[ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
0000000000000832
[ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
0000000000000002
[ 1145.860047] FS: 0000000000000000(0000) GS:ffff8882b5880000(0000)
knlGS:0000000000000000
[ 1145.860994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
0000000000160ee0
[ 1145.862570] Call Trace:
[ 1145.862877] cpuidle_enter_state+0x7c/0x3e0
[ 1145.863392] cpuidle_enter+0x29/0x40


> I think we can still be woken up from kvm_vcpu_block() if pir is set.

Exactly.

2019-06-24 09:01:16

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <[email protected]> wrote:
>
> On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> > >
> > > Hi Li,
> > >
> > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > > >
> > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > From: Wanpeng Li <[email protected]>
> > > > > >
> > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > timer fires in dedicated instance scenario.
> > > > > >
> > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > timer fires.
> > > > > >
> > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > >
> > > > > > w/o patch:
> > > > > >
> > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > >
> > > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > > >
> > > > > > w/ patch:
> > > > > >
> > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > >
> > > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > > >
> > > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > > Cc: Radim Krčmář <[email protected]>
> > > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > > ---
> > > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > > include/linux/sched/isolation.h | 2 ++
> > > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > index 87ecb56..9ceeee5 100644
> > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > return apic->vcpu->vcpu_id;
> > > > > > }
> > > > > >
> > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > >
> > > > > Paolo, can you explain the reasoning behind this?
> > > > >
> > > > > Should not be necessary...
> >
> > https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > through kvm_apic_expired if the guest needs to be woken up from
> > kvm_vcpu_block."
>
> Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> be woken up, if it receives a timer interrupt.
>
> But your patch will go through:
>
> kvm_apic_inject_pending_timer_irqs
> __apic_accept_irq ->
> vmx_deliver_posted_interrupt ->
> kvm_vcpu_trigger_posted_interrupt returns false
> (because vcpu->mode != IN_GUEST_MODE) ->
> kvm_vcpu_kick
>
> Which will wakeup the vcpu.

Hi Marcelo,

>
> Apart from this oops, which triggers when running:
> taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40

I try both host and guest use latest kvm/queue w/ CONFIG_PREEMPT
enabled, and expose mwait as your config, however, there is no oops.
Can you reproduce steadily or encounter casually? Can you reproduce
w/o the patchset?

>
> Timer interruption from housekeeping vcpus is normal to me
> (without requiring kvm_hlt_in_guest).
>
> [ 1145.849646] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 1145.850481] #PF: supervisor instruction fetch in kernel mode
> [ 1145.851161] #PF: error_code(0x0010) - not-present page
> [ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
> PMD 0
> [ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
> [ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
> [ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> [ 1145.854554] RIP: 0010:0x0
> [ 1145.854879] Code: Bad RIP value.
> [ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
> [ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
> 00000000c526b7c4
> [ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
> ffff8882b58a8320
> [ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
> 0000000000000832
> [ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
> 0000000000000002
> [ 1145.860047] FS: 0000000000000000(0000) GS:ffff8882b5880000(0000)
> knlGS:0000000000000000
> [ 1145.860994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
> 0000000000160ee0
> [ 1145.862570] Call Trace:
> [ 1145.862877] cpuidle_enter_state+0x7c/0x3e0
> [ 1145.863392] cpuidle_enter+0x29/0x40
>
>
> > I think we can still be woken up from kvm_vcpu_block() if pir is set.
>
> Exactly.
>

2019-06-25 19:43:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On 21/06/19 23:42, Marcelo Tosatti wrote:
> On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
>> On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
>>>
>>> Hi Li,
>>>
>>> On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
>>>> On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
>>>>>
>>>>> On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
>>>>>> From: Wanpeng Li <[email protected]>
>>>>>>
>>>>>> Dedicated instances are currently disturbed by unnecessary jitter due
>>>>>> to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
>>>>>> There is no hardware virtual timer on Intel for guest like ARM. Both
>>>>>> programming timer in guest and the emulated timer fires incur vmexits.
>>>>>> This patch tries to avoid vmexit which is incurred by the emulated
>>>>>> timer fires in dedicated instance scenario.
>>>>>>
>>>>>> When nohz_full is enabled in dedicated instances scenario, the emulated
>>>>>> timers can be offload to the nearest busy housekeeping cpus since APICv
>>>>>> is really common in recent years. The guest timer interrupt is injected
>>>>>> by posted-interrupt which is delivered by housekeeping cpu once the emulated
>>>>>> timer fires.
>>>>>>
>>>>>> The host admin should fine tuned, e.g. dedicated instances scenario w/
>>>>>> nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
>>>>>> for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
>>>>>> mode, ~3% redis performance benefit can be observed on Skylake server.
>>>>>>
>>>>>> w/o patch:
>>>>>>
>>>>>> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
>>>>>>
>>>>>> w/ patch:
>>>>>>
>>>>>> VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
>>>>>>
>>>>>> EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
>>>>>>
>>>>>> Cc: Paolo Bonzini <[email protected]>
>>>>>> Cc: Radim Krčmář <[email protected]>
>>>>>> Cc: Marcelo Tosatti <[email protected]>
>>>>>> Signed-off-by: Wanpeng Li <[email protected]>
>>>>>> ---
>>>>>> arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
>>>>>> arch/x86/kvm/lapic.h | 1 +
>>>>>> arch/x86/kvm/vmx/vmx.c | 3 ++-
>>>>>> arch/x86/kvm/x86.c | 5 +++++
>>>>>> arch/x86/kvm/x86.h | 2 ++
>>>>>> include/linux/sched/isolation.h | 2 ++
>>>>>> kernel/sched/isolation.c | 6 ++++++
>>>>>> 7 files changed, 44 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>>> index 87ecb56..9ceeee5 100644
>>>>>> --- a/arch/x86/kvm/lapic.c
>>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>>> @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>>>>>> return apic->vcpu->vcpu_id;
>>>>>> }
>>>>>>
>>>>>> +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
>>>>>> + kvm_hlt_in_guest(vcpu->kvm);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
>>>>>
>>>>> Paolo, can you explain the reasoning behind this?
>>>>>
>>>>> Should not be necessary...
>>
>> https://lkml.org/lkml/2019/6/5/436 "Here you need to check
>> kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
>> through kvm_apic_expired if the guest needs to be woken up from
>> kvm_vcpu_block."
>
> Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> be woken up, if it receives a timer interrupt.

Yes, this is true.

Paolo

> But your patch will go through:
>
> kvm_apic_inject_pending_timer_irqs
> __apic_accept_irq ->
> vmx_deliver_posted_interrupt ->
> kvm_vcpu_trigger_posted_interrupt returns false
> (because vcpu->mode != IN_GUEST_MODE) ->
> kvm_vcpu_kick
>
> Which will wakeup the vcpu.
>
> Apart from this oops, which triggers when running:
> taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40
>
> Timer interruption from housekeeping vcpus is normal to me
> (without requiring kvm_hlt_in_guest).
>
> [ 1145.849646] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 1145.850481] #PF: supervisor instruction fetch in kernel mode
> [ 1145.851161] #PF: error_code(0x0010) - not-present page
> [ 1145.851772] PGD 80000002a9fa5067 P4D 80000002a9fa5067 PUD 2abcbb067
> PMD 0
> [ 1145.852578] Oops: 0010 [#1] PREEMPT SMP PTI
> [ 1145.853066] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.0-rc1+ #11
> [ 1145.853809] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> [ 1145.854554] RIP: 0010:0x0
> [ 1145.854879] Code: Bad RIP value.
> [ 1145.855270] RSP: 0018:ffffc90001903e68 EFLAGS: 00010013
> [ 1145.855902] RAX: 0000010ac9f60043 RBX: ffff8882b58a8320 RCX:
> 00000000c526b7c4
> [ 1145.856726] RDX: 0000000000000000 RSI: ffffffff820d9640 RDI:
> ffff8882b58a8320
> [ 1145.857560] RBP: ffffffff820d9640 R08: 00000000c526b7c4 R09:
> 0000000000000832
> [ 1145.858390] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [ 1145.859222] R13: ffffffff820d9658 R14: ffff8881063b2880 R15:
> 0000000000000002
> [ 1145.860047] FS: 0000000000000000(0000) GS:ffff8882b5880000(0000)
> knlGS:0000000000000000
> [ 1145.860994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1145.861692] CR2: ffffffffffffffd6 CR3: 00000002ab1de001 CR4:
> 0000000000160ee0
> [ 1145.862570] Call Trace:
> [ 1145.862877] cpuidle_enter_state+0x7c/0x3e0
> [ 1145.863392] cpuidle_enter+0x29/0x40
>
>
>> I think we can still be woken up from kvm_vcpu_block() if pir is set.
>
> Exactly.
>

2019-06-25 19:59:57

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <[email protected]> wrote:
> >
> > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> > > >
> > > > Hi Li,
> > > >
> > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > From: Wanpeng Li <[email protected]>
> > > > > > >
> > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > timer fires in dedicated instance scenario.
> > > > > > >
> > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > timer fires.
> > > > > > >
> > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > >
> > > > > > > w/o patch:
> > > > > > >
> > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > >
> > > > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > > > >
> > > > > > > w/ patch:
> > > > > > >
> > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > >
> > > > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > > > >
> > > > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > > > Cc: Radim Krčmář <[email protected]>
> > > > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > > > ---
> > > > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > > > include/linux/sched/isolation.h | 2 ++
> > > > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > return apic->vcpu->vcpu_id;
> > > > > > > }
> > > > > > >
> > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > >
> > > > > > Paolo, can you explain the reasoning behind this?
> > > > > >
> > > > > > Should not be necessary...
> > >
> > > https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > through kvm_apic_expired if the guest needs to be woken up from
> > > kvm_vcpu_block."
> >
> > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > be woken up, if it receives a timer interrupt.
> >
> > But your patch will go through:
> >
> > kvm_apic_inject_pending_timer_irqs
> > __apic_accept_irq ->
> > vmx_deliver_posted_interrupt ->
> > kvm_vcpu_trigger_posted_interrupt returns false
> > (because vcpu->mode != IN_GUEST_MODE) ->
> > kvm_vcpu_kick
> >
> > Which will wakeup the vcpu.
>
> Hi Marcelo,
>
> >
> > Apart from this oops, which triggers when running:
> > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40
>
> I try both host and guest use latest kvm/queue w/ CONFIG_PREEMPT
> enabled, and expose mwait as your config, however, there is no oops.
> Can you reproduce steadily or encounter casually? Can you reproduce
> w/o the patchset?

Hi Li,

Steadily.

Do you have this as well:

Index: kvm/arch/x86/kvm/lapic.c
===================================================================
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -129,8 +129,7 @@ static inline u32 kvm_x2apic_id(struct k

bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
{
- return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
- kvm_hlt_in_guest(vcpu->kvm);
+ return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
}
EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

2019-06-26 11:04:22

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <[email protected]> wrote:
>
> On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <[email protected]> wrote:
> > >
> > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> > > > >
> > > > > Hi Li,
> > > > >
> > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > From: Wanpeng Li <[email protected]>
> > > > > > > >
> > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > >
> > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > timer fires.
> > > > > > > >
> > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > >
> > > > > > > > w/o patch:
> > > > > > > >
> > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > >
> > > > > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > > > > >
> > > > > > > > w/ patch:
> > > > > > > >
> > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > >
> > > > > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > > > > >
> > > > > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > > > > Cc: Radim Krčmář <[email protected]>
> > > > > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > > > > include/linux/sched/isolation.h | 2 ++
> > > > > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > > return apic->vcpu->vcpu_id;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > +{
> > > > > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > >
> > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > >
> > > > > > > Should not be necessary...
> > > >
> > > > https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > kvm_vcpu_block."
> > >
> > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > be woken up, if it receives a timer interrupt.
> > >
> > > But your patch will go through:
> > >
> > > kvm_apic_inject_pending_timer_irqs
> > > __apic_accept_irq ->
> > > vmx_deliver_posted_interrupt ->
> > > kvm_vcpu_trigger_posted_interrupt returns false
> > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > kvm_vcpu_kick
> > >
> > > Which will wakeup the vcpu.
> >
> > Hi Marcelo,
> >
> > >
> > > Apart from this oops, which triggers when running:
> > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40
> >
> > I try both host and guest use latest kvm/queue w/ CONFIG_PREEMPT
> > enabled, and expose mwait as your config, however, there is no oops.
> > Can you reproduce steadily or encounter casually? Can you reproduce
> > w/o the patchset?
>
> Hi Li,

Hi Marcelo,

>
> Steadily.
>
> Do you have this as well:

w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
didn't see any oops. Could you observe the oops disappear when w/o
below diff? If the answer is yes, then the oops will not block to
merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
condition to guarantee be woken up from kvm_vcpu_block(). For the
exitless injection if the guest is running(DPDK style workloads that
busy-spin on network card) scenarios, we can find a solution later.

Regards,
Wanpeng Li

>
> Index: kvm/arch/x86/kvm/lapic.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/lapic.c
> +++ kvm/arch/x86/kvm/lapic.c
> @@ -129,8 +129,7 @@ static inline u32 kvm_x2apic_id(struct k
>
> bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> {
> - return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> - kvm_hlt_in_guest(vcpu->kvm);
> + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
> }
> EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);

2019-06-26 16:45:36

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Wed, Jun 26, 2019 at 07:02:13PM +0800, Wanpeng Li wrote:
> On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <[email protected]> wrote:
> >
> > On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <[email protected]> wrote:
> > > >
> > > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> > > > > >
> > > > > > Hi Li,
> > > > > >
> > > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > > From: Wanpeng Li <[email protected]>
> > > > > > > > >
> > > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > > >
> > > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > > timer fires.
> > > > > > > > >
> > > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > > >
> > > > > > > > > w/o patch:
> > > > > > > > >
> > > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > > >
> > > > > > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > > > > > >
> > > > > > > > > w/ patch:
> > > > > > > > >
> > > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > > >
> > > > > > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > > > > > >
> > > > > > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > > > > > Cc: Radim Krčmář <[email protected]>
> > > > > > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > > > > > ---
> > > > > > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > > > > > include/linux/sched/isolation.h | 2 ++
> > > > > > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > > > return apic->vcpu->vcpu_id;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > > +{
> > > > > > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > > >
> > > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > > >
> > > > > > > > Should not be necessary...
> > > > >
> > > > > https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> > > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > > kvm_vcpu_block."
> > > >
> > > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > > be woken up, if it receives a timer interrupt.
> > > >
> > > > But your patch will go through:
> > > >
> > > > kvm_apic_inject_pending_timer_irqs
> > > > __apic_accept_irq ->
> > > > vmx_deliver_posted_interrupt ->
> > > > kvm_vcpu_trigger_posted_interrupt returns false
> > > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > > kvm_vcpu_kick
> > > >
> > > > Which will wakeup the vcpu.
> > >
> > > Hi Marcelo,
> > >
> > > >
> > > > Apart from this oops, which triggers when running:
> > > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40
> > >
> > > I try both host and guest use latest kvm/queue w/ CONFIG_PREEMPT
> > > enabled, and expose mwait as your config, however, there is no oops.
> > > Can you reproduce steadily or encounter casually? Can you reproduce
> > > w/o the patchset?
> >
> > Hi Li,
>
> Hi Marcelo,
>
> >
> > Steadily.
> >
> > Do you have this as well:
>
> w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
> didn't see any oops. Could you observe the oops disappear when w/o
> below diff? If the answer is yes, then the oops will not block to
> merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
> condition to guarantee be woken up from kvm_vcpu_block().

He agreed that its not necessary. Removing the HLT in guest widens
the scope of the patch greatly.

> For the
> exitless injection if the guest is running(DPDK style workloads that
> busy-spin on network card) scenarios, we can find a solution later.

What is the use-case for HLT in guest again?

I'll find the source for the oops (or confirm can't reproduce with
kvm/queue RSN).

2019-06-28 08:28:11

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] KVM: LAPIC: inject lapic timer interrupt by posted interrupt

On Thu, 27 Jun 2019 at 00:44, Marcelo Tosatti <[email protected]> wrote:
>
> On Wed, Jun 26, 2019 at 07:02:13PM +0800, Wanpeng Li wrote:
> > On Wed, 26 Jun 2019 at 03:03, Marcelo Tosatti <[email protected]> wrote:
> > >
> > > On Mon, Jun 24, 2019 at 04:53:53PM +0800, Wanpeng Li wrote:
> > > > On Sat, 22 Jun 2019 at 06:11, Marcelo Tosatti <[email protected]> wrote:
> > > > >
> > > > > On Fri, Jun 21, 2019 at 09:42:39AM +0800, Wanpeng Li wrote:
> > > > > > On Thu, 20 Jun 2019 at 05:04, Marcelo Tosatti <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Li,
> > > > > > >
> > > > > > > On Wed, Jun 19, 2019 at 08:36:06AM +0800, Wanpeng Li wrote:
> > > > > > > > On Tue, 18 Jun 2019 at 21:36, Marcelo Tosatti <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 17, 2019 at 07:24:44PM +0800, Wanpeng Li wrote:
> > > > > > > > > > From: Wanpeng Li <[email protected]>
> > > > > > > > > >
> > > > > > > > > > Dedicated instances are currently disturbed by unnecessary jitter due
> > > > > > > > > > to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
> > > > > > > > > > There is no hardware virtual timer on Intel for guest like ARM. Both
> > > > > > > > > > programming timer in guest and the emulated timer fires incur vmexits.
> > > > > > > > > > This patch tries to avoid vmexit which is incurred by the emulated
> > > > > > > > > > timer fires in dedicated instance scenario.
> > > > > > > > > >
> > > > > > > > > > When nohz_full is enabled in dedicated instances scenario, the emulated
> > > > > > > > > > timers can be offload to the nearest busy housekeeping cpus since APICv
> > > > > > > > > > is really common in recent years. The guest timer interrupt is injected
> > > > > > > > > > by posted-interrupt which is delivered by housekeeping cpu once the emulated
> > > > > > > > > > timer fires.
> > > > > > > > > >
> > > > > > > > > > The host admin should fine tuned, e.g. dedicated instances scenario w/
> > > > > > > > > > nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus
> > > > > > > > > > for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root
> > > > > > > > > > mode, ~3% redis performance benefit can be observed on Skylake server.
> > > > > > > > > >
> > > > > > > > > > w/o patch:
> > > > > > > > > >
> > > > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > > > >
> > > > > > > > > > EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% )
> > > > > > > > > >
> > > > > > > > > > w/ patch:
> > > > > > > > > >
> > > > > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
> > > > > > > > > >
> > > > > > > > > > EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% )
> > > > > > > > > >
> > > > > > > > > > Cc: Paolo Bonzini <[email protected]>
> > > > > > > > > > Cc: Radim Krčmář <[email protected]>
> > > > > > > > > > Cc: Marcelo Tosatti <[email protected]>
> > > > > > > > > > Signed-off-by: Wanpeng Li <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > > arch/x86/kvm/lapic.c | 33 ++++++++++++++++++++++++++-------
> > > > > > > > > > arch/x86/kvm/lapic.h | 1 +
> > > > > > > > > > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > > > > > > > > > arch/x86/kvm/x86.c | 5 +++++
> > > > > > > > > > arch/x86/kvm/x86.h | 2 ++
> > > > > > > > > > include/linux/sched/isolation.h | 2 ++
> > > > > > > > > > kernel/sched/isolation.c | 6 ++++++
> > > > > > > > > > 7 files changed, 44 insertions(+), 8 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > > > > > > index 87ecb56..9ceeee5 100644
> > > > > > > > > > --- a/arch/x86/kvm/lapic.c
> > > > > > > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > > > > > > @@ -122,6 +122,13 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
> > > > > > > > > > return apic->vcpu->vcpu_id;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > +bool posted_interrupt_inject_timer(struct kvm_vcpu *vcpu)
> > > > > > > > > > +{
> > > > > > > > > > + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu) &&
> > > > > > > > > > + kvm_hlt_in_guest(vcpu->kvm);
> > > > > > > > > > +}
> > > > > > > > > > +EXPORT_SYMBOL_GPL(posted_interrupt_inject_timer);
> > > > > > > > >
> > > > > > > > > Paolo, can you explain the reasoning behind this?
> > > > > > > > >
> > > > > > > > > Should not be necessary...
> > > > > >
> > > > > > https://lkml.org/lkml/2019/6/5/436 "Here you need to check
> > > > > > kvm_halt_in_guest, not kvm_mwait_in_guest, because you need to go
> > > > > > through kvm_apic_expired if the guest needs to be woken up from
> > > > > > kvm_vcpu_block."
> > > > >
> > > > > Ah, i think he means that a sleeping vcpu (in kvm_vcpu_block) must
> > > > > be woken up, if it receives a timer interrupt.
> > > > >
> > > > > But your patch will go through:
> > > > >
> > > > > kvm_apic_inject_pending_timer_irqs
> > > > > __apic_accept_irq ->
> > > > > vmx_deliver_posted_interrupt ->
> > > > > kvm_vcpu_trigger_posted_interrupt returns false
> > > > > (because vcpu->mode != IN_GUEST_MODE) ->
> > > > > kvm_vcpu_kick
> > > > >
> > > > > Which will wakeup the vcpu.
> > > >
> > > > Hi Marcelo,
> > > >
> > > > >
> > > > > Apart from this oops, which triggers when running:
> > > > > taskset -c 1 ./cyclictest -D 3600 -p 99 -t 1 -h 30 -m -n -i 50000 -b 40
> > > >
> > > > I try both host and guest use latest kvm/queue w/ CONFIG_PREEMPT
> > > > enabled, and expose mwait as your config, however, there is no oops.
> > > > Can you reproduce steadily or encounter casually? Can you reproduce
> > > > w/o the patchset?
> > >
> > > Hi Li,
> >
> > Hi Marcelo,
> >
> > >
> > > Steadily.
> > >
> > > Do you have this as well:
> >
> > w/ or w/o below diff, testing on both SKX and HSW servers on hand, I
> > didn't see any oops. Could you observe the oops disappear when w/o
> > below diff? If the answer is yes, then the oops will not block to
> > merge the patchset since Paolo prefers to add the kvm_hlt_in_guest()
> > condition to guarantee be woken up from kvm_vcpu_block().
>
> He agreed that its not necessary. Removing the HLT in guest widens
> the scope of the patch greatly.
>
> > For the
> > exitless injection if the guest is running(DPDK style workloads that
> > busy-spin on network card) scenarios, we can find a solution later.
>
> What is the use-case for HLT in guest again?

Together w/ mwait/hlt/pause no vmexits for dedicated instances. In
addition, hlt in guest will disable pv qspinlock which is not optimal
for dedicated instance. Refer to commit
b2798ba0b876 (KVM: X86: Choose qspinlock when dedicated physical CPUs
are available) and commit caa057a2cad64 (KVM: X86: Provide a
capability to disable HLT intercepts).

>
> I'll find the source for the oops (or confirm can't reproduce with
> kvm/queue RSN).

Looking forward to it, thanks Marcelo, hope we can catch the upcoming
merge window. :)

Regards,
Wanpeng Li