2020-05-02 14:00:55

by Maxim Levitsky

[permalink] [raw]
Subject: AVIC related warning in enable_irq_window

Hi!

On kernel 5.7-rc3 I get the following warning when I boot a windows 10 VM with AVIC enabled

[ 6702.706124] WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd]
[ 6702.706124] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio vhost_net vhost vhost_iotlb tap kvm_amd kvm irqbypass hfsplus ntfs msdos xfs hidp ccm rfcomm xt_MASQUERADE xt_conntrack
ipt_REJECT iptable_mangle iptable_nat nf_nat ebtable_filter ebtables ip6table_filter ip6_tables tun bridge pmbus ee1004 pmbus_core jc42 cmac bnep sunrpc nls_iso8859_1 nls_cp437 vfat fat dm_mirror
dm_region_hash dm_log nvidia_uvm(O) ucsi_ccg typec_ucsi typec wmi_bmof mxm_wmi iwlmvm mac80211 edac_mce_amd edac_core libarc4 joydev nvidia(PO) iwlwifi btusb btrtl input_leds btbcm btintel
snd_hda_codec_hdmi bluetooth snd_usb_audio cdc_acm xpad snd_hda_intel cfg80211 snd_usbmidi_lib ff_memless snd_intel_dspcfg snd_rawmidi snd_hda_codec mc ecdh_generic snd_hwdep snd_seq ecc snd_hda_core
thunderbolt rfkill i2c_nvidia_gpu pcspkr efi_pstore snd_seq_device bfq snd_pcm snd_timer snd zenpower i2c_piix4 rtc_cmos tpm_crb tpm_tis tpm_tis_core wmi tpm button binfmt_misc ext4 mbcache jbd2
dm_crypt sd_mod uas
[ 6702.706146] usb_storage hid_generic usbhid hid crc32_pclmul amdgpu crc32c_intel gpu_sched ttm drm_kms_helper cfbfillrect ahci syscopyarea libahci cfbimgblt sysfillrect libata sysimgblt igb
fb_sys_fops ccp cfbcopyarea i2c_algo_bit cec rng_core xhci_pci nvme xhci_hcd drm nvme_core drm_panel_orientation_quirks t10_pi it87 hwmon_vid fuse i2c_dev i2c_core ipv6 autofs4 [last unloaded:
irqbypass]
[ 6702.706640] CPU: 0 PID: 118232 Comm: CPU 0/KVM Tainted: P W O 5.7.0-rc3+ #30
[ 6702.706667] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4c 03/05/2020
[ 6702.706712] RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd]
[ 6702.706759] Code: 0c 10 48 89 df e8 56 3c 00 00 48 8b 83 c8 2d 00 00 f6 40 0c 10 74 31 48 83 bb f0 02 00 00 00 74 0b 80 bb f8 02 00 00 00 74 02 <0f> 0b 81 48 60 00 01 0f 00 c7 40 64 00 00 00 00 48
8b 83 c8 2d 00
[ 6702.706804] RSP: 0018:ffffc900073fbd48 EFLAGS: 00010202
[ 6702.706847] RAX: ffff889f56a51000 RBX: ffff889f6c4f0000 RCX: 0000000000000000
[ 6702.706871] RDX: 000060603ee37040 RSI: 00000000000000fb RDI: ffff889f6c4f0000
[ 6702.706913] RBP: ffffc900073fbd50 R08: 0000000000000001 R09: 0000000000000000
[ 6702.706959] R10: 000000000000029f R11: 0000000000000018 R12: 0000000000000001
[ 6702.706983] R13: 8000000000000000 R14: ffff889f6c4f0000 R15: ffffc9000729e1a0
[ 6702.707034] FS: 00007f8fa96f9700(0000) GS:ffff889fbe000000(0000) knlGS:0000000000000000
[ 6702.707090] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6702.707114] CR2: 0000000000000000 CR3: 0000001f5d236000 CR4: 0000000000340ef0
[ 6702.707173] Call Trace:
[ 6702.707217] kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm]
[ 6702.707273] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
[ 6702.707298] ? _copy_to_user+0x26/0x30
[ 6702.707332] ? kvm_vm_ioctl+0xb3e/0xd90 [kvm]
[ 6702.707374] ? set_next_entity+0x78/0xc0
[ 6702.707407] kvm_vcpu_ioctl+0x236/0x610 [kvm]
[ 6702.707431] ksys_ioctl+0x8a/0xc0
[ 6702.707492] __x64_sys_ioctl+0x1a/0x20
[ 6702.707528] do_syscall_64+0x58/0x210
[ 6702.707553] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 6702.707596] RIP: 0033:0x7f8fb35e235b
[ 6702.707637] Code: 0f 1e fa 48 8b 05 2d 9b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 9a 0c 00 f7
d8 64 89 01 48
[ 6702.707722] RSP: 002b:00007f8fa96f7728 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 6702.707747] RAX: ffffffffffffffda RBX: 0000556b7c859b90 RCX: 00007f8fb35e235b
[ 6702.707788] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000022
[ 6702.707845] RBP: 00007f8fa96f7820 R08: 0000556b7a6207f0 R09: 00000000000000ff
[ 6702.707868] R10: 0000556b79f86ecd R11: 0000000000000246 R12: 00007ffd9a4264de
[ 6702.707908] R13: 00007ffd9a4264df R14: 00007ffd9a4265a0 R15: 00007f8fa96f7a40
[ 6702.707951] ---[ end trace d8146ba85c79e2a9 ]---


It looks like what happening is that enable_irq_window sometimes disables AVIC and then
calls the svm_enable_vintr to enable intercept on the moment when guest enables interrupts.

The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
however it doesn't broadcast it to CPU on which now we are running, which seems OK,
because the code that handles that broadcast runs on each VCPU entry, thus
when this CPU will enter guest mode it will notice and disable the AVIC.

However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
which is still true on current CPU because of the above.

The code containing this warning was added in commit

64b5bd27042639dfcc1534f01771b7b871a02ffe
KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1

The VM is running with SVM nesting disabled (-cpu svm=off), x2apic disabled, hv_synic disabled
(but I will send another mail about this soon), and pit lost tick policy set to discard,
to make AVIC actually be used.

The cpu is 3970X running latest mainline git head. I also tried 'next' branch of the kvm tree
without noticeable differences.

One thing that is curious is that this CPU (probably bios bug doesn't advertise x2apic, but it
is actually there. I added a hack to the kernel to pretend that the corresponding CPUID bit is
set and it actually works. But AVIC doesn't have any support for guest's x2apic thus I disable
x2apic for the guest completely.

Best regards,
Maxim Levitsky



2020-05-02 16:44:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On 02/05/20 15:58, Maxim Levitsky wrote:
> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
> kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
> however it doesn't broadcast it to CPU on which now we are running, which seems OK,
> because the code that handles that broadcast runs on each VCPU entry, thus
> when this CPU will enter guest mode it will notice and disable the AVIC.
>
> However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
> which is still true on current CPU because of the above.

Good point! We can just remove the WARN_ON I think. Can you send a patch?

svm_set_vintr also has a rather silly

static void svm_set_vintr(struct vcpu_svm *svm)
{
set_intercept(svm, INTERCEPT_VINTR);
if (is_intercept(svm, INTERCEPT_VINTR))
svm_enable_vintr(svm);
}

so I'm thinking of just inlining svm_enable_vintr and renaming
svm_{set,clear}_vintr to svm_{enable,disable}_vintr_window. Would you
like to send two patches for this, the first to remove the WARN_ON and
the second to do the cleanup?

Thanks,

Paolo

> The code containing this warning was added in commit
>
> 64b5bd27042639dfcc1534f01771b7b871a02ffe
> KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1

2020-05-02 16:47:57

by Maxim Levitsky

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On Sat, 2020-05-02 at 18:42 +0200, Paolo Bonzini wrote:
> On 02/05/20 15:58, Maxim Levitsky wrote:
> > The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
> > kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
> > however it doesn't broadcast it to CPU on which now we are running, which seems OK,
> > because the code that handles that broadcast runs on each VCPU entry, thus
> > when this CPU will enter guest mode it will notice and disable the AVIC.
> >
> > However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
> > which is still true on current CPU because of the above.
>
> Good point! We can just remove the WARN_ON I think. Can you send a patch?
>
> svm_set_vintr also has a rather silly
>
> static void svm_set_vintr(struct vcpu_svm *svm)
> {
> set_intercept(svm, INTERCEPT_VINTR);
> if (is_intercept(svm, INTERCEPT_VINTR))
> svm_enable_vintr(svm);
> }
>
> so I'm thinking of just inlining svm_enable_vintr and renaming
> svm_{set,clear}_vintr to svm_{enable,disable}_vintr_window. Would you
> like to send two patches for this, the first to remove the WARN_ON and
> the second to do the cleanup?

Absolutely! I will send a patch very soon.
Best regards,
Maxim Levitsky

>
> Thanks,
>
> Paolo
>
> > The code containing this warning was added in commit
> >
> > 64b5bd27042639dfcc1534f01771b7b871a02ffe
> > KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1


2020-05-03 19:20:07

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

Maxim / Paolo,

On 5/2/20 11:43 PM, Maxim Levitsky wrote:
> On Sat, 2020-05-02 at 18:42 +0200, Paolo Bonzini wrote:
>> On 02/05/20 15:58, Maxim Levitsky wrote:
>>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>>> kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
>>> however it doesn't broadcast it to CPU on which now we are running, which seems OK,
>>> because the code that handles that broadcast runs on each VCPU entry, thus
>>> when this CPU will enter guest mode it will notice and disable the AVIC.
>>>
>>> However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>>> which is still true on current CPU because of the above.
>>
>> Good point! We can just remove the WARN_ON I think. Can you send a patch?
>>
>> svm_set_vintr also has a rather silly
>>
>> static void svm_set_vintr(struct vcpu_svm *svm)
>> {
>> set_intercept(svm, INTERCEPT_VINTR);
>> if (is_intercept(svm, INTERCEPT_VINTR))
>> svm_enable_vintr(svm);
>> }
>>
>> so I'm thinking of just inlining svm_enable_vintr and renaming
>> svm_{set,clear}_vintr to svm_{enable,disable}_vintr_window. Would you
>> like to send two patches for this, the first to remove the WARN_ON and
>> the second to do the cleanup?
>
> Absolutely! I will send a patch very soon.

I have been debugging this and I have a patch that is supposed to fix this
(instead of removing the WARN ON). Please do not remove the warn on just yet.

Thanks,
Suravee

2020-05-04 09:02:31

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

Paolo / Maxim,

On 5/2/20 11:42 PM, Paolo Bonzini wrote:
> On 02/05/20 15:58, Maxim Levitsky wrote:
>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>> kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
>> however it doesn't broadcast it to CPU on which now we are running, which seems OK,
>> because the code that handles that broadcast runs on each VCPU entry, thus
>> when this CPU will enter guest mode it will notice and disable the AVIC.
>>
>> However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>> which is still true on current CPU because of the above.
> Good point! We can just remove the WARN_ON I think. Can you send a patch?

Instead, as an alternative to remove the WARN_ON(), would it be better to just explicitly
calling kvm_vcpu_update_apicv(vcpu) to update the apicv_active flag right after
kvm_request_apicv_update()?

Thanks,
Suravee

2020-05-04 09:16:11

by Maxim Levitsky

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
> Paolo / Maxim,
>
> On 5/2/20 11:42 PM, Paolo Bonzini wrote:
> > On 02/05/20 15:58, Maxim Levitsky wrote:
> > > The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
> > > kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
> > > however it doesn't broadcast it to CPU on which now we are running, which seems OK,
> > > because the code that handles that broadcast runs on each VCPU entry, thus
> > > when this CPU will enter guest mode it will notice and disable the AVIC.
> > >
> > > However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
> > > which is still true on current CPU because of the above.
> >
> > Good point! We can just remove the WARN_ON I think. Can you send a patch?
>
> Instead, as an alternative to remove the WARN_ON(), would it be better to just explicitly
> calling kvm_vcpu_update_apicv(vcpu) to update the apicv_active flag right after
> kvm_request_apicv_update()?
>
> Thanks,
> Suravee
>

This should work IMHO, other that the fact kvm_vcpu_update_apicv will be called again,
when this vcpu is entered since the KVM_REQ_APICV_UPDATE will still be pending on it.
It shoudn't be a problem, and we can even add a check to do nothing when it is called
while avic is already in target enable state.

Best regards,
Maxim Levitsky

2020-05-04 10:53:15

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

Paolo / Maxim,

On 5/4/20 4:25 PM, Paolo Bonzini wrote:
> On 04/05/20 11:13, Maxim Levitsky wrote:
>> On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
>>> Paolo / Maxim,
>>>
>>> On 5/2/20 11:42 PM, Paolo Bonzini wrote:
>>>> On 02/05/20 15:58, Maxim Levitsky wrote:
>>>>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>>>>> kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
>>>>> however it doesn't broadcast it to CPU on which now we are running, which seems OK,
>>>>> because the code that handles that broadcast runs on each VCPU entry, thus
>>>>> when this CPU will enter guest mode it will notice and disable the AVIC.
>>>>>
>>>>> However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>>>>> which is still true on current CPU because of the above.
>>>>
>>>> Good point! We can just remove the WARN_ON I think. Can you send a patch?
>>>
>>> Instead, as an alternative to remove the WARN_ON(), would it be better to just explicitly
>>> calling kvm_vcpu_update_apicv(vcpu) to update the apicv_active flag right after
>>> kvm_request_apicv_update()?
>>>
>> This should work IMHO, other that the fact kvm_vcpu_update_apicv will be called again,
>> when this vcpu is entered since the KVM_REQ_APICV_UPDATE will still be pending on it.
>> It shoudn't be a problem, and we can even add a check to do nothing when it is called
>> while avic is already in target enable state.
>
> I thought about that but I think it's a bit confusing. If we want to
> keep the WARN_ON, Maxim can add an equivalent one to svm_vcpu_run, which
> is even better because the invariant is clearer.
>
> WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
> == (AVIC_ENABLE_MASK | V_IRQ_MASK));
>
> Paolo
>

Quick update. I tried your suggestion as following, and it's showing the warning still.
I'll look further into this.

arch/x86/kvm/svm/svm.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2f379ba..142c4b9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1368,9 +1368,6 @@ static inline void svm_enable_vintr(struct vcpu_svm *svm)
{
struct vmcb_control_area *control;

- /* The following fields are ignored when AVIC is enabled */
- WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));
-
/*
* This is just a dummy VINTR to actually cause a vmexit to happen.
* Actual injection of virtual interrupts happens through EVENTINJ.
@@ -3322,6 +3319,11 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.apic->lapic_timer.timer_advance_ns)
kvm_wait_lapic_expire(vcpu);

+//SURAVEE
+ WARN_ON((svm->vmcb->control.int_ctl &
+ (AVIC_ENABLE_MASK | V_IRQ_MASK))
+ == (AVIC_ENABLE_MASK | V_IRQ_MASK));
+

Suravee

2020-05-04 10:57:23

by Paolo Bonzini

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On 04/05/20 12:37, Suravee Suthikulpanit wrote:
> Paolo / Maxim,
>
> On 5/4/20 4:25 PM, Paolo Bonzini wrote:
>> On 04/05/20 11:13, Maxim Levitsky wrote:
>>> On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
>>>> Paolo / Maxim,
>>>>
>>>> On 5/2/20 11:42 PM, Paolo Bonzini wrote:
>>>>> On 02/05/20 15:58, Maxim Levitsky wrote:
>>>>>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>>>>>> kvm_request_apicv_update, which broadcasts the
>>>>>> KVM_REQ_APICV_UPDATE vcpu request,
>>>>>> however it doesn't broadcast it to CPU on which now we are
>>>>>> running, which seems OK,
>>>>>> because the code that handles that broadcast runs on each VCPU
>>>>>> entry, thus
>>>>>> when this CPU will enter guest mode it will notice and disable the
>>>>>> AVIC.
>>>>>>
>>>>>> However later in svm_enable_vintr, there is test
>>>>>> 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>>>>>> which is still true on current CPU because of the above.
>>>>>
>>>>> Good point!  We can just remove the WARN_ON I think.  Can you send
>>>>> a patch?
>>>>
>>>> Instead, as an alternative to remove the WARN_ON(), would it be
>>>> better to just explicitly
>>>> calling kvm_vcpu_update_apicv(vcpu) to update the apicv_active flag
>>>> right after
>>>> kvm_request_apicv_update()?
>>>>
>>> This should work IMHO, other that the fact kvm_vcpu_update_apicv will
>>> be called again,
>>> when this vcpu is entered since the KVM_REQ_APICV_UPDATE will still
>>> be pending on it.
>>> It shoudn't be a problem, and we can even add a check to do nothing
>>> when it is called
>>> while avic is already in target enable state.
>>
>> I thought about that but I think it's a bit confusing.  If we want to
>> keep the WARN_ON, Maxim can add an equivalent one to svm_vcpu_run, which
>> is even better because the invariant is clearer.
>>
>> WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
>>     == (AVIC_ENABLE_MASK | V_IRQ_MASK));
>>
>> Paolo
>>
>
> Quick update. I tried your suggestion as following, and it's showing the
> warning still.
> I'll look further into this.

Ok, thanks. By the way, there is another possible cleanup: the clearing
of V_IRQ_MASK can be removed from interrupt_window_interception since it
has already called svm_clear_vintr.

Paolo

>  arch/x86/kvm/svm/svm.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 2f379ba..142c4b9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1368,9 +1368,6 @@ static inline void svm_enable_vintr(struct
> vcpu_svm *svm)
>  {
>         struct vmcb_control_area *control;
>
> -       /* The following fields are ignored when AVIC is enabled */
> -       WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));
> -
>         /*
>          * This is just a dummy VINTR to actually cause a vmexit to happen.
>          * Actual injection of virtual interrupts happens through EVENTINJ.
> @@ -3322,6 +3319,11 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>                 vcpu->arch.apic->lapic_timer.timer_advance_ns)
>                 kvm_wait_lapic_expire(vcpu);
>
> +//SURAVEE
> +       WARN_ON((svm->vmcb->control.int_ctl &
> +                (AVIC_ENABLE_MASK | V_IRQ_MASK))
> +                == (AVIC_ENABLE_MASK | V_IRQ_MASK));
> +
>
> Suravee
>

2020-05-04 12:39:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On 04/05/20 11:13, Maxim Levitsky wrote:
> On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
>> Paolo / Maxim,
>>
>> On 5/2/20 11:42 PM, Paolo Bonzini wrote:
>>> On 02/05/20 15:58, Maxim Levitsky wrote:
>>>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>>>> kvm_request_apicv_update, which broadcasts the KVM_REQ_APICV_UPDATE vcpu request,
>>>> however it doesn't broadcast it to CPU on which now we are running, which seems OK,
>>>> because the code that handles that broadcast runs on each VCPU entry, thus
>>>> when this CPU will enter guest mode it will notice and disable the AVIC.
>>>>
>>>> However later in svm_enable_vintr, there is test 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>>>> which is still true on current CPU because of the above.
>>>
>>> Good point! We can just remove the WARN_ON I think. Can you send a patch?
>>
>> Instead, as an alternative to remove the WARN_ON(), would it be better to just explicitly
>> calling kvm_vcpu_update_apicv(vcpu) to update the apicv_active flag right after
>> kvm_request_apicv_update()?
>>
> This should work IMHO, other that the fact kvm_vcpu_update_apicv will be called again,
> when this vcpu is entered since the KVM_REQ_APICV_UPDATE will still be pending on it.
> It shoudn't be a problem, and we can even add a check to do nothing when it is called
> while avic is already in target enable state.

I thought about that but I think it's a bit confusing. If we want to
keep the WARN_ON, Maxim can add an equivalent one to svm_vcpu_run, which
is even better because the invariant is clearer.

WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
== (AVIC_ENABLE_MASK | V_IRQ_MASK));

Paolo

2020-05-05 08:00:09

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

Paolo / Maxim,

On 5/4/20 5:49 PM, Paolo Bonzini wrote:
> On 04/05/20 12:37, Suravee Suthikulpanit wrote:
>> On 5/4/20 4:25 PM, Paolo Bonzini wrote:
>>> On 04/05/20 11:13, Maxim Levitsky wrote:
>>>> On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
>>>>> On 5/2/20 11:42 PM, Paolo Bonzini wrote:
>>>>>> On 02/05/20 15:58, Maxim Levitsky wrote:
>>>>>>> The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
>>>>>>> kvm_request_apicv_update, which broadcasts the
>>>>>>> KVM_REQ_APICV_UPDATE vcpu request,
>>>>>>> however it doesn't broadcast it to CPU on which now we are
>>>>>>> running, which seems OK,
>>>>>>> because the code that handles that broadcast runs on each VCPU
>>>>>>> entry, thus when this CPU will enter guest mode it will notice
>>>>>>> and disable the AVIC. >>>>>>>
>>>>>>> However later in svm_enable_vintr, there is test
>>>>>>> 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
>>>>>>> which is still true on current CPU because of the above.
>>>>>>
>>>>>> Good point!  We can just remove the WARN_ON I think.  Can you send
>>>>>> a patch?
>>>>>>
>>>>> Instead, as an alternative to remove the WARN_ON(), would it be
>>>>> better to just explicitly calling kvm_vcpu_update_apicv(vcpu)
>>>>> to update the apicv_active flag right after kvm_request_apicv_update()?
>>>>>
>>>> This should work IMHO, other that the fact kvm_vcpu_update_apicv will
>>>> be called again, when this vcpu is entered since the KVM_REQ_APICV_UPDATE
>>>> will still be pending on it.
>>>> It shoudn't be a problem, and we can even add a check to do nothing
>>>> when it is called while avic is already in target enable state.
>>>
>>> I thought about that but I think it's a bit confusing.  If we want to
>>> keep the WARN_ON, Maxim can add an equivalent one to svm_vcpu_run, which
>>> is even better because the invariant is clearer.
>>>
>>> WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
>>>     == (AVIC_ENABLE_MASK | V_IRQ_MASK));
>>>

Based on my experiment, it seems that the hardware sets the V_IRQ_MASK bit
when #VMEXIT despite this bit being ignored when AVIC is enabled.
(I'll double check w/ HW team on this.) In this case, I don't think we can
use the WARN_ON() as suggested above.

I think we should keep the warning in the svm_set_vintr() since we want to know
if the V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR are ignored when calling
svm_set_vintr().

Instead, I would consider explicitly call kvm_vcpu_update_apicv() since it would
be benefit from not having to wait for the next vcpu_enter_guest for this vcpu to process
the request. This is less confusing to me. In this case, we would need to
kvm_clear_request(KVM_REQ_APICV_UPDATE) for this vcpu as well.

On the other hand, would be it useful to implement kvm_make_all_cpus_request_but_self(),
which sends request to all other vcpus excluding itself?

> By the way, there is another possible cleanup: the clearing
> of V_IRQ_MASK can be removed from interrupt_window_interception since it
> has already called svm_clear_vintr.

Maxim, I can help with the clean up patches if you would prefer.

Thanks,
Suravee


> Paolo
>

2020-05-05 11:42:40

by Maxim Levitsky

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On Tue, 2020-05-05 at 14:55 +0700, Suravee Suthikulpanit wrote:
> Paolo / Maxim,
>
> On 5/4/20 5:49 PM, Paolo Bonzini wrote:
> > On 04/05/20 12:37, Suravee Suthikulpanit wrote:
> > > On 5/4/20 4:25 PM, Paolo Bonzini wrote:
> > > > On 04/05/20 11:13, Maxim Levitsky wrote:
> > > > > On Mon, 2020-05-04 at 15:46 +0700, Suravee Suthikulpanit wrote:
> > > > > > On 5/2/20 11:42 PM, Paolo Bonzini wrote:
> > > > > > > On 02/05/20 15:58, Maxim Levitsky wrote:
> > > > > > > > The AVIC is disabled by svm_toggle_avic_for_irq_window, which calls
> > > > > > > > kvm_request_apicv_update, which broadcasts the
> > > > > > > > KVM_REQ_APICV_UPDATE vcpu request,
> > > > > > > > however it doesn't broadcast it to CPU on which now we are
> > > > > > > > running, which seems OK,
> > > > > > > > because the code that handles that broadcast runs on each VCPU
> > > > > > > > entry, thus when this CPU will enter guest mode it will notice
> > > > > > > > and disable the AVIC. >>>>>>>
> > > > > > > > However later in svm_enable_vintr, there is test
> > > > > > > > 'WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));'
> > > > > > > > which is still true on current CPU because of the above.
> > > > > > >
> > > > > > > Good point! We can just remove the WARN_ON I think. Can you send
> > > > > > > a patch?
> > > > > > >
> > > > > >
> > > > > > Instead, as an alternative to remove the WARN_ON(), would it be
> > > > > > better to just explicitly calling kvm_vcpu_update_apicv(vcpu)
> > > > > > to update the apicv_active flag right after kvm_request_apicv_update()?
> > > > > >
> > > > >
> > > > > This should work IMHO, other that the fact kvm_vcpu_update_apicv will
> > > > > be called again, when this vcpu is entered since the KVM_REQ_APICV_UPDATE
> > > > > will still be pending on it.
> > > > > It shoudn't be a problem, and we can even add a check to do nothing
> > > > > when it is called while avic is already in target enable state.
> > > >
> > > > I thought about that but I think it's a bit confusing. If we want to
> > > > keep the WARN_ON, Maxim can add an equivalent one to svm_vcpu_run, which
> > > > is even better because the invariant is clearer.
> > > >
> > > > WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
> > > > == (AVIC_ENABLE_MASK | V_IRQ_MASK));
> > > >
>
> Based on my experiment, it seems that the hardware sets the V_IRQ_MASK bit
> when #VMEXIT despite this bit being ignored when AVIC is enabled.
> (I'll double check w/ HW team on this.) In this case, I don't think we can
> use the WARN_ON() as suggested above.
>
> I think we should keep the warning in the svm_set_vintr() since we want to know
> if the V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR are ignored when calling
> svm_set_vintr().
>
> Instead, I would consider explicitly call kvm_vcpu_update_apicv() since it would
> be benefit from not having to wait for the next vcpu_enter_guest for this vcpu to process
> the request. This is less confusing to me. In this case, we would need to
> kvm_clear_request(KVM_REQ_APICV_UPDATE) for this vcpu as well.
>
> On the other hand, would be it useful to implement kvm_make_all_cpus_request_but_self(),
> which sends request to all other corpus excluding itself?
>
> > By the way, there is another possible cleanup: the clearing
> > of V_IRQ_MASK can be removed from interrupt_window_interception since it
> > has already called svm_clear_vintr.
>
> Maxim, I can help with the clean up patches if you would prefer.

I currently am waiting for the decision on how to we are going to fix this.
I don't have a strong opinion on how to fix this, but at least I think that we know
what is going on.

Initially I was thinking that something was broken in AVIC, especially when I noticed
that guest would hang when I did LatencyMon benchmark in it.
Luckily the other fix that I tested and reviewed seems to fix those hangs.

In a few days I plan to do some nvme passthrough stress testing as I used to do when I was
developing my nvme-mdev driver with AVIC. I am very curios on how this will turn out.

Best regards,
Maxim Levitsky

>
> Thanks,
> Suravee
>
>
> > Paolo
> >
>
>


2020-05-05 12:14:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window

On 05/05/20 09:55, Suravee Suthikulpanit wrote:
>> WARN_ON((vmcb->control.int_ctl & (AVIC_ENABLE_MASK | V_IRQ_MASK))
>> == (AVIC_ENABLE_MASK | V_IRQ_MASK));
>
> Based on my experiment, it seems that the hardware sets the V_IRQ_MASK bit
> when #VMEXIT despite this bit being ignored when AVIC is enabled.
> (I'll double check w/ HW team on this.) In this case, I don't think we can
> use the WARN_ON() as suggested above.

Indeed this is even documented:

NOTE: This value is written back to the VMCB at #VMEXIT.
This field is ignored on VMRUN when AVIC is
enabled.

> I think we should keep the warning in the svm_set_vintr() since we want
> to know if the V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR are ignored when
> calling svm_set_vintr().
>
> Instead, I would consider explicitly call kvm_vcpu_update_apicv() since
> it would be benefit from not having to wait for the next vcpu_enter_guest for
> this vcpu to process the request. This is less confusing to me. In this case,
> we would need to kvm_clear_request(KVM_REQ_APICV_UPDATE) for this vcpu as well.
>
> On the other hand, would be it useful to implement
> kvm_make_all_cpus_request_but_self(),
> which sends request to all other vcpus excluding itself?

Yes, that's also a possibility. It's not too much extra complication if
we add a new argument to kvm_make_vcpus_request_mask, like this:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 74bdb7bf3295..8f9dadb1ef42 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -258,7 +258,7 @@ static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait)
return true;
}

-bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
+bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, struct kvm_vcpu *except,
unsigned long *vcpu_bitmap, cpumask_var_t tmp)
{
int i, cpu, me;
@@ -270,6 +270,8 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
continue;
+ if (vcpu == except)
+ continue;

kvm_make_request(req, vcpu);
cpu = vcpu->cpu;


Paolo

2020-05-06 00:45:18

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AVIC related warning in enable_irq_window



On 5/5/20 7:12 PM, Paolo Bonzini wrote:
> On 05/05/20 09:55, Suravee Suthikulpanit wrote:
>> On the other hand, would be it useful to implement
>> kvm_make_all_cpus_request_but_self(),
>> which sends request to all other vcpus excluding itself?
>
> Yes, that's also a possibility. It's not too much extra complication if
> we add a new argument to kvm_make_vcpus_request_mask, like this:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 74bdb7bf3295..8f9dadb1ef42 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -258,7 +258,7 @@ static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait)
> return true;
> }
>
> -bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
> +bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, struct kvm_vcpu *except,
> unsigned long *vcpu_bitmap, cpumask_var_t tmp)
> {
> int i, cpu, me;
> @@ -270,6 +270,8 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
> continue;
> + if (vcpu == except)
> + continue;
>
> kvm_make_request(req, vcpu);
> cpu = vcpu->cpu;
>
>
> Paolo
>

Sounds good. I'll take care of this today.

Suravee