Date:   Thu, 16 Dec 2021 15:42:35 +0000
From:   Sean Christopherson <seanjc@google.com>
To:     "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@huawei.com>
Cc:     "pbonzini@redhat.com" <pbonzini@redhat.com>,
        "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
        "Gonglei (Arei)" <arei.gonglei@huawei.com>,
        Huangzhichao <huangzhichao@huawei.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: The vcpu won't be wakened for a long time
Message-ID: <Ybtea42RxZ9aVzCh@google.com>
References: <73d46f3cc46a499c8e39fdf704b2deaf@huawei.com>
 <YbjWFTtNo9Ap7kDp@google.com>
 <9e5aef1ae0c141e49c2b1d19692b9295@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9e5aef1ae0c141e49c2b1d19692b9295@huawei.com>
Precedence: bulk

On Thu, Dec 16, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > What kernel version?  There have been a variety of fixes/changes in the
> > area in recent kernels.
> 
> The kernel version is 4.18, and it seems the latest kernel also has this problem.
> 
> The following code can fixes this bug, I've tested it on 4.18.
> 
> (4.18)
> 
> @@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>         if (pi_test_and_set_on(&vmx->pi_desc))
>                 return;
>  
> +       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
> +               kvm_vcpu_kick(vcpu);
> +               return;
> +       }
> +
>         if (vcpu != kvm_get_running_vcpu() &&
>                 !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_kick(vcpu);
> 
> 
> (latest)
> 
> @@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>         if (pi_test_and_set_on(&vmx->pi_desc))
>                 return 0;
>  
> +       if (rcuwait_active(&vcpu->wait)) {
> +               kvm_vcpu_kick(vcpu);
> +               return 0;
> +       }
> +
>         if (vcpu != kvm_get_running_vcpu() &&
>             !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_kick(vcpu);
> 
> Do you have any suggestions ?

Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
Can you try running with the below commit?  It's currently sitting in kvm/queue,
but not marked for stable because I didn't think it was possible for the check
to a cause a missed wake event in KVM's current code base.

commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Dec 8 01:52:27 2021 +0000

    KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU

    Drop a check that guards triggering a posted interrupt on the currently
    running vCPU, and more importantly guards waking the target vCPU if
    triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
    The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
    doesn't have a path to ->deliver_posted_interrupt() from asynchronous
    context, e.g. if apic_timer_expired() were changed to always go down the
    posted interrupt path for APICv, or if the IN_GUEST_MODE check in
    kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
    kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
    would be scheduled() out without being awakened, i.e. would "miss" the
    timer interrupt.

    One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
    context for the current running vCPU should be illegal, but nothing in
    KVM actually enforces that rules.  There's also no strong obvious benefit
    to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
    kvm_vcpu_wake_up() is at worst marginally more costly than querying the
    current running vCPU.

    Lastly, this aligns the non-nested and nested usage of triggering posted
    interrupts, and will allow for additional cleanups.

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20211208015236.1616697-18-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 38749063da0e..f61a6348cffd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
         * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
         * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
         */
-       if (vcpu != kvm_get_running_vcpu() &&
-           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
+       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
                kvm_vcpu_wake_up(vcpu);

        return 0;