Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
MIME-Version: 1.0
References: <1581988630-19182-1-git-send-email-wanpengli@tencent.com> <87r1ys7xpk.fsf@vitty.brq.redhat.com>
In-Reply-To: <87r1ys7xpk.fsf@vitty.brq.redhat.com>
From:   Wanpeng Li <kernellwp@gmail.com>
Date:   Wed, 19 Feb 2020 08:32:26 +0800
Message-ID: <CANRm+CzbVygGMsUxMOXEd-+U3B_gOQjkd6u66DzROBmU_V2mTQ@mail.gmail.com>
Subject: Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after
 VM boots
To:     Vitaly Kuznetsov <vkuznets@redhat.com>
Cc:     Paolo Bonzini <pbonzini@redhat.com>,
        Sean Christopherson <sean.j.christopherson@intel.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Tue, 18 Feb 2020 at 22:54, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Wanpeng Li <kernellwp@gmail.com> writes:
>
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > In the progress of vCPUs creation, it queues a kvmclock sync worker to the global
> > workqueue before each vCPU creation completes. Each worker will be scheduled
> > after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them
> > out. This is especially worse when scaling to large VMs due to a lot of vmexits.
> > Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is
> > enough.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v3 -> v4:
> >  * check vcpu->vcpu_idx
> >
> >  arch/x86/kvm/x86.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index fb5d64e..d0ba2d4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
> >       if (!kvmclock_periodic_sync)
> >               return;
> >
> > -     schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> > -                                     KVMCLOCK_SYNC_PERIOD);
> > +     if (vcpu->vcpu_idx == 0)
> > +             schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> > +                                             KVMCLOCK_SYNC_PERIOD);
> >  }
> >
> >  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>
> Forgive me my ignorance, I was under the impression
> schedule_delayed_work() doesn't do anything if the work is already
> queued (see queue_delayed_work_on()) and we seem to be scheduling the
> same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
> per-vcpu). Do we actually happen to finish executing it before next vCPU
> is created or why does the storm you describe happens?

I miss it, ok, let's just make patch 2/2 upstream.

    Wanpeng