Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp449865imu; Wed, 19 Dec 2018 22:54:54 -0800 (PST) X-Google-Smtp-Source: AFSGD/UpFaSwtPqSdtSf1A6W0hTda75AzGpsC7A88XeN9fmWIK3AklxH4wRPcttjUY64kuZPUVNG X-Received: by 2002:a63:7f4f:: with SMTP id p15mr22267659pgn.296.1545288894900; Wed, 19 Dec 2018 22:54:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545288894; cv=none; d=google.com; s=arc-20160816; b=WPeNpE55TH+N1g9q7tNpXAlMRPNI98DI1dbh2NJ90KCrFxE7h061pOuWFEVXCrRSdj x5Rr/rCPgZI6zqXRDA6I50PWKOOZrIDG36L7U8/ZsEsRlzD3HegmumhHAaONKpphfwyv c9sjkTYEtb4a+PGJBir7emJ/+YbdwubH28OZEfQSoUBGxerndDGyGwEebUcXZpOoMrLi iIpLZfx7EDYRXSb5zvIvbkd0VDNrH6GhmanfSTxZ6pEPiz+wSpmej5Abo2PK3hpg9/7Y v1OUU+QLnEfWyyY0rnMSB4RFKKf03DpZFMeU4JIWkcMmw/7dNVWwlamgdt49sN7vOnAr BVxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=DX9TonpDXOTAw99zf1iXPkf/nxAZGH5dpGvLKmPc3wM=; b=LRhPhyblGrO8mluO+r4qGUgwuxaYySjy7/SL19Haurkmse68vL4y0mHCzBBHnpwJud fZXE4lGFfuAMawlxc3OJz/N1/sFPi+9yJsVIuS3sxHfa4N3OlOJDwNF4r8pm1eIHOkY6 w3/TaE6CjTSkPvWRrmn4Nj76emrdlKdVkdUwHuMEFrx9ZwL8lzQJEpiJMh8q+N2sOyBK edDxohxZIbISSS+zl8yO17iUHwZakIDSxaf3EcIhn3Gdw4QV0TI5RCSLFMtKiiz+zbvx 1imqPy7ouojG/UlWUEJwcfn1mJ2nKlYBYgnCxs4kvX2GAIhk5ZbmtjD7KCu/paJZwYlQ JgJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MHqIg6Iy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i1si18190930pfj.276.2018.12.19.22.54.38; Wed, 19 Dec 2018 22:54:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MHqIg6Iy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730484AbeLTGQL (ORCPT + 99 others); Thu, 20 Dec 2018 01:16:11 -0500 Received: from mail-ot1-f65.google.com ([209.85.210.65]:42822 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726604AbeLTGQL (ORCPT ); Thu, 20 Dec 2018 01:16:11 -0500 Received: by mail-ot1-f65.google.com with SMTP id v23so528931otk.9; Wed, 19 Dec 2018 22:16:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=DX9TonpDXOTAw99zf1iXPkf/nxAZGH5dpGvLKmPc3wM=; b=MHqIg6IyTEnxExu70r6JmOd6ECtmJyvXjYCZOZvRCLzvlC3EJO9sbXZwe9VeOF760Z xneC0RFFVbm3m0Ycm/vIX4pabNBb9bhURovdkgI4o+zkvbNy1Fbg0fPknnMqpNwZqU2A biAI8G8EJ9ZQWxAkl5Hn2F+YruD+8fAXgRi7d4f8kowvCfo8HbF5JuossLpBbEioNNpF P49TKkYoC04lK6wn/FnGWc/K9+3I42PZcJfjfKLbI0rH9d+suw9rbB8ZSWrb5hkBRqGS nHcicA2NzDDehR5DYqAWEKc3ydquiyarbbzdg38scNsQAwbhnK6JizpI26q+mMo5ts34 OQsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=DX9TonpDXOTAw99zf1iXPkf/nxAZGH5dpGvLKmPc3wM=; b=jtNf2YdaQRgJX5hl0GVPHmzkVMXbi3b+T7/b5O3VHtl387E5Cu8G1EDyuDtU0PnmI+ 8hr3mCWt1dEi4fY8BaXzZhENDdO1PGgXhUAnlbYfoSgYT2rTBQ6LP1r2qq0pSF4HRQDc NtDyCYldzeebX4t6k+tx4Md1brk31o5DZ6DjqJe1iLus2w9fiFiHBjbTEhCiAG71hx+m 8EU/LWdfnHi8ZsK3hQ8K36zuIUMoMOGlfe+7NIirG8t0Vlm6bugHYA9nxeisPijTAivM A6IleoCRThdmt7sLwCWIkYC4m6CbNgw9bBDPjRZdehQLlOa+/EpSvN+ZxkmsqlmhQEAJ sqNg== X-Gm-Message-State: AA+aEWbmCDzv/ZRuvESixvDmDCXgHh8aYYppSmzJj0Dr9YHXVywa6aqP DDwwJbCsY3OhJhedY1WI6uRKIZ48so0S0c4yxHfZAg== X-Received: by 2002:a9d:5f06:: with SMTP id f6mr17730649oti.258.1545286569641; Wed, 19 Dec 2018 22:16:09 -0800 (PST) MIME-Version: 1.0 References: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> In-Reply-To: From: Wanpeng Li Date: Thu, 20 Dec 2018 14:16:14 +0800 Message-ID: Subject: Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes To: LKML , kvm Cc: Paolo Bonzini , Radim Krcmar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org kindly ping, On Fri, 14 Dec 2018 at 15:24, Wanpeng Li wrote: > > ping, > On Thu, 6 Dec 2018 at 15:58, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > Last year guys from huawei reported that the call of memory_global_dirt= y_log_start/stop() > > takes 13s for 4T memory and cause guest freeze too long which increases= the unacceptable > > migration downtime. [1] [2] > > > > Guangrong pointed out: > > > > | collapsible_sptes zaps 4k mappings to make memory-read happy, it is n= ot > > | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not > > | urgent for vCPU's running, it could be done in a separate thread and = use > > | lock-break technology. > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html > > [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html > > > > Several TB memory guest is common now after NVDIMM is deployed in cloud= environment. > > This patch utilizes worker thread to zap collapsible sptes in order to = lazy collapse > > small sptes into large sptes during roll-back after live migration fail= s. > > > > Cc: Paolo Bonzini > > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > > Signed-off-by: Wanpeng Li > > --- > > arch/x86/include/asm/kvm_host.h | 3 +++ > > arch/x86/kvm/mmu.c | 37 ++++++++++++++++++++++++++++++++-= ---- > > arch/x86/kvm/x86.c | 4 ++++ > > 3 files changed, 39 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm= _host.h > > index fbda5a9..dde32f9 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -892,6 +892,8 @@ struct kvm_arch { > > u64 master_cycle_now; > > struct delayed_work kvmclock_update_work; > > struct delayed_work kvmclock_sync_work; > > + struct delayed_work kvm_mmu_zap_collapsible_sptes_work; > > + bool zap_in_progress; > > > > struct kvm_xen_hvm_config xen_hvm_config; > > > > @@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm); > > void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslot= s *slots); > > unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); > > void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu= _pages); > > +void zap_collapsible_sptes_fn(struct work_struct *work); > > > > int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned l= ong cr3); > > bool pdptrs_changed(struct kvm_vcpu *vcpu); > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index 7c03c0f..fe87dd3 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct= kvm *kvm, > > return need_tlb_flush; > > } > > > > +void zap_collapsible_sptes_fn(struct work_struct *work) > > +{ > > + struct kvm_memory_slot *memslot; > > + struct kvm_memslots *slots; > > + struct delayed_work *dwork =3D to_delayed_work(work); > > + struct kvm_arch *ka =3D container_of(dwork, struct kvm_arch, > > + kvm_mmu_zap_collapsible_spte= s_work); > > + struct kvm *kvm =3D container_of(ka, struct kvm, arch); > > + int i; > > + > > + mutex_lock(&kvm->slots_lock); > > + for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > > + spin_lock(&kvm->mmu_lock); > > + slots =3D __kvm_memslots(kvm, i); > > + kvm_for_each_memslot(memslot, slots) { > > + slot_handle_leaf(kvm, (struct kvm_memory_slot *= )memslot, > > + kvm_mmu_zap_collapsible_spte, true); > > + if (need_resched() || spin_needbreak(&kvm->mmu_= lock)) > > + cond_resched_lock(&kvm->mmu_lock); > > + } > > + spin_unlock(&kvm->mmu_lock); > > + } > > + kvm->arch.zap_in_progress =3D false; > > + mutex_unlock(&kvm->slots_lock); > > +} > > + > > +#define KVM_MMU_ZAP_DELAYED (60 * HZ) > > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > > const struct kvm_memory_slot *memslo= t) > > { > > - /* FIXME: const-ify all uses of struct kvm_memory_slot. */ > > - spin_lock(&kvm->mmu_lock); > > - slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, > > - kvm_mmu_zap_collapsible_spte, true); > > - spin_unlock(&kvm->mmu_lock); > > + if (!kvm->arch.zap_in_progress) { > > + kvm->arch.zap_in_progress =3D true; > > + schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsibl= e_sptes_work, > > + KVM_MMU_ZAP_DELAYED); > > + } > > } > > > > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index d029377..c2af289 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned lo= ng type) > > > > INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_upd= ate_fn); > > INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_= fn); > > + INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work= , > > + zap_collapsible_sptes_fn); > > + kvm->arch.zap_in_progress =3D false; > > > > kvm_hv_init_vm(kvm); > > kvm_page_track_init(kvm); > > @@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm) > > { > > cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work); > > cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work); > > + cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_spt= es_work); > > kvm_free_pit(kvm); > > } > > > > -- > > 2.7.4 > >