Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1758409imu; Thu, 13 Dec 2018 23:25:12 -0800 (PST) X-Google-Smtp-Source: AFSGD/X2IOtUJf+OJPJhL0vsqW0RpyZrq54bE10ZSa1HX3WTavOw6O5Jm/uyTjw8gA0XAgP4Oso/ X-Received: by 2002:a17:902:e28e:: with SMTP id cf14mr1883823plb.311.1544772311970; Thu, 13 Dec 2018 23:25:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544772311; cv=none; d=google.com; s=arc-20160816; b=0luX5m3GSLZPAT92e/TkoVTcBvnU/0Jr00uzwdqrQljHr4uY1jtG23Oo/oBBmQEdVf H1xAPtxegb4g4XN/mqO9GpsZG2Z88ysNNeRkD5ENXrSTyj/GjeS79TaSNcygdT0JQmVz TOxyO/xMNtT9T0i5ELNS15cBHPGPXJXbkQcATQS0vUVlHXMlR8sa94AyZq2x/UAfMprK Nx+Sr+ZObHEypvvUrUb9+Qli6/Tt7+P1wi/sYDCW+MUO+jjgRr8onu60TMDtOcqK4Q+O nawfGPzkqKV8xqdkAXBOGumrOHnZfNRKNk+QXOFhnGaI21cEdMa3BejKh8ULFlLh8HPP YW4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=1IjR+hljDHa0CWj5h3p77xH8cWm9v6joLzico1hRFmM=; b=p6FlAAdK/Zyu7Rey8crv8kz5myarPAo3nn9ydKUay7Xn5Yq+mvCkDE9ixW9kEykDqE eh9HppwG+D0udbngutCscBNfNJSni86v643XL+9ZTYp6soEi8BD/FZenaDgpFJQCIHjG kslrFI99Q4ZTPNaHmSfO2pxp0mr5tgcHf0WZb7qy+/9UMCxfhTs0DnoOGJvDOoF/VzIY VLWOjs7AIrT3VtooCDDNN9SgCSEFrm5KUW15Zxdzm8HGqFp5yCahcFneuKAxiTYsCVDu TSot/6PhoS/QI7OZ7dwy79Uqitqoaf7baJ4OnC/3gKdteuEEdWEZXCf5ZIzMp4ZrVgA5 +0qQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=EdB1k+SG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x33si3513946plb.43.2018.12.13.23.24.55; Thu, 13 Dec 2018 23:25:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=EdB1k+SG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727224AbeLNHYD (ORCPT + 99 others); Fri, 14 Dec 2018 02:24:03 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:39185 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726437AbeLNHYC (ORCPT ); Fri, 14 Dec 2018 02:24:02 -0500 Received: by mail-ot1-f66.google.com with SMTP id n8so4515414otl.6; Thu, 13 Dec 2018 23:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=1IjR+hljDHa0CWj5h3p77xH8cWm9v6joLzico1hRFmM=; b=EdB1k+SGm+tarv9YCo3kEyY175rXtCnpSamLxcFTu8BClMEpOD9saXsAUHxHrDHvQe s0I44pxEsBTnlYvHuaEp0H8sqmqNc9mWCklxf9Xr3q8Y/IiRxXnxs+6oQ8w/YZh4q5vK GX4FRzekitR42iV6Z4kZ7LppXmo+ytwygCJDVX3NkZjcTeUyb+C+UFLTwUFlJHu5l0rU P3jIxiJW+aoIM59rpRZX2PKmr3Z6nr7/hHZPqQGN77mQoa3AJHzC8LUtaW2Ne5cvpqqZ 6iyXNUmvQWCVDMc1KDy5ag8aWJhkEaQGp0xB/joJejtGG1PpvfmKnK/XqakI+rprVbwb IdKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=1IjR+hljDHa0CWj5h3p77xH8cWm9v6joLzico1hRFmM=; b=gmyMewmCpnCzSH9Mosfpge0PXn3N6rNxfo+xxRI3vIapgHzau65XOTQPrEDRk5SiCS ciQnZBudQXOIF6lz2GTRxXBscq9vIDlJ4f7G3BpYt3YIlwOQXuQw8MB+s6GkEIKoN5+c rfYxiX2uQZ2i8Dld6zq9xwG9aS64YjrHXX5xmyJJ3SGuxOCYrLrol/Np8ek195J6t4W2 m8gUD9xzR4I/zjkt3oykiUMI/pcIElNsA5kbxZQdtrQJ9cLvC0vc8X+mMcKOOMHJkLoK XZT7AKbTqWrZz17oiaVrA0X1XRjQzURrei4JMJo4S9ld0qhVDFtg5QvqGJu2VksZA9I9 Au9Q== X-Gm-Message-State: AA+aEWbHD4N0m2s9HARiJeuj5nIWreawuz+mORvwW8ENYQlO5aS7g/wA BBFbkGwlPa8N3RMVNmlWOZhQTGBOIrRC8d+5xRmviA== X-Received: by 2002:a9d:19e8:: with SMTP id k95mr1253168otk.209.1544772241288; Thu, 13 Dec 2018 23:24:01 -0800 (PST) MIME-Version: 1.0 References: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> In-Reply-To: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> From: Wanpeng Li Date: Fri, 14 Dec 2018 15:24:01 +0800 Message-ID: Subject: Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes To: LKML , kvm Cc: Paolo Bonzini , Radim Krcmar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ping, On Thu, 6 Dec 2018 at 15:58, Wanpeng Li wrote: > > From: Wanpeng Li > > Last year guys from huawei reported that the call of memory_global_dirty_= log_start/stop() > takes 13s for 4T memory and cause guest freeze too long which increases t= he unacceptable > migration downtime. [1] [2] > > Guangrong pointed out: > > | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not > | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not > | urgent for vCPU's running, it could be done in a separate thread and us= e > | lock-break technology. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html > [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html > > Several TB memory guest is common now after NVDIMM is deployed in cloud e= nvironment. > This patch utilizes worker thread to zap collapsible sptes in order to la= zy collapse > small sptes into large sptes during roll-back after live migration fails. > > Cc: Paolo Bonzini > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > Signed-off-by: Wanpeng Li > --- > arch/x86/include/asm/kvm_host.h | 3 +++ > arch/x86/kvm/mmu.c | 37 ++++++++++++++++++++++++++++++++---= -- > arch/x86/kvm/x86.c | 4 ++++ > 3 files changed, 39 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_h= ost.h > index fbda5a9..dde32f9 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -892,6 +892,8 @@ struct kvm_arch { > u64 master_cycle_now; > struct delayed_work kvmclock_update_work; > struct delayed_work kvmclock_sync_work; > + struct delayed_work kvm_mmu_zap_collapsible_sptes_work; > + bool zap_in_progress; > > struct kvm_xen_hvm_config xen_hvm_config; > > @@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm); > void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots = *slots); > unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); > void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_p= ages); > +void zap_collapsible_sptes_fn(struct work_struct *work); > > int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned lon= g cr3); > bool pdptrs_changed(struct kvm_vcpu *vcpu); > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 7c03c0f..fe87dd3 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct k= vm *kvm, > return need_tlb_flush; > } > > +void zap_collapsible_sptes_fn(struct work_struct *work) > +{ > + struct kvm_memory_slot *memslot; > + struct kvm_memslots *slots; > + struct delayed_work *dwork =3D to_delayed_work(work); > + struct kvm_arch *ka =3D container_of(dwork, struct kvm_arch, > + kvm_mmu_zap_collapsible_sptes_= work); > + struct kvm *kvm =3D container_of(ka, struct kvm, arch); > + int i; > + > + mutex_lock(&kvm->slots_lock); > + for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > + spin_lock(&kvm->mmu_lock); > + slots =3D __kvm_memslots(kvm, i); > + kvm_for_each_memslot(memslot, slots) { > + slot_handle_leaf(kvm, (struct kvm_memory_slot *)m= emslot, > + kvm_mmu_zap_collapsible_spte, true); > + if (need_resched() || spin_needbreak(&kvm->mmu_lo= ck)) > + cond_resched_lock(&kvm->mmu_lock); > + } > + spin_unlock(&kvm->mmu_lock); > + } > + kvm->arch.zap_in_progress =3D false; > + mutex_unlock(&kvm->slots_lock); > +} > + > +#define KVM_MMU_ZAP_DELAYED (60 * HZ) > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > const struct kvm_memory_slot *memslot) > { > - /* FIXME: const-ify all uses of struct kvm_memory_slot. */ > - spin_lock(&kvm->mmu_lock); > - slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, > - kvm_mmu_zap_collapsible_spte, true); > - spin_unlock(&kvm->mmu_lock); > + if (!kvm->arch.zap_in_progress) { > + kvm->arch.zap_in_progress =3D true; > + schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsible_= sptes_work, > + KVM_MMU_ZAP_DELAYED); > + } > } > > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d029377..c2af289 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) > > INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_updat= e_fn); > INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn= ); > + INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work, > + zap_collapsible_sptes_fn); > + kvm->arch.zap_in_progress =3D false; > > kvm_hv_init_vm(kvm); > kvm_page_track_init(kvm); > @@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm) > { > cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work); > cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work); > + cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_sptes= _work); > kvm_free_pit(kvm); > } > > -- > 2.7.4 >