Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp10413331imu; Thu, 6 Dec 2018 00:10:15 -0800 (PST) X-Google-Smtp-Source: AFSGD/VzsZ5ExIYnc15PxF6ZVpkj5lI0Jqprwph4l6yJscZT/QEotRopyJB8+UXLgB2RvVhrGwLe X-Received: by 2002:a17:902:8e8b:: with SMTP id bg11mr27830961plb.332.1544083815706; Thu, 06 Dec 2018 00:10:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544083815; cv=none; d=google.com; s=arc-20160816; b=xwm4ZNMlA0lYT99vsLYKDuyw3+T8efZn4LDCOiusa4+7SCcPRyrigG6p0F3uLEB8K6 Vr//eXRr0oyEiY13j/bM0+NMOfsWhB+y2d3goTisSOX3WneTw2XdYloA7Tjyfr0yP9As Rrpv9xI5ZiHDVi0JDGyJDvf04tq4zmsAaSCBERG8ypq0y1XtfgkuKstii96NVPNA8jDd DnvrlLYxikkxEkDHRBJvSM9J6BvUxiCuVcCiOY1s05gNVl4fnMqOxSIuR+VhQj2Jp01C jO7LySEViD/Fs94VeTrntZjp7KdYX52R9chZH7MtQs4MptBZu1L7XX2uTLQCJmalVVJh m3qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=RGv8FHMp1VwIj8CLwNbx7fGxgdyYRFIVWoB9fwDfvdA=; b=czHSMb9tXTss1/WcSuU6W2zNpTfWg5EMqiyV+TYfPXteGu9dkk3OH9jCYR2KXdMS91 vmLI//61fQWBtv82N21n0DioC0QYcarS+kwGtO8xYgCXDlxuNpoiuSEM9S4LjGVLc/N5 L2af+IDbB7CW4ZKNP3XeFK8zlx7jEFkd7HdfUEfKlIgU9Iru4N3wd0kSfe7clujn2mij WbuPUpsKlM1WEKHIp9G+ivRuNzRsZ9NwjiA528wA2AAiB3dz3WLcBfOUMr17wOhFCKqi ppU67jEnE4SnhezblGPvPH9TyqIyW+7zzS73upKqNcyB69YGEEYTXVBEVGZAX+MyaGf0 eSfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="EI6ju/LY"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a23si22886978plm.334.2018.12.06.00.09.54; Thu, 06 Dec 2018 00:10:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="EI6ju/LY"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729209AbeLFH6P (ORCPT + 99 others); Thu, 6 Dec 2018 02:58:15 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:42517 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729062AbeLFH6P (ORCPT ); Thu, 6 Dec 2018 02:58:15 -0500 Received: by mail-pg1-f196.google.com with SMTP id d72so10275394pga.9; Wed, 05 Dec 2018 23:58:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=RGv8FHMp1VwIj8CLwNbx7fGxgdyYRFIVWoB9fwDfvdA=; b=EI6ju/LYOY4VaJYPtxLCR+55z2LfvuXPu+GsktgaVuBBDRrU5G+bbpm77zUQmssq9W GJbwEg0scTRHpC3HB7Sfg04t9zQgRwGPFvynUvErz8zbYeQrvvnLVh7BQOOI8u/biftm H6rN51fFOgcAwFjldz4J2WfXh1cgwgDnqWH6DCbSbaVX58IWUO9r1BEI80zl3c+AHtqC zhJB+KysLyr/FFlB5wP2blFdIG6TqRhJKzyG7xhsaCyuD6B4lTnVo7x9DaKQ6C9f/pG3 Kw0PnDULuRtMdZBLjDJg1RjDmokVIvtFToUqD6C3uh7B+agWs8StqNU9ZCPLHZymlfFa pILw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=RGv8FHMp1VwIj8CLwNbx7fGxgdyYRFIVWoB9fwDfvdA=; b=aL+yDJfhT20xOXPoKanifVNPQR6SlcDt6cOjxWpIhD4EFgyJGMipgAdudh5vRdibzf 3T/xRklNOmRvUh1ssZveoIHUH58RcCQMF5bNo9yVrzvaTZxsE2P5hRY12Gx8k5OSZ6vu zKDEU2qgk/FEKnvpl9/vMGFUCkIiozAe8No4b3GlQvRUhi9zuvHrN2J/MVWWQV0WcLkd 2oapKRGR7gks2mFiqQv/7ro+6KwA5TTAtAuLsfg3sDfsLlfzStM8PufQ6a1AC2MsikoA UvucLDPtihUtpBgsurtmkZJsrbWaC+YdzaRm4ULDMk5PO+DN+4Z/3FMxCzvg3Back15K 0miA== X-Gm-Message-State: AA+aEWZXAj+hIm0ojn4hFyyZ33rrs2QV8n3+aWqgJMqQxgyRmJ2PeZ6G rsj8gwvQ0ixEyao2EbnhMwE1NIyP X-Received: by 2002:a62:47d9:: with SMTP id p86mr26969354pfi.95.1544083094266; Wed, 05 Dec 2018 23:58:14 -0800 (PST) Received: from localhost.localdomain ([203.205.141.123]) by smtp.googlemail.com with ESMTPSA id g190sm26372993pgc.28.2018.12.05.23.58.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 05 Dec 2018 23:58:13 -0800 (PST) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Subject: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes Date: Thu, 6 Dec 2018 15:58:09 +0800 Message-Id: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Wanpeng Li Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable migration downtime. [1] [2] Guangrong pointed out: | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not | urgent for vCPU's running, it could be done in a separate thread and use | lock-break technology. [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html Several TB memory guest is common now after NVDIMM is deployed in cloud environment. This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse small sptes into large sptes during roll-back after live migration fails. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mmu.c | 37 ++++++++++++++++++++++++++++++++----- arch/x86/kvm/x86.c | 4 ++++ 3 files changed, 39 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fbda5a9..dde32f9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -892,6 +892,8 @@ struct kvm_arch { u64 master_cycle_now; struct delayed_work kvmclock_update_work; struct delayed_work kvmclock_sync_work; + struct delayed_work kvm_mmu_zap_collapsible_sptes_work; + bool zap_in_progress; struct kvm_xen_hvm_config xen_hvm_config; @@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); +void zap_collapsible_sptes_fn(struct work_struct *work); int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3); bool pdptrs_changed(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7c03c0f..fe87dd3 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, return need_tlb_flush; } +void zap_collapsible_sptes_fn(struct work_struct *work) +{ + struct kvm_memory_slot *memslot; + struct kvm_memslots *slots; + struct delayed_work *dwork = to_delayed_work(work); + struct kvm_arch *ka = container_of(dwork, struct kvm_arch, + kvm_mmu_zap_collapsible_sptes_work); + struct kvm *kvm = container_of(ka, struct kvm, arch); + int i; + + mutex_lock(&kvm->slots_lock); + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + spin_lock(&kvm->mmu_lock); + slots = __kvm_memslots(kvm, i); + kvm_for_each_memslot(memslot, slots) { + slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, + kvm_mmu_zap_collapsible_spte, true); + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) + cond_resched_lock(&kvm->mmu_lock); + } + spin_unlock(&kvm->mmu_lock); + } + kvm->arch.zap_in_progress = false; + mutex_unlock(&kvm->slots_lock); +} + +#define KVM_MMU_ZAP_DELAYED (60 * HZ) void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot) { - /* FIXME: const-ify all uses of struct kvm_memory_slot. */ - spin_lock(&kvm->mmu_lock); - slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, - kvm_mmu_zap_collapsible_spte, true); - spin_unlock(&kvm->mmu_lock); + if (!kvm->arch.zap_in_progress) { + kvm->arch.zap_in_progress = true; + schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work, + KVM_MMU_ZAP_DELAYED); + } } void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d029377..c2af289 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn); INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); + INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work, + zap_collapsible_sptes_fn); + kvm->arch.zap_in_progress = false; kvm_hv_init_vm(kvm); kvm_page_track_init(kvm); @@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm) { cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work); cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work); + cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work); kvm_free_pit(kvm); } -- 2.7.4