Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1125939imu; Thu, 20 Dec 2018 10:31:31 -0800 (PST) X-Google-Smtp-Source: AFSGD/VymlOdUKkCfQGP7P3yO+++lbqq+ffZyEURDkuZdw7l9U6RLE/8oTcJSF8n3s2Is+lvAtAe X-Received: by 2002:a17:902:6909:: with SMTP id j9mr24183689plk.196.1545330691553; Thu, 20 Dec 2018 10:31:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545330691; cv=none; d=google.com; s=arc-20160816; b=CF6oLIQm3ORdX2meydXWCNsaaiGDO6TaAiJ69mAQpmYNAjwO9BT+QwC9V9oZQFSfgQ 3DtubFJk/PF9UytwjRlp8U/tP5p72rrShKKf6+XW19Fa7bvs9RwIQ7cr3vumxFS5R1lj TbABNI/LB0xSmKyTy00B0izijy8n+5uq+oqJ/GUsqdvoe1Nrxvj9ds731/2JKFgAsdcV erdaeB8k5W8HDrnMX3p7yCq2QIVeSPI2QKbWPjjXq29KQNlRvJAMuLdWO949v5hRRcgl AQHt/t736tDmEe3kwNY6eV+3po6Wufm/SO3KGnS4L7usNf+EWJ/nHQCOymjgYiZr7xgn FW3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=a9gR3jLw+kaMueCiX97iys34l6VhLZj2uqpHuA5LSvA=; b=Wn/+cbOf+GgkucbnRQ2XA5VRguDNJTLL3mr0XHQAlXkeqRUtBmdVwgjAi6vSUlTWJ0 cbzErmgJ5Kj2RDVax6W2gXew9nEjG2p74z4BIDlkgH8x5dsBeIm0vViGO/JEzscoY+1B 3uVNGAzrpBaOPkvfIkoZIHS3Bq0MNkVFqwKvZJ2pjmob5gDpr07dFUJCBRRikhNbEpBc OS7Rju4QHe30UjwA+E8xPXDBXyziP/m40O5Rw9oXc40y+CVO8jsDtEfClZZK+ws/mZOI 5DOckqpetpJ8yc3IF3n0ULKGGUbt/j5h49X+df6WTu7BjxSADteFW+arZkL++qajfQBP U8uA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p185si20612796pfg.112.2018.12.20.10.31.15; Thu, 20 Dec 2018 10:31:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387697AbeLTOny (ORCPT + 99 others); Thu, 20 Dec 2018 09:43:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34738 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730138AbeLTOnx (ORCPT ); Thu, 20 Dec 2018 09:43:53 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 632EF3C2CDB; Thu, 20 Dec 2018 14:43:53 +0000 (UTC) Received: from flask (unknown [10.43.2.138]) by smtp.corp.redhat.com (Postfix) with SMTP id 83A1B4526; Thu, 20 Dec 2018 14:43:46 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Thu, 20 Dec 2018 15:43:45 +0100 Date: Thu, 20 Dec 2018 15:43:45 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini Subject: Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes Message-ID: <20181220144345.GB19579@flask> References: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1544083089-13000-1-git-send-email-wanpengli@tencent.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 20 Dec 2018 14:43:53 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-12-06 15:58+0800, Wanpeng Li: > From: Wanpeng Li > > Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() > takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable > migration downtime. [1] [2] > > Guangrong pointed out: > > | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not > | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not > | urgent for vCPU's running, it could be done in a separate thread and use > | lock-break technology. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html > [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html > > Several TB memory guest is common now after NVDIMM is deployed in cloud environment. > This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse > small sptes into large sptes during roll-back after live migration fails. > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Signed-off-by: Wanpeng Li > --- > @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, > return need_tlb_flush; > } > > +void zap_collapsible_sptes_fn(struct work_struct *work) > +{ > + struct kvm_memory_slot *memslot; > + struct kvm_memslots *slots; > + struct delayed_work *dwork = to_delayed_work(work); > + struct kvm_arch *ka = container_of(dwork, struct kvm_arch, > + kvm_mmu_zap_collapsible_sptes_work); > + struct kvm *kvm = container_of(ka, struct kvm, arch); > + int i; > + > + mutex_lock(&kvm->slots_lock); > + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { > + spin_lock(&kvm->mmu_lock); > + slots = __kvm_memslots(kvm, i); > + kvm_for_each_memslot(memslot, slots) { > + slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, > + kvm_mmu_zap_collapsible_spte, true); > + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) > + cond_resched_lock(&kvm->mmu_lock); I think we shouldn't zap all memslots when kvm_mmu_zap_collapsible_sptes only wanted to zap a specific one. Please add a list of memslots to be zapped; delete from the list here and add in kvm_mmu_zap_collapsible_sptes(). > + } > + spin_unlock(&kvm->mmu_lock); > + } > + kvm->arch.zap_in_progress = false; > + mutex_unlock(&kvm->slots_lock); > +} > + > +#define KVM_MMU_ZAP_DELAYED (60 * HZ) > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > const struct kvm_memory_slot *memslot) > { > - /* FIXME: const-ify all uses of struct kvm_memory_slot. */ > - spin_lock(&kvm->mmu_lock); > - slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot, > - kvm_mmu_zap_collapsible_spte, true); > - spin_unlock(&kvm->mmu_lock); > + if (!kvm->arch.zap_in_progress) { The list can also serve in place of zap_in_progress -- if there were any elements in it, then there is no need to schedule the work again. Thanks.