Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752933Ab3FIIxu (ORCPT ); Sun, 9 Jun 2013 04:53:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55529 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750827Ab3FIIxr (ORCPT ); Sun, 9 Jun 2013 04:53:47 -0400 Date: Sun, 9 Jun 2013 11:53:42 +0300 From: Gleb Natapov To: Xiao Guangrong Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [PATCH v8 00/11] KVM: MMU: fast zap all shadow pages Message-ID: <20130609085342.GI4725@redhat.com> References: <1369960590-14138-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1369960590-14138-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7175 Lines: 168 On Fri, May 31, 2013 at 08:36:19AM +0800, Xiao Guangrong wrote: > Hi Gleb, Paolo, Marcelo, > > I have putted the potential controversial patches to the latter that are > patch 8 ~ 10, patch 11 depends on patch 9. Other patches are fully reviewed, > I think its are ready for being merged. If not luck enough, further discussion > is needed, could you please apply that patches first? :) > > Thank you in advance! > > Some points are raised during discussion but missed in this version: > 1) Gleb's idea that skip obsolete pages in the hast list walker > > Unfortunately, it is not safe. There has a window between updating > valid-gen and reloading mmu, in that window, the obsolete page can > be used by vcpu, but the guest page table fail to be write-protected > (since the obsolete page is skipped in mmu_need_write_protect()). > Can you elaborate on how this can happen. valid_gen is updated under mmu_lock and reloading of mmus happens under the same lock, so for all other vcpus this should look like atomic thing. > Instead, we can only skip the zapped-obsolete page > (is_obsolete_sp(sp) && sp->role.invalid)), the current code has already > skip them but put the comment around the hash list walker to warn the > further development. > > 2) Marcelo's comment that obsolete pages can cause the number of shadow page > greater than the n_max_mmu_pages > > I am not sure this is really a problem, it only exists in the really tiny > window and the page-reclaim path are able to handle the obsolete pages. > Furthermore, we can properly reduce n_max_mmu_pages to make that window > more tiny. > > Anyway, like commit 5d21881432 shows that "the mmu counters are for > beancounting purposes only", maybe that window is allowed. > > Changlog: > V8: > 1): add some comments to explain FIFO around active_mmu_list address > Marcelo's comments. > > 2): the page-reclaim path may fail to free zapped-obsolete pages pointed > out by Marcelo, the patchset fixes it by listing all zapped obsolete > pages on a global list, always free page from that list first. > > 3): address Marcelo's suggestion to move the "zap pages in batch" patch > to the latter. > > 4): drop the previous patch which introduced > kvm_mmu_prepare_zap_obsolete_page(), instead, we put the comments > around hash list walker to warn the user that the zapped-obsolete > page still live on hash list. > > 5): add the note into the changelog of "zap pages in batch" patch to explain > the batch number is the speculative value based on Takuya's comments. > > V7: > 1): separate some optimization into two patches which do not reuse > the obsolete pages and collapse tlb flushes, suggested by Marcelo. > > 2): make the patch based on Gleb's diff change which reduce > KVM_REQ_MMU_RELOAD when root page is being zapped. > > 3): remove calling kvm_mmu_zap_page when patching hypercall, investigated > by Gleb. > > 4): drop the patch which deleted page from hash list at the "prepare" > time since it can break the walk based on hash list. > > 5): rename kvm_mmu_invalidate_all_pages to kvm_mmu_invalidate_zap_all_pages. > > 6): introduce kvm_mmu_prepare_zap_obsolete_page which is used to zap obsolete > page to collapse tlb flushes. > > V6: > 1): reversely walk active_list to skip the new created pages based > on the comments from Gleb and Paolo. > > 2): completely replace kvm_mmu_zap_all by kvm_mmu_invalidate_all_pages > based on Gleb's comments. > > 3): improve the parameters of kvm_mmu_invalidate_all_pages based on > Gleb's comments. > > 4): rename kvm_mmu_invalidate_memslot_pages to kvm_mmu_invalidate_all_pages > 5): rename zap_invalid_pages to kvm_zap_obsolete_pages > > V5: > 1): rename is_valid_sp to is_obsolete_sp > 2): use lock-break technique to zap all old pages instead of only pages > linked on invalid slot's rmap suggested by Marcelo. > 3): trace invalid pages and kvm_mmu_invalidate_memslot_pages() > 4): rename kvm_mmu_invalid_memslot_pages to kvm_mmu_invalidate_memslot_pages > according to Takuya's comments. > > V4: > 1): drop unmapping invalid rmap out of mmu-lock and use lock-break technique > instead. Thanks to Gleb's comments. > > 2): needn't handle invalid-gen pages specially due to page table always > switched by KVM_REQ_MMU_RELOAD. Thanks to Marcelo's comments. > > V3: > completely redesign the algorithm, please see below. > > V2: > - do not reset n_requested_mmu_pages and n_max_mmu_pages > - batch free root shadow pages to reduce vcpu notification and mmu-lock > contention > - remove the first patch that introduce kvm->arch.mmu_cache since we only > 'memset zero' on hashtable rather than all mmu cache members in this > version > - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all > > * Issue > The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to > walk and zap all shadow pages one by one, also it need to zap all guest > page's rmap and all shadow page's parent spte list. Particularly, things > become worse if guest uses more memory or vcpus. It is not good for > scalability. > > * Idea > KVM maintains a global mmu invalid generation-number which is stored in > kvm->arch.mmu_valid_gen and every shadow page stores the current global > generation-number into sp->mmu_valid_gen when it is created. > > When KVM need zap all shadow pages sptes, it just simply increase the > global generation-number then reload root shadow pages on all vcpus. > Vcpu will create a new shadow page table according to current kvm's > generation-number. It ensures the old pages are not used any more. > > Then the invalid-gen pages (sp->mmu_valid_gen != kvm->arch.mmu_valid_gen) > are zapped by using lock-break technique. > > Gleb Natapov (1): > KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped > > Xiao Guangrong (10): > KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall > KVM: MMU: drop unnecessary kvm_reload_remote_mmus > KVM: MMU: fast invalidate all pages > KVM: x86: use the fast way to invalidate all pages > KVM: MMU: show mmu_valid_gen in shadow page related tracepoints > KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages > KVM: MMU: do not reuse the obsolete page > KVM: MMU: zap pages in batch > KVM: MMU: collapse TLB flushes when zap all pages > KVM: MMU: reclaim the zapped-obsolete page first > > arch/x86/include/asm/kvm_host.h | 4 + > arch/x86/kvm/mmu.c | 128 ++++++++++++++++++++++++++++++++++++--- > arch/x86/kvm/mmu.h | 1 + > arch/x86/kvm/mmutrace.h | 42 ++++++++++--- > arch/x86/kvm/x86.c | 17 +---- > 5 files changed, 161 insertions(+), 31 deletions(-) > > -- > 1.7.7.6 -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/