Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753003AbcCIHfx (ORCPT ); Wed, 9 Mar 2016 02:35:53 -0500 Received: from mga04.intel.com ([192.55.52.120]:5585 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752728AbcCIHfI (ORCPT ); Wed, 9 Mar 2016 02:35:08 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,310,1455004800"; d="scan'208";a="666301936" Message-ID: <56DFCE50.2000807@intel.com> Date: Wed, 09 Mar 2016 15:18:40 +0800 From: Lan Tianyu User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Paolo Bonzini , Thomas Gleixner CC: gleb@kernel.org, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Xiao Guangrong Subject: Re: [PATCH] KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page() References: <1457055312-27067-1-git-send-email-tianyu.lan@intel.com> <56D9354E.9040908@intel.com> <56D94BFE.1080406@redhat.com> <56DE8F1A.9000401@intel.com> <56DEEF69.20208@redhat.com> In-Reply-To: <56DEEF69.20208@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5420 Lines: 136 On 2016年03月08日 23:27, Paolo Bonzini wrote: > Unfortunately that patch added a bad memory barrier: 1) it lacks a > comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a read, > so it's not even obvious that this memory barrier has to do with the > immediately preceding read of kvm->tlbs_dirty. It also is not > documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented > there most of his other work, back in 2013, but not this one :)). > > The cmpxchg is ordered anyway against the read, because 1) x86 has > implicit ordering between earlier loads and later stores; 2) even > store-load barriers are unnecessary for accesses to the same variable > (in this case kvm->tlbs_dirty). > > So offhand, I cannot say what it orders. There are two possibilities: > > 1) it orders the read of tlbs_dirty with the read of mode. In this > case, a smp_rmb() would have been enough, and it's not clear where is > the matching smp_wmb(). > > 2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH request. > In this case a smp_load_acquire would be better. > > 3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for other > callers of kvm_flush_remote_tlbs(). In this case, we know what's the > matching memory barrier (walk_shadow_page_lockless_*). > > 4) it is completely unnecessary. > > My guess is either (2) or (3), but I am not sure. We know that > anticipating kvm->tlbs_dirty should be safe; worst case, it causes the > cmpxchg to fail and an extra TLB flush on the next call to the MMU > notifier. But I'm not sure of what happens if the processor moves the > read later. I found the smp_mb() in the kvm_mmu_commit_zap_page() was removed by commit 5befdc38 and the commit was reverted by commit a086f6a1e. The remove/revert reason is whether kvm_flush_remote_tlbs() is under mmu_lock or not. The mode and request aren't always under mmu_lock and so I think the smp_mb() should not be related with mode and request when introduced. > >> > The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit >> > c142786c6 which was merged later than commit a4ee1ca4. It pairs with >> > smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order >> > between modifications of the page tables and reading mode. > Yes; it also pairs with the smp_mb__after_srcu_unlock() in vcpu_enter_guest. > >> > The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit >> > 6b7e2d09. It keeps order between setting request bit and reading mode. > Yes. > >>> >> So you can: >>> >> >>> >> 1) add a comment to kvm_flush_remote_tlbs like: >>> >> >>> >> /* >>> >> * We want to publish modifications to the page tables before reading >>> >> * mode. Pairs with a memory barrier in arch-specific code. >>> >> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest. >>> >> * - powerpc: smp_mb in kvmppc_prepare_to_enter. >>> >> */ >>> >> >>> >> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying >>> >> that the memory barrier also orders the write to mode from any reads >>> >> to the page tables done while the VCPU is running. In other words, on >>> >> entry a single memory barrier achieves two purposes (write ->mode before >>> >> reading requests, write ->mode before reading page tables). >> > >> > These sounds good. >> > >>> >> The same should be true in kvm_flush_remote_tlbs(). So you may investigate >>> >> removing the barrier from kvm_flush_remote_tlbs, because >>> >> kvm_make_all_cpus_request already includes a memory barrier. Like >>> >> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(), >>> >> saying which memory barrier you are relying on and for what. >> > >> > If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to >> > leave comments both in the kvm_flush_remote_tlbs() and >> > kvm_mmu_commit_zap_page(), right? > Yes. In fact, instead of removing it, I would change it to > > smp_mb__before_atomic(); > > with a comment that points to the addition of the barrier in commit > a4ee1ca4. Unless Guangrong can enlighten us. :) > How about the following comments. Log for kvm_mmu_commit_zap_page() /* * We need to make sure everyone sees our modifications to * the page tables and see changes to vcpu->mode here. The * barrier in the kvm_flush_remote_tlbs() helps us to achieve * these. Otherwise, wait for all vcpus to exit guest mode * and/or lockless shadow page table walks. */ kvm_flush_remote_tlbs(kvm); Log for kvm_flush_remote_tlbs() /* * We want to publish modifications to the page tables before * reading mode. Pairs with a memory barrier in arch-specific * code. * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest. * - powerpc: smp_mb in kvmppc_prepare_to_enter. */ smp_mb__before_atomic(); >>> >> >>> >> And finally, the memory barrier in kvm_make_all_cpus_request can become >>> >> smp_mb__after_atomic, which is free on x86. >> > >> > I found you have done this in your tree :) >> > >> > commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479 >> > Author: Paolo Bonzini >> > Date: Wed Feb 24 12:44:17 2016 +0100 >> > >> > KVM: mark memory barrier with smp_mb__after_atomic >> > >> > Signed-off-by: Paolo Bonzini > Yeah, but I reverted it because it makes sense to do it together with > your patch. > > Paolo -- Best regards Tianyu Lan