Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753388AbdDKUpo (ORCPT ); Tue, 11 Apr 2017 16:45:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38012 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753173AbdDKUpl (ORCPT ); Tue, 11 Apr 2017 16:45:41 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B4FFF3D941 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com B4FFF3D941 Date: Tue, 11 Apr 2017 22:45:36 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Paolo Bonzini Cc: James Hogan , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Christoffer Dall , Andrew Jones , Marc Zyngier , Christian Borntraeger , Cornelia Huck , Paul Mackerras Subject: Re: [PATCH RFC 1/6] KVM: fix guest_mode optimization in kvm_make_all_cpus_request() Message-ID: <20170411204536.GA20384@potion> References: <20170406202056.18379-1-rkrcmar@redhat.com> <20170406202056.18379-2-rkrcmar@redhat.com> <20170406210215.GV31606@jhogan-linux.le.imgtec.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 11 Apr 2017 20:45:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4776 Lines: 130 2017-04-11 13:25+0800, Paolo Bonzini: > On 07/04/2017 05:02, James Hogan wrote: >> - you'll no longer get IPIs if its in EXITING_GUEST_MODE (i.e. if you >> get two of these in quick succession only the first will wait for the >> IPI, which might work as long as they're already serialised but it >> still feels wrong). > > But this is okay---avoiding multiple IPIs is the exact purpose of > EXITING_GUEST_MODE. I think this applied to the missed synchronization, in which case the point is valid as the latter caller would assume that it can proceed to reuse the memory even though the guest was still using it. > There are evidently multiple uses of kvm_make_all_cpus_request, and we > should avoid smp_call_function_many(..., true) if possible. So perhaps: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index a17d78759727..20e3bd60bdda 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -169,7 +169,7 @@ static void ack_flush(void *_completed) > { > } > > -bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req) > +bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req, bool wait) > { > int i, cpu, me; > cpumask_var_t cpus; > @@ -182,18 +182,19 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req) > kvm_for_each_vcpu(i, vcpu, kvm) { > kvm_make_request(req, vcpu); > cpu = vcpu->cpu; > + if (cpus == NULL || cpu == -1 || cpu == me) > + continue; > > /* Set ->requests bit before we read ->mode. */ > smp_mb__after_atomic(); > - > - if (cpus != NULL && cpu != -1 && cpu != me && > - kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE) > + if (kvm_arch_vcpu_should_kick(vcpu) || > + (wait && vcpu->mode != OUTSIDE_GUEST_MODE)) > cpumask_set_cpu(cpu, cpus); > } > if (unlikely(cpus == NULL)) > - smp_call_function_many(cpu_online_mask, ack_flush, NULL, 1); > + smp_call_function_many(cpu_online_mask, ack_flush, NULL, wait); > else if (!cpumask_empty(cpus)) > - smp_call_function_many(cpus, ack_flush, NULL, 1); > + smp_call_function_many(cpus, ack_flush, NULL, wait); > else > called = false; > put_cpu(); > @@ -221,7 +222,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) > * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that > * barrier here. > */ > - if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH)) > + if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH, true)) > ++kvm->stat.remote_tlb_flush; > cmpxchg(&kvm->tlbs_dirty, dirty_count, 0); > } > @@ -230,7 +231,7 @@ EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs); > > void kvm_reload_remote_mmus(struct kvm *kvm) > { > - kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD); > + /* FIXME, is wait=true really needed? */ Probably not. There are two uses, in kvm_mmu_prepare_zap_page(): The only change that happens between kvm_reload_remote_mmus() and kvm_flush_remote_tlbs() in kvm_mmu_commit_zap_page() is setting of sp->role.invalid -- synchronizing it doesn't prevent any race with READING_SHADOW_PAGE_TABLES mode and the unconditional TLB flush is the important one. I think that kvm_reload_remote_mmus doesn't even need to kick in this case. in kvm_mmu_invalidate_zap_all_pages(): Same situation: the guest cannot do an entry without increasing the generation number, but can enter READING_SHADOW_PAGE_TABLES mode between reload and flush. I think that we don't need to call but my knowledge of this area is obviously lacking ... > + kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, true); > } > > int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) > > > Other users do not need wait=false. You mean "wait=true"? (Would be safer to assume they depend on the VM exit wait until proved otherwise ...) > Or another idea is to embed wait in the request number, as suggested in the > ARM thread, so that for example: Right, I don't think that a TLB flush makes sense without synchronization and adding context sensitivity > - bits 0-4 = bit number in vcpu->requests > > - bit 8 = wait when making request Sounds good. The single-target kvm_make_request() + kvm_vcpu_kick() should use this as well. > - bit 9 = kick after making request Maybe add bit mask to denote in which modes the kick/wait is necessary? bit 9 : IN_GUEST_MODE bit 10 : EXITING_GUEST_MODE bit 11 : READING_SHADOW_PAGE_TABLES TLB_FLUSH would set bits 8-11. IIUC, ARM has use for requests that need to make sure that the guest is not in guest mode before proceeding and those would set bit 8-10. The common requests, "notice me as soon as possible", would set bit 9. The bits 9-11 could also be used only when bit 8 is set, to make the transition easier. (9 and 10 could be squished then as well.)