Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S937421AbdDZUoj (ORCPT ); Wed, 26 Apr 2017 16:44:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39008 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935071AbdDZUoc (ORCPT ); Wed, 26 Apr 2017 16:44:32 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com EF6B2C05974C Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com EF6B2C05974C Date: Wed, 26 Apr 2017 22:44:28 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Brijesh Singh Cc: pbonzini@redhat.com, joro@8bytes.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, Thomas.Lendacky@amd.com Subject: Re: [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set Message-ID: <20170426204427.GA7135@potion> References: <1493049146-19261-1-git-send-email-brijesh.singh@amd.com> <20170424205236.GE5713@potion> <77f51978-5937-0c94-13b6-885345921b03@amd.com> <20170425140351.GF5713@potion> <6e453f35-cd26-1df8-5f8e-68fa09c6a1a3@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6e453f35-cd26-1df8-5f8e-68fa09c6a1a3@amd.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 26 Apr 2017 20:44:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3876 Lines: 98 2017-04-25 17:02-0500, Brijesh Singh: > > > I also wanted to avoid adding yet another variable but we can't depend on > > > cr2 parameters passed into x86_emulate_instruction(). > > > > > > The x86_emulate_instruction() function is called from two places: > > > > > > 1) handling the page-fault. > > > pf_interception [svm.c] > > > kvm_mmu_page_fault [mmu.c] > > > x86_emulate_instruction [x86.c] > > > > > > 2) completing the IO/MMIO's from previous instruction decode > > > kvm_arch_vcpu_ioctl_run > > > complete_emulated_io > > > emulate_instruction > > > x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0) > > > > > > In #1, we are guaranteed that cr2 variable will contain a valid GPA but > > > in #2, CR2 is set to zero. > > > > We are setting up the completion in #1 x86_emulate_instruction(), where > > the gpa (cr2) is available, so we could store the value while arming > > vcpu->arch.complete_userspace_io. > > > > emulator_read_write_onepage() already saves gpa in frag->gpa, which is > > then passed into complete_emulated_mmio -- isn't that mechanism > > sufficient? > > > > I see that complete_emulated_mmio() saves the frag>gpa into run->mmio.phys_addr, > so based on the exit_reason we should be able to get the saved gpa. I mean, we could pass the frag->gpa into x86_emulate_instruction(). I think that exit_reason is related just accidentally and relying on it inside x86_emulate_instruction() is bad. > In my debug patch below, I tried doing something similar to verify that frag->gpa > contains the valid CR2 value but I saw a bunch of mismatch. So it seems like we > may not able to use frag->gpa mechanism. Additionally we also need to handle the > PIO cases (e.g what if we are called from complete_emulated_pio), which also takes > similar code path Yes, but PIO should trigger a NPF exit relatively rarely and generalizing the method from MMIO will be easy. > complete_emulated_pio > completed_emulated_io > emulate_instruction > x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0) > > @@ -5682,13 +5686,20 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, > restart: > /* > - * Save the faulting GPA (cr2) in the address field > - * NOTE: If gpa_available is set then gpa_val will contain a valid GPA > + * if previous exit was due to userspace mmio completion then actual > + * cr2 is stored in mmio.phys_addr. > */ > - if (vcpu->arch.gpa_available) > - ctxt->exception.address = vcpu->arch.gpa_val; > - else > - ctxt->exception.address = cr2; > + if (vcpu->run->exit_reason == KVM_EXIT_MMIO) { > + cr2 = vcpu->run->mmio.phys_addr; > + if (cr2 != vcpu->arch.gpa_val) > + pr_err("** mismatch %llx %llx\n", > + vcpu->run->mmio.phys_addr, vcpu->arch.gpa_val); This will probably return false negatives when then vcpu->arch.gpa_val couldn't be used anyway (possibly after a VM exits), so it is hard to draw a conclusion. I would really like if we had a solution that passed the gpa inside function parameters. (Btw. cr2 can also be a virtual address, so we can call it gpa directly) > > If at all possible, I'd like to have the gpa passed with other relevant > > data, instead of having it isolated like this ... and we can't manage > > that, then at least good benchmark results to excuse the bad code. > > > > I ran two tests to see how many times we walk guest page table. > > Test1: run kvm-unit-test > Test2: launch Ubuntu 16.06 guest, run a stressapptest for 20 seconds, shutdown the VM > > Before patch > * Test1: 10419 > * Test2: 243365 > > After patch: > * Test1: 1259 > * Test2: 1221 > > Please let me know if you want me to run other other benchmark and capture the data. Looks convincing, thanks.