Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3000441pxk; Mon, 21 Sep 2020 02:41:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzVf7L8KBCHs7eGGQojBalRLot/24Lc6KaVjAB+4IXLSBo7Zlh2+vZBhRtIsLyKz/7eIGqf X-Received: by 2002:aa7:d35a:: with SMTP id m26mr49661462edr.183.1600681297614; Mon, 21 Sep 2020 02:41:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600681297; cv=none; d=google.com; s=arc-20160816; b=NYWq5T/YKJ6B6ci5QE+IKzUxpTQPza5CX5Tk0+Wx/drNudbuLbjrtY41LhwdThBmZr hW3ij70YXx3rSVf8Wbm6Icf9Epi31aZ6F8ToEcM2NXv8//vOltVxpdjDvl4CtKOpgNbh LyfWFOrtFAodXqLU/mpXSHcl1QLwakTLFcHKu4sRYo6Qd7/PLs4h1M/kla1LuCWOPg+y yMpWUMGAerDrWRClY2WKnt0FEEicwSRCEBIi0NFGf4yjjRtFnt3dNwp9cHGh6HpFraca eInGZ5dZVT+CwpLS/k1e6Ka+o/9Jb1NQMC74afrvCXLxaYSZAXzqxFPe70jJOIuWpss1 eUaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=hzal5HS5mBmQ65nzGT4AtLr7x+11VqTcaPyuQCAIt4g=; b=GtpGvIFlWY7/rjawKiYvLcdbiq9zPjAvWqvQ4udxHQk4a4w7CDjpQTelyZx+MvwEWp UrU0PkXt9+ts3MDOPqgREVUNYkEao84CazLpZDO3uVwuTQ0GeVRxcNpVLPDWoqr8Wg+I LKt8fsthItsGH0EtWyUJ+pWSNgZL7rOkVFjdHHt4X/o/ExP9TA714B1Wgnz5eDWwDuxO GaZ3aQ3vp+3Tu1UW9P5k+Cvc6J70q9vh35Cig1TYNGFh1EtcWs22DKbJ1KtUnzV3G3pP RpI+A1IYjP8LC6eIn6h7UFe/ikmziTJnkAeEoi7efnXbTO52Ingfr8smYRRtlTYhQbAX /H9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PwYXWwAQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id yc15si9666904ejb.674.2020.09.21.02.41.14; Mon, 21 Sep 2020 02:41:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PwYXWwAQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726318AbgIUJkN (ORCPT + 99 others); Mon, 21 Sep 2020 05:40:13 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:41630 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726333AbgIUJkN (ORCPT ); Mon, 21 Sep 2020 05:40:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600681211; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hzal5HS5mBmQ65nzGT4AtLr7x+11VqTcaPyuQCAIt4g=; b=PwYXWwAQkD7qZ9nfS1xSPKAl2N3rMA1+Z4FxFSl/u9WySsO7EZG7YhNZm9qDzCgWQ1rwWs 8b305apwqOUwyEJpgLmrf+L0UvQfvGOuDutwxoaWldB1iUJ5EwZ+3fvH6Y4oXp+UB1aFSN s5UzIXwlNtvgxAbVKhCBCEbKAWVT6kM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-116-xapV6KnjMf2PtreSsxQg_g-1; Mon, 21 Sep 2020 05:40:09 -0400 X-MC-Unique: xapV6KnjMf2PtreSsxQg_g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6D0216A2A0; Mon, 21 Sep 2020 09:40:07 +0000 (UTC) Received: from starship (unknown [10.35.206.28]) by smtp.corp.redhat.com (Postfix) with ESMTP id 091987882A; Mon, 21 Sep 2020 09:40:00 +0000 (UTC) Message-ID: Subject: Re: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration From: Maxim Levitsky To: Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Jim Mattson , Borislav Petkov , Joerg Roedel , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Ingo Molnar , Thomas Gleixner , Vitaly Kuznetsov Date: Mon, 21 Sep 2020 12:39:59 +0300 In-Reply-To: References: <20200917110723.820666-1-mlevitsk@redhat.com> <20200917110723.820666-2-mlevitsk@redhat.com> <20200917161135.GC13522@sjchrist-ice> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.3 (3.36.3-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2020-09-21 at 12:25 +0300, Maxim Levitsky wrote: > On Thu, 2020-09-17 at 09:11 -0700, Sean Christopherson wrote: > > On Thu, Sep 17, 2020 at 02:07:23PM +0300, Maxim Levitsky wrote: > > > MSR reads/writes should always access the L1 state, since the (nested) > > > hypervisor should intercept all the msrs it wants to adjust, and these > > > that it doesn't should be read by the guest as if the host had read it. > > > > > > However IA32_TSC is an exception.Even when not intercepted, guest still > > > > Missing a space after the period. > Fixed > > > reads the value + TSC offset. > > > The write however does not take any TSC offset in the account. > > > > s/in the/into > Fixed. > > > This is documented in Intel's PRM and seems also to happen on AMD as well. > > > > Ideally we'd get confirmation from AMD that this is the correct behavior. > It would be great. This isn't a blocker for this patch however since I didn't > change the current emulation behavier which already assumes this. > Also we don't really trap TSC reads, so this code isn't really executed. I did now find out an explict mention that AMD's TSC scaling affects the MSR reads, and while this doesn't expictily mention the offset, this does give more ground to the assumption that the offset is added as well. "Writing to the TSC Ratio MSR allows the hypervisor to control the guest's view of the Time Stamp Counter. The contents of TSC Ratio MSR sets the value of the TSCRatio. This constant scales the timestamp value returned when the TSC is read by a guest via the RDTSC or RDTSCP instructions or when the TSC, MPERF, or MPerfReadOnly MSRs are read via the RDMSR instruction by a guest running under virtualization." Best regards, Maxim Levitsky > > (I haven't checked what corner cases when we do this. It can happen in theory, > if MSR read is done from the emulator or something like that). > > > > This creates a problem when userspace wants to read the IA32_TSC value and then > > > write it. (e.g for migration) > > > > > > In this case it reads L2 value but write is interpreted as an L1 value. > > > > It _may_ read the L2 value, e.g. it's not going to read the L2 value if L1 > > is active. > > I didn't thought about this this way. I guess I always thought that L2 is, > L2 if L2 is running, otherwise L1, but now I understand what you mean, > and I agree. > > > > To fix this make the userspace initiated reads of IA32_TSC return L1 value > > > as well. > > > > > > Huge thanks to Dave Gilbert for helping me understand this very confusing > > > semantic of MSR writes. > > > > > > Signed-off-by: Maxim Levitsky > > > --- > > > arch/x86/kvm/x86.c | 19 ++++++++++++++++++- > > > 1 file changed, 18 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 17f4995e80a7e..d10d5c6add359 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) > > > } > > > EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); > > > > > > +static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) > > > > This is definitely not L2 specific. I would vote to just omit the helper so > > that we don't need to come up with a name that is correct across the board, > > e.g. "raw" is also not quite correct. > Yes, now I see this. > > > An alternative would be to do: > > > > u64 tsc_offset = msr_info->host_initiated ? vcpu->arch.l1_tsc_offset : > > vcpu->arch.tsc_offset; > > > > msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + tsc_offset; > > > > Which I kind of like because the behavioral difference is a bit more obvious. > Yep did that. The onl minor downside is that I need a C scope in the switch block. > I can add kvm_read_tsc but I think that this is not worth it. > > > > +{ > > > + return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc); > > > +} > > > + > > > static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) > > > { > > > vcpu->arch.l1_tsc_offset = offset; > > > @@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > > msr_info->data = vcpu->arch.msr_ia32_power_ctl; > > > break; > > > case MSR_IA32_TSC: > > > - msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset; > > > + /* > > > + * Intel PRM states that MSR_IA32_TSC read adds the TSC offset > > > + * even when not intercepted. AMD manual doesn't define this > > > + * but appears to behave the same > > > + * > > > + * However when userspace wants to read this MSR, return its > > > + * real L1 value so that its restore will be correct > > > + * > > > > Extra line is unnecessary. > This is a bit of my OCD :-) I don't mind to remove it. > > > + */ > > > + if (msr_info->host_initiated) > > > + msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc()); > > > + else > > > + msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc()); > > > break; > > > case MSR_MTRRcap: > > > case 0x200 ... 0x2ff: > > > -- > > > 2.26.2 > > > > > Thanks for the review, the V2 is on the way. > Best regards, > Maxim Levitsky >