Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp323590pxk; Thu, 17 Sep 2020 04:12:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOQ3Zu2cpgMThFF/8hb4t2wdN+/D25PeLYRdJkIlq4VdCLf16epU3IUX4KpFQKIyZbOrIX X-Received: by 2002:a50:9b5e:: with SMTP id a30mr31807573edj.49.1600341168653; Thu, 17 Sep 2020 04:12:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600341168; cv=none; d=google.com; s=arc-20160816; b=m7iYZYMIiw95RE7cREZixYP0MFIYeJgq+MDbWnAfb5OhdpwpgqEmolquY2T0wi78lY nxKxRjylCgQEcJd12dqi4C6FI61iEQMF2s+a+iKlFZIiQcGRrIgdw/NaiIgxv9gLuaEj j+3zL1aDJCBg4IgsLKw+6t6dUgt1ZAlUbyXgSrngIPTAgodjp2ZfsOkt/ahBJmjh/3Og WLrLp1SwwqH4tufk/hO3zMvj/l0iIccqKJ1WmB5sOpuI13ALdl4tow4qepDlWZZW5Rsc aof5Mm8Fz0DtMlyDG31EdLsZsV4h5ZxmxPaDflnrCNnRXZa4E8VdQDoS5sRq7kgTmCaG 3/aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6jPa4Eca2QJFB9z/ho9ZCcjVgGtPPG1+kl3a+IvmFXI=; b=ktgeGV5FMeSrHS+AUUWbmrk9R2tlbGa9XOyjRL2l8kz/8Esw0LUR1Bb8LbKBLUQXTl L7km72iPbS1ibgqv9JdSxX+jnf/9MPAm5YRlbEwFpHsrnpQjZUW/rEv7asPO0mvg8XVx 25cc5lfi8ov+uRbRnUIOOw6d1ufePV7UkXuk57S6JCouzpzAgRptcEM/1b+zgaTsThhv ukc3AfreXPa1GmdzSCKGvaaF1rKffV1QxaJy/ow8bQa53sy3kTRSwHABYEzvtCSFy39d JQqfpTZCh3z6NlHIQuOA2r6i2wtxCEcCGtxHh+NHoDHfmRtKAHThSYOYoBJ+XVaU/yPA sy4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fPwK+M3z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id wk12si13030770ejb.130.2020.09.17.04.12.25; Thu, 17 Sep 2020 04:12:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fPwK+M3z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726672AbgIQLIS (ORCPT + 99 others); Thu, 17 Sep 2020 07:08:18 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:54007 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726543AbgIQLHx (ORCPT ); Thu, 17 Sep 2020 07:07:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600340858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6jPa4Eca2QJFB9z/ho9ZCcjVgGtPPG1+kl3a+IvmFXI=; b=fPwK+M3z16gEQ24nJxtTsh2kU9gl8Oj/Ok8N6HMxOpPkqPpEGBpePA/kmRowrRIPgacKPJ +e4VLaThKoznD/0eEU6Vic4lSw/xjPG8UPThplBGJjk6l2xFhxgdLQLPaZhOVhAtnH+pU8 TcxFaHq7XoWglv0fiP6EjobfpNRJnFU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-126-eyXnH01vO8ug49cAiWT_Bw-1; Thu, 17 Sep 2020 07:07:35 -0400 X-MC-Unique: eyXnH01vO8ug49cAiWT_Bw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4D08888EF01; Thu, 17 Sep 2020 11:07:33 +0000 (UTC) Received: from localhost.localdomain (unknown [10.35.206.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id B233275142; Thu, 17 Sep 2020 11:07:29 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Jim Mattson , Sean Christopherson , Borislav Petkov , Joerg Roedel , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Ingo Molnar , Thomas Gleixner , Vitaly Kuznetsov , Maxim Levitsky Subject: [PATCH 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration Date: Thu, 17 Sep 2020 14:07:23 +0300 Message-Id: <20200917110723.820666-2-mlevitsk@redhat.com> In-Reply-To: <20200917110723.820666-1-mlevitsk@redhat.com> References: <20200917110723.820666-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception.Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset in the account. This is documented in Intel's PRM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/x86.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 17f4995e80a7e..d10d5c6add359 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2025,6 +2025,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) } EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); +static u64 kvm_read_l2_tsc(struct kvm_vcpu *vcpu, u64 host_tsc) +{ + return vcpu->arch.tsc_offset + kvm_scale_tsc(vcpu, host_tsc); +} + static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) { vcpu->arch.l1_tsc_offset = offset; @@ -3220,7 +3225,19 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vcpu->arch.msr_ia32_power_ctl; break; case MSR_IA32_TSC: - msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset; + /* + * Intel PRM states that MSR_IA32_TSC read adds the TSC offset + * even when not intercepted. AMD manual doesn't define this + * but appears to behave the same + * + * However when userspace wants to read this MSR, return its + * real L1 value so that its restore will be correct + * + */ + if (msr_info->host_initiated) + msr_info->data = kvm_read_l1_tsc(vcpu, rdtsc()); + else + msr_info->data = kvm_read_l2_tsc(vcpu, rdtsc()); break; case MSR_MTRRcap: case 0x200 ... 0x2ff: -- 2.26.2