Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp247884pxj; Fri, 14 May 2021 02:24:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyVoIoPW4XXOuqgSHd4oxBVidl5bQuNL5fkfzKty6sPbGSHGdjzWlVCoBWCjN2kXJ0vvjOG X-Received: by 2002:a92:d24a:: with SMTP id v10mr6500043ilg.247.1620984267028; Fri, 14 May 2021 02:24:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620984267; cv=none; d=google.com; s=arc-20160816; b=MGwag+crbfE7xIiTPa+YcWiOPfGhLmoGaq89R1GbOTMq6YuROVkon/+WgmaVKbOuwu eXXwkjC9ZR2XWIYHQTJPbsX0ExMyTvjJyvUJD/sHGaSXveLdi/9/JGZ8m/9frCoG71wF 5gHAVSfO8jmH2aogoGoZCcSklt+yIN3Ptvm374BikCfw6HorIylcapkD9FRWL3t2iEtC Bp19fG+s4ypkvGml5ZoPhDPfmmwkm7l3COZORqqG+AbJQlQu8EUfZ78SmN2hRJ12ZCpi EqS1Wp3QchudL2QG2spbzS7cmwWCBRy+PULmjw94sB7gU+yg5yZcTOnGnlN9PvFrIO+x OhJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GhmR1gMuK6ZYd6ubmXQnRRH50y5hVvXQHU7g9VIKAEQ=; b=ORhtgbf8XrrEDBc4gmpOD/RgfbPd7GlXwWkB+eIAzrftZPH6udLOcbBMAW+kphQo7c 9xuK0K6Pt1DnRDmJCoU/CnTrkgk+RKBVotx1agNrx943cjRpFXw0tXe2Iera9gftbRuI 9hNlwX+Q/C3e+ONP7TUgWCMKd2FKiQkamzCU6DASqFbGn1Uk8z5M7ui40f0126XtzAUN WbJV/XFAjPMwE6NtKWpN4sd+klv9tBPnmX+OrhySbVv7CupsHjw445BUX8NDzyEi102G tn6WT2nTvhokt3mAqSvI5D/dyLoL92UwsE1lMWUs21qABH4oi1zVc7rOWoTIkH1bfunz vQHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Dfe7d0hh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z11si7566517ioq.81.2021.05.14.02.24.13; Fri, 14 May 2021 02:24:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Dfe7d0hh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231142AbhENFNK (ORCPT + 99 others); Fri, 14 May 2021 01:13:10 -0400 Received: from mail.kernel.org ([198.145.29.99]:57610 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229997AbhENFNK (ORCPT ); Fri, 14 May 2021 01:13:10 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 4C1F16141E for ; Fri, 14 May 2021 05:11:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620969119; bh=89LPmC6xl6/RVBJjKjZBJhT2GrBhxlOpYu9HuggbB6o=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Dfe7d0hhYGed4ACRU3n+0ldK+xNDd20mjVys7UWSGxwdv5+TnophL718h1argZnz+ keyzL9d36IqUiv2fOzaU0/mn8ON2sABImEnRrxghu6B+ReovLXK+oMo/a6cR3BLfuk qgFMQDhC0dYuZAdm+4wnwsIWkkCSdS+PH319/QdfXmWQ5OP5pbnkZ9tgRR63jJMv8B f/kkjmxyJS4g1+Dzkv5wyxENsARXtxfKLvloBfzKxTXzPYoY9UL0V03LinTTwTgILf x3YOl2rPRa8iQ22CjhIoAS1o032DCQbiPD5/hfJLMZVgD0pZozAR+8924Bjf89uLSI z2PbwIAfr/BqQ== Received: by mail-ed1-f41.google.com with SMTP id r11so9302096edt.13 for ; Thu, 13 May 2021 22:11:59 -0700 (PDT) X-Gm-Message-State: AOAM532nQfXw5DSIiecqsOlQ0Az3Rlg3gt5gEagPjoS2lQK8XTKBWmuQ Bn9BXnspYGDT2IijAMrkVpI5v8ckRCHWfHAqErdIYw== X-Received: by 2002:aa7:d390:: with SMTP id x16mr52685761edq.172.1620969107558; Thu, 13 May 2021 22:11:47 -0700 (PDT) MIME-Version: 1.0 References: <20210507164456.1033-1-jon@nutanix.com> In-Reply-To: <20210507164456.1033-1-jon@nutanix.com> From: Andy Lutomirski Date: Thu, 13 May 2021 22:11:36 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] KVM: x86: add hint to skip hidden rdpkru under kvm_load_host_xsave_state To: Jon Kohler Cc: Babu Moger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Dave Hansen , Fenghua Yu , Yu-cheng Yu , Tony Luck , Uros Bizjak , Petteri Aimonen , Kan Liang , Andrew Morton , Mike Rapoport , Benjamin Thiel , Fan Yang , Juergen Gross , Dave Jiang , "Peter Zijlstra (Intel)" , Ricardo Neri , Arvind Sankar , LKML , kvm list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 7, 2021 at 9:45 AM Jon Kohler wrote: > > kvm_load_host_xsave_state handles xsave on vm exit, part of which is > managing memory protection key state. The latest arch.pkru is updated > with a rdpkru, and if that doesn't match the base host_pkru (which > about 70% of the time), we issue a __write_pkru. This thread caused me to read the code, and I don't get how it's even remotely correct. First, kvm_load_guest_fpu() has this delight: /* * Guests with protected state can't have it set by the hypervisor, * so skip trying to set it. */ if (vcpu->arch.guest_fpu) /* PKRU is separately restored in kvm_x86_ops.run. */ __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state, ~XFEATURE_MASK_PKRU); That's nice, but it fails to restore XINUSE[PKRU]. As far as I know, that bit is live, and the only way to restore it to 0 is with XRSTOR(S). > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index cebdaa1e3cf5..cd95adbd140c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) > } > > if (static_cpu_has(X86_FEATURE_PKU) && > - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || > - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && > - vcpu->arch.pkru != vcpu->arch.host_pkru) > - __write_pkru(vcpu->arch.pkru); > + vcpu->arch.pkru != vcpu->arch.host_pkru && > + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || > + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) > + __write_pkru(vcpu->arch.pkru, false); Please tell me I'm missing something (e.g. KVM very cleverly managing the PKRU register using intercepts) that makes this reliably load the guest value. An innocent or malicious guest could easily make that condition evaluate to false, thus allowing the host PKRU value to be live in guest mode. (Or is something fancy going on here?) I don't even want to think about what happens if a perf NMI hits and accesses host user memory while the guest PKRU is live (on VMX -- I think this can't happen on SVM). > } > EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); > > @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) > return; > > if (static_cpu_has(X86_FEATURE_PKU) && > - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || > - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { > + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || > + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) { > vcpu->arch.pkru = rdpkru(); > if (vcpu->arch.pkru != vcpu->arch.host_pkru) > - __write_pkru(vcpu->arch.host_pkru); > + __write_pkru(vcpu->arch.host_pkru, true); > } Suppose the guest writes to PKRU and then, without exiting, sets PKE = 0 and XCR0[PKRU] = 0. (Or are the intercepts such that this can't happen except on SEV where maybe SEV magic makes the problem go away?) I admit I'm fairly mystified as to why KVM doesn't handle PKRU like the rest of guest XSTATE.