Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2427412pxj; Mon, 17 May 2021 01:00:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzDsVteH4nB08wbPJz8Gqwi5PT78z2p1GGXMk6q9R49rftyiyLlOslhQYu5awKi1LM0sGyE X-Received: by 2002:a17:906:c0c3:: with SMTP id bn3mr61269530ejb.498.1621238420884; Mon, 17 May 2021 01:00:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621238420; cv=none; d=google.com; s=arc-20160816; b=HC6mJZW2ekKlodT20TKL/04Kr1e2NU26YP9VkYbw8kG6crPw3/HcO9WBxUpsNINoYY JK+/u5nr5TjtCN3h3H2fzZbIeCu0sDpG1NaUU9Sk+Pu+/ARVjHwSKR561e1NB94jFt1t tJLv9WVgbESSpiFTDK8iNDYODt3J8fOKDFhWdS9WZDvSfO6SSlkdj1d3wNN8TAdHkONs fnme9rhCelbg4vwvd7IYwGgwEC4m0F6kaCN4UAJc//EwraHpAZvpv6iLX9a7jhcnNsNA LuUuvJlDtlZA+/B0R89qIA3ABlD5CMoWwBvnfuNQDU0xiOhcB/wYXp9ld4uWwy87hncE 7wqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to:dkim-signature; bh=Pr+SsTDKN3t1n8i1sa/wBoRt+XTnEThJ5HCbyq0kBx8=; b=f+ZxqKfNeyEgptziCTJl/qSOtMOStVdslT7vDCkubLzoMy5RHRMzT7TcLEQmCTm77O 6mP9Q+8rKUcfuc/0v2B1H3qOfJd8CxhkGkuMDNnii+kGxoq83RGGMLn21tSIQntey/Cl +Ze8aWCnLIsJX7kQwODZmSZFkNSKJ6fJU7KPO+YY9DjVWaQSjrYNko1kmkP/lXGLr0LD lYrNcNNkoto9H6NKJnj5T+cXL6c0myfukgS3gDkXdAScflcoIChSQELOJ9rM4qeI/KY5 ay3zUQmrCRcuz9r0A5m+W/XDlYYGPlH72mAlAVO409Be8wOe0SuctkckfG6NBhZbXL/p rzaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="imEL/CWC"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y18si5760495ejr.515.2021.05.17.00.59.58; Mon, 17 May 2021 01:00:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="imEL/CWC"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233807AbhEQHsW (ORCPT + 99 others); Mon, 17 May 2021 03:48:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:51648 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232040AbhEQHsU (ORCPT ); Mon, 17 May 2021 03:48:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621237624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pr+SsTDKN3t1n8i1sa/wBoRt+XTnEThJ5HCbyq0kBx8=; b=imEL/CWCpH8JehSfzermR39Fl6bjn+7L3pceZLMpIn94R7wVGB39kEO0V5Tj7AtsCfKz2d +JAbzai6FjVQjjIToRjweqqSbRXzCWXwi2cdQcnlZq/isA1tX1pgZ4+kX4how8eqEKw64+ q0HGuob+R6k80OBkv402gZzMTPZKq/s= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-456-LO38JR1EO5iOvei9zPNiwA-1; Mon, 17 May 2021 03:46:58 -0400 X-MC-Unique: LO38JR1EO5iOvei9zPNiwA-1 Received: by mail-wr1-f72.google.com with SMTP id i102-20020adf90ef0000b029010dfcfc46c0so3455914wri.1 for ; Mon, 17 May 2021 00:46:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Pr+SsTDKN3t1n8i1sa/wBoRt+XTnEThJ5HCbyq0kBx8=; b=h0m4Y9ZTzlUFBtcG7frzhEPE8hO8hM1jpbA9hDqYaDiVw+NrOXNj/9KjYiIe0K1Mpj UziLx8PeSbhkIjMQNRUQWUOXslnSUP2tisW/YEKPaJNhj8tGLYtokGD84SJjGZ9zCEfK o7Y0PMWd0LJmfOXnuG7/Gnkzj34eTalJl6vcVPXx9hLceSssG3b891uJAzDBOF8DJV4U c/tget3BltMnBWVeffJ+sEYIBFgw06lbrgjvkRNdposFMbWaEaktOO15QqUa+Q+iQcqj 9ovM6j3geWiPw58eMFTIdJdoCZZBsdzdISspfQBCfZAiqkF560w6+qwKTuFl+KGgTaC0 r07A== X-Gm-Message-State: AOAM530A7DWlziFQfHEZiy4bjB0E/B7TlAKmj/EUISeHpJ5v1bN/BFUO uEV1A+3bgn9BE3xmDp5Bxeg0AtU0vnxJNsYj0b80gDZVDq4kYgMv/Q7NLvfGVOsyABhrOmiV1kp U4Omrxgf6IsMjPh49gWLKX3fv X-Received: by 2002:a05:600c:410a:: with SMTP id j10mr16693579wmi.26.1621237617435; Mon, 17 May 2021 00:46:57 -0700 (PDT) X-Received: by 2002:a05:600c:410a:: with SMTP id j10mr16693534wmi.26.1621237617222; Mon, 17 May 2021 00:46:57 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:63a7:c72e:ea0e:6045? ([2001:b07:6468:f312:63a7:c72e:ea0e:6045]) by smtp.gmail.com with ESMTPSA id s6sm23233419wms.0.2021.05.17.00.46.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 May 2021 00:46:56 -0700 (PDT) To: Andy Lutomirski , Jon Kohler , Dave Hansen , Sean Christopherson Cc: Babu Moger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Fenghua Yu , Yu-cheng Yu , Tony Luck , Uros Bizjak , Petteri Aimonen , Kan Liang , Andrew Morton , Mike Rapoport , Benjamin Thiel , Fan Yang , Juergen Gross , Dave Jiang , "Peter Zijlstra (Intel)" , Ricardo Neri , Arvind Sankar , LKML , kvm list References: <20210507164456.1033-1-jon@nutanix.com> From: Paolo Bonzini Subject: Re: [PATCH] KVM: x86: add hint to skip hidden rdpkru under kvm_load_host_xsave_state Message-ID: <5e01d18b-123c-b91f-c7b4-7ec583dd1ec6@redhat.com> Date: Mon, 17 May 2021 09:46:52 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/05/21 07:11, Andy Lutomirski wrote: > That's nice, but it fails to restore XINUSE[PKRU]. As far as I know, > that bit is live, and the only way to restore it to 0 is with > XRSTOR(S). The manual says "It is possible for XINUSE[i] to be 1 even when state component i is in its initial configuration" so this is architecturally valid. Does the XINUSE optimization matter for PKRU which is a single word? >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index cebdaa1e3cf5..cd95adbd140c 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) >> } >> >> if (static_cpu_has(X86_FEATURE_PKU) && >> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && >> - vcpu->arch.pkru != vcpu->arch.host_pkru) >> - __write_pkru(vcpu->arch.pkru); >> + vcpu->arch.pkru != vcpu->arch.host_pkru && >> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) >> + __write_pkru(vcpu->arch.pkru, false); > > Please tell me I'm missing something (e.g. KVM very cleverly managing > the PKRU register using intercepts) that makes this reliably load the > guest value. An innocent or malicious guest could easily make that > condition evaluate to false, thus allowing the host PKRU value to be > live in guest mode. (Or is something fancy going on here?) RDPKRU/WRPKRU cannot be used unless CR4.PKE=1, but PKRU can still be accessed using XSAVE/XRSTOR. However both CR4 and XCR0 have their writes trapped, so the guest will not use the host PKRU value before the next vmexit if CR4.PKE=0 and XCR0.PKRU=0. > I don't even want to think about what happens if a perf NMI hits and > accesses host user memory while the guest PKRU is live (on VMX -- I > think this can't happen on SVM). This is indeed a problem, which indeed cannot happen on SVM but is there on VMX. Note that the function above is not handling all of the xstate, it's handling the *XSAVE state*, that is XCR0, XSS and PKRU. Thus the window is small, but it's there. Is it solvable at all, without having PKRU fields in the VMCS (and without masking NMIs in the LAPIC which would be too expensive)? Dave, Sean, what do you think? >> } >> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state); >> >> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) >> return; >> >> if (static_cpu_has(X86_FEATURE_PKU) && >> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || >> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { >> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) || >> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) { >> vcpu->arch.pkru = rdpkru(); >> if (vcpu->arch.pkru != vcpu->arch.host_pkru) >> - __write_pkru(vcpu->arch.host_pkru); >> + __write_pkru(vcpu->arch.host_pkru, true); >> } > > Suppose the guest writes to PKRU and then, without exiting, sets PKE = > 0 and XCR0[PKRU] = 0. (Or are the intercepts such that this can't > happen except on SEV where maybe SEV magic makes the problem go away?) Yes, see above. KVM needs to trap CR4 and XCR0 anyway (CR4 because you don't want the guest to clear e.g. MCE, XCR0 to forbid setting bits that the host kernel does not have in its own xstate). > I admit I'm fairly mystified as to why KVM doesn't handle PKRU like > the rest of guest XSTATE. Because the rest of the guest XSTATE is live too early. The problem you mention above with respect to perf, where you access host memory with the guest PKRU, would be very much amplified. It is basically the opposite problem of what you have in switch_fpu_finish, which loads PKRU eagerly while delaying the rest to the switch to userland. Paolo