Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2800290lqz; Wed, 3 Apr 2024 08:59:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUsOj61AIbfNNYdiyi1nLJ7mDxJ41xwWKxG9DpvVR6w+plV+yN0hqzqOJLn4uXGVgpkpzE8+idDx+hUe2w6dahyIuk6L6mQn4PZ83sbow== X-Google-Smtp-Source: AGHT+IFsow+q0tZ/+H3s31GbouK8Ogsd2u6ec5JWP9gx7qRcCiwVjSdhy+a1+gmc1R1bj14+H3vK X-Received: by 2002:a05:622a:1754:b0:434:3afd:9b21 with SMTP id l20-20020a05622a175400b004343afd9b21mr4642854qtk.35.1712159946944; Wed, 03 Apr 2024 08:59:06 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712159946; cv=pass; d=google.com; s=arc-20160816; b=e3lBT1cm3Ui2nsui+/W3kZ8j5P8kg7Muyz5CTonWAS8Ob89QYXwNVVrh5oCSNr6/jz ZL50/ZE9xWVGIsU14rOhepqwFcISB4Uv7UkIqZ7cbTgc5MxD6dM6B10t6/5JswiPyolb tvV0oT90a1oHQsh0Vl0fEXd+IPJheS1ZCDRwmJvso4QoCtQhplGj/AUlFa5pBm9bVO10 0WvvA2cSsoE1bqkM+j9jc6Nur2sFmc9hjSsjVZzzzucJKXZqfmC5wscn5zDvQTbr7WPv vKnB3wmuzUgnP3UlZpu3hVBKanCAsUxswFPRV+9ybtHZL+5416Nh1MxyPbXrEBZQcsWI GdJg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=Gx8NzCNHxkxUNS3bHQ2Vy0QWy11cakhMcI+4YAbd68I=; fh=Lt8mPHFTWnJ8iVShzoLO8jofcgcwV4SyHdnvuQcLhBs=; b=wvmbJk5mmnm2ynJxVVX0EK0FjNCafXWOI/1ECch1Xug60YRQIgn4mEl8TArifOO2Gt b73PPqaIlDKjlNu2UHQrMGlUy2YjW4Hyu3zgluZQSRIw20NzmxmkTjAgcIsQoqrzL7NX lTnGaOybPEM7mIp4zZnHdmCE05l5bKUNEAX6JAoh/6eYIGZD58ceHUu3DYuBNjyn+ohP E3jn+I7m4JADOKj3pJMQxegoD0yVKSRlBvGbiPbsz+ZMQCLXXm1cdWjCxXirGuCCAFU2 t1ru5WZ2QLh3CgbbPdKgXCT9ENFRKMIjdDHZe88ramrFU0DozDfD9wdJO6WA+OHmDogd H9Dw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=JLb8FmcA; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-130150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130150-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id y8-20020ac85f48000000b004311b30b3b3si15062826qta.569.2024.04.03.08.59.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 08:59:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-130150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=JLb8FmcA; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-130150-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-130150-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9A98F1C21D30 for ; Wed, 3 Apr 2024 15:59:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 756E214A637; Wed, 3 Apr 2024 15:58:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JLb8FmcA" Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABB6914A4D8 for ; Wed, 3 Apr 2024 15:58:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712159934; cv=none; b=m64nVYyGi+IrjQJLig4xxTyD5dd7UJnBx4t+92Y112Sk5egs9QQkJPrhTcXiVK7elcJ0eqA1uOKuTUk6/ja37viIjiWMcbdnfL4l6uE5tR+1pxPt0VqGq4Jjk4p1SsVcMHPH1zZJVf2Tq9C6imd+wkWqLJvmY5bWLL8v3szNg6A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712159934; c=relaxed/simple; bh=GmWmx3DTXih0jvt+fWPUCOP3jKBAIOu/E1IcNqxUeBo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EEI524UOv/2CDJeHxPtiRHpE9naDrU8svHrRVIbRgDHBGJ4fLFz2mfR3gOVdP+JBfJ1mkvOtQeee3wbST0UkWHb5J+oqmsBF3s7wTvRqe+K5jhtwbB6Vw0KkTdKrGH/g+QAW9+YrMbpY6ubeR9K792ONfZTs5cwZUOBjry15s/0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JLb8FmcA; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2a2c3543b85so20074a91.0 for ; Wed, 03 Apr 2024 08:58:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712159932; x=1712764732; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Gx8NzCNHxkxUNS3bHQ2Vy0QWy11cakhMcI+4YAbd68I=; b=JLb8FmcARjn962baZh8h0FE/QC1HA7lq2ZFjPzLAltScuqddohwCB/cfvo1wvyuTJQ b98y3KhUFn3KdctYG6+0Y04KLxSHh9eWZkD5y6tcVpoWH+BnXTrMJ9Z8mfRuTNOzrA7k tbdm0YGsMs0k+P+ZZvenJXqvXFRnGqTLJuchEjkTu+HxL2XKI+ZDVy7qDXrZUafmmcui O6M9hf5O2ys5Nht7jiND7W209ZP283/kgeEuVXUVICJentqArO68jnQq1g72YGMWhU+2 7BbWczaHfzlMTuce5vYN0Zsqrny3MHdC3BcJ+syfWF66186n702s82Ur1+94Goaw2reu P5Wg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712159932; x=1712764732; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Gx8NzCNHxkxUNS3bHQ2Vy0QWy11cakhMcI+4YAbd68I=; b=QgHOUyXhvbrn/IfyeIB66b1stphIVzJ/AMRrbyxtudJrN0NmRvbrrEUKTEd/QJOL41 vSJenehH8I1QJB8EaLKqq93XXMdGhxSNRguy5Tsr1JRiYy7I3osR1zhH0OfCVkYSpU5j fbz1q/LDOMo8Vh97zbJdcHYTsexH1eCQSiT5R+lnbgfQ0rF3kapzsUTdDdZmVjyaeOxB mopSfrBaMJ3lU3sY0u58Hwk5ETtfOSdh6YIyDbkjneZpGC/4blLb5/6j08iE2g52qCD3 m9OOJFTi4WrvYB7uIgSwR325ShH/4UNrE0hMrdGAf7tlxzfa0UOfwSHNncGzWWc4bIHT trEg== X-Forwarded-Encrypted: i=1; AJvYcCUR0ATwF3JxshOfAfXdvMJg+EEdqN27cUvON1o8o52ljA37cpa9sotO/wEp4U7spWU0QbrOckyrWk2+OfkoTkNH0sBYxOrvptE7DSvg X-Gm-Message-State: AOJu0YzFvGd6qk4TXsSo66Hj9BRYabGOnWKjOAG6txtVDvbDRPZz3UU3 aj5WczPygs6EpnojiW+w0556Ab3QEu2iMEIqJYvaVEAu6+MnRl4KuPPv7X3aLnkaEH+6d3wW+p9 W7A== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:30c5:b0:2a2:8f19:f484 with SMTP id hi5-20020a17090b30c500b002a28f19f484mr7608pjb.3.1712159931879; Wed, 03 Apr 2024 08:58:51 -0700 (PDT) Date: Wed, 3 Apr 2024 08:58:50 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH v19 106/130] KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL From: Sean Christopherson To: isaku.yamahata@intel.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sagi Shahar , Kai Huang , chen.bo@intel.com, hang.yuan@intel.com, tina.zhang@intel.com Content-Type: text/plain; charset="us-ascii" On Mon, Feb 26, 2024, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata > > Some of TDG.VP.VMCALL require device model, for example, qemu, to handle > them on behalf of kvm kernel module. TDVMCALL_REPORT_FATAL_ERROR, > TDVMCALL_MAP_GPA, TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT, and > TDVMCALL_GET_QUOTE requires user space VMM handling. > > Introduce new kvm exit, KVM_EXIT_TDX, and functions to setup it. Device > model should update R10 if necessary as return value. Hard NAK. KVM needs its own ABI, under no circumstance should KVM inherit ABI directly from the GHCI. Even worse, this doesn't even sanity check the "unknown" VMCALLs, KVM just blindly punts *everything* to userspace. And even worse than that, KVM already has at least one user exit that overlaps, TDVMCALL_MAP_GPA => KVM_HC_MAP_GPA_RANGE. If the userspace VMM wants to run an end-around on KVM and directly communicate with the guest, e.g. via a synthetic device (a la virtio), that's totally fine, because *KVM* is not definining any unique ABI, KVM is purely providing the transport, e.g. emulated MMIO or PIO (and maybe not even that). IIRC, this option even came up in the context of GET_QUOTE. But explicit exiting to userspace with KVM_EXIT_TDX is very different. KVM is creating a contract with userspace that says "for TDX VMCALLs [a-z], KVM will exit to userspace with values [a-z]". *Every* new VMCALL that's added to the GHCI will become KVM ABI, e.g. if Intel ships a TDX module that adds a new VMALL, then KVM will forward the exit to userspace, and userspace can then start relying on that behavior. And punting all register state, decoding, etc. to userspace creates a crap ABI. KVM effectively did this for SEV and SEV-ES by copying the PSP ABI verbatim into KVM ioctls(), and it's a gross, ugly mess. Each VMCALL that KVM wants to forward needs a dedicated KVM_EXIT_ and associated struct in the exit union. Yes, it's slightly more work now, but it's one time pain. Whereas copying all registers is endless misery for everyone involved, e.g. *every* userspace VMM needs to decipher the registers, do sanity checking, etc. And *every* end user needs to do the same when a debugging inevitable failures. This also solves Chao's comment about XMM registers. Except for emualting Hyper-V hypercalls, which have very explicit handling, KVM does NOT support using XMM registers in hypercalls. > Signed-off-by: Isaku Yamahata > --- > v14 -> v15: > - updated struct kvm_tdx_exit with union > - export constants for reg bitmask > > Signed-off-by: Isaku Yamahata > --- > arch/x86/kvm/vmx/tdx.c | 83 ++++++++++++++++++++++++++++++++++++- > include/uapi/linux/kvm.h | 89 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 170 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index c8eb47591105..72dbe2ff9062 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1038,6 +1038,78 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) > return 1; > } > > +static int tdx_complete_vp_vmcall(struct kvm_vcpu *vcpu) > +{ > + struct kvm_tdx_vmcall *tdx_vmcall = &vcpu->run->tdx.u.vmcall; > + __u64 reg_mask = kvm_rcx_read(vcpu); > + > +#define COPY_REG(MASK, REG) \ > + do { \ > + if (reg_mask & TDX_VMCALL_REG_MASK_ ## MASK) \ > + kvm_## REG ## _write(vcpu, tdx_vmcall->out_ ## REG); \ > + } while (0) > + > + > + COPY_REG(R10, r10); > + COPY_REG(R11, r11); > + COPY_REG(R12, r12); > + COPY_REG(R13, r13); > + COPY_REG(R14, r14); > + COPY_REG(R15, r15); > + COPY_REG(RBX, rbx); > + COPY_REG(RDI, rdi); > + COPY_REG(RSI, rsi); > + COPY_REG(R8, r8); > + COPY_REG(R9, r9); > + COPY_REG(RDX, rdx); > + > +#undef COPY_REG > + > + return 1; > +} > + > +static int tdx_vp_vmcall_to_user(struct kvm_vcpu *vcpu) > +{ > + struct kvm_tdx_vmcall *tdx_vmcall = &vcpu->run->tdx.u.vmcall; > + __u64 reg_mask; > + > + vcpu->arch.complete_userspace_io = tdx_complete_vp_vmcall; > + memset(tdx_vmcall, 0, sizeof(*tdx_vmcall)); > + > + vcpu->run->exit_reason = KVM_EXIT_TDX; > + vcpu->run->tdx.type = KVM_EXIT_TDX_VMCALL; > + > + reg_mask = kvm_rcx_read(vcpu); > + tdx_vmcall->reg_mask = reg_mask; > + > +#define COPY_REG(MASK, REG) \ > + do { \ > + if (reg_mask & TDX_VMCALL_REG_MASK_ ## MASK) { \ > + tdx_vmcall->in_ ## REG = kvm_ ## REG ## _read(vcpu); \ > + tdx_vmcall->out_ ## REG = tdx_vmcall->in_ ## REG; \ > + } \ > + } while (0) > + > + > + COPY_REG(R10, r10); > + COPY_REG(R11, r11); > + COPY_REG(R12, r12); > + COPY_REG(R13, r13); > + COPY_REG(R14, r14); > + COPY_REG(R15, r15); > + COPY_REG(RBX, rbx); > + COPY_REG(RDI, rdi); > + COPY_REG(RSI, rsi); > + COPY_REG(R8, r8); > + COPY_REG(R9, r9); > + COPY_REG(RDX, rdx); > + > +#undef COPY_REG > + > + /* notify userspace to handle the request */ > + return 0; > +} > + > static int handle_tdvmcall(struct kvm_vcpu *vcpu) > { > if (tdvmcall_exit_type(vcpu)) > @@ -1048,8 +1120,15 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) > break; > } > > - tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); > - return 1; > + /* > + * Unknown VMCALL. Toss the request to the user space VMM, e.g. qemu, > + * as it may know how to handle. > + * > + * Those VMCALLs require user space VMM: > + * TDVMCALL_REPORT_FATAL_ERROR, TDVMCALL_MAP_GPA, > + * TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT, and TDVMCALL_GET_QUOTE. > + */ > + return tdx_vp_vmcall_to_user(vcpu); > } > > void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 5e2b28934aa9..a7aa804ef021 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -167,6 +167,92 @@ struct kvm_xen_exit { > } u; > }; > > +/* masks for reg_mask to indicate which registers are passed. */ > +#define TDX_VMCALL_REG_MASK_RBX BIT_ULL(2) > +#define TDX_VMCALL_REG_MASK_RDX BIT_ULL(3) > +#define TDX_VMCALL_REG_MASK_RSI BIT_ULL(6) > +#define TDX_VMCALL_REG_MASK_RDI BIT_ULL(7) > +#define TDX_VMCALL_REG_MASK_R8 BIT_ULL(8) > +#define TDX_VMCALL_REG_MASK_R9 BIT_ULL(9) > +#define TDX_VMCALL_REG_MASK_R10 BIT_ULL(10) > +#define TDX_VMCALL_REG_MASK_R11 BIT_ULL(11) > +#define TDX_VMCALL_REG_MASK_R12 BIT_ULL(12) > +#define TDX_VMCALL_REG_MASK_R13 BIT_ULL(13) > +#define TDX_VMCALL_REG_MASK_R14 BIT_ULL(14) > +#define TDX_VMCALL_REG_MASK_R15 BIT_ULL(15) > + > +struct kvm_tdx_exit { > +#define KVM_EXIT_TDX_VMCALL 1 > + __u32 type; > + __u32 pad; > + > + union { > + struct kvm_tdx_vmcall { > + /* > + * RAX(bit 0), RCX(bit 1) and RSP(bit 4) are reserved. > + * RAX(bit 0): TDG.VP.VMCALL status code. > + * RCX(bit 1): bitmap for used registers. > + * RSP(bit 4): the caller stack. > + */ > + union { > + __u64 in_rcx; > + __u64 reg_mask; > + }; > + > + /* > + * Guest-Host-Communication Interface for TDX spec > + * defines the ABI for TDG.VP.VMCALL. > + */ > + /* Input parameters: guest -> VMM */ > + union { > + __u64 in_r10; > + __u64 type; > + }; > + union { > + __u64 in_r11; > + __u64 subfunction; > + }; > + /* > + * Subfunction specific. > + * Registers are used in this order to pass input > + * arguments. r12=arg0, r13=arg1, etc. > + */ > + __u64 in_r12; > + __u64 in_r13; > + __u64 in_r14; > + __u64 in_r15; > + __u64 in_rbx; > + __u64 in_rdi; > + __u64 in_rsi; > + __u64 in_r8; > + __u64 in_r9; > + __u64 in_rdx; > + > + /* Output parameters: VMM -> guest */ > + union { > + __u64 out_r10; > + __u64 status_code; > + }; > + /* > + * Subfunction specific. > + * Registers are used in this order to output return > + * values. r11=ret0, r12=ret1, etc. > + */ > + __u64 out_r11; > + __u64 out_r12; > + __u64 out_r13; > + __u64 out_r14; > + __u64 out_r15; > + __u64 out_rbx; > + __u64 out_rdi; > + __u64 out_rsi; > + __u64 out_r8; > + __u64 out_r9; > + __u64 out_rdx; > + } vmcall; > + } u; > +}; > + > #define KVM_S390_GET_SKEYS_NONE 1 > #define KVM_S390_SKEYS_MAX 1048576 > > @@ -210,6 +296,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_NOTIFY 37 > #define KVM_EXIT_LOONGARCH_IOCSR 38 > #define KVM_EXIT_MEMORY_FAULT 39 > +#define KVM_EXIT_TDX 40 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -470,6 +557,8 @@ struct kvm_run { > __u64 gpa; > __u64 size; > } memory_fault; > + /* KVM_EXIT_TDX_VMCALL */ > + struct kvm_tdx_exit tdx; > /* Fix the size of the union. */ > char padding[256]; > }; > -- > 2.25.1 >