Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp645749rdb; Tue, 5 Dec 2023 16:07:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/FEhj0eL50tPB3uCei9oyppXw6vcen+qmko+o8/eudsL4aDjU/3G6Zkm+4NnR8LCj6h1x X-Received: by 2002:a17:903:1250:b0:1d0:6ffe:a04 with SMTP id u16-20020a170903125000b001d06ffe0a04mr33562plh.98.1701821256343; Tue, 05 Dec 2023 16:07:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701821256; cv=none; d=google.com; s=arc-20160816; b=sHVJ9LBY0OkMCJ+SMPodMh0mfx5OKMo+l2PebKOOoJyj7IPOh8cPehUvmMyQp20a2h owtcsAiuzskPMYMaoeohH4YYR9SsrW3LPzXkbWe09nKVuniMYq09MxTz3F+5xgXCBzp6 x58ur2hoPbAnHwDt0IknSZ4A3dMsAHp/JGAEsj5d3R2nPC69kF4ZdLHgcWxoKMYtEOT4 fYhM4EscS2XSp+tMuglZQyBsBROgVPO/Eh+rdIIehTs78OEHpHoKtZ7RKbPeKHjhq/DZ IyMiKmdppR2AuGkLkw18csOd33nKgURjIOBlsemjaERL60KT1DY/cb7Q3Yo2Z+NJ7zHk z4dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=B09jNl/6ew1fQLf1zTOLuFX5OzlF3p6K7osCTShEz6M=; fh=Shn4BdQjFGtUdMtG4PtjDFheYWXY+XPBpyKCzpnnmWc=; b=EZztqbZdeGovi6+e8K4V9edteVnUmyneze2/QUrubTd5fXBY3IC/fCYkhg/o734V7m hSqFc8M8tb/wOHVH5XZXUABZOV+WFOW3J/Ad29f+F2qGquTLJTOdgwHqlFb9yjgnXPDd BSn6V4uQLmmr5S3kfyWFWGB43tI2ZDRy0CoqkqXPEMHTBCAdgGiV+PVnj2yOIo86bHBs zIBjx8YqohFHGYfNWIzOlKhrdSpqnfO0Z/mHhp7jl2/HIRVORYa+25+9hHspSROSuKq6 URuz80BzkS5l3ueIZ/ogzyDHPpbtjvW5JQ0ImRBqUZMku9cZ5pMoRkMCj2OXSnqjwxXF tXWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=xR8t6mUk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id jk5-20020a170903330500b001d0748f1dffsi6106211plb.162.2023.12.05.16.07.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 16:07:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=xR8t6mUk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 26FE38056A32; Tue, 5 Dec 2023 16:07:33 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346643AbjLFAHP (ORCPT + 99 others); Tue, 5 Dec 2023 19:07:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346563AbjLFAHN (ORCPT ); Tue, 5 Dec 2023 19:07:13 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDE6D1A2 for ; Tue, 5 Dec 2023 16:07:17 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5ca2a6f07b6so101006877b3.2 for ; Tue, 05 Dec 2023 16:07:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701821237; x=1702426037; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=B09jNl/6ew1fQLf1zTOLuFX5OzlF3p6K7osCTShEz6M=; b=xR8t6mUk4YlLTCwB3wzAVW+pbILw5/x9O7C9Ot8cC5Eacn2DbevSTZUw1/uk8+IUH3 ffDD6LfXxy26tQN5nZ4m39/EejSIRlbTV4iqOHFSpdCOfyzh739uRUElkDNCbyoZuwMv AHB0kMv9w7K++aVN+nE15SYKf7zPrbiahaCSr0uRgHu8ePf3J22npjoBHA4mTmvilEdj iwZdtCjfY0NkJV7Tlyggt7tWKF3O0IUUgHh3WP7o3GyRpwTrhQa3VYU3Cx51Wpwfx4Il pdiv6B2MkJUK+vU10QGQM+bRZ4/HzOK+a/BXGCDcvrocDdU0hhW2R5Qpw3CajRXprFqM P+Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701821237; x=1702426037; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=B09jNl/6ew1fQLf1zTOLuFX5OzlF3p6K7osCTShEz6M=; b=Q1bBr83+bo7v1I5c/NrDVjNfpNPJV7yKD7BfZekfSNZ7O142gPO6FJecdg727UgjJ5 pd5ItaCjeph2zWrPXqV8HPHacXH0O8xJRB60jvApIF3LeZdFRgRQKYydU7qSF0QPGY3N 0g5nVVo+x+xB3vUlYxLoMWzfxk/R5fasWTbbhpHDhpRSmo/1ibImi2Lm3m3YCtHtHtAT yzv+yxX2JCdAyfmj1OMVHfV6p/a3BADm/jzB5sODrHb1ZgJL4TTMbqjpomYMRHlUxzo4 7IdE4tbSf+VxisdGwEkR9TBLfM3dd4wj75xRd4Dv2YQEyEnscgCtmuqgCjfpTNlk0jaa Tl8g== X-Gm-Message-State: AOJu0YzbINxnVkafTKETUcJ03N0jpL5zudrwlcIR44fxkJKsYPOEfCTe LUnNnm0KDgLT9Dj22Wc8KTHngBaLgHY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:2f0b:b0:5d4:3013:25d4 with SMTP id ev11-20020a05690c2f0b00b005d4301325d4mr282162ywb.5.1701821237087; Tue, 05 Dec 2023 16:07:17 -0800 (PST) Date: Tue, 5 Dec 2023 16:07:15 -0800 In-Reply-To: Mime-Version: 1.0 References: <20231108111806.92604-1-nsaenz@amazon.com> <20231108111806.92604-6-nsaenz@amazon.com> Message-ID: Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page From: Sean Christopherson To: Maxim Levitsky Cc: Nicolas Saenz Julienne , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, pbonzini@redhat.com, vkuznets@redhat.com, anelkz@amazon.com, graf@amazon.com, dwmw@amazon.co.uk, jgowans@amazon.com, kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, x86@kernel.org, linux-doc@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 05 Dec 2023 16:07:33 -0800 (PST) On Tue, Dec 05, 2023, Maxim Levitsky wrote: > On Tue, 2023-12-05 at 11:21 -0800, Sean Christopherson wrote: > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote: > > > On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote: > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote: > > > > > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote: > > > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote: > > > > > > > > To support this I think that we can add a userspace msr fil= ter on the HV_X64_MSR_HYPERCALL, > > > > > > > > although I am not 100% sure if a userspace msr filter overr= ides the in-kernel msr handling. > > > > > > >=20 > > > > > > > I thought about it at the time. It's not that simple though, = we should > > > > > > > still let KVM set the hypercall bytecode, and other quirks li= ke the Xen > > > > > > > one. > > > > > >=20 > > > > > > Yeah, that Xen quirk is quite the killer. > > > > > >=20 > > > > > > Can you provide pseudo-assembly for what the final page is supp= osed to look like? > > > > > > I'm struggling mightily to understand what this is actually try= ing to do. > > > > >=20 > > > > > I'll make it as simple as possible (diregard 32bit support and th= at xen > > > > > exists): > > > > >=20 > > > > > vmcall <- Offset 0, regular Hyper-V hypercalls enter= here > > > > > ret > > > > > mov rax,rcx <- VTL call hypercall enters here > > > >=20 > > > > I'm missing who/what defines "here" though. What generates the CAL= L that points > > > > at this exact offset? If the exact offset is dictated in the TLFS,= then aren't > > > > we screwed with the whole Xen quirk, which inserts 5 bytes before t= hat first VMCALL? > > >=20 > > > Yes, sorry, I should've included some more context. > > >=20 > > > Here's a rundown (from memory) of how the first VTL call happens: > > > - CPU0 start running at VTL0. > > > - Hyper-V enables VTL1 on the partition. > > > - Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It pas= ses > > > the initial VTL1 CPU state alongside the enablement hypercall > > > arguments. > > > - Hyper-V sets the Hypercall page overlay address through > > > HV_X64_MSR_HYPERCALL. KVM fills it. > > > - Hyper-V gets the VTL-call and VTL-return offset into the hypercall > > > page using the VP Register HvRegisterVsmCodePageOffsets (VP regist= er > > > handling is in user-space). > >=20 > > Ah, so the guest sets the offsets by "writing" HvRegisterVsmCodePageOff= sets via > > a HvSetVpRegisters() hypercall. >=20 > No, you didn't understand this correctly.=20 >=20 > The guest writes the HV_X64_MSR_HYPERCALL, and in the response hyperv fil= ls When people say "Hyper-V", do y'all mean "root partition"? If so, can we j= ust say "root partition"? Part of my confusion is that I don't instinctively k= now whether things like "Hyper-V enables VTL1 on the partition" are talking abo= ut the root partition (or I guess parent partition?) or the hypervisor. Functiona= lly it probably doesn't matter, it's just hard to reconcile things with the TLFS, = which is written largely to describe the hypervisor's behavior. > the hypercall page, including the VSM thunks. > > Then the guest can _read_ the offsets, hyperv chose there by issuing anot= her hypercall.=20 Hrm, now I'm really confused. Ah, the TLFS contradicts itself. The blurb = for AccessVpRegisters says: The partition can invoke the hypercalls HvSetVpRegisters and HvGetVpRegis= ters. And HvSetVpRegisters confirms that requirement: The caller must either be the parent of the partition specified by Partit= ionId, or the partition specified must be =E2=80=9Cself=E2=80=9D and the partiti= on must have the AccessVpRegisters privilege But it's absent from HvGetVpRegisters: The caller must be the parent of the partition specified by PartitionId o= r the partition specifying its own partition ID. > In the current implementation, the offsets that the kernel choose are fir= st > exposed to the userspace via new ioctl, and then the userspace exposes th= ese > offsets to the guest via that 'another hypercall' (reading a pseudo parti= tion > register 'HvRegisterVsmCodePageOffsets') >=20 > I personally don't know for sure anymore if the userspace or kernel based > hypercall page is better here, it's ugly regardless :( Hrm. Requiring userspace to intercept the WRMSR will be a mess because the= n KVM will have zero knowledge of the hypercall page, e.g. userspace would be for= ced to intercept HV_X64_MSR_GUEST_OS_ID as well. That's not the end of the world,= but it's not exactly ideal either. What if we exit to userspace with a new kvm_hyperv_exit reason that require= s completion? I.e. punt to userspace if VSM is enabled, but still record the= data in KVM? Ugh, but even that's a mess because kvm_hv_set_msr_pw() is deep in= the WRMSR emulation call stack and can't easily signal that an exit to userspac= e is needed. Blech.