Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3655344rwd; Mon, 22 May 2023 18:09:32 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4NAjU8VUN5oqiNGDDZhs0u0b7q4lneoTPp5WxqR5E4OwOoU5JHKd7OwR0KXZ/LHzvlnnB4 X-Received: by 2002:a17:902:7008:b0:1ac:84dd:6d1f with SMTP id y8-20020a170902700800b001ac84dd6d1fmr12322480plk.1.1684804171828; Mon, 22 May 2023 18:09:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684804171; cv=none; d=google.com; s=arc-20160816; b=xYC+IycM0hWie93SZnNkHXOMZ9TcHF57i/AkZzHwDO/nsfzZDSsisTFySQcWdwhDDc J78P1eHXipJ5dkaQx7RFLJc5v2Dn5uaJSRybKt7bgoFS5yw3EtjkSUjDnapJ3aUZNu4y eBcy8Tz5yzP6U6LCLmhsaLAa2iv8Vdth5z5mH37jWuiDztoiNHR7w0UExdFcpqnig34J EJUdvqFqLmmRlbMJFtFV321KT2WQCDznppXlhtMNE0Q7G09lMiCf5VJsQa33oUMhxOgm WFPDdKpTenwuKKlu4AE8HQlpuD6VJqvzMf3YPkZw3KzLKyINgAXANW3vWahb8xKD7G0n +5Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=2aYdF7DTvHPO7OMZesJvRkwGqhq0minNXqz8wSE/nz4=; b=ztaivvSHnKz93x3KoDWbJw2zUat5PHaDIz+k3sRUKctlN+erZ1qggz7Vz9pWGKecxC fLHDkw5ebLXJHj1s7tV1UW1Yy7XXXJ9eANd4BKZHodmR5ocjItOJBxxUoJJMqtCIB1tc yUAHEPV+JBWqDn79VMWkhVeRIuviteVlDD42UV1ZW+zY/Wh0FMTbr2j51qtFR+xyAQnx 0HHZmS0sCXRCVpx+7HUQLQNPxzK1hIXhscGmlswhO7d5xzhNKtyYkhZyGeOlTq2aQIBV Z4b79jdLRI2tTZHEfo2P3lZFMGFXc0iydYIa/55XYd8gOR6lbpYagwyNDpDAlF2PFa1t 2LoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=6hkfuhgD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j4-20020a170902c08400b001ae3214162dsi5387859pld.548.2023.05.22.18.09.17; Mon, 22 May 2023 18:09:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=6hkfuhgD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234983AbjEWAkh (ORCPT + 99 others); Mon, 22 May 2023 20:40:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235540AbjEWAbC (ORCPT ); Mon, 22 May 2023 20:31:02 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C49DCE47 for ; Mon, 22 May 2023 17:21:42 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-64d2d0f5132so2429962b3a.3 for ; Mon, 22 May 2023 17:21:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684801302; x=1687393302; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2aYdF7DTvHPO7OMZesJvRkwGqhq0minNXqz8wSE/nz4=; b=6hkfuhgDV+Cm6kpsvzWJ48OVTPAuJNbAy//iZV3iTyUkEm0CBAiqKh89/T3iZhHz2Q cS/nl4xPBMzFPSFWoI45FkfXCfhPFhmdrAtsbpG7H2XU7N4TH9lA1jBlH6doK5D5jGF9 ssywWHAIrAIm4hg395CyoE06GD7PWLdSA8pWKEnXN5od0s8MM/vGppgN7Y1k1zVrN/5l t+Xuz2ALaAtGJt9hjPyJJtXoDPu6+f2qM+ISuGV3bk/DwEkTviRlnRJWj+jRfCEMNUsw xubELT0MLwIRN2O5A4wXqEY5VMPb6RITikcAhhPQuoPKHRCzm+uZO4JRG5iNINtlxx8O gjCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684801302; x=1687393302; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2aYdF7DTvHPO7OMZesJvRkwGqhq0minNXqz8wSE/nz4=; b=CFqTd0BDADi/3BEz+HszE2a/DKHArjhu/tkK0e/NZAuhJpRMSZjF7Q2XCnynbBpklP 814BtD7aV/Rb+bBwYd60A2MzMs3lgBtHF4DJOZV430BvTidzYl0GMQR5TG+DZL71abql uUx6CwnAyKOfsDhAXv5uIGryEx/4vrE8i/jjRwpE68xDdB28Sdid3YAmNniZeji2vw8Z seQUJqEdiCIQfW0wQT9ce2yMGI+ZJuCZ//02vE8yd43FXEPPA04rdzxMMcrKoyLT2git O+BCYh5gmoqK1MqQGdnkveHbYGfyEqGCGEJxH9UMtZjDZJUrEGVBszOt1XErqkpBld6D HNZA== X-Gm-Message-State: AC+VfDwMIpMAvxZo2LwY2r0zj0nkBSc2GjfyG/DeProt0uFWDZgozLMK FfX7HtZGNcY8eVJ1hMyLSa8Sj5AZA+Y= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:986:b0:63d:397a:8fdd with SMTP id u6-20020a056a00098600b0063d397a8fddmr5586115pfg.0.1684801302311; Mon, 22 May 2023 17:21:42 -0700 (PDT) Date: Tue, 23 May 2023 00:21:40 +0000 In-Reply-To: <20230522235838.ov3722lcusotzlvo@amd.com> Mime-Version: 1.0 References: <9efef45f-e9f4-18d1-0120-f0fc0961761c@redhat.com> <5869f50f-0858-ab0c-9049-4345abcf5641@redhat.com> <20230512002124.3sap3kzxpegwj3n2@amd.com> <20230522135036.wnvsmryhkvstwvw2@amd.com> <20230522235838.ov3722lcusotzlvo@amd.com> Message-ID: Subject: Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM) From: Sean Christopherson To: Michael Roth Cc: David Hildenbrand , Chao Peng , Paolo Bonzini , Vitaly Kuznetsov , Jim Mattson , Joerg Roedel , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , dhildenb@redhat.com, Quentin Perret , tabba@google.com, wei.w.wang@intel.com, Mike Rapoport , Liam Merwick , Isaku Yamahata , Jarkko Sakkinen , Ackerley Tng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Hugh Dickins , Christian Brauner Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 22, 2023, Michael Roth wrote: > On Mon, May 22, 2023 at 10:09:40AM -0700, Sean Christopherson wrote: > > On Mon, May 22, 2023, Michael Roth wrote: > > > On Fri, May 12, 2023 at 11:01:10AM -0700, Sean Christopherson wrote: > > > > On Thu, May 11, 2023, Michael Roth wrote: > > > I put together a tree with some fixups that are needed for against the > > > kvm_gmem_solo base tree, and a set of hooks to handle invalidations, > > > preparing the initial private state as suggested above, and a > > > platform-configurable mask that the x86 MMU code can use for determining > > > whether a fault is for private vs. shared pages. > > > > > > KVM: x86: Determine shared/private faults using a configurable mask > > > ^ for TDX we could trivially add an inverted analogue of the mask/logic > > > KVM: x86: Use full 64-bit error code for kvm_mmu_do_page_fault > > > KVM: x86: Add platform hooks for private memory invalidations > > > > Hrm, I'd prefer to avoid adding another hook for this case, arch code already has > > a "hook" in the form of kvm_unmap_gfn_range(). We'd probably just need a > > kvm_gfn_range.is_private flag to communicate to arch/vendor code that the memory > > being zapped is private. > > kvm_unmap_gfn_range() does however get called with kvm->mmu_lock held so > it might be tricky to tie RMP updates into that path. Gah, I caught the mmu_lock issue before the end of my email, but forgot to go back and rethink the first half. > > That'd leave a gap for the unbind() case because kvm_unmap_gfn_range() is invoked > > if and only if there's an overlapping memslot. I'll chew on that a bit to see if > > there's a way to cleanly handle that case without another hook. I think it's worth > > mapping out exactly what we want unbind() to look like anyways, e.g. right now the > > code subtly relies on private memslots being immutable. m > I thought the direction you sort of driving at was to completely decouple > RMP updates for physical pages from the KVM MMU map/unmap paths since the > life-cycles of those backing pages and associated RMP state are somewhat > separate from the state of the GFNs and kvm->mem_attr_array. It seems to > make sense when dealing with things like this unbind() case. > > There's also cases like userspaces that opt to not discard memory after > conversions because they highly favor performance over memory usage. In > those cases it would make sense to defer marking the pages as shared in > the RMP until the FALLOC_FL_PUNCH_HOLE, rather than triggering it via > KVM MMU invalidation path after a conversion. Hmm, right. I got overzealous in my desire to avoid new hooks. > > > KVM: x86: Add platform hook for initializing private memory > > > > This should also be unnecessary. The call to kvm_gmem_get_pfn() is from arch > > code, KVM just needs to ensure the RMP is converted before acquiring mmu_lock, > > e.g. KVM has all the necessary info in kvm_tdp_mmu_page_fault(). > > I think that approach would work fine. The way I was thinking of things > is that KVM MMU would necessarily call kvm_gmem_get_pfn() to grab the > page before mapping it into the guest, so moving it out into an explicit > call should work just as well. That would also drop the need for the > __kvm_gmem_get_pfn() stuff I needed to add for the initial case where we > need to access the PFN prior to making it private. > > > > > The only reason to add another arch hook would be if we wanted to converted the > > RMP when _allocating_, e.g. to preconvert in response to fallocate() instead of > > waiting until #NPF. But I think I would rather add a generic ioctl() to allow > > userspace to effectively prefault guest memory, e.g. to setup the RMP before > > running a vCPU. Such an ioctl() would potentially be useful in other scenarios, > > e.g. on the dest during live migration to reduce jitter. > > Agreed, deferring the RMPUPDATE until it's actually needed would give us > more flexibility on optimizing for things like lazy-acceptance. > > For less-common scenarios like preallocation it makes sense to make that > an opt-in sort of thing for userspace to configure explicitly. > > > > > > *fixup (kvm_gmem_solo): KVM: Fix end range calculation for MMU invalidations > > > > There was another bug in this path. The math for handling a non-zero offsets into > > the file was wrong. The code now looks like: > > > > xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { > > struct kvm_gfn_range gfn_range = { > > .start = slot->base_gfn + start - slot->gmem.index, > > Sorry if I'm missing something here, but isn't there a risk that: > > start - slot->gmem.index > > would be less than zero? E.g. starting GFN was 0, but current slot is bound > at some non-zero offset in the same gmem instance. I guess the warning below > shouldn't caught that, but it seems like a real scenario. Heh, only if there's a testcase for it. Assuming start >= the slot offset does seem broken, e.g. if the range-to-invalidate overlaps multiple slots, later slots will have index==slot->gmem.index > start. > Since 'index' corresponds to the gmem offset of the current slot, is there any > reason not to do something like this?: > > .start = slot->base_gfn + index - slot->gmem.index, > > But then, if that's the case, wouldn't index == slot->gmem.index? Suggesting > we case just simplify to this?: > > .start = slot->base_gfn, No, e.g. if start is partway through a memslot, there's no need to invalidate the entire memslot. I'll stare at this tomorrow when my brain is hopefully a bit more functional, I suspect there is a min() and/or max() needed somewhere.