Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5750472rwr; Tue, 9 May 2023 06:01:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6N5cjSgyxNgwN15EZXQmye1qWYt0844efqVHx5YslCfilwC8dTYv9/nNvZVw0kW4ZH7Q0N X-Received: by 2002:a05:6870:5141:b0:187:9dfd:87cf with SMTP id z1-20020a056870514100b001879dfd87cfmr5446903oak.37.1683637310880; Tue, 09 May 2023 06:01:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683637310; cv=none; d=google.com; s=arc-20160816; b=UArqdKxF1gpv0q6ikHkXBF1NawaXMiVuBzvDin0hv8Pc3bg1mp4Gppf0s3F8yVrict Yx3QW1G+0LGFubahX4Axs6Mz4vQ8gwR2EOWj4x0c6gjjDla9jEP8xmO08i6O+/DWR70/ Adl9FrXfBuj2Iyq2jMbq5mwv4aOhv0FKn3K2/Kn0YGIBFWpMoLcika0T8VU5sv0twPSM /FyNe1fsFlG0z0UPeqORJ+PeyitoEvZcgI08RSJJ/BRpU26LYo59AsT9oKJCibyHL4PT 3/zys20dDx6hi+XLO7gNTmisLVNLA4nNY3JYcbuuKLbgYsxIYjrW5QUqgh/kuvFtsVGM 7fpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=glCHMCJ4NE2rmBeG0lmxKYSV9bs0908wnv7L8ZlBvJo=; b=iNqdBGxVgJNEip212MfLxNaGk3+gk5VFWCRp5o0QOhNMDm5gpIpPnmJW6ANwOIQycy FBPGlUsIG9D+64BYKNGFk4Cnv8YzafiBrrJFNzbsth2JVzK4lslC2RE+yTST00egVB0P wsjG2Lnnkh/EHjrLZbch6CPD/QziARcDO9hCuYQTh7dkVPwzfKGsCUJivKlK0OzrFtGN dk8RVxOv+1ReopgtH7OKZAG3yFHNvU6Peny/Yfc55OKDR6asO7DUWtcqhx1HFhJmiJcw njRlTlvO1hHywsNVFDnw0t5y1kjEcSqaV8U0SNJgSI3nHHEFUYNCW15eEnB3UPqmACSp umaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UQd+EizI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020a0568300b9300b006a5f1fdd0aasi6479474otv.74.2023.05.09.06.01.35; Tue, 09 May 2023 06:01:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UQd+EizI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235055AbjEIMwJ (ORCPT + 99 others); Tue, 9 May 2023 08:52:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234981AbjEIMwD (ORCPT ); Tue, 9 May 2023 08:52:03 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 727AA2694; Tue, 9 May 2023 05:52:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683636720; x=1715172720; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:content-transfer-encoding:in-reply-to; bh=rrSQ/FJcEqHkrsjB0HKViGMNUOtQomkux4NFU2fyLL0=; b=UQd+EizIVOZpQSRFB1ChVWrnDtLeMUmqaZsyXRzu8n5A1wMND7BGtcr9 dCosj+p3HS3vxF1yS36bX+LDLCuYk4zjgBxN0lxh3oa9Yz2BLekLZKel4 IBWohBrUsz+CyPxguby8SUTReCXjmV+Fgy6kdULuhJK4BMETaU5r8qM5z i7bAsRZ5/yBn/Ja+W4oit51fwHr711jbWjpwi8NvWUmZd5/HEsRyqSfuS gzR+v6b9syaG7Omf4Rz8aGo4tfuXNsYpOhPkUuw/S0ylczJakaSs2eiyv aMIqJxYQ6geE3dLEX1wgv8es5nqVUMqncM7cDKorYBapaNEv9NIFNfccp Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="347373707" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="347373707" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2023 05:52:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10705"; a="768484681" X-IronPort-AV: E=Sophos;i="5.99,262,1677571200"; d="scan'208";a="768484681" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.105]) by fmsmga004.fm.intel.com with ESMTP; 09 May 2023 05:51:54 -0700 Date: Tue, 9 May 2023 20:44:28 +0800 From: Chao Peng To: Ackerley Tng Cc: Sean Christopherson , david@redhat.com, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com, joro@8bytes.org, mail@maciej.szmigiero.name, vbabka@suse.cz, vannapurve@google.com, yu.c.zhang@linux.intel.com, kirill.shutemov@linux.intel.com, dhildenb@redhat.com, qperret@google.com, tabba@google.com, michael.roth@amd.com, wei.w.wang@intel.com, rppt@kernel.org, liam.merwick@oracle.com, isaku.yamahata@gmail.com, jarkko@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hughd@google.com, brauner@kernel.org Subject: Re: Rename restrictedmem => guardedmem? (was: Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM) Message-ID: <20230509124428.GA217130@chaop.bj.intel.com> Reply-To: Chao Peng References: MIME-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 05, 2023 at 07:39:36PM +0000, Ackerley Tng wrote: > > Hi Sean, > > Thanks for implementing this POC! > > I??ve started porting the selftests (both Chao??s and those I added [1]). Hi Sean/Ackerley, Thanks for doing that. Overall making gmem a KVM ioctl() looks good to me and it should also play nice with Intel TDX. Besides what Ackerley mentioned below, I think we haven't discussed device assignment, which will be supported in not too long distance. Current VFIO_IOMMU_MAP_DMA consumes virtual address so that needs to be fixed for fd-based memory anyway, and the fix looks not related to whether this being a syscall() or a KVM ioctl(). There will be some initialization sequence dependency, e.g. if gmem is finally a VM-scope ioctl() then we need VM created first before can we map fd-based memory in VFIO, but that sounds not an issue at all. I also see Vlastimil/David expressed their preference on ioctl. So maybe we can move forward on your current PoC. Do you already have a plan to post a formal version? Chao > > guest mem seems to cover the use cases that have been discussed and > proposed so far, but I still need to figure out how gmem can work with > > + hugetlbfs > + specification of/storing memory policy (for NUMA node bindings) > + memory accounting - we may need to account for memory used separately, > so that guest mem shows up separately on /proc/meminfo and similar > places. > > One issue I??ve found so far is that the pointer to kvm (gmem->kvm) is > not cleaned up, and hence it is possible to crash the host kernel in the > following way > > 1. Create a KVM VM > 2. Create a guest mem fd on that VM > 3. Create a memslot with the guest mem fd (hence binding the fd to the > VM) > 4. Close/destroy the KVM VM > 5. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm > when it tries to do invalidation. > > I then tried to clean up the gmem->kvm pointer during unbinding when the > KVM VM is destroyed. > > That works, but then I realized there??s a simpler way to use the pointer > after freeing: > > 1. Create a KVM VM > 2. Create a guest mem fd on that VM > 3. Close/destroy the KVM VM > 4. Call fallocate(PUNCH_HOLE) on the guest mem fd, which uses gmem->kvm > when it tries to do invalidation. > > Perhaps binding should mean setting the gmem->kvm pointer in addition to > gmem->bindings. This makes binding and unbinding symmetric and avoids > the use-after-frees described above. > > This also means that creating a guest mem fd is no longer dependent on > the VM. Perhaps we can make creating a gmem fd a system ioctl (like > KVM_GET_API_VERSION and KVM_CREATE_VM) instead of a vm ioctl? > > [1] > https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@google.com/T/ > > Ackerley