Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2489238rwb; Mon, 15 Aug 2022 06:25:57 -0700 (PDT) X-Google-Smtp-Source: AA6agR6y1x+GAIuttYmn7RAm5waFjkc58bYu4EIDsRcWif7zPy36GvAwKD2uCXA2qChXCWI6KMRD X-Received: by 2002:a05:6402:528d:b0:440:d769:5908 with SMTP id en13-20020a056402528d00b00440d7695908mr14863349edb.52.1660569957055; Mon, 15 Aug 2022 06:25:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660569957; cv=none; d=google.com; s=arc-20160816; b=f/suMlSV+55guipVTMILkljeczCf6YkS//ycbMsaSAhniItkqT3t7CHzL0H0ZK/VC9 nvkaBCm+Nx46jvuUf3UgbyZZA5o+hm69Ziu6wOHaxXlCX7iNu57gmii4ODZ4rhAvRU4N dBXN046Ej+PbfsaMNrSPmi5FWfYR2QKH/fqt+Usd+8igtzNrdxTnDU4nDQgyCuFrVxTs 97qWA3YCyp+m+W6kdsWldU15OhBI73WFXqYdeI/9pmD/oSPqAln6YGJFWyd3E6z4k5q4 cOwjLGzW9ba2h7edfOuW+GW0q27OhhlokMwlT6o7UpUaMx2YP1zuWZkyZWYS/strQZnJ WW8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=7u+uMYJ8ZgjA7MkQbnSRBdFBtglkmzpcgQtiWpAzXkg=; b=u0MIY3cxp/lqAjs+lUzXXr3trLZ3sH2AqDaJIeXWOAWfk7h/ZxrVorjhDT1GolFLgm zPyjOXWsP3S+S2U4hcyu/UcuZsaos8+7SIn8Xb0GOcnPkylWXjX1kLZEX5sdlI17j/hj JGRtbix1hnrdljbo4Zvmo0OIHkQaVgqdbiGfg4iExmo97FkLJkr45G8RR6X6oEyjuT2f Wr1eADiOLg0KSly9+fpFyHYmqO7GEOPPMOa4Qvso8ucg7JOMmh46u+ZYudJi87W5o+dV gYhyPli47C/dbw14mDKVqzgG5GNTVsY5fLzRDuLH/pfMxT+BtBxqd0i00SG0o8veK1wL b2mQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=D7z1M7k0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g23-20020a056402321700b0044362798fc7si5348234eda.375.2022.08.15.06.25.30; Mon, 15 Aug 2022 06:25:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=D7z1M7k0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232396AbiHONJL (ORCPT + 99 others); Mon, 15 Aug 2022 09:09:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229816AbiHONJJ (ORCPT ); Mon, 15 Aug 2022 09:09:09 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57FCE1A050; Mon, 15 Aug 2022 06:09:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660568948; x=1692104948; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=uC6wcjTIjKWZdEEUC6nc6q/QLuQ42xa9S3mFqSAWmAc=; b=D7z1M7k05lfgk/4oNMiHl6lR0EEZWjiK78uYc0+ltrFjyA/mNld6zARG k7ZgsdkTCNGGjsnJr9RofIHSyuCJkl7M7ioTOZqbHuDVBMdnw8Rx83D58 DyB1YNE9SkEX4soApUwcFZLdY4JfrN6g1kl16OIBGm5A3pD+z0ZfpBocw bvXfl/EDl4g45w/YWmRLpPVQFHBTkQXlKcrJ3vNGrmXWiEYXP/YkKNGmT mYMWjiCJJM0rVu5H5VPiCYJfhfK0BsqykWt6WjqS2Gh6NEEAtNb+1KpZz 49OTVE0IsSIQYsZgXwN3+vm3VGJwRolVvb4SzGaU7FgsLfpy9/h9mDlWX A==; X-IronPort-AV: E=McAfee;i="6400,9594,10440"; a="317937105" X-IronPort-AV: E=Sophos;i="5.93,238,1654585200"; d="scan'208";a="317937105" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Aug 2022 06:09:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,238,1654585200"; d="scan'208";a="635470742" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga008.jf.intel.com with ESMTP; 15 Aug 2022 06:08:56 -0700 Date: Mon, 15 Aug 2022 21:04:11 +0800 From: Chao Peng To: "Nikunj A. Dadhania" Cc: "Gupta, Pankaj" , Sean Christopherson , Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , bharata@amd.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20220815130411.GA1073443@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> <9e86daea-5619-a216-fe02-0562cf14c501@amd.com> <9dc91ce8-4cb6-37e6-4c25-27a72dc11dd0@amd.com> <422b9f97-fdf5-54bf-6c56-3c45eff5e174@amd.com> <1407c70c-0c0b-6955-10bb-d44c5928f2d9@amd.com> <1136925c-2e37-6af4-acac-be8bed9f6ed5@amd.com> <1b02db9d-f2f1-94dd-6f37-59481525abff@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1b02db9d-f2f1-94dd-6f37-59481525abff@amd.com> X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 12, 2022 at 02:18:43PM +0530, Nikunj A. Dadhania wrote: > > > On 12/08/22 12:48, Gupta, Pankaj wrote: > > > >>>>>> > >>>>>> However, fallocate() preallocates full guest memory before starting the guest. > >>>>>> With this behaviour guest memory is *not* demand pinned. Is there a way to > >>>>>> prevent fallocate() from reserving full guest memory? > >>>>> > >>>>> Isn't the pinning being handled by the corresponding host memory backend with mmu > notifier and architecture support while doing the memory operations e.g page> migration and swapping/reclaim (not supported currently AFAIU). But yes, we need> to allocate entire guest memory with the new flags MEMFILE_F_{UNMOVABLE, UNRECLAIMABLE etc}. > >>>> > >>>> That is correct, but the question is when does the memory allocated, as these flags are set, > >>>> memory is neither moved nor reclaimed. In current scenario, if I start a 32GB guest, all 32GB is > >>>> allocated. > >>> > >>> I guess so if guest memory is private by default. > >>> > >>> Other option would be to allocate memory as shared by default and > >>> handle on demand allocation and RMPUPDATE with page state change event. But still that would be done at guest boot time, IIUC. > >> > >> Sorry! Don't want to hijack the other thread so replying here. > >> > >> I thought the question is for SEV SNP. For SEV, maybe the hypercall with the page state information can be used to allocate memory as we use it or something like quota based memory allocation (just thinking). > > > > But all this would have considerable performance overhead (if by default memory is shared) and used mostly at boot time. > > > So, preallocating memory (default memory private) seems better approach for both SEV & SEV SNP with later page management (pinning, reclaim) taken care by host memory backend & architecture together. > > I am not sure how will pre-allocating memory help, even if guest would not use full memory it will be pre-allocated. Which if I understand correctly is not expected. Actually the current version allows you to delay the allocation to a later time (e.g. page fault time) if you don't call fallocate() on the private fd. fallocate() is necessary in previous versions because we treat the existense in the fd as 'private' but in this version we track private/shared info in KVM so we don't rely on that fact from memory backstores. Definitely the page will still be pinned once it's allocated, there is no way to swap it out for example just with the current code. That kind of support, if desirable, can be extended through MOVABLE flag and some other callbacks to let feature-specific code to involve. Chao > > Regards > Nikunj