Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp1304500iof; Tue, 7 Jun 2022 02:57:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9u/37DX2yU1rbIhGIc38IhAN0BJkw9RsyDyLOsmW9luYsJzI6Fx7TaG8CN+YHG/F0u9tt X-Received: by 2002:a17:907:c06:b0:701:eb60:ded with SMTP id ga6-20020a1709070c0600b00701eb600dedmr24318487ejc.178.1654595839309; Tue, 07 Jun 2022 02:57:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654595839; cv=none; d=google.com; s=arc-20160816; b=JMp5GZzU4ELj/LKGQ1GuA9604kPFa+w0qtOGBcY96hQ7HX1vDtv30pBkwzmy2FgVsl 2LJP1ZNLRxxZiv2vWPYbg+YADZTpJiyhRBv26qrHI1FpK4rDOGK4zvaIV5jtNEJjl9RA 2uarUzoRynI+Nghq0Jzjst9ifdCDE3rgXaaxzpeviYUzswc4h7iang8y3FQHkt/qh9Jh P4VIEYO3Yd4Gwkb+dnXI++LY9FNFepbWiCPztQwuvkfRLHmmRDEwXMKPgYrCxdEel2/j +ED2LODO4vsRqV1SjksEN7fy92NDp+0qSRDMW9vEp6pSCwY1VeK9NBiMIiiKeldvQWek 7QeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=4UpcYO/fPrRwZe5biHRHG04jcx7HTtkKxrUsB/b9aDI=; b=pcl3a8f6QLVydyZkho05RpLDAuJuuUoHMdRA9PjFUenBNXfZ6THQDVJNWtFIIQFwU7 mEbIy0Q3jgN0xIoWAl9mZmF55CqIgRTJ49ol9rsVQ0WY7+uAlkshXpAX6iHW10ktFdG/ QcDEphM+yIsXpw2iBn3PLsNHxtEocykLHpiVbxnpu/zMpGsc8pY63D+QbWtm2LWdyr7A hKZUNU94NtB0lWjJqU4MRhNWpvApFiAvN0pJluN4E6C6s5Yu5ldBZTfqDn8HXcfhpHhY 2q9sd0CtpQo5rQVD/1B0RqsR9RmWjuVDbb2lZfJGo41jXlKRIUrmQOrd0WO9T871pwq0 fWdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=K0kATua1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id di5-20020a170906730500b006f4408bfffbsi9829497ejc.779.2022.06.07.02.56.52; Tue, 07 Jun 2022 02:57:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=K0kATua1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237411AbiFGHBs (ORCPT + 99 others); Tue, 7 Jun 2022 03:01:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231392AbiFGHBq (ORCPT ); Tue, 7 Jun 2022 03:01:46 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13DA2DFD13; Tue, 7 Jun 2022 00:01:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654585305; x=1686121305; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=ZV74rA8Yj9vsewf5ZR+zBoOmFi7pomIO9JTPNnhNKN8=; b=K0kATua1kyotCI00blMzra3BpvqvEwQpBF3ONucBFBtci+cCdKzXAutw TBRE/He+PipiFuZbJFq3dCEGT8pTGxug7femgjarN8qJP2e2NkuICxD85 HC2IekcArSCZXgpkrgzLbRbmtScHP5OtGxCXKBJ+cHOVpPd0euTpANd31 dky0nXhbdal3koQTU6Fcg0ba6j2EmTCTISRqo90924sTdqmqBovn664lZ u+cIB0RT1tm9jbvR+OqPKgjFNCp2zunFI7j0sJQzy7fe8fnKCkWxXteXY FQOOWwTmyKy2m/CdR8+Ki7bZBapqBZfdYU7urAQ18blSGx87h3yeYkBFq g==; X-IronPort-AV: E=McAfee;i="6400,9594,10370"; a="277355550" X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="277355550" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 00:01:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,283,1647327600"; d="scan'208";a="579485634" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 07 Jun 2022 00:01:13 -0700 Date: Tue, 7 Jun 2022 14:57:49 +0800 From: Chao Peng To: Vishal Annapurve Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , "Kirill A . Shutemov" , Andy Lutomirski , Jun Nakajima , dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Subject: Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20220607065749.GA1513445@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 06, 2022 at 01:09:50PM -0700, Vishal Annapurve wrote: > > > > Private memory map/unmap and conversion > > --------------------------------------- > > Userspace's map/unmap operations are done by fallocate() ioctl on the > > backing store fd. > > - map: default fallocate() with mode=0. > > - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE. > > The map/unmap will trigger above memfile_notifier_ops to let KVM map/unmap > > secondary MMU page tables. > > > .... > > QEMU: https://github.com/chao-p/qemu/tree/privmem-v6 > > > > An example QEMU command line for TDX test: > > -object tdx-guest,id=tdx \ > > -object memory-backend-memfd-private,id=ram1,size=2G \ > > -machine q35,kvm-type=tdx,pic=no,kernel_irqchip=split,memory-encryption=tdx,memory-backend=ram1 > > > > There should be more discussion around double allocation scenarios > when using the private fd approach. A malicious guest or buggy > userspace VMM can cause physical memory getting allocated for both > shared (memory accessible from host) and private fds backing the guest > memory. > Userspace VMM will need to unback the shared guest memory while > handling the conversion from shared to private in order to prevent > double allocation even with malicious guests or bugs in userspace VMM. I don't know how malicious guest can cause that. The initial design of this serie is to put the private/shared memory into two different address spaces and gives usersapce VMM the flexibility to convert between the two. It can choose respect the guest conversion request or not. It's possible for a usrspace VMM to cause double allocation if it fails to call the unback operation during the conversion, this may be a bug or not. Double allocation may not be a wrong thing, even in conception. At least TDX allows you to use half shared half private in guest, means both shared/private can be effective. Unbacking the memory is just the current QEMU implementation choice. Chao > > Options to unback shared guest memory seem to be: > 1) madvise(.., MADV_DONTNEED/MADV_REMOVE) - This option won't stop > kernel from backing the shared memory on subsequent write accesses > 2) fallocate(..., FALLOC_FL_PUNCH_HOLE...) - For file backed shared > guest memory, this option still is similar to madvice since this would > still allow shared memory to get backed on write accesses > 3) munmap - This would give away the contiguous virtual memory region > reservation with holes in the guest backing memory, which might make > guest memory management difficult. > 4) mprotect(... PROT_NONE) - This would keep the virtual memory > address range backing the guest memory preserved > > ram_block_discard_range_fd from reference implementation: > https://github.com/chao-p/qemu/tree/privmem-v6 seems to be relying on > fallocate/madvise. > > Any thoughts/suggestions around better ways to unback the shared > memory in order to avoid double allocation scenarios? > > Regards, > Vishal