X-Received: by 2002:a05:6402:51ca:b0:412:d1b2:496d with SMTP id r10-20020a05640251ca00b00412d1b2496dmr1087771edd.18.1645647232200; Wed, 23 Feb 2022 12:13:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645647232; cv=none; d=google.com; s=arc-20160816; b=bKF6iFz0+vrjTFdSnEwPUOe7czNEyakwophUaKgDSkPkU0bIx5w4gprtcMuvOghhJn L45CmjutDBTaI+Lg3ywdX3fuP79PmEOeAU8vPEA+UnMtmyQAZOhOPpTaBtvnEeR7O8+f gkAgLeXBnP5as1qMzM5+0kassC9isDSYBGrNL0t3uYMmYIpt/5Jgn91K9Cgk7s0IJhkS INeoMF1W+20i7Mb1AJgBexiZOCmxRcexMDUqqUPZD5qtxcwsapcdr7QVW/YyLg0touMi FkR+JqXipYi+JffD4bgbQ3mtKyZDN99k3/nds46C384s6w2jESSFlQo3vTU9C5t8RfDv i3bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=sYMhb361Jc1RISz0w/5QC+H8iOEUZsyXSMldSjGGynY=; b=fkyuDw9cBja/zUviXoTIIvQmcZHuU+SC4yQtZgXCaF7M6YNf61J0/91gDoQPm5vy44 wDWTSA+eaqAGWRCAcaPFsV/Okz40pVeahZDdvfFzeoK0CqlFi/mEevcnqp82zuXESK0H An3tJvtml4WcxNkG/UP2Z3iXZIjbogMaxNIOCydzkUz6yV3D8xGR903/eCEXWRGwZYih 2mRD7kmpqwg1nrLiLxfUmn2MTeiq46faDUsf72dddl9CzNdGUySaZ3Z1HrQ5RqNgfToE LD4iZ6rlor7EX197nZeRswyi0nJGplEUpL7/BMwr0oVDXy7sdf5P6qkN16BGU/eayjiT Vj/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=iLngOzGH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x3si327267edd.466.2022.02.23.12.13.28; Wed, 23 Feb 2022 12:13:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=iLngOzGH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240236AbiBWMBy (ORCPT + 99 others); Wed, 23 Feb 2022 07:01:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232651AbiBWMBx (ORCPT ); Wed, 23 Feb 2022 07:01:53 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF0C99A4ED; Wed, 23 Feb 2022 04:01:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645617685; x=1677153685; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=eoV+NoqJpfdApzLIf0eBSfJu/vs2jcIFT1WCrSXOtdw=; b=iLngOzGH4PoTbIKFUazf5Mawfw5oC0OcT19HQOliRJfnJoHK0URhdOT+ g0I39UdUdSjrNYqZEiAcHRd655lGVokoHuSIXxx8orKo6g6m1ugFQXqmY ARMtJ2Yic+VfKTouKjnt7hdfegfJPXOtA4mm4OeD0OIqMDLwwn86h2JD8 AFOnLTRJr7LGWJJkVQqJuvmig6MGEoWW/3BVVQhqaJeX6Rcz/byP1eyYQ RAJLOhCTQm1E5Ru/cdwg+6VnmRISnPjZMNDzl8sNvenWYhMBYqOrV/mIP TwKzExSnP/R0/Ss5Y476vSGMsth41ecruomSOUJAJ+L5aLROsguehz0yb g==; X-IronPort-AV: E=McAfee;i="6200,9189,10266"; a="338384365" X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="338384365" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Feb 2022 04:01:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,390,1635231600"; d="scan'208";a="532653565" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 23 Feb 2022 04:01:07 -0800 Date: Wed, 23 Feb 2022 20:00:47 +0800 From: Chao Peng To: "Maciej S. Szmigiero" Cc: Yu Zhang , Paolo Bonzini , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , kvm@vger.kernel.org, Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, qemu-devel@nongnu.org Subject: Re: [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE Message-ID: <20220223120047.GB53733@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220118132121.31388-1-chao.p.peng@linux.intel.com> <20220118132121.31388-13-chao.p.peng@linux.intel.com> <20220217134548.GA33836@chaop.bj.intel.com> <45148f5f-fe79-b452-f3b2-482c5c3291c4@maciej.szmigiero.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45148f5f-fe79-b452-f3b2-482c5c3291c4@maciej.szmigiero.name> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 22, 2022 at 02:16:46AM +0100, Maciej S. Szmigiero wrote: > On 17.02.2022 14:45, Chao Peng wrote: > > On Tue, Jan 25, 2022 at 09:20:39PM +0100, Maciej S. Szmigiero wrote: > > > On 18.01.2022 14:21, Chao Peng wrote: > > > > KVM_MEM_PRIVATE is not exposed by default but architecture code can turn > > > > on it by implementing kvm_arch_private_memory_supported(). > > > > > > > > Also private memslot cannot be movable and the same file+offset can not > > > > be mapped into different GFNs. > > > > > > > > Signed-off-by: Yu Zhang > > > > Signed-off-by: Chao Peng > > > > --- > > > (..) > > > > static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, > > > > - gfn_t start, gfn_t end) > > > > + struct file *file, > > > > + gfn_t start, gfn_t end, > > > > + loff_t start_off, loff_t end_off) > > > > { > > > > struct kvm_memslot_iter iter; > > > > + struct kvm_memory_slot *slot; > > > > + struct inode *inode; > > > > + int bkt; > > > > kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { > > > > if (iter.slot->id != id) > > > > return true; > > > > } > > > > + /* Disallow mapping the same file+offset into multiple gfns. */ > > > > + if (file) { > > > > + inode = file_inode(file); > > > > + kvm_for_each_memslot(slot, bkt, slots) { > > > > + if (slot->private_file && > > > > + file_inode(slot->private_file) == inode && > > > > + !(end_off <= slot->private_offset || > > > > + start_off >= slot->private_offset > > > > + + (slot->npages >> PAGE_SHIFT))) > > > > + return true; > > > > + } > > > > + } > > > > > > That's a linear scan of all memslots on each CREATE (and MOVE) operation > > > with a fd - we just spent more than a year rewriting similar linear scans > > > into more efficient operations in KVM. > > > > In the last version I tried to solve this problem by using interval tree > > (just like existing hva_tree), but finally we realized that in one VM we > > can have multiple fds with overlapped offsets so that approach is > > incorrect. See https://lkml.org/lkml/2021/12/28/480 for the discussion. > > That's right, in this case a two-level structure would be necessary: > the first level matching a file, then the second level matching that > file ranges. > However, if such data is going to be used just for checking possible > overlap at memslot add or move time it is almost certainly an overkill. Yes, that is also what I'm seeing. > > > So linear scan is used before I can find a better way. > > Another option would be to simply not check for overlap at add or move > time, declare such configuration undefined behavior under KVM API and > make sure in MMU notifiers that nothing bad happens to the host kernel > if it turns out somebody actually set up a VM this way (it could be > inefficient in this case, since it's not supposed to ever happen > unless there is a bug somewhere in the userspace part). Specific to TDX case, SEAMMODULE will fail the overlapping case and then KVM prints a message to the kernel log. It will not cause any other side effect, it does look weird however. Yes warn that in the API document can help to some extent. Thanks, Chao > > > Chao > > Thanks, > Maciej