Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp4428561pxb; Mon, 21 Feb 2022 21:19:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJyLao5k6bZ4nGLf0lFUQZwpkWreeoOFZ4Jiz4gBFbLZGFq7Wx6oVA+T5AAvIHguYNdg81Ko X-Received: by 2002:a17:902:ee45:b0:14f:b66c:dbad with SMTP id 5-20020a170902ee4500b0014fb66cdbadmr8191950plo.73.1645507159517; Mon, 21 Feb 2022 21:19:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645507159; cv=none; d=google.com; s=arc-20160816; b=Mf0ByXaYBAeQj7NnCsCdz/rJyhoQMm9qWHVMK+w8TN0P8EUAQg/iN75Ceq53VhqmFz pTs+Xg1dUP9H0GEt77gFirstvTadZI0v3e3TITWJsGj5VbXmRJj0Lhvgk7f9sJ7zqDm1 z9jblhXVOLv//lhVmXCS5oNOGIBPFOMOgI6Y8cgb4wmAj1lD+HPr2mHCX70l5SkmmjJy M06YRVKtUjekd9ohIK7VMIby7yYxHEdZKfzxOHixwATPPoivnBhUhLIhZ0bZPwrNAx1V 0qHRtzF0IIMWNVLLAgZrydOSnejtgCuEV98X7U9FJYLrCDSaRx+7FJQzbqSstT2XrmjF JDuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id; bh=lB9m8Cj/PO05lC6Ej+IE0CPqK205qXyP1eyORQTNe+Y=; b=oxsASs4QVDUFkAe83Ku8YlzoRNH1ZYY1j2W1Gknq3BD0XWlylFChe/LUKklYw5bbv5 Ua3A75ugj60DgYyarJCXaThm98SkAED3B34Wk569SEnM8rNUu0NJmwCmJHPbbMpmzQUw OAAcOaXjBx/m46nfEW/TojSP5DdWMp5Xgz4ttbNy4KNq7CueZnPkuQ7Fc/wc6o2/zzCE YiNsnGg1zEZRbMZKMv+AvbGpscDjJsVFmuoMzzjtcChs1lfnpI6hZ0e9EebLQECz4y0c Et/SU8CxhJxbWlZYBht7pFxXcYDD2XD1TRI5dYL5FxLUvizeTcVjweJ8IM81KaW6pe3b ZlcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s12si21141416pgs.75.2022.02.21.21.19.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Feb 2022 21:19:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 21CE3CDE; Mon, 21 Feb 2022 20:51:06 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238321AbiBVBRn (ORCPT + 99 others); Mon, 21 Feb 2022 20:17:43 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:54448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230022AbiBVBRm (ORCPT ); Mon, 21 Feb 2022 20:17:42 -0500 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47B3825589; Mon, 21 Feb 2022 17:17:15 -0800 (PST) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1nMJnQ-0005kk-S3; Tue, 22 Feb 2022 02:16:52 +0100 Message-ID: <45148f5f-fe79-b452-f3b2-482c5c3291c4@maciej.szmigiero.name> Date: Tue, 22 Feb 2022 02:16:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US To: Chao Peng Cc: Yu Zhang , Paolo Bonzini , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , kvm@vger.kernel.org, Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, qemu-devel@nongnu.org References: <20220118132121.31388-1-chao.p.peng@linux.intel.com> <20220118132121.31388-13-chao.p.peng@linux.intel.com> <20220217134548.GA33836@chaop.bj.intel.com> From: "Maciej S. Szmigiero" Subject: Re: [PATCH v4 12/12] KVM: Expose KVM_MEM_PRIVATE In-Reply-To: <20220217134548.GA33836@chaop.bj.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17.02.2022 14:45, Chao Peng wrote: > On Tue, Jan 25, 2022 at 09:20:39PM +0100, Maciej S. Szmigiero wrote: >> On 18.01.2022 14:21, Chao Peng wrote: >>> KVM_MEM_PRIVATE is not exposed by default but architecture code can turn >>> on it by implementing kvm_arch_private_memory_supported(). >>> >>> Also private memslot cannot be movable and the same file+offset can not >>> be mapped into different GFNs. >>> >>> Signed-off-by: Yu Zhang >>> Signed-off-by: Chao Peng >>> --- >> (..) >>> static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, >>> - gfn_t start, gfn_t end) >>> + struct file *file, >>> + gfn_t start, gfn_t end, >>> + loff_t start_off, loff_t end_off) >>> { >>> struct kvm_memslot_iter iter; >>> + struct kvm_memory_slot *slot; >>> + struct inode *inode; >>> + int bkt; >>> kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { >>> if (iter.slot->id != id) >>> return true; >>> } >>> + /* Disallow mapping the same file+offset into multiple gfns. */ >>> + if (file) { >>> + inode = file_inode(file); >>> + kvm_for_each_memslot(slot, bkt, slots) { >>> + if (slot->private_file && >>> + file_inode(slot->private_file) == inode && >>> + !(end_off <= slot->private_offset || >>> + start_off >= slot->private_offset >>> + + (slot->npages >> PAGE_SHIFT))) >>> + return true; >>> + } >>> + } >> >> That's a linear scan of all memslots on each CREATE (and MOVE) operation >> with a fd - we just spent more than a year rewriting similar linear scans >> into more efficient operations in KVM. > > In the last version I tried to solve this problem by using interval tree > (just like existing hva_tree), but finally we realized that in one VM we > can have multiple fds with overlapped offsets so that approach is > incorrect. See https://lkml.org/lkml/2021/12/28/480 for the discussion. That's right, in this case a two-level structure would be necessary: the first level matching a file, then the second level matching that file ranges. However, if such data is going to be used just for checking possible overlap at memslot add or move time it is almost certainly an overkill. > So linear scan is used before I can find a better way. Another option would be to simply not check for overlap at add or move time, declare such configuration undefined behavior under KVM API and make sure in MMU notifiers that nothing bad happens to the host kernel if it turns out somebody actually set up a VM this way (it could be inefficient in this case, since it's not supposed to ever happen unless there is a bug somewhere in the userspace part). > Chao Thanks, Maciej