Date:   Tue, 4 Jan 2022 17:43:50 +0000
From:   Sean Christopherson <seanjc@google.com>
To:     Chao Peng <chao.p.peng@linux.intel.com>
Cc:     kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
        qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
        Jonathan Corbet <corbet@lwn.net>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
        Hugh Dickins <hughd@google.com>,
        Jeff Layton <jlayton@kernel.org>,
        "J . Bruce Fields" <bfields@fieldses.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Yu Zhang <yu.c.zhang@linux.intel.com>,
        "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
        luto@kernel.org, john.ji@intel.com, susie.li@intel.com,
        jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com,
        david@redhat.com
Subject: Re: [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast
 memslot lookup by file offset
Message-ID: <YdSHViDXGkjz5t/Q@google.com>
References: <20211223123011.41044-1-chao.p.peng@linux.intel.com>
 <20211223123011.41044-6-chao.p.peng@linux.intel.com>
 <YcS5uStTallwRs0G@google.com>
 <20211224035418.GA43608@chaop.bj.intel.com>
 <YcuGGCo5pR31GkZE@google.com>
 <20211231022636.GA7025@chaop.bj.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20211231022636.GA7025@chaop.bj.intel.com>
Precedence: bulk

On Fri, Dec 31, 2021, Chao Peng wrote:
> On Tue, Dec 28, 2021 at 09:48:08PM +0000, Sean Christopherson wrote:
> >KVM handles
> > reverse engineering the memslot to get the offset and whatever else it needs.
> > notify_fallocate() and other callbacks are unchanged, though they probably can
> > drop the inode.
> > 
> > E.g. likely with bad math and handwaving on the overlap detection:
> > 
> > int kvm_private_fd_fallocate_range(void *owner, pgoff_t start, pgoff_t end)
> > {
> > 	struct kvm_memory_slot *slot = owner;
> > 	struct kvm_gfn_range gfn_range = {
> > 		.slot	   = slot,
> > 		.start	   = (start - slot->private_offset) >> PAGE_SHIFT,
> > 		.end	   = (end - slot->private_offset) >> PAGE_SHIFT,
> > 		.may_block = true,
> > 	};
> > 
> > 	if (!has_overlap(slot, start, end))
> > 		return 0;
> > 
> > 	gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages);
> > 
> > 	kvm_unmap_gfn_range(slot->kvm, &gfn_range);
> > 	return 0;
> > }
> 
> I understand this KVM side handling, but again one fd can have multiple
> memslots. How shmem decides to notify which memslot from a list of
> memslots when it invokes the notify_fallocate()? Or just notify all
> the possible memslots then let KVM to check? 

Heh, yeah, those are the two choices.  :-)

Either the backing store needs to support registering callbacks for specific,
arbitrary ranges, or it needs to invoke all registered callbacks.  Invoking all
callbacks has my vote; it's much simpler to implement and is unlikely to incur
meaningful overhead.  _Something_ has to find the overlapping ranges, that cost
doesn't magically go away if it's pushed into the backing store.

Note, invoking all notifiers is also aligned with the mmu_notifier behavior.