Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp856489pxf; Thu, 8 Apr 2021 14:31:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAgg+Qem7wZWBda4XMKQW2A8TU7TqM3bA8VixpBWT11bWegX3yXkagUwMsn7ndavBs0mPH X-Received: by 2002:aa7:c247:: with SMTP id y7mr14470132edo.305.1617917478981; Thu, 08 Apr 2021 14:31:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617917478; cv=none; d=google.com; s=arc-20160816; b=J7g2FjbJujc50d8UNFZapjXlyIERdFB14VyCB+laZ4zYGy9z+SdnsiYCUn5Qr6DSzK CbiLPLuYm3MG4vOI0PbOMsJxbyKG99tBqcAtyXtGM0ZDYSfRPyODmYR558o1e/EVQ+5r KLMmpBCeKXz5mTeEoPPJo4uuduXBsKyZiQsOtyWlqBHFfAq9lgBx8cPgNnsQoX5s8QfN JFfTcapCPLnJ3TKKF2wcK2FITFBVuHxeDC8GgXpAQl8t8/q8mvvMu9wBVTWdxHfEfcLg SQ3agHHwWESeL1v0P+jH9DKA/HV5Y/47M2KU8Luqqw1CcjZCBEs6UNAiD59kPXmbi/W9 skfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=FMwsbzIx2vhE3AUqGng1fDydGgdG4KsD5JUXQdwoJ+w=; b=qscJSc6lo8Hr5LzjTPlAlmvguMw6SZulAxNmI6oNkxCwTXc12ft7T8HyUwsn8sULlS bcMcetMAQkKr+3qYg6t9RyE0EWZWoYf75dJYLNXubmJOFH613tttk5Cj2YeZSmCyZdvI jxtQdP34miAWBwgEpVEyvqtu4vvcmln/I7+FI+vg5zkwDQKRWBQ9yK8jvsxslQRqBBR9 j3DFFaWApSSLEKYH66ELCjm0WU1o2oCuUIjnGouYFkC04q8w2nBwqwVitXXwuUhM9MO5 GRzt/4jB8IOlxOVlRdtUZTcwXz3mxBb1dafU0RZ6orz6N0e18D3r5eZUdgs+DO3EZqyE EAmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ZxjfWMBP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a22si644152edv.24.2021.04.08.14.30.55; Thu, 08 Apr 2021 14:31:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ZxjfWMBP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232545AbhDHVaM (ORCPT + 99 others); Thu, 8 Apr 2021 17:30:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232350AbhDHVaM (ORCPT ); Thu, 8 Apr 2021 17:30:12 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29B53C061760; Thu, 8 Apr 2021 14:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=FMwsbzIx2vhE3AUqGng1fDydGgdG4KsD5JUXQdwoJ+w=; b=ZxjfWMBPLVX108xUoD3fxC9LtV GXYGhxpBrxOZBsUh44HUWr/CHWB9mc+dHoAoJXCPYBBcg6r9FWxoWze7gYmBh/InxbGY+oZiib2/e uSiPXADkWIuCB7BPPYc8jAI8K5299K4EmEGVIsMuhq6WN6HzH44P5V6ehuhmAYrlbvRIpvlkZj3QB vMAlYFJ0RSwGbaABVxZ55vWznbJtOhiytyUr8lsGvOGsVUjuqNqgv3OmXc6nD2marT8BrKu+l3du0 2Ll4ZUu8wtjpPHQ15i/yRYnEFgA52ZX9NSdHi71qA4xU4ABcomSEwgISzH37BDAugOThzCej9KTH4 ou83SQiw==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lUcDP-00Gtps-RP; Thu, 08 Apr 2021 21:29:33 +0000 Date: Thu, 8 Apr 2021 22:29:27 +0100 From: Matthew Wilcox To: Daniel Xu Cc: bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, jolsa@kernel.org, hannes@cmpxchg.org, yhs@fb.com Subject: Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache Message-ID: <20210408212927.GQ2531743@casper.infradead.org> References: <22bededbd502e0df45326a54b3056941de65a101.1617831474.git.dxu@dxuuu.xyz> <20210408061401.GI2531743@casper.infradead.org> <20210408194849.wmueo74qcxghhf2d@dlxu-fedora-R90QNFJV> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210408194849.wmueo74qcxghhf2d@dlxu-fedora-R90QNFJV> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 08, 2021 at 12:48:49PM -0700, Daniel Xu wrote: > No reason other than I didn't know about the latter. Thanks for the > hint. find_get_entries() seems to return a pagevec of entries which > would complicate the iteration (a 4th layer of things to iterate over). > > But I did find find_get_pages_range() which I think can be used to find > 1 page at a time. I'll look into it further. Please don't, that's going to be a pagevec too. > > I'm not really keen on the idea of random BPF programs being able to poke > > at pages in the page cache like this. From your initial description, > > it sounded like all you needed was a list of which pages are present. > > Could you elaborate on what "list of which pages are present" implies? > The overall goal with this patch is to detect duplicate content in the > page cache. So anything that helps achieve that goal I would (in theory) > be OK with. > > My understanding is the user would need to hash the contents > of each page in the page cache. And BPF provides the flexibility such > that this work could be reused for currently unanticipated use cases. But if you need the contents, then you'll need to kmap() the pages. I don't see people being keen on exposing kmap() to bpf either. I think you're much better off providing an interface that returns a hash of each page to the BPF program. > Furthermore, bpf programs could already look at all the pages in the > page cache by hooking into tracepoint:filemap:mm_filemap_add_to_page_cache, > albeit at a much slower rate. I figure the downside of adding this > page cache iterator is we're explicitly condoning the behavior. That should never have been exposed. It's only supposed to be for error injection. If people have started actually using it for something, then it's time we delete that tracepoint. > The idea behind the radix tree was to deduplicate the mounts by > superblock. Because a single filesystem may be mounted in different > locations. I didn't find a set data structure I could reuse so I > figured radix tree / xarray would work too. > > Happy to take any better ideas too. > > > If you don't understand why this is so bad, call xa_dump() on it after > > constructing it. I'll wait. > > I did a dump and got the following results: http://ix.io/2VpY . > > I receieved a hint that you may be referring to how the xarray/radix > tree would be as large as the largest pointer. To my uneducated eye it > doesn't look like that's the case in this dump. Could you please > clarify? We get seven nodes per 4kB page. $ grep -c 'value 0' 2VpY 15 $ grep -c node 2VpY 43 so we use 6+1/7 pages in order to store 15 values. That's 387 cache lines, for the amount of data that could fit in two. Liam and I are working on a data structure that would support doing something along these lines in an efficient manner, but it's not ready yet.