Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp800394pxf; Thu, 8 Apr 2021 12:51:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZDC9CYuB5RTyGuMevcuWSJl3F9huszDDw+jE2ikfTCPG3go+TJl07AcquCvr4RR7YpBZG X-Received: by 2002:a17:90a:4b4e:: with SMTP id o14mr3426958pjl.199.1617911497668; Thu, 08 Apr 2021 12:51:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617911497; cv=none; d=google.com; s=arc-20160816; b=gJe5dAgS5+zrOrzSeFXWIlL1ZX2w5+Kb1qeJMNsK/y2FiPyVh9H28oCfVYfTKNfwGp gbNcgO4WcRPBfkhcnIfTelEcTDZggFKjdsxz9T1kWmUOI00guM9wDU531oidts0vlpq4 lxGqYMexIVgxzihkb+MbaCQnqY66LVRZM3NzrxijC8hyW4WXEFHFf7YM1izDn5/lsrAc j/GRciJ0gxrPFFQRj+BRNZdalcsNc9PMwdZcL8+pxMvaNbXseDacWTMkyWrJrE+PVaAw 1e9ID3U5Ko4tFVLUJqN+NU8WYh85fa1xdX/gEiTdjLFnxIWyhx5PlpN/1AVk5CM6A9uv 4v8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=fw9IHCLJHiRwr8mG8aPR5wNCw+ySr2ECQd6P2WIYQTA=; b=VxbJhRB2FwaUWeFT+lC32vPUd4IZXLtlXWpwkcAORn4yqFl78ePHcMUjoI5YJY1cEW KS9lReAS4RXueFpg1Rb2OyOe9WfPRhW69Y4hYFtyuoOANTUP63wvTYeGex8P/Izq3n8+ 3KEUNNseXeqFL4+wL+Wa2ZIxlRjPuVgThuCp6m3DWNbarSZdvHJXk8akn8445QrLqDTZ FdfZDjeqMC4zLz/I1XA4tBA6E+A46m0VhfDxraggVPRbRd48H0s+Tgtjy/JLQWVDvKGQ xmJ7yPr8bQyn6Oxs2gf4qAAmTaqmWmt3UdIrduFopN1x0TwI4niTwwwQO89aqkcnt11Z v5Ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm3 header.b=NUjIuKak; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=K6vo3oY3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g8si261796pfc.106.2021.04.08.12.51.24; Thu, 08 Apr 2021 12:51:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm3 header.b=NUjIuKak; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=K6vo3oY3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232417AbhDHTtG (ORCPT + 99 others); Thu, 8 Apr 2021 15:49:06 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:59851 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232023AbhDHTtG (ORCPT ); Thu, 8 Apr 2021 15:49:06 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id 61BF65C00F9; Thu, 8 Apr 2021 15:48:54 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 08 Apr 2021 15:48:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dxuuu.xyz; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm3; bh=fw9IHCLJHiRwr8mG8aPR5wNCw+y Sr2ECQd6P2WIYQTA=; b=NUjIuKakvgWBWb63I3f5BeSkW2jFtfJjHEwDl7Gn9Kv qZf0zp8LN02gDnCrEr2MFMUWj8VYzEUQlyVF24zV0P0BhA5N4lx2PBlLxdELp9NX h8dqFRZLBh0Qi0/nMaI3Bl6Y0P7jju4dohJtGZ9zoEU9hP1NL4XBQZ+hL2hfNvnP DpEaHnZPsp1A183gWRjA5+XsGoyY5dvDrHpXSokCGvr0fFncQyINaXUtkXeHdvFy /ZLV/iJ/E/pPuii6RGn+cUPYof/8a3NGzLe2jKljsg8G5zQ7SjUG2rN8jR5b4Td1 YTWwjaeVea7uYO092nsTdupuCyGRyT8RDrHpUNX+PSQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=fw9IHC LJHiRwr8mG8aPR5wNCw+ySr2ECQd6P2WIYQTA=; b=K6vo3oY3mP345AdlOqgpvG MDIxj3WMxCM4GwBvGv09JZ7QreGswrNZMCCGhJyrh4U3JdctjJg4GCuBxl2XYmaA 2aCMuxTZmuohGeCTrz9eh/N7wcd7I6JNTROU/6oQF7TYhbM13tYInXrJo6c6DxT8 a/0bRdIZm+BY3XDfunvCguu0T88ha9ItyqZ0dwNS4Ia+toqTuSNBymatj4e+5SB2 8Vlq5BOpTgC0npw7CDbMiCzj+e7W9sS+pYygWjryEiycWWlggUTmCMHXNSzPTHZh /d2+EnzAwpvxq5P7ggd1BAzglOi7CFmAm1WHg1LIpxVt5qjjQfHFr9e40YLl1OLQ == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudejledgudegfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecuogfuuhhsphgvtghtffhomhgrihhnucdlgeelmd enfghrlhcuvffnffculdefhedmnecujfgurhepfffhvffukfhfgggtuggjsehttdertddt tddvnecuhfhrohhmpeffrghnihgvlhcuighuuceougiguhesugiguhhuuhdrgiihiieqne cuggftrfgrthhtvghrnhepudevudelgedtjeehieekjeeiffevffevtedtheekudegvdef ffegfeehjefhuddvnecuffhomhgrihhnpehigidrihhonecukfhppeduieefrdduudegrd dufedvrddunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhho mhepugiguhesugiguhhuuhdrgiihii X-ME-Proxy: Received: from dlxu-fedora-R90QNFJV (unknown [163.114.132.1]) by mail.messagingengine.com (Postfix) with ESMTPA id 8FD711080057; Thu, 8 Apr 2021 15:48:51 -0400 (EDT) Date: Thu, 8 Apr 2021 12:48:49 -0700 From: Daniel Xu To: Matthew Wilcox Cc: bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, jolsa@kernel.org, hannes@cmpxchg.org, yhs@fb.com Subject: Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache Message-ID: <20210408194849.wmueo74qcxghhf2d@dlxu-fedora-R90QNFJV> References: <22bededbd502e0df45326a54b3056941de65a101.1617831474.git.dxu@dxuuu.xyz> <20210408061401.GI2531743@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210408061401.GI2531743@casper.infradead.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 08, 2021 at 07:14:01AM +0100, Matthew Wilcox wrote: > On Wed, Apr 07, 2021 at 02:46:11PM -0700, Daniel Xu wrote: > > +struct bpf_iter_seq_pagecache_info { > > + struct mnt_namespace *ns; > > + struct radix_tree_root superblocks; > > Why are you adding a new radix tree? Use an XArray instead. Ah right, sorry. Will do. > > +static struct page *goto_next_page(struct bpf_iter_seq_pagecache_info *info) > > +{ > > + struct page *page, *ret = NULL; > > + unsigned long idx; > > + > > + rcu_read_lock(); > > +retry: > > + BUG_ON(!info->cur_inode); > > + ret = NULL; > > + xa_for_each_start(&info->cur_inode->i_data.i_pages, idx, page, > > + info->cur_page_idx) { > > + if (!page_cache_get_speculative(page)) > > + continue; > > Why do you feel the need to poke around in i_pages directly? Is there > something wrong with find_get_entries()? No reason other than I didn't know about the latter. Thanks for the hint. find_get_entries() seems to return a pagevec of entries which would complicate the iteration (a 4th layer of things to iterate over). But I did find find_get_pages_range() which I think can be used to find 1 page at a time. I'll look into it further. > > +static int __pagecache_seq_show(struct seq_file *seq, struct page *page, > > + bool in_stop) > > +{ > > + struct bpf_iter_meta meta; > > + struct bpf_iter__pagecache ctx; > > + struct bpf_prog *prog; > > + > > + meta.seq = seq; > > + prog = bpf_iter_get_info(&meta, in_stop); > > + if (!prog) > > + return 0; > > + > > + meta.seq = seq; > > + ctx.meta = &meta; > > + ctx.page = page; > > + return bpf_iter_run_prog(prog, &ctx); > > I'm not really keen on the idea of random BPF programs being able to poke > at pages in the page cache like this. From your initial description, > it sounded like all you needed was a list of which pages are present. Could you elaborate on what "list of which pages are present" implies? The overall goal with this patch is to detect duplicate content in the page cache. So anything that helps achieve that goal I would (in theory) be OK with. My understanding is the user would need to hash the contents of each page in the page cache. And BPF provides the flexibility such that this work could be reused for currently unanticipated use cases. Furthermore, bpf programs could already look at all the pages in the page cache by hooking into tracepoint:filemap:mm_filemap_add_to_page_cache, albeit at a much slower rate. I figure the downside of adding this page cache iterator is we're explicitly condoning the behavior. > > + INIT_RADIX_TREE(&info->superblocks, GFP_KERNEL); > > + > > + spin_lock(&info->ns->ns_lock); > > + list_for_each_entry(mnt, &info->ns->list, mnt_list) { > > + sb = mnt->mnt.mnt_sb; > > + > > + /* The same mount may be mounted in multiple places */ > > + if (radix_tree_lookup(&info->superblocks, (unsigned long)sb)) > > + continue; > > + > > + err = radix_tree_insert(&info->superblocks, > > + (unsigned long)sb, (void *)1); > > + if (err) > > + goto out; > > + } > > + > > + radix_tree_for_each_slot(slot, &info->superblocks, &iter, 0) { > > + sb = (struct super_block *)iter.index; > > + atomic_inc(&sb->s_active); > > + } > > Uh. What on earth made you think this was a good way to use the radix > tree? And, no, the XArray doesn't change that. The idea behind the radix tree was to deduplicate the mounts by superblock. Because a single filesystem may be mounted in different locations. I didn't find a set data structure I could reuse so I figured radix tree / xarray would work too. Happy to take any better ideas too. > If you don't understand why this is so bad, call xa_dump() on it after > constructing it. I'll wait. I did a dump and got the following results: http://ix.io/2VpY . I receieved a hint that you may be referring to how the xarray/radix tree would be as large as the largest pointer. To my uneducated eye it doesn't look like that's the case in this dump. Could you please clarify? <...> Thanks, Daniel