Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp832150pxf; Thu, 8 Apr 2021 13:46:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxJpRHI+NqS+JWx6KhoERJAMZ1aaOGbINMMtWK5LEmgLd1TnYuCTX4qoUJf2TW2ZnHQ2n3v X-Received: by 2002:a63:5d6:: with SMTP id 205mr9535800pgf.278.1617914776827; Thu, 08 Apr 2021 13:46:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617914776; cv=none; d=google.com; s=arc-20160816; b=ImWhbwXO3C6P6kqHMi/jRXzqZhhHLruxWHeZS8OhsiHtrFilKHQEpqj77b5ESvxnwm BAwzUh79kPPGnDZqiapFp44Dj6ZzlTT+p6veb9xLbHLhQHaK9F+6uSMgEgLXoGPtdB2B ja39jtBw/+9ZGiPpAQji1p9Q9ZBv0IHJHcceD51gfdKZmrr5RbrugRF5ALDOAphoxPUB OaIs9PzzM3CtR3PLD+zyNBbd97mC1fRfKEIA8G5fYypPBbWjUwPUPmYcwZ+mLMO0JYWX HVJtGFIgAkcaQ/9WN1ncILcd9/ViUZWAiywV72m/M0VWEI74qwVqRbJeYb1MmI5ZYXBH 6U9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=qd99tIwkyY/yAuS+FY1nRWCDzMJlvE3kBmaqzDP5T8I=; b=OGxayXyrSJyxLixdWkjTFQP/p1odEoRkkbj963vSujG6VGAZSx+rtVmqI56e0OzHOQ tMsRjeHZm8k3lI+/Kc1DSxjwKi7w42GmigC/qcHXX21LHJMvK8DgOqoL0RsdRiRP2yTS +TSD++kTLhzgmg8Fd6GNOBaemO5C6DKBzXfMpWCBlaO+ZaBfBGtCQ+2pZA5HrnIswZWk vR9yZMiCyizOCzkwNNGXWFk4/kJSMLlOq/GnS15tSFF7QW9bb+GWYjTTbK8KqLGnPyGp +Z7I7p4gkCOW1Hi4mbrwmcjoTJ4GzCA03Rv5yNjSzkWlON+YsMEqfXkr8CE3o38FSd9C Vwcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm3 header.b=d5RbFQys; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=SDh4cvhe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a8si366323plp.338.2021.04.08.13.46.04; Thu, 08 Apr 2021 13:46:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm3 header.b=d5RbFQys; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=SDh4cvhe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232351AbhDHUo3 (ORCPT + 99 others); Thu, 8 Apr 2021 16:44:29 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:54487 "EHLO out3-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231676AbhDHUo1 (ORCPT ); Thu, 8 Apr 2021 16:44:27 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id CBA175C00AC; Thu, 8 Apr 2021 16:44:15 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 08 Apr 2021 16:44:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dxuuu.xyz; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm3; bh=qd99tIwkyY/yAuS+FY1nRWCDzMJ lvE3kBmaqzDP5T8I=; b=d5RbFQyskd2jDEewPuVtIYXhbjKJ0C3OJApXeMJmzQ5 QGU1zt28aKnfc2TKMx4BeCxELRQ6XFdHyN7aztL9bl+977LdnZ0YjRQep++GFj32 vvJl0VC2caDQ/SnpdOfbS1Ua4E3mJAw6nFFsXtuB77ePRBMRsNlMVb+AAVElAD4b dTTyWJNw27ZMuPrOl50w6omLbRepdkP40KdXU9x2/9FQF4KYtCGZHytpm/TM3HCr MrnC4cmftk1rwJMfllap1/FtDCugbbEY8vh9xcPJsDhqJw7XTuzlh7veA33W+uY0 DL1mqpUKY6oNp+S2JTw/AUnVeIWgFn9c5W6cfEMg0hw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=qd99tI wkyY/yAuS+FY1nRWCDzMJlvE3kBmaqzDP5T8I=; b=SDh4cvheVbh2qDob+ss9qD +FD9wwn4xZsrkMHujv7PmqPYsEeMTwnDJ+vINSAVHbu4jHq2r39NuiyTiiIBatZs Ndud9QH6jCxsN6qKbrLHaQEfh5p0mUNgcij3+IM2xi7IZt5p9jkKZJk8gIsqYxkI Tae/IIvDRsouOAOzO9/FVfdqvxrPJlNkYkIctwGZPoc3ZoL+fZMV/lvwkwPdqOnb C6UP4UQRILccy4TOuJO3AKjbyoOUIuWZCNPgwKOoBRBe5/WwKFbc8RVa3F7W8wQx hrQ/b+6ccZ84wrkaJQSAa5LcPRCjtVruv8wrqflGoI//D1eiHO7wYi+HT8Nr75bQ == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudejledgudehgecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enfghrlhcuvffnffculdejtddmnecujfgurhepfffhvffukfhfgggtuggjsehttdertddt tddvnecuhfhrohhmpeffrghnihgvlhcuighuuceougiguhesugiguhhuuhdrgiihiieqne cuggftrfgrthhtvghrnhepueduvdejfefflefgueevheefgeefteefteeuudduhfduhfeh veelteevudelheejnecukfhppeduieefrdduudegrddufedvrddunecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepugiguhesugiguhhuuhdrgiih ii X-ME-Proxy: Received: from dlxu-fedora-R90QNFJV (unknown [163.114.132.1]) by mail.messagingengine.com (Postfix) with ESMTPA id B5B3F1080066; Thu, 8 Apr 2021 16:44:12 -0400 (EDT) Date: Thu, 8 Apr 2021 13:44:10 -0700 From: Daniel Xu To: Christian Brauner Cc: bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, jolsa@kernel.org, hannes@cmpxchg.org, yhs@fb.com, Al Viro Subject: Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache Message-ID: <20210408204410.wszz3rjmqbg4ps3q@dlxu-fedora-R90QNFJV> References: <22bededbd502e0df45326a54b3056941de65a101.1617831474.git.dxu@dxuuu.xyz> <20210408081935.b3xollrzl6lejbyf@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210408081935.b3xollrzl6lejbyf@wittgenstein> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 08, 2021 at 10:19:35AM +0200, Christian Brauner wrote: > On Wed, Apr 07, 2021 at 02:46:11PM -0700, Daniel Xu wrote: > > This commit introduces the bpf page cache iterator. This iterator allows > > users to run a bpf prog against each page in the "page cache". > > Internally, the "page cache" is extremely tied to VFS superblock + inode > > combo. Because of this, iter_pagecache will only examine pages in the > > caller's mount namespace. > > > > Signed-off-by: Daniel Xu > > --- > > kernel/bpf/Makefile | 2 +- > > kernel/bpf/pagecache_iter.c | 293 ++++++++++++++++++++++++++++++++++++ > > 2 files changed, 294 insertions(+), 1 deletion(-) > > create mode 100644 kernel/bpf/pagecache_iter.c <...> > > > > +static int init_seq_pagecache(void *priv_data, struct bpf_iter_aux_info *aux) > > +{ > > + struct bpf_iter_seq_pagecache_info *info = priv_data; > > + struct radix_tree_iter iter; > > + struct super_block *sb; > > + struct mount *mnt; > > + void **slot; > > + int err; > > + > > + info->ns = current->nsproxy->mnt_ns; > > + get_mnt_ns(info->ns); > > + INIT_RADIX_TREE(&info->superblocks, GFP_KERNEL); > > + > > + spin_lock(&info->ns->ns_lock); > > + list_for_each_entry(mnt, &info->ns->list, mnt_list) { > > Not just are there helpers for taking ns_lock > static inline void lock_ns_list(struct mnt_namespace *ns) > static inline void unlock_ns_list(struct mnt_namespace *ns) > they are private to fs/namespace.c because it's the only place that > should ever walk this list. Thanks for the hints. Would it be acceptable to add some helpers to fs/namespace.c to allow walking the list? IIUC the only way to find a list of mounts is by looking at the mount namespace. And walking each mount and looking at each `struct super_node`'s inode's `struct address_space` seemed like the cleanest way to walkthe page cache. > This seems buggy: why is it ok here to only take ns_lock and not also > namespace_sem like mnt_already_visible() and __is_local_mountpoint() > or the relevant proc iterators? I might be missing something. Thanks for the hints. I'll take a closer look at the locking. Most probably I didn't get it right. I should have also mentioned in the cover letter that I'm fairly sure I messed up the locking somewhere. > > > + sb = mnt->mnt.mnt_sb; > > + > > + /* The same mount may be mounted in multiple places */ > > + if (radix_tree_lookup(&info->superblocks, (unsigned long)sb)) > > + continue; > > + > > + err = radix_tree_insert(&info->superblocks, > > + (unsigned long)sb, (void *)1); > > + if (err) > > + goto out; > > + } > > + > > + radix_tree_for_each_slot(slot, &info->superblocks, &iter, 0) { > > + sb = (struct super_block *)iter.index; > > + atomic_inc(&sb->s_active); > > It also isn't nice that you mess with sb->s_active directly. > > Imho, this is poking around in a lot of fs/ specific stuff that other > parts of the kernel should not care about or have access to. Re above: do you think it'd be appropriate to add more helpers to fs/ ? <...> Thanks, Daniel