Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3423374yba; Tue, 16 Apr 2019 11:01:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqwfbGRl2tP7QX2xaCut4TvkUTv0OosNCMQbAZH0/c3TaVh9ZSLk2OTZgFJ8+mwzZIGaPO5G X-Received: by 2002:a63:1048:: with SMTP id 8mr79429268pgq.70.1555437671782; Tue, 16 Apr 2019 11:01:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555437671; cv=none; d=google.com; s=arc-20160816; b=SwYoZxXkF3/j1gvQeSQCkeZMX9reCdjBsf1yQslOh4OuSQjLG5dAJpG8JlETyZhTn2 pkETfnpgkS6UDzRYnaDoaiaA/77NfukcsJ5Y5zwIYWgVMm/cZCyhZ0BDvWjVUId3g8dk V9zPdaA2IVonbtgNVWooXYAMAlNF9hkDdTK3aZn0DpWh+XSaZW3tOoCkXAnzqwUWBAlo B5yWYTVVD+cR+BMdOogXDQ3PDKRnKldaWbVN32lEczK6x1k7LQljc7LktwnQ64h59m27 GtCJQk2c/5B8hyQjtY6HxIqYgTnhUGcwZBm0zSnVYnmwJzz4Ps1+BAzCoa1XsuF0o5UK Klqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=TLm7eMxDSxegjQvo1hzmbgmp24AEnecwVkE3WQHjRtc=; b=sD3898FwyCqtuAQwdoEgn3AtLWn2Jrd0pOMTrwdV+/fx8zMTEbqmY9PLN7wUIB6LcZ QXwg6ogwTYhbG9cT7w6LhPWpevJ7s42dNudQgimvQSpQiVGTVQFwoYrl7Hg4mFvmTwJY +PkrQlYiPRJE2/Xg6MuE4hfLBd9O1RLaRIaGSd9sLPc2UvWXvIR2D2yzxuxJNS0jK+g7 duwyDUI8vHhFd+xuc5RRIRPSKHE4RESnaIMVWsA6Y0+E7HKHEcq+lCFAPNVGmzOG9kZS ACetJk4VHcS7jkq7FzsYHGdkA3+Z7qe9Isr30tC/gFcrz4anf4Cl93XSRT2x55hoOsqd rI0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m32si12385376pld.309.2019.04.16.11.00.52; Tue, 16 Apr 2019 11:01:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730747AbfDPSAO (ORCPT + 99 others); Tue, 16 Apr 2019 14:00:14 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:57494 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727180AbfDPRxm (ORCPT ); Tue, 16 Apr 2019 13:53:42 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92 #3 (Red Hat Linux)) id 1hGSH6-0005Ua-Rz; Tue, 16 Apr 2019 17:53:40 +0000 From: Al Viro To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 03/62] new inode method: ->free_inode() Date: Tue, 16 Apr 2019 18:52:41 +0100 Message-Id: <20190416175340.21068-3-viro@ZenIV.linux.org.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190416175340.21068-1-viro@ZenIV.linux.org.uk> References: <20190416174900.GT2217@ZenIV.linux.org.uk> <20190416175340.21068-1-viro@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Al Viro A lot of ->destroy_inode() instances end with call_rcu() of a callback that does RCU-delayed part of freeing. Introduce a new method for doing just that, with saner signature. Rules: ->destroy_inode ->free_inode f g immediate call of f(), RCU-delayed call of g() f NULL immediate call of f(), no RCU-delayed calls NULL g RCU-delayed call of g() NULL NULL RCU-delayed default freeing IOW, NULL ->free_inode gives the same behaviour as now. Note that NULL, NULL is equivalent to NULL, free_inode_nonrcu; we could mandate the latter form, but that would have very little benefit beyond making rules a bit more symmetric. It would break backwards compatibility, require extra boilerplate and expected semantics for (NULL, NULL) pair would have no use whatsoever... Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 2 ++ Documentation/filesystems/porting | 17 ++++++++++++ fs/inode.c | 54 +++++++++++++++++++++++---------------- include/linux/fs.h | 1 + 4 files changed, 52 insertions(+), 22 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index efea228ccd8a..7b20c385cc02 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -118,6 +118,7 @@ set: exclusive --------------------------- super_operations --------------------------- prototypes: struct inode *(*alloc_inode)(struct super_block *sb); + void (*free_inode)(struct inode *); void (*destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, struct writeback_control *wbc); @@ -139,6 +140,7 @@ locking rules: All may block [not true, see below] s_umount alloc_inode: +free_inode: called from RCU callback destroy_inode: dirty_inode: write_inode: diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index cf43bc4dbf31..9d80f9e0855e 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting @@ -638,3 +638,20 @@ in your dentry operations instead. inode to d_splice_alias() will also do the right thing (equivalent of d_add(dentry, NULL); return NULL;), so that kind of special cases also doesn't need a separate treatment. +-- +[strongly recommended] + take the RCU-delayed parts of ->destroy_inode() into a new method - + ->free_inode(). If ->destroy_inode() becomes empty - all the better, + just get rid of it. Synchronous work (e.g. the stuff that can't + be done from an RCU callback, or any WARN_ON() where we want the + stack trace) *might* be movable to ->evict_inode(); however, + that goes only for the things that are not needed to balance something + done by ->alloc_inode(). IOW, if it's cleaning up the stuff that + might have accumulated over the life of in-core inode, ->evict_inode() + might be a fit. + + Rules for inode destruction: + * if ->destroy_inode() is non-NULL, it gets called + * if ->free_inode() is non-NULL, it gets scheduled by call_rcu() + * combination of NULL ->destroy_inode and NULL ->free_inode is + treated as NULL/free_inode_nonrcu, to preserve the compatibility. diff --git a/fs/inode.c b/fs/inode.c index e9d97add2b36..fb45590d284e 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -202,12 +202,28 @@ int inode_init_always(struct super_block *sb, struct inode *inode) } EXPORT_SYMBOL(inode_init_always); +void free_inode_nonrcu(struct inode *inode) +{ + kmem_cache_free(inode_cachep, inode); +} +EXPORT_SYMBOL(free_inode_nonrcu); + +static void i_callback(struct rcu_head *head) +{ + struct inode *inode = container_of(head, struct inode, i_rcu); + if (inode->i_sb->s_op->free_inode) + inode->i_sb->s_op->free_inode(inode); + else + free_inode_nonrcu(inode); +} + static struct inode *alloc_inode(struct super_block *sb) { + const struct super_operations *ops = sb->s_op; struct inode *inode; - if (sb->s_op->alloc_inode) - inode = sb->s_op->alloc_inode(sb); + if (ops->alloc_inode) + inode = ops->alloc_inode(sb); else inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL); @@ -215,22 +231,18 @@ static struct inode *alloc_inode(struct super_block *sb) return NULL; if (unlikely(inode_init_always(sb, inode))) { - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - kmem_cache_free(inode_cachep, inode); + if (ops->destroy_inode) { + ops->destroy_inode(inode); + if (!ops->free_inode) + return NULL; + } + i_callback(&inode->i_rcu); return NULL; } return inode; } -void free_inode_nonrcu(struct inode *inode) -{ - kmem_cache_free(inode_cachep, inode); -} -EXPORT_SYMBOL(free_inode_nonrcu); - void __destroy_inode(struct inode *inode) { BUG_ON(inode_has_buffers(inode)); @@ -253,20 +265,18 @@ void __destroy_inode(struct inode *inode) } EXPORT_SYMBOL(__destroy_inode); -static void i_callback(struct rcu_head *head) -{ - struct inode *inode = container_of(head, struct inode, i_rcu); - kmem_cache_free(inode_cachep, inode); -} - static void destroy_inode(struct inode *inode) { + const struct super_operations *ops = inode->i_sb->s_op; + BUG_ON(!list_empty(&inode->i_lru)); __destroy_inode(inode); - if (inode->i_sb->s_op->destroy_inode) - inode->i_sb->s_op->destroy_inode(inode); - else - call_rcu(&inode->i_rcu, i_callback); + if (ops->destroy_inode) { + ops->destroy_inode(inode); + if (!ops->free_inode) + return; + } + call_rcu(&inode->i_rcu, i_callback); } /** diff --git a/include/linux/fs.h b/include/linux/fs.h index dd28e7679089..2e9b9f87caca 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1903,6 +1903,7 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); + void (*free_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, struct writeback_control *wbc); -- 2.11.0