From: "Kirill A. Shutemov" <[email protected]>
Sorry for resend. Original mail had too long cc list.
====
There's no reason to call rcu_barrier() on every deactivate_locked_super().
We only need to make sure that all delayed rcu free inodes are flushed
before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some
fas paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <[email protected]>
---
fs/9p/v9fs.c | 5 +++++
fs/adfs/super.c | 5 +++++
fs/affs/super.c | 5 +++++
fs/afs/super.c | 5 +++++
fs/befs/linuxvfs.c | 5 +++++
fs/bfs/inode.c | 5 +++++
fs/btrfs/extent_io.c | 6 ++++++
fs/btrfs/inode.c | 5 +++++
fs/ceph/super.c | 5 +++++
fs/cifs/cifsfs.c | 5 +++++
fs/coda/inode.c | 5 +++++
fs/ecryptfs/main.c | 6 ++++++
fs/efs/super.c | 5 +++++
fs/exofs/super.c | 5 +++++
fs/ext2/super.c | 5 +++++
fs/ext3/super.c | 5 +++++
fs/ext4/super.c | 5 +++++
fs/fat/inode.c | 5 +++++
fs/freevxfs/vxfs_super.c | 5 +++++
fs/fuse/inode.c | 6 ++++++
fs/hfs/super.c | 6 ++++++
fs/hfsplus/super.c | 6 ++++++
fs/hpfs/super.c | 5 +++++
fs/hugetlbfs/inode.c | 5 +++++
fs/isofs/inode.c | 5 +++++
fs/jffs2/super.c | 6 ++++++
fs/jfs/super.c | 6 ++++++
fs/logfs/inode.c | 5 +++++
fs/minix/inode.c | 5 +++++
fs/ncpfs/inode.c | 5 +++++
fs/nfs/inode.c | 5 +++++
fs/nilfs2/super.c | 6 ++++++
fs/ntfs/super.c | 6 ++++++
fs/ocfs2/dlmfs/dlmfs.c | 5 +++++
fs/ocfs2/super.c | 5 +++++
fs/openpromfs/inode.c | 5 +++++
fs/qnx4/inode.c | 5 +++++
fs/qnx6/inode.c | 5 +++++
fs/reiserfs/super.c | 5 +++++
fs/romfs/super.c | 5 +++++
fs/squashfs/super.c | 5 +++++
fs/super.c | 6 ------
fs/sysv/inode.c | 5 +++++
fs/ubifs/super.c | 6 ++++++
fs/udf/super.c | 5 +++++
fs/ufs/super.c | 5 +++++
fs/xfs/xfs_super.c | 5 +++++
47 files changed, 240 insertions(+), 6 deletions(-)
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index b85efa7..392c5da 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -560,6 +560,11 @@ static int v9fs_init_inode_cache(void)
*/
static void v9fs_destroy_inode_cache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(v9fs_inode_cache);
}
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 06fdcc9..f137165 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -276,6 +276,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(adfs_inode_cachep);
}
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 0782653..4fe18a8 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -129,6 +129,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(affs_inode_cachep);
}
diff --git a/fs/afs/super.c b/fs/afs/super.c
index f02b31e..af69050 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -123,6 +123,11 @@ void __exit afs_fs_exit(void)
BUG();
}
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(afs_inode_cachep);
_leave("");
}
diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index e18da23..02f3331 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -454,6 +454,11 @@ befs_init_inodecache(void)
static void
befs_destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(befs_inode_cachep);
}
diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9870417..d5fc598 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -280,6 +280,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(bfs_inode_cachep);
}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2c8f7b2..010623a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -106,6 +106,12 @@ void extent_io_exit(void)
list_del(&eb->leak_list);
kmem_cache_free(extent_buffer_cache, eb);
}
+
+ /*
+ * Make sure all delayed rcu free are flushed before we
+ * destroy caches.
+ */
+ rcu_barrier();
if (extent_state_cache)
kmem_cache_destroy(extent_state_cache);
if (extent_buffer_cache)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f6ab6f5..971d222 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6976,6 +6976,11 @@ static void init_once(void *foo)
void btrfs_destroy_cachep(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
if (btrfs_inode_cachep)
kmem_cache_destroy(btrfs_inode_cachep);
if (btrfs_trans_handle_cachep)
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 1e67dd7..ecc251e 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -602,6 +602,11 @@ bad_cap:
static void destroy_caches(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ceph_inode_cachep);
kmem_cache_destroy(ceph_cap_cachep);
kmem_cache_destroy(ceph_dentry_cachep);
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 8b6e344..93ce9b3f 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -975,6 +975,11 @@ cifs_init_inodecache(void)
static void
cifs_destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(cifs_inode_cachep);
}
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index f181312..0c70460 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -85,6 +85,11 @@ int coda_init_inodecache(void)
void coda_destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(coda_inode_cachep);
}
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 6895493..f6acf8a 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -693,6 +693,12 @@ static void ecryptfs_free_kmem_caches(void)
{
int i;
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
+
for (i = 0; i < ARRAY_SIZE(ecryptfs_cache_infos); i++) {
struct ecryptfs_cache_info *info;
diff --git a/fs/efs/super.c b/fs/efs/super.c
index e755ec7..2002431 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -96,6 +96,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(efs_inode_cachep);
}
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 4337836..d271f96 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -206,6 +206,11 @@ static int init_inodecache(void)
*/
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(exofs_inode_cachep);
}
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index b3621cb..de2051c 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -204,6 +204,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ext2_inode_cachep);
}
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 8c3a44b..bd60e08 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -543,6 +543,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ext3_inode_cachep);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb7aa3e..4e2aacb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1046,6 +1046,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ext4_inode_cachep);
}
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index a3d81eb..35113bc 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -526,6 +526,11 @@ static int __init fat_init_inodecache(void)
static void __exit fat_destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(fat_inode_cachep);
}
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index d4fabd2..fed2c8a 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -279,6 +279,11 @@ static void __exit
vxfs_cleanup(void)
{
unregister_filesystem(&vxfs_fs_type);
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(vxfs_inode_cachep);
}
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 1cd6165..9078173 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1171,6 +1171,12 @@ static void fuse_fs_cleanup(void)
{
unregister_filesystem(&fuse_fs_type);
unregister_fuseblk();
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(fuse_inode_cachep);
}
diff --git a/fs/hfs/super.c b/fs/hfs/super.c
index 7b4c537..373a9e5 100644
--- a/fs/hfs/super.c
+++ b/fs/hfs/super.c
@@ -483,6 +483,12 @@ static int __init init_hfs_fs(void)
static void __exit exit_hfs_fs(void)
{
unregister_filesystem(&hfs_fs_type);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(hfs_inode_cachep);
}
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index a9bca4b..bde9ecf 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -615,6 +615,12 @@ static int __init init_hfsplus_fs(void)
static void __exit exit_hfsplus_fs(void)
{
unregister_filesystem(&hfsplus_fs_type);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(hfsplus_inode_cachep);
}
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 706a12c..3cb1da5 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -210,6 +210,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(hpfs_inode_cachep);
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index cc9281b..b482175 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1042,6 +1042,11 @@ static int __init init_hugetlbfs_fs(void)
static void __exit exit_hugetlbfs_fs(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(hugetlbfs_inode_cachep);
kern_unmount(hugetlbfs_vfsmount);
unregister_filesystem(&hugetlbfs_fs_type);
diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 29037c3..f94cde4 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -114,6 +114,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(isofs_inode_cachep);
}
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 61ea413..ff48795 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -418,6 +418,12 @@ static void __exit exit_jffs2_fs(void)
unregister_filesystem(&jffs2_fs_type);
jffs2_destroy_slab_caches();
jffs2_compressors_exit();
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(jffs2_inode_cachep);
}
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 4a82950..2fc7602 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -898,6 +898,12 @@ static void __exit exit_jfs_fs(void)
jfs_proc_clean();
#endif
unregister_filesystem(&jfs_fs_type);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(jfs_inode_cachep);
}
diff --git a/fs/logfs/inode.c b/fs/logfs/inode.c
index a422f42..3d1b9fa 100644
--- a/fs/logfs/inode.c
+++ b/fs/logfs/inode.c
@@ -401,5 +401,10 @@ int logfs_init_inode_cache(void)
void logfs_destroy_inode_cache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(logfs_inode_cache);
}
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 2a503ad..dc8d362 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -100,6 +100,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(minix_inode_cachep);
}
diff --git a/fs/ncpfs/inode.c b/fs/ncpfs/inode.c
index 333df07..0c62c55 100644
--- a/fs/ncpfs/inode.c
+++ b/fs/ncpfs/inode.c
@@ -89,6 +89,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ncp_inode_cachep);
}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index e605d69..39aae2e 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1566,6 +1566,11 @@ static int __init nfs_init_inodecache(void)
static void nfs_destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(nfs_inode_cachep);
}
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 1099a76..956e5a4 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1386,6 +1386,12 @@ static void nilfs_segbuf_init_once(void *obj)
static void nilfs_destroy_cachep(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
+
if (nilfs_inode_cachep)
kmem_cache_destroy(nilfs_inode_cachep);
if (nilfs_transaction_cachep)
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index b341492..ecc8625 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -3185,6 +3185,12 @@ static void __exit exit_ntfs_fs(void)
ntfs_debug("Unregistering NTFS driver.");
unregister_filesystem(&ntfs_fs_type);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ntfs_big_inode_cache);
kmem_cache_destroy(ntfs_inode_cache);
kmem_cache_destroy(ntfs_name_cache);
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index e31d6ae..6934459 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -691,6 +691,11 @@ static void __exit exit_dlmfs_fs(void)
flush_workqueue(user_dlm_worker);
destroy_workqueue(user_dlm_worker);
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(dlmfs_inode_cache);
bdi_destroy(&dlmfs_backing_dev_info);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 68f4541..0e91ec2 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1818,6 +1818,11 @@ static int ocfs2_initialize_mem_caches(void)
static void ocfs2_free_mem_caches(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
if (ocfs2_inode_cachep)
kmem_cache_destroy(ocfs2_inode_cachep);
ocfs2_inode_cachep = NULL;
diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
index bc49c97..1b76370 100644
--- a/fs/openpromfs/inode.c
+++ b/fs/openpromfs/inode.c
@@ -463,6 +463,11 @@ static int __init init_openprom_fs(void)
static void __exit exit_openprom_fs(void)
{
unregister_filesystem(&openprom_fs_type);
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(op_inode_cachep);
}
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 552e994..9534b4f 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -391,6 +391,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(qnx4_inode_cachep);
}
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index e44012d..3cba5d2 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -652,6 +652,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(qnx6_inode_cachep);
}
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 651ce76..ad39139 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -603,6 +603,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(reiserfs_inode_cachep);
}
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index e64f6b5..f901eaf 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -648,6 +648,11 @@ error_register:
static void __exit exit_romfs_fs(void)
{
unregister_filesystem(&romfs_fs_type);
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(romfs_inode_cachep);
}
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 29cd014..260e392 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -425,6 +425,11 @@ static int __init init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(squashfs_inode_cachep);
}
diff --git a/fs/super.c b/fs/super.c
index cf00177..2d962aa 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -256,12 +256,6 @@ void deactivate_locked_super(struct super_block *s)
/* caches are now gone, we can safely kill the shrinker now */
unregister_shrinker(&s->s_shrink);
-
- /*
- * We need to call rcu_barrier so all the delayed rcu free
- * inodes are flushed before we release the fs module.
- */
- rcu_barrier();
put_filesystem(fs);
put_super(s);
} else {
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index 08d0b25..a08341c 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -376,5 +376,10 @@ int __init sysv_init_icache(void)
void sysv_destroy_icache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(sysv_inode_cachep);
}
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 5862dd9..e8ae748 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2302,6 +2302,12 @@ static void __exit ubifs_exit(void)
dbg_debugfs_exit();
ubifs_compressors_exit();
unregister_shrinker(&ubifs_shrinker_info);
+
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ubifs_inode_slab);
unregister_filesystem(&ubifs_fs_type);
}
diff --git a/fs/udf/super.c b/fs/udf/super.c
index ac8a348..ebe9f52 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -170,6 +170,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(udf_inode_cachep);
}
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 302f340..f655c1d 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1449,6 +1449,11 @@ static int init_inodecache(void)
static void destroy_inodecache(void)
{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
kmem_cache_destroy(ufs_inode_cachep);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0d9de41..bbcbf2a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1582,6 +1582,11 @@ xfs_init_zones(void)
STATIC void
xfs_destroy_zones(void)
{
+ /*
+ * Make sure all delayed rcu free are flushed before we
+ * destroy caches.
+ */
+ rcu_barrier();
kmem_zone_destroy(xfs_ili_zone);
kmem_zone_destroy(xfs_inode_zone);
kmem_zone_destroy(xfs_efi_zone);
--
1.7.10
On Fri, Jun 08, 2012 at 02:43:58PM -0700, Linus Torvalds wrote:
> On Fri, Jun 8, 2012 at 2:28 PM, Kirill A. Shutemov
> <[email protected]> wrote:
> > From: "Kirill A. Shutemov" <[email protected]>
> >
> > There's no reason to call rcu_barrier() on every deactivate_locked_super().
> > We only need to make sure that all delayed rcu free inodes are flushed
> > before we destroy related cache.
> >
> > Removing rcu_barrier() from deactivate_locked_super() affects some
> > fas paths. E.g. on my machine exit_group() of a last process in IPC
> > namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
>
> I think we should just delete it.
>
> kmem_cache_destroy() (at least for SLUB) already has:
>
> if (s->flags & SLAB_DESTROY_BY_RCU)
> rcu_barrier();
>
> in it. But I think it's too late - it gets called *after* we do
> kmem_cache_close(), and I get the feeling that we should do it before.
>
> Shouldn't that be sufficient? And if other slab allocators don't have
> this, we should add it to them too.
>
> Hmm?
When I tried SLAB_DESTROY_BY_RCU I've got problem:
[ 36.687999] Pid: 3455, comm: rmmod Not tainted 3.5.0-rc1-00130-g48d212a-dirty #40
[ 36.688001] Call Trace:
[ 36.688012] [<ffffffff8113367a>] slab_err+0xaa/0xd0
[ 36.688020] [<ffffffff8113515a>] ? __kmalloc+0x10a/0x110
[ 36.688026] [<ffffffff8113647d>] kmem_cache_destroy+0x1dd/0x420
[ 36.688056] [<ffffffffa00f0f25>] btrfs_destroy_cachep+0x15/0x60 [btrfs]
[ 36.688076] [<ffffffffa013cac3>] exit_btrfs_fs+0x9/0x3a [btrfs]
[ 36.688083] [<ffffffff810c324e>] sys_delete_module+0x16e/0x2f0
[ 36.688090] [<ffffffff8128cf29>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 36.688097] [<ffffffff8161eba6>] system_call_fastpath+0x1a/0x1f
[ 36.688111] Pid: 3455, comm: rmmod Not tainted 3.5.0-rc1-00130-g48d212a-dirty #40
[ 36.688114] Call Trace:
[ 36.688119] [<ffffffff811365ee>] kmem_cache_destroy+0x34e/0x420
[ 36.688143] [<ffffffffa00f0f25>] btrfs_destroy_cachep+0x15/0x60 [btrfs]
[ 36.688162] [<ffffffffa013cac3>] exit_btrfs_fs+0x9/0x3a [btrfs]
[ 36.688168] [<ffffffff810c324e>] sys_delete_module+0x16e/0x2f0
[ 36.688174] [<ffffffff8128cf29>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 36.688179] [<ffffffff8161eba6>] system_call_fastpath+0x1a/0x1f
IIUC, moving rcu_barrier() up should help, but I can't say that I fully
understand SLAB_DESTROY_BY_RCU semantics.
--
Kirill A. Shutemov
On Sat, 9 Jun 2012 00:41:03 +0300
"Kirill A. Shutemov" <[email protected]> wrote:
> There's no reason to call rcu_barrier() on every deactivate_locked_super().
> We only need to make sure that all delayed rcu free inodes are flushed
> before we destroy related cache.
>
> Removing rcu_barrier() from deactivate_locked_super() affects some
> fas paths. E.g. on my machine exit_group() of a last process in IPC
> namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
What an unpleasant patch. Is final-process-exiting-ipc-namespace a
sufficiently high-frequency operation to justify the change?
I don't really understand what's going on here. Are you saying that
there is some filesystem against which we run deactivate_locked_super()
during exit_group(), and that this filesystem doesn't use rcu-freeing
of inodes? The description needs this level of detail, please.
The implementation would be less unpleasant if we could do the
rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that
without adding a dedicated slab flag, which would require editing all
the filesystems anyway.
(kmem_cache_destroy() already has an rcu_barrier(). Can we do away
with the private rcu games in the vfs and switch to
SLAB_DESTROY_BY_RCU?)
On Fri, Jun 8, 2012 at 3:00 PM, Kirill A. Shutemov
<[email protected]> wrote:
>
> IIUC, moving rcu_barrier() up should help, but I can't say that I fully
> understand SLAB_DESTROY_BY_RCU semantics.
.. hmm. I think you may be right. Even if we do move it up, we
probably shouldn't use it.
We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU
free for other reasons anyway, so it would duplicate the RCU delaying
and cause problems. I forgot about that little complication.
We could have a separate "RCU_BARRIER_ON_DESTROY" thing, but that's
just silly too.
Maybe your patch is the right thing.
Linus
On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
> On Sat, 9 Jun 2012 00:41:03 +0300
> "Kirill A. Shutemov" <[email protected]> wrote:
>
> > There's no reason to call rcu_barrier() on every deactivate_locked_super().
> > We only need to make sure that all delayed rcu free inodes are flushed
> > before we destroy related cache.
> >
> > Removing rcu_barrier() from deactivate_locked_super() affects some
> > fas paths. E.g. on my machine exit_group() of a last process in IPC
> > namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
>
> What an unpleasant patch. Is final-process-exiting-ipc-namespace a
> sufficiently high-frequency operation to justify the change?
> I don't really understand what's going on here. Are you saying that
> there is some filesystem against which we run deactivate_locked_super()
> during exit_group(), and that this filesystem doesn't use rcu-freeing
> of inodes? The description needs this level of detail, please.
I think the rcu_barrier() is in wrong place. We need it to safely destroy
inode cache. deactivate_locked_super() is part of umount() path, but all
filesystems I've checked have inode cache for whole filesystem, not
per-mount.
> The implementation would be less unpleasant if we could do the
> rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that
> without adding a dedicated slab flag, which would require editing all
> the filesystems anyway.
I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.
> (kmem_cache_destroy() already has an rcu_barrier(). Can we do away
> with the private rcu games in the vfs and switch to
> SLAB_DESTROY_BY_RCU?)
--
Kirill A. Shutemov
On Sat, Jun 09, 2012 at 01:14:46AM +0300, Kirill A. Shutemov wrote:
> > The implementation would be less unpleasant if we could do the
> > rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that
> > without adding a dedicated slab flag, which would require editing all
> > the filesystems anyway.
>
> I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.
You've got to be kidding. Please, show us the codepath that would be hot
enough to make that too expensive and would contain kmem_cache_destroy().
Note that module unload is *not* a hot path - not on any even remotely sane
use.
On Fri, Jun 08, 2012 at 03:06:20PM -0700, Linus Torvalds wrote:
> .. hmm. I think you may be right. Even if we do move it up, we
> probably shouldn't use it.
>
> We don't even want SLAB_DESTROY_BY_RCU, since we do the delayed RCU
> free for other reasons anyway, so it would duplicate the RCU delaying
> and cause problems. I forgot about that little complication.
>
> We could have a separate "RCU_BARRIER_ON_DESTROY" thing, but that's
> just silly too.
Why not make that rcu_barrier() in there unconditional? Where are
we creating/destroying caches often enough for that to become a problem?
On Sat, 9 Jun 2012 01:14:46 +0300
"Kirill A. Shutemov" <[email protected]> wrote:
> On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
> > On Sat, 9 Jun 2012 00:41:03 +0300
> > "Kirill A. Shutemov" <[email protected]> wrote:
> >
> > > There's no reason to call rcu_barrier() on every deactivate_locked_super().
> > > We only need to make sure that all delayed rcu free inodes are flushed
> > > before we destroy related cache.
> > >
> > > Removing rcu_barrier() from deactivate_locked_super() affects some
> > > fas paths. E.g. on my machine exit_group() of a last process in IPC
> > > namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
> >
> > What an unpleasant patch. Is final-process-exiting-ipc-namespace a
> > sufficiently high-frequency operation to justify the change?
This, please.
> > I don't really understand what's going on here. Are you saying that
> > there is some filesystem against which we run deactivate_locked_super()
> > during exit_group(), and that this filesystem doesn't use rcu-freeing
> > of inodes? The description needs this level of detail, please.
You still haven't explained where this deactivate_locked_super() call
is coming from. Oh well.
> I think the rcu_barrier() is in wrong place. We need it to safely destroy
> inode cache. deactivate_locked_super() is part of umount() path, but all
> filesystems I've checked have inode cache for whole filesystem, not
> per-mount.
Well from a design perspective, putting the rcu_barrier() in the vfs is
the *correct* place. Individual filesystems shouldn't be hard-coding
knowledge about vfs internal machinery.
A neater implementation might be to add a kmem_cache* argument to
unregister_filesystem(). If that is non-NULL, unregister_filesystem()
does the rcu_barrier() and destroys the cache. That way we get to
delete (rather than add) a bunch of code from all filesystems and new
and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> > The implementation would be less unpleasant if we could do the
> > rcu_barrier() in kmem_cache_destroy(). I can't see a way of doing that
> > without adding a dedicated slab flag, which would require editing all
> > the filesystems anyway.
>
> I think rcu_barrier() for all kmem_cache_destroy() would be too expensive.
That is not what I proposed.
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> A neater implementation might be to add a kmem_cache* argument to
> unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> does the rcu_barrier() and destroys the cache. That way we get to
> delete (rather than add) a bunch of code from all filesystems and new
> and out-of-tree filesystems cannot forget to perform the rcu_barrier().
There's often enough more than one cache, so that one is no-go.
On Fri, Jun 8, 2012 at 3:23 PM, Al Viro <[email protected]> wrote:
>
> Note that module unload is *not* a hot path - not on any even remotely sane
> use.
Actually, I think we've had distributions that basically did a "load
pretty much everything, and let God sort it out" approach to modules.
I know some people *have* actually worried about module load/unload
performance.
Whether it is "remotely sane" I'm not going to argue for, but ..
Linus
On Fri, 8 Jun 2012 23:27:34 +0100
Al Viro <[email protected]> wrote:
> On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
>
> > A neater implementation might be to add a kmem_cache* argument to
> > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > does the rcu_barrier() and destroys the cache. That way we get to
> > delete (rather than add) a bunch of code from all filesystems and new
> > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
>
> There's often enough more than one cache, so that one is no-go.
kmem_cache** ;)
Which filesystems have multiple inode caches?
On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
> On Fri, 8 Jun 2012 23:27:34 +0100
> Al Viro <[email protected]> wrote:
>
> > On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> >
> > > A neater implementation might be to add a kmem_cache* argument to
> > > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > > does the rcu_barrier() and destroys the cache. That way we get to
> > > delete (rather than add) a bunch of code from all filesystems and new
> > > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> >
> > There's often enough more than one cache, so that one is no-go.
>
> kmem_cache** ;)
>
> Which filesystems have multiple inode caches?
inodes are not the only things that get caches of their own...
BTW, Kirill, would you mind not cross-posting to that many lists ever again?
On Fri, 8 Jun 2012 23:36:24 +0100
Al Viro <[email protected]> wrote:
> On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
> > On Fri, 8 Jun 2012 23:27:34 +0100
> > Al Viro <[email protected]> wrote:
> >
> > > On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> > >
> > > > A neater implementation might be to add a kmem_cache* argument to
> > > > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > > > does the rcu_barrier() and destroys the cache. That way we get to
> > > > delete (rather than add) a bunch of code from all filesystems and new
> > > > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> > >
> > > There's often enough more than one cache, so that one is no-go.
> >
> > kmem_cache** ;)
> >
> > Which filesystems have multiple inode caches?
>
> inodes are not the only things that get caches of their own...
Yes, but other random non-inode caches do not get rcu requirements
secretly forced upon them by the vfs so don't need rcu_barrier() prior
to their destruction?
> BTW, Kirill, would you mind not cross-posting to that many lists ever again?
I dunno, I like all those little messages - it makes me feel important.
On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> On Sat, 9 Jun 2012 01:14:46 +0300
> "Kirill A. Shutemov" <[email protected]> wrote:
>
> > On Fri, Jun 08, 2012 at 03:02:53PM -0700, Andrew Morton wrote:
> > > On Sat, 9 Jun 2012 00:41:03 +0300
> > > "Kirill A. Shutemov" <[email protected]> wrote:
> > >
> > > > There's no reason to call rcu_barrier() on every deactivate_locked_super().
> > > > We only need to make sure that all delayed rcu free inodes are flushed
> > > > before we destroy related cache.
> > > >
> > > > Removing rcu_barrier() from deactivate_locked_super() affects some
> > > > fas paths. E.g. on my machine exit_group() of a last process in IPC
> > > > namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
> > >
> > > What an unpleasant patch. Is final-process-exiting-ipc-namespace a
> > > sufficiently high-frequency operation to justify the change?
>
> This, please.
ALT Linux guys use a namespaces (including IPC namespace) to create
sandbox[1] for build system and other use cases. The build system calls
the sandboxing wrapper frequently on setup building chroot and build
prepare. This kind of delays affect timings significantly.
[1] http://git.altlinux.org/people/ldv/packages/hasher-priv.git
> > > I don't really understand what's going on here. Are you saying that
> > > there is some filesystem against which we run deactivate_locked_super()
> > > during exit_group(), and that this filesystem doesn't use rcu-freeing
> > > of inodes? The description needs this level of detail, please.
>
> You still haven't explained where this deactivate_locked_super() call
> is coming from. Oh well.
Call Trace:
[<ffffffff81443a2a>] schedule+0x3a/0x50
[<ffffffff81441e7d>] schedule_timeout+0x1cd/0x2c0
[<ffffffff811f8f87>] ? mqueue_destroy_inode+0x17/0x20
[<ffffffff81443044>] wait_for_common+0xc4/0x160
[<ffffffff8107af50>] ? try_to_wake_up+0x2a0/0x2a0
[<ffffffff810d63b0>] ? call_rcu_sched+0x10/0x20
[<ffffffff810d63a0>] ? call_rcu_bh+0x20/0x20
[<ffffffff81443188>] wait_for_completion+0x18/0x20
[<ffffffff810d5a9b>] _rcu_barrier.clone.31+0x9b/0xb0
[<ffffffff810d5ac0>] rcu_barrier_sched+0x10/0x20
[<ffffffff810d5ad9>] rcu_barrier+0x9/0x10
[<ffffffff811602c9>] deactivate_locked_super+0x49/0x90
[<ffffffff81160d35>] deactivate_super+0x45/0x60
[<ffffffff8117ad74>] mntput_no_expire+0x104/0x150
[<ffffffff8117addc>] mntput+0x1c/0x30
[<ffffffff8117cda7>] kern_unmount+0x27/0x30
[<ffffffff811faeb0>] mq_put_mnt+0x10/0x20
[<ffffffff811fb57f>] put_ipc_ns+0x3f/0xb0
[<ffffffff81071f5c>] free_nsproxy+0x3c/0xa0
[<ffffffff81072143>] switch_task_namespaces+0x33/0x40
[<ffffffff8107215b>] exit_task_namespaces+0xb/0x10
[<ffffffff8104f154>] do_exit+0x4b4/0x8a0
[<ffffffff8104f7e3>] do_group_exit+0x53/0xc0
[<ffffffff8104f862>] sys_exit_group+0x12/0x20
[<ffffffff8144c939>] system_call_fastpath+0x16/0x1b
--
Kirill A. Shutemov
On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
> On Fri, 8 Jun 2012 23:27:34 +0100
> Al Viro <[email protected]> wrote:
>
> > On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> >
> > > A neater implementation might be to add a kmem_cache* argument to
> > > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > > does the rcu_barrier() and destroys the cache. That way we get to
> > > delete (rather than add) a bunch of code from all filesystems and new
> > > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> >
> > There's often enough more than one cache, so that one is no-go.
>
> kmem_cache** ;)
>
> Which filesystems have multiple inode caches?
Multiple inode caches? No.
Multiple caches with call_rcu() free? See btrfs or gfs2.
--
Kirill A. Shutemov
On Fri, Jun 08, 2012 at 11:36:24PM +0100, Al Viro wrote:
> On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
> > On Fri, 8 Jun 2012 23:27:34 +0100
> > Al Viro <[email protected]> wrote:
> >
> > > On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> > >
> > > > A neater implementation might be to add a kmem_cache* argument to
> > > > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > > > does the rcu_barrier() and destroys the cache. That way we get to
> > > > delete (rather than add) a bunch of code from all filesystems and new
> > > > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> > >
> > > There's often enough more than one cache, so that one is no-go.
> >
> > kmem_cache** ;)
> >
> > Which filesystems have multiple inode caches?
>
> inodes are not the only things that get caches of their own...
>
> BTW, Kirill, would you mind not cross-posting to that many lists ever again?
Sorry for that. I haven't deal with patches that potentially affect so
many people before.
--
Kirill A. Shutemov
On Sat, 9 Jun 2012 02:31:27 +0300
"Kirill A. Shutemov" <[email protected]> wrote:
> On Fri, Jun 08, 2012 at 03:31:20PM -0700, Andrew Morton wrote:
> > On Fri, 8 Jun 2012 23:27:34 +0100
> > Al Viro <[email protected]> wrote:
> >
> > > On Fri, Jun 08, 2012 at 03:25:50PM -0700, Andrew Morton wrote:
> > >
> > > > A neater implementation might be to add a kmem_cache* argument to
> > > > unregister_filesystem(). If that is non-NULL, unregister_filesystem()
> > > > does the rcu_barrier() and destroys the cache. That way we get to
> > > > delete (rather than add) a bunch of code from all filesystems and new
> > > > and out-of-tree filesystems cannot forget to perform the rcu_barrier().
> > >
> > > There's often enough more than one cache, so that one is no-go.
> >
> > kmem_cache** ;)
> >
> > Which filesystems have multiple inode caches?
>
> Multiple inode caches? No.
> Multiple caches with call_rcu() free? See btrfs or gfs2.
OK. But for those non-inode caches, the rcu treatment is private to
the filesystem. Hence it is appropriate that the filesystem call
rcu_barrier() for those caches.
But in the case of the inode caches, the rcu treatment is a vfs thing,
so it is the vfs which should perform the rcu_barrier().
This is a red herring - those non-inode caches have nothing to do with
the issue we're dicussing.
So how about open-coding the rcu_barrier() in btrfs and gfs2 for the
non-inode caches (which is the appropriate place), and hand the inode
cache over to the vfs for treatment (which is the appropriate place).
The downside is that btrfs and gfs2 will do an extra rcu_barrier() at
umount time. Shrug.
If they really want to super-optimise that, they can skip the private
rcu_barrier() call and assume that the vfs will be doing it. Not a
good idea, IMO.
On Fri, Jun 8, 2012 at 4:37 PM, Andrew Morton <[email protected]> wrote:
>
> So how about open-coding the rcu_barrier() in btrfs and gfs2 for the
> non-inode caches (which is the appropriate place), and hand the inode
> cache over to the vfs for treatment (which is the appropriate place).
The thing is, none of the inode caches are really up to the VFS. They
are per-filesystem caches, that just *embed* an inode as part of the
bigger ext4_inode or whatever. But apart from the fact that the
embedded inode requires them to then use the proper "call_rcu()" stuff
to do the delayed free, they really are pretty much filesystem data
structures. The VFS layer can neither free them or allocate them,
since the VFS layer doesn't even know how big the structures are, or
where the inodes are embedded, or how to initialize them (or even when
to allocate them).
Of course, if you just mean having a VFS wrapper that does
static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
{
rcu_barrier();
kmem_cache_destroy(cachep);
}
then we could do that. Not much better than what Kirill's patch did,
but at least we could have that comment in just one single place.
Linus
On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds <[email protected]> wrote:
> Of course, if you just mean having a VFS wrapper that does
>
> static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
> {
> rcu_barrier();
> kmem_cache_destroy(cachep);
> }
>
> then we could do that. Not much better than what Kirill's patch did,
> but at least we could have that comment in just one single place.
That's conceptually what I meant. But it has the problem that new and
out-of-tree filesystems might forget to do it. Which is why I suggest
adding a kmem_cache* argument to unregister_filesystem() for this.
It's a bit awkward, and the fs can pass in NULL if it knows what it's
doing. But it's reliable.
Il 09/06/2012 02:28, Andrew Morton ha scritto:
> On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds<[email protected]> wrote:
>
>> Of course, if you just mean having a VFS wrapper that does
>>
>> static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
>> {
>> rcu_barrier();
>> kmem_cache_destroy(cachep);
>> }
>>
>> then we could do that. Not much better than what Kirill's patch did,
>> but at least we could have that comment in just one single place.
>
> That's conceptually what I meant. But it has the problem that new and
> out-of-tree filesystems might forget to do it. Which is why I suggest
> adding a kmem_cache* argument to unregister_filesystem() for this.
>
> It's a bit awkward, and the fs can pass in NULL if it knows what it's
> doing. But it's reliable.
> --
The call of rcu_barrier should be mandatory for the "unload fs module"
problem, right? If the fs is compiled statically maybe we could avoid
it, but (eventually) this kind of decision is per-fs, so this could be a
clue that the call of rcu_barrier maybe is inside each fs not in VFS.
Marco
On Sat, 09 Jun 2012 09:06:28 +0200 Marco Stornelli <[email protected]> wrote:
> Il 09/06/2012 02:28, Andrew Morton ha scritto:
> > On Fri, 8 Jun 2012 16:46:47 -0700 Linus Torvalds<[email protected]> wrote:
> >
> >> Of course, if you just mean having a VFS wrapper that does
> >>
> >> static void vfs_inode_kmem_cache_destroy(struct kmem_cache *cachep)
> >> {
> >> rcu_barrier();
> >> kmem_cache_destroy(cachep);
> >> }
> >>
> >> then we could do that. Not much better than what Kirill's patch did,
> >> but at least we could have that comment in just one single place.
> >
> > That's conceptually what I meant. But it has the problem that new and
> > out-of-tree filesystems might forget to do it. Which is why I suggest
> > adding a kmem_cache* argument to unregister_filesystem() for this.
> >
> > It's a bit awkward, and the fs can pass in NULL if it knows what it's
> > doing. But it's reliable.
> > --
>
> The call of rcu_barrier should be mandatory for the "unload fs module"
> problem, right? If the fs is compiled statically maybe we could avoid
> it, but (eventually) this kind of decision is per-fs, so this could be a
> clue that the call of rcu_barrier maybe is inside each fs not in VFS.
>
No, this is unrelated to module unloading and the problem affects
statically linked filesystems also. The requirement is that all inodes
which are pending rcu freeing be flushed (and freed) before their cache
is destroyed in kmem_cache_destroy().
And... it seems that I misread what's going on. The individual
filesystems are doing the rcu freeing of their inodes, so it is
appropriate that they also call rcu_barrier() prior to running
kmem_cache_free(). Which is what Kirill's patch does. oops.
On Sat, Jun 09, 2012 at 12:25:57AM -0700, Andrew Morton wrote:
> And... it seems that I misread what's going on. The individual
> filesystems are doing the rcu freeing of their inodes, so it is
> appropriate that they also call rcu_barrier() prior to running
> kmem_cache_free(). Which is what Kirill's patch does. oops.
Ack? ;)
--
Kirill A. Shutemov