Reduce contention around global locks used in kernfs.
The current kernfs design makes use of 3 global locks to synchronize
various operations. There are a few other global locks as well but
they are used for specific cases and hence don't cause severe contention.
The above mentioned 3 main global locks used in kernfs are:
1. A global mutex, kernfs_open_file_mutex, to protect the list of
kernfs_open_file instances correspondng to a sysfs attribute.
2. A global spinlock, kernfs_open_node_lock, to protect
kernfs_node->attr.open which points to kernfs_open_node instance
corresponding to a kernfs_node.
3. A global per-fs rw semaphore, root->kernfs_rwsem, to synchronize most
of the other operations like adding, removing, renaming etc. of a file
or directory.
Since these locks are global, they can cause contention when multiple
(for example few hundred) applications try to access sysfs (or other kernfs
based file system) files in parallel, even if the applications are
accessing different and unrelated files.
For example on a system with 384 cores, if I run 200 instances of an
application which is mostly executing the following loop:
for (int loop = 0; loop <100 ; loop++)
{
for (int port_num = 1; port_num < 2; port_num++)
{
for (int gid_index = 0; gid_index < 254; gid_index++ )
{
char ret_buf[64], ret_buf_lo[64];
char gid_file_path[1024];
int ret_len;
int ret_fd;
ssize_t ret_rd;
ub4 i, saved_errno;
memset(ret_buf, 0, sizeof(ret_buf));
memset(gid_file_path, 0, sizeof(gid_file_path));
ret_len = snprintf(gid_file_path, sizeof(gid_file_path),
"/sys/class/infiniband/%s/ports/%d/gids/%d",
dev_name,
port_num,
gid_index);
ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC);
if (ret_fd < 0)
{
printf("Failed to open %s\n", gid_file_path);
continue;
}
/* Read the GID */
ret_rd = read(ret_fd, ret_buf, 40);
if (ret_rd == -1)
{
printf("Failed to read from file %s, errno: %u\n",
gid_file_path, saved_errno);
continue;
}
close(ret_fd);
}
}
I can see contention around above mentioned locks as follows:
- 54.07% 53.60% showgids [kernel.kallsyms] [k] osq_lock
- 53.60% __libc_start_main
- 32.29% __GI___libc_open
entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_open
do_sys_open
do_filp_open
path_openat
vfs_open
do_dentry_open
kernfs_fop_open
mutex_lock
- __mutex_lock_slowpath
- 32.23% __mutex_lock.isra.5
osq_lock
- 21.31% __GI___libc_close
entry_SYSCALL_64_after_hwframe
do_syscall_64
exit_to_usermode_loop
task_work_run
____fput
__fput
kernfs_fop_release
kernfs_put_open_node.isra.8
mutex_lock
- __mutex_lock_slowpath
- 21.28% __mutex_lock.isra.5
osq_lock
- 10.49% 10.39% showgids [kernel.kallsyms] [k] down_read
10.39% __libc_start_main
__GI___libc_open
entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_open
do_sys_open
do_filp_open
- path_openat
- 9.72% link_path_walk
- 5.21% inode_permission
- __inode_permission
- 5.21% kernfs_iop_permission
down_read
- 4.08% walk_component
lookup_fast
- d_revalidate.part.24
- 4.08% kernfs_dop_revalidate
- 7.48% 7.41% showgids [kernel.kallsyms] [k] up_read
7.41% __libc_start_main
__GI___libc_open
entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_open
do_sys_open
do_filp_open
- path_openat
- 7.01% link_path_walk
- 4.12% inode_permission
- __inode_permission
- 4.12% kernfs_iop_permission
up_read
- 2.61% walk_component
lookup_fast
- d_revalidate.part.24
- 2.61% kernfs_dop_revalidate
Moreover this run of 200 application isntances takes 32-34 secs. to
complete.
This patch set is reducing the above mentioned contention by replacing
these global locks with hashed locks.
For example with the patched kernel and on the same test setup, we no
longer see contention around osq_lock (i.e kernfs_open_file_mutex) and also
contention around per-fs kernfs_rwsem has reduced significantly as well.
This can be seen in the following perf snippet:
- 1.66% 1.65% showgids [kernel.kallsyms] [k] down_read
1.65% __libc_start_main
__GI___libc_open
entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_open
do_sys_open
do_filp_open
- path_openat
- 1.62% link_path_walk
- 0.98% inode_permission
- __inode_permission
+ 0.98% kernfs_iop_permission
- 0.52% walk_component
lookup_fast
- d_revalidate.part.24
- 0.52% kernfs_dop_revalidate
- 1.12% 1.11% showgids [kernel.kallsyms] [k] up_read
1.11% __libc_start_main
__GI___libc_open
entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_open
do_sys_open
do_filp_open
- path_openat
- 1.11% link_path_walk
- 0.69% inode_permission
- __inode_permission
- 0.69% kernfs_iop_permission
up_read
Moreover the test execution time has reduced from 32-34 secs to 18-19 secs.
The patches of this patchset introduce following changes:
PATCH-1: Introduce hashed mutexes to replace global kernfs_open_file_mutex.
PATCH-2: Replace global kernfs_open_file_mutex with hashed mutexes.
PATCH-3: Introduce hashed spinlocks to replace global kernfs_open_node_lock.
PATCH-4: Replace global kernfs_open_node_lock with hashed spinlocks.
PATCH-5: Use a per-fs rwsem to protect list of kernfs_super_info.
PATCH-6: Introduce hashed rw-sem to replace per-fs kernfs_rwsem.
PATCH-7: Replace per-fs rwsem with hashed ones.
------------------------------------------------------------------
Changes since v5:
- Addressed review comments from Greg
- Clean up duplicate code.
- Addressed review comments from Tejun
- Explain how current value of NR_KERNFS_LOCKS were obtained for
different values of NR_CPUS.
- Use single hash table for locks in place of per-fs hash table
for locks.
- Move introduction of supers_rwsem to a separate patch.
- Separate interface and usage part of hashed rwsem implementation
into 2 patches.
- Use address of rwsems to determine locking order in case of
nested locking. This approach avoids ABBA deadlock possibility.
- Change some #define macros into enum, with proper prefix.
- Taking a cue from Tejun's feedback about separating hashed rwsem
interface and usage into 2 patches, I have also divided the patch
that introduced hashed mutex and spinlock, into separate patches.
- Rebase on linux-next tag: next-20220211
Changes since v4:
- Removed compilation warnings reported by the 0-day bot.
Changes since v3:
- Make open_file_mutex and open_node_lock per-fs.
- Replace per-fs rwsem with per-fs hashed rwsem.
(Suggested by Tejun Heo <[email protected]>)
Imran Khan (7):
kernfs: Introduce hashed mutexes to replace global
kernfs_open_file_mutex.
kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.
kernfs: Introduce hashed spinlocks to replace global
kernfs_open_node_lock.
kernfs: Replace global kernfs_open_node_lock with hashed spinlocks.
kernfs: Use a per-fs rwsem to protect per-fs list of
kernfs_super_info.
kernfs: Introduce hashed rw-sem to replace per-fs kernfs_rwsem.
kernfs: Replace per-fs rwsem with hashed ones.
fs/kernfs/dir.c | 132 ++++----
fs/kernfs/file.c | 65 ++--
fs/kernfs/inode.c | 22 +-
fs/kernfs/kernfs-internal.h | 604 +++++++++++++++++++++++++++++++++++-
fs/kernfs/mount.c | 29 +-
fs/kernfs/symlink.c | 5 +-
include/linux/kernfs.h | 69 ++++
7 files changed, 802 insertions(+), 124 deletions(-)
base-commit: 6d9bd4ad4ca08b1114e814c2c42383b8b13be631
--
2.30.2
In current kernfs design a single spinlock, kernfs_open_node_lock, protects
the kernfs_node->attr.open i.e kernfs_open_node instances corresponding to
a sysfs attribute. So even if different tasks are opening or closing
different sysfs files they can contend on this spinlock. The contention is
more apparent in large scale systems with few hundred CPUs where most of
the CPUs have running tasks that are opening, accessing or closing sysfs
files at any point of time.
Using hashed spinlocks in place of a single global spinlock, can
significantly reduce contention around global spinlock and hence provide
better scalability. Moreover as these hashed spinlocks are not part of
kernfs_node objects we will not see any singnificant change in memory
utilization of kernfs based file systems like sysfs, cgroupfs etc.
This patch introduces hashed spinlocks that can be used in place of above
mentioned global spinlock. It also provides interfaces needed to use hashed
spinlocks. The next patch makes use of these interfaces and replaces global
spinlock with hashed ones.
Signed-off-by: Imran Khan <[email protected]>
---
fs/kernfs/kernfs-internal.h | 20 ++++++++++++++++++++
fs/kernfs/mount.c | 4 +++-
include/linux/kernfs.h | 11 +++++++++--
3 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 03e983953eda4..593395f325a18 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -170,4 +170,24 @@ static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
return lock;
}
+static inline spinlock_t *
+kernfs_open_node_spinlock_ptr(struct kernfs_node *kn)
+{
+ int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS);
+
+ return &kernfs_locks->open_node_locks[idx].lock;
+}
+
+static inline spinlock_t *
+kernfs_open_node_spinlock(struct kernfs_node *kn)
+{
+ spinlock_t *lock;
+
+ lock = kernfs_open_node_spinlock_ptr(kn);
+
+ spin_lock_irq(lock);
+
+ return lock;
+}
+
#endif /* __KERNFS_INTERNAL_H */
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index fa3fa22c95b21..809b738739b18 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -395,8 +395,10 @@ void __init kernfs_lock_init(void)
kernfs_locks = kmalloc(sizeof(struct kernfs_global_locks), GFP_KERNEL);
WARN_ON(!kernfs_locks);
- for (count = 0; count < NR_KERNFS_LOCKS; count++)
+ for (count = 0; count < NR_KERNFS_LOCKS; count++) {
mutex_init(&kernfs_locks->open_file_mutex[count].lock);
+ spin_lock_init(&kernfs_locks->open_node_locks[count].lock);
+ }
}
void __init kernfs_init(void)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 3f72d38d48e31..7ee0595b315a2 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -19,6 +19,7 @@
#include <linux/wait.h>
#include <linux/rwsem.h>
#include <linux/cache.h>
+#include <linux/spinlock.h>
struct file;
struct dentry;
@@ -75,20 +76,26 @@ struct kernfs_iattrs;
* kernfs_open_file.
* kernfs_open_files are chained at kernfs_open_node->files, which is
* protected by kernfs_open_file_mutex.lock.
+ *
+ * kernfs_node->attr.open points to kernfs_open_node. attr.open is
+ * protected by kernfs_open_node_lock.lock.
*/
-
struct kernfs_open_file_mutex {
struct mutex lock;
} ____cacheline_aligned_in_smp;
+struct kernfs_open_node_lock {
+ spinlock_t lock;
+} ____cacheline_aligned_in_smp;
+
/*
* To reduce possible contention in sysfs access, arising due to single
* locks, use an array of locks and use kernfs_node object address as
* hash keys to get the index of these locks.
*/
-
struct kernfs_global_locks {
struct kernfs_open_file_mutex open_file_mutex[NR_KERNFS_LOCKS];
+ struct kernfs_open_node_lock open_node_locks[NR_KERNFS_LOCKS];
};
enum kernfs_node_type {
--
2.30.2
In current kernfs design a single mutex, kernfs_open_file_mutex, protects
the list of kernfs_open_file instances corresponding to a sysfs attribute.
So even if different tasks are opening or closing different sysfs files
they can contend on osq_lock of this mutex. The contention is more apparent
in large scale systems with few hundred CPUs where most of the CPUs have
running tasks that are opening, accessing or closing sysfs files at any
point of time.
Using hashed mutexes in place of a single global mutex, can significantly
reduce contention around global mutex and hence can provide better
scalability. Moreover as these hashed mutexes are not part of kernfs_node
objects we will not see any singnificant change in memory utilization of
kernfs based file systems like sysfs, cgroupfs etc.
This patch introduces hashed mutexes that can be used in place of above
mentioned global mutex. It also provides interfaces needed to use hashed
mutexes. The next patch makes use of these interfaces and replaces global
mutex with hashed ones.
Signed-off-by: Imran Khan <[email protected]>
---
fs/kernfs/kernfs-internal.h | 23 +++++++++++++++
fs/kernfs/mount.c | 13 ++++++++
include/linux/kernfs.h | 59 +++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+)
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index f9cc912c31e1b..03e983953eda4 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -147,4 +147,27 @@ void kernfs_drain_open_files(struct kernfs_node *kn);
*/
extern const struct inode_operations kernfs_symlink_iops;
+/*
+ * kernfs locks
+ */
+extern struct kernfs_global_locks *kernfs_locks;
+
+static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
+{
+ int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS);
+
+ return &kernfs_locks->open_file_mutex[idx].lock;
+}
+
+static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
+{
+ struct mutex *lock;
+
+ lock = kernfs_open_file_mutex_ptr(kn);
+
+ mutex_lock(lock);
+
+ return lock;
+}
+
#endif /* __KERNFS_INTERNAL_H */
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index cfa79715fc1a7..fa3fa22c95b21 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -20,6 +20,7 @@
#include "kernfs-internal.h"
struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache;
+struct kernfs_global_locks *kernfs_locks;
static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
{
@@ -387,6 +388,17 @@ void kernfs_kill_sb(struct super_block *sb)
kfree(info);
}
+void __init kernfs_lock_init(void)
+{
+ int count;
+
+ kernfs_locks = kmalloc(sizeof(struct kernfs_global_locks), GFP_KERNEL);
+ WARN_ON(!kernfs_locks);
+
+ for (count = 0; count < NR_KERNFS_LOCKS; count++)
+ mutex_init(&kernfs_locks->open_file_mutex[count].lock);
+}
+
void __init kernfs_init(void)
{
kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
@@ -397,4 +409,5 @@ void __init kernfs_init(void)
kernfs_iattrs_cache = kmem_cache_create("kernfs_iattrs_cache",
sizeof(struct kernfs_iattrs),
0, SLAB_PANIC, NULL);
+ kernfs_lock_init();
}
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 861c4f0f8a29f..3f72d38d48e31 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -18,6 +18,7 @@
#include <linux/uidgid.h>
#include <linux/wait.h>
#include <linux/rwsem.h>
+#include <linux/cache.h>
struct file;
struct dentry;
@@ -34,6 +35,62 @@ struct kernfs_fs_context;
struct kernfs_open_node;
struct kernfs_iattrs;
+/*
+ * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash
+ * table of locks.
+ * Having a small hash table would impact scalability, since
+ * more and more kernfs_node objects will end up using same lock
+ * and having a very large hash table would waste memory.
+ *
+ * At the moment size of hash table of locks is being set based on
+ * the number of CPUs as follows:
+ *
+ * NR_CPU NR_KERNFS_LOCK_BITS NR_KERNFS_LOCKS
+ * 1 1 2
+ * 2-3 2 4
+ * 4-7 4 16
+ * 8-15 6 64
+ * 16-31 8 256
+ * 32 and more 10 1024
+ *
+ * The above relation between NR_CPU and number of locks is based
+ * on some internal experimentation which involved booting qemu
+ * with different values of smp, performing some sysfs operations
+ * on all CPUs and observing how increase in number of locks impacts
+ * completion time of these sysfs operations on each CPU.
+ */
+#ifdef CONFIG_SMP
+#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32)))
+#else
+#define NR_KERNFS_LOCK_BITS 1
+#endif
+
+#define NR_KERNFS_LOCKS (1 << NR_KERNFS_LOCK_BITS)
+
+/*
+ * There's one kernfs_open_file for each open file and one kernfs_open_node
+ * for each kernfs_node with one or more open files.
+ *
+ * filp->private_data points to seq_file whose ->private points to
+ * kernfs_open_file.
+ * kernfs_open_files are chained at kernfs_open_node->files, which is
+ * protected by kernfs_open_file_mutex.lock.
+ */
+
+struct kernfs_open_file_mutex {
+ struct mutex lock;
+} ____cacheline_aligned_in_smp;
+
+/*
+ * To reduce possible contention in sysfs access, arising due to single
+ * locks, use an array of locks and use kernfs_node object address as
+ * hash keys to get the index of these locks.
+ */
+
+struct kernfs_global_locks {
+ struct kernfs_open_file_mutex open_file_mutex[NR_KERNFS_LOCKS];
+};
+
enum kernfs_node_type {
KERNFS_DIR = 0x0001,
KERNFS_FILE = 0x0002,
@@ -413,6 +470,8 @@ void kernfs_kill_sb(struct super_block *sb);
void kernfs_init(void);
+void kernfs_lock_init(void);
+
struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
u64 id);
#else /* CONFIG_KERNFS */
--
2.30.2
Hello,
The patchset generally looks good to me but I think it can be split a bit
better and the locking helper code can be more compacted. Also, it'd be
great o include the benchmark method and result in the commit log so that it
can be looked up easily later. It just has to be in one of the patch
descriptions whether the first or last of the series and the rest can refer
to it.
Thanks.
--
tejun
Remove global kernfs_open_node_lock, using hashed spinlock and
corresponding interface introduced in previous patch.
Signed-off-by: Imran Khan <[email protected]>
---
fs/kernfs/file.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 295fe67950346..f3ecc6fe8aedc 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -18,15 +18,6 @@
#include "kernfs-internal.h"
-/*
- * kernfs_node->attr.open points to kernfs_open_node. attr.open is
- * protected by kernfs_open_node_lock.
- *
- * filp->private_data points to seq_file whose ->private points to
- * kernfs_open_file.
- */
-static DEFINE_SPINLOCK(kernfs_open_node_lock);
-
struct kernfs_open_node {
atomic_t refcnt;
atomic_t event;
@@ -520,10 +511,11 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
{
struct kernfs_open_node *on, *new_on = NULL;
struct mutex *mutex = NULL;
+ spinlock_t *lock = NULL;
retry:
mutex = kernfs_open_file_mutex_lock(kn);
- spin_lock_irq(&kernfs_open_node_lock);
+ lock = kernfs_open_node_spinlock(kn);
if (!kn->attr.open && new_on) {
kn->attr.open = new_on;
@@ -536,7 +528,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
list_add_tail(&of->list, &on->files);
}
- spin_unlock_irq(&kernfs_open_node_lock);
+ spin_unlock_irq(lock);
mutex_unlock(mutex);
if (on) {
@@ -572,10 +564,13 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
{
struct kernfs_open_node *on = kn->attr.open;
struct mutex *mutex = NULL;
+ spinlock_t *lock = NULL;
unsigned long flags;
mutex = kernfs_open_file_mutex_lock(kn);
- spin_lock_irqsave(&kernfs_open_node_lock, flags);
+ lock = kernfs_open_node_spinlock_ptr(kn);
+
+ spin_lock_irqsave(lock, flags);
if (of)
list_del(&of->list);
@@ -585,7 +580,7 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
else
on = NULL;
- spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+ spin_unlock_irqrestore(lock, flags);
mutex_unlock(mutex);
kfree(on);
@@ -768,15 +763,16 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
struct kernfs_open_node *on;
struct kernfs_open_file *of;
struct mutex *mutex = NULL;
+ spinlock_t *lock = NULL;
if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
return;
- spin_lock_irq(&kernfs_open_node_lock);
+ lock = kernfs_open_node_spinlock(kn);
on = kn->attr.open;
if (on)
atomic_inc(&on->refcnt);
- spin_unlock_irq(&kernfs_open_node_lock);
+ spin_unlock_irq(lock);
if (!on)
return;
@@ -921,13 +917,13 @@ void kernfs_notify(struct kernfs_node *kn)
return;
/* kick poll immediately */
- spin_lock_irqsave(&kernfs_open_node_lock, flags);
+ spin_lock_irqsave(kernfs_open_node_spinlock_ptr(kn), flags);
on = kn->attr.open;
if (on) {
atomic_inc(&on->event);
wake_up_interruptible(&on->poll);
}
- spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+ spin_unlock_irqrestore(kernfs_open_node_spinlock_ptr(kn), flags);
/* schedule work to kick fsnotify */
spin_lock_irqsave(&kernfs_notify_lock, flags);
--
2.30.2
Hello,
On Mon, Feb 14, 2022 at 11:03:16PM +1100, Imran Khan wrote:
> +extern struct kernfs_global_locks *kernfs_locks;
> +
> +static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
> +{
> + int idx = hash_ptr(kn, NR_KERNFS_LOCK_BITS);
> +
> + return &kernfs_locks->open_file_mutex[idx].lock;
> +}
> +
> +static inline struct mutex *kernfs_open_file_mutex_lock(struct kernfs_node *kn)
> +{
> + struct mutex *lock;
> +
> + lock = kernfs_open_file_mutex_ptr(kn);
> +
> + mutex_lock(lock);
> +
> + return lock;
> +}
So, splitting patches this way doesn't really help. Because this patch
introduces code which isn't used and the second patch does all the
meaningful changes. It'd be better if the first patch introduces the
interface without changing the actual locking - ie. introduce and convert to
use kernfs_open_file_mutex*() but make it return the same old global mutex,
and then the second patch adds the hashed locks and updates
kernfs_open_file_mutex*() to actually return hashed locks. This way, the
meaningful changes are split into two patches which can be verified
independently.
Thanks.
--
tejun
Remove global kernfs_open_file_mutex, using hashed mutex and corresponding
interface introduced in previous patch.
Signed-off-by: Imran Khan <[email protected]>
---
fs/kernfs/file.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 9414a7a60a9f4..295fe67950346 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -19,18 +19,13 @@
#include "kernfs-internal.h"
/*
- * There's one kernfs_open_file for each open file and one kernfs_open_node
- * for each kernfs_node with one or more open files.
- *
* kernfs_node->attr.open points to kernfs_open_node. attr.open is
* protected by kernfs_open_node_lock.
*
* filp->private_data points to seq_file whose ->private points to
- * kernfs_open_file. kernfs_open_files are chained at
- * kernfs_open_node->files, which is protected by kernfs_open_file_mutex.
+ * kernfs_open_file.
*/
static DEFINE_SPINLOCK(kernfs_open_node_lock);
-static DEFINE_MUTEX(kernfs_open_file_mutex);
struct kernfs_open_node {
atomic_t refcnt;
@@ -524,9 +519,10 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
struct kernfs_open_file *of)
{
struct kernfs_open_node *on, *new_on = NULL;
+ struct mutex *mutex = NULL;
retry:
- mutex_lock(&kernfs_open_file_mutex);
+ mutex = kernfs_open_file_mutex_lock(kn);
spin_lock_irq(&kernfs_open_node_lock);
if (!kn->attr.open && new_on) {
@@ -541,7 +537,7 @@ static int kernfs_get_open_node(struct kernfs_node *kn,
}
spin_unlock_irq(&kernfs_open_node_lock);
- mutex_unlock(&kernfs_open_file_mutex);
+ mutex_unlock(mutex);
if (on) {
kfree(new_on);
@@ -575,9 +571,10 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
struct kernfs_open_file *of)
{
struct kernfs_open_node *on = kn->attr.open;
+ struct mutex *mutex = NULL;
unsigned long flags;
- mutex_lock(&kernfs_open_file_mutex);
+ mutex = kernfs_open_file_mutex_lock(kn);
spin_lock_irqsave(&kernfs_open_node_lock, flags);
if (of)
@@ -589,7 +586,7 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
on = NULL;
spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
- mutex_unlock(&kernfs_open_file_mutex);
+ mutex_unlock(mutex);
kfree(on);
}
@@ -729,11 +726,11 @@ static void kernfs_release_file(struct kernfs_node *kn,
/*
* @of is guaranteed to have no other file operations in flight and
* we just want to synchronize release and drain paths.
- * @kernfs_open_file_mutex is enough. @of->mutex can't be used
+ * @open_file_mutex is enough. @of->mutex can't be used
* here because drain path may be called from places which can
* cause circular dependency.
*/
- lockdep_assert_held(&kernfs_open_file_mutex);
+ lockdep_assert_held(kernfs_open_file_mutex_ptr(kn));
if (!of->released) {
/*
@@ -750,11 +747,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
{
struct kernfs_node *kn = inode->i_private;
struct kernfs_open_file *of = kernfs_of(filp);
+ struct mutex *lock = NULL;
if (kn->flags & KERNFS_HAS_RELEASE) {
- mutex_lock(&kernfs_open_file_mutex);
+ lock = kernfs_open_file_mutex_lock(kn);
kernfs_release_file(kn, of);
- mutex_unlock(&kernfs_open_file_mutex);
+ mutex_unlock(lock);
}
kernfs_put_open_node(kn, of);
@@ -769,6 +767,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
{
struct kernfs_open_node *on;
struct kernfs_open_file *of;
+ struct mutex *mutex = NULL;
if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
return;
@@ -781,7 +780,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
if (!on)
return;
- mutex_lock(&kernfs_open_file_mutex);
+ mutex = kernfs_open_file_mutex_lock(kn);
list_for_each_entry(of, &on->files, list) {
struct inode *inode = file_inode(of->file);
@@ -793,7 +792,7 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
kernfs_release_file(kn, of);
}
- mutex_unlock(&kernfs_open_file_mutex);
+ mutex_unlock(mutex);
kernfs_put_open_node(kn, NULL);
}
--
2.30.2
Remove per-fs rwsem, using hashed rwsem and corresponding interface
introduced in previous patch. Also as we are removing use of per-fs
rwsem, we no longer need lockdep checkings under kernfs_active.
Wherever needed lockdep checkings can be made on a per-node basis using
interfaces provided in previous patch.
Signed-off-by: Imran Khan <[email protected]>
---
fs/kernfs/dir.c | 130 ++++++++++++++++++------------------
fs/kernfs/file.c | 4 +-
fs/kernfs/inode.c | 22 +++---
fs/kernfs/kernfs-internal.h | 1 -
fs/kernfs/mount.c | 5 +-
fs/kernfs/symlink.c | 5 +-
6 files changed, 80 insertions(+), 87 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 0dac58f8091c9..42404f8bc6acd 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -449,21 +449,22 @@ void kernfs_put_active(struct kernfs_node *kn)
/**
* kernfs_drain - drain kernfs_node
* @kn: kernfs_node to drain
+ * @anc: ancestor of kernfs_node to drain
*
* Drain existing usages and nuke all existing mmaps of @kn. Mutiple
* removers may invoke this function concurrently on @kn and all will
* return after draining is complete.
*/
-static void kernfs_drain(struct kernfs_node *kn)
- __releases(&kernfs_root(kn)->kernfs_rwsem)
- __acquires(&kernfs_root(kn)->kernfs_rwsem)
+static void kernfs_drain(struct kernfs_node *kn, struct kernfs_node *anc)
+ __releases(kernfs_rwsem_ptr(anc))
+ __acquires(kernfs_rwsem_ptr(anc))
{
struct kernfs_root *root = kernfs_root(kn);
- lockdep_assert_held_write(&root->kernfs_rwsem);
+ kernfs_rwsem_assert_held_write(anc);
WARN_ON_ONCE(kernfs_active(kn));
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(anc);
if (kernfs_lockdep(kn)) {
rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
@@ -482,7 +483,7 @@ static void kernfs_drain(struct kernfs_node *kn)
kernfs_drain_open_files(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(anc, KERNFS_RWSEM_LOCK_SELF);
}
/**
@@ -717,12 +718,16 @@ struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root,
int kernfs_add_one(struct kernfs_node *kn)
{
struct kernfs_node *parent = kn->parent;
- struct kernfs_root *root = kernfs_root(parent);
struct kernfs_iattrs *ps_iattr;
bool has_ns;
int ret;
- down_write(&root->kernfs_rwsem);
+ /**
+ * The node being added is not active at this point of time and may
+ * be activated later depending on CREATE_DEACTIVATED flag. So at
+ * this point of time just locking the parent is enough.
+ */
+ down_write_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
ret = -EINVAL;
has_ns = kernfs_ns_enabled(parent);
@@ -753,7 +758,7 @@ int kernfs_add_one(struct kernfs_node *kn)
ps_iattr->ia_mtime = ps_iattr->ia_ctime;
}
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(parent);
/*
* Activate the new node unless CREATE_DEACTIVATED is requested.
@@ -767,7 +772,7 @@ int kernfs_add_one(struct kernfs_node *kn)
return 0;
out_unlock:
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(parent);
return ret;
}
@@ -788,7 +793,7 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent,
bool has_ns = kernfs_ns_enabled(parent);
unsigned int hash;
- lockdep_assert_held(&kernfs_root(parent)->kernfs_rwsem);
+ kernfs_rwsem_assert_held(parent);
if (has_ns != (bool)ns) {
WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n",
@@ -820,7 +825,7 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent,
size_t len;
char *p, *name;
- lockdep_assert_held_read(&kernfs_root(parent)->kernfs_rwsem);
+ kernfs_rwsem_assert_held_read(parent);
/* grab kernfs_rename_lock to piggy back on kernfs_pr_cont_buf */
spin_lock_irq(&kernfs_rename_lock);
@@ -859,12 +864,11 @@ struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent,
const char *name, const void *ns)
{
struct kernfs_node *kn;
- struct kernfs_root *root = kernfs_root(parent);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
kn = kernfs_find_ns(parent, name, ns);
kernfs_get(kn);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
return kn;
}
@@ -884,12 +888,11 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent,
const char *path, const void *ns)
{
struct kernfs_node *kn;
- struct kernfs_root *root = kernfs_root(parent);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
kn = kernfs_walk_ns(parent, path, ns);
kernfs_get(kn);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
return kn;
}
@@ -914,7 +917,6 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
return ERR_PTR(-ENOMEM);
idr_init(&root->ino_idr);
- init_rwsem(&root->kernfs_rwsem);
INIT_LIST_HEAD(&root->supers);
init_rwsem(&root->supers_rwsem);
@@ -1045,7 +1047,6 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent,
static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
{
struct kernfs_node *kn;
- struct kernfs_root *root;
if (flags & LOOKUP_RCU)
return -ECHILD;
@@ -1061,13 +1062,12 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
parent = kernfs_dentry_node(dentry->d_parent);
if (parent) {
spin_unlock(&dentry->d_lock);
- root = kernfs_root(parent);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
if (kernfs_dir_changed(parent, dentry)) {
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
return 0;
}
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
} else
spin_unlock(&dentry->d_lock);
@@ -1078,8 +1078,7 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
}
kn = kernfs_dentry_node(dentry);
- root = kernfs_root(kn);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
/* The kernfs node has been deactivated */
if (!kernfs_active(kn))
@@ -1098,10 +1097,10 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
kernfs_info(dentry->d_sb)->ns != kn->ns)
goto out_bad;
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(kn);
return 1;
out_bad:
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(kn);
return 0;
}
@@ -1115,28 +1114,29 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir,
{
struct kernfs_node *parent = dir->i_private;
struct kernfs_node *kn;
- struct kernfs_root *root;
struct inode *inode = NULL;
const void *ns = NULL;
- root = kernfs_root(parent);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dir->i_sb)->ns;
kn = kernfs_find_ns(parent, dentry->d_name.name, ns);
+ up_read_kernfs_rwsem(parent);
/* attach dentry and inode */
if (kn) {
/* Inactive nodes are invisible to the VFS so don't
* create a negative.
*/
+ down_read_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
if (!kernfs_active(kn)) {
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(kn);
return NULL;
}
inode = kernfs_get_inode(dir->i_sb, kn);
if (!inode)
inode = ERR_PTR(-ENOMEM);
+ up_read_kernfs_rwsem(kn);
}
/*
* Needed for negative dentry validation.
@@ -1144,9 +1144,10 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir,
* or transforms from positive dentry in dentry_unlink_inode()
* called from vfs_rmdir().
*/
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
if (!IS_ERR(inode))
kernfs_set_rev(parent, dentry);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
/* instantiate and hash (possibly negative) dentry */
return d_splice_alias(inode, dentry);
@@ -1269,7 +1270,7 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
{
struct rb_node *rbn;
- lockdep_assert_held_write(&kernfs_root(root)->kernfs_rwsem);
+ kernfs_rwsem_assert_held_write(root);
/* if first iteration, visit leftmost descendant which may be root */
if (!pos)
@@ -1304,9 +1305,8 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
void kernfs_activate(struct kernfs_node *kn)
{
struct kernfs_node *pos;
- struct kernfs_root *root = kernfs_root(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
pos = NULL;
while ((pos = kernfs_next_descendant_post(pos, kn))) {
@@ -1320,15 +1320,13 @@ void kernfs_activate(struct kernfs_node *kn)
pos->flags |= KERNFS_ACTIVATED;
}
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
}
static void __kernfs_remove(struct kernfs_node *kn)
{
struct kernfs_node *pos;
- lockdep_assert_held_write(&kernfs_root(kn)->kernfs_rwsem);
-
/*
* Short-circuit if non-root @kn has already finished removal.
* This is for kernfs_remove_self() which plays with active ref
@@ -1341,12 +1339,16 @@ static void __kernfs_remove(struct kernfs_node *kn)
/* prevent any new usage under @kn by deactivating all nodes */
pos = NULL;
+
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
while ((pos = kernfs_next_descendant_post(pos, kn)))
if (kernfs_active(pos))
atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+ up_write_kernfs_rwsem(kn);
/* deactivate and unlink the subtree node-by-node */
do {
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
pos = kernfs_leftmost_descendant(kn);
/*
@@ -1364,10 +1366,15 @@ static void __kernfs_remove(struct kernfs_node *kn)
* error paths without worrying about draining.
*/
if (kn->flags & KERNFS_ACTIVATED)
- kernfs_drain(pos);
+ kernfs_drain(pos, kn);
else
WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
+ up_write_kernfs_rwsem(kn);
+
+ if (pos->parent)
+ down_write_kernfs_rwsem(pos->parent, KERNFS_RWSEM_LOCK_SELF);
+
/*
* kernfs_unlink_sibling() succeeds once per node. Use it
* to decide who's responsible for cleanups.
@@ -1385,6 +1392,9 @@ static void __kernfs_remove(struct kernfs_node *kn)
kernfs_put(pos);
}
+ if (pos->parent)
+ up_write_kernfs_rwsem(pos->parent);
+
kernfs_put(pos);
} while (pos != kn);
}
@@ -1397,11 +1407,7 @@ static void __kernfs_remove(struct kernfs_node *kn)
*/
void kernfs_remove(struct kernfs_node *kn)
{
- struct kernfs_root *root = kernfs_root(kn);
-
- down_write(&root->kernfs_rwsem);
__kernfs_remove(kn);
- up_write(&root->kernfs_rwsem);
}
/**
@@ -1487,9 +1493,8 @@ void kernfs_unbreak_active_protection(struct kernfs_node *kn)
bool kernfs_remove_self(struct kernfs_node *kn)
{
bool ret;
- struct kernfs_root *root = kernfs_root(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
kernfs_break_active_protection(kn);
/*
@@ -1503,9 +1508,11 @@ bool kernfs_remove_self(struct kernfs_node *kn)
*/
if (!(kn->flags & KERNFS_SUICIDAL)) {
kn->flags |= KERNFS_SUICIDAL;
+ up_write_kernfs_rwsem(kn);
__kernfs_remove(kn);
kn->flags |= KERNFS_SUICIDED;
ret = true;
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
} else {
wait_queue_head_t *waitq = &kernfs_root(kn)->deactivate_waitq;
DEFINE_WAIT(wait);
@@ -1517,9 +1524,9 @@ bool kernfs_remove_self(struct kernfs_node *kn)
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
break;
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
schedule();
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF_AND_PARENT);
}
finish_wait(waitq, &wait);
WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb));
@@ -1532,7 +1539,7 @@ bool kernfs_remove_self(struct kernfs_node *kn)
*/
kernfs_unbreak_active_protection(kn);
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
return ret;
}
@@ -1549,7 +1556,6 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
const void *ns)
{
struct kernfs_node *kn;
- struct kernfs_root *root;
if (!parent) {
WARN(1, KERN_WARNING "kernfs: can not remove '%s', no directory\n",
@@ -1557,15 +1563,15 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name,
return -ENOENT;
}
- root = kernfs_root(parent);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
kn = kernfs_find_ns(parent, name, ns);
+
+ up_write_kernfs_rwsem(parent);
+
if (kn)
__kernfs_remove(kn);
- up_write(&root->kernfs_rwsem);
-
if (kn)
return 0;
else
@@ -1583,7 +1589,6 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
const char *new_name, const void *new_ns)
{
struct kernfs_node *old_parent;
- struct kernfs_root *root;
const char *old_name = NULL;
int error;
@@ -1591,8 +1596,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
if (!kn->parent)
return -EINVAL;
- root = kernfs_root(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem_rename_ns(kn, kn->parent, new_parent);
error = -ENOENT;
if (!kernfs_active(kn) || !kernfs_active(new_parent) ||
@@ -1646,7 +1650,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
error = 0;
out:
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem_rename_ns(kn, new_parent, old_parent);
return error;
}
@@ -1717,14 +1721,12 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
struct dentry *dentry = file->f_path.dentry;
struct kernfs_node *parent = kernfs_dentry_node(dentry);
struct kernfs_node *pos = file->private_data;
- struct kernfs_root *root;
const void *ns = NULL;
if (!dir_emit_dots(file, ctx))
return 0;
- root = kernfs_root(parent);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
if (kernfs_ns_enabled(parent))
ns = kernfs_info(dentry->d_sb)->ns;
@@ -1741,12 +1743,12 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx)
file->private_data = pos;
kernfs_get(pos);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
if (!dir_emit(ctx, name, len, ino, type))
return 0;
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
}
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
file->private_data = NULL;
ctx->pos = INT_MAX;
return 0;
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index af2046bc63aa1..03d8f0d087cb8 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -857,7 +857,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
root = kernfs_root(kn);
/* kick fsnotify */
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
down_write(&root->supers_rwsem);
list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
@@ -897,7 +897,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
}
up_write(&root->supers_rwsem);
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
kernfs_put(kn);
goto repeat;
}
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index 3d783d80f5daa..4de65f9c21d85 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -99,11 +99,10 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr)
int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr)
{
int ret;
- struct kernfs_root *root = kernfs_root(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
ret = __kernfs_setattr(kn, iattr);
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
return ret;
}
@@ -112,14 +111,12 @@ int kernfs_iop_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
{
struct inode *inode = d_inode(dentry);
struct kernfs_node *kn = inode->i_private;
- struct kernfs_root *root;
int error;
if (!kn)
return -EINVAL;
- root = kernfs_root(kn);
- down_write(&root->kernfs_rwsem);
+ down_write_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
error = setattr_prepare(&init_user_ns, dentry, iattr);
if (error)
goto out;
@@ -132,7 +129,7 @@ int kernfs_iop_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
setattr_copy(&init_user_ns, inode, iattr);
out:
- up_write(&root->kernfs_rwsem);
+ up_write_kernfs_rwsem(kn);
return error;
}
@@ -187,14 +184,13 @@ int kernfs_iop_getattr(struct user_namespace *mnt_userns,
{
struct inode *inode = d_inode(path->dentry);
struct kernfs_node *kn = inode->i_private;
- struct kernfs_root *root = kernfs_root(kn);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
spin_lock(&inode->i_lock);
kernfs_refresh_inode(kn, inode);
generic_fillattr(&init_user_ns, inode, stat);
spin_unlock(&inode->i_lock);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(kn);
return 0;
}
@@ -278,21 +274,19 @@ int kernfs_iop_permission(struct user_namespace *mnt_userns,
struct inode *inode, int mask)
{
struct kernfs_node *kn;
- struct kernfs_root *root;
int ret;
if (mask & MAY_NOT_BLOCK)
return -ECHILD;
kn = inode->i_private;
- root = kernfs_root(kn);
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(kn, KERNFS_RWSEM_LOCK_SELF);
spin_lock(&inode->i_lock);
kernfs_refresh_inode(kn, inode);
ret = generic_permission(&init_user_ns, inode, mask);
spin_unlock(&inode->i_lock);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(kn);
return ret;
}
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index ba89de378f240..f19b180557559 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -133,7 +133,6 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr);
/*
* dir.c
*/
-extern struct rw_semaphore kernfs_rwsem;
extern const struct dentry_operations kernfs_dops;
extern const struct file_operations kernfs_dir_fops;
extern const struct inode_operations kernfs_dir_iops;
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index d28f8a3eeb215..2816750f798e2 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -237,7 +237,6 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *kfc)
{
struct kernfs_super_info *info = kernfs_info(sb);
- struct kernfs_root *kf_root = kfc->root;
struct inode *inode;
struct dentry *root;
@@ -257,9 +256,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k
sb->s_shrink.seeks = 0;
/* get root inode, initialize and unlock it */
- down_read(&kf_root->kernfs_rwsem);
+ down_read_kernfs_rwsem(info->root->kn, KERNFS_RWSEM_LOCK_SELF);
inode = kernfs_get_inode(sb, info->root->kn);
- up_read(&kf_root->kernfs_rwsem);
+ up_read_kernfs_rwsem(info->root->kn);
if (!inode) {
pr_debug("kernfs: could not get root inode\n");
return -ENOMEM;
diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 0ab13824822f7..24d0f64460bda 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -113,12 +113,11 @@ static int kernfs_getlink(struct inode *inode, char *path)
struct kernfs_node *kn = inode->i_private;
struct kernfs_node *parent = kn->parent;
struct kernfs_node *target = kn->symlink.target_kn;
- struct kernfs_root *root = kernfs_root(parent);
int error;
- down_read(&root->kernfs_rwsem);
+ down_read_kernfs_rwsem(parent, KERNFS_RWSEM_LOCK_SELF);
error = kernfs_get_target_path(parent, target, path);
- up_read(&root->kernfs_rwsem);
+ up_read_kernfs_rwsem(parent);
return error;
}
--
2.30.2
Hello Tejun,
Thanks again for reviewing this. I have refactored the code.
I have also added a document to describe lock usage and proof
of correctness in the new patch set at [1]. This took some time
but it can highlight if I have understood something wrong while
making these changes.
[1]
https://lore.kernel.org/lkml/[email protected]/
Thanks
-- Imran
On 15/2/22 5:15 am, Tejun Heo wrote:
> Hello,
>
> The patchset generally looks good to me but I think it can be split a bit
> better and the locking helper code can be more compacted. Also, it'd be
> great o include the benchmark method and result in the commit log so that it
> can be looked up easily later. It just has to be in one of the patch
> descriptions whether the first or last of the series and the rest can refer
> to it.
>
> Thanks.
>