2013-07-09 01:17:44

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 00/14] Lockless update of reference count protected by spinlock

v5->v6:
- Add a new GENERIC_SPINLOCK_REFCOUNT config parameter for using the
generic implementation.
- Add two parameters LOCKREF_WAIT_SHIFT and LOCKREF_RETRY_COUNT which
can be specified differently for each architecture.
- Update various spinlock_refcount.* files to incorporate review
comments.
- Replace reference of d_refcount() macro in Lustre filesystem code in
the staging tree to use the new d_count() helper function.

v4->v5:
- Add a d_count() helper for readonly access of reference count and
change all references to d_count outside of dcache.c, dcache.h
and namei.c to use d_count().

v3->v4:
- Replace helper function access to d_lock and d_count by using
macros to redefine the old d_lock name to the spinlock and new
d_refcount name to the reference count. This greatly reduces the
size of this patchset from 25 to 12 and make it easier to review.

v2->v3:
- Completely revamp the packaging by adding a new lockref data
structure that combines the spinlock with the reference
count. Helper functions are also added to manipulate the new data
structure. That results in modifying over 50 files, but the changes
were trivial in most of them.
- Change initial spinlock wait to use a timeout.
- Force 64-bit alignment of the spinlock & reference count structure.
- Add a new way to use the combo by using a new union and helper
functions.

v1->v2:
- Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
- Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
- Add some generic get/put macros into spinlock_refcount.h.

This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way. This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.

The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new lockref data structure and the d_lock name
is now a macro to the actual spinlock. The d_count name, however,
cannot be reused as it has collision elsewhere in the kernel. So a
new d_refcount macro is now used for the reference count and files
outside of dcache.c, dcache.h and namei.c are modified to use the
d_count() helper function.

The various patches were applied and built one-by-one to make sure
that there were no broken build.

This patch set causes significant performance improvement in the
short workload of the AIM7 benchmark on a 8-socket x86-64 machine
with 80 cores.

patch 1: Introduce the new lockref data structure
patch 2: Enable x86 architecture to use the feature
patch 3: Introduce the new d_count() helper function
patches 4-11: Rename all d_count references to d_count() helper
patch 12: Replace d_refcount() macro to d_count() helper
patch 13: Rename the d_count field to d_refcount
patch 14: Change the dentry structure to use the lockref
structure to improve performance for high contention
cases

Thank to Thomas Gleixner, Andi Kleen and Linus for their valuable
input in shaping this patchset.

Signed-off-by: Waiman Long <[email protected]>

Waiman Long (14):
spinlock: A new lockref structure for lockless update of refcount
spinlock: Enable x86 architecture to do lockless refcount update
dcache: Add a new helper function d_count() to return refcount
auto-fs: replace direct access of d_count with the d_count() helper
ceph-fs: replace direct access of d_count with the d_count() helper
coda-fs: replace direct access of d_count with the d_count() helper
config-fs: replace direct access of d_count with the d_count() helper
ecrypt-fs: replace direct access of d_count with the d_count() helper
file locking: replace direct access of d_count with the d_count()
helper
nfs: replace direct access of d_count with the d_count() helper
nilfs2: replace direct access of d_count with the d_count() helper
lustre-fs: Use the standard d_count() helper to access refcount
dcache: rename d_count field of dentry to d_refcount
dcache: Enable lockless update of refcount in dentry structure

arch/x86/Kconfig | 3 +
.../lustre/include/linux/lustre_patchless_compat.h | 2 -
drivers/staging/lustre/lustre/include/linux/lvfs.h | 2 +-
drivers/staging/lustre/lustre/llite/dcache.c | 8 +-
.../staging/lustre/lustre/llite/llite_internal.h | 4 +-
drivers/staging/lustre/lustre/llite/llite_lib.c | 2 +-
drivers/staging/lustre/lustre/llite/namei.c | 4 +-
drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 4 +-
fs/autofs4/expire.c | 8 +-
fs/autofs4/root.c | 2 +-
fs/ceph/inode.c | 4 +-
fs/ceph/mds_client.c | 2 +-
fs/coda/dir.c | 2 +-
fs/configfs/dir.c | 2 +-
fs/dcache.c | 72 ++++---
fs/ecryptfs/inode.c | 2 +-
fs/locks.c | 2 +-
fs/namei.c | 6 +-
fs/nfs/dir.c | 6 +-
fs/nfs/unlink.c | 2 +-
fs/nilfs2/super.c | 2 +-
include/asm-generic/spinlock_refcount.h | 46 ++++
include/linux/dcache.h | 31 ++-
include/linux/spinlock_refcount.h | 142 +++++++++++++
kernel/Kconfig.locks | 15 ++
lib/Makefile | 2 +
lib/spinlock_refcount.c | 218 ++++++++++++++++++++
27 files changed, 519 insertions(+), 76 deletions(-)
create mode 100644 include/asm-generic/spinlock_refcount.h
create mode 100644 include/linux/spinlock_refcount.h
create mode 100644 lib/spinlock_refcount.c


2013-07-09 01:17:51

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 01/14] spinlock: A new lockref structure for lockless update of refcount

This patch introduces a new set of spinlock_refcount.h header files to
be included by kernel codes that want to do a faster lockless update
of reference count protected by a spinlock.

The new lockref structure consists of just the spinlock and the
reference count data. Helper functions are defined in the new
<linux/spinlock_refcount.h> header file to access the content of
the new structure. There is a generic structure defined for all
architecture, but each architecture can also optionally define its
own structure and use its own helper functions.

Three new config parameters are introduced:
1. SPINLOCK_REFCOUNT
2. GENERIC_SPINLOCK_REFCOUNT
2. ARCH_SPINLOCK_REFCOUNT

The first one is defined in the kernel/Kconfig.locks which is used
to enable or disable the faster lockless reference count update
optimization. The second and third one have to be defined in each of
the architecture's Kconfig file to enable the optimization for that
architecture. Therefore, each architecture has to opt-in for this
optimization or it won't get it. This allows each architecture plenty
of time to test it out before deciding to use it or replace it with
a better architecture specific solution. The architecture should set
only GENERIC_SPINLOCK_REFCOUNT to use the generic implementation
without customization. By setting only ARCH_SPINLOCK_REFCOUNT,
the architecture will have to provide its own implementation. By
setting both, an architecture uses the generic implementation with
customized parameters.

This optimization won't work for non-SMP system or when spinlock
debugging is turned on. As a result, it is turned off each any of
them is true. It also won't work for full preempt-RT and so should
be turned off in this case.

To maximize the chance of doing lockless update in the generic version,
the inlined __lockref_add_unless() function will wait for a certain
amount of time if the lock is not free before trying to do the update.
The amount of time is controlled by the LOCKREF_WAIT_SHIFT macro.

The new code also attempts to do lockless atomic update a few
time before falling back to the old code path of acquiring a lock
before doing the update. Similarly, this is controlled by the
LOCKREF_RETRY_COUNT macro.

Signed-off-by: Waiman Long <[email protected]>
---
include/asm-generic/spinlock_refcount.h | 46 +++++++
include/linux/spinlock_refcount.h | 142 ++++++++++++++++++++
kernel/Kconfig.locks | 15 ++
lib/Makefile | 2 +
lib/spinlock_refcount.c | 218 +++++++++++++++++++++++++++++++
5 files changed, 423 insertions(+), 0 deletions(-)
create mode 100644 include/asm-generic/spinlock_refcount.h
create mode 100644 include/linux/spinlock_refcount.h
create mode 100644 lib/spinlock_refcount.c

diff --git a/include/asm-generic/spinlock_refcount.h b/include/asm-generic/spinlock_refcount.h
new file mode 100644
index 0000000..d3a4119
--- /dev/null
+++ b/include/asm-generic/spinlock_refcount.h
@@ -0,0 +1,46 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+#ifndef __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+#define __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+
+/*
+ * The lockref structure defines a combined spinlock with reference count
+ * data structure to be embedded in a larger structure. The combined data
+ * structure is always 8-byte aligned. So proper placement of this structure
+ * in the larger embedding data structure is needed to ensure that there is
+ * no hole in it.
+ */
+struct __aligned(sizeof(u64)) lockref {
+ union {
+ u64 lock_count;
+ struct {
+ unsigned int refcnt; /* Reference count */
+ spinlock_t lock;
+ };
+ };
+};
+
+/*
+ * Struct lockref helper functions
+ */
+extern void lockref_get(struct lockref *lockcnt);
+extern int lockref_put(struct lockref *lockcnt);
+extern int lockref_get_not_zero(struct lockref *lockcnt);
+extern int lockref_put_or_lock(struct lockref *lockcnt);
+
+#endif /* __ASM_GENERIC_SPINLOCK_REFCOUNT_H */
diff --git a/include/linux/spinlock_refcount.h b/include/linux/spinlock_refcount.h
new file mode 100644
index 0000000..32389a9
--- /dev/null
+++ b/include/linux/spinlock_refcount.h
@@ -0,0 +1,142 @@
+/*
+ * Spinlock with reference count combo data structure
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+#ifndef __LINUX_SPINLOCK_REFCOUNT_H
+#define __LINUX_SPINLOCK_REFCOUNT_H
+
+#include <linux/spinlock.h>
+
+/*
+ * To enable lockless update of reference count, an architecture has to define
+ * at least one of the following two config parameters in its Kconfig file:
+ * 1. GENERIC_SPINLOCK_REFCOUNT
+ * 2. ARCH_SPINLOCK_REFCOUNT
+ *
+ * By defining just the GENERIC_SPINLOCK_REFCOUNT parameter, the architecture
+ * will use the generic implementation with default tuning parameters. There
+ * is nothing else an architecture need to do.
+ *
+ * On the other hand, defining just the ARCH_SPINLOCK_REFCOUNT parameter
+ * indicates that the architecture is provding its own implementation. It
+ * has to provide an <asm/spinlock_refcount.h> header file.
+ *
+ * By defining both parameters, the architecture indicates that it is using
+ * the generic implementation, but with customized tuning parameters specified
+ * in the <asm/spinlock_refcount.h> header file. This header file should also
+ * include the <asm-generic/spinlock_refcount.h> header file.
+ *
+ * The supported tuning parameters that can be specified in
+ * <asm/spinlock_refcount.h> are:
+ * 1. LOCKREF_WAIT_SHIFT (default = 0)
+ * A value of 0 will force the code to wait indefinitely until the spinlock
+ * is free before attempting to do a lockless update of reference count.
+ * A non-zero value (n) will cause the waiting code to loop 2^n times
+ * before aborting the wait for a free spinlock.
+ * 2. LOCKREF_RETRY_COUNT (default = 2)
+ * The number of lockless reference count updates attempted by the code
+ * before falling back to taking the spinlock.
+ */
+#ifdef CONFIG_SPINLOCK_REFCOUNT
+
+# ifdef CONFIG_ARCH_SPINLOCK_REFCOUNT
+# include <asm/spinlock_refcount.h>
+# else
+# include <asm-generic/spinlock_refcount.h>
+# endif
+
+#else
+/*
+ * If the spinlock & reference count optimization feature is disabled,
+ * they will be accessed separately on its own.
+ */
+struct lockref {
+ unsigned int refcnt; /* Reference count */
+ spinlock_t lock;
+};
+
+/*
+ * Struct lockref helper functions
+ */
+/**
+ * lockref_get - Increments reference count unconditionally
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void lockref_get(struct lockref *lockcnt)
+{
+ spin_lock(&lockcnt->lock);
+ lockcnt->refcnt++;
+ spin_unlock(&lockcnt->lock);
+}
+
+/**
+ * lockref_get_not_zero - Increments count unless the count is 0
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0
+ */
+static __always_inline int lockref_get_not_zero(struct lockref *lockcnt)
+{
+ int retval = 0;
+
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt)) {
+ lockcnt->refcnt++;
+ retval = 1;
+ }
+ spin_unlock(&lockcnt->lock);
+ return retval;
+}
+
+/**
+ * lockref_put - Decrements count unless count <= 1 before decrement
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+static __always_inline int lockref_put(struct lockref *lockcnt)
+{
+ int retval = 0;
+
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt > 1)) {
+ lockcnt->refcnt--;
+ retval = 1;
+ }
+ spin_unlock(&lockcnt->lock);
+ return retval;
+}
+
+/**
+ * lockref_put_or_lock - decrements count unless count <= 1 before decrement
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ *
+ * The only difference between lockref_put_or_lock and lockref_put is that
+ * the former function will hold the lock on return while the latter one
+ * will free it on return.
+ */
+static __always_inline int lockref_put_or_locked(struct lockref *lockcnt)
+{
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt > 1)) {
+ lockcnt->refcnt--;
+ spin_unlock(&lockcnt->lock);
+ return 1;
+ }
+ return 0;
+}
+
+#endif /* !CONFIG_SPINLOCK_REFCOUNT */
+#endif /* __LINUX_SPINLOCK_REFCOUNT_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index d2b32ac..67ff90b 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -223,3 +223,18 @@ endif
config MUTEX_SPIN_ON_OWNER
def_bool y
depends on SMP && !DEBUG_MUTEXES
+
+#
+# Spinlock with reference count optimization
+#
+config GENERIC_SPINLOCK_REFCOUNT
+ bool
+
+config ARCH_SPINLOCK_REFCOUNT
+ bool
+
+config SPINLOCK_REFCOUNT
+ def_bool y
+ depends on ARCH_SPINLOCK_REFCOUNT || GENERIC_SPINLOCK_REFCOUNT
+ depends on SMP
+ depends on !GENERIC_LOCKBREAK && !DEBUG_SPINLOCK && !DEBUG_LOCK_ALLOC
diff --git a/lib/Makefile b/lib/Makefile
index c09e38e..bb2d83b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -183,3 +183,5 @@ quiet_cmd_build_OID_registry = GEN $@
clean-files += oid_registry_data.c

obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
+
+obj-$(CONFIG_GENERIC_SPINLOCK_REFCOUNT) += spinlock_refcount.o
diff --git a/lib/spinlock_refcount.c b/lib/spinlock_refcount.c
new file mode 100644
index 0000000..e32dc58
--- /dev/null
+++ b/lib/spinlock_refcount.c
@@ -0,0 +1,218 @@
+/*
+ * Generic spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+
+#include <linux/spinlock.h>
+#include <linux/spinlock_refcount.h>
+
+/*
+ * The _SHOW_WAIT_LOOP_TIME macro is for internal instrumentation purpose
+ * only during development. It should not be set in production system.
+ */
+#ifdef _SHOW_WAIT_LOOP_TIME
+#include <linux/time.h>
+#endif
+
+/*
+ * The LOCKREF_WAIT_SHIFT macro defines the maximum number of waiting spins
+ * when the lock is not free before trying to attempt a lockless update.
+ * The purpose of the timeout is to limit the amount of unfairness to the
+ * thread that is doing the reference count update. Otherwise, it is
+ * theoretically possible for that thread to be starved for a really long
+ * time causing all kind of problems. If it times out and the lock is still
+ * not free, the code will fall back to the traditional way of queuing up
+ * to acquire the lock before updating the count.
+ *
+ * The actual time spent in the wait loop very much depends on the CPU being
+ * used. On a 2.4GHz Westmere CPU, the execution time of a PAUSE instruction
+ * (cpu_relax) in a 4k loop is about 16us. The lock checking and looping
+ * overhead is about 8us. With 4 cpu_relax's in the loop, it will wait
+ * about 72us before the count reaches 0. With cacheline contention, the
+ * wait time can go up to 3x as much (about 210us). Having multiple
+ * cpu_relax's in the wait loop does seem to reduce cacheline contention
+ * somewhat and give slightly better performance.
+ *
+ * The preset timeout value could be rather arbitrary and really depends on
+ * the CPU being used. The exact value is not that important. A longer wait
+ * time will increase the chance of doing a lockless update, but it results
+ * in more unfairness to the waiting thread and vice versa. A value of 0
+ * indicates that the waiting code will wait forever until the lock is free.
+ *
+ * Default = 0
+ */
+#ifndef LOCKREF_WAIT_SHIFT
+#define LOCKREF_WAIT_SHIFT 0
+#endif
+#if LOCKREF_WAIT_SHIFT == 0
+#define LOCKREF_SPIN_WAIT_MAX INT_MAX
+#else
+#define LOCKREF_SPIN_WAIT_MAX (1<<LOCKREF_WAIT_SHIFT)
+#endif
+
+/*
+ * The number of attempts to update the reference count locklessly before
+ * quitting (default = 2).
+ */
+#ifndef LOCKREF_RETRY_COUNT
+#define LOCKREF_RETRY_COUNT 2
+#endif
+
+/*
+ * The maximum backoff shift value
+ */
+#define LOCKREF_BACKOFF_SHIFT_MAX 6
+
+/**
+ *
+ * lockref_add_unless - atomically add to count unless locked or reach threshold
+ *
+ * @lockcnt : pointer to the lockref structure
+ * @value : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return : 1 if operation succeeds, 0 otherwise
+ *
+ * If the lock was not acquired, lockref_add_unless() atomically adds the
+ * given value to the reference count unless the given threshold is reached.
+ * If the lock was acquired or the threshold was reached, 0 is returned and
+ * the caller will have to acquire the lock and update the count accordingly
+ * (can be done in a non-atomic way).
+ */
+static __always_inline int
+lockref_add_unless(struct lockref *lockcnt, int value, int threshold)
+{
+ struct lockref old, new;
+ int loopcnt;
+#ifdef _SHOW_WAIT_LOOP_TIME
+ struct timespec tv1, tv2;
+#endif
+
+ /*
+ * Code doesn't work if raw spinlock is larger than 4 bytes
+ * or is empty.
+ */
+ BUILD_BUG_ON((sizeof(arch_spinlock_t) == 0) ||
+ (sizeof(arch_spinlock_t) > 4));
+
+ /*
+ * Wait a certain amount of time until the lock is free or the loop
+ * counter reaches 0. Doing multiple cpu_relax() helps to reduce
+ * contention in the spinlock cacheline and achieve better performance.
+ */
+#ifdef _SHOW_WAIT_LOOP_TIME
+ getnstimeofday(&tv1);
+#endif
+ loopcnt = LOCKREF_SPIN_WAIT_MAX;
+ while (loopcnt && spin_is_locked(&lockcnt->lock)) {
+ loopcnt--;
+ cpu_relax();
+ cpu_relax();
+ cpu_relax();
+ cpu_relax();
+ }
+
+#ifdef _SHOW_WAIT_LOOP_TIME
+ if (loopcnt == 0) {
+ unsigned long ns;
+
+ getnstimeofday(&tv2);
+ ns = (tv2.tv_sec - tv1.tv_sec) * NSEC_PER_SEC +
+ (tv2.tv_nsec - tv1.tv_nsec);
+ pr_info("lockref wait loop time = %lu ns\n", ns);
+ }
+#endif
+
+ loopcnt = 0;
+ barrier();
+ do {
+ unsigned u;
+
+ old.lock_count = ACCESS_ONCE(lockcnt->lock_count);
+ if ((threshold >= 0) && (old.refcnt <= threshold))
+ break;
+ if (likely(!spin_is_locked(&old.lock))) {
+ new.lock_count = old.lock_count;
+ new.refcnt += value;
+ if (cmpxchg64(&lockcnt->lock_count, old.lock_count,
+ new.lock_count) == old.lock_count)
+ return 1;
+ }
+ /*
+ * Exponential backoff when contended
+ * The LOCKREF_RETRY_COUNT should be a small number.
+ */
+ u = 1 << min(loopcnt, LOCKREF_BACKOFF_SHIFT_MAX);
+ do {
+ cpu_relax();
+ } while (--u > 0);
+ } while (++loopcnt <= LOCKREF_RETRY_COUNT);
+ return 0;
+}
+
+/*
+ * Struct lockref helper functions
+ */
+/**
+ * lockref_get - Increments reference count unconditionally
+ * @lockcnt: pointer to struct lockref structure
+ */
+void lockref_get(struct lockref *lockcnt)
+{
+ if (lockref_add_unless(lockcnt, 1, -1))
+ return;
+ spin_lock(&lockcnt->lock);
+ lockcnt->refcnt++;
+ spin_unlock(&lockcnt->lock);
+}
+EXPORT_SYMBOL(lockref_get);
+
+/**
+ * lockref_get_not_zero - Increments count unless the count is 0
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0 and lock taken
+ */
+int lockref_get_not_zero(struct lockref *lockcnt)
+{
+ return lockref_add_unless(lockcnt, 1, 0);
+}
+EXPORT_SYMBOL(lockref_get_not_zero);
+
+/**
+ * lockref_put - Decrements count unless the count <= 1
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+int lockref_put(struct lockref *lockcnt)
+{
+ return lockref_add_unless(lockcnt, -1, 1);
+}
+EXPORT_SYMBOL(lockref_put);
+
+/**
+ * lockref_put_or_lock - Decrements count unless the count is <= 1
+ * otherwise, the lock will be taken
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ */
+int
+lockref_put_or_lock(struct lockref *lockcnt)
+{
+ if (lockref_add_unless(lockcnt, -1, 1))
+ return 1;
+ spin_lock(&lockcnt->lock);
+ return 0;
+}
+EXPORT_SYMBOL(lockref_put_or_lock);
--
1.7.1

2013-07-09 01:17:57

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 11/14] nilfs2: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]>
---
fs/nilfs2/super.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 1427de5..af3ba04 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -996,7 +996,7 @@ static int nilfs_attach_snapshot(struct super_block *s, __u64 cno,

static int nilfs_tree_was_touched(struct dentry *root_dentry)
{
- return root_dentry->d_count > 1;
+ return d_count(root_dentry) > 1;
}

/**
--
1.7.1

2013-07-09 01:18:09

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 14/14] dcache: Enable lockless update of refcount in dentry structure

The current code takes the dentry's d_lock lock whenever the
d_refcount is being updated. In reality, nothing big really happens
until d_refcount goes to 0 in dput(). So it is not necessary to take
the lock if the reference count won't go to 0. On the other hand,
there are cases where d_refcount should not be updated or was not
expected to be updated while d_lock was acquired by another thread.

To use the new lockref infrastructure to do lockless reference count
update, the d_lock and d_refcount field of the dentry structure was
combined into a new d_lockcnt field.

The offsets of the new d_lockcnt field are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput
due to this patch on an 8-socket 80-core x86-64 system with a 3.10
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-----------------+----------------+-----------------+----------+
| Configuration | Mean JPM | Mean JPM | % Change |
| | Rate w/o patch | Rate with patch | |
+-----------------+---------------------------------------------+
| | User Range 10 - 100 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1650355 | 5191497 | +214.6% |
| 4 nodes, HT off | 1665137 | 5204267 | +212.5% |
| 2 nodes, HT off | 1667552 | 3815637 | +128.8% |
| 1 node , HT off | 2442998 | 2352103 | -3.7% |
+-----------------+---------------------------------------------+
| | User Range 200 - 1000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1008604 | 5972142 | +492.1% |
| 4 nodes, HT off | 1317284 | 7190302 | +445.8% |
| 2 nodes, HT off | 1048363 | 4516400 | +330.8% |
| 1 node , HT off | 2461802 | 2466583 | +0.2% |
+-----------------+---------------------------------------------+
| | User Range 1100 - 2000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 995149 | 6424182 | +545.6% |
| 4 nodes, HT off | 1313386 | 7012193 | +433.9% |
| 2 nodes, HT off | 1041411 | 4478519 | +330.0% |
| 1 node , HT off | 2511186 | 2482650 | -1.1% |
+-----------------+----------------+-----------------+----------+

It can be seen that with 20 CPUs (2 nodes) or more, this patch can
significantly improve the short workload performance. With only 1
node, the performance is similar with or without the patch. The short
workload also scales pretty well up to 4 nodes with this patch.

The following table shows the short workload performance difference
of the original 3.10 kernel versus the one with the patch but have
SPINLOCK_REFCOUNT config variable disabled.

+-----------------+----------------+-----------------+----------+
| Configuration | Mean JPM | Mean JPM | % Change |
| | Rate w/o patch | Rate with patch | |
+-----------------+---------------------------------------------+
| | User Range 10 - 100 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1650355 | 1634232 | -1.0% |
| 4 nodes, HT off | 1665137 | 1675791 | +0.6% |
| 2 nodes, HT off | 1667552 | 2985552 | +79.0% |
| 1 node , HT off | 2442998 | 2396091 | -1.9% |
+-----------------+---------------------------------------------+
| | User Range 200 - 1000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1008604 | 1005153 | -0.3% |
| 4 nodes, HT off | 1317284 | 1330782 | +1.0% |
| 2 nodes, HT off | 1048363 | 2056871 | +96.2% |
| 1 node , HT off | 2461802 | 2463877 | +0.1% |
+-----------------+---------------------------------------------+
| | User Range 1100 - 2000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 995149 | 991157 | -0.4% |
| 4 nodes, HT off | 1313386 | 1321806 | +0.6% |
| 2 nodes, HT off | 1041411 | 2032808 | +95.2% |
| 1 node , HT off | 2511186 | 2483815 | -1.1% |
+-----------------+----------------+-----------------+----------+

There are some abnormalities in the original 3.10 2-node data. Ignoring
that, the performance difference for the other node counts, if any,
is insignificant.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
78% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
1. dget_parent (49.91%)
2. dput (49.89%)

The relevant perf report lines are:
+ 78.37% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 0.09% reaim [kernel.kallsyms] [k] dput
+ 0.05% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq
+ 0.00% reaim [kernel.kallsyms] [k] dget_parent

With this patch installed, the new perf report lines are:
+ 19.65% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
+ 3.94% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 2.47% reaim [kernel.kallsyms] [k] lockref_get_not_zero
+ 0.62% reaim [kernel.kallsyms] [k] lockref_put_or_locked
+ 0.36% reaim [kernel.kallsyms] [k] dput
+ 0.31% reaim [kernel.kallsyms] [k] lockref_get
+ 0.02% reaim [kernel.kallsyms] [k] dget_parent

- 3.94% reaim [kernel.kallsyms] [k] _raw_spin_lock
- _raw_spin_lock
+ 32.86% SyS_getcwd
+ 31.99% d_path
+ 4.81% prepend_path
+ 4.14% __rcu_process_callbacks
+ 3.73% complete_walk
+ 2.31% dget_parent
+ 1.99% unlazy_walk
+ 1.44% do_anonymous_page
+ 1.22% lockref_put_or_locked
+ 1.16% sem_lock
+ 0.95% task_rq_lock
+ 0.89% selinux_inode_free_security
+ 0.89% process_backlog
+ 0.79% enqueue_to_backlog
+ 0.72% unix_dgram_sendmsg
+ 0.69% unix_stream_sendmsg

The lockref_put_or_locked used up only 1.22% of the _raw_spin_lock
time while dget_parent used only 2.31%.

This impact of this patch on other AIM7 workloads were much more
modest. The table below show the mean %change due to this patch on
the same 8-socket system with a 3.10 kernel.

+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| alltests | -0.2% | +0.5% | -0.3% |
| five_sec | +2.5% | -4.2% | -4.7% |
| fserver | +1.7% | +1.6% | +0.3% |
| high_systime | +0.1% | +1.4% | +5.5% |
| new_fserver | +0.4% | +1.2% | +0.3% |
| shared | +0.8% | -0.3% | 0.0% |
+--------------+---------------+----------------+-----------------+

There are slight drops in performance for the five_sec workload,
but slight increase in the high_systime workload.

The checkpatch.pl scripts reported errors in the d_lock and d_refcount
macros as it wanted to have parentheses around the actual names.
That won't work for those macros and so the errors should be ignored.

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 18 ++++++++++++------
include/linux/dcache.h | 17 ++++++++++-------
2 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 20def64..095ee18 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -515,7 +515,9 @@ void dput(struct dentry *dentry)
repeat:
if (dentry->d_refcount == 1)
might_sleep();
- spin_lock(&dentry->d_lock);
+ if (lockref_put_or_lock(&dentry->d_lockcnt))
+ return;
+ /* dentry's lock taken */
BUG_ON(!dentry->d_refcount);
if (dentry->d_refcount > 1) {
dentry->d_refcount--;
@@ -611,26 +613,30 @@ static inline void __dget_dlock(struct dentry *dentry)

static inline void __dget(struct dentry *dentry)
{
- spin_lock(&dentry->d_lock);
- __dget_dlock(dentry);
- spin_unlock(&dentry->d_lock);
+ lockref_get(&dentry->d_lockcnt);
}

struct dentry *dget_parent(struct dentry *dentry)
{
struct dentry *ret;

+ rcu_read_lock();
+ ret = rcu_dereference(dentry->d_parent);
+ if (lockref_get_not_zero(&ret->d_lockcnt)) {
+ rcu_read_unlock();
+ return ret;
+ }
repeat:
/*
* Don't need rcu_dereference because we re-check it was correct under
* the lock.
*/
- rcu_read_lock();
- ret = dentry->d_parent;
+ ret = ACCESS_ONCE(dentry->d_parent);
spin_lock(&ret->d_lock);
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
rcu_read_unlock();
+ rcu_read_lock();
goto repeat;
}
rcu_read_unlock();
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6b9b7f4..c6e9c7a 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -9,6 +9,7 @@
#include <linux/seqlock.h>
#include <linux/cache.h>
#include <linux/rcupdate.h>
+#include <linux/spinlock_refcount.h>

struct nameidata;
struct path;
@@ -112,8 +113,7 @@ struct dentry {
unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */

/* Ref lookup also touches following */
- unsigned int d_refcount; /* protected by d_lock */
- spinlock_t d_lock; /* per dentry lock */
+ struct lockref d_lockcnt; /* per dentry lock & count */
const struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
unsigned long d_time; /* used by d_revalidate */
@@ -132,6 +132,12 @@ struct dentry {
};

/*
+ * Define macros to access the name-changed spinlock and reference count
+ */
+#define d_lock d_lockcnt.lock
+#define d_refcount d_lockcnt.refcnt
+
+/*
* dentry->d_lock spinlock nesting subclasses:
*
* 0: normal
@@ -367,11 +373,8 @@ static inline struct dentry *dget_dlock(struct dentry *dentry)

static inline struct dentry *dget(struct dentry *dentry)
{
- if (dentry) {
- spin_lock(&dentry->d_lock);
- dget_dlock(dentry);
- spin_unlock(&dentry->d_lock);
- }
+ if (dentry)
+ lockref_get(&dentry->d_lockcnt);
return dentry;
}

--
1.7.1

2013-07-09 01:18:27

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 12/14] lustre-fs: Use the standard d_count() helper to access refcount

The Lustre FS should use the newly defined d_count() helper function
to access the dentry's reference count instead of defining its own
d_refcount() macro for the same purpose. Since the current lustre
code is marked as broken, no build test was attempted for this change.

Signed-off-by: Waiman Long <[email protected]>
---
.../lustre/include/linux/lustre_patchless_compat.h | 2 --
drivers/staging/lustre/lustre/include/linux/lvfs.h | 2 +-
drivers/staging/lustre/lustre/llite/dcache.c | 8 ++++----
.../staging/lustre/lustre/llite/llite_internal.h | 4 ++--
drivers/staging/lustre/lustre/llite/llite_lib.c | 2 +-
drivers/staging/lustre/lustre/llite/namei.c | 4 ++--
drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 4 ++--
7 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
index f050808..a8e9c0c 100644
--- a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
+++ b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
@@ -60,8 +60,6 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
ll_delete_from_page_cache(page);
}

-# define d_refcount(d) ((d)->d_count)
-
#ifdef ATTR_OPEN
# define ATTR_FROM_OPEN ATTR_OPEN
#else
diff --git a/drivers/staging/lustre/lustre/include/linux/lvfs.h b/drivers/staging/lustre/lustre/include/linux/lvfs.h
index b4db6cb..eb59ac7 100644
--- a/drivers/staging/lustre/lustre/include/linux/lvfs.h
+++ b/drivers/staging/lustre/lustre/include/linux/lvfs.h
@@ -99,7 +99,7 @@ static inline void l_dput(struct dentry *de)
if (!de || IS_ERR(de))
return;
//shrink_dcache_parent(de);
- LASSERT(d_refcount(de) > 0);
+ LASSERT(d_count(de) > 0);
dput(de);
}

diff --git a/drivers/staging/lustre/lustre/llite/dcache.c b/drivers/staging/lustre/lustre/llite/dcache.c
index 7d6abff..ff0d085 100644
--- a/drivers/staging/lustre/lustre/llite/dcache.c
+++ b/drivers/staging/lustre/lustre/llite/dcache.c
@@ -98,7 +98,7 @@ int ll_dcompare(const struct dentry *parent, const struct inode *pinode,

CDEBUG(D_DENTRY, "found name %.*s(%p) flags %#x refc %d\n",
name->len, name->name, dentry, dentry->d_flags,
- d_refcount(dentry));
+ d_count(dentry));

/* mountpoint is always valid */
if (d_mountpoint((struct dentry *)dentry))
@@ -165,7 +165,7 @@ static int ll_ddelete(const struct dentry *de)
list_empty(&de->d_subdirs) ? "" : "subdirs");

/* kernel >= 2.6.38 last refcount is decreased after this function. */
- LASSERT(d_refcount(de) == 1);
+ LASSERT(d_count(de) == 1);

/* Disable this piece of code temproarily because this is called
* inside dcache_lock so it's not appropriate to do lots of work
@@ -190,7 +190,7 @@ static int ll_set_dd(struct dentry *de)

CDEBUG(D_DENTRY, "ldd on dentry %.*s (%p) parent %p inode %p refc %d\n",
de->d_name.len, de->d_name.name, de, de->d_parent, de->d_inode,
- d_refcount(de));
+ d_count(de));

if (de->d_fsdata == NULL) {
struct ll_dentry_data *lld;
@@ -540,7 +540,7 @@ out:
CDEBUG(D_DENTRY, "revalidated dentry %.*s (%p) parent %p "
"inode %p refc %d\n", de->d_name.len,
de->d_name.name, de, de->d_parent, de->d_inode,
- d_refcount(de));
+ d_count(de));

ll_set_lock_data(exp, de->d_inode, it, &bits);

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 992cd20..5227c5c 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1529,12 +1529,12 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested)
{
CDEBUG(D_DENTRY, "invalidate dentry %.*s (%p) parent %p inode %p "
"refc %d\n", dentry->d_name.len, dentry->d_name.name, dentry,
- dentry->d_parent, dentry->d_inode, d_refcount(dentry));
+ dentry->d_parent, dentry->d_inode, d_count(dentry));

spin_lock_nested(&dentry->d_lock,
nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL);
__d_lustre_invalidate(dentry);
- if (d_refcount(dentry) == 0)
+ if (d_count(dentry) == 0)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
}
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 2311b20..afae801 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -659,7 +659,7 @@ void lustre_dump_dentry(struct dentry *dentry, int recur)
" flags=0x%x, fsdata=%p, %d subdirs\n", dentry,
dentry->d_name.len, dentry->d_name.name,
dentry->d_parent->d_name.len, dentry->d_parent->d_name.name,
- dentry->d_parent, dentry->d_inode, d_refcount(dentry),
+ dentry->d_parent, dentry->d_inode, d_count(dentry),
dentry->d_flags, dentry->d_fsdata, subdirs);
if (dentry->d_inode != NULL)
ll_dump_inode(dentry->d_inode);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 58d59aa..ff8f63d 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -409,7 +409,7 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
iput(inode);
CDEBUG(D_DENTRY,
"Reuse dentry %p inode %p refc %d flags %#x\n",
- new, new->d_inode, d_refcount(new), new->d_flags);
+ new, new->d_inode, d_count(new), new->d_flags);
return new;
}
}
@@ -417,7 +417,7 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
__d_lustre_invalidate(de);
d_add(de, inode);
CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
- de, de->d_inode, d_refcount(de), de->d_flags);
+ de, de->d_inode, d_count(de), de->d_flags);
return de;
}

diff --git a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
index 1e6f32c..e70d8fe 100644
--- a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
+++ b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
@@ -121,8 +121,8 @@ void push_ctxt(struct lvfs_run_ctxt *save, struct lvfs_run_ctxt *new_ctx,
OBD_SET_CTXT_MAGIC(save);

save->fs = get_fs();
- LASSERT(d_refcount(cfs_fs_pwd(current->fs)));
- LASSERT(d_refcount(new_ctx->pwd));
+ LASSERT(d_count(cfs_fs_pwd(current->fs)));
+ LASSERT(d_count(new_ctx->pwd));
save->pwd = dget(cfs_fs_pwd(current->fs));
save->pwdmnt = mntget(cfs_fs_mnt(current->fs));
save->luc.luc_umask = current_umask();
--
1.7.1

2013-07-09 01:18:50

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 10/14] nfs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/nfs/dir.c | 6 +++---
fs/nfs/unlink.c | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d7ed697..c933bdf 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1721,7 +1721,7 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry)
dir->i_ino, dentry->d_name.name);

spin_lock(&dentry->d_lock);
- if (dentry->d_count > 1) {
+ if (d_count(dentry) > 1) {
spin_unlock(&dentry->d_lock);
/* Start asynchronous writeout of the inode */
write_inode_now(dentry->d_inode, 0);
@@ -1866,7 +1866,7 @@ int nfs_rename(struct inode *old_dir, struct dentry *old_dentry,
dfprintk(VFS, "NFS: rename(%s/%s -> %s/%s, ct=%d)\n",
old_dentry->d_parent->d_name.name, old_dentry->d_name.name,
new_dentry->d_parent->d_name.name, new_dentry->d_name.name,
- new_dentry->d_count);
+ d_count(new_dentry));

/*
* For non-directories, check whether the target is busy and if so,
@@ -1884,7 +1884,7 @@ int nfs_rename(struct inode *old_dir, struct dentry *old_dentry,
rehash = new_dentry;
}

- if (new_dentry->d_count > 2) {
+ if (d_count(new_dentry) > 2) {
int err;

/* copy the target dentry's name */
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index 1f1f38f..60395ad 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -479,7 +479,7 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)

dfprintk(VFS, "NFS: silly-rename(%s/%s, ct=%d)\n",
dentry->d_parent->d_name.name, dentry->d_name.name,
- dentry->d_count);
+ d_count(dentry));
nfs_inc_stats(dir, NFSIOS_SILLYRENAME);

/*
--
1.7.1

2013-07-09 01:18:48

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 13/14] dcache: rename d_count field of dentry to d_refcount

Before converting the d_lock and d_count field of the dentry data
structure to the new lockref structure, we need to consider the
implication of such a change. All current references of d_count and
d_lock have to be changed accordingly.

One way to minimize the changes is to redefine the original field
names as macros to the new names. For d_lock, it is possible to do
so saving a lot of changes as this name is not used anywhere else in
the kernel. For d_count, this is not possible as this name is used
somewhere else in the kernel for different things.

The dcache.c and namei.c files need to change the reference count
value. They will be modified to use a different reference count name
"d_refcount" which is unique in the kernel source code.

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 54 ++++++++++++++++++++++++------------------------
fs/namei.c | 6 ++--
include/linux/dcache.h | 8 +++---
3 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 87bdb53..20def64 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -54,7 +54,7 @@
* - d_flags
* - d_name
* - d_lru
- * - d_count
+ * - d_refcount
* - d_unhashed()
* - d_parent and d_subdirs
* - childrens' d_child and d_parent
@@ -229,7 +229,7 @@ static void __d_free(struct rcu_head *head)
*/
static void d_free(struct dentry *dentry)
{
- BUG_ON(dentry->d_count);
+ BUG_ON(dentry->d_refcount);
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -467,7 +467,7 @@ relock:
}

if (ref)
- dentry->d_count--;
+ dentry->d_refcount--;
/*
* inform the fs via d_prune that this dentry is about to be
* unhashed and destroyed.
@@ -513,12 +513,12 @@ void dput(struct dentry *dentry)
return;

repeat:
- if (dentry->d_count == 1)
+ if (dentry->d_refcount == 1)
might_sleep();
spin_lock(&dentry->d_lock);
- BUG_ON(!dentry->d_count);
- if (dentry->d_count > 1) {
- dentry->d_count--;
+ BUG_ON(!dentry->d_refcount);
+ if (dentry->d_refcount > 1) {
+ dentry->d_refcount--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -535,7 +535,7 @@ repeat:
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);

- dentry->d_count--;
+ dentry->d_refcount--;
spin_unlock(&dentry->d_lock);
return;

@@ -590,7 +590,7 @@ int d_invalidate(struct dentry * dentry)
* We also need to leave mountpoints alone,
* directory or not.
*/
- if (dentry->d_count > 1 && dentry->d_inode) {
+ if (dentry->d_refcount > 1 && dentry->d_inode) {
if (S_ISDIR(dentry->d_inode->i_mode) || d_mountpoint(dentry)) {
spin_unlock(&dentry->d_lock);
return -EBUSY;
@@ -606,7 +606,7 @@ EXPORT_SYMBOL(d_invalidate);
/* This must be called with d_lock held */
static inline void __dget_dlock(struct dentry *dentry)
{
- dentry->d_count++;
+ dentry->d_refcount++;
}

static inline void __dget(struct dentry *dentry)
@@ -634,8 +634,8 @@ repeat:
goto repeat;
}
rcu_read_unlock();
- BUG_ON(!ret->d_count);
- ret->d_count++;
+ BUG_ON(!ret->d_refcount);
+ ret->d_refcount++;
spin_unlock(&ret->d_lock);
return ret;
}
@@ -718,7 +718,7 @@ restart:
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
- if (!dentry->d_count) {
+ if (!dentry->d_refcount) {
__dget_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
@@ -734,7 +734,7 @@ EXPORT_SYMBOL(d_prune_aliases);

/*
* Try to throw away a dentry - free the inode, dput the parent.
- * Requires dentry->d_lock is held, and dentry->d_count == 0.
+ * Requires dentry->d_lock is held, and dentry->d_refcount == 0.
* Releases dentry->d_lock.
*
* This may fail if locks cannot be acquired no problem, just try again.
@@ -764,8 +764,8 @@ static void try_prune_one_dentry(struct dentry *dentry)
dentry = parent;
while (dentry) {
spin_lock(&dentry->d_lock);
- if (dentry->d_count > 1) {
- dentry->d_count--;
+ if (dentry->d_refcount > 1) {
+ dentry->d_refcount--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -793,7 +793,7 @@ static void shrink_dentry_list(struct list_head *list)
* the LRU because of laziness during lookup. Do not free
* it - just keep it off the LRU list.
*/
- if (dentry->d_count) {
+ if (dentry->d_refcount) {
dentry_lru_del(dentry);
spin_unlock(&dentry->d_lock);
continue;
@@ -913,7 +913,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
dentry_lru_del(dentry);
__d_shrink(dentry);

- if (dentry->d_count != 0) {
+ if (dentry->d_refcount != 0) {
printk(KERN_ERR
"BUG: Dentry %p{i=%lx,n=%s}"
" still in use (%d)"
@@ -922,7 +922,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
dentry->d_inode ?
dentry->d_inode->i_ino : 0UL,
dentry->d_name.name,
- dentry->d_count,
+ dentry->d_refcount,
dentry->d_sb->s_type->name,
dentry->d_sb->s_id);
BUG();
@@ -933,7 +933,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
list_del(&dentry->d_u.d_child);
} else {
parent = dentry->d_parent;
- parent->d_count--;
+ parent->d_refcount--;
list_del(&dentry->d_u.d_child);
}

@@ -981,7 +981,7 @@ void shrink_dcache_for_umount(struct super_block *sb)

dentry = sb->s_root;
sb->s_root = NULL;
- dentry->d_count--;
+ dentry->d_refcount--;
shrink_dcache_for_umount_subtree(dentry);

while (!hlist_bl_empty(&sb->s_anon)) {
@@ -1147,7 +1147,7 @@ resume:
* loop in shrink_dcache_parent() might not make any progress
* and loop forever.
*/
- if (dentry->d_count) {
+ if (dentry->d_refcount) {
dentry_lru_del(dentry);
} else if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
dentry_lru_move_list(dentry, dispose);
@@ -1269,7 +1269,7 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
smp_wmb();
dentry->d_name.name = dname;

- dentry->d_count = 1;
+ dentry->d_refcount = 1;
dentry->d_flags = 0;
spin_lock_init(&dentry->d_lock);
seqcount_init(&dentry->d_seq);
@@ -1970,7 +1970,7 @@ struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
goto next;
}

- dentry->d_count++;
+ dentry->d_refcount++;
found = dentry;
spin_unlock(&dentry->d_lock);
break;
@@ -2069,7 +2069,7 @@ again:
spin_lock(&dentry->d_lock);
inode = dentry->d_inode;
isdir = S_ISDIR(inode->i_mode);
- if (dentry->d_count == 1) {
+ if (dentry->d_refcount == 1) {
if (!spin_trylock(&inode->i_lock)) {
spin_unlock(&dentry->d_lock);
cpu_relax();
@@ -2937,7 +2937,7 @@ resume:
}
if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
dentry->d_flags |= DCACHE_GENOCIDE;
- dentry->d_count--;
+ dentry->d_refcount--;
}
spin_unlock(&dentry->d_lock);
}
@@ -2945,7 +2945,7 @@ resume:
struct dentry *child = this_parent;
if (!(this_parent->d_flags & DCACHE_GENOCIDE)) {
this_parent->d_flags |= DCACHE_GENOCIDE;
- this_parent->d_count--;
+ this_parent->d_refcount--;
}
this_parent = try_to_ascend(this_parent, locked, seq);
if (!this_parent)
diff --git a/fs/namei.c b/fs/namei.c
index b2beee7..1c96aa6 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -536,8 +536,8 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
* a reference at this point.
*/
BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
- BUG_ON(!parent->d_count);
- parent->d_count++;
+ BUG_ON(!parent->d_refcount);
+ parent->d_refcount++;
spin_unlock(&dentry->d_lock);
}
spin_unlock(&parent->d_lock);
@@ -3327,7 +3327,7 @@ void dentry_unhash(struct dentry *dentry)
{
shrink_dcache_parent(dentry);
spin_lock(&dentry->d_lock);
- if (dentry->d_count == 1)
+ if (dentry->d_refcount == 1)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 7c6bbf0..6b9b7f4 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -112,7 +112,7 @@ struct dentry {
unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */

/* Ref lookup also touches following */
- unsigned int d_count; /* protected by d_lock */
+ unsigned int d_refcount; /* protected by d_lock */
spinlock_t d_lock; /* per dentry lock */
const struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
@@ -264,7 +264,7 @@ extern void d_rehash(struct dentry *);
*/
static inline unsigned int d_count(struct dentry *entry)
{
- return entry->d_count;
+ return entry->d_refcount;
}

/**
@@ -328,7 +328,7 @@ static inline int __d_rcu_to_refcount(struct dentry *dentry, unsigned seq)
assert_spin_locked(&dentry->d_lock);
if (!read_seqcount_retry(&dentry->d_seq, seq)) {
ret = 1;
- dentry->d_count++;
+ dentry->d_refcount++;
}

return ret;
@@ -361,7 +361,7 @@ extern char *dentry_path(struct dentry *, char *, int);
static inline struct dentry *dget_dlock(struct dentry *dentry)
{
if (dentry)
- dentry->d_count++;
+ dentry->d_refcount++;
return dentry;
}

--
1.7.1

2013-07-09 01:19:24

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 09/14] file locking: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/locks.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 04e2c1f..c98e1a1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1454,7 +1454,7 @@ static int generic_add_lease(struct file *filp, long arg, struct file_lock **flp
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
goto out;
if ((arg == F_WRLCK)
- && ((dentry->d_count > 1)
+ && ((d_count(dentry) > 1)
|| (atomic_read(&inode->i_count) > 1)))
goto out;

--
1.7.1

2013-07-09 01:17:47

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 02/14] spinlock: Enable x86 architecture to do lockless refcount update

This patch enables the x86 architecture to do lockless reference
count update using the generic lockref implementation with default
parameters. Only the x86/Kconfig file needs to be changed.

Signed-off-by: Waiman Long <[email protected]>
---
arch/x86/Kconfig | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 265c672..6a86061 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -261,6 +261,9 @@ config ARCH_CPU_PROBE_RELEASE
config ARCH_SUPPORTS_UPROBES
def_bool y

+config GENERIC_SPINLOCK_REFCOUNT
+ def_bool y
+
source "init/Kconfig"
source "kernel/Kconfig.freezer"

--
1.7.1

2013-07-09 01:19:47

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 07/14] config-fs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/configfs/dir.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c
index 64e5323..1d75ec5 100644
--- a/fs/configfs/dir.c
+++ b/fs/configfs/dir.c
@@ -387,7 +387,7 @@ static void remove_dir(struct dentry * d)
if (d->d_inode)
simple_rmdir(parent->d_inode,d);

- pr_debug(" o %s removing done (%d)\n",d->d_name.name, d->d_count);
+ pr_debug(" o %s removing done (%d)\n", d->d_name.name, d_count(d));

dput(parent);
}
--
1.7.1

2013-07-09 01:19:45

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 08/14] ecrypt-fs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/ecryptfs/inode.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index a2f2bb2..67e9b63 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -358,7 +358,7 @@ static int ecryptfs_lookup_interpose(struct dentry *dentry,

lower_mnt = mntget(ecryptfs_dentry_to_lower_mnt(dentry->d_parent));
fsstack_copy_attr_atime(dir_inode, lower_dentry->d_parent->d_inode);
- BUG_ON(!lower_dentry->d_count);
+ BUG_ON(!d_count(lower_dentry));

ecryptfs_set_dentry_private(dentry, dentry_info);
ecryptfs_set_dentry_lower(dentry, lower_dentry);
--
1.7.1

2013-07-09 01:20:24

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 06/14] coda-fs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/coda/dir.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/coda/dir.c b/fs/coda/dir.c
index 14a1480..190effc 100644
--- a/fs/coda/dir.c
+++ b/fs/coda/dir.c
@@ -526,7 +526,7 @@ static int coda_dentry_revalidate(struct dentry *de, unsigned int flags)
if (cii->c_flags & C_FLUSH)
coda_flag_inode_children(inode, C_FLUSH);

- if (de->d_count > 1)
+ if (d_count(de) > 1)
/* pretend it's valid, but don't change the flags */
goto out;

--
1.7.1

2013-07-09 01:20:42

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 05/14] ceph-fs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/ceph/inode.c | 4 ++--
fs/ceph/mds_client.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index be0f7e2..bd2289a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -903,8 +903,8 @@ static struct dentry *splice_dentry(struct dentry *dn, struct inode *in,
} else if (realdn) {
dout("dn %p (%d) spliced with %p (%d) "
"inode %p ino %llx.%llx\n",
- dn, dn->d_count,
- realdn, realdn->d_count,
+ dn, d_count(dn),
+ realdn, d_count(realdn),
realdn->d_inode, ceph_vinop(realdn->d_inode));
dput(dn);
dn = realdn;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 74fd289..99890b0 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1553,7 +1553,7 @@ retry:
*base = ceph_ino(temp->d_inode);
*plen = len;
dout("build_path on %p %d built %llx '%.*s'\n",
- dentry, dentry->d_count, *base, len, path);
+ dentry, d_count(dentry), *base, len, path);
return path;
}

--
1.7.1

2013-07-09 01:21:08

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 04/14] auto-fs: replace direct access of d_count with the d_count() helper

All readonly references to d_count outside of the core dcache code
should be changed to use the new d_count() helper as they shouldn't
access its value directly. There is no change in logic and everything
should just work.

Signed-off-by: Waiman Long <[email protected]>
---
fs/autofs4/expire.c | 8 ++++----
fs/autofs4/root.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 13ddec9..aac0006 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -109,7 +109,7 @@ cont:

spin_lock_nested(&q->d_lock, DENTRY_D_LOCK_NESTED);
/* Already gone or negative dentry (under construction) - try next */
- if (q->d_count == 0 || !simple_positive(q)) {
+ if (d_count(q) == 0 || !simple_positive(q)) {
spin_unlock(&q->d_lock);
next = q->d_u.d_child.next;
goto cont;
@@ -267,7 +267,7 @@ static int autofs4_tree_busy(struct vfsmount *mnt,
else
ino_count++;

- if (p->d_count > ino_count) {
+ if (d_count(p) > ino_count) {
top_ino->last_used = jiffies;
dput(p);
return 1;
@@ -409,7 +409,7 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
if (!exp_leaves) {
/* Path walk currently on this dentry? */
ino_count = atomic_read(&ino->count) + 1;
- if (dentry->d_count > ino_count)
+ if (d_count(dentry) > ino_count)
goto next;

if (!autofs4_tree_busy(mnt, dentry, timeout, do_now)) {
@@ -423,7 +423,7 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb,
} else {
/* Path walk currently on this dentry? */
ino_count = atomic_read(&ino->count) + 1;
- if (dentry->d_count > ino_count)
+ if (d_count(dentry) > ino_count)
goto next;

expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index ca8e555..78f9b0a 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -179,7 +179,7 @@ static struct dentry *autofs4_lookup_active(struct dentry *dentry)
spin_lock(&active->d_lock);

/* Already gone? */
- if (active->d_count == 0)
+ if (d_count(active) == 0)
goto next;

qstr = &active->d_name;
--
1.7.1

2013-07-09 01:21:31

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 03/14] dcache: Add a new helper function d_count() to return refcount

This patch adds a new helper function d_count() in dcache.h for
returning the current reference count of the dentry object. It
should be used by all the files outside of the core dcache.c and
namei.c files.

Signed-off-by: Waiman Long <[email protected]>
---
include/linux/dcache.h | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index f42dbe1..7c6bbf0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -258,6 +258,16 @@ extern int have_submounts(struct dentry *);
extern void d_rehash(struct dentry *);

/**
+ * d_count - return the reference count in dentry
+ * @entry: dentry pointer
+ * Returns: current value of reference count
+ */
+static inline unsigned int d_count(struct dentry *entry)
+{
+ return entry->d_count;
+}
+
+/**
* d_add - add dentry to hash queues
* @entry: dentry to add
* @inode: The inode to attach to this dentry
--
1.7.1

2013-07-10 09:49:37

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH v6 12/14] lustre-fs: Use the standard d_count() helper to access refcount

> -----Original Message-----
> From: Waiman Long [mailto:[email protected]]
> Sent: Tuesday, July 09, 2013 9:10 AM
> To: Alexander Viro; Jeff Layton; Miklos Szeredi; Ingo Molnar; Thomas Gleixner
> Cc: Waiman Long; [email protected]; Greg Kroah-Hartman; Andreas Dilger; Peng, Tao; Oleg Drokin;
> Fan Yong; Ned Bass; [email protected]; [email protected]; Peter Zijlstra; Steven
> Rostedt; Linus Torvalds; Benjamin Herrenschmidt; Andi Kleen; Chandramouleeswaran, Aswin; Norton, Scott J
> Subject: [PATCH v6 12/14] lustre-fs: Use the standard d_count() helper to access refcount
>
> The Lustre FS should use the newly defined d_count() helper function
> to access the dentry's reference count instead of defining its own
> d_refcount() macro for the same purpose. Since the current lustre
> code is marked as broken, no build test was attempted for this change.
>
This was already fixed by Al. See commit 193deee199c55ce06bca2b3e5e2d3c10208a942a in Linus tree.

Thanks,
Tao

> Signed-off-by: Waiman Long <[email protected]>
> ---
> .../lustre/include/linux/lustre_patchless_compat.h | 2 --
> drivers/staging/lustre/lustre/include/linux/lvfs.h | 2 +-
> drivers/staging/lustre/lustre/llite/dcache.c | 8 ++++----
> .../staging/lustre/lustre/llite/llite_internal.h | 4 ++--
> drivers/staging/lustre/lustre/llite/llite_lib.c | 2 +-
> drivers/staging/lustre/lustre/llite/namei.c | 4 ++--
> drivers/staging/lustre/lustre/lvfs/lvfs_linux.c | 4 ++--
> 7 files changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
> b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
> index f050808..a8e9c0c 100644
> --- a/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
> +++ b/drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h
> @@ -60,8 +60,6 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
> ll_delete_from_page_cache(page);
> }
>
> -# define d_refcount(d) ((d)->d_count)
> -
> #ifdef ATTR_OPEN
> # define ATTR_FROM_OPEN ATTR_OPEN
> #else
> diff --git a/drivers/staging/lustre/lustre/include/linux/lvfs.h
> b/drivers/staging/lustre/lustre/include/linux/lvfs.h
> index b4db6cb..eb59ac7 100644
> --- a/drivers/staging/lustre/lustre/include/linux/lvfs.h
> +++ b/drivers/staging/lustre/lustre/include/linux/lvfs.h
> @@ -99,7 +99,7 @@ static inline void l_dput(struct dentry *de)
> if (!de || IS_ERR(de))
> return;
> //shrink_dcache_parent(de);
> - LASSERT(d_refcount(de) > 0);
> + LASSERT(d_count(de) > 0);
> dput(de);
> }
>
> diff --git a/drivers/staging/lustre/lustre/llite/dcache.c b/drivers/staging/lustre/lustre/llite/dcache.c
> index 7d6abff..ff0d085 100644
> --- a/drivers/staging/lustre/lustre/llite/dcache.c
> +++ b/drivers/staging/lustre/lustre/llite/dcache.c
> @@ -98,7 +98,7 @@ int ll_dcompare(const struct dentry *parent, const struct inode *pinode,
>
> CDEBUG(D_DENTRY, "found name %.*s(%p) flags %#x refc %d\n",
> name->len, name->name, dentry, dentry->d_flags,
> - d_refcount(dentry));
> + d_count(dentry));
>
> /* mountpoint is always valid */
> if (d_mountpoint((struct dentry *)dentry))
> @@ -165,7 +165,7 @@ static int ll_ddelete(const struct dentry *de)
> list_empty(&de->d_subdirs) ? "" : "subdirs");
>
> /* kernel >= 2.6.38 last refcount is decreased after this function. */
> - LASSERT(d_refcount(de) == 1);
> + LASSERT(d_count(de) == 1);
>
> /* Disable this piece of code temproarily because this is called
> * inside dcache_lock so it's not appropriate to do lots of work
> @@ -190,7 +190,7 @@ static int ll_set_dd(struct dentry *de)
>
> CDEBUG(D_DENTRY, "ldd on dentry %.*s (%p) parent %p inode %p refc %d\n",
> de->d_name.len, de->d_name.name, de, de->d_parent, de->d_inode,
> - d_refcount(de));
> + d_count(de));
>
> if (de->d_fsdata == NULL) {
> struct ll_dentry_data *lld;
> @@ -540,7 +540,7 @@ out:
> CDEBUG(D_DENTRY, "revalidated dentry %.*s (%p) parent %p "
> "inode %p refc %d\n", de->d_name.len,
> de->d_name.name, de, de->d_parent, de->d_inode,
> - d_refcount(de));
> + d_count(de));
>
> ll_set_lock_data(exp, de->d_inode, it, &bits);
>
> diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h
> b/drivers/staging/lustre/lustre/llite/llite_internal.h
> index 992cd20..5227c5c 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_internal.h
> +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
> @@ -1529,12 +1529,12 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested)
> {
> CDEBUG(D_DENTRY, "invalidate dentry %.*s (%p) parent %p inode %p "
> "refc %d\n", dentry->d_name.len, dentry->d_name.name, dentry,
> - dentry->d_parent, dentry->d_inode, d_refcount(dentry));
> + dentry->d_parent, dentry->d_inode, d_count(dentry));
>
> spin_lock_nested(&dentry->d_lock,
> nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL);
> __d_lustre_invalidate(dentry);
> - if (d_refcount(dentry) == 0)
> + if (d_count(dentry) == 0)
> __d_drop(dentry);
> spin_unlock(&dentry->d_lock);
> }
> diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c
> b/drivers/staging/lustre/lustre/llite/llite_lib.c
> index 2311b20..afae801 100644
> --- a/drivers/staging/lustre/lustre/llite/llite_lib.c
> +++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
> @@ -659,7 +659,7 @@ void lustre_dump_dentry(struct dentry *dentry, int recur)
> " flags=0x%x, fsdata=%p, %d subdirs\n", dentry,
> dentry->d_name.len, dentry->d_name.name,
> dentry->d_parent->d_name.len, dentry->d_parent->d_name.name,
> - dentry->d_parent, dentry->d_inode, d_refcount(dentry),
> + dentry->d_parent, dentry->d_inode, d_count(dentry),
> dentry->d_flags, dentry->d_fsdata, subdirs);
> if (dentry->d_inode != NULL)
> ll_dump_inode(dentry->d_inode);
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index 58d59aa..ff8f63d 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -409,7 +409,7 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
> iput(inode);
> CDEBUG(D_DENTRY,
> "Reuse dentry %p inode %p refc %d flags %#x\n",
> - new, new->d_inode, d_refcount(new), new->d_flags);
> + new, new->d_inode, d_count(new), new->d_flags);
> return new;
> }
> }
> @@ -417,7 +417,7 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
> __d_lustre_invalidate(de);
> d_add(de, inode);
> CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
> - de, de->d_inode, d_refcount(de), de->d_flags);
> + de, de->d_inode, d_count(de), de->d_flags);
> return de;
> }
>
> diff --git a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
> b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
> index 1e6f32c..e70d8fe 100644
> --- a/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
> +++ b/drivers/staging/lustre/lustre/lvfs/lvfs_linux.c
> @@ -121,8 +121,8 @@ void push_ctxt(struct lvfs_run_ctxt *save, struct lvfs_run_ctxt *new_ctx,
> OBD_SET_CTXT_MAGIC(save);
>
> save->fs = get_fs();
> - LASSERT(d_refcount(cfs_fs_pwd(current->fs)));
> - LASSERT(d_refcount(new_ctx->pwd));
> + LASSERT(d_count(cfs_fs_pwd(current->fs)));
> + LASSERT(d_count(new_ctx->pwd));
> save->pwd = dget(cfs_fs_pwd(current->fs));
> save->pwdmnt = mntget(cfs_fs_mnt(current->fs));
> save->luc.luc_umask = current_umask();
> --
> 1.7.1
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-11 13:49:06

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v6 03/14] dcache: Add a new helper function d_count() to return refcount

On 07/08/2013 09:09 PM, Waiman Long wrote:
> This patch adds a new helper function d_count() in dcache.h for
> returning the current reference count of the dentry object. It
> should be used by all the files outside of the core dcache.c and
> namei.c files.

I want to know people's thought of spinning out patches 3-11 of this
patch series as the making the d_count() helper the standard way of
accessing the reference count in dentry outside of dcache.c and namei.c
should be non-controversial. By merging it first, this will make
revising this patch series easier and involving less people.

Regards,
Longman

Subject: Re: [PATCH v6 01/14] spinlock: A new lockref structure for lockless update of refcount

Hi,

(2013/07/09 10:09), Waiman Long wrote:> +/**
> + * lockref_put_or_lock - decrements count unless count <= 1 before decrement
> + * @lockcnt: pointer to lockref structure
> + * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
> + *
> + * The only difference between lockref_put_or_lock and lockref_put is that
> + * the former function will hold the lock on return while the latter one
> + * will free it on return.
> + */
> +static __always_inline int lockref_put_or_locked(struct lockref *lockcnt)

Here is a function name typo. _locked should be _lock.
And also, I think we should take a note here to tell this function does *not*
guarantee lockcnt->refcnt == 0 or 1 until unlocked if this returns 0.

> +{
> + spin_lock(&lockcnt->lock);
> + if (likely(lockcnt->refcnt > 1)) {
> + lockcnt->refcnt--;
> + spin_unlock(&lockcnt->lock);
> + return 1;
> + }
> + return 0;
> +}

Using this implementation guarantees lockcnt->refcnt == 0 or 1 until unlocked
if this returns 0.

However, the below one looks not guarantee it. Since lockref_add_unless
and spinlock are not done atomically, there is a chance for someone
to increment it right before locking.

Or, I missed something?

> +/**
> + * lockref_put_or_lock - Decrements count unless the count is <= 1
> + * otherwise, the lock will be taken
> + * @lockcnt: pointer to struct lockref structure
> + * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
> + */
> +int
> +lockref_put_or_lock(struct lockref *lockcnt)
> +{
> + if (lockref_add_unless(lockcnt, -1, 1))
> + return 1;
> + spin_lock(&lockcnt->lock);
> + return 0;
> +}

BTW, it looks that your dcache patch knows this and keeps double check for
the case of lockcnt->refcnt > 1 in dput().

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-07-15 21:00:38

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v6 01/14] spinlock: A new lockref structure for lockless update of refcount

On 07/13/2013 12:58 PM, Masami Hiramatsu wrote:
> Hi,
>
> (2013/07/09 10:09), Waiman Long wrote:> +/**
>> + * lockref_put_or_lock - decrements count unless count <= 1 before decrement
>> + * @lockcnt: pointer to lockref structure
>> + * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
>> + *
>> + * The only difference between lockref_put_or_lock and lockref_put is that
>> + * the former function will hold the lock on return while the latter one
>> + * will free it on return.
>> + */
>> +static __always_inline int lockref_put_or_locked(struct lockref *lockcnt)
> Here is a function name typo. _locked should be _lock.
> And also, I think we should take a note here to tell this function does *not*
> guarantee lockcnt->refcnt == 0 or 1 until unlocked if this returns 0.

Thank for pointing this out. I will fix the typo and add additional note
to the comments.

>> +{
>> + spin_lock(&lockcnt->lock);
>> + if (likely(lockcnt->refcnt > 1)) {
>> + lockcnt->refcnt--;
>> + spin_unlock(&lockcnt->lock);
>> + return 1;
>> + }
>> + return 0;
>> +}
> Using this implementation guarantees lockcnt->refcnt == 0 or 1 until unlocked
> if this returns 0.
>
> However, the below one looks not guarantee it. Since lockref_add_unless
> and spinlock are not done atomically, there is a chance for someone
> to increment it right before locking.
>
> Or, I missed something?

For both functions, reference count won't be decremented to 0 and the
caller has to handle this case by taking the lock and do whatever it
needs to handle it. When refcnt > 1, decrement is done atomically either
by cmpxchg or with the spinlock hold. The reason for these 2 functions
is to save an extra lock/unlock sequence when this feature is disabled.
I will add comments to clarify that.

>> +/**
>> + * lockref_put_or_lock - Decrements count unless the count is <= 1
>> + * otherwise, the lock will be taken
>> + * @lockcnt: pointer to struct lockref structure
>> + * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
>> + */
>> +int
>> +lockref_put_or_lock(struct lockref *lockcnt)
>> +{
>> + if (lockref_add_unless(lockcnt, -1, 1))
>> + return 1;
>> + spin_lock(&lockcnt->lock);
>> + return 0;
>> +}
> BTW, it looks that your dcache patch knows this and keeps double check for
> the case of lockcnt->refcnt > 1 in dput().

There is a slight chance that the refcnt may be changed in between
locked section of code. So it is prudent to double check before
decrementing it to zero.

Regards,
Longman