LinuxLists.cc - [PATCH v7 0/4] Lockless update of reference count protected by spinlock

2013-08-06 03:13:08

Subject: [PATCH v7 0/4] Lockless update of reference count protected by spinlock

v6->v7:
- Substantially reduce the number of patches from 14 to 4 because a
lot of the minor filesystem related changes had been merged to
v3.11-rc1.
- Remove architecture specific customization (LOCKREF_WAIT_SHIFT &
LOCKREF_RETRY_COUNT).
- Tune single-thread performance of lockref_put/get to within 10%
of old lock->update->unlock code.

v5->v6:
- Add a new GENERIC_SPINLOCK_REFCOUNT config parameter for using the
generic implementation.
- Add two parameters LOCKREF_WAIT_SHIFT and LOCKREF_RETRY_COUNT which
can be specified differently for each architecture.
- Update various spinlock_refcount.* files to incorporate review
comments.
- Replace reference of d_refcount() macro in Lustre filesystem code in
the staging tree to use the new d_count() helper function.

v4->v5:
- Add a d_count() helper for readonly access of reference count and
change all references to d_count outside of dcache.c, dcache.h
and namei.c to use d_count().

v3->v4:
- Replace helper function access to d_lock and d_count by using
macros to redefine the old d_lock name to the spinlock and new
d_refcount name to the reference count. This greatly reduces the
size of this patchset from 25 to 12 and make it easier to review.

v2->v3:
- Completely revamp the packaging by adding a new lockref data
structure that combines the spinlock with the reference
count. Helper functions are also added to manipulate the new data
structure. That results in modifying over 50 files, but the changes
were trivial in most of them.
- Change initial spinlock wait to use a timeout.
- Force 64-bit alignment of the spinlock & reference count structure.
- Add a new way to use the combo by using a new union and helper
functions.

v1->v2:
- Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
- Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
- Add some generic get/put macros into spinlock_refcount.h.

This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way. This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.

The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new lockref data structure and the d_lock name
is now a macro to the actual spinlock.

This patch set causes significant performance improvement in the
short workload of the AIM7 benchmark on a 8-socket x86-64 machine
with 80 cores.

Thank to Thomas Gleixner, Andi Kleen and Linus for their valuable
input in shaping this patchset.

Signed-off-by: Waiman Long <[email protected]>

Waiman Long (4):
spinlock: A new lockref structure for lockless update of refcount
spinlock: Enable x86 architecture to do lockless refcount update
dcache: replace d_lock/d_count by d_lockcnt
dcache: Enable lockless update of dentry's refcount

arch/x86/Kconfig | 3 +
fs/dcache.c | 78 +++++++------
fs/namei.c | 6 +-
include/asm-generic/spinlock_refcount.h | 46 +++++++
include/linux/dcache.h | 22 ++--
include/linux/spinlock_refcount.h | 126 ++++++++++++++++++++
kernel/Kconfig.locks | 15 +++
lib/Makefile | 2 +
lib/spinlock_refcount.c | 198 +++++++++++++++++++++++++++++++
9 files changed, 449 insertions(+), 47 deletions(-)
create mode 100644 include/asm-generic/spinlock_refcount.h
create mode 100644 include/linux/spinlock_refcount.h
create mode 100644 lib/spinlock_refcount.c

2013-08-06 03:13:10

by Waiman Long

[permalink] [raw]

Subject: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

This patch introduces a new set of spinlock_refcount.h header files to
be included by kernel codes that want to do a faster lockless update
of reference count protected by a spinlock.

The new lockref structure consists of just the spinlock and the
reference count data. Helper functions are defined in the new
<linux/spinlock_refcount.h> header file to access the content of
the new structure. There is a generic structure defined for all
architecture, but each architecture can also optionally define its
own structure and use its own helper functions.

Three new config parameters are introduced:
1. SPINLOCK_REFCOUNT
2. GENERIC_SPINLOCK_REFCOUNT
2. ARCH_SPINLOCK_REFCOUNT

The first one is defined in the kernel/Kconfig.locks which is used
to enable or disable the faster lockless reference count update
optimization. The second and third one have to be defined in each of
the architecture's Kconfig file to enable the optimization for that
architecture. Therefore, each architecture has to opt-in for this
optimization or it won't get it. This allows each architecture plenty
of time to test it out before deciding to use it or replace it with
a better architecture specific solution. The architecture should set
only GENERIC_SPINLOCK_REFCOUNT to use the generic implementation
without customization. By setting only ARCH_SPINLOCK_REFCOUNT,
the architecture will have to provide its own implementation.

This optimization won't work for non-SMP system or when spinlock
debugging is turned on. As a result, it is turned off each any of
them is true. It also won't work for full preempt-RT and so should
be turned off in this case.

To maximize the chance of doing lockless atomic update, the new code
will wait until the lock is free before trying to do the update.
The code will also attempt to do lockless atomic update a few times
before falling back to the old code path of acquiring a lock before
doing the update.

The table below shows the average JPM (jobs/minute) number (out of
3 runs) of the AIM7's short workload at 1500 users for different
configurations on an 8-socket 80-core DL980 with HT off with kernel
based on 3.11-rc3.

Configuration JPM
------------- ---
Wait till lock free, 1 update attempt 5899907
Wait till lock free, 2 update attempts 6534958
Wait till lock free, 3 update attempts 6868170
Wait till lock free, 4 update attempts 6905332
No wait, 2 update attempts 1091273
No wait, 4 update attempts 1281867
No wait, 8 update attempts 5095203
No wait, 16 update attempts 6392709
No wait, 32 update attempts 6438080

The "no wait, 8 update attempts" test showed high variability in the
results. One run can have 6M JPM whereas the other one is only 2M
JPM, for example. The "wait till lock free" tests, on the other hand,
are much more stable in their throughput numbers.

For this initial version, the code will wait until the lock is free
with 4 update attempts.

To evaluate the performance difference between doing a reference count
update using the old way (lock->update->unlock) and the new lockref
functions in the uncontended case, a 256K loop was run on a 2.4Ghz
Westmere x86-64 CPU. The following table shows the average time
(in ns) for a single update operation (including the looping and
timing overhead):

Update Type Time (ns)
----------- ---------
lock->update->unlock 14.7
lockref_get/lockref_put 16.0

The new lockref* functions are about 10% slower than when there is
no contention. Since reference count update is usually a very small
part of a typical workload, the actual performance impact of this
change is negligible when there is no contention.

Signed-off-by: Waiman Long <[email protected]>
---
include/asm-generic/spinlock_refcount.h | 46 +++++++
include/linux/spinlock_refcount.h | 126 ++++++++++++++++++++
kernel/Kconfig.locks | 15 +++
lib/Makefile | 2 +
lib/spinlock_refcount.c | 198 +++++++++++++++++++++++++++++++
5 files changed, 387 insertions(+), 0 deletions(-)
create mode 100644 include/asm-generic/spinlock_refcount.h
create mode 100644 include/linux/spinlock_refcount.h
create mode 100644 lib/spinlock_refcount.c

diff --git a/include/asm-generic/spinlock_refcount.h b/include/asm-generic/spinlock_refcount.h
new file mode 100644
index 0000000..d3a4119
--- /dev/null
+++ b/include/asm-generic/spinlock_refcount.h
@@ -0,0 +1,46 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+#ifndef __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+#define __ASM_GENERIC_SPINLOCK_REFCOUNT_H
+
+/*
+ * The lockref structure defines a combined spinlock with reference count
+ * data structure to be embedded in a larger structure. The combined data
+ * structure is always 8-byte aligned. So proper placement of this structure
+ * in the larger embedding data structure is needed to ensure that there is
+ * no hole in it.
+ */
+struct __aligned(sizeof(u64)) lockref {
+ union {
+ u64 lock_count;
+ struct {
+ unsigned int refcnt; /* Reference count */
+ spinlock_t lock;
+ };
+ };
+};
+
+/*
+ * Struct lockref helper functions
+ */
+extern void lockref_get(struct lockref *lockcnt);
+extern int lockref_put(struct lockref *lockcnt);
+extern int lockref_get_not_zero(struct lockref *lockcnt);
+extern int lockref_put_or_lock(struct lockref *lockcnt);
+
+#endif /* __ASM_GENERIC_SPINLOCK_REFCOUNT_H */
diff --git a/include/linux/spinlock_refcount.h b/include/linux/spinlock_refcount.h
new file mode 100644
index 0000000..abadd87
--- /dev/null
+++ b/include/linux/spinlock_refcount.h
@@ -0,0 +1,126 @@
+/*
+ * Spinlock with reference count combo data structure
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (c) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+#ifndef __LINUX_SPINLOCK_REFCOUNT_H
+#define __LINUX_SPINLOCK_REFCOUNT_H
+
+#include <linux/spinlock.h>
+
+/*
+ * To enable lockless update of reference count, an architecture has to define
+ * either one of the following two config parameters in its Kconfig file:
+ * 1. GENERIC_SPINLOCK_REFCOUNT
+ * 2. ARCH_SPINLOCK_REFCOUNT
+ *
+ * By defining just the GENERIC_SPINLOCK_REFCOUNT parameter, the architecture
+ * will use the generic implementation. There is nothing else an architecture
+ * need to do.
+ *
+ * On the other hand, defining the ARCH_SPINLOCK_REFCOUNT parameter indicates
+ * that the architecture is provding its own implementation. It has to provide
+ * an <asm/spinlock_refcount.h> header file.
+ */
+#ifdef CONFIG_SPINLOCK_REFCOUNT
+
+# ifdef CONFIG_ARCH_SPINLOCK_REFCOUNT
+# include <asm/spinlock_refcount.h>
+# else
+# include <asm-generic/spinlock_refcount.h>
+# endif
+
+#else
+/*
+ * If the spinlock & reference count optimization feature is disabled,
+ * they will be accessed separately on its own.
+ */
+struct lockref {
+ unsigned int refcnt; /* Reference count */
+ spinlock_t lock;
+};
+
+/*
+ * Struct lockref helper functions
+ */
+/**
+ * lockref_get - Increments reference count unconditionally
+ * @lockcnt: pointer to lockref structure
+ */
+static __always_inline void lockref_get(struct lockref *lockcnt)
+{
+ spin_lock(&lockcnt->lock);
+ lockcnt->refcnt++;
+ spin_unlock(&lockcnt->lock);
+}
+
+/**
+ * lockref_get_not_zero - Increments count unless the count is 0
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0
+ */
+static __always_inline int lockref_get_not_zero(struct lockref *lockcnt)
+{
+ int retval = 0;
+
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt)) {
+ lockcnt->refcnt++;
+ retval = 1;
+ }
+ spin_unlock(&lockcnt->lock);
+ return retval;
+}
+
+/**
+ * lockref_put - Decrements count unless count <= 1 before decrement
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+static __always_inline int lockref_put(struct lockref *lockcnt)
+{
+ int retval = 0;
+
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt > 1)) {
+ lockcnt->refcnt--;
+ retval = 1;
+ }
+ spin_unlock(&lockcnt->lock);
+ return retval;
+}
+
+/**
+ * lockref_put_or_lock - decrements count unless count <= 1 before decrement
+ * @lockcnt: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ *
+ * The only difference between lockref_put_or_lock and lockref_put is that
+ * the former function will hold the lock on return while the latter one
+ * will free it on return.
+ */
+static __always_inline int lockref_put_or_lock(struct lockref *lockcnt)
+{
+ spin_lock(&lockcnt->lock);
+ if (likely(lockcnt->refcnt > 1)) {
+ lockcnt->refcnt--;
+ spin_unlock(&lockcnt->lock);
+ return 1;
+ }
+ return 0;
+}
+
+#endif /* !CONFIG_SPINLOCK_REFCOUNT */
+#endif /* __LINUX_SPINLOCK_REFCOUNT_H */
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index d2b32ac..67ff90b 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -223,3 +223,18 @@ endif
config MUTEX_SPIN_ON_OWNER
def_bool y
depends on SMP && !DEBUG_MUTEXES
+
+#
+# Spinlock with reference count optimization
+#
+config GENERIC_SPINLOCK_REFCOUNT
+ bool
+
+config ARCH_SPINLOCK_REFCOUNT
+ bool
+
+config SPINLOCK_REFCOUNT
+ def_bool y
+ depends on ARCH_SPINLOCK_REFCOUNT || GENERIC_SPINLOCK_REFCOUNT
+ depends on SMP
+ depends on !GENERIC_LOCKBREAK && !DEBUG_SPINLOCK && !DEBUG_LOCK_ALLOC
diff --git a/lib/Makefile b/lib/Makefile
index 7baccfd..91de559 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -187,3 +187,5 @@ quiet_cmd_build_OID_registry = GEN $@
clean-files += oid_registry_data.c

obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
+
+obj-$(CONFIG_GENERIC_SPINLOCK_REFCOUNT) += spinlock_refcount.o
diff --git a/lib/spinlock_refcount.c b/lib/spinlock_refcount.c
new file mode 100644
index 0000000..963ff07
--- /dev/null
+++ b/lib/spinlock_refcount.c
@@ -0,0 +1,198 @@
+/*
+ * Generic spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2013 Hewlett-Packard Development Company, L.P.
+ *
+ * Authors: Waiman Long <[email protected]>
+ */
+
+#ifdef CONFIG_SPINLOCK_REFCOUNT
+#include <linux/spinlock.h>
+#include <linux/spinlock_refcount.h>
+
+/*
+ * The number of attempts to update the reference count locklessly before
+ * quitting (default = 4).
+ */
+#ifndef LOCKREF_RETRY_COUNT
+#define LOCKREF_RETRY_COUNT 4
+#endif
+
+/**
+ *
+ * add_unless - atomically add to count unless locked or reach threshold
+ *
+ * @lockcnt : pointer to the lockref structure
+ * @value : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return : 1 if operation succeeds, -1 if threshold reached, 0 otherwise
+ *
+ * If the lock was not acquired, add_unless() atomically adds the given value
+ * to the reference count unless the given threshold is reached. If the lock
+ * was acquired or the threshold was reached, 0 is returned and the caller
+ * will have to acquire the lock and update the count accordingly (can be
+ * done in a non-atomic way).
+ */
+static __always_inline int
+add_unless(struct lockref *lockcnt, int value, int threshold)
+{
+ struct lockref old;
+ register struct lockref new;
+
+ old.lock_count = ACCESS_ONCE(lockcnt->lock_count);
+ if ((threshold >= 0) && (old.refcnt <= threshold))
+ return -1;
+ if (likely(!spin_is_locked(&old.lock))) {
+ new.lock_count = old.lock_count;
+ new.refcnt += value;
+ if (likely(cmpxchg64(&lockcnt->lock_count, old.lock_count,
+ new.lock_count) == old.lock_count))
+ return 1;
+ }
+ return 0;
+}
+
+/**
+ *
+ * add_unless_loop - call add_unless in a loop
+ *
+ * @lockcnt : pointer to the lockref structure
+ * @value : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * @loopcnt : loop count
+ * Return : 1 if operation succeeds, 0 otherwise
+ */
+static noinline int
+add_unless_loop(struct lockref *lockcnt, int value, int threshold, int loopcnt)
+{
+ int ret;
+
+ if (threshold >= 0) {
+ for (; loopcnt > 0; loopcnt--) {
+ ret = add_unless(lockcnt, value, threshold);
+ if (ret > 0)
+ return 1;
+ else if (ret < 0)
+ return 0;
+ cpu_relax();
+ }
+ } else {
+ for (; loopcnt > 0; loopcnt--) {
+ if (add_unless(lockcnt, value, -1) > 0)
+ return 1;
+ cpu_relax();
+ }
+ }
+ return 0;
+}
+
+/**
+ *
+ * lockref_add_unless - atomically add to count unless locked or reach threshold
+ *
+ * @lockcnt : pointer to the lockref structure
+ * @value : value to be added
+ * @threshold: threshold value for acquiring the lock
+ * Return : 1 if operation succeeds, 0 otherwise
+ *
+ * The reason for separating out the first lockless update attempt from the
+ * rest is due to the fact that gcc compiler seems to be less able to optimize
+ * complex operations in a loop. So we try it once, if it doesn't work, we
+ * try out the remaining attempts in a separate slowpath function.
+ */
+static __always_inline int
+lockref_add_unless(struct lockref *lockcnt, int value, int threshold)
+{
+ int ret;
+
+ /*
+ * Code doesn't work if raw spinlock is larger than 4 bytes
+ * or is empty.
+ */
+ BUILD_BUG_ON((sizeof(arch_spinlock_t) == 0) ||
+ (sizeof(arch_spinlock_t) > 4));
+
+ /*
+ * Wait until the lock is free before attempting to do a lockless
+ * reference count update.
+ */
+ while (spin_is_locked(&lockcnt->lock))
+ cpu_relax();
+
+ ret = add_unless(lockcnt, value, threshold);
+ if (likely(ret > 0))
+ return 1;
+ if (unlikely((ret == 0) && (LOCKREF_RETRY_COUNT > 1))) {
+ cpu_relax();
+ if (add_unless_loop(lockcnt, value, threshold,
+ LOCKREF_RETRY_COUNT - 1))
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Struct lockref helper functions
+ */
+/**
+ * lockref_get - Increments reference count unconditionally
+ * @lockcnt: pointer to struct lockref structure
+ */
+void lockref_get(struct lockref *lockcnt)
+{
+ if (likely(lockref_add_unless(lockcnt, 1, -1)))
+ return;
+ spin_lock(&lockcnt->lock);
+ lockcnt->refcnt++;
+ spin_unlock(&lockcnt->lock);
+}
+EXPORT_SYMBOL(lockref_get);
+
+/**
+ * lockref_get_not_zero - Increments count unless the count is 0
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count is 0 and lock taken
+ */
+int lockref_get_not_zero(struct lockref *lockcnt)
+{
+ return lockref_add_unless(lockcnt, 1, 0);
+}
+EXPORT_SYMBOL(lockref_get_not_zero);
+
+/**
+ * lockref_put - Decrements count unless the count <= 1
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1
+ */
+int lockref_put(struct lockref *lockcnt)
+{
+ return lockref_add_unless(lockcnt, -1, 1);
+}
+EXPORT_SYMBOL(lockref_put);
+
+/**
+ * lockref_put_or_lock - Decrements count unless the count is <= 1
+ * otherwise, the lock will be taken
+ * @lockcnt: pointer to struct lockref structure
+ * Return: 1 if count updated successfully or 0 if count <= 1 and lock taken
+ */
+int
+lockref_put_or_lock(struct lockref *lockcnt)
+{
+ if (likely(lockref_add_unless(lockcnt, -1, 1)))
+ return 1;
+ spin_lock(&lockcnt->lock);
+ return 0;
+}
+EXPORT_SYMBOL(lockref_put_or_lock);
+#endif /* CONFIG_SPINLOCK_REFCOUNT */
--
1.7.1

2013-08-06 03:13:23

by Waiman Long

[permalink] [raw]

Subject: [PATCH v7 4/4] dcache: Enable lockless update of dentry's refcount

The current code takes the dentry's d_lock lock whenever the refcnt
is being updated. In reality, nothing big really happens until refcnt
goes to 0 in dput(). So it is not necessary to take the lock if the
reference count won't go to 0. On the other hand, there are cases
where refcnt should not be updated or was not expected to be updated
while d_lock was acquired by another thread.

This patch changes the code in dput(), dget(), __dget() and
dget_parent() to use lockless reference count update function calls.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput due
to this patch on an 8-socket 80-core x86-64 system with a 3.11-rc3
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-----------------+----------------+-----------------+----------+
| Configuration | Mean JPM | Mean JPM | % Change |
| | Rate w/o patch | Rate with patch | |
+-----------------+---------------------------------------------+
| | User Range 10 - 100 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1760523 | 4225737 | +140.0% |
| 4 nodes, HT off | 2020076 | 3206202 | +58.7% |
| 2 nodes, HT off | 2391359 | 2654701 | +11.0% |
| 1 node , HT off | 2302912 | 2302433 | 0.0% |
+-----------------+---------------------------------------------+
| | User Range 200 - 1000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1078421 | 7380760 | +584.4% |
| 4 nodes, HT off | 1371040 | 4212007 | +207.2% |
| 2 nodes, HT off | 2844720 | 2783442 | -2.2% |
| 1 node , HT off | 2433443 | 2415590 | -0.7% |
+-----------------+---------------------------------------------+
| | User Range 1100 - 2000 |
+-----------------+---------------------------------------------+
| 8 nodes, HT off | 1055626 | 7118985 | +574.4% |
| 4 nodes, HT off | 1352329 | 4512914 | +233.7% |
| 2 nodes, HT off | 2793037 | 2758652 | -1.2% |
| 1 node , HT off | 2458125 | 2445069 | -0.5% |
+-----------------+----------------+-----------------+----------+

With 4 nodes and above, there are significant performance improvement
with this patch. With only 1 or 2 nodes, the performance is very close.
Because of variability of the AIM7 benchmark, a few percent difference
may not indicate a real performance gain or loss.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
79% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
1. dget_parent (49.92%)
2. dput (49.84%)

The relevant perf report lines are:
+ 78.76% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 0.05% reaim [kernel.kallsyms] [k] dput
+ 0.01% reaim [kernel.kallsyms] [k] dget_parent

With this patch installed, the new perf report lines are:
+ 19.66% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave
+ 2.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
+ 2.23% reaim [kernel.kallsyms] [k] lockref_get_not_zero
+ 0.50% reaim [kernel.kallsyms] [k] dput
+ 0.32% reaim [kernel.kallsyms] [k] lockref_put_or_lock
+ 0.30% reaim [kernel.kallsyms] [k] lockref_get
+ 0.01% reaim [kernel.kallsyms] [k] dget_parent

- 2.46% reaim [kernel.kallsyms] [k] _raw_spin_lock
- _raw_spin_lock
+ 23.89% sys_getcwd
+ 23.60% d_path
+ 8.01% prepend_path
+ 5.18% complete_walk
+ 4.21% __rcu_process_callbacks
+ 3.08% inet_twsk_schedule
+ 2.36% do_anonymous_page
+ 2.24% unlazy_walk
+ 2.02% sem_lock
+ 1.82% process_backlog
+ 1.62% selinux_inode_free_security
+ 1.54% task_rq_lock
+ 1.45% unix_dgram_sendmsg
+ 1.18% enqueue_to_backlog
+ 1.06% unix_stream_sendmsg
+ 0.94% tcp_v4_rcv
+ 0.87% unix_create1
+ 0.71% scheduler_tick
+ 0.60% unix_release_sock
+ 0.59% do_wp_page
+ 0.59% unix_stream_recvmsg
+ 0.58% handle_pte_fault
+ 0.57% __do_fault
+ 0.53% unix_peer_get

The dput() and dget_parent() functions didn't show up in the
_raw_spin_lock callers at all.

This impact of this patch on other AIM7 workloads were much more
modest. Besides short, the other AIM7 workload that showed consistent
improvement is the high_systime workload. For the other workloads,
the changes were so minor that they are no significant difference
with and without the patch.

+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +1.1% | +3.4% |
+--------------+---------------+----------------+-----------------+

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 26 ++++++++++++++++++--------
include/linux/dcache.h | 7 ++-----
2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3adb6aa..9a4cf30 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -513,9 +513,15 @@ void dput(struct dentry *dentry)
return;

repeat:
- if (d_count(dentry) == 1)
- might_sleep();
- spin_lock(&dentry->d_lock);
+ if (d_count(dentry) > 1) {
+ if (lockref_put_or_lock(&dentry->d_lockcnt))
+ return;
+ /* dentry's lock taken */
+ } else {
+ if (d_count(dentry) == 1)
+ might_sleep();
+ spin_lock(&dentry->d_lock);
+ }
BUG_ON(!d_count(dentry));
if (d_count(dentry) > 1) {
dentry->d_lockcnt.refcnt--;
@@ -611,26 +617,30 @@ static inline void __dget_dlock(struct dentry *dentry)

static inline void __dget(struct dentry *dentry)
{
- spin_lock(&dentry->d_lock);
- __dget_dlock(dentry);
- spin_unlock(&dentry->d_lock);
+ lockref_get(&dentry->d_lockcnt);
}

struct dentry *dget_parent(struct dentry *dentry)
{
struct dentry *ret;

+ rcu_read_lock();
+ ret = rcu_dereference(dentry->d_parent);
+ if (lockref_get_not_zero(&ret->d_lockcnt)) {
+ rcu_read_unlock();
+ return ret;
+ }
repeat:
/*
* Don't need rcu_dereference because we re-check it was correct under
* the lock.
*/
- rcu_read_lock();
- ret = dentry->d_parent;
+ ret = ACCESS_ONCE(dentry->d_parent);
spin_lock(&ret->d_lock);
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
rcu_read_unlock();
+ rcu_read_lock();
goto repeat;
}
rcu_read_unlock();
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 20e6f2e..ec9206e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -367,11 +367,8 @@ static inline struct dentry *dget_dlock(struct dentry *dentry)

static inline struct dentry *dget(struct dentry *dentry)
{
- if (dentry) {
- spin_lock(&dentry->d_lock);
- dget_dlock(dentry);
- spin_unlock(&dentry->d_lock);
- }
+ if (dentry)
+ lockref_get(&dentry->d_lockcnt);
return dentry;
}

--
1.7.1

2013-08-06 03:13:41

by Waiman Long

[permalink] [raw]

Subject: [PATCH v7 3/4] dcache: replace d_lock/d_count by d_lockcnt

This patch replaces the d_lock and d_count fields of the dentry
data structure by the combined d_lockcnt structure. A d_lock macro
is defined to remap the old d_lock name to the new d_lockcnt.lock
name. This is needed as a lot of files use the d_lock spinlock.

Read accesses to d_count are replaced by the d_count() helper
function. Write accesses to d_count are replaced by the new
d_lockcnt.refcnt name. Other than that, there is no other functional
change in this patch.

The offsets of the new d_lockcnt field are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 54 ++++++++++++++++++++++++------------------------
fs/namei.c | 6 ++--
include/linux/dcache.h | 15 ++++++++----
3 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 87bdb53..3adb6aa 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -54,7 +54,7 @@
* - d_flags
* - d_name
* - d_lru
- * - d_count
+ * - d_lockcnt.refcnt
* - d_unhashed()
* - d_parent and d_subdirs
* - childrens' d_child and d_parent
@@ -229,7 +229,7 @@ static void __d_free(struct rcu_head *head)
*/
static void d_free(struct dentry *dentry)
{
- BUG_ON(dentry->d_count);
+ BUG_ON(d_count(dentry));
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -467,7 +467,7 @@ relock:
}

if (ref)
- dentry->d_count--;
+ dentry->d_lockcnt.refcnt--;
/*
* inform the fs via d_prune that this dentry is about to be
* unhashed and destroyed.
@@ -513,12 +513,12 @@ void dput(struct dentry *dentry)
return;

repeat:
- if (dentry->d_count == 1)
+ if (d_count(dentry) == 1)
might_sleep();
spin_lock(&dentry->d_lock);
- BUG_ON(!dentry->d_count);
- if (dentry->d_count > 1) {
- dentry->d_count--;
+ BUG_ON(!d_count(dentry));
+ if (d_count(dentry) > 1) {
+ dentry->d_lockcnt.refcnt--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -535,7 +535,7 @@ repeat:
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);

- dentry->d_count--;
+ dentry->d_lockcnt.refcnt--;
spin_unlock(&dentry->d_lock);
return;

@@ -590,7 +590,7 @@ int d_invalidate(struct dentry * dentry)
* We also need to leave mountpoints alone,
* directory or not.
*/
- if (dentry->d_count > 1 && dentry->d_inode) {
+ if (d_count(dentry) > 1 && dentry->d_inode) {
if (S_ISDIR(dentry->d_inode->i_mode) || d_mountpoint(dentry)) {
spin_unlock(&dentry->d_lock);
return -EBUSY;
@@ -606,7 +606,7 @@ EXPORT_SYMBOL(d_invalidate);
/* This must be called with d_lock held */
static inline void __dget_dlock(struct dentry *dentry)
{
- dentry->d_count++;
+ dentry->d_lockcnt.refcnt++;
}

static inline void __dget(struct dentry *dentry)
@@ -634,8 +634,8 @@ repeat:
goto repeat;
}
rcu_read_unlock();
- BUG_ON(!ret->d_count);
- ret->d_count++;
+ BUG_ON(!d_count(ret));
+ ret->d_lockcnt.refcnt++;
spin_unlock(&ret->d_lock);
return ret;
}
@@ -718,7 +718,7 @@ restart:
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
- if (!dentry->d_count) {
+ if (!d_count(dentry)) {
__dget_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
@@ -734,7 +734,7 @@ EXPORT_SYMBOL(d_prune_aliases);

/*
* Try to throw away a dentry - free the inode, dput the parent.
- * Requires dentry->d_lock is held, and dentry->d_count == 0.
+ * Requires dentry->d_lock is held, and dentry->d_lockcnt.refcnt == 0.
* Releases dentry->d_lock.
*
* This may fail if locks cannot be acquired no problem, just try again.
@@ -764,8 +764,8 @@ static void try_prune_one_dentry(struct dentry *dentry)
dentry = parent;
while (dentry) {
spin_lock(&dentry->d_lock);
- if (dentry->d_count > 1) {
- dentry->d_count--;
+ if (d_count(dentry) > 1) {
+ dentry->d_lockcnt.refcnt--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -793,7 +793,7 @@ static void shrink_dentry_list(struct list_head *list)
* the LRU because of laziness during lookup. Do not free
* it - just keep it off the LRU list.
*/
- if (dentry->d_count) {
+ if (d_count(dentry)) {
dentry_lru_del(dentry);
spin_unlock(&dentry->d_lock);
continue;
@@ -913,7 +913,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
dentry_lru_del(dentry);
__d_shrink(dentry);

- if (dentry->d_count != 0) {
+ if (d_count(dentry) != 0) {
printk(KERN_ERR
"BUG: Dentry %p{i=%lx,n=%s}"
" still in use (%d)"
@@ -922,7 +922,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
dentry->d_inode ?
dentry->d_inode->i_ino : 0UL,
dentry->d_name.name,
- dentry->d_count,
+ d_count(dentry),
dentry->d_sb->s_type->name,
dentry->d_sb->s_id);
BUG();
@@ -933,7 +933,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
list_del(&dentry->d_u.d_child);
} else {
parent = dentry->d_parent;
- parent->d_count--;
+ parent->d_lockcnt.refcnt--;
list_del(&dentry->d_u.d_child);
}

@@ -981,7 +981,7 @@ void shrink_dcache_for_umount(struct super_block *sb)

dentry = sb->s_root;
sb->s_root = NULL;
- dentry->d_count--;
+ dentry->d_lockcnt.refcnt--;
shrink_dcache_for_umount_subtree(dentry);

while (!hlist_bl_empty(&sb->s_anon)) {
@@ -1147,7 +1147,7 @@ resume:
* loop in shrink_dcache_parent() might not make any progress
* and loop forever.
*/
- if (dentry->d_count) {
+ if (d_count(dentry)) {
dentry_lru_del(dentry);
} else if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
dentry_lru_move_list(dentry, dispose);
@@ -1269,7 +1269,7 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
smp_wmb();
dentry->d_name.name = dname;

- dentry->d_count = 1;
+ dentry->d_lockcnt.refcnt = 1;
dentry->d_flags = 0;
spin_lock_init(&dentry->d_lock);
seqcount_init(&dentry->d_seq);
@@ -1970,7 +1970,7 @@ struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
goto next;
}

- dentry->d_count++;
+ dentry->d_lockcnt.refcnt++;
found = dentry;
spin_unlock(&dentry->d_lock);
break;
@@ -2069,7 +2069,7 @@ again:
spin_lock(&dentry->d_lock);
inode = dentry->d_inode;
isdir = S_ISDIR(inode->i_mode);
- if (dentry->d_count == 1) {
+ if (d_count(dentry) == 1) {
if (!spin_trylock(&inode->i_lock)) {
spin_unlock(&dentry->d_lock);
cpu_relax();
@@ -2937,7 +2937,7 @@ resume:
}
if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
dentry->d_flags |= DCACHE_GENOCIDE;
- dentry->d_count--;
+ dentry->d_lockcnt.refcnt--;
}
spin_unlock(&dentry->d_lock);
}
@@ -2945,7 +2945,7 @@ resume:
struct dentry *child = this_parent;
if (!(this_parent->d_flags & DCACHE_GENOCIDE)) {
this_parent->d_flags |= DCACHE_GENOCIDE;
- this_parent->d_count--;
+ this_parent->d_lockcnt.refcnt--;
}
this_parent = try_to_ascend(this_parent, locked, seq);
if (!this_parent)
diff --git a/fs/namei.c b/fs/namei.c
index 8b61d10..28e5152 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -536,8 +536,8 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
* a reference at this point.
*/
BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
- BUG_ON(!parent->d_count);
- parent->d_count++;
+ BUG_ON(!d_count(parent));
+ parent->d_lockcnt.refcnt++;
spin_unlock(&dentry->d_lock);
}
spin_unlock(&parent->d_lock);
@@ -3327,7 +3327,7 @@ void dentry_unhash(struct dentry *dentry)
{
shrink_dcache_parent(dentry);
spin_lock(&dentry->d_lock);
- if (dentry->d_count == 1)
+ if (d_count(dentry) == 1)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
}
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index b90337c..20e6f2e 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -9,6 +9,7 @@
#include <linux/seqlock.h>
#include <linux/cache.h>
#include <linux/rcupdate.h>
+#include <linux/spinlock_refcount.h>

struct nameidata;
struct path;
@@ -112,8 +113,7 @@ struct dentry {
unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */

/* Ref lookup also touches following */
- unsigned int d_count; /* protected by d_lock */
- spinlock_t d_lock; /* per dentry lock */
+ struct lockref d_lockcnt; /* per dentry lock & count */
const struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
unsigned long d_time; /* used by d_revalidate */
@@ -132,6 +132,11 @@ struct dentry {
};

/*
+ * Define macros to access the name-changed spinlock
+ */
+#define d_lock d_lockcnt.lock
+
+/*
* dentry->d_lock spinlock nesting subclasses:
*
* 0: normal
@@ -318,7 +323,7 @@ static inline int __d_rcu_to_refcount(struct dentry *dentry, unsigned seq)
assert_spin_locked(&dentry->d_lock);
if (!read_seqcount_retry(&dentry->d_seq, seq)) {
ret = 1;
- dentry->d_count++;
+ dentry->d_lockcnt.refcnt++;
}

return ret;
@@ -326,7 +331,7 @@ static inline int __d_rcu_to_refcount(struct dentry *dentry, unsigned seq)

static inline unsigned d_count(const struct dentry *dentry)
{
- return dentry->d_count;
+ return dentry->d_lockcnt.refcnt;
}

/* validate "insecure" dentry pointer */
@@ -356,7 +361,7 @@ extern char *dentry_path(struct dentry *, char *, int);
static inline struct dentry *dget_dlock(struct dentry *dentry)
{
if (dentry)
- dentry->d_count++;
+ dentry->d_lockcnt.refcnt++;
return dentry;
}

--
1.7.1

2013-08-06 03:14:11

by Waiman Long

[permalink] [raw]

Subject: [PATCH v7 2/4] spinlock: Enable x86 architecture to do lockless refcount update

This patch enables the x86 architecture to do lockless reference
count update using the generic lockref implementation with default
parameters. Only the x86/Kconfig file needs to be changed.

Signed-off-by: Waiman Long <[email protected]>
---
arch/x86/Kconfig | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..79a9309 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -262,6 +262,9 @@ config ARCH_CPU_PROBE_RELEASE
config ARCH_SUPPORTS_UPROBES
def_bool y

+config GENERIC_SPINLOCK_REFCOUNT
+ def_bool y
+
source "init/Kconfig"
source "kernel/Kconfig.freezer"

--
1.7.1

2013-08-13 18:03:12

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH v7 0/4] Lockless update of reference count protected by spinlock

On 08/05/2013 11:12 PM, Waiman Long wrote:
> v6->v7:
> - Substantially reduce the number of patches from 14 to 4 because a
> lot of the minor filesystem related changes had been merged to
> v3.11-rc1.
> - Remove architecture specific customization (LOCKREF_WAIT_SHIFT&
> LOCKREF_RETRY_COUNT).
> - Tune single-thread performance of lockref_put/get to within 10%
> of old lock->update->unlock code.
>
> v5->v6:
> - Add a new GENERIC_SPINLOCK_REFCOUNT config parameter for using the
> generic implementation.
> - Add two parameters LOCKREF_WAIT_SHIFT and LOCKREF_RETRY_COUNT which
> can be specified differently for each architecture.
> - Update various spinlock_refcount.* files to incorporate review
> comments.
> - Replace reference of d_refcount() macro in Lustre filesystem code in
> the staging tree to use the new d_count() helper function.
>
> v4->v5:
> - Add a d_count() helper for readonly access of reference count and
> change all references to d_count outside of dcache.c, dcache.h
> and namei.c to use d_count().
>
> v3->v4:
> - Replace helper function access to d_lock and d_count by using
> macros to redefine the old d_lock name to the spinlock and new
> d_refcount name to the reference count. This greatly reduces the
> size of this patchset from 25 to 12 and make it easier to review.
>
> v2->v3:
> - Completely revamp the packaging by adding a new lockref data
> structure that combines the spinlock with the reference
> count. Helper functions are also added to manipulate the new data
> structure. That results in modifying over 50 files, but the changes
> were trivial in most of them.
> - Change initial spinlock wait to use a timeout.
> - Force 64-bit alignment of the spinlock& reference count structure.
> - Add a new way to use the combo by using a new union and helper
> functions.
>
> v1->v2:
> - Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
> - Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
> - Add some generic get/put macros into spinlock_refcount.h.
>
> This patchset supports a generic mechanism to atomically update
> a reference count that is protected by a spinlock without actually
> acquiring the lock itself. If the update doesn't succeeed, the caller
> will have to acquire the lock and update the reference count in the
> the old way. This will help in situation where there is a lot of
> spinlock contention because of frequent reference count update.
>
> The d_lock and d_count fields of the struct dentry in dcache.h was
> modified to use the new lockref data structure and the d_lock name
> is now a macro to the actual spinlock.
>
> This patch set causes significant performance improvement in the
> short workload of the AIM7 benchmark on a 8-socket x86-64 machine
> with 80 cores.
>
> Thank to Thomas Gleixner, Andi Kleen and Linus for their valuable
> input in shaping this patchset.
>
> Signed-off-by: Waiman Long<[email protected]>
>
> Waiman Long (4):
> spinlock: A new lockref structure for lockless update of refcount
> spinlock: Enable x86 architecture to do lockless refcount update
> dcache: replace d_lock/d_count by d_lockcnt
> dcache: Enable lockless update of dentry's refcount
>
> arch/x86/Kconfig | 3 +
> fs/dcache.c | 78 +++++++------
> fs/namei.c | 6 +-
> include/asm-generic/spinlock_refcount.h | 46 +++++++
> include/linux/dcache.h | 22 ++--
> include/linux/spinlock_refcount.h | 126 ++++++++++++++++++++
> kernel/Kconfig.locks | 15 +++
> lib/Makefile | 2 +
> lib/spinlock_refcount.c | 198 +++++++++++++++++++++++++++++++
> 9 files changed, 449 insertions(+), 47 deletions(-)
> create mode 100644 include/asm-generic/spinlock_refcount.h
> create mode 100644 include/linux/spinlock_refcount.h
> create mode 100644 lib/spinlock_refcount.c

So far, I haven't heard back anything about if further change and
improvement is needed for this patch set. Any comment or feedback will
be appreciated.

Thank in advance for your time.

Regards,
Longman

2013-08-29 01:40:08

Subject: [PATCH v7 0/4] Lockless update of reference count protected by spinlock

Subject: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: [PATCH v7 4/4] dcache: Enable lockless update of dentry's refcount

Subject: [PATCH v7 3/4] dcache: replace d_lock/d_count by d_lockcnt

Subject: [PATCH v7 2/4] spinlock: Enable x86 architecture to do lockless refcount update

Subject: Re: [PATCH v7 0/4] Lockless update of reference count protected by spinlock

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Attachments:

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Attachments:

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Attachments:

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Attachments:

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount