2019-03-25 05:06:56

by NeilBrown

[permalink] [raw]
Subject: [PATCH 0/4] Convert rhashtable to use bitlocks

This series converts rhashtable to use a per-bucket bitlock
rather than a separate array of spinlocks.
This:
reduces memory usage
results in slightly fewer memory accesses
slight improves parallelism
make a configuration option unnecessary

Thanks,
NeilBrown

---

NeilBrown (4):
rhashtable: use cmpxchg() in nested_table_alloc()
rhashtable: allow rht_bucket_var to return NULL.
rhashtable: use bit_spin_locks to protect hash bucket.
rhashtable: add lockdep tracking to bucket bit-spin-locks.


include/linux/rhashtable-types.h | 2
include/linux/rhashtable.h | 204 ++++++++++++++++++++++++++------------
ipc/util.c | 1
lib/rhashtable.c | 143 +++++++++++++++------------
net/bridge/br_fdb.c | 1
net/bridge/br_multicast.c | 1
net/bridge/br_vlan.c | 1
net/bridge/br_vlan_tunnel.c | 1
net/ipv4/ipmr.c | 1
net/ipv6/ip6mr.c | 1
net/netfilter/nf_tables_api.c | 1
11 files changed, 219 insertions(+), 138 deletions(-)

--
Signature



2019-03-25 05:07:09

by NeilBrown

[permalink] [raw]
Subject: [PATCH 1/4] rhashtable: use cmpxchg() in nested_table_alloc()

nested_table_alloc() relies on the fact that there is
at most one spinlock allocated for every slot in the top
level nested table, so it is not possible for two threads
to try to allocate the same table at the same time.

This assumption is a little fragile (it is not explicit) and is
unnecessary as cmpxchg() can be used instead.

A future patch will replace the spinlocks by per-bucket bitlocks,
and then we won't be able to protect the slot pointer with a spinlock.

So replace rcu_assign_pointer() with cmpxchg() - which has equivalent
barrier properties.
If it the cmp fails, free the table that was just allocated.

Signed-off-by: NeilBrown <[email protected]>
---
lib/rhashtable.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index f65e43fb1ff8..afe01a5048d7 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -131,9 +131,11 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht,
INIT_RHT_NULLS_HEAD(ntbl[i].bucket);
}

- rcu_assign_pointer(*prev, ntbl);
-
- return ntbl;
+ if (cmpxchg(prev, NULL, ntbl) == NULL)
+ return ntbl;
+ /* Raced with another thread. */
+ kfree(ntbl);
+ return rcu_dereference(*prev);
}

static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,



2019-03-25 05:07:20

by NeilBrown

[permalink] [raw]
Subject: [PATCH 2/4] rhashtable: allow rht_bucket_var to return NULL.

Rather than returning a pointer to a static nulls, rht_bucket_var()
now returns NULL if the bucket doesn't exist.
This will make the next patch, which stores a bitlock in the
bucket pointer, somewhat cleaner.

This change involves introducing __rht_bucket_nested() which is
like rht_bucket_nested(), but doesn't provide the static nulls,
and changing rht_bucket_nested() to call this and possible
provide a static nulls - as is still needed for the non-var case.

Signed-off-by: NeilBrown <[email protected]>
---
include/linux/rhashtable.h | 11 +++++++++--
lib/rhashtable.c | 29 ++++++++++++++++++++---------
2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 86dfa417848d..0c9175aeab8a 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -265,6 +265,8 @@ void rhashtable_destroy(struct rhashtable *ht);

struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
unsigned int hash);
+struct rhash_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash);
struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,
struct bucket_table *tbl,
unsigned int hash);
@@ -294,7 +296,7 @@ static inline struct rhash_head __rcu *const *rht_bucket(
static inline struct rhash_head __rcu **rht_bucket_var(
struct bucket_table *tbl, unsigned int hash)
{
- return unlikely(tbl->nest) ? rht_bucket_nested(tbl, hash) :
+ return unlikely(tbl->nest) ? __rht_bucket_nested(tbl, hash) :
&tbl->buckets[hash];
}

@@ -890,6 +892,8 @@ static inline int __rhashtable_remove_fast_one(
spin_lock_bh(lock);

pprev = rht_bucket_var(tbl, hash);
+ if (!pprev)
+ goto out;
rht_for_each_from(he, *pprev, tbl, hash) {
struct rhlist_head *list;

@@ -934,6 +938,7 @@ static inline int __rhashtable_remove_fast_one(
break;
}

+out:
spin_unlock_bh(lock);

if (err > 0) {
@@ -1042,6 +1047,8 @@ static inline int __rhashtable_replace_fast(
spin_lock_bh(lock);

pprev = rht_bucket_var(tbl, hash);
+ if (!pprev)
+ goto out;
rht_for_each_from(he, *pprev, tbl, hash) {
if (he != obj_old) {
pprev = &he->next;
@@ -1053,7 +1060,7 @@ static inline int __rhashtable_replace_fast(
err = 0;
break;
}
-
+out:
spin_unlock_bh(lock);

return err;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index afe01a5048d7..e3b726bf2dd3 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -237,8 +237,10 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
goto out;

err = -ENOENT;
+ if (!pprev)
+ goto out;

- rht_for_each(entry, old_tbl, old_hash) {
+ rht_for_each_from(entry, *pprev, old_tbl, old_hash) {
err = 0;
next = rht_dereference_bucket(entry->next, old_tbl, old_hash);

@@ -492,6 +494,8 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,

elasticity = RHT_ELASTICITY;
pprev = rht_bucket_var(tbl, hash);
+ if (!pprev)
+ return ERR_PTR(-ENOENT);
rht_for_each_from(head, *pprev, tbl, hash) {
struct rhlist_head *list;
struct rhlist_head *plist;
@@ -1157,11 +1161,10 @@ void rhashtable_destroy(struct rhashtable *ht)
}
EXPORT_SYMBOL_GPL(rhashtable_destroy);

-struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
- unsigned int hash)
+struct rhash_head __rcu **__rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash)
{
const unsigned int shift = PAGE_SHIFT - ilog2(sizeof(void *));
- static struct rhash_head __rcu *rhnull;
unsigned int index = hash & ((1 << tbl->nest) - 1);
unsigned int size = tbl->size >> tbl->nest;
unsigned int subhash = hash;
@@ -1179,15 +1182,23 @@ struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
subhash >>= shift;
}

- if (!ntbl) {
- if (!rhnull)
- INIT_RHT_NULLS_HEAD(rhnull);
- return &rhnull;
- }
+ if (!ntbl)
+ return NULL;

return &ntbl[subhash].bucket;

}
+EXPORT_SYMBOL_GPL(__rht_bucket_nested);
+
+struct rhash_head __rcu **rht_bucket_nested(const struct bucket_table *tbl,
+ unsigned int hash)
+{
+ static struct rhash_head __rcu *rhnull;
+
+ if (!rhnull)
+ INIT_RHT_NULLS_HEAD(rhnull);
+ return __rht_bucket_nested(tbl, hash) ?: &rhnull;
+}
EXPORT_SYMBOL_GPL(rht_bucket_nested);

struct rhash_head __rcu **rht_bucket_nested_insert(struct rhashtable *ht,



2019-03-25 05:08:02

by NeilBrown

[permalink] [raw]
Subject: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
bucket pointer to lock the hash chain for that bucket.

The benefits of a bit spin_lock are:
- no need to allocate a separate array of locks.
- no need to have a configuration option to guide the
choice of the size of this array
- locking cost is often a single test-and-set in a cache line
that will have to be loaded anyway. When inserting at, or removing
from, the head of the chain, the unlock is free - writing the new
address in the bucket head implicitly clears the lock bit.
For __rhashtable_insert_fast() we ensure this always happens
when adding a new key.
- even when lockings costs 2 updates (lock and unlock), they are
in a cacheline that needs to be read anyway.

The cost of using a bit spin_lock is a little bit of code complexity,
which I think is quite manageable.

Bit spin_locks are sometimes inappropriate because they are not fair -
if multiple CPUs repeatedly contend of the same lock, one CPU can
easily be starved. This is not a credible situation with rhashtable.
Multiple CPUs may want to repeatedly add or remove objects, but they
will typically do so at different buckets, so they will attempt to
acquire different locks.

As we have more bit-locks than we previously had spinlocks (by at
least a factor of two) we can expect slightly less contention to
go with the slightly better cache behavior and reduced memory
consumption.

Signed-off-by: NeilBrown <[email protected]>
---
include/linux/rhashtable-types.h | 2
include/linux/rhashtable.h | 191 ++++++++++++++++++++++++--------------
ipc/util.c | 1
lib/rhashtable.c | 113 +++++++++++-----------
net/bridge/br_fdb.c | 1
net/bridge/br_multicast.c | 1
net/bridge/br_vlan.c | 1
net/bridge/br_vlan_tunnel.c | 1
net/ipv4/ipmr.c | 1
net/ipv6/ip6mr.c | 1
net/netfilter/nf_tables_api.c | 1
11 files changed, 179 insertions(+), 135 deletions(-)

diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h
index 763d613ce2c2..57467cbf4c5b 100644
--- a/include/linux/rhashtable-types.h
+++ b/include/linux/rhashtable-types.h
@@ -48,7 +48,6 @@ typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg,
* @head_offset: Offset of rhash_head in struct to be hashed
* @max_size: Maximum size while expanding
* @min_size: Minimum size while shrinking
- * @locks_mul: Number of bucket locks to allocate per cpu (default: 32)
* @automatic_shrinking: Enable automatic shrinking of tables
* @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
* @obj_hashfn: Function to hash object
@@ -62,7 +61,6 @@ struct rhashtable_params {
unsigned int max_size;
u16 min_size;
bool automatic_shrinking;
- u8 locks_mul;
rht_hashfn_t hashfn;
rht_obj_hashfn_t obj_hashfn;
rht_obj_cmpfn_t obj_cmpfn;
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 0c9175aeab8a..14bac26cc92c 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -24,6 +24,7 @@
#include <linux/list_nulls.h>
#include <linux/workqueue.h>
#include <linux/rculist.h>
+#include <linux/bit_spinlock.h>

#include <linux/rhashtable-types.h>
/*
@@ -52,8 +53,6 @@
* @nest: Number of bits of first-level nested table.
* @rehash: Current bucket being rehashed
* @hash_rnd: Random seed to fold into hash
- * @locks_mask: Mask to apply before accessing locks[]
- * @locks: Array of spinlocks protecting individual buckets
* @walkers: List of active walkers
* @rcu: RCU structure for freeing the table
* @future_tbl: Table under construction during rehashing
@@ -64,8 +63,6 @@ struct bucket_table {
unsigned int size;
unsigned int nest;
u32 hash_rnd;
- unsigned int locks_mask;
- spinlock_t *locks;
struct list_head walkers;
struct rcu_head rcu;

@@ -74,6 +71,63 @@ struct bucket_table {
struct rhash_head __rcu *buckets[] ____cacheline_aligned_in_smp;
};

+/*
+ * We lock a bucket by setting BIT(1) in the pointer - this is always
+ * zero in real pointers and in the nulls marker.
+ * bit_spin_locks do not handle contention well, but the whole point
+ * of the hashtable design is to achieve minimum per-bucket contention.
+ * A nested hash table might not have a bucket pointer. In that case
+ * we cannot get a lock. For remove and replace the bucket cannot be
+ * interesting and doesn't need locking.
+ * For insert we allocate the bucket if this is the last bucket_table,
+ * and then take the lock.
+ * Sometimes we unlock a bucket by writing a new pointer there. In that
+ * case we don't need to unlock, but we do need to reset state such as
+ * local_bh. For that we have rht_unlocked(). This doesn't include
+ * the memory barrier that bit_spin_unlock() provides, but rcu_assign_pointer()
+ * will have provided that.
+ */
+
+static inline void rht_lock(struct rhash_head **bucket)
+{
+ local_bh_disable();
+ bit_spin_lock(1, (unsigned long *)bucket);
+}
+
+static inline void rht_unlock(struct rhash_head **bucket)
+{
+ bit_spin_unlock(1, (unsigned long *)bucket);
+ local_bh_enable();
+}
+
+static inline void rht_unlocked(void)
+{
+ preempt_enable();
+ __release(bitlock);
+ local_bh_enable();
+}
+
+/*
+ * If 'p' is a bucket head and might be locked:
+ * rht_ptr() returns the address without the lock bit.
+ * rht_ptr_locked() returns the address WITH the lock bit.
+ * rht_is_locked() tests if the lock bit is set.
+ */
+static inline struct rhash_head __rcu *rht_ptr(const struct rhash_head *p)
+{
+ return (void *)(((unsigned long)p) & ~2UL);
+}
+
+static inline struct rhash_head __rcu *rht_ptr_locked(const struct rhash_head *p)
+{
+ return (void *)(((unsigned long)p) | 2UL);
+}
+
+static inline bool rht_is_locked(const struct rhash_head *p)
+{
+ return rht_ptr_locked(p) == p;
+}
+
/*
* NULLS_MARKER() expects a hash value with the low
* bits mostly likely to be significant, and it discards
@@ -206,25 +260,6 @@ static inline bool rht_grow_above_max(const struct rhashtable *ht,
return atomic_read(&ht->nelems) >= ht->max_elems;
}

-/* The bucket lock is selected based on the hash and protects mutations
- * on a group of hash buckets.
- *
- * A maximum of tbl->size/2 bucket locks is allocated. This ensures that
- * a single lock always covers both buckets which may both contains
- * entries which link to the same bucket of the old table during resizing.
- * This allows to simplify the locking as locking the bucket in both
- * tables during resize always guarantee protection.
- *
- * IMPORTANT: When holding the bucket lock of both the old and new table
- * during expansions and shrinking, the old bucket lock must always be
- * acquired first.
- */
-static inline spinlock_t *rht_bucket_lock(const struct bucket_table *tbl,
- unsigned int hash)
-{
- return &tbl->locks[hash & tbl->locks_mask];
-}
-
#ifdef CONFIG_PROVE_LOCKING
int lockdep_rht_mutex_is_held(struct rhashtable *ht);
int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash);
@@ -326,7 +361,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
* @hash: the hash value / bucket index
*/
#define rht_for_each(pos, tbl, hash) \
- rht_for_each_from(pos, *rht_bucket(tbl, hash), tbl, hash)
+ rht_for_each_from(pos, rht_ptr(*rht_bucket(tbl, hash)), tbl, hash)

/**
* rht_for_each_entry_from - iterate over hash chain from given head
@@ -351,7 +386,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
* @member: name of the &struct rhash_head within the hashable struct.
*/
#define rht_for_each_entry(tpos, pos, tbl, hash, member) \
- rht_for_each_entry_from(tpos, pos, *rht_bucket(tbl, hash), \
+ rht_for_each_entry_from(tpos, pos, rht_ptr(*rht_bucket(tbl, hash)), \
tbl, hash, member)

/**
@@ -367,7 +402,8 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
* remove the loop cursor from the list.
*/
#define rht_for_each_entry_safe(tpos, pos, next, tbl, hash, member) \
- for (pos = rht_dereference_bucket(*rht_bucket(tbl, hash), tbl, hash), \
+ for (pos = rht_dereference_bucket(rht_ptr(*rht_bucket(tbl, hash)), \
+ tbl, hash), \
next = !rht_is_a_nulls(pos) ? \
rht_dereference_bucket(pos->next, tbl, hash) : NULL; \
(!rht_is_a_nulls(pos)) && rht_entry(tpos, pos, member); \
@@ -388,7 +424,7 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
*/
#define rht_for_each_rcu_from(pos, head, tbl, hash) \
for (({barrier(); }), \
- pos = rht_dereference_bucket_rcu(head, tbl, hash); \
+ pos = rht_ptr(rht_dereference_bucket_rcu(head, tbl, hash)); \
!rht_is_a_nulls(pos); \
pos = rcu_dereference_raw(pos->next))

@@ -437,7 +473,8 @@ static inline struct rhash_head __rcu **rht_bucket_insert(
* traversal is guarded by rcu_read_lock().
*/
#define rht_for_each_entry_rcu(tpos, pos, tbl, hash, member) \
- rht_for_each_entry_rcu_from(tpos, pos, *rht_bucket(tbl, hash), \
+ rht_for_each_entry_rcu_from(tpos, pos, \
+ rht_ptr(*rht_bucket(tbl, hash)), \
tbl, hash, member)

/**
@@ -600,9 +637,9 @@ static inline void *__rhashtable_insert_fast(
.key = key,
};
struct rhash_head __rcu **pprev;
+ struct rhash_head __rcu **lock;
struct bucket_table *tbl;
struct rhash_head *head;
- spinlock_t *lock;
unsigned int hash;
int elasticity;
void *data;
@@ -611,23 +648,22 @@ static inline void *__rhashtable_insert_fast(

tbl = rht_dereference_rcu(ht->tbl, ht);
hash = rht_head_hashfn(ht, tbl, obj, params);
- lock = rht_bucket_lock(tbl, hash);
- spin_lock_bh(lock);
+ elasticity = RHT_ELASTICITY;
+ pprev = rht_bucket_insert(ht, tbl, hash);
+ data = ERR_PTR(-ENOMEM);
+ if (!pprev)
+ goto out;
+ lock = pprev;
+ rht_lock(lock);

if (unlikely(rcu_access_pointer(tbl->future_tbl))) {
slow_path:
- spin_unlock_bh(lock);
+ rht_unlock(lock);
rcu_read_unlock();
return rhashtable_insert_slow(ht, key, obj);
}

- elasticity = RHT_ELASTICITY;
- pprev = rht_bucket_insert(ht, tbl, hash);
- data = ERR_PTR(-ENOMEM);
- if (!pprev)
- goto out;
-
- rht_for_each_from(head, *pprev, tbl, hash) {
+ rht_for_each_from(head, rht_ptr(*pprev), tbl, hash) {
struct rhlist_head *plist;
struct rhlist_head *list;

@@ -643,7 +679,7 @@ static inline void *__rhashtable_insert_fast(
data = rht_obj(ht, head);

if (!rhlist)
- goto out;
+ goto out_unlock;


list = container_of(obj, struct rhlist_head, rhead);
@@ -662,14 +698,16 @@ static inline void *__rhashtable_insert_fast(

data = ERR_PTR(-E2BIG);
if (unlikely(rht_grow_above_max(ht, tbl)))
- goto out;
+ goto out_unlock;

if (unlikely(rht_grow_above_100(ht, tbl)))
goto slow_path;

+ /* Inserting at head of list make unlocking free. */
+ pprev = lock;
head = rht_dereference_bucket(*pprev, tbl, hash);

- RCU_INIT_POINTER(obj->next, head);
+ RCU_INIT_POINTER(obj->next, rht_ptr(head));
if (rhlist) {
struct rhlist_head *list;

@@ -686,8 +724,16 @@ static inline void *__rhashtable_insert_fast(
good:
data = NULL;

+ if (pprev == lock) {
+ /* Assigning to *headp unlocked the chain, so we
+ * mustn't do it again.
+ */
+ rht_unlocked();
+ } else {
+out_unlock:
+ rht_unlock(lock);
+ }
out:
- spin_unlock_bh(lock);
rcu_read_unlock();

return data;
@@ -699,9 +745,9 @@ static inline void *__rhashtable_insert_fast(
* @obj: pointer to hash head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
@@ -728,9 +774,9 @@ static inline int rhashtable_insert_fast(
* @list: pointer to hash list head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
@@ -751,9 +797,9 @@ static inline int rhltable_insert_key(
* @list: pointer to hash list head inside object
* @params: hash table parameters
*
- * Will take a per bucket spinlock to protect against mutual mutations
+ * Will take the per bucket bitlock to protect against mutual mutations
* on the same bucket. Multiple insertions may occur in parallel unless
- * they map to the same bucket lock.
+ * they map to the same bucket.
*
* It is safe to call this function from atomic context.
*
@@ -881,20 +927,19 @@ static inline int __rhashtable_remove_fast_one(
bool rhlist)
{
struct rhash_head __rcu **pprev;
+ struct rhash_head __rcu **lock;
struct rhash_head *he;
- spinlock_t * lock;
unsigned int hash;
int err = -ENOENT;

hash = rht_head_hashfn(ht, tbl, obj, params);
- lock = rht_bucket_lock(tbl, hash);
-
- spin_lock_bh(lock);
-
pprev = rht_bucket_var(tbl, hash);
if (!pprev)
- goto out;
- rht_for_each_from(he, *pprev, tbl, hash) {
+ return -ENOENT;
+ lock = pprev;
+ rht_lock(lock);
+
+ rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
struct rhlist_head *list;

list = container_of(he, struct rhlist_head, rhead);
@@ -935,12 +980,16 @@ static inline int __rhashtable_remove_fast_one(
}

rcu_assign_pointer(*pprev, obj);
+ if (lock == pprev) {
+ /* That rcu_assign_pointer() unlocked the chain */
+ rht_unlocked();
+ goto unlocked;
+ }
break;
}

-out:
- spin_unlock_bh(lock);
-
+ rht_unlock(lock);
+unlocked:
if (err > 0) {
atomic_dec(&ht->nelems);
if (unlikely(ht->p.automatic_shrinking &&
@@ -1030,8 +1079,8 @@ static inline int __rhashtable_replace_fast(
const struct rhashtable_params params)
{
struct rhash_head __rcu **pprev;
+ struct rhash_head __rcu **lock;
struct rhash_head *he;
- spinlock_t *lock;
unsigned int hash;
int err = -ENOENT;

@@ -1042,14 +1091,14 @@ static inline int __rhashtable_replace_fast(
if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
return -EINVAL;

- lock = rht_bucket_lock(tbl, hash);
-
- spin_lock_bh(lock);
-
pprev = rht_bucket_var(tbl, hash);
if (!pprev)
- goto out;
- rht_for_each_from(he, *pprev, tbl, hash) {
+ return -ENOENT;
+
+ lock = pprev;
+ rht_lock(lock);
+
+ rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
if (he != obj_old) {
pprev = &he->next;
continue;
@@ -1058,11 +1107,17 @@ static inline int __rhashtable_replace_fast(
rcu_assign_pointer(obj_new->next, obj_old->next);
rcu_assign_pointer(*pprev, obj_new);
err = 0;
+ if (pprev == lock) {
+ /* We just unlocked the chain by assigning to *pprev */
+ rht_unlocked();
+ goto unlocked;
+ }
break;
}
-out:
- spin_unlock_bh(lock);

+ rht_unlock(lock);
+
+unlocked:
return err;
}

diff --git a/ipc/util.c b/ipc/util.c
index 0af05752969f..095274a871f8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -101,7 +101,6 @@ static const struct rhashtable_params ipc_kht_params = {
.head_offset = offsetof(struct kern_ipc_perm, khtnode),
.key_offset = offsetof(struct kern_ipc_perm, key),
.key_len = FIELD_SIZEOF(struct kern_ipc_perm, key),
- .locks_mul = 1,
.automatic_shrinking = true,
};

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index e3b726bf2dd3..beb942aa1e50 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -31,7 +31,6 @@

#define HASH_DEFAULT_SIZE 64UL
#define HASH_MIN_SIZE 4U
-#define BUCKET_LOCKS_PER_CPU 32UL

union nested_table {
union nested_table __rcu *table;
@@ -56,9 +55,11 @@ EXPORT_SYMBOL_GPL(lockdep_rht_mutex_is_held);

int lockdep_rht_bucket_is_held(const struct bucket_table *tbl, u32 hash)
{
- spinlock_t *lock = rht_bucket_lock(tbl, hash);
-
- return (debug_locks) ? lockdep_is_held(lock) : 1;
+ if (!debug_locks)
+ return 1;
+ if (unlikely(tbl->nest))
+ return 1;
+ return bit_spin_is_locked(1, (unsigned long *)&tbl->buckets[hash]);
}
EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
#else
@@ -104,7 +105,6 @@ static void bucket_table_free(const struct bucket_table *tbl)
if (tbl->nest)
nested_bucket_table_free(tbl);

- free_bucket_spinlocks(tbl->locks);
kvfree(tbl);
}

@@ -171,7 +171,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
gfp_t gfp)
{
struct bucket_table *tbl = NULL;
- size_t size, max_locks;
+ size_t size;
int i;

size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
@@ -189,16 +189,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,

tbl->size = size;

- max_locks = size >> 1;
- if (tbl->nest)
- max_locks = min_t(size_t, max_locks, 1U << tbl->nest);
-
- if (alloc_bucket_spinlocks(&tbl->locks, &tbl->locks_mask, max_locks,
- ht->p.locks_mul, gfp) < 0) {
- bucket_table_free(tbl);
- return NULL;
- }
-
rcu_head_init(&tbl->rcu);
INIT_LIST_HEAD(&tbl->walkers);

@@ -223,24 +213,22 @@ static struct bucket_table *rhashtable_last_table(struct rhashtable *ht,
return new_tbl;
}

-static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
+static int rhashtable_rehash_one(struct rhashtable *ht,
+ struct rhash_head __rcu **pprev,
+ unsigned int old_hash)
{
struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
struct bucket_table *new_tbl = rhashtable_last_table(ht, old_tbl);
- struct rhash_head __rcu **pprev = rht_bucket_var(old_tbl, old_hash);
int err = -EAGAIN;
struct rhash_head *head, *next, *entry;
- spinlock_t *new_bucket_lock;
unsigned int new_hash;

if (new_tbl->nest)
goto out;

err = -ENOENT;
- if (!pprev)
- goto out;

- rht_for_each_from(entry, *pprev, old_tbl, old_hash) {
+ rht_for_each_from(entry, rht_ptr(*pprev), old_tbl, old_hash) {
err = 0;
next = rht_dereference_bucket(entry->next, old_tbl, old_hash);

@@ -255,16 +243,19 @@ static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)

new_hash = head_hashfn(ht, new_tbl, entry);

- new_bucket_lock = rht_bucket_lock(new_tbl, new_hash);
+ rht_lock(&new_tbl->buckets[new_hash]);

- spin_lock_nested(new_bucket_lock, SINGLE_DEPTH_NESTING);
- head = rht_dereference_bucket(new_tbl->buckets[new_hash],
- new_tbl, new_hash);
+ head = rht_ptr(rht_dereference_bucket(new_tbl->buckets[new_hash],
+ new_tbl, new_hash));

RCU_INIT_POINTER(entry->next, head);

rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
- spin_unlock(new_bucket_lock);
+ rht_unlocked();
+
+ /* Need to preserved the bit lock. */
+ if (rht_is_locked(*pprev))
+ next = rht_ptr_locked(next);

rcu_assign_pointer(*pprev, next);

@@ -276,19 +267,19 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,
unsigned int old_hash)
{
struct bucket_table *old_tbl = rht_dereference(ht->tbl, ht);
- spinlock_t *old_bucket_lock;
+ struct rhash_head __rcu **pprev = rht_bucket_var(old_tbl, old_hash);
int err;

- old_bucket_lock = rht_bucket_lock(old_tbl, old_hash);
+ if (!pprev)
+ return 0;
+ rht_lock(pprev);

- spin_lock_bh(old_bucket_lock);
- while (!(err = rhashtable_rehash_one(ht, old_hash)))
+ while (!(err = rhashtable_rehash_one(ht, pprev, old_hash)))
;

if (err == -ENOENT)
err = 0;
-
- spin_unlock_bh(old_bucket_lock);
+ rht_unlock(pprev);

return err;
}
@@ -481,6 +472,7 @@ static int rhashtable_insert_rehash(struct rhashtable *ht,
}

static void *rhashtable_lookup_one(struct rhashtable *ht,
+ struct rhash_head __rcu **pprev,
struct bucket_table *tbl, unsigned int hash,
const void *key, struct rhash_head *obj)
{
@@ -488,15 +480,12 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
.ht = ht,
.key = key,
};
- struct rhash_head __rcu **pprev;
+ struct rhash_head **lock = pprev;
struct rhash_head *head;
int elasticity;

elasticity = RHT_ELASTICITY;
- pprev = rht_bucket_var(tbl, hash);
- if (!pprev)
- return ERR_PTR(-ENOENT);
- rht_for_each_from(head, *pprev, tbl, hash) {
+ rht_for_each_from(head, rht_ptr(*pprev), tbl, hash) {
struct rhlist_head *list;
struct rhlist_head *plist;

@@ -518,6 +507,9 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
RCU_INIT_POINTER(list->next, plist);
head = rht_dereference_bucket(head->next, tbl, hash);
RCU_INIT_POINTER(list->rhead.next, head);
+ if (pprev == lock)
+ /* Need to preserve the bit lock */
+ obj = rht_ptr_locked(obj);
rcu_assign_pointer(*pprev, obj);

return NULL;
@@ -530,12 +522,12 @@ static void *rhashtable_lookup_one(struct rhashtable *ht,
}

static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht,
+ struct rhash_head __rcu **pprev,
struct bucket_table *tbl,
unsigned int hash,
struct rhash_head *obj,
void *data)
{
- struct rhash_head __rcu **pprev;
struct bucket_table *new_tbl;
struct rhash_head *head;

@@ -558,11 +550,7 @@ static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht,
if (unlikely(rht_grow_above_100(ht, tbl)))
return ERR_PTR(-EAGAIN);

- pprev = rht_bucket_insert(ht, tbl, hash);
- if (!pprev)
- return ERR_PTR(-ENOMEM);
-
- head = rht_dereference_bucket(*pprev, tbl, hash);
+ head = rht_ptr(rht_dereference_bucket(*pprev, tbl, hash));

RCU_INIT_POINTER(obj->next, head);
if (ht->rhlist) {
@@ -572,6 +560,10 @@ static struct bucket_table *rhashtable_insert_one(struct rhashtable *ht,
RCU_INIT_POINTER(list->next, NULL);
}

+ /* pprev is always the head of the list, so it holds
+ * the lock, which we need to preserve
+ */
+ obj = rht_ptr_locked(obj);
rcu_assign_pointer(*pprev, obj);

atomic_inc(&ht->nelems);
@@ -586,6 +578,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
{
struct bucket_table *new_tbl;
struct bucket_table *tbl;
+ struct rhash_head __rcu **pprev;
unsigned int hash;
void *data;

@@ -594,14 +587,25 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
do {
tbl = new_tbl;
hash = rht_head_hashfn(ht, tbl, obj, ht->p);
- spin_lock_bh(rht_bucket_lock(tbl, hash));
-
- data = rhashtable_lookup_one(ht, tbl, hash, key, obj);
- new_tbl = rhashtable_insert_one(ht, tbl, hash, obj, data);
- if (PTR_ERR(new_tbl) != -EEXIST)
- data = ERR_CAST(new_tbl);
-
- spin_unlock_bh(rht_bucket_lock(tbl, hash));
+ if (rcu_access_pointer(tbl->future_tbl))
+ /* Failure is OK */
+ pprev = rht_bucket_var(tbl, hash);
+ else
+ pprev = rht_bucket_insert(ht, tbl, hash);
+ if (pprev == NULL) {
+ new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+ data = ERR_PTR(-EAGAIN);
+ } else {
+ rht_lock(pprev);
+ data = rhashtable_lookup_one(ht, pprev, tbl,
+ hash, key, obj);
+ new_tbl = rhashtable_insert_one(ht, pprev, tbl,
+ hash, obj, data);
+ if (PTR_ERR(new_tbl) != -EEXIST)
+ data = ERR_CAST(new_tbl);
+
+ rht_unlock(pprev);
+ }
} while (!IS_ERR_OR_NULL(new_tbl));

if (PTR_ERR(data) == -EAGAIN)
@@ -1028,11 +1032,6 @@ int rhashtable_init(struct rhashtable *ht,

size = rounded_hashtable_size(&ht->p);

- if (params->locks_mul)
- ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
- else
- ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
-
ht->key_len = ht->p.key_len;
if (!params->hashfn) {
ht->p.hashfn = jhash;
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 00573cc46c98..b1c91f66d79c 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -33,7 +33,6 @@ static const struct rhashtable_params br_fdb_rht_params = {
.key_offset = offsetof(struct net_bridge_fdb_entry, key),
.key_len = sizeof(struct net_bridge_fdb_key),
.automatic_shrinking = true,
- .locks_mul = 1,
};

static struct kmem_cache *br_fdb_cache __read_mostly;
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index b257342c0860..2e162943a141 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -44,7 +44,6 @@ static const struct rhashtable_params br_mdb_rht_params = {
.key_offset = offsetof(struct net_bridge_mdb_entry, addr),
.key_len = sizeof(struct br_ip),
.automatic_shrinking = true,
- .locks_mul = 1,
};

static void br_multicast_start_querier(struct net_bridge *br,
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 96abf8feb9dc..0a02822b5667 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -21,7 +21,6 @@ static const struct rhashtable_params br_vlan_rht_params = {
.key_offset = offsetof(struct net_bridge_vlan, vid),
.key_len = sizeof(u16),
.nelem_hint = 3,
- .locks_mul = 1,
.max_size = VLAN_N_VID,
.obj_cmpfn = br_vlan_cmp,
.automatic_shrinking = true,
diff --git a/net/bridge/br_vlan_tunnel.c b/net/bridge/br_vlan_tunnel.c
index 6d2c4eed2dc8..758151863669 100644
--- a/net/bridge/br_vlan_tunnel.c
+++ b/net/bridge/br_vlan_tunnel.c
@@ -34,7 +34,6 @@ static const struct rhashtable_params br_vlan_tunnel_rht_params = {
.key_offset = offsetof(struct net_bridge_vlan, tinfo.tunnel_id),
.key_len = sizeof(__be64),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = br_vlan_tunid_cmp,
.automatic_shrinking = true,
};
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 2c931120c494..9a3f13edc98e 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -373,7 +373,6 @@ static const struct rhashtable_params ipmr_rht_params = {
.key_offset = offsetof(struct mfc_cache, cmparg),
.key_len = sizeof(struct mfc_cache_cmp_arg),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = ipmr_hash_cmp,
.automatic_shrinking = true,
};
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index e4dd57976737..4e69847ed5be 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -355,7 +355,6 @@ static const struct rhashtable_params ip6mr_rht_params = {
.key_offset = offsetof(struct mfc6_cache, cmparg),
.key_len = sizeof(struct mfc6_cache_cmp_arg),
.nelem_hint = 3,
- .locks_mul = 1,
.obj_cmpfn = ip6mr_hash_cmp,
.automatic_shrinking = true,
};
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 513f93118604..40253eb38176 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -53,7 +53,6 @@ static const struct rhashtable_params nft_chain_ht_params = {
.hashfn = nft_chain_hash,
.obj_hashfn = nft_chain_hash_obj,
.obj_cmpfn = nft_chain_hash_cmp,
- .locks_mul = 1,
.automatic_shrinking = true,
};




2019-03-25 05:08:12

by NeilBrown

[permalink] [raw]
Subject: [PATCH 4/4] rhashtable: add lockdep tracking to bucket bit-spin-locks.

Native bit_spin_locks are not tracked by lockdep.

The bit_spin_locks used for rhashtable buckets are local
to the rhashtable implementation, so there is little opportunity
for the sort of misuse that lockdep might detect.
However locks are held while a hash function or compare
function is called, and if one of these took a lock,
a misbehaviour is possible.

As it is quite easy to add lockdep support this unlikely
possibility seems to be enough justification.

So create a lockdep class for bucket bit_spin_lock and attach
through a lockdep_map in each bucket_table.

Without the 'nested' annotation in rhashtable_rehash_one(), lockdep
correctly reports a possible problem as this lock is taken
while another bucket lock (in another table) is held. This
confirms that the added support works.
With the correct nested annotation in place, lockdep reports
no problems.

Signed-off-by: NeilBrown <[email protected]>
---
include/linux/rhashtable.h | 40 +++++++++++++++++++++++++++-------------
lib/rhashtable.c | 15 +++++++++------
2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 14bac26cc92c..e954d2ff5463 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -68,6 +68,8 @@ struct bucket_table {

struct bucket_table __rcu *future_tbl;

+ struct lockdep_map dep_map;
+
struct rhash_head __rcu *buckets[] ____cacheline_aligned_in_smp;
};

@@ -88,20 +90,32 @@ struct bucket_table {
* will have provided that.
*/

-static inline void rht_lock(struct rhash_head **bucket)
+static inline void rht_lock(struct bucket_table *tbl, struct rhash_head **bucket)
+{
+ local_bh_disable();
+ bit_spin_lock(1, (unsigned long *)bucket);
+ lock_map_acquire(&tbl->dep_map);
+}
+
+static inline void rht_lock_nested(struct bucket_table *tbl,
+ struct rhash_head **bucket,
+ unsigned int subclass)
{
local_bh_disable();
bit_spin_lock(1, (unsigned long *)bucket);
+ lock_acquire_exclusive(&tbl->dep_map, subclass, 0, NULL, _THIS_IP_);
}

-static inline void rht_unlock(struct rhash_head **bucket)
+static inline void rht_unlock(struct bucket_table *tbl, struct rhash_head **bucket)
{
+ lock_map_release(&tbl->dep_map);
bit_spin_unlock(1, (unsigned long *)bucket);
local_bh_enable();
}

-static inline void rht_unlocked(void)
+static inline void rht_unlocked(struct bucket_table *tbl)
{
+ lock_map_release(&tbl->dep_map);
preempt_enable();
__release(bitlock);
local_bh_enable();
@@ -654,11 +668,11 @@ static inline void *__rhashtable_insert_fast(
if (!pprev)
goto out;
lock = pprev;
- rht_lock(lock);
+ rht_lock(tbl, lock);

if (unlikely(rcu_access_pointer(tbl->future_tbl))) {
slow_path:
- rht_unlock(lock);
+ rht_unlock(tbl, lock);
rcu_read_unlock();
return rhashtable_insert_slow(ht, key, obj);
}
@@ -728,10 +742,10 @@ static inline void *__rhashtable_insert_fast(
/* Assigning to *headp unlocked the chain, so we
* mustn't do it again.
*/
- rht_unlocked();
+ rht_unlocked(tbl);
} else {
out_unlock:
- rht_unlock(lock);
+ rht_unlock(tbl, lock);
}
out:
rcu_read_unlock();
@@ -937,7 +951,7 @@ static inline int __rhashtable_remove_fast_one(
if (!pprev)
return -ENOENT;
lock = pprev;
- rht_lock(lock);
+ rht_lock(tbl, lock);

rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
struct rhlist_head *list;
@@ -982,13 +996,13 @@ static inline int __rhashtable_remove_fast_one(
rcu_assign_pointer(*pprev, obj);
if (lock == pprev) {
/* That rcu_assign_pointer() unlocked the chain */
- rht_unlocked();
+ rht_unlocked(tbl);
goto unlocked;
}
break;
}

- rht_unlock(lock);
+ rht_unlock(tbl, lock);
unlocked:
if (err > 0) {
atomic_dec(&ht->nelems);
@@ -1096,7 +1110,7 @@ static inline int __rhashtable_replace_fast(
return -ENOENT;

lock = pprev;
- rht_lock(lock);
+ rht_lock(tbl, lock);

rht_for_each_from(he, rht_ptr(*pprev), tbl, hash) {
if (he != obj_old) {
@@ -1109,13 +1123,13 @@ static inline int __rhashtable_replace_fast(
err = 0;
if (pprev == lock) {
/* We just unlocked the chain by assigning to *pprev */
- rht_unlocked();
+ rht_unlocked(tbl);
goto unlocked;
}
break;
}

- rht_unlock(lock);
+ rht_unlock(tbl, lock);

unlocked:
return err;
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index beb942aa1e50..db66562fc17c 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -173,6 +173,7 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
struct bucket_table *tbl = NULL;
size_t size;
int i;
+ static struct lock_class_key __key;

size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
tbl = kvzalloc(size, gfp);
@@ -187,6 +188,8 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
if (tbl == NULL)
return NULL;

+ lockdep_init_map(&tbl->dep_map, "rhashtable_bucket", &__key, 0);
+
tbl->size = size;

rcu_head_init(&tbl->rcu);
@@ -243,7 +246,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht,

new_hash = head_hashfn(ht, new_tbl, entry);

- rht_lock(&new_tbl->buckets[new_hash]);
+ rht_lock_nested(new_tbl, &new_tbl->buckets[new_hash], SINGLE_DEPTH_NESTING);

head = rht_ptr(rht_dereference_bucket(new_tbl->buckets[new_hash],
new_tbl, new_hash));
@@ -251,7 +254,7 @@ static int rhashtable_rehash_one(struct rhashtable *ht,
RCU_INIT_POINTER(entry->next, head);

rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
- rht_unlocked();
+ rht_unlocked(new_tbl);

/* Need to preserved the bit lock. */
if (rht_is_locked(*pprev))
@@ -272,14 +275,14 @@ static int rhashtable_rehash_chain(struct rhashtable *ht,

if (!pprev)
return 0;
- rht_lock(pprev);
+ rht_lock(old_tbl, pprev);

while (!(err = rhashtable_rehash_one(ht, pprev, old_hash)))
;

if (err == -ENOENT)
err = 0;
- rht_unlock(pprev);
+ rht_unlock(old_tbl, pprev);

return err;
}
@@ -596,7 +599,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
data = ERR_PTR(-EAGAIN);
} else {
- rht_lock(pprev);
+ rht_lock(tbl, pprev);
data = rhashtable_lookup_one(ht, pprev, tbl,
hash, key, obj);
new_tbl = rhashtable_insert_one(ht, pprev, tbl,
@@ -604,7 +607,7 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key,
if (PTR_ERR(new_tbl) != -EEXIST)
data = ERR_CAST(new_tbl);

- rht_unlock(pprev);
+ rht_unlock(tbl, pprev);
}
} while (!IS_ERR_OR_NULL(new_tbl));




2019-03-26 05:04:58

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
>
> + * Sometimes we unlock a bucket by writing a new pointer there. In that
> + * case we don't need to unlock, but we do need to reset state such as
> + * local_bh. For that we have rht_unlocked(). This doesn't include
> + * the memory barrier that bit_spin_unlock() provides, but rcu_assign_pointer()
> + * will have provided that.

Hmm, are you sure that's enough? IIRC rcu_assign_pointer only
provides a write barrier compared to the more complete (but one-way)
barrier that a spin-lock provides.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-03-26 05:28:20

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
>
> +/*
> + * If 'p' is a bucket head and might be locked:
> + * rht_ptr() returns the address without the lock bit.
> + * rht_ptr_locked() returns the address WITH the lock bit.
> + * rht_is_locked() tests if the lock bit is set.
> + */
> +static inline struct rhash_head __rcu *rht_ptr(const struct rhash_head *p)
> +{
> + return (void *)(((unsigned long)p) & ~2UL);
> +}

My preference would be to turn rhash_head * into an opaque type
that can only be dereferenced through rht_ptr.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-03-26 15:39:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Tue, Mar 26, 2019 at 01:03:20PM +0800, Herbert Xu wrote:
> On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
> >
> > + * Sometimes we unlock a bucket by writing a new pointer there. In that
> > + * case we don't need to unlock, but we do need to reset state such as
> > + * local_bh. For that we have rht_unlocked(). This doesn't include
> > + * the memory barrier that bit_spin_unlock() provides, but rcu_assign_pointer()
> > + * will have provided that.
>
> Hmm, are you sure that's enough? IIRC rcu_assign_pointer only
> provides a write barrier compared to the more complete (but one-way)
> barrier that a spin-lock provides.

Not seeing the code, I have no opinion on the safety in this case,
but I did want to mention that rcu_assign_pointer() has been upgraded
to a release store, so that it orders all prior accesses from the
viewpoint of some other thread that just picked up the stored pointer
via rcu_dereference().

But you are quite right, rcu_assign_pointer() used to just do an
smp_wmb(). It is now new and improved! ;-)

Thanx, Paul


2019-03-27 01:06:12

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Tue, Mar 26 2019, Herbert Xu wrote:

> On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
>>
>> + * Sometimes we unlock a bucket by writing a new pointer there. In that
>> + * case we don't need to unlock, but we do need to reset state such as
>> + * local_bh. For that we have rht_unlocked(). This doesn't include
>> + * the memory barrier that bit_spin_unlock() provides, but rcu_assign_pointer()
>> + * will have provided that.
>
> Hmm, are you sure that's enough? IIRC rcu_assign_pointer only
> provides a write barrier compared to the more complete (but one-way)
> barrier that a spin-lock provides.
>

The bit_spin_unlock(), which I am avoiding as unnecessary, would have
provided release semantics.
i.e. any write by this CPU that happened before the releasing write
will be visible to other CPUs before (or when) they see the result of
the releasing write.
This is (as I understand it) exactly that rcu_assign_pointer() promises
- even before acquire semantics were added as Paul just reported.

So yes, I am sure (surer now that I've walked through it carefully).

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2019-03-27 01:11:28

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Tue, Mar 26 2019, Herbert Xu wrote:

> On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
>>
>> +/*
>> + * If 'p' is a bucket head and might be locked:
>> + * rht_ptr() returns the address without the lock bit.
>> + * rht_ptr_locked() returns the address WITH the lock bit.
>> + * rht_is_locked() tests if the lock bit is set.
>> + */
>> +static inline struct rhash_head __rcu *rht_ptr(const struct rhash_head *p)
>> +{
>> + return (void *)(((unsigned long)p) & ~2UL);
>> +}
>
> My preference would be to turn rhash_head * into an opaque type
> that can only be dereferenced through rht_ptr.

I can try that.
Presumably the 'next' pointer in 'struct rhash_head' can still be a
normal pointer, but the buckets in the bucket_table would need to
opaque pointers.
Maybe pointers to some new undefined struct, which gets cast and masked
as appropriate?

I'll have a go and see how it looks.

Thanks,
NeilBrown


Attachments:
signature.asc (847.00 B)

2019-03-27 03:46:12

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Tue, Mar 26, 2019 at 08:36:46AM -0700, Paul E. McKenney wrote:
>
> But you are quite right, rcu_assign_pointer() used to just do an
> smp_wmb(). It is now new and improved! ;-)

Thanks for confirming! I had forgotten about that upgrade :)
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-03-27 03:46:43

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Wed, Mar 27, 2019 at 09:35:18AM +1100, NeilBrown wrote:
>
> The bit_spin_unlock(), which I am avoiding as unnecessary, would have
> provided release semantics.
> i.e. any write by this CPU that happened before the releasing write
> will be visible to other CPUs before (or when) they see the result of
> the releasing write.
> This is (as I understand it) exactly that rcu_assign_pointer() promises
> - even before acquire semantics were added as Paul just reported.
>
> So yes, I am sure (surer now that I've walked through it carefully).

Given that rcu_assign_pointer now does a store release it should
indeed be safe. Thanks!
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-03-27 15:07:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.

On Wed, Mar 27, 2019 at 09:35:18AM +1100, NeilBrown wrote:
> On Tue, Mar 26 2019, Herbert Xu wrote:
>
> > On Mon, Mar 25, 2019 at 04:05:39PM +1100, NeilBrown wrote:
> >>
> >> + * Sometimes we unlock a bucket by writing a new pointer there. In that
> >> + * case we don't need to unlock, but we do need to reset state such as
> >> + * local_bh. For that we have rht_unlocked(). This doesn't include
> >> + * the memory barrier that bit_spin_unlock() provides, but rcu_assign_pointer()
> >> + * will have provided that.
> >
> > Hmm, are you sure that's enough? IIRC rcu_assign_pointer only
> > provides a write barrier compared to the more complete (but one-way)
> > barrier that a spin-lock provides.
> >
>
> The bit_spin_unlock(), which I am avoiding as unnecessary, would have
> provided release semantics.
> i.e. any write by this CPU that happened before the releasing write
> will be visible to other CPUs before (or when) they see the result of
> the releasing write.
> This is (as I understand it) exactly that rcu_assign_pointer() promises
> - even before acquire semantics were added as Paul just reported.
>
> So yes, I am sure (surer now that I've walked through it carefully).

But why not construct a litmus test and apply tools/memory-model? ;-)

Thanx, Paul