2012-10-28 19:03:08

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

This hashtable implementation is using hlist buckets to provide a simple
hashtable to prevent it from getting reimplemented all over the kernel.

Signed-off-by: Sasha Levin <[email protected]>
---

Sorry for the long delay, I was busy with a bunch of personal things.

Changes since v6:

- Use macros that point to internal static inline functions instead of
implementing everything as a macro.
- Rebase on latest -next.
- Resending the enter patch series on request.
- Break early from hash_empty() if found to be non-empty.
- DECLARE_HASHTABLE/DEFINE_HASHTABLE.


include/linux/hashtable.h | 193 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 193 insertions(+)
create mode 100644 include/linux/hashtable.h

diff --git a/include/linux/hashtable.h b/include/linux/hashtable.h
new file mode 100644
index 0000000..1fb8c97
--- /dev/null
+++ b/include/linux/hashtable.h
@@ -0,0 +1,193 @@
+/*
+ * Statically sized hash table implementation
+ * (C) 2012 Sasha Levin <[email protected]>
+ */
+
+#ifndef _LINUX_HASHTABLE_H
+#define _LINUX_HASHTABLE_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/hash.h>
+#include <linux/rculist.h>
+
+#define DEFINE_HASHTABLE(name, bits) \
+ struct hlist_head name[1 << bits] = \
+ { [0 ... ((1 << bits) - 1)] = HLIST_HEAD_INIT }
+
+#define DECLARE_HASHTABLE(name, bits) \
+ struct hlist_head name[1 << (bits)]
+
+#define HASH_SIZE(name) (ARRAY_SIZE(name))
+#define HASH_BITS(name) ilog2(HASH_SIZE(name))
+
+/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
+#define hash_min(val, bits) \
+({ \
+ sizeof(val) <= 4 ? \
+ hash_32(val, bits) : \
+ hash_long(val, bits); \
+})
+
+static inline void __hash_init(struct hlist_head *ht, int sz)
+{
+ int i;
+
+ for (i = 0; i < sz; i++)
+ INIT_HLIST_HEAD(&ht[sz]);
+}
+
+/**
+ * hash_init - initialize a hash table
+ * @hashtable: hashtable to be initialized
+ *
+ * Calculates the size of the hashtable from the given parameter, otherwise
+ * same as hash_init_size.
+ *
+ * This has to be a macro since HASH_BITS() will not work on pointers since
+ * it calculates the size during preprocessing.
+ */
+#define hash_init(hashtable) __hash_init(hashtable, HASH_SIZE(hashtable))
+
+/**
+ * hash_add - add an object to a hashtable
+ * @hashtable: hashtable to add to
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add(hashtable, node, key) \
+ hlist_add_head(node, &hashtable[hash_min(key, HASH_BITS(hashtable))]);
+
+/**
+ * hash_add_rcu - add an object to a rcu enabled hashtable
+ * @hashtable: hashtable to add to
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add_rcu(hashtable, node, key) \
+ hlist_add_head_rcu(node, &hashtable[hash_min(key, HASH_BITS(hashtable))]);
+
+/**
+ * hash_hashed - check whether an object is in any hashtable
+ * @node: the &struct hlist_node of the object to be checked
+ */
+#define hash_hashed(node) (!hlist_unhashed(node))
+
+static inline bool __hash_empty(struct hlist_head *ht, int sz)
+{
+ int i;
+
+ for (i = 0; i < sz; i++)
+ if (!hlist_empty(&ht[i]))
+ return false;
+
+ return true;
+}
+
+/**
+ * hash_empty - check whether a hashtable is empty
+ * @hashtable: hashtable to check
+ *
+ * This has to be a macro since HASH_BITS() will not work on pointers since
+ * it calculates the size during preprocessing.
+ */
+#define hash_empty(hashtable) __hash_empty(hashtable, HASH_SIZE(hashtable))
+
+/**
+ * hash_del - remove an object from a hashtable
+ * @node: &struct hlist_node of the object to remove
+ */
+static inline void hash_del(struct hlist_node *node)
+{
+ hlist_del_init(node);
+}
+
+/**
+ * hash_del_rcu - remove an object from a rcu enabled hashtable
+ * @node: &struct hlist_node of the object to remove
+ */
+static inline void hash_del_rcu(struct hlist_node *node)
+{
+ hlist_del_init_rcu(node);
+}
+
+/**
+ * hash_for_each - iterate over a hashtable
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @obj: the type * to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each(name, bkt, node, obj, member) \
+ for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\
+ hlist_for_each_entry(obj, node, &name[bkt], member)
+
+/**
+ * hash_for_each_rcu - iterate over a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @obj: the type * to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_rcu(name, bkt, node, obj, member) \
+ for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\
+ hlist_for_each_entry_rcu(obj, node, &name[bkt], member)
+
+/**
+ * hash_for_each_safe - iterate over a hashtable safe against removal of
+ * hash entry
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @tmp: a &struct used for temporary storage
+ * @obj: the type * to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_safe(name, bkt, node, tmp, obj, member) \
+ for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\
+ hlist_for_each_entry_safe(obj, node, tmp, &name[bkt], member)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects hashing to the
+ * same bucket
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each entry
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible(name, obj, node, member, key) \
+ hlist_for_each_entry(obj, node, &name[hash_min(key, HASH_BITS(name))], member)
+
+/**
+ * hash_for_each_possible_rcu - iterate over all possible objects hashing to the
+ * same bucket in an rcu enabled hashtable
+ * in a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each entry
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_rcu(name, obj, node, member, key) \
+ hlist_for_each_entry_rcu(obj, node, &name[hash_min(key, HASH_BITS(name))], member)
+
+/**
+ * hash_for_each_possible_safe - iterate over all possible objects hashing to the
+ * same bucket safe against removals
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each entry
+ * @node: the &struct list_head to use as a loop cursor for each entry
+ * @tmp: a &struct used for temporary storage
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_safe(name, obj, node, tmp, member, key) \
+ hlist_for_each_entry_safe(obj, node, tmp, \
+ &name[hash_min(key, HASH_BITS(name))], member)
+
+
+#endif
--
1.7.12.4


2012-10-28 19:03:27

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 05/16] mm/huge_memory: use new hashtable implementation

Switch hugemem to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the hugemem.

This also removes the dymanic allocation of the hash table. The size of the table is
constant so there's no point in paying the price of an extra dereference when accessing
it.

Signed-off-by: Sasha Levin <[email protected]>
---
mm/huge_memory.c | 55 ++++++++++++++-----------------------------------------
1 file changed, 14 insertions(+), 41 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3c14a96..38ce8e9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -19,6 +19,7 @@
#include <linux/mman.h>
#include <linux/pagemap.h>
#include <linux/migrate.h>
+#include <linux/hashtable.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
#include "internal.h"
@@ -59,12 +60,12 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;

static int khugepaged(void *none);
-static int mm_slots_hash_init(void);
static int khugepaged_slab_init(void);
static void khugepaged_slab_free(void);

-#define MM_SLOTS_HASH_HEADS 1024
-static struct hlist_head *mm_slots_hash __read_mostly;
+#define MM_SLOTS_HASH_BITS 10
+static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
+
static struct kmem_cache *mm_slot_cache __read_mostly;

/**
@@ -545,12 +546,6 @@ static int __init hugepage_init(void)
if (err)
goto out;

- err = mm_slots_hash_init();
- if (err) {
- khugepaged_slab_free();
- goto out;
- }
-
/*
* By default disable transparent hugepages on smaller systems,
* where the extra memory used could hurt more than TLB overhead
@@ -1673,6 +1668,8 @@ static int __init khugepaged_slab_init(void)
if (!mm_slot_cache)
return -ENOMEM;

+ hash_init(mm_slots_hash);
+
return 0;
}

@@ -1694,47 +1691,23 @@ static inline void free_mm_slot(struct mm_slot *mm_slot)
kmem_cache_free(mm_slot_cache, mm_slot);
}

-static int __init mm_slots_hash_init(void)
-{
- mm_slots_hash = kzalloc(MM_SLOTS_HASH_HEADS * sizeof(struct hlist_head),
- GFP_KERNEL);
- if (!mm_slots_hash)
- return -ENOMEM;
- return 0;
-}
-
-#if 0
-static void __init mm_slots_hash_free(void)
-{
- kfree(mm_slots_hash);
- mm_slots_hash = NULL;
-}
-#endif
-
static struct mm_slot *get_mm_slot(struct mm_struct *mm)
{
- struct mm_slot *mm_slot;
- struct hlist_head *bucket;
+ struct mm_slot *slot;
struct hlist_node *node;

- bucket = &mm_slots_hash[((unsigned long)mm / sizeof(struct mm_struct))
- % MM_SLOTS_HASH_HEADS];
- hlist_for_each_entry(mm_slot, node, bucket, hash) {
- if (mm == mm_slot->mm)
- return mm_slot;
- }
+ hash_for_each_possible(mm_slots_hash, slot, node, hash, (unsigned long) mm)
+ if (slot->mm == mm)
+ return slot;
+
return NULL;
}

static void insert_to_mm_slots_hash(struct mm_struct *mm,
struct mm_slot *mm_slot)
{
- struct hlist_head *bucket;
-
- bucket = &mm_slots_hash[((unsigned long)mm / sizeof(struct mm_struct))
- % MM_SLOTS_HASH_HEADS];
mm_slot->mm = mm;
- hlist_add_head(&mm_slot->hash, bucket);
+ hash_add(mm_slots_hash, &mm_slot->hash, (long)mm);
}

static inline int khugepaged_test_exit(struct mm_struct *mm)
@@ -1803,7 +1776,7 @@ void __khugepaged_exit(struct mm_struct *mm)
spin_lock(&khugepaged_mm_lock);
mm_slot = get_mm_slot(mm);
if (mm_slot && khugepaged_scan.mm_slot != mm_slot) {
- hlist_del(&mm_slot->hash);
+ hash_del(&mm_slot->hash);
list_del(&mm_slot->mm_node);
free = 1;
}
@@ -2252,7 +2225,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)

if (khugepaged_test_exit(mm)) {
/* free mm_slot */
- hlist_del(&mm_slot->hash);
+ hash_del(&mm_slot->hash);
list_del(&mm_slot->mm_node);

/*
--
1.7.12.4

2012-10-28 19:03:21

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 04/16] workqueue: use new hashtable implementation

Switch workqueues to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the workqueues.

Signed-off-by: Sasha Levin <[email protected]>
---
kernel/workqueue.c | 86 ++++++++++--------------------------------------------
1 file changed, 15 insertions(+), 71 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a1135c6..8f6e1bf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -41,6 +41,7 @@
#include <linux/debug_locks.h>
#include <linux/lockdep.h>
#include <linux/idr.h>
+#include <linux/hashtable.h>

#include "workqueue_sched.h"

@@ -82,8 +83,6 @@ enum {
NR_WORKER_POOLS = 2, /* # worker pools per gcwq */

BUSY_WORKER_HASH_ORDER = 6, /* 64 pointers */
- BUSY_WORKER_HASH_SIZE = 1 << BUSY_WORKER_HASH_ORDER,
- BUSY_WORKER_HASH_MASK = BUSY_WORKER_HASH_SIZE - 1,

MAX_IDLE_WORKERS_RATIO = 4, /* 1/4 of busy can be idle */
IDLE_WORKER_TIMEOUT = 300 * HZ, /* keep idle ones for 5 mins */
@@ -180,7 +179,7 @@ struct global_cwq {
unsigned int flags; /* L: GCWQ_* flags */

/* workers are chained either in busy_hash or pool idle_list */
- struct hlist_head busy_hash[BUSY_WORKER_HASH_SIZE];
+ DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */

struct worker_pool pools[NR_WORKER_POOLS];
@@ -285,8 +284,7 @@ EXPORT_SYMBOL_GPL(system_freezable_wq);
(pool) < &(gcwq)->pools[NR_WORKER_POOLS]; (pool)++)

#define for_each_busy_worker(worker, i, pos, gcwq) \
- for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++) \
- hlist_for_each_entry(worker, pos, &gcwq->busy_hash[i], hentry)
+ hash_for_each(gcwq->busy_hash, i, pos, worker, hentry)

static inline int __next_gcwq_cpu(int cpu, const struct cpumask *mask,
unsigned int sw)
@@ -857,63 +855,6 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
}

/**
- * busy_worker_head - return the busy hash head for a work
- * @gcwq: gcwq of interest
- * @work: work to be hashed
- *
- * Return hash head of @gcwq for @work.
- *
- * CONTEXT:
- * spin_lock_irq(gcwq->lock).
- *
- * RETURNS:
- * Pointer to the hash head.
- */
-static struct hlist_head *busy_worker_head(struct global_cwq *gcwq,
- struct work_struct *work)
-{
- const int base_shift = ilog2(sizeof(struct work_struct));
- unsigned long v = (unsigned long)work;
-
- /* simple shift and fold hash, do we need something better? */
- v >>= base_shift;
- v += v >> BUSY_WORKER_HASH_ORDER;
- v &= BUSY_WORKER_HASH_MASK;
-
- return &gcwq->busy_hash[v];
-}
-
-/**
- * __find_worker_executing_work - find worker which is executing a work
- * @gcwq: gcwq of interest
- * @bwh: hash head as returned by busy_worker_head()
- * @work: work to find worker for
- *
- * Find a worker which is executing @work on @gcwq. @bwh should be
- * the hash head obtained by calling busy_worker_head() with the same
- * work.
- *
- * CONTEXT:
- * spin_lock_irq(gcwq->lock).
- *
- * RETURNS:
- * Pointer to worker which is executing @work if found, NULL
- * otherwise.
- */
-static struct worker *__find_worker_executing_work(struct global_cwq *gcwq,
- struct hlist_head *bwh,
- struct work_struct *work)
-{
- struct worker *worker;
- struct hlist_node *tmp;
-
- hlist_for_each_entry(worker, tmp, bwh, hentry)
- if (worker->current_work == work)
- return worker;
- return NULL;
-}
-
-/**
* find_worker_executing_work - find worker which is executing a work
* @gcwq: gcwq of interest
* @work: work to find worker for
@@ -932,8 +873,14 @@ static struct worker *__find_worker_executing_work(struct global_cwq *gcwq,
static struct worker *find_worker_executing_work(struct global_cwq *gcwq,
struct work_struct *work)
{
- return __find_worker_executing_work(gcwq, busy_worker_head(gcwq, work),
- work);
+ struct worker *worker;
+ struct hlist_node *tmp;
+
+ hash_for_each_possible(gcwq->busy_hash, worker, tmp, hentry, (unsigned long)work)
+ if (worker->current_work == work)
+ return worker;
+
+ return NULL;
}

/**
@@ -2160,7 +2107,6 @@ __acquires(&gcwq->lock)
struct cpu_workqueue_struct *cwq = get_work_cwq(work);
struct worker_pool *pool = worker->pool;
struct global_cwq *gcwq = pool->gcwq;
- struct hlist_head *bwh = busy_worker_head(gcwq, work);
bool cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE;
work_func_t f = work->func;
int work_color;
@@ -2192,7 +2138,7 @@ __acquires(&gcwq->lock)
* already processing the work. If so, defer the work to the
* currently executing one.
*/
- collision = __find_worker_executing_work(gcwq, bwh, work);
+ collision = find_worker_executing_work(gcwq, work);
if (unlikely(collision)) {
move_linked_works(work, &collision->scheduled, NULL);
return;
@@ -2200,7 +2146,7 @@ __acquires(&gcwq->lock)

/* claim and dequeue */
debug_work_deactivate(work);
- hlist_add_head(&worker->hentry, bwh);
+ hash_add(gcwq->busy_hash, &worker->hentry, (unsigned long)worker);
worker->current_work = work;
worker->current_cwq = cwq;
work_color = get_work_color(work);
@@ -2258,7 +2204,7 @@ __acquires(&gcwq->lock)
worker_clr_flags(worker, WORKER_CPU_INTENSIVE);

/* we're done with it, release */
- hlist_del_init(&worker->hentry);
+ hash_del(&worker->hentry);
worker->current_work = NULL;
worker->current_cwq = NULL;
cwq_dec_nr_in_flight(cwq, work_color);
@@ -3823,7 +3769,6 @@ out_unlock:
static int __init init_workqueues(void)
{
unsigned int cpu;
- int i;

/* make sure we have enough bits for OFFQ CPU number */
BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_CPU_SHIFT)) <
@@ -3841,8 +3786,7 @@ static int __init init_workqueues(void)
gcwq->cpu = cpu;
gcwq->flags |= GCWQ_DISASSOCIATED;

- for (i = 0; i < BUSY_WORKER_HASH_SIZE; i++)
- INIT_HLIST_HEAD(&gcwq->busy_hash[i]);
+ hash_init(gcwq->busy_hash);

for_each_worker_pool(pool, gcwq) {
pool->gcwq = gcwq;
--
1.7.12.4

2012-10-28 19:03:34

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 07/16] net,9p: use new hashtable implementation

Switch 9p error table to use the new hashtable implementation. This reduces the amount of
generic unrelated code in 9p.

Signed-off-by: Sasha Levin <[email protected]>
---
net/9p/error.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/9p/error.c b/net/9p/error.c
index 2ab2de7..a5cc7dd 100644
--- a/net/9p/error.c
+++ b/net/9p/error.c
@@ -34,7 +34,7 @@
#include <linux/jhash.h>
#include <linux/errno.h>
#include <net/9p/9p.h>
-
+#include <linux/hashtable.h>
/**
* struct errormap - map string errors from Plan 9 to Linux numeric ids
* @name: string sent over 9P
@@ -50,8 +50,8 @@ struct errormap {
struct hlist_node list;
};

-#define ERRHASHSZ 32
-static struct hlist_head hash_errmap[ERRHASHSZ];
+#define ERR_HASH_BITS 5
+static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS);

/* FixMe - reduce to a reasonable size */
static struct errormap errmap[] = {
@@ -193,18 +193,17 @@ static struct errormap errmap[] = {
int p9_error_init(void)
{
struct errormap *c;
- int bucket;
+ u32 hash;

/* initialize hash table */
- for (bucket = 0; bucket < ERRHASHSZ; bucket++)
- INIT_HLIST_HEAD(&hash_errmap[bucket]);
+ hash_init(hash_errmap);

/* load initial error map into hash table */
for (c = errmap; c->name != NULL; c++) {
c->namelen = strlen(c->name);
- bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ;
+ hash = jhash(c->name, c->namelen, 0);
INIT_HLIST_NODE(&c->list);
- hlist_add_head(&c->list, &hash_errmap[bucket]);
+ hash_add(hash_errmap, &c->list, hash);
}

return 1;
@@ -223,13 +222,13 @@ int p9_errstr2errno(char *errstr, int len)
int errno;
struct hlist_node *p;
struct errormap *c;
- int bucket;
+ u32 hash;

errno = 0;
p = NULL;
c = NULL;
- bucket = jhash(errstr, len, 0) % ERRHASHSZ;
- hlist_for_each_entry(c, p, &hash_errmap[bucket], list) {
+ hash = jhash(errstr, len, 0);
+ hash_for_each_possible(hash_errmap, c, p, list, hash) {
if (c->namelen == len && !memcmp(c->name, errstr, len)) {
errno = c->val;
break;
--
1.7.12.4

2012-10-28 19:03:45

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

Switch cache to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the cache implementation.

Signed-off-by: Sasha Levin <[email protected]>
---
net/sunrpc/cache.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index fc2f7aa..0490546 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -28,6 +28,7 @@
#include <linux/workqueue.h>
#include <linux/mutex.h>
#include <linux/pagemap.h>
+#include <linux/hashtable.h>
#include <asm/ioctls.h>
#include <linux/sunrpc/types.h>
#include <linux/sunrpc/cache.h>
@@ -524,19 +525,18 @@ EXPORT_SYMBOL_GPL(cache_purge);
* it to be revisited when cache info is available
*/

-#define DFR_HASHSIZE (PAGE_SIZE/sizeof(struct list_head))
-#define DFR_HASH(item) ((((long)item)>>4 ^ (((long)item)>>13)) % DFR_HASHSIZE)
+#define DFR_HASH_BITS 9

#define DFR_MAX 300 /* ??? */

static DEFINE_SPINLOCK(cache_defer_lock);
static LIST_HEAD(cache_defer_list);
-static struct hlist_head cache_defer_hash[DFR_HASHSIZE];
+static DEFINE_HASHTABLE(cache_defer_hash, DFR_HASH_BITS);
static int cache_defer_cnt;

static void __unhash_deferred_req(struct cache_deferred_req *dreq)
{
- hlist_del_init(&dreq->hash);
+ hash_del(&dreq->hash);
if (!list_empty(&dreq->recent)) {
list_del_init(&dreq->recent);
cache_defer_cnt--;
@@ -545,10 +545,7 @@ static void __unhash_deferred_req(struct cache_deferred_req *dreq)

static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_head *item)
{
- int hash = DFR_HASH(item);
-
- INIT_LIST_HEAD(&dreq->recent);
- hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
+ hash_add(cache_defer_hash, &dreq->hash, (unsigned long)item);
}

static void setup_deferral(struct cache_deferred_req *dreq,
@@ -600,7 +597,7 @@ static void cache_wait_req(struct cache_req *req, struct cache_head *item)
* to clean up
*/
spin_lock(&cache_defer_lock);
- if (!hlist_unhashed(&sleeper.handle.hash)) {
+ if (hash_hashed(&sleeper.handle.hash)) {
__unhash_deferred_req(&sleeper.handle);
spin_unlock(&cache_defer_lock);
} else {
@@ -671,12 +668,11 @@ static void cache_revisit_request(struct cache_head *item)
struct cache_deferred_req *dreq;
struct list_head pending;
struct hlist_node *lp, *tmp;
- int hash = DFR_HASH(item);

INIT_LIST_HEAD(&pending);
spin_lock(&cache_defer_lock);

- hlist_for_each_entry_safe(dreq, lp, tmp, &cache_defer_hash[hash], hash)
+ hash_for_each_possible_safe(cache_defer_hash, dreq, lp, tmp, hash, (unsigned long)item)
if (dreq->item == item) {
__unhash_deferred_req(dreq);
list_add(&dreq->recent, &pending);
@@ -1636,6 +1632,8 @@ static int create_cache_proc_entries(struct cache_detail *cd, struct net *net)
void __init cache_initialize(void)
{
INIT_DEFERRABLE_WORK(&cache_cleaner, do_cache_clean);
+
+ hash_init(cache_defer_hash);
}

int cache_register_net(struct cache_detail *cd, struct net *net)
--
1.7.12.4

2012-10-28 19:03:51

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 11/16] net,l2tp: use new hashtable implementation

Switch l2tp to use the new hashtable implementation. This reduces the amount of
generic unrelated code in l2tp.

Signed-off-by: Sasha Levin <[email protected]>
---
net/l2tp/l2tp_core.c | 134 ++++++++++++++++++------------------------------
net/l2tp/l2tp_core.h | 8 +--
net/l2tp/l2tp_debugfs.c | 19 +++----
3 files changed, 61 insertions(+), 100 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1a9f372..77029b0 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -44,6 +44,7 @@
#include <linux/udp.h>
#include <linux/l2tp.h>
#include <linux/hash.h>
+#include <linux/hashtable.h>
#include <linux/sort.h>
#include <linux/file.h>
#include <linux/nsproxy.h>
@@ -107,8 +108,8 @@ static unsigned int l2tp_net_id;
struct l2tp_net {
struct list_head l2tp_tunnel_list;
spinlock_t l2tp_tunnel_list_lock;
- struct hlist_head l2tp_session_hlist[L2TP_HASH_SIZE_2];
- spinlock_t l2tp_session_hlist_lock;
+ DECLARE_HASHTABLE(l2tp_session_hash, L2TP_HASH_BITS_2);
+ spinlock_t l2tp_session_hash_lock;
};

static void l2tp_session_set_header_len(struct l2tp_session *session, int version);
@@ -156,30 +157,17 @@ do { \
#define l2tp_tunnel_dec_refcount(t) l2tp_tunnel_dec_refcount_1(t)
#endif

-/* Session hash global list for L2TPv3.
- * The session_id SHOULD be random according to RFC3931, but several
- * L2TP implementations use incrementing session_ids. So we do a real
- * hash on the session_id, rather than a simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash_2(struct l2tp_net *pn, u32 session_id)
-{
- return &pn->l2tp_session_hlist[hash_32(session_id, L2TP_HASH_BITS_2)];
-
-}
-
/* Lookup a session by id in the global session list
*/
static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
{
struct l2tp_net *pn = l2tp_pernet(net);
- struct hlist_head *session_list =
- l2tp_session_id_hash_2(pn, session_id);
struct l2tp_session *session;
struct hlist_node *walk;

rcu_read_lock_bh();
- hlist_for_each_entry_rcu(session, walk, session_list, global_hlist) {
+ hash_for_each_possible_rcu(pn->l2tp_session_hash, session, walk,
+ global_hlist, session_id) {
if (session->session_id == session_id) {
rcu_read_unlock_bh();
return session;
@@ -190,23 +178,10 @@ static struct l2tp_session *l2tp_session_find_2(struct net *net, u32 session_id)
return NULL;
}

-/* Session hash list.
- * The session_id SHOULD be random according to RFC2661, but several
- * L2TP implementations (Cisco and Microsoft) use incrementing
- * session_ids. So we do a real hash on the session_id, rather than a
- * simple bitmask.
- */
-static inline struct hlist_head *
-l2tp_session_id_hash(struct l2tp_tunnel *tunnel, u32 session_id)
-{
- return &tunnel->session_hlist[hash_32(session_id, L2TP_HASH_BITS)];
-}
-
/* Lookup a session by id
*/
struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id)
{
- struct hlist_head *session_list;
struct l2tp_session *session;
struct hlist_node *walk;

@@ -217,15 +192,14 @@ struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunn
if (tunnel == NULL)
return l2tp_session_find_2(net, session_id);

- session_list = l2tp_session_id_hash(tunnel, session_id);
- read_lock_bh(&tunnel->hlist_lock);
- hlist_for_each_entry(session, walk, session_list, hlist) {
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each_possible(tunnel->session_hash, session, walk, hlist, session_id) {
if (session->session_id == session_id) {
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);
return session;
}
}
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);

return NULL;
}
@@ -238,17 +212,15 @@ struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth)
struct l2tp_session *session;
int count = 0;

- read_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
- hlist_for_each_entry(session, walk, &tunnel->session_hlist[hash], hlist) {
- if (++count > nth) {
- read_unlock_bh(&tunnel->hlist_lock);
- return session;
- }
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+ if (++count > nth) {
+ read_unlock_bh(&tunnel->hash_lock);
+ return session;
}
}

- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);

return NULL;
}
@@ -265,12 +237,10 @@ struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname)
struct l2tp_session *session;

rcu_read_lock_bh();
- for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++) {
- hlist_for_each_entry_rcu(session, walk, &pn->l2tp_session_hlist[hash], global_hlist) {
- if (!strcmp(session->ifname, ifname)) {
- rcu_read_unlock_bh();
- return session;
- }
+ hash_for_each_rcu(pn->l2tp_session_hash, hash, walk, session, global_hlist) {
+ if (!strcmp(session->ifname, ifname)) {
+ rcu_read_unlock_bh();
+ return session;
}
}

@@ -1272,7 +1242,7 @@ end:
*/
static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
{
- int hash;
+ int hash, found = 0;
struct hlist_node *walk;
struct hlist_node *tmp;
struct l2tp_session *session;
@@ -1282,16 +1252,14 @@ static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
l2tp_info(tunnel, L2TP_MSG_CONTROL, "%s: closing all sessions...\n",
tunnel->name);

- write_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
-again:
- hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
- session = hlist_entry(walk, struct l2tp_session, hlist);
-
+ write_lock_bh(&tunnel->hash_lock);
+ do {
+ found = 0;
+ hash_for_each_safe(tunnel->session_hash, hash, walk, tmp, session, hlist) {
l2tp_info(session, L2TP_MSG_CONTROL,
"%s: closing session\n", session->name);

- hlist_del_init(&session->hlist);
+ hash_del(&session->hlist);

/* Since we should hold the sock lock while
* doing any unbinding, we need to release the
@@ -1302,14 +1270,14 @@ again:
if (session->ref != NULL)
(*session->ref)(session);

- write_unlock_bh(&tunnel->hlist_lock);
+ write_unlock_bh(&tunnel->hash_lock);

if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);

- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_del_init_rcu(&session->global_hlist);
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_del_rcu(&session->global_hlist);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
synchronize_rcu();
}

@@ -1319,17 +1287,17 @@ again:
if (session->deref != NULL)
(*session->deref)(session);

- write_lock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);

/* Now restart from the beginning of this hash
* chain. We always remove a session from the
* list so we are guaranteed to make forward
* progress.
*/
- goto again;
+ found = 1;
}
- }
- write_unlock_bh(&tunnel->hlist_lock);
+ } while (found);
+ write_unlock_bh(&tunnel->hash_lock);
}

/* Really kill the tunnel.
@@ -1576,7 +1544,7 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32

tunnel->magic = L2TP_TUNNEL_MAGIC;
sprintf(&tunnel->name[0], "tunl %u", tunnel_id);
- rwlock_init(&tunnel->hlist_lock);
+ rwlock_init(&tunnel->hash_lock);

/* The net we belong to */
tunnel->l2tp_net = net;
@@ -1613,6 +1581,8 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32

/* Add tunnel to our list */
INIT_LIST_HEAD(&tunnel->list);
+
+ hash_init(tunnel->session_hash);
atomic_inc(&l2tp_tunnel_count);

/* Bump the reference count. The tunnel context is deleted
@@ -1677,17 +1647,17 @@ void l2tp_session_free(struct l2tp_session *session)
BUG_ON(tunnel->magic != L2TP_TUNNEL_MAGIC);

/* Delete the session from the hash */
- write_lock_bh(&tunnel->hlist_lock);
- hlist_del_init(&session->hlist);
- write_unlock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);
+ hash_del(&session->hlist);
+ write_unlock_bh(&tunnel->hash_lock);

/* Unlink from the global hash if not L2TPv2 */
if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);

- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_del_init_rcu(&session->global_hlist);
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_del_rcu(&session->global_hlist);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
synchronize_rcu();
}

@@ -1800,19 +1770,17 @@ struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunn
sock_hold(tunnel->sock);

/* Add session to the tunnel's hash list */
- write_lock_bh(&tunnel->hlist_lock);
- hlist_add_head(&session->hlist,
- l2tp_session_id_hash(tunnel, session_id));
- write_unlock_bh(&tunnel->hlist_lock);
+ write_lock_bh(&tunnel->hash_lock);
+ hash_add(tunnel->session_hash, &session->hlist, session_id);
+ write_unlock_bh(&tunnel->hash_lock);

/* And to the global session list if L2TPv3 */
if (tunnel->version != L2TP_HDR_VER_2) {
struct l2tp_net *pn = l2tp_pernet(tunnel->l2tp_net);

- spin_lock_bh(&pn->l2tp_session_hlist_lock);
- hlist_add_head_rcu(&session->global_hlist,
- l2tp_session_id_hash_2(pn, session_id));
- spin_unlock_bh(&pn->l2tp_session_hlist_lock);
+ spin_lock_bh(&pn->l2tp_session_hash_lock);
+ hash_add(pn->l2tp_session_hash, &session->global_hlist, session_id);
+ spin_unlock_bh(&pn->l2tp_session_hash_lock);
}

/* Ignore management session in session count value */
@@ -1831,15 +1799,13 @@ EXPORT_SYMBOL_GPL(l2tp_session_create);
static __net_init int l2tp_init_net(struct net *net)
{
struct l2tp_net *pn = net_generic(net, l2tp_net_id);
- int hash;

INIT_LIST_HEAD(&pn->l2tp_tunnel_list);
spin_lock_init(&pn->l2tp_tunnel_list_lock);

- for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++)
- INIT_HLIST_HEAD(&pn->l2tp_session_hlist[hash]);
+ hash_init(pn->l2tp_session_hash);

- spin_lock_init(&pn->l2tp_session_hlist_lock);
+ spin_lock_init(&pn->l2tp_session_hash_lock);

return 0;
}
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 56d583e..dcbec9e 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -11,17 +11,17 @@
#ifndef _L2TP_CORE_H_
#define _L2TP_CORE_H_

+#include <linux/hashtable.h>
+
/* Just some random numbers */
#define L2TP_TUNNEL_MAGIC 0x42114DDA
#define L2TP_SESSION_MAGIC 0x0C04EB7D

/* Per tunnel, session hash table size */
#define L2TP_HASH_BITS 4
-#define L2TP_HASH_SIZE (1 << L2TP_HASH_BITS)

/* System-wide, session hash table size */
#define L2TP_HASH_BITS_2 8
-#define L2TP_HASH_SIZE_2 (1 << L2TP_HASH_BITS_2)

/* Debug message categories for the DEBUG socket option */
enum {
@@ -164,8 +164,8 @@ struct l2tp_tunnel_cfg {
struct l2tp_tunnel {
int magic; /* Should be L2TP_TUNNEL_MAGIC */
struct rcu_head rcu;
- rwlock_t hlist_lock; /* protect session_hlist */
- struct hlist_head session_hlist[L2TP_HASH_SIZE];
+ rwlock_t hash_lock; /* protect session_hash */
+ DECLARE_HASHTABLE(session_hash, L2TP_HASH_BITS);
/* hashed list of sessions,
* hashed by id */
u32 tunnel_id;
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index c3813bc..655f1fa 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -105,21 +105,16 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
int session_count = 0;
int hash;
struct hlist_node *walk;
- struct hlist_node *tmp;
+ struct l2tp_session *session;

- read_lock_bh(&tunnel->hlist_lock);
- for (hash = 0; hash < L2TP_HASH_SIZE; hash++) {
- hlist_for_each_safe(walk, tmp, &tunnel->session_hlist[hash]) {
- struct l2tp_session *session;
+ read_lock_bh(&tunnel->hash_lock);
+ hash_for_each(tunnel->session_hash, hash, walk, session, hlist) {
+ if (session->session_id == 0)
+ continue;

- session = hlist_entry(walk, struct l2tp_session, hlist);
- if (session->session_id == 0)
- continue;
-
- session_count++;
- }
+ session_count++;
}
- read_unlock_bh(&tunnel->hlist_lock);
+ read_unlock_bh(&tunnel->hash_lock);

seq_printf(m, "\nTUNNEL %u peer %u", tunnel->tunnel_id, tunnel->peer_tunnel_id);
if (tunnel->sock) {
--
1.7.12.4

2012-10-28 19:03:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 08/16] block,elevator: use new hashtable implementation

Switch elevator to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the elevator.

This also removes the dymanic allocation of the hash table. The size of the table is
constant so there's no point in paying the price of an extra dereference when accessing
it.

Signed-off-by: Sasha Levin <[email protected]>
---
block/blk.h | 2 +-
block/elevator.c | 23 ++++-------------------
include/linux/elevator.h | 5 ++++-
3 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index ca51543..a0abbf6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -61,7 +61,7 @@ static inline void blk_clear_rq_complete(struct request *rq)
/*
* Internal elevator interface
*/
-#define ELV_ON_HASH(rq) (!hlist_unhashed(&(rq)->hash))
+#define ELV_ON_HASH(rq) hash_hashed(&(rq)->hash)

void blk_insert_flush(struct request *rq);
void blk_abort_flushes(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 9b1d42b..898d0eb 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -46,11 +46,6 @@ static LIST_HEAD(elv_list);
/*
* Merge hash stuff.
*/
-static const int elv_hash_shift = 6;
-#define ELV_HASH_BLOCK(sec) ((sec) >> 3)
-#define ELV_HASH_FN(sec) \
- (hash_long(ELV_HASH_BLOCK((sec)), elv_hash_shift))
-#define ELV_HASH_ENTRIES (1 << elv_hash_shift)
#define rq_hash_key(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq))

/*
@@ -142,7 +137,6 @@ static struct elevator_queue *elevator_alloc(struct request_queue *q,
struct elevator_type *e)
{
struct elevator_queue *eq;
- int i;

eq = kmalloc_node(sizeof(*eq), GFP_KERNEL | __GFP_ZERO, q->node);
if (unlikely(!eq))
@@ -151,14 +145,7 @@ static struct elevator_queue *elevator_alloc(struct request_queue *q,
eq->type = e;
kobject_init(&eq->kobj, &elv_ktype);
mutex_init(&eq->sysfs_lock);
-
- eq->hash = kmalloc_node(sizeof(struct hlist_head) * ELV_HASH_ENTRIES,
- GFP_KERNEL, q->node);
- if (!eq->hash)
- goto err;
-
- for (i = 0; i < ELV_HASH_ENTRIES; i++)
- INIT_HLIST_HEAD(&eq->hash[i]);
+ hash_init(eq->hash);

return eq;
err:
@@ -173,7 +160,6 @@ static void elevator_release(struct kobject *kobj)

e = container_of(kobj, struct elevator_queue, kobj);
elevator_put(e->type);
- kfree(e->hash);
kfree(e);
}

@@ -240,7 +226,7 @@ EXPORT_SYMBOL(elevator_exit);

static inline void __elv_rqhash_del(struct request *rq)
{
- hlist_del_init(&rq->hash);
+ hash_del(&rq->hash);
}

static void elv_rqhash_del(struct request_queue *q, struct request *rq)
@@ -254,7 +240,7 @@ static void elv_rqhash_add(struct request_queue *q, struct request *rq)
struct elevator_queue *e = q->elevator;

BUG_ON(ELV_ON_HASH(rq));
- hlist_add_head(&rq->hash, &e->hash[ELV_HASH_FN(rq_hash_key(rq))]);
+ hash_add(e->hash, &rq->hash, rq_hash_key(rq));
}

static void elv_rqhash_reposition(struct request_queue *q, struct request *rq)
@@ -266,11 +252,10 @@ static void elv_rqhash_reposition(struct request_queue *q, struct request *rq)
static struct request *elv_rqhash_find(struct request_queue *q, sector_t offset)
{
struct elevator_queue *e = q->elevator;
- struct hlist_head *hash_list = &e->hash[ELV_HASH_FN(offset)];
struct hlist_node *entry, *next;
struct request *rq;

- hlist_for_each_entry_safe(rq, entry, next, hash_list, hash) {
+ hash_for_each_possible_safe(e->hash, rq, entry, next, hash, offset) {
BUG_ON(!ELV_ON_HASH(rq));

if (unlikely(!rq_mergeable(rq))) {
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index c03af76..7587f7f 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -2,6 +2,7 @@
#define _LINUX_ELEVATOR_H

#include <linux/percpu.h>
+#include <linux/hashtable.h>

#ifdef CONFIG_BLOCK

@@ -96,6 +97,8 @@ struct elevator_type
struct list_head list;
};

+#define ELV_HASH_BITS 6
+
/*
* each queue has an elevator_queue associated with it
*/
@@ -105,7 +108,7 @@ struct elevator_queue
void *elevator_data;
struct kobject kobj;
struct mutex sysfs_lock;
- struct hlist_head *hash;
+ DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
unsigned int registered:1;
};

--
1.7.12.4

2012-10-28 19:03:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 13/16] lockd: use new hashtable implementation

Switch lockd to use the new hashtable implementation. This reduces the amount of
generic unrelated code in lockd.

Signed-off-by: Sasha Levin <[email protected]>
---
fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++-------------------------
1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index 0deb5f6..d223a1f 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -20,6 +20,7 @@
#include <linux/lockd/share.h>
#include <linux/module.h>
#include <linux/mount.h>
+#include <linux/hashtable.h>

#define NLMDBG_FACILITY NLMDBG_SVCSUBS

@@ -28,8 +29,7 @@
* Global file hash table
*/
#define FILE_HASH_BITS 7
-#define FILE_NRHASH (1<<FILE_HASH_BITS)
-static struct hlist_head nlm_files[FILE_NRHASH];
+static DEFINE_HASHTABLE(nlm_files, FILE_HASH_BITS);
static DEFINE_MUTEX(nlm_file_mutex);

#ifdef NFSD_DEBUG
@@ -68,7 +68,7 @@ static inline unsigned int file_hash(struct nfs_fh *f)
int i;
for (i=0; i<NFS2_FHSIZE;i++)
tmp += f->data[i];
- return tmp & (FILE_NRHASH - 1);
+ return tmp;
}

/*
@@ -86,17 +86,17 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result,
{
struct hlist_node *pos;
struct nlm_file *file;
- unsigned int hash;
+ unsigned int key;
__be32 nfserr;

nlm_debug_print_fh("nlm_lookup_file", f);

- hash = file_hash(f);
+ key = file_hash(f);

/* Lock file table */
mutex_lock(&nlm_file_mutex);

- hlist_for_each_entry(file, pos, &nlm_files[hash], f_list)
+ hash_for_each_possible(nlm_files, file, pos, f_list, file_hash(f))
if (!nfs_compare_fh(&file->f_handle, f))
goto found;

@@ -123,7 +123,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result,
goto out_free;
}

- hlist_add_head(&file->f_list, &nlm_files[hash]);
+ hash_add(nlm_files, &file->f_list, key);

found:
dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
@@ -147,8 +147,8 @@ static inline void
nlm_delete_file(struct nlm_file *file)
{
nlm_debug_print_file("closing file", file);
- if (!hlist_unhashed(&file->f_list)) {
- hlist_del(&file->f_list);
+ if (hash_hashed(&file->f_list)) {
+ hash_del(&file->f_list);
nlmsvc_ops->fclose(file->f_file);
kfree(file);
} else {
@@ -253,27 +253,25 @@ nlm_traverse_files(void *data, nlm_host_match_fn_t match,
int i, ret = 0;

mutex_lock(&nlm_file_mutex);
- for (i = 0; i < FILE_NRHASH; i++) {
- hlist_for_each_entry_safe(file, pos, next, &nlm_files[i], f_list) {
- if (is_failover_file && !is_failover_file(data, file))
- continue;
- file->f_count++;
- mutex_unlock(&nlm_file_mutex);
-
- /* Traverse locks, blocks and shares of this file
- * and update file->f_locks count */
- if (nlm_inspect_file(data, file, match))
- ret = 1;
-
- mutex_lock(&nlm_file_mutex);
- file->f_count--;
- /* No more references to this file. Let go of it. */
- if (list_empty(&file->f_blocks) && !file->f_locks
- && !file->f_shares && !file->f_count) {
- hlist_del(&file->f_list);
- nlmsvc_ops->fclose(file->f_file);
- kfree(file);
- }
+ hash_for_each_safe(nlm_files, i, pos, next, file, f_list) {
+ if (is_failover_file && !is_failover_file(data, file))
+ continue;
+ file->f_count++;
+ mutex_unlock(&nlm_file_mutex);
+
+ /* Traverse locks, blocks and shares of this file
+ * and update file->f_locks count */
+ if (nlm_inspect_file(data, file, match))
+ ret = 1;
+
+ mutex_lock(&nlm_file_mutex);
+ file->f_count--;
+ /* No more references to this file. Let go of it. */
+ if (list_empty(&file->f_blocks) && !file->f_locks
+ && !file->f_shares && !file->f_count) {
+ hash_del(&file->f_list);
+ nlmsvc_ops->fclose(file->f_file);
+ kfree(file);
}
}
mutex_unlock(&nlm_file_mutex);
@@ -451,3 +449,11 @@ nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr)
return ret ? -EIO : 0;
}
EXPORT_SYMBOL_GPL(nlmsvc_unlock_all_by_ip);
+
+static int __init nlm_init(void)
+{
+ hash_init(nlm_files);
+ return 0;
+}
+
+module_init(nlm_init);
--
1.7.12.4

2012-10-28 19:04:05

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 15/16] openvswitch: use new hashtable implementation

Switch openvswitch to use the new hashtable implementation. This reduces the amount of
generic unrelated code in openvswitch.

Signed-off-by: Sasha Levin <[email protected]>
---
net/openvswitch/vport.c | 34 +++++++++++++---------------------
1 file changed, 13 insertions(+), 21 deletions(-)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 03779e8..3cb9caa 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -28,6 +28,7 @@
#include <linux/rtnetlink.h>
#include <linux/compat.h>
#include <net/net_namespace.h>
+#include <linux/hashtable.h>

#include "datapath.h"
#include "vport.h"
@@ -41,8 +42,8 @@ static const struct vport_ops *vport_ops_list[] = {
};

/* Protected by RCU read lock for reading, RTNL lock for writing. */
-static struct hlist_head *dev_table;
-#define VPORT_HASH_BUCKETS 1024
+#define VPORT_HASH_BITS 10
+static DEFINE_HASHTABLE(dev_table, VPORT_HASH_BITS);

/**
* ovs_vport_init - initialize vport subsystem
@@ -51,10 +52,7 @@ static struct hlist_head *dev_table;
*/
int ovs_vport_init(void)
{
- dev_table = kzalloc(VPORT_HASH_BUCKETS * sizeof(struct hlist_head),
- GFP_KERNEL);
- if (!dev_table)
- return -ENOMEM;
+ hash_init(dev_table);

return 0;
}
@@ -69,12 +67,6 @@ void ovs_vport_exit(void)
kfree(dev_table);
}

-static struct hlist_head *hash_bucket(struct net *net, const char *name)
-{
- unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
- return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
-}
-
/**
* ovs_vport_locate - find a port that has already been created
*
@@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
*/
struct vport *ovs_vport_locate(struct net *net, const char *name)
{
- struct hlist_head *bucket = hash_bucket(net, name);
struct vport *vport;
struct hlist_node *node;
+ int key = full_name_hash(name, strlen(name));

- hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
- if (!strcmp(name, vport->ops->get_name(vport)) &&
- net_eq(ovs_dp_get_net(vport->dp), net))
+ hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
+ if (!strcmp(name, vport->ops->get_name(vport)))
return vport;

return NULL;
@@ -174,7 +165,8 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)

for (i = 0; i < ARRAY_SIZE(vport_ops_list); i++) {
if (vport_ops_list[i]->type == parms->type) {
- struct hlist_head *bucket;
+ int key;
+ const char *name;

vport = vport_ops_list[i]->create(parms);
if (IS_ERR(vport)) {
@@ -182,9 +174,9 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
goto out;
}

- bucket = hash_bucket(ovs_dp_get_net(vport->dp),
- vport->ops->get_name(vport));
- hlist_add_head_rcu(&vport->hash_node, bucket);
+ name = vport->ops->get_name(vport);
+ key = full_name_hash(name, strlen(name));
+ hash_add_rcu(dev_table, &vport->hash_node, key);
return vport;
}
}
@@ -225,7 +217,7 @@ void ovs_vport_del(struct vport *vport)
{
ASSERT_RTNL();

- hlist_del_rcu(&vport->hash_node);
+ hash_del_rcu(&vport->hash_node);

vport->ops->destroy(vport);
}
--
1.7.12.4

2012-10-28 19:04:23

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 16/16] tracing output: use new hashtable implementation

Switch tracing to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the tracing module.

Signed-off-by: Sasha Levin <[email protected]>
---
kernel/trace/trace_output.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..1324c1a 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -8,15 +8,15 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/ftrace.h>
+#include <linux/hashtable.h>

#include "trace_output.h"

-/* must be a power of 2 */
-#define EVENT_HASHSIZE 128
+#define EVENT_HASH_BITS 7

DECLARE_RWSEM(trace_event_mutex);

-static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly;
+static DEFINE_HASHTABLE(event_hash, EVENT_HASH_BITS);

static int next_event_type = __TRACE_LAST_TYPE + 1;

@@ -712,11 +712,8 @@ struct trace_event *ftrace_find_event(int type)
{
struct trace_event *event;
struct hlist_node *n;
- unsigned key;

- key = type & (EVENT_HASHSIZE - 1);
-
- hlist_for_each_entry(event, n, &event_hash[key], node) {
+ hash_for_each_possible(event_hash, event, n, node, type) {
if (event->type == type)
return event;
}
@@ -781,7 +778,6 @@ void trace_event_read_unlock(void)
*/
int register_ftrace_event(struct trace_event *event)
{
- unsigned key;
int ret = 0;

down_write(&trace_event_mutex);
@@ -833,9 +829,7 @@ int register_ftrace_event(struct trace_event *event)
if (event->funcs->binary == NULL)
event->funcs->binary = trace_nop_print;

- key = event->type & (EVENT_HASHSIZE - 1);
-
- hlist_add_head(&event->node, &event_hash[key]);
+ hash_add(event_hash, &event->node, event->type);

ret = event->type;
out:
@@ -850,7 +844,7 @@ EXPORT_SYMBOL_GPL(register_ftrace_event);
*/
int __unregister_ftrace_event(struct trace_event *event)
{
- hlist_del(&event->node);
+ hash_del(&event->node);
list_del(&event->list);
return 0;
}
@@ -1323,6 +1317,8 @@ __init static int init_events(void)
}
}

+ hash_init(event_hash);
+
return 0;
}
early_initcall(init_events);
--
1.7.12.4

2012-10-28 19:04:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 14/16] net,rds: use new hashtable implementation

Switch rds to use the new hashtable implementation. This reduces the amount of
generic unrelated code in rds.

Signed-off-by: Sasha Levin <[email protected]>
---
net/rds/bind.c | 28 +++++++++-----
net/rds/connection.c | 102 +++++++++++++++++++++++----------------------------
2 files changed, 63 insertions(+), 67 deletions(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 637bde5..79d65ce 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -36,16 +36,16 @@
#include <linux/if_arp.h>
#include <linux/jhash.h>
#include <linux/ratelimit.h>
+#include <linux/hashtable.h>
#include "rds.h"

-#define BIND_HASH_SIZE 1024
-static struct hlist_head bind_hash_table[BIND_HASH_SIZE];
+#define BIND_HASH_BITS 10
+static DEFINE_HASHTABLE(bind_hash_table, BIND_HASH_BITS);
static DEFINE_SPINLOCK(rds_bind_lock);

-static struct hlist_head *hash_to_bucket(__be32 addr, __be16 port)
+static u32 rds_hash(__be32 addr, __be16 port)
{
- return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
- (BIND_HASH_SIZE - 1));
+ return jhash_2words((u32)addr, (u32)port, 0);
}

static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
@@ -53,12 +53,12 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
{
struct rds_sock *rs;
struct hlist_node *node;
- struct hlist_head *head = hash_to_bucket(addr, port);
+ u32 key = rds_hash(addr, port);
u64 cmp;
u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);

rcu_read_lock();
- hlist_for_each_entry_rcu(rs, node, head, rs_bound_node) {
+ hash_for_each_possible_rcu(bind_hash_table, rs, node, rs_bound_node, key) {
cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
be16_to_cpu(rs->rs_bound_port);

@@ -74,13 +74,13 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
* make sure our addr and port are set before
* we are added to the list, other people
* in rcu will find us as soon as the
- * hlist_add_head_rcu is done
+ * hash_add_rcu is done
*/
insert->rs_bound_addr = addr;
insert->rs_bound_port = port;
rds_sock_addref(insert);

- hlist_add_head_rcu(&insert->rs_bound_node, head);
+ hash_add_rcu(bind_hash_table, &insert->rs_bound_node, key);
}
return NULL;
}
@@ -152,7 +152,7 @@ void rds_remove_bound(struct rds_sock *rs)
rs, &rs->rs_bound_addr,
ntohs(rs->rs_bound_port));

- hlist_del_init_rcu(&rs->rs_bound_node);
+ hash_del_rcu(&rs->rs_bound_node);
rds_sock_put(rs);
rs->rs_bound_addr = 0;
}
@@ -202,3 +202,11 @@ out:
synchronize_rcu();
return ret;
}
+
+static int __init rds_init(void)
+{
+ hash_init(bind_hash_table);
+ return 0;
+}
+
+module_init(rds_init);
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 9e07c75..5b09ee1 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -34,28 +34,24 @@
#include <linux/list.h>
#include <linux/slab.h>
#include <linux/export.h>
+#include <linux/hashtable.h>
#include <net/inet_hashtables.h>

#include "rds.h"
#include "loop.h"

#define RDS_CONNECTION_HASH_BITS 12
-#define RDS_CONNECTION_HASH_ENTRIES (1 << RDS_CONNECTION_HASH_BITS)
-#define RDS_CONNECTION_HASH_MASK (RDS_CONNECTION_HASH_ENTRIES - 1)

/* converting this to RCU is a chore for another day.. */
static DEFINE_SPINLOCK(rds_conn_lock);
static unsigned long rds_conn_count;
-static struct hlist_head rds_conn_hash[RDS_CONNECTION_HASH_ENTRIES];
+static DEFINE_HASHTABLE(rds_conn_hash, RDS_CONNECTION_HASH_BITS);
static struct kmem_cache *rds_conn_slab;

-static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
+static unsigned long rds_conn_hashfn(__be32 laddr, __be32 faddr)
{
/* Pass NULL, don't need struct net for hash */
- unsigned long hash = inet_ehashfn(NULL,
- be32_to_cpu(laddr), 0,
- be32_to_cpu(faddr), 0);
- return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK];
+ return inet_ehashfn(NULL, be32_to_cpu(laddr), 0, be32_to_cpu(faddr), 0);
}

#define rds_conn_info_set(var, test, suffix) do { \
@@ -64,14 +60,14 @@ static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr)
} while (0)

/* rcu read lock must be held or the connection spinlock */
-static struct rds_connection *rds_conn_lookup(struct hlist_head *head,
- __be32 laddr, __be32 faddr,
+static struct rds_connection *rds_conn_lookup(__be32 laddr, __be32 faddr,
struct rds_transport *trans)
{
struct rds_connection *conn, *ret = NULL;
struct hlist_node *pos;
+ unsigned long key = rds_conn_hashfn(laddr, faddr);

- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
+ hash_for_each_possible_rcu(rds_conn_hash, conn, pos, c_hash_node, key) {
if (conn->c_faddr == faddr && conn->c_laddr == laddr &&
conn->c_trans == trans) {
ret = conn;
@@ -117,13 +113,12 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
int is_outgoing)
{
struct rds_connection *conn, *parent = NULL;
- struct hlist_head *head = rds_conn_bucket(laddr, faddr);
struct rds_transport *loop_trans;
unsigned long flags;
int ret;

rcu_read_lock();
- conn = rds_conn_lookup(head, laddr, faddr, trans);
+ conn = rds_conn_lookup(laddr, faddr, trans);
if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport &&
!is_outgoing) {
/* This is a looped back IB connection, and we're
@@ -224,13 +219,15 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
/* Creating normal conn */
struct rds_connection *found;

- found = rds_conn_lookup(head, laddr, faddr, trans);
+ found = rds_conn_lookup(laddr, faddr, trans);
if (found) {
trans->conn_free(conn->c_transport_data);
kmem_cache_free(rds_conn_slab, conn);
conn = found;
} else {
- hlist_add_head_rcu(&conn->c_hash_node, head);
+ unsigned long key = rds_conn_hashfn(laddr, faddr);
+
+ hash_add_rcu(rds_conn_hash, &conn->c_hash_node, key);
rds_cong_add_conn(conn);
rds_conn_count++;
}
@@ -303,7 +300,7 @@ void rds_conn_shutdown(struct rds_connection *conn)
* conn - the reconnect is always triggered by the active peer. */
cancel_delayed_work_sync(&conn->c_conn_w);
rcu_read_lock();
- if (!hlist_unhashed(&conn->c_hash_node)) {
+ if (hash_hashed(&conn->c_hash_node)) {
rcu_read_unlock();
rds_queue_reconnect(conn);
} else {
@@ -329,7 +326,7 @@ void rds_conn_destroy(struct rds_connection *conn)

/* Ensure conn will not be scheduled for reconnect */
spin_lock_irq(&rds_conn_lock);
- hlist_del_init_rcu(&conn->c_hash_node);
+ hash_del(&conn->c_hash_node);
spin_unlock_irq(&rds_conn_lock);
synchronize_rcu();

@@ -375,7 +372,6 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,
struct rds_info_lengths *lens,
int want_send)
{
- struct hlist_head *head;
struct hlist_node *pos;
struct list_head *list;
struct rds_connection *conn;
@@ -388,27 +384,24 @@ static void rds_conn_message_info(struct socket *sock, unsigned int len,

rcu_read_lock();

- for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
- i++, head++) {
- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
- if (want_send)
- list = &conn->c_send_queue;
- else
- list = &conn->c_retrans;
-
- spin_lock_irqsave(&conn->c_lock, flags);
-
- /* XXX too lazy to maintain counts.. */
- list_for_each_entry(rm, list, m_conn_item) {
- total++;
- if (total <= len)
- rds_inc_info_copy(&rm->m_inc, iter,
- conn->c_laddr,
- conn->c_faddr, 0);
- }
-
- spin_unlock_irqrestore(&conn->c_lock, flags);
+ hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+ if (want_send)
+ list = &conn->c_send_queue;
+ else
+ list = &conn->c_retrans;
+
+ spin_lock_irqsave(&conn->c_lock, flags);
+
+ /* XXX too lazy to maintain counts.. */
+ list_for_each_entry(rm, list, m_conn_item) {
+ total++;
+ if (total <= len)
+ rds_inc_info_copy(&rm->m_inc, iter,
+ conn->c_laddr,
+ conn->c_faddr, 0);
}
+
+ spin_unlock_irqrestore(&conn->c_lock, flags);
}
rcu_read_unlock();

@@ -438,7 +431,6 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
size_t item_len)
{
uint64_t buffer[(item_len + 7) / 8];
- struct hlist_head *head;
struct hlist_node *pos;
struct rds_connection *conn;
size_t i;
@@ -448,23 +440,19 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
lens->nr = 0;
lens->each = item_len;

- for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
- i++, head++) {
- hlist_for_each_entry_rcu(conn, pos, head, c_hash_node) {
-
- /* XXX no c_lock usage.. */
- if (!visitor(conn, buffer))
- continue;
-
- /* We copy as much as we can fit in the buffer,
- * but we count all items so that the caller
- * can resize the buffer. */
- if (len >= item_len) {
- rds_info_copy(iter, buffer, item_len);
- len -= item_len;
- }
- lens->nr++;
+ hash_for_each_rcu(rds_conn_hash, i, pos, conn, c_hash_node) {
+ /* XXX no c_lock usage.. */
+ if (!visitor(conn, buffer))
+ continue;
+
+ /* We copy as much as we can fit in the buffer,
+ * but we count all items so that the caller
+ * can resize the buffer. */
+ if (len >= item_len) {
+ rds_info_copy(iter, buffer, item_len);
+ len -= item_len;
}
+ lens->nr++;
}
rcu_read_unlock();
}
@@ -518,6 +506,8 @@ int rds_conn_init(void)
rds_info_register_func(RDS_INFO_RETRANS_MESSAGES,
rds_conn_message_info_retrans);

+ hash_init(rds_conn_hash);
+
return 0;
}

@@ -525,8 +515,6 @@ void rds_conn_exit(void)
{
rds_loop_exit();

- WARN_ON(!hlist_empty(rds_conn_hash));
-
kmem_cache_destroy(rds_conn_slab);

rds_info_deregister_func(RDS_INFO_CONNECTIONS, rds_conn_info);
--
1.7.12.4

2012-10-28 19:05:27

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 12/16] dm: use new hashtable implementation

Switch dm to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the dm.

Signed-off-by: Sasha Levin <[email protected]>
---
drivers/md/dm-snap.c | 24 +++++------------
drivers/md/persistent-data/dm-block-manager.c | 1 -
.../persistent-data/dm-persistent-data-internal.h | 19 --------------
.../md/persistent-data/dm-transaction-manager.c | 30 +++++++---------------
4 files changed, 16 insertions(+), 58 deletions(-)
delete mode 100644 drivers/md/persistent-data/dm-persistent-data-internal.h

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 223e7eb..4b19fa0 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -34,9 +34,7 @@ static const char dm_snapshot_merge_target_name[] = "snapshot-merge";
*/
#define MIN_IOS 256

-#define DM_TRACKED_CHUNK_HASH_SIZE 16
-#define DM_TRACKED_CHUNK_HASH(x) ((unsigned long)(x) & \
- (DM_TRACKED_CHUNK_HASH_SIZE - 1))
+#define DM_TRACKED_CHUNK_HASH_BITS 4

struct dm_exception_table {
uint32_t hash_mask;
@@ -80,7 +78,7 @@ struct dm_snapshot {
/* Chunks with outstanding reads */
spinlock_t tracked_chunk_lock;
mempool_t *tracked_chunk_pool;
- struct hlist_head tracked_chunk_hash[DM_TRACKED_CHUNK_HASH_SIZE];
+ DECLARE_HASHTABLE(tracked_chunk_hash, DM_TRACKED_CHUNK_HASH_BITS);

/* The on disk metadata handler */
struct dm_exception_store *store;
@@ -202,8 +200,7 @@ static struct dm_snap_tracked_chunk *track_chunk(struct dm_snapshot *s,
c->chunk = chunk;

spin_lock_irq(&s->tracked_chunk_lock);
- hlist_add_head(&c->node,
- &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)]);
+ hash_add(s->tracked_chunk_hash, &c->node, chunk);
spin_unlock_irq(&s->tracked_chunk_lock);

return c;
@@ -215,7 +212,7 @@ static void stop_tracking_chunk(struct dm_snapshot *s,
unsigned long flags;

spin_lock_irqsave(&s->tracked_chunk_lock, flags);
- hlist_del(&c->node);
+ hash_del(&c->node);
spin_unlock_irqrestore(&s->tracked_chunk_lock, flags);

mempool_free(c, s->tracked_chunk_pool);
@@ -229,8 +226,7 @@ static int __chunk_is_tracked(struct dm_snapshot *s, chunk_t chunk)

spin_lock_irq(&s->tracked_chunk_lock);

- hlist_for_each_entry(c, hn,
- &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)], node) {
+ hash_for_each_possible(s->tracked_chunk_hash, c, hn, node, chunk) {
if (c->chunk == chunk) {
found = 1;
break;
@@ -1032,7 +1028,6 @@ static void stop_merge(struct dm_snapshot *s)
static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
struct dm_snapshot *s;
- int i;
int r = -EINVAL;
char *origin_path, *cow_path;
unsigned args_used, num_flush_requests = 1;
@@ -1127,8 +1122,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad_tracked_chunk_pool;
}

- for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++)
- INIT_HLIST_HEAD(&s->tracked_chunk_hash[i]);
+ hash_init(s->tracked_chunk_hash);

spin_lock_init(&s->tracked_chunk_lock);

@@ -1252,9 +1246,6 @@ static void __handover_exceptions(struct dm_snapshot *snap_src,

static void snapshot_dtr(struct dm_target *ti)
{
-#ifdef CONFIG_DM_DEBUG
- int i;
-#endif
struct dm_snapshot *s = ti->private;
struct dm_snapshot *snap_src = NULL, *snap_dest = NULL;

@@ -1285,8 +1276,7 @@ static void snapshot_dtr(struct dm_target *ti)
smp_mb();

#ifdef CONFIG_DM_DEBUG
- for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++)
- BUG_ON(!hlist_empty(&s->tracked_chunk_hash[i]));
+ BUG_ON(!hash_empty(s->tracked_chunk_hash));
#endif

mempool_destroy(s->tracked_chunk_pool);
diff --git a/drivers/md/persistent-data/dm-block-manager.c b/drivers/md/persistent-data/dm-block-manager.c
index 5ba2777..31edaf13 100644
--- a/drivers/md/persistent-data/dm-block-manager.c
+++ b/drivers/md/persistent-data/dm-block-manager.c
@@ -4,7 +4,6 @@
* This file is released under the GPL.
*/
#include "dm-block-manager.h"
-#include "dm-persistent-data-internal.h"
#include "../dm-bufio.h"

#include <linux/crc32c.h>
diff --git a/drivers/md/persistent-data/dm-persistent-data-internal.h b/drivers/md/persistent-data/dm-persistent-data-internal.h
deleted file mode 100644
index c49e26f..0000000
--- a/drivers/md/persistent-data/dm-persistent-data-internal.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * Copyright (C) 2011 Red Hat, Inc.
- *
- * This file is released under the GPL.
- */
-
-#ifndef _DM_PERSISTENT_DATA_INTERNAL_H
-#define _DM_PERSISTENT_DATA_INTERNAL_H
-
-#include "dm-block-manager.h"
-
-static inline unsigned dm_hash_block(dm_block_t b, unsigned hash_mask)
-{
- const unsigned BIG_PRIME = 4294967291UL;
-
- return (((unsigned) b) * BIG_PRIME) & hash_mask;
-}
-
-#endif /* _PERSISTENT_DATA_INTERNAL_H */
diff --git a/drivers/md/persistent-data/dm-transaction-manager.c b/drivers/md/persistent-data/dm-transaction-manager.c
index d247a35..9eb9417 100644
--- a/drivers/md/persistent-data/dm-transaction-manager.c
+++ b/drivers/md/persistent-data/dm-transaction-manager.c
@@ -7,11 +7,11 @@
#include "dm-space-map.h"
#include "dm-space-map-disk.h"
#include "dm-space-map-metadata.h"
-#include "dm-persistent-data-internal.h"

#include <linux/export.h>
#include <linux/slab.h>
#include <linux/device-mapper.h>
+#include <linux/hashtable.h>

#define DM_MSG_PREFIX "transaction manager"

@@ -25,8 +25,7 @@ struct shadow_info {
/*
* It would be nice if we scaled with the size of transaction.
*/
-#define HASH_SIZE 256
-#define HASH_MASK (HASH_SIZE - 1)
+#define DM_HASH_BITS 8

struct dm_transaction_manager {
int is_clone;
@@ -36,7 +35,7 @@ struct dm_transaction_manager {
struct dm_space_map *sm;

spinlock_t lock;
- struct hlist_head buckets[HASH_SIZE];
+ DECLARE_HASHTABLE(hash, DM_HASH_BITS);
};

/*----------------------------------------------------------------*/
@@ -44,12 +43,11 @@ struct dm_transaction_manager {
static int is_shadow(struct dm_transaction_manager *tm, dm_block_t b)
{
int r = 0;
- unsigned bucket = dm_hash_block(b, HASH_MASK);
struct shadow_info *si;
struct hlist_node *n;

spin_lock(&tm->lock);
- hlist_for_each_entry(si, n, tm->buckets + bucket, hlist)
+ hash_for_each_possible(tm->hash, si, n, hlist, b)
if (si->where == b) {
r = 1;
break;
@@ -65,15 +63,13 @@ static int is_shadow(struct dm_transaction_manager *tm, dm_block_t b)
*/
static void insert_shadow(struct dm_transaction_manager *tm, dm_block_t b)
{
- unsigned bucket;
struct shadow_info *si;

si = kmalloc(sizeof(*si), GFP_NOIO);
if (si) {
si->where = b;
- bucket = dm_hash_block(b, HASH_MASK);
spin_lock(&tm->lock);
- hlist_add_head(&si->hlist, tm->buckets + bucket);
+ hash_add(tm->hash, &si->hlist, b);
spin_unlock(&tm->lock);
}
}
@@ -82,18 +78,12 @@ static void wipe_shadow_table(struct dm_transaction_manager *tm)
{
struct shadow_info *si;
struct hlist_node *n, *tmp;
- struct hlist_head *bucket;
int i;

spin_lock(&tm->lock);
- for (i = 0; i < HASH_SIZE; i++) {
- bucket = tm->buckets + i;
- hlist_for_each_entry_safe(si, n, tmp, bucket, hlist)
- kfree(si);
-
- INIT_HLIST_HEAD(bucket);
- }
-
+ hash_for_each_safe(tm->hash, i, n, tmp, si, hlist)
+ kfree(si);
+ hash_init(tm->hash);
spin_unlock(&tm->lock);
}

@@ -102,7 +92,6 @@ static void wipe_shadow_table(struct dm_transaction_manager *tm)
static struct dm_transaction_manager *dm_tm_create(struct dm_block_manager *bm,
struct dm_space_map *sm)
{
- int i;
struct dm_transaction_manager *tm;

tm = kmalloc(sizeof(*tm), GFP_KERNEL);
@@ -115,8 +104,7 @@ static struct dm_transaction_manager *dm_tm_create(struct dm_block_manager *bm,
tm->sm = sm;

spin_lock_init(&tm->lock);
- for (i = 0; i < HASH_SIZE; i++)
- INIT_HLIST_HEAD(tm->buckets + i);
+ hash_init(tm->hash);

return tm;
}
--
1.7.12.4

2012-10-28 19:06:46

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 10/16] dlm: use new hashtable implementation

Switch dlm to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the dlm.

Signed-off-by: Sasha Levin <[email protected]>
---
fs/dlm/lowcomms.c | 47 +++++++++++++----------------------------------
1 file changed, 13 insertions(+), 34 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 331ea4f..9f21774 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -55,6 +55,7 @@
#include <net/sctp/sctp.h>
#include <net/sctp/user.h>
#include <net/ipv6.h>
+#include <linux/hashtable.h>

#include "dlm_internal.h"
#include "lowcomms.h"
@@ -62,7 +63,7 @@
#include "config.h"

#define NEEDED_RMEM (4*1024*1024)
-#define CONN_HASH_SIZE 32
+#define CONN_HASH_BITS 5

/* Number of messages to send before rescheduling */
#define MAX_SEND_MSG_COUNT 25
@@ -158,34 +159,21 @@ static int dlm_allow_conn;
static struct workqueue_struct *recv_workqueue;
static struct workqueue_struct *send_workqueue;

-static struct hlist_head connection_hash[CONN_HASH_SIZE];
+static struct hlist_head connection_hash[CONN_HASH_BITS];
static DEFINE_MUTEX(connections_lock);
static struct kmem_cache *con_cache;

static void process_recv_sockets(struct work_struct *work);
static void process_send_sockets(struct work_struct *work);

-
-/* This is deliberately very simple because most clusters have simple
- sequential nodeids, so we should be able to go straight to a connection
- struct in the array */
-static inline int nodeid_hash(int nodeid)
-{
- return nodeid & (CONN_HASH_SIZE-1);
-}
-
static struct connection *__find_con(int nodeid)
{
- int r;
struct hlist_node *h;
struct connection *con;

- r = nodeid_hash(nodeid);
-
- hlist_for_each_entry(con, h, &connection_hash[r], list) {
+ hash_for_each_possible(connection_hash, con, h, list, nodeid)
if (con->nodeid == nodeid)
return con;
- }
return NULL;
}

@@ -196,7 +184,6 @@ static struct connection *__find_con(int nodeid)
static struct connection *__nodeid2con(int nodeid, gfp_t alloc)
{
struct connection *con = NULL;
- int r;

con = __find_con(nodeid);
if (con || !alloc)
@@ -206,8 +193,7 @@ static struct connection *__nodeid2con(int nodeid, gfp_t alloc)
if (!con)
return NULL;

- r = nodeid_hash(nodeid);
- hlist_add_head(&con->list, &connection_hash[r]);
+ hash_add(connection_hash, &con->list, nodeid);

con->nodeid = nodeid;
mutex_init(&con->sock_mutex);
@@ -235,11 +221,8 @@ static void foreach_conn(void (*conn_func)(struct connection *c))
struct hlist_node *h, *n;
struct connection *con;

- for (i = 0; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry_safe(con, h, n, &connection_hash[i], list){
- conn_func(con);
- }
- }
+ hash_for_each_safe(connection_hash, i, h, n, con, list)
+ conn_func(con);
}

static struct connection *nodeid2con(int nodeid, gfp_t allocation)
@@ -262,12 +245,10 @@ static struct connection *assoc2con(int assoc_id)

mutex_lock(&connections_lock);

- for (i = 0 ; i < CONN_HASH_SIZE; i++) {
- hlist_for_each_entry(con, h, &connection_hash[i], list) {
- if (con->sctp_assoc == assoc_id) {
- mutex_unlock(&connections_lock);
- return con;
- }
+ hash_for_each(connection_hash, i, h, con, list) {
+ if (con->sctp_assoc == assoc_id) {
+ mutex_unlock(&connections_lock);
+ return con;
}
}
mutex_unlock(&connections_lock);
@@ -1638,7 +1619,7 @@ static void free_conn(struct connection *con)
close_connection(con, true);
if (con->othercon)
kmem_cache_free(con_cache, con->othercon);
- hlist_del(&con->list);
+ hash_del(&con->list);
kmem_cache_free(con_cache, con);
}

@@ -1667,10 +1648,8 @@ int dlm_lowcomms_start(void)
{
int error = -EINVAL;
struct connection *con;
- int i;

- for (i = 0; i < CONN_HASH_SIZE; i++)
- INIT_HLIST_HEAD(&connection_hash[i]);
+ hash_init(connection_hash);

init_local();
if (!dlm_local_count) {
--
1.7.12.4

2012-10-28 19:07:45

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 06/16] tracepoint: use new hashtable implementation

Switch tracepoints to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the tracepoints.

Signed-off-by: Sasha Levin <[email protected]>
---
kernel/tracepoint.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index d96ba22..854df92 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -26,6 +26,7 @@
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/static_key.h>
+#include <linux/hashtable.h>

extern struct tracepoint * const __start___tracepoints_ptrs[];
extern struct tracepoint * const __stop___tracepoints_ptrs[];
@@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
* Protected by tracepoints_mutex.
*/
#define TRACEPOINT_HASH_BITS 6
-#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
-static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
+static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);

/*
* Note about RCU :
@@ -191,16 +191,15 @@ tracepoint_entry_remove_probe(struct tracepoint_entry *entry,
*/
static struct tracepoint_entry *get_tracepoint(const char *name)
{
- struct hlist_head *head;
struct hlist_node *node;
struct tracepoint_entry *e;
u32 hash = jhash(name, strlen(name), 0);

- head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
- hlist_for_each_entry(e, node, head, hlist) {
+ hash_for_each_possible(tracepoint_table, e, node, hlist, hash) {
if (!strcmp(name, e->name))
return e;
}
+
return NULL;
}

@@ -210,19 +209,13 @@ static struct tracepoint_entry *get_tracepoint(const char *name)
*/
static struct tracepoint_entry *add_tracepoint(const char *name)
{
- struct hlist_head *head;
- struct hlist_node *node;
struct tracepoint_entry *e;
size_t name_len = strlen(name) + 1;
u32 hash = jhash(name, name_len-1, 0);

- head = &tracepoint_table[hash & (TRACEPOINT_TABLE_SIZE - 1)];
- hlist_for_each_entry(e, node, head, hlist) {
- if (!strcmp(name, e->name)) {
- printk(KERN_NOTICE
- "tracepoint %s busy\n", name);
- return ERR_PTR(-EEXIST); /* Already there */
- }
+ if (get_tracepoint(name)) {
+ printk(KERN_NOTICE "tracepoint %s busy\n", name);
+ return ERR_PTR(-EEXIST); /* Already there */
}
/*
* Using kmalloc here to allocate a variable length element. Could
@@ -234,7 +227,7 @@ static struct tracepoint_entry *add_tracepoint(const char *name)
memcpy(&e->name[0], name, name_len);
e->funcs = NULL;
e->refcount = 0;
- hlist_add_head(&e->hlist, head);
+ hash_add(tracepoint_table, &e->hlist, hash);
return e;
}

@@ -244,7 +237,7 @@ static struct tracepoint_entry *add_tracepoint(const char *name)
*/
static inline void remove_tracepoint(struct tracepoint_entry *e)
{
- hlist_del(&e->hlist);
+ hash_del(&e->hlist);
kfree(e);
}

@@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {

static int init_tracepoints(void)
{
+ hash_init(tracepoint_table);
+
return register_module_notifier(&tracepoint_module_nb);
}
__initcall(init_tracepoints);
--
1.7.12.4

2012-10-28 19:03:14

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 02/16] userns: use new hashtable implementation

Switch to using the new hashtable implementation to store user structs.
This reduces the amount of generic unrelated code in kernel/user.c.

Signed-off-by: Sasha Levin <[email protected]>
---
kernel/user.c | 33 +++++++++++++--------------------
1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/kernel/user.c b/kernel/user.c
index 750acff..8cd922a 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -16,6 +16,7 @@
#include <linux/interrupt.h>
#include <linux/export.h>
#include <linux/user_namespace.h>
+#include <linux/hashtable.h>

/*
* userns count is 1 for root user, 1 for init_uts_ns,
@@ -60,13 +61,9 @@ EXPORT_SYMBOL_GPL(init_user_ns);
*/

#define UIDHASH_BITS (CONFIG_BASE_SMALL ? 3 : 7)
-#define UIDHASH_SZ (1 << UIDHASH_BITS)
-#define UIDHASH_MASK (UIDHASH_SZ - 1)
-#define __uidhashfn(uid) (((uid >> UIDHASH_BITS) + uid) & UIDHASH_MASK)
-#define uidhashentry(uid) (uidhash_table + __uidhashfn((__kuid_val(uid))))

static struct kmem_cache *uid_cachep;
-struct hlist_head uidhash_table[UIDHASH_SZ];
+static DEFINE_HASHTABLE(uidhash_table, UIDHASH_BITS);

/*
* The uidhash_lock is mostly taken from process context, but it is
@@ -92,22 +89,22 @@ struct user_struct root_user = {
/*
* These routines must be called with the uidhash spinlock held!
*/
-static void uid_hash_insert(struct user_struct *up, struct hlist_head *hashent)
+static void uid_hash_insert(struct user_struct *up)
{
- hlist_add_head(&up->uidhash_node, hashent);
+ hash_add(uidhash_table, &up->uidhash_node, __kuid_val(up->uid));
}

static void uid_hash_remove(struct user_struct *up)
{
- hlist_del_init(&up->uidhash_node);
+ hash_del(&up->uidhash_node);
}

-static struct user_struct *uid_hash_find(kuid_t uid, struct hlist_head *hashent)
+static struct user_struct *uid_hash_find(kuid_t uid)
{
struct user_struct *user;
struct hlist_node *h;

- hlist_for_each_entry(user, h, hashent, uidhash_node) {
+ hash_for_each_possible(uidhash_table, user, h, uidhash_node, __kuid_val(uid)) {
if (uid_eq(user->uid, uid)) {
atomic_inc(&user->__count);
return user;
@@ -143,7 +140,7 @@ struct user_struct *find_user(kuid_t uid)
unsigned long flags;

spin_lock_irqsave(&uidhash_lock, flags);
- ret = uid_hash_find(uid, uidhashentry(uid));
+ ret = uid_hash_find(uid);
spin_unlock_irqrestore(&uidhash_lock, flags);
return ret;
}
@@ -164,11 +161,10 @@ void free_uid(struct user_struct *up)

struct user_struct *alloc_uid(kuid_t uid)
{
- struct hlist_head *hashent = uidhashentry(uid);
struct user_struct *up, *new;

spin_lock_irq(&uidhash_lock);
- up = uid_hash_find(uid, hashent);
+ up = uid_hash_find(uid);
spin_unlock_irq(&uidhash_lock);

if (!up) {
@@ -184,13 +180,13 @@ struct user_struct *alloc_uid(kuid_t uid)
* on adding the same user already..
*/
spin_lock_irq(&uidhash_lock);
- up = uid_hash_find(uid, hashent);
+ up = uid_hash_find(uid);
if (up) {
key_put(new->uid_keyring);
key_put(new->session_keyring);
kmem_cache_free(uid_cachep, new);
} else {
- uid_hash_insert(new, hashent);
+ uid_hash_insert(new);
up = new;
}
spin_unlock_irq(&uidhash_lock);
@@ -204,17 +200,14 @@ out_unlock:

static int __init uid_cache_init(void)
{
- int n;
-
uid_cachep = kmem_cache_create("uid_cache", sizeof(struct user_struct),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);

- for(n = 0; n < UIDHASH_SZ; ++n)
- INIT_HLIST_HEAD(uidhash_table + n);
+ hash_init(uidhash_table);

/* Insert the root user immediately (init already runs as root) */
spin_lock_irq(&uidhash_lock);
- uid_hash_insert(&root_user, uidhashentry(GLOBAL_ROOT_UID));
+ uid_hash_insert(&root_user);
spin_unlock_irq(&uidhash_lock);

return 0;
--
1.7.12.4

2012-10-28 19:08:37

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v7 03/16] mm,ksm: use new hashtable implementation

Switch ksm to use the new hashtable implementation. This reduces the amount of
generic unrelated code in the ksm module.

Signed-off-by: Sasha Levin <[email protected]>
---
mm/ksm.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 31ae5ea..36ba1a8 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -33,7 +33,7 @@
#include <linux/mmu_notifier.h>
#include <linux/swap.h>
#include <linux/ksm.h>
-#include <linux/hash.h>
+#include <linux/hashtable.h>
#include <linux/freezer.h>
#include <linux/oom.h>

@@ -156,9 +156,8 @@ struct rmap_item {
static struct rb_root root_stable_tree = RB_ROOT;
static struct rb_root root_unstable_tree = RB_ROOT;

-#define MM_SLOTS_HASH_SHIFT 10
-#define MM_SLOTS_HASH_HEADS (1 << MM_SLOTS_HASH_SHIFT)
-static struct hlist_head mm_slots_hash[MM_SLOTS_HASH_HEADS];
+#define MM_SLOTS_HASH_BITS 10
+static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);

static struct mm_slot ksm_mm_head = {
.mm_list = LIST_HEAD_INIT(ksm_mm_head.mm_list),
@@ -275,26 +274,21 @@ static inline void free_mm_slot(struct mm_slot *mm_slot)

static struct mm_slot *get_mm_slot(struct mm_struct *mm)
{
- struct mm_slot *mm_slot;
- struct hlist_head *bucket;
struct hlist_node *node;
+ struct mm_slot *slot;
+
+ hash_for_each_possible(mm_slots_hash, slot, node, link, (unsigned long)mm)
+ if (slot->mm == mm)
+ return slot;

- bucket = &mm_slots_hash[hash_ptr(mm, MM_SLOTS_HASH_SHIFT)];
- hlist_for_each_entry(mm_slot, node, bucket, link) {
- if (mm == mm_slot->mm)
- return mm_slot;
- }
return NULL;
}

static void insert_to_mm_slots_hash(struct mm_struct *mm,
struct mm_slot *mm_slot)
{
- struct hlist_head *bucket;
-
- bucket = &mm_slots_hash[hash_ptr(mm, MM_SLOTS_HASH_SHIFT)];
mm_slot->mm = mm;
- hlist_add_head(&mm_slot->link, bucket);
+ hash_add(mm_slots_hash, &mm_slot->link, (unsigned long)mm);
}

static inline int in_stable_tree(struct rmap_item *rmap_item)
@@ -647,7 +641,7 @@ static int unmerge_and_remove_all_rmap_items(void)
ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
struct mm_slot, mm_list);
if (ksm_test_exit(mm)) {
- hlist_del(&mm_slot->link);
+ hash_del(&mm_slot->link);
list_del(&mm_slot->mm_list);
spin_unlock(&ksm_mmlist_lock);

@@ -1392,7 +1386,7 @@ next_mm:
* or when all VM_MERGEABLE areas have been unmapped (and
* mmap_sem then protects against race with MADV_MERGEABLE).
*/
- hlist_del(&slot->link);
+ hash_del(&slot->link);
list_del(&slot->mm_list);
spin_unlock(&ksm_mmlist_lock);

@@ -1559,7 +1553,7 @@ void __ksm_exit(struct mm_struct *mm)
mm_slot = get_mm_slot(mm);
if (mm_slot && ksm_scan.mm_slot != mm_slot) {
if (!mm_slot->rmap_list) {
- hlist_del(&mm_slot->link);
+ hash_del(&mm_slot->link);
list_del(&mm_slot->mm_list);
easy_to_free = 1;
} else {
@@ -2038,6 +2032,9 @@ static int __init ksm_init(void)
*/
hotplug_memory_notifier(ksm_memory_callback, 100);
#endif
+
+ hash_init(mm_slots_hash);
+
return 0;

out_free:
--
1.7.12.4

2012-10-29 01:25:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 04/16] workqueue: use new hashtable implementation

On Sun, Oct 28, 2012 at 03:02:16PM -0400, Sasha Levin wrote:
> Switch workqueues to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in the workqueues.
>
> Signed-off-by: Sasha Levin <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks!

--
tejun

2012-10-29 01:29:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 08/16] block,elevator: use new hashtable implementation

On Sun, Oct 28, 2012 at 03:02:20PM -0400, Sasha Levin wrote:
> Switch elevator to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in the elevator.
>
> This also removes the dymanic allocation of the hash table. The size of the table is
> constant so there's no point in paying the price of an extra dereference when accessing
> it.
>
> Signed-off-by: Sasha Levin <[email protected]>

Reviewed-by: Tejun Heo <[email protected]>

But please reformat commit message to fit inside 80col (preferably 74
or something like that).

Thanks.

--
tejun

2012-10-29 11:29:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

* Sasha Levin ([email protected]) wrote:
> This hashtable implementation is using hlist buckets to provide a simple
> hashtable to prevent it from getting reimplemented all over the kernel.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
>
> Sorry for the long delay, I was busy with a bunch of personal things.
>
> Changes since v6:
>
> - Use macros that point to internal static inline functions instead of
> implementing everything as a macro.
> - Rebase on latest -next.
> - Resending the enter patch series on request.
> - Break early from hash_empty() if found to be non-empty.
> - DECLARE_HASHTABLE/DEFINE_HASHTABLE.
>
>
> include/linux/hashtable.h | 193 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 193 insertions(+)
> create mode 100644 include/linux/hashtable.h
>
> diff --git a/include/linux/hashtable.h b/include/linux/hashtable.h
> new file mode 100644
> index 0000000..1fb8c97
> --- /dev/null
> +++ b/include/linux/hashtable.h
> @@ -0,0 +1,193 @@
> +/*
> + * Statically sized hash table implementation
> + * (C) 2012 Sasha Levin <[email protected]>
> + */
> +
> +#ifndef _LINUX_HASHTABLE_H
> +#define _LINUX_HASHTABLE_H
> +
> +#include <linux/list.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/hash.h>
> +#include <linux/rculist.h>
> +
> +#define DEFINE_HASHTABLE(name, bits) \
> + struct hlist_head name[1 << bits] = \
> + { [0 ... ((1 << bits) - 1)] = HLIST_HEAD_INIT }

Although it's unlikely that someone would use this with a binary
operator with lower precedence than "<<" (see e.g.
http://www.swansontec.com/sopc.html) as "bits", lack of parenthesis
around "bits" would be unexpected by the caller, and could introduce
bugs. Please review all macros with the precedence table in mind, and
ask yourself if lack of parenthesis could introduce a subtle bug.

> +
> +#define DECLARE_HASHTABLE(name, bits) \
> + struct hlist_head name[1 << (bits)]

Here, you have parenthesis around "bits", but not above (inconsistency).

> +
> +#define HASH_SIZE(name) (ARRAY_SIZE(name))
> +#define HASH_BITS(name) ilog2(HASH_SIZE(name))
> +
> +/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
> +#define hash_min(val, bits) \
> +({ \
> + sizeof(val) <= 4 ? \
> + hash_32(val, bits) : \
> + hash_long(val, bits); \
> +})
> +
> +static inline void __hash_init(struct hlist_head *ht, int sz)

int -> unsigned int.

> +{
> + int i;

int -> unsigned int.

> +
> + for (i = 0; i < sz; i++)
> + INIT_HLIST_HEAD(&ht[sz]);

ouch. How did this work ? Has it been tested at all ?

sz -> i


> +}
> +
> +/**
> + * hash_init - initialize a hash table
> + * @hashtable: hashtable to be initialized
> + *
> + * Calculates the size of the hashtable from the given parameter, otherwise
> + * same as hash_init_size.
> + *
> + * This has to be a macro since HASH_BITS() will not work on pointers since
> + * it calculates the size during preprocessing.
> + */
> +#define hash_init(hashtable) __hash_init(hashtable, HASH_SIZE(hashtable))
> +
> +/**
> + * hash_add - add an object to a hashtable
> + * @hashtable: hashtable to add to
> + * @node: the &struct hlist_node of the object to be added
> + * @key: the key of the object to be added
> + */
> +#define hash_add(hashtable, node, key) \
> + hlist_add_head(node, &hashtable[hash_min(key, HASH_BITS(hashtable))]);

extra ";" at the end to remove.

> +
> +/**
> + * hash_add_rcu - add an object to a rcu enabled hashtable
> + * @hashtable: hashtable to add to
> + * @node: the &struct hlist_node of the object to be added
> + * @key: the key of the object to be added
> + */
> +#define hash_add_rcu(hashtable, node, key) \
> + hlist_add_head_rcu(node, &hashtable[hash_min(key, HASH_BITS(hashtable))]);

extra ";" at the end to remove.

> +
> +/**
> + * hash_hashed - check whether an object is in any hashtable
> + * @node: the &struct hlist_node of the object to be checked
> + */
> +#define hash_hashed(node) (!hlist_unhashed(node))

Please use a static inline for this instead of a macro.

> +
> +static inline bool __hash_empty(struct hlist_head *ht, int sz)

int -> unsigned int.

> +{
> + int i;

int -> unsigned int.

> +
> + for (i = 0; i < sz; i++)
> + if (!hlist_empty(&ht[i]))
> + return false;
> +
> + return true;
> +}
> +
> +/**
> + * hash_empty - check whether a hashtable is empty
> + * @hashtable: hashtable to check
> + *
> + * This has to be a macro since HASH_BITS() will not work on pointers since
> + * it calculates the size during preprocessing.
> + */
> +#define hash_empty(hashtable) __hash_empty(hashtable, HASH_SIZE(hashtable))
> +
> +/**
> + * hash_del - remove an object from a hashtable
> + * @node: &struct hlist_node of the object to remove
> + */
> +static inline void hash_del(struct hlist_node *node)
> +{
> + hlist_del_init(node);
> +}
> +
> +/**
> + * hash_del_rcu - remove an object from a rcu enabled hashtable
> + * @node: &struct hlist_node of the object to remove
> + */
> +static inline void hash_del_rcu(struct hlist_node *node)
> +{
> + hlist_del_init_rcu(node);
> +}
> +
> +/**
> + * hash_for_each - iterate over a hashtable
> + * @name: hashtable to iterate
> + * @bkt: integer to use as bucket loop cursor
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @obj: the type * to use as a loop cursor for each entry
> + * @member: the name of the hlist_node within the struct
> + */
> +#define hash_for_each(name, bkt, node, obj, member) \
> + for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\

if "bkt" happens to be a dereferenced pointer (unary operator '*'), we
get into a situation where "*blah" has higher precedence than "=",
higher than "<", but lower than "++". Any thoughts on fixing this ?

> + hlist_for_each_entry(obj, node, &name[bkt], member)
> +
> +/**
> + * hash_for_each_rcu - iterate over a rcu enabled hashtable
> + * @name: hashtable to iterate
> + * @bkt: integer to use as bucket loop cursor
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @obj: the type * to use as a loop cursor for each entry
> + * @member: the name of the hlist_node within the struct
> + */
> +#define hash_for_each_rcu(name, bkt, node, obj, member) \
> + for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\

Same comment as above about "bkt".

> + hlist_for_each_entry_rcu(obj, node, &name[bkt], member)
> +
> +/**
> + * hash_for_each_safe - iterate over a hashtable safe against removal of
> + * hash entry
> + * @name: hashtable to iterate
> + * @bkt: integer to use as bucket loop cursor
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @tmp: a &struct used for temporary storage
> + * @obj: the type * to use as a loop cursor for each entry
> + * @member: the name of the hlist_node within the struct
> + */
> +#define hash_for_each_safe(name, bkt, node, tmp, obj, member) \
> + for (bkt = 0, node = NULL; node == NULL && bkt < HASH_SIZE(name); bkt++)\

Same comment as above about "bkt".

Thanks,

Mathieu

> + hlist_for_each_entry_safe(obj, node, tmp, &name[bkt], member)
> +
> +/**
> + * hash_for_each_possible - iterate over all possible objects hashing to the
> + * same bucket
> + * @name: hashtable to iterate
> + * @obj: the type * to use as a loop cursor for each entry
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @member: the name of the hlist_node within the struct
> + * @key: the key of the objects to iterate over
> + */
> +#define hash_for_each_possible(name, obj, node, member, key) \
> + hlist_for_each_entry(obj, node, &name[hash_min(key, HASH_BITS(name))], member)
> +
> +/**
> + * hash_for_each_possible_rcu - iterate over all possible objects hashing to the
> + * same bucket in an rcu enabled hashtable
> + * in a rcu enabled hashtable
> + * @name: hashtable to iterate
> + * @obj: the type * to use as a loop cursor for each entry
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @member: the name of the hlist_node within the struct
> + * @key: the key of the objects to iterate over
> + */
> +#define hash_for_each_possible_rcu(name, obj, node, member, key) \
> + hlist_for_each_entry_rcu(obj, node, &name[hash_min(key, HASH_BITS(name))], member)
> +
> +/**
> + * hash_for_each_possible_safe - iterate over all possible objects hashing to the
> + * same bucket safe against removals
> + * @name: hashtable to iterate
> + * @obj: the type * to use as a loop cursor for each entry
> + * @node: the &struct list_head to use as a loop cursor for each entry
> + * @tmp: a &struct used for temporary storage
> + * @member: the name of the hlist_node within the struct
> + * @key: the key of the objects to iterate over
> + */
> +#define hash_for_each_possible_safe(name, obj, node, tmp, member, key) \
> + hlist_for_each_entry_safe(obj, node, tmp, \
> + &name[hash_min(key, HASH_BITS(name))], member)
> +
> +
> +#endif
> --
> 1.7.12.4
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 11:35:20

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in the tracepoints.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> kernel/tracepoint.c | 27 +++++++++++----------------
> 1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index d96ba22..854df92 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -26,6 +26,7 @@
> #include <linux/slab.h>
> #include <linux/sched.h>
> #include <linux/static_key.h>
> +#include <linux/hashtable.h>
>
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
> * Protected by tracepoints_mutex.
> */
> #define TRACEPOINT_HASH_BITS 6
> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
>
[...]
>
> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
>
> static int init_tracepoints(void)
> {
> + hash_init(tracepoint_table);
> +
> return register_module_notifier(&tracepoint_module_nb);
> }
> __initcall(init_tracepoints);

So we have a hash table defined in .bss (therefore entirely initialized
to NULL), and you add a call to "hash_init", which iterates on the whole
array and initialize it to NULL (again) ?

This extra initialization is redundant. I think it should be removed
from here, and hashtable.h should document that hash_init() don't need
to be called on zeroed memory (which includes static/global variables,
kzalloc'd memory, etc).

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 12:15:11

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 07/16] net,9p: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Switch 9p error table to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in 9p.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> net/9p/error.c | 21 ++++++++++-----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/net/9p/error.c b/net/9p/error.c
> index 2ab2de7..a5cc7dd 100644
> --- a/net/9p/error.c
> +++ b/net/9p/error.c
> @@ -34,7 +34,7 @@
> #include <linux/jhash.h>
> #include <linux/errno.h>
> #include <net/9p/9p.h>
> -
> +#include <linux/hashtable.h>

missing newline.

> /**
> * struct errormap - map string errors from Plan 9 to Linux numeric ids
> * @name: string sent over 9P
> @@ -50,8 +50,8 @@ struct errormap {
> struct hlist_node list;
> };
>
> -#define ERRHASHSZ 32
> -static struct hlist_head hash_errmap[ERRHASHSZ];
> +#define ERR_HASH_BITS 5
> +static DEFINE_HASHTABLE(hash_errmap, ERR_HASH_BITS);
>
> /* FixMe - reduce to a reasonable size */
> static struct errormap errmap[] = {
> @@ -193,18 +193,17 @@ static struct errormap errmap[] = {
> int p9_error_init(void)
> {
> struct errormap *c;
> - int bucket;
> + u32 hash;
>
> /* initialize hash table */
> - for (bucket = 0; bucket < ERRHASHSZ; bucket++)
> - INIT_HLIST_HEAD(&hash_errmap[bucket]);
> + hash_init(hash_errmap);

As for most of the other patches in this series, the hash_init is
redundant for a statically defined hash table.

Thanks,

Mathieu

>
> /* load initial error map into hash table */
> for (c = errmap; c->name != NULL; c++) {
> c->namelen = strlen(c->name);
> - bucket = jhash(c->name, c->namelen, 0) % ERRHASHSZ;
> + hash = jhash(c->name, c->namelen, 0);
> INIT_HLIST_NODE(&c->list);
> - hlist_add_head(&c->list, &hash_errmap[bucket]);
> + hash_add(hash_errmap, &c->list, hash);
> }
>
> return 1;
> @@ -223,13 +222,13 @@ int p9_errstr2errno(char *errstr, int len)
> int errno;
> struct hlist_node *p;
> struct errormap *c;
> - int bucket;
> + u32 hash;
>
> errno = 0;
> p = NULL;
> c = NULL;
> - bucket = jhash(errstr, len, 0) % ERRHASHSZ;
> - hlist_for_each_entry(c, p, &hash_errmap[bucket], list) {
> + hash = jhash(errstr, len, 0);
> + hash_for_each_possible(hash_errmap, c, p, list, hash) {
> if (c->namelen == len && !memcmp(c->name, errstr, len)) {
> errno = c->val;
> break;
> --
> 1.7.12.4
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 12:20:56

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 08/16] block,elevator: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
[...]
> @@ -96,6 +97,8 @@ struct elevator_type
> struct list_head list;
> };
>
> +#define ELV_HASH_BITS 6
> +
> /*
> * each queue has an elevator_queue associated with it
> */
> @@ -105,7 +108,7 @@ struct elevator_queue
> void *elevator_data;
> struct kobject kobj;
> struct mutex sysfs_lock;
> - struct hlist_head *hash;
> + DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
> unsigned int registered:1;

Hrm, so this is moving "registered" out of the elevator_queue first
cache-line by turning the pointer into a 256 or 512 bytes hash table.

Maybe we should consider moving "registered" before the "hash" field ?

Thanks,

Mathieu

> };
>
> --
> 1.7.12.4
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 12:42:35

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Switch cache to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in the cache implementation.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> net/sunrpc/cache.c | 20 +++++++++-----------
> 1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index fc2f7aa..0490546 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -28,6 +28,7 @@
> #include <linux/workqueue.h>
> #include <linux/mutex.h>
> #include <linux/pagemap.h>
> +#include <linux/hashtable.h>
> #include <asm/ioctls.h>
> #include <linux/sunrpc/types.h>
> #include <linux/sunrpc/cache.h>
> @@ -524,19 +525,18 @@ EXPORT_SYMBOL_GPL(cache_purge);
> * it to be revisited when cache info is available
> */
>
> -#define DFR_HASHSIZE (PAGE_SIZE/sizeof(struct list_head))
> -#define DFR_HASH(item) ((((long)item)>>4 ^ (((long)item)>>13)) % DFR_HASHSIZE)
> +#define DFR_HASH_BITS 9

If we look at a bit of history, mainly commit:

commit 1117449276bb909b029ed0b9ba13f53e4784db9d
Author: NeilBrown <[email protected]>
Date: Thu Aug 12 17:04:08 2010 +1000

sunrpc/cache: change deferred-request hash table to use hlist.


we'll notice that the only reason why the prior DFR_HASHSIZE was using

(PAGE_SIZE/sizeof(struct list_head))

instead of

(PAGE_SIZE/sizeof(struct hlist_head))

is because it has been forgotten in that commit. The intent there is to
make the hash table array fit the page size.

By defining DFR_HASH_BITS arbitrarily to "9", this indeed fulfills this
purpose on architectures with 4kB page size and 64-bit pointers, but not
on some powerpc configurations, and Tile architectures, which have more
exotic 64kB page size, and of course on the far less exotic 32-bit
pointer architectures.

So defining e.g.:

#include <linux/log2.h>

#define DFR_HASH_BITS (PAGE_SHIFT - ilog2(BITS_PER_LONG))

would keep the intended behavior in all cases: use one page for the hash
array.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 12:46:59

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 10/16] dlm: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
[...]
> @@ -158,34 +159,21 @@ static int dlm_allow_conn;
> static struct workqueue_struct *recv_workqueue;
> static struct workqueue_struct *send_workqueue;
>
> -static struct hlist_head connection_hash[CONN_HASH_SIZE];
> +static struct hlist_head connection_hash[CONN_HASH_BITS];
> static DEFINE_MUTEX(connections_lock);
> static struct kmem_cache *con_cache;
>
> static void process_recv_sockets(struct work_struct *work);
> static void process_send_sockets(struct work_struct *work);
>
> -
> -/* This is deliberately very simple because most clusters have simple
> - sequential nodeids, so we should be able to go straight to a connection
> - struct in the array */
> -static inline int nodeid_hash(int nodeid)
> -{
> - return nodeid & (CONN_HASH_SIZE-1);
> -}

There is one thing I dislike about this change: you remove a useful
comment. It's good to be informed of the reason why a direct mapping
"value -> hash" without any dispersion function is preferred here.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 13:04:08

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 11/16] net,l2tp: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
[...]
> -/* Session hash global list for L2TPv3.
> - * The session_id SHOULD be random according to RFC3931, but several
> - * L2TP implementations use incrementing session_ids. So we do a real
> - * hash on the session_id, rather than a simple bitmask.
> - */
> -static inline struct hlist_head *
> -l2tp_session_id_hash_2(struct l2tp_net *pn, u32 session_id)
> -{
> - return &pn->l2tp_session_hlist[hash_32(session_id, L2TP_HASH_BITS_2)];
> -
> -}

I understand that you removed this hash function, as well as
"l2tp_session_id_hash" below, but is there any way we could leave those
comments in place ? They look useful.

> -/* Session hash list.
> - * The session_id SHOULD be random according to RFC2661, but several
> - * L2TP implementations (Cisco and Microsoft) use incrementing
> - * session_ids. So we do a real hash on the session_id, rather than a
> - * simple bitmask.

Ditto.

> - */
> -static inline struct hlist_head *
> -l2tp_session_id_hash(struct l2tp_tunnel *tunnel, u32 session_id)
> -{
> - return &tunnel->session_hlist[hash_32(session_id, L2TP_HASH_BITS)];
> -}
> -
> /* Lookup a session by id
> */

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 13:07:46

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 10/16] dlm: use new hashtable implementation

* Mathieu Desnoyers ([email protected]) wrote:
> * Sasha Levin ([email protected]) wrote:
> [...]
> > @@ -158,34 +159,21 @@ static int dlm_allow_conn;
> > static struct workqueue_struct *recv_workqueue;
> > static struct workqueue_struct *send_workqueue;
> >
> > -static struct hlist_head connection_hash[CONN_HASH_SIZE];
> > +static struct hlist_head connection_hash[CONN_HASH_BITS];
> > static DEFINE_MUTEX(connections_lock);
> > static struct kmem_cache *con_cache;
> >
> > static void process_recv_sockets(struct work_struct *work);
> > static void process_send_sockets(struct work_struct *work);
> >
> > -
> > -/* This is deliberately very simple because most clusters have simple
> > - sequential nodeids, so we should be able to go straight to a connection
> > - struct in the array */
> > -static inline int nodeid_hash(int nodeid)
> > -{
> > - return nodeid & (CONN_HASH_SIZE-1);
> > -}
>
> There is one thing I dislike about this change: you remove a useful
> comment. It's good to be informed of the reason why a direct mapping
> "value -> hash" without any dispersion function is preferred here.

And now that I come to think of it: you're changing the behavior : you
will now use a dispersion function on the key, which goes against the
intent expressed in this comment.

It might be good to change hash_add(), hash_add_rcu(),
hash_for_each_possible*() key parameter for a "hash" parameter, and let
the caller provide the hash value computed by the function they like as
parameter, rather than enforcing hash_32/hash_64.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 13:23:27

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 13/16] lockd: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Switch lockd to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in lockd.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++-------------------------
> 1 file changed, 36 insertions(+), 30 deletions(-)
>
> diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
> index 0deb5f6..d223a1f 100644
> --- a/fs/lockd/svcsubs.c
> +++ b/fs/lockd/svcsubs.c
> @@ -20,6 +20,7 @@
> #include <linux/lockd/share.h>
> #include <linux/module.h>
> #include <linux/mount.h>
> +#include <linux/hashtable.h>
>
> #define NLMDBG_FACILITY NLMDBG_SVCSUBS
>
> @@ -28,8 +29,7 @@
> * Global file hash table
> */
> #define FILE_HASH_BITS 7
> -#define FILE_NRHASH (1<<FILE_HASH_BITS)
> -static struct hlist_head nlm_files[FILE_NRHASH];
> +static DEFINE_HASHTABLE(nlm_files, FILE_HASH_BITS);
> static DEFINE_MUTEX(nlm_file_mutex);
>
> #ifdef NFSD_DEBUG
> @@ -68,7 +68,7 @@ static inline unsigned int file_hash(struct nfs_fh *f)
> int i;
> for (i=0; i<NFS2_FHSIZE;i++)
> tmp += f->data[i];
> - return tmp & (FILE_NRHASH - 1);
> + return tmp;
> }
>
> /*
> @@ -86,17 +86,17 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result,
> {
> struct hlist_node *pos;
> struct nlm_file *file;
> - unsigned int hash;
> + unsigned int key;
> __be32 nfserr;
>
> nlm_debug_print_fh("nlm_lookup_file", f);
>
> - hash = file_hash(f);
> + key = file_hash(f);
>
> /* Lock file table */
> mutex_lock(&nlm_file_mutex);
>
> - hlist_for_each_entry(file, pos, &nlm_files[hash], f_list)
> + hash_for_each_possible(nlm_files, file, pos, f_list, file_hash(f))

we have a nice example of weirdness about key vs hash here:

1) "key" is computed from file_hash(f)
2) file_hash(f) is computed again and again in hash_for_each_possible()

> if (!nfs_compare_fh(&file->f_handle, f))
> goto found;
>
> @@ -123,7 +123,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result,
> goto out_free;
> }
>
> - hlist_add_head(&file->f_list, &nlm_files[hash]);
> + hash_add(nlm_files, &file->f_list, key);

3) then we use "key" as parameter to hash_add.

Moreover, we're adding dispersion to the file_hash() with the hash_32()
called under the hook within hashtable.h. Is it an intended behavior ?
This should at the very least be documented in the changelog.

[...]

> +static int __init nlm_init(void)
> +{
> + hash_init(nlm_files);

Useless.

Thanks,

Mathieu

> + return 0;
> +}
> +
> +module_init(nlm_init);
> --
> 1.7.12.4
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 13:25:56

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 14/16] net,rds: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Switch rds to use the new hashtable implementation. This reduces the amount of
> generic unrelated code in rds.
>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> net/rds/bind.c | 28 +++++++++-----
> net/rds/connection.c | 102 +++++++++++++++++++++++----------------------------
> 2 files changed, 63 insertions(+), 67 deletions(-)
>
> diff --git a/net/rds/bind.c b/net/rds/bind.c
> index 637bde5..79d65ce 100644
> --- a/net/rds/bind.c
> +++ b/net/rds/bind.c
> @@ -36,16 +36,16 @@
> #include <linux/if_arp.h>
> #include <linux/jhash.h>
> #include <linux/ratelimit.h>
> +#include <linux/hashtable.h>
> #include "rds.h"
>
> -#define BIND_HASH_SIZE 1024
> -static struct hlist_head bind_hash_table[BIND_HASH_SIZE];
> +#define BIND_HASH_BITS 10
> +static DEFINE_HASHTABLE(bind_hash_table, BIND_HASH_BITS);
> static DEFINE_SPINLOCK(rds_bind_lock);
>
> -static struct hlist_head *hash_to_bucket(__be32 addr, __be16 port)
> +static u32 rds_hash(__be32 addr, __be16 port)
> {
> - return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
> - (BIND_HASH_SIZE - 1));
> + return jhash_2words((u32)addr, (u32)port, 0);
> }
>
> static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
> @@ -53,12 +53,12 @@ static struct rds_sock *rds_bind_lookup(__be32 addr, __be16 port,
> {
> struct rds_sock *rs;
> struct hlist_node *node;
> - struct hlist_head *head = hash_to_bucket(addr, port);
> + u32 key = rds_hash(addr, port);
> u64 cmp;
> u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
>
> rcu_read_lock();
> - hlist_for_each_entry_rcu(rs, node, head, rs_bound_node) {
> + hash_for_each_possible_rcu(bind_hash_table, rs, node, rs_bound_node, key) {

here too, key will be hashed twice:

- once by jhash_2words,
- once by hash_32(),

is this intended ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 13:29:38

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
[...]
> -static struct hlist_head *hash_bucket(struct net *net, const char *name)
> -{
> - unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
> - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
> -}
> -
> /**
> * ovs_vport_locate - find a port that has already been created
> *
> @@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
> */
> struct vport *ovs_vport_locate(struct net *net, const char *name)
> {
> - struct hlist_head *bucket = hash_bucket(net, name);
> struct vport *vport;
> struct hlist_node *node;
> + int key = full_name_hash(name, strlen(name));
>
> - hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
> - if (!strcmp(name, vport->ops->get_name(vport)) &&
> - net_eq(ovs_dp_get_net(vport->dp), net))
> + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)

Is applying hash_32() on top of full_name_hash() needed and expected ?

Thanks,

Mathieu

> + if (!strcmp(name, vport->ops->get_name(vport)))
> return vport;
>
> return NULL;
> @@ -174,7 +165,8 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
>
> for (i = 0; i < ARRAY_SIZE(vport_ops_list); i++) {
> if (vport_ops_list[i]->type == parms->type) {
> - struct hlist_head *bucket;
> + int key;
> + const char *name;
>
> vport = vport_ops_list[i]->create(parms);
> if (IS_ERR(vport)) {
> @@ -182,9 +174,9 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
> goto out;
> }
>
> - bucket = hash_bucket(ovs_dp_get_net(vport->dp),
> - vport->ops->get_name(vport));
> - hlist_add_head_rcu(&vport->hash_node, bucket);
> + name = vport->ops->get_name(vport);
> + key = full_name_hash(name, strlen(name));
> + hash_add_rcu(dev_table, &vport->hash_node, key);
> return vport;
> }
> }
> @@ -225,7 +217,7 @@ void ovs_vport_del(struct vport *vport)
> {
> ASSERT_RTNL();
>
> - hlist_del_rcu(&vport->hash_node);
> + hash_del_rcu(&vport->hash_node);
>
> vport->ops->destroy(vport);
> }
> --
> 1.7.12.4
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 14:50:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

On Mon, Oct 29, 2012 at 5:42 AM, Mathieu Desnoyers
<[email protected]> wrote:
>
> So defining e.g.:
>
> #include <linux/log2.h>
>
> #define DFR_HASH_BITS (PAGE_SHIFT - ilog2(BITS_PER_LONG))
>
> would keep the intended behavior in all cases: use one page for the hash
> array.

Well, since that wasn't true before either because of the long-time
bug you point out, clearly the page size isn't all that important. I
think it's more important to have small and simple code, and "9" is
certainly that, compared to playing ilog2 games with not-so-obvious
things.

Because there's no reason to believe that '9' is in any way a worse
random number than something page-shift-related, is there? And getting
away from *previous* overly-complicated size calculations that had
been broken because they were too complicated and random, sounds like
a good idea.

Linus

2012-10-29 15:13:48

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

* Linus Torvalds ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 5:42 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> >
> > So defining e.g.:
> >
> > #include <linux/log2.h>
> >
> > #define DFR_HASH_BITS (PAGE_SHIFT - ilog2(BITS_PER_LONG))
> >
> > would keep the intended behavior in all cases: use one page for the hash
> > array.
>
> Well, since that wasn't true before either because of the long-time
> bug you point out, clearly the page size isn't all that important. I
> think it's more important to have small and simple code, and "9" is
> certainly that, compared to playing ilog2 games with not-so-obvious
> things.
>
> Because there's no reason to believe that '9' is in any way a worse
> random number than something page-shift-related, is there? And getting
> away from *previous* overly-complicated size calculations that had
> been broken because they were too complicated and random, sounds like
> a good idea.

Good point. I agree that unless we really care about the precise number
of TLB entries and cache lines used by this hash table, we might want to
stay away from page-size and pointer-size based calculation.

It might not hurt to explain this in the patch changelog though.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 15:17:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

On Mon, Oct 29, 2012 at 11:13:43AM -0400, Mathieu Desnoyers wrote:
> * Linus Torvalds ([email protected]) wrote:
> > On Mon, Oct 29, 2012 at 5:42 AM, Mathieu Desnoyers
> > <[email protected]> wrote:
> > >
> > > So defining e.g.:
> > >
> > > #include <linux/log2.h>
> > >
> > > #define DFR_HASH_BITS (PAGE_SHIFT - ilog2(BITS_PER_LONG))
> > >
> > > would keep the intended behavior in all cases: use one page for the hash
> > > array.
> >
> > Well, since that wasn't true before either because of the long-time
> > bug you point out, clearly the page size isn't all that important. I
> > think it's more important to have small and simple code, and "9" is
> > certainly that, compared to playing ilog2 games with not-so-obvious
> > things.
> >
> > Because there's no reason to believe that '9' is in any way a worse
> > random number than something page-shift-related, is there? And getting
> > away from *previous* overly-complicated size calculations that had
> > been broken because they were too complicated and random, sounds like
> > a good idea.
>
> Good point. I agree that unless we really care about the precise number
> of TLB entries and cache lines used by this hash table, we might want to
> stay away from page-size and pointer-size based calculation.
>
> It might not hurt to explain this in the patch changelog though.

I'd also be happy to take that as a separate patch now.

--b.

2012-10-29 15:41:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

* J. Bruce Fields ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 11:13:43AM -0400, Mathieu Desnoyers wrote:
> > * Linus Torvalds ([email protected]) wrote:
> > > On Mon, Oct 29, 2012 at 5:42 AM, Mathieu Desnoyers
> > > <[email protected]> wrote:
> > > >
> > > > So defining e.g.:
> > > >
> > > > #include <linux/log2.h>
> > > >
> > > > #define DFR_HASH_BITS (PAGE_SHIFT - ilog2(BITS_PER_LONG))
> > > >
> > > > would keep the intended behavior in all cases: use one page for the hash
> > > > array.
> > >
> > > Well, since that wasn't true before either because of the long-time
> > > bug you point out, clearly the page size isn't all that important. I
> > > think it's more important to have small and simple code, and "9" is
> > > certainly that, compared to playing ilog2 games with not-so-obvious
> > > things.
> > >
> > > Because there's no reason to believe that '9' is in any way a worse
> > > random number than something page-shift-related, is there? And getting
> > > away from *previous* overly-complicated size calculations that had
> > > been broken because they were too complicated and random, sounds like
> > > a good idea.
> >
> > Good point. I agree that unless we really care about the precise number
> > of TLB entries and cache lines used by this hash table, we might want to
> > stay away from page-size and pointer-size based calculation.
> >
> > It might not hurt to explain this in the patch changelog though.
>
> I'd also be happy to take that as a separate patch now.

FYIW: I've made a nice boo-boo above. It should have been:

#define DFR_HASH_BITS (PAGE_SHIFT - ilog2(sizeof(struct hlist_head)))

Because we happen to have a memory indexed in bytes, not in bits. I
guess this goes a long way proving Linus' point about virtues of trivial
code. ;-)

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 15:44:25

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

Hi Mathieu,

On Mon, Oct 29, 2012 at 9:29 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
> [...]
>> -static struct hlist_head *hash_bucket(struct net *net, const char *name)
>> -{
>> - unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
>> - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
>> -}
>> -
>> /**
>> * ovs_vport_locate - find a port that has already been created
>> *
>> @@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
>> */
>> struct vport *ovs_vport_locate(struct net *net, const char *name)
>> {
>> - struct hlist_head *bucket = hash_bucket(net, name);
>> struct vport *vport;
>> struct hlist_node *node;
>> + int key = full_name_hash(name, strlen(name));
>>
>> - hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
>> - if (!strcmp(name, vport->ops->get_name(vport)) &&
>> - net_eq(ovs_dp_get_net(vport->dp), net))
>> + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
>
> Is applying hash_32() on top of full_name_hash() needed and expected ?

Since this was pointed out in several of the patches, I'll answer it
just once here.

I've intentionally "allowed" double hashing with hash_32 to keep the
code simple.

hash_32() is pretty simple and gcc optimizes it to be almost nothing,
so doing that costs us a multiplication and a shift. On the other
hand, we benefit from keeping our code simple - how would we avoid
doing this double hash? adding a different hashtable function for
strings? or a new function for already hashed keys? I think we benefit
a lot from having to mul/shr instead of adding extra lines of code
here.


Thanks,
Sasha

2012-10-29 15:54:22

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 10/16] dlm: use new hashtable implementation

On Mon, Oct 29, 2012 at 9:07 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Mathieu Desnoyers ([email protected]) wrote:
>> * Sasha Levin ([email protected]) wrote:
>> [...]
>> > @@ -158,34 +159,21 @@ static int dlm_allow_conn;
>> > static struct workqueue_struct *recv_workqueue;
>> > static struct workqueue_struct *send_workqueue;
>> >
>> > -static struct hlist_head connection_hash[CONN_HASH_SIZE];
>> > +static struct hlist_head connection_hash[CONN_HASH_BITS];
>> > static DEFINE_MUTEX(connections_lock);
>> > static struct kmem_cache *con_cache;
>> >
>> > static void process_recv_sockets(struct work_struct *work);
>> > static void process_send_sockets(struct work_struct *work);
>> >
>> > -
>> > -/* This is deliberately very simple because most clusters have simple
>> > - sequential nodeids, so we should be able to go straight to a connection
>> > - struct in the array */
>> > -static inline int nodeid_hash(int nodeid)
>> > -{
>> > - return nodeid & (CONN_HASH_SIZE-1);
>> > -}
>>
>> There is one thing I dislike about this change: you remove a useful
>> comment. It's good to be informed of the reason why a direct mapping
>> "value -> hash" without any dispersion function is preferred here.

Yes, I've removed the comment because it's no longer true with the patch :)

> And now that I come to think of it: you're changing the behavior : you
> will now use a dispersion function on the key, which goes against the
> intent expressed in this comment.

The comment gave us the information that nodeids are mostly
sequential, we no longer need to rely on that.

> It might be good to change hash_add(), hash_add_rcu(),
> hash_for_each_possible*() key parameter for a "hash" parameter, and let
> the caller provide the hash value computed by the function they like as
> parameter, rather than enforcing hash_32/hash_64.

Why? We already proved that hash_32() is more than enough as a hashing
function, why complicate things?

Even doing hash_32() on top of another hash is probably a good idea to
keep things simple.

Thanks,
Sasha

2012-10-29 16:00:03

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> Hi Mathieu,
>
> On Mon, Oct 29, 2012 at 9:29 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> > [...]
> >> -static struct hlist_head *hash_bucket(struct net *net, const char *name)
> >> -{
> >> - unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
> >> - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
> >> -}
> >> -
> >> /**
> >> * ovs_vport_locate - find a port that has already been created
> >> *
> >> @@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
> >> */
> >> struct vport *ovs_vport_locate(struct net *net, const char *name)
> >> {
> >> - struct hlist_head *bucket = hash_bucket(net, name);
> >> struct vport *vport;
> >> struct hlist_node *node;
> >> + int key = full_name_hash(name, strlen(name));
> >>
> >> - hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
> >> - if (!strcmp(name, vport->ops->get_name(vport)) &&
> >> - net_eq(ovs_dp_get_net(vport->dp), net))
> >> + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
> >
> > Is applying hash_32() on top of full_name_hash() needed and expected ?
>
> Since this was pointed out in several of the patches, I'll answer it
> just once here.
>
> I've intentionally "allowed" double hashing with hash_32 to keep the
> code simple.
>
> hash_32() is pretty simple and gcc optimizes it to be almost nothing,
> so doing that costs us a multiplication and a shift. On the other
> hand, we benefit from keeping our code simple - how would we avoid
> doing this double hash? adding a different hashtable function for
> strings? or a new function for already hashed keys? I think we benefit
> a lot from having to mul/shr instead of adding extra lines of code
> here.

This could be done, as I pointed out in another email within this
thread, by changing the "key" argument from add/for_each_possible to an
expected "hash" value, and let the caller invoke hash_32() if they want.
I doubt this would add a significant amount of complexity for users of
this API, but would allow much more flexibility to choose hash
functions.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 16:07:09

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

On Mon, Oct 29, 2012 at 7:29 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
>> +
>> + for (i = 0; i < sz; i++)
>> + INIT_HLIST_HEAD(&ht[sz]);
>
> ouch. How did this work ? Has it been tested at all ?
>
> sz -> i

Funny enough, it works perfectly. Generally as a test I boot the
kernel in a VM and let it fuzz with trinity for a bit, doing that with
the code above worked flawlessly.

While it works, it's obviously wrong. Why does it work though? Usually
there's a list op happening pretty soon after that which brings the
list into proper state.

I've been playing with a patch that adds a magic value into list_head
if CONFIG_DEBUG_LIST is set, and checks that magic in the list debug
code in lib/list_debug.c.

Does it sound like something useful? If so I'll send that patch out.


Thanks,
Sasha

2012-10-29 16:07:22

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 10/16] dlm: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 9:07 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Mathieu Desnoyers ([email protected]) wrote:
> >> * Sasha Levin ([email protected]) wrote:
> >> [...]
> >> > @@ -158,34 +159,21 @@ static int dlm_allow_conn;
> >> > static struct workqueue_struct *recv_workqueue;
> >> > static struct workqueue_struct *send_workqueue;
> >> >
> >> > -static struct hlist_head connection_hash[CONN_HASH_SIZE];
> >> > +static struct hlist_head connection_hash[CONN_HASH_BITS];
> >> > static DEFINE_MUTEX(connections_lock);
> >> > static struct kmem_cache *con_cache;
> >> >
> >> > static void process_recv_sockets(struct work_struct *work);
> >> > static void process_send_sockets(struct work_struct *work);
> >> >
> >> > -
> >> > -/* This is deliberately very simple because most clusters have simple
> >> > - sequential nodeids, so we should be able to go straight to a connection
> >> > - struct in the array */
> >> > -static inline int nodeid_hash(int nodeid)
> >> > -{
> >> > - return nodeid & (CONN_HASH_SIZE-1);
> >> > -}
> >>
> >> There is one thing I dislike about this change: you remove a useful
> >> comment. It's good to be informed of the reason why a direct mapping
> >> "value -> hash" without any dispersion function is preferred here.
>
> Yes, I've removed the comment because it's no longer true with the patch :)
>
> > And now that I come to think of it: you're changing the behavior : you
> > will now use a dispersion function on the key, which goes against the
> > intent expressed in this comment.
>
> The comment gave us the information that nodeids are mostly
> sequential, we no longer need to rely on that.

I'm fine with turning a direct + modulo mapping into a dispersed hash as
long as there are no underlying assumptions about sequentiality of value
accesses.

If the access pattern would happen to be typically sequential, then
adding dispersion could hurt performances significantly, turning a
frequent L1 access into a L2 access for instance.

>
> > It might be good to change hash_add(), hash_add_rcu(),
> > hash_for_each_possible*() key parameter for a "hash" parameter, and let
> > the caller provide the hash value computed by the function they like as
> > parameter, rather than enforcing hash_32/hash_64.
>
> Why? We already proved that hash_32() is more than enough as a hashing
> function, why complicate things?
>
> Even doing hash_32() on top of another hash is probably a good idea to
> keep things simple.

All I'm asking is: have you made sure that this hash table is not
deliberately kept sequential (without dispersion) to accelerate specific
access patterns ? This should at least be documented in the changelog.

Thanks,

Mathieu


>
> Thanks,
> Sasha

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 16:14:19

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 7:29 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> +
> >> + for (i = 0; i < sz; i++)
> >> + INIT_HLIST_HEAD(&ht[sz]);
> >
> > ouch. How did this work ? Has it been tested at all ?
> >
> > sz -> i
>
> Funny enough, it works perfectly. Generally as a test I boot the
> kernel in a VM and let it fuzz with trinity for a bit, doing that with
> the code above worked flawlessly.
>
> While it works, it's obviously wrong. Why does it work though? Usually
> there's a list op happening pretty soon after that which brings the
> list into proper state.
>
> I've been playing with a patch that adds a magic value into list_head
> if CONFIG_DEBUG_LIST is set, and checks that magic in the list debug
> code in lib/list_debug.c.
>
> Does it sound like something useful? If so I'll send that patch out.

Most of the calls to this initialization function apply it on zeroed
memory (static/kzalloc'd...), which makes it useless. I'd actually be in
favor of removing those redundant calls (as I pointed out in another
email), and document that zeroed memory don't need to be explicitly
initialized.

Those sites that need to really reinitialize memory, or initialize it
(if located on the stack or in non-zeroed dynamically allocated memory)
could use a memset to 0, which will likely be faster than setting to
NULL on many architectures.

About testing, I'd recommend taking the few sites that still need the
initialization function, and just initialize the array with garbage
before calling the initialization function. Things should blow up quite
quickly. Doing it as a one-off thing might be enough to catch any issue.
I don't think we need extra magic numbers to catch issues in this rather
obvious init function.

Thanks,

Mathieu

>
>
> Thanks,
> Sasha

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 16:18:17

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

Hello,

On Mon, Oct 29, 2012 at 12:14:12PM -0400, Mathieu Desnoyers wrote:
> Most of the calls to this initialization function apply it on zeroed
> memory (static/kzalloc'd...), which makes it useless. I'd actually be in
> favor of removing those redundant calls (as I pointed out in another
> email), and document that zeroed memory don't need to be explicitly
> initialized.
>
> Those sites that need to really reinitialize memory, or initialize it
> (if located on the stack or in non-zeroed dynamically allocated memory)
> could use a memset to 0, which will likely be faster than setting to
> NULL on many architectures.

I don't think it's a good idea to optimize out the basic encapsulation
there. We're talking about re-zeroing some static memory areas which
are pretty small. It's just not worth optimizing out at the cost of
proper initializtion. e.g. We might add debug fields to list_head
later.

Thanks.

--
tejun

2012-10-29 16:22:26

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

* Tejun Heo ([email protected]) wrote:
> Hello,
>
> On Mon, Oct 29, 2012 at 12:14:12PM -0400, Mathieu Desnoyers wrote:
> > Most of the calls to this initialization function apply it on zeroed
> > memory (static/kzalloc'd...), which makes it useless. I'd actually be in
> > favor of removing those redundant calls (as I pointed out in another
> > email), and document that zeroed memory don't need to be explicitly
> > initialized.
> >
> > Those sites that need to really reinitialize memory, or initialize it
> > (if located on the stack or in non-zeroed dynamically allocated memory)
> > could use a memset to 0, which will likely be faster than setting to
> > NULL on many architectures.
>
> I don't think it's a good idea to optimize out the basic encapsulation
> there. We're talking about re-zeroing some static memory areas which
> are pretty small. It's just not worth optimizing out at the cost of
> proper initializtion. e.g. We might add debug fields to list_head
> later.

Future-proofness for debugging fields is indeed a very compelling
argument. Fair enough!

We might want to document this intent at the top of the initialization
function though, just in case anyone want to short-circuit it.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 16:24:46

by David Teigland

[permalink] [raw]
Subject: Re: [PATCH v7 10/16] dlm: use new hashtable implementation

On Mon, Oct 29, 2012 at 12:07:10PM -0400, Mathieu Desnoyers wrote:
> I'm fine with turning a direct + modulo mapping into a dispersed hash as
> long as there are no underlying assumptions about sequentiality of value
> accesses.
>
> If the access pattern would happen to be typically sequential, then
> adding dispersion could hurt performances significantly, turning a
> frequent L1 access into a L2 access for instance.

> All I'm asking is: have you made sure that this hash table is not
> deliberately kept sequential (without dispersion) to accelerate specific
> access patterns ? This should at least be documented in the changelog.

It was not intentional. I don't expect any benefit would be lost by
making it non-sequential.

2012-10-29 16:27:11

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

On Mon, Oct 29, 2012 at 12:14 PM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
>> On Mon, Oct 29, 2012 at 7:29 AM, Mathieu Desnoyers
>> <[email protected]> wrote:
>> > * Sasha Levin ([email protected]) wrote:
>> >> +
>> >> + for (i = 0; i < sz; i++)
>> >> + INIT_HLIST_HEAD(&ht[sz]);
>> >
>> > ouch. How did this work ? Has it been tested at all ?
>> >
>> > sz -> i
>>
>> Funny enough, it works perfectly. Generally as a test I boot the
>> kernel in a VM and let it fuzz with trinity for a bit, doing that with
>> the code above worked flawlessly.
>>
>> While it works, it's obviously wrong. Why does it work though? Usually
>> there's a list op happening pretty soon after that which brings the
>> list into proper state.
>>
>> I've been playing with a patch that adds a magic value into list_head
>> if CONFIG_DEBUG_LIST is set, and checks that magic in the list debug
>> code in lib/list_debug.c.
>>
>> Does it sound like something useful? If so I'll send that patch out.
>
> Most of the calls to this initialization function apply it on zeroed
> memory (static/kzalloc'd...), which makes it useless. I'd actually be in
> favor of removing those redundant calls (as I pointed out in another
> email), and document that zeroed memory don't need to be explicitly
> initialized.

Why would that make it useless? The idea is that the init functions
will set the magic field to something random, like:

.magic = 0xBADBEEF0;

And have list_add() and friends WARN(.magic != 0xBADBEEF0, "Using an
uninitialized list\n");

This way we'll catch all places that don't go through list initialization code.


Thanks,
Sasha

2012-10-29 16:28:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v7 09/16] SUNRPC/cache: use new hashtable implementation

On Mon, 29 Oct 2012 07:49:42 -0700 Linus Torvalds <[email protected]> wrote:

> Because there's no reason to believe that '9' is in any way a worse
> random number than something page-shift-related, is there?

9 is much better than PAGE_SHIFT. PAGE_SIZE can vary by a factor of
16, depending on config.

Everyone thinks 4k, and tests only for that. There's potential for
very large performance and behavior changes when their code gets run
on a 64k PAGE_SIZE machine.

2012-10-29 16:29:09

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 01/16] hashtable: introduce a small and naive hashtable

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 12:14 PM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> On Mon, Oct 29, 2012 at 7:29 AM, Mathieu Desnoyers
> >> <[email protected]> wrote:
> >> > * Sasha Levin ([email protected]) wrote:
> >> >> +
> >> >> + for (i = 0; i < sz; i++)
> >> >> + INIT_HLIST_HEAD(&ht[sz]);
> >> >
> >> > ouch. How did this work ? Has it been tested at all ?
> >> >
> >> > sz -> i
> >>
> >> Funny enough, it works perfectly. Generally as a test I boot the
> >> kernel in a VM and let it fuzz with trinity for a bit, doing that with
> >> the code above worked flawlessly.
> >>
> >> While it works, it's obviously wrong. Why does it work though? Usually
> >> there's a list op happening pretty soon after that which brings the
> >> list into proper state.
> >>
> >> I've been playing with a patch that adds a magic value into list_head
> >> if CONFIG_DEBUG_LIST is set, and checks that magic in the list debug
> >> code in lib/list_debug.c.
> >>
> >> Does it sound like something useful? If so I'll send that patch out.
> >
> > Most of the calls to this initialization function apply it on zeroed
> > memory (static/kzalloc'd...), which makes it useless. I'd actually be in
> > favor of removing those redundant calls (as I pointed out in another
> > email), and document that zeroed memory don't need to be explicitly
> > initialized.
>
> Why would that make it useless? The idea is that the init functions
> will set the magic field to something random, like:
>
> .magic = 0xBADBEEF0;
>
> And have list_add() and friends WARN(.magic != 0xBADBEEF0, "Using an
> uninitialized list\n");
>
> This way we'll catch all places that don't go through list initialization code.

As I replied to Tejun Heo already, I agree that keeping the
initialization in place makes sense for future-proofness. This intent
should probably be documented in a comment about the initialization
function though, just to make sure nobody will try to skip it.

Thanks,

Mathieu

>
>
> Thanks,
> Sasha

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 17:29:48

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
>> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
>> generic unrelated code in the tracepoints.
>>
>> Signed-off-by: Sasha Levin <[email protected]>
>> ---
>> kernel/tracepoint.c | 27 +++++++++++----------------
>> 1 file changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
>> index d96ba22..854df92 100644
>> --- a/kernel/tracepoint.c
>> +++ b/kernel/tracepoint.c
>> @@ -26,6 +26,7 @@
>> #include <linux/slab.h>
>> #include <linux/sched.h>
>> #include <linux/static_key.h>
>> +#include <linux/hashtable.h>
>>
>> extern struct tracepoint * const __start___tracepoints_ptrs[];
>> extern struct tracepoint * const __stop___tracepoints_ptrs[];
>> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
>> * Protected by tracepoints_mutex.
>> */
>> #define TRACEPOINT_HASH_BITS 6
>> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
>> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
>> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
>>
> [...]
>>
>> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
>>
>> static int init_tracepoints(void)
>> {
>> + hash_init(tracepoint_table);
>> +
>> return register_module_notifier(&tracepoint_module_nb);
>> }
>> __initcall(init_tracepoints);
>
> So we have a hash table defined in .bss (therefore entirely initialized
> to NULL), and you add a call to "hash_init", which iterates on the whole
> array and initialize it to NULL (again) ?
>
> This extra initialization is redundant. I think it should be removed
> from here, and hashtable.h should document that hash_init() don't need
> to be called on zeroed memory (which includes static/global variables,
> kzalloc'd memory, etc).

This was discussed in the previous series, the conclusion was to call
hash_init() either way to keep the encapsulation and consistency.

It's cheap enough and happens only once, so why not?


Thanks,
Sasha

2012-10-29 17:35:58

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

On Mon, Oct 29, 2012 at 11:59 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
>> Hi Mathieu,
>>
>> On Mon, Oct 29, 2012 at 9:29 AM, Mathieu Desnoyers
>> <[email protected]> wrote:
>> > * Sasha Levin ([email protected]) wrote:
>> > [...]
>> >> -static struct hlist_head *hash_bucket(struct net *net, const char *name)
>> >> -{
>> >> - unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
>> >> - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
>> >> -}
>> >> -
>> >> /**
>> >> * ovs_vport_locate - find a port that has already been created
>> >> *
>> >> @@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
>> >> */
>> >> struct vport *ovs_vport_locate(struct net *net, const char *name)
>> >> {
>> >> - struct hlist_head *bucket = hash_bucket(net, name);
>> >> struct vport *vport;
>> >> struct hlist_node *node;
>> >> + int key = full_name_hash(name, strlen(name));
>> >>
>> >> - hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
>> >> - if (!strcmp(name, vport->ops->get_name(vport)) &&
>> >> - net_eq(ovs_dp_get_net(vport->dp), net))
>> >> + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
>> >
>> > Is applying hash_32() on top of full_name_hash() needed and expected ?
>>
>> Since this was pointed out in several of the patches, I'll answer it
>> just once here.
>>
>> I've intentionally "allowed" double hashing with hash_32 to keep the
>> code simple.
>>
>> hash_32() is pretty simple and gcc optimizes it to be almost nothing,
>> so doing that costs us a multiplication and a shift. On the other
>> hand, we benefit from keeping our code simple - how would we avoid
>> doing this double hash? adding a different hashtable function for
>> strings? or a new function for already hashed keys? I think we benefit
>> a lot from having to mul/shr instead of adding extra lines of code
>> here.
>
> This could be done, as I pointed out in another email within this
> thread, by changing the "key" argument from add/for_each_possible to an
> expected "hash" value, and let the caller invoke hash_32() if they want.
> I doubt this would add a significant amount of complexity for users of
> this API, but would allow much more flexibility to choose hash
> functions.

Most callers do need to do the hashing though, so why add an
additional step for all callers instead of doing another hash_32 for
the ones that don't really need it?

Another question is why do you need flexibility? I think that
simplicity wins over flexibility here.

Thanks,
Sasha

2012-10-29 17:50:12

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
> >> generic unrelated code in the tracepoints.
> >>
> >> Signed-off-by: Sasha Levin <[email protected]>
> >> ---
> >> kernel/tracepoint.c | 27 +++++++++++----------------
> >> 1 file changed, 11 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> >> index d96ba22..854df92 100644
> >> --- a/kernel/tracepoint.c
> >> +++ b/kernel/tracepoint.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/slab.h>
> >> #include <linux/sched.h>
> >> #include <linux/static_key.h>
> >> +#include <linux/hashtable.h>
> >>
> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
> >> * Protected by tracepoints_mutex.
> >> */
> >> #define TRACEPOINT_HASH_BITS 6
> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
> >>
> > [...]
> >>
> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
> >>
> >> static int init_tracepoints(void)
> >> {
> >> + hash_init(tracepoint_table);
> >> +
> >> return register_module_notifier(&tracepoint_module_nb);
> >> }
> >> __initcall(init_tracepoints);
> >
> > So we have a hash table defined in .bss (therefore entirely initialized
> > to NULL), and you add a call to "hash_init", which iterates on the whole
> > array and initialize it to NULL (again) ?
> >
> > This extra initialization is redundant. I think it should be removed
> > from here, and hashtable.h should document that hash_init() don't need
> > to be called on zeroed memory (which includes static/global variables,
> > kzalloc'd memory, etc).
>
> This was discussed in the previous series, the conclusion was to call
> hash_init() either way to keep the encapsulation and consistency.

Agreed,

Thanks,

Mathieu

>
> It's cheap enough and happens only once, so why not?
>
>
> Thanks,
> Sasha

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 18:16:53

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 11:59 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> Hi Mathieu,
> >>
> >> On Mon, Oct 29, 2012 at 9:29 AM, Mathieu Desnoyers
> >> <[email protected]> wrote:
> >> > * Sasha Levin ([email protected]) wrote:
> >> > [...]
> >> >> -static struct hlist_head *hash_bucket(struct net *net, const char *name)
> >> >> -{
> >> >> - unsigned int hash = jhash(name, strlen(name), (unsigned long) net);
> >> >> - return &dev_table[hash & (VPORT_HASH_BUCKETS - 1)];
> >> >> -}
> >> >> -
> >> >> /**
> >> >> * ovs_vport_locate - find a port that has already been created
> >> >> *
> >> >> @@ -84,13 +76,12 @@ static struct hlist_head *hash_bucket(struct net *net, const char *name)
> >> >> */
> >> >> struct vport *ovs_vport_locate(struct net *net, const char *name)
> >> >> {
> >> >> - struct hlist_head *bucket = hash_bucket(net, name);
> >> >> struct vport *vport;
> >> >> struct hlist_node *node;
> >> >> + int key = full_name_hash(name, strlen(name));
> >> >>
> >> >> - hlist_for_each_entry_rcu(vport, node, bucket, hash_node)
> >> >> - if (!strcmp(name, vport->ops->get_name(vport)) &&
> >> >> - net_eq(ovs_dp_get_net(vport->dp), net))
> >> >> + hash_for_each_possible_rcu(dev_table, vport, node, hash_node, key)
> >> >
> >> > Is applying hash_32() on top of full_name_hash() needed and expected ?
> >>
> >> Since this was pointed out in several of the patches, I'll answer it
> >> just once here.
> >>
> >> I've intentionally "allowed" double hashing with hash_32 to keep the
> >> code simple.
> >>
> >> hash_32() is pretty simple and gcc optimizes it to be almost nothing,
> >> so doing that costs us a multiplication and a shift. On the other
> >> hand, we benefit from keeping our code simple - how would we avoid
> >> doing this double hash? adding a different hashtable function for
> >> strings? or a new function for already hashed keys? I think we benefit
> >> a lot from having to mul/shr instead of adding extra lines of code
> >> here.
> >
> > This could be done, as I pointed out in another email within this
> > thread, by changing the "key" argument from add/for_each_possible to an
> > expected "hash" value, and let the caller invoke hash_32() if they want.
> > I doubt this would add a significant amount of complexity for users of
> > this API, but would allow much more flexibility to choose hash
> > functions.
>
> Most callers do need to do the hashing though, so why add an
> additional step for all callers instead of doing another hash_32 for
> the ones that don't really need it?
>
> Another question is why do you need flexibility? I think that
> simplicity wins over flexibility here.

I usually try to make things as simple as possible, but not simplistic
compared to the problem tackled. In this case, I would ask the following
question: by standardizing the hash function of all those pieces of
kernel infrastructure to "hash_32()", including submodules part of the
kernel network infrastructure, parts of the kernel that can be fed
values coming from user-space (through the VFS), how can you guarantee
that hash_32() won't be the cause of a DoS attack based on the fact that
this algorithm is a) known by an attacker, and b) does not have any
randomness. It's been a recent trend to perform DoS attacks on poorly
implemented hashing functions.

This is just one example in an attempt to show why different hash table
users may have different constraints: for a hash table entirely
populated by keys generated internally by the kernel, a random seed
might not be required, but for cases where values are fed by user-space
and from the NIC, I would argue that flexibility to implement a
randomizable hash function beats implementation simplicity any time.

And you could keep the basic use-case simple by providing hints to the
hash_32()/hash_64()/hash_ulong() helpers in comments.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 18:22:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

Hello,

On Mon, Oct 29, 2012 at 02:16:48PM -0400, Mathieu Desnoyers wrote:
> This is just one example in an attempt to show why different hash table
> users may have different constraints: for a hash table entirely
> populated by keys generated internally by the kernel, a random seed
> might not be required, but for cases where values are fed by user-space
> and from the NIC, I would argue that flexibility to implement a
> randomizable hash function beats implementation simplicity any time.
>
> And you could keep the basic use-case simple by providing hints to the
> hash_32()/hash_64()/hash_ulong() helpers in comments.

If all you need is throwing in a salt value to avoid attacks, can't
you just do that from caller side? Scrambling the key before feeding
it into hash_*() should work, no?

Thanks.

--
tejun

2012-10-29 18:32:12

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 01:29:24PM -0400, Sasha Levin wrote:
> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
> >> generic unrelated code in the tracepoints.
> >>
> >> Signed-off-by: Sasha Levin <[email protected]>
> >> ---
> >> kernel/tracepoint.c | 27 +++++++++++----------------
> >> 1 file changed, 11 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> >> index d96ba22..854df92 100644
> >> --- a/kernel/tracepoint.c
> >> +++ b/kernel/tracepoint.c
> >> @@ -26,6 +26,7 @@
> >> #include <linux/slab.h>
> >> #include <linux/sched.h>
> >> #include <linux/static_key.h>
> >> +#include <linux/hashtable.h>
> >>
> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
> >> * Protected by tracepoints_mutex.
> >> */
> >> #define TRACEPOINT_HASH_BITS 6
> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
> >>
> > [...]
> >>
> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
> >>
> >> static int init_tracepoints(void)
> >> {
> >> + hash_init(tracepoint_table);
> >> +
> >> return register_module_notifier(&tracepoint_module_nb);
> >> }
> >> __initcall(init_tracepoints);
> >
> > So we have a hash table defined in .bss (therefore entirely initialized
> > to NULL), and you add a call to "hash_init", which iterates on the whole
> > array and initialize it to NULL (again) ?
> >
> > This extra initialization is redundant. I think it should be removed
> > from here, and hashtable.h should document that hash_init() don't need
> > to be called on zeroed memory (which includes static/global variables,
> > kzalloc'd memory, etc).
>
> This was discussed in the previous series, the conclusion was to call
> hash_init() either way to keep the encapsulation and consistency.
>
> It's cheap enough and happens only once, so why not?

Unnecessary work adds up. Better not to do it unnecessarily, even if by
itself it doesn't cost that much.

It doesn't seem that difficult for future fields to have 0 as their
initialized state.

- Josh Triplett

2012-10-29 18:35:15

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 15/16] openvswitch: use new hashtable implementation

* Tejun Heo ([email protected]) wrote:
> Hello,
>
> On Mon, Oct 29, 2012 at 02:16:48PM -0400, Mathieu Desnoyers wrote:
> > This is just one example in an attempt to show why different hash table
> > users may have different constraints: for a hash table entirely
> > populated by keys generated internally by the kernel, a random seed
> > might not be required, but for cases where values are fed by user-space
> > and from the NIC, I would argue that flexibility to implement a
> > randomizable hash function beats implementation simplicity any time.
> >
> > And you could keep the basic use-case simple by providing hints to the
> > hash_32()/hash_64()/hash_ulong() helpers in comments.
>
> If all you need is throwing in a salt value to avoid attacks, can't
> you just do that from caller side? Scrambling the key before feeding
> it into hash_*() should work, no?

Yes, I think salting the "key" parameter would work.

Thanks,

Mathieu

>
> Thanks.
>
> --
> tejun

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 18:42:55

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 2:31 PM, Josh Triplett <[email protected]> wrote:
> On Mon, Oct 29, 2012 at 01:29:24PM -0400, Sasha Levin wrote:
>> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
>> <[email protected]> wrote:
>> > * Sasha Levin ([email protected]) wrote:
>> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
>> >> generic unrelated code in the tracepoints.
>> >>
>> >> Signed-off-by: Sasha Levin <[email protected]>
>> >> ---
>> >> kernel/tracepoint.c | 27 +++++++++++----------------
>> >> 1 file changed, 11 insertions(+), 16 deletions(-)
>> >>
>> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
>> >> index d96ba22..854df92 100644
>> >> --- a/kernel/tracepoint.c
>> >> +++ b/kernel/tracepoint.c
>> >> @@ -26,6 +26,7 @@
>> >> #include <linux/slab.h>
>> >> #include <linux/sched.h>
>> >> #include <linux/static_key.h>
>> >> +#include <linux/hashtable.h>
>> >>
>> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
>> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
>> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
>> >> * Protected by tracepoints_mutex.
>> >> */
>> >> #define TRACEPOINT_HASH_BITS 6
>> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
>> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
>> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
>> >>
>> > [...]
>> >>
>> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
>> >>
>> >> static int init_tracepoints(void)
>> >> {
>> >> + hash_init(tracepoint_table);
>> >> +
>> >> return register_module_notifier(&tracepoint_module_nb);
>> >> }
>> >> __initcall(init_tracepoints);
>> >
>> > So we have a hash table defined in .bss (therefore entirely initialized
>> > to NULL), and you add a call to "hash_init", which iterates on the whole
>> > array and initialize it to NULL (again) ?
>> >
>> > This extra initialization is redundant. I think it should be removed
>> > from here, and hashtable.h should document that hash_init() don't need
>> > to be called on zeroed memory (which includes static/global variables,
>> > kzalloc'd memory, etc).
>>
>> This was discussed in the previous series, the conclusion was to call
>> hash_init() either way to keep the encapsulation and consistency.
>>
>> It's cheap enough and happens only once, so why not?
>
> Unnecessary work adds up. Better not to do it unnecessarily, even if by
> itself it doesn't cost that much.
>
> It doesn't seem that difficult for future fields to have 0 as their
> initialized state.

Let's put it this way: hlist requires the user to initialize hlist
head before usage, therefore as a hlist user, hashtable implementation
must do that.

We do it automatically when the hashtable user does
DEFINE_HASHTABLE(), but we can't do that if he does
DECLARE_HASHTABLE(). This means that the hashtable user must call
hash_init() whenever he uses DECLARE_HASHTABLE() to create his
hashtable.

There are two options here, either we specify that hash_init() should
only be called if DECLARE_HASHTABLE() was called, which is confusing,
inconsistent and prone to errors, or we can just say that it should be
called whenever a hashtable is used.

The only way to work around it IMO is to get hlist to not require
initializing before usage, and there are good reasons that that won't
happen.


Thanks,
Sasha

2012-10-29 18:53:25

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 2:31 PM, Josh Triplett <[email protected]> wrote:
> > On Mon, Oct 29, 2012 at 01:29:24PM -0400, Sasha Levin wrote:
> >> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
> >> <[email protected]> wrote:
> >> > * Sasha Levin ([email protected]) wrote:
> >> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
> >> >> generic unrelated code in the tracepoints.
> >> >>
> >> >> Signed-off-by: Sasha Levin <[email protected]>
> >> >> ---
> >> >> kernel/tracepoint.c | 27 +++++++++++----------------
> >> >> 1 file changed, 11 insertions(+), 16 deletions(-)
> >> >>
> >> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> >> >> index d96ba22..854df92 100644
> >> >> --- a/kernel/tracepoint.c
> >> >> +++ b/kernel/tracepoint.c
> >> >> @@ -26,6 +26,7 @@
> >> >> #include <linux/slab.h>
> >> >> #include <linux/sched.h>
> >> >> #include <linux/static_key.h>
> >> >> +#include <linux/hashtable.h>
> >> >>
> >> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
> >> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> >> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
> >> >> * Protected by tracepoints_mutex.
> >> >> */
> >> >> #define TRACEPOINT_HASH_BITS 6
> >> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> >> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
> >> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
> >> >>
> >> > [...]
> >> >>
> >> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
> >> >>
> >> >> static int init_tracepoints(void)
> >> >> {
> >> >> + hash_init(tracepoint_table);
> >> >> +
> >> >> return register_module_notifier(&tracepoint_module_nb);
> >> >> }
> >> >> __initcall(init_tracepoints);
> >> >
> >> > So we have a hash table defined in .bss (therefore entirely initialized
> >> > to NULL), and you add a call to "hash_init", which iterates on the whole
> >> > array and initialize it to NULL (again) ?
> >> >
> >> > This extra initialization is redundant. I think it should be removed
> >> > from here, and hashtable.h should document that hash_init() don't need
> >> > to be called on zeroed memory (which includes static/global variables,
> >> > kzalloc'd memory, etc).
> >>
> >> This was discussed in the previous series, the conclusion was to call
> >> hash_init() either way to keep the encapsulation and consistency.
> >>
> >> It's cheap enough and happens only once, so why not?
> >
> > Unnecessary work adds up. Better not to do it unnecessarily, even if by
> > itself it doesn't cost that much.
> >
> > It doesn't seem that difficult for future fields to have 0 as their
> > initialized state.
>
> Let's put it this way: hlist requires the user to initialize hlist
> head before usage, therefore as a hlist user, hashtable implementation
> must do that.
>
> We do it automatically when the hashtable user does
> DEFINE_HASHTABLE(), but we can't do that if he does
> DECLARE_HASHTABLE(). This means that the hashtable user must call
> hash_init() whenever he uses DECLARE_HASHTABLE() to create his
> hashtable.
>
> There are two options here, either we specify that hash_init() should
> only be called if DECLARE_HASHTABLE() was called, which is confusing,
> inconsistent and prone to errors, or we can just say that it should be
> called whenever a hashtable is used.
>
> The only way to work around it IMO is to get hlist to not require
> initializing before usage, and there are good reasons that that won't
> happen.

Hrm, just a second here.

The argument about hash_init being useful to add magic values in the
future only works for the cases where a hash table is declared with
DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
because we could initialize any debugging variables from within
DEFINE_HASHTABLE().

So I take my "Agreed" back. I disagree with initializing the hash table
twice redundantly. There should be at least "DEFINE_HASHTABLE()" or a
hash_init() (for DECLARE_HASHTABLE()), but not useless execution
initialization on top of an already statically initialized hash table.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 18:58:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 02:53:19PM -0400, Mathieu Desnoyers wrote:
> The argument about hash_init being useful to add magic values in the
> future only works for the cases where a hash table is declared with
> DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
> because we could initialize any debugging variables from within
> DEFINE_HASHTABLE().

You can do that with [0 .. HASH_SIZE - 1] initializer.

Thanks.

--
tejun

2012-10-29 19:01:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 11:58:14AM -0700, Tejun Heo wrote:
> On Mon, Oct 29, 2012 at 02:53:19PM -0400, Mathieu Desnoyers wrote:
> > The argument about hash_init being useful to add magic values in the
> > future only works for the cases where a hash table is declared with
> > DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
> > because we could initialize any debugging variables from within
> > DEFINE_HASHTABLE().
>
> You can do that with [0 .. HASH_SIZE - 1] initializer.

And in general, let's please try not to do optimizations which are
pointless. Just stick to the usual semantics. You have an abstract
data structure - invoke the initializer before using it. Sure,
optimize it if it shows up somewhere. And here, if we do the
initializers properly, it shouldn't cause any more actual overhead -
ie. DEFINE_HASHTABLE() will basicallly boil down to all zero
assignments and the compiler will put the whole thing in .bss anyway.

Thanks.

--
tejun

2012-10-29 19:09:59

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 2:53 PM, Mathieu Desnoyers
<[email protected]> wrote:
> * Sasha Levin ([email protected]) wrote:
>> On Mon, Oct 29, 2012 at 2:31 PM, Josh Triplett <[email protected]> wrote:
>> > On Mon, Oct 29, 2012 at 01:29:24PM -0400, Sasha Levin wrote:
>> >> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
>> >> <[email protected]> wrote:
>> >> > * Sasha Levin ([email protected]) wrote:
>> >> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
>> >> >> generic unrelated code in the tracepoints.
>> >> >>
>> >> >> Signed-off-by: Sasha Levin <[email protected]>
>> >> >> ---
>> >> >> kernel/tracepoint.c | 27 +++++++++++----------------
>> >> >> 1 file changed, 11 insertions(+), 16 deletions(-)
>> >> >>
>> >> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
>> >> >> index d96ba22..854df92 100644
>> >> >> --- a/kernel/tracepoint.c
>> >> >> +++ b/kernel/tracepoint.c
>> >> >> @@ -26,6 +26,7 @@
>> >> >> #include <linux/slab.h>
>> >> >> #include <linux/sched.h>
>> >> >> #include <linux/static_key.h>
>> >> >> +#include <linux/hashtable.h>
>> >> >>
>> >> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
>> >> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
>> >> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
>> >> >> * Protected by tracepoints_mutex.
>> >> >> */
>> >> >> #define TRACEPOINT_HASH_BITS 6
>> >> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
>> >> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
>> >> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
>> >> >>
>> >> > [...]
>> >> >>
>> >> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
>> >> >>
>> >> >> static int init_tracepoints(void)
>> >> >> {
>> >> >> + hash_init(tracepoint_table);
>> >> >> +
>> >> >> return register_module_notifier(&tracepoint_module_nb);
>> >> >> }
>> >> >> __initcall(init_tracepoints);
>> >> >
>> >> > So we have a hash table defined in .bss (therefore entirely initialized
>> >> > to NULL), and you add a call to "hash_init", which iterates on the whole
>> >> > array and initialize it to NULL (again) ?
>> >> >
>> >> > This extra initialization is redundant. I think it should be removed
>> >> > from here, and hashtable.h should document that hash_init() don't need
>> >> > to be called on zeroed memory (which includes static/global variables,
>> >> > kzalloc'd memory, etc).
>> >>
>> >> This was discussed in the previous series, the conclusion was to call
>> >> hash_init() either way to keep the encapsulation and consistency.
>> >>
>> >> It's cheap enough and happens only once, so why not?
>> >
>> > Unnecessary work adds up. Better not to do it unnecessarily, even if by
>> > itself it doesn't cost that much.
>> >
>> > It doesn't seem that difficult for future fields to have 0 as their
>> > initialized state.
>>
>> Let's put it this way: hlist requires the user to initialize hlist
>> head before usage, therefore as a hlist user, hashtable implementation
>> must do that.
>>
>> We do it automatically when the hashtable user does
>> DEFINE_HASHTABLE(), but we can't do that if he does
>> DECLARE_HASHTABLE(). This means that the hashtable user must call
>> hash_init() whenever he uses DECLARE_HASHTABLE() to create his
>> hashtable.
>>
>> There are two options here, either we specify that hash_init() should
>> only be called if DECLARE_HASHTABLE() was called, which is confusing,
>> inconsistent and prone to errors, or we can just say that it should be
>> called whenever a hashtable is used.
>>
>> The only way to work around it IMO is to get hlist to not require
>> initializing before usage, and there are good reasons that that won't
>> happen.
>
> Hrm, just a second here.
>
> The argument about hash_init being useful to add magic values in the
> future only works for the cases where a hash table is declared with
> DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
> because we could initialize any debugging variables from within
> DEFINE_HASHTABLE().
>
> So I take my "Agreed" back. I disagree with initializing the hash table
> twice redundantly. There should be at least "DEFINE_HASHTABLE()" or a
> hash_init() (for DECLARE_HASHTABLE()), but not useless execution
> initialization on top of an already statically initialized hash table.

The "magic values" argument was used to point out that some sort of
initialization *must* occur, either by hash_init() or by a proper
initialization in DEFINE_HASHTABLE(), and we can't simply memset() it
to 0. It appears that we all agree on that.

The other thing is whether hash_init() should be called for hashtables
that were created with DEFINE_HASHTABLE(). That point was raised by
Neil Brown last time this series went around, and it seems that no one
objected to the point that it should be consistent across the code.

Even if we ignore hash_init() being mostly optimized out, is it really
worth it taking the risk that some future patch would move a hashtable
that user DEFINE_HASHTABLE() into a struct and will start using
DECLARE_HASHTABLE() and forgetting to initialize it, for example?


Thanks,
Sasha

2012-10-29 19:10:46

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

* Tejun Heo ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 11:58:14AM -0700, Tejun Heo wrote:
> > On Mon, Oct 29, 2012 at 02:53:19PM -0400, Mathieu Desnoyers wrote:
> > > The argument about hash_init being useful to add magic values in the
> > > future only works for the cases where a hash table is declared with
> > > DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
> > > because we could initialize any debugging variables from within
> > > DEFINE_HASHTABLE().
> >
> > You can do that with [0 .. HASH_SIZE - 1] initializer.
>
> And in general, let's please try not to do optimizations which are
> pointless. Just stick to the usual semantics. You have an abstract
> data structure - invoke the initializer before using it. Sure,
> optimize it if it shows up somewhere. And here, if we do the
> initializers properly, it shouldn't cause any more actual overhead -
> ie. DEFINE_HASHTABLE() will basicallly boil down to all zero
> assignments and the compiler will put the whole thing in .bss anyway.

Yes, agreed. I was going too far in optimization land by proposing
assumptions on zeroed memory. All I actually really care about is that
we don't end up calling hash_init() on a statically defined (and thus
already initialized) hash table.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 19:13:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 03:09:36PM -0400, Sasha Levin wrote:
> The other thing is whether hash_init() should be called for hashtables
> that were created with DEFINE_HASHTABLE(). That point was raised by
> Neil Brown last time this series went around, and it seems that no one
> objected to the point that it should be consistent across the code.

Hmmm? If something is DEFINE_XXX()'d, you definitely shouldn't be
calling XXX_init() on it. That's how it is with most other abstract
data types and you need *VERY* strong rationale to deviate from that.

Thanks.

--
tejun

2012-10-29 19:17:05

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

* Sasha Levin ([email protected]) wrote:
> On Mon, Oct 29, 2012 at 2:53 PM, Mathieu Desnoyers
> <[email protected]> wrote:
> > * Sasha Levin ([email protected]) wrote:
> >> On Mon, Oct 29, 2012 at 2:31 PM, Josh Triplett <[email protected]> wrote:
> >> > On Mon, Oct 29, 2012 at 01:29:24PM -0400, Sasha Levin wrote:
> >> >> On Mon, Oct 29, 2012 at 7:35 AM, Mathieu Desnoyers
> >> >> <[email protected]> wrote:
> >> >> > * Sasha Levin ([email protected]) wrote:
> >> >> >> Switch tracepoints to use the new hashtable implementation. This reduces the amount of
> >> >> >> generic unrelated code in the tracepoints.
> >> >> >>
> >> >> >> Signed-off-by: Sasha Levin <[email protected]>
> >> >> >> ---
> >> >> >> kernel/tracepoint.c | 27 +++++++++++----------------
> >> >> >> 1 file changed, 11 insertions(+), 16 deletions(-)
> >> >> >>
> >> >> >> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> >> >> >> index d96ba22..854df92 100644
> >> >> >> --- a/kernel/tracepoint.c
> >> >> >> +++ b/kernel/tracepoint.c
> >> >> >> @@ -26,6 +26,7 @@
> >> >> >> #include <linux/slab.h>
> >> >> >> #include <linux/sched.h>
> >> >> >> #include <linux/static_key.h>
> >> >> >> +#include <linux/hashtable.h>
> >> >> >>
> >> >> >> extern struct tracepoint * const __start___tracepoints_ptrs[];
> >> >> >> extern struct tracepoint * const __stop___tracepoints_ptrs[];
> >> >> >> @@ -49,8 +50,7 @@ static LIST_HEAD(tracepoint_module_list);
> >> >> >> * Protected by tracepoints_mutex.
> >> >> >> */
> >> >> >> #define TRACEPOINT_HASH_BITS 6
> >> >> >> -#define TRACEPOINT_TABLE_SIZE (1 << TRACEPOINT_HASH_BITS)
> >> >> >> -static struct hlist_head tracepoint_table[TRACEPOINT_TABLE_SIZE];
> >> >> >> +static DEFINE_HASHTABLE(tracepoint_table, TRACEPOINT_HASH_BITS);
> >> >> >>
> >> >> > [...]
> >> >> >>
> >> >> >> @@ -722,6 +715,8 @@ struct notifier_block tracepoint_module_nb = {
> >> >> >>
> >> >> >> static int init_tracepoints(void)
> >> >> >> {
> >> >> >> + hash_init(tracepoint_table);
> >> >> >> +
> >> >> >> return register_module_notifier(&tracepoint_module_nb);
> >> >> >> }
> >> >> >> __initcall(init_tracepoints);
> >> >> >
> >> >> > So we have a hash table defined in .bss (therefore entirely initialized
> >> >> > to NULL), and you add a call to "hash_init", which iterates on the whole
> >> >> > array and initialize it to NULL (again) ?
> >> >> >
> >> >> > This extra initialization is redundant. I think it should be removed
> >> >> > from here, and hashtable.h should document that hash_init() don't need
> >> >> > to be called on zeroed memory (which includes static/global variables,
> >> >> > kzalloc'd memory, etc).
> >> >>
> >> >> This was discussed in the previous series, the conclusion was to call
> >> >> hash_init() either way to keep the encapsulation and consistency.
> >> >>
> >> >> It's cheap enough and happens only once, so why not?
> >> >
> >> > Unnecessary work adds up. Better not to do it unnecessarily, even if by
> >> > itself it doesn't cost that much.
> >> >
> >> > It doesn't seem that difficult for future fields to have 0 as their
> >> > initialized state.
> >>
> >> Let's put it this way: hlist requires the user to initialize hlist
> >> head before usage, therefore as a hlist user, hashtable implementation
> >> must do that.
> >>
> >> We do it automatically when the hashtable user does
> >> DEFINE_HASHTABLE(), but we can't do that if he does
> >> DECLARE_HASHTABLE(). This means that the hashtable user must call
> >> hash_init() whenever he uses DECLARE_HASHTABLE() to create his
> >> hashtable.
> >>
> >> There are two options here, either we specify that hash_init() should
> >> only be called if DECLARE_HASHTABLE() was called, which is confusing,
> >> inconsistent and prone to errors, or we can just say that it should be
> >> called whenever a hashtable is used.
> >>
> >> The only way to work around it IMO is to get hlist to not require
> >> initializing before usage, and there are good reasons that that won't
> >> happen.
> >
> > Hrm, just a second here.
> >
> > The argument about hash_init being useful to add magic values in the
> > future only works for the cases where a hash table is declared with
> > DECLARE_HASHTABLE(). It's completely pointless with DEFINE_HASHTABLE(),
> > because we could initialize any debugging variables from within
> > DEFINE_HASHTABLE().
> >
> > So I take my "Agreed" back. I disagree with initializing the hash table
> > twice redundantly. There should be at least "DEFINE_HASHTABLE()" or a
> > hash_init() (for DECLARE_HASHTABLE()), but not useless execution
> > initialization on top of an already statically initialized hash table.
>
> The "magic values" argument was used to point out that some sort of
> initialization *must* occur, either by hash_init() or by a proper
> initialization in DEFINE_HASHTABLE(), and we can't simply memset() it
> to 0. It appears that we all agree on that.

Yes.

> The other thing is whether hash_init() should be called for hashtables
> that were created with DEFINE_HASHTABLE(). That point was raised by
> Neil Brown last time this series went around, and it seems that no one
> objected to the point that it should be consistent across the code.

I was probably busy in the San Diego area at that time, or preparing for
it, sorry! :)

>
> Even if we ignore hash_init() being mostly optimized out, is it really
> worth it taking the risk that some future patch would move a hashtable
> that user DEFINE_HASHTABLE() into a struct and will start using
> DECLARE_HASHTABLE() and forgetting to initialize it, for example?

There is a saying that with "if"s, we could put Paris in a bottle. ;)

Please have a look at "linux/wait.h", where if a wait queue is defined
with DEFINE_*(), there is just no need to initialize it at runtime.
There are plenty other kernel headers that do the same. I don't see why
hashtable.h should be different.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2012-10-29 19:17:40

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v7 06/16] tracepoint: use new hashtable implementation

On Mon, Oct 29, 2012 at 3:12 PM, Tejun Heo <[email protected]> wrote:
> On Mon, Oct 29, 2012 at 03:09:36PM -0400, Sasha Levin wrote:
>> The other thing is whether hash_init() should be called for hashtables
>> that were created with DEFINE_HASHTABLE(). That point was raised by
>> Neil Brown last time this series went around, and it seems that no one
>> objected to the point that it should be consistent across the code.
>
> Hmmm? If something is DEFINE_XXX()'d, you definitely shouldn't be
> calling XXX_init() on it. That's how it is with most other abstract
> data types and you need *VERY* strong rationale to deviate from that.

Neil Brown raised that point last time that this series went around,
and suggested that this should be consistent and hash_init() would
appear everywhere, even if DEFINE_HASHTABLE() was used. Since no one
objected to that I thought we're going with that.

I'll chalk it up to me getting confused :)


Thanks,
Sasha