LinuxLists.cc - [PATCH v2] lib/idr.c rewrite, percpu ida/tag allocator

2013-06-19 00:02:41

Subject: [PATCH v2] lib/idr.c rewrite, percpu ida/tag allocator

2013-06-19 00:02:53

by Kent Overstreet

[permalink] [raw]

Subject: [PATCH 03/10] idr: Rewrite ida

This is a new, from scratch implementation of ida that should be
simpler, faster and more space efficient.

Two primary reasons for the rewrite:
* A future patch will reimplement idr on top of this ida implementation +
radix trees. Once that's done, the end result will be ~1k fewer lines
of code, much simpler and easier to understand and it should be quite
a bit faster.

* The performance improvements and addition of ganged allocation should
make ida more suitable for use by a percpu id/tag allocator, which
would then act as a frontend to this allocator.

The old ida implementation was done with the idr data structures - this
was IMO backwards. I'll soon be reimplementing idr on top of this new
ida implementation and radix trees - using a separate dedicated data
structure for the free ID bitmap should actually make idr faster, and
the end result is _significantly_ less code.

This implementation conceptually isn't that different from the old one -
it's a tree of bitmaps, where one bit in a given node indicates whether
or not there are free bits in a child node.

The main difference (and advantage) over the old version is that the
tree isn't implemented with pointers - it's implemented in an array,
like how heaps are implemented, which both better space efficiency and
it'll be faster since there's no pointer chasing.

This does mean that the entire bitmap is stored in one contiguous memory
allocation - and as currently implemented we won't be able to allocate
_quite_ as many ids as with the previous implementation.

I don't expect this to be an issue in practice since anywhere this is
used, an id corresponds to a struct allocation somewher else - we can't
allocate an unbounded number of ids, we'll run out of memory somewhere
else eventually, and I expect that to be the limiting factor in
practice.

If a user/use case does come up where this matters I can add some
sharding (or perhaps add a separate big_ida implementation) - but the
extra complexity would adversely affect performance for the users that
don't need > millions of ids, so I intend to leave the implementation as
is until if and when this becomes an issue.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Fengguang Wu <[email protected]>
---
include/linux/idr.h | 118 +++++---
lib/idr.c | 776 +++++++++++++++++++++++++++++++++-------------------
2 files changed, 573 insertions(+), 321 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index c0e0c54..a78cf99 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -17,6 +17,88 @@
#include <linux/init.h>
#include <linux/rcupdate.h>

+/* IDA */
+
+#define IDA_INLINE_NODES 1
+
+struct ida {
+ spinlock_t lock;
+
+ /*
+ * cur_id and allocated_ids are for ida_alloc_cyclic. For cyclic
+ * allocations we search for new ids to allocate starting from the last
+ * id allocated - cur_id is the next id to try allocating.
+ *
+ * But we also don't want the allocated ids to be arbitrarily sparse -
+ * the memory usage for the bitmap could be arbitrarily bad, and if
+ * they're used as keys in a radix tree the memory overhead of the radix
+ * tree could be quite bad as well. So we use allocated_ids to decide
+ * when to restart cur_id from 0, and bound how sparse the bitmap can
+ * be.
+ */
+ unsigned cur_id;
+ unsigned allocated_ids;
+
+ /* size of ida->tree */
+ unsigned nodes;
+
+ /*
+ * Index of first leaf node in ida->tree; equal to the number of non
+ * leaf nodes, ida->nodes - ida->first_leaf == number of leaf nodes
+ */
+ unsigned first_leaf;
+
+ unsigned long *tree;
+ unsigned long inline_nodes[IDA_INLINE_NODES];
+};
+
+#define IDA_INIT(name) \
+{ \
+ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
+ .nodes = IDA_INLINE_NODES, \
+ .first_leaf = 1, \
+ .tree = name.inline_nodes, \
+}
+#define DEFINE_IDA(name) struct ida name = IDA_INIT(name)
+
+void ida_remove(struct ida *ida, unsigned id);
+int ida_alloc_range(struct ida *ida, unsigned int start,
+ unsigned int end, gfp_t gfp);
+int ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end, gfp_t gfp);
+void ida_destroy(struct ida *ida);
+int ida_init_prealloc(struct ida *ida, unsigned prealloc);
+
+/**
+ * ida_alloc_range - allocate a new id.
+ * @ida: the (initialized) ida.
+ * @gfp_mask: memory allocation flags
+ *
+ * Allocates an id in the range [0, INT_MAX]. Returns -ENOSPC if no ids are
+ * available, or -ENOMEM on memory allocation failure.
+ *
+ * Returns the smallest available id
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+static inline int ida_alloc(struct ida *ida, gfp_t gfp_mask)
+{
+ return ida_alloc_range(ida, 0, 0, gfp_mask);
+}
+
+/**
+ * ida_init - initialize ida handle
+ * @ida: ida handle
+ *
+ * This function is use to set up the handle (@ida) that you will pass
+ * to the rest of the functions.
+ */
+static inline void ida_init(struct ida *ida)
+{
+ ida_init_prealloc(ida, 0);
+}
+
+/* IDR */
+
/*
* We want shallower trees and thus more bits covered at each layer. 8
* bits gives us large enough first layer for most use cases and maximum
@@ -195,42 +277,6 @@ static inline void __deprecated idr_remove_all(struct idr *idp)
__idr_remove_all(idp);
}

-/*
- * IDA - IDR based id allocator, use when translation from id to
- * pointer isn't necessary.
- *
- * IDA_BITMAP_LONGS is calculated to be one less to accommodate
- * ida_bitmap->nr_busy so that the whole struct fits in 128 bytes.
- */
-#define IDA_CHUNK_SIZE 128 /* 128 bytes per chunk */
-#define IDA_BITMAP_LONGS (IDA_CHUNK_SIZE / sizeof(long) - 1)
-#define IDA_BITMAP_BITS (IDA_BITMAP_LONGS * sizeof(long) * 8)
-
-struct ida_bitmap {
- long nr_busy;
- unsigned long bitmap[IDA_BITMAP_LONGS];
-};
-
-struct ida {
- struct idr idr;
- struct ida_bitmap *free_bitmap;
-};
-
-#define IDA_INIT(name) { .idr = IDR_INIT((name).idr), .free_bitmap = NULL, }
-#define DEFINE_IDA(name) struct ida name = IDA_INIT(name)
-
-void ida_destroy(struct ida *ida);
-void ida_init(struct ida *ida);
-
-int ida_alloc_range(struct ida *ida, unsigned int start, unsigned int end,
- gfp_t gfp_mask);
-void ida_remove(struct ida *ida, unsigned int id);
-
-static inline int ida_alloc(struct ida *ida, gfp_t gfp_mask)
-{
- return ida_alloc_range(ida, 0, 0, gfp_mask);
-}
-
void __init idr_init_cache(void);

#endif /* __IDR_H__ */
diff --git a/lib/idr.c b/lib/idr.c
index df2d32e..e6b837a 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,6 +8,8 @@
*
* Modified by Nadia Derbey to make it RCU safe.
*
+ * IDA completely rewritten by Kent Overstreet <[email protected]>
+ *
* Small id to pointer translation service.
*
* It uses a radix tree like structure as a sparse array indexed
@@ -38,6 +40,495 @@
#include <linux/percpu.h>
#include <linux/hardirq.h>

+#define IDA_TREE_ARY BITS_PER_LONG
+
+/**
+ * DOC: IDA description
+ * IDA - ID (small integer) allocator
+ *
+ * This works much like using a simple bitmap to allocate indices - ida_alloc()
+ * is equivalent to find_first_zero_bit() then __set_bit(), and ida_remove() is
+ * equivalent to __clear_bit(). But it's much more efficient than a large
+ * bitmap, and resizes itself as needed.
+ *
+ * It's implemented as a tree of bitmaps: a node in the tree is a single
+ * unsigned long. The leaf nodes of the tree are segments of the entire bitmap -
+ * a cleared bit indicates a free id, and a set bit indicates an allocated one.
+ * Bits in the parent nodes indicate whether or not there are free bits in the
+ * corresponding child node - when all the bits in a parent node are set, none
+ * of its children have bits free.
+ *
+ * The splay factor of the tree (IDA_TREE_ARY) == BITS_PER_LONG - parent nodes
+ * have 32 or 64 children.
+ *
+ * The tree itself is implemented with an array instead of pointers - exactly
+ * like the textbook implementation of D-ary heaps. The root of the bitmap tree
+ * is at ida->tree[0]. The children of node i are at i * IDA_TREE_ARY + 1 + j,
+ * where j is in the range [0, 63], and the parent of node i is at (i - 1) /
+ * IDA_TREE_ARY.
+ *
+ * This conveniently means that our leaf nodes are all contiguous in memory -
+ * the bit for id i is bit id % BITS_PER_LONG in ida->tree[ida->first_leaf + i /
+ * BITS_PER_LONG].
+ *
+ * Note that the number of ids we can allocate is limited by the amount of
+ * memory we can contiguously allocate. The amount of memory used for the bitmap
+ * tree is only slightly more than a flat bitmap would use - about 1 / TREE_ARY
+ * * (sizeof flat bitmap).
+ *
+ * So for 1 mb of memory (and allocating more than that should be fine with
+ * CONFIG_COMPACTION) you get slightly under 8 million IDs.
+ */
+
+/*
+ * For a given number of nodes, calculate how many are going to be parent nodes
+ * (equal to ida->first_leaf) and by extension how my will be leaves.
+ */
+static unsigned first_leaf_from_nodes(unsigned nodes)
+{
+ unsigned ret = 0;
+
+ while (ret * IDA_TREE_ARY + 1 < nodes)
+ ret = ret * IDA_TREE_ARY + 1;
+
+ return ret;
+}
+
+/*
+ * Given some number of leaf nodes, calculate the number of parent nodes the
+ * bitmap tree will require - i.e. the new ida->first_leaf
+ */
+static unsigned first_leaf_from_leaves(unsigned leaves)
+{
+ unsigned ret = 0;
+
+ while (ret * IDA_TREE_ARY + 1 < ret + leaves)
+ ret = ret * IDA_TREE_ARY + 1;
+
+ return ret;
+}
+
+static void __ida_remove(struct ida *ida, unsigned int id)
+{
+ unsigned i = ida->first_leaf + id / BITS_PER_LONG;
+ unsigned bit = id % BITS_PER_LONG;
+
+ BUG_ON(i >= ida->nodes);
+
+ --ida->allocated_ids;
+
+ while (1) {
+ unsigned long *node = ida->tree + i, old = *node;
+
+ WARN_ON(!test_bit(bit, node));
+ __clear_bit(bit, node);
+
+ if (~old || !i)
+ break;
+
+ /*
+ * If this node's bits were all 1s before we cleared this bit,
+ * we need to clear this node's bit in the parent node - and so
+ * on up to the root.
+ */
+
+ bit = (i - 1) % IDA_TREE_ARY;
+ i = (i - 1) / IDA_TREE_ARY;
+ }
+}
+
+/**
+ * ida_remove - remove an allocated id.
+ * @ida: the (initialized) ida.
+ * @id: the id returned by ida_alloc_range.
+ */
+void ida_remove(struct ida *ida, unsigned int id)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&ida->lock, flags);
+ __ida_remove(ida, id);
+ spin_unlock_irqrestore(&ida->lock, flags);
+}
+EXPORT_SYMBOL(ida_remove);
+
+static void ida_free_tree(unsigned long *tree, unsigned nodes)
+{
+ size_t bytes = nodes * sizeof(unsigned long);
+
+ if (bytes < PAGE_SIZE)
+ kfree(tree);
+ else
+ free_pages((unsigned long) tree,
+ get_order(bytes));
+
+}
+
+/*
+ * Attempt to double the size of the tree. We have to drop ida->lock to allocate
+ * memory, so we might race with another allocation that also tries to resize.
+ * So if the tree's not the size it originally was when we retake ida->lock,
+ * just return 0 - but the caller needs to recheck for the tree being full in
+ * case we _really_ raced badly.
+ */
+static inline int __ida_resize(struct ida *ida, unsigned max_id,
+ gfp_t gfp, unsigned long *flags)
+ __releases(&ida->lock)
+ __acquires(&ida->lock)
+{
+ unsigned long *tree;
+ unsigned old_nodes = ida->nodes;
+ unsigned cur_leaves = ida->nodes - ida->first_leaf;
+ unsigned new_nodes = roundup_pow_of_two(ida->nodes + 1);
+ unsigned first_leaf = first_leaf_from_nodes(new_nodes);
+ size_t bytes;
+
+ if (cur_leaves >= BITS_TO_LONGS(max_id))
+ return -ENOSPC;
+
+ spin_unlock_irqrestore(&ida->lock, *flags);
+
+ bytes = new_nodes * sizeof(unsigned long);
+
+ tree = bytes < PAGE_SIZE
+ ? kmalloc(bytes, gfp)
+ : (void *) __get_free_pages(gfp, get_order(bytes));
+
+ spin_lock_irqsave(&ida->lock, *flags);
+
+ if (!tree)
+ return -ENOMEM;
+
+ if (old_nodes != ida->nodes) {
+ ida_free_tree(tree, new_nodes);
+ return 0;
+ }
+
+ if (first_leaf == ida->first_leaf) {
+ /* Depth doesn't change, just appending leaf nodes */
+ memcpy(tree, ida->tree, ida->nodes * sizeof(unsigned long));
+ } else {
+ unsigned i, j, bit;
+
+ /* Zero out new parent nodes, reconstructing them below */
+ memset(tree, 0, first_leaf * sizeof(unsigned long));
+
+ memcpy(tree + first_leaf,
+ ida->tree + ida->first_leaf,
+ cur_leaves * sizeof(unsigned long));
+
+ for (i = first_leaf; i < first_leaf + cur_leaves; i++) {
+ j = i;
+
+ while (!~tree[j] && j) {
+ bit = (j - 1) % IDA_TREE_ARY;
+ j = (j - 1) / IDA_TREE_ARY;
+
+ __set_bit(bit, tree + j);
+ }
+ }
+ }
+
+ /* Zero out new leaf nodes */
+ memset(tree + first_leaf + cur_leaves, 0,
+ (new_nodes - first_leaf - cur_leaves) * sizeof(unsigned long));
+
+ if (ida->tree != ida->inline_nodes)
+ ida_free_tree(ida->tree, ida->nodes);
+
+ ida->nodes = new_nodes;
+ ida->first_leaf = first_leaf;
+ ida->tree = tree;
+
+ return 0;
+}
+
+/*
+ * Ganged allocation - amortize locking and tree traversal for when we've got
+ * another allocator (i.e. a percpu version) acting as a frontend to this code
+ */
+static int __ida_alloc_range_multiple(struct ida *ida, unsigned *ids,
+ unsigned nr_ids, unsigned min_id,
+ unsigned max_id, gfp_t gfp,
+ unsigned long *flags)
+ __releases(&ida->lock)
+ __acquires(&ida->lock)
+{
+ unsigned i = 0, bit, bit_offset, id, ids_found = 0;
+ unsigned long *node = ida->tree;
+ int err = 0;
+
+ if (!max_id)
+ max_id = (unsigned) INT_MAX + 1;
+
+ if (min_id >= max_id)
+ return -ENOSPC;
+
+ while (ids_found < nr_ids) {
+ /*
+ * If all bits are set in the root, no bits free and we need to
+ * resize.
+ */
+ while (!~*node) {
+resize:
+ err = __ida_resize(ida, max_id, gfp, flags);
+ if (err)
+ goto err;
+
+ i = 0;
+ node = ida->tree;
+ }
+
+ if (min_id) {
+ /*
+ * If we're starting from a specific index, skip to that
+ * leaf node and start looking there:
+ */
+ bit_offset = min_id % BITS_PER_LONG;
+ i = ida->first_leaf + min_id / BITS_PER_LONG;
+
+ if (i >= ida->nodes)
+ goto resize;
+
+ while (1) {
+ node = ida->tree + i;
+ bit = ffz(*node >> bit_offset) + bit_offset;
+
+ /*
+ * We might have had to go back up the tree
+ * before we found a free bit - so skip down to
+ * where we recurse down the tree.
+ */
+ if (~*node && bit < BITS_PER_LONG)
+ goto found;
+
+ if (!i)
+ goto resize;
+
+ /*
+ * Ok, no bits available in this node - go up a
+ * level. But we have to update bit_offset so we
+ * start searching in the parent _after_ the
+ * node we're currently at
+ */
+ bit_offset = (i - 1) % IDA_TREE_ARY + 1;
+ i = (i - 1) / IDA_TREE_ARY;
+ }
+ }
+
+ /*
+ * Recurse down the tree looking for a free bit. We already
+ * checked to make sure there _were_ free bits, but we might end
+ * up at a leaf node we haven't allocated yet.
+ */
+ while (1) {
+ bit = ffz(*node);
+found:
+ /*
+ * Found a bit - if we're at a leaf node, great! We're
+ * done:
+ */
+ if (i >= ida->first_leaf)
+ break;
+
+ i = i * IDA_TREE_ARY + 1 + bit;
+ node = ida->tree + i;
+
+ /*
+ * Recurse. But if we'd recurse to a node that hasn't
+ * been allocated yet, resize:
+ */
+
+ if (i >= ida->nodes)
+ goto resize;
+
+ BUG_ON(!~*node);
+ }
+
+ /*
+ * Our leaves are contiguous, so we can calculate the id we
+ * allocated from the node we're at and the bit we found within
+ * that node:
+ */
+ id = (i - ida->first_leaf) * BITS_PER_LONG + bit;
+ BUG_ON(id < min_id);
+
+ if (id >= max_id) {
+ err = -ENOSPC;
+ goto err;
+ }
+
+ ids[ids_found++] = id;
+ ida->allocated_ids++;
+
+ /*
+ * Mark the id as allocated. If all the bits are now set in this
+ * node, set this node's bit in the parent node - and so on up
+ * to the root:
+ */
+ while (1) {
+ __set_bit(bit, node);
+
+ if (~*node || !i)
+ break;
+
+ bit = (i - 1) % IDA_TREE_ARY;
+ i = (i - 1) / IDA_TREE_ARY;
+
+ node = ida->tree + i;
+ }
+ }
+err:
+ return ids_found ? ids_found : err;
+}
+
+/**
+ * ida_alloc_range - allocate a new id.
+ * @ida: the (initialized) ida.
+ * @start: the minimum id (inclusive, <= INT_MAX)
+ * @end: the maximum id (exclusive, <= INT_MAX + 1 or 0 for unlimited)
+ * @gfp_mask: memory allocation flags
+ *
+ * Allocates an id in the range [start, end). Returns -ENOSPC if no ids are
+ * available, or -ENOMEM on memory allocation failure.
+ *
+ * Returns the smallest free id >= start.
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+int ida_alloc_range(struct ida *ida, unsigned int start,
+ unsigned int end, gfp_t gfp)
+{
+ int ret;
+ unsigned id;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ida->lock, flags);
+ ret = __ida_alloc_range_multiple(ida, &id, 1, start, end, gfp, &flags);
+ spin_unlock_irqrestore(&ida->lock, flags);
+
+ return ret == 1 ? id : ret;
+}
+EXPORT_SYMBOL(ida_alloc_range);
+
+static int __ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end,
+ gfp_t gfp, unsigned long *flags)
+ __releases(&ida->lock)
+ __acquires(&ida->lock)
+{
+ int ret;
+ unsigned id;
+
+ ret = __ida_alloc_range_multiple(ida, &id, 1,
+ max(start, ida->cur_id),
+ end, gfp, flags);
+
+ if (ret < 0)
+ ret = __ida_alloc_range_multiple(ida, &id, 1, start,
+ end, gfp, flags);
+ if (ret == 1) {
+ ida->cur_id = id + 1;
+ if ((ida->cur_id - start) / 2 > max(1024U, ida->allocated_ids))
+ ida->cur_id = 0;
+
+ return id;
+ }
+
+ return ret;
+}
+
+/**
+ * ida_alloc_cyclic - allocate new ids cyclically
+ * @ida: the (initialized) ida.
+ * @start: the minimum id (inclusive, <= INT_MAX)
+ * @end: the maximum id (exclusive, <= INT_MAX + 1 or 0 for unlimited)
+ * @gfp_mask: memory allocation flags
+ *
+ * Allocates an id in the range start <= id < end, or returns -ENOSPC.
+ * On memory allocation failure, returns -ENOMEM.
+ *
+ * Instead of returning the smallest free id, start searching from the position
+ * where the last id was allocated - i.e. it won't reuse freed ids right away.
+ *
+ * To avoid the allocated id space (and internal bitmap) becoming arbitrarily
+ * sparse, it can wrap before reaching the maximum id - if less than half of our
+ * current id space is allocated, it resets cur_id to 0
+ *
+ * But we don't want to wrap when the id space is small, so we use the maximum
+ * of (1024, allocated_ids) - see __ida_alloc_cyclic().
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+int ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end, gfp_t gfp)
+{
+ int ret;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ida->lock, flags);
+ ret = __ida_alloc_cyclic(ida, start, end, gfp, &flags);
+ spin_unlock_irqrestore(&ida->lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(ida_alloc_cyclic);
+
+/**
+ * ida_destroy - release all cached layers within an ida tree
+ * @ida: ida handle
+ */
+void ida_destroy(struct ida *ida)
+{
+ if (ida->tree != ida->inline_nodes)
+ ida_free_tree(ida->tree, ida->nodes);
+}
+EXPORT_SYMBOL(ida_destroy);
+
+/**
+ * ida_init_prealloc - initialize ida handle
+ * @ida: ida handle
+ * @prealloc: number of ids to preallocate memory for
+ *
+ * Initialize an ida, and preallocate enough memory that ida_alloc() will never
+ * return -ENOMEM if passed max_id <= prealloc.
+ */
+int ida_init_prealloc(struct ida *ida, unsigned prealloc)
+{
+ memset(ida, 0, sizeof(*ida));
+
+ spin_lock_init(&ida->lock);
+
+ ida->nodes = IDA_INLINE_NODES;
+ ida->first_leaf = 0;
+ ida->tree = ida->inline_nodes;
+
+ if (prealloc) {
+ unsigned leaves = BITS_TO_LONGS(prealloc);
+ unsigned first_leaf = first_leaf_from_leaves(leaves);
+ unsigned nodes = first_leaf + leaves;
+ size_t bytes;
+
+ if (nodes > ida->nodes) {
+ nodes = roundup_pow_of_two(nodes);
+ bytes = nodes * sizeof(unsigned long);
+
+ ida->tree = bytes < PAGE_SIZE
+ ? kzalloc(bytes, GFP_KERNEL)
+ : (void *) __get_free_pages(GFP_KERNEL|__GFP_ZERO,
+ get_order(bytes));
+ if (!ida->tree)
+ return -ENOMEM;
+
+ ida->nodes = nodes;
+ ida->first_leaf = first_leaf;
+ }
+ }
+
+ return 0;
+
+}
+EXPORT_SYMBOL(ida_init_prealloc);
+
+/* IDR */
+
#define MAX_IDR_SHIFT (sizeof(int) * 8 - 1)
#define MAX_IDR_BIT (1U << MAX_IDR_SHIFT)

@@ -50,7 +541,6 @@
static struct kmem_cache *idr_layer_cache;
static DEFINE_PER_CPU(struct idr_layer *, idr_preload_head);
static DEFINE_PER_CPU(int, idr_preload_cnt);
-static DEFINE_SPINLOCK(simple_ida_lock);

/* the maximum ID which can be allocated given idr->layers */
static int idr_max(int layers)
@@ -870,287 +1360,3 @@ void idr_init(struct idr *idp)
spin_lock_init(&idp->lock);
}
EXPORT_SYMBOL(idr_init);
-
-
-/**
- * DOC: IDA description
- * IDA - IDR based ID allocator
- *
- * This is id allocator without id -> pointer translation. Memory
- * usage is much lower than full blown idr because each id only
- * occupies a bit. ida uses a custom leaf node which contains
- * IDA_BITMAP_BITS slots.
- *
- * 2007-04-25 written by Tejun Heo <[email protected]>
- */
-
-static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
-{
- unsigned long flags;
-
- if (!ida->free_bitmap) {
- spin_lock_irqsave(&ida->idr.lock, flags);
- if (!ida->free_bitmap) {
- ida->free_bitmap = bitmap;
- bitmap = NULL;
- }
- spin_unlock_irqrestore(&ida->idr.lock, flags);
- }
-
- kfree(bitmap);
-}
-
-/**
- * ida_pre_get - reserve resources for ida allocation
- * @ida: ida handle
- * @gfp_mask: memory allocation flag
- *
- * This function should be called prior to locking and calling the
- * following function. It preallocates enough memory to satisfy the
- * worst possible allocation.
- *
- * If the system is REALLY out of memory this function returns %0,
- * otherwise %1.
- */
-static int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
-{
- /* allocate idr_layers */
- if (!__idr_pre_get(&ida->idr, gfp_mask))
- return 0;
-
- /* allocate free_bitmap */
- if (!ida->free_bitmap) {
- struct ida_bitmap *bitmap;
-
- bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
- if (!bitmap)
- return 0;
-
- free_bitmap(ida, bitmap);
- }
-
- return 1;
-}
-
-/**
- * ida_get_new_above - allocate new ID above or equal to a start id
- * @ida: ida handle
- * @starting_id: id to start search at
- * @p_id: pointer to the allocated handle
- *
- * Allocate new ID above or equal to @starting_id. It should be called
- * with any required locks.
- *
- * If memory is required, it will return %-EAGAIN, you should unlock
- * and go back to the ida_pre_get() call. If the ida is full, it will
- * return %-ENOSPC.
- *
- * @p_id returns a value in the range @starting_id ... %0x7fffffff.
- */
-static int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
-{
- struct idr_layer *pa[MAX_IDR_LEVEL + 1];
- struct ida_bitmap *bitmap;
- unsigned long flags;
- int idr_id = starting_id / IDA_BITMAP_BITS;
- int offset = starting_id % IDA_BITMAP_BITS;
- int t, id;
-
- restart:
- /* get vacant slot */
- t = idr_get_empty_slot(&ida->idr, idr_id, pa, 0, &ida->idr);
- if (t < 0)
- return t == -ENOMEM ? -EAGAIN : t;
-
- if (t * IDA_BITMAP_BITS >= MAX_IDR_BIT)
- return -ENOSPC;
-
- if (t != idr_id)
- offset = 0;
- idr_id = t;
-
- /* if bitmap isn't there, create a new one */
- bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];
- if (!bitmap) {
- spin_lock_irqsave(&ida->idr.lock, flags);
- bitmap = ida->free_bitmap;
- ida->free_bitmap = NULL;
- spin_unlock_irqrestore(&ida->idr.lock, flags);
-
- if (!bitmap)
- return -EAGAIN;
-
- memset(bitmap, 0, sizeof(struct ida_bitmap));
- rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
- (void *)bitmap);
- pa[0]->count++;
- }
-
- /* lookup for empty slot */
- t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
- if (t == IDA_BITMAP_BITS) {
- /* no empty slot after offset, continue to the next chunk */
- idr_id++;
- offset = 0;
- goto restart;
- }
-
- id = idr_id * IDA_BITMAP_BITS + t;
- if (id >= MAX_IDR_BIT)
- return -ENOSPC;
-
- __set_bit(t, bitmap->bitmap);
- if (++bitmap->nr_busy == IDA_BITMAP_BITS)
- idr_mark_full(pa, idr_id);
-
- *p_id = id;
-
- /* Each leaf node can handle nearly a thousand slots and the
- * whole idea of ida is to have small memory foot print.
- * Throw away extra resources one by one after each successful
- * allocation.
- */
- if (ida->idr.id_free_cnt || ida->free_bitmap) {
- struct idr_layer *p = get_from_free_list(&ida->idr);
- if (p)
- kmem_cache_free(idr_layer_cache, p);
- }
-
- return 0;
-}
-
-static void __ida_remove(struct ida *ida, int id)
-{
- struct idr_layer *p = ida->idr.top;
- int shift = (ida->idr.layers - 1) * IDR_BITS;
- int idr_id = id / IDA_BITMAP_BITS;
- int offset = id % IDA_BITMAP_BITS;
- int n;
- struct ida_bitmap *bitmap;
-
- /* clear full bits while looking up the leaf idr_layer */
- while ((shift > 0) && p) {
- n = (idr_id >> shift) & IDR_MASK;
- __clear_bit(n, p->bitmap);
- p = p->ary[n];
- shift -= IDR_BITS;
- }
-
- if (p == NULL)
- goto err;
-
- n = idr_id & IDR_MASK;
- __clear_bit(n, p->bitmap);
-
- bitmap = (void *)p->ary[n];
- if (!test_bit(offset, bitmap->bitmap))
- goto err;
-
- /* update bitmap and remove it if empty */
- __clear_bit(offset, bitmap->bitmap);
- if (--bitmap->nr_busy == 0) {
- __set_bit(n, p->bitmap); /* to please idr_remove() */
- idr_remove(&ida->idr, idr_id);
- free_bitmap(ida, bitmap);
- }
-
- return;
-
- err:
- printk(KERN_WARNING
- "ida_remove called for id=%d which is not allocated.\n", id);
-}
-
-/**
- * ida_destroy - release all cached layers within an ida tree
- * @ida: ida handle
- */
-void ida_destroy(struct ida *ida)
-{
- idr_destroy(&ida->idr);
- kfree(ida->free_bitmap);
-}
-EXPORT_SYMBOL(ida_destroy);
-
-/**
- * ida_alloc_range - get a new id.
- * @ida: the (initialized) ida.
- * @start: the minimum id (inclusive, < 0x8000000)
- * @end: the maximum id (exclusive, < 0x8000000 or 0)
- * @gfp_mask: memory allocation flags
- *
- * Allocates an id in the range start <= id < end, or returns -ENOSPC.
- * On memory allocation failure, returns -ENOMEM.
- *
- * Use ida_remove() to get rid of an id.
- */
-int ida_alloc_range(struct ida *ida, unsigned int start, unsigned int end,
- gfp_t gfp_mask)
-{
- int ret, id;
- unsigned int max;
- unsigned long flags;
-
- BUG_ON((int)start < 0);
- BUG_ON((int)end < 0);
-
- if (end == 0)
- max = 0x80000000;
- else {
- BUG_ON(end < start);
- max = end - 1;
- }
-
-again:
- if (!ida_pre_get(ida, gfp_mask))
- return -ENOMEM;
-
- spin_lock_irqsave(&simple_ida_lock, flags);
- ret = ida_get_new_above(ida, start, &id);
- if (!ret) {
- if (id > max) {
- __ida_remove(ida, id);
- ret = -ENOSPC;
- } else {
- ret = id;
- }
- }
- spin_unlock_irqrestore(&simple_ida_lock, flags);
-
- if (unlikely(ret == -EAGAIN))
- goto again;
-
- return ret;
-}
-EXPORT_SYMBOL(ida_alloc_range);
-
-/**
- * ida_remove - remove an allocated id.
- * @ida: the (initialized) ida.
- * @id: the id returned by ida_alloc_range.
- */
-void ida_remove(struct ida *ida, unsigned int id)
-{
- unsigned long flags;
-
- BUG_ON((int)id < 0);
- spin_lock_irqsave(&simple_ida_lock, flags);
- __ida_remove(ida, id);
- spin_unlock_irqrestore(&simple_ida_lock, flags);
-}
-EXPORT_SYMBOL(ida_remove);
-
-/**
- * ida_init - initialize ida handle
- * @ida: ida handle
- *
- * This function is use to set up the handle (@ida) that you will pass
- * to the rest of the functions.
- */
-void ida_init(struct ida *ida)
-{
- memset(ida, 0, sizeof(struct ida));
- idr_init(&ida->idr);
-
-}
-EXPORT_SYMBOL(ida_init);
--
1.8.3.1

2013-06-19 00:02:58

by Kent Overstreet

[permalink] [raw]

Subject: [PATCH 05/10] idr: Kill old deprecated idr interfaces

The deprecated idr interfaces don't have any in kernel users, so let's
delete them as prep work for the idr rewrite.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
---
include/linux/idr.h | 63 -----------------------------------------------------
lib/idr.c | 36 +++---------------------------
2 files changed, 3 insertions(+), 96 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 3ba796b..35df6c7 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -262,69 +262,6 @@ static inline void *idr_find(struct idr *idr, int id)
#define idr_for_each_entry(idp, entry, id) \
for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)

-/*
- * Don't use the following functions. These exist only to suppress
- * deprecated warnings on EXPORT_SYMBOL()s.
- */
-int __idr_pre_get(struct idr *idp, gfp_t gfp_mask);
-int __idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id);
-void __idr_remove_all(struct idr *idp);
-
-/**
- * idr_pre_get - reserve resources for idr allocation
- * @idp: idr handle
- * @gfp_mask: memory allocation flags
- *
- * Part of old alloc interface. This is going away. Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_pre_get(struct idr *idp, gfp_t gfp_mask)
-{
- return __idr_pre_get(idp, gfp_mask);
-}
-
-/**
- * idr_get_new_above - allocate new idr entry above or equal to a start id
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @starting_id: id to start search at
- * @id: pointer to the allocated handle
- *
- * Part of old alloc interface. This is going away. Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_get_new_above(struct idr *idp, void *ptr,
- int starting_id, int *id)
-{
- return __idr_get_new_above(idp, ptr, starting_id, id);
-}
-
-/**
- * idr_get_new - allocate new idr entry
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @id: pointer to the allocated handle
- *
- * Part of old alloc interface. This is going away. Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_get_new(struct idr *idp, void *ptr, int *id)
-{
- return __idr_get_new_above(idp, ptr, 0, id);
-}
-
-/**
- * idr_remove_all - remove all ids from the given idr tree
- * @idp: idr handle
- *
- * If you're trying to destroy @idp, calling idr_destroy() is enough.
- * This is going away. Don't use.
- */
-static inline void __deprecated idr_remove_all(struct idr *idp)
-{
- __idr_remove_all(idp);
-}
-
void __init idr_init_cache(void);

#endif /* __IDR_H__ */
diff --git a/lib/idr.c b/lib/idr.c
index 4962e5d..dae0920 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -982,19 +982,6 @@ static void idr_mark_full(struct idr_layer **pa, int id)
}
}

-int __idr_pre_get(struct idr *idp, gfp_t gfp_mask)
-{
- while (idp->id_free_cnt < MAX_IDR_FREE) {
- struct idr_layer *new;
- new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
- if (new == NULL)
- return (0);
- move_to_free_list(idp, new);
- }
- return 1;
-}
-EXPORT_SYMBOL(__idr_pre_get);
-
/**
* sub_alloc - try to allocate an id without growing the tree depth
* @idp: idr handle
@@ -1160,21 +1147,6 @@ static void idr_fill_slot(struct idr *idr, void *ptr, int id,
idr_mark_full(pa, id);
}

-int __idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
-{
- struct idr_layer *pa[MAX_IDR_LEVEL + 1];
- int rv;
-
- rv = idr_get_empty_slot(idp, starting_id, pa, 0, idp);
- if (rv < 0)
- return rv == -ENOMEM ? -EAGAIN : rv;
-
- idr_fill_slot(idp, ptr, rv, pa);
- *id = rv;
- return 0;
-}
-EXPORT_SYMBOL(__idr_get_new_above);
-
/**
* idr_preload - preload for idr_alloc()
* @gfp_mask: allocation mask to use for preloading
@@ -1395,7 +1367,7 @@ void idr_remove(struct idr *idp, int id)
}
EXPORT_SYMBOL(idr_remove);

-void __idr_remove_all(struct idr *idp)
+static void __idr_remove_all(struct idr *idp)
{
int n, id, max;
int bt_mask;
@@ -1428,7 +1400,6 @@ void __idr_remove_all(struct idr *idp)
}
idp->layers = 0;
}
-EXPORT_SYMBOL(__idr_remove_all);

/**
* idr_destroy - release all cached layers within an idr tree
@@ -1490,13 +1461,12 @@ EXPORT_SYMBOL(idr_find_slowpath);
* callback function will be called for each pointer currently
* registered, passing the id, the pointer and the data pointer passed
* to this function. It is not safe to modify the idr tree while in
- * the callback, so functions such as idr_get_new and idr_remove are
- * not allowed.
+ * the callback, so functions such as idr_remove are not allowed.
*
* We check the return of @fn each time. If it returns anything other
* than %0, we break out and return that value.
*
- * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
+ * The caller must serialize idr_for_each() vs idr_remove().
*/
int idr_for_each(struct idr *idp,
int (*fn)(int id, void *p, void *data), void *data)
--
1.8.3.1

2013-06-19 00:03:18

by Kent Overstreet

[permalink] [raw]

Subject: [PATCH 06/10] idr: Rename idr_get_next() -> idr_find_next()

get() implies taking a ref or sometimes an allocation, which this
function definitely does not do - rename it to something more sensible.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
---
drivers/block/drbd/drbd_main.c | 2 +-
drivers/block/drbd/drbd_nl.c | 2 +-
drivers/mtd/mtdcore.c | 2 +-
include/linux/idr.h | 4 ++--
lib/idr.c | 6 +++---
5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a5dca6a..b84e4b2 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -500,7 +500,7 @@ int conn_lowest_minor(struct drbd_tconn *tconn)
int vnr = 0, m;

rcu_read_lock();
- mdev = idr_get_next(&tconn->volumes, &vnr);
+ mdev = idr_find_next(&tconn->volumes, &vnr);
m = mdev ? mdev_to_minor(mdev) : -1;
rcu_read_unlock();

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 9e3f441..8fa7eb1 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2837,7 +2837,7 @@ int get_one_status(struct sk_buff *skb, struct netlink_callback *cb)
}
if (tconn) {
next_tconn:
- mdev = idr_get_next(&tconn->volumes, &volume);
+ mdev = idr_find_next(&tconn->volumes, &volume);
if (!mdev) {
/* No more volumes to dump on this tconn.
* Advance tconn iterator. */
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index c400c57..eaa1fcc 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(mtd_table_mutex);

struct mtd_info *__mtd_next_device(int i)
{
- return idr_get_next(&mtd_idr, &i);
+ return idr_find_next(&mtd_idr, &i);
}
EXPORT_SYMBOL_GPL(__mtd_next_device);

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 35df6c7..ae3a3b4 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -209,7 +209,7 @@ int idr_alloc(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask);
int idr_for_each(struct idr *idp,
int (*fn)(int id, void *p, void *data), void *data);
-void *idr_get_next(struct idr *idp, int *nextid);
+void *idr_find_next(struct idr *idp, int *nextid);
void *idr_replace(struct idr *idp, void *ptr, int id);
void idr_remove(struct idr *idp, int id);
void idr_free(struct idr *idp, int id);
@@ -260,7 +260,7 @@ static inline void *idr_find(struct idr *idr, int id)
* is convenient for a "not found" value.
*/
#define idr_for_each_entry(idp, entry, id) \
- for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)
+ for (id = 0; ((entry) = idr_find_next(idp, &(id))) != NULL; ++id)

void __init idr_init_cache(void);

diff --git a/lib/idr.c b/lib/idr.c
index dae0920..21f86e8 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -1506,7 +1506,7 @@ int idr_for_each(struct idr *idp,
EXPORT_SYMBOL(idr_for_each);

/**
- * idr_get_next - lookup next object of id to given id.
+ * idr_find_next - lookup next object of id to given id.
* @idp: idr handle
* @nextidp: pointer to lookup key
*
@@ -1517,7 +1517,7 @@ EXPORT_SYMBOL(idr_for_each);
* This function can be called under rcu_read_lock(), given that the leaf
* pointers lifetimes are correctly managed.
*/
-void *idr_get_next(struct idr *idp, int *nextidp)
+void *idr_find_next(struct idr *idp, int *nextidp)
{
struct idr_layer *p, *pa[MAX_IDR_LEVEL + 1];
struct idr_layer **paa = &pa[0];
@@ -1558,7 +1558,7 @@ void *idr_get_next(struct idr *idp, int *nextidp)
}
return NULL;
}
-EXPORT_SYMBOL(idr_get_next);
+EXPORT_SYMBOL(idr_find_next);

/**
--
1.8.3.1

2013-06-19 00:03:12

by Kent Overstreet

[permalink] [raw]

Subject: [PATCH 08/10] idr: Reimplement idr on top of ida/radix trees

The old idr code was really a second radix tree implementation - we
already have one in lib/radix-tree.c.

This patch reimplements idr on top of our existing radix trees, using
our shiny new ida implementation for allocating/freeing the ids. The old
idr code was noticably slower than lib/radix-tree.c in at least some
benchmarks, so in addition to being ~500 lines less code this patch
should improve performance too.

There's one thing left unfinished in this patch - the existing
idr_preload() interface won't work for ida. Another patch on top of this
will fix idr_preload() and update existing users to the new interface.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
---
include/linux/idr.h | 157 ++++-----
init/main.c | 1 -
lib/idr.c | 896 ++++++++++------------------------------------------
3 files changed, 249 insertions(+), 805 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 4bce55c..ec789f5 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -1,6 +1,6 @@
/*
* include/linux/idr.h
- *
+ *
* 2002-10-18 written by Jim Houston [email protected]
* Copyright (C) 2002 by Concurrent Computer Corporation
* Distributed under the GNU GPL license version 2.
@@ -12,10 +12,8 @@
#ifndef __IDR_H__
#define __IDR_H__

-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/rcupdate.h>
+#include <linux/gfp.h>
+#include <linux/radix-tree.h>
#include <linux/spinlock_types.h>
#include <linux/wait.h>

@@ -147,74 +145,42 @@ int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);

/* IDR */

-/*
- * We want shallower trees and thus more bits covered at each layer. 8
- * bits gives us large enough first layer for most use cases and maximum
- * tree depth of 4. Each idr_layer is slightly larger than 2k on 64bit and
- * 1k on 32bit.
+/**
+ * DOC: idr sync
+ * idr synchronization (stolen from radix-tree.h)
+ *
+ * idr_alloc() and idr_remove() do their own locking internally - the user need
+ * not be concerned with synchronization unless there's other operations that
+ * need to be done atomically.
+ *
+ * idr_find() does no locking - it can be called locklessly using RCU, if the
+ * caller ensures calls to this function are made within rcu_read_lock()
+ * regions and does all the other appropriate RCU stuff.
*/
-#define IDR_BITS 8
-#define IDR_SIZE (1 << IDR_BITS)
-#define IDR_MASK ((1 << IDR_BITS)-1)
-
-struct idr_layer {
- int prefix; /* the ID prefix of this idr_layer */
- DECLARE_BITMAP(bitmap, IDR_SIZE); /* A zero bit means "space here" */
- struct idr_layer __rcu *ary[1<<IDR_BITS];
- int count; /* When zero, we can release it */
- int layer; /* distance from leaf */
- struct rcu_head rcu_head;
-};

struct idr {
- struct idr_layer __rcu *hint; /* the last layer allocated from */
- struct idr_layer __rcu *top;
- struct idr_layer *id_free;
- int layers; /* only valid w/o concurrent changes */
- int id_free_cnt;
- int cur; /* current pos for cyclic allocation */
- spinlock_t lock;
+ struct ida ida;
+ struct radix_tree_root ptrs;
};

#define IDR_INIT(name) \
{ \
- .lock = __SPIN_LOCK_UNLOCKED(name.lock), \
+ .ida = IDA_INIT(name.ida), \
+ .ptrs = RADIX_TREE_INIT(GFP_NOWAIT), \
}
#define DEFINE_IDR(name) struct idr name = IDR_INIT(name)

-/**
- * DOC: idr sync
- * idr synchronization (stolen from radix-tree.h)
- *
- * idr_find() is able to be called locklessly, using RCU. The caller must
- * ensure calls to this function are made within rcu_read_lock() regions.
- * Other readers (lock-free or otherwise) and modifications may be running
- * concurrently.
- *
- * It is still required that the caller manage the synchronization and
- * lifetimes of the items. So if RCU lock-free lookups are used, typically
- * this would mean that the items have their own locks, or are amenable to
- * lock-free access; and that the items are freed by RCU (or only freed after
- * having been deleted from the idr tree *and* a synchronize_rcu() grace
- * period).
- */
-
-/*
- * This is what we export.
- */
-
-void *idr_find_slowpath(struct idr *idp, int id);
-void idr_preload(gfp_t gfp_mask);
-int idr_alloc_range(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask);
-int idr_for_each(struct idr *idp,
+void *idr_find_next(struct idr *idr, int *nextid);
+int idr_for_each(struct idr *idr,
int (*fn)(int id, void *p, void *data), void *data);
-void *idr_find_next(struct idr *idp, int *nextid);
-void *idr_replace(struct idr *idp, void *ptr, int id);
-void idr_remove(struct idr *idp, int id);
-void idr_free(struct idr *idp, int id);
-void idr_destroy(struct idr *idp);
-void idr_init(struct idr *idp);
+void *idr_replace(struct idr *idr, void *ptr, unsigned id);
+void idr_remove(struct idr *idr, unsigned id);
+int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
+ unsigned end, gfp_t gfp);
+int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
+ unsigned end, gfp_t gfp_mask);
+void idr_destroy(struct idr *idr);
+void idr_init(struct idr *idr);

static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
{
@@ -229,7 +195,53 @@ static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
*/
static inline void idr_preload_end(void)
{
- preempt_enable();
+ radix_tree_preload_end();
+}
+
+/**
+ * idr_preload - preload for idr_alloc_range()
+ * @gfp: allocation mask to use for preloading
+ *
+ * Preload per-cpu layer buffer for idr_alloc_range(). Can only be used from
+ * process context and each idr_preload() invocation should be matched with
+ * idr_preload_end(). Note that preemption is disabled while preloaded.
+ *
+ * The first idr_alloc_range() in the preloaded section can be treated as if it
+ * were invoked with @gfp_mask used for preloading. This allows using more
+ * permissive allocation masks for idrs protected by spinlocks.
+ *
+ * For example, if idr_alloc_range() below fails, the failure can be treated as
+ * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
+ *
+ * idr_preload(GFP_KERNEL);
+ * spin_lock(lock);
+ *
+ * id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
+ *
+ * spin_unlock(lock);
+ * idr_preload_end();
+ * if (id < 0)
+ * error;
+ */
+static inline void idr_preload(gfp_t gfp)
+{
+ might_sleep_if(gfp & __GFP_WAIT);
+
+ /* Well this is horrible, but idr_preload doesn't return errors */
+ if (radix_tree_preload(gfp))
+ preempt_disable();
+}
+
+/* radix tree can't store NULL pointers, so we have to translate... */
+static inline void *__radix_idr_ptr(void *ptr)
+{
+ return ptr != (void *) (~0UL & ~RADIX_TREE_INDIRECT_PTR)
+ ? ptr : NULL;
+}
+
+static inline void *__idr_radix_ptr(void *ptr)
+{
+ return ptr ?: (void *) (~0UL & ~RADIX_TREE_INDIRECT_PTR);
}

/**
@@ -239,24 +251,19 @@ static inline void idr_preload_end(void)
*
* Return the pointer given the id it has been registered with. A %NULL
* return indicates that @id is not valid or you passed %NULL in
- * idr_get_new().
+ * idr_alloc().
*
* This function can be called under rcu_read_lock(), given that the leaf
* pointers lifetimes are correctly managed.
*/
-static inline void *idr_find(struct idr *idr, int id)
+static inline void *idr_find(struct idr *idr, unsigned id)
{
- struct idr_layer *hint = rcu_dereference_raw(idr->hint);
-
- if (hint && (id & ~IDR_MASK) == hint->prefix)
- return rcu_dereference_raw(hint->ary[id & IDR_MASK]);
-
- return idr_find_slowpath(idr, id);
+ return __radix_idr_ptr(radix_tree_lookup(&idr->ptrs, id));
}

/**
* idr_for_each_entry - iterate over an idr's elements of a given type
- * @idp: idr handle
+ * @idr: idr handle
* @entry: the type * to use as cursor
* @id: id entry's key
*
@@ -264,9 +271,7 @@ static inline void *idr_find(struct idr *idr, int id)
* after normal terminatinon @entry is left with the value NULL. This
* is convenient for a "not found" value.
*/
-#define idr_for_each_entry(idp, entry, id) \
- for (id = 0; ((entry) = idr_find_next(idp, &(id))) != NULL; ++id)
-
-void __init idr_init_cache(void);
+#define idr_for_each_entry(idr, entry, id) \
+ for (id = 0; ((entry) = idr_find_next(idr, &(id))) != NULL; ++id)

#endif /* __IDR_H__ */
diff --git a/init/main.c b/init/main.c
index 9484f4b..87b5a0f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -541,7 +541,6 @@ asmlinkage void __init start_kernel(void)
preempt_disable();
if (WARN(!irqs_disabled(), "Interrupts were enabled *very* early, fixing it\n"))
local_irq_disable();
- idr_init_cache();
perf_event_init();
rcu_init();
tick_nohz_init();
diff --git a/lib/idr.c b/lib/idr.c
index d0f05e9..c2fb8bc 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,24 +8,10 @@
*
* Modified by Nadia Derbey to make it RCU safe.
*
- * IDA completely rewritten by Kent Overstreet <[email protected]>
+ * Completely rewritten by Kent Overstreet <[email protected]>.
*
- * Small id to pointer translation service.
- *
- * It uses a radix tree like structure as a sparse array indexed
- * by the id to obtain the pointer. The bitmap makes allocating
- * a new id quick.
- *
- * You call it to allocate an id (an int) an associate with that id a
- * pointer or what ever, we treat it as a (void *). You can pass this
- * id to a user for him to pass back at a later time. You then pass
- * that id to this code and it returns your pointer.
-
- * You can release ids at any time. When all ids are released, most of
- * the memory is returned (we keep MAX_IDR_FREE) in a local pool so we
- * don't need to go to the memory "store" during an id allocate, just
- * so you don't need to be too concerned about locking and conflicts
- * with the slab allocator.
+ * id allocator (scalable/resizable bitmap, essentially), and also idr which
+ * combines ida with a radix tree to map pointers to small integers for you.
*/

#include <linux/bitmap.h>
@@ -33,11 +19,10 @@
#include <linux/bug.h>
#include <linux/err.h>
#include <linux/export.h>
-#include <linux/hardirq.h>
#include <linux/idr.h>
-#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/percpu.h>
+#include <linux/rcupdate.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -823,389 +808,158 @@ err:
}
EXPORT_SYMBOL_GPL(percpu_ida_init);

-/* IDR */
-
-#define MAX_IDR_SHIFT (sizeof(int) * 8 - 1)
-#define MAX_IDR_BIT (1U << MAX_IDR_SHIFT)
-
-/* Leave the possibility of an incomplete final layer */
-#define MAX_IDR_LEVEL ((MAX_IDR_SHIFT + IDR_BITS - 1) / IDR_BITS)
-
-/* Number of id_layer structs to leave in free list */
-#define MAX_IDR_FREE (MAX_IDR_LEVEL * 2)
-
-static struct kmem_cache *idr_layer_cache;
-static DEFINE_PER_CPU(struct idr_layer *, idr_preload_head);
-static DEFINE_PER_CPU(int, idr_preload_cnt);
-
-/* the maximum ID which can be allocated given idr->layers */
-static int idr_max(int layers)
-{
- int bits = min_t(int, layers * IDR_BITS, MAX_IDR_SHIFT);
-
- return (1 << bits) - 1;
-}
-
-/*
- * Prefix mask for an idr_layer at @layer. For layer 0, the prefix mask is
- * all bits except for the lower IDR_BITS. For layer 1, 2 * IDR_BITS, and
- * so on.
+/**
+ * DOC: IDR description
+ * IDR: Maps ids (small integers) to pointers.
+ *
+ * This merely combines ida (id allocation) with a radix tree; idr_alloc()
+ * stores a pointer, and returns you a small integer by which you can refer to
+ * it.
+ *
+ * It'll give you the smallest available integer (within a specified range if
+ * you use idr_alloc_range()) - there's also idr_alloc_cyclic() if you don't
+ * want ids to be reused right away.
+ *
+ * id -> pointer mappings can be deleted with idr_remove().
*/
-static int idr_layer_prefix_mask(int layer)
-{
- return ~idr_max(layer + 1);
-}
-
-static struct idr_layer *get_from_free_list(struct idr *idp)
-{
- struct idr_layer *p;
- unsigned long flags;
-
- spin_lock_irqsave(&idp->lock, flags);
- if ((p = idp->id_free)) {
- idp->id_free = p->ary[0];
- idp->id_free_cnt--;
- p->ary[0] = NULL;
- }
- spin_unlock_irqrestore(&idp->lock, flags);
- return(p);
-}

/**
- * idr_layer_alloc - allocate a new idr_layer
- * @gfp_mask: allocation mask
- * @layer_idr: optional idr to allocate from
+ * idr_find_next - lookup next object of id to given id.
+ * @idr: idr handle
+ * @nextidp: pointer to lookup key
*
- * If @layer_idr is %NULL, directly allocate one using @gfp_mask or fetch
- * one from the per-cpu preload buffer. If @layer_idr is not %NULL, fetch
- * an idr_layer from @idr->id_free.
+ * Returns pointer to registered object with id, which is next number to
+ * given id. After being looked up, *@nextidp will be updated for the next
+ * iteration.
*
- * @layer_idr is to maintain backward compatibility with the old alloc
- * interface - idr_pre_get() and idr_get_new*() - and will be removed
- * together with per-pool preload buffer.
+ * This function can be called under rcu_read_lock(), given that the leaf
+ * pointers lifetimes are correctly managed.
*/
-static struct idr_layer *idr_layer_alloc(gfp_t gfp_mask, struct idr *layer_idr)
+void *idr_find_next(struct idr *idr, int *nextidp)
{
- struct idr_layer *new;
-
- /* this is the old path, bypass to get_from_free_list() */
- if (layer_idr)
- return get_from_free_list(layer_idr);
-
- /*
- * Try to allocate directly from kmem_cache. We want to try this
- * before preload buffer; otherwise, non-preloading idr_alloc_range()
- * users will end up taking advantage of preloading ones. As the
- * following is allowed to fail for preloaded cases, suppress
- * warning this time.
- */
- new = kmem_cache_zalloc(idr_layer_cache, gfp_mask | __GFP_NOWARN);
- if (new)
- return new;
-
- /*
- * Try to fetch one from the per-cpu preload buffer if in process
- * context. See idr_preload() for details.
- */
- if (!in_interrupt()) {
- preempt_disable();
- new = __this_cpu_read(idr_preload_head);
- if (new) {
- __this_cpu_write(idr_preload_head, new->ary[0]);
- __this_cpu_dec(idr_preload_cnt);
- new->ary[0] = NULL;
- }
- preempt_enable();
- if (new)
- return new;
- }
+ void **slot;
+ struct radix_tree_iter iter;
+ void *ret = NULL;

- /*
- * Both failed. Try kmem_cache again w/o adding __GFP_NOWARN so
- * that memory allocation failure warning is printed as intended.
- */
- return kmem_cache_zalloc(idr_layer_cache, gfp_mask);
-}
-
-static void idr_layer_rcu_free(struct rcu_head *head)
-{
- struct idr_layer *layer;
-
- layer = container_of(head, struct idr_layer, rcu_head);
- kmem_cache_free(idr_layer_cache, layer);
-}
-
-static inline void free_layer(struct idr *idr, struct idr_layer *p)
-{
- if (idr->hint && idr->hint == p)
- RCU_INIT_POINTER(idr->hint, NULL);
- call_rcu(&p->rcu_head, idr_layer_rcu_free);
-}
+ rcu_read_lock();

-/* only called when idp->lock is held */
-static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
-{
- p->ary[0] = idp->id_free;
- idp->id_free = p;
- idp->id_free_cnt++;
-}
-
-static void move_to_free_list(struct idr *idp, struct idr_layer *p)
-{
- unsigned long flags;
+ radix_tree_for_each_slot(slot, &idr->ptrs, &iter, *nextidp) {
+ *nextidp = iter.index;
+ ret = radix_tree_deref_slot(slot);
+ break;
+ }

- /*
- * Depends on the return element being zeroed.
- */
- spin_lock_irqsave(&idp->lock, flags);
- __move_to_free_list(idp, p);
- spin_unlock_irqrestore(&idp->lock, flags);
-}
+ rcu_read_unlock();

-static void idr_mark_full(struct idr_layer **pa, int id)
-{
- struct idr_layer *p = pa[0];
- int l = 0;
-
- __set_bit(id & IDR_MASK, p->bitmap);
- /*
- * If this layer is full mark the bit in the layer above to
- * show that this part of the radix tree is full. This may
- * complete the layer above and require walking up the radix
- * tree.
- */
- while (bitmap_full(p->bitmap, IDR_SIZE)) {
- if (!(p = pa[++l]))
- break;
- id = id >> IDR_BITS;
- __set_bit((id & IDR_MASK), p->bitmap);
- }
+ return __radix_idr_ptr(ret);
}
+EXPORT_SYMBOL(idr_find_next);

/**
- * sub_alloc - try to allocate an id without growing the tree depth
- * @idp: idr handle
- * @starting_id: id to start search at
- * @pa: idr_layer[MAX_IDR_LEVEL] used as backtrack buffer
- * @gfp_mask: allocation mask for idr_layer_alloc()
- * @layer_idr: optional idr passed to idr_layer_alloc()
+ * idr_for_each - iterate through all stored pointers
+ * @idr: idr handle
+ * @fn: function to be called for each pointer
+ * @data: data passed back to callback function
+ *
+ * Iterate over the pointers registered with the given idr. The
+ * callback function will be called for each pointer currently
+ * registered, passing the id, the pointer and the data pointer passed
+ * to this function. It is not safe to modify the idr tree while in
+ * the callback, so functions such as idr_remove are not allowed.
*
- * Allocate an id in range [@starting_id, INT_MAX] from @idp without
- * growing its depth. Returns
+ * We check the return of @fn each time. If it returns anything other
+ * than %0, we break out and return that value.
*
- * the allocated id >= 0 if successful,
- * -EAGAIN if the tree needs to grow for allocation to succeed,
- * -ENOSPC if the id space is exhausted,
- * -ENOMEM if more idr_layers need to be allocated.
+ * The caller must serialize idr_for_each() vs idr_remove().
*/
-static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa,
- gfp_t gfp_mask, struct idr *layer_idr)
+int idr_for_each(struct idr *idr,
+ int (*fn)(int id, void *p, void *data), void *data)
{
- int n, m, sh;
- struct idr_layer *p, *new;
- int l, id, oid;
-
- id = *starting_id;
- restart:
- p = idp->top;
- l = idp->layers;
- pa[l--] = NULL;
- while (1) {
- /*
- * We run around this while until we reach the leaf node...
- */
- n = (id >> (IDR_BITS*l)) & IDR_MASK;
- m = find_next_zero_bit(p->bitmap, IDR_SIZE, n);
- if (m == IDR_SIZE) {
- /* no space available go back to previous layer. */
- l++;
- oid = id;
- id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
-
- /* if already at the top layer, we need to grow */
- if (id >= 1 << (idp->layers * IDR_BITS)) {
- *starting_id = id;
- return -EAGAIN;
- }
- p = pa[l];
- BUG_ON(!p);
+ void *p;
+ unsigned id;
+ int error = 0;

- /* If we need to go up one layer, continue the
- * loop; otherwise, restart from the top.
- */
- sh = IDR_BITS * (l + 1);
- if (oid >> sh == id >> sh)
- continue;
- else
- goto restart;
- }
- if (m != n) {
- sh = IDR_BITS*l;
- id = ((id >> sh) ^ n ^ m) << sh;
- }
- if ((id >= MAX_IDR_BIT) || (id < 0))
- return -ENOSPC;
- if (l == 0)
+ idr_for_each_entry(idr, p, id) {
+ error = fn(id, p, data);
+ if (error)
break;
- /*
- * Create the layer below if it is missing.
- */
- if (!p->ary[m]) {
- new = idr_layer_alloc(gfp_mask, layer_idr);
- if (!new)
- return -ENOMEM;
- new->layer = l-1;
- new->prefix = id & idr_layer_prefix_mask(new->layer);
- rcu_assign_pointer(p->ary[m], new);
- p->count++;
- }
- pa[l--] = p;
- p = p->ary[m];
}

- pa[l] = p;
- return id;
+ return error;
}
+EXPORT_SYMBOL(idr_for_each);

-static int idr_get_empty_slot(struct idr *idp, int starting_id,
- struct idr_layer **pa, gfp_t gfp_mask,
- struct idr *layer_idr)
+/**
+ * idr_replace - replace pointer for given id
+ * @idr: idr handle
+ * @ptr: pointer you want associated with the id
+ * @id: lookup key
+ *
+ * Replace the pointer registered with an id and return the old value.
+ * A %-ENOENT return indicates that @id was not found.
+ * A %-EINVAL return indicates that @id was not within valid constraints.
+ */
+void *idr_replace(struct idr *idr, void *ptr, unsigned id)
{
- struct idr_layer *p, *new;
- int layers, v, id;
+ void **slot, *old = ERR_PTR(-ENOENT);
unsigned long flags;

- id = starting_id;
-build_up:
- p = idp->top;
- layers = idp->layers;
- if (unlikely(!p)) {
- if (!(p = idr_layer_alloc(gfp_mask, layer_idr)))
- return -ENOMEM;
- p->layer = 0;
- layers = 1;
- }
- /*
- * Add a new layer to the top of the tree if the requested
- * id is larger than the currently allocated space.
- */
- while (id > idr_max(layers)) {
- layers++;
- if (!p->count) {
- /* special case: if the tree is currently empty,
- * then we grow the tree by moving the top node
- * upwards.
- */
- p->layer++;
- WARN_ON_ONCE(p->prefix);
- continue;
- }
- if (!(new = idr_layer_alloc(gfp_mask, layer_idr))) {
- /*
- * The allocation failed. If we built part of
- * the structure tear it down.
- */
- spin_lock_irqsave(&idp->lock, flags);
- for (new = p; p && p != idp->top; new = p) {
- p = p->ary[0];
- new->ary[0] = NULL;
- new->count = 0;
- bitmap_clear(new->bitmap, 0, IDR_SIZE);
- __move_to_free_list(idp, new);
- }
- spin_unlock_irqrestore(&idp->lock, flags);
- return -ENOMEM;
- }
- new->ary[0] = p;
- new->count = 1;
- new->layer = layers-1;
- new->prefix = id & idr_layer_prefix_mask(new->layer);
- if (bitmap_full(p->bitmap, IDR_SIZE))
- __set_bit(0, new->bitmap);
- p = new;
+ rcu_read_lock();
+ spin_lock_irqsave(&idr->ida.lock, flags);
+
+ slot = radix_tree_lookup_slot(&idr->ptrs, id);
+
+ if (slot) {
+ old = radix_tree_deref_slot(slot);
+ if (old)
+ radix_tree_replace_slot(slot, __idr_radix_ptr(ptr));
}
- rcu_assign_pointer(idp->top, p);
- idp->layers = layers;
- v = sub_alloc(idp, &id, pa, gfp_mask, layer_idr);
- if (v == -EAGAIN)
- goto build_up;
- return(v);
-}

-/*
- * @id and @pa are from a successful allocation from idr_get_empty_slot().
- * Install the user pointer @ptr and mark the slot full.
- */
-static void idr_fill_slot(struct idr *idr, void *ptr, int id,
- struct idr_layer **pa)
-{
- /* update hint used for lookup, cleared from free_layer() */
- rcu_assign_pointer(idr->hint, pa[0]);
+ spin_unlock_irqrestore(&idr->ida.lock, flags);
+ rcu_read_unlock();

- rcu_assign_pointer(pa[0]->ary[id & IDR_MASK], (struct idr_layer *)ptr);
- pa[0]->count++;
- idr_mark_full(pa, id);
+ return __radix_idr_ptr(old);
}
+EXPORT_SYMBOL(idr_replace);

/**
- * idr_preload - preload for idr_alloc_range()
- * @gfp_mask: allocation mask to use for preloading
- *
- * Preload per-cpu layer buffer for idr_alloc_range(). Can only be used from
- * process context and each idr_preload() invocation should be matched with
- * idr_preload_end(). Note that preemption is disabled while preloaded.
- *
- * The first idr_alloc_range() in the preloaded section can be treated as if it
- * were invoked with @gfp_mask used for preloading. This allows using more
- * permissive allocation masks for idrs protected by spinlocks.
- *
- * For example, if idr_alloc_range() below fails, the failure can be treated as
- * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
- *
- * idr_preload(GFP_KERNEL);
- * spin_lock(lock);
- *
- * id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
- *
- * spin_unlock(lock);
- * idr_preload_end();
- * if (id < 0)
- * error;
+ * idr_remove - remove the given id and free its slot
+ * @idr: idr handle
+ * @id: unique key
*/
-void idr_preload(gfp_t gfp_mask)
+void idr_remove(struct idr *idr, unsigned id)
{
- /*
- * Consuming preload buffer from non-process context breaks preload
- * allocation guarantee. Disallow usage from those contexts.
- */
- WARN_ON_ONCE(in_interrupt());
- might_sleep_if(gfp_mask & __GFP_WAIT);
-
- preempt_disable();
-
- /*
- * idr_alloc_range() is likely to succeed w/o full idr_layer buffer and
- * return value from idr_alloc_range() needs to be checked for failure
- * anyway. Silently give up if allocation fails. The caller can
- * treat failures from idr_alloc_range() as if idr_alloc() were called
- * with @gfp_mask which should be enough.
- */
- while (__this_cpu_read(idr_preload_cnt) < MAX_IDR_FREE) {
- struct idr_layer *new;
-
- preempt_enable();
- new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
- preempt_disable();
- if (!new)
- break;
+ unsigned long flags;

- /* link the new one to per-cpu preload list */
- new->ary[0] = __this_cpu_read(idr_preload_head);
- __this_cpu_write(idr_preload_head, new);
- __this_cpu_inc(idr_preload_cnt);
+ spin_lock_irqsave(&idr->ida.lock, flags);
+
+ radix_tree_delete(&idr->ptrs, id);
+ __ida_remove(&idr->ida, id);
+
+ spin_unlock_irqrestore(&idr->ida.lock, flags);
+}
+EXPORT_SYMBOL(idr_remove);
+
+static int idr_insert(struct idr *idr, void *ptr, unsigned id,
+ gfp_t gfp, unsigned long *flags)
+{
+ int ret = radix_tree_preload(GFP_NOWAIT);
+ if (ret) {
+ spin_unlock_irqrestore(&idr->ida.lock, *flags);
+ ret = radix_tree_preload(gfp);
+ spin_lock_irqsave(&idr->ida.lock, *flags);
+
+ if (ret) {
+ __ida_remove(&idr->ida, id);
+ return ret;
+ }
}
+
+ ret = radix_tree_insert(&idr->ptrs, id, __idr_radix_ptr(ptr));
+ BUG_ON(ret);
+ radix_tree_preload_end();
+ return id;
}
-EXPORT_SYMBOL(idr_preload);

/**
* idr_alloc_range - allocate new idr entry
@@ -1213,43 +967,34 @@ EXPORT_SYMBOL(idr_preload);
* @ptr: pointer to be associated with the new id
* @start: the minimum id (inclusive)
* @end: the maximum id (exclusive, <= 0 for max)
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
*
* Allocate an id in [start, end) and associate it with @ptr. If no ID is
* available in the specified range, returns -ENOSPC. On memory allocation
* failure, returns -ENOMEM.
*
- * Note that @end is treated as max when <= 0. This is to always allow
- * using @start + N as @end as long as N is inside integer range.
- *
- * The user is responsible for exclusively synchronizing all operations
- * which may modify @idr. However, read-only accesses such as idr_find()
- * or iteration can be performed under RCU read lock provided the user
- * destroys @ptr in RCU-safe way after removal from idr.
+ * Note that @end is treated as max when <= 0. This is to always allow using
+ * @start + N as @end as long as N is inside integer range.
*/
-int idr_alloc_range(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask)
+int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
+ unsigned end, gfp_t gfp)
{
- int max = end > 0 ? end - 1 : INT_MAX; /* inclusive upper limit */
- struct idr_layer *pa[MAX_IDR_LEVEL + 1];
- int id;
+ int ret;
+ unsigned id;
+ unsigned long flags;

- might_sleep_if(gfp_mask & __GFP_WAIT);
+ might_sleep_if(gfp & __GFP_WAIT);

- /* sanity checks */
- if (WARN_ON_ONCE(start < 0))
- return -EINVAL;
- if (unlikely(max < start))
- return -ENOSPC;
+ spin_lock_irqsave(&idr->ida.lock, flags);

- /* allocate id */
- id = idr_get_empty_slot(idr, start, pa, gfp_mask, NULL);
- if (unlikely(id < 0))
- return id;
- if (unlikely(id > max))
- return -ENOSPC;
+ ret = __ida_alloc_range_multiple(&idr->ida, &id, 1, start,
+ end, gfp, &flags);
+ if (ret == 1)
+ ret = idr_insert(idr, ptr, id, gfp, &flags);

- idr_fill_slot(idr, ptr, id, pa);
- return id;
+ spin_unlock_irqrestore(&idr->ida.lock, flags);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(idr_alloc_range);

@@ -1261,368 +1006,63 @@ EXPORT_SYMBOL_GPL(idr_alloc_range);
* @end: the maximum id (exclusive, <= 0 for max)
* @gfp_mask: memory allocation flags
*
- * Essentially the same as idr_alloc_range, but prefers to allocate progressively
- * higher ids if it can. If the "cur" counter wraps, then it will start again
- * at the "start" end of the range and allocate one that has already been used.
+ * Essentially the same as idr_alloc_range, but prefers to allocate
+ * progressively higher ids if it can. If the "cur" counter wraps, then it will
+ * start again at the "start" end of the range and allocate one that has already
+ * been used.
*/
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end,
- gfp_t gfp_mask)
+int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
+ unsigned end, gfp_t gfp)
{
- int id;
+ int ret;
+ unsigned long flags;

- id = idr_alloc_range(idr, ptr, max(start, idr->cur), end, gfp_mask);
- if (id == -ENOSPC)
- id = idr_alloc_range(idr, ptr, start, end, gfp_mask);
+ might_sleep_if(gfp & __GFP_WAIT);

- if (likely(id >= 0))
- idr->cur = id + 1;
- return id;
-}
-EXPORT_SYMBOL(idr_alloc_cyclic);
+ spin_lock_irqsave(&idr->ida.lock, flags);

-static void idr_remove_warning(int id)
-{
- printk(KERN_WARNING
- "idr_remove called for id=%d which is not allocated.\n", id);
- dump_stack();
-}
+ ret = __ida_alloc_cyclic(&idr->ida, start, end, gfp, &flags);
+ if (ret >= 0)
+ ret = idr_insert(idr, ptr, ret, gfp, &flags);

-static void sub_remove(struct idr *idp, int shift, int id)
-{
- struct idr_layer *p = idp->top;
- struct idr_layer **pa[MAX_IDR_LEVEL + 1];
- struct idr_layer ***paa = &pa[0];
- struct idr_layer *to_free;
- int n;
-
- *paa = NULL;
- *++paa = &idp->top;
-
- while ((shift > 0) && p) {
- n = (id >> shift) & IDR_MASK;
- __clear_bit(n, p->bitmap);
- *++paa = &p->ary[n];
- p = p->ary[n];
- shift -= IDR_BITS;
- }
- n = id & IDR_MASK;
- if (likely(p != NULL && test_bit(n, p->bitmap))) {
- __clear_bit(n, p->bitmap);
- rcu_assign_pointer(p->ary[n], NULL);
- to_free = NULL;
- while(*paa && ! --((**paa)->count)){
- if (to_free)
- free_layer(idp, to_free);
- to_free = **paa;
- **paa-- = NULL;
- }
- if (!*paa)
- idp->layers = 0;
- if (to_free)
- free_layer(idp, to_free);
- } else
- idr_remove_warning(id);
-}
+ spin_unlock_irqrestore(&idr->ida.lock, flags);

-/**
- * idr_remove - remove the given id and free its slot
- * @idp: idr handle
- * @id: unique key
- */
-void idr_remove(struct idr *idp, int id)
-{
- struct idr_layer *p;
- struct idr_layer *to_free;
-
- if (id < 0)
- return;
-
- sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
- if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
- idp->top->ary[0]) {
- /*
- * Single child at leftmost slot: we can shrink the tree.
- * This level is not needed anymore since when layers are
- * inserted, they are inserted at the top of the existing
- * tree.
- */
- to_free = idp->top;
- p = idp->top->ary[0];
- rcu_assign_pointer(idp->top, p);
- --idp->layers;
- to_free->count = 0;
- bitmap_clear(to_free->bitmap, 0, IDR_SIZE);
- free_layer(idp, to_free);
- }
- while (idp->id_free_cnt >= MAX_IDR_FREE) {
- p = get_from_free_list(idp);
- /*
- * Note: we don't call the rcu callback here, since the only
- * layers that fall into the freelist are those that have been
- * preallocated.
- */
- kmem_cache_free(idr_layer_cache, p);
- }
- return;
-}
-EXPORT_SYMBOL(idr_remove);
-
-static void __idr_remove_all(struct idr *idp)
-{
- int n, id, max;
- int bt_mask;
- struct idr_layer *p;
- struct idr_layer *pa[MAX_IDR_LEVEL + 1];
- struct idr_layer **paa = &pa[0];
-
- n = idp->layers * IDR_BITS;
- p = idp->top;
- rcu_assign_pointer(idp->top, NULL);
- max = idr_max(idp->layers);
-
- id = 0;
- while (id >= 0 && id <= max) {
- while (n > IDR_BITS && p) {
- n -= IDR_BITS;
- *paa++ = p;
- p = p->ary[(id >> n) & IDR_MASK];
- }
-
- bt_mask = id;
- id += 1 << n;
- /* Get the highest bit that the above add changed from 0->1. */
- while (n < fls(id ^ bt_mask)) {
- if (p)
- free_layer(idp, p);
- n += IDR_BITS;
- p = *--paa;
- }
- }
- idp->layers = 0;
+ return ret;
}
+EXPORT_SYMBOL(idr_alloc_cyclic);

/**
- * idr_destroy - release all cached layers within an idr tree
- * @idp: idr handle
+ * idr_destroy - free all memory owned by @idr
+ * @idr: idr handle
*
- * Free all id mappings and all idp_layers. After this function, @idp is
- * completely unused and can be freed / recycled. The caller is
- * responsible for ensuring that no one else accesses @idp during or after
- * idr_destroy().
+ * After this function, @idr is completely unused and can be freed / recycled.
*
* A typical clean-up sequence for objects stored in an idr tree will use
* idr_for_each() to free all objects, if necessay, then idr_destroy() to
- * free up the id mappings and cached idr_layers.
- */
-void idr_destroy(struct idr *idp)
-{
- __idr_remove_all(idp);
-
- while (idp->id_free_cnt) {
- struct idr_layer *p = get_from_free_list(idp);
- kmem_cache_free(idr_layer_cache, p);
- }
-}
-EXPORT_SYMBOL(idr_destroy);
-
-void *idr_find_slowpath(struct idr *idp, int id)
-{
- int n;
- struct idr_layer *p;
-
- if (id < 0)
- return NULL;
-
- p = rcu_dereference_raw(idp->top);
- if (!p)
- return NULL;
- n = (p->layer+1) * IDR_BITS;
-
- if (id > idr_max(p->layer + 1))
- return NULL;
- BUG_ON(n == 0);
-
- while (n > 0 && p) {
- n -= IDR_BITS;
- BUG_ON(n != p->layer*IDR_BITS);
- p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
- }
- return((void *)p);
-}
-EXPORT_SYMBOL(idr_find_slowpath);
-
-/**
- * idr_for_each - iterate through all stored pointers
- * @idp: idr handle
- * @fn: function to be called for each pointer
- * @data: data passed back to callback function
- *
- * Iterate over the pointers registered with the given idr. The
- * callback function will be called for each pointer currently
- * registered, passing the id, the pointer and the data pointer passed
- * to this function. It is not safe to modify the idr tree while in
- * the callback, so functions such as idr_remove are not allowed.
- *
- * We check the return of @fn each time. If it returns anything other
- * than %0, we break out and return that value.
- *
- * The caller must serialize idr_for_each() vs idr_remove().
+ * free the embedded ida and radix tree.
*/
-int idr_for_each(struct idr *idp,
- int (*fn)(int id, void *p, void *data), void *data)
+void idr_destroy(struct idr *idr)
{
- int n, id, max, error = 0;
- struct idr_layer *p;
- struct idr_layer *pa[MAX_IDR_LEVEL + 1];
- struct idr_layer **paa = &pa[0];
-
- n = idp->layers * IDR_BITS;
- p = rcu_dereference_raw(idp->top);
- max = idr_max(idp->layers);
-
- id = 0;
- while (id >= 0 && id <= max) {
- while (n > 0 && p) {
- n -= IDR_BITS;
- *paa++ = p;
- p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
- }
-
- if (p) {
- error = fn(id, (void *)p, data);
- if (error)
- break;
- }
-
- id += 1 << n;
- while (n < fls(id)) {
- n += IDR_BITS;
- p = *--paa;
- }
- }
-
- return error;
-}
-EXPORT_SYMBOL(idr_for_each);
-
-/**
- * idr_find_next - lookup next object of id to given id.
- * @idp: idr handle
- * @nextidp: pointer to lookup key
- *
- * Returns pointer to registered object with id, which is next number to
- * given id. After being looked up, *@nextidp will be updated for the next
- * iteration.
- *
- * This function can be called under rcu_read_lock(), given that the leaf
- * pointers lifetimes are correctly managed.
- */
-void *idr_find_next(struct idr *idp, int *nextidp)
-{
- struct idr_layer *p, *pa[MAX_IDR_LEVEL + 1];
- struct idr_layer **paa = &pa[0];
- int id = *nextidp;
- int n, max;
-
- /* find first ent */
- p = rcu_dereference_raw(idp->top);
- if (!p)
- return NULL;
- n = (p->layer + 1) * IDR_BITS;
- max = idr_max(p->layer + 1);
-
- while (id >= 0 && id <= max) {
- while (n > 0 && p) {
- n -= IDR_BITS;
- *paa++ = p;
- p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
- }
-
- if (p) {
- *nextidp = id;
- return p;
- }
-
- /*
- * Proceed to the next layer at the current level. Unlike
- * idr_for_each(), @id isn't guaranteed to be aligned to
- * layer boundary at this point and adding 1 << n may
- * incorrectly skip IDs. Make sure we jump to the
- * beginning of the next layer using round_up().
- */
- id = round_up(id + 1, 1 << n);
- while (n < fls(id)) {
- n += IDR_BITS;
- p = *--paa;
- }
- }
- return NULL;
-}
-EXPORT_SYMBOL(idr_find_next);
-
-
-/**
- * idr_replace - replace pointer for given id
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @id: lookup key
- *
- * Replace the pointer registered with an id and return the old value.
- * A %-ENOENT return indicates that @id was not found.
- * A %-EINVAL return indicates that @id was not within valid constraints.
- *
- * The caller must serialize with writers.
- */
-void *idr_replace(struct idr *idp, void *ptr, int id)
-{
- int n;
- struct idr_layer *p, *old_p;
-
- if (id < 0)
- return ERR_PTR(-EINVAL);
-
- p = idp->top;
- if (!p)
- return ERR_PTR(-EINVAL);
-
- n = (p->layer+1) * IDR_BITS;
-
- if (id >= (1 << n))
- return ERR_PTR(-EINVAL);
-
- n -= IDR_BITS;
- while ((n > 0) && p) {
- p = p->ary[(id >> n) & IDR_MASK];
- n -= IDR_BITS;
- }
-
- n = id & IDR_MASK;
- if (unlikely(p == NULL || !test_bit(n, p->bitmap)))
- return ERR_PTR(-ENOENT);
-
- old_p = p->ary[n];
- rcu_assign_pointer(p->ary[n], ptr);
+ void *p;
+ unsigned id;

- return old_p;
-}
-EXPORT_SYMBOL(idr_replace);
+ idr_for_each_entry(idr, p, id)
+ idr_remove(idr, id);

-void __init idr_init_cache(void)
-{
- idr_layer_cache = kmem_cache_create("idr_layer_cache",
- sizeof(struct idr_layer), 0, SLAB_PANIC, NULL);
+ ida_destroy(&idr->ida);
}
+EXPORT_SYMBOL(idr_destroy);

/**
- * idr_init - initialize idr handle
- * @idp: idr handle
+ * idr_init - initialize sparse idr handle
+ * @idr: idr handle
*
- * This function is use to set up the handle (@idp) that you will pass
+ * This function is use to set up the handle (@idr) that you will pass
* to the rest of the functions.
*/
-void idr_init(struct idr *idp)
+void idr_init(struct idr *idr)
{
- memset(idp, 0, sizeof(struct idr));
- spin_lock_init(&idp->lock);
+ ida_init(&idr->ida);
+ INIT_RADIX_TREE(&idr->ptrs, GFP_NOWAIT);
}
EXPORT_SYMBOL(idr_init);
--
1.8.3.1

2013-06-19 00:04:11

by Kent Overstreet

[permalink] [raw]

Subject: [PATCH 04/10] idr: Percpu ida

Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.

We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.

Note that the synchronization is _definitely_ tricky - we're using
xchg()/cmpxchg() on the percpu lists, to synchronize between
steal_tags().

The alternative would've been adding a spinlock to protect the percpu
freelists, but that would've required some different tricky code to
avoid deadlock because of the lock ordering.

Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Nicholas A. Bellinger" <[email protected]>
---
include/linux/idr.h | 48 ++++++++
lib/idr.c | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 352 insertions(+), 8 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index a78cf99..3ba796b 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -16,6 +16,8 @@
#include <linux/bitops.h>
#include <linux/init.h>
#include <linux/rcupdate.h>
+#include <linux/spinlock_types.h>
+#include <linux/wait.h>

/* IDA */

@@ -97,6 +99,52 @@ static inline void ida_init(struct ida *ida)
ida_init_prealloc(ida, 0);
}

+/* Percpu IDA/tag allocator */
+
+struct percpu_ida_cpu;
+
+struct percpu_ida {
+ /*
+ * number of tags available to be allocated, as passed to
+ * percpu_ida_init()
+ */
+ unsigned nr_tags;
+
+ struct percpu_ida_cpu __percpu *tag_cpu;
+
+ /*
+ * Bitmap of cpus that (may) have tags on their percpu freelists:
+ * steal_tags() uses this to decide when to steal tags, and which cpus
+ * to try stealing from.
+ *
+ * It's ok for a freelist to be empty when its bit is set - steal_tags()
+ * will just keep looking - but the bitmap _must_ be set whenever a
+ * percpu freelist does have tags.
+ */
+ unsigned long *cpus_have_tags;
+
+ struct {
+ /*
+ * When we go to steal tags from another cpu (see steal_tags()),
+ * we want to pick a cpu at random. Cycling through them every
+ * time we steal is a bit easier and more or less equivalent:
+ */
+ unsigned cpu_last_stolen;
+
+ /* For sleeping on allocation failure */
+ wait_queue_head_t wait;
+
+ /* Global freelist */
+ struct ida ida;
+ } ____cacheline_aligned_in_smp;
+};
+
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp);
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
+
+void percpu_ida_destroy(struct percpu_ida *pool);
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);
+
/* IDR */

/*
diff --git a/lib/idr.c b/lib/idr.c
index e6b837a..4962e5d 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -28,17 +28,20 @@
* with the slab allocator.
*/

-#ifndef TEST // to test in user space...
-#include <linux/slab.h>
-#include <linux/init.h>
-#include <linux/export.h>
-#endif
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
#include <linux/err.h>
-#include <linux/string.h>
+#include <linux/export.h>
+#include <linux/hardirq.h>
#include <linux/idr.h>
-#include <linux/spinlock.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
#include <linux/percpu.h>
-#include <linux/hardirq.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/spinlock.h>

#define IDA_TREE_ARY BITS_PER_LONG

@@ -527,6 +530,299 @@ int ida_init_prealloc(struct ida *ida, unsigned prealloc)
}
EXPORT_SYMBOL(ida_init_prealloc);

+/* Percpu IDA */
+
+/*
+ * Number of tags we move between the percpu freelist and the global freelist at
+ * a time
+ */
+#define TAG_CPU_BATCH_MOVE 32U
+
+/* Max size of percpu freelist, */
+#define TAG_CPU_SIZE (TAG_CPU_BATCH_MOVE + TAG_CPU_BATCH_MOVE / 2)
+
+struct percpu_ida_cpu {
+ spinlock_t lock;
+ unsigned nr_free;
+ unsigned freelist[];
+};
+
+/*
+ * Try to steal tags from a remote cpu's percpu freelist.
+ *
+ * We first check how many percpu freelists have tags - we don't steal tags
+ * unless enough percpu freelists have tags on them that it's possible more than
+ * half the total tags could be stuck on remote percpu freelists.
+ *
+ * Then we iterate through the cpus until we find some tags - we don't attempt
+ * to find the "best" cpu to steal from, to keep cacheline bouncing to a
+ * minimum.
+ *
+ * Returns true on success (our percpu freelist is no longer empty), false on
+ * failure.
+ */
+static inline bool steal_tags(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
+ struct percpu_ida_cpu *remote;
+
+ for (cpus_have_tags = bitmap_weight(pool->cpus_have_tags, nr_cpu_ids);
+ cpus_have_tags * TAG_CPU_SIZE > pool->nr_tags / 2;
+ cpus_have_tags--) {
+ cpu = find_next_bit(pool->cpus_have_tags, nr_cpu_ids, cpu);
+
+ if (cpu == nr_cpu_ids)
+ cpu = find_first_bit(pool->cpus_have_tags, nr_cpu_ids);
+
+ if (cpu == nr_cpu_ids)
+ BUG();
+
+ pool->cpu_last_stolen = cpu;
+ remote = per_cpu_ptr(pool->tag_cpu, cpu);
+
+ clear_bit(cpu, pool->cpus_have_tags);
+
+ if (remote == tags)
+ continue;
+
+ spin_lock(&remote->lock);
+
+ if (remote->nr_free) {
+ memcpy(tags->freelist,
+ remote->freelist,
+ sizeof(unsigned) * remote->nr_free);
+
+ tags->nr_free = remote->nr_free;
+ remote->nr_free = 0;
+ }
+
+ spin_unlock(&remote->lock);
+
+ if (tags->nr_free)
+ return true;
+ }
+
+ return false;
+}
+
+static inline bool alloc_global_tags(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ int nr_free = __ida_alloc_range_multiple(&pool->ida, tags->freelist,
+ TAG_CPU_BATCH_MOVE, 0,
+ pool->nr_tags, GFP_NOWAIT,
+ NULL);
+
+ if (nr_free <= 0)
+ return false;
+
+ tags->nr_free = nr_free;
+ return true;
+}
+
+static inline unsigned alloc_local_tag(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ int tag = -ENOSPC;
+
+ spin_lock(&tags->lock);
+ if (tags->nr_free)
+ tag = tags->freelist[--tags->nr_free];
+ spin_unlock(&tags->lock);
+
+ return tag;
+}
+
+/**
+ * percpu_ida_alloc - allocate a tag
+ * @pool: pool to allocate from
+ * @gfp: gfp flags
+ *
+ * Returns a tag - an integer in the range [0..nr_tags) (passed to
+ * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
+ *
+ * Safe to be called from interrupt context (assuming it isn't passed
+ * __GFP_WAIT, of course).
+ *
+ * Will not fail if passed __GFP_WAIT.
+ */
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
+{
+ DEFINE_WAIT(wait);
+ struct percpu_ida_cpu *tags;
+ unsigned long flags;
+ unsigned this_cpu;
+ int tag;
+
+ local_irq_save(flags);
+ this_cpu = smp_processor_id();
+ tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+
+ /* Fastpath */
+ tag = alloc_local_tag(pool, tags);
+ if (likely(tag >= 0)) {
+ local_irq_restore(flags);
+ return tag;
+ }
+
+ while (1) {
+ spin_lock(&pool->ida.lock);
+
+ /*
+ * prepare_to_wait() must come before steal_tags(), in case
+ * percpu_ida_free() on another cpu flips a bit in
+ * cpus_have_tags
+ */
+ prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+
+ /*
+ * alloc_global_tags(), steal_tags() return true iff we now have
+ * tags on our percpu freelist
+ */
+ if (tags->nr_free ||
+ alloc_global_tags(pool, tags) ||
+ steal_tags(pool, tags)) {
+ /*
+ * Global lock held and irqs disabled, don't need percpu
+ * lock
+ */
+ tag = tags->freelist[--tags->nr_free];
+ if (tags->nr_free)
+ set_bit(this_cpu, pool->cpus_have_tags);
+ }
+
+ spin_unlock(&pool->ida.lock);
+ local_irq_restore(flags);
+
+ if (tag >= 0 || !(gfp & __GFP_WAIT))
+ break;
+
+ schedule();
+
+ local_irq_save(flags);
+ this_cpu = smp_processor_id();
+ tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+ }
+
+ finish_wait(&pool->wait, &wait);
+ return tag;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_alloc);
+
+/**
+ * percpu_ida_free - free a tag
+ * @pool: pool @tag was allocated from
+ * @tag: a tag previously allocated with percpu_ida_alloc()
+ *
+ * Safe to be called from interrupt context.
+ */
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
+{
+ struct percpu_ida_cpu *tags;
+ unsigned long flags;
+ unsigned nr_free, this_cpu;
+
+ BUG_ON(tag >= pool->nr_tags);
+
+ local_irq_save(flags);
+ this_cpu = smp_processor_id();
+ tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+
+ spin_lock(&tags->lock);
+ nr_free = tags->nr_free;
+ tags->freelist[tags->nr_free++] = tag;
+ spin_unlock(&tags->lock);
+
+ if (!nr_free) {
+ set_bit(this_cpu, pool->cpus_have_tags);
+ wake_up(&pool->wait);
+ }
+
+ if (nr_free + 1 == TAG_CPU_SIZE) {
+ spin_lock(&pool->ida.lock);
+
+ /*
+ * Global lock held and irqs disabled, don't need percpu
+ * lock
+ */
+ while (tags->nr_free > TAG_CPU_SIZE - TAG_CPU_BATCH_MOVE)
+ __ida_remove(&pool->ida,
+ tags->freelist[--tags->nr_free]);
+
+ wake_up(&pool->wait);
+ spin_unlock(&pool->ida.lock);
+ }
+
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(percpu_ida_free);
+
+/**
+ * percpu_ida_destroy - release a tag pool's resources
+ * @pool: pool to free
+ *
+ * Frees the resources allocated by percpu_ida_init().
+ */
+void percpu_ida_destroy(struct percpu_ida *pool)
+{
+ free_percpu(pool->tag_cpu);
+ kfree(pool->cpus_have_tags);
+ ida_destroy(&pool->ida);
+}
+EXPORT_SYMBOL_GPL(percpu_ida_destroy);
+
+/**
+ * percpu_ida_init - initialize a percpu tag pool
+ * @pool: pool to initialize
+ * @nr_tags: number of tags that will be available for allocation
+ *
+ * Initializes @pool so that it can be used to allocate tags - integers in the
+ * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
+ * preallocated array of tag structures.
+ *
+ * Allocation is percpu, but sharding is limited by nr_tags - for best
+ * performance, the workload should not span more cpus than nr_tags / 128.
+ */
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
+{
+ unsigned cpu;
+
+ memset(pool, 0, sizeof(*pool));
+
+ init_waitqueue_head(&pool->wait);
+ pool->nr_tags = nr_tags;
+
+ /* Guard against overflow */
+ if (nr_tags > (unsigned) INT_MAX + 1) {
+ pr_err("tags.c: nr_tags too large\n");
+ return -EINVAL;
+ }
+
+ if (ida_init_prealloc(&pool->ida, nr_tags))
+ return -ENOMEM;
+
+ pool->cpus_have_tags = kzalloc(BITS_TO_LONGS(nr_cpu_ids) *
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!pool->cpus_have_tags)
+ goto err;
+
+ pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
+ TAG_CPU_SIZE * sizeof(unsigned),
+ sizeof(unsigned));
+ if (!pool->tag_cpu)
+ goto err;
+
+ for_each_possible_cpu(cpu)
+ spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
+
+ return 0;
+err:
+ percpu_ida_destroy(pool);
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_init);
+
/* IDR */

#define MAX_IDR_SHIFT (sizeof(int) * 8 - 1)
--
1.8.3.1

2013-06-19 09:43:05

by Steven Whitehouse

[permalink] [raw]

Subject: Re: [PATCH 03/10] idr: Rewrite ida

Hi,

On Tue, 2013-06-18 at 17:02 -0700, Kent Overstreet wrote:
> This is a new, from scratch implementation of ida that should be
> simpler, faster and more space efficient.
>
[...]

>
> This does mean that the entire bitmap is stored in one contiguous memory
> allocation - and as currently implemented we won't be able to allocate
> _quite_ as many ids as with the previous implementation.
>
> I don't expect this to be an issue in practice since anywhere this is
> used, an id corresponds to a struct allocation somewher else - we can't
> allocate an unbounded number of ids, we'll run out of memory somewhere
> else eventually, and I expect that to be the limiting factor in
> practice.
>
> If a user/use case does come up where this matters I can add some
> sharding (or perhaps add a separate big_ida implementation) - but the
> extra complexity would adversely affect performance for the users that
> don't need > millions of ids, so I intend to leave the implementation as
> is until if and when this becomes an issue.
>

Millions of IDs is something that is fairly normal for DLM, since there
will be two DLM locks per cached inode with GFS2 and people tend to use
it on pretty large servers with lots of memory,

Steve.

2013-06-19 23:38:43

by Kent Overstreet

[permalink] [raw]

Subject: Re: [PATCH 03/10] idr: Rewrite ida

On Wed, Jun 19, 2013 at 10:40:22AM +0100, Steven Whitehouse wrote:
> Hi,
>
> On Tue, 2013-06-18 at 17:02 -0700, Kent Overstreet wrote:
> > This is a new, from scratch implementation of ida that should be
> > simpler, faster and more space efficient.
> >
> [...]
>
> >
> > This does mean that the entire bitmap is stored in one contiguous memory
> > allocation - and as currently implemented we won't be able to allocate
> > _quite_ as many ids as with the previous implementation.
> >
> > I don't expect this to be an issue in practice since anywhere this is
> > used, an id corresponds to a struct allocation somewher else - we can't
> > allocate an unbounded number of ids, we'll run out of memory somewhere
> > else eventually, and I expect that to be the limiting factor in
> > practice.
> >
> > If a user/use case does come up where this matters I can add some
> > sharding (or perhaps add a separate big_ida implementation) - but the
> > extra complexity would adversely affect performance for the users that
> > don't need > millions of ids, so I intend to leave the implementation as
> > is until if and when this becomes an issue.
> >
>
> Millions of IDs is something that is fairly normal for DLM, since there
> will be two DLM locks per cached inode with GFS2 and people tend to use
> it on pretty large servers with lots of memory,

Thanks, I wasn't aware of that. Is the 31 bits for the id limitation an
issue for you? While I'm at changing ids to longs should be fairly
trivial.

2013-06-21 16:33:53

by David Teigland

[permalink] [raw]

Subject: Re: [PATCH 03/10] idr: Rewrite ida

On Wed, Jun 19, 2013 at 04:38:36PM -0700, Kent Overstreet wrote:
> On Wed, Jun 19, 2013 at 10:40:22AM +0100, Steven Whitehouse wrote:
> > Millions of IDs is something that is fairly normal for DLM, since there
> > will be two DLM locks per cached inode with GFS2 and people tend to use
> > it on pretty large servers with lots of memory,
>
> Thanks, I wasn't aware of that. Is the 31 bits for the id limitation an
> issue for you? While I'm at changing ids to longs should be fairly
> trivial.

There is a dlm_lkb struct in memory for each id, so 31 bits will not
be a problem.
Dave