Message-ID: <48438ACB.8040500@bull.net>
Date: Mon, 02 Jun 2008 07:53:15 +0200
From: Nadia Derbey <Nadia.Derbey@bull.net>
Organization: BULL/DT/OSwR&D/Linux
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115
MIME-Version: 1.0
To: paulmck@linux.vnet.ibm.com
Cc: manfred@colorfullife.com, lnxninja@linux.vnet.ibm.com,
       linux-kernel@vger.kernel.org, efault@gmx.de, akpm@linux-foundation.org
Subject: Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
References: <20080507113553.395937000@bull.net> <20080530082214.GD4943@linux.vnet.ibm.com>
In-Reply-To: <20080530082214.GD4943@linux.vnet.ibm.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 28293
Lines: 1099

Paul E. McKenney wrote:
> On Wed, May 07, 2008 at 01:35:53PM +0200, Nadia.Derbey@bull.net wrote:
> 
>>After scalability problems have been detected when using the sysV ipcs, I
>>have proposed to use an RCU based implementation of the IDR api instead (see
>>threads http://lkml.org/lkml/2008/4/11/212 and
>>http://lkml.org/lkml/2008/4/29/295).
>>
>>This resulted in many people asking to convert the idr API and make it
>>rcu safe (because most of the code was duplicated and thus unmaintanable
>>and unreviewable).
>>
>>So here is a first attempt.
>>
>>The important change wrt to the idr API itself is during idr removes:
>>idr layers are freed after a grace period, instead of being moved to the
>>free list.
>>
>>The important change wrt to ipcs, is that idr_find() can now be called
>>locklessly inside a rcu read critical section.
>>
>>Here are the results I've got for the pmsg test sent by Manfred: 
>>
>>   2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
>>1         1168441           1064021       876000               947488
>>2         1094264            921059      1549592              1730685
>>3         2082520           1738165      1694370              2324880
>>4         2079929           1695521       404553              2400408
>>5         2898758            406566       391283              3246580
>>6         2921417            261275       263249              3752148
>>7         3308761            126056       191742              4243142
>>8         3329456            100129       141722              4275780
>>
>>1st column: stock 2.6.25-rc3-mm1
>>2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
>>3nd column: stock 2.6.25-mm1
>>4th column: 2.6.25-mm1 + this pacth series.
>>
>>I'll send a chart as an answer to this mail: don't know how to do that
>>with quilt :-(
>>
>>
>>Reviewers are more than ever welcome!
>>
>>Patches should be applied on linux-2.6.25-mm1, in the following order:
>>
>>[ PATCH 01/09 ] : idr_add_rcu_head.patch
>>[ PATCH 02/09 ] : idr_rename_routines.patch
>>[ PATCH 03/09 ] : idr_fix_printk.patch
>>[ PATCH 04/09 ] : idr_rc_to_errno.patch
>>[ PATCH 05/09 ] : idr_get_new_rcu_safe.patch
>>[ PATCH 06/09 ] : idr_find_rcu_safe.patch
>>[ PATCH 07/09 ] : idr_remove_rcu_safe.patch
>>[ PATCH 08/09 ] : ipc_fix_ipc_lock.patch
>>[ PATCH 09/09 ] : remove_ipc_lock_down.patch
>>
>>Patches 2, 3 and 4 do not introduce actual changes.
>>
>>I won't be available before next Tuesday, so, please, don't be mad at me if
>>I'm not answering fast enough.
> 
> 
> I guess in my case, next Tuesday was not an issue.  :-/

Anyway, thanks for reviewing the code!

> 
> Anyway, the idr.c changes look good to me.  Not sure why you are using
> INIT_RCU_HEAD() given that call_rcu() completely initializes the fields.
> Using INIT_RCU_HEAD() doesn't cause any problems, but does add needless
> code.

Well the INIT_RCU_HEAD calls are here because I first thought of moving 
the freed idr layers to the free list instead of directly removing them.
So the INIT_RCU_HEAD was a way for me to have a "blank" structure when 
getting a layer from the free list, i.e. reusing an old layer.
But I finally gave up the idea... but left the INI_RCU_HEAD() calls.
Will fix this.
But I have a "process" question: should I resend the complete patch 
series or would a patch above the already submitted ones be enough?

Regards,
Nadia

> 
> Commentary below, looks good from an RCU viewpoint.
> 
> 							Thanx, Paul
> 
> 
>>/*
>> * 2002-10-18  written by Jim Houston jim.houston@ccur.com
>> *	Copyright (C) 2002 by Concurrent Computer Corporation
>> *	Distributed under the GNU GPL license version 2.
>> *
>> * Modified by George Anzinger to reuse immediately and to use
>> * find bit instructions.  Also removed _irq on spinlocks.
>> *
>> * Modified by Nadia Derbey to make it RCU safe.
>> *
>> * Small id to pointer translation service.
>> *
>> * It uses a radix tree like structure as a sparse array indexed
>> * by the id to obtain the pointer.  The bitmap makes allocating
>> * a new id quick.
>> *
>> * You call it to allocate an id (an int) an associate with that id a
>> * pointer or what ever, we treat it as a (void *).  You can pass this
>> * id to a user for him to pass back at a later time.  You then pass
>> * that id to this code and it returns your pointer.
>>
>> * You can release ids at any time. When all ids are released, most of
>> * the memory is returned (we keep IDR_FREE_MAX) in a local pool so we
>> * don't need to go to the memory "store" during an id allocate, just
>> * so you don't need to be too concerned about locking and conflicts
>> * with the slab allocator.
>> */
>>
>>#ifndef TEST                        // to test in user space...
>>#include <linux/slab.h>
>>#include <linux/init.h>
>>#include <linux/module.h>
>>#endif
>>#include <linux/err.h>
>>#include <linux/string.h>
>>#include <linux/idr.h>
>>
>>static struct kmem_cache *idr_layer_cache;
>>
>>static struct idr_layer *get_from_free_list(struct idr *idp)
>>{
>>	struct idr_layer *p;
>>	unsigned long flags;
>>
>>	spin_lock_irqsave(&idp->lock, flags);
>>	if ((p = idp->id_free)) {
>>		idp->id_free = p->ary[0];
>>		idp->id_free_cnt--;
>>		p->ary[0] = NULL;
> 
> 
> OK, this is the freelist which is inaccessible to readers.
> 
> 
>>	}
>>	spin_unlock_irqrestore(&idp->lock, flags);
>>	return(p);
>>}
>>
>>static void idr_layer_rcu_free(struct rcu_head *head)
>>{
>>	struct idr_layer *layer;
>>
>>	layer = container_of(head, struct idr_layer, rcu_head);
>>	kmem_cache_free(idr_layer_cache, layer);
>>}
>>
>>static inline void free_layer(struct idr_layer *p)
>>{
>>	call_rcu(&p->rcu_head, idr_layer_rcu_free);
>>}
>>
>>/* only called when idp->lock is held */
>>static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
>>{
>>	p->ary[0] = idp->id_free;
> 
> 
> OK, this is the freelist which is inaccessible to readers.
> 
> 
>>	idp->id_free = p;
>>	idp->id_free_cnt++;
>>}
>>
>>static void move_to_free_list(struct idr *idp, struct idr_layer *p)
>>{
>>	unsigned long flags;
>>
>>	/*
>>	 * Depends on the return element being zeroed.
>>	 */
>>	spin_lock_irqsave(&idp->lock, flags);
>>	__move_to_free_list(idp, p);
>>	spin_unlock_irqrestore(&idp->lock, flags);
>>}
>>
>>static void idr_mark_full(struct idr_layer **pa, int id)
>>{
>>	struct idr_layer *p = pa[0];
>>	int l = 0;
>>
>>	__set_bit(id & IDR_MASK, &p->bitmap);
>>	/*
>>	 * If this layer is full mark the bit in the layer above to
>>	 * show that this part of the radix tree is full.  This may
>>	 * complete the layer above and require walking up the radix
>>	 * tree.
>>	 */
>>	while (p->bitmap == IDR_FULL) {
>>		if (!(p = pa[++l]))
>>			break;
>>		id = id >> IDR_BITS;
>>		__set_bit((id & IDR_MASK), &p->bitmap);
>>	}
>>}
>>
>>/**
>> * idr_pre_get - reserver resources for idr allocation
>> * @idp:	idr handle
>> * @gfp_mask:	memory allocation flags
>> *
>> * This function should be called prior to locking and calling the
>> * idr_get_new* functions. It preallocates enough memory to satisfy
>> * the worst possible allocation.
>> *
>> * If the system is REALLY out of memory this function returns 0,
>> * otherwise 1.
>> */
>>int idr_pre_get(struct idr *idp, gfp_t gfp_mask)
>>{
>>	while (idp->id_free_cnt < IDR_FREE_MAX) {
>>		struct idr_layer *new;
>>		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
>>		if (new == NULL)
>>			return (0);
>>		move_to_free_list(idp, new);
>>	}
>>	return 1;
>>}
>>EXPORT_SYMBOL(idr_pre_get);
>>
>>static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
>>{
>>	int n, m, sh;
>>	struct idr_layer *p, *new;
>>	int l, id, oid;
>>	unsigned long bm;
>>
>>	id = *starting_id;
>> restart:
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	l = idp->layers;
>>	pa[l--] = NULL;
>>	while (1) {
>>		/*
>>		 * We run around this while until we reach the leaf node...
>>		 */
>>		n = (id >> (IDR_BITS*l)) & IDR_MASK;
>>		bm = ~p->bitmap;
>>		m = find_next_bit(&bm, IDR_SIZE, n);
>>		if (m == IDR_SIZE) {
>>			/* no space available go back to previous layer. */
>>			l++;
>>			oid = id;
>>			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
>>
>>			/* if already at the top layer, we need to grow */
>>			if (!(p = pa[l])) {
>>				*starting_id = id;
>>				return IDR_NEED_TO_GROW;
>>			}
>>
>>			/* If we need to go up one layer, continue the
>>			 * loop; otherwise, restart from the top.
>>			 */
>>			sh = IDR_BITS * (l + 1);
>>			if (oid >> sh == id >> sh)
>>				continue;
>>			else
>>				goto restart;
>>		}
>>		if (m != n) {
>>			sh = IDR_BITS*l;
>>			id = ((id >> sh) ^ n ^ m) << sh;
>>		}
>>		if ((id >= MAX_ID_BIT) || (id < 0))
>>			return IDR_NOMORE_SPACE;
>>		if (l == 0)
>>			break;
>>		/*
>>		 * Create the layer below if it is missing.
>>		 */
>>		if (!p->ary[m]) {
> 
> 
> OK, we aren't dereferencing.  Besides, we should hold the update-side
> lock at this point.
> 
> 
>>			new = get_from_free_list(idp);
>>			if (!new)
>>				return -1;
>>			INIT_RCU_HEAD(&new->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>			rcu_assign_pointer(p->ary[m], new);
>>			p->count++;
>>		}
>>		pa[l--] = p;
>>		p = p->ary[m];
> 
> 
> Holding update-side lock.
> 
> 
>>	}
>>
>>	pa[l] = p;
>>	return id;
>>}
>>
>>static int idr_get_empty_slot(struct idr *idp, int starting_id,
>>			      struct idr_layer **pa)
>>{
>>	struct idr_layer *p, *new;
>>	int layers, v, id;
>>	unsigned long flags;
>>
>>	id = starting_id;
>>build_up:
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	layers = idp->layers;
>>	if (unlikely(!p)) {
>>		if (!(p = get_from_free_list(idp)))
>>			return -1;
>>		INIT_RCU_HEAD(&p->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>		layers = 1;
>>	}
>>	/*
>>	 * Add a new layer to the top of the tree if the requested
>>	 * id is larger than the currently allocated space.
>>	 */
>>	while ((layers < (MAX_LEVEL - 1)) && (id >= (1 << (layers*IDR_BITS)))) {
>>		layers++;
>>		if (!p->count)
>>			continue;
>>		if (!(new = get_from_free_list(idp))) {
>>			/*
>>			 * The allocation failed.  If we built part of
>>			 * the structure tear it down.
>>			 */
>>			spin_lock_irqsave(&idp->lock, flags);
>>			for (new = p; p && p != idp->top; new = p) {
>>				p = p->ary[0];
>>				new->ary[0] = NULL;
> 
> 
> OK, this presumably has not yet been made accessible to readers.
> 
> 
>>				new->bitmap = new->count = 0;
>>				__move_to_free_list(idp, new);
>>			}
>>			spin_unlock_irqrestore(&idp->lock, flags);
>>			return -1;
>>		}
>>		new->ary[0] = p;
> 
> 
> OK, this presumably has not yet been made accessible to readers.
> 
> 
>>		new->count = 1;
>>		INIT_RCU_HEAD(&new->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>		if (p->bitmap == IDR_FULL)
>>			__set_bit(0, &new->bitmap);
>>		p = new;
>>	}
>>	rcu_assign_pointer(idp->top, p);
>>	idp->layers = layers;
>>	v = sub_alloc(idp, &id, pa);
>>	if (v == IDR_NEED_TO_GROW)
>>		goto build_up;
>>	return(v);
>>}
>>
>>static int idr_get_new_above_int(struct idr *idp, void *ptr, int starting_id)
>>{
>>	struct idr_layer *pa[MAX_LEVEL];
>>	int id;
>>
>>	id = idr_get_empty_slot(idp, starting_id, pa);
>>	if (id >= 0) {
>>		/*
>>		 * Successfully found an empty slot.  Install the user
>>		 * pointer and mark the slot full.
>>		 */
>>		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
>>				(struct idr_layer *)ptr);
>>		pa[0]->count++;
>>		idr_mark_full(pa, id);
>>	}
>>
>>	return id;
>>}
>>
>>/**
>> * idr_get_new_above - allocate new idr entry above or equal to a start id
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the ide
>> * @start_id: id to start search at
>> * @id: pointer to the allocated handle
>> *
>> * This is the allocate id function.  It should be called with any
>> * required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff
>> */
>>int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
>>{
>>	int rv;
>>
>>	rv = idr_get_new_above_int(idp, ptr, starting_id);
>>	/*
>>	 * This is a cheap hack until the IDR code can be fixed to
>>	 * return proper error values.
>>	 */
>>	if (rv < 0)
>>		return _idr_rc_to_errno(rv);
>>	*id = rv;
>>	return 0;
>>}
>>EXPORT_SYMBOL(idr_get_new_above);
>>
>>/**
>> * idr_get_new - allocate new idr entry
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the ide
>> * @id: pointer to the allocated handle
>> *
>> * This is the allocate id function.  It should be called with any
>> * required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff
>> */
>>int idr_get_new(struct idr *idp, void *ptr, int *id)
>>{
>>	int rv;
>>
>>	rv = idr_get_new_above_int(idp, ptr, 0);
>>	/*
>>	 * This is a cheap hack until the IDR code can be fixed to
>>	 * return proper error values.
>>	 */
>>	if (rv < 0)
>>		return _idr_rc_to_errno(rv);
>>	*id = rv;
>>	return 0;
>>}
>>EXPORT_SYMBOL(idr_get_new);
>>
>>static void idr_remove_warning(int id)
>>{
>>	printk(KERN_WARNING
>>		"idr_remove called for id=%d which is not allocated.\n", id);
>>	dump_stack();
>>}
>>
>>static void sub_remove(struct idr *idp, int shift, int id)
>>{
>>	struct idr_layer *p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	struct idr_layer **pa[MAX_LEVEL];
>>	struct idr_layer ***paa = &pa[0];
>>	struct idr_layer *to_free;
>>	int n;
>>
>>	*paa = NULL;
>>	*++paa = &idp->top;
>>
>>	while ((shift > 0) && p) {
>>		n = (id >> shift) & IDR_MASK;
>>		__clear_bit(n, &p->bitmap);
>>		*++paa = &p->ary[n];
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>		p = p->ary[n];
>>		shift -= IDR_BITS;
>>	}
>>	n = id & IDR_MASK;
>>	if (likely(p != NULL && test_bit(n, &p->bitmap))){
>>		__clear_bit(n, &p->bitmap);
>>		rcu_assign_pointer(p->ary[n], NULL);
>>		to_free = NULL;
>>		while(*paa && ! --((**paa)->count)){
>>			if (to_free)
>>				free_layer(to_free);
>>			to_free = **paa;
>>			**paa-- = NULL;
>>		}
>>		if (!*paa)
>>			idp->layers = 0;
>>		if (to_free)
>>			free_layer(to_free);
>>	} else
>>		idr_remove_warning(id);
>>}
>>
>>/**
>> * idr_remove - remove the given id and free it's slot
>> * @idp: idr handle
>> * @id: unique key
>> */
>>void idr_remove(struct idr *idp, int id)
>>{
>>	struct idr_layer *p;
>>	struct idr_layer *to_free;
>>
>>	/* Mask off upper bits we don't use for the search. */
>>	id &= MAX_ID_MASK;
>>
>>	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
>>	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
>>	    idp->top->ary[0]) {
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		/*
>>		 * Single child at leftmost slot: we can shrink the tree.
>>		 * This level is not needed anymore since when layers are
>>		 * inserted, they are inserted at the top of the existing
>>		 * tree.
>>		 */
>>		to_free = idp->top;
>>		p = idp->top->ary[0];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		rcu_assign_pointer(idp->top, p);
>>		--idp->layers;
>>		to_free->bitmap = to_free->count = 0;
>>		free_layer(to_free);
>>	}
>>	while (idp->id_free_cnt >= IDR_FREE_MAX) {
>>		p = get_from_free_list(idp);
>>		/*
>>		 * Note: we don't call the rcu callback here, since the only
>>		 * layers that fall into the freelist are those that have been
>>		 * preallocated.
>>		 */
>>		kmem_cache_free(idr_layer_cache, p);
>>	}
>>	return;
>>}
>>EXPORT_SYMBOL(idr_remove);
>>
>>/**
>> * idr_remove_all - remove all ids from the given idr tree
>> * @idp: idr handle
>> *
>> * idr_destroy() only frees up unused, cached idp_layers, but this
>> * function will remove all id mappings and leave all idp_layers
>> * unused.
>> *
>> * A typical clean-up sequence for objects stored in an idr tree, will
>> * use idr_for_each() to free all objects, if necessay, then
>> * idr_remove_all() to remove all ids, and idr_destroy() to free
>> * up the cached idr_layers.
>> */
>>void idr_remove_all(struct idr *idp)
>>{
>>	int n, id, max;
>>	struct idr_layer *p;
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct idr_layer **paa = &pa[0];
>>
>>	n = idp->layers * IDR_BITS;
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	max = 1 << n;
>>
>>	id = 0;
>>	while (id < max) {
>>		while (n > IDR_BITS && p) {
>>			n -= IDR_BITS;
>>			*paa++ = p;
>>			p = p->ary[(id >> n) & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		}
>>
>>		id += 1 << n;
>>		while (n < fls(id)) {
>>			if (p)
>>				free_layer(p);
>>			n += IDR_BITS;
>>			p = *--paa;
>>		}
>>	}
>>	rcu_assign_pointer(idp->top, NULL);
>>	idp->layers = 0;
>>}
>>EXPORT_SYMBOL(idr_remove_all);
>>
>>/**
>> * idr_destroy - release all cached layers within an idr tree
>> * idp: idr handle
>> */
>>void idr_destroy(struct idr *idp)
>>{
>>	while (idp->id_free_cnt) {
>>		struct idr_layer *p = get_from_free_list(idp);
>>		kmem_cache_free(idr_layer_cache, p);
>>	}
>>}
>>EXPORT_SYMBOL(idr_destroy);
>>
>>/**
>> * idr_find - return pointer for given id
>> * @idp: idr handle
>> * @id: lookup key
>> *
>> * Return the pointer given the id it has been registered with.  A %NULL
>> * return indicates that @id is not valid or you passed %NULL in
>> * idr_get_new().
>> *
>> * This function can be called under rcu_read_lock(), given that the leaf
>> * pointers lifetimes are correctly managed.
>> */
>>void *idr_find(struct idr *idp, int id)
>>{
>>	int n;
>>	struct idr_layer *p;
>>
>>	n = idp->layers * IDR_BITS;
>>	p = rcu_dereference(idp->top);
>>
>>	/* Mask off upper bits we don't use for the search. */
>>	id &= MAX_ID_MASK;
>>
>>	if (id >= (1 << n))
>>		return NULL;
>>
>>	while (n > 0 && p) {
>>		n -= IDR_BITS;
>>		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>>	}
>>	return((void *)p);
>>}
>>EXPORT_SYMBOL(idr_find);
>>
>>/**
>> * idr_for_each - iterate through all stored pointers
>> * @idp: idr handle
>> * @fn: function to be called for each pointer
>> * @data: data passed back to callback function
>> *
>> * Iterate over the pointers registered with the given idr.  The
>> * callback function will be called for each pointer currently
>> * registered, passing the id, the pointer and the data pointer passed
>> * to this function.  It is not safe to modify the idr tree while in
>> * the callback, so functions such as idr_get_new and idr_remove are
>> * not allowed.
>> *
>> * We check the return of @fn each time. If it returns anything other
>> * than 0, we break out and return that value.
>> *
>> * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
>> */
>>int idr_for_each(struct idr *idp,
>>		 int (*fn)(int id, void *p, void *data), void *data)
>>{
>>	int n, id, max, error = 0;
>>	struct idr_layer *p;
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct idr_layer **paa = &pa[0];
>>
>>	n = idp->layers * IDR_BITS;
>>	p = rcu_dereference(idp->top);
>>	max = 1 << n;
>>
>>	id = 0;
>>	while (id < max) {
>>		while (n > 0 && p) {
>>			n -= IDR_BITS;
>>			*paa++ = p;
>>			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>>		}
>>
>>		if (p) {
>>			error = fn(id, (void *)p, data);
>>			if (error)
>>				break;
>>		}
>>
>>		id += 1 << n;
>>		while (n < fls(id)) {
>>			n += IDR_BITS;
>>			p = *--paa;
>>		}
>>	}
>>
>>	return error;
>>}
>>EXPORT_SYMBOL(idr_for_each);
>>
>>/**
>> * idr_replace - replace pointer for given id
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the id
>> * @id: lookup key
>> *
>> * Replace the pointer registered with an id and return the old value.
>> * A -ENOENT return indicates that @id was not found.
>> * A -EINVAL return indicates that @id was not within valid constraints.
>> *
>> * The caller must serialize with writers.
>> */
>>void *idr_replace(struct idr *idp, void *ptr, int id)
>>{
>>	int n;
>>	struct idr_layer *p, *old_p;
>>
>>	n = idp->layers * IDR_BITS;
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	id &= MAX_ID_MASK;
>>
>>	if (id >= (1 << n))
>>		return ERR_PTR(-EINVAL);
>>
>>	n -= IDR_BITS;
>>	while ((n > 0) && p) {
>>		p = p->ary[(id >> n) & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		n -= IDR_BITS;
>>	}
>>
>>	n = id & IDR_MASK;
>>	if (unlikely(p == NULL || !test_bit(n, &p->bitmap)))
>>		return ERR_PTR(-ENOENT);
>>
>>	old_p = p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	rcu_assign_pointer(p->ary[n], ptr);
>>
>>	return old_p;
>>}
>>EXPORT_SYMBOL(idr_replace);
>>
>>static void idr_cache_ctor(struct kmem_cache *idr_layer_cache, void *idr_layer)
>>{
>>	memset(idr_layer, 0, sizeof(struct idr_layer));
>>}
>>
>>void __init idr_init_cache(void)
>>{
>>	idr_layer_cache = kmem_cache_create("idr_layer_cache",
>>				sizeof(struct idr_layer), 0, SLAB_PANIC,
>>				idr_cache_ctor);
>>}
>>
>>/**
>> * idr_init - initialize idr handle
>> * @idp:	idr handle
>> *
>> * This function is use to set up the handle (@idp) that you will pass
>> * to the rest of the functions.
>> */
>>void idr_init(struct idr *idp)
>>{
>>	memset(idp, 0, sizeof(struct idr));
>>	spin_lock_init(&idp->lock);
>>}
>>EXPORT_SYMBOL(idr_init);
>>
>>
>>/*
>> * IDA - IDR based ID allocator
>> *
>> * this is id allocator without id -> pointer translation.  Memory
>> * usage is much lower than full blown idr because each id only
>> * occupies a bit.  ida uses a custom leaf node which contains
>> * IDA_BITMAP_BITS slots.
>> *
>> * 2007-04-25  written by Tejun Heo <htejun@gmail.com>
>> */
>>
>>static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
>>{
>>	unsigned long flags;
>>
>>	if (!ida->free_bitmap) {
>>		spin_lock_irqsave(&ida->idr.lock, flags);
>>		if (!ida->free_bitmap) {
>>			ida->free_bitmap = bitmap;
>>			bitmap = NULL;
>>		}
>>		spin_unlock_irqrestore(&ida->idr.lock, flags);
>>	}
>>
>>	kfree(bitmap);
>>}
>>
>>/**
>> * ida_pre_get - reserve resources for ida allocation
>> * @ida:	ida handle
>> * @gfp_mask:	memory allocation flag
>> *
>> * This function should be called prior to locking and calling the
>> * following function.  It preallocates enough memory to satisfy the
>> * worst possible allocation.
>> *
>> * If the system is REALLY out of memory this function returns 0,
>> * otherwise 1.
>> */
>>int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
>>{
>>	/* allocate idr_layers */
>>	if (!idr_pre_get(&ida->idr, gfp_mask))
>>		return 0;
>>
>>	/* allocate free_bitmap */
>>	if (!ida->free_bitmap) {
>>		struct ida_bitmap *bitmap;
>>
>>		bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
>>		if (!bitmap)
>>			return 0;
>>
>>		free_bitmap(ida, bitmap);
>>	}
>>
>>	return 1;
>>}
>>EXPORT_SYMBOL(ida_pre_get);
>>
>>/**
>> * ida_get_new_above - allocate new ID above or equal to a start id
>> * @ida:	ida handle
>> * @staring_id:	id to start search at
>> * @p_id:	pointer to the allocated handle
>> *
>> * Allocate new ID above or equal to @ida.  It should be called with
>> * any required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the ida_pre_get() call.  If the ida is full, it will
>> * return -ENOSPC.
>> *
>> * @p_id returns a value in the range 0 ... 0x7fffffff.
>> */
>>int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
>>{
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct ida_bitmap *bitmap;
>>	unsigned long flags;
>>	int idr_id = starting_id / IDA_BITMAP_BITS;
>>	int offset = starting_id % IDA_BITMAP_BITS;
>>	int t, id;
>>
>> restart:
>>	/* get vacant slot */
>>	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
>>	if (t < 0)
>>		return _idr_rc_to_errno(t);
>>
>>	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
>>		return -ENOSPC;
>>
>>	if (t != idr_id)
>>		offset = 0;
>>	idr_id = t;
>>
>>	/* if bitmap isn't there, create a new one */
>>	bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	if (!bitmap) {
>>		spin_lock_irqsave(&ida->idr.lock, flags);
>>		bitmap = ida->free_bitmap;
>>		ida->free_bitmap = NULL;
>>		spin_unlock_irqrestore(&ida->idr.lock, flags);
>>
>>		if (!bitmap)
>>			return -EAGAIN;
>>
>>		memset(bitmap, 0, sizeof(struct ida_bitmap));
>>		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
>>				(void *)bitmap);
>>		pa[0]->count++;
>>	}
>>
>>	/* lookup for empty slot */
>>	t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
>>	if (t == IDA_BITMAP_BITS) {
>>		/* no empty slot after offset, continue to the next chunk */
>>		idr_id++;
>>		offset = 0;
>>		goto restart;
>>	}
>>
>>	id = idr_id * IDA_BITMAP_BITS + t;
>>	if (id >= MAX_ID_BIT)
>>		return -ENOSPC;
>>
>>	__set_bit(t, bitmap->bitmap);
>>	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
>>		idr_mark_full(pa, idr_id);
>>
>>	*p_id = id;
>>
>>	/* Each leaf node can handle nearly a thousand slots and the
>>	 * whole idea of ida is to have small memory foot print.
>>	 * Throw away extra resources one by one after each successful
>>	 * allocation.
>>	 */
>>	if (ida->idr.id_free_cnt || ida->free_bitmap) {
>>		struct idr_layer *p = get_from_free_list(&ida->idr);
>>		if (p)
>>			kmem_cache_free(idr_layer_cache, p);
>>	}
>>
>>	return 0;
>>}
>>EXPORT_SYMBOL(ida_get_new_above);
>>
>>/**
>> * ida_get_new - allocate new ID
>> * @ida:	idr handle
>> * @p_id:	pointer to the allocated handle
>> *
>> * Allocate new ID.  It should be called with any required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff.
>> */
>>int ida_get_new(struct ida *ida, int *p_id)
>>{
>>	return ida_get_new_above(ida, 0, p_id);
>>}
>>EXPORT_SYMBOL(ida_get_new);
>>
>>/**
>> * ida_remove - remove the given ID
>> * @ida:	ida handle
>> * @id:		ID to free
>> */
>>void ida_remove(struct ida *ida, int id)
>>{
>>	struct idr_layer *p = ida->idr.top;
>>	int shift = (ida->idr.layers - 1) * IDR_BITS;
>>	int idr_id = id / IDA_BITMAP_BITS;
>>	int offset = id % IDA_BITMAP_BITS;
>>	int n;
>>	struct ida_bitmap *bitmap;
>>
>>	/* clear full bits while looking up the leaf idr_layer */
>>	while ((shift > 0) && p) {
>>		n = (idr_id >> shift) & IDR_MASK;
>>		__clear_bit(n, &p->bitmap);
>>		p = p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		shift -= IDR_BITS;
>>	}
>>
>>	if (p == NULL)
>>		goto err;
>>
>>	n = idr_id & IDR_MASK;
>>	__clear_bit(n, &p->bitmap);
>>
>>	bitmap = (void *)p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	if (!test_bit(offset, bitmap->bitmap))
>>		goto err;
>>
>>	/* update bitmap and remove it if empty */
>>	__clear_bit(offset, bitmap->bitmap);
>>	if (--bitmap->nr_busy == 0) {
>>		__set_bit(n, &p->bitmap);	/* to please idr_remove() */
>>		idr_remove(&ida->idr, idr_id);
>>		free_bitmap(ida, bitmap);
>>	}
>>
>>	return;
>>
>> err:
>>	printk(KERN_WARNING
>>	       "ida_remove called for id=%d which is not allocated.\n", id);
>>}
>>EXPORT_SYMBOL(ida_remove);
>>
>>/**
>> * ida_destroy - release all cached layers within an ida tree
>> * ida:		ida handle
>> */
>>void ida_destroy(struct ida *ida)
>>{
>>	idr_destroy(&ida->idr);
>>	kfree(ida->free_bitmap);
>>}
>>EXPORT_SYMBOL(ida_destroy);
>>
>>/**
>> * ida_init - initialize ida handle
>> * @ida:	ida handle
>> *
>> * This function is use to set up the handle (@ida) that you will pass
>> * to the rest of the functions.
>> */
>>void ida_init(struct ida *ida)
>>{
>>	memset(ida, 0, sizeof(struct ida));
>>	idr_init(&ida->idr);
>>
>>}
>>EXPORT_SYMBOL(ida_init);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/