2010-11-16 00:53:52

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v4 0/2] Lockless memory allocator and list

Hi, Len,

This patchset is needed by APEI support. Which involves some lockless
operations. And I think they can be used by some other part of kernel
too.

v4:

- Split from APEI patchset
- Update patch description and comments according to ML comments

v3:

- Rework lockless memory allocator and list according to ML comments

[PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless
[PATCH -v4 2/2] lib, Add lock-less NULL terminated single list


2010-11-16 00:53:57

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v4 2/2] lib, Add lock-less NULL terminated single list

Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.

Because this is a single list, so the tail can not be accessed in O(1).

This can be used in NMI handler.

If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers. They
can work simultaneously without lock. But llist_del_first can not be
used here. Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add sequence in another consumer may
violate that.

If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.

The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc. But the list entries
can not be traversed safely before deleted from the list without
proper synchronization with the list consumers.

Signed-off-by: Huang Ying <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
include/linux/llist.h | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig | 3 +
lib/Makefile | 2 +
lib/llist.c | 80 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 177 insertions(+)
create mode 100644 include/linux/llist.h
create mode 100644 lib/llist.c

--- /dev/null
+++ b/include/linux/llist.h
@@ -0,0 +1,92 @@
+#ifndef LLIST_H
+#define LLIST_H
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * If there are multiple producers and multiple consumers, llist_add
+ * can be used in producers and llist_del_all can be used in
+ * consumers. They can work simultaneously without lock. But
+ * llist_del_first can not be used here. Because llist_del_first
+ * depends on list->first->next does not changed if list->first is not
+ * changed during its operation, but llist_del_first, llist_add,
+ * llist_add sequence in another consumer may violate that.
+ *
+ * If there are multiple producers and one consumer, llist_add can be
+ * used in producers and llist_del_all or llist_del_first can be used
+ * in the consumer.
+ *
+ * The list entries deleted via llist_del_all can be traversed with
+ * traversing function such as llist_for_each etc. But the list
+ * entries can not be traversed safely before deleted from the list
+ * without proper synchronization with the list consumers.
+ */
+
+struct llist_head {
+ struct llist_node *first;
+};
+
+struct llist_node {
+ struct llist_node *next;
+};
+
+#define LLIST_HEAD_INIT(name) { NULL }
+#define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name)
+
+/**
+ * init_llist_head - initialize lock-less list head
+ * @head: the head for your lock-less list
+ */
+static inline void init_llist_head(struct llist_head *list)
+{
+ list->first = NULL;
+}
+
+/**
+ * llist_entry - get the struct of this entry
+ * @ptr: the &struct llist_node pointer.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the llist_node within the struct.
+ */
+#define llist_entry(ptr, type, member) \
+ container_of(ptr, type, member)
+
+/**
+ * llist_for_each - iterate over some entries of a lock-less list
+ * @pos: the &struct llist_node to use as a loop cursor
+ * @node: the first entry of deleted list entries
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being deleted from list, so start with an entry
+ * instead of list head.
+ */
+#define llist_for_each(pos, node) \
+ for (pos = (node); pos; pos = pos->next)
+
+/**
+ * llist_for_each_entry - iterate over some entries of lock-less list of given type
+ * @pos: the type * to use as a loop cursor.
+ * @node: the fist entry of deleted list entries.
+ * @member: the name of the llist_node with the struct.
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being removed from list, so start with an entry
+ * instead of list head.
+ */
+#define llist_for_each_entry(pos, node, member) \
+ for (pos = llist_entry((node), typeof(*pos), member); \
+ &pos->member != NULL; \
+ pos = llist_entry(pos->member.next, typeof(*pos), member))
+
+/**
+ * llist_empty - tests whether a lock-less list is empty
+ * @head: the list to test
+ */
+static inline int llist_empty(const struct llist_head *head)
+{
+ return head->first == NULL;
+}
+
+void llist_add(struct llist_node *new, struct llist_head *head);
+struct llist_node *llist_del_first(struct llist_head *head);
+struct llist_node *llist_del_all(struct llist_head *head);
+#endif /* LLIST_H */
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -210,4 +210,7 @@ config GENERIC_ATOMIC64
config LRU_CACHE
tristate

+config LLIST
+ bool
+
endmenu
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ obj-$(CONFIG_GENERIC_ATOMIC64) += atomic

obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o

+obj-$(CONFIG_LLIST) += llist.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

--- /dev/null
+++ b/lib/llist.c
@@ -0,0 +1,80 @@
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * Copyright 2010 Intel Corp.
+ * Author: Huang Ying <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/llist.h>
+#include <linux/errno.h>
+
+#include <asm/system.h>
+
+/**
+ * llist_add - add a new entry
+ * @new: new entry to be added
+ * @head: the head for your lock-less list
+ */
+void llist_add(struct llist_node *new, struct llist_head *head)
+{
+ struct llist_node *entry;
+
+ do {
+ entry = head->first;
+ new->next = entry;
+ } while (cmpxchg(&head->first, entry, new) != entry);
+}
+EXPORT_SYMBOL_GPL(llist_add);
+
+/**
+ * llist_del_first - delete the first entry of lock-less list
+ * @head: the head for your lock-less list
+ *
+ * If list is empty, return NULL, otherwise, return the first entry deleted.
+ *
+ * Only one llist_del_first user can be used simultaneously with
+ * multiple llist_add users without lock. Because otherwise
+ * llist_del_first, llist_add, llist_add sequence in another user may
+ * change @head->first->next, but keep @head->first. If multiple
+ * consumers are needed, please use llist_del_all.
+ */
+struct llist_node *llist_del_first(struct llist_head *head)
+{
+ struct llist_node *entry;
+
+ do {
+ entry = head->first;
+ if (entry == NULL)
+ return NULL;
+ } while (cmpxchg(&head->first, entry, entry->next) != entry);
+
+ return entry;
+}
+EXPORT_SYMBOL_GPL(llist_del_first);
+
+/**
+ * llist_del_all - delete all entries from lock-less list
+ * @head: the head of lock-less list to delete all entries
+ *
+ * If list is empty, return NULL, otherwise, delete all entries and
+ * return the pointer to the first entry.
+ */
+struct llist_node *llist_del_all(struct llist_head *head)
+{
+ return xchg(&head->first, NULL);
+}
+EXPORT_SYMBOL_GPL(llist_del_all);

2010-11-16 00:53:55

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

This version of the gen_pool memory allocator supports lockless
operation.

This makes it safe to use in NMI handlers and other special
unblockable contexts that could otherwise deadlock on locks. This is
implemented by using atomic operations and retries on any conflicts.
The disadvantage is that there may be livelocks in extreme cases. For
better scalability, one gen_pool allocator can be used for each CPU.

The lockless operation only works if there is enough memory available.
If new memory is added to the pool a lock has to be still taken. So
any user relying on locklessness has to ensure that sufficient memory
is preallocated.

The basic atomic operation of this allocator is cmpxchg on long. On
architectures that don't support cmpxchg natively a fallback is used.
If the fallback uses locks it may not be safe to use it in NMI
contexts on these architectures.

Signed-off-by: Huang Ying <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
---
include/linux/bitmap.h | 1
include/linux/genalloc.h | 35 +++++--
lib/bitmap.c | 2
lib/genalloc.c | 228 ++++++++++++++++++++++++++++++++++++++---------
4 files changed, 215 insertions(+), 51 deletions(-)

--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -142,6 +142,7 @@ extern void bitmap_release_region(unsign
extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);

+#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
#define BITMAP_LAST_WORD_MASK(nbits) \
( \
((nbits) % BITS_PER_LONG) ? \
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -1,19 +1,26 @@
+#ifndef GENALLOC_H
+#define GENALLOC_H
/*
- * Basic general purpose allocator for managing special purpose memory
- * not managed by the regular kmalloc/kfree interface.
- * Uses for this includes on-device special memory, uncached memory
- * etc.
+ * Basic general purpose allocator for managing special purpose
+ * memory, for example, memory not managed by the regular
+ * kmalloc/kfree interface. Uses for this includes on-device special
+ * memory, uncached memory etc.
+ *
+ * The gen_pool_alloc, gen_pool_free, gen_pool_avail and gen_pool_size
+ * implementation is lockless, that is, multiple users can
+ * allocate/free memory in the pool simultaneously without lock. This
+ * also makes the gen_pool memory allocator can be used to
+ * allocate/free memory in IRQ/NMI handler.
*
* This source code is licensed under the GNU General Public License,
* Version 2. See the file COPYING for more details.
*/

-
/*
* General purpose special memory pool descriptor.
*/
struct gen_pool {
- rwlock_t lock;
+ spinlock_t lock;
struct list_head chunks; /* list of chunks in this pool */
int min_alloc_order; /* minimum allocation order */
};
@@ -22,15 +29,29 @@ struct gen_pool {
* General purpose special memory pool chunk descriptor.
*/
struct gen_pool_chunk {
- spinlock_t lock;
struct list_head next_chunk; /* next chunk in pool */
+ atomic_t avail;
unsigned long start_addr; /* starting address of memory chunk */
unsigned long end_addr; /* ending address of memory chunk */
unsigned long bits[0]; /* bitmap for allocating memory chunk */
};

+/**
+ * gen_pool_for_each_chunk - iterate over chunks of generic memory pool
+ * @chunk: the struct gen_pool_chunk * to use as a loop cursor
+ * @pool: the generic memory pool
+ *
+ * Not lockless, proper mutual exclusion is needed to use this macro
+ * with other gen_pool function simultaneously.
+ */
+#define gen_pool_for_each_chunk(chunk, pool) \
+ list_for_each_entry(chunk, &pool->chunks, next_chunk)
+
extern struct gen_pool *gen_pool_create(int, int);
extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
extern void gen_pool_destroy(struct gen_pool *);
extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+extern size_t gen_pool_avail(struct gen_pool *);
+extern size_t gen_pool_size(struct gen_pool *);
+#endif /* GENALLOC_H */
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -271,8 +271,6 @@ int __bitmap_weight(const unsigned long
}
EXPORT_SYMBOL(__bitmap_weight);

-#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
-
void bitmap_set(unsigned long *map, int start, int nr)
{
unsigned long *p = map + BIT_WORD(start);
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -1,8 +1,25 @@
/*
- * Basic general purpose allocator for managing special purpose memory
- * not managed by the regular kmalloc/kfree interface.
- * Uses for this includes on-device special memory, uncached memory
- * etc.
+ * Basic general purpose allocator for managing special purpose
+ * memory, for example, memory that is not managed by the regular
+ * kmalloc/kfree interface. Uses for this includes on-device special
+ * memory, uncached memory etc.
+ *
+ * It is safe to use the allocator in NMI handlers and other special
+ * unblockable contexts that could otherwise deadlock on locks. This
+ * is implemented by using atomic operations and retries on any
+ * conflicts. The disadvantage is that there may be livelocks in
+ * extreme cases. For better scalability, one allocator can be used
+ * for each CPU.
+ *
+ * The lockless operation only works if there is enough memory
+ * available. If new memory is added to the pool a lock has to be
+ * still taken. So any user relying on locklessness has to ensure
+ * that sufficient memory is preallocated.
+ *
+ * The basic atomic operation of this allocator is cmpxchg on long.
+ * On architectures that don't support cmpxchg natively a fallback is
+ * used. If the fallback uses locks it may not be safe to use it in
+ * NMI contexts on these architectures.
*
* Copyright 2005 (C) Jes Sorensen <[email protected]>
*
@@ -13,8 +30,107 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/bitmap.h>
+#include <linux/rculist.h>
#include <linux/genalloc.h>

+static inline int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+{
+ unsigned long val, nval;
+
+ nval = *addr;
+ do {
+ val = nval;
+ if (val & mask_to_set)
+ return -EBUSY;
+ } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
+
+ return 0;
+}
+
+static inline int clear_bits_ll(unsigned long *addr,
+ unsigned long mask_to_clear)
+{
+ unsigned long val, nval;
+
+ nval = *addr;
+ do {
+ val = nval;
+ if ((val & mask_to_clear) != mask_to_clear)
+ return -EBUSY;
+ } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
+
+ return 0;
+}
+
+/*
+ * bitmap_set_ll - set the specified number of bits at the specified position
+ * @map: pointer to a bitmap
+ * @start: a bit position in @map
+ * @nr: number of bits to set
+ *
+ * Set @nr bits start from @start in @map lock-lessly. Several users
+ * can set/clear the same bitmap simultaneously without lock. If two
+ * users set the same bit, one user will return remain bits, otherwise
+ * return 0.
+ */
+static int bitmap_set_ll(unsigned long *map, int start, int nr)
+{
+ unsigned long *p = map + BIT_WORD(start);
+ const int size = start + nr;
+ int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
+ unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+
+ while (nr - bits_to_set >= 0) {
+ if (set_bits_ll(p, mask_to_set))
+ return nr;
+ nr -= bits_to_set;
+ bits_to_set = BITS_PER_LONG;
+ mask_to_set = ~0UL;
+ p++;
+ }
+ if (nr) {
+ mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+ if (set_bits_ll(p, mask_to_set))
+ return nr;
+ }
+
+ return 0;
+}
+
+/*
+ * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * @map: pointer to a bitmap
+ * @start: a bit position in @map
+ * @nr: number of bits to set
+ *
+ * Clear @nr bits start from @start in @map lock-lessly. Several users
+ * can set/clear the same bitmap simultaneously without lock. If two
+ * users clear the same bit, one user will return remain bits,
+ * otherwise return 0.
+ */
+static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+{
+ unsigned long *p = map + BIT_WORD(start);
+ const int size = start + nr;
+ int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+ unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+
+ while (nr - bits_to_clear >= 0) {
+ if (clear_bits_ll(p, mask_to_clear))
+ return nr;
+ nr -= bits_to_clear;
+ bits_to_clear = BITS_PER_LONG;
+ mask_to_clear = ~0UL;
+ p++;
+ }
+ if (nr) {
+ mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+ if (clear_bits_ll(p, mask_to_clear))
+ return nr;
+ }
+
+ return 0;
+}

/**
* gen_pool_create - create a new special memory pool
@@ -30,7 +146,7 @@ struct gen_pool *gen_pool_create(int min

pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
if (pool != NULL) {
- rwlock_init(&pool->lock);
+ spin_lock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
pool->min_alloc_order = min_alloc_order;
}
@@ -58,15 +174,15 @@ int gen_pool_add(struct gen_pool *pool,

chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
if (unlikely(chunk == NULL))
- return -1;
+ return -ENOMEM;

- spin_lock_init(&chunk->lock);
chunk->start_addr = addr;
chunk->end_addr = addr + size;
+ atomic_set(&chunk->avail, size);

- write_lock(&pool->lock);
- list_add(&chunk->next_chunk, &pool->chunks);
- write_unlock(&pool->lock);
+ spin_lock(&pool->lock);
+ list_add_rcu(&chunk->next_chunk, &pool->chunks);
+ spin_unlock(&pool->lock);

return 0;
}
@@ -112,39 +228,39 @@ EXPORT_SYMBOL(gen_pool_destroy);
*/
unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
+ unsigned long addr;
int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;
+ int nbits, start_bit = 0, end_bit, remain;

if (size == 0)
return 0;

nbits = (size + (1UL << order) - 1) >> order;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
+ if (size > atomic_read(&chunk->avail))
+ continue;

end_bit = (chunk->end_addr - chunk->start_addr) >> order;
-
- spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
- spin_unlock_irqrestore(&chunk->lock, flags);
+retry:
+ start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit,
+ start_bit, nbits, 0);
+ if (start_bit >= end_bit)
continue;
+ remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+ if (remain) {
+ remain = bitmap_clear_ll(chunk->bits, start_bit,
+ nbits - remain);
+ BUG_ON(remain);
+ goto retry;
}

addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
- spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
+ size = nbits << order;
+ atomic_sub(size, &chunk->avail);
return addr;
}
- read_unlock(&pool->lock);
return 0;
}
EXPORT_SYMBOL(gen_pool_alloc);
@@ -159,29 +275,57 @@ EXPORT_SYMBOL(gen_pool_alloc);
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
- unsigned long flags;
int order = pool->min_alloc_order;
- int bit, nbits;
+ int start_bit, nbits, remain;

nbits = (size + (1UL << order) - 1) >> order;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
-
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (addr >= chunk->start_addr && addr < chunk->end_addr) {
BUG_ON(addr + size > chunk->end_addr);
- spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
- spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ start_bit = (addr - chunk->start_addr) >> order;
+ remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+ BUG_ON(remain);
+ size = nbits << order;
+ atomic_add(size, &chunk->avail);
+ return;
}
}
- BUG_ON(nbits > 0);
- read_unlock(&pool->lock);
+ BUG();
}
EXPORT_SYMBOL(gen_pool_free);
+
+/**
+ * gen_pool_avail - get available free space of the pool
+ * @pool: pool to get available free space
+ *
+ * Return available free space of the specified pool.
+ */
+size_t gen_pool_avail(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ size_t avail = 0;
+
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
+ avail += atomic_read(&chunk->avail);
+ return avail;
+}
+EXPORT_SYMBOL_GPL(gen_pool_avail);
+
+/**
+ * gen_pool_size - get size in bytes of memory managed by the pool
+ * @pool: pool to get size
+ *
+ * Return size in bytes of memory managed by the pool.
+ */
+size_t gen_pool_size(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ size_t size = 0;
+
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
+ size += chunk->end_addr - chunk->start_addr;
+ return size;
+}
+EXPORT_SYMBOL_GPL(gen_pool_size);

2010-11-16 11:49:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 0/2] Lockless memory allocator and list

On Tue, 2010-11-16 at 08:53 +0800, Huang Ying wrote:
> Hi, Len,
>
> This patchset is needed by APEI support. Which involves some lockless
> operations. And I think they can be used by some other part of kernel
> too.

The whole notion of allocating memory from NMI context seems incredibly
offensive.

And then there's the whole point of not merging stuff without a user.

2010-11-16 11:50:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 2/2] lib, Add lock-less NULL terminated single list

On Tue, 2010-11-16 at 08:53 +0800, Huang Ying wrote:
> +/**
> + * llist_del_all - delete all entries from lock-less list
> + * @head: the head of lock-less list to delete all entries
> + *
> + * If list is empty, return NULL, otherwise, delete all entries and
> + * return the pointer to the first entry.
> + */
> +struct llist_node *llist_del_all(struct llist_head *head)
> +{
> + return xchg(&head->first, NULL);
> +}
> +EXPORT_SYMBOL_GPL(llist_del_all);

Its not del_all, since it returns the actual list. move or splice might
be better names.

2010-11-16 16:34:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH -v4 2/2] lib, Add lock-less NULL terminated single list

On Tue, Nov 16, 2010 at 3:50 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-11-16 at 08:53 +0800, Huang Ying wrote:
>> +/**
>> + * llist_del_all - delete all entries from lock-less list
>> + * @head: ? ? ?the head of lock-less list to delete all entries
>> + *
>> + * If list is empty, return NULL, otherwise, delete all entries and
>> + * return the pointer to the first entry.
>> + */
>> +struct llist_node *llist_del_all(struct llist_head *head)
>> +{
>> + ? ? ? return xchg(&head->first, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_del_all);
>
> Its not del_all, since it returns the actual list. move or splice might
> be better names.

I don't think it's a "move", since the result isn't actually a
list_head (it's by definition a "we remove all entries from the list,
and then we can look at the list entries in private afterwards").

So the "delete" part _is_ important - before the op, we had an atomic
list. After the op, we just have a bunch of entries (they may be a
list, but they aren't a _lock-less_ list any more: they are literally
just private entries now).

So both "move" and "splice" imply at least to me that the target is
equivalent to the source. And it really isn't. The target is something
totally different. In fact, my original objection to this code was
that there wasn't a good separation between "entries" and "lock-less
head of list", and it used the same type for both - the way the
regular list functions do. In the regular list implementation, each
list entry really is a list head too, and you can have lists that
don't even _have_ separate heads, but everything is a list entry
(common in things like the dentry "siblings" list, where each dentry
can be used as the "head" to that sibling list - all siblings are
equal).

Now, I don't know if "del_all" is necessarily the proper name, but I
do think that it's important to make it clear that as far as the
lock-lessness of the list is concerned, the operation really is about
removing all entries, and it's not some kind of symmetric operation
wrt src/dest like with normal lists.

Linus

2010-11-16 16:39:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH -v4 0/2] Lockless memory allocator and list

On Tue, Nov 16, 2010 at 3:49 AM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-11-16 at 08:53 +0800, Huang Ying wrote:
>> Hi, Len,
>>
>> This patchset is needed by APEI support. Which involves some lockless
>> operations. And I think they can be used by some other part of kernel
>> too.
>
> The whole notion of allocating memory from NMI context seems incredibly
> offensive.
>
> And then there's the whole point of not merging stuff without a user.

So I do agree that people should look very hard at trying to use
existing infrastructure.

I kind of like the lock-less list implementation (it could easily be
useful for random things, and it's very simple). And I don't think the
notion of a lockless memory allocator is wrong either, although it
looks a lot more specialized than the list thing (the solution to
lockless allocations is generally simply to do them ahead of time).

So the part I'm really not all that comfy with is the whole APEI side
of things. I'm not at all convinced that we want yet another random
hw-specific interface, and I really have yet to hear why it's so
magical.

Linus

2010-11-16 18:04:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 0/2] Lockless memory allocator and list

On Tue, 2010-11-16 at 08:38 -0800, Linus Torvalds wrote:
>
> I kind of like the lock-less list implementation (it could easily be
> useful for random things, and it's very simple).

Yes, there's various implementations floating around, and we already
have one in-kernel ( net/rds/xlist.h ), mason and axboe and me have been
kicking around various patches using that thing in other circumstances
as well.

[ At some point we had perf -- what now is kernel/irq_work.c -- using
it as well, but the new code grew too complex due to requirements
from Huang ]

> And I don't think the
> notion of a lockless memory allocator is wrong either, although it
> looks a lot more specialized than the list thing (the solution to
> lockless allocations is generally simply to do them ahead of time).
>
Right, I don't generally object to lockless things, but they either need
to be 1) faster than the existing code, and/or 2) have a very convincing
use-case (other than performance) for their added complexity.

Afaict the proposed patch adds lots more LOCK'ed instructions into that
allocator path than it removes, ie its a slow down for existing users.

2010-11-16 21:51:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Tue, 16 Nov 2010 08:53:10 +0800
Huang Ying <[email protected]> wrote:

> This version of the gen_pool memory allocator supports lockless
> operation.
>
> This makes it safe to use in NMI handlers and other special
> unblockable contexts that could otherwise deadlock on locks. This is
> implemented by using atomic operations and retries on any conflicts.
> The disadvantage is that there may be livelocks in extreme cases. For
> better scalability, one gen_pool allocator can be used for each CPU.
>
> The lockless operation only works if there is enough memory available.
> If new memory is added to the pool a lock has to be still taken. So
> any user relying on locklessness has to ensure that sufficient memory
> is preallocated.
>
> The basic atomic operation of this allocator is cmpxchg on long. On
> architectures that don't support cmpxchg natively a fallback is used.
> If the fallback uses locks it may not be safe to use it in NMI
> contexts on these architectures.

The code assumes that cmpxchg is atomic wrt NMI. That would be news to
me - at present an architecture can legitimately implement cmpxchg()
with, say, spin_lock_irqsave() on a hashed spinlock. I don't know
whether any architectures _do_ do anything like that. If so then
that's a problem. If not, it's an additional requirement on future
architecture ports.

> Signed-off-by: Huang Ying <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> ---
> include/linux/bitmap.h | 1
> include/linux/genalloc.h | 35 +++++--
> lib/bitmap.c | 2
> lib/genalloc.c | 228 ++++++++++++++++++++++++++++++++++++++---------
> 4 files changed, 215 insertions(+), 51 deletions(-)
>
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -142,6 +142,7 @@ extern void bitmap_release_region(unsign
> extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
> extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);
>
> +#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
> #define BITMAP_LAST_WORD_MASK(nbits) \
> ( \
> ((nbits) % BITS_PER_LONG) ? \
> --- a/include/linux/genalloc.h
> +++ b/include/linux/genalloc.h
> @@ -1,19 +1,26 @@
> +#ifndef GENALLOC_H
> +#define GENALLOC_H
> /*
> - * Basic general purpose allocator for managing special purpose memory
> - * not managed by the regular kmalloc/kfree interface.
> - * Uses for this includes on-device special memory, uncached memory
> - * etc.
> + * Basic general purpose allocator for managing special purpose
> + * memory, for example, memory not managed by the regular
> + * kmalloc/kfree interface. Uses for this includes on-device special
> + * memory, uncached memory etc.
> + *
> + * The gen_pool_alloc, gen_pool_free, gen_pool_avail and gen_pool_size
> + * implementation is lockless, that is, multiple users can
> + * allocate/free memory in the pool simultaneously without lock. This
> + * also makes the gen_pool memory allocator can be used to

That sentence needs a fixup.

>
> +static inline int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> +{
> + unsigned long val, nval;
> +
> + nval = *addr;
> + do {
> + val = nval;
> + if (val & mask_to_set)
> + return -EBUSY;
> + } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> +
> + return 0;
> +}
> +
> +static inline int clear_bits_ll(unsigned long *addr,
> + unsigned long mask_to_clear)
> +{
> + unsigned long val, nval;
> +
> + nval = *addr;
> + do {
> + val = nval;
> + if ((val & mask_to_clear) != mask_to_clear)
> + return -EBUSY;
> + } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> +
> + return 0;
> +}

These are waaaay too big to be inlined. Let the compiler decide.

2010-11-17 01:03:11

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 0/2] Lockless memory allocator and list

On Wed, 2010-11-17 at 00:38 +0800, Linus Torvalds wrote:
> On Tue, Nov 16, 2010 at 3:49 AM, Peter Zijlstra <[email protected]> wrote:
> > On Tue, 2010-11-16 at 08:53 +0800, Huang Ying wrote:
> >> Hi, Len,
> >>
> >> This patchset is needed by APEI support. Which involves some lockless
> >> operations. And I think they can be used by some other part of kernel
> >> too.
> >
> > The whole notion of allocating memory from NMI context seems incredibly
> > offensive.
> >
> > And then there's the whole point of not merging stuff without a user.
>
> So I do agree that people should look very hard at trying to use
> existing infrastructure.
>
> I kind of like the lock-less list implementation (it could easily be
> useful for random things, and it's very simple). And I don't think the
> notion of a lockless memory allocator is wrong either, although it
> looks a lot more specialized than the list thing (the solution to
> lockless allocations is generally simply to do them ahead of time).

Yes. The general solution is that. But lockless memory allocator has
some advantages too for some situations. For example, the memory pool
backing lockless memory allocator can be enlarged or shrunken (not
implemented yet) when needed. That is hard for pre-allocation.

> So the part I'm really not all that comfy with is the whole APEI side
> of things. I'm not at all convinced that we want yet another random
> hw-specific interface, and I really have yet to hear why it's so
> magical.

Yes. Random hardware-specific interface is not good. In fact we have a
generic hardware error reporting infrastructure that can be used by
Machine Check, EDAC, PCIe AER, APEI, etc in the original patchset as
follow:

http://lkml.org/lkml/2010/10/27/23
http://lkml.org/lkml/2010/10/27/18

But it seem that nobody want to take a look at it. I will re-post the
generic hardware error reporting infrastructure patchset with better
description separately. Hope somebody will take a look at it this time.

Best Regards,
Huang Ying

2010-11-17 01:45:35

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 0/2] Lockless memory allocator and list

On Wed, 2010-11-17 at 02:04 +0800, Peter Zijlstra wrote:
> On Tue, 2010-11-16 at 08:38 -0800, Linus Torvalds wrote:
> >
> > I kind of like the lock-less list implementation (it could easily be
> > useful for random things, and it's very simple).
>
> Yes, there's various implementations floating around, and we already
> have one in-kernel ( net/rds/xlist.h ), mason and axboe and me have been
> kicking around various patches using that thing in other circumstances
> as well.
>
> [ At some point we had perf -- what now is kernel/irq_work.c -- using
> it as well, but the new code grew too complex due to requirements
> from Huang ]

I think it should be possible for them to use the general lockless list
implementation in the patch. I think this will reduce some code
duplication/complexity. Do you agree?

> > And I don't think the
> > notion of a lockless memory allocator is wrong either, although it
> > looks a lot more specialized than the list thing (the solution to
> > lockless allocations is generally simply to do them ahead of time).
> >
> Right, I don't generally object to lockless things, but they either need
> to be 1) faster than the existing code, and/or 2) have a very convincing
> use-case (other than performance) for their added complexity.

I will post a generic hardware error reporting mechanism patchset soon.
The lock-less memory allocator is used there. And I think maybe we can
use it in lockdep code too. Which needs to allocate something locklessly
if my understanding is correct.

> Afaict the proposed patch adds lots more LOCK'ed instructions into that
> allocator path than it removes, ie its a slow down for existing users.

Let's take a look at gen_pool_alloc

The locks removed:

- one rwlock: pool->lock
- one spinlock for each chunk: chunk->lock

The LOCK'ed instructions added:

- one or two cmpxchg in most cases. But if there is heavy contention
between users, there will be more cmpxchg. So I suggest to use one
gen_pool for each CPU for heavy contention situation.

BTW: The original gen_pool is designed to deal with special purpose
memory in some drivers. So I don't think performance is a big issue for
it.

Best Regards,
Huang Ying

2010-11-17 02:18:07

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 05:50 +0800, Andrew Morton wrote:
> On Tue, 16 Nov 2010 08:53:10 +0800
> Huang Ying <[email protected]> wrote:
>
> > This version of the gen_pool memory allocator supports lockless
> > operation.
> >
> > This makes it safe to use in NMI handlers and other special
> > unblockable contexts that could otherwise deadlock on locks. This is
> > implemented by using atomic operations and retries on any conflicts.
> > The disadvantage is that there may be livelocks in extreme cases. For
> > better scalability, one gen_pool allocator can be used for each CPU.
> >
> > The lockless operation only works if there is enough memory available.
> > If new memory is added to the pool a lock has to be still taken. So
> > any user relying on locklessness has to ensure that sufficient memory
> > is preallocated.
> >
> > The basic atomic operation of this allocator is cmpxchg on long. On
> > architectures that don't support cmpxchg natively a fallback is used.
> > If the fallback uses locks it may not be safe to use it in NMI
> > contexts on these architectures.
>
> The code assumes that cmpxchg is atomic wrt NMI. That would be news to
> me - at present an architecture can legitimately implement cmpxchg()
> with, say, spin_lock_irqsave() on a hashed spinlock. I don't know
> whether any architectures _do_ do anything like that. If so then
> that's a problem. If not, it's an additional requirement on future
> architecture ports.

cmpxchg has been used in that way by ftrace and perf for a long time. So
I agree to make it a requirement on future architecture ports.

> > Signed-off-by: Huang Ying <[email protected]>
> > Reviewed-by: Andi Kleen <[email protected]>
> > ---
> > include/linux/bitmap.h | 1
> > include/linux/genalloc.h | 35 +++++--
> > lib/bitmap.c | 2
> > lib/genalloc.c | 228 ++++++++++++++++++++++++++++++++++++++---------
> > 4 files changed, 215 insertions(+), 51 deletions(-)
> >
> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -142,6 +142,7 @@ extern void bitmap_release_region(unsign
> > extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
> > extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);
> >
> > +#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
> > #define BITMAP_LAST_WORD_MASK(nbits) \
> > ( \
> > ((nbits) % BITS_PER_LONG) ? \
> > --- a/include/linux/genalloc.h
> > +++ b/include/linux/genalloc.h
> > @@ -1,19 +1,26 @@
> > +#ifndef GENALLOC_H
> > +#define GENALLOC_H
> > /*
> > - * Basic general purpose allocator for managing special purpose memory
> > - * not managed by the regular kmalloc/kfree interface.
> > - * Uses for this includes on-device special memory, uncached memory
> > - * etc.
> > + * Basic general purpose allocator for managing special purpose
> > + * memory, for example, memory not managed by the regular
> > + * kmalloc/kfree interface. Uses for this includes on-device special
> > + * memory, uncached memory etc.
> > + *
> > + * The gen_pool_alloc, gen_pool_free, gen_pool_avail and gen_pool_size
> > + * implementation is lockless, that is, multiple users can
> > + * allocate/free memory in the pool simultaneously without lock. This
> > + * also makes the gen_pool memory allocator can be used to
>
> That sentence needs a fixup.

Yes. I will fix it.

> > +static inline int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
> > +{
> > + unsigned long val, nval;
> > +
> > + nval = *addr;
> > + do {
> > + val = nval;
> > + if (val & mask_to_set)
> > + return -EBUSY;
> > + } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> > +
> > + return 0;
> > +}
> > +
> > +static inline int clear_bits_ll(unsigned long *addr,
> > + unsigned long mask_to_clear)
> > +{
> > + unsigned long val, nval;
> > +
> > + nval = *addr;
> > + do {
> > + val = nval;
> > + if ((val & mask_to_clear) != mask_to_clear)
> > + return -EBUSY;
> > + } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
> > +
> > + return 0;
> > +}
>
> These are waaaay too big to be inlined. Let the compiler decide.

Yes. Will change it.

Best Regards,
Huang Ying

2010-11-17 02:39:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 17 Nov 2010 10:18:01 +0800 Huang Ying <[email protected]> wrote:

> On Wed, 2010-11-17 at 05:50 +0800, Andrew Morton wrote:
> > On Tue, 16 Nov 2010 08:53:10 +0800
> > Huang Ying <[email protected]> wrote:
> >
> > > This version of the gen_pool memory allocator supports lockless
> > > operation.
> > >
> > > This makes it safe to use in NMI handlers and other special
> > > unblockable contexts that could otherwise deadlock on locks. This is
> > > implemented by using atomic operations and retries on any conflicts.
> > > The disadvantage is that there may be livelocks in extreme cases. For
> > > better scalability, one gen_pool allocator can be used for each CPU.
> > >
> > > The lockless operation only works if there is enough memory available.
> > > If new memory is added to the pool a lock has to be still taken. So
> > > any user relying on locklessness has to ensure that sufficient memory
> > > is preallocated.
> > >
> > > The basic atomic operation of this allocator is cmpxchg on long. On
> > > architectures that don't support cmpxchg natively a fallback is used.
> > > If the fallback uses locks it may not be safe to use it in NMI
> > > contexts on these architectures.
> >
> > The code assumes that cmpxchg is atomic wrt NMI. That would be news to
> > me - at present an architecture can legitimately implement cmpxchg()
> > with, say, spin_lock_irqsave() on a hashed spinlock. I don't know
> > whether any architectures _do_ do anything like that. If so then
> > that's a problem. If not, it's an additional requirement on future
> > architecture ports.
>
> cmpxchg has been used in that way by ftrace and perf for a long time. So
> I agree to make it a requirement on future architecture ports.

All I was really doing was inviting you to check your assumptions for
the known architecture ports. Seems that I must do it myself.


dude, take a look at include/asm-generic/cmpxchg-local.h. Not NMI-safe!

arch/arm/include/asm/atomic.h's atomic_cmpxchg() isn't NMi-safe.

arch/arm/include/asm/system.h uses include/asm-generic/cmpxchg-local.h.

as does avr32

and blackfin

Now go take a look at cris.

h8300 atomic_cmpxchg() isn't NMI-safe.

m32r isn't NMI-safe

go look at m68k, see if you can work it out.

microblaze? Dunno.

mn10300 uniprocessor isn't NMI-safe

score isn't NMI-safe

I stopped looking there.

2010-11-17 03:03:30

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 10:35 +0800, Andrew Morton wrote:
> On Wed, 17 Nov 2010 10:18:01 +0800 Huang Ying <[email protected]> wrote:
>
> > On Wed, 2010-11-17 at 05:50 +0800, Andrew Morton wrote:
> > > On Tue, 16 Nov 2010 08:53:10 +0800
> > > Huang Ying <[email protected]> wrote:
> > >
> > > > This version of the gen_pool memory allocator supports lockless
> > > > operation.
> > > >
> > > > This makes it safe to use in NMI handlers and other special
> > > > unblockable contexts that could otherwise deadlock on locks. This is
> > > > implemented by using atomic operations and retries on any conflicts.
> > > > The disadvantage is that there may be livelocks in extreme cases. For
> > > > better scalability, one gen_pool allocator can be used for each CPU.
> > > >
> > > > The lockless operation only works if there is enough memory available.
> > > > If new memory is added to the pool a lock has to be still taken. So
> > > > any user relying on locklessness has to ensure that sufficient memory
> > > > is preallocated.
> > > >
> > > > The basic atomic operation of this allocator is cmpxchg on long. On
> > > > architectures that don't support cmpxchg natively a fallback is used.
> > > > If the fallback uses locks it may not be safe to use it in NMI
> > > > contexts on these architectures.
> > >
> > > The code assumes that cmpxchg is atomic wrt NMI. That would be news to
> > > me - at present an architecture can legitimately implement cmpxchg()
> > > with, say, spin_lock_irqsave() on a hashed spinlock. I don't know
> > > whether any architectures _do_ do anything like that. If so then
> > > that's a problem. If not, it's an additional requirement on future
> > > architecture ports.
> >
> > cmpxchg has been used in that way by ftrace and perf for a long time. So
> > I agree to make it a requirement on future architecture ports.
>
> All I was really doing was inviting you to check your assumptions for
> the known architecture ports. Seems that I must do it myself.

Sorry. I should have done that by myself.

> dude, take a look at include/asm-generic/cmpxchg-local.h. Not NMI-safe!
>
> arch/arm/include/asm/atomic.h's atomic_cmpxchg() isn't NMi-safe.
>
> arch/arm/include/asm/system.h uses include/asm-generic/cmpxchg-local.h.
>
> as does avr32
>
> and blackfin
>
> Now go take a look at cris.
>
> h8300 atomic_cmpxchg() isn't NMI-safe.
>
> m32r isn't NMI-safe
>
> go look at m68k, see if you can work it out.
>
> microblaze? Dunno.
>
> mn10300 uniprocessor isn't NMI-safe
>
> score isn't NMI-safe
>
> I stopped looking there.

I have talked about the NMI-safety of cmpxchg with Steven Rostedt before
in following thread:

http://lkml.org/lkml/2009/6/10/518

It seems that Steven thinks many architectures without NMI-safe cmpxchg
have no real NMI too.

In the patch description and comments, it is said that on architectures
without NMI-safe cmpxchg, gen_pool can not be used in NMI handler
safely.

Or do you think it is better to use a spin_trylock based fallback if
NMI-safe cmpxchg is not available? Or require cmpxchg implementation
uses spin_trylock instead of spin_lock?

Best Regards,
Huang Ying

2010-11-17 04:01:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 17 Nov 2010 11:03:25 +0800 Huang Ying <[email protected]> wrote:

> It seems that Steven thinks many architectures without NMI-safe cmpxchg
> have no real NMI too.

Could be.

Really, we should nail this down and work out the matrix of
what-works-with-what, and then arrange for the things which _won't_
work to be either non-Kconfigurable or non-compilable.

The worst thing we could do would be to advertise some code as
"nmi-safe!!", then to have someone go and use it on that basis, only
for their users to later discover ghastly rare races.

> In the patch description and comments, it is said that on architectures
> without NMI-safe cmpxchg, gen_pool can not be used in NMI handler
> safely.
>
> Or do you think it is better to use a spin_trylock based fallback if
> NMI-safe cmpxchg is not available? Or require cmpxchg implementation
> uses spin_trylock instead of spin_lock?

As a first step, a typical thing to do would be to create
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, define that in the appropriate
architectures, make ftrace and perf and genpool and anything else
dependent upon that at Kconfig-time.

A spin_trylock_irqsave() implementation would do what? Rarely fail the
memory allocation attempt if the trylock failed? I guess that's
acceptable in the context of gen_pool, because memory allocators are
expected to fail, and everyone carefully tests the allocation-failed
error paths (lol).

But rare failures may not be useful within the context of future
clients of the "lockless" list implementation so I'd say that a safer
approach would be to make the list implementation require
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG and be done with it.

So that's all pretty simple so far. However...

The list implementation is still useful to
non-CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG clients, as long as they aren't
using it from NMI context, yes?

In which case I suppose we could add a library of lockless_list_foo()
functions which are usable from non-NMI contexts and which are
available to all configs. And on top of that implement a library of
nmi_safe_lockless_list_foo() functions which are just wrappers around
lockless_list_foo(), but which are only available if
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y.

Which is getting to be a bit of a pain, so you might choose to
disregard everything after "However..." ;)

2010-11-17 06:05:20

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 11:57 +0800, Andrew Morton wrote:
> On Wed, 17 Nov 2010 11:03:25 +0800 Huang Ying <[email protected]> wrote:
>
> > It seems that Steven thinks many architectures without NMI-safe cmpxchg
> > have no real NMI too.
>
> Could be.
>
> Really, we should nail this down and work out the matrix of
> what-works-with-what, and then arrange for the things which _won't_
> work to be either non-Kconfigurable or non-compilable.

Yes. I will try to work out a draft matrix for review firstly.

> The worst thing we could do would be to advertise some code as
> "nmi-safe!!", then to have someone go and use it on that basis, only
> for their users to later discover ghastly rare races.

Yes.

> > In the patch description and comments, it is said that on architectures
> > without NMI-safe cmpxchg, gen_pool can not be used in NMI handler
> > safely.
> >
> > Or do you think it is better to use a spin_trylock based fallback if
> > NMI-safe cmpxchg is not available? Or require cmpxchg implementation
> > uses spin_trylock instead of spin_lock?
>
> As a first step, a typical thing to do would be to create
> CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, define that in the appropriate
> architectures, make ftrace and perf and genpool and anything else
> dependent upon that at Kconfig-time.

Yes. At least lockless list should depend on that. And we need select
between cmpxchg and spin_trylock_irqsave based on that.

Hi, Peter,

Do you think think irq_work should depend on that? Or we just
reimplement irq_work based on lockless list and make irq_work depends on
lockless list?

> A spin_trylock_irqsave() implementation would do what? Rarely fail the
> memory allocation attempt if the trylock failed? I guess that's
> acceptable in the context of gen_pool, because memory allocators are
> expected to fail, and everyone carefully tests the allocation-failed
> error paths (lol).

Yes. I will implement the fallback for gen_pool.

> But rare failures may not be useful within the context of future
> clients of the "lockless" list implementation so I'd say that a safer
> approach would be to make the list implementation require
> CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG and be done with it.

Yes.

> So that's all pretty simple so far. However...
>
> The list implementation is still useful to
> non-CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG clients, as long as they aren't
> using it from NMI context, yes?
>
> In which case I suppose we could add a library of lockless_list_foo()
> functions which are usable from non-NMI contexts and which are
> available to all configs. And on top of that implement a library of
> nmi_safe_lockless_list_foo() functions which are just wrappers around
> lockless_list_foo(), but which are only available if
> CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y.
>
> Which is getting to be a bit of a pain, so you might choose to
> disregard everything after "However..." ;)

At least as the first step, I prefer to just make lockless list depend
on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

Best Regards,
Huang Ying

2010-11-17 10:41:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
>
> cmpxchg has been used in that way by ftrace and perf for a long time. So
> I agree to make it a requirement on future architecture ports.

Neither mandate an architecture do this though, only that when an
architecture wants to support either feature and has NMIs (not all archs
have NMI equivalents) it has to be safe.



2010-11-17 10:49:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 14:05 +0800, Huang Ying wrote:
> Hi, Peter,
>
> Do you think think irq_work should depend on that? Or we just
> reimplement irq_work based on lockless list and make irq_work depends on
> lockless list?

If you can make it use a lockless list thing that's fine.

I'm not sure Andrew's CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG makes much sense
for CONFIG_ARCH_HAVE_NMI=n though..

Anyway, make sure to consolidate net/rds/xlist.h and its users, having
two lockless lists is one too many.

> At least as the first step, I prefer to just make lockless list depend
> on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

But only if the platform has NMIs, otherwise its moot.

2010-11-17 11:16:07

by huang ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, Nov 17, 2010 at 6:49 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-11-17 at 14:05 +0800, Huang Ying wrote:
>> Hi, Peter,
>>
>> Do you think think irq_work should depend on that?  Or we just
>> reimplement irq_work based on lockless list and make irq_work depends on
>> lockless list?
>
> If you can make it use a lockless list thing that's fine.

I will try to do that.

> I'm not sure Andrew's CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG makes much sense
> for CONFIG_ARCH_HAVE_NMI=n though..

Sorry, I do not find ARCH_HAVE_NMI in any Kconfig, can you help me to
point out? I think cmpxchg can be used safely in lock-less code on
architectures without NMI.

> Anyway, make sure to consolidate net/rds/xlist.h and its users, having
> two lockless lists is one too many.

Sure.

>> At least as the first step, I prefer to just make lockless list depend
>> on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>
> But only if the platform has NMIs, otherwise its moot.

Yes.

Best Regards,
Huang Ying

2010-11-17 11:38:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 19:16 +0800, huang ying wrote:
> > I'm not sure Andrew's CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG makes much sense
> > for CONFIG_ARCH_HAVE_NMI=n though..
>
> Sorry, I do not find ARCH_HAVE_NMI in any Kconfig, can you help me to
> point out? I think cmpxchg can be used safely in lock-less code on
> architectures without NMI.

It doesn't exist, but what I'm saying is that HAVE_NMI_SAFE_CMPXCHG is
pointless for architectures that don't actually have an NMI, although I
guess you can argue that since it doesn't have one its safe by default,
still slightly confusing.

Some arch don't currently implement NMI like functionality (ARM comes to
mind) but they could using interrupt priorities (like SPARC64 does for
example).

2010-11-17 11:47:12

by huang ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, Nov 17, 2010 at 6:40 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
>>
>> cmpxchg has been used in that way by ftrace and perf for a long time. So
>> I agree to make it a requirement on future architecture ports.
>
> Neither mandate an architecture do this though, only that when an
> architecture wants to support either feature and has NMIs (not all archs
> have NMI equivalents) it has to be safe.

So we can make sure cmpxchg can be used in lock-less code on
architectures with perf, irq_work or ftrace enabled?

Best Regards,
Huang Ying

2010-11-17 11:53:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 19:47 +0800, huang ying wrote:
> On Wed, Nov 17, 2010 at 6:40 PM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
> >>
> >> cmpxchg has been used in that way by ftrace and perf for a long time. So
> >> I agree to make it a requirement on future architecture ports.
> >
> > Neither mandate an architecture do this though, only that when an
> > architecture wants to support either feature and has NMIs (not all archs
> > have NMI equivalents) it has to be safe.
>
> So we can make sure cmpxchg can be used in lock-less code on
> architectures with perf, irq_work or ftrace enabled?

It had better, otherwise stuff is broken.

2010-11-18 01:14:52

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Wed, 2010-11-17 at 19:53 +0800, Peter Zijlstra wrote:
> On Wed, 2010-11-17 at 19:47 +0800, huang ying wrote:
> > On Wed, Nov 17, 2010 at 6:40 PM, Peter Zijlstra <[email protected]> wrote:
> > > On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
> > >>
> > >> cmpxchg has been used in that way by ftrace and perf for a long time. So
> > >> I agree to make it a requirement on future architecture ports.
> > >
> > > Neither mandate an architecture do this though, only that when an
> > > architecture wants to support either feature and has NMIs (not all archs
> > > have NMI equivalents) it has to be safe.
> >
> > So we can make sure cmpxchg can be used in lock-less code on
> > architectures with perf, irq_work or ftrace enabled?
>
> It had better, otherwise stuff is broken.

Take a look at superh architecture cmpxchg implementation. It seems that
cmpxchg is implemented with special instruction if CONFIG_GUSA_RB=y or
CONFIG_CPU_SH4A=y, otherwise it is implemented with local_irq_save. Is
it possible that superh has not PMU support if CONFIG_GUSA_RB=n and
CONFIG_CPU_SH4A=n, so that perf work properly but no NMI safe cmpxchg in
that situation?

Best Regards,
Huang Ying

2010-11-18 08:34:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Thu, 2010-11-18 at 09:14 +0800, Huang Ying wrote:
> On Wed, 2010-11-17 at 19:53 +0800, Peter Zijlstra wrote:
> > On Wed, 2010-11-17 at 19:47 +0800, huang ying wrote:
> > > On Wed, Nov 17, 2010 at 6:40 PM, Peter Zijlstra <[email protected]> wrote:
> > > > On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
> > > >>
> > > >> cmpxchg has been used in that way by ftrace and perf for a long time. So
> > > >> I agree to make it a requirement on future architecture ports.
> > > >
> > > > Neither mandate an architecture do this though, only that when an
> > > > architecture wants to support either feature and has NMIs (not all archs
> > > > have NMI equivalents) it has to be safe.
> > >
> > > So we can make sure cmpxchg can be used in lock-less code on
> > > architectures with perf, irq_work or ftrace enabled?
> >
> > It had better, otherwise stuff is broken.
>
> Take a look at superh architecture cmpxchg implementation. It seems that
> cmpxchg is implemented with special instruction if CONFIG_GUSA_RB=y or
> CONFIG_CPU_SH4A=y, otherwise it is implemented with local_irq_save. Is
> it possible that superh has not PMU support if CONFIG_GUSA_RB=n and
> CONFIG_CPU_SH4A=n, so that perf work properly but no NMI safe cmpxchg in
> that situation?

Dunno, you forgot to CC the author of that code.. I've really no clue
about SH.

2010-11-18 08:44:22

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Thu, Nov 18, 2010 at 09:34:35AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-18 at 09:14 +0800, Huang Ying wrote:
> > On Wed, 2010-11-17 at 19:53 +0800, Peter Zijlstra wrote:
> > > On Wed, 2010-11-17 at 19:47 +0800, huang ying wrote:
> > > > On Wed, Nov 17, 2010 at 6:40 PM, Peter Zijlstra <[email protected]> wrote:
> > > > > On Wed, 2010-11-17 at 10:18 +0800, Huang Ying wrote:
> > > > >>
> > > > >> cmpxchg has been used in that way by ftrace and perf for a long time. So
> > > > >> I agree to make it a requirement on future architecture ports.
> > > > >
> > > > > Neither mandate an architecture do this though, only that when an
> > > > > architecture wants to support either feature and has NMIs (not all archs
> > > > > have NMI equivalents) it has to be safe.
> > > >
> > > > So we can make sure cmpxchg can be used in lock-less code on
> > > > architectures with perf, irq_work or ftrace enabled?
> > >
> > > It had better, otherwise stuff is broken.
> >
> > Take a look at superh architecture cmpxchg implementation. It seems that
> > cmpxchg is implemented with special instruction if CONFIG_GUSA_RB=y or
> > CONFIG_CPU_SH4A=y, otherwise it is implemented with local_irq_save. Is
> > it possible that superh has not PMU support if CONFIG_GUSA_RB=n and
> > CONFIG_CPU_SH4A=n, so that perf work properly but no NMI safe cmpxchg in
> > that situation?
>
> Dunno, you forgot to CC the author of that code.. I've really no clue
> about SH.

At the moment it's only SH-4 and SH-4A CPUs that implement PMU support,
so all of these are covered by a theoretically NMI-safe cmpxchg
implementation. This is more by coincidence than design, though.

The only cause for concern really is SH-2A which supports hardware
breakpoints (and perf events by proxy) but doesn't contain a PMU, and
only uses an IRQs disabled cmpxchg for the moment. It also supports
ftrace. I suppose I'll need to come up with a hack for this case..

2010-11-18 08:58:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Thu, 2010-11-18 at 17:43 +0900, Paul Mundt wrote:
>
> The only cause for concern really is SH-2A which supports hardware
> breakpoints (and perf events by proxy) but doesn't contain a PMU, and
> only uses an IRQs disabled cmpxchg for the moment. It also supports
> ftrace. I suppose I'll need to come up with a hack for this case..

Well, as long as it doesn't actually have NMIs it doesn't matter.. If it
does and they're actually used by linux, then yes that needs fixing.

2010-11-18 09:04:19

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH -v4 1/2] lib, Make gen_pool memory allocator lockless

On Thu, Nov 18, 2010 at 09:57:54AM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-18 at 17:43 +0900, Paul Mundt wrote:
> >
> > The only cause for concern really is SH-2A which supports hardware
> > breakpoints (and perf events by proxy) but doesn't contain a PMU, and
> > only uses an IRQs disabled cmpxchg for the moment. It also supports
> > ftrace. I suppose I'll need to come up with a hack for this case..
>
> Well, as long as it doesn't actually have NMIs it doesn't matter.. If it
> does and they're actually used by linux, then yes that needs fixing.

Yes, that's what I meant. Most SH-2A boards expose the NMI to a push
switch, so we either need to fix up cmpxchg() or start shipping cardboard
cutouts for obscuring the switch from view on existing boards. I'll be
lobbying for the latter while working on the former.. :-)