2011-04-13 08:54:04

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v3 0/4] ACPI, APEI, GHES, printk support for recoverable error via NMI

v3:

- Some changes to lock-less list based on Mathieu's comments

v2:

- Update some comments of llist per Mathieu's comments


[PATCH -v3 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
[PATCH -v3 2/4] lib, Add lock-less NULL terminated single list
[PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless
[PATCH -v3 4/4] ACPI, APEI, GHES, printk support for recoverable error via NMI


2011-04-13 08:54:13

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v3 2/4] lib, Add lock-less NULL terminated single list

Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.

Because this is a single list, so the tail can not be accessed in O(1).

If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers. They
can work simultaneously without lock. But llist_del_first can not be
used here. Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
llist_add) sequence in another consumer may violate that.

If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.

This can be summarized as follow:

| add | del_first | del_all
add | - | - | -
del_first | | L | L
del_all | | | -

Where "-" stands for no lock is needed, while "L" stands for lock is
needed.

The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc. But the list entries
can not be traversed safely before deleted from the list. The order
of deleted entries is from the newest to the oldest added one. If you
want to traverse from the oldest to the newest, you must reverse the
order by yourself before traversing.

The basic atomic operation of this list is cmpxchg on long. On
architectures that don't have NMI-safe cmpxchg implementation, the
list can NOT be used in NMI handler. So code uses the list in NMI
handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

Signed-off-by: Huang Ying <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/llist.h | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig | 3 +
lib/Makefile | 2
lib/llist.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 260 insertions(+)
create mode 100644 include/linux/llist.h
create mode 100644 lib/llist.c

--- /dev/null
+++ b/include/linux/llist.h
@@ -0,0 +1,126 @@
+#ifndef LLIST_H
+#define LLIST_H
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * If there are multiple producers and multiple consumers, llist_add
+ * can be used in producers and llist_del_all can be used in
+ * consumers. They can work simultaneously without lock. But
+ * llist_del_first can not be used here. Because llist_del_first
+ * depends on list->first->next does not changed if list->first is not
+ * changed during its operation, but llist_del_first, llist_add,
+ * llist_add (or llist_del_all, llist_add, llist_add) sequence in
+ * another consumer may violate that.
+ *
+ * If there are multiple producers and one consumer, llist_add can be
+ * used in producers and llist_del_all or llist_del_first can be used
+ * in the consumer.
+ *
+ * This can be summarized as follow:
+ *
+ * | add | del_first | del_all
+ * add | - | - | -
+ * del_first | | L | L
+ * del_all | | | -
+ *
+ * Where "-" stands for no lock is needed, while "L" stands for lock
+ * is needed.
+ *
+ * The list entries deleted via llist_del_all can be traversed with
+ * traversing function such as llist_for_each etc. But the list
+ * entries can not be traversed safely before deleted from the list.
+ * The order of deleted entries is from the newest to the oldest added
+ * one. If you want to traverse from the oldest to the newest, you
+ * must reverse the order by yourself before traversing.
+ *
+ * The basic atomic operation of this list is cmpxchg on long. On
+ * architectures that don't have NMI-safe cmpxchg implementation, the
+ * list can NOT be used in NMI handler. So code uses the list in NMI
+ * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ */
+
+struct llist_head {
+ struct llist_node *first;
+};
+
+struct llist_node {
+ struct llist_node *next;
+};
+
+#define LLIST_HEAD_INIT(name) { NULL }
+#define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name)
+
+/**
+ * init_llist_head - initialize lock-less list head
+ * @head: the head for your lock-less list
+ */
+static inline void init_llist_head(struct llist_head *list)
+{
+ list->first = NULL;
+}
+
+/**
+ * llist_entry - get the struct of this entry
+ * @ptr: the &struct llist_node pointer.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the llist_node within the struct.
+ */
+#define llist_entry(ptr, type, member) \
+ container_of(ptr, type, member)
+
+/**
+ * llist_for_each - iterate over some deleted entries of a lock-less list
+ * @pos: the &struct llist_node to use as a loop cursor
+ * @node: the first entry of deleted list entries
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being deleted from list, so start with an entry
+ * instead of list head.
+ *
+ * If being used on entries deleted from lock-less list directly, the
+ * traverse order is from the newest to the oldest added entry. If
+ * you want to traverse from the oldest to the newest, you must
+ * reverse the order by yourself before traversing.
+ */
+#define llist_for_each(pos, node) \
+ for ((pos) = (node); pos; (pos) = (pos)->next)
+
+/**
+ * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
+ * @pos: the type * to use as a loop cursor.
+ * @node: the fist entry of deleted list entries.
+ * @member: the name of the llist_node with the struct.
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being removed from list, so start with an entry
+ * instead of list head.
+ *
+ * If being used on entries deleted from lock-less list directly, the
+ * traverse order is from the newest to the oldest added entry. If
+ * you want to traverse from the oldest to the newest, you must
+ * reverse the order by yourself before traversing.
+ */
+#define llist_for_each_entry(pos, node, member) \
+ for ((pos) = llist_entry((node), typeof(*(pos)), member); \
+ &(pos)->member != NULL; \
+ (pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))
+
+/**
+ * llist_empty - tests whether a lock-less list is empty
+ * @head: the list to test
+ *
+ * Not guaranteed to be accurate or up to date. Just a quick way to
+ * test whether the list is empty without deleting something from the
+ * list.
+ */
+static inline int llist_empty(const struct llist_head *head)
+{
+ return ACCESS_ONCE(head->first) == NULL;
+}
+
+void llist_add(struct llist_node *new, struct llist_head *head);
+void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
+ struct llist_head *head);
+struct llist_node *llist_del_first(struct llist_head *head);
+struct llist_node *llist_del_all(struct llist_head *head);
+#endif /* LLIST_H */
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -272,4 +272,7 @@ config AVERAGE

If unsure, say N.

+config LLIST
+ bool
+
endmenu
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -115,6 +115,8 @@ obj-$(CONFIG_AVERAGE) += average.o

obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o

+obj-$(CONFIG_LLIST) += llist.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

--- /dev/null
+++ b/lib/llist.c
@@ -0,0 +1,129 @@
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * The basic atomic operation of this list is cmpxchg on long. On
+ * architectures that don't have NMI-safe cmpxchg implementation, the
+ * list can NOT be used in NMI handler. So code uses the list in NMI
+ * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ *
+ * Copyright 2010,2011 Intel Corp.
+ * Author: Huang Ying <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/llist.h>
+
+#include <asm/system.h>
+
+/**
+ * llist_add - add a new entry
+ * @new: new entry to be added
+ * @head: the head for your lock-less list
+ */
+void llist_add(struct llist_node *new, struct llist_head *head)
+{
+ struct llist_node *entry, *old_entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif
+
+ entry = head->first;
+ do {
+ old_entry = entry;
+ new->next = entry;
+ cpu_relax();
+ } while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
+}
+EXPORT_SYMBOL_GPL(llist_add);
+
+/**
+ * llist_add_batch - add several linked entries in batch
+ * @new_first: first entry in batch to be added
+ * @new_last: last entry in batch to be added
+ * @head: the head for your lock-less list
+ */
+void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
+ struct llist_head *head)
+{
+ struct llist_node *entry, *old_entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif
+
+ entry = head->first;
+ do {
+ old_entry = entry;
+ new_last->next = entry;
+ cpu_relax();
+ } while ((entry = cmpxchg(&head->first, old_entry, new_first)) != old_entry);
+}
+EXPORT_SYMBOL_GPL(llist_add_batch);
+
+/**
+ * llist_del_first - delete the first entry of lock-less list
+ * @head: the head for your lock-less list
+ *
+ * If list is empty, return NULL, otherwise, return the first entry
+ * deleted, this is the newest added one.
+ *
+ * Only one llist_del_first user can be used simultaneously with
+ * multiple llist_add users without lock. Because otherwise
+ * llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
+ * llist_add) sequence in another user may change @head->first->next,
+ * but keep @head->first. If multiple consumers are needed, please
+ * use llist_del_all or use lock between consumers.
+ */
+struct llist_node *llist_del_first(struct llist_head *head)
+{
+ struct llist_node *entry, *old_entry, *next;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif
+
+ entry = head->first;
+ do {
+ if (entry == NULL)
+ return NULL;
+ old_entry = entry;
+ next = entry->next;
+ cpu_relax();
+ } while ((entry = cmpxchg(&head->first, old_entry, next)) != old_entry);
+
+ return entry;
+}
+EXPORT_SYMBOL_GPL(llist_del_first);
+
+/**
+ * llist_del_all - delete all entries from lock-less list
+ * @head: the head of lock-less list to delete all entries
+ *
+ * If list is empty, return NULL, otherwise, delete all entries and
+ * return the pointer to the first entry. The order of entries
+ * deleted is from the newest to the oldest added one.
+ */
+struct llist_node *llist_del_all(struct llist_head *head)
+{
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif
+
+ return xchg(&head->first, NULL);
+}
+EXPORT_SYMBOL_GPL(llist_del_all);

2011-04-13 08:54:11

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v3 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG

cmpxchg() is widely used by lockless code, including NMI-safe lockless
code. But on some architectures, the cmpxchg() implementation is not
NMI-safe, on these architectures the lockless code may need to a
spin_trylock_irqsave() based implementation.

This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
NMI-safe lockless code can depend on it or provide different
implementation according to it.

On many architectures, cmpxchg is only NMI-safe for several specific
operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).

Signed-off-by: Huang Ying <[email protected]>
Acked-by: Mike Frysinger <[email protected]>
Acked-by: Paul Mundt <[email protected]>
Acked-by: Hans-Christian Egtvedt <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
Acked-by: Chris Metcalf <[email protected]>
CC: Richard Henderson <[email protected]>
CC: Mikael Starvik <[email protected]>
CC: David Howells <[email protected]>
CC: Yoshinori Sato <[email protected]>
CC: Tony Luck <[email protected]>
CC: Hirokazu Takata <[email protected]>
CC: Geert Uytterhoeven <[email protected]>
CC: Michal Simek <[email protected]>
CC: Ralf Baechle <[email protected]>
CC: Kyle McMartin <[email protected]>
CC: Martin Schwidefsky <[email protected]>
CC: Chen Liqin <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Chris Zankel <[email protected]>
---
arch/Kconfig | 3 +++
arch/alpha/Kconfig | 1 +
arch/avr32/Kconfig | 1 +
arch/frv/Kconfig | 1 +
arch/ia64/Kconfig | 1 +
arch/m68k/Kconfig | 1 +
arch/parisc/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/sh/Kconfig | 1 +
arch/sparc/Kconfig | 1 +
arch/tile/Kconfig | 1 +
arch/x86/Kconfig | 1 +
13 files changed, 15 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -178,4 +178,7 @@ config HAVE_ARCH_JUMP_LABEL
config HAVE_ARCH_MUTEX_CPU_RELAX
bool

+config ARCH_HAVE_NMI_SAFE_CMPXCHG
+ bool
+
source "kernel/gcov/Kconfig"
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -12,6 +12,7 @@ config ALPHA
select GENERIC_IRQ_PROBE
select AUTO_IRQ_AFFINITY if SMP
select GENERIC_IRQ_SHOW
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG
help
The Alpha is a 64-bit general-purpose processor designed and
marketed by the Digital Equipment Corporation of blessed memory,
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -10,6 +10,7 @@ config AVR32
select GENERIC_IRQ_PROBE
select HARDIRQS_SW_RESEND
select GENERIC_IRQ_SHOW
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG
help
AVR32 is a high-performance 32-bit RISC microprocessor core,
designed for cost-sensitive embedded applications, with particular
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -7,6 +7,7 @@ config FRV
select HAVE_PERF_EVENTS
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_SHOW
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG

config ZONE_DMA
bool
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -27,6 +27,7 @@ config IA64
select GENERIC_PENDING_IRQ if SMP
select IRQ_PER_CPU
select GENERIC_IRQ_SHOW
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG
default y
help
The Itanium Processor Family is Intel's 64-bit successor to
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -5,6 +5,7 @@ config M68K
select HAVE_AOUT if MMU
select GENERIC_ATOMIC64 if MMU
select HAVE_GENERIC_HARDIRQS if !MMU
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG if RMW_INSNS

config RWSEM_GENERIC_SPINLOCK
bool
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -15,6 +15,7 @@ config PARISC
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_PROBE
select IRQ_PER_CPU
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG

help
The PA-RISC microprocessor is designed by Hewlett-Packard and used
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
select IRQ_PER_CPU
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG

config EARLY_PRINTK
bool
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -81,6 +81,7 @@ config S390
select INIT_ALL_POSSIBLE
select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -11,6 +11,7 @@ config SUPERH
select HAVE_DMA_ATTRS
select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
select PERF_USE_VMALLOC
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -53,6 +53,7 @@ config SPARC64
select HAVE_GENERIC_HARDIRQS
select GENERIC_IRQ_SHOW
select IRQ_PREFLOW_FASTEOI
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG

config ARCH_DEFCONFIG
string
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -12,6 +12,7 @@ config TILE
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
select GENERIC_IRQ_SHOW
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG if !M386

# FIXME: investigate whether we need/want these options.
# select HAVE_IOREMAP_PROT
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -72,6 +72,7 @@ config X86
select IRQ_FORCED_THREADING
select USE_GENERIC_SMP_HELPERS if SMP
select ARCH_NO_SYSDEV_OPS
+ select ARCH_HAVE_NMI_SAFE_CMPXCHG

config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)

2011-04-13 08:54:44

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v3 4/4] ACPI, APEI, GHES, printk support for recoverable error via NMI

Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.

To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list. On the other
hand, a irq_work is used to delay the operation from NMI context to
IRQ context. The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.

Signed-off-by: Huang Ying <[email protected]>
---
drivers/acpi/apei/Kconfig | 2
drivers/acpi/apei/ghes.c | 188 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 176 insertions(+), 14 deletions(-)

--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -12,6 +12,8 @@ config ACPI_APEI_GHES
tristate "APEI Generic Hardware Error Source"
depends on ACPI_APEI && X86
select ACPI_HED
+ select LLIST
+ select GENERIC_ALLOCATOR
help
Generic Hardware Error Source provides a way to report
platform hardware errors (such as that from chipset). It
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -12,7 +12,7 @@
* For more information about Generic Hardware Error Source, please
* refer to ACPI Specification version 4.0, section 17.3.2.6
*
- * Copyright 2010 Intel Corp.
+ * Copyright 2010,2011 Intel Corp.
* Author: Huang Ying <[email protected]>
*
* This program is free software; you can redistribute it and/or
@@ -42,6 +42,9 @@
#include <linux/mutex.h>
#include <linux/ratelimit.h>
#include <linux/vmalloc.h>
+#include <linux/irq_work.h>
+#include <linux/llist.h>
+#include <linux/genalloc.h>
#include <acpi/apei.h>
#include <acpi/atomicio.h>
#include <acpi/hed.h>
@@ -53,6 +56,15 @@
#define GHES_PFX "GHES: "

#define GHES_ESTATUS_MAX_SIZE 65536
+#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
+
+#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
+
+#define GHES_ESTATUS_NODE_LEN(estatus_len) \
+ (sizeof(struct ghes_estatus_node) + (estatus_len))
+#define GHES_ESTATUS_FROM_NODE(estatus_node) \
+ ((struct acpi_hest_generic_status *) \
+ ((struct ghes_estatus_node *)(estatus_node) + 1))

/*
* One struct ghes is created for each generic hardware error source.
@@ -77,6 +89,11 @@ struct ghes {
};
};

+struct ghes_estatus_node {
+ struct llist_node llnode;
+ struct acpi_hest_generic *generic;
+};
+
static int ghes_panic_timeout __read_mostly = 30;

/*
@@ -121,6 +138,19 @@ static struct vm_struct *ghes_ioremap_ar
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);

+/*
+ * printk is not safe in NMI context. So in NMI handler, we allocate
+ * required memory from lock-less memory allocator
+ * (ghes_estatus_pool), save estatus into it, put them into lock-less
+ * list (ghes_estatus_llist), then delay printk into IRQ context via
+ * irq_work (ghes_proc_irq_work). ghes_estatus_size_request record
+ * required pool size by all NMI error source.
+ */
+static struct gen_pool *ghes_estatus_pool;
+static unsigned long ghes_estatus_pool_size_request;
+static struct llist_head ghes_estatus_llist;
+static struct irq_work ghes_proc_irq_work;
+
static int ghes_ioremap_init(void)
{
ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -180,6 +210,50 @@ static void ghes_iounmap_irq(void __iome
__flush_tlb_one(vaddr);
}

+static int ghes_estatus_pool_init(void)
+{
+ ghes_estatus_pool = gen_pool_create(GHES_ESTATUS_POOL_MIN_ALLOC_ORDER, -1);
+ if (!ghes_estatus_pool)
+ return -ENOMEM;
+ return 0;
+}
+
+static void ghes_estatus_pool_exit(void)
+{
+ struct gen_pool_chunk *chunk;
+
+ gen_pool_for_each_chunk(chunk, ghes_estatus_pool)
+ free_page(chunk->start_addr);
+ gen_pool_destroy(ghes_estatus_pool);
+}
+
+static int ghes_estatus_pool_expand(unsigned long len)
+{
+ unsigned long i, pages, size, addr;
+ int ret;
+
+ ghes_estatus_pool_size_request += PAGE_ALIGN(len);
+ size = gen_pool_size(ghes_estatus_pool);
+ if (size >= ghes_estatus_pool_size_request)
+ return 0;
+ pages = (ghes_estatus_pool_size_request - size) / PAGE_SIZE;
+ for (i = 0; i < pages; i++) {
+ addr = __get_free_page(GFP_KERNEL);
+ if (!addr)
+ return -ENOMEM;
+ ret = gen_pool_add(ghes_estatus_pool, addr, PAGE_SIZE, -1);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void ghes_estatus_pool_shrink(unsigned long len)
+{
+ ghes_estatus_pool_size_request -= PAGE_ALIGN(len);
+}
+
static struct ghes *ghes_new(struct acpi_hest_generic *generic)
{
struct ghes *ghes;
@@ -341,13 +415,13 @@ static void ghes_clear_estatus(struct gh
ghes->flags &= ~GHES_TO_CLEAR;
}

-static void ghes_do_proc(struct ghes *ghes)
+static void ghes_do_proc(const struct acpi_hest_generic_status *estatus)
{
int sev, processed = 0;
struct acpi_hest_generic_data *gdata;

- sev = ghes_severity(ghes->estatus->error_severity);
- apei_estatus_for_each_section(ghes->estatus, gdata) {
+ sev = ghes_severity(estatus->error_severity);
+ apei_estatus_for_each_section(estatus, gdata) {
#ifdef CONFIG_X86_MCE
if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
CPER_SEC_PLATFORM_MEM)) {
@@ -360,13 +434,15 @@ static void ghes_do_proc(struct ghes *gh
}
}

-static void ghes_print_estatus(const char *pfx, struct ghes *ghes)
+static void ghes_print_estatus(const char *pfx,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus)
{
/* Not more than 2 messages every 5 seconds */
static DEFINE_RATELIMIT_STATE(ratelimit, 5*HZ, 2);

if (pfx == NULL) {
- if (ghes_severity(ghes->estatus->error_severity) <=
+ if (ghes_severity(estatus->error_severity) <=
GHES_SEV_CORRECTED)
pfx = KERN_WARNING HW_ERR;
else
@@ -375,8 +451,8 @@ static void ghes_print_estatus(const cha
if (__ratelimit(&ratelimit)) {
printk(
"%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
- pfx, ghes->generic->header.source_id);
- apei_estatus_print(pfx, ghes->estatus);
+ pfx, generic->header.source_id);
+ apei_estatus_print(pfx, estatus);
}
}

@@ -387,8 +463,8 @@ static int ghes_proc(struct ghes *ghes)
rc = ghes_read_estatus(ghes, 0);
if (rc)
goto out;
- ghes_print_estatus(NULL, ghes);
- ghes_do_proc(ghes);
+ ghes_print_estatus(NULL, ghes->generic, ghes->estatus);
+ ghes_do_proc(ghes->estatus);

out:
ghes_clear_estatus(ghes);
@@ -447,6 +523,40 @@ static int ghes_notify_sci(struct notifi
return ret;
}

+static void ghes_proc_in_irq(struct irq_work *irq_work)
+{
+ struct llist_node *llnode, *next, *tail = NULL;
+ struct ghes_estatus_node *estatus_node;
+ struct acpi_hest_generic_status *estatus;
+ u32 len, node_len;
+
+ /*
+ * Because the time order of estatus in list is reversed,
+ * revert it back to proper order.
+ */
+ llnode = llist_del_all(&ghes_estatus_llist);
+ while (llnode) {
+ next = llnode->next;
+ llnode->next = tail;
+ tail = llnode;
+ llnode = next;
+ }
+ llnode = tail;
+ while (llnode) {
+ next = llnode->next;
+ estatus_node = llist_entry(llnode, struct ghes_estatus_node,
+ llnode);
+ estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+ len = apei_estatus_len(estatus);
+ node_len = GHES_ESTATUS_NODE_LEN(len);
+ ghes_do_proc(estatus);
+ ghes_print_estatus(NULL, estatus_node->generic, estatus);
+ gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
+ node_len);
+ llnode = next;
+ }
+}
+
static int ghes_notify_nmi(struct notifier_block *this,
unsigned long cmd, void *data)
{
@@ -476,7 +586,8 @@ static int ghes_notify_nmi(struct notifi

if (sev_global >= GHES_SEV_PANIC) {
oops_begin();
- ghes_print_estatus(KERN_EMERG HW_ERR, ghes_global);
+ ghes_print_estatus(KERN_EMERG HW_ERR, ghes_global->generic,
+ ghes_global->estatus);
/* reboot to log the error! */
if (panic_timeout == 0)
panic_timeout = ghes_panic_timeout;
@@ -484,12 +595,31 @@ static int ghes_notify_nmi(struct notifi
}

list_for_each_entry_rcu(ghes, &ghes_nmi, list) {
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ u32 len, node_len;
+ struct ghes_estatus_node *estatus_node;
+ struct acpi_hest_generic_status *estatus;
+#endif
if (!(ghes->flags & GHES_TO_CLEAR))
continue;
- /* Do not print estatus because printk is not NMI safe */
- ghes_do_proc(ghes);
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ /* Save estatus for further processing in IRQ context */
+ len = apei_estatus_len(ghes->estatus);
+ node_len = GHES_ESTATUS_NODE_LEN(len);
+ estatus_node = (void *)gen_pool_alloc(ghes_estatus_pool,
+ node_len);
+ if (estatus_node) {
+ estatus_node->generic = ghes->generic;
+ estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+ memcpy(estatus, ghes->estatus, len);
+ llist_add(&estatus_node->llnode, &ghes_estatus_llist);
+ }
+#endif
ghes_clear_estatus(ghes);
}
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ irq_work_queue(&ghes_proc_irq_work);
+#endif

out:
raw_spin_unlock(&ghes_nmi_lock);
@@ -504,10 +634,26 @@ static struct notifier_block ghes_notifi
.notifier_call = ghes_notify_nmi,
};

+static unsigned long ghes_esource_prealloc_size(
+ const struct acpi_hest_generic *generic)
+{
+ unsigned long block_length, prealloc_records, prealloc_size;
+
+ block_length = min_t(unsigned long, generic->error_block_length,
+ GHES_ESTATUS_MAX_SIZE);
+ prealloc_records = max_t(unsigned long,
+ generic->records_to_preallocate, 1);
+ prealloc_size = min_t(unsigned long, block_length * prealloc_records,
+ GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+ return prealloc_size;
+}
+
static int __devinit ghes_probe(struct platform_device *ghes_dev)
{
struct acpi_hest_generic *generic;
struct ghes *ghes = NULL;
+ unsigned long len;
int rc = -EINVAL;

generic = *(struct acpi_hest_generic **)ghes_dev->dev.platform_data;
@@ -573,6 +719,8 @@ static int __devinit ghes_probe(struct p
mutex_unlock(&ghes_list_mutex);
break;
case ACPI_HEST_NOTIFY_NMI:
+ len = ghes_esource_prealloc_size(generic);
+ ghes_estatus_pool_expand(len);
mutex_lock(&ghes_list_mutex);
if (list_empty(&ghes_nmi))
register_die_notifier(&ghes_notifier_nmi);
@@ -597,6 +745,7 @@ static int __devexit ghes_remove(struct
{
struct ghes *ghes;
struct acpi_hest_generic *generic;
+ unsigned long len;

ghes = platform_get_drvdata(ghes_dev);
generic = ghes->generic;
@@ -627,6 +776,8 @@ static int __devexit ghes_remove(struct
* freed after NMI handler finishes.
*/
synchronize_rcu();
+ len = ghes_esource_prealloc_size(generic);
+ ghes_estatus_pool_shrink(len);
break;
default:
BUG();
@@ -662,15 +813,23 @@ static int __init ghes_init(void)
return -EINVAL;
}

+ init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq);
+
rc = ghes_ioremap_init();
if (rc)
goto err;

- rc = platform_driver_register(&ghes_platform_driver);
+ rc = ghes_estatus_pool_init();
if (rc)
goto err_ioremap_exit;

+ rc = platform_driver_register(&ghes_platform_driver);
+ if (rc)
+ goto err_pool_exit;
+
return 0;
+err_pool_exit:
+ ghes_estatus_pool_exit();
err_ioremap_exit:
ghes_ioremap_exit();
err:
@@ -680,6 +839,7 @@ err:
static void __exit ghes_exit(void)
{
platform_driver_unregister(&ghes_platform_driver);
+ ghes_estatus_pool_exit();
ghes_ioremap_exit();
}

2011-04-13 08:54:57

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

This version of the gen_pool memory allocator supports lockless
operation.

This makes it safe to use in NMI handlers and other special
unblockable contexts that could otherwise deadlock on locks. This is
implemented by using atomic operations and retries on any conflicts.
The disadvantage is that there may be livelocks in extreme cases. For
better scalability, one gen_pool allocator can be used for each CPU.

The lockless operation only works if there is enough memory available.
If new memory is added to the pool a lock has to be still taken. So
any user relying on locklessness has to ensure that sufficient memory
is preallocated.

The basic atomic operation of this allocator is cmpxchg on long. On
architectures that don't have NMI-safe cmpxchg implementation, the
allocator can NOT be used in NMI handler. So code uses the allocator
in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

Signed-off-by: Huang Ying <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/bitmap.h | 1
include/linux/genalloc.h | 46 +++++++-
lib/bitmap.c | 2
lib/genalloc.c | 256 ++++++++++++++++++++++++++++++++++++++---------
4 files changed, 250 insertions(+), 55 deletions(-)

--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -142,6 +142,7 @@ extern void bitmap_release_region(unsign
extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);

+#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
#define BITMAP_LAST_WORD_MASK(nbits) \
( \
((nbits) % BITS_PER_LONG) ? \
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -1,8 +1,28 @@
+#ifndef GENALLOC_H
+#define GENALLOC_H
/*
- * Basic general purpose allocator for managing special purpose memory
- * not managed by the regular kmalloc/kfree interface.
- * Uses for this includes on-device special memory, uncached memory
- * etc.
+ * Basic general purpose allocator for managing special purpose
+ * memory, for example, memory that is not managed by the regular
+ * kmalloc/kfree interface. Uses for this includes on-device special
+ * memory, uncached memory etc.
+ *
+ * It is safe to use the allocator in NMI handlers and other special
+ * unblockable contexts that could otherwise deadlock on locks. This
+ * is implemented by using atomic operations and retries on any
+ * conflicts. The disadvantage is that there may be livelocks in
+ * extreme cases. For better scalability, one allocator can be used
+ * for each CPU.
+ *
+ * The lockless operation only works if there is enough memory
+ * available. If new memory is added to the pool a lock has to be
+ * still taken. So any user relying on locklessness has to ensure
+ * that sufficient memory is preallocated.
+ *
+ * The basic atomic operation of this allocator is cmpxchg on long.
+ * On architectures that don't have NMI-safe cmpxchg implementation,
+ * the allocator can NOT be used in NMI handler. So code uses the
+ * allocator in NMI handler should depend on
+ * CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
*
* This source code is licensed under the GNU General Public License,
* Version 2. See the file COPYING for more details.
@@ -13,7 +33,7 @@
* General purpose special memory pool descriptor.
*/
struct gen_pool {
- rwlock_t lock;
+ spinlock_t lock;
struct list_head chunks; /* list of chunks in this pool */
int min_alloc_order; /* minimum allocation order */
};
@@ -22,15 +42,29 @@ struct gen_pool {
* General purpose special memory pool chunk descriptor.
*/
struct gen_pool_chunk {
- spinlock_t lock;
struct list_head next_chunk; /* next chunk in pool */
+ atomic_t avail;
unsigned long start_addr; /* starting address of memory chunk */
unsigned long end_addr; /* ending address of memory chunk */
unsigned long bits[0]; /* bitmap for allocating memory chunk */
};

+/**
+ * gen_pool_for_each_chunk - iterate over chunks of generic memory pool
+ * @chunk: the struct gen_pool_chunk * to use as a loop cursor
+ * @pool: the generic memory pool
+ *
+ * Not lockless, proper mutual exclusion is needed to use this macro
+ * with other gen_pool function simultaneously.
+ */
+#define gen_pool_for_each_chunk(chunk, pool) \
+ list_for_each_entry_rcu(chunk, &(pool)->chunks, next_chunk)
+
extern struct gen_pool *gen_pool_create(int, int);
extern int gen_pool_add(struct gen_pool *, unsigned long, size_t, int);
extern void gen_pool_destroy(struct gen_pool *);
extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+extern size_t gen_pool_avail(struct gen_pool *);
+extern size_t gen_pool_size(struct gen_pool *);
+#endif /* GENALLOC_H */
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -271,8 +271,6 @@ int __bitmap_weight(const unsigned long
}
EXPORT_SYMBOL(__bitmap_weight);

-#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) % BITS_PER_LONG))
-
void bitmap_set(unsigned long *map, int start, int nr)
{
unsigned long *p = map + BIT_WORD(start);
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -1,8 +1,33 @@
/*
- * Basic general purpose allocator for managing special purpose memory
- * not managed by the regular kmalloc/kfree interface.
- * Uses for this includes on-device special memory, uncached memory
- * etc.
+ * Basic general purpose allocator for managing special purpose
+ * memory, for example, memory that is not managed by the regular
+ * kmalloc/kfree interface. Uses for this includes on-device special
+ * memory, uncached memory etc.
+ *
+ * It is safe to use the allocator in NMI handlers and other special
+ * unblockable contexts that could otherwise deadlock on locks. This
+ * is implemented by using atomic operations and retries on any
+ * conflicts. The disadvantage is that there may be livelocks in
+ * extreme cases. For better scalability, one allocator can be used
+ * for each CPU.
+ *
+ * The lockless operation only works if there is enough memory
+ * available. If new memory is added to the pool a lock has to be
+ * still taken. So any user relying on locklessness has to ensure
+ * that sufficient memory is preallocated.
+ *
+ * The basic atomic operation of this allocator is cmpxchg on long.
+ * On architectures that don't have NMI-safe cmpxchg implementation,
+ * the allocator can NOT be used in NMI handler. So code uses the
+ * allocator in NMI handler should depend on
+ * CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ *
+ * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
+ * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
+ * are only added into pool, not deleted from pool unless the pool
+ * itself is destroyed. If chunk will be deleted from pool,
+ * rcu_read_lock and rcu_read_unlock should be uses in these
+ * functions.
*
* Copyright 2005 (C) Jes Sorensen <[email protected]>
*
@@ -13,8 +38,109 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/bitmap.h>
+#include <linux/rculist.h>
+#include <linux/interrupt.h>
#include <linux/genalloc.h>

+static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+{
+ unsigned long val, nval;
+
+ nval = *addr;
+ do {
+ val = nval;
+ if (val & mask_to_set)
+ return -EBUSY;
+ cpu_relax();
+ } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
+
+ return 0;
+}
+
+static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+{
+ unsigned long val, nval;
+
+ nval = *addr;
+ do {
+ val = nval;
+ if ((val & mask_to_clear) != mask_to_clear)
+ return -EBUSY;
+ cpu_relax();
+ } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
+
+ return 0;
+}
+
+/*
+ * bitmap_set_ll - set the specified number of bits at the specified position
+ * @map: pointer to a bitmap
+ * @start: a bit position in @map
+ * @nr: number of bits to set
+ *
+ * Set @nr bits start from @start in @map lock-lessly. Several users
+ * can set/clear the same bitmap simultaneously without lock. If two
+ * users set the same bit, one user will return remain bits, otherwise
+ * return 0.
+ */
+static int bitmap_set_ll(unsigned long *map, int start, int nr)
+{
+ unsigned long *p = map + BIT_WORD(start);
+ const int size = start + nr;
+ int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
+ unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
+
+ while (nr - bits_to_set >= 0) {
+ if (set_bits_ll(p, mask_to_set))
+ return nr;
+ nr -= bits_to_set;
+ bits_to_set = BITS_PER_LONG;
+ mask_to_set = ~0UL;
+ p++;
+ }
+ if (nr) {
+ mask_to_set &= BITMAP_LAST_WORD_MASK(size);
+ if (set_bits_ll(p, mask_to_set))
+ return nr;
+ }
+
+ return 0;
+}
+
+/*
+ * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * @map: pointer to a bitmap
+ * @start: a bit position in @map
+ * @nr: number of bits to set
+ *
+ * Clear @nr bits start from @start in @map lock-lessly. Several users
+ * can set/clear the same bitmap simultaneously without lock. If two
+ * users clear the same bit, one user will return remain bits,
+ * otherwise return 0.
+ */
+static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+{
+ unsigned long *p = map + BIT_WORD(start);
+ const int size = start + nr;
+ int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+ unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+
+ while (nr - bits_to_clear >= 0) {
+ if (clear_bits_ll(p, mask_to_clear))
+ return nr;
+ nr -= bits_to_clear;
+ bits_to_clear = BITS_PER_LONG;
+ mask_to_clear = ~0UL;
+ p++;
+ }
+ if (nr) {
+ mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+ if (clear_bits_ll(p, mask_to_clear))
+ return nr;
+ }
+
+ return 0;
+}

/**
* gen_pool_create - create a new special memory pool
@@ -30,7 +156,7 @@ struct gen_pool *gen_pool_create(int min

pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid);
if (pool != NULL) {
- rwlock_init(&pool->lock);
+ spin_lock_init(&pool->lock);
INIT_LIST_HEAD(&pool->chunks);
pool->min_alloc_order = min_alloc_order;
}
@@ -58,15 +184,15 @@ int gen_pool_add(struct gen_pool *pool,

chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
if (unlikely(chunk == NULL))
- return -1;
+ return -ENOMEM;

- spin_lock_init(&chunk->lock);
chunk->start_addr = addr;
chunk->end_addr = addr + size;
+ atomic_set(&chunk->avail, size);

- write_lock(&pool->lock);
- list_add(&chunk->next_chunk, &pool->chunks);
- write_unlock(&pool->lock);
+ spin_lock(&pool->lock);
+ list_add_rcu(&chunk->next_chunk, &pool->chunks);
+ spin_unlock(&pool->lock);

return 0;
}
@@ -86,7 +212,6 @@ void gen_pool_destroy(struct gen_pool *p
int order = pool->min_alloc_order;
int bit, end_bit;

-
list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
list_del(&chunk->next_chunk);
@@ -108,43 +233,47 @@ EXPORT_SYMBOL(gen_pool_destroy);
* @size: number of bytes to allocate from the pool
*
* Allocate the requested number of bytes from the specified pool.
- * Uses a first-fit algorithm.
+ * Uses a first-fit algorithm. Can not be used in NMI handler on
+ * architectures without NMI-safe cmpxchg implementation.
*/
unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
- unsigned long addr, flags;
+ unsigned long addr;
int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit;
+ int nbits, start_bit = 0, end_bit, remain;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif

if (size == 0)
return 0;

nbits = (size + (1UL << order) - 1) >> order;
-
- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
+ if (size > atomic_read(&chunk->avail))
+ continue;

end_bit = (chunk->end_addr - chunk->start_addr) >> order;
-
- spin_lock_irqsave(&chunk->lock, flags);
- start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
- nbits, 0);
- if (start_bit >= end_bit) {
- spin_unlock_irqrestore(&chunk->lock, flags);
+retry:
+ start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit,
+ start_bit, nbits, 0);
+ if (start_bit >= end_bit)
continue;
+ remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+ if (remain) {
+ remain = bitmap_clear_ll(chunk->bits, start_bit,
+ nbits - remain);
+ BUG_ON(remain);
+ goto retry;
}

addr = chunk->start_addr + ((unsigned long)start_bit << order);
-
- bitmap_set(chunk->bits, start_bit, nbits);
- spin_unlock_irqrestore(&chunk->lock, flags);
- read_unlock(&pool->lock);
+ size = nbits << order;
+ atomic_sub(size, &chunk->avail);
return addr;
}
- read_unlock(&pool->lock);
return 0;
}
EXPORT_SYMBOL(gen_pool_alloc);
@@ -155,33 +284,66 @@ EXPORT_SYMBOL(gen_pool_alloc);
* @addr: starting address of memory to free back to pool
* @size: size in bytes of memory to free
*
- * Free previously allocated special memory back to the specified pool.
+ * Free previously allocated special memory back to the specified
+ * pool. Can not be used in NMI handler on architectures without
+ * NMI-safe cmpxchg implementation.
*/
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
- struct list_head *_chunk;
struct gen_pool_chunk *chunk;
- unsigned long flags;
int order = pool->min_alloc_order;
- int bit, nbits;
-
- nbits = (size + (1UL << order) - 1) >> order;
+ int start_bit, nbits, remain;

- read_lock(&pool->lock);
- list_for_each(_chunk, &pool->chunks) {
- chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+ BUG_ON(in_nmi());
+#endif

+ nbits = (size + (1UL << order) - 1) >> order;
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (addr >= chunk->start_addr && addr < chunk->end_addr) {
BUG_ON(addr + size > chunk->end_addr);
- spin_lock_irqsave(&chunk->lock, flags);
- bit = (addr - chunk->start_addr) >> order;
- while (nbits--)
- __clear_bit(bit++, chunk->bits);
- spin_unlock_irqrestore(&chunk->lock, flags);
- break;
+ start_bit = (addr - chunk->start_addr) >> order;
+ remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+ BUG_ON(remain);
+ size = nbits << order;
+ atomic_add(size, &chunk->avail);
+ return;
}
}
- BUG_ON(nbits > 0);
- read_unlock(&pool->lock);
+ BUG();
}
EXPORT_SYMBOL(gen_pool_free);
+
+/**
+ * gen_pool_avail - get available free space of the pool
+ * @pool: pool to get available free space
+ *
+ * Return available free space of the specified pool.
+ */
+size_t gen_pool_avail(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ size_t avail = 0;
+
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
+ avail += atomic_read(&chunk->avail);
+ return avail;
+}
+EXPORT_SYMBOL_GPL(gen_pool_avail);
+
+/**
+ * gen_pool_size - get size in bytes of memory managed by the pool
+ * @pool: pool to get size
+ *
+ * Return size in bytes of memory managed by the pool.
+ */
+size_t gen_pool_size(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ size_t size = 0;
+
+ list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
+ size += chunk->end_addr - chunk->start_addr;
+ return size;
+}
+EXPORT_SYMBOL_GPL(gen_pool_size);

2011-04-13 21:08:01

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

* Huang Ying ([email protected]) wrote:
[...]
> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
> + * are only added into pool, not deleted from pool unless the pool
> + * itself is destroyed. If chunk will be deleted from pool,
> + * rcu_read_lock and rcu_read_unlock should be uses in these
> + * functions.

So how do you protect between pool destruction and adding chunks into
the pool ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2011-04-13 21:11:04

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -v3 2/4] lib, Add lock-less NULL terminated single list

* Huang Ying ([email protected]) wrote:
> Cmpxchg is used to implement adding new entry to the list, deleting
> all entries from the list, deleting first entry of the list and some
> other operations.
>
> Because this is a single list, so the tail can not be accessed in O(1).
>
> If there are multiple producers and multiple consumers, llist_add can
> be used in producers and llist_del_all can be used in consumers. They
> can work simultaneously without lock. But llist_del_first can not be
> used here. Because llist_del_first depends on list->first->next does
> not changed if list->first is not changed during its operation, but
> llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
> llist_add) sequence in another consumer may violate that.
>
> If there are multiple producers and one consumer, llist_add can be
> used in producers and llist_del_all or llist_del_first can be used in
> the consumer.
>
> This can be summarized as follow:
>
> | add | del_first | del_all
> add | - | - | -
> del_first | | L | L
> del_all | | | -
>
> Where "-" stands for no lock is needed, while "L" stands for lock is
> needed.
>
> The list entries deleted via llist_del_all can be traversed with
> traversing function such as llist_for_each etc. But the list entries
> can not be traversed safely before deleted from the list. The order
> of deleted entries is from the newest to the oldest added one. If you
> want to traverse from the oldest to the newest, you must reverse the
> order by yourself before traversing.
>
> The basic atomic operation of this list is cmpxchg on long. On
> architectures that don't have NMI-safe cmpxchg implementation, the
> list can NOT be used in NMI handler. So code uses the list in NMI
> handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>
> Signed-off-by: Huang Ying <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>

Reviewed-by: Mathieu Desnoyers <[email protected]>

Thanks,

Mathieu

> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/llist.h | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
> lib/Kconfig | 3 +
> lib/Makefile | 2
> lib/llist.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 260 insertions(+)
> create mode 100644 include/linux/llist.h
> create mode 100644 lib/llist.c
>
> --- /dev/null
> +++ b/include/linux/llist.h
> @@ -0,0 +1,126 @@
> +#ifndef LLIST_H
> +#define LLIST_H
> +/*
> + * Lock-less NULL terminated single linked list
> + *
> + * If there are multiple producers and multiple consumers, llist_add
> + * can be used in producers and llist_del_all can be used in
> + * consumers. They can work simultaneously without lock. But
> + * llist_del_first can not be used here. Because llist_del_first
> + * depends on list->first->next does not changed if list->first is not
> + * changed during its operation, but llist_del_first, llist_add,
> + * llist_add (or llist_del_all, llist_add, llist_add) sequence in
> + * another consumer may violate that.
> + *
> + * If there are multiple producers and one consumer, llist_add can be
> + * used in producers and llist_del_all or llist_del_first can be used
> + * in the consumer.
> + *
> + * This can be summarized as follow:
> + *
> + * | add | del_first | del_all
> + * add | - | - | -
> + * del_first | | L | L
> + * del_all | | | -
> + *
> + * Where "-" stands for no lock is needed, while "L" stands for lock
> + * is needed.
> + *
> + * The list entries deleted via llist_del_all can be traversed with
> + * traversing function such as llist_for_each etc. But the list
> + * entries can not be traversed safely before deleted from the list.
> + * The order of deleted entries is from the newest to the oldest added
> + * one. If you want to traverse from the oldest to the newest, you
> + * must reverse the order by yourself before traversing.
> + *
> + * The basic atomic operation of this list is cmpxchg on long. On
> + * architectures that don't have NMI-safe cmpxchg implementation, the
> + * list can NOT be used in NMI handler. So code uses the list in NMI
> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> + */
> +
> +struct llist_head {
> + struct llist_node *first;
> +};
> +
> +struct llist_node {
> + struct llist_node *next;
> +};
> +
> +#define LLIST_HEAD_INIT(name) { NULL }
> +#define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name)
> +
> +/**
> + * init_llist_head - initialize lock-less list head
> + * @head: the head for your lock-less list
> + */
> +static inline void init_llist_head(struct llist_head *list)
> +{
> + list->first = NULL;
> +}
> +
> +/**
> + * llist_entry - get the struct of this entry
> + * @ptr: the &struct llist_node pointer.
> + * @type: the type of the struct this is embedded in.
> + * @member: the name of the llist_node within the struct.
> + */
> +#define llist_entry(ptr, type, member) \
> + container_of(ptr, type, member)
> +
> +/**
> + * llist_for_each - iterate over some deleted entries of a lock-less list
> + * @pos: the &struct llist_node to use as a loop cursor
> + * @node: the first entry of deleted list entries
> + *
> + * In general, some entries of the lock-less list can be traversed
> + * safely only after being deleted from list, so start with an entry
> + * instead of list head.
> + *
> + * If being used on entries deleted from lock-less list directly, the
> + * traverse order is from the newest to the oldest added entry. If
> + * you want to traverse from the oldest to the newest, you must
> + * reverse the order by yourself before traversing.
> + */
> +#define llist_for_each(pos, node) \
> + for ((pos) = (node); pos; (pos) = (pos)->next)
> +
> +/**
> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
> + * @pos: the type * to use as a loop cursor.
> + * @node: the fist entry of deleted list entries.
> + * @member: the name of the llist_node with the struct.
> + *
> + * In general, some entries of the lock-less list can be traversed
> + * safely only after being removed from list, so start with an entry
> + * instead of list head.
> + *
> + * If being used on entries deleted from lock-less list directly, the
> + * traverse order is from the newest to the oldest added entry. If
> + * you want to traverse from the oldest to the newest, you must
> + * reverse the order by yourself before traversing.
> + */
> +#define llist_for_each_entry(pos, node, member) \
> + for ((pos) = llist_entry((node), typeof(*(pos)), member); \
> + &(pos)->member != NULL; \
> + (pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))
> +
> +/**
> + * llist_empty - tests whether a lock-less list is empty
> + * @head: the list to test
> + *
> + * Not guaranteed to be accurate or up to date. Just a quick way to
> + * test whether the list is empty without deleting something from the
> + * list.
> + */
> +static inline int llist_empty(const struct llist_head *head)
> +{
> + return ACCESS_ONCE(head->first) == NULL;
> +}
> +
> +void llist_add(struct llist_node *new, struct llist_head *head);
> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> + struct llist_head *head);
> +struct llist_node *llist_del_first(struct llist_head *head);
> +struct llist_node *llist_del_all(struct llist_head *head);
> +#endif /* LLIST_H */
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -272,4 +272,7 @@ config AVERAGE
>
> If unsure, say N.
>
> +config LLIST
> + bool
> +
> endmenu
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -115,6 +115,8 @@ obj-$(CONFIG_AVERAGE) += average.o
>
> obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o
>
> +obj-$(CONFIG_LLIST) += llist.o
> +
> hostprogs-y := gen_crc32table
> clean-files := crc32table.h
>
> --- /dev/null
> +++ b/lib/llist.c
> @@ -0,0 +1,129 @@
> +/*
> + * Lock-less NULL terminated single linked list
> + *
> + * The basic atomic operation of this list is cmpxchg on long. On
> + * architectures that don't have NMI-safe cmpxchg implementation, the
> + * list can NOT be used in NMI handler. So code uses the list in NMI
> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> + *
> + * Copyright 2010,2011 Intel Corp.
> + * Author: Huang Ying <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation;
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +#include <linux/llist.h>
> +
> +#include <asm/system.h>
> +
> +/**
> + * llist_add - add a new entry
> + * @new: new entry to be added
> + * @head: the head for your lock-less list
> + */
> +void llist_add(struct llist_node *new, struct llist_head *head)
> +{
> + struct llist_node *entry, *old_entry;
> +
> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> + BUG_ON(in_nmi());
> +#endif
> +
> + entry = head->first;
> + do {
> + old_entry = entry;
> + new->next = entry;
> + cpu_relax();
> + } while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
> +}
> +EXPORT_SYMBOL_GPL(llist_add);
> +
> +/**
> + * llist_add_batch - add several linked entries in batch
> + * @new_first: first entry in batch to be added
> + * @new_last: last entry in batch to be added
> + * @head: the head for your lock-less list
> + */
> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> + struct llist_head *head)
> +{
> + struct llist_node *entry, *old_entry;
> +
> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> + BUG_ON(in_nmi());
> +#endif
> +
> + entry = head->first;
> + do {
> + old_entry = entry;
> + new_last->next = entry;
> + cpu_relax();
> + } while ((entry = cmpxchg(&head->first, old_entry, new_first)) != old_entry);
> +}
> +EXPORT_SYMBOL_GPL(llist_add_batch);
> +
> +/**
> + * llist_del_first - delete the first entry of lock-less list
> + * @head: the head for your lock-less list
> + *
> + * If list is empty, return NULL, otherwise, return the first entry
> + * deleted, this is the newest added one.
> + *
> + * Only one llist_del_first user can be used simultaneously with
> + * multiple llist_add users without lock. Because otherwise
> + * llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
> + * llist_add) sequence in another user may change @head->first->next,
> + * but keep @head->first. If multiple consumers are needed, please
> + * use llist_del_all or use lock between consumers.
> + */
> +struct llist_node *llist_del_first(struct llist_head *head)
> +{
> + struct llist_node *entry, *old_entry, *next;
> +
> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> + BUG_ON(in_nmi());
> +#endif
> +
> + entry = head->first;
> + do {
> + if (entry == NULL)
> + return NULL;
> + old_entry = entry;
> + next = entry->next;
> + cpu_relax();
> + } while ((entry = cmpxchg(&head->first, old_entry, next)) != old_entry);
> +
> + return entry;
> +}
> +EXPORT_SYMBOL_GPL(llist_del_first);
> +
> +/**
> + * llist_del_all - delete all entries from lock-less list
> + * @head: the head of lock-less list to delete all entries
> + *
> + * If list is empty, return NULL, otherwise, delete all entries and
> + * return the pointer to the first entry. The order of entries
> + * deleted is from the newest to the oldest added one.
> + */
> +struct llist_node *llist_del_all(struct llist_head *head)
> +{
> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> + BUG_ON(in_nmi());
> +#endif
> +
> + return xchg(&head->first, NULL);
> +}
> +EXPORT_SYMBOL_GPL(llist_del_all);

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2011-04-14 01:29:17

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
> * Huang Ying ([email protected]) wrote:
> [...]
>> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
>> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
>> + * are only added into pool, not deleted from pool unless the pool
>> + * itself is destroyed. If chunk will be deleted from pool,
>> + * rcu_read_lock and rcu_read_unlock should be uses in these
>> + * functions.
>
> So how do you protect between pool destruction and adding chunks into
> the pool ?

Because the pool itself will be freed when destruction, we need some
mechanism outside of pool. For example, if gen_pool_add() is called via
device file IOCTL, we must un-register the device file first, and
destroy the pool after the last reference to device has gone.

Best Regards,
Huang Ying

2011-04-15 00:24:32

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
> * Huang Ying ([email protected]) wrote:
> [...]
>> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
>> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
>> + * are only added into pool, not deleted from pool unless the pool
>> + * itself is destroyed. If chunk will be deleted from pool,
>> + * rcu_read_lock and rcu_read_unlock should be uses in these
>> + * functions.
>
> So how do you protect between pool destruction and adding chunks into
> the pool ?

Because the pool itself will be freed when destruction, we need some
mechanism outside of pool. For example, if gen_pool_add() is called via
device file IOCTL, we must un-register the device file first, and
destroy the pool after the last reference to device has gone.

Best Regards,
Huang Ying

2011-04-15 16:46:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

* Huang Ying ([email protected]) wrote:
> On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
> > * Huang Ying ([email protected]) wrote:
> > [...]
> >> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
> >> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
> >> + * are only added into pool, not deleted from pool unless the pool
> >> + * itself is destroyed. If chunk will be deleted from pool,
> >> + * rcu_read_lock and rcu_read_unlock should be uses in these
> >> + * functions.
> >
> > So how do you protect between pool destruction and adding chunks into
> > the pool ?
>
> Because the pool itself will be freed when destruction, we need some
> mechanism outside of pool. For example, if gen_pool_add() is called via
> device file IOCTL, we must un-register the device file first, and
> destroy the pool after the last reference to device has gone.

I am concerned about the list_for_each_entry_rcu() (and thus
rcu_dereference_raw()) used outside of rcu_read_lock/unlock pairs.
Validation infrastructure as recently been added to RCU: it triggers
warnings when these situations are encountered in some RCU debugging
configurations. The case of RCU list iteration is not covered by the
checks, but it would make sense to be aware of it.

So although it seems like your code does not require rcu read lock
critical sections, I'd prefer to let Paul McKenney have a look.

Thanks,

Mathieu

>
> Best Regards,
> Huang Ying

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2011-04-15 17:43:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

On Fri, Apr 15, 2011 at 12:46:29PM -0400, Mathieu Desnoyers wrote:
> * Huang Ying ([email protected]) wrote:
> > On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
> > > * Huang Ying ([email protected]) wrote:
> > > [...]
> > >> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
> > >> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
> > >> + * are only added into pool, not deleted from pool unless the pool
> > >> + * itself is destroyed. If chunk will be deleted from pool,
> > >> + * rcu_read_lock and rcu_read_unlock should be uses in these
> > >> + * functions.
> > >
> > > So how do you protect between pool destruction and adding chunks into
> > > the pool ?
> >
> > Because the pool itself will be freed when destruction, we need some
> > mechanism outside of pool. For example, if gen_pool_add() is called via
> > device file IOCTL, we must un-register the device file first, and
> > destroy the pool after the last reference to device has gone.
>
> I am concerned about the list_for_each_entry_rcu() (and thus
> rcu_dereference_raw()) used outside of rcu_read_lock/unlock pairs.
> Validation infrastructure as recently been added to RCU: it triggers
> warnings when these situations are encountered in some RCU debugging
> configurations. The case of RCU list iteration is not covered by the
> checks, but it would make sense to be aware of it.
>
> So although it seems like your code does not require rcu read lock
> critical sections, I'd prefer to let Paul McKenney have a look.

As long as you add elements and never remove them, then you can get
away with using list_for_each_entry_rcu() outside of an RCU read-side
critical section. But please comment this -- it is all too easy
for someone to decide later to start deleting elements without also
inserting the needed rcu_read_lock() and rcu_read_unlock() pairs.

But I have lost the thread -- what code am I supposed to look at?

Thanx, Paul

2011-04-15 17:45:03

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

* Paul E. McKenney ([email protected]) wrote:
> On Fri, Apr 15, 2011 at 12:46:29PM -0400, Mathieu Desnoyers wrote:
> > * Huang Ying ([email protected]) wrote:
> > > On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
> > > > * Huang Ying ([email protected]) wrote:
> > > > [...]
> > > >> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
> > > >> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
> > > >> + * are only added into pool, not deleted from pool unless the pool
> > > >> + * itself is destroyed. If chunk will be deleted from pool,
> > > >> + * rcu_read_lock and rcu_read_unlock should be uses in these
> > > >> + * functions.
> > > >
> > > > So how do you protect between pool destruction and adding chunks into
> > > > the pool ?
> > >
> > > Because the pool itself will be freed when destruction, we need some
> > > mechanism outside of pool. For example, if gen_pool_add() is called via
> > > device file IOCTL, we must un-register the device file first, and
> > > destroy the pool after the last reference to device has gone.
> >
> > I am concerned about the list_for_each_entry_rcu() (and thus
> > rcu_dereference_raw()) used outside of rcu_read_lock/unlock pairs.
> > Validation infrastructure as recently been added to RCU: it triggers
> > warnings when these situations are encountered in some RCU debugging
> > configurations. The case of RCU list iteration is not covered by the
> > checks, but it would make sense to be aware of it.
> >
> > So although it seems like your code does not require rcu read lock
> > critical sections, I'd prefer to let Paul McKenney have a look.
>
> As long as you add elements and never remove them, then you can get
> away with using list_for_each_entry_rcu() outside of an RCU read-side
> critical section. But please comment this -- it is all too easy
> for someone to decide later to start deleting elements without also
> inserting the needed rcu_read_lock() and rcu_read_unlock() pairs.
>
> But I have lost the thread -- what code am I supposed to look at?

You can have a look at https://lkml.org/lkml/2011/4/13/56

Thanks!

Mathieu

>
> Thanx, Paul

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2011-04-18 11:42:25

by huang ying

[permalink] [raw]
Subject: Re: [PATCH -v3 3/4] lib, Make gen_pool memory allocator lockless

Hi, Paul,

On Sat, Apr 16, 2011 at 1:44 AM, Mathieu Desnoyers
<[email protected]> wrote:
> * Paul E. McKenney ([email protected]) wrote:
>> On Fri, Apr 15, 2011 at 12:46:29PM -0400, Mathieu Desnoyers wrote:
>> > * Huang Ying ([email protected]) wrote:
>> > > On 04/14/2011 05:07 AM, Mathieu Desnoyers wrote:
>> > > > * Huang Ying ([email protected]) wrote:
>> > > > [...]
>> > > >> + * rcu_read_lock and rcu_read_unlock is not used int gen_pool_alloc,
>> > > >> + * gen_pool_free, gen_pool_avail and gen_pool_size etc, because chunks
>> > > >> + * are only added into pool, not deleted from pool unless the pool
>> > > >> + * itself is destroyed.  If chunk will be deleted from pool,
>> > > >> + * rcu_read_lock and rcu_read_unlock should be uses in these
>> > > >> + * functions.
>> > > >
>> > > > So how do you protect between pool destruction and adding chunks into
>> > > > the pool ?
>> > >
>> > > Because the pool itself will be freed when destruction, we need some
>> > > mechanism outside of pool.  For example, if gen_pool_add() is called via
>> > > device file IOCTL, we must un-register the device file first, and
>> > > destroy the pool after the last reference to device has gone.
>> >
>> > I am concerned about the list_for_each_entry_rcu() (and thus
>> > rcu_dereference_raw()) used outside of rcu_read_lock/unlock pairs.
>> > Validation infrastructure as recently been added to RCU: it triggers
>> > warnings when these situations are encountered in some RCU debugging
>> > configurations. The case of RCU list iteration is not covered by the
>> > checks, but it would make sense to be aware of it.
>> >
>> > So although it seems like your code does not require rcu read lock
>> > critical sections, I'd prefer to let Paul McKenney have a look.
>>
>> As long as you add elements and never remove them, then you can get
>> away with using list_for_each_entry_rcu() outside of an RCU read-side
>> critical section.  But please comment this -- it is all too easy
>> for someone to decide later to start deleting elements without also
>> inserting the needed rcu_read_lock() and rcu_read_unlock() pairs.
>>
>> But I have lost the thread -- what code am I supposed to look at?
>
> You can have a look at https://lkml.org/lkml/2011/4/13/56

What do you think about this patch and its usage of RCU?

Best Regards,
Huang Ying