2008-07-31 09:41:09

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH 0/3] powerpc - Make the irq reverse mapping tree lockless

Hi ,

here is a respin of the patches I posted last week for the RT kernel now targeted
for mainline (http://lkml.org/lkml/2008/7/24/98). Thomas, steven, a note for you
at the end.

The goal of this patchset is to simplify the locking constraints on the radix
tree used for IRQ reverse mapping on the pSeries machines and provide lockless
access to this tree.

This also solves the following BUG under preempt-rt:

BUG: sleeping function called from invalid context swapper(1) at kernel/rtmutex.c:739
in_atomic():1 [00000002], irqs_disabled():1
Call Trace:
[c0000001e20f3340] [c000000000010370] .show_stack+0x70/0x1bc (unreliable)
[c0000001e20f33f0] [c000000000049380] .__might_sleep+0x11c/0x138
[c0000001e20f3470] [c0000000002a2f64] .__rt_spin_lock+0x3c/0x98
[c0000001e20f34f0] [c0000000000c3f20] .kmem_cache_alloc+0x68/0x184
[c0000001e20f3590] [c000000000193f3c] .radix_tree_node_alloc+0xf0/0x144
[c0000001e20f3630] [c000000000195190] .radix_tree_insert+0x18c/0x2fc
[c0000001e20f36f0] [c00000000000c710] .irq_radix_revmap+0x1a4/0x1e4
[c0000001e20f37b0] [c00000000003b3f0] .xics_startup+0x30/0x54
[c0000001e20f3840] [c00000000008b864] .setup_irq+0x26c/0x370
[c0000001e20f38f0] [c00000000008ba68] .request_irq+0x100/0x158
[c0000001e20f39a0] [c0000000001ee9c0] .hvc_open+0xb4/0x148
[c0000001e20f3a40] [c0000000001d72ec] .tty_open+0x200/0x368
[c0000001e20f3af0] [c0000000000ce928] .chrdev_open+0x1f4/0x25c
[c0000001e20f3ba0] [c0000000000c8bf0] .__dentry_open+0x188/0x2c8
[c0000001e20f3c50] [c0000000000c8dec] .do_filp_open+0x50/0x70
[c0000001e20f3d70] [c0000000000c8e8c] .do_sys_open+0x80/0x148
[c0000001e20f3e20] [c00000000000928c] .init_post+0x4c/0x100
[c0000001e20f3ea0] [c0000000003c0e0c] .kernel_init+0x428/0x478
[c0000001e20f3f90] [c000000000027448] .kernel_thread+0x4c/0x68

The root cause of this bug lies in the fact that the XICS interrupt controller
uses a radix tree for its reverse irq mapping and that we cannot allocate the tree
nodes (even GFP_ATOMIC) with preemption disabled.

In fact, we have 2 nested preemption disabling when we want to allocate
a new node:

- setup_irq() does a spin_lock_irqsave() before calling xics_startup() which
then calls irq_radix_revmap() to insert a new node in the tree

- irq_radix_revmap() also does a spin_lock_irqsave() (in irq_radix_wrlock())
before the radix_tree_insert()

Also, if an IRQ gets registered before the tree is initialized (namely the
IPI), it will be inserted into the tree in interrupt context once the tree
have been initialized, hence the need for a spin_lock_irqsave() in the insertion
path.

This serie is split into 3 patches:

- The first patch moves the initialization of the radix tree earlier in the
boot process before any IRQ gets registered, but after the mm is up.

- The second patch splits irq_radix_revmap() into its 2 components: one
for lookup and one for insertion into the radix tree.

- And finally, the third patch makes the radix tree fully lockless on the
lookup side.


Here is the diffstat for the whole patchset:

arch/powerpc/kernel/irq.c | 134 ++++++++-------------------------
arch/powerpc/platforms/pseries/smp.c | 1 +
arch/powerpc/platforms/pseries/xics.c | 11 +--
include/asm-powerpc/irq.h | 24 +++++-
4 files changed, 58 insertions(+), 112 deletions(-)


Thomas, Steven, the first 2 patches can be applied seamlessly to 2.6.26-rt1
with offsets, the third patch has a trivial to fix reject in
arch/powerpc/kernel/irq.c because the irq_big_lock is changed to a raw spinlock
in preempt-rt. If you want those patches for RT, just flag me, I have those
sitting on my test box.



Thanks,

Sebastien.



2008-07-31 09:40:32

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH] powerpc - Separate the irq radix tree insertion and lookup

irq_radix_revmap() currently serves 2 purposes, irq mapping lookup
and insertion which happen in interrupt and process context respectively.

Separate the function into its 2 components, one for lookup only and one
for insertion only.


Signed-off-by: Sebastien Dugue <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
---
arch/powerpc/kernel/irq.c | 25 ++++++++++++++-----------
arch/powerpc/platforms/pseries/xics.c | 11 ++++-------
include/asm-powerpc/irq.h | 18 +++++++++++++++---
3 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 0a1445c..083b181 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -900,34 +900,37 @@ void __init irq_radix_revmap_init(void)
}
}

-unsigned int irq_radix_revmap(struct irq_host *host,
- irq_hw_number_t hwirq)
+unsigned int irq_radix_revmap_lookup(struct irq_host *host,
+ irq_hw_number_t hwirq)
{
struct irq_map_entry *ptr;
- unsigned int virq;
+ unsigned int virq = NO_IRQ;
unsigned long flags;

WARN_ON(host->revmap_type != IRQ_HOST_MAP_TREE);

- /* Try to resolve */
irq_radix_rdlock(&flags);
ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
irq_radix_rdunlock(flags);

- /* Found it, return */
- if (ptr) {
+ if (ptr)
virq = ptr - irq_map;
- return virq;
- }

- /* If not there, try to insert it */
- virq = irq_find_mapping(host, hwirq);
+ return virq;
+}
+
+void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq,
+ irq_hw_number_t hwirq)
+{
+ unsigned long flags;
+
+ WARN_ON(host->revmap_type != IRQ_HOST_MAP_TREE);
+
if (virq != NO_IRQ) {
irq_radix_wrlock(&flags);
radix_tree_insert(&host->revmap_data.tree, hwirq, &irq_map[virq]);
irq_radix_wrunlock(flags);
}
- return virq;
}

unsigned int irq_linear_revmap(struct irq_host *host,
diff --git a/arch/powerpc/platforms/pseries/xics.c b/arch/powerpc/platforms/pseries/xics.c
index 0fc830f..6b1a005 100644
--- a/arch/powerpc/platforms/pseries/xics.c
+++ b/arch/powerpc/platforms/pseries/xics.c
@@ -310,12 +310,6 @@ static void xics_mask_irq(unsigned int virq)

static unsigned int xics_startup(unsigned int virq)
{
- unsigned int irq;
-
- /* force a reverse mapping of the interrupt so it gets in the cache */
- irq = (unsigned int)irq_map[virq].hwirq;
- irq_radix_revmap(xics_host, irq);
-
/* unmask it */
xics_unmask_irq(virq);
return 0;
@@ -346,7 +340,7 @@ static inline unsigned int xics_remap_irq(unsigned int vec)

if (vec == XICS_IRQ_SPURIOUS)
return NO_IRQ;
- irq = irq_radix_revmap(xics_host, vec);
+ irq = irq_radix_revmap_lookup(xics_host, vec);
if (likely(irq != NO_IRQ))
return irq;

@@ -530,6 +524,9 @@ static int xics_host_map(struct irq_host *h, unsigned int virq,
{
pr_debug("xics: map virq %d, hwirq 0x%lx\n", virq, hw);

+ /* Insert the interrupt mapping into the radix tree for fast lookup */
+ irq_radix_revmap_insert(xics_host, virq, hw);
+
get_irq_desc(virq)->status |= IRQ_LEVEL;
set_irq_chip_and_handler(virq, xics_irq_chip, handle_fasteoi_irq);
return 0;
diff --git a/include/asm-powerpc/irq.h b/include/asm-powerpc/irq.h
index 47b8119..5c88acf 100644
--- a/include/asm-powerpc/irq.h
+++ b/include/asm-powerpc/irq.h
@@ -243,15 +243,27 @@ extern unsigned int irq_create_direct_mapping(struct irq_host *host);
extern void __init irq_radix_revmap_init(void);

/**
- * irq_radix_revmap - Find a linux virq from a hw irq number.
+ * irq_radix_revmap_insert - Insert a hw irq to linux virq number mapping.
+ * @host: host owning this hardware interrupt
+ * @virq: linux irq number
+ * @hwirq: hardware irq number in that host space
+ *
+ * This is for use by irq controllers that use a radix tree reverse
+ * mapping for fast lookup.
+ */
+extern void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq,
+ irq_hw_number_t hwirq);
+
+/**
+ * irq_radix_revmap_lookup - Find a linux virq from a hw irq number.
* @host: host owning this hardware interrupt
* @hwirq: hardware irq number in that host space
*
* This is a fast path, for use by irq controller code that uses radix tree
* revmaps
*/
-extern unsigned int irq_radix_revmap(struct irq_host *host,
- irq_hw_number_t hwirq);
+extern unsigned int irq_radix_revmap_lookup(struct irq_host *host,
+ irq_hw_number_t hwirq);

/**
* irq_linear_revmap - Find a linux virq from a hw irq number.
--
1.5.5.1

2008-07-31 09:40:49

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH] powerpc - Initialize the irq radix tree earlier

The radix tree used for fast irq reverse mapping by the XICS is initialized
late in the boot process, after the first interrupt (IPI) gets registered
and after the first IPI is received.

This patch moves the initialization of the XICS radix tree earlier into
the boot process in smp_xics_probe() (the mm is already up but no interrupts
have been registered at that point) to avoid having to insert a mapping into
the tree in interrupt context. This will help in simplifying the locking
constraints and move to a lockless radix tree in subsequent patches.

As a nice side effect, there is no need any longer to check for
(host->revmap_data.tree.gfp_mask != 0) to know if the tree have been
initialized.


Signed-off-by: Sebastien Dugue <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
---
arch/powerpc/kernel/irq.c | 44 +++++++++------------------------
arch/powerpc/platforms/pseries/smp.c | 1 +
include/asm-powerpc/irq.h | 5 ++++
3 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6ac8612..0a1445c 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -840,9 +840,6 @@ void irq_dispose_mapping(unsigned int virq)
host->revmap_data.linear.revmap[hwirq] = NO_IRQ;
break;
case IRQ_HOST_MAP_TREE:
- /* Check if radix tree allocated yet */
- if (host->revmap_data.tree.gfp_mask == 0)
- break;
irq_radix_wrlock(&flags);
radix_tree_delete(&host->revmap_data.tree, hwirq);
irq_radix_wrunlock(flags);
@@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host,
}
EXPORT_SYMBOL_GPL(irq_find_mapping);

+void __init irq_radix_revmap_init(void)
+{
+ struct irq_host *h;
+
+ list_for_each_entry(h, &irq_hosts, link) {
+ if (h->revmap_type == IRQ_HOST_MAP_TREE)
+ INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
+ }
+}

unsigned int irq_radix_revmap(struct irq_host *host,
irq_hw_number_t hwirq)
{
- struct radix_tree_root *tree;
struct irq_map_entry *ptr;
unsigned int virq;
unsigned long flags;

WARN_ON(host->revmap_type != IRQ_HOST_MAP_TREE);

- /* Check if the radix tree exist yet. We test the value of
- * the gfp_mask for that. Sneaky but saves another int in the
- * structure. If not, we fallback to slow mode
- */
- tree = &host->revmap_data.tree;
- if (tree->gfp_mask == 0)
- return irq_find_mapping(host, hwirq);
-
- /* Now try to resolve */
+ /* Try to resolve */
irq_radix_rdlock(&flags);
- ptr = radix_tree_lookup(tree, hwirq);
+ ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
irq_radix_rdunlock(flags);

/* Found it, return */
@@ -927,7 +924,7 @@ unsigned int irq_radix_revmap(struct irq_host *host,
virq = irq_find_mapping(host, hwirq);
if (virq != NO_IRQ) {
irq_radix_wrlock(&flags);
- radix_tree_insert(tree, hwirq, &irq_map[virq]);
+ radix_tree_insert(&host->revmap_data.tree, hwirq, &irq_map[virq]);
irq_radix_wrunlock(flags);
}
return virq;
@@ -1035,23 +1032,6 @@ void irq_early_init(void)
get_irq_desc(i)->status |= IRQ_NOREQUEST;
}

-/* We need to create the radix trees late */
-static int irq_late_init(void)
-{
- struct irq_host *h;
- unsigned long flags;
-
- irq_radix_wrlock(&flags);
- list_for_each_entry(h, &irq_hosts, link) {
- if (h->revmap_type == IRQ_HOST_MAP_TREE)
- INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
- }
- irq_radix_wrunlock(flags);
-
- return 0;
-}
-arch_initcall(irq_late_init);
-
#ifdef CONFIG_VIRQ_DEBUG
static int virq_debug_show(struct seq_file *m, void *private)
{
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 9d8f8c8..b143fe7 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg)

static int __init smp_xics_probe(void)
{
+ irq_radix_revmap_init();
xics_request_IPIs();

return cpus_weight(cpu_possible_map);
diff --git a/include/asm-powerpc/irq.h b/include/asm-powerpc/irq.h
index 1ef8e30..47b8119 100644
--- a/include/asm-powerpc/irq.h
+++ b/include/asm-powerpc/irq.h
@@ -237,6 +237,11 @@ extern unsigned int irq_find_mapping(struct irq_host *host,
*/
extern unsigned int irq_create_direct_mapping(struct irq_host *host);

+/*
+ * Initialize the radix tree used by some irq controllers
+ */
+extern void __init irq_radix_revmap_init(void);
+
/**
* irq_radix_revmap - Find a linux virq from a hw irq number.
* @host: host owning this hardware interrupt
--
1.5.5.1

2008-07-31 09:41:29

by Sébastien Dugué

[permalink] [raw]
Subject: [PATCH] powerpc - Make the irq reverse mapping radix tree lockless

The radix trees used by interrupt controllers for their irq reverse mapping
(currently only the XICS found on pSeries) have a complex locking scheme
dating back to before the advent of the lockless radix tree.

Take advantage of this and of the fact that the items of the tree are
pointers to a static array (irq_map) elements which can never go under us
to simplify the locking.

Concurrency between readers and writers is handled by the intrinsic
properties of the lockless radix tree. Concurrency between writers is handled
with a spinlock added to the irq_host structure.


Signed-off-by: Sebastien Dugue <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
---
arch/powerpc/kernel/irq.c | 75 ++++++--------------------------------------
include/asm-powerpc/irq.h | 1 +
2 files changed, 12 insertions(+), 64 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 083b181..3aa683b 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -458,8 +458,6 @@ void do_softirq(void)

static LIST_HEAD(irq_hosts);
static DEFINE_SPINLOCK(irq_big_lock);
-static DEFINE_PER_CPU(unsigned int, irq_radix_reader);
-static unsigned int irq_radix_writer;
struct irq_map_entry irq_map[NR_IRQS];
static unsigned int irq_virq_count = NR_IRQS;
static struct irq_host *irq_default_host;
@@ -602,57 +600,6 @@ void irq_set_virq_count(unsigned int count)
irq_virq_count = count;
}

-/* radix tree not lockless safe ! we use a brlock-type mecanism
- * for now, until we can use a lockless radix tree
- */
-static void irq_radix_wrlock(unsigned long *flags)
-{
- unsigned int cpu, ok;
-
- spin_lock_irqsave(&irq_big_lock, *flags);
- irq_radix_writer = 1;
- smp_mb();
- do {
- barrier();
- ok = 1;
- for_each_possible_cpu(cpu) {
- if (per_cpu(irq_radix_reader, cpu)) {
- ok = 0;
- break;
- }
- }
- if (!ok)
- cpu_relax();
- } while(!ok);
-}
-
-static void irq_radix_wrunlock(unsigned long flags)
-{
- smp_wmb();
- irq_radix_writer = 0;
- spin_unlock_irqrestore(&irq_big_lock, flags);
-}
-
-static void irq_radix_rdlock(unsigned long *flags)
-{
- local_irq_save(*flags);
- __get_cpu_var(irq_radix_reader) = 1;
- smp_mb();
- if (likely(irq_radix_writer == 0))
- return;
- __get_cpu_var(irq_radix_reader) = 0;
- smp_wmb();
- spin_lock(&irq_big_lock);
- __get_cpu_var(irq_radix_reader) = 1;
- spin_unlock(&irq_big_lock);
-}
-
-static void irq_radix_rdunlock(unsigned long flags)
-{
- __get_cpu_var(irq_radix_reader) = 0;
- local_irq_restore(flags);
-}
-
static int irq_setup_virq(struct irq_host *host, unsigned int virq,
irq_hw_number_t hwirq)
{
@@ -807,7 +754,6 @@ void irq_dispose_mapping(unsigned int virq)
{
struct irq_host *host;
irq_hw_number_t hwirq;
- unsigned long flags;

if (virq == NO_IRQ)
return;
@@ -840,9 +786,9 @@ void irq_dispose_mapping(unsigned int virq)
host->revmap_data.linear.revmap[hwirq] = NO_IRQ;
break;
case IRQ_HOST_MAP_TREE:
- irq_radix_wrlock(&flags);
+ spin_lock(&host->tree_lock);
radix_tree_delete(&host->revmap_data.tree, hwirq);
- irq_radix_wrunlock(flags);
+ spin_unlock(&host->tree_lock);
break;
}

@@ -895,8 +841,10 @@ void __init irq_radix_revmap_init(void)
struct irq_host *h;

list_for_each_entry(h, &irq_hosts, link) {
- if (h->revmap_type == IRQ_HOST_MAP_TREE)
+ if (h->revmap_type == IRQ_HOST_MAP_TREE) {
INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
+ spin_lock_init(&h->tree_lock);
+ }
}
}

@@ -905,13 +853,14 @@ unsigned int irq_radix_revmap_lookup(struct irq_host *host,
{
struct irq_map_entry *ptr;
unsigned int virq = NO_IRQ;
- unsigned long flags;

WARN_ON(host->revmap_type != IRQ_HOST_MAP_TREE);

- irq_radix_rdlock(&flags);
+ /*
+ * No rcu_read_lock(ing) needed, the ptr returned can't go under us
+ * as it's referencing an entry in the static irq_map table.
+ */
ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
- irq_radix_rdunlock(flags);

if (ptr)
virq = ptr - irq_map;
@@ -922,14 +871,12 @@ unsigned int irq_radix_revmap_lookup(struct irq_host *host,
void irq_radix_revmap_insert(struct irq_host *host, unsigned int virq,
irq_hw_number_t hwirq)
{
- unsigned long flags;
-
WARN_ON(host->revmap_type != IRQ_HOST_MAP_TREE);

if (virq != NO_IRQ) {
- irq_radix_wrlock(&flags);
+ spin_lock(&host->tree_lock);
radix_tree_insert(&host->revmap_data.tree, hwirq, &irq_map[virq]);
- irq_radix_wrunlock(flags);
+ spin_unlock(&host->tree_lock);
}
}

diff --git a/include/asm-powerpc/irq.h b/include/asm-powerpc/irq.h
index 5c88acf..2ae395f 100644
--- a/include/asm-powerpc/irq.h
+++ b/include/asm-powerpc/irq.h
@@ -121,6 +121,7 @@ struct irq_host {
} linear;
struct radix_tree_root tree;
} revmap_data;
+ spinlock_t tree_lock;
struct irq_host_ops *ops;
void *host_data;
irq_hw_number_t inval_irq;
--
1.5.5.1

2008-07-31 10:11:57

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH 0/3] powerpc - Make the irq reverse mapping tree lockless


OK, I goofed up with git-format-patch, forgot the --numbered option.

The patches subjects should read:

[PATCH 1/3] powerpc - Initialize the irq radix tree earlier
[PATCH 2/3] powerpc - Separate the irq radix tree insertion and lookup
[PATCH 3/3] powerpc - Make the irq reverse mapping radix tree lockless

Sorry for that.

Sebastien.

2008-07-31 11:41:12

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote:
> The radix tree used for fast irq reverse mapping by the XICS is initialized
> late in the boot process, after the first interrupt (IPI) gets registered
> and after the first IPI is received.
>
> This patch moves the initialization of the XICS radix tree earlier into
> the boot process in smp_xics_probe() (the mm is already up but no interrupts
> have been registered at that point) to avoid having to insert a mapping into
> the tree in interrupt context. This will help in simplifying the locking
> constraints and move to a lockless radix tree in subsequent patches.
>
> As a nice side effect, there is no need any longer to check for
> (host->revmap_data.tree.gfp_mask != 0) to know if the tree have been
> initialized.

Hi Sebastien,

This is a nice cleanup, I think :)

> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 6ac8612..0a1445c 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host,
> }
> EXPORT_SYMBOL_GPL(irq_find_mapping);
>
> +void __init irq_radix_revmap_init(void)
> +{
> + struct irq_host *h;
> +
> + list_for_each_entry(h, &irq_hosts, link) {
> + if (h->revmap_type == IRQ_HOST_MAP_TREE)
> + INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
> + }
> +}

Note irq_radix_revmap_init() loops over all irq_hosts ...

> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> index 9d8f8c8..b143fe7 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg)
>
> static int __init smp_xics_probe(void)
> {
> + irq_radix_revmap_init();
> xics_request_IPIs();

But now it's only called from the xics setup code.

Which seems a bit ugly. In practice it doesn't matter because at the
moment xics is the only user of the radix revmap. But if we're going to
switch to this sort of initialisation I think xics should only be
init'ing the revmap for itself.


This boot ordering stuff is pretty hairy, so I might have missed
something, but this is how the code is ordered AFAICT:

start_kernel()
init_IRQ()
...
local_irq_enable()
...
rest_init()
kernel_thread()
kernel_init()
smp_prepare_cpus()
smp_xics_probe() (via smp_ops->probe())


What's stopping us from taking an irq between local_irq_enable() and
smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?


cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-07-31 12:01:15

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier


Hi Michael,

On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:

> On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote:
> > The radix tree used for fast irq reverse mapping by the XICS is initialized
> > late in the boot process, after the first interrupt (IPI) gets registered
> > and after the first IPI is received.
> >
> > This patch moves the initialization of the XICS radix tree earlier into
> > the boot process in smp_xics_probe() (the mm is already up but no interrupts
> > have been registered at that point) to avoid having to insert a mapping into
> > the tree in interrupt context. This will help in simplifying the locking
> > constraints and move to a lockless radix tree in subsequent patches.
> >
> > As a nice side effect, there is no need any longer to check for
> > (host->revmap_data.tree.gfp_mask != 0) to know if the tree have been
> > initialized.
>
> Hi Sebastien,
>
> This is a nice cleanup, I think :)

Thanks.

>
> > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> > index 6ac8612..0a1445c 100644
> > --- a/arch/powerpc/kernel/irq.c
> > +++ b/arch/powerpc/kernel/irq.c
> > @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host,
> > }
> > EXPORT_SYMBOL_GPL(irq_find_mapping);
> >
> > +void __init irq_radix_revmap_init(void)
> > +{
> > + struct irq_host *h;
> > +
> > + list_for_each_entry(h, &irq_hosts, link) {
> > + if (h->revmap_type == IRQ_HOST_MAP_TREE)
> > + INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
> > + }
> > +}
>
> Note irq_radix_revmap_init() loops over all irq_hosts ...

Yep, but there's only one host (xics)

>
> > diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> > index 9d8f8c8..b143fe7 100644
> > --- a/arch/powerpc/platforms/pseries/smp.c
> > +++ b/arch/powerpc/platforms/pseries/smp.c
> > @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg)
> >
> > static int __init smp_xics_probe(void)
> > {
> > + irq_radix_revmap_init();
> > xics_request_IPIs();
>
> But now it's only called from the xics setup code.
>
> Which seems a bit ugly. In practice it doesn't matter because at the
> moment xics is the only user of the radix revmap. But if we're going to
> switch to this sort of initialisation I think xics should only be
> init'ing the revmap for itself.

You're right, that's what I intended to do from the beginning but
stumbled upon ... hmm, can't remember what, that made me change
my mind. But I agree, I'm not particularly proud of that. Will look
again into that.

>
>
> This boot ordering stuff is pretty hairy, so I might have missed
> something, but this is how the code is ordered AFAICT:
> 
> start_kernel()
> init_IRQ()
> ...
> local_irq_enable()
> ...
> rest_init()
> kernel_thread()
> kernel_init()
> smp_prepare_cpus()
> smp_xics_probe() (via smp_ops->probe())
>
>
> What's stopping us from taking an irq between local_irq_enable() and
> smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?

It's hairy, I agree, but as you've mentioned no one has done a request_irq()
at that point. The first one to do it is smp_xics_probe() for the IPI.

Thanks for your comments.

Sebastien.

2008-07-31 12:10:32

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 31 Jul 2008 14:00:02 +0200 Sebastien Dugue <[email protected]> wrote:

>
> Hi Michael,
>
> On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
>
> > On Thu, 2008-07-31 at 11:40 +0200, Sebastien Dugue wrote:
> > > The radix tree used for fast irq reverse mapping by the XICS is initialized
> > > late in the boot process, after the first interrupt (IPI) gets registered
> > > and after the first IPI is received.
> > >
> > > This patch moves the initialization of the XICS radix tree earlier into
> > > the boot process in smp_xics_probe() (the mm is already up but no interrupts
> > > have been registered at that point) to avoid having to insert a mapping into
> > > the tree in interrupt context. This will help in simplifying the locking
> > > constraints and move to a lockless radix tree in subsequent patches.
> > >
> > > As a nice side effect, there is no need any longer to check for
> > > (host->revmap_data.tree.gfp_mask != 0) to know if the tree have been
> > > initialized.
> >
> > Hi Sebastien,
> >
> > This is a nice cleanup, I think :)
>
> Thanks.
>
> >
> > > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> > > index 6ac8612..0a1445c 100644
> > > --- a/arch/powerpc/kernel/irq.c
> > > +++ b/arch/powerpc/kernel/irq.c
> > > @@ -893,28 +890,28 @@ unsigned int irq_find_mapping(struct irq_host *host,
> > > }
> > > EXPORT_SYMBOL_GPL(irq_find_mapping);
> > >
> > > +void __init irq_radix_revmap_init(void)
> > > +{
> > > + struct irq_host *h;
> > > +
> > > + list_for_each_entry(h, &irq_hosts, link) {
> > > + if (h->revmap_type == IRQ_HOST_MAP_TREE)
> > > + INIT_RADIX_TREE(&h->revmap_data.tree, GFP_ATOMIC);
> > > + }
> > > +}
> >
> > Note irq_radix_revmap_init() loops over all irq_hosts ...
>
> Yep, but there's only one host (xics)
>
> >
> > > diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> > > index 9d8f8c8..b143fe7 100644
> > > --- a/arch/powerpc/platforms/pseries/smp.c
> > > +++ b/arch/powerpc/platforms/pseries/smp.c
> > > @@ -130,6 +130,7 @@ static void smp_xics_message_pass(int target, int msg)
> > >
> > > static int __init smp_xics_probe(void)
> > > {
> > > + irq_radix_revmap_init();
> > > xics_request_IPIs();
> >
> > But now it's only called from the xics setup code.
> >
> > Which seems a bit ugly. In practice it doesn't matter because at the
> > moment xics is the only user of the radix revmap. But if we're going to
> > switch to this sort of initialisation I think xics should only be
> > init'ing the revmap for itself.
>
> You're right, that's what I intended to do from the beginning but
> stumbled upon ... hmm, can't remember what, that made me change
> my mind.

Ah yes, I wanted to do it from xics_init_host() but backed off
because at that point the mm is not up. But it does not make a difference
as the first request_irq() happens after the mm is up. A bit shaky I
concede.

> But I agree, I'm not particularly proud of that. Will look
> again into that.
>
> >
> >
> > This boot ordering stuff is pretty hairy, so I might have missed
> > something, but this is how the code is ordered AFAICT:
> > 
> > start_kernel()
> > init_IRQ()
> > ...
> > local_irq_enable()
> > ...
> > rest_init()
> > kernel_thread()
> > kernel_init()
> > smp_prepare_cpus()
> > smp_xics_probe() (via smp_ops->probe())
> >
> >
> > What's stopping us from taking an irq between local_irq_enable() and
> > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
>
> It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> at that point. The first one to do it is smp_xics_probe() for the IPI.
>
> Thanks for your comments.
>
> Sebastien.
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

2008-07-31 12:58:40

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote:
> On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
> >
> > This boot ordering stuff is pretty hairy, so I might have missed
> > something, but this is how the code is ordered AFAICT:
> > 
> > start_kernel()
> > init_IRQ()
> > ...
> > local_irq_enable()
> > ...
> > rest_init()
> > kernel_thread()
> > kernel_init()
> > smp_prepare_cpus()
> > smp_xics_probe() (via smp_ops->probe())
> >
> >
> > What's stopping us from taking an irq between local_irq_enable() and
> > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
>
> It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> at that point. The first one to do it is smp_xics_probe() for the IPI.

Hmm, I don't think that's strong enough. I can trivially cause irqs to
fire during a kexec reboot just by mashing the keyboard.

And during a kdump boot all sorts of stuff could be firing. Even during
a clean boot, from firmware, I don't think we can guarantee that
nothing's going to fire.

.. after a bit of testing ..

It seems it actually works (sort of).

xics_remap_irq() calls irq_radix_revmap_lookup(), which calls:

ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);

And because host->revmap_data.tree was zalloc'ed we trip on the first
check here:



cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-07-31 13:01:54

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote:
> On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote:
> > On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
> > >
> > > This boot ordering stuff is pretty hairy, so I might have missed
> > > something, but this is how the code is ordered AFAICT:
> > > 
> > > start_kernel()
> > > init_IRQ()
> > > ...
> > > local_irq_enable()
> > > ...
> > > rest_init()
> > > kernel_thread()
> > > kernel_init()
> > > smp_prepare_cpus()
> > > smp_xics_probe() (via smp_ops->probe())
> > >
> > >
> > > What's stopping us from taking an irq between local_irq_enable() and
> > > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
> >
> > It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> > at that point. The first one to do it is smp_xics_probe() for the IPI.
>
> Hmm, I don't think that's strong enough. I can trivially cause irqs to
> fire during a kexec reboot just by mashing the keyboard.
>
> And during a kdump boot all sorts of stuff could be firing. Even during
> a clean boot, from firmware, I don't think we can guarantee that
> nothing's going to fire.
>
> .. after a bit of testing ..
>
> It seems it actually works (sort of).
>
> xics_remap_irq() calls irq_radix_revmap_lookup(), which calls:
>
> ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
>
> And because host->revmap_data.tree was zalloc'ed we trip on the first
> check here:

@#$% ctrl-enter == send!

Continuing ...

void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
{
unsigned int height, shift;
struct radix_tree_node *node, **slot;

node = rcu_dereference(root->rnode);
if (node == NULL)
return NULL;

Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool.


So I think it can fly, as long as we're happy that we can't reverse map
anything until smp_xics_probe() - and I think that's true, as any irq we
take will be invalid.

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-07-31 13:25:46

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 31 Jul 2008 23:01:39 +1000 Michael Ellerman <[email protected]> wrote:

> On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote:
> > On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote:
> > > On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
> > > >
> > > > This boot ordering stuff is pretty hairy, so I might have missed
> > > > something, but this is how the code is ordered AFAICT:
> > > > 
> > > > start_kernel()
> > > > init_IRQ()
> > > > ...
> > > > local_irq_enable()
> > > > ...
> > > > rest_init()
> > > > kernel_thread()
> > > > kernel_init()
> > > > smp_prepare_cpus()
> > > > smp_xics_probe() (via smp_ops->probe())
> > > >
> > > >
> > > > What's stopping us from taking an irq between local_irq_enable() and
> > > > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
> > >
> > > It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> > > at that point. The first one to do it is smp_xics_probe() for the IPI.
> >
> > Hmm, I don't think that's strong enough. I can trivially cause irqs to
> > fire during a kexec reboot just by mashing the keyboard.
> >
> > And during a kdump boot all sorts of stuff could be firing. Even during
> > a clean boot, from firmware, I don't think we can guarantee that
> > nothing's going to fire.
> >
> > .. after a bit of testing ..
> >
> > It seems it actually works (sort of).
> >
> > xics_remap_irq() calls irq_radix_revmap_lookup(), which calls:
> >
> > ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
> >
> > And because host->revmap_data.tree was zalloc'ed we trip on the first
> > check here:
>
> @#$% ctrl-enter == send!
>
> Continuing ...
>
> void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> {
> unsigned int height, shift;
> struct radix_tree_node *node, **slot;
>
> node = rcu_dereference(root->rnode);
> if (node == NULL)
> return NULL;
>
> Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool.

Which is what I intended so that as long as no IRQ is registered we
return NO_IRQ.

>
>
> So I think it can fly, as long as we're happy that we can't reverse map
> anything until smp_xics_probe() - and I think that's true, as any irq we
> take will be invalid.

That's true as no IRQs are registered before smp_xics_probe() and for any
interrupt we might get before that, irq_radix_revmap_lookup() will return
NO_IRQ.

Thanks,

Sebastien.

2008-07-31 13:39:42

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 2008-07-31 at 15:26 +0200, Sebastien Dugue wrote:
> On Thu, 31 Jul 2008 23:01:39 +1000 Michael Ellerman <[email protected]> wrote:
>
> > On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote:
> > > On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote:
> > > > On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
> > > > >
> > > > > This boot ordering stuff is pretty hairy, so I might have missed
> > > > > something, but this is how the code is ordered AFAICT:
> > > > > 
> > > > > start_kernel()
> > > > > init_IRQ()
> > > > > ...
> > > > > local_irq_enable()
> > > > > ...
> > > > > rest_init()
> > > > > kernel_thread()
> > > > > kernel_init()
> > > > > smp_prepare_cpus()
> > > > > smp_xics_probe() (via smp_ops->probe())
> > > > >
> > > > >
> > > > > What's stopping us from taking an irq between local_irq_enable() and
> > > > > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
> > > >
> > > > It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> > > > at that point. The first one to do it is smp_xics_probe() for the IPI.
> > >
> > > Hmm, I don't think that's strong enough. I can trivially cause irqs to
> > > fire during a kexec reboot just by mashing the keyboard.
> > >
> > > And during a kdump boot all sorts of stuff could be firing. Even during
> > > a clean boot, from firmware, I don't think we can guarantee that
> > > nothing's going to fire.
> > >
> > > .. after a bit of testing ..
> > >
> > > It seems it actually works (sort of).
> > >
> > > xics_remap_irq() calls irq_radix_revmap_lookup(), which calls:
> > >
> > > ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
> > >
> > > And because host->revmap_data.tree was zalloc'ed we trip on the first
> > > check here:
> >
> > @#$% ctrl-enter == send!
> >
> > Continuing ...
> >
> > void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> > {
> > unsigned int height, shift;
> > struct radix_tree_node *node, **slot;
> >
> > node = rcu_dereference(root->rnode);
> > if (node == NULL)
> > return NULL;
> >
> > Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool.
>
> Which is what I intended so that as long as no IRQ is registered we
> return NO_IRQ.
>
> >
> >
> > So I think it can fly, as long as we're happy that we can't reverse map
> > anything until smp_xics_probe() - and I think that's true, as any irq we
> > take will be invalid.
>
> That's true as no IRQs are registered before smp_xics_probe() and for any
> interrupt we might get before that, irq_radix_revmap_lookup() will return
> NO_IRQ.

Cool, we agree :)

My only worry is that we might be relying on on the particular radix
tree implementation a bit too much. Is it documented somewhere that
the /very/ first check is for root->rnode != NULL, and the rest of the
root may be unintialised?

And I think it needs a big fat comment in the irq code saying that it's
safe because revmap_data is zalloc'ed, and that means the radix lookup
will fail (safely).

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2008-07-31 14:14:21

by Sébastien Dugué

[permalink] [raw]
Subject: Re: [PATCH] powerpc - Initialize the irq radix tree earlier

On Thu, 31 Jul 2008 23:39:26 +1000 Michael Ellerman <[email protected]> wrote:

> On Thu, 2008-07-31 at 15:26 +0200, Sebastien Dugue wrote:
> > On Thu, 31 Jul 2008 23:01:39 +1000 Michael Ellerman <[email protected]> wrote:
> >
> > > On Thu, 2008-07-31 at 22:58 +1000, Michael Ellerman wrote:
> > > > On Thu, 2008-07-31 at 14:00 +0200, Sebastien Dugue wrote:
> > > > > On Thu, 31 Jul 2008 21:40:56 +1000 Michael Ellerman <[email protected]> wrote:
> > > > > >
> > > > > > This boot ordering stuff is pretty hairy, so I might have missed
> > > > > > something, but this is how the code is ordered AFAICT:
> > > > > > 
> > > > > > start_kernel()
> > > > > > init_IRQ()
> > > > > > ...
> > > > > > local_irq_enable()
> > > > > > ...
> > > > > > rest_init()
> > > > > > kernel_thread()
> > > > > > kernel_init()
> > > > > > smp_prepare_cpus()
> > > > > > smp_xics_probe() (via smp_ops->probe())
> > > > > >
> > > > > >
> > > > > > What's stopping us from taking an irq between local_irq_enable() and
> > > > > > smp_xics_probe() ? Is it just that no one's request_irq()'ed them yet?
> > > > >
> > > > > It's hairy, I agree, but as you've mentioned no one has done a request_irq()
> > > > > at that point. The first one to do it is smp_xics_probe() for the IPI.
> > > >
> > > > Hmm, I don't think that's strong enough. I can trivially cause irqs to
> > > > fire during a kexec reboot just by mashing the keyboard.
> > > >
> > > > And during a kdump boot all sorts of stuff could be firing. Even during
> > > > a clean boot, from firmware, I don't think we can guarantee that
> > > > nothing's going to fire.
> > > >
> > > > .. after a bit of testing ..
> > > >
> > > > It seems it actually works (sort of).
> > > >
> > > > xics_remap_irq() calls irq_radix_revmap_lookup(), which calls:
> > > >
> > > > ptr = radix_tree_lookup(&host->revmap_data.tree, hwirq);
> > > >
> > > > And because host->revmap_data.tree was zalloc'ed we trip on the first
> > > > check here:
> > >
> > > @#$% ctrl-enter == send!
> > >
> > > Continuing ...
> > >
> > > void *radix_tree_lookup(struct radix_tree_root *root, unsigned long index)
> > > {
> > > unsigned int height, shift;
> > > struct radix_tree_node *node, **slot;
> > >
> > > node = rcu_dereference(root->rnode);
> > > if (node == NULL)
> > > return NULL;
> > >
> > > Which means irq_radix_revmap_lookup() will return NO_IRQ, which is cool.
> >
> > Which is what I intended so that as long as no IRQ is registered we
> > return NO_IRQ.
> >
> > >
> > >
> > > So I think it can fly, as long as we're happy that we can't reverse map
> > > anything until smp_xics_probe() - and I think that's true, as any irq we
> > > take will be invalid.
> >
> > That's true as no IRQs are registered before smp_xics_probe() and for any
> > interrupt we might get before that, irq_radix_revmap_lookup() will return
> > NO_IRQ.
>
> Cool, we agree :)
>
> My only worry is that we might be relying on on the particular radix
> tree implementation a bit too much.

Well maybe we could revert back to testing a flag just like we
do for host->revmap_data.tree.gfp_mask != 0. Dunno.

> Is it documented somewhere that
> the /very/ first check is for root->rnode != NULL, and the rest of the
> root may be unintialised?

Not in anything I could read except in looking at the code.

>
> And I think it needs a big fat comment in the irq code saying that it's
> safe because revmap_data is zalloc'ed, and that means the radix lookup
> will fail (safely).

Yep, right. Will advertise this properly for the next round if this
remains the prefered solution.

Thanks,

Sebastien.