2015-02-04 13:28:05

by Jörg Rödel

[permalink] [raw]
Subject: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable

Hi,

here is a patch to fix a kernel panic at shutdown we have seen recently.
The issue is hard to reproduce, so the patch description about the root
cause of the bug comes only from review and my current understanding of
x86 irq code.

So if what I wrote in the patch description doesn't make sense, please
let me know.

Constructive comments and feedback welcome.

Thanks,

Joerg

>From 153e8f6cf39c42dd9767eb49a27eacfb69738fb0 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <[email protected]>
Date: Wed, 4 Feb 2015 13:33:33 +0100
Subject: [PATCH] x86: irq: Check for valid irq descriptor in
check_irq_vectors_for_cpu_disable

When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.

When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.

When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.

This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable for a valid
'struct irq_desc' to prevent this issue.

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/irq.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 705ef8d..67b1cbe 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
irq = __this_cpu_read(vector_irq[vector]);
if (irq >= 0) {
desc = irq_to_desc(irq);
+ if (!desc)
+ continue;
+
data = irq_desc_get_irq_data(desc);
cpumask_copy(&affinity_new, data->affinity);
cpu_clear(this_cpu, affinity_new);
--
1.8.4.5


2015-02-05 05:51:46

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable

On 2015/2/4 21:27, Joerg Roedel wrote:
> Hi,
>
> here is a patch to fix a kernel panic at shutdown we have seen recently.
> The issue is hard to reproduce, so the patch description about the root
> cause of the bug comes only from review and my current understanding of
> x86 irq code.
>
> So if what I wrote in the patch description doesn't make sense, please
> let me know.
>
> Constructive comments and feedback welcome.
>
> Thanks,
>
> Joerg
>
> From 153e8f6cf39c42dd9767eb49a27eacfb69738fb0 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <[email protected]>
> Date: Wed, 4 Feb 2015 13:33:33 +0100
> Subject: [PATCH] x86: irq: Check for valid irq descriptor in
> check_irq_vectors_for_cpu_disable
>
> When an interrupt is migrated away from a cpu it will stay
> in its vector_irq array until smp_irq_move_cleanup_interrupt
> succeeded. The cfg->move_in_progress flag is cleared already
> when the IPI was sent.
>
> When the interrupt is destroyed after migration its 'struct
> irq_desc' is freed and the vector_irq arrays are cleaned up.
> But since cfg->move_in_progress is already 0 the references
> at cpus before the last migration will not be cleared. So
> this would leave a reference to an already destroyed irq
> alive.
>
> When the cpu is taken down at this point, the
> check_irq_vectors_for_cpu_disable function finds a valid irq
> number in the vector_irq array, but gets NULL for its
> descriptor and dereferences it, causing a kernel panic.
>
> This has been observed on real systems at shutdown. Add a
> check to check_irq_vectors_for_cpu_disable for a valid
> 'struct irq_desc' to prevent this issue.
>
> Signed-off-by: Joerg Roedel <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>

Actually there's another racing pattern.
for (irq = 0; irq < nr_irqs; irq++) {
desc = irq_to_desc(irq);
access desc->xxx
}

When sparsing IRQ is enabled, there's no mechanism to protect
desc returned by irq_to_desc(). Once I have considered a brute
solution of disabling freeing of irq_desc:(
Regards!
Gerry
> ---
> arch/x86/kernel/irq.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index 705ef8d..67b1cbe 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
> irq = __this_cpu_read(vector_irq[vector]);
> if (irq >= 0) {
> desc = irq_to_desc(irq);
> + if (!desc)
> + continue;
> +
> data = irq_desc_get_irq_data(desc);
> cpumask_copy(&affinity_new, data->affinity);
> cpu_clear(this_cpu, affinity_new);
>

2015-02-06 12:28:08

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] x86: irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable

Hi Jiang,

On Thu, Feb 05, 2015 at 01:51:26PM +0800, Jiang Liu wrote:
> Reviewed-by: Jiang Liu <[email protected]>

Thanks for your review.

> Actually there's another racing pattern.
> for (irq = 0; irq < nr_irqs; irq++) {
> desc = irq_to_desc(irq);
> access desc->xxx
> }
>
> When sparsing IRQ is enabled, there's no mechanism to protect
> desc returned by irq_to_desc(). Once I have considered a brute
> solution of disabling freeing of irq_desc:(

Hmm, how about wrapping the places that use irq_desc in rcu_read_lock()
and do a synchronize_rcu() before we free it (at least in the SPARSE_IRQ
case)? It wouldn't be a real RCU data structure, but we make at least
sure that we don't free an irq_desc thats in use.


Joerg

Subject: [tip:irq/urgent] x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()

Commit-ID: d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1
Gitweb: http://git.kernel.org/tip/d97eb8966c91f2c9d05f0a22eb89ed5b76d966d1
Author: Joerg Roedel <[email protected]>
AuthorDate: Wed, 4 Feb 2015 13:33:33 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Feb 2015 15:01:42 +0100

x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()

When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.

When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.

When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable() function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.

This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable() for a valid
'struct irq_desc' to prevent this issue.

Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/irq.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 705ef8d..67b1cbe 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -302,6 +302,9 @@ int check_irq_vectors_for_cpu_disable(void)
irq = __this_cpu_read(vector_irq[vector]);
if (irq >= 0) {
desc = irq_to_desc(irq);
+ if (!desc)
+ continue;
+
data = irq_desc_get_irq_data(desc);
cpumask_copy(&affinity_new, data->affinity);
cpu_clear(this_cpu, affinity_new);