2012-05-21 16:50:03

by Dimitri Sivanich

[permalink] [raw]
Subject: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

The smp_irq_move_cleanup_interrupt routine should be checking for a valid
irq_cfg pointer prior to accessing it. It also seems that this should be
done after taking the desc lock.

Signed-off-by: Dimitri Sivanich <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;

- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);

+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.


2012-05-21 21:07:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, 21 May 2012, Dimitri Sivanich wrote:

> The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> irq_cfg pointer prior to accessing it. It also seems that this should be
> done after taking the desc lock.

It seems that you either missed or failed to explain why it should be
done _after_ taking the lock.

Changelogs matter, really.

Thanks,

tglx

2012-05-21 21:07:39

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, 2012-05-21 at 11:49 -0500, Dimitri Sivanich wrote:
> The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> irq_cfg pointer prior to accessing it. It also seems that this should be
> done after taking the desc lock.

I think these changes are correct. Did you see any crashes during module
unload etc?

Reviewed-by: Suresh Siddha <[email protected]>

>
> Signed-off-by: Dimitri Sivanich <[email protected]>
> ---
> arch/x86/kernel/apic/io_apic.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux/arch/x86/kernel/apic/io_apic.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/apic/io_apic.c
> +++ linux/arch/x86/kernel/apic/io_apic.c
> @@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
> if (!desc)
> continue;
>
> - cfg = irq_cfg(irq);
> raw_spin_lock(&desc->lock);
>
> + cfg = irq_cfg(irq);
> + if (!cfg)
> + goto unlock;
> +
> /*
> * Check if the irq migration is in progress. If so, we
> * haven't received the cleanup request yet for this irq.

2012-05-21 21:09:05

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, May 21, 2012 at 02:05:53PM -0700, Suresh Siddha wrote:
> On Mon, 2012-05-21 at 11:49 -0500, Dimitri Sivanich wrote:
> > The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> > irq_cfg pointer prior to accessing it. It also seems that this should be
> > done after taking the desc lock.
>
> I think these changes are correct. Did you see any crashes during module
> unload etc?

Yes, we have seen these on occasion during boot.

>
> Reviewed-by: Suresh Siddha <[email protected]>
>
> >
> > Signed-off-by: Dimitri Sivanich <[email protected]>
> > ---
> > arch/x86/kernel/apic/io_apic.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > Index: linux/arch/x86/kernel/apic/io_apic.c
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/apic/io_apic.c
> > +++ linux/arch/x86/kernel/apic/io_apic.c
> > @@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
> > if (!desc)
> > continue;
> >
> > - cfg = irq_cfg(irq);
> > raw_spin_lock(&desc->lock);
> >
> > + cfg = irq_cfg(irq);
> > + if (!cfg)
> > + goto unlock;
> > +
> > /*
> > * Check if the irq migration is in progress. If so, we
> > * haven't received the cleanup request yet for this irq.
>

2012-05-21 21:12:30

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, 2012-05-21 at 16:09 -0500, Dimitri Sivanich wrote:
> On Mon, May 21, 2012 at 02:05:53PM -0700, Suresh Siddha wrote:
> > On Mon, 2012-05-21 at 11:49 -0500, Dimitri Sivanich wrote:
> > > The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> > > irq_cfg pointer prior to accessing it. It also seems that this should be
> > > done after taking the desc lock.
> >
> > I think these changes are correct. Did you see any crashes during module
> > unload etc?
>
> Yes, we have seen these on occasion during boot.

During boot or shutdown?

Review of the code shows that this can trigger during module unload
which can call destroy_irq() etc and can trigger the crash if there is a
parallel irq migration related cleanup.

Unsuccessful module loads can also call destroy_irq() but I doubt that
is what happening here.

thanks,
suresh

2012-05-21 21:19:24

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, May 21, 2012 at 11:07:04PM +0200, Thomas Gleixner wrote:
> On Mon, 21 May 2012, Dimitri Sivanich wrote:
>
> > The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> > irq_cfg pointer prior to accessing it. It also seems that this should be
> > done after taking the desc lock.
>
> It seems that you either missed or failed to explain why it should be
> done _after_ taking the lock.
>
> Changelogs matter, really.
>
How about this?


The smp_irq_move_cleanup_interrupt routine should be checking for a valid
irq_cfg pointer prior to accessing it.

Follow the same protocol shown in irq_set_chip_data(), by taking the desc
lock before accessing this location.

Signed-off-by: Dimitri Sivanich <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;

- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);

+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.

2012-05-21 21:34:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, 21 May 2012, Dimitri Sivanich wrote:

> On Mon, May 21, 2012 at 11:07:04PM +0200, Thomas Gleixner wrote:
> > On Mon, 21 May 2012, Dimitri Sivanich wrote:
> >
> > > The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> > > irq_cfg pointer prior to accessing it. It also seems that this should be
> > > done after taking the desc lock.
> >
> > It seems that you either missed or failed to explain why it should be
> > done _after_ taking the lock.
> >
> > Changelogs matter, really.
> >
> How about this?
>
> The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> irq_cfg pointer prior to accessing it.

Why should it?

> Follow the same protocol shown in irq_set_chip_data(), by taking the desc
> lock before accessing this location.

This is not a proper explanation. irq_set_chip_data() might be wrong
as well and aside of that it might be correct to ignore that protocol
in that particular situation.

What's wrong with adding the actual wreckage scenario _AND_ the
solution to the changelog so that a casual reader, who is not
completely familiar with the code can understand what you are trying
to solve?

Don't misunderstand me. The patch is correct, just the explanation
sucks.

Thanks,

tglx

2012-05-22 02:41:06

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, May 21, 2012 at 02:10:39PM -0700, Suresh Siddha wrote:
> On Mon, 2012-05-21 at 16:09 -0500, Dimitri Sivanich wrote:
> > On Mon, May 21, 2012 at 02:05:53PM -0700, Suresh Siddha wrote:
> > > On Mon, 2012-05-21 at 11:49 -0500, Dimitri Sivanich wrote:
> > > > The smp_irq_move_cleanup_interrupt routine should be checking for a valid
> > > > irq_cfg pointer prior to accessing it. It also seems that this should be
> > > > done after taking the desc lock.
> > >
> > > I think these changes are correct. Did you see any crashes during module
> > > unload etc?
> >
> > Yes, we have seen these on occasion during boot.
>
> During boot or shutdown?
>
Early on in boot, on rare occasions.

Easiest to reproduce doing module repeated module loads/unloads.

> Review of the code shows that this can trigger during module unload
> which can call destroy_irq() etc and can trigger the crash if there is a
> parallel irq migration related cleanup.
>
> Unsuccessful module loads can also call destroy_irq() but I doubt that
> is what happening here.
>
> thanks,
> suresh
>

2012-05-23 18:16:39

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Mon, May 21, 2012 at 11:34:32PM +0200, Thomas Gleixner wrote:
> Don't misunderstand me. The patch is correct, just the explanation
> sucks.
>
Hopefully this explanation is descriptive enough.



A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set vector_irq in
__assign_irq_vector(), but not yet called irq_set_chip_data() to set the
irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
irq, but panic when accessing irq_cfg.

Only continue processing the irq if irq_cfg is non-NULL.

Signed-off-by: Dimitri Sivanich <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;

- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);

+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.

2012-05-23 19:04:19

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, May 23, 2012 at 01:16:36PM -0500, Dimitri Sivanich wrote:
> In create_irq_nr() there is a window where we have set vector_irq in
> __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> irq_cfg pointer.

BTW - is there a reason why we're calling irq_set_chip_data() in create_irq_nr()
rather than in __assign_irq_vector() for the case where irq_cfg is NULL?

2012-05-23 19:24:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, 23 May 2012, Dimitri Sivanich wrote:

> On Wed, May 23, 2012 at 01:16:36PM -0500, Dimitri Sivanich wrote:
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
>
> BTW - is there a reason why we're calling irq_set_chip_data() in create_irq_nr()
> rather than in __assign_irq_vector() for the case where irq_cfg is NULL?

We can't nest desc->lock inside vector lock.

2012-05-23 19:26:20

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, 2012-05-23 at 14:04 -0500, Dimitri Sivanich wrote:
> On Wed, May 23, 2012 at 01:16:36PM -0500, Dimitri Sivanich wrote:
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.

Ha. Now I understand how it can happen during boot/module load time.
Thanks.

> BTW - is there a reason why we're calling irq_set_chip_data() in create_irq_nr()
> rather than in __assign_irq_vector() for the case where irq_cfg is NULL?

assign_irq_vector() is also for setting up vectors during irq migration.
So may be we could have done the irq_set_chip_data() in create_irq_nr()
itself before calling assign_irq_vector(). Anyways, this change can't
help in case of destroy irq path which can also lead to the same issue
of de-referencing null pointer.

Also, it will be nice if you can refer to this destroy irq path in your
changelog.

Acked-by: Suresh Siddha <[email protected]>

2012-05-23 20:02:31

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, May 23, 2012 at 12:24:46PM -0700, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 14:04 -0500, Dimitri Sivanich wrote:
> > On Wed, May 23, 2012 at 01:16:36PM -0500, Dimitri Sivanich wrote:
> > > In create_irq_nr() there is a window where we have set vector_irq in
> > > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > > irq_cfg pointer.
>
> Ha. Now I understand how it can happen during boot/module load time.
> Thanks.
>
> > BTW - is there a reason why we're calling irq_set_chip_data() in create_irq_nr()
> > rather than in __assign_irq_vector() for the case where irq_cfg is NULL?
>
> assign_irq_vector() is also for setting up vectors during irq migration.
> So may be we could have done the irq_set_chip_data() in create_irq_nr()
> itself before calling assign_irq_vector(). Anyways, this change can't
> help in case of destroy irq path which can also lead to the same issue
> of de-referencing null pointer.
>
> Also, it will be nice if you can refer to this destroy irq path in your
> changelog.
>
> Acked-by: Suresh Siddha <[email protected]>

OK. Hopefully this covers it.



A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set vector_irq in
__assign_irq_vector(), but not yet called irq_set_chip_data() to set the
irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
irq, but panic when accessing irq_cfg.

There is also a window in destroy_irq() where we've cleared the irq_cfg
pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.

Only continue processing the irq if irq_cfg is non-NULL.

Signed-off-by: Dimitri Sivanich <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;

- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);

+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.

2012-05-23 23:51:00

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> OK. Hopefully this covers it.

Sorry No. Now you will understand why Thomas wanted detailed changelog.
I found one more issue with the help of your new modification to the
changelog.

> A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
>
> In create_irq_nr() there is a window where we have set vector_irq in
> __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> irq_cfg pointer.
>
> Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> irq, but panic when accessing irq_cfg.
>
> There is also a window in destroy_irq() where we've cleared the irq_cfg
> pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.

So, what happens if the irq_desc gets freed by the destroy_irq() in the
sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
irq desc memory! Right?

May we should really do something like the appended (untested patch)?
Can you please review and give this a try? Let me review a bit more to
see if this really fixes the issue.

Thanks.
---
arch/x86/kernel/apic/io_apic.c | 19 +++++++++----------
1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ffdc152..81f4cab 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2295,6 +2295,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
exit_idle();

me = smp_processor_id();
+
+ raw_spin_lock(&vector_lock);
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
unsigned int irq;
unsigned int irr;
@@ -2310,17 +2312,16 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
continue;

cfg = irq_cfg(irq);
- raw_spin_lock(&desc->lock);

/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.
*/
if (cfg->move_in_progress)
- goto unlock;
+ continue;

if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
- goto unlock;
+ continue;

irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
/*
@@ -2332,12 +2333,11 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
*/
if (irr & (1 << (vector % 32))) {
apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
- goto unlock;
+ continue;
}
__this_cpu_write(vector_irq[vector], -1);
-unlock:
- raw_spin_unlock(&desc->lock);
}
+ raw_spin_unlock(&vector_lock);

irq_exit();
}
@@ -2986,17 +2986,16 @@ unsigned int create_irq_nr(unsigned int from, int node)
return 0;
}

+ irq_set_chip_data(irq, cfg);
raw_spin_lock_irqsave(&vector_lock, flags);
if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
ret = irq;
raw_spin_unlock_irqrestore(&vector_lock, flags);

- if (ret) {
- irq_set_chip_data(irq, cfg);
+ if (ret)
irq_clear_status_flags(irq, IRQ_NOREQUEST);
- } else {
+ else
free_irq_at(irq, cfg);
- }
return ret;
}


2012-05-24 01:40:25

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, May 23, 2012 at 04:49:29PM -0700, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.

> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?

I don't believe I ever hit that case, but yes, I see the possibility, so
we have at least 3 holes here.

My test (other than repeated boots) was to repeatedly insert and remove a
module that sets up irq's on every cpu. Without my patch it panic'ed fairly
quickly. With the patch it never failed.

>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.

I wonder how many instances of smp_irq_move_cleanup_interrupt() might
run simultaneously on a large system? Would there be any issue with lock
pressure from holding the vector_lock through the entire loop?

I can probably test the code (after further review), but the only case I
saw through testing was the create_irq_nr case.

>
> Thanks.
> ---
> arch/x86/kernel/apic/io_apic.c | 19 +++++++++----------
> 1 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index ffdc152..81f4cab 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2295,6 +2295,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> exit_idle();
>
> me = smp_processor_id();
> +
> + raw_spin_lock(&vector_lock);
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> unsigned int irq;
> unsigned int irr;
> @@ -2310,17 +2312,16 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> continue;
>
> cfg = irq_cfg(irq);
> - raw_spin_lock(&desc->lock);
>
> /*
> * Check if the irq migration is in progress. If so, we
> * haven't received the cleanup request yet for this irq.
> */
> if (cfg->move_in_progress)
> - goto unlock;
> + continue;
>
> if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
> - goto unlock;
> + continue;
>
> irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
> /*
> @@ -2332,12 +2333,11 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> */
> if (irr & (1 << (vector % 32))) {
> apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
> - goto unlock;
> + continue;
> }
> __this_cpu_write(vector_irq[vector], -1);
> -unlock:
> - raw_spin_unlock(&desc->lock);
> }
> + raw_spin_unlock(&vector_lock);
>
> irq_exit();
> }
> @@ -2986,17 +2986,16 @@ unsigned int create_irq_nr(unsigned int from, int node)
> return 0;
> }
>
> + irq_set_chip_data(irq, cfg);
> raw_spin_lock_irqsave(&vector_lock, flags);
> if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
> ret = irq;
> raw_spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (ret) {
> - irq_set_chip_data(irq, cfg);
> + if (ret)
> irq_clear_status_flags(irq, IRQ_NOREQUEST);
> - } else {
> + else
> free_irq_at(irq, cfg);
> - }
> return ret;
> }
>
>

2012-05-24 14:37:16

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, May 23, 2012 at 04:49:29PM -0700, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.
>
> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?


And speaking of possible holes in destroy_irq()..

What happens if we're running __assign_irq_vector() (say we're changing irq
affinity), and on another cpu we had just run through __clear_irq_vector()
via destroy_irq(). Now destroy_irq() is going to call
free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
__assign_irq_vector goes to access irq_cfg (cfg->vector or
cfg->move_in_progress, for instance), which was already freed.

I'm not sure if this can happen, but just eyeballing it, it does look that
that way.

>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.
>
> Thanks.
> ---
> arch/x86/kernel/apic/io_apic.c | 19 +++++++++----------
> 1 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index ffdc152..81f4cab 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2295,6 +2295,8 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> exit_idle();
>
> me = smp_processor_id();
> +
> + raw_spin_lock(&vector_lock);
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> unsigned int irq;
> unsigned int irr;
> @@ -2310,17 +2312,16 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> continue;
>
> cfg = irq_cfg(irq);
> - raw_spin_lock(&desc->lock);
>
> /*
> * Check if the irq migration is in progress. If so, we
> * haven't received the cleanup request yet for this irq.
> */
> if (cfg->move_in_progress)
> - goto unlock;
> + continue;
>
> if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
> - goto unlock;
> + continue;
>
> irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
> /*
> @@ -2332,12 +2333,11 @@ asmlinkage void smp_irq_move_cleanup_interrupt(void)
> */
> if (irr & (1 << (vector % 32))) {
> apic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
> - goto unlock;
> + continue;
> }
> __this_cpu_write(vector_irq[vector], -1);
> -unlock:
> - raw_spin_unlock(&desc->lock);
> }
> + raw_spin_unlock(&vector_lock);
>
> irq_exit();
> }
> @@ -2986,17 +2986,16 @@ unsigned int create_irq_nr(unsigned int from, int node)
> return 0;
> }
>
> + irq_set_chip_data(irq, cfg);
> raw_spin_lock_irqsave(&vector_lock, flags);
> if (!__assign_irq_vector(irq, cfg, apic->target_cpus()))
> ret = irq;
> raw_spin_unlock_irqrestore(&vector_lock, flags);
>
> - if (ret) {
> - irq_set_chip_data(irq, cfg);
> + if (ret)
> irq_clear_status_flags(irq, IRQ_NOREQUEST);
> - } else {
> + else
> free_irq_at(irq, cfg);
> - }
> return ret;
> }
>
>

2012-05-24 14:53:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Wed, 23 May 2012, Suresh Siddha wrote:
> On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > OK. Hopefully this covers it.
>
> Sorry No. Now you will understand why Thomas wanted detailed changelog.
> I found one more issue with the help of your new modification to the
> changelog.
>
> > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> >
> > In create_irq_nr() there is a window where we have set vector_irq in
> > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > irq_cfg pointer.
> >
> > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > irq, but panic when accessing irq_cfg.
> >
> > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
>
> So, what happens if the irq_desc gets freed by the destroy_irq() in the
> sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> irq desc memory! Right?
>
> May we should really do something like the appended (untested patch)?
> Can you please review and give this a try? Let me review a bit more to
> see if this really fixes the issue.

It's fixing the problem.

But this move_cleanup stuff could be made less stupid.

The check for irq_desc is superflous. irq_cfg() calls
irq_get_chip_data() which will return NULL if the irq descriptor is
not there.

To avoid the lookup business completely we should really store
irq_desc instead of the irq number in the per cpu vector array, that
would also get rid of the lookup in the irq delivery path.

Now that still needs to iterate over all vectors, but this could be
optimized in a second step.

In complete_move() we send the IPI to all cpus in the old mask. We
really should set the corresponding vector bit in a per cpu bitfield
on those cpus in the mask. The cleanup can rely on the bits and avoid
looking at 200+ vectors to find a single one.

Thoughts?

tglx

2012-05-24 15:36:12

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Thu, May 24, 2012 at 04:53:06PM +0200, Thomas Gleixner wrote:
> On Wed, 23 May 2012, Suresh Siddha wrote:
> > On Wed, 2012-05-23 at 15:02 -0500, Dimitri Sivanich wrote:
> > > OK. Hopefully this covers it.
> >
> > Sorry No. Now you will understand why Thomas wanted detailed changelog.
> > I found one more issue with the help of your new modification to the
> > changelog.
> >
> > > A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
> > > we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.
> > >
> > > In create_irq_nr() there is a window where we have set vector_irq in
> > > __assign_irq_vector(), but not yet called irq_set_chip_data() to set the
> > > irq_cfg pointer.
> > >
> > > Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
> > > smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
> > > irq, but panic when accessing irq_cfg.
> > >
> > > There is also a window in destroy_irq() where we've cleared the irq_cfg
> > > pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
> > > that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
> > > but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.
> >
> > So, what happens if the irq_desc gets freed by the destroy_irq() in the
> > sparse irq case? smp_irq_move_cleanup_interrupt() will refer to freed
> > irq desc memory! Right?
> >
> > May we should really do something like the appended (untested patch)?
> > Can you please review and give this a try? Let me review a bit more to
> > see if this really fixes the issue.
>
> It's fixing the problem.
>
> But this move_cleanup stuff could be made less stupid.
>
> The check for irq_desc is superflous. irq_cfg() calls
> irq_get_chip_data() which will return NULL if the irq descriptor is
> not there.
>
> To avoid the lookup business completely we should really store
> irq_desc instead of the irq number in the per cpu vector array, that
> would also get rid of the lookup in the irq delivery path.

So if the irq_desc gets deallocated you would clear all corresponding
per cpu vector_irq references before deallocation, protecting accesses
by smp_irq_move_cleanup_interrupt?

>
> Now that still needs to iterate over all vectors, but this could be
> optimized in a second step.
>
> In complete_move() we send the IPI to all cpus in the old mask. We
> really should set the corresponding vector bit in a per cpu bitfield
> on those cpus in the mask. The cleanup can rely on the bits and avoid
> looking at 200+ vectors to find a single one.

This part does sound more efficient at first glance.

>
> Thoughts?
>
> tglx

2012-05-24 18:20:47

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> And speaking of possible holes in destroy_irq()..
>
> What happens if we're running __assign_irq_vector() (say we're changing irq
> affinity), and on another cpu we had just run through __clear_irq_vector()
> via destroy_irq(). Now destroy_irq() is going to call
> free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> __assign_irq_vector goes to access irq_cfg (cfg->vector or
> cfg->move_in_progress, for instance), which was already freed.
>
> I'm not sure if this can happen, but just eyeballing it, it does look that
> that way.
>

I wanted to say, irq desc is locked when we change the irq affinity,
which calls assign_irq_vector() and friends, so this should be fine.

BUT NO. I don't see any reference counts being maintained when we do
irq_to_desc(). So locking/unlocking that desc pointer is bogus when
destroy_irq() can go ahead and free the desc in parallel.

So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?

thanks,
suresh

2012-05-24 19:16:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Thu, 24 May 2012, Suresh Siddha wrote:
> On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> > And speaking of possible holes in destroy_irq()..
> >
> > What happens if we're running __assign_irq_vector() (say we're changing irq
> > affinity), and on another cpu we had just run through __clear_irq_vector()
> > via destroy_irq(). Now destroy_irq() is going to call
> > free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> > __assign_irq_vector goes to access irq_cfg (cfg->vector or
> > cfg->move_in_progress, for instance), which was already freed.
> >
> > I'm not sure if this can happen, but just eyeballing it, it does look that
> > that way.
> >
>
> I wanted to say, irq desc is locked when we change the irq affinity,
> which calls assign_irq_vector() and friends, so this should be fine.
>
> BUT NO. I don't see any reference counts being maintained when we do
> irq_to_desc(). So locking/unlocking that desc pointer is bogus when
> destroy_irq() can go ahead and free the desc in parallel.
>
> So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?

Yes, we need refcounts for that. We talked about that before, but then
the argument was against it was that all that code is serialized
already, so no need. How wrong :)

2012-05-26 00:24:52

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> On Thu, 24 May 2012, Suresh Siddha wrote:
> > On Thu, 2012-05-24 at 09:37 -0500, Dimitri Sivanich wrote:
> > > And speaking of possible holes in destroy_irq()..
> > >
> > > What happens if we're running __assign_irq_vector() (say we're changing irq
> > > affinity), and on another cpu we had just run through __clear_irq_vector()
> > > via destroy_irq(). Now destroy_irq() is going to call
> > > free_irq_at()->free_irq_cfg, which will clear irq_cfg. Then
> > > __assign_irq_vector goes to access irq_cfg (cfg->vector or
> > > cfg->move_in_progress, for instance), which was already freed.
> > >
> > > I'm not sure if this can happen, but just eyeballing it, it does look that
> > > that way.
> > >
> >
> > I wanted to say, irq desc is locked when we change the irq affinity,
> > which calls assign_irq_vector() and friends, so this should be fine.
> >
> > BUT NO. I don't see any reference counts being maintained when we do
> > irq_to_desc(). So locking/unlocking that desc pointer is bogus when
> > destroy_irq() can go ahead and free the desc in parallel.
> >
> > So, SPARSE_IRQ looks terribly broken! Yinghai, Thomas?
>
> Yes, we need refcounts for that. We talked about that before, but then
> the argument was against it was that all that code is serialized
> already, so no need. How wrong :)

I looked a bit more at this and it looks like unregister_handler_proc()
(that gets called from the free_irq path) will ensure that all the
existing proc writers modifying the irq affinity from
the /proc/irq/N/smp_affinity interface will be completed. So Dimitri,
the code path changing irq affinity looks fine.

We also do synchronize_irq() in __free_irq(). Followed by
masking/shutdown the irq etc and the destroy_irq() which happens after
all this is probably getting lucky because of timing reasons (any other
cpu handling that irq would have completed the irq handling and the
corresponding reference to the 'desc' etc). But we should be also doing
synchronize_rcu()/synchronize_sched() to ensure that other cpu's are not
in the irq handling paths having a reference to the 'desc'.

There are other (not-so common) irq desc references, like in the
show_interrupts() (cat /proc/interrupts path) etc, that does things like
this in the process context:

desc = irq_to_desc(i);
if (!desc)
return 0;

raw_spin_lock_irqsave(&desc->lock, flags);

May be we should introduce something like
get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
irq desc with pre-emption/irq's disabled and lock it etc. And the
synchronize_sched() will enable the destroy_irq()/free_desc() to free it
safely etc.

thanks,
suresh

2012-05-26 10:18:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Fri, 25 May 2012, Suresh Siddha wrote:
> On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> There are other (not-so common) irq desc references, like in the
> show_interrupts() (cat /proc/interrupts path) etc, that does things like
> this in the process context:
>
> desc = irq_to_desc(i);
> if (!desc)
> return 0;
>
> raw_spin_lock_irqsave(&desc->lock, flags);
>
> May be we should introduce something like
> get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
> irq desc with pre-emption/irq's disabled and lock it etc. And the
> synchronize_sched() will enable the destroy_irq()/free_desc() to free it
> safely etc.

I want to avoid that and instead use proper refcounting. The reason is
that we want to move the irq descriptor when the affinity changes
nodes, and for that we need refcounting anyway.

Thanks,

tglx

2012-05-27 01:41:41

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

I have realized the same issue with typical usages of for_each_irq_desc(),
which may access freed memory with SPARSE_IRQ. My naive solution was to
avoid freeing irq_desc even SPARSE_IRQ_ is enabled.

for_each_irq_desc(i, desc) {
raw_spin_lock_irq(&desc->lock);

On 05/26/2012 06:18 PM, Thomas Gleixner wrote:
> On Fri, 25 May 2012, Suresh Siddha wrote:
>> On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
>> There are other (not-so common) irq desc references, like in the
>> show_interrupts() (cat /proc/interrupts path) etc, that does things like
>> this in the process context:
>>
>> desc = irq_to_desc(i);
>> if (!desc)
>> return 0;
>>
>> raw_spin_lock_irqsave(&desc->lock, flags);
>>
>> May be we should introduce something like
>> get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
>> irq desc with pre-emption/irq's disabled and lock it etc. And the
>> synchronize_sched() will enable the destroy_irq()/free_desc() to free it
>> safely etc.
>
> I want to avoid that and instead use proper refcounting. The reason is
> that we want to move the irq descriptor when the affinity changes
> nodes, and for that we need refcounting anyway.
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-05-30 13:50:29

by Dimitri Sivanich

[permalink] [raw]
Subject: Re: [PATCH] x86: check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt

On Sat, May 26, 2012 at 12:18:21PM +0200, Thomas Gleixner wrote:
> On Fri, 25 May 2012, Suresh Siddha wrote:
> > On Thu, 2012-05-24 at 21:16 +0200, Thomas Gleixner wrote:
> > There are other (not-so common) irq desc references, like in the
> > show_interrupts() (cat /proc/interrupts path) etc, that does things like
> > this in the process context:
> >
> > desc = irq_to_desc(i);
> > if (!desc)
> > return 0;
> >
> > raw_spin_lock_irqsave(&desc->lock, flags);
> >
> > May be we should introduce something like
> > get_irq_desc_locked()/put_irq_desc_locked() that can safely access the
> > irq desc with pre-emption/irq's disabled and lock it etc. And the
> > synchronize_sched() will enable the destroy_irq()/free_desc() to free it
> > safely etc.
>
> I want to avoid that and instead use proper refcounting. The reason is
> that we want to move the irq descriptor when the affinity changes
> nodes, and for that we need refcounting anyway.
>

While this proposal sounds good, in the meantime would there be any harm in
putting the NULL cfg check into smp_irq_move_cleanup_interrupt()?

It's a minimal change, and eliminates the panics that I've encountered thus far.

Reposting the patch.



A NULL pointer dereference can occur in smp_irq_move_cleanup_interrupt() if
we haven't yet setup the irq_cfg pointer in the irq_desc.irq_data.chip_data.

In create_irq_nr() there is a window where we have set vector_irq in
__assign_irq_vector(), but not yet called irq_set_chip_data() to set the
irq_cfg pointer.

Should an IRQ_MOVE_CLEANUP_VECTOR hit the cpu in question during this time,
smp_irq_move_cleanup_interrupt() will attempt to process the aforementioned
irq, but panic when accessing irq_cfg.

There is also a window in destroy_irq() where we've cleared the irq_cfg
pointer in free_irq_cfg(), but have not yet called irq_free_desc(). Note
that we have cleared vector_irq in __clear_irq_vector() prior to free_irq_cfg(),
but smp_irq_move_cleanup_interrupt() might've already referenced the irq_desc.

Only continue processing the irq if irq_cfg is non-NULL.

Signed-off-by: Dimitri Sivanich <[email protected]>
---
arch/x86/kernel/apic/io_apic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic/io_apic.c
+++ linux/arch/x86/kernel/apic/io_apic.c
@@ -2478,9 +2478,12 @@ asmlinkage void smp_irq_move_cleanup_int
if (!desc)
continue;

- cfg = irq_cfg(irq);
raw_spin_lock(&desc->lock);

+ cfg = irq_cfg(irq);
+ if (!cfg)
+ goto unlock;
+
/*
* Check if the irq migration is in progress. If so, we
* haven't received the cleanup request yet for this irq.