2009-06-04 19:02:21

by Suresh Siddha

[permalink] [raw]
Subject: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

From: Suresh Siddha <[email protected]>
Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

In the presence of interrupt-remapping, irqs will be migrated in the
process context and we don't do (and there is no need to) irq_chip mask/unmask
while migrating the interrupt.

Similarly fix the fixup_irqs() that get called during cpu offline and avoid
calling irq_chip mask/unmask for irqs that are ok to be migrated in the
process context.

While we didn't observe any race condition with the existing code,
this change takes complete advantage of interrupt-remapping in
the newer generation platforms and avoids any potential HW lockup's
(that often worry Eric :)

Signed-off-by: Suresh Siddha <[email protected]>
Cc: Eric W. Biederman <[email protected]>
---

diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
index 977d8b4..82265a5 100644
--- a/arch/x86/kernel/irq_64.c
+++ b/arch/x86/kernel/irq_64.c
@@ -95,7 +95,7 @@ void fixup_irqs(void)
affinity = cpu_all_mask;
}

- if (desc->chip->mask)
+ if (!(desc->status & IRQ_MOVE_PCNTXT) && desc->chip->mask)
desc->chip->mask(irq);

if (desc->chip->set_affinity)
@@ -103,7 +103,7 @@ void fixup_irqs(void)
else if (!(warned++))
set_affinity = 0;

- if (desc->chip->unmask)
+ if (!(desc->status & IRQ_MOVE_PCNTXT) && desc->chip->unmask)
desc->chip->unmask(irq);

spin_unlock(&desc->lock);


2009-06-04 23:13:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

Suresh Siddha <[email protected]> writes:

> From: Suresh Siddha <[email protected]>
> Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
>
> In the presence of interrupt-remapping, irqs will be migrated in the
> process context and we don't do (and there is no need to) irq_chip mask/unmask
> while migrating the interrupt.
>
> Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> process context.
>
> While we didn't observe any race condition with the existing code,
> this change takes complete advantage of interrupt-remapping in
> the newer generation platforms and avoids any potential HW lockup's
> (that often worry Eric :)

You now apparently fail to migrate the irq threads in tandem with
the rest of the irqs.

Eric

> Signed-off-by: Suresh Siddha <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> ---
>
> diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c
> index 977d8b4..82265a5 100644
> --- a/arch/x86/kernel/irq_64.c
> +++ b/arch/x86/kernel/irq_64.c
> @@ -95,7 +95,7 @@ void fixup_irqs(void)
> affinity = cpu_all_mask;
> }
>
> - if (desc->chip->mask)
> + if (!(desc->status & IRQ_MOVE_PCNTXT) && desc->chip->mask)
> desc->chip->mask(irq);
>
> if (desc->chip->set_affinity)
> @@ -103,7 +103,7 @@ void fixup_irqs(void)
> else if (!(warned++))
> set_affinity = 0;
>
> - if (desc->chip->unmask)
> + if (!(desc->status & IRQ_MOVE_PCNTXT) && desc->chip->unmask)
> desc->chip->unmask(irq);
>
> spin_unlock(&desc->lock);

2009-06-05 01:20:45

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
> Suresh Siddha <[email protected]> writes:
>
> > From: Suresh Siddha <[email protected]>
> > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
> >
> > In the presence of interrupt-remapping, irqs will be migrated in the
> > process context and we don't do (and there is no need to) irq_chip mask/unmask
> > while migrating the interrupt.
> >
> > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> > process context.
> >
> > While we didn't observe any race condition with the existing code,
> > this change takes complete advantage of interrupt-remapping in
> > the newer generation platforms and avoids any potential HW lockup's
> > (that often worry Eric :)
>
> You now apparently fail to migrate the irq threads in tandem with
> the rest of the irqs.

Eric, Are you referring to Gary's issues? As far as I understand, they
don't happen in the presence of interrupt-remapping.

Can you ack this patch, as this avoid touching IO-APIC and MSI entries
and does fixup_irqs() in a much more reliable fashion.

I haven't followed Gary's couple of patches related to non
interrupt-remapping case. I will go through them and see how I can help
there.

thanks,
suresh

2009-06-05 01:22:18

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Thu, 2009-06-04 at 18:18 -0700, Suresh Siddha wrote:
> On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
> > Suresh Siddha <[email protected]> writes:
> >
> > > From: Suresh Siddha <[email protected]>
> > > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
> > >
> > > In the presence of interrupt-remapping, irqs will be migrated in the
> > > process context and we don't do (and there is no need to) irq_chip mask/unmask
> > > while migrating the interrupt.
> > >
> > > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> > > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> > > process context.
> > >
> > > While we didn't observe any race condition with the existing code,
> > > this change takes complete advantage of interrupt-remapping in
> > > the newer generation platforms and avoids any potential HW lockup's
> > > (that often worry Eric :)
> >
> > You now apparently fail to migrate the irq threads in tandem with
> > the rest of the irqs.
>
> Eric, Are you referring to Gary's issues? As far as I understand, they
> don't happen in the presence of interrupt-remapping.
>
> Can you ack this patch, as this avoid touching IO-APIC and MSI entries
> and does fixup_irqs() in a much more reliable fashion.

in the presence of interrupt-remapping ofcourse :)

>
> I haven't followed Gary's couple of patches related to non
> interrupt-remapping case. I will go through them and see how I can help
> there.
>
> thanks,
> suresh

2009-06-05 01:47:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

Suresh Siddha <[email protected]> writes:

> On Thu, 2009-06-04 at 18:18 -0700, Suresh Siddha wrote:
>> On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
>> > Suresh Siddha <[email protected]> writes:
>> >
>> > > From: Suresh Siddha <[email protected]>
>> > > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
>> > >
>> > > In the presence of interrupt-remapping, irqs will be migrated in the
>> > > process context and we don't do (and there is no need to) irq_chip mask/unmask
>> > > while migrating the interrupt.
>> > >
>> > > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
>> > > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
>> > > process context.
>> > >
>> > > While we didn't observe any race condition with the existing code,
>> > > this change takes complete advantage of interrupt-remapping in
>> > > the newer generation platforms and avoids any potential HW lockup's
>> > > (that often worry Eric :)
>> >
>> > You now apparently fail to migrate the irq threads in tandem with
>> > the rest of the irqs.
>>
>> Eric, Are you referring to Gary's issues? As far as I understand, they
>> don't happen in the presence of interrupt-remapping.
>>
>> Can you ack this patch, as this avoid touching IO-APIC and MSI entries
>> and does fixup_irqs() in a much more reliable fashion.
>
> in the presence of interrupt-remapping ofcourse :)

As far as this patch goes it looks like an improvement.

Acked-by: "Eric W. Biederman" <[email protected]>

However after looking at Gary's issues I see some things that are still wrong
on this path.

1) We don't do the part of irq migration that moves irq threads.
We aren't using irq threads yet but still

If we could figure out how to call irq_set_affinity for the IRQ_MOVE_PCNTXT
code path that would make the maintenance a lot simpler.

2) We still diverge on 32bit vs 64bit for no reason.
I expect the fixed 64bit version should be moved into apic/io_apic.c

3) We still enable irqs for a short while after this to let things drain.
I am wondering if that is really necessary. It does very simply
allow the irq cleanup ipi to happen, and it unjams any irqs that happened
before we migrated them.

If we wanted to very strictly follow the rules I guess we could do something
like the cleanup_ipi by hand on the cpu that is going down and rebroadcast
all of the pending irqs to another cpu to process.

Eric

2009-06-05 22:20:26

by Gary Hade

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Thu, Jun 04, 2009 at 06:18:14PM -0700, Suresh Siddha wrote:
> On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
> > Suresh Siddha <[email protected]> writes:
> >
> > > From: Suresh Siddha <[email protected]>
> > > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
> > >
> > > In the presence of interrupt-remapping, irqs will be migrated in the
> > > process context and we don't do (and there is no need to) irq_chip mask/unmask
> > > while migrating the interrupt.
> > >
> > > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> > > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> > > process context.
> > >
> > > While we didn't observe any race condition with the existing code,
> > > this change takes complete advantage of interrupt-remapping in
> > > the newer generation platforms and avoids any potential HW lockup's
> > > (that often worry Eric :)
> >
> > You now apparently fail to migrate the irq threads in tandem with
> > the rest of the irqs.
>
> Eric, Are you referring to Gary's issues? As far as I understand, they
> don't happen in the presence of interrupt-remapping.

Suresh,
We do not currently have the h/w on which to test this assertion
but it seems like there is a good chance that at least the race that
http://lkml.org/lkml/2009/6/2/377
fixes could reproduce there.

The other problem that is repaired by
http://lkml.org/lkml/2009/6/2/378
depends on the i/o redirection table write with remote IRR bit
set lockup anomaly that the interrupt-remapping code may avoid
or perhaps is not even present with that h/w. My proposed fix
for the problem is based on previous interrupt-remapping code
that you recently removed with
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0280f7c416c652a2fd95d166f52b199ae61122c0
If I correctly understand your justification for the change
it sounds like the interrupt-remapping code now "avoids" the problem.

Thanks,
Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc

2009-06-06 01:00:04

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Fri, 2009-06-05 at 15:20 -0700, Gary Hade wrote:
> On Thu, Jun 04, 2009 at 06:18:14PM -0700, Suresh Siddha wrote:
> > On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
> > > Suresh Siddha <[email protected]> writes:
> > >
> > > > From: Suresh Siddha <[email protected]>
> > > > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
> > > >
> > > > In the presence of interrupt-remapping, irqs will be migrated in the
> > > > process context and we don't do (and there is no need to) irq_chip mask/unmask
> > > > while migrating the interrupt.
> > > >
> > > > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> > > > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> > > > process context.
> > > >
> > > > While we didn't observe any race condition with the existing code,
> > > > this change takes complete advantage of interrupt-remapping in
> > > > the newer generation platforms and avoids any potential HW lockup's
> > > > (that often worry Eric :)
> > >
> > > You now apparently fail to migrate the irq threads in tandem with
> > > the rest of the irqs.
> >
> > Eric, Are you referring to Gary's issues? As far as I understand, they
> > don't happen in the presence of interrupt-remapping.
>
> Suresh,
> We do not currently have the h/w on which to test this assertion
> but it seems like there is a good chance that at least the race that
> http://lkml.org/lkml/2009/6/2/377
> fixes could reproduce there.

No. In the presence of interrupt-remapping, migration of the irq will be
done atomically from the process context. So we don't depend for the
next interrupt to arrive for the migration.

In the particular case that you mentioned above, we are calling the
send_cleanup_vector() from the set_affinity() itself in case of
interrupt-remapping. And this will reset the cfg->move_in_progress.

But I agree that this bug is pretty nasty for non interrupt-remapping
cases and we are very lucky so far for not hitting it with all the
irqbalance changes and with suspend/resume code path.

While I agree with your online fix, I have to catch up with Eric's
concerns.

>
> The other problem that is repaired by
> http://lkml.org/lkml/2009/6/2/378

oh! This one is the famous Eric's rant about broken fixup_irqs() in the
presence of non interrupt-remapping.

Long time back I have proposed a solution to Eric to resolve this for
non interrupt-remapping cases but don't think I never addressed that.
Again let me catch up with Eric's comments and see if we can comeup with
a solution that is acceptable to everyone.

> depends on the i/o redirection table write with remote IRR bit
> set lockup anomaly that the interrupt-remapping code may avoid
> or perhaps is not even present with that h/w. My proposed fix
> for the problem is based on previous interrupt-remapping code
> that you recently removed with
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0280f7c416c652a2fd95d166f52b199ae61122c0

I cleaned up my code for a reason (that I never liked it and is
complex). So I would def recommend not to go down that path.

This second issue also doesn't happen with interrupt-remapping, as we avoid
touching the io-apic RTE's and use a virtual vector in the RTE
(which is same irrespective of the destination).

Will work with you next week in coming up with clean fixes.

thanks,
suresh

2009-06-06 01:22:07

by Suresh Siddha

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Thu, 2009-06-04 at 18:47 -0700, Eric W. Biederman wrote:
> As far as this patch goes it looks like an improvement.
>
> Acked-by: "Eric W. Biederman" <[email protected]>

Thanks. Ingo, Peter: Can you please queue this patch? Eric, more
comments below.

> However after looking at Gary's issues I see some things that are still wrong
> on this path.
>
> 1) We don't do the part of irq migration that moves irq threads.
> We aren't using irq threads yet but still
>

Ok. Will look at the irq threads code (-rt tree?).

> If we could figure out how to call irq_set_affinity for the IRQ_MOVE_PCNTXT
> code path that would make the maintenance a lot simpler.

Ok. Will post this cleanup sometime.

>
> 2) We still diverge on 32bit vs 64bit for no reason.
> I expect the fixed 64bit version should be moved into apic/io_apic.c

Agree.

>
> 3) We still enable irqs for a short while after this to let things drain.
> I am wondering if that is really necessary. It does very simply
> allow the irq cleanup ipi to happen, and it unjams any irqs that happened
> before we migrated them.

Yes.

> If we wanted to very strictly follow the rules I guess we could do something
> like the cleanup_ipi by hand on the cpu that is going down and rebroadcast
> all of the pending irqs to another cpu to process.
>

This will be ok for all the cases except the most difficult case (non
interrupt-remapping and IO-APIC level triggered). We should service the
pending interrupt from the same cpu 'X' (that is going down) because of
the vector information (that is cpu 'X' specific) in the IO-APIC RTE.
Otherwise we have to do directed EOI ( and I am not sure if all the
IO-APIC's support that) to the io-apic.

Is there a problem with enabling irqs in the fixup_irqs()?

My old proposal (which will fix the stuck IRR issue seen by Gary) for
fixing the level irq migration in fixup_irqs() is to do something like:

Disable interrupts
..
Mask io-apic RTE
check if the IRR is set
if so,
unmask the io-apic RTE
enable interrupts
and go back to top
else
ok to migrate the io-apic rte.

So if there is any other reason for keeping the interrupts disabled
during fixup_irqs(), then we need to think of another strategy to
address this.

Otherwise, If everyone agrees with this direction then we can try to
comeup with a clean patch for this.

thanks,
suresh

2009-06-06 02:58:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

Suresh Siddha <[email protected]> writes:

> On Thu, 2009-06-04 at 18:47 -0700, Eric W. Biederman wrote:
>> As far as this patch goes it looks like an improvement.
>>
>> Acked-by: "Eric W. Biederman" <[email protected]>
>
> Thanks. Ingo, Peter: Can you please queue this patch? Eric, more
> comments below.
>
>> However after looking at Gary's issues I see some things that are still wrong
>> on this path.
>>
>> 1) We don't do the part of irq migration that moves irq threads.
>> We aren't using irq threads yet but still
>>
>
> Ok. Will look at the irq threads code (-rt tree?).

In 2.6.30 No one is using them yet but the code is merged.

>> If we could figure out how to call irq_set_affinity for the IRQ_MOVE_PCNTXT
>> code path that would make the maintenance a lot simpler.
>
> Ok. Will post this cleanup sometime.

Thanks.

>> 2) We still diverge on 32bit vs 64bit for no reason.
>> I expect the fixed 64bit version should be moved into apic/io_apic.c
>
> Agree.
>
>>
>> 3) We still enable irqs for a short while after this to let things drain.
>> I am wondering if that is really necessary. It does very simply
>> allow the irq cleanup ipi to happen, and it unjams any irqs that happened
>> before we migrated them.
>
> Yes.
>
>> If we wanted to very strictly follow the rules I guess we could do something
>> like the cleanup_ipi by hand on the cpu that is going down and rebroadcast
>> all of the pending irqs to another cpu to process.
>>
>
> This will be ok for all the cases except the most difficult case (non
> interrupt-remapping and IO-APIC level triggered). We should service the
> pending interrupt from the same cpu 'X' (that is going down) because of
> the vector information (that is cpu 'X' specific) in the IO-APIC RTE.
> Otherwise we have to do directed EOI ( and I am not sure if all the
> IO-APIC's support that) to the io-apic.
>
> Is there a problem with enabling irqs in the fixup_irqs()?

Strictly speaking it is against the rules. At least it was last time
I audited the cpu hotunplug path. In practice it seems to work well.

> My old proposal (which will fix the stuck IRR issue seen by Gary) for
> fixing the level irq migration in fixup_irqs() is to do something like:
>
> Disable interrupts
> ..
> Mask io-apic RTE
> check if the IRR is set
> if so,
> unmask the io-apic RTE
> enable interrupts
> and go back to top
> else
> ok to migrate the io-apic rte.
>
> So if there is any other reason for keeping the interrupts disabled
> during fixup_irqs(), then we need to think of another strategy to
> address this.

As long as fixup_irqs is where we grow the nasty work-arounds and
we don't burden the other saner paths. I am generally in favor of it.

> Otherwise, If everyone agrees with this direction then we can try to
> comeup with a clean patch for this.

There is also something else we can do. We can register a cpu hotplug
notifier and tell hotunplug to fail if we dealing with a case that
we can not actually support cleanly.

At the same time there are a couple of more case that we can move into
process context. MSI cpu irq migration being the primary.

It needs a really good testing but I think all of lowest priority interrupt
delivery can be done safely from process context as well. Since we only
need a single register write and if we have properly shutdown the cpu
the hardware just won't deliver irqs to it automagically.

My pipe dream is that we can move just enough irq migration cases into
process context that the suspend to ram code code will be happy. And
we can make cpu hotunplug fail if the irqs are only safe to migrate
in interrupt context.

Eric

2009-06-06 23:38:15

by Gary Hade

[permalink] [raw]
Subject: Re: [patch] x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping

On Fri, Jun 05, 2009 at 05:57:29PM -0700, Suresh Siddha wrote:
> On Fri, 2009-06-05 at 15:20 -0700, Gary Hade wrote:
> > On Thu, Jun 04, 2009 at 06:18:14PM -0700, Suresh Siddha wrote:
> > > On Thu, 2009-06-04 at 16:13 -0700, Eric W. Biederman wrote:
> > > > Suresh Siddha <[email protected]> writes:
> > > >
> > > > > From: Suresh Siddha <[email protected]>
> > > > > Subject: x64: Avoid irq_chip mask/unmask in fixup_irqs for interrupt-remapping
> > > > >
> > > > > In the presence of interrupt-remapping, irqs will be migrated in the
> > > > > process context and we don't do (and there is no need to) irq_chip mask/unmask
> > > > > while migrating the interrupt.
> > > > >
> > > > > Similarly fix the fixup_irqs() that get called during cpu offline and avoid
> > > > > calling irq_chip mask/unmask for irqs that are ok to be migrated in the
> > > > > process context.
> > > > >
> > > > > While we didn't observe any race condition with the existing code,
> > > > > this change takes complete advantage of interrupt-remapping in
> > > > > the newer generation platforms and avoids any potential HW lockup's
> > > > > (that often worry Eric :)
> > > >
> > > > You now apparently fail to migrate the irq threads in tandem with
> > > > the rest of the irqs.
> > >
> > > Eric, Are you referring to Gary's issues? As far as I understand, they
> > > don't happen in the presence of interrupt-remapping.
> >
> > Suresh,
> > We do not currently have the h/w on which to test this assertion
> > but it seems like there is a good chance that at least the race that
> > http://lkml.org/lkml/2009/6/2/377
> > fixes could reproduce there.
>
> No. In the presence of interrupt-remapping, migration of the irq will be
> done atomically from the process context. So we don't depend for the
> next interrupt to arrive for the migration.

Thats good.

>
> In the particular case that you mentioned above, we are calling the
> send_cleanup_vector() from the set_affinity() itself in case of
> interrupt-remapping. And this will reset the cfg->move_in_progress.

Now that you mention this, I do remember seeing those send_cleanup()
calls in ir_set_msi_irq_affinity() and migrate_ioapic_irq_desc().
Much better than having to depend on the IRQ move destination CPU,
especially when correct behavior depends on it's timely receipt
of an interrupt.

>
> But I agree that this bug is pretty nasty for non interrupt-remapping
> cases and we are very lucky so far for not hitting it with all the
> irqbalance changes and with suspend/resume code path.

Yes, _very_ nasty. It was extremely easy for me to initially
reproduce the problem so I have also been wondering why others
have been so lucky. The first time I saw the problem was after
the first run of a simple script similar to the below script
that I had just written to do a quick-and-dirty CPU hotplug
sanity check. I then started seeing file i/o plus sas and aic94xx
device driver errors and found myself with a very trashed system.
Not good!

::::::::::::::
cpus_offline_all.sh
::::::::::::::
#!/bin/sh

##
# Offline all offlineable CPUs
##

SYS_CPU_DIR=/sys/devices/system/cpu

for d in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
cpu="`basename $d`"
state=`cat $d/online`
if [ "$state" = 1 ]; then
echo $cpu: offlining
echo 0 > $d/online
else
echo $cpu: already offline
fi
done

>
> While I agree with your online fix,

Thats encouraging. :)

> I have to catch up with Eric's concerns.
>
> >
> > The other problem that is repaired by
> > http://lkml.org/lkml/2009/6/2/378
>
> oh! This one is the famous Eric's rant about broken fixup_irqs() in the
> presence of non interrupt-remapping.
>
> Long time back I have proposed a solution to Eric to resolve this for
> non interrupt-remapping cases but don't think I never addressed that.
> Again let me catch up with Eric's comments and see if we can comeup with
> a solution that is acceptable to everyone.
>
> > depends on the i/o redirection table write with remote IRR bit
> > set lockup anomaly that the interrupt-remapping code may avoid
> > or perhaps is not even present with that h/w. My proposed fix
> > for the problem is based on previous interrupt-remapping code
> > that you recently removed with
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0280f7c416c652a2fd95d166f52b199ae61122c0
>
> I cleaned up my code for a reason (that I never liked it and is
> complex).

That delay code may look a little complex but it sure seems
to do a nice job. Based on my testing it certainly feels
solid from a functional standpoint.

> So I would def recommend not to go down that path.

Since it has been working so great for us, I am sorry
to hear you say this. However, I am open to any ideas you
have that would maintain the current user interface and would
simple and non-intrusive enough for our distribution partners
to feel comfortable adding via an update an existing release.
It would be nice to get this kind of fix into mainline as
quickly as possible to prevent further propagation of the bug
before a better but more disruptive overall solution, such as
the one that Eric has been pushing for, is available.

BTW, this one has been much harder to reproduce than the
other issue so I am not nearly as surprised that others have
not reported it. I tripped on it after fixing the other issue
and deciding that it might be a good idea to crank up the
interrupt rate while testing the fix.

I am actually at work today because I had some ideas on how
to revise this patch to address Eric's most recent concerns
about the IMHO minor, yet still very worrysome to Eric, changes
it makes to the interrupt context migration path. Since it
sounds like you have concerns beyond it not touching the
interrupt context migration path, I guess I will hold off on
this work until you have a chance to look at the issue next week.

>
> This second issue also doesn't happen with interrupt-remapping, as we avoid
> touching the io-apic RTE's and use a virtual vector in the RTE
> (which is same irrespective of the destination).
>
> Will work with you next week in coming up with clean fixes.

Thanks. I really appreciate this.

Gary

--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
[email protected]
http://www.ibm.com/linux/ltc