2010-11-30 08:29:13

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 0/2] Fix dmar fault interrupt problems

Hi,

Here are patches to fix the following problems regarding dmar fault interrupt.

- No dmar fault event is notified in x2apic cluster mode
- Changing IRQ affinity of dmar fault interrupt causes "No irq handler
for vector (irq XX)" message and dmar fault interrupts are never
notified after that.

Regards,
Kenji Kaneshige


2010-11-30 08:31:27

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 1/2] dmar: fix fault interrupt setup

Fix the problem dmar fault event is not notified in x2apic cluster mode.

In the current code, dmar fault event is configured before setting up
the x86_cpu_to_logical_apicid percpu variable in x2apic cluster
mode. Because of this, invalid apic ID is used for dmar fault
interrupt and this cuases the problem.

To fix the problem, do dmar fault event configuration after local apic
setup (after end_local_APIC_setup()).

Signed-off-by: Kenji Kaneshige <[email protected]>


---
arch/x86/kernel/apic/apic.c | 9 +++++++++
arch/x86/kernel/apic/probe_64.c | 7 -------
2 files changed, 9 insertions(+), 7 deletions(-)

Index: linux-next-20101125/arch/x86/kernel/apic/apic.c
===================================================================
--- linux-next-20101125.orig/arch/x86/kernel/apic/apic.c
+++ linux-next-20101125/arch/x86/kernel/apic/apic.c
@@ -1745,6 +1745,15 @@ int __init APIC_init_uniprocessor(void)

end_local_APIC_setup();

+#ifdef CONFIG_INTR_REMAP
+ /*
+ * Now that local APIC setup is completed, configure the fault
+ * handling for interrupt remapping.
+ */
+ if (intr_remapping_enabled)
+ enable_drhd_fault_handling();
+#endif
+
#ifdef CONFIG_X86_IO_APIC
if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
setup_IO_APIC();
Index: linux-next-20101125/arch/x86/kernel/apic/probe_64.c
===================================================================
--- linux-next-20101125.orig/arch/x86/kernel/apic/probe_64.c
+++ linux-next-20101125/arch/x86/kernel/apic/probe_64.c
@@ -79,13 +79,6 @@ void __init default_setup_apic_routing(v
/* need to update phys_pkg_id */
apic->phys_pkg_id = apicid_phys_pkg_id;
}
-
- /*
- * Now that apic routing model is selected, configure the
- * fault handling for intr remapping.
- */
- if (intr_remapping_enabled)
- enable_drhd_fault_handling();
}

/* Same for both flat and physical. */

2010-11-30 08:33:00

by Kenji Kaneshige

[permalink] [raw]
Subject: [PATCH 2/2] dmar: Fix dmar interrupt affinity handling

Fix the problem that changing IRQ affinity of dmar fault interrupt
causes "No irq handler for vector (irq XX)" message and dmar fault
interrupts are never notified after that.

The dmar_msi_set_affinity() must configure upper address register of
remapping hardware interrupt in x2apic mode.

Signed-off-by: Kenji Kaneshige <[email protected]>

---
arch/x86/kernel/apic/io_apic.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-next-20101125/arch/x86/kernel/apic/io_apic.c
===================================================================
--- linux-next-20101125.orig/arch/x86/kernel/apic/io_apic.c
+++ linux-next-20101125/arch/x86/kernel/apic/io_apic.c
@@ -3413,6 +3413,9 @@ dmar_msi_set_affinity(struct irq_data *d
msg.data |= MSI_DATA_VECTOR(cfg->vector);
msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
msg.address_lo |= MSI_ADDR_DEST_ID(dest);
+ msg.address_hi = MSI_ADDR_BASE_HI;
+ if (x2apic_enabled())
+ msg.address_hi |= MSI_ADDR_EXT_DEST_ID(dest);

dmar_msi_write(irq, &msg);


2010-11-30 18:09:11

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

On Tue, 2010-11-30 at 00:28 -0800, Kenji Kaneshige wrote:
> Hi,
>
> Here are patches to fix the following problems regarding dmar fault interrupt.
>
> - No dmar fault event is notified in x2apic cluster mode
> - Changing IRQ affinity of dmar fault interrupt causes "No irq handler
> for vector (irq XX)" message and dmar fault interrupts are never
> notified after that.

Kenji, In addition to these two, there are couple of more patches that
are needed to fix the kexec/kdump issue that we are debugging on RH
bugzilla.

I will post all these 4 patches shortly. I already noticed and queued
the second patch (dmar irq migration) you mentioned.

thanks,
suresh

2010-11-30 18:15:36

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

* Suresh Siddha ([email protected]) wrote:
> On Tue, 2010-11-30 at 00:28 -0800, Kenji Kaneshige wrote:
> > Hi,
> >
> > Here are patches to fix the following problems regarding dmar fault interrupt.
> >
> > - No dmar fault event is notified in x2apic cluster mode
> > - Changing IRQ affinity of dmar fault interrupt causes "No irq handler
> > for vector (irq XX)" message and dmar fault interrupts are never
> > notified after that.
>
> Kenji, In addition to these two, there are couple of more patches that
> are needed to fix the kexec/kdump issue that we are debugging on RH
> bugzilla.
>
> I will post all these 4 patches shortly. I already noticed and queued
> the second patch (dmar irq migration) you mentioned.

And one of them (the additional call to flushing existing dmar faults)
is not needed. So the quirk plus ensuring ir faults show up as msi
interrupts should be sufficient.

thanks,
-chris

2010-11-30 18:20:16

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

On Tue, 2010-11-30 at 10:13 -0800, Chris Wright wrote:
> * Suresh Siddha ([email protected]) wrote:
> > On Tue, 2010-11-30 at 00:28 -0800, Kenji Kaneshige wrote:
> > > Hi,
> > >
> > > Here are patches to fix the following problems regarding dmar fault interrupt.
> > >
> > > - No dmar fault event is notified in x2apic cluster mode
> > > - Changing IRQ affinity of dmar fault interrupt causes "No irq handler
> > > for vector (irq XX)" message and dmar fault interrupts are never
> > > notified after that.
> >
> > Kenji, In addition to these two, there are couple of more patches that
> > are needed to fix the kexec/kdump issue that we are debugging on RH
> > bugzilla.
> >
> > I will post all these 4 patches shortly. I already noticed and queued
> > the second patch (dmar irq migration) you mentioned.
>
> And one of them (the additional call to flushing existing dmar faults)
> is not needed.

yes, that patch is not needed for the current kexec/kdump issue. But
nevertheless that is an issue that needs to be addressed aswell.

thanks,
suresh

2010-11-30 19:29:15

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

* Suresh Siddha ([email protected]) wrote:
> On Tue, 2010-11-30 at 10:13 -0800, Chris Wright wrote:
> > * Suresh Siddha ([email protected]) wrote:
> > > On Tue, 2010-11-30 at 00:28 -0800, Kenji Kaneshige wrote:
> > > > Hi,
> > > >
> > > > Here are patches to fix the following problems regarding dmar fault interrupt.
> > > >
> > > > - No dmar fault event is notified in x2apic cluster mode
> > > > - Changing IRQ affinity of dmar fault interrupt causes "No irq handler
> > > > for vector (irq XX)" message and dmar fault interrupts are never
> > > > notified after that.
> > >
> > > Kenji, In addition to these two, there are couple of more patches that
> > > are needed to fix the kexec/kdump issue that we are debugging on RH
> > > bugzilla.
> > >
> > > I will post all these 4 patches shortly. I already noticed and queued
> > > the second patch (dmar irq migration) you mentioned.
> >
> > And one of them (the additional call to flushing existing dmar faults)
> > is not needed.
>
> yes, that patch is not needed for the current kexec/kdump issue. But
> nevertheless that is an issue that needs to be addressed aswell.

AFAICT, it just becomes a duplicate call in a narrow window.

native_smp_prepare_cpus() (or APIC_init_uniprocessor()):
enable_IR_x2apic()
enable_IR()
enable_intr_remapping() <-- clears faults
default_setup_apic_routing()
enable_drhd_fault_handling() <-- added call to clear faults

That's what I meant by not needed.

thanks,
-chris

2010-11-30 19:52:56

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

On Tue, 2010-11-30 at 11:28 -0800, Chris Wright wrote:
> AFAICT, it just becomes a duplicate call in a narrow window.
>
> native_smp_prepare_cpus() (or APIC_init_uniprocessor()):
> enable_IR_x2apic()
> enable_IR()
> enable_intr_remapping() <-- clears faults
> default_setup_apic_routing()
> enable_drhd_fault_handling() <-- added call to clear faults
>
> That's what I meant by not needed.

I agree that we will call clear faults twice.

With out the second clear fault, any fault in that narrow window of
enabling intr-remapping and enabling fault handling will go unnoticed
and block further faults getting reported.

I am not sure if the first fault is necessary. I can check and remove it
if not needed.

Ideally we want to do something like:

Setup fault handling
Clear any faults
Enable intr-remapping

But fault handling interrupt configuration depends on the apic mode that
gets selected based on whether intr-remapping was enabled successfully
or not. And hence we were enabling fault handling after enabling the
intr-remapping mode.

I can see if I can cleanup any of these flows for 2.6.38. For 2.6.37 and
older kernels I would like to keep the patch simple.

thanks,
suresh

2010-11-30 20:18:20

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 1/2] dmar: fix fault interrupt setup

On Tue, 2010-11-30 at 00:30 -0800, Kenji Kaneshige wrote:
> Fix the problem dmar fault event is not notified in x2apic cluster mode.
>
> In the current code, dmar fault event is configured before setting up
> the x86_cpu_to_logical_apicid percpu variable in x2apic cluster
> mode. Because of this, invalid apic ID is used for dmar fault
> interrupt and this cuases the problem.
>
> To fix the problem, do dmar fault event configuration after local apic
> setup (after end_local_APIC_setup()).
>
> Signed-off-by: Kenji Kaneshige <[email protected]>
>
>
> ---
> arch/x86/kernel/apic/apic.c | 9 +++++++++
> arch/x86/kernel/apic/probe_64.c | 7 -------
> 2 files changed, 9 insertions(+), 7 deletions(-)
>
> Index: linux-next-20101125/arch/x86/kernel/apic/apic.c
> ===================================================================
> --- linux-next-20101125.orig/arch/x86/kernel/apic/apic.c
> +++ linux-next-20101125/arch/x86/kernel/apic/apic.c
> @@ -1745,6 +1745,15 @@ int __init APIC_init_uniprocessor(void)
>
> end_local_APIC_setup();
>
> +#ifdef CONFIG_INTR_REMAP
> + /*
> + * Now that local APIC setup is completed, configure the fault
> + * handling for interrupt remapping.
> + */
> + if (intr_remapping_enabled)
> + enable_drhd_fault_handling();
> +#endif

BTW, this is wrong. APIC_init_uniprocessor() gets called only for UP
kernel. I will move it to the end_local_APIC_setup()

> +
> #ifdef CONFIG_X86_IO_APIC
> if (smp_found_config && !skip_ioapic_setup && nr_ioapics)
> setup_IO_APIC();
> Index: linux-next-20101125/arch/x86/kernel/apic/probe_64.c
> ===================================================================
> --- linux-next-20101125.orig/arch/x86/kernel/apic/probe_64.c
> +++ linux-next-20101125/arch/x86/kernel/apic/probe_64.c
> @@ -79,13 +79,6 @@ void __init default_setup_apic_routing(v
> /* need to update phys_pkg_id */
> apic->phys_pkg_id = apicid_phys_pkg_id;
> }
> -
> - /*
> - * Now that apic routing model is selected, configure the
> - * fault handling for intr remapping.
> - */
> - if (intr_remapping_enabled)
> - enable_drhd_fault_handling();
> }
>
> /* Same for both flat and physical. */
>
>

2010-11-30 22:21:10

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix dmar fault interrupt problems

* Suresh Siddha ([email protected]) wrote:
> On Tue, 2010-11-30 at 11:28 -0800, Chris Wright wrote:
> > AFAICT, it just becomes a duplicate call in a narrow window.
> >
> > native_smp_prepare_cpus() (or APIC_init_uniprocessor()):
> > enable_IR_x2apic()
> > enable_IR()
> > enable_intr_remapping() <-- clears faults
> > default_setup_apic_routing()
> > enable_drhd_fault_handling() <-- added call to clear faults
> >
> > That's what I meant by not needed.
>
> I agree that we will call clear faults twice.
>
> With out the second clear fault, any fault in that narrow window of
> enabling intr-remapping and enabling fault handling will go unnoticed
> and block further faults getting reported.
>
> I am not sure if the first fault is necessary. I can check and remove it
> if not needed.
>
> Ideally we want to do something like:
>
> Setup fault handling
> Clear any faults
> Enable intr-remapping

Agree, that's ideal.

> But fault handling interrupt configuration depends on the apic mode that
> gets selected based on whether intr-remapping was enabled successfully
> or not. And hence we were enabling fault handling after enabling the
> intr-remapping mode.
>
> I can see if I can cleanup any of these flows for 2.6.38. For 2.6.37 and
> older kernels I would like to keep the patch simple.

Yup, makes sense.

thanks,
-chris

2010-12-01 05:14:37

by Kenji Kaneshige

[permalink] [raw]
Subject: Re: [PATCH 1/2] dmar: fix fault interrupt setup

(2010/12/01 5:18), Suresh Siddha wrote:
> On Tue, 2010-11-30 at 00:30 -0800, Kenji Kaneshige wrote:
>> Fix the problem dmar fault event is not notified in x2apic cluster mode.
>>
>> In the current code, dmar fault event is configured before setting up
>> the x86_cpu_to_logical_apicid percpu variable in x2apic cluster
>> mode. Because of this, invalid apic ID is used for dmar fault
>> interrupt and this cuases the problem.
>>
>> To fix the problem, do dmar fault event configuration after local apic
>> setup (after end_local_APIC_setup()).
>>
>> Signed-off-by: Kenji Kaneshige<[email protected]>
>>
>>
>> ---
>> arch/x86/kernel/apic/apic.c | 9 +++++++++
>> arch/x86/kernel/apic/probe_64.c | 7 -------
>> 2 files changed, 9 insertions(+), 7 deletions(-)
>>
>> Index: linux-next-20101125/arch/x86/kernel/apic/apic.c
>> ===================================================================
>> --- linux-next-20101125.orig/arch/x86/kernel/apic/apic.c
>> +++ linux-next-20101125/arch/x86/kernel/apic/apic.c
>> @@ -1745,6 +1745,15 @@ int __init APIC_init_uniprocessor(void)
>>
>> end_local_APIC_setup();
>>
>> +#ifdef CONFIG_INTR_REMAP
>> + /*
>> + * Now that local APIC setup is completed, configure the fault
>> + * handling for interrupt remapping.
>> + */
>> + if (intr_remapping_enabled)
>> + enable_drhd_fault_handling();
>> +#endif
>
> BTW, this is wrong. APIC_init_uniprocessor() gets called only for UP
> kernel. I will move it to the end_local_APIC_setup()
>

Oh, it's totally wrong. I'm sorry about that.

Maybe even with my wrong patch, dmar interrupt is configured in the
intel_iommu_init()->init_dmars()->dmar_set_interrupt() code path. I
think this is why I could not noticed my badness in my test.

By the way, why do we need to try to configure dmar interrupt in the
intel_iommu_init() code path again?

Regards,
Kenji Kaneshige