2008-10-24 12:46:50

by Kumar Gala

[permalink] [raw]
Subject: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

It appears the default IRQ affinity changes from being just cpu 0 to
all cpu's. This breaks several PPC SMP systems in which only a single
processor is allowed to be selected as the destination of the IRQ.

What is the right answer in fixing this? Should we:

cpumask_t irq_default_affinity = 1;

instead of

cpumask_t irq_default_affinity = CPU_MASK_ALL?

- k


2008-10-24 15:17:15

by Chris Snook

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Kumar Gala wrote:
> It appears the default IRQ affinity changes from being just cpu 0 to all
> cpu's. This breaks several PPC SMP systems in which only a single
> processor is allowed to be selected as the destination of the IRQ.
>
> What is the right answer in fixing this? Should we:
>
> cpumask_t irq_default_affinity = 1;
>
> instead of
>
> cpumask_t irq_default_affinity = CPU_MASK_ALL?

On those systems, perhaps, but not universally. There's plenty of hardware
where the physical topology of the machine is abstracted away from the OS, and
you need to leave the mask wide open and let the APIC figure out where to map
the IRQs. Ideally, we should probably make this decision based on the APIC, but
if there's no PPC hardware that uses this technique, then it would suffice to
make this arch-specific.

-- Chris

2008-10-24 15:40:14

by Kumar Gala

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)


On Oct 24, 2008, at 10:17 AM, Chris Snook wrote:

> Kumar Gala wrote:
>> It appears the default IRQ affinity changes from being just cpu 0
>> to all cpu's. This breaks several PPC SMP systems in which only a
>> single processor is allowed to be selected as the destination of
>> the IRQ.
>> What is the right answer in fixing this? Should we:
>> cpumask_t irq_default_affinity = 1;
>> instead of
>> cpumask_t irq_default_affinity = CPU_MASK_ALL?
>
> On those systems, perhaps, but not universally. There's plenty of
> hardware where the physical topology of the machine is abstracted
> away from the OS, and you need to leave the mask wide open and let
> the APIC figure out where to map the IRQs. Ideally, we should
> probably make this decision based on the APIC, but if there's no PPC
> hardware that uses this technique, then it would suffice to make
> this arch-specific.


What did those systems do before this patch? Its one thing to expose
a mask in the ability to change the default mask in /proc/irq/
default_smp_affinity. Its another (and a regression in my opinion) to
change the mask value itself.

As for making it ARCH specific, that doesn't really help since not all
PPC hw has the limitation I spoke of. Not even all MPIC (in our
cases) have the limitation.

- k

2008-10-24 16:08:51

by Chris Snook

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Kumar Gala wrote:
>
> On Oct 24, 2008, at 10:17 AM, Chris Snook wrote:
>
>> Kumar Gala wrote:
>>> It appears the default IRQ affinity changes from being just cpu 0 to
>>> all cpu's. This breaks several PPC SMP systems in which only a
>>> single processor is allowed to be selected as the destination of the
>>> IRQ.
>>> What is the right answer in fixing this? Should we:
>>> cpumask_t irq_default_affinity = 1;
>>> instead of
>>> cpumask_t irq_default_affinity = CPU_MASK_ALL?
>>
>> On those systems, perhaps, but not universally. There's plenty of
>> hardware where the physical topology of the machine is abstracted away
>> from the OS, and you need to leave the mask wide open and let the APIC
>> figure out where to map the IRQs. Ideally, we should probably make
>> this decision based on the APIC, but if there's no PPC hardware that
>> uses this technique, then it would suffice to make this arch-specific.
>
>
> What did those systems do before this patch? Its one thing to expose a
> mask in the ability to change the default mask in
> /proc/irq/default_smp_affinity. Its another (and a regression in my
> opinion) to change the mask value itself.

Before the patch they took an extremely long time to boot if they had storage
attached to each node of a multi-chassis system, performed poorly unless special
irqbalance hackery or manual assignment was used, and imposed artificial
restrictions on the granularity of hardware partitioning to ensure that CPU 0
would always be a CPU that could service all interrupts necessary to boot the OS.

> As for making it ARCH specific, that doesn't really help since not all
> PPC hw has the limitation I spoke of. Not even all MPIC (in our cases)
> have the limitation.

What did those systems do before this patch? :)

Making it arch-specific is an extremely simple way to solve your problem without
making trouble for the people who wanted this patch in the first place. If PPC
needs further refinement to handle particular *PICs, you can implement that
without touching any arch-generic code.

-- Chris

2008-10-24 16:37:54

by Kumar Gala

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)


On Oct 24, 2008, at 11:09 AM, Chris Snook wrote:

> Kumar Gala wrote:
>> On Oct 24, 2008, at 10:17 AM, Chris Snook wrote:
>>> Kumar Gala wrote:
>>>> It appears the default IRQ affinity changes from being just cpu 0
>>>> to all cpu's. This breaks several PPC SMP systems in which only
>>>> a single processor is allowed to be selected as the destination
>>>> of the IRQ.
>>>> What is the right answer in fixing this? Should we:
>>>> cpumask_t irq_default_affinity = 1;
>>>> instead of
>>>> cpumask_t irq_default_affinity = CPU_MASK_ALL?
>>>
>>> On those systems, perhaps, but not universally. There's plenty of
>>> hardware where the physical topology of the machine is abstracted
>>> away from the OS, and you need to leave the mask wide open and let
>>> the APIC figure out where to map the IRQs. Ideally, we should
>>> probably make this decision based on the APIC, but if there's no
>>> PPC hardware that uses this technique, then it would suffice to
>>> make this arch-specific.
>> What did those systems do before this patch? Its one thing to
>> expose a mask in the ability to change the default mask in /proc/
>> irq/default_smp_affinity. Its another (and a regression in my
>> opinion) to change the mask value itself.
>
> Before the patch they took an extremely long time to boot if they
> had storage attached to each node of a multi-chassis system,
> performed poorly unless special irqbalance hackery or manual
> assignment was used, and imposed artificial restrictions on the
> granularity of hardware partitioning to ensure that CPU 0 would
> always be a CPU that could service all interrupts necessary to boot
> the OS.
>
>> As for making it ARCH specific, that doesn't really help since not
>> all PPC hw has the limitation I spoke of. Not even all MPIC (in
>> our cases) have the limitation.
>
> What did those systems do before this patch? :)
>
> Making it arch-specific is an extremely simple way to solve your
> problem without making trouble for the people who wanted this patch
> in the first place. If PPC needs further refinement to handle
> particular *PICs, you can implement that without touching any arch-
> generic code.


So why not just have x86 startup code set irq_default_affinity =
CPU_MASK_ALL than?

- k

2008-10-24 17:41:25

by Scott Wood

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Kumar Gala wrote:
> So why not just have x86 startup code set irq_default_affinity =
> CPU_MASK_ALL than?

That doesn't really solve the problem, as a user could still manually
set an invalid affinity. The MPIC driver should reduce the affinity
itself to what the hardware can handle.

-Scott

2008-10-24 17:51:36

by Chris Snook

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Kumar Gala wrote:
>
> On Oct 24, 2008, at 11:09 AM, Chris Snook wrote:
>
>> Kumar Gala wrote:
>>> On Oct 24, 2008, at 10:17 AM, Chris Snook wrote:
>>>> Kumar Gala wrote:
>>>>> It appears the default IRQ affinity changes from being just cpu 0
>>>>> to all cpu's. This breaks several PPC SMP systems in which only a
>>>>> single processor is allowed to be selected as the destination of
>>>>> the IRQ.
>>>>> What is the right answer in fixing this? Should we:
>>>>> cpumask_t irq_default_affinity = 1;
>>>>> instead of
>>>>> cpumask_t irq_default_affinity = CPU_MASK_ALL?
>>>>
>>>> On those systems, perhaps, but not universally. There's plenty of
>>>> hardware where the physical topology of the machine is abstracted
>>>> away from the OS, and you need to leave the mask wide open and let
>>>> the APIC figure out where to map the IRQs. Ideally, we should
>>>> probably make this decision based on the APIC, but if there's no PPC
>>>> hardware that uses this technique, then it would suffice to make
>>>> this arch-specific.
>>> What did those systems do before this patch? Its one thing to expose
>>> a mask in the ability to change the default mask in
>>> /proc/irq/default_smp_affinity. Its another (and a regression in my
>>> opinion) to change the mask value itself.
>>
>> Before the patch they took an extremely long time to boot if they had
>> storage attached to each node of a multi-chassis system, performed
>> poorly unless special irqbalance hackery or manual assignment was
>> used, and imposed artificial restrictions on the granularity of
>> hardware partitioning to ensure that CPU 0 would always be a CPU that
>> could service all interrupts necessary to boot the OS.
>>
>>> As for making it ARCH specific, that doesn't really help since not
>>> all PPC hw has the limitation I spoke of. Not even all MPIC (in our
>>> cases) have the limitation.
>>
>> What did those systems do before this patch? :)
>>
>> Making it arch-specific is an extremely simple way to solve your
>> problem without making trouble for the people who wanted this patch in
>> the first place. If PPC needs further refinement to handle particular
>> *PICs, you can implement that without touching any arch-generic code.
>
>
> So why not just have x86 startup code set irq_default_affinity =
> CPU_MASK_ALL than?

It's an issue on Itanium as well, and potentially any SMP architecture with a
non-trivial interconnect.

-- Chris

2008-10-24 18:18:54

by Chris Snook

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Scott Wood wrote:
> Kumar Gala wrote:
>> So why not just have x86 startup code set irq_default_affinity =
>> CPU_MASK_ALL than?
>
> That doesn't really solve the problem, as a user could still manually
> set an invalid affinity. The MPIC driver should reduce the affinity
> itself to what the hardware can handle.

Does the MPIC code actually allow that to happen? I can't quite tell, but I
noticed this:

[csnook@bernoulli sysdev]$ fgrep '#ifdef CONFIG_' mpic.c | sort -u
#ifdef CONFIG_IRQ_ALL_CPUS
#ifdef CONFIG_MPIC_BROKEN_REGREAD
#ifdef CONFIG_MPIC_U3_HT_IRQS
#ifdef CONFIG_MPIC_WEIRD
#ifdef CONFIG_PCI_MSI
#ifdef CONFIG_PM
#ifdef CONFIG_PPC32 /* XXX for now */
#ifdef CONFIG_PPC_DCR
#ifdef CONFIG_SMP

Do any of those config options (or combinations thereof) imply an MPIC that
can't handle an IRQ masked to multiple CPUs? If so, this can be fixed rather
easily at build time, without having to muck around with arch-specific
initialization code.

-- Chris

2008-10-24 18:27:39

by Scott Wood

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

Chris Snook wrote:
> Scott Wood wrote:
>> Kumar Gala wrote:
>>> So why not just have x86 startup code set irq_default_affinity =
>>> CPU_MASK_ALL than?
>>
>> That doesn't really solve the problem, as a user could still manually
>> set an invalid affinity. The MPIC driver should reduce the affinity
>> itself to what the hardware can handle.
>
> Does the MPIC code actually allow that to happen?

As far as I can tell, though I haven't tested it.

> I can't quite tell, but I noticed this:
>
> [csnook@bernoulli sysdev]$ fgrep '#ifdef CONFIG_' mpic.c | sort -u
> #ifdef CONFIG_IRQ_ALL_CPUS
> #ifdef CONFIG_MPIC_BROKEN_REGREAD
> #ifdef CONFIG_MPIC_U3_HT_IRQS
> #ifdef CONFIG_MPIC_WEIRD
> #ifdef CONFIG_PCI_MSI
> #ifdef CONFIG_PM
> #ifdef CONFIG_PPC32 /* XXX for now */
> #ifdef CONFIG_PPC_DCR
> #ifdef CONFIG_SMP
>
> Do any of those config options (or combinations thereof) imply an MPIC
> that can't handle an IRQ masked to multiple CPUs? If so, this can be
> fixed rather easily at build time, without having to muck around with
> arch-specific initialization code.

I don't think so, and in any case it should be detected at runtime from
the device tree.

-Scott

2008-10-24 23:18:59

by David Miller

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)

From: Kumar Gala <[email protected]>
Date: Fri, 24 Oct 2008 10:39:05 -0500

> As for making it ARCH specific, that doesn't really help since not
> all PPC hw has the limitation I spoke of. Not even all MPIC (in our
> cases) have the limitation.

Since the PPC code knows exactly which MPICs have the problem the
PPC code is where the constraining can occur.

I agree completely with the suggestion that the arch code has to
interpret the cpumask as appropriate for the hardware, since the
user can stick "illegal" values there anyways.

2008-11-19 06:43:47

by Max Krasnyansky

[permalink] [raw]
Subject: Re: default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems)



David Miller wrote:
> From: Kumar Gala <[email protected]>
> Date: Fri, 24 Oct 2008 10:39:05 -0500
>
>> As for making it ARCH specific, that doesn't really help since not
>> all PPC hw has the limitation I spoke of. Not even all MPIC (in our
>> cases) have the limitation.
>
> Since the PPC code knows exactly which MPICs have the problem the
> PPC code is where the constraining can occur.
>
> I agree completely with the suggestion that the arch code has to
> interpret the cpumask as appropriate for the hardware, since the
> user can stick "illegal" values there anyways.

Sorry for delay in replying to this. And sorry for causing regression on some
ppc platforms.
I totally agree with what Dave said above. ALL_CPUS is a sane default,
platform code has to sanity check masks passed via set_affinity() calls
anyway. So I beleive it should be fixed in the platform code.

Max