This erratum is observed on the ThunderX2 GICv3 ITS. When a
MOVI command is used to change affinity of a LPI to a collection/cpu
on another node, the LPI is not delivered to the cpu.
An additional INV command is required after the MOVI to deliver
the LPI to the new destination.
If we add INV after MOVI, there is a chance that we lose LPIs which
are raised when the affinity is changed. So for now, adding workaround fix
to disable inter node affinity change.
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
---
v2: Added workaround to avoid inter node affinity change.
v1: Initial patch
Documentation/arm64/silicon-errata.txt | 1 +
arch/arm64/Kconfig | 10 ++++++++++
drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
3 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index fc1c884..fb27cb5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -63,6 +63,7 @@ stable kernels.
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
+| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
| Cavium | ThunderX2 SMMUv3| #74 | N/A |
| Cavium | ThunderX2 SMMUv3| #126 | N/A |
| | | | |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e..0dbf3bd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
If unsure, say Y.
+config CAVIUM_ERRATUM_174
+ bool "Cavium ThunderX2 erratum 174"
+ default y
+ help
+ Cavium ThunderX2 dual socket systems may loose interrupts
+ on affinity change to a cpu on other node.
+ This workaround fix avoids inter node affinity change.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 06f025f..b0cb528 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -46,6 +46,7 @@
#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0)
#define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
#define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
+#define ITS_FLAGS_WORKAROUND_CAVIUM_174 (1ULL << 3)
#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
@@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
return -EINVAL;
/* lpi cannot be routed to a redistributor that is on a foreign node */
- if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
+ if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
+ its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
if (its_dev->its->numa_node >= 0) {
cpu_mask = cpumask_of_node(its_dev->its->numa_node);
if (!cpumask_intersects(mask_val, cpu_mask))
@@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
}
}
+static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
+{
+ struct its_node *its = data;
+
+ its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
+
+ return true;
+}
+
static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
{
struct its_node *its = data;
@@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
.init = its_enable_quirk_hip07_161600802,
},
#endif
+#ifdef CONFIG_CAVIUM_ERRATUM_174
+ {
+ .desc = "ITS: Cavium ThunderX2 erratum 174",
+ .iidr = 0x13f, /* ThunderX2 pass A1/A2/B0 */
+ .mask = 0xffffffff,
+ .init = its_enable_quirk_cavium_174,
+ },
+#endif
{
}
};
--
2.9.4
On 18/01/18 05:28, Ganapatrao Kulkarni wrote:
> This erratum is observed on the ThunderX2 GICv3 ITS. When a
> MOVI command is used to change affinity of a LPI to a collection/cpu
> on another node, the LPI is not delivered to the cpu.
> An additional INV command is required after the MOVI to deliver
> the LPI to the new destination.
>
> If we add INV after MOVI, there is a chance that we lose LPIs which
> are raised when the affinity is changed. So for now, adding workaround fix
> to disable inter node affinity change.
>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> ---
>
> v2: Added workaround to avoid inter node affinity change.
>
> v1: Initial patch
>
> Documentation/arm64/silicon-errata.txt | 1 +
> arch/arm64/Kconfig | 10 ++++++++++
> drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
> 3 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index fc1c884..fb27cb5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -63,6 +63,7 @@ stable kernels.
> | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
> | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
> | Cavium | ThunderX SMMUv2 | #27704 | N/A |
> +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
> | Cavium | ThunderX2 SMMUv3| #74 | N/A |
> | Cavium | ThunderX2 SMMUv3| #126 | N/A |
> | | | | |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c9a7e9e..0dbf3bd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>
> If unsure, say Y.
>
> +config CAVIUM_ERRATUM_174
> + bool "Cavium ThunderX2 erratum 174"
> + default y
> + help
> + Cavium ThunderX2 dual socket systems may loose interrupts
> + on affinity change to a cpu on other node.
> + This workaround fix avoids inter node affinity change.
> +
> + If unsure, say Y.
> +
> config CAVIUM_ERRATUM_22375
> bool "Cavium erratum 22375, 24313"
> default y
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 06f025f..b0cb528 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -46,6 +46,7 @@
> #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0)
> #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
> #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
> +#define ITS_FLAGS_WORKAROUND_CAVIUM_174 (1ULL << 3)
Instead of inventing a new flag, please rename the existing one to
ITS_FLAG_WORKAROUND_RESTRICT_NODE_AFFINITY (or something similar). There
is really no need to have two flags that do the exact same thing,
>
> #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>
> @@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
> return -EINVAL;
>
> /* lpi cannot be routed to a redistributor that is on a foreign node */
> - if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
> + if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
> + its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
> if (its_dev->its->numa_node >= 0) {
> cpu_mask = cpumask_of_node(its_dev->its->numa_node);
> if (!cpumask_intersects(mask_val, cpu_mask))
> @@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
> }
> }
>
> +static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
> +{
> + struct its_node *its = data;
> +
> + its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
> +
> + return true;
> +}
> +
> static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
> {
> struct its_node *its = data;
> @@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
> .init = its_enable_quirk_hip07_161600802,
> },
> #endif
> +#ifdef CONFIG_CAVIUM_ERRATUM_174
> + {
> + .desc = "ITS: Cavium ThunderX2 erratum 174",
> + .iidr = 0x13f, /* ThunderX2 pass A1/A2/B0 */
> + .mask = 0xffffffff,
> + .init = its_enable_quirk_cavium_174,
> + },
> +#endif
> {
> }
> };
>
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Fri, Jan 19, 2018 at 5:53 PM, Marc Zyngier <[email protected]> wrote:
> On 18/01/18 05:28, Ganapatrao Kulkarni wrote:
>> This erratum is observed on the ThunderX2 GICv3 ITS. When a
>> MOVI command is used to change affinity of a LPI to a collection/cpu
>> on another node, the LPI is not delivered to the cpu.
>> An additional INV command is required after the MOVI to deliver
>> the LPI to the new destination.
>>
>> If we add INV after MOVI, there is a chance that we lose LPIs which
>> are raised when the affinity is changed. So for now, adding workaround fix
>> to disable inter node affinity change.
>>
>> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
>> ---
>>
>> v2: Added workaround to avoid inter node affinity change.
>>
>> v1: Initial patch
>>
>> Documentation/arm64/silicon-errata.txt | 1 +
>> arch/arm64/Kconfig | 10 ++++++++++
>> drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
>> 3 files changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index fc1c884..fb27cb5 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -63,6 +63,7 @@ stable kernels.
>> | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
>> | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
>> | Cavium | ThunderX SMMUv2 | #27704 | N/A |
>> +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
>> | Cavium | ThunderX2 SMMUv3| #74 | N/A |
>> | Cavium | ThunderX2 SMMUv3| #126 | N/A |
>> | | | | |
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index c9a7e9e..0dbf3bd 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>>
>> If unsure, say Y.
>>
>> +config CAVIUM_ERRATUM_174
>> + bool "Cavium ThunderX2 erratum 174"
>> + default y
>> + help
>> + Cavium ThunderX2 dual socket systems may loose interrupts
>> + on affinity change to a cpu on other node.
>> + This workaround fix avoids inter node affinity change.
>> +
>> + If unsure, say Y.
>> +
>> config CAVIUM_ERRATUM_22375
>> bool "Cavium erratum 22375, 24313"
>> default y
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 06f025f..b0cb528 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -46,6 +46,7 @@
>> #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0)
>> #define ITS_FLAGS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
>> #define ITS_FLAGS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
>> +#define ITS_FLAGS_WORKAROUND_CAVIUM_174 (1ULL << 3)
>
> Instead of inventing a new flag, please rename the existing one to
> ITS_FLAG_WORKAROUND_RESTRICT_NODE_AFFINITY (or something similar). There
> is really no need to have two flags that do the exact same thing,
#23144 is used to restrict ITS to collection mapping too,
where as 174 is only restricts cross node affinity.
Having said that, Since we are restricting affinity in #174,
i see there is no use of having ITS to other node collection mapping.
There should not be any issue if we club flag. I will post this change
in next version.
>
>>
>> #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>>
>> @@ -1102,7 +1103,8 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>> return -EINVAL;
>>
>> /* lpi cannot be routed to a redistributor that is on a foreign node */
>> - if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) {
>> + if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144 ||
>> + its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_174) {
>> if (its_dev->its->numa_node >= 0) {
>> cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>> if (!cpumask_intersects(mask_val, cpu_mask))
>> @@ -2904,6 +2906,15 @@ static int its_force_quiescent(void __iomem *base)
>> }
>> }
>>
>> +static bool __maybe_unused its_enable_quirk_cavium_174(void *data)
>> +{
>> + struct its_node *its = data;
>> +
>> + its->flags |= ITS_FLAGS_WORKAROUND_CAVIUM_174;
>> +
>> + return true;
>> +}
>> +
>> static bool __maybe_unused its_enable_quirk_cavium_22375(void *data)
>> {
>> struct its_node *its = data;
>> @@ -3031,6 +3042,14 @@ static const struct gic_quirk its_quirks[] = {
>> .init = its_enable_quirk_hip07_161600802,
>> },
>> #endif
>> +#ifdef CONFIG_CAVIUM_ERRATUM_174
>> + {
>> + .desc = "ITS: Cavium ThunderX2 erratum 174",
>> + .iidr = 0x13f, /* ThunderX2 pass A1/A2/B0 */
>> + .mask = 0xffffffff,
>> + .init = its_enable_quirk_cavium_174,
>> + },
>> +#endif
>> {
>> }
>> };
>>
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...
thanks
Ganapat
On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> This erratum is observed on the ThunderX2 GICv3 ITS. When a
> MOVI command is used to change affinity of a LPI to a collection/cpu
> on another node, the LPI is not delivered to the cpu.
> An additional INV command is required after the MOVI to deliver
> the LPI to the new destination.
>
> If we add INV after MOVI, there is a chance that we lose LPIs which
> are raised when the affinity is changed. So for now, adding workaround fix
> to disable inter node affinity change.
>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> ---
>
> v2: Added workaround to avoid inter node affinity change.
>
> v1: Initial patch
>
> Documentation/arm64/silicon-errata.txt | 1 +
> arch/arm64/Kconfig | 10 ++++++++++
> drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
> 3 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index fc1c884..fb27cb5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -63,6 +63,7 @@ stable kernels.
> | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
> | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
> | Cavium | ThunderX SMMUv2 | #27704 | N/A |
> +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
> | Cavium | ThunderX2 SMMUv3| #74 | N/A |
> | Cavium | ThunderX2 SMMUv3| #126 | N/A |
> | | | | |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c9a7e9e..0dbf3bd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
>
> If unsure, say Y.
>
> +config CAVIUM_ERRATUM_174
> + bool "Cavium ThunderX2 erratum 174"
> + default y
> + help
> + Cavium ThunderX2 dual socket systems may loose interrupts
> + on affinity change to a cpu on other node.
> + This workaround fix avoids inter node affinity change.
This has to be fixed up to match the commit message (and for spelling).
I have seen some questions offlist about how important this fix is,
and how it can affect users - so that would be useful to have in the
description as well.
To clarify, this errata comes into play only when the irq affinity is
forced from the node given by the device (and ITS) affinity to another
node. This should not happen in normal, useful configurations.
Also, we will hold further posting of this errata until we do another
round of investigation with the hardware team for a better solution.
If we can handle the pending interrupts for the small window of MOVI/INV
in first workaround, we will not need this restriction at all.
JC.
On Sun, 21 Jan 2018 07:00:48 +0000,
Jayachandran C wrote:
>
> On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > MOVI command is used to change affinity of a LPI to a collection/cpu
> > on another node, the LPI is not delivered to the cpu.
> > An additional INV command is required after the MOVI to deliver
> > the LPI to the new destination.
> >
> > If we add INV after MOVI, there is a chance that we lose LPIs which
> > are raised when the affinity is changed. So for now, adding workaround fix
> > to disable inter node affinity change.
> >
> > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > ---
> >
> > v2: Added workaround to avoid inter node affinity change.
> >
> > v1: Initial patch
> >
> > Documentation/arm64/silicon-errata.txt | 1 +
> > arch/arm64/Kconfig | 10 ++++++++++
> > drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
> > 3 files changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > index fc1c884..fb27cb5 100644
> > --- a/Documentation/arm64/silicon-errata.txt
> > +++ b/Documentation/arm64/silicon-errata.txt
> > @@ -63,6 +63,7 @@ stable kernels.
> > | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
> > | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
> > | Cavium | ThunderX SMMUv2 | #27704 | N/A |
> > +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
> > | Cavium | ThunderX2 SMMUv3| #74 | N/A |
> > | Cavium | ThunderX2 SMMUv3| #126 | N/A |
> > | | | | |
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index c9a7e9e..0dbf3bd 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> >
> > If unsure, say Y.
> >
> > +config CAVIUM_ERRATUM_174
> > + bool "Cavium ThunderX2 erratum 174"
> > + default y
> > + help
> > + Cavium ThunderX2 dual socket systems may loose interrupts
> > + on affinity change to a cpu on other node.
> > + This workaround fix avoids inter node affinity change.
>
> This has to be fixed up to match the commit message (and for spelling).
> I have seen some questions offlist about how important this fix is,
> and how it can affect users - so that would be useful to have in the
> description as well.
>
> To clarify, this errata comes into play only when the irq affinity is
> forced from the node given by the device (and ITS) affinity to another
> node. This should not happen in normal, useful configurations.
Define normal. That's all under control of userspace, and the kernel
doesn't really have a say. irqbalance will happily move interrupts
around. Disable all CPUs from node at runtime (again, from userspace),
and you'll get the exact same thing. I can't see what's so "abnormal"
about any of that.
> Also, we will hold further posting of this errata until we do another
> round of investigation with the hardware team for a better solution.
> If we can handle the pending interrupts for the small window of MOVI/INV
> in first workaround, we will not need this restriction at all.
What do you mean by "If we can handle the pending interrupts for the
small window of MOVI/INV"? Taking the interrupt on the source CPU?
Sure, that would be fine. But that's assuming that the souce CPU is in
a position to actually handle this, and is not simply going down.
If there is only a slight possibility that you may loose an interrupt
in the MOVI/INV window (which is not that small, since that's a 4
command sequence), your only other solution is to inject a spurious
interrupt to replace the one you may have lost in that window.
In the meantime, and until I see a patch fixing this (or a decent
explanation of why this isn't a problem), I'll consider it broken.
Thanks,
M.
--
Jazz is not dead, it just smell funny.
On Sun, Jan 21, 2018 at 11:35:34AM +0000, Marc Zyngier wrote:
> On Sun, 21 Jan 2018 07:00:48 +0000,
> Jayachandran C wrote:
> >
> > On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > > MOVI command is used to change affinity of a LPI to a collection/cpu
> > > on another node, the LPI is not delivered to the cpu.
> > > An additional INV command is required after the MOVI to deliver
> > > the LPI to the new destination.
> > >
> > > If we add INV after MOVI, there is a chance that we lose LPIs which
> > > are raised when the affinity is changed. So for now, adding workaround fix
> > > to disable inter node affinity change.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > > ---
> > >
> > > v2: Added workaround to avoid inter node affinity change.
> > >
> > > v1: Initial patch
> > >
> > > Documentation/arm64/silicon-errata.txt | 1 +
> > > arch/arm64/Kconfig | 10 ++++++++++
> > > drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
> > > 3 files changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > > index fc1c884..fb27cb5 100644
> > > --- a/Documentation/arm64/silicon-errata.txt
> > > +++ b/Documentation/arm64/silicon-errata.txt
> > > @@ -63,6 +63,7 @@ stable kernels.
> > > | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
> > > | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
> > > | Cavium | ThunderX SMMUv2 | #27704 | N/A |
> > > +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
> > > | Cavium | ThunderX2 SMMUv3| #74 | N/A |
> > > | Cavium | ThunderX2 SMMUv3| #126 | N/A |
> > > | | | | |
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index c9a7e9e..0dbf3bd 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> > >
> > > If unsure, say Y.
> > >
> > > +config CAVIUM_ERRATUM_174
> > > + bool "Cavium ThunderX2 erratum 174"
> > > + default y
> > > + help
> > > + Cavium ThunderX2 dual socket systems may loose interrupts
> > > + on affinity change to a cpu on other node.
> > > + This workaround fix avoids inter node affinity change.
> >
> > This has to be fixed up to match the commit message (and for spelling).
> > I have seen some questions offlist about how important this fix is,
> > and how it can affect users - so that would be useful to have in the
> > description as well.
> >
> > To clarify, this errata comes into play only when the irq affinity is
> > forced from the node given by the device (and ITS) affinity to another
> > node. This should not happen in normal, useful configurations.
>
> Define normal. That's all under control of userspace, and the kernel
> doesn't really have a say. irqbalance will happily move interrupts
> around. Disable all CPUs from node at runtime (again, from userspace),
> and you'll get the exact same thing. I can't see what's so "abnormal"
> about any of that.
>
> > Also, we will hold further posting of this errata until we do another
> > round of investigation with the hardware team for a better solution.
> > If we can handle the pending interrupts for the small window of MOVI/INV
> > in first workaround, we will not need this restriction at all.
>
> What do you mean by "If we can handle the pending interrupts for the
> small window of MOVI/INV"? Taking the interrupt on the source CPU?
> Sure, that would be fine. But that's assuming that the souce CPU is in
> a position to actually handle this, and is not simply going down.
>
> If there is only a slight possibility that you may loose an interrupt
> in the MOVI/INV window (which is not that small, since that's a 4
> command sequence), your only other solution is to inject a spurious
> interrupt to replace the one you may have lost in that window.
>
> In the meantime, and until I see a patch fixing this (or a decent
> explanation of why this isn't a problem), I'll consider it broken.
After reviewing the issue with our hardware team, we decided to
tweak the redistributor cache configuration from firmware rather
than go with this errata workaournd in Linux (and other OSes).
So, with the new firmware MOVI will work across nodes as expected,
and this patch is no longer neeeded.
Thanks,
JC.
On Mon, 19 Feb 2018 21:12:10 +0000,
Jayachandran C wrote:
>
> On Sun, Jan 21, 2018 at 11:35:34AM +0000, Marc Zyngier wrote:
> > On Sun, 21 Jan 2018 07:00:48 +0000,
> > Jayachandran C wrote:
> > >
> > > On Thu, Jan 18, 2018 at 10:58:20AM +0530, Ganapatrao Kulkarni wrote:
> > > > This erratum is observed on the ThunderX2 GICv3 ITS. When a
> > > > MOVI command is used to change affinity of a LPI to a collection/cpu
> > > > on another node, the LPI is not delivered to the cpu.
> > > > An additional INV command is required after the MOVI to deliver
> > > > the LPI to the new destination.
> > > >
> > > > If we add INV after MOVI, there is a chance that we lose LPIs which
> > > > are raised when the affinity is changed. So for now, adding workaround fix
> > > > to disable inter node affinity change.
> > > >
> > > > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > > > ---
> > > >
> > > > v2: Added workaround to avoid inter node affinity change.
> > > >
> > > > v1: Initial patch
> > > >
> > > > Documentation/arm64/silicon-errata.txt | 1 +
> > > > arch/arm64/Kconfig | 10 ++++++++++
> > > > drivers/irqchip/irq-gic-v3-its.c | 21 ++++++++++++++++++++-
> > > > 3 files changed, 31 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> > > > index fc1c884..fb27cb5 100644
> > > > --- a/Documentation/arm64/silicon-errata.txt
> > > > +++ b/Documentation/arm64/silicon-errata.txt
> > > > @@ -63,6 +63,7 @@ stable kernels.
> > > > | Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
> > > > | Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
> > > > | Cavium | ThunderX SMMUv2 | #27704 | N/A |
> > > > +| Cavium | ThunderX2 ITS | #174 | CAVIUM_ERRATUM_174 |
> > > > | Cavium | ThunderX2 SMMUv3| #74 | N/A |
> > > > | Cavium | ThunderX2 SMMUv3| #126 | N/A |
> > > > | | | | |
> > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > index c9a7e9e..0dbf3bd 100644
> > > > --- a/arch/arm64/Kconfig
> > > > +++ b/arch/arm64/Kconfig
> > > > @@ -461,6 +461,16 @@ config ARM64_ERRATUM_843419
> > > >
> > > > If unsure, say Y.
> > > >
> > > > +config CAVIUM_ERRATUM_174
> > > > + bool "Cavium ThunderX2 erratum 174"
> > > > + default y
> > > > + help
> > > > + Cavium ThunderX2 dual socket systems may loose interrupts
> > > > + on affinity change to a cpu on other node.
> > > > + This workaround fix avoids inter node affinity change.
> > >
> > > This has to be fixed up to match the commit message (and for spelling).
> > > I have seen some questions offlist about how important this fix is,
> > > and how it can affect users - so that would be useful to have in the
> > > description as well.
> > >
> > > To clarify, this errata comes into play only when the irq affinity is
> > > forced from the node given by the device (and ITS) affinity to another
> > > node. This should not happen in normal, useful configurations.
> >
> > Define normal. That's all under control of userspace, and the kernel
> > doesn't really have a say. irqbalance will happily move interrupts
> > around. Disable all CPUs from node at runtime (again, from userspace),
> > and you'll get the exact same thing. I can't see what's so "abnormal"
> > about any of that.
> >
> > > Also, we will hold further posting of this errata until we do another
> > > round of investigation with the hardware team for a better solution.
> > > If we can handle the pending interrupts for the small window of MOVI/INV
> > > in first workaround, we will not need this restriction at all.
> >
> > What do you mean by "If we can handle the pending interrupts for the
> > small window of MOVI/INV"? Taking the interrupt on the source CPU?
> > Sure, that would be fine. But that's assuming that the souce CPU is in
> > a position to actually handle this, and is not simply going down.
> >
> > If there is only a slight possibility that you may loose an interrupt
> > in the MOVI/INV window (which is not that small, since that's a 4
> > command sequence), your only other solution is to inject a spurious
> > interrupt to replace the one you may have lost in that window.
> >
> > In the meantime, and until I see a patch fixing this (or a decent
> > explanation of why this isn't a problem), I'll consider it broken.
>
> After reviewing the issue with our hardware team, we decided to
> tweak the redistributor cache configuration from firmware rather
> than go with this errata workaournd in Linux (and other OSes).
>
> So, with the new firmware MOVI will work across nodes as expected,
> and this patch is no longer neeeded.
That's good news! One question though: how do we detect the old (or
new) firmware? We'd still need mitigation for the "old firmware"
case, unless you have a way to guarantee that all existing boxes are
magically updated to a fixed version?
Thanks,
M.
--
Jazz is not dead, it just smell funny.