Now cpu cap check is done every exception happens on every arm64 platform,
but this check is necessary on just few of then, so we can drop this
check at compile time on others. This can decrease exception handle time
on most cases.
Fixes: 6d4e11c5e2e8 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
Signed-off-by: Puyou Lu <[email protected]>
---
drivers/irqchip/irq-gic-v3.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 262658fd5f9e..3f08c2ef1251 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -237,9 +237,11 @@ static void gic_redist_wait_for_rwp(void)
static u64 __maybe_unused gic_read_iar(void)
{
+#ifdef CONFIG_CAVIUM_ERRATUM_23154
if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
return gic_read_iar_cavium_thunderx();
else
+#endif
return gic_read_iar_common();
}
#endif
--
2.17.1
On Sat, 27 Aug 2022 06:19:27 +0100,
Puyou Lu <[email protected]> wrote:
>
> Now cpu cap check is done every exception happens on every arm64 platform,
> but this check is necessary on just few of then, so we can drop this
> check at compile time on others. This can decrease exception handle time
> on most cases.
>
> Fixes: 6d4e11c5e2e8 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
> Signed-off-by: Puyou Lu <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 262658fd5f9e..3f08c2ef1251 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -237,9 +237,11 @@ static void gic_redist_wait_for_rwp(void)
>
> static u64 __maybe_unused gic_read_iar(void)
> {
> +#ifdef CONFIG_CAVIUM_ERRATUM_23154
> if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
> return gic_read_iar_cavium_thunderx();
> else
> +#endif
> return gic_read_iar_common();
> }
> #endif
You realise that cpus_have_const_cap() results purely in a couple of
branches once the caps have been finalised, right?
Please provide data showing that it actually "can decrease exception
handle time on most cases", because I'm pretty sure you cannot measure
the difference in any meaningful way.
M.
--
Without deviation from the norm, progress is not possible.
On Sat, Aug 27, 2022 at 04:13:00PM +0100, Marc Zyngier wrote:
> On Sat, 27 Aug 2022 06:19:27 +0100,
> Puyou Lu <[email protected]> wrote:
> >
> > Now cpu cap check is done every exception happens on every arm64 platform,
> > but this check is necessary on just few of then, so we can drop this
> > check at compile time on others. This can decrease exception handle time
> > on most cases.
> >
> > Fixes: 6d4e11c5e2e8 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
> > Signed-off-by: Puyou Lu <[email protected]>
> > ---
> > drivers/irqchip/irq-gic-v3.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 262658fd5f9e..3f08c2ef1251 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -237,9 +237,11 @@ static void gic_redist_wait_for_rwp(void)
> >
> > static u64 __maybe_unused gic_read_iar(void)
> > {
> > +#ifdef CONFIG_CAVIUM_ERRATUM_23154
> > if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
> > return gic_read_iar_cavium_thunderx();
> > else
> > +#endif
> > return gic_read_iar_common();
> > }
> > #endif
>
> You realise that cpus_have_const_cap() results purely in a couple of
> branches once the caps have been finalised, right?
>
> Please provide data showing that it actually "can decrease exception
> handle time on most cases", because I'm pretty sure you cannot measure
> the difference in any meaningful way.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
Hi Marc,
Thank you for the reply. Actually I did no test, just from the disassemble
code of vmlinux, I saw about 6 instruction generated by
cpus_have_const_cap, and about 36 by gic_read_iar_cavium_thunderx, which
is useless for most CPUs. I think this will waste some cpu cycles, as
exceptions can occur hunderds or thousands times per second. Also
(6+36)*4=168 bytes of icache is wasted, and icache misses increase
somewhere else.
If I got things wrong, please correct me.
On Sun, 28 Aug 2022 08:56:23 +0100,
Puyou Lu <[email protected]> wrote:
>
> On Sat, Aug 27, 2022 at 04:13:00PM +0100, Marc Zyngier wrote:
> > On Sat, 27 Aug 2022 06:19:27 +0100,
> > Puyou Lu <[email protected]> wrote:
> > >
> > > Now cpu cap check is done every exception happens on every arm64 platform,
> > > but this check is necessary on just few of then, so we can drop this
> > > check at compile time on others. This can decrease exception handle time
> > > on most cases.
> > >
> > > Fixes: 6d4e11c5e2e8 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")
> > > Signed-off-by: Puyou Lu <[email protected]>
> > > ---
> > > drivers/irqchip/irq-gic-v3.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > > index 262658fd5f9e..3f08c2ef1251 100644
> > > --- a/drivers/irqchip/irq-gic-v3.c
> > > +++ b/drivers/irqchip/irq-gic-v3.c
> > > @@ -237,9 +237,11 @@ static void gic_redist_wait_for_rwp(void)
> > >
> > > static u64 __maybe_unused gic_read_iar(void)
> > > {
> > > +#ifdef CONFIG_CAVIUM_ERRATUM_23154
> > > if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_23154))
> > > return gic_read_iar_cavium_thunderx();
> > > else
> > > +#endif
> > > return gic_read_iar_common();
> > > }
> > > #endif
> >
> > You realise that cpus_have_const_cap() results purely in a couple of
> > branches once the caps have been finalised, right?
> >
> > Please provide data showing that it actually "can decrease exception
> > handle time on most cases", because I'm pretty sure you cannot measure
> > the difference in any meaningful way.
> >
> > M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
>
> Hi Marc,
> Thank you for the reply. Actually I did no test, just from the disassemble
> code of vmlinux, I saw about 6 instruction generated by
> cpus_have_const_cap, and about 36 by gic_read_iar_cavium_thunderx, which
> is useless for most CPUs. I think this will waste some cpu cycles, as
> exceptions can occur hunderds or thousands times per second. Also
> (6+36)*4=168 bytes of icache is wasted, and icache misses increase
> somewhere else.
> If I got things wrong, please correct me.
Well, what you got wrong is that these instructions are stepped over
two branches when the caps are finalised, and that doesn't appear in
the disassembly (you need to look at the code that is actually
executed).
Now, any optimisation of the sort must be backed by some performance
numbers. If you can show that this has a meaningful impact on a given
workload, I'm happy to look into it. But only if you can show that
data.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Sun, Aug 28, 2022 at 06:11:15PM +0100, Marc Zyngier wrote:
> Well, what you got wrong is that these instructions are stepped over
> two branches when the caps are finalised, and that doesn't appear in
> the disassembly (you need to look at the code that is actually
> executed).
>
> Now, any optimisation of the sort must be backed by some performance
> numbers. If you can show that this has a meaningful impact on a given
> workload, I'm happy to look into it. But only if you can show that
> data.
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
Thank you very much for correcting me, and sorry for wasting your time, I
got it now.