Every time the vPE is scheduled, the code starts polling the
GICR_VPENDBASER.Dirty bit until it becomes 0. It's OK. But
the delay_us of the poll function is directly set to 10, which
is a long time for most interrupts. In our measurement, it takes
only 1~2 microseconds, or even hundreds of nanoseconds, to finish
parsing the VPT in most cases. However, in the current implementation,
if the GICR_VPENDBASER.Dirty bit is not 0 immediately after the
vPE is scheduled, it will directly wait for 10 microseconds,
resulting in meaningless waiting.
In order to avoid meaningless waiting, we can set the delay_us
to 0, which can exit the poll function immediately when the Dirty
bit becomes 0.
Signed-off-by: Shenming Lu <[email protected]>
---
drivers/irqchip/irq-gic-v3-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 548de7538632..5cfcf0c2ce1a 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3803,7 +3803,7 @@ static void its_wait_vpt_parse_complete(void)
WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER,
val,
!(val & GICR_VPENDBASER_Dirty),
- 10, 500));
+ 0, 500));
}
static void its_vpe_schedule(struct its_vpe *vpe)
--
2.19.1
On 2020-09-15 08:22, Shenming Lu wrote:
> Every time the vPE is scheduled, the code starts polling the
> GICR_VPENDBASER.Dirty bit until it becomes 0. It's OK. But
> the delay_us of the poll function is directly set to 10, which
> is a long time for most interrupts. In our measurement, it takes
> only 1~2 microseconds, or even hundreds of nanoseconds, to finish
> parsing the VPT in most cases. However, in the current implementation,
> if the GICR_VPENDBASER.Dirty bit is not 0 immediately after the
> vPE is scheduled, it will directly wait for 10 microseconds,
> resulting in meaningless waiting.
>
> In order to avoid meaningless waiting, we can set the delay_us
> to 0, which can exit the poll function immediately when the Dirty
> bit becomes 0.
We clearly have a difference in interpretation of the word
"meaningless".
With this, you are busy-waiting on the register, adding even more
overhead
at the RD level. How is that better? The whole point is to back off and
let
the RD do its stuff in the background. This is also based on a massive
sample of *one* implementation. How is that representative?
It would be a lot more convincing if you measured the difference it
makes on the total scheduling latency of a vcpu. Assuming it makes
*any* observable difference.
Thanks,
M.
>
> Signed-off-by: Shenming Lu <[email protected]>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 548de7538632..5cfcf0c2ce1a 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3803,7 +3803,7 @@ static void its_wait_vpt_parse_complete(void)
> WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base +
> GICR_VPENDBASER,
> val,
> !(val & GICR_VPENDBASER_Dirty),
> - 10, 500));
> + 0, 500));
> }
>
> static void its_vpe_schedule(struct its_vpe *vpe)
--
Jazz is not dead. It just smells funny...