by Ganapatrao Kulkarni

[permalink] [raw]

Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.

Hi Marc,

On 05-03-2024 08:33 pm, Marc Zyngier wrote:
> On Tue, 05 Mar 2024 13:29:08 +0000,
> Ganapatrao Kulkarni <[email protected]> wrote:
>>
>>
>>
>> What are the core issues (please forgive me if you mentioned already)?
>> certainly we will prioritise them than this.
>
> AT is a big one. Maintenance interrupts are more or less broken. I'm
> slowly plugging PAuth, but there's no testing whatsoever (running
> Linux doesn't count). Lack of SVE support is also definitely a
> blocker.
>

I am debugging an issue where EDK2(ArmVirtPkg) boot hangs when tried to
boot from L1 using QEMU.

The hang is due to failure of AT instruction and resulting in immediate
return to Guest(L2) and the loop continues...

AT instruction is executed in function of
__get_fault_info(__translate_far_to_hpfar) in L1 when data abort is
forwarded. Then AT instruction is trapped and executed/emulated in L0 in
function "__kvm_at_s1e01" is failing and resulting in the return to guest.

Is this also the manifestation of the issue of AT that you are referring to?

Thanks,
Ganapat

2024-03-27 12:41:32

by Marc Zyngier

[permalink] [raw]

Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.

On Tue, 26 Mar 2024 11:33:27 +0000,
Ganapatrao Kulkarni <[email protected]> wrote:
>
>
> Hi Marc,
>
> On 05-03-2024 08:33 pm, Marc Zyngier wrote:
> > On Tue, 05 Mar 2024 13:29:08 +0000,
> > Ganapatrao Kulkarni <[email protected]> wrote:
> >>
> >>
> >>
> >> What are the core issues (please forgive me if you mentioned already)?
> >> certainly we will prioritise them than this.
> >
> > AT is a big one. Maintenance interrupts are more or less broken. I'm
> > slowly plugging PAuth, but there's no testing whatsoever (running
> > Linux doesn't count). Lack of SVE support is also definitely a
> > blocker.
> >
>
> I am debugging an issue where EDK2(ArmVirtPkg) boot hangs when tried
> to boot from L1 using QEMU.
>
> The hang is due to failure of AT instruction and resulting in
> immediate return to Guest(L2) and the loop continues...
>
> AT instruction is executed in function of
> __get_fault_info(__translate_far_to_hpfar) in L1 when data abort is
> forwarded. Then AT instruction is trapped and executed/emulated in L0
> in function "__kvm_at_s1e01" is failing and resulting in the return to
> guest.
>
> Is this also the manifestation of the issue of AT that you are referring to?

It's possible, but you are looking at the symptom and not necessarily
the problem.

FWIW, I can boot EDK2 as built by debian as an L2 using QEMU without
any problem, but I'm not using QEMU for L1 (it shouldn't have much of
an impact anyway).

I expect AT S1E1R to fail if any of the following are true:

- the guest S1 page tables have been swapped out in the interval
between the fault and the AT emulation

- the shadow S2 page tables do not have a translation for the output
of the S1 page tables yet

You will need to work out which of these two are true. It is perfectly
possible that there are more edge cases that need addressing, as what
you describe just works with my setup. It could also be that 4kB page
support at L1 is broken (as I have no way to test it and only run with
a 16kB L1).

M.

--
Without deviation from the norm, progress is not possible.