2019-09-04 18:22:17

by Heinrich Schuchardt

[permalink] [raw]
Subject: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

If an application tries to access memory that is not mapped, an error
ENOSYS, "load/store instruction decoding not implemented" may occur.
QEMU will hang with a register dump.

Instead create a data abort that can be handled gracefully by the
application running in the virtual environment.

Now the virtual machine can react to the event in the most appropriate
way - by recovering, by writing an informative log, or by rebooting.

Signed-off-by: Heinrich Schuchardt <[email protected]>
---
virt/kvm/arm/mmio.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
index a8a6a0c883f1..0cbed7d6a0f4 100644
--- a/virt/kvm/arm/mmio.c
+++ b/virt/kvm/arm/mmio.c
@@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
if (ret)
return ret;
} else {
- kvm_err("load/store instruction decoding not implemented\n");
- return -ENOSYS;
+ kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+ return 1;
}

rt = vcpu->arch.mmio_decode.rt;
--
2.23.0.rc1


2019-09-05 08:41:57

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

[Please use my kernel.org address. My arm.com address will disappear shortly]

On Wed, 04 Sep 2019 19:07:36 +0100,
Heinrich Schuchardt <[email protected]> wrote:
>
> If an application tries to access memory that is not mapped, an error
> ENOSYS, "load/store instruction decoding not implemented" may occur.
> QEMU will hang with a register dump.
>
> Instead create a data abort that can be handled gracefully by the
> application running in the virtual environment.
>
> Now the virtual machine can react to the event in the most appropriate
> way - by recovering, by writing an informative log, or by rebooting.
>
> Signed-off-by: Heinrich Schuchardt <[email protected]>
> ---
> virt/kvm/arm/mmio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index a8a6a0c883f1..0cbed7d6a0f4 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> if (ret)
> return ret;
> } else {
> - kvm_err("load/store instruction decoding not implemented\n");
> - return -ENOSYS;
> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> + return 1;

How can you tell that the access would fault? You have no idea at that
stage (the kernel doesn't know about the MMIO ranges that userspace
handles). All you know is that you're faced with a memory access that
you cannot emulate in the kernel. Injecting a data abort at that stage
is not something that the architecture allows.

If you want to address this, consider forwarding the access to
userspace. You'll only need an instruction decoder (supporting T1, T2,
A32 and A64) and a S1 page table walker (one per page table format,
all three of them) to emulate the access (having taken care of
stopping all the other vcpus to make sure there is no concurrent
modification of the page tables). You'll then be able to return the
result of the access back to the kernel.

Of course, the best thing would be to actually fix the guest so that
it doesn't use non-emulatable MMIO accesses. In general, that the sign
of a bug in low-level accessors.

M.

--
Jazz is not dead, it just smells funny.

2019-09-05 08:47:31

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <[email protected]> wrote:
> How can you tell that the access would fault? You have no idea at that
> stage (the kernel doesn't know about the MMIO ranges that userspace
> handles). All you know is that you're faced with a memory access that
> you cannot emulate in the kernel. Injecting a data abort at that stage
> is not something that the architecture allows.

To be fair, locking up the whole CPU (which is effectively
what the kvm_err/ENOSYS is going to do to the VM) isn't
something the architecture allows either :-)

> Of course, the best thing would be to actually fix the guest so that
> it doesn't use non-emulatable MMIO accesses. In general, that the sign
> of a bug in low-level accessors.

This is true, but the problem is that barfing out to userspace
makes it harder to debug the guest because it means that
the VM is immediately destroyed, whereas AIUI if we
inject some kind of exception then (assuming you're set up
to do kernel-debug via gdbstub) you can actually examine
the offending guest code with a debugger because at least
your VM is still around to inspect...

thanks
-- PMM

2019-09-05 09:57:20

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote:
> On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <[email protected]> wrote:
> > How can you tell that the access would fault? You have no idea at that
> > stage (the kernel doesn't know about the MMIO ranges that userspace
> > handles). All you know is that you're faced with a memory access that
> > you cannot emulate in the kernel. Injecting a data abort at that stage
> > is not something that the architecture allows.
>
> To be fair, locking up the whole CPU (which is effectively
> what the kvm_err/ENOSYS is going to do to the VM) isn't
> something the architecture allows either :-)
>
> > Of course, the best thing would be to actually fix the guest so that
> > it doesn't use non-emulatable MMIO accesses. In general, that the sign
> > of a bug in low-level accessors.
>
> This is true, but the problem is that barfing out to userspace
> makes it harder to debug the guest because it means that
> the VM is immediately destroyed, whereas AIUI if we
> inject some kind of exception then (assuming you're set up
> to do kernel-debug via gdbstub) you can actually examine
> the offending guest code with a debugger because at least
> your VM is still around to inspect...
>

Is it really going to be easier to debug a guest that sees behavior
which may not be architecturally correct? For example, seeing a data
abort on an access to an MMIO region because the guest used a strange
instruction?

I appreaciate that the current way we handle this is confusing and has
led many people down a rabbit hole, so we should do better.

Would a better approach not be to return to userspace saying, "we can't
handle this in the kernel, you decide", without printing the dubious
kernel error message. Then user space could suspend the VM and print a
lenghty explanation of all the possible problems there could be, or
re-inject something back into the guest, or whatever, for a particular
environment.

Thoughts?


Thanks,

Christoffer

2019-09-05 10:04:58

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
>
> On Thu, 05 Sep 2019 09:16:54 +0100,
> Peter Maydell <[email protected]> wrote:
> > This is true, but the problem is that barfing out to userspace
> > makes it harder to debug the guest because it means that
> > the VM is immediately destroyed, whereas AIUI if we
> > inject some kind of exception then (assuming you're set up
> > to do kernel-debug via gdbstub) you can actually examine
> > the offending guest code with a debugger because at least
> > your VM is still around to inspect...
>
> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> an exception, but the instruction that caused it may be completely
> legal (store with post-increment, for example), leading to an even
> more puzzled developer (that exception should never have been
> delivered the first place).

Right, but the combination of "host kernel prints a message
about an unsupported load/store insn" and "within-guest debug
dump/stack trace/etc" is much more useful than just having
"host kernel prints message" and "QEMU exits"; and it requires
about 3 lines of code change...

> I'm far more in favour of dumping the state of the access in the run
> structure (much like we do for a MMIO access) and let userspace do
> something about it (such as dumping information on the console or
> breaking). It could even inject an exception *if* the user has asked
> for it.

...whereas this requires agreement on a kernel-userspace API,
larger changes in the kernel, somebody to implement the userspace
side of things, and the user to update both the kernel and QEMU.
It's hard for me to see that the benefit here over the 3-line
approach really outweighs the extra effort needed. In practice
saying "we should do this" is saying "we're going to do nothing",
based on the historical record.

thanks
-- PMM

2019-09-05 10:17:14

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 05 Sep 2019 09:56:44 +0100,
Peter Maydell <[email protected]> wrote:
>
> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
> >
> > On Thu, 05 Sep 2019 09:16:54 +0100,
> > Peter Maydell <[email protected]> wrote:
> > > This is true, but the problem is that barfing out to userspace
> > > makes it harder to debug the guest because it means that
> > > the VM is immediately destroyed, whereas AIUI if we
> > > inject some kind of exception then (assuming you're set up
> > > to do kernel-debug via gdbstub) you can actually examine
> > > the offending guest code with a debugger because at least
> > > your VM is still around to inspect...
> >
> > To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> > an exception, but the instruction that caused it may be completely
> > legal (store with post-increment, for example), leading to an even
> > more puzzled developer (that exception should never have been
> > delivered the first place).
>
> Right, but the combination of "host kernel prints a message
> about an unsupported load/store insn" and "within-guest debug
> dump/stack trace/etc" is much more useful than just having
> "host kernel prints message" and "QEMU exits"; and it requires
> about 3 lines of code change...

Which is wrong, and creates a new behaviour that isn't specified
anywhere.

>
> > I'm far more in favour of dumping the state of the access in the run
> > structure (much like we do for a MMIO access) and let userspace do
> > something about it (such as dumping information on the console or
> > breaking). It could even inject an exception *if* the user has asked
> > for it.
>
> ...whereas this requires agreement on a kernel-userspace API,
> larger changes in the kernel, somebody to implement the userspace
> side of things, and the user to update both the kernel and QEMU.
> It's hard for me to see that the benefit here over the 3-line
> approach really outweighs the extra effort needed.

3 lines that already require the host kernel to be updated, and create
a legacy that we'll never be able to get rid of.

> In practice saying "we should do this" is saying "we're going to do
> nothing", based on the historical record.

Thanks for the vote of confidence...

M.

--
Jazz is not dead, it just smells funny.

2019-09-05 10:18:41

by Daniel P. Berrangé

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, Sep 05, 2019 at 10:20:39AM +0100, Stefan Hajnoczi wrote:
> On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
> > If an application tries to access memory that is not mapped, an error
> > ENOSYS, "load/store instruction decoding not implemented" may occur.
> > QEMU will hang with a register dump.
> >
> > Instead create a data abort that can be handled gracefully by the
> > application running in the virtual environment.
> >
> > Now the virtual machine can react to the event in the most appropriate
> > way - by recovering, by writing an informative log, or by rebooting.
> >
> > Signed-off-by: Heinrich Schuchardt <[email protected]>
> > ---
> > virt/kvm/arm/mmio.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> > index a8a6a0c883f1..0cbed7d6a0f4 100644
> > --- a/virt/kvm/arm/mmio.c
> > +++ b/virt/kvm/arm/mmio.c
> > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > if (ret)
> > return ret;
> > } else {
> > - kvm_err("load/store instruction decoding not implemented\n");
> > - return -ENOSYS;
> > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > + return 1;
>
> I see this more as a temporary debugging hack than something to merge.
>
> It sounds like in your case the guest environment provided good
> debugging information and you preferred it over debugging this from the
> host side. That's fine, but allowing the guest to continue running in
> the general case makes it much harder to track down the root cause of a
> problem because many guest CPU instructions may be executed after the
> original problem occurs. Other guest software may fail silently in
> weird ways. IMO it's best to fail early.

The current error message is quite limited in its usefulness - mostly
you have to be able to google the message and hope to hit a previous
report which explains the problem, or find someone on IRC who remembers
the problem, etc.

Could we put a text doc in the kernel tree explaining the problem in
enough detail that people can identify their next steps to resolve it,
and then make this error message tell people to read that text doc.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

2019-09-05 11:10:22

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 9/5/19 10:03 AM, Marc Zyngier wrote:
> [Please use my kernel.org address. My arm.com address will disappear shortly]
>
> On Wed, 04 Sep 2019 19:07:36 +0100,
> Heinrich Schuchardt <[email protected]> wrote:
>>
>> If an application tries to access memory that is not mapped, an error
>> ENOSYS, "load/store instruction decoding not implemented" may occur.
>> QEMU will hang with a register dump.
>>
>> Instead create a data abort that can be handled gracefully by the
>> application running in the virtual environment.
>>
>> Now the virtual machine can react to the event in the most appropriate
>> way - by recovering, by writing an informative log, or by rebooting.
>>
>> Signed-off-by: Heinrich Schuchardt <[email protected]>
>> ---
>> virt/kvm/arm/mmio.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
>> index a8a6a0c883f1..0cbed7d6a0f4 100644
>> --- a/virt/kvm/arm/mmio.c
>> +++ b/virt/kvm/arm/mmio.c
>> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> if (ret)
>> return ret;
>> } else {
>> - kvm_err("load/store instruction decoding not implemented\n");
>> - return -ENOSYS;
>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> + return 1;
>
> How can you tell that the access would fault? You have no idea at that
> stage (the kernel doesn't know about the MMIO ranges that userspace
> handles). All you know is that you're faced with a memory access that
> you cannot emulate in the kernel. Injecting a data abort at that stage
> is not something that the architecture allows.
>
> If you want to address this, consider forwarding the access to
> userspace. You'll only need an instruction decoder (supporting T1, T2,
> A32 and A64) and a S1 page table walker (one per page table format,
> all three of them) to emulate the access (having taken care of
> stopping all the other vcpus to make sure there is no concurrent
> modification of the page tables). You'll then be able to return the
> result of the access back to the kernel.

If I understand you right, the problem has to be fixed in QEMU and not
in KVM.

Is there an example of such a forwarding already available in QEMU?

>
> Of course, the best thing would be to actually fix the guest so that
> it doesn't use non-emulatable MMIO accesses. In general, that the sign
> of a bug in low-level accessors.

My problem was to find out where in my guest (U-Boot running UEFI SCT)
the problem occurred. With this patch U-Boot gave me the relative
address in Shell.efi and I was able to locate what was wrong in U-Boot's
UEFI implementation.

Thanks for reviewing.

Heinrich

2019-09-05 11:10:38

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 5 Sep 2019 at 09:25, Christoffer Dall <[email protected]> wrote:
>
> On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote:
> > This is true, but the problem is that barfing out to userspace
> > makes it harder to debug the guest because it means that
> > the VM is immediately destroyed, whereas AIUI if we
> > inject some kind of exception then (assuming you're set up
> > to do kernel-debug via gdbstub) you can actually examine
> > the offending guest code with a debugger because at least
> > your VM is still around to inspect...
> >
>
> Is it really going to be easier to debug a guest that sees behavior
> which may not be architecturally correct? For example, seeing a data
> abort on an access to an MMIO region because the guest used a strange
> instruction?

Yeah, a data abort is not ideal. You could UNDEF the insn, which
probably is more likely to result in getting control in the
debugger I suppose.

As for whether it's going to be easier to debug, for the
user who reported this in the first place it certainly was.
(Consider even a simple Linux guest not under a debugger --
if we UNDEF the insn the guest kernel will print a helpful
backtrace so you can tell where the problem is; at the moment
we just print a register dump from the host kernel, which is a
lot less informative.)

> I appreaciate that the current way we handle this is confusing and has
> led many people down a rabbit hole, so we should do better.
>
> Would a better approach not be to return to userspace saying, "we can't
> handle this in the kernel, you decide", without printing the dubious
> kernel error message.

Printing the message in the kernel is the best clue we give
the user at the moment that they've run into this problem;
I would be wary of removing it (even if we decide to also
do something else).

> Then user space could suspend the VM and print a
> lenghty explanation of all the possible problems there could be, or
> re-inject something back into the guest, or whatever, for a particular
> environment.

In theory I guess so. In practice that's not what userspace
currently in the wild does, and injecting an exception from
userspace is a bit awkward (I dunno if kvmtool does it,
QEMU only needs to in really obscure circumstances and
was buggy in how it tried to do it until very recently)...

thanks
-- PMM

2019-09-05 11:23:40

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 05 Sep 2019 09:16:54 +0100,
Peter Maydell <[email protected]> wrote:
>
> On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <[email protected]> wrote:
> > How can you tell that the access would fault? You have no idea at that
> > stage (the kernel doesn't know about the MMIO ranges that userspace
> > handles). All you know is that you're faced with a memory access that
> > you cannot emulate in the kernel. Injecting a data abort at that stage
> > is not something that the architecture allows.
>
> To be fair, locking up the whole CPU (which is effectively
> what the kvm_err/ENOSYS is going to do to the VM) isn't
> something the architecture allows either :-)

Hey, I didn't say things were good as they are now! ;-)

I'm definitely willing to change things in that area, but I also don't
want anyone to start relying on things that are not specified anywhere.

>
> > Of course, the best thing would be to actually fix the guest so that
> > it doesn't use non-emulatable MMIO accesses. In general, that the sign
> > of a bug in low-level accessors.
>
> This is true, but the problem is that barfing out to userspace
> makes it harder to debug the guest because it means that
> the VM is immediately destroyed, whereas AIUI if we
> inject some kind of exception then (assuming you're set up
> to do kernel-debug via gdbstub) you can actually examine
> the offending guest code with a debugger because at least
> your VM is still around to inspect...

To Christoffer's point, I find the benefit a bit dubious. Yes, you get
an exception, but the instruction that caused it may be completely
legal (store with post-increment, for example), leading to an even
more puzzled developer (that exception should never have been
delivered the first place).

I'm far more in favour of dumping the state of the access in the run
structure (much like we do for a MMIO access) and let userspace do
something about it (such as dumping information on the console or
breaking). It could even inject an exception *if* the user has asked
for it.

Thanks,

M.

--
Jazz is not dead, it just smells funny.

2019-09-05 11:25:19

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 9/5/19 10:16 AM, Peter Maydell wrote:
> On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <[email protected]> wrote:
>> How can you tell that the access would fault? You have no idea at that
>> stage (the kernel doesn't know about the MMIO ranges that userspace
>> handles). All you know is that you're faced with a memory access that
>> you cannot emulate in the kernel. Injecting a data abort at that stage
>> is not something that the architecture allows.
>
> To be fair, locking up the whole CPU (which is effectively
> what the kvm_err/ENOSYS is going to do to the VM) isn't
> something the architecture allows either :-)
>
>> Of course, the best thing would be to actually fix the guest so that
>> it doesn't use non-emulatable MMIO accesses. In general, that the sign
>> of a bug in low-level accessors.
>
> This is true, but the problem is that barfing out to userspace
> makes it harder to debug the guest because it means that
> the VM is immediately destroyed, whereas AIUI if we
> inject some kind of exception then (assuming you're set up
> to do kernel-debug via gdbstub) you can actually examine
> the offending guest code with a debugger because at least
> your VM is still around to inspect...

Stopping the CPU and debugging is not what I am interested in. I want
the QEMU guest to be able to react to an incorrect memory access.

Imagine Apollo 11's computer not restarting when hitting an exception.
They would never have reached the moon. - I think allowing an emulation
guest to react to an exception, e.g. by resetting, is a necessity.

In my case U-Boot as a guest creates an output like the one below when a
data abort occurs:

"Synchronous Abort" handler, esr 0x02000000
elr: fffffffffdeac19c lr : fffffffffdeac19c (reloc)
elr: 000000007ddd719c lr : 000000007ddd719c
x0 : 0000000000000000 x1 : 000000007ffbc000
x2 : 000000000000000a x3 : 000000007ffbcd80
x4 : 0000000000002800 x5 : 000000007ffbcdb0
x6 : 0000000000000001 x7 : 000000007eef8b80
x8 : 000000000000003f x9 : 0000000000000004
x10: 0000000000000001 x11: 000000000000000d
x12: 0000000000000006 x13: 000000000001869f
x14: 0000000047f00000 x15: 0000000000000000
x16: 000000007ff5b194 x17: 0000000000000000
x18: 0000000000000000 x19: 000000007ffbcd30
x20: 0000000000000000 x21: 000000007ffeb000
x22: 0000000000000009 x23: 000000007eef5cf0
x24: 0000000000000000 x25: 000000007ffa7806
x26: 000000007ffa7834 x27: 0000000000000024
x28: 000000007dddd040 x29: 000000007ede9990

UEFI image [0x000000007ddd7000:0x000000007ddd749f] pc=0x19c '/bug.efi'
Resetting CPU ...

With this information I see that the problem occurred at 0x019C from the
start of the loaded binary bug.efi. Next thing is to look at the map
file of bug.efi to find out in which instruction the problem occurred.

After providing the dump U-Boot continues to reset the system.

When U-Boot is running the EDK II SCT (a test suite for UEFI firmware),
SCT will log that a restart occurred (indicating that a test failed) and
continue to run the next test.

Best regards

Heinrich

2019-09-05 11:56:16

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 05 Sep 2019 09:28:42 +0100,
Heinrich Schuchardt <[email protected]> wrote:
>
> On 9/5/19 10:03 AM, Marc Zyngier wrote:
> > [Please use my kernel.org address. My arm.com address will disappear shortly]
> >
> > On Wed, 04 Sep 2019 19:07:36 +0100,
> > Heinrich Schuchardt <[email protected]> wrote:
> >>
> >> If an application tries to access memory that is not mapped, an error
> >> ENOSYS, "load/store instruction decoding not implemented" may occur.
> >> QEMU will hang with a register dump.
> >>
> >> Instead create a data abort that can be handled gracefully by the
> >> application running in the virtual environment.
> >>
> >> Now the virtual machine can react to the event in the most appropriate
> >> way - by recovering, by writing an informative log, or by rebooting.
> >>
> >> Signed-off-by: Heinrich Schuchardt <[email protected]>
> >> ---
> >> virt/kvm/arm/mmio.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> >> index a8a6a0c883f1..0cbed7d6a0f4 100644
> >> --- a/virt/kvm/arm/mmio.c
> >> +++ b/virt/kvm/arm/mmio.c
> >> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >> if (ret)
> >> return ret;
> >> } else {
> >> - kvm_err("load/store instruction decoding not implemented\n");
> >> - return -ENOSYS;
> >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> >> + return 1;
> >
> > How can you tell that the access would fault? You have no idea at that
> > stage (the kernel doesn't know about the MMIO ranges that userspace
> > handles). All you know is that you're faced with a memory access that
> > you cannot emulate in the kernel. Injecting a data abort at that stage
> > is not something that the architecture allows.
> >
> > If you want to address this, consider forwarding the access to
> > userspace. You'll only need an instruction decoder (supporting T1, T2,
> > A32 and A64) and a S1 page table walker (one per page table format,
> > all three of them) to emulate the access (having taken care of
> > stopping all the other vcpus to make sure there is no concurrent
> > modification of the page tables). You'll then be able to return the
> > result of the access back to the kernel.
>
> If I understand you right, the problem has to be fixed in QEMU and not
> in KVM.

It is a shared responsibility.

> Is there an example of such a forwarding already available in QEMU?

Yes. That's how we ask userspace (QEMU) to perform a MMIO access on
behalf of the guest (see how the run structure is populated and the
vcpu thread returns to userspace).

> >
> > Of course, the best thing would be to actually fix the guest so that
> > it doesn't use non-emulatable MMIO accesses. In general, that the sign
> > of a bug in low-level accessors.
>
> My problem was to find out where in my guest (U-Boot running UEFI SCT)
> the problem occurred. With this patch U-Boot gave me the relative
> address in Shell.efi and I was able to locate what was wrong in U-Boot's
> UEFI implementation.

I understand that there is a need for more precise information. I'm
just wary of adding debug features that become something that people
rely on, despite being in contradiction with the architecture.

I can help with the kernel side of the reporting, should someone want
to tackle the userspace side of it.

Thanks,

M.

--
Jazz is not dead, it just smells funny.

2019-09-05 11:58:59

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
> If an application tries to access memory that is not mapped, an error
> ENOSYS, "load/store instruction decoding not implemented" may occur.
> QEMU will hang with a register dump.
>
> Instead create a data abort that can be handled gracefully by the
> application running in the virtual environment.
>
> Now the virtual machine can react to the event in the most appropriate
> way - by recovering, by writing an informative log, or by rebooting.
>
> Signed-off-by: Heinrich Schuchardt <[email protected]>
> ---
> virt/kvm/arm/mmio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index a8a6a0c883f1..0cbed7d6a0f4 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> if (ret)
> return ret;
> } else {
> - kvm_err("load/store instruction decoding not implemented\n");
> - return -ENOSYS;
> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> + return 1;

I see this more as a temporary debugging hack than something to merge.

It sounds like in your case the guest environment provided good
debugging information and you preferred it over debugging this from the
host side. That's fine, but allowing the guest to continue running in
the general case makes it much harder to track down the root cause of a
problem because many guest CPU instructions may be executed after the
original problem occurs. Other guest software may fail silently in
weird ways. IMO it's best to fail early.

Stefan


Attachments:
(No filename) (1.65 kB)
signature.asc (499.00 B)
Download all attachments

2019-09-05 11:59:00

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
> >
> > On Thu, 05 Sep 2019 09:16:54 +0100,
> > Peter Maydell <[email protected]> wrote:
> > > This is true, but the problem is that barfing out to userspace
> > > makes it harder to debug the guest because it means that
> > > the VM is immediately destroyed, whereas AIUI if we
> > > inject some kind of exception then (assuming you're set up
> > > to do kernel-debug via gdbstub) you can actually examine
> > > the offending guest code with a debugger because at least
> > > your VM is still around to inspect...
> >
> > To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> > an exception, but the instruction that caused it may be completely
> > legal (store with post-increment, for example), leading to an even
> > more puzzled developer (that exception should never have been
> > delivered the first place).
>
> Right, but the combination of "host kernel prints a message
> about an unsupported load/store insn" and "within-guest debug
> dump/stack trace/etc" is much more useful than just having
> "host kernel prints message" and "QEMU exits"; and it requires
> about 3 lines of code change...
>
> > I'm far more in favour of dumping the state of the access in the run
> > structure (much like we do for a MMIO access) and let userspace do
> > something about it (such as dumping information on the console or
> > breaking). It could even inject an exception *if* the user has asked
> > for it.
>
> ...whereas this requires agreement on a kernel-userspace API,
> larger changes in the kernel, somebody to implement the userspace
> side of things, and the user to update both the kernel and QEMU.
> It's hard for me to see that the benefit here over the 3-line
> approach really outweighs the extra effort needed. In practice
> saying "we should do this" is saying "we're going to do nothing",
> based on the historical record.
>

How about something like the following (completely untested, liable for
ABI discussions etc. etc., but for illustration purposes).

I think it raises the question (and likely many other) of whether we can
break the existing 'ABI' and change behavior for missing ISV
retrospectively for legacy user space when the issue has occurred?

Someone might have written code that reacts to the -ENOSYS, so I've
taken the conservative approach for this for the time being.


diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 8a37c8e89777..19a92c49039c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -76,6 +76,14 @@ struct kvm_arch {

/* Mandated version of PSCI */
u32 psci_version;
+
+ /*
+ * If we encounter a data abort without valid instruction syndrome
+ * information, report this to user space. User space can (and
+ * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+ * supported.
+ */
+ bool return_nisv_io_abort_to_user;
};

#define KVM_NR_MEM_OBJS 40
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f656169db8c3..019bc560edc1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -83,6 +83,14 @@ struct kvm_arch {

/* Mandated version of PSCI */
u32 psci_version;
+
+ /*
+ * If we encounter a data abort without valid instruction syndrome
+ * information, report this to user space. User space can (and
+ * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+ * supported.
+ */
+ bool return_nisv_io_abort_to_user;
};

#define KVM_NR_MEM_OBJS 40
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..a4dd004d0db9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
#define KVM_EXIT_S390_STSI 25
#define KVM_EXIT_IOAPIC_EOI 26
#define KVM_EXIT_HYPERV 27
+#define KVM_EXIT_ARM_NISV 28

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
#define KVM_CAP_ARM_PTRAUTH_GENERIC 172
#define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_NISV_TO_USER 174

#ifdef KVM_CAP_IRQ_ROUTING

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 35a069815baf..2ce94bd9d4a9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
return 0;
}

+int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
+ struct kvm_enable_cap *cap)
+{
+ int r;
+
+ if (cap->flags)
+ return -EINVAL;
+
+ switch (cap->cap) {
+ case KVM_CAP_ARM_NISV_TO_USER:
+ r = 0;
+ kvm->arch.return_nisv_io_abort_to_user = true;
+ break;
+ default:
+ r = -EINVAL;
+ break;
+ }
+
+ return r;
+}

/**
* kvm_arch_init_vm - initializes a VM data structure
@@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MP_STATE:
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_VCPU_EVENTS:
+ case KVM_CAP_ARM_NISV_TO_USER:
r = 1;
break;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
@@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
ret = kvm_handle_mmio_return(vcpu, vcpu->run);
if (ret)
return ret;
+ } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
+ kvm_inject_undefined(vcpu);
}

if (run->immediate_exit)
diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
index 6af5c91337f2..62e6ef47a6de 100644
--- a/virt/kvm/arm/mmio.c
+++ b/virt/kvm/arm/mmio.c
@@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
if (ret)
return ret;
} else {
- kvm_err("load/store instruction decoding not implemented\n");
- return -ENOSYS;
+ if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
+ run->exit_reason = KVM_EXIT_ARM_NISV;
+ run->mmio.phys_addr = fault_ipa;
+ vcpu->stat.mmio_exit_user++;
+ return 0;
+ } else {
+ kvm_info("encountered data abort without syndrome info\n");
+ return -ENOSYS;
+ }
}

rt = vcpu->arch.mmio_decode.rt;


Thanks,

Christoffer

2019-09-05 13:27:07

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 9/5/19 11:20 AM, Stefan Hajnoczi wrote:
> On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
>> If an application tries to access memory that is not mapped, an error
>> ENOSYS, "load/store instruction decoding not implemented" may occur.
>> QEMU will hang with a register dump.
>>
>> Instead create a data abort that can be handled gracefully by the
>> application running in the virtual environment.
>>
>> Now the virtual machine can react to the event in the most appropriate
>> way - by recovering, by writing an informative log, or by rebooting.
>>
>> Signed-off-by: Heinrich Schuchardt <[email protected]>
>> ---
>> virt/kvm/arm/mmio.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
>> index a8a6a0c883f1..0cbed7d6a0f4 100644
>> --- a/virt/kvm/arm/mmio.c
>> +++ b/virt/kvm/arm/mmio.c
>> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> if (ret)
>> return ret;
>> } else {
>> - kvm_err("load/store instruction decoding not implemented\n");
>> - return -ENOSYS;
>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> + return 1;
>
> I see this more as a temporary debugging hack than something to merge.
>
> It sounds like in your case the guest environment provided good
> debugging information and you preferred it over debugging this from the
> host side. That's fine, but allowing the guest to continue running in
> the general case makes it much harder to track down the root cause of a
> problem because many guest CPU instructions may be executed after the
> original problem occurs. Other guest software may fail silently in
> weird ways. IMO it's best to fail early.
>
> Stefan
>

As virtual machine are ubiquitous, expect also mission critical system
to run on them. At development time halting a machine may be a good
idea. In production this is often the worst solution. Rebooting may be
essential for survival.

For an anecdotal example see:
https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html

I am convinced that leaving it to the guest to decide how to react is
the best choice.

Best regards

Heinrich

2019-09-05 14:03:25

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 9/5/19 11:22 AM, Christoffer Dall wrote:
> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
>>>
>>> On Thu, 05 Sep 2019 09:16:54 +0100,
>>> Peter Maydell <[email protected]> wrote:
>>>> This is true, but the problem is that barfing out to userspace
>>>> makes it harder to debug the guest because it means that
>>>> the VM is immediately destroyed, whereas AIUI if we
>>>> inject some kind of exception then (assuming you're set up
>>>> to do kernel-debug via gdbstub) you can actually examine
>>>> the offending guest code with a debugger because at least
>>>> your VM is still around to inspect...
>>>
>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
>>> an exception, but the instruction that caused it may be completely
>>> legal (store with post-increment, for example), leading to an even
>>> more puzzled developer (that exception should never have been
>>> delivered the first place).
>>
>> Right, but the combination of "host kernel prints a message
>> about an unsupported load/store insn" and "within-guest debug
>> dump/stack trace/etc" is much more useful than just having
>> "host kernel prints message" and "QEMU exits"; and it requires
>> about 3 lines of code change...
>>
>>> I'm far more in favour of dumping the state of the access in the run
>>> structure (much like we do for a MMIO access) and let userspace do
>>> something about it (such as dumping information on the console or
>>> breaking). It could even inject an exception *if* the user has asked
>>> for it.
>>
>> ...whereas this requires agreement on a kernel-userspace API,
>> larger changes in the kernel, somebody to implement the userspace
>> side of things, and the user to update both the kernel and QEMU.
>> It's hard for me to see that the benefit here over the 3-line
>> approach really outweighs the extra effort needed. In practice
>> saying "we should do this" is saying "we're going to do nothing",
>> based on the historical record.
>>
>
> How about something like the following (completely untested, liable for
> ABI discussions etc. etc., but for illustration purposes).
>
> I think it raises the question (and likely many other) of whether we can
> break the existing 'ABI' and change behavior for missing ISV
> retrospectively for legacy user space when the issue has occurred?
>
> Someone might have written code that reacts to the -ENOSYS, so I've
> taken the conservative approach for this for the time being.
>
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 8a37c8e89777..19a92c49039c 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -76,6 +76,14 @@ struct kvm_arch {
>
> /* Mandated version of PSCI */
> u32 psci_version;
> +
> + /*
> + * If we encounter a data abort without valid instruction syndrome
> + * information, report this to user space. User space can (and
> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> + * supported.
> + */
> + bool return_nisv_io_abort_to_user;
> };
>
> #define KVM_NR_MEM_OBJS 40
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f656169db8c3..019bc560edc1 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -83,6 +83,14 @@ struct kvm_arch {
>
> /* Mandated version of PSCI */
> u32 psci_version;
> +
> + /*
> + * If we encounter a data abort without valid instruction syndrome
> + * information, report this to user space. User space can (and
> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> + * supported.
> + */
> + bool return_nisv_io_abort_to_user;

How about 32bit ARM?

> };
>
> #define KVM_NR_MEM_OBJS 40
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5e3f12d5359e..a4dd004d0db9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_S390_STSI 25
> #define KVM_EXIT_IOAPIC_EOI 26
> #define KVM_EXIT_HYPERV 27
> +#define KVM_EXIT_ARM_NISV 28
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> #define KVM_CAP_PMU_EVENT_FILTER 173
> +#define KVM_CAP_ARM_NISV_TO_USER 174
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 35a069815baf..2ce94bd9d4a9 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
> return 0;
> }
>
> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> + struct kvm_enable_cap *cap)

This overrides the weak implementation in virt/kvm/kvm_main.c. OK.

> +{
> + int r;
> +
> + if (cap->flags)
> + return -EINVAL;
> +
> + switch (cap->cap) {
> + case KVM_CAP_ARM_NISV_TO_USER:
> + r = 0;
> + kvm->arch.return_nisv_io_abort_to_user = true;
> + break;
> + default:
> + r = -EINVAL;
> + break;
> + }
> +
> + return r;
> +}
>
> /**
> * kvm_arch_init_vm - initializes a VM data structure
> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_MP_STATE:
> case KVM_CAP_IMMEDIATE_EXIT:
> case KVM_CAP_VCPU_EVENTS:
> + case KVM_CAP_ARM_NISV_TO_USER:
> r = 1;
> break;
> case KVM_CAP_ARM_SET_DEVICE_ADDR:
> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> if (ret)
> return ret;
> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
> + kvm_inject_undefined(vcpu);

So QEMU can try to enable the feature via IOCTL. And here you would
raise the 'undefined instruction' exception which QEMU will have to
handle in the loop calling KVM either by trying to make sense of the
instruction or by passing it on to the guest.

Conceptually this looks good to me and meets the requirements of my
application.

Thanks a lot for your suggestion.

Regards

Heinrich

> }
>
> if (run->immediate_exit)
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index 6af5c91337f2..62e6ef47a6de 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> if (ret)
> return ret;
> } else {
> - kvm_err("load/store instruction decoding not implemented\n");
> - return -ENOSYS;
> + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
> + run->exit_reason = KVM_EXIT_ARM_NISV;
> + run->mmio.phys_addr = fault_ipa;
> + vcpu->stat.mmio_exit_user++;
> + return 0;
> + } else {
> + kvm_info("encountered data abort without syndrome info\n");
> + return -ENOSYS;
> + }
> }
>
> rt = vcpu->arch.mmio_decode.rt;
>
>
> Thanks,
>
> Christoffer
>

2019-09-05 16:49:31

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

Hi Heinrich,

On Thu, Sep 05, 2019 at 02:01:36PM +0200, Heinrich Schuchardt wrote:
> On 9/5/19 11:20 AM, Stefan Hajnoczi wrote:
> > On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
> > > If an application tries to access memory that is not mapped, an error
> > > ENOSYS, "load/store instruction decoding not implemented" may occur.
> > > QEMU will hang with a register dump.
> > >
> > > Instead create a data abort that can be handled gracefully by the
> > > application running in the virtual environment.
> > >
> > > Now the virtual machine can react to the event in the most appropriate
> > > way - by recovering, by writing an informative log, or by rebooting.
> > >
> > > Signed-off-by: Heinrich Schuchardt <[email protected]>
> > > ---
> > > virt/kvm/arm/mmio.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> > > index a8a6a0c883f1..0cbed7d6a0f4 100644
> > > --- a/virt/kvm/arm/mmio.c
> > > +++ b/virt/kvm/arm/mmio.c
> > > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > > if (ret)
> > > return ret;
> > > } else {
> > > - kvm_err("load/store instruction decoding not implemented\n");
> > > - return -ENOSYS;
> > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > > + return 1;
> >
> > I see this more as a temporary debugging hack than something to merge.
> >
> > It sounds like in your case the guest environment provided good
> > debugging information and you preferred it over debugging this from the
> > host side. That's fine, but allowing the guest to continue running in
> > the general case makes it much harder to track down the root cause of a
> > problem because many guest CPU instructions may be executed after the
> > original problem occurs. Other guest software may fail silently in
> > weird ways. IMO it's best to fail early.
> >
> > Stefan
> >
>
> As virtual machine are ubiquitous, expect also mission critical system
> to run on them. At development time halting a machine may be a good
> idea. In production this is often the worst solution. Rebooting may be
> essential for survival.
>
> For an anecdotal example see:
> https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html
>
> I am convinced that leaving it to the guest to decide how to react is
> the best choice.
>
Maintaining strong adherence to the architecture is equally important,
and I'm sure we can find anecdotes to support how not doing the
expected, can also lead to disastrous outcomes.

Have you had a look at the suggested patch I sent? The idea is that we
can preserve existing legacy ABI, allow for a better debugging
experience, allow userspace to do emulation if it so wishes, and provide
a better error message if userspace doesn't handle this properly.

One thing we could change from my proposed patch would be to have KVM
inject the access as an external abort if the target address also
doesn't hit an MMIO device, which is by far the common scenario reported
here on the list.

Hopefully, a mission critical deployment based on KVM/Arm (scary as that
sounds), would use a recent and patched VMM (QEMU) that either causes
the external abort, or reboots the VM, as per the configuration of the
particular system in question.


Thanks,

Christoffer

2019-09-05 17:22:21

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 05/09/2019 10:22, Christoffer Dall wrote:
> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
>>>
>>> On Thu, 05 Sep 2019 09:16:54 +0100,
>>> Peter Maydell <[email protected]> wrote:
>>>> This is true, but the problem is that barfing out to userspace
>>>> makes it harder to debug the guest because it means that
>>>> the VM is immediately destroyed, whereas AIUI if we
>>>> inject some kind of exception then (assuming you're set up
>>>> to do kernel-debug via gdbstub) you can actually examine
>>>> the offending guest code with a debugger because at least
>>>> your VM is still around to inspect...
>>>
>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
>>> an exception, but the instruction that caused it may be completely
>>> legal (store with post-increment, for example), leading to an even
>>> more puzzled developer (that exception should never have been
>>> delivered the first place).
>>
>> Right, but the combination of "host kernel prints a message
>> about an unsupported load/store insn" and "within-guest debug
>> dump/stack trace/etc" is much more useful than just having
>> "host kernel prints message" and "QEMU exits"; and it requires
>> about 3 lines of code change...
>>
>>> I'm far more in favour of dumping the state of the access in the run
>>> structure (much like we do for a MMIO access) and let userspace do
>>> something about it (such as dumping information on the console or
>>> breaking). It could even inject an exception *if* the user has asked
>>> for it.
>>
>> ...whereas this requires agreement on a kernel-userspace API,
>> larger changes in the kernel, somebody to implement the userspace
>> side of things, and the user to update both the kernel and QEMU.
>> It's hard for me to see that the benefit here over the 3-line
>> approach really outweighs the extra effort needed. In practice
>> saying "we should do this" is saying "we're going to do nothing",
>> based on the historical record.
>>
>
> How about something like the following (completely untested, liable for
> ABI discussions etc. etc., but for illustration purposes).
>
> I think it raises the question (and likely many other) of whether we can
> break the existing 'ABI' and change behavior for missing ISV
> retrospectively for legacy user space when the issue has occurred?
>
> Someone might have written code that reacts to the -ENOSYS, so I've
> taken the conservative approach for this for the time being.
>
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 8a37c8e89777..19a92c49039c 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -76,6 +76,14 @@ struct kvm_arch {
>
> /* Mandated version of PSCI */
> u32 psci_version;
> +
> + /*
> + * If we encounter a data abort without valid instruction syndrome
> + * information, report this to user space. User space can (and
> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> + * supported.
> + */
> + bool return_nisv_io_abort_to_user;
> };
>
> #define KVM_NR_MEM_OBJS 40
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f656169db8c3..019bc560edc1 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -83,6 +83,14 @@ struct kvm_arch {
>
> /* Mandated version of PSCI */
> u32 psci_version;
> +
> + /*
> + * If we encounter a data abort without valid instruction syndrome
> + * information, report this to user space. User space can (and
> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> + * supported.
> + */
> + bool return_nisv_io_abort_to_user;
> };
>
> #define KVM_NR_MEM_OBJS 40
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5e3f12d5359e..a4dd004d0db9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_S390_STSI 25
> #define KVM_EXIT_IOAPIC_EOI 26
> #define KVM_EXIT_HYPERV 27
> +#define KVM_EXIT_ARM_NISV 28
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> #define KVM_CAP_PMU_EVENT_FILTER 173
> +#define KVM_CAP_ARM_NISV_TO_USER 174
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 35a069815baf..2ce94bd9d4a9 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
> return 0;
> }
>
> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> + struct kvm_enable_cap *cap)
> +{
> + int r;
> +
> + if (cap->flags)
> + return -EINVAL;
> +
> + switch (cap->cap) {
> + case KVM_CAP_ARM_NISV_TO_USER:
> + r = 0;
> + kvm->arch.return_nisv_io_abort_to_user = true;
> + break;
> + default:
> + r = -EINVAL;
> + break;
> + }
> +
> + return r;
> +}
>
> /**
> * kvm_arch_init_vm - initializes a VM data structure
> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_MP_STATE:
> case KVM_CAP_IMMEDIATE_EXIT:
> case KVM_CAP_VCPU_EVENTS:
> + case KVM_CAP_ARM_NISV_TO_USER:
> r = 1;
> break;
> case KVM_CAP_ARM_SET_DEVICE_ADDR:
> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> if (ret)
> return ret;
> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
> + kvm_inject_undefined(vcpu);

Just to make sure I understand: Is the expectation here that userspace
could clear the exit reason if it managed to handle the exit? And
otherwise we'd inject an UNDEF on reentry?

> }
>
> if (run->immediate_exit)
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index 6af5c91337f2..62e6ef47a6de 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> if (ret)
> return ret;
> } else {
> - kvm_err("load/store instruction decoding not implemented\n");
> - return -ENOSYS;
> + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
> + run->exit_reason = KVM_EXIT_ARM_NISV;
> + run->mmio.phys_addr = fault_ipa;

We could also record whether that's a read or a write (WnR should still
be valid). Actually, we could store a sanitized version of the ESR.

> + vcpu->stat.mmio_exit_user++;
> + return 0;
> + } else {
> + kvm_info("encountered data abort without syndrome info\n");

My only issue with this is that the previous message has been sort of
documented...

Thanks,

M.
--
Jazz is not dead, it just smells funny...

2019-09-06 11:09:54

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, Sep 05, 2019 at 03:25:47PM +0200, Heinrich Schuchardt wrote:
> On 9/5/19 11:22 AM, Christoffer Dall wrote:
> > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
> > > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
> > > >
> > > > On Thu, 05 Sep 2019 09:16:54 +0100,
> > > > Peter Maydell <[email protected]> wrote:
> > > > > This is true, but the problem is that barfing out to userspace
> > > > > makes it harder to debug the guest because it means that
> > > > > the VM is immediately destroyed, whereas AIUI if we
> > > > > inject some kind of exception then (assuming you're set up
> > > > > to do kernel-debug via gdbstub) you can actually examine
> > > > > the offending guest code with a debugger because at least
> > > > > your VM is still around to inspect...
> > > >
> > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> > > > an exception, but the instruction that caused it may be completely
> > > > legal (store with post-increment, for example), leading to an even
> > > > more puzzled developer (that exception should never have been
> > > > delivered the first place).
> > >
> > > Right, but the combination of "host kernel prints a message
> > > about an unsupported load/store insn" and "within-guest debug
> > > dump/stack trace/etc" is much more useful than just having
> > > "host kernel prints message" and "QEMU exits"; and it requires
> > > about 3 lines of code change...
> > >
> > > > I'm far more in favour of dumping the state of the access in the run
> > > > structure (much like we do for a MMIO access) and let userspace do
> > > > something about it (such as dumping information on the console or
> > > > breaking). It could even inject an exception *if* the user has asked
> > > > for it.
> > >
> > > ...whereas this requires agreement on a kernel-userspace API,
> > > larger changes in the kernel, somebody to implement the userspace
> > > side of things, and the user to update both the kernel and QEMU.
> > > It's hard for me to see that the benefit here over the 3-line
> > > approach really outweighs the extra effort needed. In practice
> > > saying "we should do this" is saying "we're going to do nothing",
> > > based on the historical record.
> > >
> >
> > How about something like the following (completely untested, liable for
> > ABI discussions etc. etc., but for illustration purposes).
> >
> > I think it raises the question (and likely many other) of whether we can
> > break the existing 'ABI' and change behavior for missing ISV
> > retrospectively for legacy user space when the issue has occurred?
> >
> > Someone might have written code that reacts to the -ENOSYS, so I've
> > taken the conservative approach for this for the time being.
> >
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 8a37c8e89777..19a92c49039c 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -76,6 +76,14 @@ struct kvm_arch {
> >
> > /* Mandated version of PSCI */
> > u32 psci_version;
> > +
> > + /*
> > + * If we encounter a data abort without valid instruction syndrome
> > + * information, report this to user space. User space can (and
> > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > + * supported.
> > + */
> > + bool return_nisv_io_abort_to_user;
> > };
> >
> > #define KVM_NR_MEM_OBJS 40
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index f656169db8c3..019bc560edc1 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -83,6 +83,14 @@ struct kvm_arch {
> >
> > /* Mandated version of PSCI */
> > u32 psci_version;
> > +
> > + /*
> > + * If we encounter a data abort without valid instruction syndrome
> > + * information, report this to user space. User space can (and
> > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > + * supported.
> > + */
> > + bool return_nisv_io_abort_to_user;
>
> How about 32bit ARM?
>

What about it? Not sure I understand the comment.

> > };
> >
> > #define KVM_NR_MEM_OBJS 40
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 5e3f12d5359e..a4dd004d0db9 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
> > #define KVM_EXIT_S390_STSI 25
> > #define KVM_EXIT_IOAPIC_EOI 26
> > #define KVM_EXIT_HYPERV 27
> > +#define KVM_EXIT_ARM_NISV 28
> >
> > /* For KVM_EXIT_INTERNAL_ERROR */
> > /* Emulate instruction failed. */
> > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
> > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> > #define KVM_CAP_PMU_EVENT_FILTER 173
> > +#define KVM_CAP_ARM_NISV_TO_USER 174
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 35a069815baf..2ce94bd9d4a9 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
> > return 0;
> > }
> >
> > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > + struct kvm_enable_cap *cap)
>
> This overrides the weak implementation in virt/kvm/kvm_main.c. OK.
>

Yes.

> > +{
> > + int r;
> > +
> > + if (cap->flags)
> > + return -EINVAL;
> > +
> > + switch (cap->cap) {
> > + case KVM_CAP_ARM_NISV_TO_USER:
> > + r = 0;
> > + kvm->arch.return_nisv_io_abort_to_user = true;
> > + break;
> > + default:
> > + r = -EINVAL;
> > + break;
> > + }
> > +
> > + return r;
> > +}
> >
> > /**
> > * kvm_arch_init_vm - initializes a VM data structure
> > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_MP_STATE:
> > case KVM_CAP_IMMEDIATE_EXIT:
> > case KVM_CAP_VCPU_EVENTS:
> > + case KVM_CAP_ARM_NISV_TO_USER:
> > r = 1;
> > break;
> > case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> > if (ret)
> > return ret;
> > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
> > + kvm_inject_undefined(vcpu);
>
> So QEMU can try to enable the feature via IOCTL. And here you would
> raise the 'undefined instruction' exception which QEMU will have to
> handle in the loop calling KVM either by trying to make sense of the
> instruction or by passing it on to the guest.
>
> Conceptually this looks good to me and meets the requirements of my
> application.
>
> Thanks a lot for your suggestion.
>

I will change the undef to an external abort as I think that's more in
line with the architecture, and document this, test and send out as a
proper patch.

Thanks,

Christoffer

2019-09-06 14:06:14

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:
> On 05/09/2019 10:22, Christoffer Dall wrote:
> > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
> >> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
> >>>
> >>> On Thu, 05 Sep 2019 09:16:54 +0100,
> >>> Peter Maydell <[email protected]> wrote:
> >>>> This is true, but the problem is that barfing out to userspace
> >>>> makes it harder to debug the guest because it means that
> >>>> the VM is immediately destroyed, whereas AIUI if we
> >>>> inject some kind of exception then (assuming you're set up
> >>>> to do kernel-debug via gdbstub) you can actually examine
> >>>> the offending guest code with a debugger because at least
> >>>> your VM is still around to inspect...
> >>>
> >>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> >>> an exception, but the instruction that caused it may be completely
> >>> legal (store with post-increment, for example), leading to an even
> >>> more puzzled developer (that exception should never have been
> >>> delivered the first place).
> >>
> >> Right, but the combination of "host kernel prints a message
> >> about an unsupported load/store insn" and "within-guest debug
> >> dump/stack trace/etc" is much more useful than just having
> >> "host kernel prints message" and "QEMU exits"; and it requires
> >> about 3 lines of code change...
> >>
> >>> I'm far more in favour of dumping the state of the access in the run
> >>> structure (much like we do for a MMIO access) and let userspace do
> >>> something about it (such as dumping information on the console or
> >>> breaking). It could even inject an exception *if* the user has asked
> >>> for it.
> >>
> >> ...whereas this requires agreement on a kernel-userspace API,
> >> larger changes in the kernel, somebody to implement the userspace
> >> side of things, and the user to update both the kernel and QEMU.
> >> It's hard for me to see that the benefit here over the 3-line
> >> approach really outweighs the extra effort needed. In practice
> >> saying "we should do this" is saying "we're going to do nothing",
> >> based on the historical record.
> >>
> >
> > How about something like the following (completely untested, liable for
> > ABI discussions etc. etc., but for illustration purposes).
> >
> > I think it raises the question (and likely many other) of whether we can
> > break the existing 'ABI' and change behavior for missing ISV
> > retrospectively for legacy user space when the issue has occurred?
> >
> > Someone might have written code that reacts to the -ENOSYS, so I've
> > taken the conservative approach for this for the time being.
> >
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 8a37c8e89777..19a92c49039c 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -76,6 +76,14 @@ struct kvm_arch {
> >
> > /* Mandated version of PSCI */
> > u32 psci_version;
> > +
> > + /*
> > + * If we encounter a data abort without valid instruction syndrome
> > + * information, report this to user space. User space can (and
> > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > + * supported.
> > + */
> > + bool return_nisv_io_abort_to_user;
> > };
> >
> > #define KVM_NR_MEM_OBJS 40
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index f656169db8c3..019bc560edc1 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -83,6 +83,14 @@ struct kvm_arch {
> >
> > /* Mandated version of PSCI */
> > u32 psci_version;
> > +
> > + /*
> > + * If we encounter a data abort without valid instruction syndrome
> > + * information, report this to user space. User space can (and
> > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > + * supported.
> > + */
> > + bool return_nisv_io_abort_to_user;
> > };
> >
> > #define KVM_NR_MEM_OBJS 40
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 5e3f12d5359e..a4dd004d0db9 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
> > #define KVM_EXIT_S390_STSI 25
> > #define KVM_EXIT_IOAPIC_EOI 26
> > #define KVM_EXIT_HYPERV 27
> > +#define KVM_EXIT_ARM_NISV 28
> >
> > /* For KVM_EXIT_INTERNAL_ERROR */
> > /* Emulate instruction failed. */
> > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
> > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> > #define KVM_CAP_PMU_EVENT_FILTER 173
> > +#define KVM_CAP_ARM_NISV_TO_USER 174
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 35a069815baf..2ce94bd9d4a9 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
> > return 0;
> > }
> >
> > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > + struct kvm_enable_cap *cap)
> > +{
> > + int r;
> > +
> > + if (cap->flags)
> > + return -EINVAL;
> > +
> > + switch (cap->cap) {
> > + case KVM_CAP_ARM_NISV_TO_USER:
> > + r = 0;
> > + kvm->arch.return_nisv_io_abort_to_user = true;
> > + break;
> > + default:
> > + r = -EINVAL;
> > + break;
> > + }
> > +
> > + return r;
> > +}
> >
> > /**
> > * kvm_arch_init_vm - initializes a VM data structure
> > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_MP_STATE:
> > case KVM_CAP_IMMEDIATE_EXIT:
> > case KVM_CAP_VCPU_EVENTS:
> > + case KVM_CAP_ARM_NISV_TO_USER:
> > r = 1;
> > break;
> > case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> > if (ret)
> > return ret;
> > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
> > + kvm_inject_undefined(vcpu);
>
> Just to make sure I understand: Is the expectation here that userspace
> could clear the exit reason if it managed to handle the exit? And
> otherwise we'd inject an UNDEF on reentry?
>

Yes, but I think we should change that to an external abort. I'll test
something and send a proper patch with more clear documentation.

> > }
> >
> > if (run->immediate_exit)
> > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> > index 6af5c91337f2..62e6ef47a6de 100644
> > --- a/virt/kvm/arm/mmio.c
> > +++ b/virt/kvm/arm/mmio.c
> > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > if (ret)
> > return ret;
> > } else {
> > - kvm_err("load/store instruction decoding not implemented\n");
> > - return -ENOSYS;
> > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
> > + run->exit_reason = KVM_EXIT_ARM_NISV;
> > + run->mmio.phys_addr = fault_ipa;
>
> We could also record whether that's a read or a write (WnR should still
> be valid). Actually, we could store a sanitized version of the ESR.
>

Ah yes, I'll incorporate that.

> > + vcpu->stat.mmio_exit_user++;
> > + return 0;
> > + } else {
> > + kvm_info("encountered data abort without syndrome info\n");
>
> My only issue with this is that the previous message has been sort of
> documented...

Well, my main gripe with the current code is that the error message is
massively misleading because it explains one possible case, which is
very "kernel part of a KVM VM centric" and is actually not the common
scenario that people encounter.

Let me work on the particular wording of the error message and see if I
can achieve something self-documenting.


Thanks,

Christoffer

2019-09-06 17:51:10

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded



On 06.09.19 10:00, Christoffer Dall wrote:
> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:
>> On 05/09/2019 10:22, Christoffer Dall wrote:
>>> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
>>>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
>>>>>
>>>>> On Thu, 05 Sep 2019 09:16:54 +0100,
>>>>> Peter Maydell <[email protected]> wrote:
>>>>>> This is true, but the problem is that barfing out to userspace
>>>>>> makes it harder to debug the guest because it means that
>>>>>> the VM is immediately destroyed, whereas AIUI if we
>>>>>> inject some kind of exception then (assuming you're set up
>>>>>> to do kernel-debug via gdbstub) you can actually examine
>>>>>> the offending guest code with a debugger because at least
>>>>>> your VM is still around to inspect...
>>>>>
>>>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
>>>>> an exception, but the instruction that caused it may be completely
>>>>> legal (store with post-increment, for example), leading to an even
>>>>> more puzzled developer (that exception should never have been
>>>>> delivered the first place).
>>>>
>>>> Right, but the combination of "host kernel prints a message
>>>> about an unsupported load/store insn" and "within-guest debug
>>>> dump/stack trace/etc" is much more useful than just having
>>>> "host kernel prints message" and "QEMU exits"; and it requires
>>>> about 3 lines of code change...
>>>>
>>>>> I'm far more in favour of dumping the state of the access in the run
>>>>> structure (much like we do for a MMIO access) and let userspace do
>>>>> something about it (such as dumping information on the console or
>>>>> breaking). It could even inject an exception *if* the user has asked
>>>>> for it.
>>>>
>>>> ...whereas this requires agreement on a kernel-userspace API,
>>>> larger changes in the kernel, somebody to implement the userspace
>>>> side of things, and the user to update both the kernel and QEMU.
>>>> It's hard for me to see that the benefit here over the 3-line
>>>> approach really outweighs the extra effort needed. In practice
>>>> saying "we should do this" is saying "we're going to do nothing",
>>>> based on the historical record.
>>>>
>>>
>>> How about something like the following (completely untested, liable for
>>> ABI discussions etc. etc., but for illustration purposes).
>>>
>>> I think it raises the question (and likely many other) of whether we can
>>> break the existing 'ABI' and change behavior for missing ISV
>>> retrospectively for legacy user space when the issue has occurred?
>>>
>>> Someone might have written code that reacts to the -ENOSYS, so I've
>>> taken the conservative approach for this for the time being.
>>>
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 8a37c8e89777..19a92c49039c 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -76,6 +76,14 @@ struct kvm_arch {
>>>
>>> /* Mandated version of PSCI */
>>> u32 psci_version;
>>> +
>>> + /*
>>> + * If we encounter a data abort without valid instruction syndrome
>>> + * information, report this to user space. User space can (and
>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>>> + * supported.
>>> + */
>>> + bool return_nisv_io_abort_to_user;
>>> };
>>>
>>> #define KVM_NR_MEM_OBJS 40
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index f656169db8c3..019bc560edc1 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -83,6 +83,14 @@ struct kvm_arch {
>>>
>>> /* Mandated version of PSCI */
>>> u32 psci_version;
>>> +
>>> + /*
>>> + * If we encounter a data abort without valid instruction syndrome
>>> + * information, report this to user space. User space can (and
>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>>> + * supported.
>>> + */
>>> + bool return_nisv_io_abort_to_user;
>>> };
>>>
>>> #define KVM_NR_MEM_OBJS 40
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 5e3f12d5359e..a4dd004d0db9 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
>>> #define KVM_EXIT_S390_STSI 25
>>> #define KVM_EXIT_IOAPIC_EOI 26
>>> #define KVM_EXIT_HYPERV 27
>>> +#define KVM_EXIT_ARM_NISV 28
>>>
>>> /* For KVM_EXIT_INTERNAL_ERROR */
>>> /* Emulate instruction failed. */
>>> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
>>> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
>>> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
>>> #define KVM_CAP_PMU_EVENT_FILTER 173
>>> +#define KVM_CAP_ARM_NISV_TO_USER 174
>>>
>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 35a069815baf..2ce94bd9d4a9 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>>> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
>>> return 0;
>>> }
>>>
>>> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>> + struct kvm_enable_cap *cap)
>>> +{
>>> + int r;
>>> +
>>> + if (cap->flags)
>>> + return -EINVAL;
>>> +
>>> + switch (cap->cap) {
>>> + case KVM_CAP_ARM_NISV_TO_USER:
>>> + r = 0;
>>> + kvm->arch.return_nisv_io_abort_to_user = true;
>>> + break;
>>> + default:
>>> + r = -EINVAL;
>>> + break;
>>> + }
>>> +
>>> + return r;
>>> +}
>>>
>>> /**
>>> * kvm_arch_init_vm - initializes a VM data structure
>>> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>> case KVM_CAP_MP_STATE:
>>> case KVM_CAP_IMMEDIATE_EXIT:
>>> case KVM_CAP_VCPU_EVENTS:
>>> + case KVM_CAP_ARM_NISV_TO_USER:
>>> r = 1;
>>> break;
>>> case KVM_CAP_ARM_SET_DEVICE_ADDR:
>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>>> if (ret)
>>> return ret;
>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
>>> + kvm_inject_undefined(vcpu);
>>
>> Just to make sure I understand: Is the expectation here that userspace
>> could clear the exit reason if it managed to handle the exit? And
>> otherwise we'd inject an UNDEF on reentry?
>>
>
> Yes, but I think we should change that to an external abort. I'll test
> something and send a proper patch with more clear documentation.

Why not leave the injection to user space in any case? API wise there is
no need to be backwards compatible, as we require the CAP to be enabled,
right?

IMHO it should be 100% a policy decision in user space whether to
emulate and what type of exception to inject, if anything.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2019-09-06 18:26:09

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On 06/09/2019 13:08, Alexander Graf wrote:
>
>
> On 06.09.19 10:00, Christoffer Dall wrote:
>> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:

[...]

>>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>>>> if (ret)
>>>> return ret;
>>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
>>>> + kvm_inject_undefined(vcpu);
>>>
>>> Just to make sure I understand: Is the expectation here that userspace
>>> could clear the exit reason if it managed to handle the exit? And
>>> otherwise we'd inject an UNDEF on reentry?
>>>
>>
>> Yes, but I think we should change that to an external abort. I'll test
>> something and send a proper patch with more clear documentation.
>
> Why not leave the injection to user space in any case? API wise there is
> no need to be backwards compatible, as we require the CAP to be enabled,
> right?
>
> IMHO it should be 100% a policy decision in user space whether to
> emulate and what type of exception to inject, if anything.

The exception has to be something that the trapped instruction can
actually generate. An UNDEF is definitely wrong, as the guest would have
otherwise UNDEF'd at EL1, and KVM would have never seen it. You cannot
deviate from the rule of architecture, and userspace feels like the
wrong place to enforce it.

M.
--
Jazz is not dead, it just smells funny...

2019-09-06 18:46:12

by Alexander Graf

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded



On 06.09.19 14:34, Marc Zyngier wrote:
> On 06/09/2019 13:08, Alexander Graf wrote:
>>
>>
>> On 06.09.19 10:00, Christoffer Dall wrote:
>>> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:
>
> [...]
>
>>>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>>>>> if (ret)
>>>>> return ret;
>>>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
>>>>> + kvm_inject_undefined(vcpu);
>>>>
>>>> Just to make sure I understand: Is the expectation here that userspace
>>>> could clear the exit reason if it managed to handle the exit? And
>>>> otherwise we'd inject an UNDEF on reentry?
>>>>
>>>
>>> Yes, but I think we should change that to an external abort. I'll test
>>> something and send a proper patch with more clear documentation.
>>
>> Why not leave the injection to user space in any case? API wise there is
>> no need to be backwards compatible, as we require the CAP to be enabled,
>> right?
>>
>> IMHO it should be 100% a policy decision in user space whether to
>> emulate and what type of exception to inject, if anything.
>
> The exception has to be something that the trapped instruction can
> actually generate. An UNDEF is definitely wrong, as the guest would have
> otherwise UNDEF'd at EL1, and KVM would have never seen it. You cannot
> deviate from the rule of architecture, and userspace feels like the
> wrong place to enforce it.

There are multiple viable options user space has:

1) Trigger an external abort
2) Emulate the instruction in user space
3) Inject a PV mechanism into the guest to emulate the insn inside
guest space

Why should we treat 1) any different from 2) or 3)? Why is there a "fast
path" for the external abort, on an exit that is not performance
critical or has any other reason to get special attention from kernel
space. All we're doing is add more code in a privileged layer for a case
that realistically should never occur in the first place.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2019-09-06 18:48:41

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded



On 06.09.19 15:12, Christoffer Dall wrote:
> On Fri, Sep 06, 2019 at 02:08:15PM +0200, Alexander Graf wrote:
>>
>>
>> On 06.09.19 10:00, Christoffer Dall wrote:
>>> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:
>>>> On 05/09/2019 10:22, Christoffer Dall wrote:
>>>>> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
>>>>>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
>>>>>>>
>>>>>>> On Thu, 05 Sep 2019 09:16:54 +0100,
>>>>>>> Peter Maydell <[email protected]> wrote:
>>>>>>>> This is true, but the problem is that barfing out to userspace
>>>>>>>> makes it harder to debug the guest because it means that
>>>>>>>> the VM is immediately destroyed, whereas AIUI if we
>>>>>>>> inject some kind of exception then (assuming you're set up
>>>>>>>> to do kernel-debug via gdbstub) you can actually examine
>>>>>>>> the offending guest code with a debugger because at least
>>>>>>>> your VM is still around to inspect...
>>>>>>>
>>>>>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get
>>>>>>> an exception, but the instruction that caused it may be completely
>>>>>>> legal (store with post-increment, for example), leading to an even
>>>>>>> more puzzled developer (that exception should never have been
>>>>>>> delivered the first place).
>>>>>>
>>>>>> Right, but the combination of "host kernel prints a message
>>>>>> about an unsupported load/store insn" and "within-guest debug
>>>>>> dump/stack trace/etc" is much more useful than just having
>>>>>> "host kernel prints message" and "QEMU exits"; and it requires
>>>>>> about 3 lines of code change...
>>>>>>
>>>>>>> I'm far more in favour of dumping the state of the access in the run
>>>>>>> structure (much like we do for a MMIO access) and let userspace do
>>>>>>> something about it (such as dumping information on the console or
>>>>>>> breaking). It could even inject an exception *if* the user has asked
>>>>>>> for it.
>>>>>>
>>>>>> ...whereas this requires agreement on a kernel-userspace API,
>>>>>> larger changes in the kernel, somebody to implement the userspace
>>>>>> side of things, and the user to update both the kernel and QEMU.
>>>>>> It's hard for me to see that the benefit here over the 3-line
>>>>>> approach really outweighs the extra effort needed. In practice
>>>>>> saying "we should do this" is saying "we're going to do nothing",
>>>>>> based on the historical record.
>>>>>>
>>>>>
>>>>> How about something like the following (completely untested, liable for
>>>>> ABI discussions etc. etc., but for illustration purposes).
>>>>>
>>>>> I think it raises the question (and likely many other) of whether we can
>>>>> break the existing 'ABI' and change behavior for missing ISV
>>>>> retrospectively for legacy user space when the issue has occurred?
>>>>> Someone might have written code that reacts to the -ENOSYS, so I've
>>>>> taken the conservative approach for this for the time being.
>>>>>
>>>>>
>>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>>> index 8a37c8e89777..19a92c49039c 100644
>>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>>> @@ -76,6 +76,14 @@ struct kvm_arch {
>>>>> /* Mandated version of PSCI */
>>>>> u32 psci_version;
>>>>> +
>>>>> + /*
>>>>> + * If we encounter a data abort without valid instruction syndrome
>>>>> + * information, report this to user space. User space can (and
>>>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>>>>> + * supported.
>>>>> + */
>>>>> + bool return_nisv_io_abort_to_user;
>>>>> };
>>>>> #define KVM_NR_MEM_OBJS 40
>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>> index f656169db8c3..019bc560edc1 100644
>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>> @@ -83,6 +83,14 @@ struct kvm_arch {
>>>>> /* Mandated version of PSCI */
>>>>> u32 psci_version;
>>>>> +
>>>>> + /*
>>>>> + * If we encounter a data abort without valid instruction syndrome
>>>>> + * information, report this to user space. User space can (and
>>>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
>>>>> + * supported.
>>>>> + */
>>>>> + bool return_nisv_io_abort_to_user;
>>>>> };
>>>>> #define KVM_NR_MEM_OBJS 40
>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>> index 5e3f12d5359e..a4dd004d0db9 100644
>>>>> --- a/include/uapi/linux/kvm.h
>>>>> +++ b/include/uapi/linux/kvm.h
>>>>> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
>>>>> #define KVM_EXIT_S390_STSI 25
>>>>> #define KVM_EXIT_IOAPIC_EOI 26
>>>>> #define KVM_EXIT_HYPERV 27
>>>>> +#define KVM_EXIT_ARM_NISV 28
>>>>> /* For KVM_EXIT_INTERNAL_ERROR */
>>>>> /* Emulate instruction failed. */
>>>>> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
>>>>> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
>>>>> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
>>>>> #define KVM_CAP_PMU_EVENT_FILTER 173
>>>>> +#define KVM_CAP_ARM_NISV_TO_USER 174
>>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>>>> index 35a069815baf..2ce94bd9d4a9 100644
>>>>> --- a/virt/kvm/arm/arm.c
>>>>> +++ b/virt/kvm/arm/arm.c
>>>>> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
>>>>> return 0;
>>>>> }
>>>>> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>>>> + struct kvm_enable_cap *cap)
>>>>> +{
>>>>> + int r;
>>>>> +
>>>>> + if (cap->flags)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + switch (cap->cap) {
>>>>> + case KVM_CAP_ARM_NISV_TO_USER:
>>>>> + r = 0;
>>>>> + kvm->arch.return_nisv_io_abort_to_user = true;
>>>>> + break;
>>>>> + default:
>>>>> + r = -EINVAL;
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + return r;
>>>>> +}
>>>>> /**
>>>>> * kvm_arch_init_vm - initializes a VM data structure
>>>>> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>> case KVM_CAP_MP_STATE:
>>>>> case KVM_CAP_IMMEDIATE_EXIT:
>>>>> case KVM_CAP_VCPU_EVENTS:
>>>>> + case KVM_CAP_ARM_NISV_TO_USER:
>>>>> r = 1;
>>>>> break;
>>>>> case KVM_CAP_ARM_SET_DEVICE_ADDR:
>>>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run);
>>>>> if (ret)
>>>>> return ret;
>>>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
>>>>> + kvm_inject_undefined(vcpu);
>>>>
>>>> Just to make sure I understand: Is the expectation here that userspace
>>>> could clear the exit reason if it managed to handle the exit? And
>>>> otherwise we'd inject an UNDEF on reentry?
>>>>
>>>
>>> Yes, but I think we should change that to an external abort. I'll test
>>> something and send a proper patch with more clear documentation.
>>
>> Why not leave the injection to user space in any case? API wise there is no
>> need to be backwards compatible, as we require the CAP to be enabled, right?
>>
>
> I'd prefer leaving it to userspace to worry about, but I thought Peter
> said that had been problematic historically, which I took at face value,
> but I could have misunderstood.
>
> If QEMU, kvmtool, and whatever the crazy^H cool kids are using in
> userspace these days are happy emulating the exception, then that's a
> viable approach. The main concern I have with that is whether they'll
> all get it right, and since we already have the code in the kernel to do
> this, it might make sense to re-use the kernel logic for it.

You could make the same argument about injecting an #SError on an out of
bounds access to MMIO.

If injecting a fault is too complicated, we should fix that rather than
create an unbalanced user space interface :).

> I'll leave it in for v1 of the patch, and if based on how that code and
> interface looks like, we agree it's better to leave it to userspace, I
> can remove it in v2.

Sure, works for me :). Please CC me on v1 so I can comment on it ;)


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2019-09-06 18:48:42

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <[email protected]> wrote:
> I'd prefer leaving it to userspace to worry about, but I thought Peter
> said that had been problematic historically, which I took at face value,
> but I could have misunderstood.
>
> If QEMU, kvmtool, and whatever the crazy^H cool kids are using in
> userspace these days are happy emulating the exception, then that's a
> viable approach. The main concern I have with that is whether they'll
> all get it right, and since we already have the code in the kernel to do
> this, it might make sense to re-use the kernel logic for it.

Well, for QEMU we've had code that in theory might do this but
in practice it's never been tested. Essentially the problem is
that nobody ever wants to inject an exception from userspace
except in incredibly rare cases like "trying to use h/w breakpoints
simultaneously inside the VM and also to debug the VM from outside"
or "we're running on RAS hardware that just told us that the VM's
RAM was faulty". There's no even vaguely commonly-used usecase
for it today; and this ABI suggestion adds another "this is in
practice almost never going to happen" case to the pile. The
codepath is unreliable in QEMU because (a) it requires getting
syncing of sysreg values to and from the kernel right -- this is
about the only situation where userspace wants to modify sysregs
during execution of the VM, as opposed to just reading them -- and
(b) we try to reuse the code we already have that does TCG exception
injection, which might or might not be a design mistake, and
(c) as noted above it's a never-actually-used untested codepath...

So I think if I were you I wouldn't commit to the kernel ABI until
somebody had at least written some RFC-quality patches for QEMU and
tested that they work and the ABI is OK in that sense. (For the
avoidance of doubt, I'm not volunteering to do that myself.)
I don't object to the idea in principle, though.

PS: the other "injecting exceptions to the guest" oddity is that
if the kernel *does* find the ISV information and returns to userspace
for it to handle the MMIO, there's no way for userspace to say
"actually that address is supposed to generate a data abort".

thanks
-- PMM

2019-09-06 18:48:42

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Fri, Sep 06, 2019 at 02:08:15PM +0200, Alexander Graf wrote:
>
>
> On 06.09.19 10:00, Christoffer Dall wrote:
> > On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:
> > > On 05/09/2019 10:22, Christoffer Dall wrote:
> > > > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:
> > > > > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, 05 Sep 2019 09:16:54 +0100,
> > > > > > Peter Maydell <[email protected]> wrote:
> > > > > > > This is true, but the problem is that barfing out to userspace
> > > > > > > makes it harder to debug the guest because it means that
> > > > > > > the VM is immediately destroyed, whereas AIUI if we
> > > > > > > inject some kind of exception then (assuming you're set up
> > > > > > > to do kernel-debug via gdbstub) you can actually examine
> > > > > > > the offending guest code with a debugger because at least
> > > > > > > your VM is still around to inspect...
> > > > > >
> > > > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get
> > > > > > an exception, but the instruction that caused it may be completely
> > > > > > legal (store with post-increment, for example), leading to an even
> > > > > > more puzzled developer (that exception should never have been
> > > > > > delivered the first place).
> > > > >
> > > > > Right, but the combination of "host kernel prints a message
> > > > > about an unsupported load/store insn" and "within-guest debug
> > > > > dump/stack trace/etc" is much more useful than just having
> > > > > "host kernel prints message" and "QEMU exits"; and it requires
> > > > > about 3 lines of code change...
> > > > >
> > > > > > I'm far more in favour of dumping the state of the access in the run
> > > > > > structure (much like we do for a MMIO access) and let userspace do
> > > > > > something about it (such as dumping information on the console or
> > > > > > breaking). It could even inject an exception *if* the user has asked
> > > > > > for it.
> > > > >
> > > > > ...whereas this requires agreement on a kernel-userspace API,
> > > > > larger changes in the kernel, somebody to implement the userspace
> > > > > side of things, and the user to update both the kernel and QEMU.
> > > > > It's hard for me to see that the benefit here over the 3-line
> > > > > approach really outweighs the extra effort needed. In practice
> > > > > saying "we should do this" is saying "we're going to do nothing",
> > > > > based on the historical record.
> > > > >
> > > >
> > > > How about something like the following (completely untested, liable for
> > > > ABI discussions etc. etc., but for illustration purposes).
> > > >
> > > > I think it raises the question (and likely many other) of whether we can
> > > > break the existing 'ABI' and change behavior for missing ISV
> > > > retrospectively for legacy user space when the issue has occurred?
> > > > Someone might have written code that reacts to the -ENOSYS, so I've
> > > > taken the conservative approach for this for the time being.
> > > >
> > > >
> > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > > index 8a37c8e89777..19a92c49039c 100644
> > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > @@ -76,6 +76,14 @@ struct kvm_arch {
> > > > /* Mandated version of PSCI */
> > > > u32 psci_version;
> > > > +
> > > > + /*
> > > > + * If we encounter a data abort without valid instruction syndrome
> > > > + * information, report this to user space. User space can (and
> > > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > > > + * supported.
> > > > + */
> > > > + bool return_nisv_io_abort_to_user;
> > > > };
> > > > #define KVM_NR_MEM_OBJS 40
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index f656169db8c3..019bc560edc1 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -83,6 +83,14 @@ struct kvm_arch {
> > > > /* Mandated version of PSCI */
> > > > u32 psci_version;
> > > > +
> > > > + /*
> > > > + * If we encounter a data abort without valid instruction syndrome
> > > > + * information, report this to user space. User space can (and
> > > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> > > > + * supported.
> > > > + */
> > > > + bool return_nisv_io_abort_to_user;
> > > > };
> > > > #define KVM_NR_MEM_OBJS 40
> > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > > index 5e3f12d5359e..a4dd004d0db9 100644
> > > > --- a/include/uapi/linux/kvm.h
> > > > +++ b/include/uapi/linux/kvm.h
> > > > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
> > > > #define KVM_EXIT_S390_STSI 25
> > > > #define KVM_EXIT_IOAPIC_EOI 26
> > > > #define KVM_EXIT_HYPERV 27
> > > > +#define KVM_EXIT_ARM_NISV 28
> > > > /* For KVM_EXIT_INTERNAL_ERROR */
> > > > /* Emulate instruction failed. */
> > > > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
> > > > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
> > > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
> > > > #define KVM_CAP_PMU_EVENT_FILTER 173
> > > > +#define KVM_CAP_ARM_NISV_TO_USER 174
> > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > index 35a069815baf..2ce94bd9d4a9 100644
> > > > --- a/virt/kvm/arm/arm.c
> > > > +++ b/virt/kvm/arm/arm.c
> > > > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
> > > > return 0;
> > > > }
> > > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > + struct kvm_enable_cap *cap)
> > > > +{
> > > > + int r;
> > > > +
> > > > + if (cap->flags)
> > > > + return -EINVAL;
> > > > +
> > > > + switch (cap->cap) {
> > > > + case KVM_CAP_ARM_NISV_TO_USER:
> > > > + r = 0;
> > > > + kvm->arch.return_nisv_io_abort_to_user = true;
> > > > + break;
> > > > + default:
> > > > + r = -EINVAL;
> > > > + break;
> > > > + }
> > > > +
> > > > + return r;
> > > > +}
> > > > /**
> > > > * kvm_arch_init_vm - initializes a VM data structure
> > > > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > > case KVM_CAP_MP_STATE:
> > > > case KVM_CAP_IMMEDIATE_EXIT:
> > > > case KVM_CAP_VCPU_EVENTS:
> > > > + case KVM_CAP_ARM_NISV_TO_USER:
> > > > r = 1;
> > > > break;
> > > > case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > > > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > ret = kvm_handle_mmio_return(vcpu, vcpu->run);
> > > > if (ret)
> > > > return ret;
> > > > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
> > > > + kvm_inject_undefined(vcpu);
> > >
> > > Just to make sure I understand: Is the expectation here that userspace
> > > could clear the exit reason if it managed to handle the exit? And
> > > otherwise we'd inject an UNDEF on reentry?
> > >
> >
> > Yes, but I think we should change that to an external abort. I'll test
> > something and send a proper patch with more clear documentation.
>
> Why not leave the injection to user space in any case? API wise there is no
> need to be backwards compatible, as we require the CAP to be enabled, right?
>

I'd prefer leaving it to userspace to worry about, but I thought Peter
said that had been problematic historically, which I took at face value,
but I could have misunderstood.

If QEMU, kvmtool, and whatever the crazy^H cool kids are using in
userspace these days are happy emulating the exception, then that's a
viable approach. The main concern I have with that is whether they'll
all get it right, and since we already have the code in the kernel to do
this, it might make sense to re-use the kernel logic for it.

I'll leave it in for v1 of the patch, and if based on how that code and
interface looks like, we agree it's better to leave it to userspace, I
can remove it in v2.


Thanks,

Christoffer

2019-09-06 18:50:03

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded



On 06.09.19 15:31, Peter Maydell wrote:
> On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <[email protected]> wrote:
>> I'd prefer leaving it to userspace to worry about, but I thought Peter
>> said that had been problematic historically, which I took at face value,
>> but I could have misunderstood.
>>
>> If QEMU, kvmtool, and whatever the crazy^H cool kids are using in
>> userspace these days are happy emulating the exception, then that's a
>> viable approach. The main concern I have with that is whether they'll
>> all get it right, and since we already have the code in the kernel to do
>> this, it might make sense to re-use the kernel logic for it.
>
> Well, for QEMU we've had code that in theory might do this but
> in practice it's never been tested. Essentially the problem is
> that nobody ever wants to inject an exception from userspace
> except in incredibly rare cases like "trying to use h/w breakpoints
> simultaneously inside the VM and also to debug the VM from outside"
> or "we're running on RAS hardware that just told us that the VM's
> RAM was faulty". There's no even vaguely commonly-used usecase
> for it today; and this ABI suggestion adds another "this is in
> practice almost never going to happen" case to the pile. The
> codepath is unreliable in QEMU because (a) it requires getting
> syncing of sysreg values to and from the kernel right -- this is
> about the only situation where userspace wants to modify sysregs
> during execution of the VM, as opposed to just reading them -- and
> (b) we try to reuse the code we already have that does TCG exception
> injection, which might or might not be a design mistake, and

That's probably a design mistake, correct :)

> (c) as noted above it's a never-actually-used untested codepath...

Sounds like an easy thing to resolve using kvm-unit-tests?

> So I think if I were you I wouldn't commit to the kernel ABI until
> somebody had at least written some RFC-quality patches for QEMU and
> tested that they work and the ABI is OK in that sense. (For the
> avoidance of doubt, I'm not volunteering to do that myself.)
> I don't object to the idea in principle, though.
>
> PS: the other "injecting exceptions to the guest" oddity is that
> if the kernel *does* find the ISV information and returns to userspace
> for it to handle the MMIO, there's no way for userspace to say
> "actually that address is supposed to generate a data abort".

I think we're converging here. My proposal is that "inject a fault"
should not be something special cased for the "I can't decode the
instruction" case, but rather that we need a more generic mechanism.

Whether that's a new ioctl, a flag we set on entry or something else is
an implementation detail I'll be happy to leave for discussion.

The only thing I'd like to avoid seeing is that we create a new user
space ABI that makes it easy to inject a single, particular exception,
but not solve all of the other cases while creating extra work to just
implement instruction decoding in user space.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


2019-09-06 18:52:15

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Fri, Sep 06, 2019 at 02:31:42PM +0100, Peter Maydell wrote:
> On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <[email protected]> wrote:
> > I'd prefer leaving it to userspace to worry about, but I thought Peter
> > said that had been problematic historically, which I took at face value,
> > but I could have misunderstood.
> >
> > If QEMU, kvmtool, and whatever the crazy^H cool kids are using in
> > userspace these days are happy emulating the exception, then that's a
> > viable approach. The main concern I have with that is whether they'll
> > all get it right, and since we already have the code in the kernel to do
> > this, it might make sense to re-use the kernel logic for it.
>
> Well, for QEMU we've had code that in theory might do this but
> in practice it's never been tested. Essentially the problem is
> that nobody ever wants to inject an exception from userspace
> except in incredibly rare cases like "trying to use h/w breakpoints
> simultaneously inside the VM and also to debug the VM from outside"
> or "we're running on RAS hardware that just told us that the VM's
> RAM was faulty". There's no even vaguely commonly-used usecase
> for it today; and this ABI suggestion adds another "this is in
> practice almost never going to happen" case to the pile. The
> codepath is unreliable in QEMU because (a) it requires getting
> syncing of sysreg values to and from the kernel right -- this is
> about the only situation where userspace wants to modify sysregs
> during execution of the VM, as opposed to just reading them -- and
> (b) we try to reuse the code we already have that does TCG exception
> injection, which might or might not be a design mistake, and
> (c) as noted above it's a never-actually-used untested codepath...
>
> So I think if I were you I wouldn't commit to the kernel ABI until
> somebody had at least written some RFC-quality patches for QEMU and
> tested that they work and the ABI is OK in that sense. (For the
> avoidance of doubt, I'm not volunteering to do that myself.)
> I don't object to the idea in principle, though.
>
> PS: the other "injecting exceptions to the guest" oddity is that
> if the kernel *does* find the ISV information and returns to userspace
> for it to handle the MMIO, there's no way for userspace to say
> "actually that address is supposed to generate a data abort".
>

That's a good point. A synchronous interface with a separate mechanism
to ask the kernel to inject an exception might be a good solution, if
there's an elegant way to do the latter. I'll have a look at that.

Thanks,

Christoffer

2019-09-06 18:56:46

by Peter Maydell

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Fri, 6 Sep 2019 at 14:41, Alexander Graf <[email protected]> wrote:
> On 06.09.19 15:31, Peter Maydell wrote:
> > (b) we try to reuse the code we already have that does TCG exception
> > injection, which might or might not be a design mistake, and
>
> That's probably a design mistake, correct :)

Well, conceptually it's not necessarily a bad idea, because
in both cases what we're doing is "change the system register
state (PC, ESR_EL1, ELR_EL1 etc) so that the CPU looks like
it's just taken an exception"; but some of what the
TCG code needs to do isn't necessary for KVM and all of it
was not written with the idea of KVM in mind at all...

thanks
-- PMM

2019-09-06 19:15:42

by Alexander Graf

[permalink] [raw]
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded



On 06.09.19 15:50, Peter Maydell wrote:
> On Fri, 6 Sep 2019 at 14:41, Alexander Graf <[email protected]> wrote:
>> On 06.09.19 15:31, Peter Maydell wrote:
>>> (b) we try to reuse the code we already have that does TCG exception
>>> injection, which might or might not be a design mistake, and
>>
>> That's probably a design mistake, correct :)
>
> Well, conceptually it's not necessarily a bad idea, because
> in both cases what we're doing is "change the system register
> state (PC, ESR_EL1, ELR_EL1 etc) so that the CPU looks like
> it's just taken an exception"; but some of what the
> TCG code needs to do isn't necessary for KVM and all of it
> was not written with the idea of KVM in mind at all...

Yes, and it probably makes sense to model the QEMU internal API
similarly, so that exception generating code does not have to distinguish.

However, it's much easier for KVM to ensure exception prioritization as
well as internal state synchronization. Conceptually, as you've seen, it
really doesn't make a lot of sense to pull register state from KVM,
wiggle it and then push it down when KVM has awareness of the same
transformation anyway.

So I guess the path is clear: Create a generic interface for exception
injection and a separate patch similar to what Christoffer already
posted with the new CAP to route illegal MMIO traps into user space.

Connecting the two dots in user space really should be trivial then.

(famous last words)


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879