Hi,
I am writing a firmware in Rust to support SEV based on project td-shim[1].
But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
the linux kernel crashed because the int3 instruction in int3_selftest() cause a
#UD.
The stack is as follows:
[ 0.141804] invalid opcode: 0000 [#1] PREEMPT SMP^M
[ 0.141804] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #37^M
[ 0.141804] RIP: 0010:int3_selftest_ip+0x0/0x2a^M
[ 0.141804] Code: eb bc 66 90 0f 1f 44 00 00 48 83 ec 08 48 c7 c7 90 0d 78 83
c7 44 24 04 00 00 00 00 e8 23 fe ac fd 85 c0 75 22 48 8d 7c 24 04 <cc>
90 90 90 90 83 7c 24 04 01 75 13 48 c7 c7 90 0d 78 83 e8 42 fc^M
[ 0.141804] RSP: 0000:ffffffff82803f18 EFLAGS: 00010246^M
[ 0.141804] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffffffe^M
[ 0.141804] RDX: ffffffff82fd4938 RSI: 0000000000000296 RDI: ffffffff82803f1c^M
[ 0.141804] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000fffeffff^M
[ 0.141804] R10: ffffffff82803e08 R11: ffffffff82f615a8 R12: 00000000ff062350^M
[ 0.141804] R13: 000000001fddc20a R14: 000000000090122c R15: 0000000002000000^M
[ 0.141804] FS: 0000000000000000(0000) GS:ffff88801f200000(0000) knlGS:0000000000000000^M
[ 0.141804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 0.141804] CR2: ffff888004c00000 CR3: 000800000281f000 CR4: 00000000003506f0^M
[ 0.141804] Call Trace:^M
[ 0.141804] <TASK>^M
[ 0.141804] alternative_instructions+0xe/0x100^M
[ 0.141804] check_bugs+0xa7/0x110^M
[ 0.141804] start_kernel+0x320/0x430^M
[ 0.141804] secondary_startup_64_no_verify+0xd3/0xdb^M
[ 0.141804] </TASK>^M
[ 0.141804] Modules linked in:^M
[ 0.141804] ---[ end trace 0000000000000000 ]--
Then I tried to figure out the problem and do some test with qemu & OVMF in SEV.
But the behaviour is also weird when I create SEV VM with qemu & OVMF.
I found the int3 instruction always generated a #UD if I put a int3 instruction before
gen_pool_create() in mce_gen_pool_create(). But if I put the int3 instruction after the
gen_pool_create() in mce_gen_pool_create(), the int3 instruction generated a #BP rightly.
// linux/arch/x86/kernel/cpu/mce/genpool.c
static int mce_gen_pool_create(void)
{
struct gen_pool *tmpp;
int ret = -ENOMEM;
asm volatile ("int3\n\t"); // generated a # UD
tmpp = gen_pool_create(ilog2(sizeof(struct mce_evt_list)), -1);
asm volatile ("int3\n\t"); // generated a #BP
...
}
The stack is as follows when I put the int3 instruction before gen_pool_create().
[ 0.094846] invalid opcode: 0000 [#1] PREEMPT SMP^M
[ 0.094994] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #101^M
[ 0.094994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022^M
[ 0.094994] RIP: 0010:mcheck_cpu_init+0x4e/0x150^M
[ 0.094994] Code: 84 c0 0f 89 97 00 00 00 48 8b 45 28 f6 c4 40 0f 84 8a 00 00 00 e8 c2
e6 ff ff 48 89 ef e8 8a e2 ff ff 85 c0 0f 88 94 00 00 00 <cc> e8 dc 05 00 00 85 c0 75 76
80 0d a1 90 0a 02 20 0f b6 45 01 3c^M
[ 0.094994] RSP: 0000:ffffffff92803ef8 EFLAGS: 00010246^M
[ 0.094994] RAX: 0000000000000000 RBX: 0000000000000058 RCX: 00000000ffffffff^M
[ 0.094994] RDX: 0000000000000002 RSI: 00000000000000ff RDI: ffffffff930ed860^M
[ 0.094994] RBP: ffffffff930ed860 R08: 0000000000000000 R09: 0000000000000000^M
[ 0.094994] R10: 0000000000000000 R11: 0000000000000254 R12: 0000000000000207^M
[ 0.094994] R13: 000000001f9ec018 R14: 000000001fe85928 R15: 0000000000000001^M
[ 0.094994] FS: 0000000000000000(0000) GS:ffff8ae0dca00000(0000) knlGS:0000000000000000^M
[ 0.094994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 0.094994] CR2: ffff8ae0dac01000 CR3: 000800001881f000 CR4: 00000000003506f0^M
[ 0.094994] Call Trace:^M
[ 0.094994] <TASK>^M
[ 0.094994] identify_cpu+0x2cb/0x500^M
[ 0.094994] identify_boot_cpu+0x10/0xb0^M
[ 0.094994] check_bugs+0xf/0x110^M
[ 0.094994] start_kernel+0x320/0x430^M
[ 0.094994] secondary_startup_64_no_verify+0xd3/0xdb^M
[ 0.094994] </TASK>^M
[ 0.094994] Modules linked in:^M
[ 0.094995] ---[ end trace 0000000000000000 ]---^
The stack is as follows when I put the int3 instruction after gen_pool_create().
[ 0.095585] int3: 0000 [#1] PREEMPT SMP^M
[ 0.095590] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0+ #101^M
[ 0.095593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022^M
[ 0.095594] RIP: 0010:mcheck_cpu_init+0x4f/0x150^M
[ 0.095597] Code: c0 0f 89 97 00 00 00 48 8b 45 28 f6 c4 40 0f 84 8a 00 00 00 e8 c2 e6
ff ff 48 89 ef e8 8a e2 ff ff 85 c0 0f 88 94 00 00 00 cc <e8> dc 05 00 00 85 c0 75 76 80
0d a1 90 0a 02 20 0f b6 45 01 3c 02^M
[ 0.095598] RSP: 0000:ffffffff86803ef8 EFLAGS: 00000246^M
[ 0.095599] RAX: 0000000000000000 RBX: 0000000000000058 RCX: 00000000ffffffff^M
[ 0.095600] RDX: 0000000000000002 RSI: 00000000000000ff RDI: ffffffff870ed860^M
[ 0.095601] RBP: ffffffff870ed860 R08: 0000000000000000 R09: 0000000000000000^M
[ 0.095601] R10: 0000000000000000 R11: 0000000000000254 R12: 0000000000000207^M
[ 0.095602] R13: 000000001f9ec018 R14: 000000001fe85928 R15: 0000000000000001^M
[ 0.095604] FS: 0000000000000000(0000) GS:ffff901e5ca00000(0000) knlGS:0000000000000000^M
[ 0.095605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 0.095606] CR2: ffff901e5ac01000 CR3: 000800001881f000 CR4: 00000000003506f0^M
[ 0.095606] Call Trace:^M
[ 0.095611] <TASK>^M
[ 0.095612] identify_cpu+0x2cb/0x500^M
[ 0.095615] identify_boot_cpu+0x10/0xb0^M
[ 0.095618] check_bugs+0xf/0x110^M
[ 0.095620] start_kernel+0x320/0x430^M
[ 0.095622] secondary_startup_64_no_verify+0xd3/0xdb^M
[ 0.095625] </TASK>^M
[ 0.095625] Modules linked in:^M
[ 0.096577] ---[ end trace 0000000000000000 ]---^
BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
#BP.
So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
Any suggestion is appreciated!
[1] https://github.com/confidential-containers/td-shim
Thanks,
Wu Zongyo
On 7/31/23 09:30, Sean Christopherson wrote:
> On Sat, Jul 29, 2023, wuzongyong wrote:
>> Hi,
>> I am writing a firmware in Rust to support SEV based on project td-shim[1].
>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
>> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
>> #UD.
>
> ...
>
>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
>> #BP.
>> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
>> Any suggestion is appreciated!
>
> Have you tried my suggestions from the other thread[*]?
>
> : > > I'm curious how this happend. I cannot find any condition that would
> : > > cause the int3 instruction generate a #UD according to the AMD's spec.
> :
> : One possibility is that the value from memory that gets executed diverges from the
> : value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
> : be the case in this test), stale cache/tlb entries, etc.
> :
> : > > BTW, it worked nomarlly with qemu and ovmf.
> : >
> : > Does this happen every time you boot the guest with your firmware? What
> : > processor are you running on?
> :
> : And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
> : a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
> : tracepoints.
I have a feeling that KVM is injecting the #UD, but it will take
instrumenting KVM to see which path the #UD is being injected from.
Wu Zongyo, can you add some instrumentation to figure that out if the
trace points towards KVM injecting the #UD?
Thanks,
Tom
>
> [*] https://lore.kernel.org/all/[email protected]
On 2023/7/31 23:03, Tom Lendacky wrote:
> On 7/31/23 09:30, Sean Christopherson wrote:
>> On Sat, Jul 29, 2023, wuzongyong wrote:
>>> Hi,
>>> I am writing a firmware in Rust to support SEV based on project td-shim[1].
>>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
>>> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
>>> #UD.
>>
>> ...
>>
>>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
>>> #BP.
>>> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
>>> Any suggestion is appreciated!
>>
>> Have you tried my suggestions from the other thread[*]?
Firstly, I'm sorry for sending muliple mails with the same content. I thought the mails I sent previously
didn't be sent successfully.
And let's talk the problem here.
>>
>> : > > I'm curious how this happend. I cannot find any condition that would
>> : > > cause the int3 instruction generate a #UD according to the AMD's spec.
>> :
>> : One possibility is that the value from memory that gets executed diverges from the
>> : value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
>> : be the case in this test), stale cache/tlb entries, etc.
>> :
>> : > > BTW, it worked nomarlly with qemu and ovmf.
>> : >
>> : > Does this happen every time you boot the guest with your firmware? What
>> : > processor are you running on?
>> :
Yes, every time.
The processor I used is EPYC 7T83.
>> : And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
>> : a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
>> : tracepoints.
>
> I have a feeling that KVM is injecting the #UD, but it will take instrumenting KVM to see which path the #UD is being injected from.
>
> Wu Zongyo, can you add some instrumentation to figure that out if the trace points towards KVM injecting the #UD?
Ok, I will try to do that.
>
> Thanks,
> Tom
>
>>
>> [*] https://lore.kernel.org/all/[email protected]
On Sat, Jul 29, 2023, wuzongyong wrote:
> Hi,
> I am writing a firmware in Rust to support SEV based on project td-shim[1].
> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
> #UD.
...
> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
> #BP.
> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
> Any suggestion is appreciated!
Have you tried my suggestions from the other thread[*]?
: > > I'm curious how this happend. I cannot find any condition that would
: > > cause the int3 instruction generate a #UD according to the AMD's spec.
:
: One possibility is that the value from memory that gets executed diverges from the
: value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
: be the case in this test), stale cache/tlb entries, etc.
:
: > > BTW, it worked nomarlly with qemu and ovmf.
: >
: > Does this happen every time you boot the guest with your firmware? What
: > processor are you running on?
:
: And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
: a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
: tracepoints.
[*] https://lore.kernel.org/all/[email protected]
On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
>
> On 2023/7/31 23:03, Tom Lendacky wrote:
> > On 7/31/23 09:30, Sean Christopherson wrote:
> >> On Sat, Jul 29, 2023, wuzongyong wrote:
> >>> Hi,
> >>> I am writing a firmware in Rust to support SEV based on project td-shim[1].
> >>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
> >>> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
> >>> #UD.
> >>
> >> ...
> >>
> >>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
> >>> #BP.
> >>> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
> >>> Any suggestion is appreciated!
> >>
> >> Have you tried my suggestions from the other thread[*]?
> Firstly, I'm sorry for sending muliple mails with the same content. I thought the mails I sent previously
> didn't be sent successfully.
> And let's talk the problem here.
> >>
> >> : > > I'm curious how this happend. I cannot find any condition that would
> >> : > > cause the int3 instruction generate a #UD according to the AMD's spec.
> >> :
> >> : One possibility is that the value from memory that gets executed diverges from the
> >> : value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
> >> : be the case in this test), stale cache/tlb entries, etc.
> >> :
> >> : > > BTW, it worked nomarlly with qemu and ovmf.
> >> : >
> >> : > Does this happen every time you boot the guest with your firmware? What
> >> : > processor are you running on?
> >> :
> Yes, every time.
> The processor I used is EPYC 7T83.
> >> : And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
> >> : a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
> >> : tracepoints.
> >
> > I have a feeling that KVM is injecting the #UD, but it will take instrumenting KVM to see which path the #UD is being injected from.
> >
> > Wu Zongyo, can you add some instrumentation to figure that out if the trace points towards KVM injecting the #UD?
> Ok, I will try to do that.
You're right. The #UD is injected by KVM.
The path I found is:
svm_vcpu_run
svm_complete_interrupts
kvm_requeue_exception // vector = 3
kvm_make_request
vcpu_enter_guest
kvm_check_and_inject_events
svm_inject_exception
svm_update_soft_interrupt_rip
__svm_skip_emulated_instruction
x86_emulate_instruction
svm_can_emulate_instruction
kvm_queue_exception(vcpu, UD_VECTOR)
Does this mean a #PF intercept occur when the guest try to deliver a
#BP through the IDT? But why?
Thanks
> >
> > Thanks,
> > Tom
> >
> >>
> >> [*] https://lore.kernel.org/all/[email protected]
On Wed, Aug 02, 2023, Wu Zongyo wrote:
> On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
> >
> > On 2023/7/31 23:03, Tom Lendacky wrote:
> > > On 7/31/23 09:30, Sean Christopherson wrote:
> > >> On Sat, Jul 29, 2023, wuzongyong wrote:
> > >>> Hi,
> > >>> I am writing a firmware in Rust to support SEV based on project td-shim[1].
> > >>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
> > >>> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
> > >>> #UD.
> > >>
> > >> ...
> > >>
> > >>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
> > >>> #BP.
> > >>> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
> > >>> Any suggestion is appreciated!
> > >>
> > >> Have you tried my suggestions from the other thread[*]?
> > Firstly, I'm sorry for sending muliple mails with the same content. I thought the mails I sent previously
> > didn't be sent successfully.
> > And let's talk the problem here.
> > >>
> > >> : > > I'm curious how this happend. I cannot find any condition that would
> > >> : > > cause the int3 instruction generate a #UD according to the AMD's spec.
> > >> :
> > >> : One possibility is that the value from memory that gets executed diverges from the
> > >> : value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
> > >> : be the case in this test), stale cache/tlb entries, etc.
> > >> :
> > >> : > > BTW, it worked nomarlly with qemu and ovmf.
> > >> : >
> > >> : > Does this happen every time you boot the guest with your firmware? What
> > >> : > processor are you running on?
> > >> :
> > Yes, every time.
> > The processor I used is EPYC 7T83.
> > >> : And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
> > >> : a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
> > >> : tracepoints.
> > >
> > > I have a feeling that KVM is injecting the #UD, but it will take instrumenting KVM to see which path the #UD is being injected from.
> > >
> > > Wu Zongyo, can you add some instrumentation to figure that out if the trace points towards KVM injecting the #UD?
> > Ok, I will try to do that.
> You're right. The #UD is injected by KVM.
>
> The path I found is:
> svm_vcpu_run
> svm_complete_interrupts
> kvm_requeue_exception // vector = 3
> kvm_make_request
>
> vcpu_enter_guest
> kvm_check_and_inject_events
> svm_inject_exception
> svm_update_soft_interrupt_rip
> __svm_skip_emulated_instruction
> x86_emulate_instruction
> svm_can_emulate_instruction
> kvm_queue_exception(vcpu, UD_VECTOR)
>
> Does this mean a #PF intercept occur when the guest try to deliver a
> #BP through the IDT? But why?
I doubt it's a #PF. A #NPF is much more likely, though it could be something
else entirely, but I'm pretty sure that would require bugs in both the host and
guest.
What is the last exit recorded by trace_kvm_exit() before the #UD is injected?
On 8/2/23 09:25, Tom Lendacky wrote:
> On 8/2/23 09:01, Sean Christopherson wrote:
>> On Wed, Aug 02, 2023, Wu Zongyo wrote:
>>> On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
>>>>
>>>> On 2023/7/31 23:03, Tom Lendacky wrote:
>>>>> On 7/31/23 09:30, Sean Christopherson wrote:
>>>>>> On Sat, Jul 29, 2023, wuzongyong wrote:
>>>>>>> Hi,
>>>>>>> I am writing a firmware in Rust to support SEV based on project
>>>>>>> td-shim[1].
>>>>>>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP)
>>>>>>> with the firmware,
>>>>>>> the linux kernel crashed because the int3 instruction in
>>>>>>> int3_selftest() cause a
>>>>>>> #UD.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3
>>>>>>> instruction always generates a
>>>>>>> #BP.
>>>>>>> So I am confused now about the behaviour of int3 instruction, could
>>>>>>> anyone help to explain the behaviour?
>>>>>>> Any suggestion is appreciated!
>>>>>>
>>>>>> Have you tried my suggestions from the other thread[*]?
>>>> Firstly, I'm sorry for sending muliple mails with the same content. I
>>>> thought the mails I sent previously
>>>> didn't be sent successfully.
>>>> And let's talk the problem here.
>>>>>>
>>>>>> : > > I'm curious how this happend. I cannot find any condition
>>>>>> that would
>>>>>> : > > cause the int3 instruction generate a #UD according to the
>>>>>> AMD's spec.
>>>>>> :
>>>>>> : One possibility is that the value from memory that gets
>>>>>> executed diverges from the
>>>>>> : value that is read out be the #UD handler, e.g. due to
>>>>>> patching (doesn't seem to
>>>>>> : be the case in this test), stale cache/tlb entries, etc.
>>>>>> :
>>>>>> : > > BTW, it worked nomarlly with qemu and ovmf.
>>>>>> : >
>>>>>> : > Does this happen every time you boot the guest with your
>>>>>> firmware? What
>>>>>> : > processor are you running on?
>>>>>> :
>>>> Yes, every time.
>>>> The processor I used is EPYC 7T83.
>>>>>> : And have you ruled out KVM as the culprit? I.e. verified that
>>>>>> KVM is NOT injecting
>>>>>> : a #UD. That obviously shouldn't happen, but it should be easy
>>>>>> to check via KVM
>>>>>> : tracepoints.
>>>>>
>>>>> I have a feeling that KVM is injecting the #UD, but it will take
>>>>> instrumenting KVM to see which path the #UD is being injected from.
>>>>>
>>>>> Wu Zongyo, can you add some instrumentation to figure that out if the
>>>>> trace points towards KVM injecting the #UD?
>>>> Ok, I will try to do that.
>>> You're right. The #UD is injected by KVM.
>>>
>>> The path I found is:
>>> svm_vcpu_run
>>> svm_complete_interrupts
>>> kvm_requeue_exception // vector = 3
>>> kvm_make_request
>>>
>>> vcpu_enter_guest
>>> kvm_check_and_inject_events
>>> svm_inject_exception
>>> svm_update_soft_interrupt_rip
>>> __svm_skip_emulated_instruction
>>> x86_emulate_instruction
>>> svm_can_emulate_instruction
>>> kvm_queue_exception(vcpu, UD_VECTOR)
>>>
>>> Does this mean a #PF intercept occur when the guest try to deliver a
>>> #BP through the IDT? But why?
>>
>> I doubt it's a #PF. A #NPF is much more likely, though it could be
>> something
>> else entirely, but I'm pretty sure that would require bugs in both the
>> host and
>> guest.
>>
>> What is the last exit recorded by trace_kvm_exit() before the #UD is
>> injected?
>
> I'm guessing it was a #NPF, too. Could it be related to the changes that
> went in around svm_update_soft_interrupt_rip()?
>
> 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
> instruction")
Sorry, that should have been:
7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"")
>
> Before this the !nrips check would prevent the call into
> svm_skip_emulated_instruction(). But now, there is a call to:
>
> svm_update_soft_interrupt_rip()
> __svm_skip_emulated_instruction()
> kvm_emulate_instruction()
> x86_emulate_instruction() (passed a NULL insn pointer)
> kvm_can_emulate_insn() (passed a NULL insn pointer)
> svm_can_emulate_instruction() (passed NULL insn pointer)
>
> Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
> and injects the #UD.
>
> Thanks,
> Tom
>
On Wed, Aug 02, 2023, Tom Lendacky wrote:
> On 8/2/23 09:25, Tom Lendacky wrote:
> > On 8/2/23 09:01, Sean Christopherson wrote:
> > > > You're right. The #UD is injected by KVM.
> > > >
> > > > The path I found is:
> > > > svm_vcpu_run
> > > > svm_complete_interrupts
> > > > kvm_requeue_exception // vector = 3
> > > > kvm_make_request
> > > >
> > > > vcpu_enter_guest
> > > > kvm_check_and_inject_events
> > > > svm_inject_exception
> > > > svm_update_soft_interrupt_rip
> > > > __svm_skip_emulated_instruction
> > > > x86_emulate_instruction
> > > > svm_can_emulate_instruction
> > > > kvm_queue_exception(vcpu, UD_VECTOR)
> > > >
> > > > Does this mean a #PF intercept occur when the guest try to deliver a
> > > > #BP through the IDT? But why?
> > >
> > > I doubt it's a #PF. A #NPF is much more likely, though it could be
> > > something
> > > else entirely, but I'm pretty sure that would require bugs in both
> > > the host and
> > > guest.
> > >
> > > What is the last exit recorded by trace_kvm_exit() before the #UD is
> > > injected?
> >
> > I'm guessing it was a #NPF, too. Could it be related to the changes that
> > went in around svm_update_soft_interrupt_rip()?
> >
> > 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
> > instruction")
>
> Sorry, that should have been:
>
> 7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"")
>
> >
> > Before this the !nrips check would prevent the call into
> > svm_skip_emulated_instruction(). But now, there is a call to:
> >
> > svm_update_soft_interrupt_rip()
> > __svm_skip_emulated_instruction()
> > kvm_emulate_instruction()
> > x86_emulate_instruction() (passed a NULL insn pointer)
> > kvm_can_emulate_insn() (passed a NULL insn pointer)
> > svm_can_emulate_instruction() (passed NULL insn pointer)
> >
> > Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
> > and injects the #UD.
Yeah, my money is on that too. I believe this is the least awful solution:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d381ad424554..2eace114a934 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -385,6 +385,9 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
}
if (!svm->next_rip) {
+ if (sev_guest(vcpu->kvm))
+ return 0;
+
if (unlikely(!commit_side_effects))
old_rflags = svm->vmcb->save.rflags;
I'll send a formal patch (with a comment) if that solves the problem.
Side topic, KVM should require nrips for SEV and beyond, I don't see how SEV can
possibly work if KVM doesn't utilize nrips. E.g. this
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2eace114a934..43e500503d48 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5111,9 +5111,11 @@ static __init int svm_hardware_setup(void)
svm_adjust_mmio_mask();
+ nrips = nrips && boot_cpu_has(X86_FEATURE_NRIPS);
+
/*
* Note, SEV setup consumes npt_enabled and enable_mmio_caching (which
- * may be modified by svm_adjust_mmio_mask()).
+ * may be modified by svm_adjust_mmio_mask()), as well as nrips.
*/
sev_hardware_setup();
@@ -5125,11 +5127,6 @@ static __init int svm_hardware_setup(void)
goto err;
}
- if (nrips) {
- if (!boot_cpu_has(X86_FEATURE_NRIPS))
- nrips = false;
- }
-
enable_apicv = avic = avic && avic_hardware_setup();
if (!enable_apicv) {
On 8/2/23 09:01, Sean Christopherson wrote:
> On Wed, Aug 02, 2023, Wu Zongyo wrote:
>> On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
>>>
>>> On 2023/7/31 23:03, Tom Lendacky wrote:
>>>> On 7/31/23 09:30, Sean Christopherson wrote:
>>>>> On Sat, Jul 29, 2023, wuzongyong wrote:
>>>>>> Hi,
>>>>>> I am writing a firmware in Rust to support SEV based on project td-shim[1].
>>>>>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP) with the firmware,
>>>>>> the linux kernel crashed because the int3 instruction in int3_selftest() cause a
>>>>>> #UD.
>>>>>
>>>>> ...
>>>>>
>>>>>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3 instruction always generates a
>>>>>> #BP.
>>>>>> So I am confused now about the behaviour of int3 instruction, could anyone help to explain the behaviour?
>>>>>> Any suggestion is appreciated!
>>>>>
>>>>> Have you tried my suggestions from the other thread[*]?
>>> Firstly, I'm sorry for sending muliple mails with the same content. I thought the mails I sent previously
>>> didn't be sent successfully.
>>> And let's talk the problem here.
>>>>>
>>>>> : > > I'm curious how this happend. I cannot find any condition that would
>>>>> : > > cause the int3 instruction generate a #UD according to the AMD's spec.
>>>>> :
>>>>> : One possibility is that the value from memory that gets executed diverges from the
>>>>> : value that is read out be the #UD handler, e.g. due to patching (doesn't seem to
>>>>> : be the case in this test), stale cache/tlb entries, etc.
>>>>> :
>>>>> : > > BTW, it worked nomarlly with qemu and ovmf.
>>>>> : >
>>>>> : > Does this happen every time you boot the guest with your firmware? What
>>>>> : > processor are you running on?
>>>>> :
>>> Yes, every time.
>>> The processor I used is EPYC 7T83.
>>>>> : And have you ruled out KVM as the culprit? I.e. verified that KVM is NOT injecting
>>>>> : a #UD. That obviously shouldn't happen, but it should be easy to check via KVM
>>>>> : tracepoints.
>>>>
>>>> I have a feeling that KVM is injecting the #UD, but it will take instrumenting KVM to see which path the #UD is being injected from.
>>>>
>>>> Wu Zongyo, can you add some instrumentation to figure that out if the trace points towards KVM injecting the #UD?
>>> Ok, I will try to do that.
>> You're right. The #UD is injected by KVM.
>>
>> The path I found is:
>> svm_vcpu_run
>> svm_complete_interrupts
>> kvm_requeue_exception // vector = 3
>> kvm_make_request
>>
>> vcpu_enter_guest
>> kvm_check_and_inject_events
>> svm_inject_exception
>> svm_update_soft_interrupt_rip
>> __svm_skip_emulated_instruction
>> x86_emulate_instruction
>> svm_can_emulate_instruction
>> kvm_queue_exception(vcpu, UD_VECTOR)
>>
>> Does this mean a #PF intercept occur when the guest try to deliver a
>> #BP through the IDT? But why?
>
> I doubt it's a #PF. A #NPF is much more likely, though it could be something
> else entirely, but I'm pretty sure that would require bugs in both the host and
> guest.
>
> What is the last exit recorded by trace_kvm_exit() before the #UD is injected?
I'm guessing it was a #NPF, too. Could it be related to the changes that
went in around svm_update_soft_interrupt_rip()?
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Before this the !nrips check would prevent the call into
svm_skip_emulated_instruction(). But now, there is a call to:
svm_update_soft_interrupt_rip()
__svm_skip_emulated_instruction()
kvm_emulate_instruction()
x86_emulate_instruction() (passed a NULL insn pointer)
kvm_can_emulate_insn() (passed a NULL insn pointer)
svm_can_emulate_instruction() (passed NULL insn pointer)
Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
and injects the #UD.
Thanks,
Tom
On Wed, Aug 02, 2023, Tom Lendacky wrote:
> On 8/2/23 10:04, Sean Christopherson wrote:
> > Side topic, KVM should require nrips for SEV and beyond, I don't see how SEV can
> > possibly work if KVM doesn't utilize nrips. E.g. this
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 2eace114a934..43e500503d48 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5111,9 +5111,11 @@ static __init int svm_hardware_setup(void)
> > svm_adjust_mmio_mask();
> > + nrips = nrips && boot_cpu_has(X86_FEATURE_NRIPS);
> > +
> > /*
> > * Note, SEV setup consumes npt_enabled and enable_mmio_caching (which
> > - * may be modified by svm_adjust_mmio_mask()).
> > + * may be modified by svm_adjust_mmio_mask()), as well as nrips.
> > */
> > sev_hardware_setup();
>
> You moved the setting of nrips up, I'm assuming you then want to add a check
> in sev_hardware_setup() for nrips?
Doh. I like to think I would have noticed that I forgot to add that check before
postinga patch, but I give myself 50/50 odds at best.
On 8/2/23 10:04, Sean Christopherson wrote:
> On Wed, Aug 02, 2023, Tom Lendacky wrote:
>> On 8/2/23 09:25, Tom Lendacky wrote:
>>> On 8/2/23 09:01, Sean Christopherson wrote:
>>>>> You're right. The #UD is injected by KVM.
>>>>>
>>>>> The path I found is:
>>>>> svm_vcpu_run
>>>>> svm_complete_interrupts
>>>>> kvm_requeue_exception // vector = 3
>>>>> kvm_make_request
>>>>>
>>>>> vcpu_enter_guest
>>>>> kvm_check_and_inject_events
>>>>> svm_inject_exception
>>>>> svm_update_soft_interrupt_rip
>>>>> __svm_skip_emulated_instruction
>>>>> x86_emulate_instruction
>>>>> svm_can_emulate_instruction
>>>>> kvm_queue_exception(vcpu, UD_VECTOR)
>>>>>
>>>>> Does this mean a #PF intercept occur when the guest try to deliver a
>>>>> #BP through the IDT? But why?
>>>>
>>>> I doubt it's a #PF. A #NPF is much more likely, though it could be
>>>> something
>>>> else entirely, but I'm pretty sure that would require bugs in both
>>>> the host and
>>>> guest.
>>>>
>>>> What is the last exit recorded by trace_kvm_exit() before the #UD is
>>>> injected?
>>>
>>> I'm guessing it was a #NPF, too. Could it be related to the changes that
>>> went in around svm_update_soft_interrupt_rip()?
>>>
>>> 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
>>> instruction")
>>
>> Sorry, that should have been:
>>
>> 7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"")
>>
>>>
>>> Before this the !nrips check would prevent the call into
>>> svm_skip_emulated_instruction(). But now, there is a call to:
>>>
>>> svm_update_soft_interrupt_rip()
>>> __svm_skip_emulated_instruction()
>>> kvm_emulate_instruction()
>>> x86_emulate_instruction() (passed a NULL insn pointer)
>>> kvm_can_emulate_insn() (passed a NULL insn pointer)
>>> svm_can_emulate_instruction() (passed NULL insn pointer)
>>>
>>> Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
>>> and injects the #UD.
>
> Yeah, my money is on that too. I believe this is the least awful solution:
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d381ad424554..2eace114a934 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -385,6 +385,9 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
> }
>
> if (!svm->next_rip) {
> + if (sev_guest(vcpu->kvm))
> + return 0;
> +
> if (unlikely(!commit_side_effects))
> old_rflags = svm->vmcb->save.rflags;
>
> I'll send a formal patch (with a comment) if that solves the problem.
>
>
> Side topic, KVM should require nrips for SEV and beyond, I don't see how SEV can
> possibly work if KVM doesn't utilize nrips. E.g. this
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 2eace114a934..43e500503d48 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5111,9 +5111,11 @@ static __init int svm_hardware_setup(void)
>
> svm_adjust_mmio_mask();
>
> + nrips = nrips && boot_cpu_has(X86_FEATURE_NRIPS);
> +
> /*
> * Note, SEV setup consumes npt_enabled and enable_mmio_caching (which
> - * may be modified by svm_adjust_mmio_mask()).
> + * may be modified by svm_adjust_mmio_mask()), as well as nrips.
> */
> sev_hardware_setup();
You moved the setting of nrips up, I'm assuming you then want to add a
check in sev_hardware_setup() for nrips?
Thanks,
Tom
>
> @@ -5125,11 +5127,6 @@ static __init int svm_hardware_setup(void)
> goto err;
> }
>
> - if (nrips) {
> - if (!boot_cpu_has(X86_FEATURE_NRIPS))
> - nrips = false;
> - }
> -
> enable_apicv = avic = avic && avic_hardware_setup();
>
> if (!enable_apicv) {
>
On 8/2/23 09:33, Tom Lendacky wrote:
> On 8/2/23 09:25, Tom Lendacky wrote:
>> On 8/2/23 09:01, Sean Christopherson wrote:
>>> On Wed, Aug 02, 2023, Wu Zongyo wrote:
>>>> On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
>>>>>
>>>>> On 2023/7/31 23:03, Tom Lendacky wrote:
>>>>>> On 7/31/23 09:30, Sean Christopherson wrote:
>>>>>>> On Sat, Jul 29, 2023, wuzongyong wrote:
>>>>>>>> Hi,
>>>>>>>> I am writing a firmware in Rust to support SEV based on project
>>>>>>>> td-shim[1].
>>>>>>>> But when I create a SEV VM (just SEV, no SEV-ES and no SEV-SNP)
>>>>>>>> with the firmware,
>>>>>>>> the linux kernel crashed because the int3 instruction in
>>>>>>>> int3_selftest() cause a
>>>>>>>> #UD.
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> BTW, if a create a normal VM without SEV by qemu & OVMF, the int3
>>>>>>>> instruction always generates a
>>>>>>>> #BP.
>>>>>>>> So I am confused now about the behaviour of int3 instruction,
>>>>>>>> could anyone help to explain the behaviour?
>>>>>>>> Any suggestion is appreciated!
>>>>>>>
>>>>>>> Have you tried my suggestions from the other thread[*]?
>>>>> Firstly, I'm sorry for sending muliple mails with the same content. I
>>>>> thought the mails I sent previously
>>>>> didn't be sent successfully.
>>>>> And let's talk the problem here.
>>>>>>>
>>>>>>> : > > I'm curious how this happend. I cannot find any condition
>>>>>>> that would
>>>>>>> : > > cause the int3 instruction generate a #UD according to
>>>>>>> the AMD's spec.
>>>>>>> :
>>>>>>> : One possibility is that the value from memory that gets
>>>>>>> executed diverges from the
>>>>>>> : value that is read out be the #UD handler, e.g. due to
>>>>>>> patching (doesn't seem to
>>>>>>> : be the case in this test), stale cache/tlb entries, etc.
>>>>>>> :
>>>>>>> : > > BTW, it worked nomarlly with qemu and ovmf.
>>>>>>> : >
>>>>>>> : > Does this happen every time you boot the guest with your
>>>>>>> firmware? What
>>>>>>> : > processor are you running on?
>>>>>>> :
>>>>> Yes, every time.
>>>>> The processor I used is EPYC 7T83.
>>>>>>> : And have you ruled out KVM as the culprit? I.e. verified
>>>>>>> that KVM is NOT injecting
>>>>>>> : a #UD. That obviously shouldn't happen, but it should be
>>>>>>> easy to check via KVM
>>>>>>> : tracepoints.
>>>>>>
>>>>>> I have a feeling that KVM is injecting the #UD, but it will take
>>>>>> instrumenting KVM to see which path the #UD is being injected from.
>>>>>>
>>>>>> Wu Zongyo, can you add some instrumentation to figure that out if
>>>>>> the trace points towards KVM injecting the #UD?
>>>>> Ok, I will try to do that.
>>>> You're right. The #UD is injected by KVM.
>>>>
>>>> The path I found is:
>>>> svm_vcpu_run
>>>> svm_complete_interrupts
>>>> kvm_requeue_exception // vector = 3
>>>> kvm_make_request
>>>>
>>>> vcpu_enter_guest
>>>> kvm_check_and_inject_events
>>>> svm_inject_exception
>>>> svm_update_soft_interrupt_rip
>>>> __svm_skip_emulated_instruction
>>>> x86_emulate_instruction
>>>> svm_can_emulate_instruction
>>>> kvm_queue_exception(vcpu, UD_VECTOR)
>>>>
>>>> Does this mean a #PF intercept occur when the guest try to deliver a
>>>> #BP through the IDT? But why?
>>>
>>> I doubt it's a #PF. A #NPF is much more likely, though it could be
>>> something
>>> else entirely, but I'm pretty sure that would require bugs in both the
>>> host and
>>> guest.
>>>
>>> What is the last exit recorded by trace_kvm_exit() before the #UD is
>>> injected?
>>
>> I'm guessing it was a #NPF, too. Could it be related to the changes that
>> went in around svm_update_soft_interrupt_rip()?
>>
>> 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
>> instruction")
>
> Sorry, that should have been:
>
> 7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on
> "failure"")
Doh! I was right the first time... sigh
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
Thanks,
Tom
>
>>
>> Before this the !nrips check would prevent the call into
>> svm_skip_emulated_instruction(). But now, there is a call to:
>>
>> svm_update_soft_interrupt_rip()
>> __svm_skip_emulated_instruction()
>> kvm_emulate_instruction()
>> x86_emulate_instruction() (passed a NULL insn pointer)
>> kvm_can_emulate_insn() (passed a NULL insn pointer)
>> svm_can_emulate_instruction() (passed NULL insn pointer)
>>
>> Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
>> and injects the #UD.
>>
>> Thanks,
>> Tom
>>
On Wed, Aug 02, 2023 at 03:03:45PM -0500, Tom Lendacky wrote:
> On 8/2/23 09:33, Tom Lendacky wrote:
> > On 8/2/23 09:25, Tom Lendacky wrote:
> > > On 8/2/23 09:01, Sean Christopherson wrote:
> > > > On Wed, Aug 02, 2023, Wu Zongyo wrote:
> > > > > On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
> > > > > >
> > > > > > On 2023/7/31 23:03, Tom Lendacky wrote:
> > > > > > > On 7/31/23 09:30, Sean Christopherson wrote:
> > > > > > > > On Sat, Jul 29, 2023, wuzongyong wrote:
> > > > > > > > > Hi,
> > > > > > > > > I am writing a firmware in Rust to support
> > > > > > > > > SEV based on project td-shim[1].
> > > > > > > > > But when I create a SEV VM (just SEV, no
> > > > > > > > > SEV-ES and no SEV-SNP) with the firmware,
> > > > > > > > > the linux kernel crashed because the int3
> > > > > > > > > instruction in int3_selftest() cause a
> > > > > > > > > #UD.
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > BTW, if a create a normal VM without SEV by
> > > > > > > > > qemu & OVMF, the int3 instruction always
> > > > > > > > > generates a
> > > > > > > > > #BP.
> > > > > > > > > So I am confused now about the behaviour of
> > > > > > > > > int3 instruction, could anyone help to
> > > > > > > > > explain the behaviour?
> > > > > > > > > Any suggestion is appreciated!
> > > > > > > >
> > > > > > > > Have you tried my suggestions from the other thread[*]?
> > > > > > Firstly, I'm sorry for sending muliple mails with the
> > > > > > same content. I thought the mails I sent previously
> > > > > > didn't be sent successfully.
> > > > > > And let's talk the problem here.
> > > > > > > >
> > > > > > > > ??? : > > I'm curious how this happend. I cannot
> > > > > > > > find any condition that would
> > > > > > > > ??? : > > cause the int3 instruction generate a
> > > > > > > > #UD according to the AMD's spec.
> > > > > > > > ??? :
> > > > > > > > ??? : One possibility is that the value from
> > > > > > > > memory that gets executed diverges from the
> > > > > > > > ??? : value that is read out be the #UD handler,
> > > > > > > > e.g. due to patching (doesn't seem to
> > > > > > > > ??? : be the case in this test), stale cache/tlb entries, etc.
> > > > > > > > ??? :
> > > > > > > > ??? : > > BTW, it worked nomarlly with qemu and ovmf.
> > > > > > > > ??? : >
> > > > > > > > ??? : > Does this happen every time you boot the
> > > > > > > > guest with your firmware? What
> > > > > > > > ??? : > processor are you running on?
> > > > > > > > ??? :
> > > > > > Yes, every time.
> > > > > > The processor I used is EPYC 7T83.
> > > > > > > > ??? : And have you ruled out KVM as the
> > > > > > > > culprit?? I.e. verified that KVM is NOT
> > > > > > > > injecting
> > > > > > > > ??? : a #UD.? That obviously shouldn't happen,
> > > > > > > > but it should be easy to check via KVM
> > > > > > > > ??? : tracepoints.
> > > > > > >
> > > > > > > I have a feeling that KVM is injecting the #UD, but
> > > > > > > it will take instrumenting KVM to see which path the
> > > > > > > #UD is being injected from.
> > > > > > >
> > > > > > > Wu Zongyo, can you add some instrumentation to
> > > > > > > figure that out if the trace points towards KVM
> > > > > > > injecting the #UD?
> > > > > > Ok, I will try to do that.
> > > > > You're right. The #UD is injected by KVM.
> > > > >
> > > > > The path I found is:
> > > > > ???? svm_vcpu_run
> > > > > ???????? svm_complete_interrupts
> > > > > ??????? kvm_requeue_exception // vector = 3
> > > > > ??????????? kvm_make_request
> > > > >
> > > > > ???? vcpu_enter_guest
> > > > > ???????? kvm_check_and_inject_events
> > > > > ??????? svm_inject_exception
> > > > > ??????????? svm_update_soft_interrupt_rip
> > > > > ??????????? __svm_skip_emulated_instruction
> > > > > ??????????????? x86_emulate_instruction
> > > > > ??????????????? svm_can_emulate_instruction
> > > > > ??????????????????? kvm_queue_exception(vcpu, UD_VECTOR)
> > > > >
> > > > > Does this mean a #PF intercept occur when the guest try to deliver a
> > > > > #BP through the IDT? But why?
> > > >
> > > > I doubt it's a #PF.? A #NPF is much more likely, though it could
> > > > be something
> > > > else entirely, but I'm pretty sure that would require bugs in
> > > > both the host and
> > > > guest.
> > > >
> > > > What is the last exit recorded by trace_kvm_exit() before the
> > > > #UD is injected?
> > >
> > > I'm guessing it was a #NPF, too. Could it be related to the changes that
> > > went in around svm_update_soft_interrupt_rip()?
Yes, it's a #NPF with exit code 0x400.
There must be something I didn't handle corretly since it behave normally with
qemu & ovmf If I don't add int3 before mcheck_cpu_init().
So it'a about memory, is there something I need to pay special attention
to?
Thanks
> > >
> > > 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
> > > instruction")
> >
> > Sorry, that should have been:
> >
> > 7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on
> > "failure"")
>
> Doh! I was right the first time... sigh
>
> 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
>
> Thanks,
> Tom
>
> >
> > >
> > > Before this the !nrips check would prevent the call into
> > > svm_skip_emulated_instruction(). But now, there is a call to:
> > >
> > > ?? svm_update_soft_interrupt_rip()
> > > ???? __svm_skip_emulated_instruction()
> > > ?????? kvm_emulate_instruction()
> > > ???????? x86_emulate_instruction() (passed a NULL insn pointer)
> > > ?????????? kvm_can_emulate_insn() (passed a NULL insn pointer)
> > > ???????????? svm_can_emulate_instruction() (passed NULL insn pointer)
> > >
> > > Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
> > > and injects the #UD.
> > >
> > > Thanks,
> > > Tom
> > >
On Thu, Aug 03, 2023 at 11:27:12AM +0800, Wu Zongyo wrote:
> On Wed, Aug 02, 2023 at 03:03:45PM -0500, Tom Lendacky wrote:
> > On 8/2/23 09:33, Tom Lendacky wrote:
> > > On 8/2/23 09:25, Tom Lendacky wrote:
> > > > On 8/2/23 09:01, Sean Christopherson wrote:
> > > > > On Wed, Aug 02, 2023, Wu Zongyo wrote:
> > > > > > On Mon, Jul 31, 2023 at 11:45:29PM +0800, wuzongyong wrote:
> > > > > > >
> > > > > > > On 2023/7/31 23:03, Tom Lendacky wrote:
> > > > > > > > On 7/31/23 09:30, Sean Christopherson wrote:
> > > > > > > > > On Sat, Jul 29, 2023, wuzongyong wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > I am writing a firmware in Rust to support
> > > > > > > > > > SEV based on project td-shim[1].
> > > > > > > > > > But when I create a SEV VM (just SEV, no
> > > > > > > > > > SEV-ES and no SEV-SNP) with the firmware,
> > > > > > > > > > the linux kernel crashed because the int3
> > > > > > > > > > instruction in int3_selftest() cause a
> > > > > > > > > > #UD.
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > > BTW, if a create a normal VM without SEV by
> > > > > > > > > > qemu & OVMF, the int3 instruction always
> > > > > > > > > > generates a
> > > > > > > > > > #BP.
> > > > > > > > > > So I am confused now about the behaviour of
> > > > > > > > > > int3 instruction, could anyone help to
> > > > > > > > > > explain the behaviour?
> > > > > > > > > > Any suggestion is appreciated!
> > > > > > > > >
> > > > > > > > > Have you tried my suggestions from the other thread[*]?
> > > > > > > Firstly, I'm sorry for sending muliple mails with the
> > > > > > > same content. I thought the mails I sent previously
> > > > > > > didn't be sent successfully.
> > > > > > > And let's talk the problem here.
> > > > > > > > >
> > > > > > > > > ??? : > > I'm curious how this happend. I cannot
> > > > > > > > > find any condition that would
> > > > > > > > > ??? : > > cause the int3 instruction generate a
> > > > > > > > > #UD according to the AMD's spec.
> > > > > > > > > ??? :
> > > > > > > > > ??? : One possibility is that the value from
> > > > > > > > > memory that gets executed diverges from the
> > > > > > > > > ??? : value that is read out be the #UD handler,
> > > > > > > > > e.g. due to patching (doesn't seem to
> > > > > > > > > ??? : be the case in this test), stale cache/tlb entries, etc.
> > > > > > > > > ??? :
> > > > > > > > > ??? : > > BTW, it worked nomarlly with qemu and ovmf.
> > > > > > > > > ??? : >
> > > > > > > > > ??? : > Does this happen every time you boot the
> > > > > > > > > guest with your firmware? What
> > > > > > > > > ??? : > processor are you running on?
> > > > > > > > > ??? :
> > > > > > > Yes, every time.
> > > > > > > The processor I used is EPYC 7T83.
> > > > > > > > > ??? : And have you ruled out KVM as the
> > > > > > > > > culprit?? I.e. verified that KVM is NOT
> > > > > > > > > injecting
> > > > > > > > > ??? : a #UD.? That obviously shouldn't happen,
> > > > > > > > > but it should be easy to check via KVM
> > > > > > > > > ??? : tracepoints.
> > > > > > > >
> > > > > > > > I have a feeling that KVM is injecting the #UD, but
> > > > > > > > it will take instrumenting KVM to see which path the
> > > > > > > > #UD is being injected from.
> > > > > > > >
> > > > > > > > Wu Zongyo, can you add some instrumentation to
> > > > > > > > figure that out if the trace points towards KVM
> > > > > > > > injecting the #UD?
> > > > > > > Ok, I will try to do that.
> > > > > > You're right. The #UD is injected by KVM.
> > > > > >
> > > > > > The path I found is:
> > > > > > ???? svm_vcpu_run
> > > > > > ???????? svm_complete_interrupts
> > > > > > ??????? kvm_requeue_exception // vector = 3
> > > > > > ??????????? kvm_make_request
> > > > > >
> > > > > > ???? vcpu_enter_guest
> > > > > > ???????? kvm_check_and_inject_events
> > > > > > ??????? svm_inject_exception
> > > > > > ??????????? svm_update_soft_interrupt_rip
> > > > > > ??????????? __svm_skip_emulated_instruction
> > > > > > ??????????????? x86_emulate_instruction
> > > > > > ??????????????? svm_can_emulate_instruction
> > > > > > ??????????????????? kvm_queue_exception(vcpu, UD_VECTOR)
> > > > > >
> > > > > > Does this mean a #PF intercept occur when the guest try to deliver a
> > > > > > #BP through the IDT? But why?
> > > > >
> > > > > I doubt it's a #PF.? A #NPF is much more likely, though it could
> > > > > be something
> > > > > else entirely, but I'm pretty sure that would require bugs in
> > > > > both the host and
> > > > > guest.
> > > > >
> > > > > What is the last exit recorded by trace_kvm_exit() before the
> > > > > #UD is injected?
> > > >
> > > > I'm guessing it was a #NPF, too. Could it be related to the changes that
> > > > went in around svm_update_soft_interrupt_rip()?
> Yes, it's a #NPF with exit code 0x400.
>
> There must be something I didn't handle corretly since it behave normally with
> qemu & ovmf If I don't add int3 before mcheck_cpu_init().
>
> So it'a about memory, is there something I need to pay special attention
> to?
>
> Thanks
I check the fault address of #NPF, and it is the IDT entry address of
the guest kernel. The NPT page table is not constructed for the IDT
entry and the #NPF is generated when guest try to access IDT.
With qemu & ovmf, I didn't see the #NPF when guest invoke the int3
handler. That means the NPT page table has already been constructed, but
when?
> > > >
> > > > 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the
> > > > instruction")
> > >
> > > Sorry, that should have been:
> > >
> > > 7e5b5ef8dca3 ("KVM: SVM: Re-inject INTn instead of retrying the insn on
> > > "failure"")
> >
> > Doh! I was right the first time... sigh
> >
> > 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
> >
> > Thanks,
> > Tom
> >
> > >
> > > >
> > > > Before this the !nrips check would prevent the call into
> > > > svm_skip_emulated_instruction(). But now, there is a call to:
> > > >
> > > > ?? svm_update_soft_interrupt_rip()
> > > > ???? __svm_skip_emulated_instruction()
> > > > ?????? kvm_emulate_instruction()
> > > > ???????? x86_emulate_instruction() (passed a NULL insn pointer)
> > > > ?????????? kvm_can_emulate_insn() (passed a NULL insn pointer)
> > > > ???????????? svm_can_emulate_instruction() (passed NULL insn pointer)
> > > >
> > > > Because it is an SEV guest, it ends up in the "if (unlikely(!insn))" path
> > > > and injects the #UD.
> > > >
> > > > Thanks,
> > > > Tom
> > > >
On Thu, Aug 03, 2023, Wu Zongyo wrote:
> On Thu, Aug 03, 2023 at 11:27:12AM +0800, Wu Zongyo wrote:
> > > > >
> > > > > I'm guessing it was a #NPF, too. Could it be related to the changes that
> > > > > went in around svm_update_soft_interrupt_rip()?
> > Yes, it's a #NPF with exit code 0x400.
> >
> > There must be something I didn't handle corretly since it behave normally with
> > qemu & ovmf If I don't add int3 before mcheck_cpu_init().
> >
> > So it'a about memory, is there something I need to pay special attention
> > to?
> >
> > Thanks
> I check the fault address of #NPF, and it is the IDT entry address of
> the guest kernel. The NPT page table is not constructed for the IDT
> entry and the #NPF is generated when guest try to access IDT.
>
> With qemu & ovmf, I didn't see the #NPF when guest invoke the int3
> handler. That means the NPT page table has already been constructed, but
> when?
More than likely, the page was used by the guest at some point earlier in boot.
Why the page is faulted in for certain setups but not others isn't really all
that interesting in terms of fixing the KVM bug, both guest behaviors are completely
normal and should work.
Can you try this patch I suggested earlier? If this fixes the problem, I'll post
a formal patch.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d381ad424554..2eace114a934 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -385,6 +385,9 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
}
if (!svm->next_rip) {
+ if (sev_guest(vcpu->kvm))
+ return 0;
+
if (unlikely(!commit_side_effects))
old_rflags = svm->vmcb->save.rflags;
On Thu, Aug 03, 2023 at 02:34:50PM +0000, Sean Christopherson wrote:
> On Thu, Aug 03, 2023, Wu Zongyo wrote:
> > On Thu, Aug 03, 2023 at 11:27:12AM +0800, Wu Zongyo wrote:
> > > > > >
> > > > > > I'm guessing it was a #NPF, too. Could it be related to the changes that
> > > > > > went in around svm_update_soft_interrupt_rip()?
> > > Yes, it's a #NPF with exit code 0x400.
> > >
> > > There must be something I didn't handle corretly since it behave normally with
> > > qemu & ovmf If I don't add int3 before mcheck_cpu_init().
> > >
> > > So it'a about memory, is there something I need to pay special attention
> > > to?
> > >
> > > Thanks
> > I check the fault address of #NPF, and it is the IDT entry address of
> > the guest kernel. The NPT page table is not constructed for the IDT
> > entry and the #NPF is generated when guest try to access IDT.
> >
> > With qemu & ovmf, I didn't see the #NPF when guest invoke the int3
> > handler. That means the NPT page table has already been constructed, but
> > when?
>
> More than likely, the page was used by the guest at some point earlier in boot.
> Why the page is faulted in for certain setups but not others isn't really all
> that interesting in terms of fixing the KVM bug, both guest behaviors are completely
> normal and should work.
>
> Can you try this patch I suggested earlier? If this fixes the problem, I'll post
> a formal patch.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d381ad424554..2eace114a934 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -385,6 +385,9 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu,
> }
>
> if (!svm->next_rip) {
> + if (sev_guest(vcpu->kvm))
> + return 0;
> +
> if (unlikely(!commit_side_effects))
> old_rflags = svm->vmcb->save.rflags;
>
Yes, the patch solves the problem.