With CONFIG_KCOV=y and an AMD processor, running the following program crashes
the kernel with no output (I'm testing in a VM, so it's using nested
virtualization):
#include <fcntl.h>
#include <linux/kvm.h>
#include <sys/ioctl.h>
int main()
{
int dev, vm, cpu;
char page[4096] __attribute__((aligned(4096))) = { 0 };
struct kvm_userspace_memory_region memreg = {
.memory_size = 4096,
.userspace_addr = (unsigned long)page,
};
dev = open("/dev/kvm", O_RDONLY);
vm = ioctl(dev, KVM_CREATE_VM, 0);
cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
ioctl(cpu, KVM_RUN, 0);
}
It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
crash somehow. The following patch fixes it, though I don't know that it's the
right solution; maybe KCOV should be disabled in the function instead, or maybe
there's a more fundamental problem. What do people think?
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1fc05e428aba8..d35ef241e66d8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5652,6 +5652,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);
+#ifdef CONFIG_X86_64
+ wrmsrl(MSR_GS_BASE, svm->host.gs_base);
+#else
+ loadsegment(fs, svm->host.fs);
+#ifndef CONFIG_X86_32_LAZY_GS
+ loadsegment(gs, svm->host.gs);
+#endif
+#endif
+
/*
* We do not use IBRS in the kernel. If this vCPU has used the
* SPEC_CTRL MSR it may have left it on; save the value and
@@ -5676,15 +5685,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
/* Eliminate branch target predictions from guest mode */
vmexit_fill_RSB();
-#ifdef CONFIG_X86_64
- wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
- loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
- loadsegment(gs, svm->host.gs);
-#endif
-#endif
-
reload_tss(vcpu);
local_irq_disable();
Sorry, messed up address for KVM mailing list. See message below.
On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> the kernel with no output (I'm testing in a VM, so it's using nested
> virtualization):
>
> #include <fcntl.h>
> #include <linux/kvm.h>
> #include <sys/ioctl.h>
>
> int main()
> {
> int dev, vm, cpu;
> char page[4096] __attribute__((aligned(4096))) = { 0 };
> struct kvm_userspace_memory_region memreg = {
> .memory_size = 4096,
> .userspace_addr = (unsigned long)page,
> };
> dev = open("/dev/kvm", O_RDONLY);
> vm = ioctl(dev, KVM_CREATE_VM, 0);
> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> ioctl(cpu, KVM_RUN, 0);
> }
>
> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> crash somehow. The following patch fixes it, though I don't know that it's the
> right solution; maybe KCOV should be disabled in the function instead, or maybe
> there's a more fundamental problem. What do people think?
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 1fc05e428aba8..d35ef241e66d8 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -5652,6 +5652,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> #endif
> );
>
> +#ifdef CONFIG_X86_64
> + wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> +#else
> + loadsegment(fs, svm->host.fs);
> +#ifndef CONFIG_X86_32_LAZY_GS
> + loadsegment(gs, svm->host.gs);
> +#endif
> +#endif
> +
> /*
> * We do not use IBRS in the kernel. If this vCPU has used the
> * SPEC_CTRL MSR it may have left it on; save the value and
> @@ -5676,15 +5685,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
> /* Eliminate branch target predictions from guest mode */
> vmexit_fill_RSB();
>
> -#ifdef CONFIG_X86_64
> - wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> -#else
> - loadsegment(fs, svm->host.fs);
> -#ifndef CONFIG_X86_32_LAZY_GS
> - loadsegment(gs, svm->host.gs);
> -#endif
> -#endif
> -
> reload_tss(vcpu);
>
> local_irq_disable();
On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
> Sorry, messed up address for KVM mailing list. See message below.
>
> On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
>> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
>> the kernel with no output (I'm testing in a VM, so it's using nested
>> virtualization):
>>
>> #include <fcntl.h>
>> #include <linux/kvm.h>
>> #include <sys/ioctl.h>
>>
>> int main()
>> {
>> int dev, vm, cpu;
>> char page[4096] __attribute__((aligned(4096))) = { 0 };
>> struct kvm_userspace_memory_region memreg = {
>> .memory_size = 4096,
>> .userspace_addr = (unsigned long)page,
>> };
>> dev = open("/dev/kvm", O_RDONLY);
>> vm = ioctl(dev, KVM_CREATE_VM, 0);
>> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
>> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
>> ioctl(cpu, KVM_RUN, 0);
>> }
>>
>> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
>> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
>> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
>> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
>> crash somehow. The following patch fixes it, though I don't know that it's the
>> right solution; maybe KCOV should be disabled in the function instead, or maybe
>> there's a more fundamental problem. What do people think?
If __sanitizer_cov_trace_pc() crashes, I would expect there must be
few more of them here:
if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
if (svm->spec_ctrl)
native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
Compiler inserts these callbacks into every basic block/edge.. Aren't there?
Unfortunately we don't have an attribute that disables instrumentation
of a single function. This is currently possible only on file level.
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 1fc05e428aba8..d35ef241e66d8 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -5652,6 +5652,15 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>> #endif
>> );
>>
>> +#ifdef CONFIG_X86_64
>> + wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>> +#else
>> + loadsegment(fs, svm->host.fs);
>> +#ifndef CONFIG_X86_32_LAZY_GS
>> + loadsegment(gs, svm->host.gs);
>> +#endif
>> +#endif
>> +
>> /*
>> * We do not use IBRS in the kernel. If this vCPU has used the
>> * SPEC_CTRL MSR it may have left it on; save the value and
>> @@ -5676,15 +5685,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
>> /* Eliminate branch target predictions from guest mode */
>> vmexit_fill_RSB();
>>
>> -#ifdef CONFIG_X86_64
>> - wrmsrl(MSR_GS_BASE, svm->host.gs_base);
>> -#else
>> - loadsegment(fs, svm->host.fs);
>> -#ifndef CONFIG_X86_32_LAZY_GS
>> - loadsegment(gs, svm->host.gs);
>> -#endif
>> -#endif
>> -
>> reload_tss(vcpu);
>>
>> local_irq_disable();
On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
> On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
> > Sorry, messed up address for KVM mailing list. See message below.
> >
> > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> >> the kernel with no output (I'm testing in a VM, so it's using nested
> >> virtualization):
> >>
> >> #include <fcntl.h>
> >> #include <linux/kvm.h>
> >> #include <sys/ioctl.h>
> >>
> >> int main()
> >> {
> >> int dev, vm, cpu;
> >> char page[4096] __attribute__((aligned(4096))) = { 0 };
> >> struct kvm_userspace_memory_region memreg = {
> >> .memory_size = 4096,
> >> .userspace_addr = (unsigned long)page,
> >> };
> >> dev = open("/dev/kvm", O_RDONLY);
> >> vm = ioctl(dev, KVM_CREATE_VM, 0);
> >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> >> ioctl(cpu, KVM_RUN, 0);
> >> }
> >>
> >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> >> crash somehow. The following patch fixes it, though I don't know that it's the
> >> right solution; maybe KCOV should be disabled in the function instead, or maybe
> >> there's a more fundamental problem. What do people think?
>
>
> If __sanitizer_cov_trace_pc() crashes, I would expect there must be
> few more of them here:
>
> if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
>
> if (svm->spec_ctrl)
> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>
> Compiler inserts these callbacks into every basic block/edge.. Aren't there?
>
> Unfortunately we don't have an attribute that disables instrumentation
> of a single function. This is currently possible only on file level.
>
Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
tested moves the write to MSR_GS_BASE to before all of them, so it's once again
the first thing after the asm block. Again I'm not sure it's the proper
solution, but it did make it stop crashing.
Also I'm guessing this isn't specific to nested virtualization; I just didn't
have KCOV enabled on the host, thus the host didn't crash.
- Eric
On Mon, May 14, 2018 at 7:25 PM, Eric Biggers <[email protected]> wrote:
> On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
>> On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
>> > Sorry, messed up address for KVM mailing list. See message below.
>> >
>> > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
>> >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
>> >> the kernel with no output (I'm testing in a VM, so it's using nested
>> >> virtualization):
>> >>
>> >> #include <fcntl.h>
>> >> #include <linux/kvm.h>
>> >> #include <sys/ioctl.h>
>> >>
>> >> int main()
>> >> {
>> >> int dev, vm, cpu;
>> >> char page[4096] __attribute__((aligned(4096))) = { 0 };
>> >> struct kvm_userspace_memory_region memreg = {
>> >> .memory_size = 4096,
>> >> .userspace_addr = (unsigned long)page,
>> >> };
>> >> dev = open("/dev/kvm", O_RDONLY);
>> >> vm = ioctl(dev, KVM_CREATE_VM, 0);
>> >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
>> >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
>> >> ioctl(cpu, KVM_RUN, 0);
>> >> }
>> >>
>> >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
>> >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
>> >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
>> >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
>> >> crash somehow. The following patch fixes it, though I don't know that it's the
>> >> right solution; maybe KCOV should be disabled in the function instead, or maybe
>> >> there's a more fundamental problem. What do people think?
>>
>>
>> If __sanitizer_cov_trace_pc() crashes, I would expect there must be
>> few more of them here:
>>
>> if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
>> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
>>
>> if (svm->spec_ctrl)
>> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>>
>> Compiler inserts these callbacks into every basic block/edge.. Aren't there?
>>
>> Unfortunately we don't have an attribute that disables instrumentation
>> of a single function. This is currently possible only on file level.
>>
>
> Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
> to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
> tested moves the write to MSR_GS_BASE to before all of them, so it's once again
> the first thing after the asm block. Again I'm not sure it's the proper
> solution, but it did make it stop crashing.
From KCOV perspective:
This is definitely the simplest and less intrusive solution.
It's somewhat unreliable. But it's hard to tell if/when it will
actually break in practice. Compiler can decide to insert the callback
after asm block, or a branch can be added to wrmsrl (e.g. under some
debug config). More reliable solution would be to restore registers in
asm block itself, or move this to a separate file and disable
instrumentation of that file (though, will not save from non-inlined
wrmsrl). But again, the proposed solution may work well for the next
10 years, so additional complexity may not worth it.
Btw, I don't see anything about fs/gs in vmx_vcpu_run. Is it VMLAUNCH
that saves/restores them?
On Mon, May 14, 2018 at 10:25:08AM -0700, Eric Biggers wrote:
> On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
> > On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
> > > Sorry, messed up address for KVM mailing list. See message below.
> > >
> > > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> > >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> > >> the kernel with no output (I'm testing in a VM, so it's using nested
> > >> virtualization):
> > >>
> > >> #include <fcntl.h>
> > >> #include <linux/kvm.h>
> > >> #include <sys/ioctl.h>
> > >>
> > >> int main()
> > >> {
> > >> int dev, vm, cpu;
> > >> char page[4096] __attribute__((aligned(4096))) = { 0 };
> > >> struct kvm_userspace_memory_region memreg = {
> > >> .memory_size = 4096,
> > >> .userspace_addr = (unsigned long)page,
> > >> };
> > >> dev = open("/dev/kvm", O_RDONLY);
> > >> vm = ioctl(dev, KVM_CREATE_VM, 0);
> > >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> > >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> > >> ioctl(cpu, KVM_RUN, 0);
> > >> }
> > >>
> > >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> > >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> > >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> > >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> > >> crash somehow. The following patch fixes it, though I don't know that it's the
> > >> right solution; maybe KCOV should be disabled in the function instead, or maybe
> > >> there's a more fundamental problem. What do people think?
> >
> >
> > If __sanitizer_cov_trace_pc() crashes, I would expect there must be
> > few more of them here:
> >
> > if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> > svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> >
> > if (svm->spec_ctrl)
> > native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> >
> > Compiler inserts these callbacks into every basic block/edge.. Aren't there?
> >
> > Unfortunately we don't have an attribute that disables instrumentation
> > of a single function. This is currently possible only on file level.
> >
>
> Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
> to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
> tested moves the write to MSR_GS_BASE to before all of them, so it's once again
> the first thing after the asm block. Again I'm not sure it's the proper
> solution, but it did make it stop crashing.
>
> Also I'm guessing this isn't specific to nested virtualization; I just didn't
> have KCOV enabled on the host, thus the host didn't crash.
>
Okay, this was (apparently coincidentally) fixed by commit 15e6c22fd8e5a:
"KVM: SVM: Move spec control call after restore of GS". Thanks Thomas!
- Eric
Reviving this old thread because it wasn't fully fixed after all...
On Tue, May 15, 2018 at 07:33:48AM +0200, Dmitry Vyukov wrote:
> On Mon, May 14, 2018 at 7:25 PM, Eric Biggers <[email protected]> wrote:
> > On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
> >> On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
> >> > Sorry, messed up address for KVM mailing list. See message below.
> >> >
> >> > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> >> >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> >> >> the kernel with no output (I'm testing in a VM, so it's using nested
> >> >> virtualization):
> >> >>
> >> >> #include <fcntl.h>
> >> >> #include <linux/kvm.h>
> >> >> #include <sys/ioctl.h>
> >> >>
> >> >> int main()
> >> >> {
> >> >> int dev, vm, cpu;
> >> >> char page[4096] __attribute__((aligned(4096))) = { 0 };
> >> >> struct kvm_userspace_memory_region memreg = {
> >> >> .memory_size = 4096,
> >> >> .userspace_addr = (unsigned long)page,
> >> >> };
> >> >> dev = open("/dev/kvm", O_RDONLY);
> >> >> vm = ioctl(dev, KVM_CREATE_VM, 0);
> >> >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> >> >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> >> >> ioctl(cpu, KVM_RUN, 0);
> >> >> }
> >> >>
> >> >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> >> >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> >> >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> >> >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> >> >> crash somehow. The following patch fixes it, though I don't know that it's the
> >> >> right solution; maybe KCOV should be disabled in the function instead, or maybe
> >> >> there's a more fundamental problem. What do people think?
> >>
> >>
> >> If __sanitizer_cov_trace_pc() crashes, I would expect there must be
> >> few more of them here:
> >>
> >> if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> >> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> >>
> >> if (svm->spec_ctrl)
> >> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> >>
> >> Compiler inserts these callbacks into every basic block/edge.. Aren't there?
> >>
> >> Unfortunately we don't have an attribute that disables instrumentation
> >> of a single function. This is currently possible only on file level.
> >>
> >
> > Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
> > to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
> > tested moves the write to MSR_GS_BASE to before all of them, so it's once again
> > the first thing after the asm block. Again I'm not sure it's the proper
> > solution, but it did make it stop crashing.
>
> From KCOV perspective:
> This is definitely the simplest and less intrusive solution.
> It's somewhat unreliable. But it's hard to tell if/when it will
> actually break in practice. Compiler can decide to insert the callback
> after asm block, or a branch can be added to wrmsrl (e.g. under some
> debug config). More reliable solution would be to restore registers in
> asm block itself, or move this to a separate file and disable
> instrumentation of that file (though, will not save from non-inlined
> wrmsrl). But again, the proposed solution may work well for the next
> 10 years, so additional complexity may not worth it.
>
> Btw, I don't see anything about fs/gs in vmx_vcpu_run. Is it VMLAUNCH
> that saves/restores them?
So it turns out there *is* a branch in wrmsrl() when CONFIG_PARAVIRT=y &&
CONFIG_PARAVIRT_DEBUG=y, and that causes the same crash: the compiler inserts a
call to __sanitizer_cov_trace_pc() prior to the GS_BASE register being restored
in svm_vcpu_run().
#ifdef CONFIG_PARAVIRT_DEBUG
#define PVOP_TEST_NULL(op) BUG_ON(pv_ops.op == NULL)
#else
#define PVOP_TEST_NULL(op) ((void)pv_ops.op)
#endif
Dmitry, in the long run maybe this should be solved by adding a function
attribute to gcc that disables coverage for a function?
But for now maybe CONFIG_KCOV should depend on !CONFIG_PARAVIRT_DEBUG? Or does
anyone have a better idea? Alternatively as Dmitry suggested, svm_vcpu_run()
could be moved to a separate file and compiled with different flags...
- Eric
On Wed, Feb 27, 2019 at 8:14 AM Eric Biggers <[email protected]> wrote:
>
> Reviving this old thread because it wasn't fully fixed after all...
>
> On Tue, May 15, 2018 at 07:33:48AM +0200, Dmitry Vyukov wrote:
> > On Mon, May 14, 2018 at 7:25 PM, Eric Biggers <[email protected]> wrote:
> > > On Mon, May 14, 2018 at 07:14:41AM +0200, Dmitry Vyukov wrote:
> > >> On Mon, May 14, 2018 at 5:02 AM, Eric Biggers <[email protected]> wrote:
> > >> > Sorry, messed up address for KVM mailing list. See message below.
> > >> >
> > >> > On Sun, May 13, 2018 at 08:00:07PM -0700, Eric Biggers wrote:
> > >> >> With CONFIG_KCOV=y and an AMD processor, running the following program crashes
> > >> >> the kernel with no output (I'm testing in a VM, so it's using nested
> > >> >> virtualization):
> > >> >>
> > >> >> #include <fcntl.h>
> > >> >> #include <linux/kvm.h>
> > >> >> #include <sys/ioctl.h>
> > >> >>
> > >> >> int main()
> > >> >> {
> > >> >> int dev, vm, cpu;
> > >> >> char page[4096] __attribute__((aligned(4096))) = { 0 };
> > >> >> struct kvm_userspace_memory_region memreg = {
> > >> >> .memory_size = 4096,
> > >> >> .userspace_addr = (unsigned long)page,
> > >> >> };
> > >> >> dev = open("/dev/kvm", O_RDONLY);
> > >> >> vm = ioctl(dev, KVM_CREATE_VM, 0);
> > >> >> cpu = ioctl(vm, KVM_CREATE_VCPU, 0);
> > >> >> ioctl(vm, KVM_SET_USER_MEMORY_REGION, &memreg);
> > >> >> ioctl(cpu, KVM_RUN, 0);
> > >> >> }
> > >> >>
> > >> >> It bisects down to commit b2ac58f90540e39 ("KVM/SVM: Allow direct access to
> > >> >> MSR_IA32_SPEC_CTRL"). The bug is apparently that due to the new code for
> > >> >> managing the SPEC_CTRL MSR, __sanitizer_cov_trace_pc() is being called from
> > >> >> svm_vcpu_run() before the host's MSR_GS_BASE has been restored, which causes a
> > >> >> crash somehow. The following patch fixes it, though I don't know that it's the
> > >> >> right solution; maybe KCOV should be disabled in the function instead, or maybe
> > >> >> there's a more fundamental problem. What do people think?
> > >>
> > >>
> > >> If __sanitizer_cov_trace_pc() crashes, I would expect there must be
> > >> few more of them here:
> > >>
> > >> if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> > >> svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
> > >>
> > >> if (svm->spec_ctrl)
> > >> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > >>
> > >> Compiler inserts these callbacks into every basic block/edge.. Aren't there?
> > >>
> > >> Unfortunately we don't have an attribute that disables instrumentation
> > >> of a single function. This is currently possible only on file level.
> > >>
> > >
> > > Yes, due to the code dealing with MSR_IA32_SPEC_CTRL, there were several calls
> > > to __sanitizer_cov_trace_pc() before the write to MSR_GS_BASE. The patch I
> > > tested moves the write to MSR_GS_BASE to before all of them, so it's once again
> > > the first thing after the asm block. Again I'm not sure it's the proper
> > > solution, but it did make it stop crashing.
> >
> > From KCOV perspective:
> > This is definitely the simplest and less intrusive solution.
> > It's somewhat unreliable. But it's hard to tell if/when it will
> > actually break in practice. Compiler can decide to insert the callback
> > after asm block, or a branch can be added to wrmsrl (e.g. under some
> > debug config). More reliable solution would be to restore registers in
> > asm block itself, or move this to a separate file and disable
> > instrumentation of that file (though, will not save from non-inlined
> > wrmsrl). But again, the proposed solution may work well for the next
> > 10 years, so additional complexity may not worth it.
> >
> > Btw, I don't see anything about fs/gs in vmx_vcpu_run. Is it VMLAUNCH
> > that saves/restores them?
>
> So it turns out there *is* a branch in wrmsrl() when CONFIG_PARAVIRT=y &&
> CONFIG_PARAVIRT_DEBUG=y, and that causes the same crash: the compiler inserts a
> call to __sanitizer_cov_trace_pc() prior to the GS_BASE register being restored
> in svm_vcpu_run().
>
> #ifdef CONFIG_PARAVIRT_DEBUG
> #define PVOP_TEST_NULL(op) BUG_ON(pv_ops.op == NULL)
> #else
> #define PVOP_TEST_NULL(op) ((void)pv_ops.op)
> #endif
>
> Dmitry, in the long run maybe this should be solved by adding a function
> attribute to gcc that disables coverage for a function?
>
> But for now maybe CONFIG_KCOV should depend on !CONFIG_PARAVIRT_DEBUG? Or does
> anyone have a better idea? Alternatively as Dmitry suggested, svm_vcpu_run()
> could be moved to a separate file and compiled with different flags...
Re attribute. This is only place this come up so far. No precedents in
user-space and kernel is also quite extensive tested as well. Though
we bulk ignore some files involved in early boot, maybe the attribute
would be useful in some of these files too. E.g.:
https://bugzilla.kernel.org/show_bug.cgi?id=198443
We could add it, but adding an attribute to both compilers is not
something completely trivial and will take time to propagate to users.
I would go with some solution local to SVM for now.
It would be nice to restore segment registers right in the asm block
so that C code always have proper normal context for execution (e.g.
can affect KASAN/KMSAN/KTSAN too). But with paravirt it's not trivial,
right? At least not as trivial as executing WRMSR.