2020-08-19 18:08:24

by Tom Lendacky

[permalink] [raw]
Subject: FSGSBASE causing panic on 5.9-rc1

It looks like the FSGSBASE support is crashing my second generation EPYC
system. I was able to bisect it to:

b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")

The panic only happens when using KVM. Doing kernel builds or stress
on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
guest and do a kernel build within the guest, I get the following:

[ 120.360637] BUG: scheduling while atomic: qemu-system-x86/5485/0x00110000
[ 124.041646] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: x86_pmu_handle_irq+0x163/0x170
[ 124.041647] ------------[ cut here ]------------
[ 124.041649] Hardware name: AMD
[ 124.041649] Workqueue: 0x0 (events)
[ 124.041651] Call Trace:
[ 124.041651] ------------[ cut here ]------------
[ 124.041652] corrupted preempt_count: kworker/22:1/1449/0x110000
[ 124.051267] WARNING: CPU: 22 PID: 1449 at kernel/sched/core.c:3595 finish_task_switch+0x289/0x290
[ 124.051268] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc fuse amd64_edac_mod edac_mce_amd wmi_bmof kvm_amd kvm irqbypass sg ipmi_ssif ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq squashfs loop sch_fq_codel parport_pc ppdev lp parport ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops libata drm e1000e i2c_piix4 wmi i2c_designware_platform i2c_designware_core pinctrl_amd i2c_core
[ 124.051285] CPU: 22 PID: 1449 Comm: kworker/22:1 Tainted: G W 5.9.0-rc1-sos-linux #1
[ 124.051286] Hardware name: AMD
[ 124.051286] Workqueue: 0x0 (events)
[ 124.051287] RIP: 0010:finish_task_switch+0x289/0x290
[ 124.051288] Code: ff 65 48 8b 04 25 c0 7b 01 00 8b 90 a8 08 00 00 48 8d b0 b0 0a 00 00 48 c7 c7 20 10 10 86 c6 05 be aa 55 01 01 e8 89 03 fd ff <0f> 0b e9 6b ff ff ff 55 48 89 e5 41 55 41 54 49 89 fc 53 48 89 f3
[ 124.051288] RSP: 0018:ffffc9001afe7e10 EFLAGS: 00010082
[ 124.051289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000023
[ 124.051290] RDX: 0000000000000023 RSI: ffffffff86101044 RDI: ffff88900d798bb0
[ 124.051290] RBP: ffffc9001afe7e38 R08: ffff88900d798ba8 R09: 0000000000000005
[ 124.051290] R10: 000000000000000f R11: ffff88900d798d54 R12: ffff88900d7aacc0
[ 124.051291] R13: ffff889bd2308000 R14: 0000000000000000 R15: ffff88900d7aacc0
[ 124.051291] FS: 0000000000000000(0000) GS:ffff88900d780000(0000) knlGS:0000000000000000
[ 124.051292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 124.051292] CR2: 00007ff607620000 CR3: 0000001bcb0d2000 CR4: 0000000000350ee0
[ 124.051293] Call Trace:
[ 124.051293] __schedule+0x348/0x810
[ 124.051293] ? dbs_work_handler+0x47/0x60
[ 124.051294] schedule+0x4a/0xb0
[ 124.051294] worker_thread+0xcf/0x3b0
[ 124.051294] ? process_one_work+0x370/0x370
[ 124.051294] kthread+0xfe/0x140
[ 124.051295] ? kthread_park+0x90/0x90
[ 124.051295] ret_from_fork+0x22/0x30
[ 124.051295] ---[ end trace 7f77ee8ad05caa89 ]---
[ 124.051296] Kernel Offset: disabled

Specifying nofsgsbase avoids the issue. This is very reproducible, so I
can easily test any fixes.

Thanks,
Tom


2020-08-19 18:21:00

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/19/20 1:07 PM, Tom Lendacky wrote:
> It looks like the FSGSBASE support is crashing my second generation EPYC
> system. I was able to bisect it to:
>
> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
>
> The panic only happens when using KVM. Doing kernel builds or stress
> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> guest and do a kernel build within the guest, I get the following:

I should clarify that this panic is on the bare-metal system, not in the
guest. And that specifying nofsgsbase on the bare-metal command line fixes
the issue.

Thanks,
Tom

>
> [ 120.360637] BUG: scheduling while atomic: qemu-system-x86/5485/0x00110000
> [ 124.041646] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: x86_pmu_handle_irq+0x163/0x170
> [ 124.041647] ------------[ cut here ]------------
> [ 124.041649] Hardware name: AMD
> [ 124.041649] Workqueue: 0x0 (events)
> [ 124.041651] Call Trace:
> [ 124.041651] ------------[ cut here ]------------
> [ 124.041652] corrupted preempt_count: kworker/22:1/1449/0x110000
> [ 124.051267] WARNING: CPU: 22 PID: 1449 at kernel/sched/core.c:3595 finish_task_switch+0x289/0x290
> [ 124.051268] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc fuse amd64_edac_mod edac_mce_amd wmi_bmof kvm_amd kvm irqbypass sg ipmi_ssif ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq squashfs loop sch_fq_codel parport_pc ppdev lp parport ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops libata drm e1000e i2c_piix4 wmi i2c_designware_platform i2c_designware_core pinctrl_amd i2c_core
> [ 124.051285] CPU: 22 PID: 1449 Comm: kworker/22:1 Tainted: G W 5.9.0-rc1-sos-linux #1
> [ 124.051286] Hardware name: AMD
> [ 124.051286] Workqueue: 0x0 (events)
> [ 124.051287] RIP: 0010:finish_task_switch+0x289/0x290
> [ 124.051288] Code: ff 65 48 8b 04 25 c0 7b 01 00 8b 90 a8 08 00 00 48 8d b0 b0 0a 00 00 48 c7 c7 20 10 10 86 c6 05 be aa 55 01 01 e8 89 03 fd ff <0f> 0b e9 6b ff ff ff 55 48 89 e5 41 55 41 54 49 89 fc 53 48 89 f3
> [ 124.051288] RSP: 0018:ffffc9001afe7e10 EFLAGS: 00010082
> [ 124.051289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000023
> [ 124.051290] RDX: 0000000000000023 RSI: ffffffff86101044 RDI: ffff88900d798bb0
> [ 124.051290] RBP: ffffc9001afe7e38 R08: ffff88900d798ba8 R09: 0000000000000005
> [ 124.051290] R10: 000000000000000f R11: ffff88900d798d54 R12: ffff88900d7aacc0
> [ 124.051291] R13: ffff889bd2308000 R14: 0000000000000000 R15: ffff88900d7aacc0
> [ 124.051291] FS: 0000000000000000(0000) GS:ffff88900d780000(0000) knlGS:0000000000000000
> [ 124.051292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 124.051292] CR2: 00007ff607620000 CR3: 0000001bcb0d2000 CR4: 0000000000350ee0
> [ 124.051293] Call Trace:
> [ 124.051293] __schedule+0x348/0x810
> [ 124.051293] ? dbs_work_handler+0x47/0x60
> [ 124.051294] schedule+0x4a/0xb0
> [ 124.051294] worker_thread+0xcf/0x3b0
> [ 124.051294] ? process_one_work+0x370/0x370
> [ 124.051294] kthread+0xfe/0x140
> [ 124.051295] ? kthread_park+0x90/0x90
> [ 124.051295] ret_from_fork+0x22/0x30
> [ 124.051295] ---[ end trace 7f77ee8ad05caa89 ]---
> [ 124.051296] Kernel Offset: disabled
>
> Specifying nofsgsbase avoids the issue. This is very reproducible, so I
> can easily test any fixes.
>
> Thanks,
> Tom
>

2020-08-19 21:26:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
>
> On 8/19/20 1:07 PM, Tom Lendacky wrote:
> > It looks like the FSGSBASE support is crashing my second generation EPYC
> > system. I was able to bisect it to:
> >
> > b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
> >
> > The panic only happens when using KVM. Doing kernel builds or stress
> > on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> > guest and do a kernel build within the guest, I get the following:
>
> I should clarify that this panic is on the bare-metal system, not in the
> guest. And that specifying nofsgsbase on the bare-metal command line fixes
> the issue.

I certainly see some oddities:

We have this code:

static void svm_vcpu_put(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
int i;

avic_vcpu_put(vcpu);

++vcpu->stat.host_state_reload;
kvm_load_ldt(svm->host.ldt);
#ifdef CONFIG_X86_64
loadsegment(fs, svm->host.fs);
wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
load_gs_index(svm->host.gs);

Surely that should do load_gs_index() *before* wrmsrl(). But that's
not the problem at hand.

There are also some open-coded rdmsr and wrmsrs of MSR_GS_BASE --
surely these should be x86_gsbase_read_cpu() and
x86_gsbase_write_cpu(). (Those functions don't actually exist, but
the fsbase equivalents do, and we should add them.) But that's also
not the problem at hand.

I haven't actually spotted the bug yet...

--Andy

2020-08-20 00:23:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
>
> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
> >
> > On 8/19/20 1:07 PM, Tom Lendacky wrote:
> > > It looks like the FSGSBASE support is crashing my second generation EPYC
> > > system. I was able to bisect it to:
> > >
> > > b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
> > >
> > > The panic only happens when using KVM. Doing kernel builds or stress
> > > on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> > > guest and do a kernel build within the guest, I get the following:
> >
> > I should clarify that this panic is on the bare-metal system, not in the
> > guest. And that specifying nofsgsbase on the bare-metal command line fixes
> > the issue.
>
> I certainly see some oddities:
>
> We have this code:
>
> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> int i;
>
> avic_vcpu_put(vcpu);
>
> ++vcpu->stat.host_state_reload;
> kvm_load_ldt(svm->host.ldt);
> #ifdef CONFIG_X86_64
> loadsegment(fs, svm->host.fs);
> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> load_gs_index(svm->host.gs);
>
> Surely that should do load_gs_index() *before* wrmsrl(). But that's
> not the problem at hand.
>
> There are also some open-coded rdmsr and wrmsrs of MSR_GS_BASE --
> surely these should be x86_gsbase_read_cpu() and
> x86_gsbase_write_cpu(). (Those functions don't actually exist, but
> the fsbase equivalents do, and we should add them.) But that's also
> not the problem at hand.

Make that cpu_kernelmode_gs_base(cpu). Perf win on all CPUs.

But I still don't see the bug.

2020-08-20 13:51:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 19/08/20 23:25, Andy Lutomirski wrote:
> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> load_gs_index(svm->host.gs);
>
> Surely that should do load_gs_index() *before* wrmsrl(). But that's
> not the problem at hand.

The wrmsrl is writing the inactive GS base so the ordering between
load_gs_index and wrmsrl(MSR_KERNEL_GS_BASE) should be irrelevant?

Paolo

2020-08-20 15:12:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
> >
> > On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
> > >
> > > On 8/19/20 1:07 PM, Tom Lendacky wrote:
> > > > It looks like the FSGSBASE support is crashing my second generation EPYC
> > > > system. I was able to bisect it to:
> > > >
> > > > b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
> > > >
> > > > The panic only happens when using KVM. Doing kernel builds or stress
> > > > on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> > > > guest and do a kernel build within the guest, I get the following:
> > >
> > > I should clarify that this panic is on the bare-metal system, not in the
> > > guest. And that specifying nofsgsbase on the bare-metal command line fixes
> > > the issue.
> >
> > I certainly see some oddities:
> >
> > We have this code:
> >
> > static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> > int i;
> >
> > avic_vcpu_put(vcpu);
> >
> > ++vcpu->stat.host_state_reload;
> > kvm_load_ldt(svm->host.ldt);
> > #ifdef CONFIG_X86_64
> > loadsegment(fs, svm->host.fs);
> > wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);

Pretty sure current->thread.gsbase can be stale, i.e. this needs:

current_save_fsgs();
wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);

On a related topic, we really should consolidate the VMX and SVM code for
these flows, they're both ugly.

> > load_gs_index(svm->host.gs);
> >
> > Surely that should do load_gs_index() *before* wrmsrl(). But that's
> > not the problem at hand.
> >
> > There are also some open-coded rdmsr and wrmsrs of MSR_GS_BASE --
> > surely these should be x86_gsbase_read_cpu() and
> > x86_gsbase_write_cpu(). (Those functions don't actually exist, but
> > the fsbase equivalents do, and we should add them.) But that's also
> > not the problem at hand.
>
> Make that cpu_kernelmode_gs_base(cpu). Perf win on all CPUs.
>
> But I still don't see the bug.


2020-08-20 15:24:47

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 10:10 AM, Sean Christopherson wrote:
> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
>>>
>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
>>>>
>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
>>>>> It looks like the FSGSBASE support is crashing my second generation EPYC
>>>>> system. I was able to bisect it to:
>>>>>
>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
>>>>>
>>>>> The panic only happens when using KVM. Doing kernel builds or stress
>>>>> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
>>>>> guest and do a kernel build within the guest, I get the following:
>>>>
>>>> I should clarify that this panic is on the bare-metal system, not in the
>>>> guest. And that specifying nofsgsbase on the bare-metal command line fixes
>>>> the issue.
>>>
>>> I certainly see some oddities:
>>>
>>> We have this code:
>>>
>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>>> {
>>> struct vcpu_svm *svm = to_svm(vcpu);
>>> int i;
>>>
>>> avic_vcpu_put(vcpu);
>>>
>>> ++vcpu->stat.host_state_reload;
>>> kvm_load_ldt(svm->host.ldt);
>>> #ifdef CONFIG_X86_64
>>> loadsegment(fs, svm->host.fs);
>>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>
> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>
> current_save_fsgs();

I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
current->thread.gsbase value to a new variable in the svm struct. I then
used that variable in the wrmsrl below, but it still crashed.

Thanks,
Tom

> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>
> On a related topic, we really should consolidate the VMX and SVM code for
> these flows, they're both ugly.
>
>>> load_gs_index(svm->host.gs);
>>>
>>> Surely that should do load_gs_index() *before* wrmsrl(). But that's
>>> not the problem at hand.
>>>
>>> There are also some open-coded rdmsr and wrmsrs of MSR_GS_BASE --
>>> surely these should be x86_gsbase_read_cpu() and
>>> x86_gsbase_write_cpu(). (Those functions don't actually exist, but
>>> the fsbase equivalents do, and we should add them.) But that's also
>>> not the problem at hand.
>>
>> Make that cpu_kernelmode_gs_base(cpu). Perf win on all CPUs.
>>
>> But I still don't see the bug.
>
>

2020-08-20 16:20:00

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 10:55 AM, Andy Lutomirski wrote:
> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]> wrote:
>>
>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
>>>>>
>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
>>>>>>
>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
>>>>>>> It looks like the FSGSBASE support is crashing my second generation EPYC
>>>>>>> system. I was able to bisect it to:
>>>>>>>
>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
>>>>>>>
>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
>>>>>>> guest and do a kernel build within the guest, I get the following:
>>>>>>
>>>>>> I should clarify that this panic is on the bare-metal system, not in the
>>>>>> guest. And that specifying nofsgsbase on the bare-metal command line fixes
>>>>>> the issue.
>>>>>
>>>>> I certainly see some oddities:
>>>>>
>>>>> We have this code:
>>>>>
>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>>>>> {
>>>>> struct vcpu_svm *svm = to_svm(vcpu);
>>>>> int i;
>>>>>
>>>>> avic_vcpu_put(vcpu);
>>>>>
>>>>> ++vcpu->stat.host_state_reload;
>>>>> kvm_load_ldt(svm->host.ldt);
>>>>> #ifdef CONFIG_X86_64
>>>>> loadsegment(fs, svm->host.fs);
>>>>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>>>
>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>>>
>>> current_save_fsgs();
>>
>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
>> current->thread.gsbase value to a new variable in the svm struct. I then
>> used that variable in the wrmsrl below, but it still crashed.
>
> Can you try bisecting all the way back to:
>
> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
> Author: Andy Lutomirski <[email protected]>
> Date: Thu May 28 16:13:48 2020 -0400
>
> x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
>
> and adding the unsafe_fsgsbase command line option while you bisect.

I'll give that a try.

>
> Also, you're crashing when you run a guest, right? Can you try

Right, when the guest is running. The guest boots fine and only when I put
some stress on it (kernel build) does it cause the issue. It might be
worth trying to pin all the vCPUs and see if the crash still happens.

> running the x86 sefltests on a bad kernel without running any guests?

I'll give that a try.

Thanks,
Tom

>
> --Andy
>

2020-08-20 16:40:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]> wrote:
>
> On 8/20/20 10:10 AM, Sean Christopherson wrote:
> > On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
> >> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
> >>>
> >>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky <[email protected]> wrote:
> >>>>
> >>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
> >>>>> It looks like the FSGSBASE support is crashing my second generation EPYC
> >>>>> system. I was able to bisect it to:
> >>>>>
> >>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
> >>>>>
> >>>>> The panic only happens when using KVM. Doing kernel builds or stress
> >>>>> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
> >>>>> guest and do a kernel build within the guest, I get the following:
> >>>>
> >>>> I should clarify that this panic is on the bare-metal system, not in the
> >>>> guest. And that specifying nofsgsbase on the bare-metal command line fixes
> >>>> the issue.
> >>>
> >>> I certainly see some oddities:
> >>>
> >>> We have this code:
> >>>
> >>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> >>> {
> >>> struct vcpu_svm *svm = to_svm(vcpu);
> >>> int i;
> >>>
> >>> avic_vcpu_put(vcpu);
> >>>
> >>> ++vcpu->stat.host_state_reload;
> >>> kvm_load_ldt(svm->host.ldt);
> >>> #ifdef CONFIG_X86_64
> >>> loadsegment(fs, svm->host.fs);
> >>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> >
> > Pretty sure current->thread.gsbase can be stale, i.e. this needs:
> >
> > current_save_fsgs();
>
> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
> current->thread.gsbase value to a new variable in the svm struct. I then
> used that variable in the wrmsrl below, but it still crashed.

Can you try bisecting all the way back to:

commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
Author: Andy Lutomirski <[email protected]>
Date: Thu May 28 16:13:48 2020 -0400

x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

and adding the unsafe_fsgsbase command line option while you bisect.

Also, you're crashing when you run a guest, right? Can you try
running the x86 sefltests on a bad kernel without running any guests?

--Andy

2020-08-20 17:53:57

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 6:43 AM Paolo Bonzini <[email protected]> wrote:
>
> On 19/08/20 23:25, Andy Lutomirski wrote:
> > wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> > load_gs_index(svm->host.gs);
> >
> > Surely that should do load_gs_index() *before* wrmsrl(). But that's
> > not the problem at hand.
>
> The wrmsrl is writing the inactive GS base so the ordering between
> load_gs_index and wrmsrl(MSR_KERNEL_GS_BASE) should be irrelevant?

load_gs_index() sets the index between a pair of swapgs's -- it writes
the inactive base, too.

--Andy

2020-08-20 18:41:35

by Jim Mattson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 11:38 AM Jim Mattson <[email protected]> wrote:
>
> On Thu, Aug 20, 2020 at 11:34 AM Tom Lendacky <[email protected]> wrote:
> >
> > On 8/20/20 11:30 AM, Tom Lendacky wrote:
> > > On 8/20/20 11:17 AM, Tom Lendacky wrote:
> > >> On 8/20/20 10:55 AM, Andy Lutomirski wrote:
> > >>> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]>
> > >>> wrote:
> > >>>>
> > >>>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
> > >>>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
> > >>>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky
> > >>>>>>> <[email protected]> wrote:
> > >>>>>>>>
> > >>>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
> > >>>>>>>>> It looks like the FSGSBASE support is crashing my second
> > >>>>>>>>> generation EPYC
> > >>>>>>>>> system. I was able to bisect it to:
> > >>>>>>>>>
> > >>>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and
> > >>>>>>>>> add a chicken bit")
> > >>>>>>>>>
> > >>>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
> > >>>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a
> > >>>>>>>>> 64-vCPU
> > >>>>>>>>> guest and do a kernel build within the guest, I get the following:
> > >>>>>>>>
> > >>>>>>>> I should clarify that this panic is on the bare-metal system, not
> > >>>>>>>> in the
> > >>>>>>>> guest. And that specifying nofsgsbase on the bare-metal command
> > >>>>>>>> line fixes
> > >>>>>>>> the issue.
> > >>>>>>>
> > >>>>>>> I certainly see some oddities:
> > >>>>>>>
> > >>>>>>> We have this code:
> > >>>>>>>
> > >>>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> > >>>>>>> {
> > >>>>>>> struct vcpu_svm *svm = to_svm(vcpu);
> > >>>>>>> int i;
> > >>>>>>>
> > >>>>>>> avic_vcpu_put(vcpu);
> > >>>>>>>
> > >>>>>>> ++vcpu->stat.host_state_reload;
> > >>>>>>> kvm_load_ldt(svm->host.ldt);
> > >>>>>>> #ifdef CONFIG_X86_64
> > >>>>>>> loadsegment(fs, svm->host.fs);
> > >>>>>>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> > >>>>>
> > >>>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
> > >>>>>
> > >>>>> current_save_fsgs();
> > >>>>
> > >>>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
> > >>>> current->thread.gsbase value to a new variable in the svm struct. I then
> > >>>> used that variable in the wrmsrl below, but it still crashed.
> > >>>
> > >>> Can you try bisecting all the way back to:
> > >>>
> > >>> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
> > >>> Author: Andy Lutomirski <[email protected]>
> > >>> Date: Thu May 28 16:13:48 2020 -0400
> > >>>
> > >>> x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
> > >>>
> > >>> and adding the unsafe_fsgsbase command line option while you bisect.
> > >>
> > >> I'll give that a try.
> >
> > Bisecting with unsafe_fsgsbase identified:
> >
> > c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
> >
> > But I'm thinking that could be because it starts using GET_PERCPU_BASE,
> > which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
> > That would explain why I don't see the issue on Naples, which doesn't
> > support RDPID.
>
> It looks to me like SVM loads the guest TSC_AUX from vcpu_load to
> vcpu_put, with this comment:
>
> /* This assumes that the kernel never uses MSR_TSC_AUX */
> if (static_cpu_has(X86_FEATURE_RDTSCP))
> wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

Correction: It never restores TSC_AUX, AFAICT.

2020-08-20 18:46:43

by Chang S. Bae

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1


> On Aug 20, 2020, at 08:21, Tom Lendacky <[email protected]> wrote:
> On 8/20/20 10:10 AM, Sean Christopherson wrote:
>>
>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>> current_save_fsgs();
>
> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the current->thread.gsbase value to a new variable in the svm struct. I then used that variable in the wrmsrl below, but it still crashed.

Then, current->thread.gsbase is from __rdgsbase_inactive() which is
user GSBASE.

If you do the wrmsrl below, it overwrites the current GSBASE with the
user value.

>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);

Thanks,
Chang

2020-08-20 19:07:15

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1



On 8/20/20 2:04 PM, Tom Lendacky wrote:
> On 8/20/20 1:41 PM, Tom Lendacky wrote:
>> On 8/20/20 1:39 PM, Jim Mattson wrote:
>>> On Thu, Aug 20, 2020 at 11:38 AM Jim Mattson <[email protected]> wrote:
>>>>
>>>> On Thu, Aug 20, 2020 at 11:34 AM Tom Lendacky
>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>> Bisecting with unsafe_fsgsbase identified:
>>>>>
>>>>> c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid
>>>>> entry/exit")
>>>>>
>>>>> But I'm thinking that could be because it starts using GET_PERCPU_BASE,
>>>>> which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
>>>>> That would explain why I don't see the issue on Naples, which doesn't
>>>>> support RDPID.
>>>>
>>>> It looks to me like SVM loads the guest TSC_AUX from vcpu_load to
>>>> vcpu_put, with this comment:
>>>>
>>>> /* This assumes that the kernel never uses MSR_TSC_AUX */
>>>> if (static_cpu_has(X86_FEATURE_RDTSCP))
>>>>          wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>>>
>>> Correction: It never restores TSC_AUX, AFAICT.
>>
>> It does, it's in the host_save_user_msrs array.
>
> I added a quick hack to save TSC_AUX to a new variable in the SVM struct
> and then restore it right after VMEXIT (just after where GS is restored in
> svm_vcpu_enter_exit()) and my guest is no longer crashing.

Sorry, I mean my host is no longer crashing.

Thanks,
Tom

>
> Thanks,
> Tom
>
>>
>> Thanks,
>> Tom
>>
>>>

2020-08-20 19:07:38

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 1:41 PM, Tom Lendacky wrote:
> On 8/20/20 1:39 PM, Jim Mattson wrote:
>> On Thu, Aug 20, 2020 at 11:38 AM Jim Mattson <[email protected]> wrote:
>>>
>>> On Thu, Aug 20, 2020 at 11:34 AM Tom Lendacky <[email protected]>
>>> wrote:
>>>>
>>>>
>>>> Bisecting with unsafe_fsgsbase identified:
>>>>
>>>> c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid
>>>> entry/exit")
>>>>
>>>> But I'm thinking that could be because it starts using GET_PERCPU_BASE,
>>>> which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
>>>> That would explain why I don't see the issue on Naples, which doesn't
>>>> support RDPID.
>>>
>>> It looks to me like SVM loads the guest TSC_AUX from vcpu_load to
>>> vcpu_put, with this comment:
>>>
>>> /* This assumes that the kernel never uses MSR_TSC_AUX */
>>> if (static_cpu_has(X86_FEATURE_RDTSCP))
>>>          wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>>
>> Correction: It never restores TSC_AUX, AFAICT.
>
> It does, it's in the host_save_user_msrs array.

I added a quick hack to save TSC_AUX to a new variable in the SVM struct
and then restore it right after VMEXIT (just after where GS is restored in
svm_vcpu_enter_exit()) and my guest is no longer crashing.

Thanks,
Tom

>
> Thanks,
> Tom
>
>>

2020-08-20 19:55:01

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 11:17 AM, Tom Lendacky wrote:
> On 8/20/20 10:55 AM, Andy Lutomirski wrote:
>> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]>
>> wrote:
>>>
>>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
>>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
>>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]> wrote:
>>>>>>
>>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
>>>>>>>> It looks like the FSGSBASE support is crashing my second
>>>>>>>> generation EPYC
>>>>>>>> system. I was able to bisect it to:
>>>>>>>>
>>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and
>>>>>>>> add a chicken bit")
>>>>>>>>
>>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
>>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
>>>>>>>> guest and do a kernel build within the guest, I get the following:
>>>>>>>
>>>>>>> I should clarify that this panic is on the bare-metal system, not
>>>>>>> in the
>>>>>>> guest. And that specifying nofsgsbase on the bare-metal command
>>>>>>> line fixes
>>>>>>> the issue.
>>>>>>
>>>>>> I certainly see some oddities:
>>>>>>
>>>>>> We have this code:
>>>>>>
>>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>>>>>> {
>>>>>>           struct vcpu_svm *svm = to_svm(vcpu);
>>>>>>           int i;
>>>>>>
>>>>>>           avic_vcpu_put(vcpu);
>>>>>>
>>>>>>           ++vcpu->stat.host_state_reload;
>>>>>>           kvm_load_ldt(svm->host.ldt);
>>>>>> #ifdef CONFIG_X86_64
>>>>>>           loadsegment(fs, svm->host.fs);
>>>>>>           wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>>>>
>>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>>>>
>>>>        current_save_fsgs();
>>>
>>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
>>> current->thread.gsbase value to a new variable in the svm struct. I then
>>> used that variable in the wrmsrl below, but it still crashed.
>>
>> Can you try bisecting all the way back to:
>>
>> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
>> Author: Andy Lutomirski <[email protected]>
>> Date:   Thu May 28 16:13:48 2020 -0400
>>
>>      x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
>>
>> and adding the unsafe_fsgsbase command line option while you bisect.
>
> I'll give that a try.
>
>>
>> Also, you're crashing when you run a guest, right?  Can you try
>
> Right, when the guest is running. The guest boots fine and only when I put
> some stress on it (kernel build) does it cause the issue. It might be
> worth trying to pin all the vCPUs and see if the crash still happens.
>
>> running the x86 sefltests on a bad kernel without running any guests?
>
> I'll give that a try.

All the selftests passed.

Thanks,
Tom

>
> Thanks,
> Tom
>
>>
>> --Andy
>>

2020-08-20 20:51:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 20/08/20 18:30, Tom Lendacky wrote:
>>> running the x86 sefltests on a bad kernel without running any guests?
>>
>> I'll give that a try.
>
> All the selftests passed.

Do the KVM selftests also pass? Especially the dirty_log_test might be
interesting since it can be run for a longer time.

Paolo

2020-08-20 21:20:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 11:30 AM, Tom Lendacky wrote:
> On 8/20/20 11:17 AM, Tom Lendacky wrote:
>> On 8/20/20 10:55 AM, Andy Lutomirski wrote:
>>> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]>
>>> wrote:
>>>>
>>>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
>>>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
>>>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
>>>>>>>>> It looks like the FSGSBASE support is crashing my second
>>>>>>>>> generation EPYC
>>>>>>>>> system. I was able to bisect it to:
>>>>>>>>>
>>>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and
>>>>>>>>> add a chicken bit")
>>>>>>>>>
>>>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
>>>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a
>>>>>>>>> 64-vCPU
>>>>>>>>> guest and do a kernel build within the guest, I get the following:
>>>>>>>>
>>>>>>>> I should clarify that this panic is on the bare-metal system, not
>>>>>>>> in the
>>>>>>>> guest. And that specifying nofsgsbase on the bare-metal command
>>>>>>>> line fixes
>>>>>>>> the issue.
>>>>>>>
>>>>>>> I certainly see some oddities:
>>>>>>>
>>>>>>> We have this code:
>>>>>>>
>>>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>>>>>>> {
>>>>>>>           struct vcpu_svm *svm = to_svm(vcpu);
>>>>>>>           int i;
>>>>>>>
>>>>>>>           avic_vcpu_put(vcpu);
>>>>>>>
>>>>>>>           ++vcpu->stat.host_state_reload;
>>>>>>>           kvm_load_ldt(svm->host.ldt);
>>>>>>> #ifdef CONFIG_X86_64
>>>>>>>           loadsegment(fs, svm->host.fs);
>>>>>>>           wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>>>>>
>>>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>>>>>
>>>>>        current_save_fsgs();
>>>>
>>>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
>>>> current->thread.gsbase value to a new variable in the svm struct. I then
>>>> used that variable in the wrmsrl below, but it still crashed.
>>>
>>> Can you try bisecting all the way back to:
>>>
>>> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
>>> Author: Andy Lutomirski <[email protected]>
>>> Date:   Thu May 28 16:13:48 2020 -0400
>>>
>>>      x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
>>>
>>> and adding the unsafe_fsgsbase command line option while you bisect.
>>
>> I'll give that a try.

Bisecting with unsafe_fsgsbase identified:

c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")

But I'm thinking that could be because it starts using GET_PERCPU_BASE,
which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
That would explain why I don't see the issue on Naples, which doesn't
support RDPID.

Thanks,
Tom

>>
>>>
>>> Also, you're crashing when you run a guest, right?  Can you try
>>
>> Right, when the guest is running. The guest boots fine and only when I
>> put some stress on it (kernel build) does it cause the issue. It might
>> be worth trying to pin all the vCPUs and see if the crash still happens.
>>
>>> running the x86 sefltests on a bad kernel without running any guests?
>>
>> I'll give that a try.
>
> All the selftests passed.
>
> Thanks,
> Tom
>
>>
>> Thanks,
>> Tom
>>
>>>
>>> --Andy
>>>

2020-08-20 21:21:00

by Jim Mattson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 11:34 AM Tom Lendacky <[email protected]> wrote:
>
> On 8/20/20 11:30 AM, Tom Lendacky wrote:
> > On 8/20/20 11:17 AM, Tom Lendacky wrote:
> >> On 8/20/20 10:55 AM, Andy Lutomirski wrote:
> >>> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]>
> >>> wrote:
> >>>>
> >>>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
> >>>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
> >>>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky
> >>>>>>> <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
> >>>>>>>>> It looks like the FSGSBASE support is crashing my second
> >>>>>>>>> generation EPYC
> >>>>>>>>> system. I was able to bisect it to:
> >>>>>>>>>
> >>>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and
> >>>>>>>>> add a chicken bit")
> >>>>>>>>>
> >>>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
> >>>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a
> >>>>>>>>> 64-vCPU
> >>>>>>>>> guest and do a kernel build within the guest, I get the following:
> >>>>>>>>
> >>>>>>>> I should clarify that this panic is on the bare-metal system, not
> >>>>>>>> in the
> >>>>>>>> guest. And that specifying nofsgsbase on the bare-metal command
> >>>>>>>> line fixes
> >>>>>>>> the issue.
> >>>>>>>
> >>>>>>> I certainly see some oddities:
> >>>>>>>
> >>>>>>> We have this code:
> >>>>>>>
> >>>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> >>>>>>> {
> >>>>>>> struct vcpu_svm *svm = to_svm(vcpu);
> >>>>>>> int i;
> >>>>>>>
> >>>>>>> avic_vcpu_put(vcpu);
> >>>>>>>
> >>>>>>> ++vcpu->stat.host_state_reload;
> >>>>>>> kvm_load_ldt(svm->host.ldt);
> >>>>>>> #ifdef CONFIG_X86_64
> >>>>>>> loadsegment(fs, svm->host.fs);
> >>>>>>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> >>>>>
> >>>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
> >>>>>
> >>>>> current_save_fsgs();
> >>>>
> >>>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
> >>>> current->thread.gsbase value to a new variable in the svm struct. I then
> >>>> used that variable in the wrmsrl below, but it still crashed.
> >>>
> >>> Can you try bisecting all the way back to:
> >>>
> >>> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
> >>> Author: Andy Lutomirski <[email protected]>
> >>> Date: Thu May 28 16:13:48 2020 -0400
> >>>
> >>> x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
> >>>
> >>> and adding the unsafe_fsgsbase command line option while you bisect.
> >>
> >> I'll give that a try.
>
> Bisecting with unsafe_fsgsbase identified:
>
> c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
>
> But I'm thinking that could be because it starts using GET_PERCPU_BASE,
> which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
> That would explain why I don't see the issue on Naples, which doesn't
> support RDPID.

It looks to me like SVM loads the guest TSC_AUX from vcpu_load to
vcpu_put, with this comment:

/* This assumes that the kernel never uses MSR_TSC_AUX */
if (static_cpu_has(X86_FEATURE_RDTSCP))
wrmsrl(MSR_TSC_AUX, svm->tsc_aux);

We are talking about mainline here, right?

2020-08-20 21:22:18

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 1:39 PM, Jim Mattson wrote:
> On Thu, Aug 20, 2020 at 11:38 AM Jim Mattson <[email protected]> wrote:
>>
>> On Thu, Aug 20, 2020 at 11:34 AM Tom Lendacky <[email protected]> wrote:
>>>
>>> On 8/20/20 11:30 AM, Tom Lendacky wrote:
>>>> On 8/20/20 11:17 AM, Tom Lendacky wrote:
>>>>> On 8/20/20 10:55 AM, Andy Lutomirski wrote:
>>>>>> On Thu, Aug 20, 2020 at 8:21 AM Tom Lendacky <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 8/20/20 10:10 AM, Sean Christopherson wrote:
>>>>>>>> On Wed, Aug 19, 2020 at 05:21:33PM -0700, Andy Lutomirski wrote:
>>>>>>>>> On Wed, Aug 19, 2020 at 2:25 PM Andy Lutomirski <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 19, 2020 at 11:19 AM Tom Lendacky
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 8/19/20 1:07 PM, Tom Lendacky wrote:
>>>>>>>>>>>> It looks like the FSGSBASE support is crashing my second
>>>>>>>>>>>> generation EPYC
>>>>>>>>>>>> system. I was able to bisect it to:
>>>>>>>>>>>>
>>>>>>>>>>>> b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and
>>>>>>>>>>>> add a chicken bit")
>>>>>>>>>>>>
>>>>>>>>>>>> The panic only happens when using KVM. Doing kernel builds or stress
>>>>>>>>>>>> on bare-metal appears fine. But if I fire up, in this case, a
>>>>>>>>>>>> 64-vCPU
>>>>>>>>>>>> guest and do a kernel build within the guest, I get the following:
>>>>>>>>>>>
>>>>>>>>>>> I should clarify that this panic is on the bare-metal system, not
>>>>>>>>>>> in the
>>>>>>>>>>> guest. And that specifying nofsgsbase on the bare-metal command
>>>>>>>>>>> line fixes
>>>>>>>>>>> the issue.
>>>>>>>>>>
>>>>>>>>>> I certainly see some oddities:
>>>>>>>>>>
>>>>>>>>>> We have this code:
>>>>>>>>>>
>>>>>>>>>> static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>>>>>>>>>> {
>>>>>>>>>> struct vcpu_svm *svm = to_svm(vcpu);
>>>>>>>>>> int i;
>>>>>>>>>>
>>>>>>>>>> avic_vcpu_put(vcpu);
>>>>>>>>>>
>>>>>>>>>> ++vcpu->stat.host_state_reload;
>>>>>>>>>> kvm_load_ldt(svm->host.ldt);
>>>>>>>>>> #ifdef CONFIG_X86_64
>>>>>>>>>> loadsegment(fs, svm->host.fs);
>>>>>>>>>> wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
>>>>>>>>
>>>>>>>> Pretty sure current->thread.gsbase can be stale, i.e. this needs:
>>>>>>>>
>>>>>>>> current_save_fsgs();
>>>>>>>
>>>>>>> I did try adding current_save_fsgs() in svm_vcpu_load(), saving the
>>>>>>> current->thread.gsbase value to a new variable in the svm struct. I then
>>>>>>> used that variable in the wrmsrl below, but it still crashed.
>>>>>>
>>>>>> Can you try bisecting all the way back to:
>>>>>>
>>>>>> commit dd649bd0b3aa012740059b1ba31ecad28a408f7f
>>>>>> Author: Andy Lutomirski <[email protected]>
>>>>>> Date: Thu May 28 16:13:48 2020 -0400
>>>>>>
>>>>>> x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
>>>>>>
>>>>>> and adding the unsafe_fsgsbase command line option while you bisect.
>>>>>
>>>>> I'll give that a try.
>>>
>>> Bisecting with unsafe_fsgsbase identified:
>>>
>>> c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
>>>
>>> But I'm thinking that could be because it starts using GET_PERCPU_BASE,
>>> which on Rome would use RDPID. So is SVM restoring TSC_AUX_MSR too late?
>>> That would explain why I don't see the issue on Naples, which doesn't
>>> support RDPID.
>>
>> It looks to me like SVM loads the guest TSC_AUX from vcpu_load to
>> vcpu_put, with this comment:
>>
>> /* This assumes that the kernel never uses MSR_TSC_AUX */
>> if (static_cpu_has(X86_FEATURE_RDTSCP))
>> wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> Correction: It never restores TSC_AUX, AFAICT.

It does, it's in the host_save_user_msrs array.

Thanks,
Tom

>

2020-08-20 21:51:38

by Dave Hansen

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 12:05 PM, Tom Lendacky wrote:
>> I added a quick hack to save TSC_AUX to a new variable in the SVM
>> struct and then restore it right after VMEXIT (just after where GS is
>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
>
> Sorry, I mean my host is no longer crashing.

Just to make sure I've got this:
1. Older CPUs didn't have X86_FEATURE_RDPID
2. FSGSBASE patches started using RDPID in the NMI entry path when
supported *AND* FSGSBASE was enabled
3. There was a latent SVM bug which did not restore the RDPID data
before NMIs were reenabled after VMEXIT
4. If an NMI comes in the window between VMEXIT and the
wrmsr(TSC_AUX)... boom

If FSGSBASE reverted is disabled (as Tom did on the command-line), then
the RDPID path isn't hit.

Fun.

2020-08-20 22:17:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 01:36:46PM -0700, Andy Lutomirski wrote:
>
>
> > On Aug 20, 2020, at 1:15 PM, Tom Lendacky <[email protected]> wrote:
> >
> > On 8/20/20 3:07 PM, Dave Hansen wrote:
> >> On 8/20/20 12:05 PM, Tom Lendacky wrote:
> >>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
> >>>> struct and then restore it right after VMEXIT (just after where GS is
> >>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
> >>>
> >>> Sorry, I mean my host is no longer crashing.
> >> Just to make sure I've got this:
> >> 1. Older CPUs didn't have X86_FEATURE_RDPID
> >> 2. FSGSBASE patches started using RDPID in the NMI entry path when
> >> supported *AND* FSGSBASE was enabled
> >> 3. There was a latent SVM bug which did not restore the RDPID data
> >> before NMIs were reenabled after VMEXIT
> >> 4. If an NMI comes in the window between VMEXIT and the
> >> wrmsr(TSC_AUX)... boom
> >
> > Right, which means that the setting of TSC_AUX to the guest value needs to be moved, too.
> >
>
> Depending on how much of a perf hit this is, we could also skip using RDPID
> in the paranoid path on SVM-capable CPUs.

Doesn't this affect VMX as well? KVM+VMX doesn't restore TSC_AUX until the
kernel returns to userspace. I don't see anything that prevents the NMI
RDPID path from affecting Intel CPUs.

Assuming that's the case, I would strongly prefer this be handled in the
paranoid path. NMIs are unblocked immediately on VMX VM-Exit, which means
using the MSR load lists in the VMCS, and I hate those with a vengeance.

Perf overhead on VMX would be 8-10% for VM-Exits that would normally stay
in KVM's run loop, e.g. ~125 cycles for the WMRSR, ~1300-1500 cycles to
handle the most common VM-Exits. It'd be even higher overhead for the
VMX preemption timer, which is handled without even enabling IRQs and is
a hot path as it's used to emulate the TSC deadline timer for the guest.

2020-08-20 22:19:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 3:05 PM Sean Christopherson
<[email protected]> wrote:
>
> On Thu, Aug 20, 2020 at 01:36:46PM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Aug 20, 2020, at 1:15 PM, Tom Lendacky <[email protected]> wrote:
> > >
> > > On 8/20/20 3:07 PM, Dave Hansen wrote:
> > >> On 8/20/20 12:05 PM, Tom Lendacky wrote:
> > >>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
> > >>>> struct and then restore it right after VMEXIT (just after where GS is
> > >>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
> > >>>
> > >>> Sorry, I mean my host is no longer crashing.
> > >> Just to make sure I've got this:
> > >> 1. Older CPUs didn't have X86_FEATURE_RDPID
> > >> 2. FSGSBASE patches started using RDPID in the NMI entry path when
> > >> supported *AND* FSGSBASE was enabled
> > >> 3. There was a latent SVM bug which did not restore the RDPID data
> > >> before NMIs were reenabled after VMEXIT
> > >> 4. If an NMI comes in the window between VMEXIT and the
> > >> wrmsr(TSC_AUX)... boom
> > >
> > > Right, which means that the setting of TSC_AUX to the guest value needs to be moved, too.
> > >
> >
> > Depending on how much of a perf hit this is, we could also skip using RDPID
> > in the paranoid path on SVM-capable CPUs.
>
> Doesn't this affect VMX as well? KVM+VMX doesn't restore TSC_AUX until the
> kernel returns to userspace. I don't see anything that prevents the NMI
> RDPID path from affecting Intel CPUs.
>
> Assuming that's the case, I would strongly prefer this be handled in the
> paranoid path. NMIs are unblocked immediately on VMX VM-Exit, which means
> using the MSR load lists in the VMCS, and I hate those with a vengeance.
>
> Perf overhead on VMX would be 8-10% for VM-Exits that would normally stay
> in KVM's run loop, e.g. ~125 cycles for the WMRSR, ~1300-1500 cycles to
> handle the most common VM-Exits. It'd be even higher overhead for the
> VMX preemption timer, which is handled without even enabling IRQs and is
> a hot path as it's used to emulate the TSC deadline timer for the guest.

I'm fine with that -- let's get rid of RDPID unconditionally in the
paranoid path. Want to send a patch that also adds as comment
explaining why we're not using RDPID?

--Andy

2020-08-20 22:37:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 03:07:10PM -0700, Andy Lutomirski wrote:
> On Thu, Aug 20, 2020 at 3:05 PM Sean Christopherson
> <[email protected]> wrote:
> >
> > On Thu, Aug 20, 2020 at 01:36:46PM -0700, Andy Lutomirski wrote:
> > >
> > >
> > > > On Aug 20, 2020, at 1:15 PM, Tom Lendacky <[email protected]> wrote:
> > > >
> > > > On 8/20/20 3:07 PM, Dave Hansen wrote:
> > > >> On 8/20/20 12:05 PM, Tom Lendacky wrote:
> > > >>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
> > > >>>> struct and then restore it right after VMEXIT (just after where GS is
> > > >>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
> > > >>>
> > > >>> Sorry, I mean my host is no longer crashing.
> > > >> Just to make sure I've got this:
> > > >> 1. Older CPUs didn't have X86_FEATURE_RDPID
> > > >> 2. FSGSBASE patches started using RDPID in the NMI entry path when
> > > >> supported *AND* FSGSBASE was enabled
> > > >> 3. There was a latent SVM bug which did not restore the RDPID data
> > > >> before NMIs were reenabled after VMEXIT
> > > >> 4. If an NMI comes in the window between VMEXIT and the
> > > >> wrmsr(TSC_AUX)... boom
> > > >
> > > > Right, which means that the setting of TSC_AUX to the guest value needs to be moved, too.
> > > >
> > >
> > > Depending on how much of a perf hit this is, we could also skip using RDPID
> > > in the paranoid path on SVM-capable CPUs.
> >
> > Doesn't this affect VMX as well? KVM+VMX doesn't restore TSC_AUX until the
> > kernel returns to userspace. I don't see anything that prevents the NMI
> > RDPID path from affecting Intel CPUs.
> >
> > Assuming that's the case, I would strongly prefer this be handled in the
> > paranoid path. NMIs are unblocked immediately on VMX VM-Exit, which means
> > using the MSR load lists in the VMCS, and I hate those with a vengeance.
> >
> > Perf overhead on VMX would be 8-10% for VM-Exits that would normally stay
> > in KVM's run loop, e.g. ~125 cycles for the WMRSR, ~1300-1500 cycles to
> > handle the most common VM-Exits. It'd be even higher overhead for the
> > VMX preemption timer, which is handled without even enabling IRQs and is
> > a hot path as it's used to emulate the TSC deadline timer for the guest.
>
> I'm fine with that -- let's get rid of RDPID unconditionally in the
> paranoid path. Want to send a patch that also adds as comment
> explaining why we're not using RDPID?

Sure, though I won't object if Tom beats me to the punch :-)

2020-08-20 23:55:34

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 3:07 PM, Dave Hansen wrote:
> On 8/20/20 12:05 PM, Tom Lendacky wrote:
>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
>>> struct and then restore it right after VMEXIT (just after where GS is
>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
>>
>> Sorry, I mean my host is no longer crashing.
>
> Just to make sure I've got this:
> 1. Older CPUs didn't have X86_FEATURE_RDPID
> 2. FSGSBASE patches started using RDPID in the NMI entry path when
> supported *AND* FSGSBASE was enabled
> 3. There was a latent SVM bug which did not restore the RDPID data
> before NMIs were reenabled after VMEXIT
> 4. If an NMI comes in the window between VMEXIT and the
> wrmsr(TSC_AUX)... boom

Right, which means that the setting of TSC_AUX to the guest value needs to
be moved, too.

Thanks,
Tom

>
> If FSGSBASE reverted is disabled (as Tom did on the command-line), then
> the RDPID path isn't hit.
>
> Fun.
>

2020-08-20 23:56:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1



> On Aug 20, 2020, at 1:15 PM, Tom Lendacky <[email protected]> wrote:
>
> On 8/20/20 3:07 PM, Dave Hansen wrote:
>> On 8/20/20 12:05 PM, Tom Lendacky wrote:
>>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
>>>> struct and then restore it right after VMEXIT (just after where GS is
>>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
>>>
>>> Sorry, I mean my host is no longer crashing.
>> Just to make sure I've got this:
>> 1. Older CPUs didn't have X86_FEATURE_RDPID
>> 2. FSGSBASE patches started using RDPID in the NMI entry path when
>> supported *AND* FSGSBASE was enabled
>> 3. There was a latent SVM bug which did not restore the RDPID data
>> before NMIs were reenabled after VMEXIT
>> 4. If an NMI comes in the window between VMEXIT and the
>> wrmsr(TSC_AUX)... boom
>
> Right, which means that the setting of TSC_AUX to the guest value needs to be moved, too.
>

Depending on how much of a perf hit this is, we could also skip using RDPID in the paranoid path on SVM-capable CPUs.

2020-08-21 00:04:42

by Tom Lendacky

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On 8/20/20 5:34 PM, Sean Christopherson wrote:
> On Thu, Aug 20, 2020 at 03:07:10PM -0700, Andy Lutomirski wrote:
>> On Thu, Aug 20, 2020 at 3:05 PM Sean Christopherson
>> <[email protected]> wrote:
>>>
>>> On Thu, Aug 20, 2020 at 01:36:46PM -0700, Andy Lutomirski wrote:
>>>>
>>>>
>>>>> On Aug 20, 2020, at 1:15 PM, Tom Lendacky <[email protected]> wrote:
>>>>>
>>>>> On 8/20/20 3:07 PM, Dave Hansen wrote:
>>>>>> On 8/20/20 12:05 PM, Tom Lendacky wrote:
>>>>>>>> I added a quick hack to save TSC_AUX to a new variable in the SVM
>>>>>>>> struct and then restore it right after VMEXIT (just after where GS is
>>>>>>>> restored in svm_vcpu_enter_exit()) and my guest is no longer crashing.
>>>>>>>
>>>>>>> Sorry, I mean my host is no longer crashing.
>>>>>> Just to make sure I've got this:
>>>>>> 1. Older CPUs didn't have X86_FEATURE_RDPID
>>>>>> 2. FSGSBASE patches started using RDPID in the NMI entry path when
>>>>>> supported *AND* FSGSBASE was enabled
>>>>>> 3. There was a latent SVM bug which did not restore the RDPID data
>>>>>> before NMIs were reenabled after VMEXIT
>>>>>> 4. If an NMI comes in the window between VMEXIT and the
>>>>>> wrmsr(TSC_AUX)... boom
>>>>>
>>>>> Right, which means that the setting of TSC_AUX to the guest value needs to be moved, too.
>>>>>
>>>>
>>>> Depending on how much of a perf hit this is, we could also skip using RDPID
>>>> in the paranoid path on SVM-capable CPUs.
>>>
>>> Doesn't this affect VMX as well? KVM+VMX doesn't restore TSC_AUX until the
>>> kernel returns to userspace. I don't see anything that prevents the NMI
>>> RDPID path from affecting Intel CPUs.
>>>
>>> Assuming that's the case, I would strongly prefer this be handled in the
>>> paranoid path. NMIs are unblocked immediately on VMX VM-Exit, which means
>>> using the MSR load lists in the VMCS, and I hate those with a vengeance.
>>>
>>> Perf overhead on VMX would be 8-10% for VM-Exits that would normally stay
>>> in KVM's run loop, e.g. ~125 cycles for the WMRSR, ~1300-1500 cycles to
>>> handle the most common VM-Exits. It'd be even higher overhead for the
>>> VMX preemption timer, which is handled without even enabling IRQs and is
>>> a hot path as it's used to emulate the TSC deadline timer for the guest.
>>
>> I'm fine with that -- let's get rid of RDPID unconditionally in the
>> paranoid path. Want to send a patch that also adds as comment
>> explaining why we're not using RDPID?
>
> Sure, though I won't object if Tom beats me to the punch :-)

I can do it, but won't be able to get to it until sometime tomorrow.

Thanks,
Tom

>

2020-08-21 01:59:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: FSGSBASE causing panic on 5.9-rc1

On Thu, Aug 20, 2020 at 07:00:16PM -0500, Tom Lendacky wrote:
> On 8/20/20 5:34 PM, Sean Christopherson wrote:
> > On Thu, Aug 20, 2020 at 03:07:10PM -0700, Andy Lutomirski wrote:
> > > On Thu, Aug 20, 2020 at 3:05 PM Sean Christopherson
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Aug 20, 2020 at 01:36:46PM -0700, Andy Lutomirski wrote:
> > > > >
> > > > > Depending on how much of a perf hit this is, we could also skip using RDPID
> > > > > in the paranoid path on SVM-capable CPUs.
> > > >
> > > > Doesn't this affect VMX as well? KVM+VMX doesn't restore TSC_AUX until the
> > > > kernel returns to userspace. I don't see anything that prevents the NMI
> > > > RDPID path from affecting Intel CPUs.
> > > >
> > > > Assuming that's the case, I would strongly prefer this be handled in the
> > > > paranoid path. NMIs are unblocked immediately on VMX VM-Exit, which means
> > > > using the MSR load lists in the VMCS, and I hate those with a vengeance.
> > > >
> > > > Perf overhead on VMX would be 8-10% for VM-Exits that would normally stay
> > > > in KVM's run loop, e.g. ~125 cycles for the WMRSR, ~1300-1500 cycles to
> > > > handle the most common VM-Exits. It'd be even higher overhead for the
> > > > VMX preemption timer, which is handled without even enabling IRQs and is
> > > > a hot path as it's used to emulate the TSC deadline timer for the guest.
> > >
> > > I'm fine with that -- let's get rid of RDPID unconditionally in the
> > > paranoid path. Want to send a patch that also adds as comment
> > > explaining why we're not using RDPID?
> >
> > Sure, though I won't object if Tom beats me to the punch :-)
>
> I can do it, but won't be able to get to it until sometime tomorrow.

Confirmed VMX goes kaboom when running perf with a VM. Patch incoming.