Today tried first time after long break to boot Intel Quark SoC with
most recent linux-next. Got the following warning:
[ 14.714533] WARNING: CPU: 0 PID: 823 at
arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
[ 14.726603] Modules linked in:
[ 14.729910] CPU: 0 PID: 823 Comm: kworker/u2:0 Not tainted
4.5.0-rc7-next-20160310+ #137
[ 14.738307] 00000000 00000000 ce691e20 c12b6fc9 ce691e50 c1049fd1
c1978c6c 00000000
[ 14.747000] 00000337 c196b530 000000a3 c102050c 000000a3 ce587ac0
00000000 ce653000
[ 14.755722] ce691e64 c104a095 00000009 00000000 00000000 ce691e74
c102050c ce587500
[ 14.764468] Call Trace:
[ 14.767172] [<c12b6fc9>] dump_stack+0x16/0x1d
[ 14.771889] [<c1049fd1>] __warn+0xd1/0xf0
[ 14.776253] [<c102050c>] ? fpu__clear+0x8c/0x160
[ 14.781234] [<c104a095>] warn_slowpath_null+0x25/0x30
[ 14.786648] [<c102050c>] fpu__clear+0x8c/0x160
[ 14.791447] [<c101f347>] flush_thread+0x57/0x60
[ 14.796341] [<c113a5cc>] flush_old_exec+0x4cc/0x600
[ 14.801594] [<c117ab20>] load_elf_binary+0x2b0/0x1060
[ 14.807010] [<c1111220>] ? get_user_pages_remote+0x50/0x60
[ 14.812898] [<c12c4687>] ? _copy_from_user+0x37/0x40
[ 14.818236] [<c1139f82>] search_binary_handler+0x62/0x150
[ 14.824007] [<c113b19c>] do_execveat_common+0x45c/0x600
[ 14.829647] [<c113b35f>] do_execve+0x1f/0x30
[ 14.834289] [<c1059941>] call_usermodehelper_exec_async+0x91/0xe0
[ 14.840765] [<c17f2310>] ret_from_kernel_thread+0x20/0x40
[ 14.846540] [<c10598b0>] ? umh_complete+0x40/0x40
[ 14.851626] ---[ end trace 137ff5893f9b85bf ]---
Is it know issue? Or what could I try to fix it?
Reproducibility: 3 of 3.
--
With Best Regards,
Andy Shevchenko
I've Cc:-ed more FPU developers. Mail quoted below. I don't have a Quark system to
test this on, but maybe others have an idea why this warning triggers?
My thinking is that it's related to:
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
Thanks,
Ingo
* Andy Shevchenko <[email protected]> wrote:
> Today tried first time after long break to boot Intel Quark SoC with
> most recent linux-next. Got the following warning:
>
> [ 14.714533] WARNING: CPU: 0 PID: 823 at
> arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> [ 14.726603] Modules linked in:
> [ 14.729910] CPU: 0 PID: 823 Comm: kworker/u2:0 Not tainted
> 4.5.0-rc7-next-20160310+ #137
> [ 14.738307] 00000000 00000000 ce691e20 c12b6fc9 ce691e50 c1049fd1
> c1978c6c 00000000
> [ 14.747000] 00000337 c196b530 000000a3 c102050c 000000a3 ce587ac0
> 00000000 ce653000
> [ 14.755722] ce691e64 c104a095 00000009 00000000 00000000 ce691e74
> c102050c ce587500
> [ 14.764468] Call Trace:
> [ 14.767172] [<c12b6fc9>] dump_stack+0x16/0x1d
> [ 14.771889] [<c1049fd1>] __warn+0xd1/0xf0
> [ 14.776253] [<c102050c>] ? fpu__clear+0x8c/0x160
> [ 14.781234] [<c104a095>] warn_slowpath_null+0x25/0x30
> [ 14.786648] [<c102050c>] fpu__clear+0x8c/0x160
> [ 14.791447] [<c101f347>] flush_thread+0x57/0x60
> [ 14.796341] [<c113a5cc>] flush_old_exec+0x4cc/0x600
> [ 14.801594] [<c117ab20>] load_elf_binary+0x2b0/0x1060
> [ 14.807010] [<c1111220>] ? get_user_pages_remote+0x50/0x60
> [ 14.812898] [<c12c4687>] ? _copy_from_user+0x37/0x40
> [ 14.818236] [<c1139f82>] search_binary_handler+0x62/0x150
> [ 14.824007] [<c113b19c>] do_execveat_common+0x45c/0x600
> [ 14.829647] [<c113b35f>] do_execve+0x1f/0x30
> [ 14.834289] [<c1059941>] call_usermodehelper_exec_async+0x91/0xe0
> [ 14.840765] [<c17f2310>] ret_from_kernel_thread+0x20/0x40
> [ 14.846540] [<c10598b0>] ? umh_complete+0x40/0x40
> [ 14.851626] ---[ end trace 137ff5893f9b85bf ]---
>
> Is it know issue? Or what could I try to fix it?
>
> Reproducibility: 3 of 3.
>
> --
> With Best Regards,
> Andy Shevchenko
On Thu, Mar 10, 2016 at 1:19 PM, Ingo Molnar <[email protected]> wrote:
>
> I've Cc:-ed more FPU developers. Mail quoted below. I don't have a Quark system to
> test this on, but maybe others have an idea why this warning triggers?
>
> My thinking is that it's related to:
>
> 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
He-he, my cursor stays on
if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
while I was having a lunch. So far got from datasheet that some kind
of FPU is present there.
eagerfpu=auto doesn't fix
eagerfpu=off fixes the issue
>> Today tried first time after long break to boot Intel Quark SoC with
>> most recent linux-next. Got the following warning:
>>
>> [ 14.714533] WARNING: CPU: 0 PID: 823 at
>> arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
>> [ 14.726603] Modules linked in:
>> [ 14.729910] CPU: 0 PID: 823 Comm: kworker/u2:0 Not tainted
>> 4.5.0-rc7-next-20160310+ #137
>> [ 14.738307] 00000000 00000000 ce691e20 c12b6fc9 ce691e50 c1049fd1
>> c1978c6c 00000000
>> [ 14.747000] 00000337 c196b530 000000a3 c102050c 000000a3 ce587ac0
>> 00000000 ce653000
>> [ 14.755722] ce691e64 c104a095 00000009 00000000 00000000 ce691e74
>> c102050c ce587500
>> [ 14.764468] Call Trace:
>> [ 14.767172] [<c12b6fc9>] dump_stack+0x16/0x1d
>> [ 14.771889] [<c1049fd1>] __warn+0xd1/0xf0
>> [ 14.776253] [<c102050c>] ? fpu__clear+0x8c/0x160
>> [ 14.781234] [<c104a095>] warn_slowpath_null+0x25/0x30
>> [ 14.786648] [<c102050c>] fpu__clear+0x8c/0x160
>> [ 14.791447] [<c101f347>] flush_thread+0x57/0x60
>> [ 14.796341] [<c113a5cc>] flush_old_exec+0x4cc/0x600
>> [ 14.801594] [<c117ab20>] load_elf_binary+0x2b0/0x1060
>> [ 14.807010] [<c1111220>] ? get_user_pages_remote+0x50/0x60
>> [ 14.812898] [<c12c4687>] ? _copy_from_user+0x37/0x40
>> [ 14.818236] [<c1139f82>] search_binary_handler+0x62/0x150
>> [ 14.824007] [<c113b19c>] do_execveat_common+0x45c/0x600
>> [ 14.829647] [<c113b35f>] do_execve+0x1f/0x30
>> [ 14.834289] [<c1059941>] call_usermodehelper_exec_async+0x91/0xe0
>> [ 14.840765] [<c17f2310>] ret_from_kernel_thread+0x20/0x40
>> [ 14.846540] [<c10598b0>] ? umh_complete+0x40/0x40
>> [ 14.851626] ---[ end trace 137ff5893f9b85bf ]---
>>
>> Is it know issue? Or what could I try to fix it?
>>
>> Reproducibility: 3 of 3.
--
With Best Regards,
Andy Shevchenko
On Thu, Mar 10, 2016 at 02:48:09PM +0200, Andy Shevchenko wrote:
> He-he, my cursor stays on
> if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
> while I was having a lunch. So far got from datasheet that some kind
> of FPU is present there.
>
> eagerfpu=auto doesn't fix
> eagerfpu=off fixes the issue
Looking at the stacktrace, does that quark thing have FXRSTOR?
$ grep -i fxsr /proc/cpuinfo
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Thu, Mar 10, 2016 at 2:56 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Mar 10, 2016 at 02:48:09PM +0200, Andy Shevchenko wrote:
>> He-he, my cursor stays on
>> if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
>> while I was having a lunch. So far got from datasheet that some kind
>> of FPU is present there.
>>
>> eagerfpu=auto doesn't fix
>> eagerfpu=off fixes the issue
>
> Looking at the stacktrace, does that quark thing have FXRSTOR?
>
> $ grep -i fxsr /proc/cpuinfo
Looks like it lacks that one.
# grep -i fxsr /proc/cpuinfo; echo $?
1
--
With Best Regards,
Andy Shevchenko
On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
> Looks like it lacks that one.
>
> # grep -i fxsr /proc/cpuinfo; echo $?
> 1
Ok, so looking at where the warning comes from:
[ 14.714533] WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
{
int err;
if (config_enabled(CONFIG_X86_32)) {
err = check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
^^^^^^^^^^^^^^^^^
} else {
...
/* Copying from a kernel buffer to FPU registers should never fail: */
WARN_ON_FPU(err);
and the stacktrace is pretty clear:
flush_thread
|-> fpu__clear(&tsk->thread.fpu);
|-> we are eager by default here:
if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
/* FPU state will be reallocated lazily at the first use. */
fpu__drop(fpu);
} else {
--> we're in that branch.
copy_init_fpstate_to_fpregs();
|-> copy_kernel_to_fxregs()
I think we should use FRSTOR on quark, i.e., copy_kernel_to_fregs().
Does this untested wild guess even work?
---
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index dea8e76d60c6..bbafe5e8a1a6 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -474,8 +474,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
{
if (use_xsave())
copy_kernel_to_xregs(&init_fpstate.xsave, -1);
- else
+ else if (static_cpu_has(X86_FEATURE_FXSR))
copy_kernel_to_fxregs(&init_fpstate.fxsave);
+ else
+ copy_kernel_to_fregs(&init_fpstate.fsave);
+
}
/*
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Thu, Mar 10, 2016 at 4:59 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
>> Looks like it lacks that one.
>>
>> # grep -i fxsr /proc/cpuinfo; echo $?
>> 1
>
> Ok, so looking at where the warning comes from:
>
> [ 14.714533] WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
>
> static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
> {
> int err;
>
> if (config_enabled(CONFIG_X86_32)) {
> err = check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> ^^^^^^^^^^^^^^^^^
> } else {
>
> ...
>
> /* Copying from a kernel buffer to FPU registers should never fail: */
> WARN_ON_FPU(err);
>
>
> and the stacktrace is pretty clear:
>
> flush_thread
> |-> fpu__clear(&tsk->thread.fpu);
> |-> we are eager by default here:
>
> if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
> /* FPU state will be reallocated lazily at the first use. */
> fpu__drop(fpu);
> } else {
>
> --> we're in that branch.
>
> copy_init_fpstate_to_fpregs();
> |-> copy_kernel_to_fxregs()
>
>
> I think we should use FRSTOR on quark, i.e., copy_kernel_to_fregs().
>
> Does this untested wild guess even work?
>
> ---
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index dea8e76d60c6..bbafe5e8a1a6 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -474,8 +474,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
> {
> if (use_xsave())
> copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> - else
> + else if (static_cpu_has(X86_FEATURE_FXSR))
> copy_kernel_to_fxregs(&init_fpstate.fxsave);
> + else
> + copy_kernel_to_fregs(&init_fpstate.fsave);
> +
Obviously redundant line, otherwise it indeed works
Tested-by: Andy Shevchenko <[email protected]>
> }
>
> /*
--
With Best Regards,
Andy Shevchenko
On Thu, 2016-03-10 at 17:22 +0200, Andy Shevchenko wrote:
> On Thu, Mar 10, 2016 at 4:59 PM, Borislav Petkov <[email protected]>
> wrote:
> > On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
> > > Looks like it lacks that one.
> > >
> > > # grep -i fxsr /proc/cpuinfo; echo $?
> > > 1
> >
> > Ok, so looking at where the warning comes from:
> >
> > [ 14.714533] WARNING: CPU: 0 PID: 823 at
> > arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> >
> > static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
> > {
> > int err;
> >
> > if (config_enabled(CONFIG_X86_32)) {
> > err = check_insn(fxrstor %[fx], "=m" (*fx), [fx]
> > "m" (*fx));
> > ^^^^^^^^^^^^^^^^^
> > } else {
> >
> > ...
> >
> > /* Copying from a kernel buffer to FPU registers should
> > never fail: */
> > WARN_ON_FPU(err);
> >
> >
> > and the stacktrace is pretty clear:
> >
> > flush_thread
> > > -> fpu__clear(&tsk->thread.fpu);
> > |-> we are eager by default here:
> >
> > if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
> > /* FPU state will be reallocated lazily at the
> > first use. */
> > fpu__drop(fpu);
> > } else {
> >
> > --> we're in that branch.
> >
> > copy_init_fpstate_to_fpregs();
> > |-> copy_kernel_to_fxregs()
> >
> >
> > I think we should use FRSTOR on quark, i.e.,
> > copy_kernel_to_fregs().
> >
> > Does this untested wild guess even work?
> >
> > ---
> > diff --git a/arch/x86/kernel/fpu/core.c
> > b/arch/x86/kernel/fpu/core.c
> > index dea8e76d60c6..bbafe5e8a1a6 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -474,8 +474,11 @@ static inline void
> > copy_init_fpstate_to_fpregs(void)
> > {
> > if (use_xsave())
> > copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> > - else
> > + else if (static_cpu_has(X86_FEATURE_FXSR))
> > copy_kernel_to_fxregs(&init_fpstate.fxsave);
> > + else
> > + copy_kernel_to_fregs(&init_fpstate.fsave);
> > +
>
> Obviously redundant line, otherwise it indeed works
>
> Tested-by: Andy Shevchenko <[email protected]>
>
> > }
> >
> > /*
>
>
>
It works but user-space FPU is broken; something's wrong with the
initial state of the FPU regs - it looks as though they aren't being
properly initialized and FPU context in the signal handler is wrong
too.
Linux 3.8.7:
/root@galileo:~# ./fpu
f is 10.000000 g is 10.100000
Double value is 0.000000
Double value is 0.100000
Double value is 0.200000
^Chandler value of variable is 0.300000
Double value is 0.300000
Double value is 0.400000
Linux-next + Boris' fix:
root@galileo:~# ./fpu
f is -nan g is -nan
Double value is 0.000000
Double value is 0.100000
Double value is 0.200000^C
handler value of variable is -nan
Double value is 0.300000
Double value is 0.400000^Z[1]+ Stopped
root@galileo:~# uname -aLinux galileo 4.5.0-rc7-next-20160310+ #185 Thu
Mar 10 15:11:10 GMT 2016 i586 GNU/Linux
On Thu, Mar 10, 2016 at 03:45:21PM +0000, Bryan O'Donoghue wrote:
> It works but user-space FPU is broken; something's wrong with the
> initial state of the FPU regs - it looks as though they aren't being
> properly initialized and FPU context in the signal handler is wrong
> too.
What does your test prog say when you boot linux-next with
"eagerfpu=off" and *without* my fix?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Thu, 2016-03-10 at 17:49 +0100, Borislav Petkov wrote:
> On Thu, Mar 10, 2016 at 03:45:21PM +0000, Bryan O'Donoghue wrote:
> > It works but user-space FPU is broken; something's wrong with the
> > initial state of the FPU regs - it looks as though they aren't
> > being
> > properly initialized and FPU context in the signal handler is wrong
> > too.
>
> What does your test prog say when you boot linux-next with
> "eagerfpu=off" and *without* my fix?
>
root@galileo:~# ./fpu
f is 10.000000 g is 10.100000
Double value is 0.000000
Double value is 0.100000
Double value is 0.200000
^Chandler value of variable is 0.300000
Double value is 0.300000
Double value is 0.400000
^Z
[1]+ Stopped ./fpu
root@galileo:~#
On Thu, Mar 10, 2016 at 05:15:15PM +0000, Bryan O'Donoghue wrote:
> root@galileo:~# ./fpu
> f is 10.000000 g is 10.100000
Hmm, ok, I can *actually* reproduce it in kvm+qemu with 486 CPU type
(which should be close to quark AFAIK).
Debugging continues...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Thu, Mar 10, 2016 at 7:45 AM, Bryan O'Donoghue
<[email protected]> wrote:
> On Thu, 2016-03-10 at 17:22 +0200, Andy Shevchenko wrote:
>> On Thu, Mar 10, 2016 at 4:59 PM, Borislav Petkov <[email protected]>
>> wrote:
>> > On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
>> > > Looks like it lacks that one.
>> > >
>> > > # grep -i fxsr /proc/cpuinfo; echo $?
>> > > 1
>> >
>> > Ok, so looking at where the warning comes from:
>> >
>> > [ 14.714533] WARNING: CPU: 0 PID: 823 at
>> > arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
>> >
>> > static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
>> > {
>> > int err;
>> >
>> > if (config_enabled(CONFIG_X86_32)) {
>> > err = check_insn(fxrstor %[fx], "=m" (*fx), [fx]
>> > "m" (*fx));
>> > ^^^^^^^^^^^^^^^^^
>> > } else {
>> >
>> > ...
>> >
>> > /* Copying from a kernel buffer to FPU registers should
>> > never fail: */
>> > WARN_ON_FPU(err);
>> >
>> >
>> > and the stacktrace is pretty clear:
>> >
>> > flush_thread
>> > > -> fpu__clear(&tsk->thread.fpu);
>> > |-> we are eager by default here:
>> >
>> > if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
>> > /* FPU state will be reallocated lazily at the
>> > first use. */
>> > fpu__drop(fpu);
>> > } else {
>> >
>> > --> we're in that branch.
>> >
>> > copy_init_fpstate_to_fpregs();
>> > |-> copy_kernel_to_fxregs()
>> >
>> >
>> > I think we should use FRSTOR on quark, i.e.,
>> > copy_kernel_to_fregs().
>> >
>> > Does this untested wild guess even work?
>> >
>> > ---
>> > diff --git a/arch/x86/kernel/fpu/core.c
>> > b/arch/x86/kernel/fpu/core.c
>> > index dea8e76d60c6..bbafe5e8a1a6 100644
>> > --- a/arch/x86/kernel/fpu/core.c
>> > +++ b/arch/x86/kernel/fpu/core.c
>> > @@ -474,8 +474,11 @@ static inline void
>> > copy_init_fpstate_to_fpregs(void)
>> > {
>> > if (use_xsave())
>> > copy_kernel_to_xregs(&init_fpstate.xsave, -1);
>> > - else
>> > + else if (static_cpu_has(X86_FEATURE_FXSR))
>> > copy_kernel_to_fxregs(&init_fpstate.fxsave);
>> > + else
>> > + copy_kernel_to_fregs(&init_fpstate.fsave);
>> > +
>>
>> Obviously redundant line, otherwise it indeed works
>>
>> Tested-by: Andy Shevchenko <[email protected]>
>>
>> > }
>> >
>> > /*
>>
>>
>>
>
> It works but user-space FPU is broken; something's wrong with the
> initial state of the FPU regs - it looks as though they aren't being
> properly initialized and FPU context in the signal handler is wrong
> too.
>
> Linux 3.8.7:
> /root@galileo:~# ./fpu
> f is 10.000000 g is 10.100000
> Double value is 0.000000
> Double value is 0.100000
> Double value is 0.200000
> ^Chandler value of variable is 0.300000
> Double value is 0.300000
> Double value is 0.400000
>
> Linux-next + Boris' fix:
> root@galileo:~# ./fpu
> f is -nan g is -nan
> Double value is 0.000000
> Double value is 0.100000
> Double value is 0.200000^C
> handler value of variable is -nan
> Double value is 0.300000
> Double value is 0.400000^Z[1]+ Stopped
>
Just to check: are you running the exact same compiled binary on both
kernels? Because your test case invokes undefined behavior, and I'm a
bit surprised you get anything sensible from it. That being said, the
f = -nan part is worrisome.
--Andy
On Thu, Mar 10, 2016 at 6:59 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
>> Looks like it lacks that one.
>>
>> # grep -i fxsr /proc/cpuinfo; echo $?
>> 1
>
> Ok, so looking at where the warning comes from:
>
> [ 14.714533] WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
>
> static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
> {
> int err;
>
> if (config_enabled(CONFIG_X86_32)) {
> err = check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> ^^^^^^^^^^^^^^^^^
> } else {
>
> ...
>
> /* Copying from a kernel buffer to FPU registers should never fail: */
> WARN_ON_FPU(err);
>
>
> and the stacktrace is pretty clear:
>
> flush_thread
> |-> fpu__clear(&tsk->thread.fpu);
> |-> we are eager by default here:
>
> if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
> /* FPU state will be reallocated lazily at the first use. */
> fpu__drop(fpu);
> } else {
>
> --> we're in that branch.
>
> copy_init_fpstate_to_fpregs();
> |-> copy_kernel_to_fxregs()
>
>
> I think we should use FRSTOR on quark, i.e., copy_kernel_to_fregs().
>
> Does this untested wild guess even work?
>
> ---
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index dea8e76d60c6..bbafe5e8a1a6 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -474,8 +474,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
> {
> if (use_xsave())
> copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> - else
> + else if (static_cpu_has(X86_FEATURE_FXSR))
> copy_kernel_to_fxregs(&init_fpstate.fxsave);
> + else
> + copy_kernel_to_fregs(&init_fpstate.fsave);
> +
> }
>
This looks wrong, too:
/*
* Once per bootup FPU initialization sequences that will run on most x86 CPUs:
*/
static void __init fpu__init_system_generic(void)
{
/*
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
fpstate_init_fxstate(&init_fpstate.fxsave); <-- wrong format on
pre-FXSR CPUs
fpu__init_system_mxcsr();
}
* Andy Lutomirski <[email protected]> wrote:
> On Thu, Mar 10, 2016 at 6:59 AM, Borislav Petkov <[email protected]> wrote:
> > On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote:
> >> Looks like it lacks that one.
> >>
> >> # grep -i fxsr /proc/cpuinfo; echo $?
> >> 1
> >
> > Ok, so looking at where the warning comes from:
> >
> > [ 14.714533] WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> >
> > static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
> > {
> > int err;
> >
> > if (config_enabled(CONFIG_X86_32)) {
> > err = check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> > ^^^^^^^^^^^^^^^^^
> > } else {
> >
> > ...
> >
> > /* Copying from a kernel buffer to FPU registers should never fail: */
> > WARN_ON_FPU(err);
> >
> >
> > and the stacktrace is pretty clear:
> >
> > flush_thread
> > |-> fpu__clear(&tsk->thread.fpu);
> > |-> we are eager by default here:
> >
> > if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
> > /* FPU state will be reallocated lazily at the first use. */
> > fpu__drop(fpu);
> > } else {
> >
> > --> we're in that branch.
> >
> > copy_init_fpstate_to_fpregs();
> > |-> copy_kernel_to_fxregs()
> >
> >
> > I think we should use FRSTOR on quark, i.e., copy_kernel_to_fregs().
> >
> > Does this untested wild guess even work?
> >
> > ---
> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > index dea8e76d60c6..bbafe5e8a1a6 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -474,8 +474,11 @@ static inline void copy_init_fpstate_to_fpregs(void)
> > {
> > if (use_xsave())
> > copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> > - else
> > + else if (static_cpu_has(X86_FEATURE_FXSR))
> > copy_kernel_to_fxregs(&init_fpstate.fxsave);
> > + else
> > + copy_kernel_to_fregs(&init_fpstate.fsave);
> > +
> > }
> >
>
> This looks wrong, too:
>
> /*
> * Once per bootup FPU initialization sequences that will run on most x86 CPUs:
> */
> static void __init fpu__init_system_generic(void)
> {
> /*
> * Set up the legacy init FPU context. (xstate init might overwrite this
> * with a more modern format, if the CPU supports it.)
> */
> fpstate_init_fxstate(&init_fpstate.fxsave); <-- wrong format on pre-FXSR CPUs
Indeed:
static inline void fpstate_init_fxstate(struct fxregs_state *fx)
{
fx->cwd = 0x37f;
fx->mxcsr = MXCSR_DEFAULT;
}
I assumed that the fxstate init is outside the legacy FPU context area, but they
overlap: fx->cwd is the first two bytes, and fx->mxcsr overlaps the middle of the
legacy area.
We do a later fpstate_init_fstate(), which does:
static inline void fpstate_init_fstate(struct fregs_state *fp)
{
fp->cwd = 0xffff037fu;
fp->swd = 0xffff0000u;
fp->twd = 0xffffffffu;
fp->fos = 0xffff0000u;
}
which accidentally overwrites the cwd bit - but AFAICS fx->mxcsw overlaps the
first legacy FPU register?
So yes, this needs to be fixed too.
Thanks,
Ingo
On Fri, Mar 11, 2016 at 10:08:40AM +0100, Ingo Molnar wrote:
> So yes, this needs to be fixed too.
Yes indeed. So the diff below seems to work with Bryan's simple test
case.
Bryan, can you confirm on your box pls?
---
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index dea8e76d60c6..8e37cc8a539a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -474,8 +474,10 @@ static inline void copy_init_fpstate_to_fpregs(void)
{
if (use_xsave())
copy_kernel_to_xregs(&init_fpstate.xsave, -1);
- else
+ else if (static_cpu_has(X86_FEATURE_FXSR))
copy_kernel_to_fxregs(&init_fpstate.fxsave);
+ else
+ copy_kernel_to_fregs(&init_fpstate.fsave);
}
/*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index e12cc0ad368e..c835f61d5feb 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -134,7 +134,7 @@ static void __init fpu__init_system_generic(void)
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
- fpstate_init_fxstate(&init_fpstate.fxsave);
+ fpstate_init(&init_fpstate);
fpu__init_system_mxcsr();
}
---
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Thu, 2016-03-10 at 17:31 -0800, Andy Lutomirski wrote:
> On Thu, Mar 10, 2016 at 7:45 AM, Bryan O'Donoghue
> <[email protected]> wrote:
> > On Thu, 2016-03-10 at 17:22 +0200, Andy Shevchenko wrote:
> > > On Thu, Mar 10, 2016 at 4:59 PM, Borislav Petkov <[email protected]>
> > > wrote:
> > > > On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko
> > > > wrote:
> > > > > Looks like it lacks that one.
> > > > >
> > > > > # grep -i fxsr /proc/cpuinfo; echo $?
> > > > > 1
> > > >
> > > > Ok, so looking at where the warning comes from:
> > > >
> > > > [ 14.714533] WARNING: CPU: 0 PID: 823 at
> > > > arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> > > >
> > > > static inline void copy_kernel_to_fxregs(struct fxregs_state
> > > > *fx)
> > > > {
> > > > int err;
> > > >
> > > > if (config_enabled(CONFIG_X86_32)) {
> > > > err = check_insn(fxrstor %[fx], "=m" (*fx),
> > > > [fx]
> > > > "m" (*fx));
> > > > ^^^^^^^^^^^^^^^^^
> > > > } else {
> > > >
> > > > ...
> > > >
> > > > /* Copying from a kernel buffer to FPU registers should
> > > > never fail: */
> > > > WARN_ON_FPU(err);
> > > >
> > > >
> > > > and the stacktrace is pretty clear:
> > > >
> > > > flush_thread
> > > > > -> fpu__clear(&tsk->thread.fpu);
> > > > |-> we are eager by default here:
> > > >
> > > > if (!use_eager_fpu() ||
> > > > !static_cpu_has(X86_FEATURE_FPU)) {
> > > > /* FPU state will be reallocated lazily at the
> > > > first use. */
> > > > fpu__drop(fpu);
> > > > } else {
> > > >
> > > > --> we're in that branch.
> > > >
> > > > copy_init_fpstate_to_fpregs();
> > > > |-> copy_kernel_to_fxregs()
> > > >
> > > >
> > > > I think we should use FRSTOR on quark, i.e.,
> > > > copy_kernel_to_fregs().
> > > >
> > > > Does this untested wild guess even work?
> > > >
> > > > ---
> > > > diff --git a/arch/x86/kernel/fpu/core.c
> > > > b/arch/x86/kernel/fpu/core.c
> > > > index dea8e76d60c6..bbafe5e8a1a6 100644
> > > > --- a/arch/x86/kernel/fpu/core.c
> > > > +++ b/arch/x86/kernel/fpu/core.c
> > > > @@ -474,8 +474,11 @@ static inline void
> > > > copy_init_fpstate_to_fpregs(void)
> > > > {
> > > > if (use_xsave())
> > > > copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> > > > - else
> > > > + else if (static_cpu_has(X86_FEATURE_FXSR))
> > > > copy_kernel_to_fxregs(&init_fpstate.fxsave);
> > > > + else
> > > > + copy_kernel_to_fregs(&init_fpstate.fsave);
> > > > +
> > >
> > > Obviously redundant line, otherwise it indeed works
> > >
> > > Tested-by: Andy Shevchenko <[email protected]>
> > >
> > > > }
> > > >
> > > > /*
> > >
> > >
> > >
> >
> > It works but user-space FPU is broken; something's wrong with the
> > initial state of the FPU regs - it looks as though they aren't
> > being
> > properly initialized and FPU context in the signal handler is wrong
> > too.
> >
> > Linux 3.8.7:
> > /root@galileo:~# ./fpu
> > f is 10.000000 g is 10.100000
> > Double value is 0.000000
> > Double value is 0.100000
> > Double value is 0.200000
> > ^Chandler value of variable is 0.300000
> > Double value is 0.300000
> > Double value is 0.400000
> >
> > Linux-next + Boris' fix:
> > root@galileo:~# ./fpu
> > f is -nan g is -nan
> > Double value is 0.000000
> > Double value is 0.100000
> > Double value is 0.200000^C
> > handler value of variable is -nan
> > Double value is 0.300000
> > Double value is 0.400000^Z[1]+ Stopped
> >
>
> Just to check: are you running the exact same compiled binary on both
> kernels? Because your test case invokes undefined behavior, and I'm
> a
> bit surprised you get anything sensible from it. That being said,
> the
> f = -nan part is worrisome.
>
> --Andy
It's the same binary yes.
On Fri, 2016-03-11 at 10:48 +0100, Borislav Petkov wrote:
> On Fri, Mar 11, 2016 at 10:08:40AM +0100, Ingo Molnar wrote:
> > So yes, this needs to be fixed too.
>
> Yes indeed. So the diff below seems to work with Bryan's simple test
> case.
>
> Bryan, can you confirm on your box pls?
>
> ---
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index dea8e76d60c6..8e37cc8a539a 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -474,8 +474,10 @@ static inline void
> copy_init_fpstate_to_fpregs(void)
> {
> if (use_xsave())
> copy_kernel_to_xregs(&init_fpstate.xsave, -1);
> - else
> + else if (static_cpu_has(X86_FEATURE_FXSR))
> copy_kernel_to_fxregs(&init_fpstate.fxsave);
> + else
> + copy_kernel_to_fregs(&init_fpstate.fsave);
> }
>
> /*
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index e12cc0ad368e..c835f61d5feb 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -134,7 +134,7 @@ static void __init fpu__init_system_generic(void)
> * Set up the legacy init FPU context. (xstate init might
> overwrite this
> * with a more modern format, if the CPU supports it.)
> */
> - fpstate_init_fxstate(&init_fpstate.fxsave);
> + fpstate_init(&init_fpstate);
>
> fpu__init_system_mxcsr();
> }
>
> ---
>
> Thanks.
>
Hi Boris,
Looks good.
Tested-by: Bryan O'Donoghue <[email protected]>
On Fri, Mar 11, 2016 at 11:02:04AM +0000, Bryan O'Donoghue wrote:
> Looks good.
>
> Tested-by: Bryan O'Donoghue <[email protected]>
Thanks Bryan!
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
486 cores like Intel Quark support only the very old, legacy x87 FPU
(FSAVE/FRSTOR, CPUID bit FXSR is not set). And our FPU code wasn't
handling the saving and restoring there properly. First, Andy Shevchenko
reported a splat:
WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
which was us trying to execute FXRSTOR on those machines even though
they don't support it.
After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.
Take care of all that.
Reported-by: Andy Shevchenko <[email protected]>
Reported-and-tested-by: Bryan O'Donoghue <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/fpu/core.c | 4 +++-
arch/x86/kernel/fpu/init.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index dea8e76d60c6..8e37cc8a539a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -474,8 +474,10 @@ static inline void copy_init_fpstate_to_fpregs(void)
{
if (use_xsave())
copy_kernel_to_xregs(&init_fpstate.xsave, -1);
- else
+ else if (static_cpu_has(X86_FEATURE_FXSR))
copy_kernel_to_fxregs(&init_fpstate.fxsave);
+ else
+ copy_kernel_to_fregs(&init_fpstate.fsave);
}
/*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index e12cc0ad368e..c835f61d5feb 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -134,7 +134,7 @@ static void __init fpu__init_system_generic(void)
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
- fpstate_init_fxstate(&init_fpstate.fxsave);
+ fpstate_init(&init_fpstate);
fpu__init_system_mxcsr();
}
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Fri, Mar 11, 2016 at 3:32 AM, Borislav Petkov <[email protected]> wrote:
> 486 cores like Intel Quark support only the very old, legacy x87 FPU
> (FSAVE/FRSTOR, CPUID bit FXSR is not set). And our FPU code wasn't
> handling the saving and restoring there properly. First, Andy Shevchenko
> reported a splat:
>
> WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
>
> which was us trying to execute FXRSTOR on those machines even though
> they don't support it.
>
> After taking care of that, Bryan O'Donoghue reported that a simple FPU
> test still failed because we weren't initializing the FPU state properly
> on those machines.
Obvious Ack to the patch, along with a "how did this ever work
before?" comment..
Linus
On Fri, Mar 11, 2016 at 10:32:43AM -0800, Linus Torvalds wrote:
> Obvious Ack to the patch, along with a "how did this ever work
> before?" comment..
I had a sarcastic sentence in the commit message which I deleted later:
"Apparently no one had tried the kernel on a 486er after the FPU
rewrite. Backwards compatibility is overrated."
:-)
I'm still wondering, though, why didn't the Quark people scream
earlier... And who knows, it was probably b0rked even before the
FPU rewrite.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On 03/11/2016 02:03 PM, Borislav Petkov wrote:
> I'm still wondering, though, why didn't the Quark people scream
> earlier... And who knows, it was probably b0rked even before the
> FPU rewrite.
I've actually got 4.0 running on my Quark board. The FPU rewrite
dropped in just after that iirc.
On Fri, Mar 11, 2016 at 02:07:19PM -0800, Dave Hansen wrote:
> I've actually got 4.0 running on my Quark board. The FPU rewrite
> dropped in just after that iirc.
4.2 or so... Ok, so it looks like we broke it then.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Fri, 2016-03-11 at 23:03 +0100, Borislav Petkov wrote:
> On Fri, Mar 11, 2016 at 10:32:43AM -0800, Linus Torvalds wrote:
> > Obvious Ack to the patch, along with a "how did this ever work
> > before?" comment..
>
> I had a sarcastic sentence in the commit message which I deleted
> later:
>
> "Apparently no one had tried the kernel on a 486er after the FPU
> rewrite. Backwards compatibility is overrated."
>
> :-)
>
> I'm still wondering, though, why didn't the Quark people scream
> earlier... And who knows, it was probably b0rked even before the
> FPU rewrite.
>
Busy with the dayjob :) so I haven't updated on Galileo since
4b696dcb1a55e40648ad0eec4af991c72f945a85 (Feb 28 or so)
but... we'll do a better job keep track of -next to catch this stuff
earlier
---
bod
On Sat, Mar 12, 2016 at 12:04:49PM +0000, Bryan O'Donoghue wrote:
> Busy with the dayjob :) so I haven't updated on Galileo since
>
> 4b696dcb1a55e40648ad0eec4af991c72f945a85 (Feb 28 or so)
That's 4.5-rc5-ish and the FPU rewrite came during 4.2. BUT!,
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
which is in tip currently, made eagerfpu default and we know that
eagerfpu=off makes the issue go away.
So the bug was there, just it wasn't being hit on Quark. Until
58122bf1d856...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
* Linus Torvalds <[email protected]> wrote:
> On Fri, Mar 11, 2016 at 3:32 AM, Borislav Petkov <[email protected]> wrote:
> > 486 cores like Intel Quark support only the very old, legacy x87 FPU
> > (FSAVE/FRSTOR, CPUID bit FXSR is not set). And our FPU code wasn't
> > handling the saving and restoring there properly. First, Andy Shevchenko
> > reported a splat:
> >
> > WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> >
> > which was us trying to execute FXRSTOR on those machines even though
> > they don't support it.
> >
> > After taking care of that, Bryan O'Donoghue reported that a simple FPU
> > test still failed because we weren't initializing the FPU state properly
> > on those machines.
>
> Obvious Ack to the patch, along with a "how did this ever work
> before?" comment..
So the window for 'real' breakage was relatively short: this is an older bug but
only became a serious bug with the following upcoming commit:
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
... but it's nice to have it fixed nevertheless!
Thanks,
Ingo
* Ingo Molnar <[email protected]> wrote:
> * Linus Torvalds <[email protected]> wrote:
>
> > On Fri, Mar 11, 2016 at 3:32 AM, Borislav Petkov <[email protected]> wrote:
> > > 486 cores like Intel Quark support only the very old, legacy x87 FPU
> > > (FSAVE/FRSTOR, CPUID bit FXSR is not set). And our FPU code wasn't
> > > handling the saving and restoring there properly. First, Andy Shevchenko
> > > reported a splat:
> > >
> > > WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
> > >
> > > which was us trying to execute FXRSTOR on those machines even though
> > > they don't support it.
> > >
> > > After taking care of that, Bryan O'Donoghue reported that a simple FPU
> > > test still failed because we weren't initializing the FPU state properly
> > > on those machines.
> >
> > Obvious Ack to the patch, along with a "how did this ever work
> > before?" comment..
>
> So the window for 'real' breakage was relatively short: this is an older bug but
> only became a serious bug with the following upcoming commit:
>
> 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
And the reason for that is:
void fpu__clear(struct fpu *fpu)
{
WARN_ON_FPU(fpu != ¤t->thread.fpu); /* Almost certainly an anomaly */
if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
/* FPU state will be reallocated lazily at the first use. */
fpu__drop(fpu);
} else {
if (!fpu->fpstate_active) {
fpu__activate_curr(fpu);
user_fpu_begin();
}
copy_init_fpstate_to_fpregs();
}
}
i.e. we only execute the buggy sequence in the !eager_fpu case - and old FPUs were
not eager-FPU, which hid the bug.
The other bug:
@@ -134,7 +134,7 @@ static void __init fpu__init_system_gene
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
- fpstate_init_fxstate(&init_fpstate.fxsave);
+ fpstate_init(&init_fpstate);
was also hidden by the fact that it only affects eagerfpu case - but all previous
eagerfpu bootups were for post-XSAVE CPUs.
Thanks,
Ingo
* Borislav Petkov <[email protected]> wrote:
> On Fri, Mar 11, 2016 at 10:32:43AM -0800, Linus Torvalds wrote:
> > Obvious Ack to the patch, along with a "how did this ever work
> > before?" comment..
>
> I had a sarcastic sentence in the commit message which I deleted later:
>
> "Apparently no one had tried the kernel on a 486er after the FPU
> rewrite. Backwards compatibility is overrated."
>
> :-)
>
> I'm still wondering, though, why didn't the Quark people scream
> earlier... And who knows, it was probably b0rked even before the
> FPU rewrite.
So I actually tested legacy FPU support during (and after) the big FPU rewrite - I
even tried FPU emu and we made that work (it was broken for like years).
This particular breakage was made prominent recently, with switching old FPU CPUs
over to eagerfpu context switching:
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
If you check the fix, it only affects eagerfpu code paths. So to see the bug with
Quark (or ancient CPUs) you'd have to have booted it with eagerfpu=on - and most
people avoid twiddling boot parameters if they can avoid it.
Thanks,
Ingo
Commit-ID: 6e6867093de35141f0a76b66ac13f9f2e2c8e77a
Gitweb: http://git.kernel.org/tip/6e6867093de35141f0a76b66ac13f9f2e2c8e77a
Author: Borislav Petkov <[email protected]>
AuthorDate: Fri, 11 Mar 2016 12:32:06 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 12 Mar 2016 16:13:55 +0100
x86/fpu: Fix eager-FPU handling on legacy FPU machines
i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.
So after we made eagerfpu the default for all CPU types:
58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
these old FPU designs broke. First, Andy Shevchenko reported a splat:
WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
which was us trying to execute FXRSTOR on those machines even though
they don't support it.
After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.
Take care of all that.
Reported-and-tested-by: Bryan O'Donoghue <[email protected]>
Reported-by: Andy Shevchenko <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Quentin Casasnovas <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yu-cheng <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 4 +++-
arch/x86/kernel/fpu/init.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d25097c..d5804ad 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -409,8 +409,10 @@ static inline void copy_init_fpstate_to_fpregs(void)
{
if (use_xsave())
copy_kernel_to_xregs(&init_fpstate.xsave, -1);
- else
+ else if (static_cpu_has(X86_FEATURE_FXSR))
copy_kernel_to_fxregs(&init_fpstate.fxsave);
+ else
+ copy_kernel_to_fregs(&init_fpstate.fsave);
}
/*
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 9ee7e30..bd08fb7 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -134,7 +134,7 @@ static void __init fpu__init_system_generic(void)
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
- fpstate_init_fxstate(&init_fpstate.fxsave);
+ fpstate_init(&init_fpstate);
fpu__init_system_mxcsr();
}
On Mar 11, 2016 2:20 PM, "Borislav Petkov" <[email protected]> wrote:
>
> On Fri, Mar 11, 2016 at 02:07:19PM -0800, Dave Hansen wrote:
> > I've actually got 4.0 running on my Quark board. The FPU rewrite
> > dropped in just after that iirc.
>
> 4.2 or so... Ok, so it looks like we broke it then.
>
For reference, what are the QEMU options and boot options you used to
trigger this? I'm asking because I tested eagerfpu=on without fxsr a
few weeks ago in QEMU, and I didn't trigger it. Maybe I needed to
force KVM off or something. Off the top of my head, I'm guessing that
when I wrote "x86/fpu: Fix FNSAVE usage in eagerfpu mode", I was
inadvertently using a bastardize combination of FNSAVE and
FXRSTOR-of-init-state, KVM let the FXRSTOR through despite not
advertising it in CPUID, and it papered over the init issue because
the wrong init state format was hidden by using the wrong instruction
to load it.
Sigh. Yet more reason for Intel to add chicken bits to *turn off* new features.
--Andy
On Sat, Mar 12, 2016 at 09:21:16AM -0800, Andy Lutomirski wrote:
> For reference, what are the QEMU options and boot options you used to
> trigger this? I'm asking because I tested eagerfpu=on without fxsr a
> few weeks ago in QEMU, and I didn't trigger it. Maybe I needed to
> force KVM off or something.
$ qemu-system-i386
-enable-kvm
-gdb tcp::1234
-cpu 486
-m 2048
-hda /home/boris/kvm/debian/sid-i386.img
-boot menu=off,order=c
-localtime
-net nic,model=rtl8139,macaddr=12:35:12:34:56:78
-net user,hostfwd=tcp::1235-:22
-usbdevice tablet
-kernel /home/boris/kernel/linux-2.6/arch/x86/boot/bzImage
-append "root=/dev/sda1 resume=/dev/sdb1 debug ignore_loglevel log_buf_len=16M earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 "
-monitor pty
-soundhw hda
-serial file:/home/boris/kvm/test-i386-1235.log
-snapshot
-name "Debian i386:1235"
-smp 1
-virtfs local,path=/tmp,mount_tag=tmp,security_model=none
I basically tried to get it as close to 486er as possible. I had also
CONFIG_M486=y for the guest kernel.
> Off the top of my head, I'm guessing that when I wrote "x86/fpu: Fix
> FNSAVE usage in eagerfpu mode", I was inadvertently using a bastardize
> combination of FNSAVE and FXRSTOR-of-init-state, KVM let the FXRSTOR
> through despite not advertising it in CPUID, and it papered over the
> init issue because the wrong init state format was hidden by using the
> wrong instruction to load it.
Yeah, I think qemu looks at CPUID bits for some stuff but probably not
all. And I also think we should go and fix such issues there so that we
can have as accurate an emulation as possible.
> Sigh. Yet more reason for Intel to add chicken bits to *turn off* new
> features.
Amen to that.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
On Fri, 11 Mar 2016, Borislav Petkov wrote:
> > Obvious Ack to the patch, along with a "how did this ever work
> > before?" comment..
>
> I had a sarcastic sentence in the commit message which I deleted later:
>
> "Apparently no one had tried the kernel on a 486er after the FPU
> rewrite. Backwards compatibility is overrated."
People who care do not always have the resources to make regular tests
with their hardware.
I for once have meant to check the original series against my 486 EISA
box since Ingo posted them back in May last year, and I still have them
stashed away for this purpose. Unfortunately I didn't get to wire that
machine for remote console access and power supply control (and I have
since run out of ports too), so I need to physically get to its location
to do any testing.
So it's not unusual for bugs in the less common configurations to only
surface a bit later on, as people get to making an upgrade. At least we
have `git bisect' available and individual changes recorded now, which
makes tracking down regressions a lot easier than it used to be in the old
days.
Maciej