Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933301AbcCKBcO (ORCPT ); Thu, 10 Mar 2016 20:32:14 -0500 Received: from mail-oi0-f43.google.com ([209.85.218.43]:33570 "EHLO mail-oi0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933051AbcCKBcN (ORCPT ); Thu, 10 Mar 2016 20:32:13 -0500 MIME-Version: 1.0 In-Reply-To: <1457624721.5784.0.camel@nexus-software.ie> References: <20160310111935.GB13102@gmail.com> <20160310125610.GA26708@pd.tnic> <20160310145940.GB26708@pd.tnic> <1457624721.5784.0.camel@nexus-software.ie> From: Andy Lutomirski Date: Thu, 10 Mar 2016 17:31:52 -0800 Message-ID: Subject: Re: Got FPU related warning on Intel Quark during boot To: "Bryan O'Donoghue" Cc: Andy Shevchenko , Borislav Petkov , Ingo Molnar , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , Fenghua Yu , Linus Torvalds , "H. Peter Anvin" , Thomas Gleixner , Andrew Morton , Dave Hansen , Oleg Nesterov , "Yu, Yu-cheng" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3480 Lines: 117 On Thu, Mar 10, 2016 at 7:45 AM, Bryan O'Donoghue wrote: > On Thu, 2016-03-10 at 17:22 +0200, Andy Shevchenko wrote: >> On Thu, Mar 10, 2016 at 4:59 PM, Borislav Petkov >> wrote: >> > On Thu, Mar 10, 2016 at 03:31:43PM +0200, Andy Shevchenko wrote: >> > > Looks like it lacks that one. >> > > >> > > # grep -i fxsr /proc/cpuinfo; echo $? >> > > 1 >> > >> > Ok, so looking at where the warning comes from: >> > >> > [ 14.714533] WARNING: CPU: 0 PID: 823 at >> > arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160 >> > >> > static inline void copy_kernel_to_fxregs(struct fxregs_state *fx) >> > { >> > int err; >> > >> > if (config_enabled(CONFIG_X86_32)) { >> > err = check_insn(fxrstor %[fx], "=m" (*fx), [fx] >> > "m" (*fx)); >> > ^^^^^^^^^^^^^^^^^ >> > } else { >> > >> > ... >> > >> > /* Copying from a kernel buffer to FPU registers should >> > never fail: */ >> > WARN_ON_FPU(err); >> > >> > >> > and the stacktrace is pretty clear: >> > >> > flush_thread >> > > -> fpu__clear(&tsk->thread.fpu); >> > |-> we are eager by default here: >> > >> > if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) { >> > /* FPU state will be reallocated lazily at the >> > first use. */ >> > fpu__drop(fpu); >> > } else { >> > >> > --> we're in that branch. >> > >> > copy_init_fpstate_to_fpregs(); >> > |-> copy_kernel_to_fxregs() >> > >> > >> > I think we should use FRSTOR on quark, i.e., >> > copy_kernel_to_fregs(). >> > >> > Does this untested wild guess even work? >> > >> > --- >> > diff --git a/arch/x86/kernel/fpu/core.c >> > b/arch/x86/kernel/fpu/core.c >> > index dea8e76d60c6..bbafe5e8a1a6 100644 >> > --- a/arch/x86/kernel/fpu/core.c >> > +++ b/arch/x86/kernel/fpu/core.c >> > @@ -474,8 +474,11 @@ static inline void >> > copy_init_fpstate_to_fpregs(void) >> > { >> > if (use_xsave()) >> > copy_kernel_to_xregs(&init_fpstate.xsave, -1); >> > - else >> > + else if (static_cpu_has(X86_FEATURE_FXSR)) >> > copy_kernel_to_fxregs(&init_fpstate.fxsave); >> > + else >> > + copy_kernel_to_fregs(&init_fpstate.fsave); >> > + >> >> Obviously redundant line, otherwise it indeed works >> >> Tested-by: Andy Shevchenko >> >> > } >> > >> > /* >> >> >> > > It works but user-space FPU is broken; something's wrong with the > initial state of the FPU regs - it looks as though they aren't being > properly initialized and FPU context in the signal handler is wrong > too. > > Linux 3.8.7: > /root@galileo:~# ./fpu > f is 10.000000 g is 10.100000 > Double value is 0.000000 > Double value is 0.100000 > Double value is 0.200000 > ^Chandler value of variable is 0.300000 > Double value is 0.300000 > Double value is 0.400000 > > Linux-next + Boris' fix: > root@galileo:~# ./fpu > f is -nan g is -nan > Double value is 0.000000 > Double value is 0.100000 > Double value is 0.200000^C > handler value of variable is -nan > Double value is 0.300000 > Double value is 0.400000^Z[1]+ Stopped > Just to check: are you running the exact same compiled binary on both kernels? Because your test case invokes undefined behavior, and I'm a bit surprised you get anything sensible from it. That being said, the f = -nan part is worrisome. --Andy