Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760271AbYFRFfT (ORCPT ); Wed, 18 Jun 2008 01:35:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754204AbYFRFfG (ORCPT ); Wed, 18 Jun 2008 01:35:06 -0400 Received: from ozlabs.org ([203.10.76.45]:54127 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753534AbYFRFfF convert rfc822-to-8bit (ORCPT ); Wed, 18 Jun 2008 01:35:05 -0400 From: Rusty Russell To: Suresh Siddha Subject: Re: 2.6.26-git: NULL pointer deref in __switch_to Date: Wed, 18 Jun 2008 15:34:23 +1000 User-Agent: KMail/1.9.9 Cc: Simon Holm =?iso-8859-1?q?Th=F8gersen?= , Vegard Nossum , Patrick McHardy , Linux Kernel Mailinglist , Chuck Ebbert , "x86@kernel.org" References: <4852B19E.4010202@trash.net> <1213651283.2495.46.camel@odie.local> <20080617235022.GA23370@linux-os.sc.intel.com> In-Reply-To: <20080617235022.GA23370@linux-os.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Content-Disposition: inline Message-Id: <200806181534.24085.rusty@rustcorp.com.au> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4953 Lines: 120 On Wednesday 18 June 2008 09:50:22 Suresh Siddha wrote: > On Mon, Jun 16, 2008 at 02:21:23PM -0700, Simon Holm Th?gersen wrote: > > > Can you please upload it some where? I will also try with another guest > > > image meanwhile. > > > > [access provided to Suresh in private email] > > Simon, Thanks. > > Simon, Patrick, I am able to reproduce the oops in __switch_to() > with lguest. My debug showed that there is atleast one lguest specific > issue (which should be present in 2.6.25 and before aswell) and it got > exposed with a kernel oops with the recent fpu dynamic allocation patches. > > In addition to the previous possible scenario (with fpu_counter), in the > presence of lguest, it is possible that the cpu's TS bit it still set and > the lguest launcher task's thread_info has TS_USEDFPU still set. > > This is because of the way the lguest launcher handling the guest's TS bit. > (look at lguest_set_ts() in lguest_arch_run_guest()). This can result > in a DNA fault while doing unlazy_fpu() in __switch_to(). This will > end up causing a DNA fault in the context of new process thats > getting context switched in (as opossed to handling DNA fault in the > context of lguest launcher/helper process). > > This is wrong in both pre and post 2.6.25 kernels. In the recent > 2.6.26-rc series, this is showing up as NULL pointer dereferences or > sleeping function called from atomic context(__switch_to()), as > we free and dynamically allocate the FPU context for the newly > created threads. Older kernels might show some FPU corruption for processes > running inside of lguest. > > With the appended patch, my test system is running for more than 50 mins > now. So atleast some of your oops (hopefully all!) should get fixed. > Please give it a try. I will spend more time with this fix tomorrow. > > Apart from the last hunk(MSR_IA32_SYSENTER_CS changes), I believe > the below patch is needed for 2.6.25 aswell. > > Thanks. > > Signed-off-by: Suresh Siddha > --- > > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c > index 6d54833..e2db9ac 100644 > --- a/arch/x86/kernel/process_32.c > +++ b/arch/x86/kernel/process_32.c > @@ -333,6 +333,7 @@ void flush_thread(void) > /* > * Forget coprocessor state.. > */ > + tsk->fpu_counter = 0; > clear_fpu(tsk); > clear_used_math(); > } > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c > index ac54ff5..c6eb5c9 100644 > --- a/arch/x86/kernel/process_64.c > +++ b/arch/x86/kernel/process_64.c > @@ -294,6 +294,7 @@ void flush_thread(void) > /* > * Forget coprocessor state.. > */ > + tsk->fpu_counter = 0; > clear_fpu(tsk); > clear_used_math(); > } > diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c > index 5126d5d..4a98404 100644 > --- a/drivers/lguest/x86/core.c > +++ b/drivers/lguest/x86/core.c > @@ -176,7 +176,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu) > * we set it now, so we can trap and pass that trap to the Guest if it > * uses the FPU. */ > if (cpu->ts) > - lguest_set_ts(); > + unlazy_fpu(current); > > /* SYSENTER is an optimized way of doing system calls. We can't allow > * it because it always jumps to privilege level 0. A normal Guest > @@ -196,6 +196,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu) > * trap made the switcher code come back, and an error code which some > * traps set. */ > > + /* Restore SYSENTER if it's supposed to be on. */ > + if (boot_cpu_has(X86_FEATURE_SEP)) > + wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); > + > /* If the Guest page faulted, then the cr2 register will tell us the > * bad virtual address. We have to grab this now, because once we > * re-enable interrupts an interrupt could fault and thus overwrite > @@ -203,13 +207,12 @@ void lguest_arch_run_guest(struct lg_cpu *cpu) > if (cpu->regs->trapnum == 14) > cpu->arch.last_pagefault = read_cr2(); > /* Similarly, if we took a trap because the Guest used the FPU, > - * we have to restore the FPU it expects to see. */ > + * we have to restore the FPU it expects to see. > + * math_state_restore() may sleep and we may even move off to > + * a different CPU. So all the critical stuff should be done > + * before this. */ > else if (cpu->regs->trapnum == 7) > math_state_restore(); Hi Suresh, Firstly, thanks for figuring this out. But math_state_restore() has nasty semantics now. Currently lguest will work, because no code path following this call relies on being on the same CPU. So, this patch is fine, but I wonder if I should just be forcing fpu allocation earlier for lguest tasks, so I can avoid this altogether? Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/