Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754569AbYFRN6B (ORCPT ); Wed, 18 Jun 2008 09:58:01 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751352AbYFRN5x (ORCPT ); Wed, 18 Jun 2008 09:57:53 -0400 Received: from smtp.cs.aau.dk ([130.225.194.6]:46094 "EHLO smtp.cs.aau.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751194AbYFRN5x (ORCPT ); Wed, 18 Jun 2008 09:57:53 -0400 Subject: Re: 2.6.26-git: NULL pointer deref in __switch_to From: Simon Holm =?ISO-8859-1?Q?Th=F8gersen?= To: Suresh Siddha Cc: Vegard Nossum , Patrick McHardy , Linux Kernel Mailinglist , Chuck Ebbert , "x86@kernel.org" , rusty@rustcorp.com.au In-Reply-To: <20080617235022.GA23370@linux-os.sc.intel.com> References: <4852B19E.4010202@trash.net> <19f34abd0806131124w32133715o3ef8c27cb0a9f96e@mail.gmail.com> <20080613224711.GA15084@linux-os.sc.intel.com> <1213611339.2495.11.camel@odie.local> <20080616174948.GB7788@linux-os.sc.intel.com> <1213651283.2495.46.camel@odie.local> <20080617235022.GA23370@linux-os.sc.intel.com> Content-Type: text/plain; charset=utf-8 Date: Wed, 18 Jun 2008 15:57:54 +0200 Message-Id: <1213797474.8363.6.camel@odie.local> Mime-Version: 1.0 X-Mailer: Evolution 2.22.2 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2392 Lines: 57 tir, 17 06 2008 kl. 16:50 -0700, skrev Suresh Siddha: > On Mon, Jun 16, 2008 at 02:21:23PM -0700, Simon Holm Thøgersen wrote: > > > Can you please upload it some where? I will also try with another guest > > > image meanwhile. > > > > > [access provided to Suresh in private email] > > Simon, Thanks. > > Simon, Patrick, I am able to reproduce the oops in __switch_to() > with lguest. My debug showed that there is atleast one lguest specific > issue (which should be present in 2.6.25 and before aswell) and it got > exposed with a kernel oops with the recent fpu dynamic allocation patches. > > In addition to the previous possible scenario (with fpu_counter), in the > presence of lguest, it is possible that the cpu's TS bit it still set and the > lguest launcher task's thread_info has TS_USEDFPU still set. > > This is because of the way the lguest launcher handling the guest's TS bit. > (look at lguest_set_ts() in lguest_arch_run_guest()). This can result > in a DNA fault while doing unlazy_fpu() in __switch_to(). This will > end up causing a DNA fault in the context of new process thats > getting context switched in (as opossed to handling DNA fault in the context > of lguest launcher/helper process). > > This is wrong in both pre and post 2.6.25 kernels. In the recent > 2.6.26-rc series, this is showing up as NULL pointer dereferences or > sleeping function called from atomic context(__switch_to()), as > we free and dynamically allocate the FPU context for the newly > created threads. Older kernels might show some FPU corruption for processes > running inside of lguest. > > With the appended patch, my test system is running for more than 50 mins > now. So atleast some of your oops (hopefully all!) should get fixed. > Please give it a try. I will spend more time with this fix tomorrow. > > Apart from the last hunk(MSR_IA32_SYSENTER_CS changes), I believe > the below patch is needed for 2.6.25 aswell. > > Thanks. > > Signed-off-by: Suresh Siddha Thanks a lot Suresh, this fixes the issue for me. Feel free to add Tested-by: Simon Holm Thøgersen Simon -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/