Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757687Ab0GNV1u (ORCPT ); Wed, 14 Jul 2010 17:27:50 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:21968 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757013Ab0GNV1s (ORCPT ); Wed, 14 Jul 2010 17:27:48 -0400 Message-ID: <4C3E2AE6.30406@kernel.org> Date: Wed, 14 Jul 2010 14:23:50 -0700 From: Yinghai Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100520 SUSE/3.0.5 Thunderbird/3.0.5 MIME-Version: 1.0 To: "H. Peter Anvin" , Ingo Molnar , Don Zickus , Frederic Weisbecker CC: Thomas Gleixner , Suresh Siddha , "linux-kernel@vger.kernel.org" Subject: Re: tip/master broken with x2apic and kexec References: <4C3BD6AA.3070908@kernel.org> <4C3CE210.2030902@zytor.com> <4C3CF650.30905@kernel.org> <4C3E1FA0.9000107@kernel.org> In-Reply-To: <4C3E1FA0.9000107@kernel.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: acsmt355.oracle.com [141.146.40.155] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090203.4C3E2BB6.0148,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3344 Lines: 87 On 07/14/2010 01:35 PM, Yinghai Lu wrote: > On 07/13/2010 04:27 PM, Yinghai Lu wrote: >> On 07/13/2010 03:00 PM, H. Peter Anvin wrote: >>> On 07/12/2010 07:59 PM, Yinghai Lu wrote: >>>> tip/master: >>>> system1: BIOS enabled x2apic, first kernel boot well, and when kexec second kernel will cause system instant reboot. >>>> >>>> system2: BIOS not enable x2apic, first kernel boot well and enable x2apic, and kexec second kernel well. but when kexec third kernel will case system instant reboot. >>>> >>>> linus' tree is ok. >>>> >>>> but for system2 if boot with nox2apic ,intr-remaping off, iommu off, the kexec loop test will pass. >>>> >>>> the problem looks start in recent two or three weeks. >>>> >>>> Any idea? >>>> >>>> bisecting will take a while, because the system post take a while everytime. >>>> >>>> Thanks >>>> >>>> Yinghai Lu >>> >>> OK, I found the bug... if you could test out the patch which will be >>> sent out shortly I would very much appreciate it. >> >> not sure if your patch is the offending one now. >> >> kL: kernel from linus tree >> kT1: kernel from tip >> kT2: kernel from tip with reverting your patch >> >> BIOS-->kL ---> kL ---> kL....always working >> BIOS-->kT1 ---> kT1 ---> kT1 : between second one and third one system reset instant... >> BIOS-->kT2 ---> kT2 ---> kT2 : between second one and third one system reset instant... >> >> BIOS-->kL ---> kL ---> kL ---> then kT1 ---> kT1 .... always working >> BIOS-->kL ---> kL ---> kL ---> then kT2 ---> kT2 .... always working >> > > bisecting said: > >> git bisect good > 58687acba59266735adb8ccd9b5b9aa2c7cd205b is the first bad commit > commit 58687acba59266735adb8ccd9b5b9aa2c7cd205b > Author: Don Zickus > Date: Fri May 7 17:11:44 2010 -0400 > > lockup_detector: Combine nmi_watchdog and softlockup detector > > The new nmi_watchdog (which uses the perf event subsystem) is very > similar in structure to the softlockup detector. Using Ingo's > suggestion, I combined the two functionalities into one file: > kernel/watchdog.c. > > Now both the nmi_watchdog (or hardlockup detector) and softlockup > detector sit on top of the perf event subsystem, which is run every > 60 seconds or so to see if there are any lockups. > > To detect hardlockups, cpus not responding to interrupts, I > implemented an hrtimer that runs 5 times for every perf event > overflow event. If that stops counting on a cpu, then the cpu is > most likely in trouble. > > To detect softlockups, tasks not yielding to the scheduler, I used the > previous kthread idea that now gets kicked every time the hrtimer fires. > If the kthread isn't being scheduled neither is anyone else and the > warning is printed to the console. > > I tested this on x86_64 and both the softlockup and hardlockup paths > work. > with # CONFIG_LOCKUP_DETECTOR is not set # CONFIG_HARDLOCKUP_DETECTOR is not set kexec loop test could passed. also that patch will break x2apic preenabled system 's kexec/kdump. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/