Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965056AbVKOWts (ORCPT ); Tue, 15 Nov 2005 17:49:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965057AbVKOWts (ORCPT ); Tue, 15 Nov 2005 17:49:48 -0500 Received: from e34.co.us.ibm.com ([32.97.110.152]:43947 "EHLO e34.co.us.ibm.com") by vger.kernel.org with ESMTP id S965056AbVKOWtr (ORCPT ); Tue, 15 Nov 2005 17:49:47 -0500 Message-ID: <437A6609.4050803@us.ibm.com> Date: Tue, 15 Nov 2005 14:49:45 -0800 From: Badari Pulavarty User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andrew Morton CC: linux-kernel@vger.kernel.org, hugh@veritas.com, Dave Airlie Subject: Re: 2.6.14 X spinning in the kernel References: <1132012281.24066.36.camel@localhost.localdomain> <20051114161704.5b918e67.akpm@osdl.org> <1132015952.24066.45.camel@localhost.localdomain> <20051114173037.286db0d4.akpm@osdl.org> In-Reply-To: <20051114173037.286db0d4.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2746 Lines: 79 Andrew Morton wrote: > Badari Pulavarty wrote: > >>On Mon, 2005-11-14 at 16:17 -0800, Andrew Morton wrote: >> >>>Badari Pulavarty wrote: >>> >>>>My 2-cpu EM64T machine started showing this problem again on 2.6.14. >>>>On some reboots, X seems to spin in the kernel forever. >>>> >>>>sysrq-t output shows nothing. >>>> >>>>X R running task 0 3607 3589 3903 >>>>(L-TLB) >>>> >>>>top shows: >>>> 3607 root 25 0 0 0 0 R 99.1 0.0 262:04.69 X >>>> >>>> >>>>So, I wrote a module to do smp_call_function() on all CPUs >>>>to show stacks on them. CPU0 seems to be spinning in exit_mmap(). >>>>I did this multiple times to collect stacks few times. >>>> >>>>Is this a known issue ? >>> >>>Nope. Maybe your vma list has a loop in it, in remove_vma()? slab >>>debugging would detect that, due to the repeated >>>kmem_cache_free(vm_area_cachep, vma); >> >>I compiled the kernel with slab debug and rebooted the machine. >>X seems to be spinning again. But this time, it shows completely >>different routines (and seems to be switching CPUs) :( >>Something weird is happening on my machine.. >> >>top: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 3600 root 25 0 0 0 0 R 99.9 0.0 8:29.18 X >> >> >>... >> >>Then I tried killing it and ran into.. >> >>CPU0: >>ffffffff8053c750 0000000000000000 00000000000018ff ffff81011c9a4230 >> ffff81011c9a4000 ffffffff8053c788 ffffffff8026de8f >>ffffffff8053c7a8 >> ffffffff80119591 ffffffff8053c7a8 >>Call Trace: {showacpu+47} >>{smp_call_function_interrupt+81} >> {call_function_interrupt+132} >>{:radeon:radeon_do_wait_for_idle+117} >> {:radeon:radeon_do_wait_for_idle+134} >> {:radeon:radeon_do_cp_idle+336} >>{:radeon:radeon_do_release+85} >> {:radeon:radeon_driver_pretakedown+9} >> {drm_takedown+74} > > > ah-hah. We've had machines stuck in radeon_do_wait_for_idle() before. In > fact, my workstation was doing it a year or two back. > > Are you able to identify the most recent kernel which didn't do this? I got this machine recently. 2.6.14-rc kernels are the first ones I tried on this box and they have been failing :( Only known good kernel I had so far was RHEL4 (2.6.9 base). Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/