Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932517AbWHCO1y (ORCPT ); Thu, 3 Aug 2006 10:27:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932520AbWHCO1y (ORCPT ); Thu, 3 Aug 2006 10:27:54 -0400 Received: from ms-smtp-01.nyroc.rr.com ([24.24.2.55]:29943 "EHLO ms-smtp-01.nyroc.rr.com") by vger.kernel.org with ESMTP id S932517AbWHCO1y (ORCPT ); Thu, 3 Aug 2006 10:27:54 -0400 Subject: Re: Problems with 2.6.17-rt8 From: Steven Rostedt To: Robert Crocombe Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner , Bill Huey In-Reply-To: References: <1154541079.25723.8.camel@localhost.localdomain> Content-Type: text/plain Date: Thu, 03 Aug 2006 10:27:41 -0400 Message-Id: <1154615261.32264.6.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.6.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4086 Lines: 107 Please don't trim CC lines. LKML is too big to read all emails. On Thu, 2006-08-03 at 04:48 -0700, Robert Crocombe wrote: > On 8/2/06, Steven Rostedt wrote: > > You mention problems but I don't see you listing what exactly the > > problems are. Just saying "the problems exist" doesn't tell us > > anything. > > > > Don't assume that we will go to some web site to figure out what you're > > talking about. Please list the problems you are facing. > > The machine dies (no alt-sysrq, no keyboard LEDs of any kind: dead in > the water). I thought the log would provide more useful information > without potentially erroneous editorialization by myself. Here are > some highlights: > > kjournald/1105[CPU#3]: BUG in debug_rt_mutex_unlock at kernel/rtmutex-debug.c:47 > 1 Ouch, that looks like kjournald is unlocking a lock that it doesn't own? > > Call Trace: > {_raw_spin_lock_irqsave+24} > {__WARN_ON+100} > {debug_rt_mutex_unlock+199} > {rt_lock_slowunlock+25} > {__lock_text_start+9} hmm, here we are probably having trouble with the percpu slab locks, that is somewhat of a hack to get slabs working on a per cpu basis. > {kmem_cache_alloc+202} It would also be nice to know exactly where ffffffff80271e93 is. > {mempool_alloc_slab+17} > {mempool_alloc+75} > {generic_make_request+375} > {bio_alloc_bioset+35} > {bio_alloc+16} > {submit_bh+137} > {ll_rw_block+122} > {ll_rw_block+161} > {journal_commit_transaction+1011} > {_raw_spin_unlock_irqrestore+56} > {_raw_spin_unlock+46} > {rt_lock_slowunlock+65} > {__lock_text_start+9} > {try_to_del_timer_sync+85} > {kjournald+202} > {autoremove_wake_function+0} > {kjournald+0} > {keventd_create_kthread+0} > {kthread+219} > {schedule_tail+188} > {child_rip+8} > {keventd_create_kthread+0} > {kthread+0} > {child_rip+0} > --------------------------- > | preempt count: 00000002 ] > | 2-level deep critical section nesting: > ---------------------------------------- > .. [] .... _raw_spin_lock+0x16/0x23 > .....[] .. ( <= rt_lock_slowunlock+0x11/0x6b) > .. [] .... _raw_spin_lock_irqsave+0x18/0x29 > .....[] .. ( <= __WARN_ON+0x1f/0x82) > > > Somewhat later: > > Kernel BUG at kernel/rtmutex.c:639 The rest was probably caused as a side effect from above. The above is already broken! You have NUMA configured too, so this is also something to look at. I still wouldn't ignore the first bug message you got: ---- BUG: scheduling while atomic: udev_run_devd/0x00000001/1568 Call Trace: {__schedule+155} {_raw_spin_unlock_irqrestore+53} {task_blocks_on_rt_mutex+518} {free_pages_bulk+39} {free_pages_bulk+39} ... ---- This could also have a side effect that messes things up. Unfortunately, right now I'm assigned to other tasks and I cant spend much more time on this at the moment. So hopefully, Ingo, Thomas or Bill, or someone else can help you find the reason for this problem. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/