Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752849Ab1DTWtK (ORCPT ); Wed, 20 Apr 2011 18:49:10 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:44710 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752420Ab1DTWtI (ORCPT ); Wed, 20 Apr 2011 18:49:08 -0400 Date: Wed, 20 Apr 2011 11:30:47 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Dave Jones , Linux Kernel , x86@kernel.org, Peter Zijlstra Subject: Re: rcu stall. Message-ID: <20110420183046.GQ2307@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20110420020215.GA30081@redhat.com> <20110420083616.GA1124@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110420083616.GA1124@elte.hu> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3148 Lines: 65 On Wed, Apr 20, 2011 at 10:36:16AM +0200, Ingo Molnar wrote: > > * Dave Jones wrote: > > > Machine was under heavy load (300 or so running processes > > calling random system calls). The rcu stall detector kicked in, > > spewed this, and then the machine completely locked up. > > Without having looked at it in detail, isnt this a lockup somewhere in the > wireless code: > > > [] ? simple_release_fs+0x22/0x57 > > [] ? arch_local_irq_restore+0x6/0xd > > [] lock_acquired+0x20f/0x21e > > [] _raw_spin_lock+0x62/0x6a > > [] ? simple_release_fs+0x22/0x57 > > [] ? _raw_spin_unlock+0x28/0x2c > > [] simple_release_fs+0x22/0x57 > > [] debugfs_remove_recursive+0x11f/0x16b > > [] ieee80211_debugfs_key_remove+0x1f/0x2e [mac80211] > > [] __ieee80211_key_destroy+0x61/0x6d [mac80211] > > [] ieee80211_key_link+0x12c/0x165 [mac80211] > > [] ieee80211_add_key+0xfb/0x133 [mac80211] > > [] nl80211_new_key+0xe5/0x106 [cfg80211] > > [] ? cfg80211_get_dev_from_ifindex+0x72/0x7a [cfg80211] > > [] genl_rcv_msg+0x1dc/0x207 > > [] ? genl_rcv+0x2d/0x2d > > [] netlink_rcv_skb+0x43/0x8f > > [] genl_rcv+0x26/0x2d > > [] netlink_unicast+0xec/0x156 > > [] netlink_sendmsg+0x27f/0x2c0 > > [] __sock_sendmsg+0x69/0x75 > > [] sock_sendmsg+0xa1/0xb6 > > [] ? lock_release+0x181/0x18e > > [] ? might_fault+0xa5/0xac > > [] ? might_fault+0x5c/0xac > > [] ? copy_from_user+0x2f/0x31 > > [] ? copy_from_user+0x2f/0x31 > > [] ? verify_iovec+0x52/0xa6 > > [] sys_sendmsg+0x23a/0x2b8 > > [] ? lock_acquire+0xec/0xfb > > [] ? lock_release+0x181/0x18e > > [] ? mntput+0x26/0x28 > > [] ? fput+0x1e6/0x1f5 > > [] ? path_put+0x1f/0x23 > > [] ? audit_syscall_entry+0x11c/0x148 > > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > > [] system_call_fastpath+0x16/0x1b > > RCU stall detector is simply the first thing that noticed the hang. Enabling > the regular lockup detector would probably have resulted in a similar looking > hang. Hello, Dave, In case the lockup detector would get you better information, you can prevent RCU from checking for stall by setting the rcu_cpu_stall_suppress module parameter to 1, either as a boot parameter or via sysfs. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/