Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757689Ab0BCWwl (ORCPT ); Wed, 3 Feb 2010 17:52:41 -0500 Received: from mail-vw0-f46.google.com ([209.85.212.46]:38086 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754668Ab0BCWwj (ORCPT ); Wed, 3 Feb 2010 17:52:39 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=OOpl5aujuWoKbS7XWGIEH9/wAzXCjuSJfvlSDTrFovgwl4nCYiPqXmm+jznivtsGaS dFbQ/s0IimuAXpL0UwOnomKzyloT8MnxWbQQUDt4IaW5tIofNxVDc00i2NQKqZHIBojn sbtzcx0Cr4z6b5y93XGENo7V9Hvk46qSL+P4c= Date: Wed, 3 Feb 2010 23:52:34 +0100 From: Frederic Weisbecker To: Alexander Beregalov Cc: Linux Kernel Mailing List Subject: Re: reiserfs deadlock Message-ID: <20100203225232.GI5068@nowhere> References: <20100203202909.GA5068@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3741 Lines: 103 On Thu, Feb 04, 2010 at 01:43:53AM +0300, Alexander Beregalov wrote: > On 3 February 2010 23:29, Frederic Weisbecker wrote: > > On Wed, Feb 03, 2010 at 10:08:57PM +0300, Alexander Beregalov wrote: > >> On 3 February 2010 22:03, Alexander Beregalov wrote: > >> > Hi Frederic > >> > > >> > I do not have previous messages and do not know how to reproduce it. > >> > Kernel was 2.6.33-rc5-00237-g9a3cbe3 > >> > > >> > >> Hm, I have the same after reboot. > >> > >> Do you need me to do anything before I try to fsck ? > > > > > > Yeah. Rebooting again makes your kernel soft lockup? > Yes, reboot does not help. I even can't login, agetty and sshd are frozen. > > INFO: task sshd:1863 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > sshd D 6f60ec44 6576 1863 1810 0x00000000 > f633dd78 00000046 ffffffff 6f60ec44 0000000f f7306b30 f73068b0 00000000 > f7306d84 7fffffff 00000000 f633de70 f633dde8 c134da45 00000000 f633dd8c > c104ca3b 00000000 7fffffff 0000000f 6f618f50 f73068b0 00000000 00000000 > Call Trace: > [] schedule_timeout+0x125/0x1b0 > [] ? trace_hardirqs_off+0xb/0x10 > [] ? _raw_spin_unlock_irq+0x22/0x30 > [] ? trace_hardirqs_on_caller+0x124/0x170 > [] ? trace_hardirqs_on+0xb/0x10 > [] wait_for_common+0xd0/0x130 > [] ? default_wake_function+0x0/0x10 > [] wait_for_completion+0x12/0x20 > [] call_usermodehelper_exec+0x89/0xb0 > [] ? call_usermodehelper_setup+0x71/0xb0 > [] ? wait_for_common+0x30/0x130 > [] __request_module+0xa2/0xf0 > [] ? new_inode+0x76/0x80 > [] ? _raw_spin_unlock+0x1d/0x20 > [] __sock_create+0x18f/0x1f0 > [] ? might_fault+0x4a/0xa0 > [] sock_create+0x37/0x40 > [] sys_socket+0x3e/0x70 > [] sys_socketcall+0x60/0x270 > [] ? sysenter_exit+0xf/0x18 > [] ? trace_hardirqs_on_thunk+0xc/0x10 > [] sysenter_do_call+0x12/0x36 > no locks held by sshd/1863. > > No locks - what does it mean? This is the call_usermodehelper_exec path, so probably the kernel tries to ask userspace to load a module, but since the filesystem is locked up, this can't happen. > > > > Usually such softlockup happens because we have a lock > > inversion, in which case you should have a lockdep report > > before the softlockup. > > No, I do not have it. 120 seconds after boot I see these messages on > the console, > no lockdep reports (lockdep is enabled). So this is probably this event waited thing. > > > > Otherwise this can also happen when we wait for an event > > that needs the lock to complete but > > that can not happen because we already have the lock. > > > > Task A hold reiserfs lock and wait for event 1 > > Task B wants to complete event 1 but it need the reisers lock > > for that => deadlock. > > > > This can usually be found in a softlockup report: lots of > > tasks are blocked on reiserfs_write_lock/mutex_lock > > except one, and this one is important as it is probably > > the waiter: the task that holds the lock and that is waiting > > for another event (that in turn needs the lock to complete). > > > > Having more reports could probably help us: > > > > echo 100 > /proc/sys/kernel/hung_task_warnings > > Ok, I will modify rc scripts to do it, as I can't login. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/