Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965225AbZLHAbU (ORCPT ); Mon, 7 Dec 2009 19:31:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965199AbZLHAbT (ORCPT ); Mon, 7 Dec 2009 19:31:19 -0500 Received: from mail-ew0-f219.google.com ([209.85.219.219]:64907 "EHLO mail-ew0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965198AbZLHAbR (ORCPT ); Mon, 7 Dec 2009 19:31:17 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=O/vTctMNPb1BrZHNRSCBAJ6HuKyNUdyFh3sQ0+ubZdF3RirUQY1vcwrv3Y76X33bsE Vdikl+mtWNb8ZUu+r4ph37X/ozMVMbMTBhV77yt23lKTqi1ffRdPuQqkNHqH7ZcpcCYL zOsZudoEHfTYHnW7/2+PzRi482LLi6A1xPXqE= Date: Tue, 8 Dec 2009 01:23:53 +0100 From: Richard Zidlicky To: Andrew Morton Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: 2.6.32 regression, hard lock Message-ID: <20091208002353.GA30011@linux-m68k.org> References: <20091206141419.GA6195@linux-m68k.org> <20091207143057.53c6a362.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091207143057.53c6a362.akpm@linux-foundation.org> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10607 Lines: 147 > > first visible symptom is that ppp over UMTS connection "stops working", connection > > did not die. Trying to restart the connection fails and the related processes hang. > > > > After that one after another everything stops working, "telinit 6" does not do its > > job and SysRQ sync-umount-reboot is logged in messages but has no visible effect. > > > > Attaching messages and .config. > > > > I uploaded your dmesg output to > http://userweb.kernel.org/~akpm/stuff/messages.txt > > I'm looking at those traces and am not able to develop a theory to > explain it :( thanks for looking at it. I did revert to 2.31.5 and to my big surprise found a similar thing happened so it is not a regression, at least not from 2.6.31. No idea why it did not occur anytime sooner while running earlier kernels, however the most recently plugged in device is an USB webcam with micro which I am suspecting to play a role in this message (posted more details about it earlier): Dec 6 10:31:39 localhost kernel: [ 162.087836] ======================================================= Dec 6 10:31:39 localhost kernel: [ 162.087839] [ INFO: possible circular locking dependency detected ] Dec 6 10:31:39 localhost kernel: [ 162.087842] 2.6.32v0 #1 Dec 6 10:31:39 localhost kernel: [ 162.087844] ------------------------------------------------------- Dec 6 10:31:39 localhost kernel: [ 162.087846] pulseaudio/4506 is trying to acquire lock: Dec 6 10:31:39 localhost kernel: [ 162.087848] (sysfs_mutex){+.+.+.}, at: [] sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.087857] Dec 6 10:31:39 localhost kernel: [ 162.087858] but task is already holding lock: Dec 6 10:31:39 localhost kernel: [ 162.087860] (&pcm->open_mutex){+.+.+.}, at: [] snd_pcm_release+0x55/0x9e [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.087873] Dec 6 10:31:39 localhost kernel: [ 162.087874] which lock already depends on the new lock. Dec 6 10:31:39 localhost kernel: [ 162.087875] Dec 6 10:31:39 localhost kernel: [ 162.087876] Dec 6 10:31:39 localhost kernel: [ 162.087877] the existing dependency chain (in reverse order) is: Dec 6 10:31:39 localhost kernel: [ 162.087879] Dec 6 10:31:39 localhost kernel: [ 162.087880] -> #2 (&pcm->open_mutex){+.+.+.}: Dec 6 10:31:39 localhost kernel: [ 162.087884] [] __lock_acquire+0xa2a/0xbb5 Dec 6 10:31:39 localhost kernel: [ 162.087890] [] lock_acquire+0x94/0xb1 Dec 6 10:31:39 localhost kernel: [ 162.087893] [] __mutex_lock_common+0x35/0x2dc Dec 6 10:31:39 localhost kernel: [ 162.087898] [] mutex_lock_nested+0x30/0x38 Dec 6 10:31:39 localhost kernel: [ 162.087901] [] snd_pcm_release+0x55/0x9e [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.087913] [] __fput+0xf0/0x187 Dec 6 10:31:39 localhost kernel: [ 162.087918] [] fput+0x19/0x1b Dec 6 10:31:39 localhost kernel: [ 162.087921] [] remove_vma+0x3e/0x5d Dec 6 10:31:39 localhost kernel: [ 162.087926] [] do_munmap+0x21c/0x237 Dec 6 10:31:39 localhost kernel: [ 162.087929] [] sys_munmap+0x30/0x3f Dec 6 10:31:39 localhost kernel: [ 162.087932] [] syscall_call+0x7/0xb Dec 6 10:31:39 localhost kernel: [ 162.087937] Dec 6 10:31:39 localhost kernel: [ 162.087938] -> #1 (&mm->mmap_sem){++++++}: Dec 6 10:31:39 localhost kernel: [ 162.087942] [] __lock_acquire+0xa2a/0xbb5 Dec 6 10:31:39 localhost kernel: [ 162.087946] [] lock_acquire+0x94/0xb1 Dec 6 10:31:39 localhost kernel: [ 162.087949] [] might_fault+0x64/0x81 Dec 6 10:31:39 localhost kernel: [ 162.087952] [] copy_to_user+0x2c/0xfc Dec 6 10:31:39 localhost kernel: [ 162.087957] [] filldir+0x78/0xb7 Dec 6 10:31:39 localhost kernel: [ 162.087961] [] sysfs_readdir+0x117/0x14b Dec 6 10:31:39 localhost kernel: [ 162.087964] [] vfs_readdir+0x68/0x94 Dec 6 10:31:39 localhost kernel: [ 162.087968] [] sys_getdents+0x62/0xa1 Dec 6 10:31:39 localhost kernel: [ 162.087971] [] syscall_call+0x7/0xb Dec 6 10:31:39 localhost kernel: [ 162.087975] Dec 6 10:31:39 localhost kernel: [ 162.087975] -> #0 (sysfs_mutex){+.+.+.}: Dec 6 10:31:39 localhost kernel: [ 162.087979] [] __lock_acquire+0x937/0xbb5 Dec 6 10:31:39 localhost kernel: [ 162.087983] [] lock_acquire+0x94/0xb1 Dec 6 10:31:39 localhost kernel: [ 162.087986] [] __mutex_lock_common+0x35/0x2dc Dec 6 10:31:39 localhost kernel: [ 162.087990] [] mutex_lock_nested+0x30/0x38 Dec 6 10:31:39 localhost kernel: [ 162.087993] [] sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.087996] [] sysfs_remove_group+0x1e/0xa3 Dec 6 10:31:39 localhost kernel: [ 162.088000] [] dpm_sysfs_remove+0x10/0x12 Dec 6 10:31:39 localhost kernel: [ 162.088006] [] device_del+0x33/0x154 Dec 6 10:31:39 localhost kernel: [ 162.088010] [] device_unregister+0x28/0x4b Dec 6 10:31:39 localhost kernel: [ 162.088014] [] usb_remove_ep_devs+0x15/0x1f Dec 6 10:31:39 localhost kernel: [ 162.088018] [] remove_intf_ep_devs+0x21/0x32 Dec 6 10:31:39 localhost kernel: [ 162.088023] [] usb_set_interface+0x10c/0x19f Dec 6 10:31:39 localhost kernel: [ 162.088027] [] snd_usb_capture_close+0x1e/0x37 [snd_usb_audio] Dec 6 10:31:39 localhost kernel: [ 162.088040] [] snd_pcm_release_substream+0x3d/0x66 [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.088050] [] snd_pcm_release+0x5c/0x9e [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.088058] [] __fput+0xf0/0x187 Dec 6 10:31:39 localhost kernel: [ 162.088062] [] fput+0x19/0x1b Dec 6 10:31:39 localhost kernel: [ 162.088065] [] remove_vma+0x3e/0x5d Dec 6 10:31:39 localhost kernel: [ 162.088069] [] do_munmap+0x21c/0x237 Dec 6 10:31:39 localhost kernel: [ 162.088072] [] sys_munmap+0x30/0x3f Dec 6 10:31:39 localhost kernel: [ 162.088076] [] syscall_call+0x7/0xb Dec 6 10:31:39 localhost kernel: [ 162.088079] Dec 6 10:31:39 localhost kernel: [ 162.088080] other info that might help us debug this: Dec 6 10:31:39 localhost kernel: [ 162.088080] Dec 6 10:31:39 localhost kernel: [ 162.088083] 2 locks held by pulseaudio/4506: Dec 6 10:31:39 localhost kernel: [ 162.088085] #0: (&mm->mmap_sem){++++++}, at: [] sys_munmap+0x23/0x3f Dec 6 10:31:39 localhost kernel: [ 162.088090] #1: (&pcm->open_mutex){+.+.+.}, at: [] snd_pcm_release+0x55/0x9e [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.088104] Dec 6 10:31:39 localhost kernel: [ 162.088105] stack backtrace: Dec 6 10:31:39 localhost kernel: [ 162.088108] Pid: 4506, comm: pulseaudio Not tainted 2.6.32v0 #1 Dec 6 10:31:39 localhost kernel: [ 162.088110] Call Trace: Dec 6 10:31:39 localhost kernel: [ 162.088113] [] ? printk+0xf/0x13 Dec 6 10:31:39 localhost kernel: [ 162.088117] [] print_circular_bug+0x91/0x9d Dec 6 10:31:39 localhost kernel: [ 162.088121] [] __lock_acquire+0x937/0xbb5 Dec 6 10:31:39 localhost kernel: [ 162.088124] [] ? save_trace+0x37/0xa3 Dec 6 10:31:39 localhost kernel: [ 162.088128] [] lock_acquire+0x94/0xb1 Dec 6 10:31:39 localhost kernel: [ 162.088131] [] ? sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.088135] [] __mutex_lock_common+0x35/0x2dc Dec 6 10:31:39 localhost kernel: [ 162.088138] [] ? sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.088141] [] ? mark_held_locks+0x43/0x5b Dec 6 10:31:39 localhost kernel: [ 162.088145] [] ? trace_hardirqs_on+0xb/0xd Dec 6 10:31:39 localhost kernel: [ 162.088148] [] mutex_lock_nested+0x30/0x38 Dec 6 10:31:39 localhost kernel: [ 162.088151] [] ? sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.088154] [] sysfs_get_dirent+0x15/0x35 Dec 6 10:31:39 localhost kernel: [ 162.088157] [] sysfs_remove_group+0x1e/0xa3 Dec 6 10:31:39 localhost kernel: [ 162.088161] [] dpm_sysfs_remove+0x10/0x12 Dec 6 10:31:39 localhost kernel: [ 162.088164] [] device_del+0x33/0x154 Dec 6 10:31:39 localhost kernel: [ 162.088168] [] device_unregister+0x28/0x4b Dec 6 10:31:39 localhost kernel: [ 162.088171] [] usb_remove_ep_devs+0x15/0x1f Dec 6 10:31:39 localhost kernel: [ 162.088174] [] remove_intf_ep_devs+0x21/0x32 Dec 6 10:31:39 localhost kernel: [ 162.088178] [] usb_set_interface+0x10c/0x19f Dec 6 10:31:39 localhost kernel: [ 162.088190] [] snd_usb_capture_close+0x1e/0x37 [snd_usb_audio] Dec 6 10:31:39 localhost kernel: [ 162.088199] [] snd_pcm_release_substream+0x3d/0x66 [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.088208] [] snd_pcm_release+0x5c/0x9e [snd_pcm] Dec 6 10:31:39 localhost kernel: [ 162.088211] [] __fput+0xf0/0x187 Dec 6 10:31:39 localhost kernel: [ 162.088215] [] fput+0x19/0x1b Dec 6 10:31:39 localhost kernel: [ 162.088218] [] remove_vma+0x3e/0x5d Dec 6 10:31:39 localhost kernel: [ 162.088221] [] do_munmap+0x21c/0x237 Dec 6 10:31:39 localhost kernel: [ 162.088225] [] sys_munmap+0x30/0x3f Dec 6 10:31:39 localhost kernel: [ 162.088228] [] syscall_call+0x7/0xb > There's some libata implication at timestamp 13203.385345 but that > looks like it's just a fluke. hm.. I am running everything on a dm-crypted device with LVM, my ata is a mix of ancient 40/80 wire-flat-cable disks and sata. > btw, why does your kernel say "To Be Filled By O.E.M." in the traces? > The kernel.org kernel doesn't do that. Is that kernel patched at all? strange, this was the first kernel in months that was not patched - even difffed it against kernel.org to make sure. Richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/