Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750747AbWIHJTR (ORCPT ); Fri, 8 Sep 2006 05:19:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750748AbWIHJTR (ORCPT ); Fri, 8 Sep 2006 05:19:17 -0400 Received: from nz-out-0102.google.com ([64.233.162.194]:12447 "EHLO nz-out-0102.google.com") by vger.kernel.org with ESMTP id S1750747AbWIHJTQ (ORCPT ); Fri, 8 Sep 2006 05:19:16 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; b=VOjA6z5KSO62y8NK7/Q4hcVlCo/6mm1s55nQsbAHNKqPqniQuRr+9ebIvdv2SKXrbwyUE9mY70Sl/XrAjV7ke41jz1QgODnBk0c0rOt9uy2u5cgP5EcHcgpz9qIAHzgj/ggK03LUq+lxHK/yQ//JBUzxrVNtjsiBDI/DLeDMqjw= Message-ID: <4501357A.5080501@gmail.com> Date: Fri, 08 Sep 2006 11:18:27 +0159 From: Jiri Slaby User-Agent: Thunderbird 2.0a1 (X11/20060724) MIME-Version: 1.0 To: Jiri Slaby CC: Andrew Morton , Linux Kernel , Arjan van de Ven , Ingo Molnar , Dave Jones Subject: Re: workqueue lockdep bug. References: <20060814183319.GJ15569@redhat.com> <20060814233626.1e3c41b2.akpm@osdl.org> <3888a5cd0608161631j629ae4a2l929d9b8fead6a2d8@mail.gmail.com> <44E3C12B.5030905@gmail.com> In-Reply-To: <44E3C12B.5030905@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6382 Lines: 170 Jiri Slaby wrote: > Andrew Morton wrote: >> On Mon, 14 Aug 2006 14:33:19 -0400 >> Dave Jones wrote: >> >>> Andrew, >>> I merged the workqueue changes from -mm into the Fedora-devel kernel to >>> kill off those billion cpufreq lockdep warnings. The bug has now >>> mutated >>> into this: >>> >>> (Trimmed version of log from >>> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202223) >>> >> >> I don't get it. > > Let me extend the output a little bit: > > clock = mutex_lock(cpu_add_remove_lock) > wqlock = mutex_lock(workqueue_mutex) > slock = mutex_lock(cpu_chain.rwsem) > similar for cunlock, wqunlock, sunlock. > > The number after colon is linenumber, where the mutex_XXX lies. > Prints are _after_ mutex_lock and _before_ mutex_unlock calls. > > So here it comes: > > [ 30.947289] clock: 268 > [ 30.947340] Disabling non-boot CPUs ... > [ 30.947392] slock: 334 > [ 30.964622] wqlock: 689 > [ 30.964659] sunlock: 336 > > Isn't this strange for validator (lock1-lock2-unlock1 + > (below)lock1-unlock2-unlock1)? > > [ 30.966762] Breaking affinity for irq 0 > [ 30.968116] CPU 1 is now offline > [ 30.968155] lockdep: not fixing up alternatives. > [ 30.968200] > [ 30.968201] ======================================================= > [ 30.968269] [ INFO: possible circular locking dependency detected ] > [ 30.968307] 2.6.18-rc4-mm1-bug #11 > [ 30.968342] ------------------------------------------------------- > > >>> > Breaking affinity for irq 185 >>> > Breaking affinity for irq 193 >>> > Breaking affinity for irq 209 >>> > CPU 1 is now offline >>> > lockdep: not fixing up alternatives. >>> > >>> > ======================================================= >>> > [ INFO: possible circular locking dependency detected ] >>> > 2.6.17-1.2548.fc6 #1 >>> > ------------------------------------------------------- >>> > pm-hibernate/4335 is trying to acquire lock: >>> > ((cpu_chain).rwsem){..--}, at: [] >>> blocking_notifier_call_chain+0x11/0x2d >>> > >>> > but task is already holding lock: >>> > (workqueue_mutex){--..}, at: [] mutex_lock+0x21/0x24 >>> > >>> > which lock already depends on the new lock. >>> > >>> > >>> > the existing dependency chain (in reverse order) is: >>> > >>> > -> #1 (workqueue_mutex){--..}: >>> > [] lock_acquire+0x4b/0x6d >>> > [] __mutex_lock_slowpath+0xbc/0x20a >>> > [] mutex_lock+0x21/0x24 >>> > [] workqueue_cpu_callback+0xfd/0x1ee >>> > [] notifier_call_chain+0x20/0x31 >>> > [] blocking_notifier_call_chain+0x1d/0x2d >>> > [] _cpu_down+0x47/0x1c4 >>> > [] disable_nonboot_cpus+0x9b/0x11a >>> > [] prepare_processes+0xe/0x41 >>> > [] pm_suspend_disk+0x9/0xf3 >>> > [] enter_state+0x54/0x1b7 >>> > [] state_store+0x86/0x9c >>> > [] subsys_attr_store+0x20/0x25 >>> > [] sysfs_write_file+0xab/0xd1 >>> > [] vfs_write+0xab/0x157 >>> > [] sys_write+0x3b/0x60 >>> > [] syscall_call+0x7/0xb >> >> cpu_add_remove_lock -> cpu_chain.rwsem -> workqueue_mutex >> >>> > -> #0 ((cpu_chain).rwsem){..--}: >>> > [] lock_acquire+0x4b/0x6d >>> > [] down_read+0x2d/0x40 >>> > [] blocking_notifier_call_chain+0x11/0x2d >>> > [] _cpu_down+0x12c/0x1c4 >>> > [] disable_nonboot_cpus+0x9b/0x11a >>> > [] prepare_processes+0xe/0x41 >>> > [] pm_suspend_disk+0x9/0xf3 >>> > [] enter_state+0x54/0x1b7 >>> > [] state_store+0x86/0x9c >>> > [] subsys_attr_store+0x20/0x25 >>> > [] sysfs_write_file+0xab/0xd1 >>> > [] vfs_write+0xab/0x157 >>> > [] sys_write+0x3b/0x60 >>> > [] syscall_call+0x7/0xb >> >> cpu_add_remove_lock -> cpu_chain.rwsem >> >>> > other info that might help us debug this: >>> > >>> > 2 locks held by pm-hibernate/4335: >>> > #0: (cpu_add_remove_lock){--..}, at: [] >>> mutex_lock+0x21/0x24 >>> > #1: (workqueue_mutex){--..}, at: [] mutex_lock+0x21/0x24 >>> > >>> > stack backtrace: >>> > [] show_trace_log_lvl+0x58/0x159 >>> > [] show_trace+0xd/0x10 >>> > [] dump_stack+0x19/0x1b >>> > [] print_circular_bug_tail+0x59/0x64 >>> > [] __lock_acquire+0x80d/0x99c >>> > [] lock_acquire+0x4b/0x6d >>> > [] down_read+0x2d/0x40 >>> > [] blocking_notifier_call_chain+0x11/0x2d >>> > [] _cpu_down+0x12c/0x1c4 >>> > [] disable_nonboot_cpus+0x9b/0x11a >>> > [] prepare_processes+0xe/0x41 >>> > [] pm_suspend_disk+0x9/0xf3 >>> > [] enter_state+0x54/0x1b7 >>> > [] state_store+0x86/0x9c >>> > [] subsys_attr_store+0x20/0x25 >>> > [] sysfs_write_file+0xab/0xd1 >>> > [] vfs_write+0xab/0x157 >>> > [] sys_write+0x3b/0x60 >>> > [] syscall_call+0x7/0xb > > [ 30.981176] [] sys_write+0x47/0x6e > [ 30.981249] [] syscall_call+0x7/0xb > [ 30.981322] ======================= > [ 30.981378] slock: 334 > > The one, that failed. > > [ 30.981882] wqunlock: 702 > [ 30.981939] sunlock: 336 > [ 30.981996] CPU1 is down > [ 30.982036] cunlock: 309 > [ 30.982075] Stopping tasks: ============ > [ 31.149008] ==================| > >> cpu_add_remove_lock -> cpu_chain.rwsem >> >> I don't see anywhere where this process took workqueue_mutex. > > Hope this helps? I can't see any difference in 2.6.18-rc6-mm1. Was this somehow processed? thanks, -- http://www.fi.muni.cz/~xslaby/ Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/