Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752131AbYKMGKO (ORCPT ); Thu, 13 Nov 2008 01:10:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750920AbYKMGJ6 (ORCPT ); Thu, 13 Nov 2008 01:09:58 -0500 Received: from mu-out-0910.google.com ([209.85.134.184]:42387 "EHLO mu-out-0910.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750777AbYKMGJ5 (ORCPT ); Thu, 13 Nov 2008 01:09:57 -0500 Message-ID: <491BC4B8.1050406@colorfullife.com> Date: Thu, 13 Nov 2008 07:10:00 +0100 From: Manfred Spraul User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: Andrew Morton CC: cboulte@gmail.com, Nadia.Derbey@bull.net, linux-kernel@vger.kernel.org, Ingo Molnar Subject: Re: [PATCH] SYSVIPC - Fix the ipc structures initialization References: <20081028145952.620752409@bull.net> <20081028150041.857635775@bull.net> <4f3ee3290810290211y75a2d0eaoe666496e25496260@mail.gmail.com> <20081111141603.f0e7fa8d.akpm@linux-foundation.org> In-Reply-To: <20081111141603.f0e7fa8d.akpm@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2990 Lines: 64 Andrew Morton wrote: > Time is starting to press on this one. Is there something which we can > revert which would fix this bug? > My previous analysis was bogus, let's start from scratch: 1) the initial oops report: http://bugzilla.kernel.org/show_bug.cgi?id=11796#c0 - lockdep is enabled, the oops is somewhere in __lock_acquire - the instruction that oopses is >>> lock incl 0x138(%r12) R12 is 0x0038004000000000 That could be an debug_atomic_inc() in __lock_acquire. The class pointer in the spinlock_t is not initialized, thus it crashes. Ingo - is that possible? 2) the latest oops was actually a soft lockup: It starts with: > [ 400.393024] INFO: trying to register non-static key. > [ 400.397005] the code is fine but needs lockdep annotation. > [ 400.397005] turning off the locking correctness validator. > [ 400.397005] Pid: 4207, comm: sysv_test2 Not tainted 2.6.27-ipc_lock #1 > [ 400.397005] Call Trace: > [ 400.397005] [] static_obj+0x60/0x77 > [ 400.397005] [] __lock_acquire+0x1c8/0x779 > [ 400.397005] [] lock_acquire+0x95/0xc2 > [ 400.397005] [] ipc_lock+0x62/0x99 > [ 400.397005] [] _spin_lock+0x2d/0x5a > [ 400.397005] [] ipc_lock+0x62/0x99 > [ 400.397005] [] ipc_lock+0x62/0x99 > [ 400.397005] [] ipc_lock+0x0/0x99 > [ 400.397005] [] ipc_lock_check+0x8/0x53 > [ 400.397005] [] sys_msgctl+0x188/0x461 > [ 400.397005] [] trace_hardirqs_on_caller+0x100/0x12a > [ 400.397005] [] trace_hardirqs_on_thunk+0x3a/0x3f > [ 400.397005] [] trace_hardirqs_on_caller+0x100/0x12a > [ 400.397005] [] sched_clock+0x5/0x7 > [ 400.397005] [] trace_hardirqs_on_thunk+0x3a/0x3f > [ 400.397005] [] native_sched_clock+0x8c/0xa5 > [ 400.397005] [] sched_clock+0x5/0x7 > [ 400.397005] [] system_call_fastpath+0x16/0x1b > [ 400.397005] > [ 464.933003] BUG: soft lockup - CPU#2 stuck for 61s! [sysv_test2:4207] > [ 464.933006] Call Trace: > [ 464.933006] [] _raw_spin_lock+0x98/0x100 > [ 464.933006] [] _spin_lock+0x4e/0x5a > [ 464.933006] [] ipc_lock+0x62/0x99 For me, it reads like an uninitialized spinlock_t: The static_obj test in kernel/lockdep.c notices that something is wrong and disables itself. But then _raw_spin_lock() tries to acquire the uninitialized spinlock and loops forever, because noone does spin_unlock(). after 60 seconds, the soft lockup detection notices the problem and oopses. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/