Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965019AbVLNWVj (ORCPT ); Wed, 14 Dec 2005 17:21:39 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965020AbVLNWVj (ORCPT ); Wed, 14 Dec 2005 17:21:39 -0500 Received: from e36.co.us.ibm.com ([32.97.110.154]:15264 "EHLO e36.co.us.ibm.com") by vger.kernel.org with ESMTP id S965019AbVLNWVi (ORCPT ); Wed, 14 Dec 2005 17:21:38 -0500 Date: Thu, 15 Dec 2005 04:09:13 +0530 From: Dinakar Guniguntala To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , David Singleton Subject: Recursion bug in -rt Message-ID: <20051214223912.GA4716@in.ibm.com> Reply-To: dino@in.ibm.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="opJtzjQTFsWo+cga" Content-Disposition: inline User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3182 Lines: 96 --opJtzjQTFsWo+cga Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi David, I hit this bug with -rt22-rf11 ========================================== [ BUG: lock recursion deadlock detected! | ------------------------------------------ already locked: [f7abbc94] {futex} .. held by: testpi-3: 4595 [f7becdd0, 59] ... acquired at: futex_wait_robust+0x142/0x1f3 ------------------------------ | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): ------------------------------ #001: [f7abbc94] {futex} ... acquired at: futex_wait_robust+0x142/0x1f3 -{current task's backtrace}-----------------> [] dump_stack+0x1e/0x20 (20) [] check_deadlock+0x2d7/0x334 (44) [] task_blocks_on_lock+0x2c/0x224 (36) [] __down_interruptible+0x37c/0x95d (160) [] down_futex+0xa3/0xe7 (40) [] futex_wait_robust+0x142/0x1f3 (72) [] do_futex+0x9a/0x109 (40) [] sys_futex+0x112/0x11e (68) [] sysenter_past_esp+0x54/0x75 (-8116) ------------------------------ | showing all locks held by: | (testpi-3/4595 [f7becdd0, 59]): ------------------------------ #001: [f7abbc94] {futex} ... acquired at: futex_wait_robust+0x142/0x1f3 --------------------------------------------------------------------- futex.c -> futex_wait_robust if ((curval & FUTEX_PID) == current->pid) { ret = -EAGAIN; goto out_unlock; } rt.c -> down_futex if (!owner_task || owner_task == current) { up(sem); up_read(¤t->mm->mmap_sem); return -EAGAIN; } I noticed that both the above checks below have been removed in your patch. I do understand that the futex_wait_robust path has been made similar to the futex_wait path, but I think we are not taking PI into consideration. Basically it looks like we still need to check if the current task has become owner. or are we missing a lock somewhere ? I added the down_futex check above and my test has been running for hours without the oops. Without this check it used to oops within minutes. Patch that works for me attached below. Thoughts? -Dinakar --opJtzjQTFsWo+cga Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="check_current.patch" Index: linux-2.6.14-rt22-rayrt5/kernel/rt.c =================================================================== --- linux-2.6.14-rt22-rayrt5.orig/kernel/rt.c 2005-12-15 02:15:13.000000000 +0530 +++ linux-2.6.14-rt22-rayrt5/kernel/rt.c 2005-12-15 02:18:29.000000000 +0530 @@ -3001,7 +3001,7 @@ * if the owner can't be found return try again. */ - if (!owner_task) { + if (!owner_task || owner_task == current) { up(sem); up_read(¤t->mm->mmap_sem); return -EAGAIN; --opJtzjQTFsWo+cga-- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/