Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751106AbVI1Wvx (ORCPT ); Wed, 28 Sep 2005 18:51:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751045AbVI1Wvx (ORCPT ); Wed, 28 Sep 2005 18:51:53 -0400 Received: from omx3-ext.sgi.com ([192.48.171.20]:45188 "EHLO omx3.sgi.com") by vger.kernel.org with ESMTP id S1751106AbVI1Wvw (ORCPT ); Wed, 28 Sep 2005 18:51:52 -0400 Date: Wed, 28 Sep 2005 15:50:55 -0700 (PDT) From: Christoph Lameter To: Andrew Morton cc: Ravikiran G Thirumalai , Petr Vandrovec , alokk@calsoftinc.com, linux-kernel@vger.kernel.org, manfred@colorfullife.com, "Shai Fultheim (Shai@scalex86.org)" , ananth@in.ibm.com, Andi Kleen Subject: Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849 In-Reply-To: <20050928210245.GA3760@localhost.localdomain> Message-ID: References: <20050916023005.4146e499.akpm@osdl.org> <432AA00D.4030706@vc.cvut.cz> <20050916230809.789d6b0b.akpm@osdl.org> <432EE103.5020105@vc.cvut.cz> <20050919112912.18daf2eb.akpm@osdl.org> <20050919122847.4322df95.akpm@osdl.org> <20050919221614.6c01c2d1.akpm@osdl.org> <43301578.8040305@vc.cvut.cz> <20050928210245.GA3760@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1928 Lines: 50 On Wed, 28 Sep 2005, Ravikiran G Thirumalai wrote: > Just might be relevant here, I found a bug with the recent > x86_64 changes to 2.6.14-rc* which causes the node_to_cpumask[] to go bad for > the boot processor. This happens on both amd and em64t boxes. I guess the > kevent/0 cpus_allowed mask might be changed by the bad node_to_cpumask[] > here? Andrew, could we add the following patch to the kernel to detect conditions in the future? This code will only be compiled in if slab debugging is enabled. --- [SLAB] Add additional debugging to detect slabs from the wrong node This patch adds some stack dumps if the slab logic is processing slab blocks from the wrong node. This is necessary in order to detect situations as encountered by Petr. Signed-off-by: Christoph Lameter Index: linux-2.6.14-rc2/mm/slab.c =================================================================== --- linux-2.6.14-rc2.orig/mm/slab.c 2005-09-27 13:22:30.000000000 -0700 +++ linux-2.6.14-rc2/mm/slab.c 2005-09-28 15:46:31.000000000 -0700 @@ -2421,6 +2421,7 @@ retry: next = slab_bufctl(slabp)[slabp->free]; #if DEBUG slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE; + WARN_ON(numa_node_id() != slabp->nodeid); #endif slabp->free = next; } @@ -2635,8 +2636,10 @@ static void free_block(kmem_cache_t *cac check_spinlock_acquired_node(cachep, node); check_slabp(cachep, slabp); - #if DEBUG + /* Verify that the slab belongs to the intended node */ + WARN_ON(slabp->nodeid != node); + if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) { printk(KERN_ERR "slab: double free detected in cache " "'%s', objp %p\n", cachep->name, objp); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/