Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760231AbXEaSlb (ORCPT ); Thu, 31 May 2007 14:41:31 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754587AbXEaSlY (ORCPT ); Thu, 31 May 2007 14:41:24 -0400 Received: from netops-testserver-3-out.sgi.com ([192.48.171.28]:50388 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753976AbXEaSlX (ORCPT ); Thu, 31 May 2007 14:41:23 -0400 Date: Thu, 31 May 2007 11:41:22 -0700 (PDT) From: Christoph Lameter X-X-Sender: clameter@schroedinger.engr.sgi.com To: Andrew Morton cc: Michal Piotrowski , linux-kernel@vger.kernel.org, "Rafael J. Wysocki" , Satoru Takeuchi Subject: Re: 2.6.22-rc3-mm1 In-Reply-To: <20070531113155.1c6d7185.akpm@linux-foundation.org> Message-ID: References: <20070530235823.793f00d9.akpm@linux-foundation.org> <465F0B83.2050100@googlemail.com> <20070531113155.1c6d7185.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2406 Lines: 56 On Thu, 31 May 2007, Andrew Morton wrote: > > l *0xc0181288 > > 0xc0181288 is in add_partial (/home/devel/linux-mm/mm/slub.c:1193). > > 1188 } > > 1189 > > 1190 static void add_partial(struct kmem_cache_node *n, struct page *page) > > 1191 { > > 1192 spin_lock(&n->list_lock); > > 1193 n->nr_partial++; > > 1194 list_add(&page->lru, &n->partial); > > 1195 spin_unlock(&n->list_lock); > > 1196 } > > 1197 add_partial runs with interrupts disabled. The interrupts are disabled when we enter SLUB via allocator or free functions. > > [ 4972.087650] {in-hardirq-W} state was registered at: > > [ 4972.092656] [] mark_lock+0x82/0x557 > > [ 4972.097323] [] __lock_acquire+0x476/0xd36 > > [ 4972.102562] [] lock_acquire+0x9e/0xb8 > > [ 4972.107342] [] _spin_lock+0x38/0x62 > > [ 4972.111993] [] deactivate_slab+0xb9/0x179 > > [ 4972.117300] [] flush_slab+0x6d/0x72 > > [ 4972.122063] [] __flush_cpu_slab+0x31/0x36 > > [ 4972.127335] [] flush_cpu_slab+0x14/0x17 > > [ 4972.132401] [] smp_call_function_interrupt+0x3a/0x56 > > [ 4972.138607] [] call_function_interrupt+0x33/0x38 > > [ 4972.144503] [] default_idle+0x50/0x69 > > [ 4972.149421] [] cpu_idle+0xb3/0xf8 > > [ 4972.153889] [] rest_init+0x56/0x58 > > [ 4972.158402] [] start_kernel+0x351/0x359 > > [ 4972.163450] [] 0xffffffff > > [ 4972.167221] irq event stamp: 2451 > > Yep, that's a bug in slub. We take that lock in the IPI handler. If a CPU > is currently holding that lock and then takes the IPI and enters > add_partial(), it'll deadlock. A cpu cannot enter an IPI handler while interrupts are disabled. That needs to be always the case when the list_lock is held. > Perhaps a suitable fix would be local_irq_disable() in flush_slab(). As far as I can tell: Interrupts are always disabled when flush_slab is run. Sometimes we use spin_lock_irqsave for the list_lock and at other times spin_lock if interrupts are already disabled. Is that the problem? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/