Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752376AbaAJMMJ (ORCPT ); Fri, 10 Jan 2014 07:12:09 -0500 Received: from merlin.infradead.org ([205.233.59.134]:57940 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751840AbaAJMMH (ORCPT ); Fri, 10 Jan 2014 07:12:07 -0500 Date: Fri, 10 Jan 2014 13:11:43 +0100 From: Peter Zijlstra To: Dave Jones , linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner , Steven Rostedt , Oleg Nesterov , Paul McKenney , Linus Torvalds Subject: Re: [RFC][PATCH] lockdep: Introduce wait-type checks Message-ID: <20140110121143.GM31570@twins.programming.kicks-ass.net> References: <20140109111516.GE7572@laptop.programming.kicks-ass.net> <20140109173326.GA10105@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140109173326.GA10105@redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2014 at 12:33:26PM -0500, Dave Jones wrote: > On Thu, Jan 09, 2014 at 12:15:16PM +0100, Peter Zijlstra wrote: > > Subject: lockdep: Introduce wait-type checks > > From: Peter Zijlstra > > Date: Tue, 19 Nov 2013 21:45:48 +0100 > > > > This patch extends lockdep to validate lock wait-type context. > > ooh, a new toy. > > *boom* > > [ 0.298629] ============================= > [ 0.298732] [ BUG: Invalid wait context ] > [ 0.298834] 3.13.0-rc7+ #15 Not tainted > [ 0.298935] ----------------------------- > [ 0.299038] swapper/0/1 is trying to lock: > [ 0.299135] (&n->list_lock){......}-{3:3}, at: [] get_partial_node.isra.49+0x4d/0x228 > [ 0.299453] > stack backtrace: > [ 0.299608] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc7+ #15 > [ 0.299983] 0000000000000001 ffff880243f37a00 ffffffff816dfe5b 0000000000000014 > [ 0.300302] ffff880243f37a78 ffffffff8109f1f7 0000000000000000 ffff880243f37a78 > [ 0.300611] 0000000000000046 ffffffff81189ae3 ffffffff00000000 0000000000000046 > [ 0.300927] Call Trace: > [ 0.301028] [] dump_stack+0x4e/0x7a > [ 0.301128] [] __lock_acquire.isra.28+0x3d7/0xd80 > [ 0.301238] [] ? deactivate_slab+0x3c3/0x740 > [ 0.301345] [] lock_acquire+0x8d/0x120 > [ 0.302971] [] ? get_partial_node.isra.49+0x4d/0x228 > [ 0.303077] [] _raw_spin_lock+0x3b/0x50 > [ 0.303183] [] ? get_partial_node.isra.49+0x4d/0x228 > [ 0.303290] [] get_partial_node.isra.49+0x4d/0x228 > [ 0.303397] [] ? __module_text_address+0x12/0x60 > [ 0.303502] [] ? is_module_text_address+0x2f/0x50 > [ 0.303610] [] ? __kernel_text_address+0x58/0x80 > [ 0.303717] [] __slab_alloc+0x1cd/0x562 > [ 0.303821] [] ? alloc_cpumask_var_node+0x1f/0x90 > [ 0.303929] [] kmem_cache_alloc_node_trace+0xda/0x290 > [ 0.304037] [] ? alloc_cpumask_var_node+0x1f/0x90 > [ 0.304145] [] alloc_cpumask_var_node+0x1f/0x90 > [ 0.304250] [] alloc_cpumask_var+0xe/0x10 > [ 0.304357] [] __assign_irq_vector+0x40/0x340 > [ 0.304462] [] __create_irqs+0x151/0x210 > [ 0.304567] [] create_irq+0x22/0x30 > [ 0.304674] [] dmar_set_interrupt+0x2d/0xd0 > [ 0.304784] [] enable_drhd_fault_handling+0x24/0x66 > [ 0.304890] [] irq_remap_enable_fault_handling+0x26/0x30 > [ 0.304999] [] bsp_end_local_APIC_setup+0x18/0x1a > [ 0.305106] [] native_smp_prepare_cpus+0x35c/0x3d3 > [ 0.305215] [] kernel_init_freeable+0x124/0x26c > [ 0.305321] [] ? kernel_init+0xe/0x130 > [ 0.305427] [] ? rest_init+0xd0/0xd0 > [ 0.305529] [] kernel_init+0xe/0x130 > [ 0.305627] [] ret_from_fork+0x7c/0xb0 > [ 0.305731] [] ? rest_init+0xd0/0xd0 > [ 0.305836] > other info that might help us debug this: > [ 0.305993] 1 lock held by swapper/0/1: > [ 0.306093] #0: (vector_lock){......}-{2:2}, at: [] __create_irqs+0x10c/0x210 > [ 0.306444] Ok, so whatever way I turn this thing, we simply cannot allocate memory while holding a raw_spinlock, since all the allocator locks upto and including zone->lock are regular spinlocks and we very much want preemptible allocators, so changing that is not an option. While -rt does appear to turn list_lock into a raw_spinlock that is at most a band-aid afaict because all those list iterations that are done while holding it aren't in any way bounded. But even converting list_lock doesn't help, because SLUB (and any of the others) will eventually call into alloc_page and friends which will touch zone->lock, which is very much a spinlock, even on -rt. So we must change this __create_irqs() site to not do this allocation while holding the lock, /me greps the -rt patches to see if anybody touches that. Ah, no, -rt simply forbids CPUMASK_OFFSTACK and side-steps the issue here. Bugger. Thomas, any clue? __assign_irq_vector() is called rather deep down in the whole IRQ story and appears to be rather stupidly expensive, a sideways reading of it makes it appear to be O(nr_cpus^2) surely a complete fail even for !rt kernels. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/