Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756175AbdCUEo2 (ORCPT ); Tue, 21 Mar 2017 00:44:28 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:34867 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751148AbdCUEo0 (ORCPT ); Tue, 21 Mar 2017 00:44:26 -0400 Date: Tue, 21 Mar 2017 13:44:21 +0900 From: Sergey Senozhatsky To: Sebastian Ott Cc: Petr Mladek , Steven Rostedt , Andrew Morton , Thomas Gleixner , Peter Zijlstra , Linus Torvalds , Michal Hocko , Heiko Carstens , Martin Schwidefsky , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: lockdep warning: console vs. mem hotplug Message-ID: <20170321044421.GB448@jagdpanzerIV.localdomain> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15869 Lines: 330 Cc Michal, Linus, Peter, Thomas, Andrew, Martin // I kept only lockdep splats from the original report and cut off .configs // full version: // lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi Sebastian, thanks. On (03/20/17 17:43), Sebastian Ott wrote: [..] > Since commit f975237b7682 ("printk: use printk_safe buffers in printk") > I observe lockdep warnings on s390 when doing memory hotplug: > > [ 132.379220] ====================================================== > [ 132.379220] [ INFO: possible circular locking dependency detected ] > [ 132.379221] 4.11.0-rc3 #330 Not tainted > [ 132.379221] ------------------------------------------------------- > [ 132.379222] sh/708 is trying to acquire lock: > [ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8 > [ 132.379224] > [ 132.379224] but task is already holding lock: > [ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340 > [ 132.379226] > [ 132.379227] which lock already depends on the new lock. > [ 132.379227] > [ 132.379227] > [ 132.379228] the existing dependency chain (in reverse order) is: > [ 132.379228] > [ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}: > [ 132.379230] validate_chain.isra.22+0xb36/0xd18 > [ 132.379231] __lock_acquire+0x650/0x880 > [ 132.379231] lock_acquire+0x24e/0x2b0 > [ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8 > [ 132.379232] get_page_from_freelist+0x548/0xf30 > [ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0 > [ 132.379233] allocate_slab+0xde/0x668 > [ 132.379233] new_slab+0x94/0xa8 > [ 132.379234] ___slab_alloc.constprop.66+0x556/0x578 > [ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8 > [ 132.379235] kmem_cache_alloc+0x136/0x4b8 > [ 132.379235] __debug_object_init+0xaa/0x448 > [ 132.379236] debug_object_activate+0x182/0x238 > [ 132.379236] mod_timer+0xe6/0x450 > [ 132.379236] prandom_reseed+0x44/0x50 > [ 132.379237] do_one_initcall+0xa2/0x150 > [ 132.379237] kernel_init_freeable+0x228/0x2d8 > [ 132.379238] kernel_init+0x2a/0x138 > [ 132.379238] kernel_thread_starter+0x6/0xc > [ 132.379239] kernel_thread_starter+0x0/0xc > [ 132.379239] > [ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}: > [ 132.379241] validate_chain.isra.22+0xb36/0xd18 > [ 132.379242] __lock_acquire+0x650/0x880 > [ 132.379242] lock_acquire+0x24e/0x2b0 > [ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8 > [ 132.379243] lock_timer_base+0x78/0xa8 > [ 132.379243] mod_timer+0xba/0x450 > [ 132.379244] sclp_console_write+0x262/0x2b8 > [ 132.379244] console_unlock+0x4a6/0x658 > [ 132.379245] register_console+0x31c/0x430 > [ 132.379245] sclp_console_init+0x152/0x170 > [ 132.379245] console_init+0x3a/0x58 > [ 132.379246] start_kernel+0x350/0x480 > [ 132.379246] _stext+0x20/0x80 > [ 132.379246] > [ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}: > [ 132.379249] check_prev_add+0x150/0x648 > [ 132.379249] validate_chain.isra.22+0xb36/0xd18 > [ 132.379249] __lock_acquire+0x650/0x880 > [ 132.379250] lock_acquire+0x24e/0x2b0 > [ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8 > [ 132.379251] sclp_console_write+0x4a/0x2b8 > [ 132.379251] console_unlock+0x4a6/0x658 > [ 132.379252] vprintk_emit+0x33e/0x350 > [ 132.379252] vprintk_default+0x44/0x58 > [ 132.379252] printk+0x4e/0x58 > [ 132.379253] __offline_isolated_pages+0x1a0/0x340 > [ 132.379253] offline_isolated_pages_cb+0x2e/0x40 > [ 132.379254] walk_system_ram_range+0x96/0xe8 > [ 132.379254] __offline_pages.constprop.43+0x6da/0x918 > [ 132.379255] memory_subsys_offline+0x6a/0xa0 > [ 132.379255] device_offline+0x84/0xd8 > [ 132.379255] store_mem_state+0xfe/0x120 > [ 132.379256] kernfs_fop_write+0x126/0x1d0 > [ 132.379256] __vfs_write+0x46/0x140 > [ 132.379257] vfs_write+0xb8/0x1a8 > [ 132.379257] SyS_write+0x66/0xc0 > [ 132.379257] system_call+0xc4/0x240 > [ 132.379258] > [ 132.379258] other info that might help us debug this: > [ 132.379258] > [ 132.379259] Chain exists of: > [ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock > [ 132.379262] > [ 132.379262] Possible unsafe locking scenario: > [ 132.379262] > [ 132.379263] CPU0 CPU1 > [ 132.379263] ---- ---- > [ 132.379264] lock(&(&zone->lock)->rlock); > [ 132.379265] lock(&(&base->lock)->rlock); > [ 132.379266] lock(&(&zone->lock)->rlock); > [ 132.379267] lock(&(&sclp_con_lock)->rlock); after a rather quick look. as far as I understand it, the problem here is that we have the following chains a) mod_timer() -> __debug_object_init() -> __alloc_pages_nodemask() takes zone->lock under timer base->lock so this adds a "base->lock -> zone->lock" dependency b) console_unlock() -> sclp_console_write() -> lock_timer_base() because sclp_console_write() calls add_timer() under sclp console spin_lock. this one adds a "sclp_con_lock -> base->lock" dependency c) __offline_isolated_pages() -> printk() -> sclp_console_write() calls printk under zone->lock. pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); so it's a "zone->lock -> sclp_con_lock -> base->lock" dependency. which conflicts with (a). a1) as an additional complication, (a) can also printk() from __debug_object_init(), for example see debug_print_object(). which probably can result in mod_timer() -> printk() -> mod_timer() -> base->lock *** deadlock so may be we can consider switching to printk_safe in debug_print_object(). but this is not directly related to the lockdep splat above. so we probably can 1) move pr_info() out of zone->lock in __offline_isolated_pages(). meh... 2) switch to printk_deferred() in __offline_isolated_pages(). meh.. there might a bunch of other printks done from under zone->lock. 3) move add_timer() out of sclp_con_lock console in sclp_console_write(). well, there can be other consoles that do something similar. 4) ... something smart. need to look more at the second lockdep splat, but looks more or less similar. -ss > [ 132.379268] *** DEADLOCK *** > [ 132.379269] > [ 132.379269] 10 locks held by sh/708: > [ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8 > [ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0 > [ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0 > [ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68 > [ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0 > [ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0 > [ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0 > [ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8 > [ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340 > [ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350 > > Here's another one using a different console: > > [ 66.522890] ====================================================== > [ 66.522890] [ INFO: possible circular locking dependency detected ] > [ 66.522891] 4.11.0-rc2-00237-g2e287c0 #119 Not tainted > [ 66.522891] ------------------------------------------------------- > [ 66.522892] sh/822 is trying to acquire lock: > [ 66.522892] (&console_sch_key){-.-...}, at: [<00000000007573f0>] raw3215_write+0x58/0x208 > [ 66.522894] but task is already holding lock: > [ 66.522894] (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370 > [ 66.522897] which lock already depends on the new lock. > [ 66.522898] the existing dependency chain (in reverse order) is: > [ 66.522899] -> #4 (&(&zone->lock)->rlock){..-...}: > [ 66.522901] validate_chain.isra.10+0xb7a/0xda8 > [ 66.522901] __lock_acquire+0x62c/0x850 > [ 66.522902] lock_acquire+0x254/0x2b8 > [ 66.522902] _raw_spin_lock_irqsave+0x70/0xb8 > [ 66.522903] get_page_from_freelist+0x4e2/0xf28 > [ 66.522903] __alloc_pages_nodemask+0x1fa/0x1528 > [ 66.522904] allocate_slab+0xf2/0x6c0 > [ 66.522904] new_slab+0x94/0xa8 > [ 66.522905] ___slab_alloc.constprop.22+0x570/0x590 > [ 66.522905] __slab_alloc.isra.16.constprop.21+0x74/0xa8 > [ 66.522906] kmem_cache_alloc+0x13c/0x4b0 > [ 66.522906] __debug_object_init+0x5c/0x468 > [ 66.522907] hrtimer_init+0x42/0x1d8 > [ 66.522907] init_dl_task_timer+0x3a/0x58 > [ 66.522908] __sched_fork.isra.5+0x7e/0xe8 > [ 66.522908] init_idle+0x7a/0x278 > [ 66.522908] fork_idle+0xa4/0xb8 > [ 66.522909] idle_threads_init+0x64/0xc8 > [ 66.522909] smp_init+0x2e/0x110 > [ 66.522910] kernel_init_freeable+0x16c/0x2e0 > [ 66.522910] kernel_init+0x2a/0x148 > [ 66.522911] kernel_thread_starter+0x6/0xc > [ 66.522911] kernel_thread_starter+0x0/0xc > [ 66.522912] -> #3 (&rq->lock){-.-.-.}: > [ 66.522914] validate_chain.isra.10+0xb7a/0xda8 > [ 66.522914] __lock_acquire+0x62c/0x850 > [ 66.522915] lock_acquire+0x254/0x2b8 > [ 66.522915] _raw_spin_lock+0x60/0xa0 > [ 66.522916] task_fork_fair+0x54/0x118 > [ 66.522916] sched_fork+0x13e/0x2a0 > [ 66.522916] copy_process+0x642/0x1e80 > [ 66.522917] _do_fork+0xc2/0x6d0 > [ 66.522917] kernel_thread+0x4e/0x60 > [ 66.522918] rest_init+0x42/0x170 > [ 66.522918] start_kernel+0x464/0x478 > [ 66.522918] _stext+0x20/0x80 > [ 66.522919] -> #2 (&p->pi_lock){-.-.-.}: > [ 66.522921] validate_chain.isra.10+0xb7a/0xda8 > [ 66.522922] __lock_acquire+0x62c/0x850 > [ 66.522922] lock_acquire+0x254/0x2b8 > [ 66.522922] _raw_spin_lock_irqsave+0x70/0xb8 > [ 66.522923] try_to_wake_up+0x4a/0x5f8 > [ 66.522924] autoremove_wake_function+0x2e/0x88 > [ 66.522924] __wake_up_common+0x76/0xc0 > [ 66.522924] __wake_up+0x54/0x68 > [ 66.522925] ccw_device_verify_done+0xae/0x268 > [ 66.522925] ccw_request_handler+0x422/0x560 > [ 66.522926] do_cio_interrupt+0x224/0x2a0 > [ 66.522926] __handle_irq_event_percpu+0x1a6/0x440 > [ 66.522927] handle_irq_event_percpu+0x38/0x88 > [ 66.522927] handle_percpu_irq+0x84/0xb0 > [ 66.522928] generic_handle_irq+0x42/0x60 > [ 66.522928] do_IRQ+0x86/0xc8 > [ 66.522929] io_int_handler+0x104/0x294 > [ 66.522929] enabled_wait+0x72/0x140 > [ 66.522930] enabled_wait+0x5a/0x140 > [ 66.522930] arch_cpu_idle+0x32/0x50 > [ 66.522930] default_idle_call+0x52/0x68 > [ 66.522931] do_idle+0x10e/0x160 > [ 66.522931] cpu_startup_entry+0x3e/0x48 > [ 66.522932] start_kernel+0x464/0x478 > [ 66.522932] _stext+0x20/0x80 > [ 66.522933] -> #1 (&priv->wait_q){-.-...}: > [ 66.522935] validate_chain.isra.10+0xb7a/0xda8 > [ 66.522935] __lock_acquire+0x62c/0x850 > [ 66.522935] lock_acquire+0x254/0x2b8 > [ 66.522936] _raw_spin_lock_irqsave+0x70/0xb8 > [ 66.522936] __wake_up+0x3a/0x68 > [ 66.522937] ccw_device_recog_done+0x28e/0x2c8 > [ 66.522937] snsid_callback+0x324/0x390 > [ 66.522938] ccw_request_handler+0x480/0x560 > [ 66.522938] do_cio_interrupt+0x224/0x2a0 > [ 66.522939] __handle_irq_event_percpu+0x1a6/0x440 > [ 66.522939] handle_irq_event_percpu+0x38/0x88 > [ 66.522940] handle_percpu_irq+0x84/0xb0 > [ 66.522940] generic_handle_irq+0x42/0x60 > [ 66.522940] do_IRQ+0x86/0xc8 > [ 66.522941] io_int_handler+0x104/0x294 > [ 66.522941] _raw_spin_unlock_irq+0x4e/0x78 > [ 66.522942] _raw_spin_unlock_irq+0x4a/0x78 > [ 66.522942] ccw_device_enable_console+0x9a/0x180 > [ 66.522943] con3215_init+0x110/0x1b8 > [ 66.522943] console_init+0x3a/0x58 > [ 66.522944] start_kernel+0x33a/0x478 > [ 66.522944] _stext+0x20/0x80 > [ 66.522945] -> #0 (&console_sch_key){-.-...}: > [ 66.522947] check_prev_add+0x160/0x6e8 > [ 66.522947] validate_chain.isra.10+0xb7a/0xda8 > [ 66.522948] __lock_acquire+0x62c/0x850 > [ 66.522948] lock_acquire+0x254/0x2b8 > [ 66.522948] _raw_spin_lock_irqsave+0x70/0xb8 > [ 66.522949] raw3215_write+0x58/0x208 > [ 66.522949] con3215_write+0x8e/0xf8 > [ 66.522950] console_unlock+0x49a/0x670 > [ 66.522950] vprintk_emit+0x308/0x378 > [ 66.522951] vprintk_default+0x44/0x58 > [ 66.522951] printk+0x4e/0x60 > [ 66.522952] __offline_isolated_pages+0x168/0x370 > [ 66.522952] offline_isolated_pages_cb+0x2e/0x40 > [ 66.522953] walk_system_ram_range+0x92/0xf0 > [ 66.522953] __offline_pages.constprop.9+0x74a/0x988 > [ 66.522954] memory_subsys_offline+0x6c/0xa0 > [ 66.522954] device_offline+0x84/0xe0 > [ 66.522954] store_mem_state+0xfe/0x120 > [ 66.522955] kernfs_fop_write+0x124/0x1c8 > [ 66.522955] __vfs_write+0x36/0x140 > [ 66.522956] vfs_write+0xb8/0x1a0 > [ 66.522956] SyS_write+0x66/0xc0 > [ 66.522957] system_call+0xc4/0x240 > > [ 66.522957] other info that might help us debug this: > [ 66.522958] Chain exists of: > [ 66.522958] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock > [ 66.522961] Possible unsafe locking scenario: > [ 66.522962] CPU0 CPU1 > [ 66.522962] ---- ---- > [ 66.522963] lock(&(&zone->lock)->rlock); > [ 66.522964] lock(&rq->lock); > [ 66.522965] lock(&(&zone->lock)->rlock); > [ 66.522966] lock(&console_sch_key); > > [ 66.522967] *** DEADLOCK *** > > [ 66.522968] 10 locks held by sh/822: > [ 66.522968] #0: (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0 > [ 66.522970] #1: (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8 > [ 66.522972] #2: (s_active#53){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8 > [ 66.522974] #3: (device_hotplug_lock){+.+...}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70 > [ 66.522976] #4: (memory_add_remove_lock){+.+...}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0 > [ 66.522978] #5: (mem_hotplug.lock){++++.+}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0 > [ 66.522980] #6: (mem_hotplug.lock#2){+.+...}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0 > [ 66.522982] #7: (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0 > [ 66.522984] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370 > [ 66.522986] #9: (console_lock){+.+...}, at: [<00000000001c2c84>] vprintk_emit+0x2fc/0x378