Date: Tue, 21 Mar 2017 13:44:21 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>, Steven Rostedt <rostedt@goodmis.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Michal Hocko <mhocko@kernel.org>,
        Heiko Carstens <heiko.carstens@de.ibm.com>,
        Martin Schwidefsky <schwidefsky@de.ibm.com>,
        linux-kernel@vger.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: lockdep warning: console vs. mem hotplug
Message-ID: <20170321044421.GB448@jagdpanzerIV.localdomain>
References: <alpine.LFD.2.20.1703201736070.1753@schleppi>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.LFD.2.20.1703201736070.1753@schleppi>
User-Agent: Mutt/1.8.0 (2017-02-23)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 15869
Lines: 330

Cc Michal, Linus, Peter, Thomas, Andrew, Martin


// I kept only lockdep splats from the original report and cut off .configs
// full version:
// lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi

Sebastian, thanks.


On (03/20/17 17:43), Sebastian Ott wrote:
[..]
> Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> I observe lockdep warnings on s390 when doing memory hotplug:
> 
> [  132.379220] ======================================================
> [  132.379220] [ INFO: possible circular locking dependency detected ]
> [  132.379221] 4.11.0-rc3 #330 Not tainted
> [  132.379221] -------------------------------------------------------
> [  132.379222] sh/708 is trying to acquire lock:
> [  132.379222]  (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
> [  132.379224]
> [  132.379224] but task is already holding lock:
> [  132.379224]  (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [  132.379226]
> [  132.379227] which lock already depends on the new lock.
> [  132.379227]
> [  132.379227]
> [  132.379228] the existing dependency chain (in reverse order) is:
> [  132.379228]
> [  132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
> [  132.379230]        validate_chain.isra.22+0xb36/0xd18
> [  132.379231]        __lock_acquire+0x650/0x880
> [  132.379231]        lock_acquire+0x24e/0x2b0
> [  132.379232]        _raw_spin_lock_irqsave+0x70/0xb8
> [  132.379232]        get_page_from_freelist+0x548/0xf30
> [  132.379233]        __alloc_pages_nodemask+0x1c8/0x14c0
> [  132.379233]        allocate_slab+0xde/0x668
> [  132.379233]        new_slab+0x94/0xa8
> [  132.379234]        ___slab_alloc.constprop.66+0x556/0x578
> [  132.379234]        __slab_alloc.isra.62.constprop.65+0x74/0xa8
> [  132.379235]        kmem_cache_alloc+0x136/0x4b8
> [  132.379235]        __debug_object_init+0xaa/0x448
> [  132.379236]        debug_object_activate+0x182/0x238
> [  132.379236]        mod_timer+0xe6/0x450
> [  132.379236]        prandom_reseed+0x44/0x50
> [  132.379237]        do_one_initcall+0xa2/0x150
> [  132.379237]        kernel_init_freeable+0x228/0x2d8
> [  132.379238]        kernel_init+0x2a/0x138
> [  132.379238]        kernel_thread_starter+0x6/0xc
> [  132.379239]        kernel_thread_starter+0x0/0xc
> [  132.379239]
> [  132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
> [  132.379241]        validate_chain.isra.22+0xb36/0xd18
> [  132.379242]        __lock_acquire+0x650/0x880
> [  132.379242]        lock_acquire+0x24e/0x2b0
> [  132.379242]        _raw_spin_lock_irqsave+0x70/0xb8
> [  132.379243]        lock_timer_base+0x78/0xa8
> [  132.379243]        mod_timer+0xba/0x450
> [  132.379244]        sclp_console_write+0x262/0x2b8
> [  132.379244]        console_unlock+0x4a6/0x658
> [  132.379245]        register_console+0x31c/0x430
> [  132.379245]        sclp_console_init+0x152/0x170
> [  132.379245]        console_init+0x3a/0x58
> [  132.379246]        start_kernel+0x350/0x480
> [  132.379246]        _stext+0x20/0x80
> [  132.379246]
> [  132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
> [  132.379249]        check_prev_add+0x150/0x648
> [  132.379249]        validate_chain.isra.22+0xb36/0xd18
> [  132.379249]        __lock_acquire+0x650/0x880
> [  132.379250]        lock_acquire+0x24e/0x2b0
> [  132.379250]        _raw_spin_lock_irqsave+0x70/0xb8
> [  132.379251]        sclp_console_write+0x4a/0x2b8
> [  132.379251]        console_unlock+0x4a6/0x658
> [  132.379252]        vprintk_emit+0x33e/0x350
> [  132.379252]        vprintk_default+0x44/0x58
> [  132.379252]        printk+0x4e/0x58
> [  132.379253]        __offline_isolated_pages+0x1a0/0x340
> [  132.379253]        offline_isolated_pages_cb+0x2e/0x40
> [  132.379254]        walk_system_ram_range+0x96/0xe8
> [  132.379254]        __offline_pages.constprop.43+0x6da/0x918
> [  132.379255]        memory_subsys_offline+0x6a/0xa0
> [  132.379255]        device_offline+0x84/0xd8
> [  132.379255]        store_mem_state+0xfe/0x120
> [  132.379256]        kernfs_fop_write+0x126/0x1d0
> [  132.379256]        __vfs_write+0x46/0x140
> [  132.379257]        vfs_write+0xb8/0x1a8
> [  132.379257]        SyS_write+0x66/0xc0
> [  132.379257]        system_call+0xc4/0x240
> [  132.379258]
> [  132.379258] other info that might help us debug this:
> [  132.379258]
> [  132.379259] Chain exists of:
> [  132.379259]   &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
> [  132.379262]
> [  132.379262]  Possible unsafe locking scenario:
> [  132.379262]
> [  132.379263]        CPU0                    CPU1
> [  132.379263]        ----                    ----
> [  132.379264]   lock(&(&zone->lock)->rlock);
> [  132.379265]                                lock(&(&base->lock)->rlock);
> [  132.379266]                                lock(&(&zone->lock)->rlock);
> [  132.379267]   lock(&(&sclp_con_lock)->rlock);


after a rather quick look.


as far as I understand it, the problem here is that we have the
following chains

a)    mod_timer() -> __debug_object_init()  -> __alloc_pages_nodemask()

	takes zone->lock under timer base->lock

	so this adds a   "base->lock -> zone->lock"   dependency


b)    console_unlock() -> sclp_console_write() -> lock_timer_base()

	because sclp_console_write() calls add_timer() under sclp
	console spin_lock.

	this one adds a  "sclp_con_lock -> base->lock"  dependency


c)    __offline_isolated_pages() -> printk() -> sclp_console_write()

	calls printk under zone->lock.    pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn);

	so it's a  "zone->lock -> sclp_con_lock -> base->lock"  dependency.
	which conflicts with (a).


a1) 	as an additional complication, (a) can also printk() from __debug_object_init(),
	for example see debug_print_object(). which probably can result in

	mod_timer() -> printk() -> mod_timer() -> base->lock  *** deadlock

	so may be we can consider switching to printk_safe in debug_print_object().
	but this is not directly related to the lockdep splat above.


so we probably can


1) move pr_info() out of zone->lock in __offline_isolated_pages().
   meh...


2) switch to printk_deferred() in __offline_isolated_pages().
   meh.. there might a bunch of other printks done from under zone->lock.


3) move add_timer() out of sclp_con_lock console in sclp_console_write().
   well, there can be other consoles that do something similar.


4) ... something smart.


need to look more at the second lockdep splat, but looks more or less
similar.

	-ss

> [  132.379268]  *** DEADLOCK ***
> [  132.379269]
> [  132.379269] 10 locks held by sh/708:
> [  132.379269]  #0:  (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
> [  132.379272]  #1:  (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
> [  132.379273]  #2:  (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
> [  132.379275]  #3:  (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
> [  132.379277]  #4:  (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
> [  132.379279]  #5:  (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
> [  132.379281]  #6:  (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
> [  132.379283]  #7:  (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
> [  132.379285]  #8:  (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [  132.379287]  #9:  (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350
> 
> Here's another one using a different console:
> 
> [   66.522890] ======================================================
> [   66.522890] [ INFO: possible circular locking dependency detected ]
> [   66.522891] 4.11.0-rc2-00237-g2e287c0 #119 Not tainted
> [   66.522891] -------------------------------------------------------
> [   66.522892] sh/822 is trying to acquire lock:
> [   66.522892]  (&console_sch_key){-.-...}, at: [<00000000007573f0>] raw3215_write+0x58/0x208
> [   66.522894] but task is already holding lock:
> [   66.522894]  (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [   66.522897] which lock already depends on the new lock.
> [   66.522898] the existing dependency chain (in reverse order) is:
> [   66.522899] -> #4 (&(&zone->lock)->rlock){..-...}:
> [   66.522901]        validate_chain.isra.10+0xb7a/0xda8
> [   66.522901]        __lock_acquire+0x62c/0x850
> [   66.522902]        lock_acquire+0x254/0x2b8
> [   66.522902]        _raw_spin_lock_irqsave+0x70/0xb8
> [   66.522903]        get_page_from_freelist+0x4e2/0xf28
> [   66.522903]        __alloc_pages_nodemask+0x1fa/0x1528
> [   66.522904]        allocate_slab+0xf2/0x6c0
> [   66.522904]        new_slab+0x94/0xa8
> [   66.522905]        ___slab_alloc.constprop.22+0x570/0x590
> [   66.522905]        __slab_alloc.isra.16.constprop.21+0x74/0xa8
> [   66.522906]        kmem_cache_alloc+0x13c/0x4b0
> [   66.522906]        __debug_object_init+0x5c/0x468
> [   66.522907]        hrtimer_init+0x42/0x1d8
> [   66.522907]        init_dl_task_timer+0x3a/0x58
> [   66.522908]        __sched_fork.isra.5+0x7e/0xe8
> [   66.522908]        init_idle+0x7a/0x278
> [   66.522908]        fork_idle+0xa4/0xb8
> [   66.522909]        idle_threads_init+0x64/0xc8
> [   66.522909]        smp_init+0x2e/0x110
> [   66.522910]        kernel_init_freeable+0x16c/0x2e0
> [   66.522910]        kernel_init+0x2a/0x148
> [   66.522911]        kernel_thread_starter+0x6/0xc
> [   66.522911]        kernel_thread_starter+0x0/0xc
> [   66.522912] -> #3 (&rq->lock){-.-.-.}:
> [   66.522914]        validate_chain.isra.10+0xb7a/0xda8
> [   66.522914]        __lock_acquire+0x62c/0x850
> [   66.522915]        lock_acquire+0x254/0x2b8
> [   66.522915]        _raw_spin_lock+0x60/0xa0
> [   66.522916]        task_fork_fair+0x54/0x118
> [   66.522916]        sched_fork+0x13e/0x2a0
> [   66.522916]        copy_process+0x642/0x1e80
> [   66.522917]        _do_fork+0xc2/0x6d0
> [   66.522917]        kernel_thread+0x4e/0x60
> [   66.522918]        rest_init+0x42/0x170
> [   66.522918]        start_kernel+0x464/0x478
> [   66.522918]        _stext+0x20/0x80
> [   66.522919] -> #2 (&p->pi_lock){-.-.-.}:
> [   66.522921]        validate_chain.isra.10+0xb7a/0xda8
> [   66.522922]        __lock_acquire+0x62c/0x850
> [   66.522922]        lock_acquire+0x254/0x2b8
> [   66.522922]        _raw_spin_lock_irqsave+0x70/0xb8
> [   66.522923]        try_to_wake_up+0x4a/0x5f8
> [   66.522924]        autoremove_wake_function+0x2e/0x88
> [   66.522924]        __wake_up_common+0x76/0xc0
> [   66.522924]        __wake_up+0x54/0x68
> [   66.522925]        ccw_device_verify_done+0xae/0x268
> [   66.522925]        ccw_request_handler+0x422/0x560
> [   66.522926]        do_cio_interrupt+0x224/0x2a0
> [   66.522926]        __handle_irq_event_percpu+0x1a6/0x440
> [   66.522927]        handle_irq_event_percpu+0x38/0x88
> [   66.522927]        handle_percpu_irq+0x84/0xb0
> [   66.522928]        generic_handle_irq+0x42/0x60
> [   66.522928]        do_IRQ+0x86/0xc8
> [   66.522929]        io_int_handler+0x104/0x294
> [   66.522929]        enabled_wait+0x72/0x140
> [   66.522930]        enabled_wait+0x5a/0x140
> [   66.522930]        arch_cpu_idle+0x32/0x50
> [   66.522930]        default_idle_call+0x52/0x68
> [   66.522931]        do_idle+0x10e/0x160
> [   66.522931]        cpu_startup_entry+0x3e/0x48
> [   66.522932]        start_kernel+0x464/0x478
> [   66.522932]        _stext+0x20/0x80
> [   66.522933] -> #1 (&priv->wait_q){-.-...}:
> [   66.522935]        validate_chain.isra.10+0xb7a/0xda8
> [   66.522935]        __lock_acquire+0x62c/0x850
> [   66.522935]        lock_acquire+0x254/0x2b8
> [   66.522936]        _raw_spin_lock_irqsave+0x70/0xb8
> [   66.522936]        __wake_up+0x3a/0x68
> [   66.522937]        ccw_device_recog_done+0x28e/0x2c8
> [   66.522937]        snsid_callback+0x324/0x390
> [   66.522938]        ccw_request_handler+0x480/0x560
> [   66.522938]        do_cio_interrupt+0x224/0x2a0
> [   66.522939]        __handle_irq_event_percpu+0x1a6/0x440
> [   66.522939]        handle_irq_event_percpu+0x38/0x88
> [   66.522940]        handle_percpu_irq+0x84/0xb0
> [   66.522940]        generic_handle_irq+0x42/0x60
> [   66.522940]        do_IRQ+0x86/0xc8
> [   66.522941]        io_int_handler+0x104/0x294
> [   66.522941]        _raw_spin_unlock_irq+0x4e/0x78
> [   66.522942]        _raw_spin_unlock_irq+0x4a/0x78
> [   66.522942]        ccw_device_enable_console+0x9a/0x180
> [   66.522943]        con3215_init+0x110/0x1b8
> [   66.522943]        console_init+0x3a/0x58
> [   66.522944]        start_kernel+0x33a/0x478
> [   66.522944]        _stext+0x20/0x80
> [   66.522945] -> #0 (&console_sch_key){-.-...}:
> [   66.522947]        check_prev_add+0x160/0x6e8
> [   66.522947]        validate_chain.isra.10+0xb7a/0xda8
> [   66.522948]        __lock_acquire+0x62c/0x850
> [   66.522948]        lock_acquire+0x254/0x2b8
> [   66.522948]        _raw_spin_lock_irqsave+0x70/0xb8
> [   66.522949]        raw3215_write+0x58/0x208
> [   66.522949]        con3215_write+0x8e/0xf8
> [   66.522950]        console_unlock+0x49a/0x670
> [   66.522950]        vprintk_emit+0x308/0x378
> [   66.522951]        vprintk_default+0x44/0x58
> [   66.522951]        printk+0x4e/0x60
> [   66.522952]        __offline_isolated_pages+0x168/0x370
> [   66.522952]        offline_isolated_pages_cb+0x2e/0x40
> [   66.522953]        walk_system_ram_range+0x92/0xf0
> [   66.522953]        __offline_pages.constprop.9+0x74a/0x988
> [   66.522954]        memory_subsys_offline+0x6c/0xa0
> [   66.522954]        device_offline+0x84/0xe0
> [   66.522954]        store_mem_state+0xfe/0x120
> [   66.522955]        kernfs_fop_write+0x124/0x1c8
> [   66.522955]        __vfs_write+0x36/0x140
> [   66.522956]        vfs_write+0xb8/0x1a0
> [   66.522956]        SyS_write+0x66/0xc0
> [   66.522957]        system_call+0xc4/0x240
> 
> [   66.522957] other info that might help us debug this:
> [   66.522958] Chain exists of:
> [   66.522958]   &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
> [   66.522961]  Possible unsafe locking scenario:
> [   66.522962]        CPU0                    CPU1
> [   66.522962]        ----                    ----
> [   66.522963]   lock(&(&zone->lock)->rlock);
> [   66.522964]                                lock(&rq->lock);
> [   66.522965]                                lock(&(&zone->lock)->rlock);
> [   66.522966]   lock(&console_sch_key);
> 
> [   66.522967]  *** DEADLOCK ***
> 
> [   66.522968] 10 locks held by sh/822:
> [   66.522968]  #0:  (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0
> [   66.522970]  #1:  (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8
> [   66.522972]  #2:  (s_active#53){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8
> [   66.522974]  #3:  (device_hotplug_lock){+.+...}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70
> [   66.522976]  #4:  (memory_add_remove_lock){+.+...}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0
> [   66.522978]  #5:  (mem_hotplug.lock){++++.+}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0
> [   66.522980]  #6:  (mem_hotplug.lock#2){+.+...}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0
> [   66.522982]  #7:  (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0
> [   66.522984]  #8:  (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [   66.522986]  #9:  (console_lock){+.+...}, at: [<00000000001c2c84>] vprintk_emit+0x2fc/0x378