2017-03-20 17:26:36

by Sebastian Ott

[permalink] [raw]
Subject: lockdep warning: console vs. mem hotplug

Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
I observe lockdep warnings on s390 when doing memory hotplug:

[ 132.379220] ======================================================
[ 132.379220] [ INFO: possible circular locking dependency detected ]
[ 132.379221] 4.11.0-rc3 #330 Not tainted
[ 132.379221] -------------------------------------------------------
[ 132.379222] sh/708 is trying to acquire lock:
[ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
[ 132.379224]
[ 132.379224] but task is already holding lock:
[ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
[ 132.379226]
[ 132.379227] which lock already depends on the new lock.
[ 132.379227]
[ 132.379227]
[ 132.379228] the existing dependency chain (in reverse order) is:
[ 132.379228]
[ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
[ 132.379230] validate_chain.isra.22+0xb36/0xd18
[ 132.379231] __lock_acquire+0x650/0x880
[ 132.379231] lock_acquire+0x24e/0x2b0
[ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379232] get_page_from_freelist+0x548/0xf30
[ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
[ 132.379233] allocate_slab+0xde/0x668
[ 132.379233] new_slab+0x94/0xa8
[ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
[ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
[ 132.379235] kmem_cache_alloc+0x136/0x4b8
[ 132.379235] __debug_object_init+0xaa/0x448
[ 132.379236] debug_object_activate+0x182/0x238
[ 132.379236] mod_timer+0xe6/0x450
[ 132.379236] prandom_reseed+0x44/0x50
[ 132.379237] do_one_initcall+0xa2/0x150
[ 132.379237] kernel_init_freeable+0x228/0x2d8
[ 132.379238] kernel_init+0x2a/0x138
[ 132.379238] kernel_thread_starter+0x6/0xc
[ 132.379239] kernel_thread_starter+0x0/0xc
[ 132.379239]
[ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
[ 132.379241] validate_chain.isra.22+0xb36/0xd18
[ 132.379242] __lock_acquire+0x650/0x880
[ 132.379242] lock_acquire+0x24e/0x2b0
[ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379243] lock_timer_base+0x78/0xa8
[ 132.379243] mod_timer+0xba/0x450
[ 132.379244] sclp_console_write+0x262/0x2b8
[ 132.379244] console_unlock+0x4a6/0x658
[ 132.379245] register_console+0x31c/0x430
[ 132.379245] sclp_console_init+0x152/0x170
[ 132.379245] console_init+0x3a/0x58
[ 132.379246] start_kernel+0x350/0x480
[ 132.379246] _stext+0x20/0x80
[ 132.379246]
[ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
[ 132.379249] check_prev_add+0x150/0x648
[ 132.379249] validate_chain.isra.22+0xb36/0xd18
[ 132.379249] __lock_acquire+0x650/0x880
[ 132.379250] lock_acquire+0x24e/0x2b0
[ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379251] sclp_console_write+0x4a/0x2b8
[ 132.379251] console_unlock+0x4a6/0x658
[ 132.379252] vprintk_emit+0x33e/0x350
[ 132.379252] vprintk_default+0x44/0x58
[ 132.379252] printk+0x4e/0x58
[ 132.379253] __offline_isolated_pages+0x1a0/0x340
[ 132.379253] offline_isolated_pages_cb+0x2e/0x40
[ 132.379254] walk_system_ram_range+0x96/0xe8
[ 132.379254] __offline_pages.constprop.43+0x6da/0x918
[ 132.379255] memory_subsys_offline+0x6a/0xa0
[ 132.379255] device_offline+0x84/0xd8
[ 132.379255] store_mem_state+0xfe/0x120
[ 132.379256] kernfs_fop_write+0x126/0x1d0
[ 132.379256] __vfs_write+0x46/0x140
[ 132.379257] vfs_write+0xb8/0x1a8
[ 132.379257] SyS_write+0x66/0xc0
[ 132.379257] system_call+0xc4/0x240
[ 132.379258]
[ 132.379258] other info that might help us debug this:
[ 132.379258]
[ 132.379259] Chain exists of:
[ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
[ 132.379262]
[ 132.379262] Possible unsafe locking scenario:
[ 132.379262]
[ 132.379263] CPU0 CPU1
[ 132.379263] ---- ----
[ 132.379264] lock(&(&zone->lock)->rlock);
[ 132.379265] lock(&(&base->lock)->rlock);
[ 132.379266] lock(&(&zone->lock)->rlock);
[ 132.379267] lock(&(&sclp_con_lock)->rlock);
[ 132.379268]
[ 132.379268] *** DEADLOCK ***
[ 132.379269]
[ 132.379269] 10 locks held by sh/708:
[ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
[ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
[ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
[ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
[ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
[ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
[ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
[ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
[ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
[ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350

Here's another one using a different console:

[ 66.522890] ======================================================
[ 66.522890] [ INFO: possible circular locking dependency detected ]
[ 66.522891] 4.11.0-rc2-00237-g2e287c0 #119 Not tainted
[ 66.522891] -------------------------------------------------------
[ 66.522892] sh/822 is trying to acquire lock:
[ 66.522892] (&console_sch_key){-.-...}, at: [<00000000007573f0>] raw3215_write+0x58/0x208
[ 66.522894] but task is already holding lock:
[ 66.522894] (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
[ 66.522897] which lock already depends on the new lock.
[ 66.522898] the existing dependency chain (in reverse order) is:
[ 66.522899] -> #4 (&(&zone->lock)->rlock){..-...}:
[ 66.522901] validate_chain.isra.10+0xb7a/0xda8
[ 66.522901] __lock_acquire+0x62c/0x850
[ 66.522902] lock_acquire+0x254/0x2b8
[ 66.522902] _raw_spin_lock_irqsave+0x70/0xb8
[ 66.522903] get_page_from_freelist+0x4e2/0xf28
[ 66.522903] __alloc_pages_nodemask+0x1fa/0x1528
[ 66.522904] allocate_slab+0xf2/0x6c0
[ 66.522904] new_slab+0x94/0xa8
[ 66.522905] ___slab_alloc.constprop.22+0x570/0x590
[ 66.522905] __slab_alloc.isra.16.constprop.21+0x74/0xa8
[ 66.522906] kmem_cache_alloc+0x13c/0x4b0
[ 66.522906] __debug_object_init+0x5c/0x468
[ 66.522907] hrtimer_init+0x42/0x1d8
[ 66.522907] init_dl_task_timer+0x3a/0x58
[ 66.522908] __sched_fork.isra.5+0x7e/0xe8
[ 66.522908] init_idle+0x7a/0x278
[ 66.522908] fork_idle+0xa4/0xb8
[ 66.522909] idle_threads_init+0x64/0xc8
[ 66.522909] smp_init+0x2e/0x110
[ 66.522910] kernel_init_freeable+0x16c/0x2e0
[ 66.522910] kernel_init+0x2a/0x148
[ 66.522911] kernel_thread_starter+0x6/0xc
[ 66.522911] kernel_thread_starter+0x0/0xc
[ 66.522912] -> #3 (&rq->lock){-.-.-.}:
[ 66.522914] validate_chain.isra.10+0xb7a/0xda8
[ 66.522914] __lock_acquire+0x62c/0x850
[ 66.522915] lock_acquire+0x254/0x2b8
[ 66.522915] _raw_spin_lock+0x60/0xa0
[ 66.522916] task_fork_fair+0x54/0x118
[ 66.522916] sched_fork+0x13e/0x2a0
[ 66.522916] copy_process+0x642/0x1e80
[ 66.522917] _do_fork+0xc2/0x6d0
[ 66.522917] kernel_thread+0x4e/0x60
[ 66.522918] rest_init+0x42/0x170
[ 66.522918] start_kernel+0x464/0x478
[ 66.522918] _stext+0x20/0x80
[ 66.522919] -> #2 (&p->pi_lock){-.-.-.}:
[ 66.522921] validate_chain.isra.10+0xb7a/0xda8
[ 66.522922] __lock_acquire+0x62c/0x850
[ 66.522922] lock_acquire+0x254/0x2b8
[ 66.522922] _raw_spin_lock_irqsave+0x70/0xb8
[ 66.522923] try_to_wake_up+0x4a/0x5f8
[ 66.522924] autoremove_wake_function+0x2e/0x88
[ 66.522924] __wake_up_common+0x76/0xc0
[ 66.522924] __wake_up+0x54/0x68
[ 66.522925] ccw_device_verify_done+0xae/0x268
[ 66.522925] ccw_request_handler+0x422/0x560
[ 66.522926] do_cio_interrupt+0x224/0x2a0
[ 66.522926] __handle_irq_event_percpu+0x1a6/0x440
[ 66.522927] handle_irq_event_percpu+0x38/0x88
[ 66.522927] handle_percpu_irq+0x84/0xb0
[ 66.522928] generic_handle_irq+0x42/0x60
[ 66.522928] do_IRQ+0x86/0xc8
[ 66.522929] io_int_handler+0x104/0x294
[ 66.522929] enabled_wait+0x72/0x140
[ 66.522930] enabled_wait+0x5a/0x140
[ 66.522930] arch_cpu_idle+0x32/0x50
[ 66.522930] default_idle_call+0x52/0x68
[ 66.522931] do_idle+0x10e/0x160
[ 66.522931] cpu_startup_entry+0x3e/0x48
[ 66.522932] start_kernel+0x464/0x478
[ 66.522932] _stext+0x20/0x80
[ 66.522933] -> #1 (&priv->wait_q){-.-...}:
[ 66.522935] validate_chain.isra.10+0xb7a/0xda8
[ 66.522935] __lock_acquire+0x62c/0x850
[ 66.522935] lock_acquire+0x254/0x2b8
[ 66.522936] _raw_spin_lock_irqsave+0x70/0xb8
[ 66.522936] __wake_up+0x3a/0x68
[ 66.522937] ccw_device_recog_done+0x28e/0x2c8
[ 66.522937] snsid_callback+0x324/0x390
[ 66.522938] ccw_request_handler+0x480/0x560
[ 66.522938] do_cio_interrupt+0x224/0x2a0
[ 66.522939] __handle_irq_event_percpu+0x1a6/0x440
[ 66.522939] handle_irq_event_percpu+0x38/0x88
[ 66.522940] handle_percpu_irq+0x84/0xb0
[ 66.522940] generic_handle_irq+0x42/0x60
[ 66.522940] do_IRQ+0x86/0xc8
[ 66.522941] io_int_handler+0x104/0x294
[ 66.522941] _raw_spin_unlock_irq+0x4e/0x78
[ 66.522942] _raw_spin_unlock_irq+0x4a/0x78
[ 66.522942] ccw_device_enable_console+0x9a/0x180
[ 66.522943] con3215_init+0x110/0x1b8
[ 66.522943] console_init+0x3a/0x58
[ 66.522944] start_kernel+0x33a/0x478
[ 66.522944] _stext+0x20/0x80
[ 66.522945] -> #0 (&console_sch_key){-.-...}:
[ 66.522947] check_prev_add+0x160/0x6e8
[ 66.522947] validate_chain.isra.10+0xb7a/0xda8
[ 66.522948] __lock_acquire+0x62c/0x850
[ 66.522948] lock_acquire+0x254/0x2b8
[ 66.522948] _raw_spin_lock_irqsave+0x70/0xb8
[ 66.522949] raw3215_write+0x58/0x208
[ 66.522949] con3215_write+0x8e/0xf8
[ 66.522950] console_unlock+0x49a/0x670
[ 66.522950] vprintk_emit+0x308/0x378
[ 66.522951] vprintk_default+0x44/0x58
[ 66.522951] printk+0x4e/0x60
[ 66.522952] __offline_isolated_pages+0x168/0x370
[ 66.522952] offline_isolated_pages_cb+0x2e/0x40
[ 66.522953] walk_system_ram_range+0x92/0xf0
[ 66.522953] __offline_pages.constprop.9+0x74a/0x988
[ 66.522954] memory_subsys_offline+0x6c/0xa0
[ 66.522954] device_offline+0x84/0xe0
[ 66.522954] store_mem_state+0xfe/0x120
[ 66.522955] kernfs_fop_write+0x124/0x1c8
[ 66.522955] __vfs_write+0x36/0x140
[ 66.522956] vfs_write+0xb8/0x1a0
[ 66.522956] SyS_write+0x66/0xc0
[ 66.522957] system_call+0xc4/0x240

[ 66.522957] other info that might help us debug this:
[ 66.522958] Chain exists of:
[ 66.522958] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
[ 66.522961] Possible unsafe locking scenario:
[ 66.522962] CPU0 CPU1
[ 66.522962] ---- ----
[ 66.522963] lock(&(&zone->lock)->rlock);
[ 66.522964] lock(&rq->lock);
[ 66.522965] lock(&(&zone->lock)->rlock);
[ 66.522966] lock(&console_sch_key);

[ 66.522967] *** DEADLOCK ***

[ 66.522968] 10 locks held by sh/822:
[ 66.522968] #0: (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0
[ 66.522970] #1: (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8
[ 66.522972] #2: (s_active#53){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8
[ 66.522974] #3: (device_hotplug_lock){+.+...}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70
[ 66.522976] #4: (memory_add_remove_lock){+.+...}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0
[ 66.522978] #5: (mem_hotplug.lock){++++.+}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0
[ 66.522980] #6: (mem_hotplug.lock#2){+.+...}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0
[ 66.522982] #7: (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0
[ 66.522984] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
[ 66.522986] #9: (console_lock){+.+...}, at: [<00000000001c2c84>] vprintk_emit+0x2fc/0x378

Config attached.

Regards,
Sebastian


Attachments:
config (53.62 kB)

2017-03-21 04:44:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

Cc Michal, Linus, Peter, Thomas, Andrew, Martin


// I kept only lockdep splats from the original report and cut off .configs
// full version:
// lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi

Sebastian, thanks.


On (03/20/17 17:43), Sebastian Ott wrote:
[..]
> Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> I observe lockdep warnings on s390 when doing memory hotplug:
>
> [ 132.379220] ======================================================
> [ 132.379220] [ INFO: possible circular locking dependency detected ]
> [ 132.379221] 4.11.0-rc3 #330 Not tainted
> [ 132.379221] -------------------------------------------------------
> [ 132.379222] sh/708 is trying to acquire lock:
> [ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
> [ 132.379224]
> [ 132.379224] but task is already holding lock:
> [ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379226]
> [ 132.379227] which lock already depends on the new lock.
> [ 132.379227]
> [ 132.379227]
> [ 132.379228] the existing dependency chain (in reverse order) is:
> [ 132.379228]
> [ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
> [ 132.379230] validate_chain.isra.22+0xb36/0xd18
> [ 132.379231] __lock_acquire+0x650/0x880
> [ 132.379231] lock_acquire+0x24e/0x2b0
> [ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379232] get_page_from_freelist+0x548/0xf30
> [ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
> [ 132.379233] allocate_slab+0xde/0x668
> [ 132.379233] new_slab+0x94/0xa8
> [ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
> [ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
> [ 132.379235] kmem_cache_alloc+0x136/0x4b8
> [ 132.379235] __debug_object_init+0xaa/0x448
> [ 132.379236] debug_object_activate+0x182/0x238
> [ 132.379236] mod_timer+0xe6/0x450
> [ 132.379236] prandom_reseed+0x44/0x50
> [ 132.379237] do_one_initcall+0xa2/0x150
> [ 132.379237] kernel_init_freeable+0x228/0x2d8
> [ 132.379238] kernel_init+0x2a/0x138
> [ 132.379238] kernel_thread_starter+0x6/0xc
> [ 132.379239] kernel_thread_starter+0x0/0xc
> [ 132.379239]
> [ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
> [ 132.379241] validate_chain.isra.22+0xb36/0xd18
> [ 132.379242] __lock_acquire+0x650/0x880
> [ 132.379242] lock_acquire+0x24e/0x2b0
> [ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379243] lock_timer_base+0x78/0xa8
> [ 132.379243] mod_timer+0xba/0x450
> [ 132.379244] sclp_console_write+0x262/0x2b8
> [ 132.379244] console_unlock+0x4a6/0x658
> [ 132.379245] register_console+0x31c/0x430
> [ 132.379245] sclp_console_init+0x152/0x170
> [ 132.379245] console_init+0x3a/0x58
> [ 132.379246] start_kernel+0x350/0x480
> [ 132.379246] _stext+0x20/0x80
> [ 132.379246]
> [ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
> [ 132.379249] check_prev_add+0x150/0x648
> [ 132.379249] validate_chain.isra.22+0xb36/0xd18
> [ 132.379249] __lock_acquire+0x650/0x880
> [ 132.379250] lock_acquire+0x24e/0x2b0
> [ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379251] sclp_console_write+0x4a/0x2b8
> [ 132.379251] console_unlock+0x4a6/0x658
> [ 132.379252] vprintk_emit+0x33e/0x350
> [ 132.379252] vprintk_default+0x44/0x58
> [ 132.379252] printk+0x4e/0x58
> [ 132.379253] __offline_isolated_pages+0x1a0/0x340
> [ 132.379253] offline_isolated_pages_cb+0x2e/0x40
> [ 132.379254] walk_system_ram_range+0x96/0xe8
> [ 132.379254] __offline_pages.constprop.43+0x6da/0x918
> [ 132.379255] memory_subsys_offline+0x6a/0xa0
> [ 132.379255] device_offline+0x84/0xd8
> [ 132.379255] store_mem_state+0xfe/0x120
> [ 132.379256] kernfs_fop_write+0x126/0x1d0
> [ 132.379256] __vfs_write+0x46/0x140
> [ 132.379257] vfs_write+0xb8/0x1a8
> [ 132.379257] SyS_write+0x66/0xc0
> [ 132.379257] system_call+0xc4/0x240
> [ 132.379258]
> [ 132.379258] other info that might help us debug this:
> [ 132.379258]
> [ 132.379259] Chain exists of:
> [ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
> [ 132.379262]
> [ 132.379262] Possible unsafe locking scenario:
> [ 132.379262]
> [ 132.379263] CPU0 CPU1
> [ 132.379263] ---- ----
> [ 132.379264] lock(&(&zone->lock)->rlock);
> [ 132.379265] lock(&(&base->lock)->rlock);
> [ 132.379266] lock(&(&zone->lock)->rlock);
> [ 132.379267] lock(&(&sclp_con_lock)->rlock);


after a rather quick look.


as far as I understand it, the problem here is that we have the
following chains

a) mod_timer() -> __debug_object_init() -> __alloc_pages_nodemask()

takes zone->lock under timer base->lock

so this adds a "base->lock -> zone->lock" dependency


b) console_unlock() -> sclp_console_write() -> lock_timer_base()

because sclp_console_write() calls add_timer() under sclp
console spin_lock.

this one adds a "sclp_con_lock -> base->lock" dependency


c) __offline_isolated_pages() -> printk() -> sclp_console_write()

calls printk under zone->lock. pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn);

so it's a "zone->lock -> sclp_con_lock -> base->lock" dependency.
which conflicts with (a).



a1) as an additional complication, (a) can also printk() from __debug_object_init(),
for example see debug_print_object(). which probably can result in

mod_timer() -> printk() -> mod_timer() -> base->lock *** deadlock

so may be we can consider switching to printk_safe in debug_print_object().
but this is not directly related to the lockdep splat above.



so we probably can


1) move pr_info() out of zone->lock in __offline_isolated_pages().
meh...


2) switch to printk_deferred() in __offline_isolated_pages().
meh.. there might a bunch of other printks done from under zone->lock.


3) move add_timer() out of sclp_con_lock console in sclp_console_write().
well, there can be other consoles that do something similar.


4) ... something smart.


need to look more at the second lockdep splat, but looks more or less
similar.

-ss

> [ 132.379268] *** DEADLOCK ***
> [ 132.379269]
> [ 132.379269] 10 locks held by sh/708:
> [ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
> [ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
> [ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
> [ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
> [ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
> [ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
> [ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
> [ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
> [ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350
>
> Here's another one using a different console:
>
> [ 66.522890] ======================================================
> [ 66.522890] [ INFO: possible circular locking dependency detected ]
> [ 66.522891] 4.11.0-rc2-00237-g2e287c0 #119 Not tainted
> [ 66.522891] -------------------------------------------------------
> [ 66.522892] sh/822 is trying to acquire lock:
> [ 66.522892] (&console_sch_key){-.-...}, at: [<00000000007573f0>] raw3215_write+0x58/0x208
> [ 66.522894] but task is already holding lock:
> [ 66.522894] (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [ 66.522897] which lock already depends on the new lock.
> [ 66.522898] the existing dependency chain (in reverse order) is:
> [ 66.522899] -> #4 (&(&zone->lock)->rlock){..-...}:
> [ 66.522901] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522901] __lock_acquire+0x62c/0x850
> [ 66.522902] lock_acquire+0x254/0x2b8
> [ 66.522902] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522903] get_page_from_freelist+0x4e2/0xf28
> [ 66.522903] __alloc_pages_nodemask+0x1fa/0x1528
> [ 66.522904] allocate_slab+0xf2/0x6c0
> [ 66.522904] new_slab+0x94/0xa8
> [ 66.522905] ___slab_alloc.constprop.22+0x570/0x590
> [ 66.522905] __slab_alloc.isra.16.constprop.21+0x74/0xa8
> [ 66.522906] kmem_cache_alloc+0x13c/0x4b0
> [ 66.522906] __debug_object_init+0x5c/0x468
> [ 66.522907] hrtimer_init+0x42/0x1d8
> [ 66.522907] init_dl_task_timer+0x3a/0x58
> [ 66.522908] __sched_fork.isra.5+0x7e/0xe8
> [ 66.522908] init_idle+0x7a/0x278
> [ 66.522908] fork_idle+0xa4/0xb8
> [ 66.522909] idle_threads_init+0x64/0xc8
> [ 66.522909] smp_init+0x2e/0x110
> [ 66.522910] kernel_init_freeable+0x16c/0x2e0
> [ 66.522910] kernel_init+0x2a/0x148
> [ 66.522911] kernel_thread_starter+0x6/0xc
> [ 66.522911] kernel_thread_starter+0x0/0xc
> [ 66.522912] -> #3 (&rq->lock){-.-.-.}:
> [ 66.522914] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522914] __lock_acquire+0x62c/0x850
> [ 66.522915] lock_acquire+0x254/0x2b8
> [ 66.522915] _raw_spin_lock+0x60/0xa0
> [ 66.522916] task_fork_fair+0x54/0x118
> [ 66.522916] sched_fork+0x13e/0x2a0
> [ 66.522916] copy_process+0x642/0x1e80
> [ 66.522917] _do_fork+0xc2/0x6d0
> [ 66.522917] kernel_thread+0x4e/0x60
> [ 66.522918] rest_init+0x42/0x170
> [ 66.522918] start_kernel+0x464/0x478
> [ 66.522918] _stext+0x20/0x80
> [ 66.522919] -> #2 (&p->pi_lock){-.-.-.}:
> [ 66.522921] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522922] __lock_acquire+0x62c/0x850
> [ 66.522922] lock_acquire+0x254/0x2b8
> [ 66.522922] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522923] try_to_wake_up+0x4a/0x5f8
> [ 66.522924] autoremove_wake_function+0x2e/0x88
> [ 66.522924] __wake_up_common+0x76/0xc0
> [ 66.522924] __wake_up+0x54/0x68
> [ 66.522925] ccw_device_verify_done+0xae/0x268
> [ 66.522925] ccw_request_handler+0x422/0x560
> [ 66.522926] do_cio_interrupt+0x224/0x2a0
> [ 66.522926] __handle_irq_event_percpu+0x1a6/0x440
> [ 66.522927] handle_irq_event_percpu+0x38/0x88
> [ 66.522927] handle_percpu_irq+0x84/0xb0
> [ 66.522928] generic_handle_irq+0x42/0x60
> [ 66.522928] do_IRQ+0x86/0xc8
> [ 66.522929] io_int_handler+0x104/0x294
> [ 66.522929] enabled_wait+0x72/0x140
> [ 66.522930] enabled_wait+0x5a/0x140
> [ 66.522930] arch_cpu_idle+0x32/0x50
> [ 66.522930] default_idle_call+0x52/0x68
> [ 66.522931] do_idle+0x10e/0x160
> [ 66.522931] cpu_startup_entry+0x3e/0x48
> [ 66.522932] start_kernel+0x464/0x478
> [ 66.522932] _stext+0x20/0x80
> [ 66.522933] -> #1 (&priv->wait_q){-.-...}:
> [ 66.522935] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522935] __lock_acquire+0x62c/0x850
> [ 66.522935] lock_acquire+0x254/0x2b8
> [ 66.522936] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522936] __wake_up+0x3a/0x68
> [ 66.522937] ccw_device_recog_done+0x28e/0x2c8
> [ 66.522937] snsid_callback+0x324/0x390
> [ 66.522938] ccw_request_handler+0x480/0x560
> [ 66.522938] do_cio_interrupt+0x224/0x2a0
> [ 66.522939] __handle_irq_event_percpu+0x1a6/0x440
> [ 66.522939] handle_irq_event_percpu+0x38/0x88
> [ 66.522940] handle_percpu_irq+0x84/0xb0
> [ 66.522940] generic_handle_irq+0x42/0x60
> [ 66.522940] do_IRQ+0x86/0xc8
> [ 66.522941] io_int_handler+0x104/0x294
> [ 66.522941] _raw_spin_unlock_irq+0x4e/0x78
> [ 66.522942] _raw_spin_unlock_irq+0x4a/0x78
> [ 66.522942] ccw_device_enable_console+0x9a/0x180
> [ 66.522943] con3215_init+0x110/0x1b8
> [ 66.522943] console_init+0x3a/0x58
> [ 66.522944] start_kernel+0x33a/0x478
> [ 66.522944] _stext+0x20/0x80
> [ 66.522945] -> #0 (&console_sch_key){-.-...}:
> [ 66.522947] check_prev_add+0x160/0x6e8
> [ 66.522947] validate_chain.isra.10+0xb7a/0xda8
> [ 66.522948] __lock_acquire+0x62c/0x850
> [ 66.522948] lock_acquire+0x254/0x2b8
> [ 66.522948] _raw_spin_lock_irqsave+0x70/0xb8
> [ 66.522949] raw3215_write+0x58/0x208
> [ 66.522949] con3215_write+0x8e/0xf8
> [ 66.522950] console_unlock+0x49a/0x670
> [ 66.522950] vprintk_emit+0x308/0x378
> [ 66.522951] vprintk_default+0x44/0x58
> [ 66.522951] printk+0x4e/0x60
> [ 66.522952] __offline_isolated_pages+0x168/0x370
> [ 66.522952] offline_isolated_pages_cb+0x2e/0x40
> [ 66.522953] walk_system_ram_range+0x92/0xf0
> [ 66.522953] __offline_pages.constprop.9+0x74a/0x988
> [ 66.522954] memory_subsys_offline+0x6c/0xa0
> [ 66.522954] device_offline+0x84/0xe0
> [ 66.522954] store_mem_state+0xfe/0x120
> [ 66.522955] kernfs_fop_write+0x124/0x1c8
> [ 66.522955] __vfs_write+0x36/0x140
> [ 66.522956] vfs_write+0xb8/0x1a0
> [ 66.522956] SyS_write+0x66/0xc0
> [ 66.522957] system_call+0xc4/0x240
>
> [ 66.522957] other info that might help us debug this:
> [ 66.522958] Chain exists of:
> [ 66.522958] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
> [ 66.522961] Possible unsafe locking scenario:
> [ 66.522962] CPU0 CPU1
> [ 66.522962] ---- ----
> [ 66.522963] lock(&(&zone->lock)->rlock);
> [ 66.522964] lock(&rq->lock);
> [ 66.522965] lock(&(&zone->lock)->rlock);
> [ 66.522966] lock(&console_sch_key);
>
> [ 66.522967] *** DEADLOCK ***
>
> [ 66.522968] 10 locks held by sh/822:
> [ 66.522968] #0: (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0
> [ 66.522970] #1: (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8
> [ 66.522972] #2: (s_active#53){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8
> [ 66.522974] #3: (device_hotplug_lock){+.+...}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70
> [ 66.522976] #4: (memory_add_remove_lock){+.+...}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0
> [ 66.522978] #5: (mem_hotplug.lock){++++.+}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0
> [ 66.522980] #6: (mem_hotplug.lock#2){+.+...}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0
> [ 66.522982] #7: (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0
> [ 66.522984] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
> [ 66.522986] #9: (console_lock){+.+...}, at: [<00000000001c2c84>] vprintk_emit+0x2fc/0x378

2017-03-21 05:04:57

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

[..]
> 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
^^^^^^^^^^^^
I take it back. Won't improve anything.

-ss

2017-03-24 16:39:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Mon, 20 Mar 2017 17:43:23 +0100 (CET)
Sebastian Ott <[email protected]> wrote:

> Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> I observe lockdep warnings on s390 when doing memory hotplug:
>
> [ 132.379220] ======================================================
> [ 132.379220] [ INFO: possible circular locking dependency detected ]
> [ 132.379221] 4.11.0-rc3 #330 Not tainted
> [ 132.379221] -------------------------------------------------------
> [ 132.379222] sh/708 is trying to acquire lock:
> [ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
> [ 132.379224]
> [ 132.379224] but task is already holding lock:
> [ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379226]
> [ 132.379227] which lock already depends on the new lock.
> [ 132.379227]
> [ 132.379227]
> [ 132.379228] the existing dependency chain (in reverse order) is:
> [ 132.379228]
> [ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
> [ 132.379230] validate_chain.isra.22+0xb36/0xd18
> [ 132.379231] __lock_acquire+0x650/0x880
> [ 132.379231] lock_acquire+0x24e/0x2b0
> [ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379232] get_page_from_freelist+0x548/0xf30
> [ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
> [ 132.379233] allocate_slab+0xde/0x668
> [ 132.379233] new_slab+0x94/0xa8
> [ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
> [ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
> [ 132.379235] kmem_cache_alloc+0x136/0x4b8
> [ 132.379235] __debug_object_init+0xaa/0x448
> [ 132.379236] debug_object_activate+0x182/0x238
> [ 132.379236] mod_timer+0xe6/0x450
> [ 132.379236] prandom_reseed+0x44/0x50
> [ 132.379237] do_one_initcall+0xa2/0x150
> [ 132.379237] kernel_init_freeable+0x228/0x2d8
> [ 132.379238] kernel_init+0x2a/0x138
> [ 132.379238] kernel_thread_starter+0x6/0xc
> [ 132.379239] kernel_thread_starter+0x0/0xc
> [ 132.379239]
> [ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
> [ 132.379241] validate_chain.isra.22+0xb36/0xd18
> [ 132.379242] __lock_acquire+0x650/0x880
> [ 132.379242] lock_acquire+0x24e/0x2b0
> [ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379243] lock_timer_base+0x78/0xa8
> [ 132.379243] mod_timer+0xba/0x450
> [ 132.379244] sclp_console_write+0x262/0x2b8
> [ 132.379244] console_unlock+0x4a6/0x658
> [ 132.379245] register_console+0x31c/0x430
> [ 132.379245] sclp_console_init+0x152/0x170
> [ 132.379245] console_init+0x3a/0x58
> [ 132.379246] start_kernel+0x350/0x480
> [ 132.379246] _stext+0x20/0x80
> [ 132.379246]
> [ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
> [ 132.379249] check_prev_add+0x150/0x648
> [ 132.379249] validate_chain.isra.22+0xb36/0xd18
> [ 132.379249] __lock_acquire+0x650/0x880
> [ 132.379250] lock_acquire+0x24e/0x2b0
> [ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
> [ 132.379251] sclp_console_write+0x4a/0x2b8
> [ 132.379251] console_unlock+0x4a6/0x658
> [ 132.379252] vprintk_emit+0x33e/0x350
> [ 132.379252] vprintk_default+0x44/0x58
> [ 132.379252] printk+0x4e/0x58
> [ 132.379253] __offline_isolated_pages+0x1a0/0x340
> [ 132.379253] offline_isolated_pages_cb+0x2e/0x40
> [ 132.379254] walk_system_ram_range+0x96/0xe8
> [ 132.379254] __offline_pages.constprop.43+0x6da/0x918
> [ 132.379255] memory_subsys_offline+0x6a/0xa0
> [ 132.379255] device_offline+0x84/0xd8
> [ 132.379255] store_mem_state+0xfe/0x120
> [ 132.379256] kernfs_fop_write+0x126/0x1d0
> [ 132.379256] __vfs_write+0x46/0x140
> [ 132.379257] vfs_write+0xb8/0x1a8
> [ 132.379257] SyS_write+0x66/0xc0
> [ 132.379257] system_call+0xc4/0x240
> [ 132.379258]
> [ 132.379258] other info that might help us debug this:
> [ 132.379258]
> [ 132.379259] Chain exists of:
> [ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
> [ 132.379262]
> [ 132.379262] Possible unsafe locking scenario:
> [ 132.379262]
> [ 132.379263] CPU0 CPU1
> [ 132.379263] ---- ----
> [ 132.379264] lock(&(&zone->lock)->rlock);
> [ 132.379265] lock(&(&base->lock)->rlock);
> [ 132.379266] lock(&(&zone->lock)->rlock);
> [ 132.379267] lock(&(&sclp_con_lock)->rlock);
> [ 132.379268]
> [ 132.379268] *** DEADLOCK ***
> [ 132.379269]
> [ 132.379269] 10 locks held by sh/708:
> [ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
> [ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
> [ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
> [ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
> [ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
> [ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
> [ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
> [ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
> [ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
> [ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350

Is there a stack trace of where the lockdep dump happened? That is
useful too. Otherwise we don't see where the inverse happened.

-- Steve

2017-03-24 16:58:02

by Sebastian Ott

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Fri, 24 Mar 2017, Steven Rostedt wrote:
> Is there a stack trace of where the lockdep dump happened? That is
> useful too. Otherwise we don't see where the inverse happened.

Sure. Here you go.

[ 132.379220] ======================================================
[ 132.379220] [ INFO: possible circular locking dependency detected ]
[ 132.379221] 4.11.0-rc3 #330 Not tainted
[ 132.379221] -------------------------------------------------------
[ 132.379222] sh/708 is trying to acquire lock:
[ 132.379222] (&(&sclp_con_lock)->rlock){-.-...}, at: [<000000000074524a>] sclp_console_write+0x4a/0x2b8
[ 132.379224]
[ 132.379224] but task is already holding lock:
[ 132.379224] (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
[ 132.379226]
[ 132.379227] which lock already depends on the new lock.
[ 132.379227]
[ 132.379227]
[ 132.379228] the existing dependency chain (in reverse order) is:
[ 132.379228]
[ 132.379229] -> #2 (&(&zone->lock)->rlock){..-...}:
[ 132.379230] validate_chain.isra.22+0xb36/0xd18
[ 132.379231] __lock_acquire+0x650/0x880
[ 132.379231] lock_acquire+0x24e/0x2b0
[ 132.379232] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379232] get_page_from_freelist+0x548/0xf30
[ 132.379233] __alloc_pages_nodemask+0x1c8/0x14c0
[ 132.379233] allocate_slab+0xde/0x668
[ 132.379233] new_slab+0x94/0xa8
[ 132.379234] ___slab_alloc.constprop.66+0x556/0x578
[ 132.379234] __slab_alloc.isra.62.constprop.65+0x74/0xa8
[ 132.379235] kmem_cache_alloc+0x136/0x4b8
[ 132.379235] __debug_object_init+0xaa/0x448
[ 132.379236] debug_object_activate+0x182/0x238
[ 132.379236] mod_timer+0xe6/0x450
[ 132.379236] prandom_reseed+0x44/0x50
[ 132.379237] do_one_initcall+0xa2/0x150
[ 132.379237] kernel_init_freeable+0x228/0x2d8
[ 132.379238] kernel_init+0x2a/0x138
[ 132.379238] kernel_thread_starter+0x6/0xc
[ 132.379239] kernel_thread_starter+0x0/0xc
[ 132.379239]
[ 132.379239] -> #1 (&(&base->lock)->rlock){-.-.-.}:
[ 132.379241] validate_chain.isra.22+0xb36/0xd18
[ 132.379242] __lock_acquire+0x650/0x880
[ 132.379242] lock_acquire+0x24e/0x2b0
[ 132.379242] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379243] lock_timer_base+0x78/0xa8
[ 132.379243] mod_timer+0xba/0x450
[ 132.379244] sclp_console_write+0x262/0x2b8
[ 132.379244] console_unlock+0x4a6/0x658
[ 132.379245] register_console+0x31c/0x430
[ 132.379245] sclp_console_init+0x152/0x170
[ 132.379245] console_init+0x3a/0x58
[ 132.379246] start_kernel+0x350/0x480
[ 132.379246] _stext+0x20/0x80
[ 132.379246]
[ 132.379247] -> #0 (&(&sclp_con_lock)->rlock){-.-...}:
[ 132.379249] check_prev_add+0x150/0x648
[ 132.379249] validate_chain.isra.22+0xb36/0xd18
[ 132.379249] __lock_acquire+0x650/0x880
[ 132.379250] lock_acquire+0x24e/0x2b0
[ 132.379250] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379251] sclp_console_write+0x4a/0x2b8
[ 132.379251] console_unlock+0x4a6/0x658
[ 132.379252] vprintk_emit+0x33e/0x350
[ 132.379252] vprintk_default+0x44/0x58
[ 132.379252] printk+0x4e/0x58
[ 132.379253] __offline_isolated_pages+0x1a0/0x340
[ 132.379253] offline_isolated_pages_cb+0x2e/0x40
[ 132.379254] walk_system_ram_range+0x96/0xe8
[ 132.379254] __offline_pages.constprop.43+0x6da/0x918
[ 132.379255] memory_subsys_offline+0x6a/0xa0
[ 132.379255] device_offline+0x84/0xd8
[ 132.379255] store_mem_state+0xfe/0x120
[ 132.379256] kernfs_fop_write+0x126/0x1d0
[ 132.379256] __vfs_write+0x46/0x140
[ 132.379257] vfs_write+0xb8/0x1a8
[ 132.379257] SyS_write+0x66/0xc0
[ 132.379257] system_call+0xc4/0x240
[ 132.379258]
[ 132.379258] other info that might help us debug this:
[ 132.379258]
[ 132.379259] Chain exists of:
[ 132.379259] &(&sclp_con_lock)->rlock --> &(&base->lock)->rlock --> &(&zone->lock)->rlock
[ 132.379262]
[ 132.379262] Possible unsafe locking scenario:
[ 132.379262]
[ 132.379263] CPU0 CPU1
[ 132.379263] ---- ----
[ 132.379264] lock(&(&zone->lock)->rlock);
[ 132.379265] lock(&(&base->lock)->rlock);
[ 132.379266] lock(&(&zone->lock)->rlock);
[ 132.379267] lock(&(&sclp_con_lock)->rlock);
[ 132.379268]
[ 132.379268] *** DEADLOCK ***
[ 132.379269]
[ 132.379269] 10 locks held by sh/708:
[ 132.379269] #0: (sb_writers#4){.+.+.+}, at: [<0000000000345d4a>] vfs_write+0xa2/0x1a8
[ 132.379272] #1: (&of->mutex){+.+.+.}, at: [<00000000003e5e60>] kernfs_fop_write+0xd8/0x1d0
[ 132.379273] #2: (s_active#180){.+.+.+}, at: [<00000000003e5e6c>] kernfs_fop_write+0xe4/0x1d0
[ 132.379275] #3: (device_hotplug_lock){+.+...}, at: [<000000000069bb58>] lock_device_hotplug_sysfs+0x30/0x68
[ 132.379277] #4: (memory_add_remove_lock){+.+...}, at: [<000000000031cfe6>] mem_hotplug_begin+0x3e/0xd0
[ 132.379279] #5: (mem_hotplug.lock){++++.+}, at: [<000000000031cfa8>] mem_hotplug_begin+0x0/0xd0
[ 132.379281] #6: (mem_hotplug.lock#2){+.+...}, at: [<000000000031d02c>] mem_hotplug_begin+0x84/0xd0
[ 132.379283] #7: (&dev->mutex){......}, at: [<000000000069d55c>] device_offline+0x5c/0xd8
[ 132.379285] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002a65a8>] __offline_isolated_pages+0xe0/0x340
[ 132.379287] #9: (console_lock){+.+...}, at: [<00000000001bdc02>] vprintk_emit+0x332/0x350
[ 132.379289]
[ 132.379289] stack backtrace:
[ 132.379290] CPU: 1 PID: 708 Comm: sh Not tainted 4.11.0-rc3 #330
[ 132.379290] Hardware name: IBM 2827 H66 703 (LPAR)
[ 132.379290] Stack:
[ 132.379291] 000000006f5ab458 000000006f5ab4e8 0000000000000003 0000000000000000
[ 132.379293] 000000006f5ab588 000000006f5ab500 000000006f5ab500 0000000000000020
[ 132.379294] 0000000000929180 000000000000000a 0000000000000020 000000000000000a
[ 132.379296] 000000000000000c 000000006f5ab550 0000000000000000 0000000000000000
[ 132.379298] 04000000016e0b18 0000000000111da2 000000006f5ab4e8 000000006f5ab540
[ 132.379300] Call Trace:
[ 132.379300] ([<0000000000111c86>] show_trace+0x8e/0xe0)
[ 132.379301] [<0000000000111d58>] show_stack+0x80/0xd8
[ 132.379301] [<00000000005ff656>] dump_stack+0x96/0xd8
[ 132.379302] [<0000000000291572>] print_circular_bug+0x2f2/0x308
[ 132.379302] [<00000000001a6b98>] check_prev_add+0x150/0x648
[ 132.379303] [<00000000001a7bc6>] validate_chain.isra.22+0xb36/0xd18
[ 132.379303] [<00000000001a9240>] __lock_acquire+0x650/0x880
[ 132.379304] [<00000000001a9bee>] lock_acquire+0x24e/0x2b0
[ 132.379304] [<00000000008dcab0>] _raw_spin_lock_irqsave+0x70/0xb8
[ 132.379304] [<000000000074524a>] sclp_console_write+0x4a/0x2b8
[ 132.379305] [<00000000001bd71e>] console_unlock+0x4a6/0x658
[ 132.379305] [<00000000001bdc0e>] vprintk_emit+0x33e/0x350
[ 132.379306] [<00000000001bddec>] vprintk_default+0x44/0x58
[ 132.379306] [<0000000000291c8e>] printk+0x4e/0x58
[ 132.379307] [<00000000002a6668>] __offline_isolated_pages+0x1a0/0x340
[ 132.379307] [<000000000031cbbe>] offline_isolated_pages_cb+0x2e/0x40
[ 132.379308] [<0000000000147466>] walk_system_ram_range+0x96/0xe8
[ 132.379308] [<00000000008d2bba>] __offline_pages.constprop.43+0x6da/0x918
[ 132.379309] [<00000000006b9b2a>] memory_subsys_offline+0x6a/0xa0
[ 132.379309] [<000000000069d584>] device_offline+0x84/0xd8
[ 132.379310] [<00000000006ba146>] store_mem_state+0xfe/0x120
[ 132.379310] [<00000000003e5eae>] kernfs_fop_write+0x126/0x1d0
[ 132.379311] [<0000000000344b0e>] __vfs_write+0x46/0x140
[ 132.379311] [<0000000000345d60>] vfs_write+0xb8/0x1a8
[ 132.379312] [<000000000034742e>] SyS_write+0x66/0xc0
[ 132.379312] [<00000000008dda2c>] system_call+0xc4/0x240
[ 132.379313] INFO: lockdep is turned off.




[ 377.096545] ======================================================
[ 377.096545] [ INFO: possible circular locking dependency detected ]
[ 377.096546] 4.11.0-rc2-00238-ga722844 #120 Not tainted
[ 377.096546] -------------------------------------------------------
[ 377.096547] sh/848 is trying to acquire lock:
[ 377.096547] ((console_sem).lock){-.....}, at: [<00000000001a594e>] down_trylock+0x2e/0x68
[ 377.096549] but task is already holding lock:
[ 377.096550] (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370
[ 377.096552] which lock already depends on the new lock.
[ 377.096553] the existing dependency chain (in reverse order) is:

[ 377.096554] -> #3 (&(&zone->lock)->rlock){..-...}:
[ 377.096556] validate_chain.isra.10+0xb7a/0xda8
[ 377.096557] __lock_acquire+0x62c/0x850
[ 377.096557] lock_acquire+0x254/0x2b8
[ 377.096558] _raw_spin_lock_irqsave+0x70/0xb8
[ 377.096558] get_page_from_freelist+0x4e2/0xf28
[ 377.096558] __alloc_pages_nodemask+0x1fa/0x1528
[ 377.096559] allocate_slab+0xf2/0x6c0
[ 377.096559] new_slab+0x94/0xa8
[ 377.096560] ___slab_alloc.constprop.22+0x570/0x590
[ 377.096560] __slab_alloc.isra.16.constprop.21+0x74/0xa8
[ 377.096561] kmem_cache_alloc+0x13c/0x4b0
[ 377.096561] __debug_object_init+0x5c/0x468
[ 377.096561] hrtimer_init+0x42/0x1d8
[ 377.096562] init_dl_task_timer+0x3a/0x58
[ 377.096562] __sched_fork.isra.5+0x7e/0xe8
[ 377.096563] init_idle+0x7a/0x278
[ 377.096563] fork_idle+0xa4/0xb8
[ 377.096564] idle_threads_init+0x64/0xc8
[ 377.096564] smp_init+0x2e/0x110
[ 377.096565] kernel_init_freeable+0x16c/0x2e0
[ 377.096565] kernel_init+0x2a/0x148
[ 377.096565] kernel_thread_starter+0x6/0xc
[ 377.096566] kernel_thread_starter+0x0/0xc

[ 377.096567] -> #2 (&rq->lock){-.-.-.}:
[ 377.096568] validate_chain.isra.10+0xb7a/0xda8
[ 377.096569] __lock_acquire+0x62c/0x850
[ 377.096569] lock_acquire+0x254/0x2b8
[ 377.096569] _raw_spin_lock+0x60/0xa0
[ 377.096570] task_fork_fair+0x54/0x118
[ 377.096570] sched_fork+0x13e/0x2a0
[ 377.096571] copy_process+0x642/0x1e80
[ 377.096571] _do_fork+0xc2/0x6d0
[ 377.096571] kernel_thread+0x4e/0x60
[ 377.096572] rest_init+0x42/0x170
[ 377.096572] start_kernel+0x464/0x478
[ 377.096572] _stext+0x20/0x80

[ 377.096573] -> #1 (&p->pi_lock){-.-.-.}:
[ 377.096574] validate_chain.isra.10+0xb7a/0xda8
[ 377.096575] __lock_acquire+0x62c/0x850
[ 377.096575] lock_acquire+0x254/0x2b8
[ 377.096576] _raw_spin_lock_irqsave+0x70/0xb8
[ 377.096576] try_to_wake_up+0x4a/0x5f8
[ 377.096576] up+0x58/0x70
[ 377.096577] __up_console_sem+0x5a/0x90
[ 377.096577] console_unlock+0x610/0x670
[ 377.096578] vprintk_emit+0x308/0x378
[ 377.096578] dev_vprintk_emit+0x1e6/0x200
[ 377.096578] dev_printk_emit+0x4e/0x60
[ 377.096579] __dev_printk+0x8e/0xb0
[ 377.096579] dev_warn+0x68/0x78
[ 377.096579] qeth_dev_portname_store+0x58/0x70
[ 377.096580] kernfs_fop_write+0x124/0x1c8
[ 377.096580] __vfs_write+0x36/0x140
[ 377.096580] vfs_write+0xb8/0x1a0
[ 377.096581] SyS_write+0x66/0xc0
[ 377.096581] system_call+0xc4/0x240

[ 377.096582] -> #0 ((console_sem).lock){-.....}:
[ 377.096583] check_prev_add+0x160/0x6e8
[ 377.096584] validate_chain.isra.10+0xb7a/0xda8
[ 377.096584] __lock_acquire+0x62c/0x850
[ 377.096584] lock_acquire+0x254/0x2b8
[ 377.096585] _raw_spin_lock_irqsave+0x70/0xb8
[ 377.096585] down_trylock+0x2e/0x68
[ 377.096586] __down_trylock_console_sem+0x4a/0xc0
[ 377.096586] console_trylock+0x32/0xc8
[ 377.096586] vprintk_emit+0x2fc/0x378
[ 377.096587] vprintk_default+0x44/0x58
[ 377.096587] printk+0x4e/0x60
[ 377.096587] __offline_isolated_pages+0x168/0x370
[ 377.096588] offline_isolated_pages_cb+0x2e/0x40
[ 377.096588] walk_system_ram_range+0x92/0xf0
[ 377.096589] __offline_pages.constprop.9+0x74a/0x988
[ 377.096589] memory_subsys_offline+0x6c/0xa0
[ 377.096589] device_offline+0x84/0xe0
[ 377.096590] store_mem_state+0xfe/0x120
[ 377.096590] kernfs_fop_write+0x124/0x1c8
[ 377.096590] __vfs_write+0x36/0x140
[ 377.096591] vfs_write+0xb8/0x1a0
[ 377.096591] SyS_write+0x66/0xc0
[ 377.096592] system_call+0xc4/0x240

[ 377.096592] other info that might help us debug this:

[ 377.096593] Chain exists of:
[ 377.096593] (console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock

[ 377.096595] Possible unsafe locking scenario:

[ 377.096596] CPU0 CPU1
[ 377.096596] ---- ----
[ 377.096597] lock(&(&zone->lock)->rlock);
[ 377.096598] lock(&rq->lock);
[ 377.096599] lock(&(&zone->lock)->rlock);
[ 377.096600] lock((console_sem).lock);

[ 377.096601] *** DEADLOCK ***

[ 377.096601] 9 locks held by sh/848:
[ 377.096602] #0: (sb_writers#5){.+.+.+}, at: [<000000000034f9c2>] vfs_write+0xa2/0x1a0
[ 377.096604] #1: (&of->mutex){+.+.+.}, at: [<00000000003f1afe>] kernfs_fop_write+0xd6/0x1c8
[ 377.096605] #2: (s_active#51){.+.+.+}, at: [<00000000003f1b0a>] kernfs_fop_write+0xe2/0x1c8
[ 377.096607] #3: (device_hotplug_lock){+.+.+.}, at: [<00000000006adb70>] lock_device_hotplug_sysfs+0x30/0x70
[ 377.096609] #4: (memory_add_remove_lock){+.+.+.}, at: [<0000000000328e16>] mem_hotplug_begin+0x3e/0xd0
[ 377.096610] #5: (mem_hotplug.lock){++++++}, at: [<0000000000328dd8>] mem_hotplug_begin+0x0/0xd0
[ 377.096612] #6: (mem_hotplug.lock#2){+.+.+.}, at: [<0000000000328e5c>] mem_hotplug_begin+0x84/0xd0
[ 377.096614] #7: (&dev->mutex){......}, at: [<00000000006af41c>] device_offline+0x5c/0xe0
[ 377.096615] #8: (&(&zone->lock)->rlock){..-...}, at: [<00000000002ae3da>] __offline_isolated_pages+0x302/0x370

[ 377.096617] stack backtrace:
[ 377.096618] CPU: 0 PID: 848 Comm: sh Not tainted 4.11.0-rc2-00238-ga722844 #120
[ 377.096618] Hardware name: IBM 2827 H66 703 (z/VM)
[ 377.096619] Stack:
[ 377.096619] 000000003ed0f450 000000003ed0f4e0 0000000000000003 0000000000000000
[ 377.096620] 000000003ed0f580 000000003ed0f4f8 000000003ed0f4f8 0000000000000020
[ 377.096629] 00000000009441c0 0000000000000020 000000000000000a 000000000000000a
[ 377.096631] 000000000000000c 000000003ed0f548 0000000000000000 0000000000000000
[ 377.096633] 04000000016fa078 0000000000112338 000000003ed0f4e0 000000003ed0f538
[ 377.096635] Call Trace:
[ 377.096635] ([<0000000000112212>] show_trace+0x8a/0xe0)
[ 377.096636] [<00000000001122e8>] show_stack+0x80/0xd8
[ 377.096636] [<000000000060f5a6>] dump_stack+0x96/0xd8
[ 377.096637] [<00000000001a9d3c>] print_circular_bug+0x314/0x340
[ 377.096637] [<00000000001aacc0>] check_prev_add+0x160/0x6e8
[ 377.096638] [<00000000001abdc2>] validate_chain.isra.10+0xb7a/0xda8
[ 377.096638] [<00000000001ad7dc>] __lock_acquire+0x62c/0x850
[ 377.096639] [<00000000001ae20c>] lock_acquire+0x254/0x2b8
[ 377.096639] [<00000000008f63b8>] _raw_spin_lock_irqsave+0x70/0xb8
[ 377.096639] [<00000000001a594e>] down_trylock+0x2e/0x68
[ 377.096640] [<00000000001c1262>] __down_trylock_console_sem+0x4a/0xc0
[ 377.096641] [<00000000001c130a>] console_trylock+0x32/0xc8
[ 377.096641] [<00000000001c2c84>] vprintk_emit+0x2fc/0x378
[ 377.096641] [<00000000001c2edc>] vprintk_default+0x44/0x58
[ 377.096642] [<000000000029a24e>] printk+0x4e/0x60
[ 377.096642] [<00000000002ae240>] __offline_isolated_pages+0x168/0x370
[ 377.096643] [<00000000003289d6>] offline_isolated_pages_cb+0x2e/0x40
[ 377.096643] [<00000000001492fa>] walk_system_ram_range+0x92/0xf0
[ 377.096644] [<00000000008ec1fa>] __offline_pages.constprop.9+0x74a/0x988
[ 377.096644] [<00000000006cc9bc>] memory_subsys_offline+0x6c/0xa0
[ 377.096645] [<00000000006af444>] device_offline+0x84/0xe0
[ 377.096645] [<00000000006cd0ae>] store_mem_state+0xfe/0x120
[ 377.096645] [<00000000003f1b4c>] kernfs_fop_write+0x124/0x1c8
[ 377.096646] [<000000000034e736>] __vfs_write+0x36/0x140
[ 377.096646] [<000000000034f9d8>] vfs_write+0xb8/0x1a0
[ 377.096647] [<000000000035110e>] SyS_write+0x66/0xc0
[ 377.096647] [<00000000008f74ac>] system_call+0xc4/0x240
[ 377.096648] INFO: lockdep is turned off.

2017-03-25 00:01:25

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

Hello,

On (03/24/17 12:39), Steven Rostedt wrote:
[..]
> Is there a stack trace of where the lockdep dump happened? That is
> useful too. Otherwise we don't see where the inverse happened.

Steven, isn't it the inversion I describe in [1] (after the first lockdep
warning)?

[1] lkml.kernel.org/r/[email protected]

-ss

2017-03-25 00:06:42

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On (03/21/17 13:44), Sergey Senozhatsky wrote:
[..]
> so we probably can
>
>
> 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> meh...
>
>
> 2) switch to printk_deferred() in __offline_isolated_pages().
> meh.. there might a bunch of other printks done from under zone->lock.
>
>
> 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> well, there can be other consoles that do something similar.
>
>
> 4) ... something smart.


Sebastian, does this change make lockdep happy?

it removes console drivers from the __offline_isolated_pages(). not the
best solution I can think of, but the simplest one.

---

mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f749b7ff7c50..eb61e6ab5f4f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
BUG_ON(!PageBuddy(page));
order = page_order(page);
#ifdef CONFIG_DEBUG_VM
- pr_info("remove from free list %lx %d %lx\n",
+ printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
pfn, 1 << order, end_pfn);
#endif
list_del(&page->lru);

2017-03-25 01:00:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Sat, 25 Mar 2017 09:00:05 +0900
Sergey Senozhatsky <[email protected]> wrote:

> Hello,
>
> On (03/24/17 12:39), Steven Rostedt wrote:
> [..]
> > Is there a stack trace of where the lockdep dump happened? That is
> > useful too. Otherwise we don't see where the inverse happened.
>
> Steven, isn't it the inversion I describe in [1] (after the first lockdep
> warning)?
>
> [1] lkml.kernel.org/r/[email protected]
>

Yeah, I believe you are right. I just wanted to make sure. It's the
same backtrace as the "(&(&sclp_con_lock)->rlock){-.-...}:" dump, but I
wanted to make sure.

-- Steve

2017-03-25 01:09:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Sat, 25 Mar 2017 09:04:42 +0900
Sergey Senozhatsky <[email protected]> wrote:

> On (03/21/17 13:44), Sergey Senozhatsky wrote:
> [..]
> > so we probably can
> >
> >
> > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > meh...
> >
> >
> > 2) switch to printk_deferred() in __offline_isolated_pages().
> > meh.. there might a bunch of other printks done from under zone->lock.
> >
> >
> > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > well, there can be other consoles that do something similar.
> >
> >
> > 4) ... something smart.
>
>
> Sebastian, does this change make lockdep happy?
>
> it removes console drivers from the __offline_isolated_pages(). not the
> best solution I can think of, but the simplest one.
>
> ---
>
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f749b7ff7c50..eb61e6ab5f4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> BUG_ON(!PageBuddy(page));
> order = page_order(page);
> #ifdef CONFIG_DEBUG_VM
> - pr_info("remove from free list %lx %d %lx\n",
> + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> pfn, 1 << order, end_pfn);
> #endif
> list_del(&page->lru);


My fear is that this will trigger for any printk in page_alloc.c under
the zone lock.

-- Steve

2017-03-25 01:39:12

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On (03/24/17 21:08), Steven Rostedt wrote:
> > Sebastian, does this change make lockdep happy?
> >
> > it removes console drivers from the __offline_isolated_pages(). not the
> > best solution I can think of, but the simplest one.
> >
> > ---
> >
> > mm/page_alloc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f749b7ff7c50..eb61e6ab5f4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> > #ifdef CONFIG_DEBUG_VM
> > - pr_info("remove from free list %lx %d %lx\n",
> > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> > #endif
> > list_del(&page->lru);
>
>
> My fear is that this will trigger for any printk in page_alloc.c under
> the zone lock.

absolutely true.

I Cc'd debugobjects, mm and sclp_console maintainers because the
real (smart) solution to the problem is somewhere there.



another problem (not reported) is that we have conflicting dependencies
mod_timer -> sclp_console
sclp_console -> mod_timer

which can result in a deadlock:
mod_timer -> debugobjects -> printk -> sclp_console -> mod_timer


this one, I think, can be addressed by switching to a printk_safe in debugobjects.


// I'm traveling now. there will be delays in replies, sorry.

-ss

2017-03-25 10:59:19

by Sebastian Ott

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Sat, 25 Mar 2017, Sergey Senozhatsky wrote:

> On (03/21/17 13:44), Sergey Senozhatsky wrote:
> [..]
> > so we probably can
> >
> >
> > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > meh...
> >
> >
> > 2) switch to printk_deferred() in __offline_isolated_pages().
> > meh.. there might a bunch of other printks done from under zone->lock.
> >
> >
> > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > well, there can be other consoles that do something similar.
> >
> >
> > 4) ... something smart.
>
>
> Sebastian, does this change make lockdep happy?
>
> it removes console drivers from the __offline_isolated_pages(). not the
> best solution I can think of, but the simplest one.

It does. No complaint from lockdep.

2017-03-28 14:22:42

by Michal Hocko

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Sat 25-03-17 09:04:42, Sergey Senozhatsky wrote:
> On (03/21/17 13:44), Sergey Senozhatsky wrote:
> [..]
> > so we probably can
> >
> >
> > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > meh...
> >
> >
> > 2) switch to printk_deferred() in __offline_isolated_pages().
> > meh.. there might a bunch of other printks done from under zone->lock.
> >
> >
> > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > well, there can be other consoles that do something similar.
> >
> >
> > 4) ... something smart.
>
>
> Sebastian, does this change make lockdep happy?
>
> it removes console drivers from the __offline_isolated_pages(). not the
> best solution I can think of, but the simplest one.
>
> ---
>
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f749b7ff7c50..eb61e6ab5f4f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> BUG_ON(!PageBuddy(page));
> order = page_order(page);
> #ifdef CONFIG_DEBUG_VM
> - pr_info("remove from free list %lx %d %lx\n",
> + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> pfn, 1 << order, end_pfn);
> #endif
> list_del(&page->lru);

I believe this is not a proper fix. Although this code is ugly and maybe
it doesn't really need zone->lock because that should be the page
allocator internal thing the problem is that printk shouldn't impose
such a subtle dependency on locks. Why does the timer needs to allocate
at all?

--
Michal Hocko
SUSE Labs

2017-03-28 16:00:32

by Petr Mladek

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Tue 2017-03-28 16:22:27, Michal Hocko wrote:
> On Sat 25-03-17 09:04:42, Sergey Senozhatsky wrote:
> > On (03/21/17 13:44), Sergey Senozhatsky wrote:
> > [..]
> > > so we probably can
> > >
> > >
> > > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > > meh...
> > >
> > >
> > > 2) switch to printk_deferred() in __offline_isolated_pages().
> > > meh.. there might a bunch of other printks done from under zone->lock.
> > >
> > >
> > > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > > well, there can be other consoles that do something similar.
> > >
> > >
> > > 4) ... something smart.
> >
> >
> > Sebastian, does this change make lockdep happy?
> >
> > it removes console drivers from the __offline_isolated_pages(). not the
> > best solution I can think of, but the simplest one.
> >
> > ---
> >
> > mm/page_alloc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f749b7ff7c50..eb61e6ab5f4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> > #ifdef CONFIG_DEBUG_VM
> > - pr_info("remove from free list %lx %d %lx\n",
> > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> > #endif
> > list_del(&page->lru);
>
> I believe this is not a proper fix. Although this code is ugly and maybe
> it doesn't really need zone->lock because that should be the page
> allocator internal thing the problem is that printk shouldn't impose
> such a subtle dependency on locks. Why does the timer needs to allocate
> at all?

printk/console use timers to postpone flushing of buffers. There are
often more consequent printks. The code wants to wait a bit and flush
them together eventually. At the same time, it wants to set a deadline
for the flushing. It makes sure that they will get flushed in a
reasonable time even when the buffer is not full. It is questionable
but it makes some sense.

In each case, the timer code is used also by scheduler and we probably
need to use scheduler from printk.


Regarding the timer code. The problem seems to be with static
timers. They call debug_object_init() when the timer is used
for the first time. See the special handling of not-found
objects in debug_object_activate().

Now, __debug_object_init() calls fill_pool() that allocates
the memory. A solution would be to either use static
struct kmem_cache for statically defined timers and
avoid the allocation. Or we should call fill_pool()
asynchronously from a safe context.

What do you think?

Best Regards,
Petr

2017-03-29 07:31:58

by Michal Hocko

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Tue 28-03-17 18:00:16, Petr Mladek wrote:
> On Tue 2017-03-28 16:22:27, Michal Hocko wrote:
> > On Sat 25-03-17 09:04:42, Sergey Senozhatsky wrote:
> > > On (03/21/17 13:44), Sergey Senozhatsky wrote:
> > > [..]
> > > > so we probably can
> > > >
> > > >
> > > > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > > > meh...
> > > >
> > > >
> > > > 2) switch to printk_deferred() in __offline_isolated_pages().
> > > > meh.. there might a bunch of other printks done from under zone->lock.
> > > >
> > > >
> > > > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > > > well, there can be other consoles that do something similar.
> > > >
> > > >
> > > > 4) ... something smart.
> > >
> > >
> > > Sebastian, does this change make lockdep happy?
> > >
> > > it removes console drivers from the __offline_isolated_pages(). not the
> > > best solution I can think of, but the simplest one.
> > >
> > > ---
> > >
> > > mm/page_alloc.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index f749b7ff7c50..eb61e6ab5f4f 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > > BUG_ON(!PageBuddy(page));
> > > order = page_order(page);
> > > #ifdef CONFIG_DEBUG_VM
> > > - pr_info("remove from free list %lx %d %lx\n",
> > > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > > pfn, 1 << order, end_pfn);
> > > #endif
> > > list_del(&page->lru);
> >
> > I believe this is not a proper fix. Although this code is ugly and maybe
> > it doesn't really need zone->lock because that should be the page
> > allocator internal thing the problem is that printk shouldn't impose
> > such a subtle dependency on locks. Why does the timer needs to allocate
> > at all?
>
> printk/console use timers to postpone flushing of buffers. There are
> often more consequent printks. The code wants to wait a bit and flush
> them together eventually. At the same time, it wants to set a deadline
> for the flushing. It makes sure that they will get flushed in a
> reasonable time even when the buffer is not full. It is questionable
> but it makes some sense.
>
> In each case, the timer code is used also by scheduler and we probably
> need to use scheduler from printk.
>
> Regarding the timer code. The problem seems to be with static
> timers. They call debug_object_init() when the timer is used
> for the first time. See the special handling of not-found
> objects in debug_object_activate().

Thanks for the clarification!

> Now, __debug_object_init() calls fill_pool() that allocates
> the memory. A solution would be to either use static
> struct kmem_cache

I am not sure what do you mean by that.

> for statically defined timers and avoid the allocation. Or we should
> call fill_pool() asynchronously from a safe context.

I think we should avoid the allocation completely. It is GFP_ATOMIC and
so likely to fail under heavy memory pressure. Async fill will make it
slightly more complicated but still unreliable.

> What do you think?

Why cannot we simply embed this debugging data into the timer itself?
It will make the structure larger (I didn't check how much) but this is
an opt in feature so it should be acceptable.

A subtle dependecny on the allocator is really bad and we should get rid
of it.
--
Michal Hocko
SUSE Labs

2017-03-30 03:59:51

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On (03/28/17 16:22), Michal Hocko wrote:
[..]
> > Sebastian, does this change make lockdep happy?
> >
> > it removes console drivers from the __offline_isolated_pages(). not the
> > best solution I can think of, but the simplest one.
> >
> > ---
> >
> > mm/page_alloc.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f749b7ff7c50..eb61e6ab5f4f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > BUG_ON(!PageBuddy(page));
> > order = page_order(page);
> > #ifdef CONFIG_DEBUG_VM
> > - pr_info("remove from free list %lx %d %lx\n",
> > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n",
> > pfn, 1 << order, end_pfn);
> > #endif
> > list_del(&page->lru);
>
> I believe this is not a proper fix.

oh, absolutely. I hate it. didn't really propose it as a fix. mostly
did it just to verify that there are no other lock inversions behind
the one that has been reported (lockdep turns off itself when it
detects the first lock dependency inversion).

> Although this code is ugly and maybe it doesn't really need zone->lock
> because that should be the page allocator internal thing the problem is
> that printk shouldn't impose such a subtle dependency on locks. Why does
> the timer needs to allocate at all?

I believe Petr has answered your questions. sorry for the delay.

-ss

2017-03-30 09:45:05

by Petr Mladek

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

On Wed 2017-03-29 09:31:47, Michal Hocko wrote:
> On Tue 28-03-17 18:00:16, Petr Mladek wrote:
> > On Tue 2017-03-28 16:22:27, Michal Hocko wrote:
> > > On Sat 25-03-17 09:04:42, Sergey Senozhatsky wrote:
> > > > On (03/21/17 13:44), Sergey Senozhatsky wrote:
> > > > [..]
> > > > > so we probably can
> > > > >
> > > > >
> > > > > 1) move pr_info() out of zone->lock in __offline_isolated_pages().
> > > > > meh...
> > > > >
> > > > >
> > > > > 2) switch to printk_deferred() in __offline_isolated_pages().
> > > > > meh.. there might a bunch of other printks done from under zone->lock.
> > > > >
> > > > >
> > > > > 3) move add_timer() out of sclp_con_lock console in sclp_console_write().
> > > > > well, there can be other consoles that do something similar.
> > > > >
> > > > >
> > > > > 4) ... something smart.
> > > >
> > > >
> > Regarding the timer code. The problem seems to be with static
> > timers. They call debug_object_init() when the timer is used
> > for the first time. See the special handling of not-found
> > objects in debug_object_activate().
>
> Thanks for the clarification!
>
> > Now, __debug_object_init() calls fill_pool() that allocates
> > the memory. A solution would be to either use static
> > struct kmem_cache
>
> I am not sure what do you mean by that.

The problem is when the timer is defined using
DEFINE_TIMER() and initialized using __TIMER_INITIALIZER().

I had an unclear idea about extending the macro to define
also the needed structure that might later be used by
debug_object_init() instead of taking it from the pool.

I am sorry for the confusion with struct kmem_cache.
It is the type of the pointer that is passed when
allocating the memory in fill_pool(). It seems
we actually need struct debug_obj. But it might
be even more complicated.


> > for statically defined timers and avoid the allocation. Or we should
> > call fill_pool() asynchronously from a safe context.
>
> I think we should avoid the allocation completely. It is GFP_ATOMIC and
> so likely to fail under heavy memory pressure. Async fill will make it
> slightly more complicated but still unreliable.
>
> > What do you think?
>
> Why cannot we simply embed this debugging data into the timer itself?
> It will make the structure larger (I didn't check how much) but this is
> an opt in feature so it should be acceptable.

This sounds reasonable to me.

> A subtle dependecny on the allocator is really bad and
> we should get rid of it.

Yup.

Best Regards,
Petr

2017-07-24 12:46:42

by Sebastian Ott

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

Hello Sergey,

On Tue, 21 Mar 2017, Sergey Senozhatsky wrote:
> // I kept only lockdep splats from the original report and cut off .configs
> // full version:
> // lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi
>
> On (03/20/17 17:43), Sebastian Ott wrote:
> > Since commit f975237b7682 ("printk: use printk_safe buffers in printk")
> > I observe lockdep warnings on s390 when doing memory hotplug:

When using the sclp console I can no longer recreate that lockdep warning
but the other one is still present:

[ 347.644660] ======================================================
[ 347.644660] WARNING: possible circular locking dependency detected
[ 347.644661] 4.13.0-rc2 #146 Not tainted
[ 347.644661] ------------------------------------------------------
[ 347.644662] sh/770 is trying to acquire lock:
[ 347.644662] (&console_sch_key){-.-...}, at: [<0000000000763d80>] raw3215_write+0x58/0x208
[ 347.644665] but task is already holding lock:
[ 347.644665] (&(&zone->lock)->rlock){-.-...}, at: [<00000000002b693e>] __offline_isolated_pages+0x316/0x388
[ 347.644668] which lock already depends on the new lock.
[ 347.644669] the existing dependency chain (in reverse order) is:
[ 347.644670] -> #4 (&(&zone->lock)->rlock){-.-...}:
[ 347.644672] validate_chain.isra.10+0xb56/0xd88
[ 347.644673] __lock_acquire+0x62c/0x850
[ 347.644673] lock_acquire+0x254/0x2b8
[ 347.644674] _raw_spin_lock_irqsave+0x70/0xb8
[ 347.644674] get_page_from_freelist+0x446/0xf30
[ 347.644675] __alloc_pages_nodemask+0x200/0x1568
[ 347.644675] allocate_slab+0xf0/0x658
[ 347.644676] new_slab+0x94/0xa8
[ 347.644677] ___slab_alloc.constprop.23+0x55e/0x580
[ 347.644677] __slab_alloc.isra.17.constprop.22+0x74/0xa8
[ 347.644678] kmem_cache_alloc+0x13c/0x4b0
[ 347.644678] __debug_object_init+0x5c/0x468
[ 347.644679] hrtimer_init+0x42/0x1d8
[ 347.644679] init_dl_task_timer+0x3a/0x58
[ 347.644680] __sched_fork.isra.2+0x82/0xd8
[ 347.644680] init_idle+0x7a/0x278
[ 347.644681] fork_idle+0xa4/0xb8
[ 347.644681] idle_threads_init+0x6a/0xd0
[ 347.644682] smp_init+0x34/0x110
[ 347.644682] kernel_init_freeable+0x166/0x2d8
[ 347.644683] kernel_init+0x2a/0x148
[ 347.644683] kernel_thread_starter+0x6/0xc
[ 347.644684] kernel_thread_starter+0x0/0xc

[ 347.644684] -> #3 (&rq->lock){-.-.-.}:
[ 347.644686] validate_chain.isra.10+0xb56/0xd88
[ 347.644687] __lock_acquire+0x62c/0x850
[ 347.644687] lock_acquire+0x254/0x2b8
[ 347.644688] _raw_spin_lock+0x60/0xa0
[ 347.644688] task_fork_fair+0x6a/0x160
[ 347.644689] sched_fork+0x13e/0x2a0
[ 347.644689] copy_process+0x676/0x1ec0
[ 347.644690] _do_fork+0xc2/0x6d0
[ 347.644690] kernel_thread+0x4e/0x60
[ 347.644691] rest_init+0x48/0x290
[ 347.644691] start_kernel+0x470/0x480
[ 347.644692] _stext+0x20/0x80

[ 347.644693] -> #2 (&p->pi_lock){-.-.-.}:
[ 347.644695] validate_chain.isra.10+0xb56/0xd88
[ 347.644695] __lock_acquire+0x62c/0x850
[ 347.644696] lock_acquire+0x254/0x2b8
[ 347.644696] _raw_spin_lock_irqsave+0x70/0xb8
[ 347.644697] try_to_wake_up+0x4a/0x600
[ 347.644697] autoremove_wake_function+0x2e/0x88
[ 347.644698] __wake_up_common+0x76/0xc0
[ 347.644698] __wake_up+0x54/0x68
[ 347.644699] ccw_device_verify_done+0xae/0x268
[ 347.644700] ccw_request_handler+0x422/0x560
[ 347.644700] do_cio_interrupt+0x224/0x2a0
[ 347.644701] __handle_irq_event_percpu+0x1a6/0x440
[ 347.644701] handle_irq_event_percpu+0x38/0x88
[ 347.644702] handle_percpu_irq+0x84/0xb0
[ 347.644702] generic_handle_irq+0x42/0x60
[ 347.644703] do_IRQ+0x86/0xc8
[ 347.644703] io_int_handler+0x104/0x2d4
[ 347.644704] enabled_wait+0x72/0x140
[ 347.644704] enabled_wait+0x5a/0x140
[ 347.644705] arch_cpu_idle+0x32/0x50
[ 347.644706] default_idle_call+0x52/0x68
[ 347.644706] do_idle+0x118/0x170
[ 347.644707] cpu_startup_entry+0x3e/0x48
[ 347.644707] smp_start_secondary+0x112/0x120
[ 347.644708] restart_int_handler+0x62/0x78
[ 347.644708] (null)

[ 347.644709] -> #1 (&priv->wait_q){-.....}:
[ 347.644711] validate_chain.isra.10+0xb56/0xd88
[ 347.644711] __lock_acquire+0x62c/0x850
[ 347.644712] lock_acquire+0x254/0x2b8
[ 347.644712] _raw_spin_lock_irqsave+0x70/0xb8
[ 347.644713] __wake_up+0x3a/0x68
[ 347.644713] ccw_device_recog_done+0x28e/0x2c8
[ 347.644714] snsid_callback+0x324/0x390
[ 347.644714] ccw_request_handler+0x480/0x560
[ 347.644715] do_cio_interrupt+0x224/0x2a0
[ 347.644715] __handle_irq_event_percpu+0x1a6/0x440
[ 347.644715] handle_irq_event_percpu+0x38/0x88
[ 347.644716] handle_percpu_irq+0x84/0xb0
[ 347.644716] generic_handle_irq+0x42/0x60
[ 347.644717] do_IRQ+0x86/0xc8
[ 347.644717] io_int_handler+0x104/0x2d4
[ 347.644718] _raw_spin_unlock_irq+0x4e/0x78
[ 347.644718] _raw_spin_unlock_irq+0x4a/0x78
[ 347.644719] ccw_device_enable_console+0xa0/0x188
[ 347.644719] con3215_init+0x116/0x1b8
[ 347.644720] console_init+0x40/0x60
[ 347.644720] start_kernel+0x34c/0x480
[ 347.644720] _stext+0x20/0x80

[ 347.644721] -> #0 (&console_sch_key){-.-...}:
[ 347.644723] check_prev_add+0x160/0x6e8
[ 347.644723] validate_chain.isra.10+0xb56/0xd88
[ 347.644723] __lock_acquire+0x62c/0x850
[ 347.644724] lock_acquire+0x254/0x2b8
[ 347.644724] _raw_spin_lock_irqsave+0x70/0xb8
[ 347.644725] raw3215_write+0x58/0x208
[ 347.644725] con3215_write+0x8e/0xf8
[ 347.644725] console_unlock+0x4d8/0x6a8
[ 347.644726] vprintk_emit+0x308/0x378
[ 347.644726] vprintk_default+0x44/0x58
[ 347.644727] printk+0x4e/0x60
[ 347.644727] __offline_isolated_pages+0x16a/0x388
[ 347.644728] offline_isolated_pages_cb+0x2e/0x40
[ 347.644728] walk_system_ram_range+0x92/0xf0
[ 347.644729] __offline_pages.constprop.6+0x6d2/0x910
[ 347.644729] memory_subsys_offline+0x6c/0xa0
[ 347.644730] device_offline+0x84/0xe0
[ 347.644730] store_mem_state+0xfe/0x120
[ 347.644731] kernfs_fop_write+0x132/0x208
[ 347.644731] __vfs_write+0x36/0x158
[ 347.644731] vfs_write+0xb8/0x1a0
[ 347.644732] SyS_write+0x66/0xc0
[ 347.644732] system_call+0xc4/0x298
[ 347.644733] other info that might help us debug this:
[ 347.644734] Chain exists of:
[ 347.644734] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
[ 347.644736] Possible unsafe locking scenario:
[ 347.644737] CPU0 CPU1
[ 347.644738] ---- ----
[ 347.644738] lock(&(&zone->lock)->rlock);
[ 347.644739] lock(&rq->lock);
[ 347.644740] lock(&(&zone->lock)->rlock);
[ 347.644741] lock(&console_sch_key);
[ 347.644742] *** DEADLOCK ***
[ 347.644743] 9 locks held by sh/770:
[ 347.644743] #0: (sb_writers#5){.+.+.+}, at: [<000000000035c64a>] vfs_write+0xa2/0x1a0
[ 347.644745] #1: (&of->mutex){+.+.+.}, at: [<000000000040065a>] kernfs_fop_write+0x1b2/0x208
[ 347.644747] #2: (s_active#46){.+.+.+}, at: [<0000000000400666>] kernfs_fop_write+0x1be/0x208
[ 347.644749] #3: (device_hotplug_lock){+.+...}, at: [<00000000006b9d18>] lock_device_hotplug_sysfs+0x30/0x70
[ 347.644750] #4: (cpu_hotplug_lock.rw_sem){++++++}, at: [<000000000033596a>] mem_hotplug_begin+0x2a/0x40
[ 347.644752] #5: (mem_hotplug_lock.rw_sem){++++.+}, at: [<00000000001a5c78>] percpu_down_write+0x38/0x110
[ 347.644754] #6: (&dev->mutex){......}, at: [<00000000006bb5ac>] device_offline+0x5c/0xe0
[ 347.644756] #7: (&(&zone->lock)->rlock){-.-...}, at: [<00000000002b693e>] __offline_isolated_pages+0x316/0x388
[ 347.644758] #8: (console_lock){+.+...}, at: [<00000000001c2a3c>] vprintk_emit+0x2fc/0x378
[ 347.644760] stack backtrace:
[ 347.644760] CPU: 2 PID: 770 Comm: sh Not tainted 4.13.0-rc2 #146
[ 347.644761] Hardware name: IBM 2827 H66 706 (z/VM 6.3.0)
[ 347.644761] Call Trace:
[ 347.644761] ([<0000000000113a1a>] show_stack+0x8a/0xe0)
[ 347.644762] [<00000000008fe4a6>] dump_stack+0x96/0xd8
[ 347.644762] [<00000000001a95c4>] print_circular_bug+0x314/0x340
[ 347.644763] [<00000000001aa548>] check_prev_add+0x160/0x6e8
[ 347.644763] [<00000000001ab626>] validate_chain.isra.10+0xb56/0xd88
[ 347.644764] [<00000000001ad024>] __lock_acquire+0x62c/0x850
[ 347.644764] [<00000000001adbec>] lock_acquire+0x254/0x2b8
[ 347.644765] [<000000000091e4a8>] _raw_spin_lock_irqsave+0x70/0xb8
[ 347.644766] [<0000000000763d80>] raw3215_write+0x58/0x208
[ 347.644766] [<000000000076416e>] con3215_write+0x8e/0xf8
[ 347.644766] [<00000000001c2570>] console_unlock+0x4d8/0x6a8
[ 347.644767] [<00000000001c2a48>] vprintk_emit+0x308/0x378
[ 347.644767] [<00000000001c2c94>] vprintk_default+0x44/0x58
[ 347.644768] [<00000000001c3c0e>] printk+0x4e/0x60
[ 347.644768] [<00000000002b6792>] __offline_isolated_pages+0x16a/0x388
[ 347.644769] [<0000000000335516>] offline_isolated_pages_cb+0x2e/0x40
[ 347.644769] [<000000000014b092>] walk_system_ram_range+0x92/0xf0
[ 347.644770] [<0000000000913cfa>] __offline_pages.constprop.6+0x6d2/0x910
[ 347.644770] [<00000000006d8cf4>] memory_subsys_offline+0x6c/0xa0
[ 347.644771] [<00000000006bb5d4>] device_offline+0x84/0xe0
[ 347.644771] [<00000000006d93de>] store_mem_state+0xfe/0x120
[ 347.644772] [<00000000004005da>] kernfs_fop_write+0x132/0x208
[ 347.644773] [<000000000035b236>] __vfs_write+0x36/0x158
[ 347.644773] [<000000000035c660>] vfs_write+0xb8/0x1a0
[ 347.644774] [<000000000035dd96>] SyS_write+0x66/0xc0
[ 347.644774] [<000000000091f59c>] system_call+0xc4/0x298
[ 347.644774] INFO: lockdep is turned off.

Sebastian

2017-07-24 13:32:16

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: lockdep warning: console vs. mem hotplug

Hello,

On (07/24/17 14:46), Sebastian Ott wrote:
> [ 347.644660] ======================================================
> [ 347.644660] WARNING: possible circular locking dependency detected
> [ 347.644661] 4.13.0-rc2 #146 Not tainted
> [ 347.644661] ------------------------------------------------------
> [ 347.644662] sh/770 is trying to acquire lock:
> [ 347.644662] (&console_sch_key){-.-...}, at: [<0000000000763d80>] raw3215_write+0x58/0x208
> [ 347.644665] but task is already holding lock:
> [ 347.644665] (&(&zone->lock)->rlock){-.-...}, at: [<00000000002b693e>] __offline_isolated_pages+0x316/0x388
> [ 347.644668] which lock already depends on the new lock.
> [ 347.644669] the existing dependency chain (in reverse order) is:
> [ 347.644670] -> #4 (&(&zone->lock)->rlock){-.-...}:
> [ 347.644672] validate_chain.isra.10+0xb56/0xd88
> [ 347.644673] __lock_acquire+0x62c/0x850
> [ 347.644673] lock_acquire+0x254/0x2b8
> [ 347.644674] _raw_spin_lock_irqsave+0x70/0xb8
> [ 347.644674] get_page_from_freelist+0x446/0xf30
> [ 347.644675] __alloc_pages_nodemask+0x200/0x1568
> [ 347.644675] allocate_slab+0xf0/0x658
> [ 347.644676] new_slab+0x94/0xa8
> [ 347.644677] ___slab_alloc.constprop.23+0x55e/0x580
> [ 347.644677] __slab_alloc.isra.17.constprop.22+0x74/0xa8
> [ 347.644678] kmem_cache_alloc+0x13c/0x4b0
> [ 347.644678] __debug_object_init+0x5c/0x468
> [ 347.644679] hrtimer_init+0x42/0x1d8
> [ 347.644679] init_dl_task_timer+0x3a/0x58
> [ 347.644680] __sched_fork.isra.2+0x82/0xd8
> [ 347.644680] init_idle+0x7a/0x278
> [ 347.644681] fork_idle+0xa4/0xb8
> [ 347.644681] idle_threads_init+0x6a/0xd0
> [ 347.644682] smp_init+0x34/0x110
> [ 347.644682] kernel_init_freeable+0x166/0x2d8
> [ 347.644683] kernel_init+0x2a/0x148
> [ 347.644683] kernel_thread_starter+0x6/0xc
> [ 347.644684] kernel_thread_starter+0x0/0xc
>
> [ 347.644684] -> #3 (&rq->lock){-.-.-.}:
> [ 347.644686] validate_chain.isra.10+0xb56/0xd88
> [ 347.644687] __lock_acquire+0x62c/0x850
> [ 347.644687] lock_acquire+0x254/0x2b8
> [ 347.644688] _raw_spin_lock+0x60/0xa0
> [ 347.644688] task_fork_fair+0x6a/0x160
> [ 347.644689] sched_fork+0x13e/0x2a0
> [ 347.644689] copy_process+0x676/0x1ec0
> [ 347.644690] _do_fork+0xc2/0x6d0
> [ 347.644690] kernel_thread+0x4e/0x60
> [ 347.644691] rest_init+0x48/0x290
> [ 347.644691] start_kernel+0x470/0x480
> [ 347.644692] _stext+0x20/0x80

ok... this part is not exactly clear to me, but we've got

rq->lock -> &(&zone->lock)->rlock

dependency.

> [ 347.644693] -> #2 (&p->pi_lock){-.-.-.}:
> [ 347.644695] validate_chain.isra.10+0xb56/0xd88
> [ 347.644695] __lock_acquire+0x62c/0x850
> [ 347.644696] lock_acquire+0x254/0x2b8
> [ 347.644696] _raw_spin_lock_irqsave+0x70/0xb8
> [ 347.644697] try_to_wake_up+0x4a/0x600
> [ 347.644697] autoremove_wake_function+0x2e/0x88
> [ 347.644698] __wake_up_common+0x76/0xc0
> [ 347.644698] __wake_up+0x54/0x68
> [ 347.644699] ccw_device_verify_done+0xae/0x268
> [ 347.644700] ccw_request_handler+0x422/0x560
> [ 347.644700] do_cio_interrupt+0x224/0x2a0
> [ 347.644701] __handle_irq_event_percpu+0x1a6/0x440
> [ 347.644701] handle_irq_event_percpu+0x38/0x88
> [ 347.644702] handle_percpu_irq+0x84/0xb0
> [ 347.644702] generic_handle_irq+0x42/0x60
> [ 347.644703] do_IRQ+0x86/0xc8
> [ 347.644703] io_int_handler+0x104/0x2d4
> [ 347.644704] enabled_wait+0x72/0x140
> [ 347.644704] enabled_wait+0x5a/0x140
> [ 347.644705] arch_cpu_idle+0x32/0x50
> [ 347.644706] default_idle_call+0x52/0x68
> [ 347.644706] do_idle+0x118/0x170
> [ 347.644707] cpu_startup_entry+0x3e/0x48
> [ 347.644707] smp_start_secondary+0x112/0x120
> [ 347.644708] restart_int_handler+0x62/0x78
> [ 347.644708] (null)
>
> [ 347.644709] -> #1 (&priv->wait_q){-.....}:
> [ 347.644711] validate_chain.isra.10+0xb56/0xd88
> [ 347.644711] __lock_acquire+0x62c/0x850
> [ 347.644712] lock_acquire+0x254/0x2b8
> [ 347.644712] _raw_spin_lock_irqsave+0x70/0xb8
> [ 347.644713] __wake_up+0x3a/0x68
> [ 347.644713] ccw_device_recog_done+0x28e/0x2c8
> [ 347.644714] snsid_callback+0x324/0x390
> [ 347.644714] ccw_request_handler+0x480/0x560
> [ 347.644715] do_cio_interrupt+0x224/0x2a0
> [ 347.644715] __handle_irq_event_percpu+0x1a6/0x440
> [ 347.644715] handle_irq_event_percpu+0x38/0x88
> [ 347.644716] handle_percpu_irq+0x84/0xb0
> [ 347.644716] generic_handle_irq+0x42/0x60
> [ 347.644717] do_IRQ+0x86/0xc8
> [ 347.644717] io_int_handler+0x104/0x2d4
> [ 347.644718] _raw_spin_unlock_irq+0x4e/0x78
> [ 347.644718] _raw_spin_unlock_irq+0x4a/0x78
> [ 347.644719] ccw_device_enable_console+0xa0/0x188
> [ 347.644719] con3215_init+0x116/0x1b8
> [ 347.644720] console_init+0x40/0x60
> [ 347.644720] start_kernel+0x34c/0x480
> [ 347.644720] _stext+0x20/0x80


so do_IRQ()->do_cio_interrupt() does several things


1) it takes sch->lock // also knows as "console_sch_key"
2) calls sch->driver->irq(sch) under sch->lock
3) which invokes __wake_up()
3.1) which locks ->wait_q
3.2) which locks ->pi_lock

so we've got

console_sch_key -> priv->wait_q
console_sch_key -> rq->pi_lock
console_sch_key -> rq->lock
console_sch_key -> .... other scheduler/timekeeping/etc. locks

> [ 347.644721] -> #0 (&console_sch_key){-.-...}:
> [ 347.644723] check_prev_add+0x160/0x6e8
> [ 347.644723] validate_chain.isra.10+0xb56/0xd88
> [ 347.644723] __lock_acquire+0x62c/0x850
> [ 347.644724] lock_acquire+0x254/0x2b8
> [ 347.644724] _raw_spin_lock_irqsave+0x70/0xb8
> [ 347.644725] raw3215_write+0x58/0x208
> [ 347.644725] con3215_write+0x8e/0xf8
> [ 347.644725] console_unlock+0x4d8/0x6a8
> [ 347.644726] vprintk_emit+0x308/0x378
> [ 347.644726] vprintk_default+0x44/0x58
> [ 347.644727] printk+0x4e/0x60
> [ 347.644727] __offline_isolated_pages+0x16a/0x388
> [ 347.644728] offline_isolated_pages_cb+0x2e/0x40
> [ 347.644728] walk_system_ram_range+0x92/0xf0
> [ 347.644729] __offline_pages.constprop.6+0x6d2/0x910
> [ 347.644729] memory_subsys_offline+0x6c/0xa0
> [ 347.644730] device_offline+0x84/0xe0
> [ 347.644730] store_mem_state+0xfe/0x120
> [ 347.644731] kernfs_fop_write+0x132/0x208
> [ 347.644731] __vfs_write+0x36/0x158
> [ 347.644731] vfs_write+0xb8/0x1a0
> [ 347.644732] SyS_write+0x66/0xc0
> [ 347.644732] system_call+0xc4/0x298

here we've got

(&zone->lock)->rlock -> console_sch_key

due to pr_info("remove from free list %lx %d %lx\n",...) under
spin_lock_irqsave(&zone->lock, flags).


so it's
from #4 from #0 from #1
rq->lock -> (&zone->lock)->rlock -> console_sch_key -> rq->lock


need to think what the fix can be...
if anyone has any thoughts I'd be glad to hear.


> [ 347.644733] other info that might help us debug this:
> [ 347.644734] Chain exists of:
> [ 347.644734] &console_sch_key --> &rq->lock --> &(&zone->lock)->rlock
> [ 347.644736] Possible unsafe locking scenario:
> [ 347.644737] CPU0 CPU1
> [ 347.644738] ---- ----
> [ 347.644738] lock(&(&zone->lock)->rlock);
> [ 347.644739] lock(&rq->lock);
> [ 347.644740] lock(&(&zone->lock)->rlock);
> [ 347.644741] lock(&console_sch_key);
> [ 347.644742] *** DEADLOCK ***
> [ 347.644743] 9 locks held by sh/770:
> [ 347.644743] #0: (sb_writers#5){.+.+.+}, at: [<000000000035c64a>] vfs_write+0xa2/0x1a0
> [ 347.644745] #1: (&of->mutex){+.+.+.}, at: [<000000000040065a>] kernfs_fop_write+0x1b2/0x208
> [ 347.644747] #2: (s_active#46){.+.+.+}, at: [<0000000000400666>] kernfs_fop_write+0x1be/0x208
> [ 347.644749] #3: (device_hotplug_lock){+.+...}, at: [<00000000006b9d18>] lock_device_hotplug_sysfs+0x30/0x70
> [ 347.644750] #4: (cpu_hotplug_lock.rw_sem){++++++}, at: [<000000000033596a>] mem_hotplug_begin+0x2a/0x40
> [ 347.644752] #5: (mem_hotplug_lock.rw_sem){++++.+}, at: [<00000000001a5c78>] percpu_down_write+0x38/0x110
> [ 347.644754] #6: (&dev->mutex){......}, at: [<00000000006bb5ac>] device_offline+0x5c/0xe0
> [ 347.644756] #7: (&(&zone->lock)->rlock){-.-...}, at: [<00000000002b693e>] __offline_isolated_pages+0x316/0x388
> [ 347.644758] #8: (console_lock){+.+...}, at: [<00000000001c2a3c>] vprintk_emit+0x2fc/0x378
> [ 347.644760] stack backtrace:
> [ 347.644760] CPU: 2 PID: 770 Comm: sh Not tainted 4.13.0-rc2 #146
> [ 347.644761] Hardware name: IBM 2827 H66 706 (z/VM 6.3.0)
> [ 347.644761] Call Trace:
> [ 347.644761] ([<0000000000113a1a>] show_stack+0x8a/0xe0)
> [ 347.644762] [<00000000008fe4a6>] dump_stack+0x96/0xd8
> [ 347.644762] [<00000000001a95c4>] print_circular_bug+0x314/0x340
> [ 347.644763] [<00000000001aa548>] check_prev_add+0x160/0x6e8
> [ 347.644763] [<00000000001ab626>] validate_chain.isra.10+0xb56/0xd88
> [ 347.644764] [<00000000001ad024>] __lock_acquire+0x62c/0x850
> [ 347.644764] [<00000000001adbec>] lock_acquire+0x254/0x2b8
> [ 347.644765] [<000000000091e4a8>] _raw_spin_lock_irqsave+0x70/0xb8
> [ 347.644766] [<0000000000763d80>] raw3215_write+0x58/0x208
> [ 347.644766] [<000000000076416e>] con3215_write+0x8e/0xf8
> [ 347.644766] [<00000000001c2570>] console_unlock+0x4d8/0x6a8
> [ 347.644767] [<00000000001c2a48>] vprintk_emit+0x308/0x378
> [ 347.644767] [<00000000001c2c94>] vprintk_default+0x44/0x58
> [ 347.644768] [<00000000001c3c0e>] printk+0x4e/0x60
> [ 347.644768] [<00000000002b6792>] __offline_isolated_pages+0x16a/0x388
> [ 347.644769] [<0000000000335516>] offline_isolated_pages_cb+0x2e/0x40
> [ 347.644769] [<000000000014b092>] walk_system_ram_range+0x92/0xf0
> [ 347.644770] [<0000000000913cfa>] __offline_pages.constprop.6+0x6d2/0x910
> [ 347.644770] [<00000000006d8cf4>] memory_subsys_offline+0x6c/0xa0
> [ 347.644771] [<00000000006bb5d4>] device_offline+0x84/0xe0
> [ 347.644771] [<00000000006d93de>] store_mem_state+0xfe/0x120
> [ 347.644772] [<00000000004005da>] kernfs_fop_write+0x132/0x208
> [ 347.644773] [<000000000035b236>] __vfs_write+0x36/0x158
> [ 347.644773] [<000000000035c660>] vfs_write+0xb8/0x1a0
> [ 347.644774] [<000000000035dd96>] SyS_write+0x66/0xc0
> [ 347.644774] [<000000000091f59c>] system_call+0xc4/0x298
> [ 347.644774] INFO: lockdep is turned off.

-ss