Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp460782pxa; Fri, 21 Aug 2020 11:39:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwG8NKc2/vxlCrC2j31pvOEcholxisJe9WYyETylIPcUazKvChssqsJ/Y2v7j5M7Jod6V/T X-Received: by 2002:a17:906:8149:: with SMTP id z9mr4175736ejw.356.1598035193843; Fri, 21 Aug 2020 11:39:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598035193; cv=none; d=google.com; s=arc-20160816; b=uLWioUcFMxfMxSZ+4vSLgR3OCIKBgJk4wIjr2uHv7sIU8Zu6iwG6ZplwoKlXojzlcD x2qli+QUq57mLHfIPS77Hbeug/Xr0YUfx1DWXVbw6/bDUXUTehj/vojgJ1O7fwu7WS1A 6vq/0BYdzUTU7oA1UPrxmDFTTb3h/MWOAPMbxP4xTZDXSwxgYQmv0frA9vwtJSVC7ZhF 3SCIq+kF36+6WTP1+8uPVEMr5KhS1Begq83J3p8BNCo3maJQv66ykt5ooyNWhB+zu0Tc zZCBg1c5sPtquut5luqnG2wxiknFpMCNJSvkfuNViZeAGXZoMH7/WAclDmhWFVBDHQBf CKRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=6FPEK+ZO8Bte/pKJR1hjG9u/bYsvmgesSBmM46P5T1o=; b=sk8ssfEJ57GVpo05BNfFxKmZGcG9FBTiBL+QlppaQ6AvDzOZD98n1icqneq7dz8QSn +tnUMhH9CGnwQWoEjIz22BJouGhHnLv7cSnfL2eAKbp1y7/k1B37Fia4Mn8GeT0rYb8W U9oyc1scWjaq45QwqvG1pwIj4Y0xNsyICnibD1Nln0/CaJhreekNpMPlZ0mpIvx7OfiC sfJYRwvzjHEnv009fGz3/xiLxTC32nzYvgduAV6u5tQAuQHrhwXTCriKE+n02HqtQ9RF q1c+Lh7mj/gN6MHpK55cM80TAUu74aWKNkoUUifE2IBzLZKESQllQrDDFPoahOryOg9U IZgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=j8ypDKWk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r26si2025163ejb.363.2020.08.21.11.39.29; Fri, 21 Aug 2020 11:39:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=j8ypDKWk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726801AbgHUSgb (ORCPT + 99 others); Fri, 21 Aug 2020 14:36:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725883AbgHUSgN (ORCPT ); Fri, 21 Aug 2020 14:36:13 -0400 Received: from mail-pf1-x442.google.com (mail-pf1-x442.google.com [IPv6:2607:f8b0:4864:20::442]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F4050C061799 for ; Fri, 21 Aug 2020 11:28:52 -0700 (PDT) Received: by mail-pf1-x442.google.com with SMTP id y206so1466698pfb.10 for ; Fri, 21 Aug 2020 11:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6FPEK+ZO8Bte/pKJR1hjG9u/bYsvmgesSBmM46P5T1o=; b=j8ypDKWkGcXr97te0YVGww0y4Ekrl3vNTLOF/dxVDL27nGKI268i7kjL15efo2seqV 4dywvv3Z67Y8J/9NTvCjzD/EG6KX+q2hUVSD6LPxeHk939g53UCcWqsHwyVUngfu+giI QYCHgYj9XnlG3YwaeL6meceFATa4lXWP2ar+4gS5xRJ6+A3LPGMAsuoA1qIIJ1ScHUKQ w9xLA0eOYkU9zbIGfFnFL3opfxqyyna5/ombQ4tXCg4MShLTCf/8zvjpZ8qe3zHNWTZg FWip10rVdmGfg9/K4xwU+M9v93CcC7BJPwKwBV0UOBSQGxrqagX32qKHEseuc85ol+eK JPEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6FPEK+ZO8Bte/pKJR1hjG9u/bYsvmgesSBmM46P5T1o=; b=ar8/AkKtf/NQHCe8w287HtcjRMoS262e2jXXGVUAVy6vhO/u5NTBuHFAlBTkR7W/6r QO2zMISxYHJJTzL1GXylJVKCQW67Wa6mBOKt0PMU5JlWuJMkkkWlDb+L4l44VDL5jjDN JqVNsFwHMwXFpTzTPzaWB9a9P53TeG1ooBXl/ElS5v/BwQmqpjZpK4EyhXo2IYl9bde3 ANn8fo5N9YbR3zUJ5XdSk3s42vQ0g/CngNlCiREVChL4WX4j7WvktoZPXawbeTL1vuAZ CtjpcMiUb1RMEUCyLg80IVF3ft68agQxRprPcDC2Ssl8m5uP5WhgjnIeurWMpzVKHiER R54g== X-Gm-Message-State: AOAM531qMEWPiXpLnwX0KalUpFMT3qFH+BiS3A63BbgmtW9HSksyMnPK aMNGRV1HCIukK36alsa7myq3VS/TW0qe7QlnFC0pJh0/BS0= X-Received: by 2002:a05:6a00:2295:: with SMTP id f21mr3414502pfe.51.1598034532108; Fri, 21 Aug 2020 11:28:52 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Saravana Kannan Date: Fri, 21 Aug 2020 11:28:16 -0700 Message-ID: Subject: Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion" To: Dong Aisheng Cc: open list , Guenter Roeck , gregkh , Marek Szyprowski , Naresh Kamboju , dl-linux-imx , "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" , Dong Aisheng , Shawn Guo , Sascha Hauer Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng wrote: > > Hi ALL, > > We met the below WARNING during system suspend on an iMX6Q SDB board > with the latest linus/master branch (v5.9-rc1+) and next-20200820. > v5.8 kernel is ok. So i did bisect and finally found it's caused by > the patch below. > Reverting it can get rid of the warning, but I wonder if there may be > other potential issues. > Any ideas? Thanks for the report. I'll look into this more closely after Linux Plumbers (next week). We can't just revert the patch you pointed out because it's fixing another locking issue. I'm not familiar with the code path of the firmware_uevent stuff. But if I were to make a guess, the fix will probably be one of: 1. Not having to go through one of these code paths for the "device link" device. 2. Rewriting the device_del() code in device_link_remove() code path. If anyone has a fix, I'd be happy to review. -Saravana > Defconfig used is: imx_v6_v7_defconfig > > commit 843e600b8a2b01463c4d873a90b2c2ea8033f1f6 > Author: Saravana Kannan > Date: Thu Jul 16 14:45:23 2020 -0700 > > driver core: Fix sleeping in invalid context during device link deletion > > Marek and Guenter reported that commit 287905e68dd2 ("driver core: > Expose device link details in sysfs") caused sleeping/scheduling while > atomic warnings. > > BUG: sleeping function called from invalid context at > kernel/locking/mutex.c:935 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1 > 2 locks held by kworker/0:1/12: > #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: > process_one_work+0x174/0x7dc > #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: > process_one_work+0x174/0x7dc > Preemption disabled at: > [] srcu_invoke_callbacks+0xc0/0x154 > ----- 8< ----- SNIP > [] (device_del) from [] (device_unregister+0x24/0x64) > [] (device_unregister) from [] > (srcu_invoke_callbacks+0xcc/0x154) > [] (srcu_invoke_callbacks) from [] > (process_one_work+0x234/0x7dc) > [] (process_one_work) from [] (worker_thread+0x44/0x51c) > [] (worker_thread) from [] (kthread+0x158/0x1a0) > [] (kthread) from [] (ret_from_fork+0x14/0x20) > Exception stack(0xee921fb0 to 0xee921ff8) > > This was caused by the device link device being released in the context > of srcu_invoke_callbacks(). There is no need to wait till the RCU > callback to release the device link device. So release the device > earlier and move the call_srcu() into the device release code. That way, > the memory will get freed only after the device is released AND the RCU > callback is called. > > Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs") > Reported-by: Marek Szyprowski > Reported-by: Guenter Roeck > Reported-by: Naresh Kamboju > Signed-off-by: Saravana Kannan > Tested-by: Marek Szyprowski > Tested-by: Guenter Roeck > Link: https://lore.kernel.org/r/20200716214523.2924704-1-saravanak@google.com > Signed-off-by: Greg Kroah-Hartman > > Error log: > # echo mem > /sys/power/state > [ 39.111865] PM: suspend entry (deep) > [ 39.148650] Filesystems sync: 0.032 seconds > [ 39.154034] > [ 39.155537] ====================================================== > [ 39.161723] WARNING: possible circular locking dependency detected > [ 39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted > [ 39.173315] ------------------------------------------------------ > [ 39.179500] sh/647 is trying to acquire lock: > [ 39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at: > dpm_for_each_dev+0x20/0x5c > [ 39.191200] > [ 39.191200] but task is already holding lock: > [ 39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4 > [ 39.203582] > [ 39.203582] which lock already depends on the new lock. > [ 39.203582] > [ 39.211763] > [ 39.211763] the existing dependency chain (in reverse order) is: > [ 39.219249] > [ 39.219249] -> #2 (fw_lock){+.+.}-{3:3}: > [ 39.224673] mutex_lock_nested+0x1c/0x24 > [ 39.229126] firmware_uevent+0x18/0xa0 > [ 39.233411] dev_uevent+0xc4/0x1f8 > [ 39.237343] uevent_show+0x98/0x114 > [ 39.241362] dev_attr_show+0x18/0x48 > [ 39.245472] sysfs_kf_seq_show+0x84/0xec > [ 39.249927] seq_read+0x138/0x550 > [ 39.253774] vfs_read+0x94/0x164 > [ 39.257529] ksys_read+0x60/0xe8 > [ 39.261288] ret_fast_syscall+0x0/0x28 > [ 39.265564] 0xbed7c808 > [ 39.268538] > [ 39.268538] -> #1 (kn->active#3){++++}-{0:0}: > [ 39.274391] kernfs_remove_by_name_ns+0x40/0x94 > [ 39.279450] device_del+0x144/0x3fc > [ 39.283467] __device_link_del+0x4c/0x70 > [ 39.287919] device_link_remove+0x5c/0x8c > [ 39.292464] _regulator_put.part.0+0x104/0x1dc > [ 39.297436] regulator_put+0x2c/0x3c > [ 39.299731] regulator regulator.5: Failed to increase supply voltage: -110 > [ 39.301544] release_nodes+0x1b4/0x204 > [ 39.301553] really_probe+0x104/0x3b4 > [ 39.316881] driver_probe_device+0x58/0xb4 > [ 39.321506] device_driver_attach+0x58/0x60 > [ 39.326217] __driver_attach+0x58/0xd0 > [ 39.330499] bus_for_each_dev+0x74/0xbc > [ 39.334863] bus_add_driver+0x150/0x1dc > [ 39.339227] driver_register+0x74/0x108 > [ 39.343599] i2c_register_driver+0x38/0x8c > [ 39.348227] do_one_initcall+0x84/0x3b4 > [ 39.352598] kernel_init_freeable+0x154/0x1e4 > [ 39.357485] kernel_init+0x8/0x118 > [ 39.361415] ret_from_fork+0x14/0x20 > [ 39.365518] 0x0 > [ 39.367883] > [ 39.367883] -> #0 (dpm_list_mtx){+.+.}-{3:3}: > [ 39.373740] lock_acquire+0xe0/0x4ec > [ 39.377848] __mutex_lock+0x94/0x9d0 > [ 39.381952] mutex_lock_nested+0x1c/0x24 > [ 39.386405] dpm_for_each_dev+0x20/0x5c > [ 39.390769] fw_pm_notify+0xa4/0xd4 > [ 39.394795] notifier_call_chain+0x48/0x80 > [ 39.399420] __blocking_notifier_call_chain+0x48/0x64 > [ 39.405003] __pm_notifier_call_chain+0x20/0x3c > [ 39.410063] pm_suspend+0x1ac/0x438 > [ 39.414080] state_store+0x68/0xc8 > [ 39.418013] kernfs_fop_write+0x10c/0x22c > [ 39.419741] cpu cpu0: failed to scale vddpu up: -110 > [ 39.422551] vfs_write+0xbc/0x1d8 > [ 39.422559] ksys_write+0x60/0xe8 > [ 39.427529] cpufreq: __target_index: Failed to change cpu frequency: -110 > [ 39.431362] ret_fast_syscall+0x0/0x28 > [ 39.431368] 0xbeec8958 > [ 39.431372] > [ 39.431372] other info that might help us debug this: > [ 39.431372] > [ 39.431375] Chain exists of: > [ 39.431375] dpm_list_mtx --> kn->active#3 --> fw_lock > [ 39.431375] > [ 39.431390] Possible unsafe locking scenario: > [ 39.431390] > [ 39.431394] CPU0 CPU1 > [ 39.431398] ---- ---- > [ 39.431401] lock(fw_lock); > [ 39.431412] lock(kn->active#3); > [ 39.490528] lock(fw_lock); > [ 39.495934] lock(dpm_list_mtx); > [ 39.499255] > [ 39.499255] *** DEADLOCK *** > [ 39.499255] > [ 39.505181] 6 locks held by sh/647: > [ 39.508675] #0: ecf48a84 (sb_writers#4){.+.+}-{0:0}, at: > vfs_write+0x14c/0x1d8 > [ 39.516007] #1: ed2ced48 (&of->mutex){+.+.}-{3:3}, at: > kernfs_fop_write+0xd0/0x22c > [ 39.523684] #2: ec1ff960 (kn->active#90){.+.+}-{0:0}, at: > kernfs_fop_write+0xd8/0x22c > [ 39.531620] #3: c151c4e8 (system_transition_mutex){+.+.}-{3:3}, > at: pm_suspend+0x11c/0x438 > [ 39.539991] #4: c151e3f4 ((pm_chain_head).rwsem){++++}-{3:3}, at: > __blocking_notifier_call_chain+0x2c/0x64 > [ 39.549753] usb_otg_vbus: disabling > [ 39.549755] #5: c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4 > [ 39.553264] wm8962-supply: disabling > [ 39.560214] > [ 39.560214] stack backtrace: > [ 39.560225] CPU: 0 PID: 647 Comm: sh Not tainted > 5.9.0-rc1-00103-g7eac66d0456f #37 > [ 39.560230] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 39.560260] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [ 39.560287] [] (show_stack) from [] > (dump_stack+0xe4/0x118) > [ 39.597365] [] (dump_stack) from [] > (check_noncircular+0x130/0x1e4) > [ 39.605386] [] (check_noncircular) from [] > (__lock_acquire+0x161c/0x31b0) > [ 39.613923] [] (__lock_acquire) from [] > (lock_acquire+0xe0/0x4ec) > [ 39.621772] [] (lock_acquire) from [] > (__mutex_lock+0x94/0x9d0) > [ 39.629445] [] (__mutex_lock) from [] > (mutex_lock_nested+0x1c/0x24) > [ 39.637463] [] (mutex_lock_nested) from [] > (dpm_for_each_dev+0x20/0x5c) > [ 39.645828] [] (dpm_for_each_dev) from [] > (fw_pm_notify+0xa4/0xd4) > [ 39.653762] [] (fw_pm_notify) from [] > (notifier_call_chain+0x48/0x80) > [ 39.661954] [] (notifier_call_chain) from [] > (__blocking_notifier_call_chain+0x48/0x64) > [ 39.671709] [] (__blocking_notifier_call_chain) from > [] (__pm_notifier_call_chain+0x20/0x3c) > [ 39.679740] cpu cpu0: failed to scale vddpu up: -110 > [ 39.681897] [] (__pm_notifier_call_chain) from > [] (pm_suspend+0x1ac/0x438) > [ 39.686859] cpufreq: __target_index: Failed to change cpu frequency: -110 > [ 39.695473] [] (pm_suspend) from [] > (state_store+0x68/0xc8) > [ 39.695490] [] (state_store) from [] > (kernfs_fop_write+0x10c/0x22c) > [ 39.695506] [] (kernfs_fop_write) from [] > (vfs_write+0xbc/0x1d8) > [ 39.695524] [] (vfs_write) from [] (ksys_write+0x60/0xe8) > [ 39.704844] VGEN1: disabling > [ 39.709627] [] (ksys_write) from [] > (ret_fast_syscall+0x0/0x28) > [ 39.743035] Exception stack(0xed573fa8 to 0xed573ff0) > [ 39.748098] 3fa0: 00000004 01d3caf8 00000001 > 01d3caf8 00000004 00000000 > [ 39.756286] 3fc0: 00000004 01d3caf8 b6f40340 00000004 b6ed6c8c > 00000000 00000000 00000000 > [ 39.764470] 3fe0: 00000004 beec8958 b6e75d4f b6e01d16 > [ 39.770874] VGEN2: disabling > [ 39.776106] VGEN3: disabling > > [ 69.590256] cfg80211: failed to load regulatory.db > [ 69.590312] Freezing user space processes ... (elapsed 0.008 seconds) done. > [ 69.606341] OOM killer disabled. > [ 69.609599] Freezing remaining freezable tasks ... (elapsed 0.001 > seconds) done. > [ 69.619021] printk: Suspending console(s) (use no_console_suspend to debug)