Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751552AbbEBHKS (ORCPT ); Sat, 2 May 2015 03:10:18 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58145 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750785AbbEBHKO (ORCPT ); Sat, 2 May 2015 03:10:14 -0400 Date: Sat, 02 May 2015 09:10:11 +0200 Message-ID: From: Takashi Iwai To: Laura Abbott Cc: Marcel Holtmann , Ming Lei , Laura Abbott , "Gustavo F. Padovan" , Johan Hedberg , Greg Kroah-Hartman , "Rafael J. Wysocki" , Linux Kernel Mailing List , linux-bluetooth@vger.kernel.org Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case In-Reply-To: <5543F194.2090307@redhat.com> References: <1430266288-13755-1-git-send-email-labbott@fedoraproject.org> <47E8D3FC-0E25-4C4C-A6C3-01E6797CC836@holtmann.org> <55413EE6.4020802@redhat.com> <5543E097.7000004@redhat.com> <5543F194.2090307@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/24.4 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 14935 Lines: 302 At Fri, 01 May 2015 14:35:16 -0700, Laura Abbott wrote: > > On 05/01/2015 01:22 PM, Laura Abbott wrote: > > On 04/30/2015 07:43 AM, Takashi Iwai wrote: > >> At Thu, 30 Apr 2015 07:19:47 -0700, > >> Marcel Holtmann wrote: > >>> > >>> Hi Ming, > >>> > >>>>>>>> We've received a number of reports of warnings when coming > >>>>>>>> out of suspend with certain bluetooth firmware configurations: > >>>>>>>> > >>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126 > >>>>>>>> _request_firmware+0x558/0x810() > >>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > >>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc > >>>>>>>> ebtable_filter > >>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > >>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter > >>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 > >>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw > >>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal > >>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek > >>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 > >>>>>>>> iTCO_vendor_support > >>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul > >>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel > >>>>>>>> uvcvideo > >>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm > >>>>>>>> videobuf2_core > >>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common > >>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom > >>>>>>>> mmc_core > >>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd > >>>>>>>> mei_me mei > >>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 > >>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core > >>>>>>>> video > >>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted > >>>>>>>> 3.19.3-200.fc21.x86_64 > >>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) > >>>>>>>> 10/22/2013 > >>>>>>>> Workqueue: hci0 hci_power_on [bluetooth] > >>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215 > >>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a > >>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40 > >>>>>>>> Call Trace: > >>>>>>>> [] dump_stack+0x45/0x57 > >>>>>>>> [] warn_slowpath_common+0x8a/0xc0 > >>>>>>>> [] warn_slowpath_null+0x1a/0x20 > >>>>>>>> [] _request_firmware+0x558/0x810 > >>>>>>>> [] request_firmware+0x35/0x50 > >>>>>>>> [] btusb_setup_bcm_patchram+0x86/0x590 [btusb] > >>>>>>>> [] ? rpm_idle+0xd6/0x230 > >>>>>>>> [] hci_dev_do_open+0xe1/0xa90 [bluetooth] > >>>>>>>> [] ? ttwu_do_activate.constprop.90+0x5d/0x70 > >>>>>>>> [] hci_power_on+0x40/0x200 [bluetooth] > >>>>>>>> [] process_one_work+0x14c/0x3f0 > >>>>>>>> [] worker_thread+0x53/0x470 > >>>>>>>> [] ? rescuer_thread+0x300/0x300 > >>>>>>>> [] kthread+0xd8/0xf0 > >>>>>>>> [] ? kthread_create_on_node+0x1b0/0x1b0 > >>>>>>>> [] ret_from_fork+0x58/0x90 > >>>>>>>> [] ? kthread_create_on_node+0x1b0/0x1b0 > >>>>>>>> > >>>>>>>> This occurs after every resume. > >>>>>>>> > >>>>>>>> When resuming, the bluetooth driver needs to re-request the > >>>>>>>> firmware. This re-request is happening before usermodehelper > >>>>>>>> is fully enabled. If the firmware load succeeded previously, the > >>>>>>>> caching behavior of the firmware code typically negates the > >>>>>>>> need to call the usermodehelper code again and the request > >>>>>>>> succeeds. If the firmware was never loaded because it isn't > >>>>>>>> actually present in the file system, this results in a call > >>>>>>>> to usermodehelper and a failure warning every resume. Rather > >>>>>>>> than have a WARN clogging up the kernel messages each time, > >>>>>>>> just drop the warn. There is still a dev_err for debugging > >>>>>>>> purposes. > >>>>>>>> > >>>>>>>> Signed-off-by: Laura Abbott > >>>>>>>> --- > >>>>>>>> This might be papering over a real issue but I'm not > >>>>>>>> familiar enough with any of suspend/resume, bluetooth, > >>>>>>>> or firmware loading to identify an alternate fix. > >>>>>>>> The backtrace is from bcm patchram but the problem > >>>>>>>> isn't limited to that hardware. Intel also does a > >>>>>>>> request firmware and I was able to reproduce the > >>>>>>>> same backtrace on that driver by requesting non-existant > >>>>>>>> firmware file. > >>>>>>> > >>>>>>> so here is the thing with Bluetooth firmware. Some of them > >>>>>>> are RAM patches to fix the ROM modules. Others are full firmware > >>>>>>> that are required to be downloaded first. > >>>>>>> > >>>>>>> For ROM modules, the RAM patching procedure is optional. So we > >>>>>>> will proceed even if no firmware is available. This means that > >>>>>>> the kernel will never cache it (since it is not there in the > >>>>>>> first place) and also on every resume we have the same issues. > >>>>>>> So optional firmware is something that happens for Bluetooth USB > >>>>>>> dongles a lot. > >>>>>>> > >>>>>>> In the driver we know which firmwares are optional and which are > >>>>>>> required. So we could tell the firmware class this if this would > >>>>>>> make > >>>>>>> things better and result in clearer errors and warnings. Is that > >>>>>>> something we want here? > >>>>>>> > >>>>>> > >>>>>> The response on another reply was > >>>>>> > >>>>>> "Yes, it is a driver problem, and loading firmware from filesystem > >>>>>> isn't safe during resume, and that is the purpose of the warning." > >>>>>> > >>>>>> It isn't clear if this means request_firmware shouldn't be called > >>>>>> on resume at all or if request_firmware shouldn't be called unless > >>>>>> we can guarantee it won't make a call into the file system. I'd > >>>> > >>>> If the firmware is cached before resume, it is ok to call > >>>> request_firmware() > >>>> during resume. Otherwise it will call filesystem and disks, which may > >>>> be a problem because the disk may not be ready for completing the > >>>> request during resume. > >>>> > >>>>>> be okay with adding another api (request_optional_firmware?) to > >>>>>> represent this if the firmware maintainers aren't against the > >>>>>> concept. If the firmware maintainers are against the concept, > >>>>>> it seems like the only solution is to rework the bluetooth drivers > >>>>>> to not request anything on resume. > >>>> > >>>> So do you just want to work around the warning by introducing a new > >>>> API? > >>>> > >>>>> > >>>>> I think request_optional_firmware concept sounds like an useful > >>>>> addition. > >>>>> > >>>>> However the problem here is that the driver does not know that it > >>>>> is called from resume path. It is easy to say that this is a driver > >>>>> problem, but the driver does not know it. > >>>> > >>>> From USB stack view, one usb driver should know it is in the resume > >>>> path > >>>> because the root entry is the .resume() callback of the USB BT driver. > >>> > >>> have you actually read drivers/bluetooth/btusb.c before making these > >>> statements. I explained how and when request_firmware is actually > >>> called. > >>> > >>>>> If the hci_register_dev is called which in fact triggers > >>>>> hdev->setup to deal with vendor specific firmware path, then it > >>>>> means the driver just went through its probe() phase again. How > >>>>> would the driver differentiate this from any hot plug event. So to > >>>>> say this is a driver problem is just plain stupid. The driver does > >>>>> not know we are ending up in a reset_resume use case or when > >>>>> ACPI/BIOS decides to emulate an USB disconnect. > >>> > >>> The only time request_firmware is called is from hdev->setup(). And > >>> that can only be triggered by hci_register_dev(). Which is only > >>> called from probe() callback of the driver. This has nothing to do > >>> with the resume() callback. > >> > >> FYI, I once fixed a very similar problem in commit > >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 > >> PM / sleep: Fix request_firmware() error at resume > >> > >> But the bug seems still alive. The stack trace looks pretty similar > >> as the original bug report. Hmm. > >> > >> My fix above addressed the race between UMH lock changes in > >> thaw_process(). If the UMH lock still fails, it means that the > >> task was triggered even before the UMH protection in thaw_process(). > >> That sounds weird, and might be a more serious problem. > >> > >> Added Rafael to Cc, in case I overlooked something obvious. > >> > > > > Thanks for pointing that out. It does look like a similar if not > > the same problem. In this case though we still haven't closed > > the race: > > > > [ 35.725301] PM: resume of devices complete after 2370.964 msecs > > [ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c > > > > [ 35.725875] ------------[ cut here ]------------ > > [ 35.725879] WARNING: CPU: 7 PID: 2542 at > > drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810() > > [ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM > > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns > > nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > > xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter > > ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > > ip6table_mangle ip6table_security ip6table_raw ip6table_filter > > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 > > nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep > > arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic > > snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl > > iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi > > coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq > > uvcvideo snd_seq_device > > [ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core > > videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi > > videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr > > serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp > > lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace > > sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm > > rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul > > crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video > > [ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W > > 4.0.0-bt_hackery+ #36 > > [ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS > > GMET62WW (2.10 ) 03/19/2014 > > [ 35.725932] Workqueue: hci0 hci_power_on [bluetooth] > > [ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48 > > ffffffff8178a76a > > [ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88 > > ffffffff8109f5fa > > [ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5 > > ffff88040934d8a0 > > [ 35.725936] Call Trace: > > [ 35.725939] [] dump_stack+0x45/0x57 > > [ 35.725940] [] warn_slowpath_common+0x8a/0xc0 > > [ 35.725943] [] warn_slowpath_null+0x1a/0x20 > > [ 35.725944] [] _request_firmware+0x557/0x810 > > [ 35.725946] [] ? rpm_idle+0x23/0x230 > > [ 35.725948] [] request_firmware+0x35/0x50 > > [ 35.725951] [] btusb_setup_intel+0x68/0x880 [btusb] > > [ 35.725952] [] ? rpm_idle+0xd6/0x230 > > [ 35.725954] [] ? __pm_runtime_idle+0x6a/0x80 > > [ 35.725960] [] hci_dev_do_open+0xe1/0x580 [bluetooth] > > [ 35.725962] [] ? > > ttwu_do_activate.constprop.95+0x5d/0x70 > > [ 35.725967] [] hci_power_on+0x40/0x200 [bluetooth] > > [ 35.725969] [] process_one_work+0x14b/0x400 > > [ 35.725971] [] worker_thread+0x53/0x490 > > [ 35.725972] [] ? rescuer_thread+0x300/0x300 > > [ 35.725974] [] kthread+0xd8/0xf0 > > [ 35.725976] [] ? kthread_create_on_node+0x1b0/0x1b0 > > [ 35.725977] [] ret_from_fork+0x42/0x70 > > [ 35.725979] [] ? kthread_create_on_node+0x1b0/0x1b0 > > [ 35.725980] ---[ end trace 27cd183538666cbc ]--- > > [ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded > > [ 35.726100] PM: Finishing wakeup. > > [ 35.726101] Restarting tasks ... done. > > > > From this, it looks like usermodehelper still isn't set until after > > we've already tried to load the firmware. Even weirder, it looks like > > a thread is running before we've thawed the processes? Am I > > misunderstanding something or is the dmesg lying to me? > > > > Okay I took another look and I noticed hci_register_dev calls > alloc_workqueue and then queue work to call the power on function. > Is it possible that the alloc_workqueue/queue_work is getting run > before thawing is complete? I don't have anymore time today to dig > into that. Thinking of this again, I wonder why request_firmware() (no _nowait version) doesn't wait. Maybe just always waiting the lock would help? A totally untested patch below. Takashi --- diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 171841ad1008..de397404a80e 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -1105,21 +1105,12 @@ _request_firmware(const struct firmware **firmware_p, const char *name, ret = 0; timeout = firmware_loading_timeout(); - if (opt_flags & FW_OPT_NOWAIT) { - timeout = usermodehelper_read_lock_wait(timeout); - if (!timeout) { - dev_dbg(device, "firmware: %s loading timed out\n", - name); - ret = -EBUSY; - goto out; - } - } else { - ret = usermodehelper_read_trylock(); - if (WARN_ON(ret)) { - dev_err(device, "firmware: %s will not be loaded\n", - name); - goto out; - } + timeout = usermodehelper_read_lock_wait(timeout); + if (!timeout) { + dev_dbg(device, "firmware: %s loading timed out\n", + name); + ret = -EBUSY; + goto out; } ret = fw_get_filesystem_firmware(device, fw->priv); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/