Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751304AbbEAVfX (ORCPT ); Fri, 1 May 2015 17:35:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58394 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750717AbbEAVfT (ORCPT ); Fri, 1 May 2015 17:35:19 -0400 Message-ID: <5543F194.2090307@redhat.com> Date: Fri, 01 May 2015 14:35:16 -0700 From: Laura Abbott User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Takashi Iwai , Marcel Holtmann CC: Ming Lei , Laura Abbott , "Gustavo F. Padovan" , Johan Hedberg , Greg Kroah-Hartman , "Rafael J. Wysocki" , Linux Kernel Mailing List , linux-bluetooth@vger.kernel.org Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case References: <1430266288-13755-1-git-send-email-labbott@fedoraproject.org> <47E8D3FC-0E25-4C4C-A6C3-01E6797CC836@holtmann.org> <55413EE6.4020802@redhat.com> <5543E097.7000004@redhat.com> In-Reply-To: <5543E097.7000004@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13262 Lines: 261 On 05/01/2015 01:22 PM, Laura Abbott wrote: > On 04/30/2015 07:43 AM, Takashi Iwai wrote: >> At Thu, 30 Apr 2015 07:19:47 -0700, >> Marcel Holtmann wrote: >>> >>> Hi Ming, >>> >>>>>>>> We've received a number of reports of warnings when coming >>>>>>>> out of suspend with certain bluetooth firmware configurations: >>>>>>>> >>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126 >>>>>>>> _request_firmware+0x558/0x810() >>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 >>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc >>>>>>>> ebtable_filter >>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 >>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter >>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 >>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw >>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal >>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek >>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 >>>>>>>> iTCO_vendor_support >>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul >>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel >>>>>>>> uvcvideo >>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm >>>>>>>> videobuf2_core >>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common >>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom >>>>>>>> mmc_core >>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd >>>>>>>> mei_me mei >>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 >>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core >>>>>>>> video >>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted >>>>>>>> 3.19.3-200.fc21.x86_64 >>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) >>>>>>>> 10/22/2013 >>>>>>>> Workqueue: hci0 hci_power_on [bluetooth] >>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215 >>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a >>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40 >>>>>>>> Call Trace: >>>>>>>> [] dump_stack+0x45/0x57 >>>>>>>> [] warn_slowpath_common+0x8a/0xc0 >>>>>>>> [] warn_slowpath_null+0x1a/0x20 >>>>>>>> [] _request_firmware+0x558/0x810 >>>>>>>> [] request_firmware+0x35/0x50 >>>>>>>> [] btusb_setup_bcm_patchram+0x86/0x590 [btusb] >>>>>>>> [] ? rpm_idle+0xd6/0x230 >>>>>>>> [] hci_dev_do_open+0xe1/0xa90 [bluetooth] >>>>>>>> [] ? ttwu_do_activate.constprop.90+0x5d/0x70 >>>>>>>> [] hci_power_on+0x40/0x200 [bluetooth] >>>>>>>> [] process_one_work+0x14c/0x3f0 >>>>>>>> [] worker_thread+0x53/0x470 >>>>>>>> [] ? rescuer_thread+0x300/0x300 >>>>>>>> [] kthread+0xd8/0xf0 >>>>>>>> [] ? kthread_create_on_node+0x1b0/0x1b0 >>>>>>>> [] ret_from_fork+0x58/0x90 >>>>>>>> [] ? kthread_create_on_node+0x1b0/0x1b0 >>>>>>>> >>>>>>>> This occurs after every resume. >>>>>>>> >>>>>>>> When resuming, the bluetooth driver needs to re-request the >>>>>>>> firmware. This re-request is happening before usermodehelper >>>>>>>> is fully enabled. If the firmware load succeeded previously, the >>>>>>>> caching behavior of the firmware code typically negates the >>>>>>>> need to call the usermodehelper code again and the request >>>>>>>> succeeds. If the firmware was never loaded because it isn't >>>>>>>> actually present in the file system, this results in a call >>>>>>>> to usermodehelper and a failure warning every resume. Rather >>>>>>>> than have a WARN clogging up the kernel messages each time, >>>>>>>> just drop the warn. There is still a dev_err for debugging >>>>>>>> purposes. >>>>>>>> >>>>>>>> Signed-off-by: Laura Abbott >>>>>>>> --- >>>>>>>> This might be papering over a real issue but I'm not >>>>>>>> familiar enough with any of suspend/resume, bluetooth, >>>>>>>> or firmware loading to identify an alternate fix. >>>>>>>> The backtrace is from bcm patchram but the problem >>>>>>>> isn't limited to that hardware. Intel also does a >>>>>>>> request firmware and I was able to reproduce the >>>>>>>> same backtrace on that driver by requesting non-existant >>>>>>>> firmware file. >>>>>>> >>>>>>> so here is the thing with Bluetooth firmware. Some of them >>>>>>> are RAM patches to fix the ROM modules. Others are full firmware >>>>>>> that are required to be downloaded first. >>>>>>> >>>>>>> For ROM modules, the RAM patching procedure is optional. So we >>>>>>> will proceed even if no firmware is available. This means that >>>>>>> the kernel will never cache it (since it is not there in the >>>>>>> first place) and also on every resume we have the same issues. >>>>>>> So optional firmware is something that happens for Bluetooth USB >>>>>>> dongles a lot. >>>>>>> >>>>>>> In the driver we know which firmwares are optional and which are >>>>>>> required. So we could tell the firmware class this if this would >>>>>>> make >>>>>>> things better and result in clearer errors and warnings. Is that >>>>>>> something we want here? >>>>>>> >>>>>> >>>>>> The response on another reply was >>>>>> >>>>>> "Yes, it is a driver problem, and loading firmware from filesystem >>>>>> isn't safe during resume, and that is the purpose of the warning." >>>>>> >>>>>> It isn't clear if this means request_firmware shouldn't be called >>>>>> on resume at all or if request_firmware shouldn't be called unless >>>>>> we can guarantee it won't make a call into the file system. I'd >>>> >>>> If the firmware is cached before resume, it is ok to call >>>> request_firmware() >>>> during resume. Otherwise it will call filesystem and disks, which may >>>> be a problem because the disk may not be ready for completing the >>>> request during resume. >>>> >>>>>> be okay with adding another api (request_optional_firmware?) to >>>>>> represent this if the firmware maintainers aren't against the >>>>>> concept. If the firmware maintainers are against the concept, >>>>>> it seems like the only solution is to rework the bluetooth drivers >>>>>> to not request anything on resume. >>>> >>>> So do you just want to work around the warning by introducing a new >>>> API? >>>> >>>>> >>>>> I think request_optional_firmware concept sounds like an useful >>>>> addition. >>>>> >>>>> However the problem here is that the driver does not know that it >>>>> is called from resume path. It is easy to say that this is a driver >>>>> problem, but the driver does not know it. >>>> >>>> From USB stack view, one usb driver should know it is in the resume >>>> path >>>> because the root entry is the .resume() callback of the USB BT driver. >>> >>> have you actually read drivers/bluetooth/btusb.c before making these >>> statements. I explained how and when request_firmware is actually >>> called. >>> >>>>> If the hci_register_dev is called which in fact triggers >>>>> hdev->setup to deal with vendor specific firmware path, then it >>>>> means the driver just went through its probe() phase again. How >>>>> would the driver differentiate this from any hot plug event. So to >>>>> say this is a driver problem is just plain stupid. The driver does >>>>> not know we are ending up in a reset_resume use case or when >>>>> ACPI/BIOS decides to emulate an USB disconnect. >>> >>> The only time request_firmware is called is from hdev->setup(). And >>> that can only be triggered by hci_register_dev(). Which is only >>> called from probe() callback of the driver. This has nothing to do >>> with the resume() callback. >> >> FYI, I once fixed a very similar problem in commit >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 >> PM / sleep: Fix request_firmware() error at resume >> >> But the bug seems still alive. The stack trace looks pretty similar >> as the original bug report. Hmm. >> >> My fix above addressed the race between UMH lock changes in >> thaw_process(). If the UMH lock still fails, it means that the >> task was triggered even before the UMH protection in thaw_process(). >> That sounds weird, and might be a more serious problem. >> >> Added Rafael to Cc, in case I overlooked something obvious. >> > > Thanks for pointing that out. It does look like a similar if not > the same problem. In this case though we still haven't closed > the race: > > [ 35.725301] PM: resume of devices complete after 2370.964 msecs > [ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c > > [ 35.725875] ------------[ cut here ]------------ > [ 35.725879] WARNING: CPU: 7 PID: 2542 at > drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810() > [ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns > nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 > xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter > ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 > ip6table_mangle ip6table_security ip6table_raw ip6table_filter > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 > nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep > arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic > snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl > iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi > coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq > uvcvideo snd_seq_device > [ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core > videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi > videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr > serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp > lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace > sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm > rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul > crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video > [ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W > 4.0.0-bt_hackery+ #36 > [ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS > GMET62WW (2.10 ) 03/19/2014 > [ 35.725932] Workqueue: hci0 hci_power_on [bluetooth] > [ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48 > ffffffff8178a76a > [ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88 > ffffffff8109f5fa > [ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5 > ffff88040934d8a0 > [ 35.725936] Call Trace: > [ 35.725939] [] dump_stack+0x45/0x57 > [ 35.725940] [] warn_slowpath_common+0x8a/0xc0 > [ 35.725943] [] warn_slowpath_null+0x1a/0x20 > [ 35.725944] [] _request_firmware+0x557/0x810 > [ 35.725946] [] ? rpm_idle+0x23/0x230 > [ 35.725948] [] request_firmware+0x35/0x50 > [ 35.725951] [] btusb_setup_intel+0x68/0x880 [btusb] > [ 35.725952] [] ? rpm_idle+0xd6/0x230 > [ 35.725954] [] ? __pm_runtime_idle+0x6a/0x80 > [ 35.725960] [] hci_dev_do_open+0xe1/0x580 [bluetooth] > [ 35.725962] [] ? > ttwu_do_activate.constprop.95+0x5d/0x70 > [ 35.725967] [] hci_power_on+0x40/0x200 [bluetooth] > [ 35.725969] [] process_one_work+0x14b/0x400 > [ 35.725971] [] worker_thread+0x53/0x490 > [ 35.725972] [] ? rescuer_thread+0x300/0x300 > [ 35.725974] [] kthread+0xd8/0xf0 > [ 35.725976] [] ? kthread_create_on_node+0x1b0/0x1b0 > [ 35.725977] [] ret_from_fork+0x42/0x70 > [ 35.725979] [] ? kthread_create_on_node+0x1b0/0x1b0 > [ 35.725980] ---[ end trace 27cd183538666cbc ]--- > [ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded > [ 35.726100] PM: Finishing wakeup. > [ 35.726101] Restarting tasks ... done. > > From this, it looks like usermodehelper still isn't set until after > we've already tried to load the firmware. Even weirder, it looks like > a thread is running before we've thawed the processes? Am I > misunderstanding something or is the dmesg lying to me? > Okay I took another look and I noticed hci_register_dev calls alloc_workqueue and then queue work to call the power on function. Is it possible that the alloc_workqueue/queue_work is getting run before thawing is complete? I don't have anymore time today to dig into that. Thanks, Laura -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/