2015-04-29 00:11:50

by Laura Abbott

[permalink] [raw]
Subject: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

We've received a number of reports of warnings when coming
out of suspend with certain bluetooth firmware configurations:

WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
_request_firmware+0x558/0x810()
Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
video
CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
Workqueue: hci0 hci_power_on [bluetooth]
0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
Call Trace:
[<ffffffff8176e215>] dump_stack+0x45/0x57
[<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
[<ffffffff814dbe78>] _request_firmware+0x558/0x810
[<ffffffff814dc165>] request_firmware+0x35/0x50
[<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
[<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
[<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
[<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
[<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
[<ffffffff810b487c>] process_one_work+0x14c/0x3f0
[<ffffffff810b52f3>] worker_thread+0x53/0x470
[<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
[<ffffffff810ba548>] kthread+0xd8/0xf0
[<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81774958>] ret_from_fork+0x58/0x90
[<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0

This occurs after every resume.

When resuming, the bluetooth driver needs to re-request the
firmware. This re-request is happening before usermodehelper
is fully enabled. If the firmware load succeeded previously, the
caching behavior of the firmware code typically negates the
need to call the usermodehelper code again and the request
succeeds. If the firmware was never loaded because it isn't
actually present in the file system, this results in a call
to usermodehelper and a failure warning every resume. Rather
than have a WARN clogging up the kernel messages each time,
just drop the warn. There is still a dev_err for debugging
purposes.

Signed-off-by: Laura Abbott <[email protected]>
---
This might be papering over a real issue but I'm not
familiar enough with any of suspend/resume, bluetooth,
or firmware loading to identify an alternate fix.
The backtrace is from bcm patchram but the problem
isn't limited to that hardware. Intel also does a
request firmware and I was able to reproduce the
same backtrace on that driver by requesting non-existant
firmware file.
---
drivers/base/firmware_class.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 171841a..48ce9ac 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1115,7 +1115,7 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
}
} else {
ret = usermodehelper_read_trylock();
- if (WARN_ON(ret)) {
+ if (ret) {
dev_err(device, "firmware: %s will not be loaded\n",
name);
goto out;
--
2.1.0


2015-04-29 01:38:06

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

Hi Laura,

> We've received a number of reports of warnings when coming
> out of suspend with certain bluetooth firmware configurations:
>
> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> _request_firmware+0x558/0x810()
> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> video
> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
> Workqueue: hci0 hci_power_on [bluetooth]
> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> Call Trace:
> [<ffffffff8176e215>] dump_stack+0x45/0x57
> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> [<ffffffff814dc165>] request_firmware+0x35/0x50
> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> [<ffffffff810ba548>] kthread+0xd8/0xf0
> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>
> This occurs after every resume.
>
> When resuming, the bluetooth driver needs to re-request the
> firmware. This re-request is happening before usermodehelper
> is fully enabled. If the firmware load succeeded previously, the
> caching behavior of the firmware code typically negates the
> need to call the usermodehelper code again and the request
> succeeds. If the firmware was never loaded because it isn't
> actually present in the file system, this results in a call
> to usermodehelper and a failure warning every resume. Rather
> than have a WARN clogging up the kernel messages each time,
> just drop the warn. There is still a dev_err for debugging
> purposes.
>
> Signed-off-by: Laura Abbott <[email protected]>
> ---
> This might be papering over a real issue but I'm not
> familiar enough with any of suspend/resume, bluetooth,
> or firmware loading to identify an alternate fix.
> The backtrace is from bcm patchram but the problem
> isn't limited to that hardware. Intel also does a
> request firmware and I was able to reproduce the
> same backtrace on that driver by requesting non-existant
> firmware file.

so here is the thing with Bluetooth firmware. Some of them are RAM patches to fix the ROM modules. Others are full firmware that are required to be downloaded first.

For ROM modules, the RAM patching procedure is optional. So we will proceed even if no firmware is available. This means that the kernel will never cache it (since it is not there in the first place) and also on every resume we have the same issues. So optional firmware is something that happens for Bluetooth USB dongles a lot.

In the driver we know which firmwares are optional and which are required. So we could tell the firmware class this if this would make things better and result in clearer errors and warnings. Is that something we want here?

Regards

Marcel

2015-04-29 04:23:22

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

On Wed, Apr 29, 2015 at 8:11 AM, Laura Abbott <[email protected]> wrote:
> We've received a number of reports of warnings when coming
> out of suspend with certain bluetooth firmware configurations:
>
> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> _request_firmware+0x558/0x810()
> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> video
> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
> Workqueue: hci0 hci_power_on [bluetooth]
> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> Call Trace:
> [<ffffffff8176e215>] dump_stack+0x45/0x57
> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> [<ffffffff814dc165>] request_firmware+0x35/0x50
> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> [<ffffffff810ba548>] kthread+0xd8/0xf0
> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>
> This occurs after every resume.
>
> When resuming, the bluetooth driver needs to re-request the
> firmware. This re-request is happening before usermodehelper
> is fully enabled. If the firmware load succeeded previously, the
> caching behavior of the firmware code typically negates the
> need to call the usermodehelper code again and the request
> succeeds. If the firmware was never loaded because it isn't
> actually present in the file system, this results in a call
> to usermodehelper and a failure warning every resume. Rather

Yes, it is a driver problem, and loading firmware from filesystem
isn't safe during resume, and that is the purpose of the warning.

> than have a WARN clogging up the kernel messages each time,
> just drop the warn. There is still a dev_err for debugging
> purposes.
>
> Signed-off-by: Laura Abbott <[email protected]>
> ---
> This might be papering over a real issue but I'm not
> familiar enough with any of suspend/resume, bluetooth,
> or firmware loading to identify an alternate fix.
> The backtrace is from bcm patchram but the problem
> isn't limited to that hardware. Intel also does a
> request firmware and I was able to reproduce the
> same backtrace on that driver by requesting non-existant
> firmware file.
> ---
> drivers/base/firmware_class.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index 171841a..48ce9ac 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -1115,7 +1115,7 @@ _request_firmware(const struct firmware **firmware_p, const char *name,
> }
> } else {
> ret = usermodehelper_read_trylock();
> - if (WARN_ON(ret)) {
> + if (ret) {
> dev_err(device, "firmware: %s will not be loaded\n",
> name);
> goto out;
> --
> 2.1.0
>

2015-04-29 20:28:38

by Laura Abbott

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

On 04/28/2015 06:37 PM, Marcel Holtmann wrote:
> Hi Laura,
>
>> We've received a number of reports of warnings when coming
>> out of suspend with certain bluetooth firmware configurations:
>>
>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>> _request_firmware+0x558/0x810()
>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>> video
>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>> Workqueue: hci0 hci_power_on [bluetooth]
>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>> Call Trace:
>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>
>> This occurs after every resume.
>>
>> When resuming, the bluetooth driver needs to re-request the
>> firmware. This re-request is happening before usermodehelper
>> is fully enabled. If the firmware load succeeded previously, the
>> caching behavior of the firmware code typically negates the
>> need to call the usermodehelper code again and the request
>> succeeds. If the firmware was never loaded because it isn't
>> actually present in the file system, this results in a call
>> to usermodehelper and a failure warning every resume. Rather
>> than have a WARN clogging up the kernel messages each time,
>> just drop the warn. There is still a dev_err for debugging
>> purposes.
>>
>> Signed-off-by: Laura Abbott <[email protected]>
>> ---
>> This might be papering over a real issue but I'm not
>> familiar enough with any of suspend/resume, bluetooth,
>> or firmware loading to identify an alternate fix.
>> The backtrace is from bcm patchram but the problem
>> isn't limited to that hardware. Intel also does a
>> request firmware and I was able to reproduce the
>> same backtrace on that driver by requesting non-existant
>> firmware file.
>
> so here is the thing with Bluetooth firmware. Some of them
> are RAM patches to fix the ROM modules. Others are full firmware
> that are required to be downloaded first.
>
> For ROM modules, the RAM patching procedure is optional. So we
> will proceed even if no firmware is available. This means that
> the kernel will never cache it (since it is not there in the
> first place) and also on every resume we have the same issues.
> So optional firmware is something that happens for Bluetooth USB
> dongles a lot.
>
> In the driver we know which firmwares are optional and which are
> required. So we could tell the firmware class this if this would make
> things better and result in clearer errors and warnings. Is that
> something we want here?
>

The response on another reply was

"Yes, it is a driver problem, and loading firmware from filesystem
isn't safe during resume, and that is the purpose of the warning."

It isn't clear if this means request_firmware shouldn't be called
on resume at all or if request_firmware shouldn't be called unless
we can guarantee it won't make a call into the file system. I'd
be okay with adding another api (request_optional_firmware?) to
represent this if the firmware maintainers aren't against the
concept. If the firmware maintainers are against the concept,
it seems like the only solution is to rework the bluetooth drivers
to not request anything on resume.


> Regards
>
> Marcel
>

Thanks,
Laura

2015-04-29 23:04:11

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

Hi Laura,

>>> We've received a number of reports of warnings when coming
>>> out of suspend with certain bluetooth firmware configurations:
>>>
>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>> _request_firmware+0x558/0x810()
>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>> video
>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>> Workqueue: hci0 hci_power_on [bluetooth]
>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>> Call Trace:
>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>
>>> This occurs after every resume.
>>>
>>> When resuming, the bluetooth driver needs to re-request the
>>> firmware. This re-request is happening before usermodehelper
>>> is fully enabled. If the firmware load succeeded previously, the
>>> caching behavior of the firmware code typically negates the
>>> need to call the usermodehelper code again and the request
>>> succeeds. If the firmware was never loaded because it isn't
>>> actually present in the file system, this results in a call
>>> to usermodehelper and a failure warning every resume. Rather
>>> than have a WARN clogging up the kernel messages each time,
>>> just drop the warn. There is still a dev_err for debugging
>>> purposes.
>>>
>>> Signed-off-by: Laura Abbott <[email protected]>
>>> ---
>>> This might be papering over a real issue but I'm not
>>> familiar enough with any of suspend/resume, bluetooth,
>>> or firmware loading to identify an alternate fix.
>>> The backtrace is from bcm patchram but the problem
>>> isn't limited to that hardware. Intel also does a
>>> request firmware and I was able to reproduce the
>>> same backtrace on that driver by requesting non-existant
>>> firmware file.
>>
>> so here is the thing with Bluetooth firmware. Some of them
>> are RAM patches to fix the ROM modules. Others are full firmware
>> that are required to be downloaded first.
>>
>> For ROM modules, the RAM patching procedure is optional. So we
>> will proceed even if no firmware is available. This means that
>> the kernel will never cache it (since it is not there in the
>> first place) and also on every resume we have the same issues.
>> So optional firmware is something that happens for Bluetooth USB
> > dongles a lot.
>>
>> In the driver we know which firmwares are optional and which are
> > required. So we could tell the firmware class this if this would make
> > things better and result in clearer errors and warnings. Is that
> > something we want here?
>>
>
> The response on another reply was
>
> "Yes, it is a driver problem, and loading firmware from filesystem
> isn't safe during resume, and that is the purpose of the warning."
>
> It isn't clear if this means request_firmware shouldn't be called
> on resume at all or if request_firmware shouldn't be called unless
> we can guarantee it won't make a call into the file system. I'd
> be okay with adding another api (request_optional_firmware?) to
> represent this if the firmware maintainers aren't against the
> concept. If the firmware maintainers are against the concept,
> it seems like the only solution is to rework the bluetooth drivers
> to not request anything on resume.

I think request_optional_firmware concept sounds like an useful addition.

However the problem here is that the driver does not know that it is called from resume path. It is easy to say that this is a driver problem, but the driver does not know it.

If the hci_register_dev is called which in fact triggers hdev->setup to deal with vendor specific firmware path, then it means the driver just went through its probe() phase again. How would the driver differentiate this from any hot plug event. So to say this is a driver problem is just plain stupid. The driver does not know we are ending up in a reset_resume use case or when ACPI/BIOS decides to emulate an USB disconnect.

Regards

Marcel

2015-04-30 10:02:53

by Ming Lei

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

On Thu, Apr 30, 2015 at 7:04 AM, Marcel Holtmann <[email protected]> wrote:
> Hi Laura,
>
>>>> We've received a number of reports of warnings when coming
>>>> out of suspend with certain bluetooth firmware configurations:
>>>>
>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>>> _request_firmware+0x558/0x810()
>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>>> video
>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>>> Workqueue: hci0 hci_power_on [bluetooth]
>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>>> Call Trace:
>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>
>>>> This occurs after every resume.
>>>>
>>>> When resuming, the bluetooth driver needs to re-request the
>>>> firmware. This re-request is happening before usermodehelper
>>>> is fully enabled. If the firmware load succeeded previously, the
>>>> caching behavior of the firmware code typically negates the
>>>> need to call the usermodehelper code again and the request
>>>> succeeds. If the firmware was never loaded because it isn't
>>>> actually present in the file system, this results in a call
>>>> to usermodehelper and a failure warning every resume. Rather
>>>> than have a WARN clogging up the kernel messages each time,
>>>> just drop the warn. There is still a dev_err for debugging
>>>> purposes.
>>>>
>>>> Signed-off-by: Laura Abbott <[email protected]>
>>>> ---
>>>> This might be papering over a real issue but I'm not
>>>> familiar enough with any of suspend/resume, bluetooth,
>>>> or firmware loading to identify an alternate fix.
>>>> The backtrace is from bcm patchram but the problem
>>>> isn't limited to that hardware. Intel also does a
>>>> request firmware and I was able to reproduce the
>>>> same backtrace on that driver by requesting non-existant
>>>> firmware file.
>>>
>>> so here is the thing with Bluetooth firmware. Some of them
>>> are RAM patches to fix the ROM modules. Others are full firmware
>>> that are required to be downloaded first.
>>>
>>> For ROM modules, the RAM patching procedure is optional. So we
>>> will proceed even if no firmware is available. This means that
>>> the kernel will never cache it (since it is not there in the
>>> first place) and also on every resume we have the same issues.
>>> So optional firmware is something that happens for Bluetooth USB
>> > dongles a lot.
>>>
>>> In the driver we know which firmwares are optional and which are
>> > required. So we could tell the firmware class this if this would make
>> > things better and result in clearer errors and warnings. Is that
>> > something we want here?
>>>
>>
>> The response on another reply was
>>
>> "Yes, it is a driver problem, and loading firmware from filesystem
>> isn't safe during resume, and that is the purpose of the warning."
>>
>> It isn't clear if this means request_firmware shouldn't be called
>> on resume at all or if request_firmware shouldn't be called unless
>> we can guarantee it won't make a call into the file system. I'd

If the firmware is cached before resume, it is ok to call request_firmware()
during resume. Otherwise it will call filesystem and disks, which may
be a problem because the disk may not be ready for completing the
request during resume.

>> be okay with adding another api (request_optional_firmware?) to
>> represent this if the firmware maintainers aren't against the
>> concept. If the firmware maintainers are against the concept,
>> it seems like the only solution is to rework the bluetooth drivers
>> to not request anything on resume.

So do you just want to work around the warning by introducing a new
API?

>
> I think request_optional_firmware concept sounds like an useful addition.
>
> However the problem here is that the driver does not know that it is called from resume path. It is easy to say that this is a driver problem, but the driver does not know it.

>From USB stack view, one usb driver should know it is in the resume path
because the root entry is the .resume() callback of the USB BT driver.

>
> If the hci_register_dev is called which in fact triggers hdev->setup to deal with vendor specific firmware path, then it means the driver just went through its probe() phase again. How would the driver differentiate this from any hot plug event. So to say this is a driver problem is just plain stupid. The driver does not know we are ending up in a reset_resume use case or when ACPI/BIOS decides to emulate an USB disconnect.
>
> Regards
>
> Marcel
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-04-30 14:19:57

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

Hi Ming,

>>>>> We've received a number of reports of warnings when coming
>>>>> out of suspend with certain bluetooth firmware configurations:
>>>>>
>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>>>> _request_firmware+0x558/0x810()
>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>>>> video
>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>>>> Workqueue: hci0 hci_power_on [bluetooth]
>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>>>> Call Trace:
>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>
>>>>> This occurs after every resume.
>>>>>
>>>>> When resuming, the bluetooth driver needs to re-request the
>>>>> firmware. This re-request is happening before usermodehelper
>>>>> is fully enabled. If the firmware load succeeded previously, the
>>>>> caching behavior of the firmware code typically negates the
>>>>> need to call the usermodehelper code again and the request
>>>>> succeeds. If the firmware was never loaded because it isn't
>>>>> actually present in the file system, this results in a call
>>>>> to usermodehelper and a failure warning every resume. Rather
>>>>> than have a WARN clogging up the kernel messages each time,
>>>>> just drop the warn. There is still a dev_err for debugging
>>>>> purposes.
>>>>>
>>>>> Signed-off-by: Laura Abbott <[email protected]>
>>>>> ---
>>>>> This might be papering over a real issue but I'm not
>>>>> familiar enough with any of suspend/resume, bluetooth,
>>>>> or firmware loading to identify an alternate fix.
>>>>> The backtrace is from bcm patchram but the problem
>>>>> isn't limited to that hardware. Intel also does a
>>>>> request firmware and I was able to reproduce the
>>>>> same backtrace on that driver by requesting non-existant
>>>>> firmware file.
>>>>
>>>> so here is the thing with Bluetooth firmware. Some of them
>>>> are RAM patches to fix the ROM modules. Others are full firmware
>>>> that are required to be downloaded first.
>>>>
>>>> For ROM modules, the RAM patching procedure is optional. So we
>>>> will proceed even if no firmware is available. This means that
>>>> the kernel will never cache it (since it is not there in the
>>>> first place) and also on every resume we have the same issues.
>>>> So optional firmware is something that happens for Bluetooth USB
>>>> dongles a lot.
>>>>
>>>> In the driver we know which firmwares are optional and which are
>>>> required. So we could tell the firmware class this if this would make
>>>> things better and result in clearer errors and warnings. Is that
>>>> something we want here?
>>>>
>>>
>>> The response on another reply was
>>>
>>> "Yes, it is a driver problem, and loading firmware from filesystem
>>> isn't safe during resume, and that is the purpose of the warning."
>>>
>>> It isn't clear if this means request_firmware shouldn't be called
>>> on resume at all or if request_firmware shouldn't be called unless
>>> we can guarantee it won't make a call into the file system. I'd
>
> If the firmware is cached before resume, it is ok to call request_firmware()
> during resume. Otherwise it will call filesystem and disks, which may
> be a problem because the disk may not be ready for completing the
> request during resume.
>
>>> be okay with adding another api (request_optional_firmware?) to
>>> represent this if the firmware maintainers aren't against the
>>> concept. If the firmware maintainers are against the concept,
>>> it seems like the only solution is to rework the bluetooth drivers
>>> to not request anything on resume.
>
> So do you just want to work around the warning by introducing a new
> API?
>
>>
>> I think request_optional_firmware concept sounds like an useful addition.
>>
>> However the problem here is that the driver does not know that it is called from resume path. It is easy to say that this is a driver problem, but the driver does not know it.
>
> From USB stack view, one usb driver should know it is in the resume path
> because the root entry is the .resume() callback of the USB BT driver.

have you actually read drivers/bluetooth/btusb.c before making these statements. I explained how and when request_firmware is actually called.

>> If the hci_register_dev is called which in fact triggers hdev->setup to deal with vendor specific firmware path, then it means the driver just went through its probe() phase again. How would the driver differentiate this from any hot plug event. So to say this is a driver problem is just plain stupid. The driver does not know we are ending up in a reset_resume use case or when ACPI/BIOS decides to emulate an USB disconnect.

The only time request_firmware is called is from hdev->setup(). And that can only be triggered by hci_register_dev(). Which is only called from probe() callback of the driver. This has nothing to do with the resume() callback.

Regards

Marcel

2015-05-01 20:22:54

by Laura Abbott

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

On 04/30/2015 07:43 AM, Takashi Iwai wrote:
> At Thu, 30 Apr 2015 07:19:47 -0700,
> Marcel Holtmann wrote:
>>
>> Hi Ming,
>>
>>>>>>> We've received a number of reports of warnings when coming
>>>>>>> out of suspend with certain bluetooth firmware configurations:
>>>>>>>
>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>>>>>> _request_firmware+0x558/0x810()
>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>>>>>> video
>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
>>>>>>> Workqueue: hci0 hci_power_on [bluetooth]
>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>>>>>> Call Trace:
>>>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>>
>>>>>>> This occurs after every resume.
>>>>>>>
>>>>>>> When resuming, the bluetooth driver needs to re-request the
>>>>>>> firmware. This re-request is happening before usermodehelper
>>>>>>> is fully enabled. If the firmware load succeeded previously, the
>>>>>>> caching behavior of the firmware code typically negates the
>>>>>>> need to call the usermodehelper code again and the request
>>>>>>> succeeds. If the firmware was never loaded because it isn't
>>>>>>> actually present in the file system, this results in a call
>>>>>>> to usermodehelper and a failure warning every resume. Rather
>>>>>>> than have a WARN clogging up the kernel messages each time,
>>>>>>> just drop the warn. There is still a dev_err for debugging
>>>>>>> purposes.
>>>>>>>
>>>>>>> Signed-off-by: Laura Abbott <[email protected]>
>>>>>>> ---
>>>>>>> This might be papering over a real issue but I'm not
>>>>>>> familiar enough with any of suspend/resume, bluetooth,
>>>>>>> or firmware loading to identify an alternate fix.
>>>>>>> The backtrace is from bcm patchram but the problem
>>>>>>> isn't limited to that hardware. Intel also does a
>>>>>>> request firmware and I was able to reproduce the
>>>>>>> same backtrace on that driver by requesting non-existant
>>>>>>> firmware file.
>>>>>>
>>>>>> so here is the thing with Bluetooth firmware. Some of them
>>>>>> are RAM patches to fix the ROM modules. Others are full firmware
>>>>>> that are required to be downloaded first.
>>>>>>
>>>>>> For ROM modules, the RAM patching procedure is optional. So we
>>>>>> will proceed even if no firmware is available. This means that
>>>>>> the kernel will never cache it (since it is not there in the
>>>>>> first place) and also on every resume we have the same issues.
>>>>>> So optional firmware is something that happens for Bluetooth USB
>>>>>> dongles a lot.
>>>>>>
>>>>>> In the driver we know which firmwares are optional and which are
>>>>>> required. So we could tell the firmware class this if this would make
>>>>>> things better and result in clearer errors and warnings. Is that
>>>>>> something we want here?
>>>>>>
>>>>>
>>>>> The response on another reply was
>>>>>
>>>>> "Yes, it is a driver problem, and loading firmware from filesystem
>>>>> isn't safe during resume, and that is the purpose of the warning."
>>>>>
>>>>> It isn't clear if this means request_firmware shouldn't be called
>>>>> on resume at all or if request_firmware shouldn't be called unless
>>>>> we can guarantee it won't make a call into the file system. I'd
>>>
>>> If the firmware is cached before resume, it is ok to call request_firmware()
>>> during resume. Otherwise it will call filesystem and disks, which may
>>> be a problem because the disk may not be ready for completing the
>>> request during resume.
>>>
>>>>> be okay with adding another api (request_optional_firmware?) to
>>>>> represent this if the firmware maintainers aren't against the
>>>>> concept. If the firmware maintainers are against the concept,
>>>>> it seems like the only solution is to rework the bluetooth drivers
>>>>> to not request anything on resume.
>>>
>>> So do you just want to work around the warning by introducing a new
>>> API?
>>>
>>>>
>>>> I think request_optional_firmware concept sounds like an useful addition.
>>>>
>>>> However the problem here is that the driver does not know that it is called from resume path. It is easy to say that this is a driver problem, but the driver does not know it.
>>>
>>> From USB stack view, one usb driver should know it is in the resume path
>>> because the root entry is the .resume() callback of the USB BT driver.
>>
>> have you actually read drivers/bluetooth/btusb.c before making these statements. I explained how and when request_firmware is actually called.
>>
>>>> If the hci_register_dev is called which in fact triggers hdev->setup to deal with vendor specific firmware path, then it means the driver just went through its probe() phase again. How would the driver differentiate this from any hot plug event. So to say this is a driver problem is just plain stupid. The driver does not know we are ending up in a reset_resume use case or when ACPI/BIOS decides to emulate an USB disconnect.
>>
>> The only time request_firmware is called is from hdev->setup(). And that can only be triggered by hci_register_dev(). Which is only called from probe() callback of the driver. This has nothing to do with the resume() callback.
>
> FYI, I once fixed a very similar problem in commit
> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> PM / sleep: Fix request_firmware() error at resume
>
> But the bug seems still alive. The stack trace looks pretty similar
> as the original bug report. Hmm.
>
> My fix above addressed the race between UMH lock changes in
> thaw_process(). If the UMH lock still fails, it means that the
> task was triggered even before the UMH protection in thaw_process().
> That sounds weird, and might be a more serious problem.
>
> Added Rafael to Cc, in case I overlooked something obvious.
>

Thanks for pointing that out. It does look like a similar if not
the same problem. In this case though we still haven't closed
the race:

[ 35.725301] PM: resume of devices complete after 2370.964 msecs
[ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c

[ 35.725875] ------------[ cut here ]------------
[ 35.725879] WARNING: CPU: 7 PID: 2542 at
drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810()
[ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep
arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl
iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi
coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq
uvcvideo snd_seq_device
[ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core
videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi
videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr
serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp
lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace
sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm
rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul
crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video
[ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W
4.0.0-bt_hackery+ #36
[ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS
GMET62WW (2.10 ) 03/19/2014
[ 35.725932] Workqueue: hci0 hci_power_on [bluetooth]
[ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48
ffffffff8178a76a
[ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88
ffffffff8109f5fa
[ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5
ffff88040934d8a0
[ 35.725936] Call Trace:
[ 35.725939] [<ffffffff8178a76a>] dump_stack+0x45/0x57
[ 35.725940] [<ffffffff8109f5fa>] warn_slowpath_common+0x8a/0xc0
[ 35.725943] [<ffffffff8109f72a>] warn_slowpath_null+0x1a/0x20
[ 35.725944] [<ffffffff814ed227>] _request_firmware+0x557/0x810
[ 35.725946] [<ffffffff814e5253>] ? rpm_idle+0x23/0x230
[ 35.725948] [<ffffffff814ed515>] request_firmware+0x35/0x50
[ 35.725951] [<ffffffffa05443d8>] btusb_setup_intel+0x68/0x880 [btusb]
[ 35.725952] [<ffffffff814e5306>] ? rpm_idle+0xd6/0x230
[ 35.725954] [<ffffffff814e54ca>] ? __pm_runtime_idle+0x6a/0x80
[ 35.725960] [<ffffffffa04aa3a1>] hci_dev_do_open+0xe1/0x580 [bluetooth]
[ 35.725962] [<ffffffff810c8afd>] ?
ttwu_do_activate.constprop.95+0x5d/0x70
[ 35.725967] [<ffffffffa04ad440>] hci_power_on+0x40/0x200 [bluetooth]
[ 35.725969] [<ffffffff810b7f9b>] process_one_work+0x14b/0x400
[ 35.725971] [<ffffffff810b8a33>] worker_thread+0x53/0x490
[ 35.725972] [<ffffffff810b89e0>] ? rescuer_thread+0x300/0x300
[ 35.725974] [<ffffffff810bddd8>] kthread+0xd8/0xf0
[ 35.725976] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
[ 35.725977] [<ffffffff817911a2>] ret_from_fork+0x42/0x70
[ 35.725979] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
[ 35.725980] ---[ end trace 27cd183538666cbc ]---
[ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded
[ 35.726100] PM: Finishing wakeup.
[ 35.726101] Restarting tasks ... done.

From this, it looks like usermodehelper still isn't set until after
we've already tried to load the firmware. Even weirder, it looks like
a thread is running before we've thawed the processes? Am I
misunderstanding something or is the dmesg lying to me?

>
> Takashi
>

Thanks,
Laura

2015-05-01 21:35:23

by Laura Abbott

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

On 05/01/2015 01:22 PM, Laura Abbott wrote:
> On 04/30/2015 07:43 AM, Takashi Iwai wrote:
>> At Thu, 30 Apr 2015 07:19:47 -0700,
>> Marcel Holtmann wrote:
>>>
>>> Hi Ming,
>>>
>>>>>>>> We've received a number of reports of warnings when coming
>>>>>>>> out of suspend with certain bluetooth firmware configurations:
>>>>>>>>
>>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>>>>>>> _request_firmware+0x558/0x810()
>>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc
>>>>>>>> ebtable_filter
>>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211
>>>>>>>> iTCO_vendor_support
>>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel
>>>>>>>> uvcvideo
>>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm
>>>>>>>> videobuf2_core
>>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom
>>>>>>>> mmc_core
>>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd
>>>>>>>> mei_me mei
>>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>>>>>>> video
>>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted
>>>>>>>> 3.19.3-200.fc21.x86_64
>>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 )
>>>>>>>> 10/22/2013
>>>>>>>> Workqueue: hci0 hci_power_on [bluetooth]
>>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>>>>>>> Call Trace:
>>>>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>>>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>>>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>>>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>>>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>>>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>>>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>>>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>>>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>>>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>>>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>>>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>>>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>>>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>>>
>>>>>>>> This occurs after every resume.
>>>>>>>>
>>>>>>>> When resuming, the bluetooth driver needs to re-request the
>>>>>>>> firmware. This re-request is happening before usermodehelper
>>>>>>>> is fully enabled. If the firmware load succeeded previously, the
>>>>>>>> caching behavior of the firmware code typically negates the
>>>>>>>> need to call the usermodehelper code again and the request
>>>>>>>> succeeds. If the firmware was never loaded because it isn't
>>>>>>>> actually present in the file system, this results in a call
>>>>>>>> to usermodehelper and a failure warning every resume. Rather
>>>>>>>> than have a WARN clogging up the kernel messages each time,
>>>>>>>> just drop the warn. There is still a dev_err for debugging
>>>>>>>> purposes.
>>>>>>>>
>>>>>>>> Signed-off-by: Laura Abbott <[email protected]>
>>>>>>>> ---
>>>>>>>> This might be papering over a real issue but I'm not
>>>>>>>> familiar enough with any of suspend/resume, bluetooth,
>>>>>>>> or firmware loading to identify an alternate fix.
>>>>>>>> The backtrace is from bcm patchram but the problem
>>>>>>>> isn't limited to that hardware. Intel also does a
>>>>>>>> request firmware and I was able to reproduce the
>>>>>>>> same backtrace on that driver by requesting non-existant
>>>>>>>> firmware file.
>>>>>>>
>>>>>>> so here is the thing with Bluetooth firmware. Some of them
>>>>>>> are RAM patches to fix the ROM modules. Others are full firmware
>>>>>>> that are required to be downloaded first.
>>>>>>>
>>>>>>> For ROM modules, the RAM patching procedure is optional. So we
>>>>>>> will proceed even if no firmware is available. This means that
>>>>>>> the kernel will never cache it (since it is not there in the
>>>>>>> first place) and also on every resume we have the same issues.
>>>>>>> So optional firmware is something that happens for Bluetooth USB
>>>>>>> dongles a lot.
>>>>>>>
>>>>>>> In the driver we know which firmwares are optional and which are
>>>>>>> required. So we could tell the firmware class this if this would
>>>>>>> make
>>>>>>> things better and result in clearer errors and warnings. Is that
>>>>>>> something we want here?
>>>>>>>
>>>>>>
>>>>>> The response on another reply was
>>>>>>
>>>>>> "Yes, it is a driver problem, and loading firmware from filesystem
>>>>>> isn't safe during resume, and that is the purpose of the warning."
>>>>>>
>>>>>> It isn't clear if this means request_firmware shouldn't be called
>>>>>> on resume at all or if request_firmware shouldn't be called unless
>>>>>> we can guarantee it won't make a call into the file system. I'd
>>>>
>>>> If the firmware is cached before resume, it is ok to call
>>>> request_firmware()
>>>> during resume. Otherwise it will call filesystem and disks, which may
>>>> be a problem because the disk may not be ready for completing the
>>>> request during resume.
>>>>
>>>>>> be okay with adding another api (request_optional_firmware?) to
>>>>>> represent this if the firmware maintainers aren't against the
>>>>>> concept. If the firmware maintainers are against the concept,
>>>>>> it seems like the only solution is to rework the bluetooth drivers
>>>>>> to not request anything on resume.
>>>>
>>>> So do you just want to work around the warning by introducing a new
>>>> API?
>>>>
>>>>>
>>>>> I think request_optional_firmware concept sounds like an useful
>>>>> addition.
>>>>>
>>>>> However the problem here is that the driver does not know that it
>>>>> is called from resume path. It is easy to say that this is a driver
>>>>> problem, but the driver does not know it.
>>>>
>>>> From USB stack view, one usb driver should know it is in the resume
>>>> path
>>>> because the root entry is the .resume() callback of the USB BT driver.
>>>
>>> have you actually read drivers/bluetooth/btusb.c before making these
>>> statements. I explained how and when request_firmware is actually
>>> called.
>>>
>>>>> If the hci_register_dev is called which in fact triggers
>>>>> hdev->setup to deal with vendor specific firmware path, then it
>>>>> means the driver just went through its probe() phase again. How
>>>>> would the driver differentiate this from any hot plug event. So to
>>>>> say this is a driver problem is just plain stupid. The driver does
>>>>> not know we are ending up in a reset_resume use case or when
>>>>> ACPI/BIOS decides to emulate an USB disconnect.
>>>
>>> The only time request_firmware is called is from hdev->setup(). And
>>> that can only be triggered by hci_register_dev(). Which is only
>>> called from probe() callback of the driver. This has nothing to do
>>> with the resume() callback.
>>
>> FYI, I once fixed a very similar problem in commit
>> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
>> PM / sleep: Fix request_firmware() error at resume
>>
>> But the bug seems still alive. The stack trace looks pretty similar
>> as the original bug report. Hmm.
>>
>> My fix above addressed the race between UMH lock changes in
>> thaw_process(). If the UMH lock still fails, it means that the
>> task was triggered even before the UMH protection in thaw_process().
>> That sounds weird, and might be a more serious problem.
>>
>> Added Rafael to Cc, in case I overlooked something obvious.
>>
>
> Thanks for pointing that out. It does look like a similar if not
> the same problem. In this case though we still haven't closed
> the race:
>
> [ 35.725301] PM: resume of devices complete after 2370.964 msecs
> [ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c
>
> [ 35.725875] ------------[ cut here ]------------
> [ 35.725879] WARNING: CPU: 7 PID: 2542 at
> drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810()
> [ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM
> ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns
> nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep
> arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl
> iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi
> coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq
> uvcvideo snd_seq_device
> [ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core
> videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi
> videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr
> serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp
> lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm
> rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul
> crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video
> [ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W
> 4.0.0-bt_hackery+ #36
> [ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS
> GMET62WW (2.10 ) 03/19/2014
> [ 35.725932] Workqueue: hci0 hci_power_on [bluetooth]
> [ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48
> ffffffff8178a76a
> [ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88
> ffffffff8109f5fa
> [ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5
> ffff88040934d8a0
> [ 35.725936] Call Trace:
> [ 35.725939] [<ffffffff8178a76a>] dump_stack+0x45/0x57
> [ 35.725940] [<ffffffff8109f5fa>] warn_slowpath_common+0x8a/0xc0
> [ 35.725943] [<ffffffff8109f72a>] warn_slowpath_null+0x1a/0x20
> [ 35.725944] [<ffffffff814ed227>] _request_firmware+0x557/0x810
> [ 35.725946] [<ffffffff814e5253>] ? rpm_idle+0x23/0x230
> [ 35.725948] [<ffffffff814ed515>] request_firmware+0x35/0x50
> [ 35.725951] [<ffffffffa05443d8>] btusb_setup_intel+0x68/0x880 [btusb]
> [ 35.725952] [<ffffffff814e5306>] ? rpm_idle+0xd6/0x230
> [ 35.725954] [<ffffffff814e54ca>] ? __pm_runtime_idle+0x6a/0x80
> [ 35.725960] [<ffffffffa04aa3a1>] hci_dev_do_open+0xe1/0x580 [bluetooth]
> [ 35.725962] [<ffffffff810c8afd>] ?
> ttwu_do_activate.constprop.95+0x5d/0x70
> [ 35.725967] [<ffffffffa04ad440>] hci_power_on+0x40/0x200 [bluetooth]
> [ 35.725969] [<ffffffff810b7f9b>] process_one_work+0x14b/0x400
> [ 35.725971] [<ffffffff810b8a33>] worker_thread+0x53/0x490
> [ 35.725972] [<ffffffff810b89e0>] ? rescuer_thread+0x300/0x300
> [ 35.725974] [<ffffffff810bddd8>] kthread+0xd8/0xf0
> [ 35.725976] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 35.725977] [<ffffffff817911a2>] ret_from_fork+0x42/0x70
> [ 35.725979] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> [ 35.725980] ---[ end trace 27cd183538666cbc ]---
> [ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded
> [ 35.726100] PM: Finishing wakeup.
> [ 35.726101] Restarting tasks ... done.
>
> From this, it looks like usermodehelper still isn't set until after
> we've already tried to load the firmware. Even weirder, it looks like
> a thread is running before we've thawed the processes? Am I
> misunderstanding something or is the dmesg lying to me?
>

Okay I took another look and I noticed hci_register_dev calls
alloc_workqueue and then queue work to call the power on function.
Is it possible that the alloc_workqueue/queue_work is getting run
before thawing is complete? I don't have anymore time today to dig
into that.

Thanks,
Laura

2015-05-02 07:10:18

by Takashi Iwai

[permalink] [raw]
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock error case

At Fri, 01 May 2015 14:35:16 -0700,
Laura Abbott wrote:
>
> On 05/01/2015 01:22 PM, Laura Abbott wrote:
> > On 04/30/2015 07:43 AM, Takashi Iwai wrote:
> >> At Thu, 30 Apr 2015 07:19:47 -0700,
> >> Marcel Holtmann wrote:
> >>>
> >>> Hi Ming,
> >>>
> >>>>>>>> We've received a number of reports of warnings when coming
> >>>>>>>> out of suspend with certain bluetooth firmware configurations:
> >>>>>>>>
> >>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> >>>>>>>> _request_firmware+0x558/0x810()
> >>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> >>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc
> >>>>>>>> ebtable_filter
> >>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> >>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> >>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> >>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> >>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> >>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> >>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211
> >>>>>>>> iTCO_vendor_support
> >>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> >>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel
> >>>>>>>> uvcvideo
> >>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm
> >>>>>>>> videobuf2_core
> >>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> >>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom
> >>>>>>>> mmc_core
> >>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd
> >>>>>>>> mei_me mei
> >>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> >>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> >>>>>>>> video
> >>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted
> >>>>>>>> 3.19.3-200.fc21.x86_64
> >>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 )
> >>>>>>>> 10/22/2013
> >>>>>>>> Workqueue: hci0 hci_power_on [bluetooth]
> >>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> >>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> >>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> >>>>>>>> Call Trace:
> >>>>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
> >>>>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> >>>>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> >>>>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> >>>>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
> >>>>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> >>>>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> >>>>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> >>>>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> >>>>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> >>>>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> >>>>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> >>>>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> >>>>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
> >>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> >>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>>>>>
> >>>>>>>> This occurs after every resume.
> >>>>>>>>
> >>>>>>>> When resuming, the bluetooth driver needs to re-request the
> >>>>>>>> firmware. This re-request is happening before usermodehelper
> >>>>>>>> is fully enabled. If the firmware load succeeded previously, the
> >>>>>>>> caching behavior of the firmware code typically negates the
> >>>>>>>> need to call the usermodehelper code again and the request
> >>>>>>>> succeeds. If the firmware was never loaded because it isn't
> >>>>>>>> actually present in the file system, this results in a call
> >>>>>>>> to usermodehelper and a failure warning every resume. Rather
> >>>>>>>> than have a WARN clogging up the kernel messages each time,
> >>>>>>>> just drop the warn. There is still a dev_err for debugging
> >>>>>>>> purposes.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Laura Abbott <[email protected]>
> >>>>>>>> ---
> >>>>>>>> This might be papering over a real issue but I'm not
> >>>>>>>> familiar enough with any of suspend/resume, bluetooth,
> >>>>>>>> or firmware loading to identify an alternate fix.
> >>>>>>>> The backtrace is from bcm patchram but the problem
> >>>>>>>> isn't limited to that hardware. Intel also does a
> >>>>>>>> request firmware and I was able to reproduce the
> >>>>>>>> same backtrace on that driver by requesting non-existant
> >>>>>>>> firmware file.
> >>>>>>>
> >>>>>>> so here is the thing with Bluetooth firmware. Some of them
> >>>>>>> are RAM patches to fix the ROM modules. Others are full firmware
> >>>>>>> that are required to be downloaded first.
> >>>>>>>
> >>>>>>> For ROM modules, the RAM patching procedure is optional. So we
> >>>>>>> will proceed even if no firmware is available. This means that
> >>>>>>> the kernel will never cache it (since it is not there in the
> >>>>>>> first place) and also on every resume we have the same issues.
> >>>>>>> So optional firmware is something that happens for Bluetooth USB
> >>>>>>> dongles a lot.
> >>>>>>>
> >>>>>>> In the driver we know which firmwares are optional and which are
> >>>>>>> required. So we could tell the firmware class this if this would
> >>>>>>> make
> >>>>>>> things better and result in clearer errors and warnings. Is that
> >>>>>>> something we want here?
> >>>>>>>
> >>>>>>
> >>>>>> The response on another reply was
> >>>>>>
> >>>>>> "Yes, it is a driver problem, and loading firmware from filesystem
> >>>>>> isn't safe during resume, and that is the purpose of the warning."
> >>>>>>
> >>>>>> It isn't clear if this means request_firmware shouldn't be called
> >>>>>> on resume at all or if request_firmware shouldn't be called unless
> >>>>>> we can guarantee it won't make a call into the file system. I'd
> >>>>
> >>>> If the firmware is cached before resume, it is ok to call
> >>>> request_firmware()
> >>>> during resume. Otherwise it will call filesystem and disks, which may
> >>>> be a problem because the disk may not be ready for completing the
> >>>> request during resume.
> >>>>
> >>>>>> be okay with adding another api (request_optional_firmware?) to
> >>>>>> represent this if the firmware maintainers aren't against the
> >>>>>> concept. If the firmware maintainers are against the concept,
> >>>>>> it seems like the only solution is to rework the bluetooth drivers
> >>>>>> to not request anything on resume.
> >>>>
> >>>> So do you just want to work around the warning by introducing a new
> >>>> API?
> >>>>
> >>>>>
> >>>>> I think request_optional_firmware concept sounds like an useful
> >>>>> addition.
> >>>>>
> >>>>> However the problem here is that the driver does not know that it
> >>>>> is called from resume path. It is easy to say that this is a driver
> >>>>> problem, but the driver does not know it.
> >>>>
> >>>> From USB stack view, one usb driver should know it is in the resume
> >>>> path
> >>>> because the root entry is the .resume() callback of the USB BT driver.
> >>>
> >>> have you actually read drivers/bluetooth/btusb.c before making these
> >>> statements. I explained how and when request_firmware is actually
> >>> called.
> >>>
> >>>>> If the hci_register_dev is called which in fact triggers
> >>>>> hdev->setup to deal with vendor specific firmware path, then it
> >>>>> means the driver just went through its probe() phase again. How
> >>>>> would the driver differentiate this from any hot plug event. So to
> >>>>> say this is a driver problem is just plain stupid. The driver does
> >>>>> not know we are ending up in a reset_resume use case or when
> >>>>> ACPI/BIOS decides to emulate an USB disconnect.
> >>>
> >>> The only time request_firmware is called is from hdev->setup(). And
> >>> that can only be triggered by hci_register_dev(). Which is only
> >>> called from probe() callback of the driver. This has nothing to do
> >>> with the resume() callback.
> >>
> >> FYI, I once fixed a very similar problem in commit
> >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> >> PM / sleep: Fix request_firmware() error at resume
> >>
> >> But the bug seems still alive. The stack trace looks pretty similar
> >> as the original bug report. Hmm.
> >>
> >> My fix above addressed the race between UMH lock changes in
> >> thaw_process(). If the UMH lock still fails, it means that the
> >> task was triggered even before the UMH protection in thaw_process().
> >> That sounds weird, and might be a more serious problem.
> >>
> >> Added Rafael to Cc, in case I overlooked something obvious.
> >>
> >
> > Thanks for pointing that out. It does look like a similar if not
> > the same problem. In this case though we still haven't closed
> > the race:
> >
> > [ 35.725301] PM: resume of devices complete after 2370.964 msecs
> > [ 35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c
> >
> > [ 35.725875] ------------[ cut here ]------------
> > [ 35.725879] WARNING: CPU: 7 PID: 2542 at
> > drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810()
> > [ 35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns
> > nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> > xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> > ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> > ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> > nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep
> > arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic
> > snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl
> > iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi
> > coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq
> > uvcvideo snd_seq_device
> > [ 35.725909] snd_pcm videobuf2_vmalloc videobuf2_core
> > videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi
> > videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr
> > serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp
> > lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace
> > sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm
> > rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul
> > crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video
> > [ 35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G W
> > 4.0.0-bt_hackery+ #36
> > [ 35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS
> > GMET62WW (2.10 ) 03/19/2014
> > [ 35.725932] Workqueue: hci0 hci_power_on [bluetooth]
> > [ 35.725933] 0000000000000000 00000000923b240b ffff8803fc9d3b48
> > ffffffff8178a76a
> > [ 35.725935] 0000000000000000 0000000000000000 ffff8803fc9d3b88
> > ffffffff8109f5fa
> > [ 35.725936] 0000000000000000 ffffffffa05480a6 00000000fffffff5
> > ffff88040934d8a0
> > [ 35.725936] Call Trace:
> > [ 35.725939] [<ffffffff8178a76a>] dump_stack+0x45/0x57
> > [ 35.725940] [<ffffffff8109f5fa>] warn_slowpath_common+0x8a/0xc0
> > [ 35.725943] [<ffffffff8109f72a>] warn_slowpath_null+0x1a/0x20
> > [ 35.725944] [<ffffffff814ed227>] _request_firmware+0x557/0x810
> > [ 35.725946] [<ffffffff814e5253>] ? rpm_idle+0x23/0x230
> > [ 35.725948] [<ffffffff814ed515>] request_firmware+0x35/0x50
> > [ 35.725951] [<ffffffffa05443d8>] btusb_setup_intel+0x68/0x880 [btusb]
> > [ 35.725952] [<ffffffff814e5306>] ? rpm_idle+0xd6/0x230
> > [ 35.725954] [<ffffffff814e54ca>] ? __pm_runtime_idle+0x6a/0x80
> > [ 35.725960] [<ffffffffa04aa3a1>] hci_dev_do_open+0xe1/0x580 [bluetooth]
> > [ 35.725962] [<ffffffff810c8afd>] ?
> > ttwu_do_activate.constprop.95+0x5d/0x70
> > [ 35.725967] [<ffffffffa04ad440>] hci_power_on+0x40/0x200 [bluetooth]
> > [ 35.725969] [<ffffffff810b7f9b>] process_one_work+0x14b/0x400
> > [ 35.725971] [<ffffffff810b8a33>] worker_thread+0x53/0x490
> > [ 35.725972] [<ffffffff810b89e0>] ? rescuer_thread+0x300/0x300
> > [ 35.725974] [<ffffffff810bddd8>] kthread+0xd8/0xf0
> > [ 35.725976] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> > [ 35.725977] [<ffffffff817911a2>] ret_from_fork+0x42/0x70
> > [ 35.725979] [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> > [ 35.725980] ---[ end trace 27cd183538666cbc ]---
> > [ 35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded
> > [ 35.726100] PM: Finishing wakeup.
> > [ 35.726101] Restarting tasks ... done.
> >
> > From this, it looks like usermodehelper still isn't set until after
> > we've already tried to load the firmware. Even weirder, it looks like
> > a thread is running before we've thawed the processes? Am I
> > misunderstanding something or is the dmesg lying to me?
> >
>
> Okay I took another look and I noticed hci_register_dev calls
> alloc_workqueue and then queue work to call the power on function.
> Is it possible that the alloc_workqueue/queue_work is getting run
> before thawing is complete? I don't have anymore time today to dig
> into that.

Thinking of this again, I wonder why request_firmware() (no _nowait
version) doesn't wait. Maybe just always waiting the lock would help?
A totally untested patch below.


Takashi

---
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 171841ad1008..de397404a80e 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1105,21 +1105,12 @@ _request_firmware(const struct firmware **firmware_p, const char *name,

ret = 0;
timeout = firmware_loading_timeout();
- if (opt_flags & FW_OPT_NOWAIT) {
- timeout = usermodehelper_read_lock_wait(timeout);
- if (!timeout) {
- dev_dbg(device, "firmware: %s loading timed out\n",
- name);
- ret = -EBUSY;
- goto out;
- }
- } else {
- ret = usermodehelper_read_trylock();
- if (WARN_ON(ret)) {
- dev_err(device, "firmware: %s will not be loaded\n",
- name);
- goto out;
- }
+ timeout = usermodehelper_read_lock_wait(timeout);
+ if (!timeout) {
+ dev_dbg(device, "firmware: %s loading timed out\n",
+ name);
+ ret = -EBUSY;
+ goto out;
}

ret = fw_get_filesystem_firmware(device, fw->priv);

2015-05-06 00:06:12

by Laura Abbott

[permalink] [raw]
Subject: [PATCH] Bluetooth: Make request workqueue freezable

We've received a number of reports of warnings when coming
out of suspend with certain bluetooth firmware configurations:

WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
_request_firmware+0x558/0x810()
Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
video
CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
Workqueue: hci0 hci_power_on [bluetooth]
0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
Call Trace:
[<ffffffff8176e215>] dump_stack+0x45/0x57
[<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
[<ffffffff814dbe78>] _request_firmware+0x558/0x810
[<ffffffff814dc165>] request_firmware+0x35/0x50
[<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
[<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
[<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
[<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
[<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
[<ffffffff810b487c>] process_one_work+0x14c/0x3f0
[<ffffffff810b52f3>] worker_thread+0x53/0x470
[<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
[<ffffffff810ba548>] kthread+0xd8/0xf0
[<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81774958>] ret_from_fork+0x58/0x90
[<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0

This occurs after every resume.

When resuming, the bluetooth stack calls hci_register_dev,
allocates a new workqueue, and immediately schedules the
power_on on the newly created workqueue. Since the new
workqueue is not freezable, the work runs immediately and
triggers the warning since resume is still happening and
usermodehelper has not yet been re-enabled. Fix this by
making the request workqueue freezable. This ensures
the work will not run until unfreezing occurs and usermodehelper
is re-enabled.

Signed-off-by: Laura Abbott <[email protected]>
---
I think this should be fixing the actual root cause.
---
net/bluetooth/hci_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 476709b..87f2e48 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3131,7 +3131,8 @@ int hci_register_dev(struct hci_dev *hdev)
}

hdev->req_workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND |
- WQ_MEM_RECLAIM, 1, hdev->name);
+ WQ_MEM_RECLAIM | WQ_FREEZABLE,
+ 1, hdev->name);
if (!hdev->req_workqueue) {
destroy_workqueue(hdev->workqueue);
error = -ENOMEM;
--
2.1.0