2017-08-16 15:26:36

by Chris Murphy

[permalink] [raw]
Subject: Re: WARNING: CPU: 3 PID: 2952 at drivers/base/firmware_class.c:1225 _request_firmware+0x329/0x890

Found this thread google searching.


4.13.0-0.rc4.git1.1.fc27.x86_64 Problem sometimes happens with this kernel.
This translates to git 623ce3456671.

4.13.0-0.rc3.git0.1.fc27.x86_64 But not this kernel, which translates
to the actual rc3 release (clean).

It seems to be a regression between those two. The problem only
happens when waking from suspend to RAM, but it doesn't always happen.


Product Name: HP Spectre Notebook
SKU Number: W2K28UA#ABA


dmesg, with bluetooth firmware warning and call trace only
https://drive.google.com/open?id=0B_2Asp8DGjJ9eDFsbHNMelVzelE

dmesg, normal boot
https://drive.google.com/open?id=0B_2Asp8DGjJ9X3gzYTdTNE4ydm8


--
Chris Murphy


2017-08-29 09:59:20

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: WARNING: CPU: 3 PID: 2952 at drivers/base/firmware_class.c:1225 _request_firmware+0x329/0x890

Hi Marcel,

On Mon, Aug 28, 2017 at 10:17 PM, Marcel Holtmann <[email protected]> wrote:
> Hi Kai-Heng,
>
>>
>> My theory is that if you request firmware when firmware_class is in
>> cache mode consecutively, and the device is changed, you will hit this
>> issue.
>> Can you give this patch a try?
>>
>> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
>> index bfbe1e154128..910dce498f53 100644
>> --- a/drivers/base/firmware_class.c
>> +++ b/drivers/base/firmware_class.c
>> @@ -1177,6 +1177,9 @@ _request_firmware_prepare(struct firmware
>> **firmware_p, const char *name,
>>
>> ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size);
>>
>> + if (!firmware_enabled() && device && (ret == 0 || ret == 1))
>> + fw_add_devm_name(device, buf->fw_id);
>> +
>> /*
>> * bind with 'buf' now to avoid warning in failure path
>> * of requesting firmware.
>
> if a fix in firmware_class can handle this, then I assume we also do not need all the per driver / hardware workarounds for firmware caching? Or are these other patches also needed?

There are two issues:
The first one, which can be quite easily to reproduce after a warm
boot, is similar to this one.

The driver checks the update status of the device which is always on
during reboot, finds that it's already updated hence no need for
firmware, happily ends the setup function after that.
In this case, request_firmware() never gets called, so firmware_class
doesn't know the firmware is required for caching.
My patches for btusb.c/ath3k.c is to address this issue.
I just find out that btusb_setup_intel_new() also skips calling
request_firmware(), so I should send new patches to address the issue.


The second case [1], which is a corner case, is found when I ran some
stress tests for the first case. The scenario is like this:
1. The driver called request_firmware(). firmware_request() calls
fw_add_devm_name() to add the firmware name to device's revres.
2. After system suspends/resumes, the device might be treated as a new device.
3. request_firmware() can find the firmware because it's in the cache.
But fw_add_devm_name() for the new 'device' is not being called.
4. The second time system suspends, the firmware will not be cached
because the firmware name is not in the devres list anymore.
5. The system resumes, request_firmware() cannot find firmware in the
cache, hit the WARN_ON.

So these two patches are for separate issues.

[1] https://lkml.org/lkml/2017/8/22/123

>
> Regards
>
> Marcel
>

2017-08-28 14:17:37

by Marcel Holtmann

[permalink] [raw]
Subject: Re: WARNING: CPU: 3 PID: 2952 at drivers/base/firmware_class.c:1225 _request_firmware+0x329/0x890

Hi Kai-Heng,

>> Found this thread google searching.
>>
>>
>> 4.13.0-0.rc4.git1.1.fc27.x86_64 Problem sometimes happens with this kernel.
>> This translates to git 623ce3456671.
>>
>> 4.13.0-0.rc3.git0.1.fc27.x86_64 But not this kernel, which translates
>> to the actual rc3 release (clean).
>>
>> It seems to be a regression between those two. The problem only
>> happens when waking from suspend to RAM, but it doesn't always happen.
>
> My theory is that if you request firmware when firmware_class is in
> cache mode consecutively, and the device is changed, you will hit this
> issue.
> Can you give this patch a try?
>
> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
> index bfbe1e154128..910dce498f53 100644
> --- a/drivers/base/firmware_class.c
> +++ b/drivers/base/firmware_class.c
> @@ -1177,6 +1177,9 @@ _request_firmware_prepare(struct firmware
> **firmware_p, const char *name,
>
> ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size);
>
> + if (!firmware_enabled() && device && (ret == 0 || ret == 1))
> + fw_add_devm_name(device, buf->fw_id);
> +
> /*
> * bind with 'buf' now to avoid warning in failure path
> * of requesting firmware.

if a fix in firmware_class can handle this, then I assume we also do not need all the per driver / hardware workarounds for firmware caching? Or are these other patches also needed?

Regards

Marcel


2017-08-17 09:28:01

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: WARNING: CPU: 3 PID: 2952 at drivers/base/firmware_class.c:1225 _request_firmware+0x329/0x890

On Wed, Aug 16, 2017 at 11:26 PM, Chris Murphy <[email protected]> wrote:
> Found this thread google searching.
>
>
> 4.13.0-0.rc4.git1.1.fc27.x86_64 Problem sometimes happens with this kernel.
> This translates to git 623ce3456671.
>
> 4.13.0-0.rc3.git0.1.fc27.x86_64 But not this kernel, which translates
> to the actual rc3 release (clean).
>
> It seems to be a regression between those two. The problem only
> happens when waking from suspend to RAM, but it doesn't always happen.

My theory is that if you request firmware when firmware_class is in
cache mode consecutively, and the device is changed, you will hit this
issue.
Can you give this patch a try?

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index bfbe1e154128..910dce498f53 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -1177,6 +1177,9 @@ _request_firmware_prepare(struct firmware
**firmware_p, const char *name,

ret = fw_lookup_and_allocate_buf(name, &fw_cache, &buf, dbuf, size);

+ if (!firmware_enabled() && device && (ret == 0 || ret == 1))
+ fw_add_devm_name(device, buf->fw_id);
+
/*
* bind with 'buf' now to avoid warning in failure path
* of requesting firmware.


>
>
> Product Name: HP Spectre Notebook
> SKU Number: W2K28UA#ABA
>
>
> dmesg, with bluetooth firmware warning and call trace only
> https://drive.google.com/open?id=0B_2Asp8DGjJ9eDFsbHNMelVzelE
>
> dmesg, normal boot
> https://drive.google.com/open?id=0B_2Asp8DGjJ9X3gzYTdTNE4ydm8
>
>
> --
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-09-16 04:52:00

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: WARNING: CPU: 3 PID: 2952 at drivers/base/firmware_class.c:1225 _request_firmware+0x329/0x890

Hi Marcel,

On Tue, Aug 29, 2017 at 2:59 AM, Kai-Heng Feng
<[email protected]> wrote:
> Hi Marcel,
>
> On Mon, Aug 28, 2017 at 10:17 PM, Marcel Holtmann <[email protected]> wrote:
>> Hi Kai-Heng,
>>
>>
>> if a fix in firmware_class can handle this, then I assume we also do not need all the per driver / hardware workarounds for firmware caching? Or are these other patches also needed?
>
> There are two issues:
> The first one, which can be quite easily to reproduce after a warm
> boot, is similar to this one.
>
> The driver checks the update status of the device which is always on
> during reboot, finds that it's already updated hence no need for
> firmware, happily ends the setup function after that.
> In this case, request_firmware() never gets called, so firmware_class
> doesn't know the firmware is required for caching.
> My patches for btusb.c/ath3k.c is to address this issue.
> I just find out that btusb_setup_intel_new() also skips calling
> request_firmware(), so I should send new patches to address the issue.
>
>
> The second case [1], which is a corner case, is found when I ran some
> stress tests for the first case. The scenario is like this:
> 1. The driver called request_firmware(). firmware_request() calls
> fw_add_devm_name() to add the firmware name to device's revres.
> 2. After system suspends/resumes, the device might be treated as a new device.
> 3. request_firmware() can find the firmware because it's in the cache.
> But fw_add_devm_name() for the new 'device' is not being called.
> 4. The second time system suspends, the firmware will not be cached
> because the firmware name is not in the devres list anymore.
> 5. The system resumes, request_firmware() cannot find firmware in the
> cache, hit the WARN_ON.
>
> So these two patches are for separate issues.
>
> [1] https://lkml.org/lkml/2017/8/22/123

Does the explanation make the situation clear?
Do you any concern about the approach?

>
>>
>> Regards
>>
>> Marcel
>>