2023-11-28 23:47:48

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: block/badblocks.c warning in 6.7-rc2

Hi,

I notice a regression report that is rather well-handled on Bugzilla [1].
Quoting from it:

>
> when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
> WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284)
> The reason why isn't clear, but the stack trace points to an error in md error handling.
> This bug didn't happen in 6.6
> there are 3 commits in 6.7-rc2 which may cause them,
> in attachment:
> - decoded stacktrace of dmesg
> - kernel .config

The culprit author then replied:

> The warning is from this line of code in _badblocks_check(),
> 1284 WARN_ON(bb->shift < 0 || sectors == 0);
>
> It means the caller sent an invalid range to check. From the oops information,
> "RDX: 0000000000000000" means parameter 'sectors' is 0.
>
> So the question is, why does md raid code send a 0-length range for badblocks check? Is this behavior on purpose, or improper?
> ...
> IMHO, it doesn't make sense for caller to check a zero-length LBA range. The warning works as expect to detect improper call to badblocks_check().

See Bugzilla for the full thread and attached decoded dmesg and kernel config.

Anyway, I'm adding this regression to regzbot:

#regzbot introduced: 3ea3354cb9f03e https://bugzilla.kernel.org/show_bug.cgi?id=218184
#regzbot title: badblocks_check regression (md error handling) on bcachefs volume

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218184

--
An old man doll... just what I always wanted! - Clara


2023-11-29 08:08:59

by Coly Li

[permalink] [raw]
Subject: Re: block/badblocks.c warning in 6.7-rc2



> 2023年11月29日 07:47,Bagas Sanjaya <[email protected]> 写道:
>
> Hi,
>
> I notice a regression report that is rather well-handled on Bugzilla [1].
> Quoting from it:
>
>>
>> when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
>> WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284)
>> The reason why isn't clear, but the stack trace points to an error in md error handling.
>> This bug didn't happen in 6.6
>> there are 3 commits in 6.7-rc2 which may cause them,
>> in attachment:
>> - decoded stacktrace of dmesg
>> - kernel .config
>
> The culprit author then replied:
>
>> The warning is from this line of code in _badblocks_check(),
>> 1284 WARN_ON(bb->shift < 0 || sectors == 0);
>>
>> It means the caller sent an invalid range to check. From the oops information,
>> "RDX: 0000000000000000" means parameter 'sectors' is 0.
>>
>> So the question is, why does md raid code send a 0-length range for badblocks check? Is this behavior on purpose, or improper?
>> ...
>> IMHO, it doesn't make sense for caller to check a zero-length LBA range. The warning works as expect to detect improper call to badblocks_check().
>
> See Bugzilla for the full thread and attached decoded dmesg and kernel config.
>
> Anyway, I'm adding this regression to regzbot:
>
> #regzbot introduced: 3ea3354cb9f03e https://bugzilla.kernel.org/show_bug.cgi?id=218184
> #regzbot title: badblocks_check regression (md error handling) on bcachefs volume
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=218184

It seems the improved bad blocks code caught a zero-size bio request from upper layer, this improper behavior was silently neglected before. It might be too early or simple to decide this is a regression, especially Janpieter closes the report for now.

Thanks.

Coly Li

2023-12-04 12:11:19

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: block/badblocks.c warning in 6.7-rc2

On 29.11.23 09:08, Coly Li wrote:
>> 2023年11月29日 07:47,Bagas Sanjaya <[email protected]> 写道:
>>
>> I notice a regression report that is rather well-handled on Bugzilla [1].
>> Quoting from it:
>>
>>>
>>> when booting from 6.7-rc2, compiled with clang, I get this warning on one of my 3 bcachefs volumes:
>>> WARNING: CPU: 3 PID: 712 at block/badblocks.c:1284 badblocks_check (block/badblocks.c:1284)
>>> The reason why isn't clear, but the stack trace points to an error in md error handling.
>>> This bug didn't happen in 6.6
>>> there are 3 commits in 6.7-rc2 which may cause them,
>>> in attachment:
>>> - decoded stacktrace of dmesg
>>> - kernel .config
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=218184
>
> It seems the improved bad blocks code caught a zero-size bio request
> from upper layer, this improper behavior was silently neglected before.
> It might be too early or simple to decide this is a regression,

Well, it's often better to add an issue to the tracking even if there is
a chance that it's not a real regression, as the issue might otherwise
fall through the cracks. But given...

> especially Janpieter closes the report for now.

...this I agree that this is likely not worth tracking, hence:

#regzbot inconclusive: maybe not a regression and report can not
reproduce it anymore

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.