2021-09-30 07:50:03

by Tor Vic

[permalink] [raw]
Subject: [BUG] kernel BUG at mm/slub.c - possible BFQ issue?

Hello,

I encounter a hard freeze on both 5.14 and 5.15 when using BFQ.
Unfortunately, I do not have a full error log, because the computer
totally freezes and slightly corrupts the display, so it's
impossible to read the entire message.

However, what I could get is the following:

kernel BUG at mm/slub.c:379
invalid opcode: 0000 [#1]
RIP: 0010:__slab_free
[...]
Call Trace:
bfq_set_next_ioprio_data
[...]
bfq_put_queue
bfq_insert_requests
[...]

This issue appears more or less randomly and it sometimes takes a
little while to reproduce it (running fio helps).
The call trace always contains references to BFQ, but they are not
always the exact same. Once, I could see on the corrupted display
the message "general protection fault".
I could reproduce this issue on two computers.

Not quite sure but I *think* the issue first appeared somewhere around
5.14.5 or 5.14.6, during which time BFQ only got the following commit:

(88013a0c5d99) block, bfq: honor already-setup queue merges

5.13 doesn't seem to be affected AFAICS.

Does anyone have an idea what is going on?
I will now revert the above commit and see if that helps...

Thanks,
Tor


2021-10-01 15:04:40

by Tor Vic

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at mm/slub.c - possible BFQ issue?

> [email protected] hat am 30.09.2021 09:44 geschrieben:
>
>
> Hello,
>
> I encounter a hard freeze on both 5.14 and 5.15 when using BFQ.
> Unfortunately, I do not have a full error log, because the computer
> totally freezes and slightly corrupts the display, so it's
> impossible to read the entire message.
>
> However, what I could get is the following:
>
> kernel BUG at mm/slub.c:379
> invalid opcode: 0000 [#1]
> RIP: 0010:__slab_free
> [...]
> Call Trace:
> bfq_set_next_ioprio_data
> [...]
> bfq_put_queue
> bfq_insert_requests
> [...]
>
> This issue appears more or less randomly and it sometimes takes a
> little while to reproduce it (running fio helps).
> The call trace always contains references to BFQ, but they are not
> always the exact same. Once, I could see on the corrupted display
> the message "general protection fault".
> I could reproduce this issue on two computers.
>
> Not quite sure but I *think* the issue first appeared somewhere around
> 5.14.5 or 5.14.6, during which time BFQ only got the following commit:
>
> (88013a0c5d99) block, bfq: honor already-setup queue merges

I have now reverted the above commit and launched some heavy I/O like
e.g. git kernel, fio, xz compression, and so far, no freezes anymore!
Too early to say that this commit really is the cause though.
Would be great if someone could have a look at it.

>
> 5.13 doesn't seem to be affected AFAICS.
>
> Does anyone have an idea what is going on?
> I will now revert the above commit and see if that helps...
>
> Thanks,
> Tor

2021-10-01 15:06:07

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at mm/slub.c - possible BFQ issue?

On 10/1/21 9:01 AM, [email protected] wrote:
>> [email protected] hat am 30.09.2021 09:44 geschrieben:
>>
>>
>> Hello,
>>
>> I encounter a hard freeze on both 5.14 and 5.15 when using BFQ.
>> Unfortunately, I do not have a full error log, because the computer
>> totally freezes and slightly corrupts the display, so it's
>> impossible to read the entire message.
>>
>> However, what I could get is the following:
>>
>> kernel BUG at mm/slub.c:379
>> invalid opcode: 0000 [#1]
>> RIP: 0010:__slab_free
>> [...]
>> Call Trace:
>> bfq_set_next_ioprio_data
>> [...]
>> bfq_put_queue
>> bfq_insert_requests
>> [...]
>>
>> This issue appears more or less randomly and it sometimes takes a
>> little while to reproduce it (running fio helps).
>> The call trace always contains references to BFQ, but they are not
>> always the exact same. Once, I could see on the corrupted display
>> the message "general protection fault".
>> I could reproduce this issue on two computers.
>>
>> Not quite sure but I *think* the issue first appeared somewhere around
>> 5.14.5 or 5.14.6, during which time BFQ only got the following commit:
>>
>> (88013a0c5d99) block, bfq: honor already-setup queue merges
>
> I have now reverted the above commit and launched some heavy I/O like
> e.g. git kernel, fio, xz compression, and so far, no freezes anymore!
> Too early to say that this commit really is the cause though.
> Would be great if someone could have a look at it.

It's known buggy, and a revert has been queued up since earlier this
week. It'll go to Linus for 5.15-rc4, and will hit 5.14 stable shortly
thereafter.

--
Jens Axboe

2021-10-05 15:10:23

by Paolo Valente

[permalink] [raw]
Subject: Re: [BUG] kernel BUG at mm/slub.c - possible BFQ issue?



> Il giorno 1 ott 2021, alle ore 17:01, [email protected] ha scritto:
>
>> [email protected] hat am 30.09.2021 09:44 geschrieben:
>>
>>
>> Hello,
>>
>> I encounter a hard freeze on both 5.14 and 5.15 when using BFQ.
>> Unfortunately, I do not have a full error log, because the computer
>> totally freezes and slightly corrupts the display, so it's
>> impossible to read the entire message.
>>
>> However, what I could get is the following:
>>
>> kernel BUG at mm/slub.c:379
>> invalid opcode: 0000 [#1]
>> RIP: 0010:__slab_free
>> [...]
>> Call Trace:
>> bfq_set_next_ioprio_data
>> [...]
>> bfq_put_queue
>> bfq_insert_requests
>> [...]
>>
>> This issue appears more or less randomly and it sometimes takes a
>> little while to reproduce it (running fio helps).
>> The call trace always contains references to BFQ, but they are not
>> always the exact same. Once, I could see on the corrupted display
>> the message "general protection fault".
>> I could reproduce this issue on two computers.
>>
>> Not quite sure but I *think* the issue first appeared somewhere around
>> 5.14.5 or 5.14.6, during which time BFQ only got the following commit:
>>
>> (88013a0c5d99) block, bfq: honor already-setup queue merges
>
> I have now reverted the above commit and launched some heavy I/O like
> e.g. git kernel, fio, xz compression, and so far, no freezes anymore!
> Too early to say that this commit really is the cause though.
> Would be great if someone could have a look at it.
>

Hi,
sorry for the delay and that you very much for reporting this crash.
I have prepared a dev version of BFQ, to try to solve this problem.
It's based on a 5.12.0, and should hopefully provide more information
upon failure. Could you please give it a try? You can find it here:
https://github.com/Algodev-github/bfq-mq/tree/dev-bfq-on-5.12

Thanks for your help,
Paolo


>>
>> 5.13 doesn't seem to be affected AFAICS.
>>
>> Does anyone have an idea what is going on?
>> I will now revert the above commit and see if that helps...
>>
>> Thanks,
>> Tor