LinuxLists.cc - RIP: + BUG: with 6.8.11 and BTRFS

2024-05-26 09:09:04

Subject: RIP: + BUG: with 6.8.11 and BTRFS

Attached is the output of

grep -A 200 -e BUG: -e RIP: messages > splats.txt

from a Gentoo hardened Linux server running at a bare metal server since
3 yrs. The system contained 2x 3.84 TiB NVMe drives, / was a raid1,
/data was configured as raid0.

I upgraded yesterday from kernel 6.8.10 to 6.8.11.

The system does not recover from reboot in moment.

--
Toralf

2024-05-26 09:13:10

by Toralf Förster

[permalink] [raw]

Subject: Re: RIP: + BUG: with 6.8.11 and BTRFS

On 5/26/24 11:08, Toralf Förster wrote:
> Attached is the output of
>
> grep -A 200 -e BUG: -e RIP: messages > splats.txt
>
> from a Gentoo hardened Linux server running at a bare metal server since
> 3 yrs. The system contained 2x 3.84 TiB NVMe drives, / was a raid1,
> /data was configured as raid0.
>
> I upgraded yesterday from kernel 6.8.10 to 6.8.11.
>
> The system does not recover from reboot in moment.
>
And here's the atatchment

--
Toralf

Attachments:

splats.txt (945.43 kB)

2024-05-26 14:47:09

by Toralf Förster

[permalink] [raw]

Subject: Re: RIP: + BUG: with 6.8.11 and BTRFS

On 5/26/24 11:08, Toralf Förster wrote:
>
> I upgraded yesterday from kernel 6.8.10 to 6.8.11.
>
> The system does not recover from reboot in moment.

It recovered eventually, I switched to 6.9.2, which runs fine so far.
But these are new log messages:

May 26 13:44:06 mr-fox kernel: WARNING: stack recursion on stack type 4
May 26 13:44:06 mr-fox kernel: WARNING: can't access registers at
syscall_return_via_sysret+0x64/0xc2
May 26 13:44:06 mr-fox sSMTP[29464]: Creating SSL connection to host
May 26 13:44:06 mr-fox sSMTP[29464]: SSL connection using
TLS_AES_256_GCM_SHA384
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (2635 >
2500), lowering kernel.perf_event_max_sample_rate to 75750
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (3323 >
3293), lowering kernel.perf_event_max_sample_rate to 60000
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (4168 >
4153), lowering kernel.perf_event_max_sample_rate to 47750
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (5273 >
5210), lowering kernel.perf_event_max_sample_rate to 37750
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (6600 >
6591), lowering kernel.perf_event_max_sample_rate to 30250
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (8318 >
8250), lowering kernel.perf_event_max_sample_rate to 24000
May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (10415 >
10397), lowering kernel.perf_event_max_sample_rate to 19000
May 26 13:44:09 mr-fox kernel: perf: interrupt took too long (13048 >
13018), lowering kernel.perf_event_max_sample_rate to 15250

--
Toralf

2024-05-27 09:35:41

by Qu Wenruo

[permalink] [raw]

Subject: Re: RIP: + BUG: with 6.8.11 and BTRFS

在 2024/5/27 00:16, Toralf Förster 写道:
> On 5/26/24 11:08, Toralf Förster wrote:
>>
>> I upgraded yesterday from kernel 6.8.10 to 6.8.11.
>>
>> The system does not recover from reboot in moment.
>
> It recovered eventually, I switched to 6.9.2, which runs fine so far.
> But these are new log messages:

That looks exactly the one Linus recently reported
(https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/)

Unfortunately he is reproducing it with latest master, so I'm not sure
if v6.9 is any better.

Meanwhile if you can reproduce the problem reliably, I can craft several
debug patches for you to test, but I'm afraid it's not that reproducible...

Thanks,
Qu
>
> May 26 13:44:06 mr-fox kernel: WARNING: stack recursion on stack type 4
> May 26 13:44:06 mr-fox kernel: WARNING: can't access registers at
> syscall_return_via_sysret+0x64/0xc2
> May 26 13:44:06 mr-fox sSMTP[29464]: Creating SSL connection to host
> May 26 13:44:06 mr-fox sSMTP[29464]: SSL connection using
> TLS_AES_256_GCM_SHA384
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (2635 >
> 2500), lowering kernel.perf_event_max_sample_rate to 75750
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (3323 >
> 3293), lowering kernel.perf_event_max_sample_rate to 60000
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (4168 >
> 4153), lowering kernel.perf_event_max_sample_rate to 47750
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (5273 >
> 5210), lowering kernel.perf_event_max_sample_rate to 37750
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (6600 >
> 6591), lowering kernel.perf_event_max_sample_rate to 30250
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (8318 >
> 8250), lowering kernel.perf_event_max_sample_rate to 24000
> May 26 13:44:07 mr-fox kernel: perf: interrupt took too long (10415 >
> 10397), lowering kernel.perf_event_max_sample_rate to 19000
> May 26 13:44:09 mr-fox kernel: perf: interrupt took too long (13048 >
> 13018), lowering kernel.perf_event_max_sample_rate to 15250
>
> --
> Toralf
>
>

2024-05-27 16:19:55

by David Sterba

[permalink] [raw]

Subject: Re: RIP: + BUG: with 6.8.11 and BTRFS

On Mon, May 27, 2024 at 07:02:56PM +0930, Qu Wenruo wrote:
>
>
> 在 2024/5/27 00:16, Toralf Förster 写道:
> > On 5/26/24 11:08, Toralf Förster wrote:
> >>
> >> I upgraded yesterday from kernel 6.8.10 to 6.8.11.
> >>
> >> The system does not recover from reboot in moment.
> >
> > It recovered eventually, I switched to 6.9.2, which runs fine so far.
> > But these are new log messages:
>
> That looks exactly the one Linus recently reported
> (https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/)

It could be similar to what was fixed in ef1e68236b91 ("btrfs: fix race
in read_extent_buffer_pages()"), also hard to reproduce. The stack
traces are the only clues we have now.