2024-06-13 23:41:17

by Bert Karwatzki

[permalink] [raw]
Subject: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

Since linux-next-20240613 firefox-esr crashes after several minutes of browsing
giving the following error messages in dmesg:
[ T2343] BUG: Bad rss-counter state mm:00000000babe0c39 type:MM_ANONPAGES val:86
[ T4063] show_signal_msg: 16 callbacks suppressed
[ T4063] Isolated Web Co[4063]: segfault at 396d1686c000 ip 0000396d1686c000 sp
00007ffd767b30a8 error 14 likely on CPU 7 (core 3, socket 0)
[ T4063] Code: Unable to access opcode bytes at 0x396d1686bfd6.
[ T4211] BUG: Bad rss-counter state mm:00000000cd9fc541 type:MM_ANONPAGES
val:817
[ T3798] BUG: Bad rss-counter state mm:00000000432d87c2 type:MM_ANONPAGES
val:181
[ T5548] BUG: Bad rss-counter state mm:00000000034aa27a type:MM_ANONPAGES
val:242
[ T3823] BUG: Bad rss-counter state mm:0000000099734197 type:MM_ANONPAGES
val:137
[ T1] BUG: Bad rss-counter state mm:000000005e5e2f2f type:MM_ANONPAGES val:28

(these are the error messages of several crashes and the error seems to affect
other processes, too (T1))

The crash can be provoked to appear in ~1min by opening large numbers of tabs in
firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
reverting this commit in linux-next-20240613 fixes the issue for me.

Bert Karwatzki

PS. Please CC me when answering, I'm not subscribed to the lists.


2024-06-14 00:04:00

by Andrew Morton

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

On Fri, 14 Jun 2024 01:40:54 +0200 Bert Karwatzki <[email protected]> wrote:

> Since linux-next-20240613 firefox-esr crashes after several minutes of browsing
> giving the following error messages in dmesg:
> [ T2343] BUG: Bad rss-counter state mm:00000000babe0c39 type:MM_ANONPAGES val:86
> [ T4063] show_signal_msg: 16 callbacks suppressed
> [ T4063] Isolated Web Co[4063]: segfault at 396d1686c000 ip 0000396d1686c000 sp
> 00007ffd767b30a8 error 14 likely on CPU 7 (core 3, socket 0)
> [ T4063] Code: Unable to access opcode bytes at 0x396d1686bfd6.
> [ T4211] BUG: Bad rss-counter state mm:00000000cd9fc541 type:MM_ANONPAGES
> val:817
> [ T3798] BUG: Bad rss-counter state mm:00000000432d87c2 type:MM_ANONPAGES
> val:181
> [ T5548] BUG: Bad rss-counter state mm:00000000034aa27a type:MM_ANONPAGES
> val:242
> [ T3823] BUG: Bad rss-counter state mm:0000000099734197 type:MM_ANONPAGES
> val:137
> [ T1] BUG: Bad rss-counter state mm:000000005e5e2f2f type:MM_ANONPAGES val:28

Let's hope Linus doesn't read this. Why are we nuking the entire
planet just because some counter went wonky?

> (these are the error messages of several crashes and the error seems to affect
> other processes, too (T1))
>
> The crash can be provoked to appear in ~1min by opening large numbers of tabs in
> firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
> error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
> reverting this commit in linux-next-20240613 fixes the issue for me.

Thanks, that must have taken a lot of time.

2024-06-14 08:02:59

by Jiri Olsa

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

On Fri, Jun 14, 2024 at 01:40:54AM +0200, Bert Karwatzki wrote:
> Since linux-next-20240613 firefox-esr crashes after several minutes of browsing
> giving the following error messages in dmesg:
> [ T2343] BUG: Bad rss-counter state mm:00000000babe0c39 type:MM_ANONPAGES val:86
> [ T4063] show_signal_msg: 16 callbacks suppressed
> [ T4063] Isolated Web Co[4063]: segfault at 396d1686c000 ip 0000396d1686c000 sp
> 00007ffd767b30a8 error 14 likely on CPU 7 (core 3, socket 0)
> [ T4063] Code: Unable to access opcode bytes at 0x396d1686bfd6.
> [ T4211] BUG: Bad rss-counter state mm:00000000cd9fc541 type:MM_ANONPAGES
> val:817
> [ T3798] BUG: Bad rss-counter state mm:00000000432d87c2 type:MM_ANONPAGES
> val:181
> [ T5548] BUG: Bad rss-counter state mm:00000000034aa27a type:MM_ANONPAGES
> val:242
> [ T3823] BUG: Bad rss-counter state mm:0000000099734197 type:MM_ANONPAGES
> val:137
> [ T1] BUG: Bad rss-counter state mm:000000005e5e2f2f type:MM_ANONPAGES val:28
>
> (these are the error messages of several crashes and the error seems to affect
> other processes, too (T1))
>
> The crash can be provoked to appear in ~1min by opening large numbers of tabs in
> firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
> error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
> reverting this commit in linux-next-20240613 fixes the issue for me.

+1, bpf selftests are failing for me because mmap fails with:
mmap(0x7f9361bc9000, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 4, 0) = -1 EBUSY (Device or resource busy)

did not get to the cause, but reverting the 1c29a32ce65f fixes it for me

thanks,
jirka

>
> Bert Karwatzki
>
> PS. Please CC me when answering, I'm not subscribed to the lists.
>

2024-06-14 08:36:47

by Bert Karwatzki

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

Am Donnerstag, dem 13.06.2024 um 17:03 -0700 schrieb Andrew Morton:
> Let's hope Linus doesn't read this.  Why are we nuking the entire
> planet just because some counter went wonky?

It's not just the wonky counter, the error always comes with a segfault which
causes a firefox tab to crash, though I've not yet figured out how this is
related to the BUG message or commit 1c29a32ce.

[ 179.393488] [ T2278] show_signal_msg: 16 callbacks suppressed
[ 179.393492] [ T2278] Privileged Cont[2278]: segfault at 22cddf91d3a0 ip
000022cde1aad010 sp 00007ffc616851a8 error 7 likely on CPU 15 (core 7, socket 0)
[ 179.393504] [ T2278] Code: Unable to access opcode bytes at 0x22cde1aacfe6.
[ 179.498173] [ T2289] BUG: Bad rss-counter state mm:00000000d0a3f682
type:MM_ANONPAGES val:1885

Bert Karwatzki


2024-06-14 08:45:34

by Bert Karwatzki

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

Am Donnerstag, dem 13.06.2024 um 17:03 -0700 schrieb Andrew Morton:
> Let's hope Linus doesn't read this. Why are we nuking the entire
> planet just because some counter went wonky?

It's not just the wonky counter, the error always comes with a segfault which
causes a firefox tab to crash, though I've not yet figured out how this is
related to the BUG message or commit 1c29a32ce.

[ 179.393488] [ T2278] show_signal_msg: 16 callbacks suppressed
[ 179.393492] [ T2278] Privileged Cont[2278]: segfault at 22cddf91d3a0 ip
000022cde1aad010 sp 00007ffc616851a8 error 7 likely on CPU 15 (core 7, socket 0)
[ 179.393504] [ T2278] Code: Unable to access opcode bytes at 0x22cde1aacfe6.
[ 179.498173] [ T2289] BUG: Bad rss-counter state mm:00000000d0a3f682
type:MM_ANONPAGES val:1885

Bert Karwatzki

2024-06-14 12:30:43

by Liam R. Howlett

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

* Andrew Morton <[email protected]> [240613 20:03]:
> On Fri, 14 Jun 2024 01:40:54 +0200 Bert Karwatzki <[email protected]> wrote:
>
> > Since linux-next-20240613 firefox-esr crashes after several minutes of browsing
> > giving the following error messages in dmesg:
> > [ T2343] BUG: Bad rss-counter state mm:00000000babe0c39 type:MM_ANONPAGES val:86
> > [ T4063] show_signal_msg: 16 callbacks suppressed
> > [ T4063] Isolated Web Co[4063]: segfault at 396d1686c000 ip 0000396d1686c000 sp
> > 00007ffd767b30a8 error 14 likely on CPU 7 (core 3, socket 0)
> > [ T4063] Code: Unable to access opcode bytes at 0x396d1686bfd6.
> > [ T4211] BUG: Bad rss-counter state mm:00000000cd9fc541 type:MM_ANONPAGES
> > val:817
> > [ T3798] BUG: Bad rss-counter state mm:00000000432d87c2 type:MM_ANONPAGES
> > val:181
> > [ T5548] BUG: Bad rss-counter state mm:00000000034aa27a type:MM_ANONPAGES
> > val:242
> > [ T3823] BUG: Bad rss-counter state mm:0000000099734197 type:MM_ANONPAGES
> > val:137
> > [ T1] BUG: Bad rss-counter state mm:000000005e5e2f2f type:MM_ANONPAGES val:28
>
> Let's hope Linus doesn't read this. Why are we nuking the entire
> planet just because some counter went wonky?

I think I know what's going on, and it's more than just the counters
being off here. The counters are the symptom of what is happening.

>
> > (these are the error messages of several crashes and the error seems to affect
> > other processes, too (T1))
> >
> > The crash can be provoked to appear in ~1min by opening large numbers of tabs in
> > firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
> > error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
> > reverting this commit in linux-next-20240613 fixes the issue for me.
>
> Thanks, that must have taken a lot of time.

Yes, thank you for all that work and apologies in creating this
frustrating situation.

Andrew, please drop the set from your branch. I need to write some more
tests, but I suspect I will need to do some work around the vma_merge()
function, which is never a fun endeavor.

Regards,
Liam

2024-06-14 12:31:31

by Liam R. Howlett

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

* Jiri Olsa <[email protected]> [240614 03:54]:
> On Fri, Jun 14, 2024 at 01:40:54AM +0200, Bert Karwatzki wrote:
> > Since linux-next-20240613 firefox-esr crashes after several minutes of browsing
> > giving the following error messages in dmesg:
> > [ T2343] BUG: Bad rss-counter state mm:00000000babe0c39 type:MM_ANONPAGES val:86
> > [ T4063] show_signal_msg: 16 callbacks suppressed
> > [ T4063] Isolated Web Co[4063]: segfault at 396d1686c000 ip 0000396d1686c000 sp
> > 00007ffd767b30a8 error 14 likely on CPU 7 (core 3, socket 0)
> > [ T4063] Code: Unable to access opcode bytes at 0x396d1686bfd6.
> > [ T4211] BUG: Bad rss-counter state mm:00000000cd9fc541 type:MM_ANONPAGES
> > val:817
> > [ T3798] BUG: Bad rss-counter state mm:00000000432d87c2 type:MM_ANONPAGES
> > val:181
> > [ T5548] BUG: Bad rss-counter state mm:00000000034aa27a type:MM_ANONPAGES
> > val:242
> > [ T3823] BUG: Bad rss-counter state mm:0000000099734197 type:MM_ANONPAGES
> > val:137
> > [ T1] BUG: Bad rss-counter state mm:000000005e5e2f2f type:MM_ANONPAGES val:28
> >
> > (these are the error messages of several crashes and the error seems to affect
> > other processes, too (T1))
> >
> > The crash can be provoked to appear in ~1min by opening large numbers of tabs in
> > firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
> > error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
> > reverting this commit in linux-next-20240613 fixes the issue for me.
>
> +1, bpf selftests are failing for me because mmap fails with:
> mmap(0x7f9361bc9000, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 4, 0) = -1 EBUSY (Device or resource busy)
>
> did not get to the cause, but reverting the 1c29a32ce65f fixes it for me
>

Thanks for the information, this sounds like a much easier way to
recreate the issue.

Regards,
Liam

2024-06-14 17:21:57

by Andrew Morton

[permalink] [raw]
Subject: Re: commit 1c29a32ce65f4cd0f1c causes Bad rss-counter state and firefox-esr crash in linux-next-20240613

On Fri, 14 Jun 2024 08:30:20 -0400 "Liam R. Howlett" <[email protected]> wrote:

> >
> > > (these are the error messages of several crashes and the error seems to affect
> > > other processes, too (T1))
> > >
> > > The crash can be provoked to appear in ~1min by opening large numbers of tabs in
> > > firefox-esr (by holding pressing ctrl+t for some time). With this I bisected the
> > > error to commit "1c29a32ce65f mm/mmap: use split munmap calls for MAP_FIXED" and
> > > reverting this commit in linux-next-20240613 fixes the issue for me.
> >
> > Thanks, that must have taken a lot of time.
>
> Yes, thank you for all that work and apologies in creating this
> frustrating situation.
>
> Andrew, please drop the set from your branch. I need to write some more
> tests, but I suspect I will need to do some work around the vma_merge()
> function, which is never a fun endeavor.

Dropped, thanks.