2023-06-21 02:31:01

by syzbot

[permalink] [raw]
Subject: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

Hello,

syzbot found the following issue on:

HEAD commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=158b99d3280000
kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/2dc89d5fee38/disk-40f71e7c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0ced5a475218/vmlinux-40f71e7c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d543a4f69684/bzImage-40f71e7c.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/e2012b787a31/mount_0.gz

The issue was bisected to:

commit e0a8de7da35e5b22b44fa1013ccc0716e17b0c14
Author: Dave Chinner <[email protected]>
Date: Mon Jun 5 04:48:15 2023 +0000

xfs: fix agf/agfl verification on v4 filesystems

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10bb665b280000
final oops: https://syzkaller.appspot.com/x/report.txt?x=12bb665b280000
console output: https://syzkaller.appspot.com/x/log.txt?x=14bb665b280000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: e0a8de7da35e ("xfs: fix agf/agfl verification on v4 filesystems")

XFS (loop0): WARNING: Reset corrupted AGFL on AG 0. 4 blocks leaked. Please unmount and run xfs_repair.
XFS (loop0): Internal error !ino_ok at line 213 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir_ino_validate+0x2c/0x90 fs/xfs/libxfs/xfs_dir2.c:220
CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Workqueue: xfs_iwalk-4998 xfs_pwork_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
xfs_error_report fs/xfs/xfs_error.c:384 [inline]
xfs_corruption_error+0x11d/0x170 fs/xfs/xfs_error.c:401
xfs_dir_ino_validate+0x5f/0x90 fs/xfs/libxfs/xfs_dir2.c:213
xfs_dir2_sf_verify+0x487/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:779
xfs_ifork_verify_local_data fs/xfs/libxfs/xfs_inode_fork.c:706 [inline]
xfs_iformat_data_fork+0x4bf/0x6d0 fs/xfs/libxfs/xfs_inode_fork.c:256
xfs_inode_from_disk+0xbbf/0x1070 fs/xfs/libxfs/xfs_inode_buf.c:245
xfs_iget_cache_miss fs/xfs/xfs_icache.c:639 [inline]
xfs_iget+0xf08/0x3050 fs/xfs/xfs_icache.c:777
xfs_qm_dqusage_adjust+0x228/0x670 fs/xfs/xfs_qm.c:1157
xfs_iwalk_ag_recs+0x486/0x7c0 fs/xfs/xfs_iwalk.c:220
xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
xfs_iwalk_ag_work+0xfb/0x1b0 fs/xfs/xfs_iwalk.c:624
xfs_pwork_work+0x7c/0x190 fs/xfs/xfs_pwork.c:47
process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
kthread+0x2b8/0x350 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
XFS (loop0): Corruption detected. Unmount and run xfs_repair
XFS (loop0): Invalid inode number 0x24
XFS (loop0): Metadata corruption detected at xfs_dir2_sf_verify+0x767/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:774, inode 0x23 data fork
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 32 bytes of corrupted metadata buffer:
00000000: 02 00 00 00 00 20 05 00 30 66 69 6c 65 30 01 00 ..... ..0file0..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2023-06-21 08:08:53

by Eric Biggers

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

Hi Dave,

On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
> On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
> > git tree: upstream
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=158b99d3280000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
> > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/2dc89d5fee38/disk-40f71e7c.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/0ced5a475218/vmlinux-40f71e7c.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/d543a4f69684/bzImage-40f71e7c.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/e2012b787a31/mount_0.gz
> >
> > The issue was bisected to:
> >
> > commit e0a8de7da35e5b22b44fa1013ccc0716e17b0c14
> > Author: Dave Chinner <[email protected]>
> > Date: Mon Jun 5 04:48:15 2023 +0000
> >
> > xfs: fix agf/agfl verification on v4 filesystems
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10bb665b280000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=12bb665b280000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14bb665b280000
>
> WTAF?
>
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> > Fixes: e0a8de7da35e ("xfs: fix agf/agfl verification on v4 filesystems")
> >
> > XFS (loop0): WARNING: Reset corrupted AGFL on AG 0. 4 blocks leaked. Please unmount and run xfs_repair.
> > XFS (loop0): Internal error !ino_ok at line 213 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir_ino_validate+0x2c/0x90 fs/xfs/libxfs/xfs_dir2.c:220
> > CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > Workqueue: xfs_iwalk-4998 xfs_pwork_work
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > xfs_error_report fs/xfs/xfs_error.c:384 [inline]
> > xfs_corruption_error+0x11d/0x170 fs/xfs/xfs_error.c:401
> > xfs_dir_ino_validate+0x5f/0x90 fs/xfs/libxfs/xfs_dir2.c:213
> > xfs_dir2_sf_verify+0x487/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:779
> > xfs_ifork_verify_local_data fs/xfs/libxfs/xfs_inode_fork.c:706 [inline]
> > xfs_iformat_data_fork+0x4bf/0x6d0 fs/xfs/libxfs/xfs_inode_fork.c:256
> > xfs_inode_from_disk+0xbbf/0x1070 fs/xfs/libxfs/xfs_inode_buf.c:245
> > xfs_iget_cache_miss fs/xfs/xfs_icache.c:639 [inline]
> > xfs_iget+0xf08/0x3050 fs/xfs/xfs_icache.c:777
> > xfs_qm_dqusage_adjust+0x228/0x670 fs/xfs/xfs_qm.c:1157
> > xfs_iwalk_ag_recs+0x486/0x7c0 fs/xfs/xfs_iwalk.c:220
> > xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
> > xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
> > xfs_iwalk_ag_work+0xfb/0x1b0 fs/xfs/xfs_iwalk.c:624
> > xfs_pwork_work+0x7c/0x190 fs/xfs/xfs_pwork.c:47
> > process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
> > worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
> > kthread+0x2b8/0x350 kernel/kthread.c:379
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> > </TASK>
> > XFS (loop0): Corruption detected. Unmount and run xfs_repair
> > XFS (loop0): Invalid inode number 0x24
> > XFS (loop0): Metadata corruption detected at xfs_dir2_sf_verify+0x767/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:774, inode 0x23 data fork
> > XFS (loop0): Unmount and run xfs_repair
> > XFS (loop0): First 32 bytes of corrupted metadata buffer:
> > 00000000: 02 00 00 00 00 20 05 00 30 66 69 6c 65 30 01 00 ..... ..0file0..
>
> syzbot corrupted a v4 filesystem.
>
> Syzbot corrupted the superblock, XFS detected and corrected that.
>
> Syzbot corrupted the AGI. XFS detected that.
>
> Syzbot corrupted the AGF and AGFL. XFS detected and corrected that,
> allowing operations to continue.
>
> Syzbot also corrupted a directory inode. XFS detected that and
> warned about it.
>
> Test finished.
>
> At no point did the kernel crash, oops, do anything bad like a UAF
> or OOB read. All XFS did was catch the corruptions, fix some of them
> so it could continue operating, and warn the user that they need to
> unmount and run repair.
>
> So exactly what is syzbot complaining about here? There's no kernel
> issue here at all.
>
> Also, I cannot tell syzbot "don't ever report this as a bug again",
> so the syzbot developers are going to have to triage and fix this
> syzbot problem themselves so it doesn't keep getting reported to
> us...

I think the problem here was that XFS logged a message beginning with
"WARNING:", followed by a stack trace. In the log that looks like a warning
generated by the WARN_ON() macro, which is meant for reporting recoverable
kernel bugs. It's difficult for any program to understand the log in cases like
this. This is why include/asm-generic/bug.h contains the following comment:

* Do not include "BUG"/"WARNING" in format strings manually to make these
* conditions distinguishable from kernel issues.

If you have a constructive suggestion of how all programs that parse the kernel
log can identify real warnings reliably without getting confused by cases like
this, I'm sure that would be appreciated. It would need to be documented and
then the guidance in bug.h could then be removed. But until then, the above is
the current guidance.

- Eric

2023-06-21 08:39:03

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
> git tree: upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=158b99d3280000
> kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
> dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/2dc89d5fee38/disk-40f71e7c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/0ced5a475218/vmlinux-40f71e7c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/d543a4f69684/bzImage-40f71e7c.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/e2012b787a31/mount_0.gz
>
> The issue was bisected to:
>
> commit e0a8de7da35e5b22b44fa1013ccc0716e17b0c14
> Author: Dave Chinner <[email protected]>
> Date: Mon Jun 5 04:48:15 2023 +0000
>
> xfs: fix agf/agfl verification on v4 filesystems
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10bb665b280000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=12bb665b280000
> console output: https://syzkaller.appspot.com/x/log.txt?x=14bb665b280000

WTAF?

> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
> Fixes: e0a8de7da35e ("xfs: fix agf/agfl verification on v4 filesystems")
>
> XFS (loop0): WARNING: Reset corrupted AGFL on AG 0. 4 blocks leaked. Please unmount and run xfs_repair.
> XFS (loop0): Internal error !ino_ok at line 213 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir_ino_validate+0x2c/0x90 fs/xfs/libxfs/xfs_dir2.c:220
> CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> Workqueue: xfs_iwalk-4998 xfs_pwork_work
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> xfs_error_report fs/xfs/xfs_error.c:384 [inline]
> xfs_corruption_error+0x11d/0x170 fs/xfs/xfs_error.c:401
> xfs_dir_ino_validate+0x5f/0x90 fs/xfs/libxfs/xfs_dir2.c:213
> xfs_dir2_sf_verify+0x487/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:779
> xfs_ifork_verify_local_data fs/xfs/libxfs/xfs_inode_fork.c:706 [inline]
> xfs_iformat_data_fork+0x4bf/0x6d0 fs/xfs/libxfs/xfs_inode_fork.c:256
> xfs_inode_from_disk+0xbbf/0x1070 fs/xfs/libxfs/xfs_inode_buf.c:245
> xfs_iget_cache_miss fs/xfs/xfs_icache.c:639 [inline]
> xfs_iget+0xf08/0x3050 fs/xfs/xfs_icache.c:777
> xfs_qm_dqusage_adjust+0x228/0x670 fs/xfs/xfs_qm.c:1157
> xfs_iwalk_ag_recs+0x486/0x7c0 fs/xfs/xfs_iwalk.c:220
> xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
> xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
> xfs_iwalk_ag_work+0xfb/0x1b0 fs/xfs/xfs_iwalk.c:624
> xfs_pwork_work+0x7c/0x190 fs/xfs/xfs_pwork.c:47
> process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
> worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
> kthread+0x2b8/0x350 kernel/kthread.c:379
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> </TASK>
> XFS (loop0): Corruption detected. Unmount and run xfs_repair
> XFS (loop0): Invalid inode number 0x24
> XFS (loop0): Metadata corruption detected at xfs_dir2_sf_verify+0x767/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:774, inode 0x23 data fork
> XFS (loop0): Unmount and run xfs_repair
> XFS (loop0): First 32 bytes of corrupted metadata buffer:
> 00000000: 02 00 00 00 00 20 05 00 30 66 69 6c 65 30 01 00 ..... ..0file0..

syzbot corrupted a v4 filesystem.

Syzbot corrupted the superblock, XFS detected and corrected that.

Syzbot corrupted the AGI. XFS detected that.

Syzbot corrupted the AGF and AGFL. XFS detected and corrected that,
allowing operations to continue.

Syzbot also corrupted a directory inode. XFS detected that and
warned about it.

Test finished.

At no point did the kernel crash, oops, do anything bad like a UAF
or OOB read. All XFS did was catch the corruptions, fix some of them
so it could continue operating, and warn the user that they need to
unmount and run repair.

So exactly what is syzbot complaining about here? There's no kernel
issue here at all.

Also, I cannot tell syzbot "don't ever report this as a bug again",
so the syzbot developers are going to have to triage and fix this
syzbot problem themselves so it doesn't keep getting reported to
us...

-Dave.
--
Dave Chinner
[email protected]

2023-06-22 09:32:02

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Wed, Jun 21, 2023 at 12:54:21AM -0700, Eric Biggers wrote:
> On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
> > On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> > So exactly what is syzbot complaining about here? There's no kernel
> > issue here at all.
> >
> > Also, I cannot tell syzbot "don't ever report this as a bug again",
> > so the syzbot developers are going to have to triage and fix this
> > syzbot problem themselves so it doesn't keep getting reported to
> > us...
>
> I think the problem here was that XFS logged a message beginning with
> "WARNING:", followed by a stack trace. In the log that looks like a warning
> generated by the WARN_ON() macro, which is meant for reporting recoverable
> kernel bugs. It's difficult for any program to understand the log in cases like
> this. This is why include/asm-generic/bug.h contains the following comment:
>
> * Do not include "BUG"/"WARNING" in format strings manually to make these
> * conditions distinguishable from kernel issues.

Nice.

Syzbot author doesn't like log messages using certain key words
because it's hard for syzbot to work out what went wrong.

Gets new rule added to kernel in a comment in some header file that
almost nobody doing kernel development work ever looks at.

Nothing was added to the coding style rules or checkpatch so nobody
is likely to accidentally trip over this new rule that nobody has
been told about.

Syzbot maintainer also fails to do an audit of the kernel to remove
all existing "WARNING" keywords from existing log messages so leaves
landmines for subsystems to have to handle at some time in the
future.

Five years later, syzbot trips over a log message containing WARNING
in it that was in code introduced before the rule was "introduced".
Subsystem maintainers are blamed for not know the rule existed.

Result: *yet again* we are being told that our only option is
to *change code that is not broken* just to *shut up some fucking
bot* we have no control over and could happily live without.

> If you have a constructive suggestion of how all programs that
> parse the kernel log can identify real warnings reliably without
> getting confused by cases like this, I'm sure that would be
> appreciated. It would need to be documented and then the guidance
> in bug.h could then be removed. But until then, the above is the
> current guidance.

That is so not the problem here, Eric.

--
Dave Chinner
[email protected]

2023-06-23 01:03:36

by Eric Biggers

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Thu, Jun 22, 2023 at 06:59:56PM +1000, Dave Chinner wrote:
> On Wed, Jun 21, 2023 at 12:54:21AM -0700, Eric Biggers wrote:
> > On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
> > > On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> > > So exactly what is syzbot complaining about here? There's no kernel
> > > issue here at all.
> > >
> > > Also, I cannot tell syzbot "don't ever report this as a bug again",
> > > so the syzbot developers are going to have to triage and fix this
> > > syzbot problem themselves so it doesn't keep getting reported to
> > > us...
> >
> > I think the problem here was that XFS logged a message beginning with
> > "WARNING:", followed by a stack trace. In the log that looks like a warning
> > generated by the WARN_ON() macro, which is meant for reporting recoverable
> > kernel bugs. It's difficult for any program to understand the log in cases like
> > this. This is why include/asm-generic/bug.h contains the following comment:
> >
> > * Do not include "BUG"/"WARNING" in format strings manually to make these
> > * conditions distinguishable from kernel issues.
>
> Nice.
>
> Syzbot author doesn't like log messages using certain key words
> because it's hard for syzbot to work out what went wrong.
>
> Gets new rule added to kernel in a comment in some header file that
> almost nobody doing kernel development work ever looks at.
>
> Nothing was added to the coding style rules or checkpatch so nobody
> is likely to accidentally trip over this new rule that nobody has
> been told about.
>
> Syzbot maintainer also fails to do an audit of the kernel to remove
> all existing "WARNING" keywords from existing log messages so leaves
> landmines for subsystems to have to handle at some time in the
> future.
>
> Five years later, syzbot trips over a log message containing WARNING
> in it that was in code introduced before the rule was "introduced".
> Subsystem maintainers are blamed for not know the rule existed.
>
> Result: *yet again* we are being told that our only option is
> to *change code that is not broken* just to *shut up some fucking
> bot* we have no control over and could happily live without.
>
> > If you have a constructive suggestion of how all programs that
> > parse the kernel log can identify real warnings reliably without
> > getting confused by cases like this, I'm sure that would be
> > appreciated. It would need to be documented and then the guidance
> > in bug.h could then be removed. But until then, the above is the
> > current guidance.
>
> That is so not the problem here, Eric.
>

Grepping for "WARNING:" is how other kernel testing systems find WARN_ON's in
the log too. For example, see _check_dmesg() in common/rc in xfstests.
xfstests fails tests if "WARNING:" is logged. You might be aware of this, as
you reviewed and applied xfstests commit 47e5d7d2bb17 which added the code.

I understand it's frustrating that Dmitry's attempt to do something about this
problem was incomplete. I don't think it is helpful to then send a reflexive,
adversarial response that shifts the blame for this longstanding problem with
the kernel logs entirely onto syzbot and even Dmitry personally. That just
causes confusion about the problem that needs to be solved.

Anyway, either everything that parses the kernel logs needs to be smarter about
identifying real WARN_ON's, or all instances of "WARNING:" need to be eliminated
from the log (with existing code, coding style guidelines, and checkpatch
updated as you mentioned). I think I'm leaning towards the position that fake
"WARNING:"s should be eliminated. It does seem like a hack, but it makes the
"obvious" log pattern matching that everyone tends to write work as expected...

If you don't want to help, fine, but at least please try not to be obstructive.

- Eric

2023-06-23 03:23:50

by Eric Sandeen

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On 6/22/23 7:56 PM, Eric Biggers wrote:
> On Thu, Jun 22, 2023 at 06:59:56PM +1000, Dave Chinner wrote:
>> On Wed, Jun 21, 2023 at 12:54:21AM -0700, Eric Biggers wrote:
>>> On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
>>>> On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
>>>> So exactly what is syzbot complaining about here? There's no kernel
>>>> issue here at all.
>>>>
>>>> Also, I cannot tell syzbot "don't ever report this as a bug again",
>>>> so the syzbot developers are going to have to triage and fix this
>>>> syzbot problem themselves so it doesn't keep getting reported to
>>>> us...
>>>
>>> I think the problem here was that XFS logged a message beginning with
>>> "WARNING:", followed by a stack trace. In the log that looks like a warning
>>> generated by the WARN_ON() macro, which is meant for reporting recoverable
>>> kernel bugs. It's difficult for any program to understand the log in cases like
>>> this. This is why include/asm-generic/bug.h contains the following comment:
>>>
>>> * Do not include "BUG"/"WARNING" in format strings manually to make these
>>> * conditions distinguishable from kernel issues.
>>
>> Nice.
>>
>> Syzbot author doesn't like log messages using certain key words
>> because it's hard for syzbot to work out what went wrong.
>>
>> Gets new rule added to kernel in a comment in some header file that
>> almost nobody doing kernel development work ever looks at.
>>
>> Nothing was added to the coding style rules or checkpatch so nobody
>> is likely to accidentally trip over this new rule that nobody has
>> been told about.
>>
>> Syzbot maintainer also fails to do an audit of the kernel to remove
>> all existing "WARNING" keywords from existing log messages so leaves
>> landmines for subsystems to have to handle at some time in the
>> future.
>>
>> Five years later, syzbot trips over a log message containing WARNING
>> in it that was in code introduced before the rule was "introduced".
>> Subsystem maintainers are blamed for not know the rule existed.
>>
>> Result: *yet again* we are being told that our only option is
>> to *change code that is not broken* just to *shut up some fucking
>> bot* we have no control over and could happily live without.
>>
>>> If you have a constructive suggestion of how all programs that
>>> parse the kernel log can identify real warnings reliably without
>>> getting confused by cases like this, I'm sure that would be
>>> appreciated. It would need to be documented and then the guidance
>>> in bug.h could then be removed. But until then, the above is the
>>> current guidance.
>>
>> That is so not the problem here, Eric.
>>
>
> Grepping for "WARNING:" is how other kernel testing systems find WARN_ON's in
> the log too. For example, see _check_dmesg() in common/rc in xfstests.
> xfstests fails tests if "WARNING:" is logged. You might be aware of this, as
> you reviewed and applied xfstests commit 47e5d7d2bb17 which added the code.
>
> I understand it's frustrating that Dmitry's attempt to do something about this
> problem was incomplete. I don't think it is helpful to then send a reflexive,
> adversarial response that shifts the blame for this longstanding problem with
> the kernel logs entirely onto syzbot and even Dmitry personally. That just
> causes confusion about the problem that needs to be solved.
>
> Anyway, either everything that parses the kernel logs needs to be smarter about
> identifying real WARN_ON's, or all instances of "WARNING:" need to be eliminated
> from the log (with existing code, coding style guidelines, and checkpatch
> updated as you mentioned). I think I'm leaning towards the position that fake
> "WARNING:"s should be eliminated. It does seem like a hack, but it makes the
> "obvious" log pattern matching that everyone tends to write work as expected...
>
> If you don't want to help, fine, but at least please try not to be obstructive.

I didn't read Dave's reply as "obstructive." There's been a trend lately
of ever-growing hoards of people (with machines behind them) generating
ever-more work for a very small and fixed number of developers who are
burning out. It's not sustainable. The work-generators need to help make
things better, or the whole system is going to break.

Dave being frustrated that he has to deal with "bug reports" about a
printk phrase is valid, IMHO. There are many straws breaking the camel's
back these days.

You had asked for a constructive suggestion.

My specific suggestion is that the people who decided that
printk("WARNING") merits must-fix syzbot reports should submit patches
to any subsystem they plan to test, to replace printk("WARNING") with
something that will not trigger syzbot reports. Don't spread that pain
onto every subsystem developer who already has to deal with legitimate
and pressing work. Or, work out some other reliable way to discern
WARN_ON from WARNING.

And add it to checkpatch etc, as Dave suggested.

This falls into the "help us help you" category. Early on, syszbot
filesystem reports presented filesystems only as a giant array of hex in
a C file, leaving it to the poor developer to work out how to use
standard filesystem tools to analyze the input. Now we get standard
images. That's an improvement, with some effort on the syzbot side that
saves time and effort for every filesystem developer forever more. Find
more ways to make these reports more relevant, more accurate, and more
efficient to triage.

That's my constructive suggestion.

-Eric


2023-06-23 04:40:57

by Eric Biggers

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Thu, Jun 22, 2023 at 10:09:55PM -0500, Eric Sandeen wrote:
> > Grepping for "WARNING:" is how other kernel testing systems find WARN_ON's in
> > the log too. For example, see _check_dmesg() in common/rc in xfstests.
> > xfstests fails tests if "WARNING:" is logged. You might be aware of this, as
> > you reviewed and applied xfstests commit 47e5d7d2bb17 which added the code.
> >
> > I understand it's frustrating that Dmitry's attempt to do something about this
> > problem was incomplete. I don't think it is helpful to then send a reflexive,
> > adversarial response that shifts the blame for this longstanding problem with
> > the kernel logs entirely onto syzbot and even Dmitry personally. That just
> > causes confusion about the problem that needs to be solved.
> >
> > Anyway, either everything that parses the kernel logs needs to be smarter about
> > identifying real WARN_ON's, or all instances of "WARNING:" need to be eliminated
> > from the log (with existing code, coding style guidelines, and checkpatch
> > updated as you mentioned). I think I'm leaning towards the position that fake
> > "WARNING:"s should be eliminated. It does seem like a hack, but it makes the
> > "obvious" log pattern matching that everyone tends to write work as expected...
> >
> > If you don't want to help, fine, but at least please try not to be obstructive.
>
> I didn't read Dave's reply as "obstructive." There's been a trend lately of
> ever-growing hoards of people (with machines behind them) generating
> ever-more work for a very small and fixed number of developers who are
> burning out. It's not sustainable. The work-generators need to help make
> things better, or the whole system is going to break.
>
> Dave being frustrated that he has to deal with "bug reports" about a printk
> phrase is valid, IMHO. There are many straws breaking the camel's back these
> days.
>
> You had asked for a constructive suggestion.
>
> My specific suggestion is that the people who decided that printk("WARNING")
> merits must-fix syzbot reports should submit patches to any subsystem they
> plan to test, to replace printk("WARNING") with something that will not
> trigger syzbot reports. Don't spread that pain onto every subsystem
> developer who already has to deal with legitimate and pressing work. Or,
> work out some other reliable way to discern WARN_ON from WARNING.
>
> And add it to checkpatch etc, as Dave suggested.
>
> This falls into the "help us help you" category. Early on, syszbot
> filesystem reports presented filesystems only as a giant array of hex in a C
> file, leaving it to the poor developer to work out how to use standard
> filesystem tools to analyze the input. Now we get standard images. That's an
> improvement, with some effort on the syzbot side that saves time and effort
> for every filesystem developer forever more. Find more ways to make these
> reports more relevant, more accurate, and more efficient to triage.
>
> That's my constructive suggestion.
>

I went ahead and filed an issue against syzkaller for this:
https://github.com/google/syzkaller/issues/3980

I still would like to emphasize that other testing systems such as xfstests do
the same "dmesg | grep WARNING:" thing and therefore have the same problem, at
least in principle. (Whether a test actually finds anything depends on the code
covered, of course.) Again, I'm mentioning this not to try to absolve syzkaller
of responsibility, but rather because it's important that everyone agrees on the
problem here, and ideally its solution too. If people continue operating under
the mistaken belief that this is a syzkaller specific issue, it might be hard to
get kernel patches merged to fix it, especially if those patches involve changes
to checkpatch.pl, CodingStyle, and several dozen different kernel subsystems.
Or, the syzkaller people might go off on their own and find and implement some
way to parse the log reliably, without other the testing systems being fixed...

- Eric

2023-06-30 02:05:24

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Thu, Jun 22, 2023 at 09:36:23PM -0700, Eric Biggers wrote:
> On Thu, Jun 22, 2023 at 10:09:55PM -0500, Eric Sandeen wrote:
> > > Grepping for "WARNING:" is how other kernel testing systems find WARN_ON's in
> > > the log too. For example, see _check_dmesg() in common/rc in xfstests.
> > > xfstests fails tests if "WARNING:" is logged. You might be aware of this, as
> > > you reviewed and applied xfstests commit 47e5d7d2bb17 which added the code.
> > >
> > > I understand it's frustrating that Dmitry's attempt to do something about this
> > > problem was incomplete. I don't think it is helpful to then send a reflexive,
> > > adversarial response that shifts the blame for this longstanding problem with
> > > the kernel logs entirely onto syzbot and even Dmitry personally. That just
> > > causes confusion about the problem that needs to be solved.
> > >
> > > Anyway, either everything that parses the kernel logs needs to be smarter about
> > > identifying real WARN_ON's, or all instances of "WARNING:" need to be eliminated
> > > from the log (with existing code, coding style guidelines, and checkpatch
> > > updated as you mentioned). I think I'm leaning towards the position that fake
> > > "WARNING:"s should be eliminated. It does seem like a hack, but it makes the
> > > "obvious" log pattern matching that everyone tends to write work as expected...
> > >
> > > If you don't want to help, fine, but at least please try not to be obstructive.
> >
> > I didn't read Dave's reply as "obstructive." There's been a trend lately of
> > ever-growing hoards of people (with machines behind them) generating
> > ever-more work for a very small and fixed number of developers who are
> > burning out. It's not sustainable. The work-generators need to help make
> > things better, or the whole system is going to break.
> >
> > Dave being frustrated that he has to deal with "bug reports" about a printk
> > phrase is valid, IMHO. There are many straws breaking the camel's back these
> > days.
> >
> > You had asked for a constructive suggestion.
> >
> > My specific suggestion is that the people who decided that printk("WARNING")
> > merits must-fix syzbot reports should submit patches to any subsystem they
> > plan to test, to replace printk("WARNING") with something that will not
> > trigger syzbot reports. Don't spread that pain onto every subsystem
> > developer who already has to deal with legitimate and pressing work. Or,
> > work out some other reliable way to discern WARN_ON from WARNING.
> >
> > And add it to checkpatch etc, as Dave suggested.
> >
> > This falls into the "help us help you" category. Early on, syszbot
> > filesystem reports presented filesystems only as a giant array of hex in a C
> > file, leaving it to the poor developer to work out how to use standard
> > filesystem tools to analyze the input. Now we get standard images. That's an
> > improvement, with some effort on the syzbot side that saves time and effort
> > for every filesystem developer forever more. Find more ways to make these
> > reports more relevant, more accurate, and more efficient to triage.
> >
> > That's my constructive suggestion.
> >
>
> I went ahead and filed an issue against syzkaller for this:
> https://github.com/google/syzkaller/issues/3980
>
> I still would like to emphasize that other testing systems such as xfstests do
> the same "dmesg | grep WARNING:" thing and therefore have the same problem, at
> least in principle. (Whether a test actually finds anything depends on the code

I was /about/ to comment that fstests allows test authors to disable
various parts of the dmesg reporting when they /know/ the code they're
exercising will spit out logging messages with "WARNING" in them, but
then I decided to read the syzkaller issue you filed...

> covered, of course.) Again, I'm mentioning this not to try to absolve syzkaller
> of responsibility, but rather because it's important that everyone agrees on the
> problem here, and ideally its solution too. If people continue operating under
> the mistaken belief that this is a syzkaller specific issue, it might be hard to
> get kernel patches merged to fix it, especially if those patches involve changes
> to checkpatch.pl, CodingStyle, and several dozen different kernel subsystems.

...and observed your claim that bug.h tells you not to use the strings
"BUG" or "WARNING" in a printk message. That's not a rule that /I/ have
ever heard of at any point during my 20 year career writing kernel code.

Assuming this is the commente that you were referring to:

96c6a32ccb55a (Dmitry Vyukov 2018-08-21 21:55:24 -0700 85) * Do not include "BUG"/"WARNING" in format strings manually to make these

It was quietly added to a header file five years ago...

$ git grep BUG.*WARNING Documentation/
$ git grep WARNING.*BUG Documentation/
$

...and isn't documented anywhere. Let me lazily try to figure out how
many printks there are that spit out "BUG:" or "WARNING:"

$ git grep printk.*BUG: | grep -v -E '(DEBUG|KERN_DEBUG)' | wc -l
23
$ git grep printk.*WARNING: | grep -v KERN_WARNING | wc -l
16

No enforcement via checkpatch.pl either. Why /do/ I keep running into
linux conventions that I've never heard of, aren't documented, and
are not checked by any of the automation? I just tripped over rst
heading style complaints a few weeks ago -- same problem!

No wonder you suggested "Follow through with "conversion" to
BUG/WARNING-free log in the kernel." I had wondered if you could at
least use dmesg -r to find KERN_WARNING.*WARNING, but I suppose XFS
does xfs_warn("WARNING...") so that wouldn't work anyway.

Ugh, string parsing. At this point I'm going to back away slowly into
the hedge. Sorry, man, everything is a mess. :(

(That said, if someone sent me a patch changing xfs_warn("WARNING...")
into xfs_notice("NOTICE:") for these "we patched your fs back together"
things, I think I'd be ok with that.)

--D

> Or, the syzkaller people might go off on their own and find and implement some
> way to parse the log reliably, without other the testing systems being fixed...
>
> - Eric

2024-02-07 00:24:17

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

syzbot suspects this issue was fixed by commit:

commit 6f861765464f43a71462d52026fbddfc858239a5
Author: Jan Kara <[email protected]>
Date: Wed Nov 1 17:43:10 2023 +0000

fs: Block writes to mounted block devices

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1110ad7be80000
start commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: fs: Block writes to mounted block devices

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2024-02-07 21:18:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

On Tue, Feb 06, 2024 at 04:24:04PM -0800, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
>
> commit 6f861765464f43a71462d52026fbddfc858239a5
> Author: Jan Kara <[email protected]>
> Date: Wed Nov 1 17:43:10 2023 +0000
>
> fs: Block writes to mounted block devices
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1110ad7be80000
> start commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
> git tree: upstream
> kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
> dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
> #syz fix: fs: Block writes to mounted block devices

Hahahahahaha!

syzbot, you're drunk. Go home.

-Dave
--
Dave Chinner
[email protected]