Hi all,
syscaller reports lots of crashes with the mainline kernel. The latest
I have seen, based on the current ToT, is attached. Backtraces are not
always the same, but this one is typical.
I have not tried to analyze this further.
Guenter
---
scsi 0:0:1:0: Direct-Access Google PersistentDisk 1 PQ: 0 ANSI: 6
BUG: unable to handle page fault for address: ffffdc0000000001
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 11826067
P4D 11826067
PUD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 41 Comm: kworker/u4:2 Not tainted 5.19.0-syzkaller-14374-g5d6a0f4da927 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:__hlist_del include/linux/list.h:886 [inline]
RIP: 0010:hlist_del include/linux/list.h:898 [inline]
RIP: 0010:__alloc_object lib/debugobjects.c:213 [inline]
RIP: 0010:alloc_object lib/debugobjects.c:253 [inline]
RIP: 0010:__debug_object_init+0x761/0xd10 lib/debugobjects.c:570
Code: 00 00 4d 8b 47 08 4c 89 c7 48 c1 ef 03 80 3c 0f 00 0f 85 74 04 00 00 48 85 f6 49 89 30 74 1a 48 8d 7e 08 49 89 fa 49 c1 ea 03 <41> 80 3c 0a 00 0f 85 2f 04 00 00 4c 89 46 08 48 b8 00 01 00 00 00
RSP: 0000:ffffc90000d2f670 EFLAGS: 00010802
RAX: 0000000000000000 RBX: 0000000000001c37 RCX: dffffc0000000000
RDX: ffff8880b9b36c20 RSI: ffff000000000000 RDI: ffff000000000008
RBP: 0000000000001c3b R08: ffffffff91340700 R09: 0000000000001c2b
R10: 1fffe00000000001 R11: 0000000000000215 R12: ffff8880b9b36c20
R13: 0000000000000209 R14: ffffed1017366d85 R15: ffff8881472293f0
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffdc0000000001 CR3: 000000000bc8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__init_work+0x48/0x50 kernel/workqueue.c:523
pm_runtime_init+0x1fe/0x400 drivers/base/power/runtime.c:1717
device_pm_init drivers/base/power/power.h:168 [inline]
device_initialize+0x23f/0x540 drivers/base/core.c:2985
scsi_sysfs_device_initialize+0x257/0xa60 drivers/scsi/scsi_sysfs.c:1627
scsi_alloc_sdev+0x9b4/0xcc0 drivers/scsi/scsi_scan.c:364
scsi_probe_and_add_lun+0x22be/0x3660 drivers/scsi/scsi_scan.c:1191
__scsi_scan_target+0x21f/0xdb0 drivers/scsi/scsi_scan.c:1673
scsi_scan_channel drivers/scsi/scsi_scan.c:1761 [inline]
scsi_scan_channel+0x148/0x1e0 drivers/scsi/scsi_scan.c:1737
scsi_scan_host_selected+0x2df/0x3b0 drivers/scsi/scsi_scan.c:1790
do_scsi_scan_host+0x1e8/0x260 drivers/scsi/scsi_scan.c:1929
do_scan_async+0x3e/0x500 drivers/scsi/scsi_scan.c:1939
async_run_entry_fn+0x98/0x530 kernel/async.c:127
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Modules linked in:
CR2: ffffdc0000000001
---[ end trace 0000000000000000 ]---
RIP: 0010:__hlist_del include/linux/list.h:886 [inline]
RIP: 0010:hlist_del include/linux/list.h:898 [inline]
RIP: 0010:__alloc_object lib/debugobjects.c:213 [inline]
RIP: 0010:alloc_object lib/debugobjects.c:253 [inline]
RIP: 0010:__debug_object_init+0x761/0xd10 lib/debugobjects.c:570
Code: 00 00 4d 8b 47 08 4c 89 c7 48 c1 ef 03 80 3c 0f 00 0f 85 74 04 00 00 48 85 f6 49 89 30 74 1a 48 8d 7e 08 49 89 fa 49 c1 ea 03 <41> 80 3c 0a 00 0f 85 2f 04 00 00 4c 89 46 08 48 b8 00 01 00 00 00
RSP: 0000:ffffc90000d2f670 EFLAGS: 00010802
RAX: 0000000000000000 RBX: 0000000000001c37 RCX: dffffc0000000000
RDX: ffff8880b9b36c20 RSI: ffff000000000000 RDI: ffff000000000008
RBP: 0000000000001c3b R08: ffffffff91340700 R09: 0000000000001c2b
R10: 1fffe00000000001 R11: 0000000000000215 R12: ffff8880b9b36c20
R13: 0000000000000209 R14: ffffed1017366d85 R15: ffff8881472293f0
FS: 0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffdc0000000001 CR3: 000000000bc8e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 00 00 add %al,(%rax)
2: 4d 8b 47 08 mov 0x8(%r15),%r8
6: 4c 89 c7 mov %r8,%rdi
9: 48 c1 ef 03 shr $0x3,%rdi
d: 80 3c 0f 00 cmpb $0x0,(%rdi,%rcx,1)
11: 0f 85 74 04 00 00 jne 0x48b
17: 48 85 f6 test %rsi,%rsi
1a: 49 89 30 mov %rsi,(%r8)
1d: 74 1a je 0x39
1f: 48 8d 7e 08 lea 0x8(%rsi),%rdi
23: 49 89 fa mov %rdi,%r10
26: 49 c1 ea 03 shr $0x3,%r10
* 2a: 41 80 3c 0a 00 cmpb $0x0,(%r10,%rcx,1) <-- trapping instruction
2f: 0f 85 2f 04 00 00 jne 0x464
35: 4c 89 46 08 mov %r8,0x8(%rsi)
39: 48 rex.W
3a: b8 00 01 00 00 mov $0x100,%eax
On Sun, Aug 14, 2022 at 2:26 PM Guenter Roeck <[email protected]> wrote:
>
> Hi all,
>
> syscaller reports lots of crashes with the mainline kernel. The latest
> I have seen, based on the current ToT, is attached. Backtraces are not
> always the same, but this one is typical.
>
> I have not tried to analyze this further.
This smells like the same issue that Andres Freund reported:
https://lore.kernel.org/all/[email protected]/
he blamed the virtio changes, mainly based on the upstream merges
between his bad tree and last good tree, ie
git log --merges --author=torvalds --oneline 7ebfc85e2cd7..69dac8e431af
and assuming those end-points are accurate, that does seem to be the
most likely area.
Andres was going to try to bisect it, but if you have some idea where
this all started, that would help.
Linus
On Sun, Aug 14, 2022 at 3:37 PM Andres Freund <[email protected]> wrote:
>
> That range had different symptoms, I think (networking not working, but not
> oopsing). I hit similar issues when tried to reproduce the issue
> interactively, to produce more details, and unwisely did git pull instead of
> checking out the precise revision, ending up with aea23e7c464b. That's when
> symptoms look similar to the above. So it'd be 69dac8e431af..aea23e7c464b
> that I'd be more suspicious off in the context of this thread.
Ok.
> Which would make me look at the following first:
> e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
> 1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
All right, that maks sense.The reported oopses seem to be about block
requests. Some of them were scsi in particular.
Let's bring in Jens and the SCSI people. Maybe that host reference
counting? There's quite a lot of "move freeing around" in that late
scsi pull, even if it was touted as "mostly small bug fixes and
trivial updates".
Here's the two threads:
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
but I guess I'll do an -rc1 regardless of this, because I need to
close the merge window.
Linus
Hi,
On 2022-08-14 14:40:25 -0700, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 2:26 PM Guenter Roeck <[email protected]> wrote:
> >
> > Hi all,
> >
> > syscaller reports lots of crashes with the mainline kernel. The latest
> > I have seen, based on the current ToT, is attached. Backtraces are not
> > always the same, but this one is typical.
> >
> > I have not tried to analyze this further.
>
> This smells like the same issue that Andres Freund reported:
>
> https://lore.kernel.org/all/[email protected]/
>
> he blamed the virtio changes, mainly based on the upstream merges
> between his bad tree and last good tree, ie
>
> git log --merges --author=torvalds --oneline 7ebfc85e2cd7..69dac8e431af
>
> and assuming those end-points are accurate, that does seem to be the
> most likely area.
That range had different symptoms, I think (networking not working, but not
oopsing). I hit similar issues when tried to reproduce the issue
interactively, to produce more details, and unwisely did git pull instead of
checking out the precise revision, ending up with aea23e7c464b. That's when
symptoms look similar to the above. So it'd be 69dac8e431af..aea23e7c464b
that I'd be more suspicious off in the context of this thread.
Which would make me look at the following first:
e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
But I'm not a kernel developer...
> Andres was going to try to bisect it, but if you have some idea where
> this all started, that would help.
I started, but it's one of those days.
First I got sidetracked first by the interactive "serial console" of google
cloud apparently not working during grub, making it harder to recover from a
bad kernel.
Then by a bug in git bisect. For some reason debian sid's git is stuck at:
clear_distance (list=0x5616f56656a0) at ./bisect.c:71
71 ./bisect.c: No such file or directory.
(gdb) bt
#0 clear_distance (list=0x5616f56656a0) at ./bisect.c:71
#1 do_find_bisection (bisect_flags=0, weights=0x561701309e50, nr=1120570, list=<optimized out>) at ./bisect.c:332
#2 find_bisection (commit_list=commit_list@entry=0x7ffe32540d50, reaches=reaches@entry=0x7ffe32540c90, all=all@entry=0x7ffe32540c94,
bisect_flags=bisect_flags@entry=0) at ./bisect.c:428
#3 0x00005616b63636ae in bisect_next_all (r=<optimized out>, prefix=prefix@entry=0x0) at ./bisect.c:1047
#4 0x00005616b628bad7 in bisect_next (terms=terms@entry=0x7ffe32542720, prefix=prefix@entry=0x0) at builtin/bisect--helper.c:637
#5 0x00005616b628be42 in bisect_auto_next (terms=terms@entry=0x7ffe32542720, prefix=0x0) at builtin/bisect--helper.c:656
#6 0x00005616b628c2c0 in bisect_state (terms=terms@entry=0x7ffe32542720, argv=<optimized out>, argv@entry=0x7ffe32543be0, argc=<optimized out>, argc@entry=2)
at builtin/bisect--helper.c:971
#7 0x00005616b628cc18 in cmd_bisect__helper (argc=2, argv=0x7ffe32543be0, prefix=<optimized out>) at builtin/bisect--helper.c:1356
#8 0x00005616b627f3aa in run_builtin (argv=0x7ffe32543be0, argc=4, p=0x5616b65b6818 <commands+120>) at ./git.c:466
#9 handle_builtin (argc=4, argv=argv@entry=0x7ffe32543be0) at ./git.c:720
#10 0x00005616b6280662 in run_argv (argv=0x7ffe325438e0, argcp=0x7ffe325438ec) at ./git.c:787
#11 cmd_main (argc=<optimized out>, argc@entry=5, argv=<optimized out>, argv@entry=0x7ffe32543bd8) at ./git.c:920
#12 0x00005616b627f085 in main (argc=5, argv=0x7ffe32543bd8) at ./common-main.c:56
I suspect it's somehow related to starting out with a shallow clone and then
fetching the full history. So my next step is going to be start with a fresh
clone. But it's dinner prep time now, so it'll have to wait a bit.
Greetings,
Andres Freund
On Sun, Aug 14, 2022 at 03:47:44PM -0700, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 3:37 PM Andres Freund <[email protected]> wrote:
> >
> > That range had different symptoms, I think (networking not working, but not
> > oopsing). I hit similar issues when tried to reproduce the issue
> > interactively, to produce more details, and unwisely did git pull instead of
> > checking out the precise revision, ending up with aea23e7c464b. That's when
> > symptoms look similar to the above. So it'd be 69dac8e431af..aea23e7c464b
> > that I'd be more suspicious off in the context of this thread.
>
> Ok.
>
> > Which would make me look at the following first:
> > e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> > abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
> > 1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
>
> All right, that maks sense.The reported oopses seem to be about block
> requests. Some of them were scsi in particular.
>
> Let's bring in Jens and the SCSI people. Maybe that host reference
> counting? There's quite a lot of "move freeing around" in that late
> scsi pull, even if it was touted as "mostly small bug fixes and
> trivial updates".
>
I may be wrong, but I don't think it is SCSI. The crashes are all over the
place. Here is another one that just came in. I can dig up more tomorrow.
Guenter
---
list_add corruption. next->prev should be prev (ffff8881401c0a00), but was ffff000000000000. (next=ffff88801fb50308).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:27
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 24 Comm: kdevtmpfs Not tainted 5.19.0-syzkaller-14374-g5d6a0f4da927 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:__list_add_valid.cold+0xf/0x58 lib/list_debug.c:27
Code: 48 c7 c6 00 ec 48 8a 48 89 ef 49 c7 c7 ea ff ff ff e8 5b 63 05 00 e9 c2 7d b6 fa 4c 89 e1 48 c7 c7 a0 f2 48 8a e8 95 f2 f0 ff <0f> 0b 48 c7 c7 40 f2 48 8a e8 87 f2 f0 ff 0f 0b 48 c7 c7 a0 f1 48
RSP: 0018:ffffc900001efc10 EFLAGS: 00010286
RAX: 0000000000000075 RBX: ffff8881401c0000 RCX: 0000000000000000
RDX: ffff888012620000 RSI: ffffffff8161f148 RDI: fffff5200003df74
RBP: ffff88801db8b588 R08: 0000000000000075 R09: 0000000000000000
R10: 0000000080000001 R11: 0000000000000000 R12: ffff88801fb50308
R13: ffff88801fb50308 R14: ffff8881401c0000 R15: ffff88801db8b588
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000bc8e000 CR4: 0000000000350ef0
Call Trace:
<TASK>
__list_add include/linux/list.h:69 [inline]
list_add include/linux/list.h:88 [inline]
inode_sb_list_add fs/inode.c:495 [inline]
new_inode+0x114/0x270 fs/inode.c:1049
shmem_get_inode+0x19b/0xe00 mm/shmem.c:2306
shmem_mknod+0x5a/0x1f0 mm/shmem.c:2873
vfs_mknod+0x4d2/0x7e0 fs/namei.c:3892
handle_create+0x340/0x4b3 drivers/base/devtmpfs.c:226
handle drivers/base/devtmpfs.c:391 [inline]
devtmpfs_work_loop drivers/base/devtmpfs.c:406 [inline]
devtmpfsd+0x1a4/0x2a3 drivers/base/devtmpfs.c:448
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid.cold+0xf/0x58 lib/list_debug.c:27
Code: 48 c7 c6 00 ec 48 8a 48 89 ef 49 c7 c7 ea ff ff ff e8 5b 63 05 00 e9 c2 7d b6 fa 4c 89 e1 48 c7 c7 a0 f2 48 8a e8 95 f2 f0 ff <0f> 0b 48 c7 c7 40 f2 48 8a e8 87 f2 f0 ff 0f 0b 48 c7 c7 a0 f1 48
RSP: 0018:ffffc900001efc10 EFLAGS: 00010286
RAX: 0000000000000075 RBX: ffff8881401c0000 RCX: 0000000000000000
RDX: ffff888012620000 RSI: ffffffff8161f148 RDI: fffff5200003df74
RBP: ffff88801db8b588 R08: 0000000000000075 R09: 0000000000000000
R10: 0000000080000001 R11: 0000000000000000 R12: ffff88801fb50308
R13: ffff88801fb50308 R14: ffff8881401c0000 R15: ffff88801db8b588
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000bc8e000 CR4: 0000000000350ef0
Hi,
On 2022-08-14 19:04:22 -0600, Jens Axboe wrote:
> I took a quick look and added more SCSI bits to my vm images, but
> haven't been able to hit it.
Didn't immediately hit anything locally in a vm either...
> Sounds like Andres is already bisecting this, so I guess we'll be wiser
> soon enough.
I started bisecting the network issue, as it occurred first, and who knows
what else it could affect. Will bisect the other range after.
Due to the serial console issue mentioned upthread it's pretty slow
going. Each iteration I create a new gcp snapshot and vm. Adds like ~10min.
Doesn't help if intermediary steps don't boot with different symptoms and
another doesn't immediately build...
Some of the symptoms could be related to the issue in this thread, hence
listing them here
a34bed37fc9d (skipped):
[ 3.128807] systemd[1]: Hostname set to <debian>.
[ 3.240524] general protection fault, probably for non-canonical address 0xffff003800000800: 0000 [#1] PREEMPT SMP PTI
[ 3.242591] CPU: 1 PID: 194 Comm: systemd-rc-loca Not tainted 5.19.0-bisect3-00075-ga34bed37fc9d #2
[ 3.244337] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 3.246262] RIP: 0010:kmem_cache_alloc+0xc2/0x3f0
[ 3.247272] Code: 6c a1 d4 57 48 8b 50 08 48 83 78 10 00 4c 8b 20 0f 84 d3 02 00 00 4d 85 e4 0f 84 ca 02 00 00 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 8b 45 28
[ 3.247275] RSP: 0018:ffffbb538032bdd0 EFLAGS: 00010286
[ 3.247278] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000286101
[ 3.247280] RDX: 00000000002860c1 RSI: 0000000000000cc0 RDI: 000000000002f9a0
[ 3.247282] RBP: ffffa43c4011e100 R08: 0000000000000000 R09: 0000000000000000
[ 3.247283] R10: 0000000000000000 R11: 0000000000000000 R12: ffff003800000000
[ 3.247285] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffffa82f80d6
[ 3.247290] FS: 00007f4036169940(0000) GS:ffffa43d77d00000(0000) knlGS:0000000000000000
[ 3.247292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.247294] CR2: 00007f40370015e2 CR3: 000000010032e004 CR4: 00000000003706e0
[ 3.247296] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.247297] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.247298] Call Trace:
[ 3.247301] <TASK>
[ 3.247304] getname_flags.part.0+0x26/0x1c0
[ 3.267827] user_path_at_empty+0x1b/0x50
[ 3.269300] user_statfs+0x4f/0xd0
[ 3.270092] __do_sys_statfs+0x35/0x70
[ 3.270836] do_syscall_64+0x37/0x90
[ 3.271499] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 3.272612] RIP: 0033:0x7f4036cb93a7
[ 3.273360] Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 05 61 8a 0f 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 b8 89 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 8a 0f 00 f7 d8 64 89 02 b8
[ 3.277027] RSP: 002b:00007ffff2133a18 EFLAGS: 00000286 ORIG_RAX: 0000000000000089
[ 3.278410] RAX: ffffffffffffffda RBX: 00007f40370015e2 RCX: 00007f4036cb93a7
[ 3.279980] RDX: 00007f403616a360 RSI: 00007ffff2133a20 RDI: 00007f403701b2e5
[ 3.281688] RBP: 00007ffff2133a20 R08: 0000000000000000 R09: 00007f40370d8aa0
[ 3.283211] R10: 00007f40370d5518 R11: 0000000000000286 R12: 00007ffff2133d08
[ 3.284602] R13: 00007ffff2133b70 R14: 0000000000000000 R15: 00007f4037107020
[ 3.286087] </TASK>
[ 3.286672] Modules linked in:
[ 3.287490] ---[ end trace 0000000000000000 ]---
repeats across boots
6b60b9c008e5, doesn't build without warnings (circumvented by CONFIG_WERROR=n):
drivers/virtio/virtio_ring.c:1118:12: error: ‘virtqueue_resize_split’ defined but not used [-Werror=unused-function]
1118 | static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
| ^~~~~~~~~~~~~~~~~~~~~~
6e33fa094f1f, first attempt didn't boot with below oops. Second boot showed
the network issue.
[ 5.809314] BUG: unable to handle page fault for address: ffffa17537d00038
[ 5.810318] #PF: supervisor read access in kernel mode
[ 5.811040] #PF: error_code(0x0000) - not-present page
[ 5.811820] PGD 100013067 P4D 100013067 PUD 0
[ 5.812479] Oops: 0000 [#1] PREEMPT SMP PTI
[ 5.813089] CPU: 1 PID: 1 Comm: systemd Not tainted 5.19.0-bisect6-00059-g6e33fa094f1f #6
[ 5.814215] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 5.815473] RIP: 0010:cgroup_rstat_updated+0x1d/0xc0
[ 5.816294] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 41 54 48 c7 c2 64 e6 01 00 55 48 63 ee 48 8b 0c ed a0 58 58 a1 53 48 8b 87 08 03 00 00 <48> 83 7c 01 38 00 74 09 5b 5d 41 5c c3 cc cc cc cc 4c 8d 24 11 48
[ 5.818885] RSP: 0018:ffff97f30001fb80 EFLAGS: 00010092
[ 5.819885] RAX: 00000fff00000000 RBX: 0000000000000001 RCX: ffff917637d00000
[ 5.820913] RDX: 000000000001e664 RSI: 0000000000000001 RDI: ffff917503463800
[ 5.821957] RBP: 0000000000000001 R08: ffffffffa2056600 R09: 0000000000000000
[ 5.822935] R10: ffffffffa2057e20 R11: 0000000000000003 R12: ffff9175035fa000
[ 5.824071] R13: 0000000000000cc0 R14: ffff9175035fa000 R15: 0000000000000001
[ 5.825223] FS: 00007f0541dfe940(0000) GS:ffff917637d00000(0000) knlGS:0000000000000000
[ 5.826428] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.827293] CR2: ffffa17537d00038 CR3: 0000000100324001 CR4: 00000000003706e0
[ 5.828315] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5.829350] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5.830313] Call Trace:
[ 5.830670] <TASK>
[ 5.830986] __mod_memcg_state+0x3c/0xa0
[ 5.831576] memcg_account_kmem+0x19/0x60
[ 5.832159] obj_cgroup_charge+0xc6/0x190
[ 5.832780] ? alloc_inode+0x54/0xc0
[ 5.833296] kmem_cache_alloc_lru+0xba/0x500
[ 5.833901] alloc_inode+0x54/0xc0
[ 5.834381] iget_locked+0xdc/0x200
[ 5.834876] kernfs_get_inode+0x13/0x110
[ 5.835423] kernfs_iop_lookup+0x6f/0xd0
[ 5.835973] path_openat+0x715/0x1250
[ 5.836487] ? path_lookupat+0x92/0x190
[ 5.837035] do_filp_open+0xaa/0x150
[ 5.837538] do_sys_openat2+0xaa/0x160
[ 5.838065] __x64_sys_openat+0x65/0x90
[ 5.838602] do_syscall_64+0x37/0x90
[ 5.839108] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 5.839802] RIP: 0033:0x7f05429637d7
[ 5.840305] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 5.843341] RSP: 002b:00007ffe006571f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 5.844355] RAX: ffffffffffffffda RBX: 000055bacd0216f0 RCX: 00007f05429637d7
[ 5.845579] RDX: 0000000000080241 RSI: 000055bacd12dd00 RDI: 00000000ffffff9c
[ 5.846580] RBP: 000055bacd12dd00 R08: 0000000000000004 R09: 0000000000000001
[ 5.847974] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
[ 5.848963] R13: 000055bacd0216f0 R14: 0000000000000001 R15: 0000000000000001
[ 5.849987] </TASK>
[ 5.850308] Modules linked in:
[ 5.850782] CR2: ffffa17537d00038
[ 5.851297] ---[ end trace 0000000000000000 ]---
[ 5.852030] RIP: 0010:cgroup_rstat_updated+0x1d/0xc0
[ 5.852805] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 41 54 48 c7 c2 64 e6 01 00 55 48 63 ee 48 8b 0c ed a0 58 58 a1 53 48 8b 87 08 03 00 00 <48> 83 7c 01 38 00 74 09 5b 5d 41 5c c3 cc cc cc cc 4c 8d 24 11 48
[ 5.855252] RSP: 0018:ffff97f30001fb80 EFLAGS: 00010092
[ 5.855964] RAX: 00000fff00000000 RBX: 0000000000000001 RCX: ffff917637d00000
[ 5.857004] RDX: 000000000001e664 RSI: 0000000000000001 RDI: ffff917503463800
[ 5.857961] RBP: 0000000000000001 R08: ffffffffa2056600 R09: 0000000000000000
[ 5.858953] R10: ffffffffa2057e20 R11: 0000000000000003 R12: ffff9175035fa000
[ 5.859967] R13: 0000000000000cc0 R14: ffff9175035fa000 R15: 0000000000000001
[ 5.860958] FS: 00007f0541dfe940(0000) GS:ffff917637d00000(0000) knlGS:0000000000000000
[ 5.862043] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.862822] CR2: ffffa17537d00038 CR3: 0000000100324001 CR4: 00000000003706e0
[ 5.863916] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5.864896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5.865911] BUG: unable to handle page fault for address: ffffa17537d00000
[ 5.866933] #PF: supervisor write access in kernel mode
[ 5.867748] #PF: error_code(0x0002) - not-present page
[ 5.868462] PGD 100013067 P4D 100013067 PUD 0
[ 5.869078] Oops: 0002 [#2] PREEMPT SMP PTI
[ 5.869681] CPU: 1 PID: 1 Comm: systemd Tainted: G D 5.19.0-bisect6-00059-g6e33fa094f1f #6
[ 5.870978] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 5.873226] RIP: 0010:__cgroup_account_cputime_field+0x3c/0x90
[ 5.874086] Code: 53 e8 b8 99 f7 ff 49 8b 9c 24 08 03 00 00 e8 9b 5b 95 00 89 c0 48 03 1c c5 a0 58 58 a1 83 fd 01 76 3a 83 ed 02 83 fd 02 77 03 <4c> 01 2b e8 7c 5b 95 00 4c 89 e7 89 c6 e8 c2 fc ff ff bf 01 00 00
[ 5.876610] RSP: 0018:ffff97f3000cce90 EFLAGS: 00010093
[ 5.877501] RAX: 0000000000000001 RBX: ffffa17537d00000 RCX: 0000000080010001
[ 5.878495] RDX: 0000000000000000 RSI: ffffffffa14ebec1 RDI: ffffffffa14f66a5
[ 5.879453] RBP: 0000000000000000 R08: 0000000000010000 R09: 000000000000000b
[ 5.880430] R10: 0000000000000022 R11: 00000000fffb9332 R12: ffff917503463800
[ 5.881453] R13: 00000000000f177d R14: ffff917637d1e040 R15: ffff917637d1e540
[ 5.883045] FS: 00007f0541dfe940(0000) GS:ffff917637d00000(0000) knlGS:0000000000000000
[ 5.884195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.884973] CR2: ffffa17537d00000 CR3: 0000000100324001 CR4: 00000000003706e0
[ 5.886588] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5.887577] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5.888536] Call Trace:
[ 5.888892] <IRQ>
[ 5.889193] account_system_index_time+0x8c/0xa0
[ 5.889833] update_process_times+0x1f/0xb0
[ 5.890416] ? update_wall_time+0x7/0x20
[ 5.890964] tick_sched_handle+0x2f/0x40
[ 5.891509] tick_sched_timer+0x6a/0x80
[ 5.892043] ? tick_sched_do_timer+0x90/0x90
[ 5.892634] __hrtimer_run_queues+0x122/0x2c0
[ 5.893331] hrtimer_interrupt+0xf9/0x210
[ 5.893979] __sysvec_apic_timer_interrupt+0x77/0x160
[ 5.894706] sysvec_apic_timer_interrupt+0x85/0xb0
[ 5.895387] </IRQ>
[ 5.895712] <TASK>
[ 5.896117] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 5.897454] RIP: 0010:_raw_spin_unlock_irq+0x9/0x30
[ 5.898124] Code: 05 9c 1e 34 5f 85 c0 74 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 c6 07 00 fb bf 01 00 00 00 <e8> 82 9f 61 ff 65 8b 05 6b 1e 34 5f 85 c0 74 05 c3 cc cc cc cc 0f
[ 5.900937] RSP: 0018:ffff97f30001fee0 EFLAGS: 00000202
[ 5.901729] RAX: ffff9175001c0000 RBX: ffff9175001f8000 RCX: 0000000000000000
[ 5.902720] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
[ 5.903676] RBP: 0000000000000000 R08: 00000000ffffdfff R09: 00000000ffffdfff
[ 5.904677] R10: ffffffffa1856fa0 R11: ffffffffa1856fa0 R12: 0000000000000009
[ 5.905692] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
[ 5.906826] do_exit+0x5f/0xa70
[ 5.907297] make_task_dead+0x4c/0x50
[ 5.907892] rewind_stack_and_make_dead+0x17/0x17
[ 5.908586] RIP: 0033:0x7f05429637d7
[ 5.909167] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 54 24 28 64 48 2b 14 25
[ 5.911947] RSP: 002b:00007ffe006571f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 5.913130] RAX: ffffffffffffffda RBX: 000055bacd0216f0 RCX: 00007f05429637d7
[ 5.914121] RDX: 0000000000080241 RSI: 000055bacd12dd00 RDI: 00000000ffffff9c
[ 5.915078] RBP: 000055bacd12dd00 R08: 0000000000000004 R09: 0000000000000001
[ 5.916039] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
[ 5.916997] R13: 000055bacd0216f0 R14: 0000000000000001 R15: 0000000000000001
[ 5.917966] </TASK>
[ 5.918357] Modules linked in:
[ 5.918792] CR2: ffffa17537d00000
[ 5.919262] ---[ end trace 0000000000000000 ]---
Greetings,
Andres Freund
On 8/14/22 4:47 PM, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 3:37 PM Andres Freund <[email protected]> wrote:
>>
>> That range had different symptoms, I think (networking not working, but not
>> oopsing). I hit similar issues when tried to reproduce the issue
>> interactively, to produce more details, and unwisely did git pull instead of
>> checking out the precise revision, ending up with aea23e7c464b. That's when
>> symptoms look similar to the above. So it'd be 69dac8e431af..aea23e7c464b
>> that I'd be more suspicious off in the context of this thread.
>
> Ok.
>
>> Which would make me look at the following first:
>> e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
>> 1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
>
> All right, that maks sense.The reported oopses seem to be about block
> requests. Some of them were scsi in particular.
>
> Let's bring in Jens and the SCSI people. Maybe that host reference
> counting? There's quite a lot of "move freeing around" in that late
> scsi pull, even if it was touted as "mostly small bug fixes and
> trivial updates".
>
> Here's the two threads:
>
> https://lore.kernel.org/all/[email protected]/
> https://lore.kernel.org/all/[email protected]/
>
> but I guess I'll do an -rc1 regardless of this, because I need to
> close the merge window.
I took a quick look and added more SCSI bits to my vm images, but
haven't been able to hit it. But if this is happening after the above
mentioned merges, does seem like it's more SCSI related. The block side
is only really an error handling fix on that front, the rest is just
nvme. Seems unlikely that'd be the culprit.
Sounds like Andres is already bisecting this, so I guess we'll be wiser
soon enough.
--
Jens Axboe
On 8/14/22 7:17 PM, Guenter Roeck wrote:
> On Sun, Aug 14, 2022 at 03:47:44PM -0700, Linus Torvalds wrote:
>> On Sun, Aug 14, 2022 at 3:37 PM Andres Freund <[email protected]> wrote:
>>>
>>> That range had different symptoms, I think (networking not working, but not
>>> oopsing). I hit similar issues when tried to reproduce the issue
>>> interactively, to produce more details, and unwisely did git pull instead of
>>> checking out the precise revision, ending up with aea23e7c464b. That's when
>>> symptoms look similar to the above. So it'd be 69dac8e431af..aea23e7c464b
>>> that I'd be more suspicious off in the context of this thread.
>>
>> Ok.
>>
>>> Which would make me look at the following first:
>>> e140f731f980 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>> abe7a481aac9 Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
>>> 1da8cf961bb1 Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
>>
>> All right, that maks sense.The reported oopses seem to be about block
>> requests. Some of them were scsi in particular.
>>
>> Let's bring in Jens and the SCSI people. Maybe that host reference
>> counting? There's quite a lot of "move freeing around" in that late
>> scsi pull, even if it was touted as "mostly small bug fixes and
>> trivial updates".
>>
>
> I may be wrong, but I don't think it is SCSI. The crashes are all over the
> place. Here is another one that just came in. I can dig up more tomorrow.
I guess it could be generic, I was mostly reacting to the three pull
requests listed. I haven't seen this on any of my test boxes, but I
guess it could depend on the configured kernel options.
--
Jens Axboe
On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
>
> Some of the symptoms could be related to the issue in this thread, hence
> listing them here
Smells like slab corruption to me, and the problems may end up being
then largely random just depending on who ends up using the allocation
that gets trampled on.
I wouldn't be surprised if it's all the same thing - including your
network issue.
Both you and Guenter seem to be seeing this on that google cloud VM
setup, which may or may not be related. I haven't seen any reports
from anybody else, but that might still be just coincidence.
But it makes me go back to your original report, where you suspected
the virtio pull.
Looking at that original report, and doing
gitk 7ebfc85e2cd7..69dac8e431af
nothing else looks all that relevant, although obviously any random
error path could end up doing a double free and messing up the slab
cache that way. But a lot of the non-virtio stuff there is things like
RISC-V and devicetree that definitely shouldn't have any effect.
Adding Michael Tsirkin to this thread too, just so he knows.
Linus
On Sun, Aug 14, 2022 at 06:36:51PM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-14 19:04:22 -0600, Jens Axboe wrote:
> > I took a quick look and added more SCSI bits to my vm images, but
> > haven't been able to hit it.
>
> Didn't immediately hit anything locally in a vm either...
>
>
> > Sounds like Andres is already bisecting this, so I guess we'll be wiser
> > soon enough.
>
> I started bisecting the network issue, as it occurred first, and who knows
> what else it could affect. Will bisect the other range after.
>
> Due to the serial console issue mentioned upthread it's pretty slow
> going. Each iteration I create a new gcp snapshot and vm. Adds like ~10min.
> Doesn't help if intermediary steps don't boot with different symptoms and
> another doesn't immediately build...
Just so we can stop pestering everyone, could you try
git revert --no-edit 0b6fd46ec5f5..a335b33f4f35 ?
Equivalently I pushed it here:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git test
--
MST
Hi,
On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
> >
> > Some of the symptoms could be related to the issue in this thread, hence
> > listing them here
>
> Smells like slab corruption to me, and the problems may end up being
> then largely random just depending on who ends up using the allocation
> that gets trampled on.
>
> I wouldn't be surprised if it's all the same thing - including your
> network issue.
Yea. As I just wrote in
https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
bisected it down to one commit (762faee5a267). With that commit I only see the
networking issue across a few reboots, but with ebcce4926365 some boots oops
badly and other times it' "just" network not working.
[ 2.447668] general protection fault, probably for non-canonical address 0xffff000000000800: 0000 [#1] PREEMPT SMP PTI
[ 2.449168] CPU: 1 PID: 109 Comm: systemd-udevd Not tainted 5.19.0-bisect8-00051-gebcce4926365 #8
[ 2.450397] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 2.451670] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
[ 2.452399] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
[ 2.455454] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
[ 2.456181] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
[ 2.457195] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
[ 2.458211] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
[ 2.459183] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
[ 2.460268] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
[ 2.461368] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
[ 2.462605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.463436] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
[ 2.464527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.465520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.466509] Call Trace:
[ 2.466882] <TASK>
[ 2.467218] copy_process+0x1eb/0x1a00
[ 2.467827] ? _raw_spin_unlock_irqrestore+0x16/0x30
[ 2.468578] kernel_clone+0xba/0x400
[ 2.470455] __do_sys_clone+0x78/0xa0
[ 2.471006] do_syscall_64+0x37/0x90
[ 2.471526] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 2.472267] RIP: 0033:0x7fd02bf98cb3
[ 2.472889] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
[ 2.475504] RSP: 002b:00007ffc6a3abf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 2.476565] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd02bf98cb3
[ 2.477554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 2.478574] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 2.479608] R10: 00007fd02b8b8b90 R11: 0000000000000246 R12: 0000000000000001
[ 2.480675] R13: 00007ffc6a3ac0c0 R14: 0000000000000000 R15: 0000000000000001
[ 2.481686] </TASK>
[ 2.482119] Modules linked in:
[ 2.482704] ---[ end trace 0000000000000000 ]---
[ 2.483456] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
[ 2.484282] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
[ 2.487024] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
[ 2.487817] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
[ 2.488805] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
[ 2.489869] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
[ 2.490842] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
[ 2.491905] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
[ 2.492975] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
[ 2.494140] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.495082] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
[ 2.496080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.497084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.498524] systemd-udevd (109) used greatest stack depth: 13688 bytes left
[ 2.503905] general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#2] PREEMPT SMP PTI
[ 2.505504] CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G D 5.19.0-bisect8-00051-gebcce4926365 #8
[ 2.507037] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 2.508313] RIP: 0010:rcu_core+0x280/0x920
[ 2.508968] Code: 3f 00 00 48 89 c2 48 85 c0 0f 84 2b 03 00 00 49 89 dd 48 83 c3 01 0f 1f 44 00 00 48 8b 42 08 48 89 d7 48 c7 42 08 00 00 00 00 <ff> d0 0f 1f 00 65 8b 05 64 f5 ad 6c f6 c4 01 75 97 be 00 02 00 00
[ 2.511684] RSP: 0000:ffffa2b40007fe20 EFLAGS: 00010202
[ 2.512410] RAX: ffff000000000000 RBX: 0000000000000002 RCX: 0000000080170011
[ 2.513497] RDX: ffff8da783372a20 RSI: 0000000080170011 RDI: ffff8da783372a20
[ 2.514604] RBP: ffff8da8b7c2b940 R08: 0000000000000001 R09: ffffffff9353b752
[ 2.515667] R10: ffffffff94a060c0 R11: 000000000009b776 R12: ffff8da78020c000
[ 2.516650] R13: 0000000000000001 R14: ffff8da8b7c2b9b8 R15: 0000000000000000
[ 2.517628] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
[ 2.518840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.519645] CR2: 0000557194db70f8 CR3: 0000000100364006 CR4: 00000000003706f0
[ 2.520641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.521629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.522592] Call Trace:
[ 2.522963] <TASK>
[ 2.523299] __do_softirq+0xe1/0x2ec
[ 2.523883] ? sort_range+0x20/0x20
[ 2.524404] run_ksoftirqd+0x25/0x30
[ 2.524944] smpboot_thread_fn+0x180/0x220
[ 2.525519] kthread+0xe1/0x110
[ 2.526001] ? kthread_complete_and_exit+0x20/0x20
[ 2.526673] ret_from_fork+0x1f/0x30
[ 2.527182] </TASK>
[ 2.527518] Modules linked in:
[ 2.528005] ---[ end trace 0000000000000000 ]---
[ 2.528662] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
[ 2.529524] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
[ 2.532396] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
[ 2.533201] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
[ 2.534376] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
[ 2.535398] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
Begin: Loading e[ 2.536401] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
[ 2.537641] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
ssential drivers[ 2.538737] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
[ 2.540028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
... done.[ 2.540843] CR2: 0000557194db70f8 CR3: 000000015080c002 CR4: 00000000003706f0
[ 2.541953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.542924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.543902] Kernel panic - not syncing: Fatal exception in interrupt
[ 2.544967] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2.546637] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
can create a kernel with some more debugging stuff enabled, if somebody tells
me what'd work best here.
Greetings,
Andres Freund
Hi,
On August 14, 2022 11:36:55 PM PDT, "Michael S. Tsirkin" <[email protected]> wrote:
>On Sun, Aug 14, 2022 at 06:36:51PM -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2022-08-14 19:04:22 -0600, Jens Axboe wrote:
>> > I took a quick look and added more SCSI bits to my vm images, but
>> > haven't been able to hit it.
>>
>> Didn't immediately hit anything locally in a vm either...
>>
>>
>> > Sounds like Andres is already bisecting this, so I guess we'll be wiser
>> > soon enough.
>>
>> I started bisecting the network issue, as it occurred first, and who knows
>> what else it could affect. Will bisect the other range after.
>>
>> Due to the serial console issue mentioned upthread it's pretty slow
>> going. Each iteration I create a new gcp snapshot and vm. Adds like ~10min.
>> Doesn't help if intermediary steps don't boot with different symptoms and
>> another doesn't immediately build...
>
>Just so we can stop pestering everyone, could you try
>
>git revert --no-edit 0b6fd46ec5f5..a335b33f4f35 ?
>
>Equivalently I pushed it here:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git test
Saw this too late - I assume you don't need this anymore now that I pinpointed the one commit?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
> > >
> > > Some of the symptoms could be related to the issue in this thread, hence
> > > listing them here
> >
> > Smells like slab corruption to me, and the problems may end up being
> > then largely random just depending on who ends up using the allocation
> > that gets trampled on.
> >
> > I wouldn't be surprised if it's all the same thing - including your
> > network issue.
>
> Yea. As I just wrote in
> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> bisected it down to one commit (762faee5a267). With that commit I only see the
> networking issue across a few reboots, but with ebcce4926365 some boots oops
> badly and other times it' "just" network not working.
>
>
> [ 2.447668] general protection fault, probably for non-canonical address 0xffff000000000800: 0000 [#1] PREEMPT SMP PTI
> [ 2.449168] CPU: 1 PID: 109 Comm: systemd-udevd Not tainted 5.19.0-bisect8-00051-gebcce4926365 #8
> [ 2.450397] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [ 2.451670] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.452399] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.455454] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.456181] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.457195] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.458211] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [ 2.459183] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.460268] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [ 2.461368] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [ 2.462605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.463436] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [ 2.464527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.465520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.466509] Call Trace:
> [ 2.466882] <TASK>
> [ 2.467218] copy_process+0x1eb/0x1a00
> [ 2.467827] ? _raw_spin_unlock_irqrestore+0x16/0x30
> [ 2.468578] kernel_clone+0xba/0x400
> [ 2.470455] __do_sys_clone+0x78/0xa0
> [ 2.471006] do_syscall_64+0x37/0x90
> [ 2.471526] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 2.472267] RIP: 0033:0x7fd02bf98cb3
> [ 2.472889] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
> [ 2.475504] RSP: 002b:00007ffc6a3abf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
> [ 2.476565] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd02bf98cb3
> [ 2.477554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> [ 2.478574] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 2.479608] R10: 00007fd02b8b8b90 R11: 0000000000000246 R12: 0000000000000001
> [ 2.480675] R13: 00007ffc6a3ac0c0 R14: 0000000000000000 R15: 0000000000000001
> [ 2.481686] </TASK>
> [ 2.482119] Modules linked in:
> [ 2.482704] ---[ end trace 0000000000000000 ]---
> [ 2.483456] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.484282] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.487024] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.487817] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.488805] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.489869] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [ 2.490842] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.491905] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [ 2.492975] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [ 2.494140] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.495082] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [ 2.496080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.497084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.498524] systemd-udevd (109) used greatest stack depth: 13688 bytes left
> [ 2.503905] general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#2] PREEMPT SMP PTI
> [ 2.505504] CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G D 5.19.0-bisect8-00051-gebcce4926365 #8
> [ 2.507037] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [ 2.508313] RIP: 0010:rcu_core+0x280/0x920
> [ 2.508968] Code: 3f 00 00 48 89 c2 48 85 c0 0f 84 2b 03 00 00 49 89 dd 48 83 c3 01 0f 1f 44 00 00 48 8b 42 08 48 89 d7 48 c7 42 08 00 00 00 00 <ff> d0 0f 1f 00 65 8b 05 64 f5 ad 6c f6 c4 01 75 97 be 00 02 00 00
> [ 2.511684] RSP: 0000:ffffa2b40007fe20 EFLAGS: 00010202
> [ 2.512410] RAX: ffff000000000000 RBX: 0000000000000002 RCX: 0000000080170011
> [ 2.513497] RDX: ffff8da783372a20 RSI: 0000000080170011 RDI: ffff8da783372a20
> [ 2.514604] RBP: ffff8da8b7c2b940 R08: 0000000000000001 R09: ffffffff9353b752
> [ 2.515667] R10: ffffffff94a060c0 R11: 000000000009b776 R12: ffff8da78020c000
> [ 2.516650] R13: 0000000000000001 R14: ffff8da8b7c2b9b8 R15: 0000000000000000
> [ 2.517628] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [ 2.518840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.519645] CR2: 0000557194db70f8 CR3: 0000000100364006 CR4: 00000000003706f0
> [ 2.520641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.521629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.522592] Call Trace:
> [ 2.522963] <TASK>
> [ 2.523299] __do_softirq+0xe1/0x2ec
> [ 2.523883] ? sort_range+0x20/0x20
> [ 2.524404] run_ksoftirqd+0x25/0x30
> [ 2.524944] smpboot_thread_fn+0x180/0x220
> [ 2.525519] kthread+0xe1/0x110
> [ 2.526001] ? kthread_complete_and_exit+0x20/0x20
> [ 2.526673] ret_from_fork+0x1f/0x30
> [ 2.527182] </TASK>
> [ 2.527518] Modules linked in:
> [ 2.528005] ---[ end trace 0000000000000000 ]---
> [ 2.528662] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.529524] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.532396] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.533201] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.534376] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.535398] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> Begin: Loading e[ 2.536401] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.537641] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> ssential drivers[ 2.538737] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [ 2.540028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> ... done.[ 2.540843] CR2: 0000557194db70f8 CR3: 000000015080c002 CR4: 00000000003706f0
> [ 2.541953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>
> [ 2.542924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.543902] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2.544967] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2.546637] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> can create a kernel with some more debugging stuff enabled, if somebody tells
> me what'd work best here.
>
>
> Greetings,
>
> Andres Freund
Thanks a lot for the work!
Just a small clarification:
So IIUC you see several issues, right?
With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
--
MST
On Mon, Aug 15, 2022 at 12:17:44AM -0700, Andres Freund wrote:
> Hi,
>
> On August 14, 2022 11:36:55 PM PDT, "Michael S. Tsirkin" <[email protected]> wrote:
> >On Sun, Aug 14, 2022 at 06:36:51PM -0700, Andres Freund wrote:
> >> Hi,
> >>
> >> On 2022-08-14 19:04:22 -0600, Jens Axboe wrote:
> >> > I took a quick look and added more SCSI bits to my vm images, but
> >> > haven't been able to hit it.
> >>
> >> Didn't immediately hit anything locally in a vm either...
> >>
> >>
> >> > Sounds like Andres is already bisecting this, so I guess we'll be wiser
> >> > soon enough.
> >>
> >> I started bisecting the network issue, as it occurred first, and who knows
> >> what else it could affect. Will bisect the other range after.
> >>
> >> Due to the serial console issue mentioned upthread it's pretty slow
> >> going. Each iteration I create a new gcp snapshot and vm. Adds like ~10min.
> >> Doesn't help if intermediary steps don't boot with different symptoms and
> >> another doesn't immediately build...
> >
> >Just so we can stop pestering everyone, could you try
> >
> >git revert --no-edit 0b6fd46ec5f5..a335b33f4f35 ?
> >
> >Equivalently I pushed it here:
> >
> >https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git test
>
> Saw this too late - I assume you don't need this anymore now that I pinpointed the one commit?
>
> Andres
Right, at least not yet - could you clarify the folliwing please:
So IIUC you see several issues, right?
With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
I just want to make sure the issue is virtio specific and then we can drop
Linus et al and move this to the virtio ML.
Thanks!
--
MST
Hi,
On Mon, Aug 15, 2022, at 00:29, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
>> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
>> > >
>> > > Some of the symptoms could be related to the issue in this thread, hence
>> > > listing them here
>> >
>> > Smells like slab corruption to me, and the problems may end up being
>> > then largely random just depending on who ends up using the allocation
>> > that gets trampled on.
>> >
>> > I wouldn't be surprised if it's all the same thing - including your
>> > network issue.
>>
>> Yea. As I just wrote in
>> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
>> bisected it down to one commit (762faee5a267). With that commit I only see the
>> networking issue across a few reboots, but with ebcce4926365 some boots oops
>> badly and other times it' "just" network not working.
>>
>>
>> [oopses]
>> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
>> can create a kernel with some more debugging stuff enabled, if somebody tells
>> me what'd work best here.
>>
>>
>> Greetings,
>>
>> Andres Freund
>
> Thanks a lot for the work!
> Just a small clarification:
>
> So IIUC you see several issues, right?
Yes, although they might be related, as theorized by Linus upthread.
> With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
Yes.
> With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
Changed between rebooting. Sometimes the network issue, sometimes the crashes in the email you're replying to.
> MST
On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
> > >
> > > Some of the symptoms could be related to the issue in this thread, hence
> > > listing them here
> >
> > Smells like slab corruption to me, and the problems may end up being
> > then largely random just depending on who ends up using the allocation
> > that gets trampled on.
> >
> > I wouldn't be surprised if it's all the same thing - including your
> > network issue.
>
> Yea. As I just wrote in
> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> bisected it down to one commit (762faee5a267). With that commit I only see the
> networking issue across a few reboots, but with ebcce4926365 some boots oops
> badly and other times it' "just" network not working.
>
>
> [ 2.447668] general protection fault, probably for non-canonical address 0xffff000000000800: 0000 [#1] PREEMPT SMP PTI
> [ 2.449168] CPU: 1 PID: 109 Comm: systemd-udevd Not tainted 5.19.0-bisect8-00051-gebcce4926365 #8
> [ 2.450397] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [ 2.451670] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.452399] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.455454] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.456181] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.457195] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.458211] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [ 2.459183] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.460268] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [ 2.461368] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [ 2.462605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.463436] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [ 2.464527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.465520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.466509] Call Trace:
> [ 2.466882] <TASK>
> [ 2.467218] copy_process+0x1eb/0x1a00
> [ 2.467827] ? _raw_spin_unlock_irqrestore+0x16/0x30
> [ 2.468578] kernel_clone+0xba/0x400
> [ 2.470455] __do_sys_clone+0x78/0xa0
> [ 2.471006] do_syscall_64+0x37/0x90
> [ 2.471526] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 2.472267] RIP: 0033:0x7fd02bf98cb3
> [ 2.472889] Code: 1f 84 00 00 00 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 41 89 c0 85 c0 75 2a 64 48 8b 04 25 10 00
> [ 2.475504] RSP: 002b:00007ffc6a3abf08 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
> [ 2.476565] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd02bf98cb3
> [ 2.477554] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> [ 2.478574] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 2.479608] R10: 00007fd02b8b8b90 R11: 0000000000000246 R12: 0000000000000001
> [ 2.480675] R13: 00007ffc6a3ac0c0 R14: 0000000000000000 R15: 0000000000000001
> [ 2.481686] </TASK>
> [ 2.482119] Modules linked in:
> [ 2.482704] ---[ end trace 0000000000000000 ]---
> [ 2.483456] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.484282] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.487024] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.487817] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.488805] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.489869] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> [ 2.490842] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.491905] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> [ 2.492975] FS: 00007fd02b8b88c0(0000) GS:ffff8da8b7d00000(0000) knlGS:0000000000000000
> [ 2.494140] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.495082] CR2: 000055a42d2ee250 CR3: 0000000100328001 CR4: 00000000003706e0
> [ 2.496080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.497084] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.498524] systemd-udevd (109) used greatest stack depth: 13688 bytes left
> [ 2.503905] general protection fault, probably for non-canonical address 0xffff000000000000: 0000 [#2] PREEMPT SMP PTI
> [ 2.505504] CPU: 0 PID: 13 Comm: ksoftirqd/0 Tainted: G D 5.19.0-bisect8-00051-gebcce4926365 #8
> [ 2.507037] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [ 2.508313] RIP: 0010:rcu_core+0x280/0x920
> [ 2.508968] Code: 3f 00 00 48 89 c2 48 85 c0 0f 84 2b 03 00 00 49 89 dd 48 83 c3 01 0f 1f 44 00 00 48 8b 42 08 48 89 d7 48 c7 42 08 00 00 00 00 <ff> d0 0f 1f 00 65 8b 05 64 f5 ad 6c f6 c4 01 75 97 be 00 02 00 00
> [ 2.511684] RSP: 0000:ffffa2b40007fe20 EFLAGS: 00010202
> [ 2.512410] RAX: ffff000000000000 RBX: 0000000000000002 RCX: 0000000080170011
> [ 2.513497] RDX: ffff8da783372a20 RSI: 0000000080170011 RDI: ffff8da783372a20
> [ 2.514604] RBP: ffff8da8b7c2b940 R08: 0000000000000001 R09: ffffffff9353b752
> [ 2.515667] R10: ffffffff94a060c0 R11: 000000000009b776 R12: ffff8da78020c000
> [ 2.516650] R13: 0000000000000001 R14: ffff8da8b7c2b9b8 R15: 0000000000000000
> [ 2.517628] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [ 2.518840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2.519645] CR2: 0000557194db70f8 CR3: 0000000100364006 CR4: 00000000003706f0
> [ 2.520641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2.521629] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.522592] Call Trace:
> [ 2.522963] <TASK>
> [ 2.523299] __do_softirq+0xe1/0x2ec
> [ 2.523883] ? sort_range+0x20/0x20
> [ 2.524404] run_ksoftirqd+0x25/0x30
> [ 2.524944] smpboot_thread_fn+0x180/0x220
> [ 2.525519] kthread+0xe1/0x110
> [ 2.526001] ? kthread_complete_and_exit+0x20/0x20
> [ 2.526673] ret_from_fork+0x1f/0x30
> [ 2.527182] </TASK>
> [ 2.527518] Modules linked in:
> [ 2.528005] ---[ end trace 0000000000000000 ]---
> [ 2.528662] RIP: 0010:kmem_cache_alloc_node+0x2b4/0x430
> [ 2.529524] Code: 01 00 0f 84 e7 fe ff ff 48 8b 50 48 48 8d 7a ff 83 e2 01 48 0f 45 c7 49 89 c7 e9 d0 fe ff ff 8b 45 28 48 8b 7d 00 48 8d 4a 40 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 cd fd ff
> [ 2.532396] RSP: 0018:ffffa2b40040bd60 EFLAGS: 00010246
> [ 2.533201] RAX: 0000000000000800 RBX: 0000000000000cc0 RCX: 0000000000001741
> [ 2.534376] RDX: 0000000000001701 RSI: 0000000000000cc0 RDI: 000000000002f820
> [ 2.535398] RBP: ffff8da7800ed500 R08: 0000000000000000 R09: 0000000000000011
> Begin: Loading e[ 2.536401] R10: 00007fd02b8b8b90 R11: 0000000000000000 R12: ffff000000000000
> [ 2.537641] R13: 0000000000000000 R14: 0000000000000cc0 R15: ffffffff934bde4b
> ssential drivers[ 2.538737] FS: 0000000000000000(0000) GS:ffff8da8b7c00000(0000) knlGS:0000000000000000
> [ 2.540028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> ... done.[ 2.540843] CR2: 0000557194db70f8 CR3: 000000015080c002 CR4: 00000000003706f0
> [ 2.541953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>
> [ 2.542924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2.543902] Kernel panic - not syncing: Fatal exception in interrupt
> [ 2.544967] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 2.546637] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> can create a kernel with some more debugging stuff enabled, if somebody tells
> me what'd work best here.
>
>
> Greetings,
>
> Andres Freund
It is possible that GCP gets confused if ring size is smaller than the
device maximum simply because no one did it in the past.
So I pushed just the revert of 762faee5a267 to the test branch.
Could you give it a spin?
--
MST
On Mon, Aug 15, 2022 at 12:46:36AM -0700, Andres Freund wrote:
> Hi,
>
> On Mon, Aug 15, 2022, at 00:29, Michael S. Tsirkin wrote:
> > On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> >> Hi,
> >>
> >> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> >> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
> >> > >
> >> > > Some of the symptoms could be related to the issue in this thread, hence
> >> > > listing them here
> >> >
> >> > Smells like slab corruption to me, and the problems may end up being
> >> > then largely random just depending on who ends up using the allocation
> >> > that gets trampled on.
> >> >
> >> > I wouldn't be surprised if it's all the same thing - including your
> >> > network issue.
> >>
> >> Yea. As I just wrote in
> >> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> >> bisected it down to one commit (762faee5a267). With that commit I only see the
> >> networking issue across a few reboots, but with ebcce4926365 some boots oops
> >> badly and other times it' "just" network not working.
> >>
> >>
> >> [oopses]
>
> >> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> >> can create a kernel with some more debugging stuff enabled, if somebody tells
> >> me what'd work best here.
> >>
> >>
> >> Greetings,
> >>
> >> Andres Freund
> >
> > Thanks a lot for the work!
> > Just a small clarification:
> >
> > So IIUC you see several issues, right?
>
> Yes, although they might be related, as theorized by Linus upthread.
>
> > With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
>
> Yes.
>
>
> > With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
>
> Changed between rebooting. Sometimes the network issue, sometimes the crashes in the email you're replying to.
>
OK I think we can safely drop the others on the Cc, and I will add
a bunch of people and relevant lists instead.
I want to fix this quickly so testing can proceed.
> > MST
On Mon, Aug 15, 2022 at 01:15:27AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-15 03:51:34 -0400, Michael S. Tsirkin wrote:
> > It is possible that GCP gets confused if ring size is smaller than the
> > device maximum simply because no one did it in the past.
> >
> > So I pushed just the revert of 762faee5a267 to the test branch.
> > Could you give it a spin?
>
> Seems to fix the issue, at least to the extent I can determine at 1am... :)
>
> Greetings,
>
> Andres Freund
So you tested this:
commit 13df5a7eaeb22561d39354b576bc98a7e2c389f9 (HEAD, kernel.org/test)
Author: Michael S. Tsirkin <[email protected]>
Date: Mon Aug 15 03:44:38 2022 -0400
Revert "virtio_net: set the default max ring size by find_vqs()"
This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.
Signed-off-by: Michael S. Tsirkin <[email protected]>
and it fixes both issues right? No crashes no networking issue?
Just making sure before I send this upstream as a stop-gap.
We need to figure out what is going on since I think there's a GCP
hypervisor bug that we need to find a way to detect and work around but
that can wait.
--
MST
Hi,
On 2022-08-15 03:51:34 -0400, Michael S. Tsirkin wrote:
> It is possible that GCP gets confused if ring size is smaller than the
> device maximum simply because no one did it in the past.
>
> So I pushed just the revert of 762faee5a267 to the test branch.
> Could you give it a spin?
Seems to fix the issue, at least to the extent I can determine at 1am... :)
Greetings,
Andres Freund
On Mon, Aug 15, 2022 at 12:46:36AM -0700, Andres Freund wrote:
> Hi,
>
> On Mon, Aug 15, 2022, at 00:29, Michael S. Tsirkin wrote:
> > On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> >> Hi,
> >>
> >> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> >> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <[email protected]> wrote:
> >> > >
> >> > > Some of the symptoms could be related to the issue in this thread, hence
> >> > > listing them here
> >> >
> >> > Smells like slab corruption to me, and the problems may end up being
> >> > then largely random just depending on who ends up using the allocation
> >> > that gets trampled on.
> >> >
> >> > I wouldn't be surprised if it's all the same thing - including your
> >> > network issue.
> >>
> >> Yea. As I just wrote in
> >> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> >> bisected it down to one commit (762faee5a267). With that commit I only see the
> >> networking issue across a few reboots, but with ebcce4926365 some boots oops
> >> badly and other times it' "just" network not working.
> >>
> >>
> >> [oopses]
>
> >> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> >> can create a kernel with some more debugging stuff enabled, if somebody tells
> >> me what'd work best here.
> >>
> >>
> >> Greetings,
> >>
> >> Andres Freund
> >
> > Thanks a lot for the work!
> > Just a small clarification:
> >
> > So IIUC you see several issues, right?
>
> Yes, although they might be related, as theorized by Linus upthread.
>
> > With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
>
> Yes.
>
>
> > With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
>
> Changed between rebooting. Sometimes the network issue, sometimes the crashes in the email you're replying to.
>
OK just adding:
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Jason Wang <[email protected]>
L: [email protected]
L: [email protected]
I think we can drop the original Cc list:
Cc: Linus Torvalds <[email protected]>,
Jens Axboe <[email protected]>,
James Bottomley <[email protected]>,
"Martin K. Petersen" <[email protected]>,
Guenter Roeck <[email protected]>, [email protected],
Greg Kroah-Hartman <[email protected]>
but I'm not sure, maybe they want to be informed.
>
> > MST
Hi,
On August 15, 2022 1:28:29 AM PDT, "Michael S. Tsirkin" <[email protected]> wrote:
>On Mon, Aug 15, 2022 at 01:15:27AM -0700, Andres Freund wrote:
>> Hi,
>>
>> On 2022-08-15 03:51:34 -0400, Michael S. Tsirkin wrote:
>> > It is possible that GCP gets confused if ring size is smaller than the
>> > device maximum simply because no one did it in the past.
>> >
>> > So I pushed just the revert of 762faee5a267 to the test branch.
>> > Could you give it a spin?
>>
>> Seems to fix the issue, at least to the extent I can determine at 1am... :)
>>
>> Greetings,
>>
>> Andres Freund
>
>So you tested this:
>
>commit 13df5a7eaeb22561d39354b576bc98a7e2c389f9 (HEAD, kernel.org/test)
>Author: Michael S. Tsirkin <[email protected]>
>Date: Mon Aug 15 03:44:38 2022 -0400
>
> Revert "virtio_net: set the default max ring size by find_vqs()"
>
> This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
>
>and it fixes both issues right? No crashes no networking issue?
Correct. I only did limited testing, but it's survived far longer / more reboots than anything since the commit.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sun, Aug 14, 2022 at 02:26:10PM -0700, Guenter Roeck wrote:
> Hi all,
>
> syscaller reports lots of crashes with the mainline kernel. The latest
> I have seen, based on the current ToT, is attached. Backtraces are not
> always the same, but this one is typical.
>
> I have not tried to analyze this further.
>
> Guenter
Thanks a lot for the report!
Could you please try the latest vhost tree:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
which has been reported by Andres Freund to fix a similar issue?
Thanks!
--
MST
On Mon, Aug 15, 2022 at 01:34:41AM -0700, Andres Freund wrote:
> Hi,
>
> On August 15, 2022 1:28:29 AM PDT, "Michael S. Tsirkin" <[email protected]> wrote:
> >On Mon, Aug 15, 2022 at 01:15:27AM -0700, Andres Freund wrote:
> >> Hi,
> >>
> >> On 2022-08-15 03:51:34 -0400, Michael S. Tsirkin wrote:
> >> > It is possible that GCP gets confused if ring size is smaller than the
> >> > device maximum simply because no one did it in the past.
> >> >
> >> > So I pushed just the revert of 762faee5a267 to the test branch.
> >> > Could you give it a spin?
> >>
> >> Seems to fix the issue, at least to the extent I can determine at 1am... :)
> >>
> >> Greetings,
> >>
> >> Andres Freund
> >
> >So you tested this:
> >
> >commit 13df5a7eaeb22561d39354b576bc98a7e2c389f9 (HEAD, kernel.org/test)
> >Author: Michael S. Tsirkin <[email protected]>
> >Date: Mon Aug 15 03:44:38 2022 -0400
> >
> > Revert "virtio_net: set the default max ring size by find_vqs()"
> >
> > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> >
> >and it fixes both issues right? No crashes no networking issue?
>
> Correct. I only did limited testing, but it's survived far longer / more reboots than anything since the commit.
>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
OK so this gives us a quick revert as a solution for now.
Next, I would appreciate it if you just try this simple hack.
If it crashes we either have a long standing problem in virtio
code or more likely a gcp bug where it can't handle smaller
rings than what device requestes.
Thanks!
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index f7965c5dd36b..bdd5f481570b 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -314,6 +314,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
if (!size || size > num)
size = num;
+ if (size > 1024)
+ size = 1024;
+
if (size & (size - 1)) {
dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", size);
return ERR_PTR(-EINVAL);
--
MST
On Mon, Aug 15, 2022 at 08:49:20AM -0700, Guenter Roeck wrote:
> On Mon, Aug 15, 2022 at 05:43:19AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Aug 14, 2022 at 02:26:10PM -0700, Guenter Roeck wrote:
> > > Hi all,
> > >
> > > syscaller reports lots of crashes with the mainline kernel. The latest
> > > I have seen, based on the current ToT, is attached. Backtraces are not
> > > always the same, but this one is typical.
> > >
> > > I have not tried to analyze this further.
> > >
> > > Guenter
> >
> > Thanks a lot for the report!
> >
> > Could you please try the latest vhost tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
> >
> > which has been reported by Andres Freund to fix a similar issue?
> >
>
> I have now literally seen dozens of reports with syzcaller crashing
> in various code locations, but we are still trying to actually reproduce
> the problem. We'll respond on the revert patch once we can reproduce the
> issue and thus the fix.
>
> Thanks,
> Guenter
OK I think I will give it a day in next and then send it to Linus.
Would be nice to know before that - vacation next week so I'd like to be
reasonably sure things are nice and quiet and everyone can proceed with
testing.
--
MST
On Mon, Aug 15, 2022 at 05:43:19AM -0400, Michael S. Tsirkin wrote:
> On Sun, Aug 14, 2022 at 02:26:10PM -0700, Guenter Roeck wrote:
> > Hi all,
> >
> > syscaller reports lots of crashes with the mainline kernel. The latest
> > I have seen, based on the current ToT, is attached. Backtraces are not
> > always the same, but this one is typical.
> >
> > I have not tried to analyze this further.
> >
> > Guenter
>
> Thanks a lot for the report!
>
> Could you please try the latest vhost tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
>
> which has been reported by Andres Freund to fix a similar issue?
>
I have now literally seen dozens of reports with syzcaller crashing
in various code locations, but we are still trying to actually reproduce
the problem. We'll respond on the revert patch once we can reproduce the
issue and thus the fix.
Thanks,
Guenter
Hi,
On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> OK so this gives us a quick revert as a solution for now.
> Next, I would appreciate it if you just try this simple hack.
> If it crashes we either have a long standing problem in virtio
> code or more likely a gcp bug where it can't handle smaller
> rings than what device requestes.
> Thanks!
I applied the below and the problem persists.
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index f7965c5dd36b..bdd5f481570b 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -314,6 +314,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> if (!size || size > num)
> size = num;
>
> + if (size > 1024)
> + size = 1024;
> +
> if (size & (size - 1)) {
> dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", size);
> return ERR_PTR(-EINVAL);
>
>
[ 1.165162] virtio_net virtio1 enp0s4: renamed from eth0
[ 1.177815] general protection fault, probably for non-canonical address 0xffff000000000400: 0000 [#1] PREEMPT SMP PTI
[ 1.179565] CPU: 1 PID: 125 Comm: systemd-udevd Not tainted 6.0.0-rc1-bisect14-dirty #14
[ 1.180785] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
[ 1.182475] RIP: 0010:__kmalloc_node_track_caller+0x19e/0x380
[ 1.183365] Code: 2b 04 25 28 00 00 00 0f 85 f8 01 00 00 48 83 c4 18 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 4a 40 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 0b ff
[ 1.186208] RSP: 0018:ffff9c470021b860 EFLAGS: 00010246
[ 1.187194] RAX: ffff000000000000 RBX: 00000000000928c0 RCX: 0000000000000400
[ 1.188634] RDX: 0000000000005781 RSI: 00000000000928c0 RDI: 000000000002e0f0
[ 1.190177] RBP: ffff908380042c00 R08: 0000000000000600 R09: ffff908380b665e4
[ 1.191256] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000928c0
[ 1.192269] R13: 0000000000000740 R14: 00000000ffffffff R15: 0000000000000000
[ 1.193368] FS: 00007f746702a8c0(0000) GS:ffff9084b7d00000(0000) knlGS:0000000000000000
[ 1.194846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.195661] CR2: 00007ffc010df980 CR3: 0000000103826005 CR4: 00000000003706e0
[ 1.196912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.198216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.199367] Call Trace:
[ 1.199815] <TASK>
[ 1.200138] ? netlink_trim+0x85/0xb0
[ 1.200754] pskb_expand_head+0x92/0x340
[ 1.202512] netlink_trim+0x85/0xb0
[ 1.203069] netlink_unicast+0x54/0x390
[ 1.203630] rtnl_getlink+0x366/0x410
[ 1.204155] ? __d_alloc+0x24/0x1d0
[ 1.204668] rtnetlink_rcv_msg+0x146/0x3b0
[ 1.205256] ? _raw_spin_unlock+0xd/0x30
[ 1.205867] ? __d_add+0xf2/0x1b0
[ 1.206600] ? rtnl_calcit.isra.0+0x130/0x130
[ 1.207221] netlink_rcv_skb+0x49/0xf0
[ 1.207904] netlink_unicast+0x23a/0x390
[ 1.208585] netlink_sendmsg+0x23b/0x4b0
[ 1.209203] sock_sendmsg+0x57/0x60
[ 1.210118] __sys_sendto+0x117/0x170
[ 1.210694] ? __wake_up_common_lock+0x83/0xc0
[ 1.211420] __x64_sys_sendto+0x1b/0x30
[ 1.211992] do_syscall_64+0x37/0x90
[ 1.212497] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1.213407] RIP: 0033:0x7f74677404e6
[ 1.213973] Code: 69 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 41 54 48 83 ec 30 44 89 4c 24 2c 4c
[ 1.217098] RSP: 002b:00007ffc010daa78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 1.219539] RAX: ffffffffffffffda RBX: 000000000011bc98 RCX: 00007f74677404e6
[ 1.220552] RDX: 0000000000000020 RSI: 0000563160679570 RDI: 0000000000000005
[ 1.222378] RBP: 00005631606796b0 R08: 00007ffc010daaf0 R09: 0000000000000080
[ 1.223692] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 1.224793] R13: 0000000000000000 R14: 0000000000000000 R15: 00005631606794b0
[ 1.226228] </TASK>
[ 1.226775] Modules linked in:
[ 1.227414] ---[ end trace 0000000000000000 ]---
Greetings,
Andres Freund
On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > OK so this gives us a quick revert as a solution for now.
> > Next, I would appreciate it if you just try this simple hack.
> > If it crashes we either have a long standing problem in virtio
> > code or more likely a gcp bug where it can't handle smaller
> > rings than what device requestes.
> > Thanks!
>
> I applied the below and the problem persists.
>
> > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > index f7965c5dd36b..bdd5f481570b 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -314,6 +314,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> > if (!size || size > num)
> > size = num;
> >
> > + if (size > 1024)
> > + size = 1024;
> > +
> > if (size & (size - 1)) {
> > dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", size);
> > return ERR_PTR(-EINVAL);
> >
> >
>
> [ 1.165162] virtio_net virtio1 enp0s4: renamed from eth0
> [ 1.177815] general protection fault, probably for non-canonical address 0xffff000000000400: 0000 [#1] PREEMPT SMP PTI
> [ 1.179565] CPU: 1 PID: 125 Comm: systemd-udevd Not tainted 6.0.0-rc1-bisect14-dirty #14
> [ 1.180785] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
> [ 1.182475] RIP: 0010:__kmalloc_node_track_caller+0x19e/0x380
> [ 1.183365] Code: 2b 04 25 28 00 00 00 0f 85 f8 01 00 00 48 83 c4 18 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 4a 40 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 0b ff
> [ 1.186208] RSP: 0018:ffff9c470021b860 EFLAGS: 00010246
> [ 1.187194] RAX: ffff000000000000 RBX: 00000000000928c0 RCX: 0000000000000400
> [ 1.188634] RDX: 0000000000005781 RSI: 00000000000928c0 RDI: 000000000002e0f0
> [ 1.190177] RBP: ffff908380042c00 R08: 0000000000000600 R09: ffff908380b665e4
> [ 1.191256] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000928c0
> [ 1.192269] R13: 0000000000000740 R14: 00000000ffffffff R15: 0000000000000000
> [ 1.193368] FS: 00007f746702a8c0(0000) GS:ffff9084b7d00000(0000) knlGS:0000000000000000
> [ 1.194846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1.195661] CR2: 00007ffc010df980 CR3: 0000000103826005 CR4: 00000000003706e0
> [ 1.196912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1.198216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1.199367] Call Trace:
> [ 1.199815] <TASK>
> [ 1.200138] ? netlink_trim+0x85/0xb0
> [ 1.200754] pskb_expand_head+0x92/0x340
> [ 1.202512] netlink_trim+0x85/0xb0
> [ 1.203069] netlink_unicast+0x54/0x390
> [ 1.203630] rtnl_getlink+0x366/0x410
> [ 1.204155] ? __d_alloc+0x24/0x1d0
> [ 1.204668] rtnetlink_rcv_msg+0x146/0x3b0
> [ 1.205256] ? _raw_spin_unlock+0xd/0x30
> [ 1.205867] ? __d_add+0xf2/0x1b0
> [ 1.206600] ? rtnl_calcit.isra.0+0x130/0x130
> [ 1.207221] netlink_rcv_skb+0x49/0xf0
> [ 1.207904] netlink_unicast+0x23a/0x390
> [ 1.208585] netlink_sendmsg+0x23b/0x4b0
> [ 1.209203] sock_sendmsg+0x57/0x60
> [ 1.210118] __sys_sendto+0x117/0x170
> [ 1.210694] ? __wake_up_common_lock+0x83/0xc0
> [ 1.211420] __x64_sys_sendto+0x1b/0x30
> [ 1.211992] do_syscall_64+0x37/0x90
> [ 1.212497] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 1.213407] RIP: 0033:0x7f74677404e6
> [ 1.213973] Code: 69 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 41 54 48 83 ec 30 44 89 4c 24 2c 4c
> [ 1.217098] RSP: 002b:00007ffc010daa78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [ 1.219539] RAX: ffffffffffffffda RBX: 000000000011bc98 RCX: 00007f74677404e6
> [ 1.220552] RDX: 0000000000000020 RSI: 0000563160679570 RDI: 0000000000000005
> [ 1.222378] RBP: 00005631606796b0 R08: 00007ffc010daaf0 R09: 0000000000000080
> [ 1.223692] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
> [ 1.224793] R13: 0000000000000000 R14: 0000000000000000 R15: 00005631606794b0
> [ 1.226228] </TASK>
> [ 1.226775] Modules linked in:
> [ 1.227414] ---[ end trace 0000000000000000 ]---
>
> Greetings,
>
> Andres Freund
Okay! And just to be 100% sure, can you try the following on top of 5.19:
diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 623906b4996c..6f4e54a618bc 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -208,6 +208,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
return ERR_PTR(-EINVAL);
}
+ if (num > 1024)
+ num = 1024;
+
info->msix_vector = msix_vec;
/* create the vring */
--
MST
Hi,
On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > OK so this gives us a quick revert as a solution for now.
> > > Next, I would appreciate it if you just try this simple hack.
> > > If it crashes we either have a long standing problem in virtio
> > > code or more likely a gcp bug where it can't handle smaller
> > > rings than what device requestes.
> > > Thanks!
> >
> > I applied the below and the problem persists.
> >
> > [...]
>
> Okay!
Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
want me to test it with the 762faee5a267 reverted? I guess what you're trying
to test if a smaller queue than what's requested you'd want to do so without
the problematic patch applied...
> And just to be 100% sure, can you try the following on top of 5.19:
> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> index 623906b4996c..6f4e54a618bc 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -208,6 +208,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> return ERR_PTR(-EINVAL);
> }
>
> + if (num > 1024)
> + num = 1024;
> +
> info->msix_vector = msix_vec;
>
> /* create the vring */
>
> --
Either way, I did this, and there are no issues that I could observe. No
oopses, no broken networking. But:
To make sure it does something I added a debugging printk - which doesn't show
up. I assume this is at a point at least earlyprintk should work (which I see
getting enabled via serial)?
Greetings,
Andres Freund
On Mon, Aug 15, 2022 at 12:01:56PM -0400, Michael S. Tsirkin wrote:
>
> OK I think I will give it a day in next and then send it to Linus.
> Would be nice to know before that - vacation next week so I'd like to be
> reasonably sure things are nice and quiet and everyone can proceed with
> testing.
I ended up publishing several of the syzkaller reports because I could
not figure out how to test the proposed patch otherwise. Sorry for the
noise on the mailing lists, but I just could not figure out how to
trigger test runs without making the reports public. Various test runs
are now queued. Hopefully we should get results soon.
Guenter
On Mon, Aug 15, 2022 at 11:22 AM Guenter Roeck <[email protected]> wrote:
>
> I ended up publishing several of the syzkaller reports because I could
> not figure out how to test the proposed patch otherwise. Sorry for the
> noise on the mailing lists, but I just could not figure out how to
> trigger test runs without making the reports public. Various test runs
> are now queued. Hopefully we should get results soon.
I actually think that the best "test" is to just test the revert and
see if the problems go away.
You may not have found a way to reliably trigger one particular
problem, but apparently you can easily enough trigger some random
issue with enough tests.
If it's some random memory corruption, there may simply not ba a great
reproducer, with the oopses just depending on random layout and
timing.
Linus
On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > OK so this gives us a quick revert as a solution for now.
> > > > Next, I would appreciate it if you just try this simple hack.
> > > > If it crashes we either have a long standing problem in virtio
> > > > code or more likely a gcp bug where it can't handle smaller
> > > > rings than what device requestes.
> > > > Thanks!
> > >
> > > I applied the below and the problem persists.
> > >
> > > [...]
> >
> > Okay!
>
> Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
> want me to test it with the 762faee5a267 reverted? I guess what you're trying
> to test if a smaller queue than what's requested you'd want to do so without
> the problematic patch applied...
>
>
> > And just to be 100% sure, can you try the following on top of 5.19:
>
> > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > index 623906b4996c..6f4e54a618bc 100644
> > --- a/drivers/virtio/virtio_pci_modern.c
> > +++ b/drivers/virtio/virtio_pci_modern.c
> > @@ -208,6 +208,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> > return ERR_PTR(-EINVAL);
> > }
> >
> > + if (num > 1024)
> > + num = 1024;
> > +
> > info->msix_vector = msix_vec;
> >
> > /* create the vring */
> >
> > --
>
> Either way, I did this, and there are no issues that I could observe. No
> oopses, no broken networking. But:
>
> To make sure it does something I added a debugging printk - which doesn't show
> up. I assume this is at a point at least earlyprintk should work (which I see
> getting enabled via serial)?
>
> Greetings,
>
> Andres Freund
Sorry if I was unclear. I wanted to know whether the change somehow
exposes a driver bug or a GCP bug. So what I wanted to do is to test
this patch on top of *5.19*, not on top of the revert.
The idea is if we reduce the size and it starts crashing then
we know it's GCP fault, if not then GCP can handle smaller sizes
and it's one of the driver changes.
It will apply on top of the revert but won't do much.
Yes I think printk should work here.
--
MST
Hi,
On 2022-08-15 16:21:51 -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> > > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > OK so this gives us a quick revert as a solution for now.
> > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > If it crashes we either have a long standing problem in virtio
> > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > rings than what device requestes.
> > > > > Thanks!
> > > >
> > > > I applied the below and the problem persists.
> > > >
> > > > [...]
> > >
> > > Okay!
> >
> > Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
> > want me to test it with the 762faee5a267 reverted? I guess what you're trying
> > to test if a smaller queue than what's requested you'd want to do so without
> > the problematic patch applied...
> >
> >
> > Either way, I did this, and there are no issues that I could observe. No
> > oopses, no broken networking. But:
> >
> > To make sure it does something I added a debugging printk - which doesn't show
> > up. I assume this is at a point at least earlyprintk should work (which I see
> > getting enabled via serial)?
> >
> Sorry if I was unclear. I wanted to know whether the change somehow
> exposes a driver bug or a GCP bug. So what I wanted to do is to test
> this patch on top of *5.19*, not on top of the revert.
Right, the 5.19 part was clear, just the earlier test:
> > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > OK so this gives us a quick revert as a solution for now.
> > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > If it crashes we either have a long standing problem in virtio
> > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > Thanks!
I wasn't sure about.
After I didn't see any effect on 5.19 + your patch, I grew a bit suspicious
and added the printks.
> Yes I think printk should work here.
The reason the debug patch didn't change anything, and that my debug printk
didn't show, is that gcp uses the legacy paths...
If there were a bug in the legacy path, it'd explain why the problem only
shows on gcp, and not in other situations.
I'll queue testing the legacy path with the equivalent change.
- Andres
Greetings,
Andres Freund
On Mon, Aug 15, 2022 at 11:37:54AM -0700, Linus Torvalds wrote:
> On Mon, Aug 15, 2022 at 11:22 AM Guenter Roeck <[email protected]> wrote:
> >
> > I ended up publishing several of the syzkaller reports because I could
> > not figure out how to test the proposed patch otherwise. Sorry for the
> > noise on the mailing lists, but I just could not figure out how to
> > trigger test runs without making the reports public. Various test runs
> > are now queued. Hopefully we should get results soon.
>
> I actually think that the best "test" is to just test the revert and
> see if the problems go away.
>
> You may not have found a way to reliably trigger one particular
> problem, but apparently you can easily enough trigger some random
> issue with enough tests.
>
> If it's some random memory corruption, there may simply not ba a great
> reproducer, with the oopses just depending on random layout and
> timing.
>
... and the environment. So far we have only been able to reproduce
the problem on Google Cloud VMs (GCE), and there is seems to happen
pretty much all the time.
Guenter
Michael,
On Mon, Aug 15, 2022 at 12:50:52PM -0400, Michael S. Tsirkin wrote:
[ ...]
>
> Okay! And just to be 100% sure, can you try the following on top of 5.19:
>
You should now be able to test any patches using the syzkaller
infrastructure. Pick any (or all) of the now-published syzkaller
reports from the linux-kernel mailing list, reply with:
#syz test git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.19
and provide your patch as attachment.
Guenter
Hi,
On 2022-08-15 14:04:37 -0700, Andres Freund wrote:
> Booting with the equivalent change, atop 5.19, in the legacy setup_vq()
> reliably causes boot to hang:
I don't know much virtio, so take this with a rock of salt:
Legacy setup_vq() doesn't tell the host about the queue size. The modern one
does:
vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq));
but the legacy one doesn't.
I assume this means the host will assume the queue is of the size suggested by
vp_legacy_get_queue_size(). If the host continues to write into the space
after the "assumed end" of the queue, but the guest puts other stuff in that
space, well, I'd expect fun roughly like the stuff we've been seeing in this
and related threads.
Greetings,
Andres Freund
On Mon, Aug 15, 2022 at 01:53:30PM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-08-15 16:21:51 -0400, Michael S. Tsirkin wrote:
> > On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> > > > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > > > > Hi,
> > > > >
> > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > > OK so this gives us a quick revert as a solution for now.
> > > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > > If it crashes we either have a long standing problem in virtio
> > > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > > rings than what device requestes.
> > > > > > Thanks!
> > > > >
> > > > > I applied the below and the problem persists.
> > > > >
> > > > > [...]
> > > >
> > > > Okay!
> > >
> > > Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
> > > want me to test it with the 762faee5a267 reverted? I guess what you're trying
> > > to test if a smaller queue than what's requested you'd want to do so without
> > > the problematic patch applied...
> > >
> > >
> > > Either way, I did this, and there are no issues that I could observe. No
> > > oopses, no broken networking. But:
> > >
> > > To make sure it does something I added a debugging printk - which doesn't show
> > > up. I assume this is at a point at least earlyprintk should work (which I see
> > > getting enabled via serial)?
> > >
>
> > Sorry if I was unclear. I wanted to know whether the change somehow
> > exposes a driver bug or a GCP bug. So what I wanted to do is to test
> > this patch on top of *5.19*, not on top of the revert.
>
> Right, the 5.19 part was clear, just the earlier test:
>
> > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > > OK so this gives us a quick revert as a solution for now.
> > > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > > If it crashes we either have a long standing problem in virtio
> > > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > > Thanks!
>
> I wasn't sure about.
>
> After I didn't see any effect on 5.19 + your patch, I grew a bit suspicious
> and added the printks.
>
>
> > Yes I think printk should work here.
>
> The reason the debug patch didn't change anything, and that my debug printk
> didn't show, is that gcp uses the legacy paths...
Wait a second. Eureka I think!
So I think GCP is not broken.
I think what's broken is this patch:
commit cdb44806fca2d0ad29ca644cbf1505433902ee0c
Author: Xuan Zhuo <[email protected]>
Date: Mon Aug 1 14:38:54 2022 +0800
virtio_pci: support the arg sizes of find_vqs()
Specifically:
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 2257f1b3d8ae..d75e5c4e637f 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -112,6 +112,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
unsigned int index,
void (*callback)(struct virtqueue *vq),
const char *name,
+ u32 size,
bool ctx,
u16 msix_vec)
{
@@ -125,10 +126,13 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
if (!num || vp_legacy_get_queue_enable(&vp_dev->ldev, index))
return ERR_PTR(-ENOENT);
+ if (!size || size > num)
+ size = num;
+
info->msix_vector = msix_vec;
/* create the vring */
- vq = vring_create_virtqueue(index, num,
+ vq = vring_create_virtqueue(index, size,
VIRTIO_PCI_VRING_ALIGN, &vp_dev->vdev,
true, false, ctx,
vp_notify, callback, name);
So if you pass the size parameter for a legacy device it will
try to make the ring smaller and that is not legal with
legacy at all. But the driver treats legacy and modern
the same, it allocates a smaller queue anyway.
Lo and behold, I pass disable-modern=on to qemu and it happily
corrupts memory exactly the same as GCP does.
So the new find_vqs API is actually completely broken, it can not work for
legacy at all and for added fun there's no way to find out
that it's legacy. Maybe we should interpret the patch
So I think I will also revert
04ca0b0b16f11faf74fa92468dab51b8372586cd..fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4
> If there were a bug in the legacy path, it'd explain why the problem only
> shows on gcp, and not in other situations.
>
> I'll queue testing the legacy path with the equivalent change.
>
> - Andres
>
>
> Greetings,
>
> Andres Freund
On 2022-08-15 13:53:31 -0700, Andres Freund wrote:
> The reason the debug patch didn't change anything, and that my debug printk
> didn't show, is that gcp uses the legacy paths...
>
> If there were a bug in the legacy path, it'd explain why the problem only
> shows on gcp, and not in other situations.
>
> I'll queue testing the legacy path with the equivalent change.
Booting with the equivalent change, atop 5.19, in the legacy setup_vq()
reliably causes boot to hang:
[ 0.718768] ACPI: button: Sleep Button [SLPF]
[ 0.721989] ACPI: \_SB_.LNKC: Enabled at IRQ 11
[ 0.722688] adebug: use legacy: 0
[ 0.722724] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
[ 0.724286] adebug: probe modern: -19
[ 0.727353] ACPI: \_SB_.LNKD: Enabled at IRQ 10
[ 0.728719] adebug: use legacy: 0
[ 0.728766] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[ 0.730422] adebug: probe modern: -19
[ 0.733552] ACPI: \_SB_.LNKA: Enabled at IRQ 10
[ 0.734923] adebug: use legacy: 0
[ 0.734957] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
[ 0.736426] adebug: probe modern: -19
[ 0.739039] ACPI: \_SB_.LNKB: Enabled at IRQ 11
[ 0.740350] adebug: use legacy: 0
[ 0.740390] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
[ 0.742142] adebug: probe modern: -19
[ 0.747627] adebug: legacy setup_vq
[ 0.748243] virtio-pci 0000:00:05.0: adebug: legacy: not limiting queue size, only 256
[ 0.751081] adebug: legacy setup_vq
[ 0.751110] virtio-pci 0000:00:05.0: adebug: legacy: not limiting queue size, only 256
[ 0.754028] adebug: legacy setup_vq
[ 0.754059] virtio-pci 0000:00:05.0: adebug: legacy: not limiting queue size, only 1
[ 0.757760] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.759135] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 0.760399] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[ 0.761610] 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
[ 0.762923] 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
[ 0.764222] Non-volatile memory driver v1.3
[ 0.768857] adebug: legacy setup_vq
[ 0.768882] virtio-pci 0000:00:06.0: adebug: legacy: not limiting queue size, only 256
[ 0.773002] Linux agpgart interface v0.103
[ 0.775424] loop: module loaded
[ 0.780513] adebug: legacy setup_vq
[ 0.780538] virtio-pci 0000:00:03.0: adebug: legacy: limiting queue size from 8192 to 1024
[ 0.784075] adebug: legacy setup_vq
[ 0.784104] virtio-pci 0000:00:03.0: adebug: legacy: limiting queue size from 8192 to 1024
[ 0.787073] adebug: legacy setup_vq
[ 0.787101] virtio-pci 0000:00:03.0: adebug: legacy: limiting queue size from 8192 to 1024
[ 0.790379] scsi host0: Virtio SCSI HBA
[ 0.795968] Freeing initrd memory: 7236K
Greetings,
Andres Freund
On Mon, 15 Aug 2022 17:32:06 -0400, "Michael S. Tsirkin" <[email protected]> wrote:
> On Mon, Aug 15, 2022 at 01:53:30PM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2022-08-15 16:21:51 -0400, Michael S. Tsirkin wrote:
> > > On Mon, Aug 15, 2022 at 10:46:17AM -0700, Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2022-08-15 12:50:52 -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, Aug 15, 2022 at 09:45:03AM -0700, Andres Freund wrote:
> > > > > > Hi,
> > > > > >
> > > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > > > OK so this gives us a quick revert as a solution for now.
> > > > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > > > If it crashes we either have a long standing problem in virtio
> > > > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > > > rings than what device requestes.
> > > > > > > Thanks!
> > > > > >
> > > > > > I applied the below and the problem persists.
> > > > > >
> > > > > > [...]
> > > > >
> > > > > Okay!
> > > >
> > > > Just checking - I applied and tested this atop 6.0-rc1, correct? Or did you
> > > > want me to test it with the 762faee5a267 reverted? I guess what you're trying
> > > > to test if a smaller queue than what's requested you'd want to do so without
> > > > the problematic patch applied...
> > > >
> > > >
> > > > Either way, I did this, and there are no issues that I could observe. No
> > > > oopses, no broken networking. But:
> > > >
> > > > To make sure it does something I added a debugging printk - which doesn't show
> > > > up. I assume this is at a point at least earlyprintk should work (which I see
> > > > getting enabled via serial)?
> > > >
> >
> > > Sorry if I was unclear. I wanted to know whether the change somehow
> > > exposes a driver bug or a GCP bug. So what I wanted to do is to test
> > > this patch on top of *5.19*, not on top of the revert.
> >
> > Right, the 5.19 part was clear, just the earlier test:
> >
> > > > > > On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:
> > > > > > > OK so this gives us a quick revert as a solution for now.
> > > > > > > Next, I would appreciate it if you just try this simple hack.
> > > > > > > If it crashes we either have a long standing problem in virtio
> > > > > > > code or more likely a gcp bug where it can't handle smaller
> > > > > > > Thanks!
> >
> > I wasn't sure about.
> >
> > After I didn't see any effect on 5.19 + your patch, I grew a bit suspicious
> > and added the printks.
> >
> >
> > > Yes I think printk should work here.
> >
> > The reason the debug patch didn't change anything, and that my debug printk
> > didn't show, is that gcp uses the legacy paths...
>
> Wait a second. Eureka I think!
>
> So I think GCP is not broken.
> I think what's broken is this patch:
>
> commit cdb44806fca2d0ad29ca644cbf1505433902ee0c
> Author: Xuan Zhuo <[email protected]>
> Date: Mon Aug 1 14:38:54 2022 +0800
>
> virtio_pci: support the arg sizes of find_vqs()
>
>
> Specifically:
>
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 2257f1b3d8ae..d75e5c4e637f 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -112,6 +112,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> unsigned int index,
> void (*callback)(struct virtqueue *vq),
> const char *name,
> + u32 size,
> bool ctx,
> u16 msix_vec)
> {
> @@ -125,10 +126,13 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> if (!num || vp_legacy_get_queue_enable(&vp_dev->ldev, index))
> return ERR_PTR(-ENOENT);
>
> + if (!size || size > num)
> + size = num;
> +
> info->msix_vector = msix_vec;
>
> /* create the vring */
> - vq = vring_create_virtqueue(index, num,
> + vq = vring_create_virtqueue(index, size,
> VIRTIO_PCI_VRING_ALIGN, &vp_dev->vdev,
> true, false, ctx,
> vp_notify, callback, name);
>
>
>
> So if you pass the size parameter for a legacy device it will
> try to make the ring smaller and that is not legal with
> legacy at all. But the driver treats legacy and modern
> the same, it allocates a smaller queue anyway.
>
>
> Lo and behold, I pass disable-modern=on to qemu and it happily
> corrupts memory exactly the same as GCP does.
Yes, I think you are right.
Thank you very much.
>
>
> So the new find_vqs API is actually completely broken, it can not work for
> legacy at all and for added fun there's no way to find out
> that it's legacy. Maybe we should interpret the patch
>
> So I think I will also revert
>
> 04ca0b0b16f11faf74fa92468dab51b8372586cd..fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4
>
>
>
>
>
>
>
> > If there were a bug in the legacy path, it'd explain why the problem only
> > shows on gcp, and not in other situations.
> >
> > I'll queue testing the legacy path with the equivalent change.
> >
> > - Andres
> >
> >
> > Greetings,
> >
> > Andres Freund
>
On Mon, 15 Aug 2022 17:32:06 -0400, Michael wrote:
> So if you pass the size parameter for a legacy device it will
> try to make the ring smaller and that is not legal with
> legacy at all. But the driver treats legacy and modern
> the same, it allocates a smaller queue anyway.
>
> Lo and behold, I pass disable-modern=on to qemu and it happily
> corrupts memory exactly the same as GCP does.
Ouch!
I understand that the host does the actual corruption,
but could you think of any additional debug checking in the guest
that would caught this in future? Potentially only when KASAN
is enabled which can verify validity of memory ranges.
Some kind of additional layer of sanity checking.
This caused a bit of a havoc for syzbot with almost 100 unique
crash signatures, so would be useful to catch such issues more
reliably in future.
Thanks
On Wed, 17 Aug 2022 08:13:59 +0200, Dmitry Vyukov <[email protected]> wrote:
> On Mon, 15 Aug 2022 17:32:06 -0400, Michael wrote:
> > So if you pass the size parameter for a legacy device it will
> > try to make the ring smaller and that is not legal with
> > legacy at all. But the driver treats legacy and modern
> > the same, it allocates a smaller queue anyway.
> >
> > Lo and behold, I pass disable-modern=on to qemu and it happily
> > corrupts memory exactly the same as GCP does.
>
> Ouch!
>
> I understand that the host does the actual corruption,
> but could you think of any additional debug checking in the guest
> that would caught this in future? Potentially only when KASAN
> is enabled which can verify validity of memory ranges.
> Some kind of additional layer of sanity checking.
>
> This caused a bit of a havoc for syzbot with almost 100 unique
> crash signatures, so would be useful to catch such issues more
> reliably in future.
We can add a check to vring size before calling vp_legacy_set_queue_address().
Checking the memory range directly is a bit cumbersome.
Thanks.
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
index 2257f1b3d8ae..0673831f45b6 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -146,6 +146,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
goto out_del_vq;
}
+ BUG_ON(num != virtqueue_get_vring_size(vq));
+
/* activate the queue */
vp_legacy_set_queue_address(&vp_dev->ldev, index, q_pfn);
>
> Thanks
On Wed, Aug 17, 2022 at 02:36:31PM +0800, Xuan Zhuo wrote:
> On Wed, 17 Aug 2022 08:13:59 +0200, Dmitry Vyukov <[email protected]> wrote:
> > On Mon, 15 Aug 2022 17:32:06 -0400, Michael wrote:
> > > So if you pass the size parameter for a legacy device it will
> > > try to make the ring smaller and that is not legal with
> > > legacy at all. But the driver treats legacy and modern
> > > the same, it allocates a smaller queue anyway.
> > >
> > > Lo and behold, I pass disable-modern=on to qemu and it happily
> > > corrupts memory exactly the same as GCP does.
> >
> > Ouch!
> >
> > I understand that the host does the actual corruption,
> > but could you think of any additional debug checking in the guest
> > that would caught this in future? Potentially only when KASAN
> > is enabled which can verify validity of memory ranges.
> > Some kind of additional layer of sanity checking.
> >
> > This caused a bit of a havoc for syzbot with almost 100 unique
> > crash signatures, so would be useful to catch such issues more
> > reliably in future.
>
> We can add a check to vring size before calling vp_legacy_set_queue_address().
> Checking the memory range directly is a bit cumbersome.
>
> Thanks.
With a comment along the lines of
/* Legacy virtio pci has no way to communicate a change in vq size to
* the hypervisor. If ring sizes don't match hypervisor will happily
* corrupt memory.
*/
> diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c
> index 2257f1b3d8ae..0673831f45b6 100644
> --- a/drivers/virtio/virtio_pci_legacy.c
> +++ b/drivers/virtio/virtio_pci_legacy.c
> @@ -146,6 +146,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
> goto out_del_vq;
> }
>
> + BUG_ON(num != virtqueue_get_vring_size(vq));
> +
> /* activate the queue */
> vp_legacy_set_queue_address(&vp_dev->ldev, index, q_pfn);
>
>
> >
> > Thanks
On Tue, Aug 16, 2022 at 11:47 PM Xuan Zhuo <[email protected]> wrote:
>
> + BUG_ON(num != virtqueue_get_vring_size(vq));
> +
Please, no more BUG_ON.
Add a WARN_ON_ONCE() and return an error.
Linus
On Sun, Aug 14, 2022 at 2:26 PM Guenter Roeck <[email protected]> wrote:
>
> syscaller reports lots of crashes with the mainline kernel. The latest
> I have seen, based on the current ToT, is attached. Backtraces are not
> always the same, but this one is typical.
Ok, I just pushed out the pull of the virtio fixes, so hopefully this
is all behind us.
Looking forward to (not really ;^) seeing what the google cloud
automation reports with this fixed.
Thanks,
Linus
Hi,
On 2022-08-17 10:12:18 -0700, Linus Torvalds wrote:
> On Sun, Aug 14, 2022 at 2:26 PM Guenter Roeck <[email protected]> wrote:
> >
> > syscaller reports lots of crashes with the mainline kernel. The latest
> > I have seen, based on the current ToT, is attached. Backtraces are not
> > always the same, but this one is typical.
>
> Ok, I just pushed out the pull of the virtio fixes, so hopefully this
> is all behind us.
FWIW, my much smaller automation turned green, building 274a2eebf80c
https://cirrus-ci.com/task/6357530606567424?logs=build_image#L3814
previously it failed to come back up after a reboot at that point.
And the first run of postgres' tests using that newly built image also passed,
so I'm reasonably confident that the issue is fixed, rather than made less
common. Not that that is as likely to find problems as syscaller.
Greetings,
Andres Freund
On Wed, 17 Aug 2022 08:58:20 -0700, Linus Torvalds <[email protected]> wrote:
> On Tue, Aug 16, 2022 at 11:47 PM Xuan Zhuo <[email protected]> wrote:
> >
> > + BUG_ON(num != virtqueue_get_vring_size(vq));
> > +
>
> Please, no more BUG_ON.
>
> Add a WARN_ON_ONCE() and return an error.
OK, I will post v2 with WARN_ON_ONCE().
Thanks.
>
> Linus