2022-09-25 12:24:00

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] New: [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

Bug ID: 216529
Summary: [fstests generic/048] BUG: Kernel NULL pointer
dereference at 0x00000069,
filemap_release_folio+0x88/0xb0
Product: File System
Version: 2.5
Kernel Version: 6.0.0-rc6+
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
Assignee: [email protected]
Reporter: [email protected]
Regression: No

Hit a panic on ppc64le, by running generic/048 with 1k block size:

[ 4638.919160] run fstests generic/048 at 2022-09-23 21:00:41
[ 4641.700564] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 4641.710999] EXT4-fs (sda3): shut down requested (1)
[ 4641.718544] Aborting journal on device sda3-8.
[ 4641.740342] EXT4-fs (sda3): unmounting filesystem.
[ 4643.000415] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069
[ 4681.230922] Faulting instruction address: 0xc00000000068ee0c
[ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1]
[ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 4681.230942] Modules linked in: dm_flakey ext2 dm_snapshot dm_bufio dm_zero
dm_mod loop ext4 mbcache jbd2 bonding rfkill tls sunrpc pseries_rng drm fuse
drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth
scsi_transport_srp vmx_crypto
[ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted
6.0.0-rc6+ #1
[ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR:
0000000000000000
[ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+)
[ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
24028242 XER: 00000000
[ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0
[ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700
c00c00000042f1c0
[ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002
0000000000000000
[ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0
0000000000000000
[ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298
c0000001fff9c480
[ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000
0000000000000000
[ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8
5deadbeef0000100
[ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00
c000000006c0b8e8
[ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009
0000000000000009
[ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0
[ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150
[ 4681.238650] Call Trace:
[ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880
(unreliable)
[ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00
[ 4681.238670] [c000000006c0b890] [c000000000498708]
filemap_release_folio+0x88/0xb0
[ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0]
shrink_active_list+0x490/0x750
[ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430
[ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4]
shrink_node_memcgs+0x234/0x290
[ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0
[ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970
[ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450
[ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150
[ 4681.238735] [c000000006c0be10] [c00000000000cbe4]
ret_from_kernel_thread+0x5c/0x64
[ 4681.238745] Instruction dump:
[ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018
60000000
[ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c
7d295378
[ 4681.238782] ---[ end trace 0000000000000000 ]---
[ 4681.270607]
[ 4681.337460] Kernel attempted to read user page (6a) - exploit attempt? (uid:
0)
[ 4681.337469] BUG: Kernel NULL pointer dereference on read at 0x0000006a
[ 4681.337474] Faulting instruction address: 0xc00000000068ee0c
[ 4681.337478] Oops: Kernel access of bad area, sig: 11 [#2]
[ 4681.337481] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 4681.337486] Modules linked in: dm_flakey ext2 dm_snapshot dm_bufio dm_zero
dm_mod loop ext4 mbcache jbd2 bonding rfkill tls sunrpc pseries_rng drm fuse
drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth
scsi_transport_srp vmx_crypto
[ 4681.337517] CPU: 2 PID: 704157 Comm: xfs_io Kdump: loaded Tainted: G D
6.0.0-rc6+ #1
[ 4681.337523] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR:
0000000000000000
[ 4681.337527] REGS: c000000036006ef0 TRAP: 0300 Tainted: G D
(6.0.0-rc6+)
[ 4681.337532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
28428242 XER: 00000001
[ 4681.337546] CFAR: c00000000000c80c DAR: 000000000000006a DSISR: 40000000
IRQMASK: 0
[ 4681.337546] GPR00: c00000000068f2b8 c000000036007190 c000000002cf1700
c00c000000424740
[ 4681.337546] GPR04: c0000000360071f0 0000000000000000 0000000000000002
0000000000000000
[ 4681.337546] GPR08: c000000002d404b0 0000000000000000 c00c000000424740
0000000000000002
[ 4681.337546] GPR12: 0000000000000000 c00000000ffce400 0000000000000000
c0000001fff9c480
[ 4681.337546] GPR16: c00000004960e050 0000000000000007 0000000000000000
0000000000000000
[ 4681.337546] GPR20: 0000000000000001 c000000036007288 c00000000146b9d8
5deadbeef0000100
[ 4681.337546] GPR24: 5deadbeef0000122 c00000004960e000 c000000036007678
c000000036007278
[ 4681.337546] GPR28: c0000000360071f0 c00c000000424740 000000000000000a
000000000000000a
[ 4681.337602] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0
[ 4681.337608] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150
[ 4681.337613] Call Trace:
[ 4681.337616] [c000000036007190] [c000000036007210] 0xc000000036007210
(unreliable)
[ 4681.337622] [c0000000360071d0] [c000000036007678] 0xc000000036007678
[ 4681.337627] [c000000036007220] [c000000000498708]
filemap_release_folio+0x88/0xb0
[ 4681.337633] [c000000036007240] [c0000000004c51c0]
shrink_active_list+0x490/0x750
[ 4681.337640] [c000000036007340] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430
[ 4681.337645] [c000000036007430] [c0000000004ca1f4]
shrink_node_memcgs+0x234/0x290
[ 4681.337651] [c0000000360074a0] [c0000000004ca3c4] shrink_node+0x174/0x6b0
[ 4681.337656] [c000000036007550] [c0000000004cbd34]
shrink_zones.constprop.0+0xd4/0x3e0
[ 4681.337661] [c0000000360075d0] [c0000000004cc158]
do_try_to_free_pages+0x118/0x470
[ 4681.337667] [c000000036007650] [c0000000004cd084]
try_to_free_pages+0x194/0x4c0
[ 4681.337673] [c000000036007720] [c00000000054cca4]
__alloc_pages_slowpath.constprop.0+0x4f4/0xd80
[ 4681.337680] [c000000036007880] [c00000000054d95c] __alloc_pages+0x42c/0x580
[ 4681.337686] [c000000036007910] [c000000000587d88] alloc_pages+0xd8/0x1d0
[ 4681.337692] [c000000036007960] [c000000000587eb4] folio_alloc+0x34/0x90
[ 4681.337698] [c000000036007990] [c000000000498bc0]
filemap_alloc_folio+0x40/0x60
[ 4681.337703] [c0000000360079b0] [c0000000004a0f54]
__filemap_get_folio+0x224/0x790
[ 4681.337709] [c000000036007ab0] [c0000000004b4830]
pagecache_get_page+0x30/0xb0
[ 4681.337715] [c000000036007ae0] [c008000003a9e4dc]
ext4_da_write_begin+0x1a4/0x4f0 [ext4]
[ 4681.337742] [c000000036007b70] [c000000000498e54]
generic_perform_write+0xf4/0x2b0
[ 4681.337748] [c000000036007c20] [c008000003a7d190]
ext4_buffered_write_iter+0xa8/0x1a0 [ext4]
[ 4681.337770] [c000000036007c70] [c000000000615fc8] vfs_write+0x358/0x4b0
[ 4681.337776] [c000000036007d40] [c0000000006161f4] sys_pwrite64+0xd4/0x120
[ 4681.337782] [c000000036007da0] [c0000000000318d0]
system_call_exception+0x180/0x430
[ 4681.337788] [c000000036007e10] [c00000000000be68]
system_call_vectored_common+0xe8/0x278
[ 4681.337795] --- interrupt: 3000 at 0x7fff95651da4
[ 4681.337799] NIP: 00007fff95651da4 LR: 0000000000000000 CTR:
0000000000000000
[ 4681.337803] REGS: c000000036007e80 TRAP: 3000 Tainted: G D
(6.0.0-rc6+)
[ 4681.337807] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR:
48082402 XER: 00000000
[ 4681.337822] IRQMASK: 0
[ 4681.337822] GPR00: 00000000000000b4 00007ffffaa52530 00007fff95767200
0000000000000003
[ 4681.337822] GPR04: 0000010031ac0000 0000000000010000 0000000000490000
00007fff9581a5a0
[ 4681.337822] GPR08: 00007fff95812e68 0000000000000000 0000000000000000
0000000000000000
[ 4681.337822] GPR12: 0000000000000000 00007fff9581a5a0 0000000000a00000
ffffffffffffffff
[ 4681.337822] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 4681.337822] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000490000
[ 4681.337822] GPR24: 0000000000000049 0000000000000000 0000000000000000
0000000000010000
[ 4681.337822] GPR28: 0000010031ac0000 0000000000000003 0000000000000000
0000000000490000
[ 4681.337875] NIP [00007fff95651da4] 0x7fff95651da4
[ 4681.337878] LR [0000000000000000] 0x0
[ 4681.337881] --- interrupt: 3000
[ 4681.337884] Instruction dump:
[ 4681.337887] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018
60000000
[ 4681.337897] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c
7d295378
[ 4681.337908] ---[ end trace 0000000000000000 ]---

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


2022-09-26 05:04:10

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

--- Comment #1 from Theodore Tso ([email protected]) ---
On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216529
>
>
> Hit a panic on ppc64le, by running generic/048 with 1k block size:

Hmm, does this reproduce reliably for you? I test with a 1k block
size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
< pagesize... and this isn't reproducing for me on x86, and I don't
have access to a PPC64LE system.

Ritesh, is this something you can take a look at it? Thanks!

- Ted

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-09-26 05:04:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 216529] New: [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216529
>
>
> Hit a panic on ppc64le, by running generic/048 with 1k block size:

Hmm, does this reproduce reliably for you? I test with a 1k block
size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
< pagesize... and this isn't reproducing for me on x86, and I don't
have access to a PPC64LE system.

Ritesh, is this something you can take a look at it? Thanks!

- Ted

2022-09-27 00:48:27

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

--- Comment #2 from Zorro Lang ([email protected]) ---
(In reply to Theodore Tso from comment #1)
> On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216529
> >
> >
> > Hit a panic on ppc64le, by running generic/048 with 1k block size:
>
> Hmm, does this reproduce reliably for you? I test with a 1k block
> size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
> < pagesize... and this isn't reproducing for me on x86, and I don't
> have access to a PPC64LE system.

Hi Ted,

Yes, it's reproducible for me, I just reproduced it again on another ppc64le
(P8) machine [1]. But it's not easy to reproduce by running generic/048 (maybe
there's a better way to reproduce it).

And this time the call trace is a little different, it might be a folio [mm]
related bug? Maybe I should cc linux-mm list to get more checking?

Thanks,
Zorro

[ 1254.857035] run fstests generic/048 at 2022-09-26 12:12:26
[ 1257.651002] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1257.666754] EXT4-fs (sda3): shut down requested (1)
[ 1257.666773] Aborting journal on device sda3-8.
[ 1257.696046] EXT4-fs (sda3): unmounting filesystem.
[ 1259.216580] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1273.042962] restraintd[2251]: *** Current Time: Mon Sep 26 12:12:45 2022
Localwatchdog at: Wed Sep 28 11:54:44 2022
[ 1333.319238] restraintd[2251]: *** Current Time: Mon Sep 26 12:13:45 2022
Localwatchdog at: Wed Sep 28 11:54:44 2022
[ 1394.828503] restraintd[2251]: *** Current Time: Mon Sep 26 12:14:47 2022
Localwatchdog at: Wed Sep 28 11:54:44 2022
[ 1403.799008] BUG: Kernel NULL pointer dereference at 0x00000062
[ 1403.799218] Faulting instruction address: 0xc00000000068edfc
[ 1403.799228] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1403.799233] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1403.799241] Modules linked in: ext4 mbcache jbd2 bonding tls rfkill sunrpc
pseries_rng drm fuse drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi
sg ibmvscsi ibmveth scsi_transport_srp vmx_crypto
[ 1403.799280] CPU: 4 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc7
#1
[ 1403.799293] NIP: c00000000068edfc LR: c00000000068f2a8 CTR:
0000000000000000
[ 1403.799300] REGS: c00000000a44b560 TRAP: 0380 Not tainted (6.0.0-rc7)
[ 1403.799308] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
28028244 XER: 00000001
[ 1403.799327] CFAR: c00000000068ede4 IRQMASK: 0
[ 1403.799327] GPR00: c00000000068f2a8 c00000000a44b800 c000000002cf1700
c00c0000001c0bc0
[ 1403.799327] GPR04: c00000000a44b860 0000000000000002 00000003fb290000
c000000002de7dc8
[ 1403.799327] GPR08: 0000000ae4f08f42 0000000000000000 c00c0000001c0bc0
0000000000008000
[ 1403.799327] GPR12: 00000003fb290000 c00000000ffcc080 c000000000194288
c0000003fff9c480
[ 1403.799327] GPR16: c000000069d30050 0000000000000007 0000000000000000
0000000000000000
[ 1403.799327] GPR20: 0000000000000001 c00000000a44b8f8 c00000000146bad8
5deadbeef0000100
[ 1403.799327] GPR24: 5deadbeef0000122 c000000069d30000 c00000000a44bc00
c00000000a44b8e8
[ 1403.799327] GPR28: c00000000a44b860 c00c0000001c0bc0 0000000000000002
0000000000000002
[ 1403.799413] NIP [c00000000068edfc] drop_buffers.constprop.0+0x4c/0x1c0
[ 1403.799423] LR [c00000000068f2a8] try_to_free_buffers+0x128/0x150
[ 1403.799431] Call Trace:
[ 1403.799434] [c00000000a44b840] [c00000000a44bc00] 0xc00000000a44bc00
[ 1403.799443] [c00000000a44b890] [c0000000004986f8]
filemap_release_folio+0x88/0xb0
[ 1403.799452] [c00000000a44b8b0] [c0000000004c51b0]
shrink_active_list+0x490/0x750
[ 1403.799462] [c00000000a44b9b0] [c0000000004c9f78] shrink_lruvec+0x3f8/0x430
[ 1403.799470] [c00000000a44baa0] [c0000000004ca1e4]
shrink_node_memcgs+0x234/0x290
[ 1403.799478] [c00000000a44bb10] [c0000000004ca3b4] shrink_node+0x174/0x6b0
[ 1403.799486] [c00000000a44bbc0] [c0000000004cace0] balance_pgdat+0x3f0/0x970
[ 1403.799494] [c00000000a44bd20] [c0000000004cb430] kswapd+0x1d0/0x450
[ 1403.799501] [c00000000a44bdc0] [c0000000001943c8] kthread+0x148/0x150
[ 1403.799510] [c00000000a44be10] [c00000000000cbe4]
ret_from_kernel_thread+0x5c/0x64
[ 1403.799520] Instruction dump:
[ 1403.799525] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018
60000000
[ 1403.799540] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c
7d295378
[ 1403.799554] ---[ end trace 0000000000000000 ]---
[ 1403.806330]
[-- MARK -- Mon Sep 26 16:15:00 2022]
[ 1415.093395] EXT4-fs (sda3): shut down requested (2)
[ 1415.093410] Aborting journal on device sda3-8.
[ 1429.107188] EXT4-fs (sda3): unmounting filesystem.
[ 1429.926262] EXT4-fs (sda3): recovery complete
[ 1429.983938] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1429.988189] EXT4-fs (sda3): unmounting filesystem.
[ 1430.166549] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1453.015796] restraintd[2251]: *** Current Time: Mon Sep 26 12:15:45 2022
Localwatchdog at: Wed Sep 28 11:54:44 2022
[ 1454.708150] EXT4-fs (sda5): unmounting filesystem.
[ 1455.225112] EXT4-fs (sda3): unmounting filesystem.
[ 1456.128026] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1456.139102] EXT4-fs (sda3): unmounting filesystem.
[ 1456.396367] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1462.317449] EXT4-fs (sda3): mounted filesystem with ordered data mode. Quota
mode: none.
[ 1462.326680] EXT4-fs (sda3): unmounting filesystem.
[ 1462.427320] EXT4-fs (sda5): unmounting filesystem.
[ 1463.259690] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota
mode: none.


>
> Ritesh, is this something you can take a look at it? Thanks!
>
> - Ted

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-09-27 18:28:35

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

--- Comment #3 from Theodore Tso ([email protected]) ---
On Tue, Sep 27, 2022 at 12:47:02AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216529
>
> Yes, it's reproducible for me, I just reproduced it again on another ppc64le
> (P8) machine [1]. But it's not easy to reproduce by running generic/048
> (maybe
> there's a better way to reproduce it).

Can you give a rough percentage of how often it reproduces? e.g.,
does it reproduces 10% of the time? 50% of the time? 2-3 times after
100 tries, so 2-3%? etc. If it reproduces but rarely, it'll be a lot
harder to try to bisect.

Something perhaps to try is to enable KASAN, since both stack traces
seem to involve a null pointer derference while trying to free
buffers. Maybe that will give us some hints towards the cause....

Thanks,

- Ted

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-09-27 18:29:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

On Tue, Sep 27, 2022 at 12:47:02AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216529
>
> Yes, it's reproducible for me, I just reproduced it again on another ppc64le
> (P8) machine [1]. But it's not easy to reproduce by running generic/048 (maybe
> there's a better way to reproduce it).

Can you give a rough percentage of how often it reproduces? e.g.,
does it reproduces 10% of the time? 50% of the time? 2-3 times after
100 tries, so 2-3%? etc. If it reproduces but rarely, it'll be a lot
harder to try to bisect.

Something perhaps to try is to enable KASAN, since both stack traces
seem to involve a null pointer derference while trying to free
buffers. Maybe that will give us some hints towards the cause....

Thanks,

- Ted

2022-09-27 18:30:04

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [Bug 216529] New: [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

On 22/09/26 01:02AM, Theodore Ts'o wrote:
> On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216529
> >
> >
> > Hit a panic on ppc64le, by running generic/048 with 1k block size:
>
> Hmm, does this reproduce reliably for you? I test with a 1k block
> size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
> < pagesize... and this isn't reproducing for me on x86, and I don't
> have access to a PPC64LE system.
>
> Ritesh, is this something you can take a look at it? Thanks!

I was away for some personal work for last few days, but I am back to work from
today. Sure, I will take a look at this and will get back.

I did give this test a couple of runs though, but wasn't able to reproduce it.
But let me try few more things along with more iterations. Will update
accordingly.

-ritesh

2022-09-27 18:30:15

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

--- Comment #4 from [email protected] ---
On 22/09/26 01:02AM, Theodore Ts'o wrote:
> On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216529
> >
> >
> > Hit a panic on ppc64le, by running generic/048 with 1k block size:
>
> Hmm, does this reproduce reliably for you? I test with a 1k block
> size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
> < pagesize... and this isn't reproducing for me on x86, and I don't
> have access to a PPC64LE system.
>
> Ritesh, is this something you can take a look at it? Thanks!

I was away for some personal work for last few days, but I am back to work from
today. Sure, I will take a look at this and will get back.

I did give this test a couple of runs though, but wasn't able to reproduce it.
But let me try few more things along with more iterations. Will update
accordingly.

-ritesh

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-10-10 17:02:42

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [Bug 216529] New: [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

On 22/09/27 11:40PM, Ritesh Harjani (IBM) wrote:
> On 22/09/26 01:02AM, Theodore Ts'o wrote:
> > On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216529
> > >
> > >
> > > Hit a panic on ppc64le, by running generic/048 with 1k block size:
> >
> > Hmm, does this reproduce reliably for you? I test with a 1k block
> > size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
> > < pagesize... and this isn't reproducing for me on x86, and I don't
> > have access to a PPC64LE system.
> >
> > Ritesh, is this something you can take a look at it? Thanks!
>
> I was away for some personal work for last few days, but I am back to work from
> today. Sure, I will take a look at this and will get back.
>
> I did give this test a couple of runs though, but wasn't able to reproduce it.
> But let me try few more things along with more iterations. Will update
> accordingly.

I thought I had updated this. But I guess I forgot to update on this mail
thread...

I tested this for quite some time in a loop and also gave it a overnight run,
but I couldn't hit this issue. I had kept low memory size guest, so that we
could see more reclaim activity (which I also ensured by doing perf trace to see
if we are going over that path or not while test was running).

I am not sure whether this could be a timing issue or what. Maybe if you could
share your defconfig, I could give a try with that on my setup once.

-ritesh

2022-10-10 17:02:42

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216529] [fstests generic/048] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0

https://bugzilla.kernel.org/show_bug.cgi?id=216529

--- Comment #5 from [email protected] ---
On 22/09/27 11:40PM, Ritesh Harjani (IBM) wrote:
> On 22/09/26 01:02AM, Theodore Ts'o wrote:
> > On Sun, Sep 25, 2022 at 11:55:29AM +0000, [email protected] wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216529
> > >
> > >
> > > Hit a panic on ppc64le, by running generic/048 with 1k block size:
> >
> > Hmm, does this reproduce reliably for you? I test with a 1k block
> > size on x86_64 as a proxy 4k block sizes on PPC64, where the blocksize
> > < pagesize... and this isn't reproducing for me on x86, and I don't
> > have access to a PPC64LE system.
> >
> > Ritesh, is this something you can take a look at it? Thanks!
>
> I was away for some personal work for last few days, but I am back to work
> from
> today. Sure, I will take a look at this and will get back.
>
> I did give this test a couple of runs though, but wasn't able to reproduce
> it.
> But let me try few more things along with more iterations. Will update
> accordingly.

I thought I had updated this. But I guess I forgot to update on this mail
thread...

I tested this for quite some time in a loop and also gave it a overnight run,
but I couldn't hit this issue. I had kept low memory size guest, so that we
could see more reclaim activity (which I also ensured by doing perf trace to
see
if we are going over that path or not while test was running).

I am not sure whether this could be a timing issue or what. Maybe if you could
share your defconfig, I could give a try with that on my setup once.

-ritesh

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.