2023-09-28 16:36:30

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH] nfs: decrement nrequests counter before releasing the req

On Thu, Sep 28, 2023 at 9:31 AM Benjamin Coddington <[email protected]> wrote:
>
> On 28 Sep 2023, at 9:04, Benjamin Coddington wrote:
>
> > On 27 Sep 2023, at 17:16, Scott Mayhew wrote:
> >
> >> On Wed, 27 Sep 2023, Trond Myklebust wrote:
> >>
> >>> On Tue, 2023-09-19 at 09:17 -0400, Jeff Layton wrote:
> >>>> I hit this panic in testing:
> >>>>
> >>>> [ 6235.500016] run fstests generic/464 at 2023-09-18 22:51:24
> >>>> [ 6288.410761] BUG: kernel NULL pointer dereference, address:
> >>>> 0000000000000000
> >>>> [ 6288.412174] #PF: supervisor read access in kernel mode
> >>>> [ 6288.413160] #PF: error_code(0x0000) - not-present page
> >>>> [ 6288.413992] PGD 0 P4D 0
> >>>> [ 6288.414603] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>> [ 6288.415419] CPU: 0 PID: 340798 Comm: kworker/u18:8 Not tainted
> >>>> 6.6.0-rc1-gdcf620ceebac #95
> >>>> [ 6288.416538] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> >>>> BIOS 1.16.2-1.fc38 04/01/2014
> >>>> [ 6288.417701] Workqueue: nfsiod rpc_async_release [sunrpc]
> >>>> [ 6288.418676] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> >>>> [ 6288.419836] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> >>>> ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> >>>> 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> >>>> cc 48 8b
> >>>> [ 6288.422389] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> >>>> [ 6288.423234] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> >>>> 0000000000000000
> >>>> [ 6288.424351] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> >>>> ffffef41448e3840
> >>>> [ 6288.425345] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> >>>> ffffffffffffffff
> >>>> [ 6288.426334] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> >>>> ffff9a29093f98c4
> >>>> [ 6288.427353] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> >>>> ffff9a29230f62d0
> >>>> [ 6288.428358] FS: 0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> >>>> knlGS:0000000000000000
> >>>> [ 6288.429513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [ 6288.430427] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> >>>> 0000000000770ef0
> >>>> [ 6288.431553] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >>>> 0000000000000000
> >>>> [ 6288.432715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >>>> 0000000000000400
> >>>> [ 6288.433698] PKRU: 55555554
> >>>> [ 6288.434196] Call Trace:
> >>>> [ 6288.434667] <TASK>
> >>>> [ 6288.435132] ? __die+0x1f/0x70
> >>>> [ 6288.435723] ? page_fault_oops+0x159/0x450
> >>>> [ 6288.436389] ? try_to_wake_up+0x98/0x5d0
> >>>> [ 6288.437044] ? do_user_addr_fault+0x65/0x660
> >>>> [ 6288.437728] ? exc_page_fault+0x7a/0x180
> >>>> [ 6288.438368] ? asm_exc_page_fault+0x22/0x30
> >>>> [ 6288.439137] ? nfs_inode_remove_request+0xc8/0x150 [nfs]
> >>>> [ 6288.440112] ? nfs_inode_remove_request+0xa0/0x150 [nfs]
> >>>> [ 6288.440924] nfs_commit_release_pages+0x16e/0x340 [nfs]
> >>>> [ 6288.441700] ? __pfx_call_transmit+0x10/0x10 [sunrpc]
> >>>> [ 6288.442475] ? _raw_spin_lock_irqsave+0x23/0x50
> >>>> [ 6288.443161] nfs_commit_release+0x15/0x40 [nfs]
> >>>> [ 6288.443926] rpc_free_task+0x36/0x60 [sunrpc]
> >>>> [ 6288.444741] rpc_async_release+0x29/0x40 [sunrpc]
> >>>> [ 6288.445509] process_one_work+0x171/0x340
> >>>> [ 6288.446135] worker_thread+0x277/0x3a0
> >>>> [ 6288.446724] ? __pfx_worker_thread+0x10/0x10
> >>>> [ 6288.447376] kthread+0xf0/0x120
> >>>> [ 6288.447903] ? __pfx_kthread+0x10/0x10
> >>>> [ 6288.448500] ret_from_fork+0x2d/0x50
> >>>> [ 6288.449078] ? __pfx_kthread+0x10/0x10
> >>>> [ 6288.449665] ret_from_fork_asm+0x1b/0x30
> >>>> [ 6288.450283] </TASK>
> >>>> [ 6288.450688] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>> dns_resolver nfs lockd grace sunrpc nls_iso8859_1 nls_cp437 vfat fat
> >>>> 9p netfs ext4 kvm_intel crc16 mbcache jbd2 joydev kvm xfs irqbypass
> >>>> virtio_net pcspkr net_failover psmouse failover 9pnet_virtio cirrus
> >>>> drm_shmem_helper virtio_balloon drm_kms_helper button evdev drm loop
> >>>> dm_mod zram zsmalloc crct10dif_pclmul crc32_pclmul
> >>>> ghash_clmulni_intel sha512_ssse3 sha512_generic virtio_blk nvme
> >>>> aesni_intel crypto_simd cryptd nvme_core t10_pi i6300esb
> >>>> crc64_rocksoft_generic crc64_rocksoft crc64 virtio_pci virtio
> >>>> virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring serio_raw
> >>>> btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor
> >>>> raid6_pq autofs4
> >>>> [ 6288.460211] CR2: 0000000000000000
> >>>> [ 6288.460787] ---[ end trace 0000000000000000 ]---
> >>>> [ 6288.461571] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> >>>> [ 6288.462500] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> >>>> ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> >>>> 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> >>>> cc 48 8b
> >>>> [ 6288.465136] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> >>>> [ 6288.465963] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> >>>> 0000000000000000
> >>>> [ 6288.467035] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> >>>> ffffef41448e3840
> >>>> [ 6288.468093] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> >>>> ffffffffffffffff
> >>>> [ 6288.469121] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> >>>> ffff9a29093f98c4
> >>>> [ 6288.470109] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> >>>> ffff9a29230f62d0
> >>>> [ 6288.471106] FS: 0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> >>>> knlGS:0000000000000000
> >>>> [ 6288.472216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [ 6288.473059] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> >>>> 0000000000770ef0
> >>>> [ 6288.474096] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >>>> 0000000000000000
> >>>> [ 6288.475097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >>>> 0000000000000400
> >>>> [ 6288.476148] PKRU: 55555554
> >>>> [ 6288.476665] note: kworker/u18:8[340798] exited with irqs disabled
> >>>>
> >>>> Once we've released "req", it's not safe to dereference it anymore.
> >>>> Decrement the nrequests counter before dropping the reference.
> >>>>
> >>>> Signed-off-by: Jeff Layton <[email protected]>
> >>>> ---
> >>>> I've only hit this once after a lot of testing, so I can't confirm
> >>>> that
> >>>> this fixes anything. It seems like the right thing to do, however.
> >>>> ---
> >>>> fs/nfs/write.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> >>>> index 8c1ee1a1a28f..7720b5e43014 100644
> >>>> --- a/fs/nfs/write.c
> >>>> +++ b/fs/nfs/write.c
> >>>> @@ -802,8 +802,8 @@ static void nfs_inode_remove_request(struct
> >>>> nfs_page *req)
> >>>> }
> >>>>
> >>>> if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> >>>> - nfs_release_request(req);
> >>>> atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> >>>>> nrequests);
> >>>> + nfs_release_request(req);
> >>>> }
> >>>> }
> >>>>
> >>>
> >>> Isn't this the same issue that Scott Mayhew posted a patch ("NFS: Fix
> >>> potential oops in nfs_inode_remove_request()") on July 25th?
> >>>
> >>> At the time I argued for something like the above patch, but both Scott
> >>> and you argued that it was insufficient to fix the problem.
> >>>
> >>> So which patch is the one we should apply?
> >>
> >> This one. If there's a separate bug, I've had zero luck in
> >> reproducing it.
> >
> > Would you say you've had zero luck reproducing this one too?
> >
> > I think your patch on July 25th fixes a real bug based on that vmcore we
> > have, but it's a pretty crazy race. I'll try again to reproduce it.
> >
> > Ben

Okay, I'll plan on taking this patch for 6.6-rc now and if we
determine there is still a bug we can add an additional patch on top.

Anna

>


2023-10-02 20:15:21

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] nfs: decrement nrequests counter before releasing the req

On 28 Sep 2023, at 12:23, Anna Schumaker wrote:
>>
>> On 28 Sep 2023, at 9:04, Benjamin Coddington wrote:
>>>
>>> I think your patch on July 25th fixes a real bug based on that vmcore we
>>> have, but it's a pretty crazy race. I'll try again to reproduce it.
>>>
>>> Ben
>
> Okay, I'll plan on taking this patch for 6.6-rc now and if we
> determine there is still a bug we can add an additional patch on top.

I can reproduce the crash Scott reported in July:
https://lore.kernel.org/linux-nfs/[email protected]/

I used a three-way race between lock(), write(), and syncfs() with a
mapped file and no delegation, and I can reliably hit it by inserting a few
small delays into the kernel.

Scott's patch fixes it and his analysis is correct. I can resend that patch
with more information, or provide more details, if needed. Basically the
issue is that the lock/unlock path sets NFS_INO_INVALID_DATA, the write path
invalidates on that, and syncfs() (or a rarer writeback flush) swoops in and
dereferences the null folio->mapping.

I think Jeff's fix here is also correct, both problems need a fix.

Ben

(Splat included because.. it was hard to make it.)

[ 966.743865] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 966.743879] Mem abort info:
[ 966.743884] ESR = 0x0000000096000004
[ 966.743890] EC = 0x25: DABT (current EL), IL = 32 bits
[ 966.743898] SET = 0, FnV = 0
[ 966.743904] EA = 0, S1PTW = 0
[ 966.743910] FSC = 0x04: level 0 translation fault
[ 966.743918] Data abort info:
[ 966.743923] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 966.743931] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 966.743938] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 966.743946] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000100a03000
[ 966.743955] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 966.743965] Internal error: Oops: 0000000096000004 [#1] SMP
[ 966.743974] Modules linked in: nfsv4(OE) nfs(OE) nfsd nfs_acl rpcsec_gss_krb5 auth_rpcgss dns_resolver lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock sunrpc vfat fat snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep uvcvideo snd_seq uvc videobuf2_vmalloc snd_seq_device videobuf2_memops videobuf2_v4l2 snd_pcm videobuf2_common videodev snd_timer snd mc joydev soundcore vmw_vmci loop zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sha3_ce sha512_ce vmwgfx sha512_arm64 e1000e nvme nvme_core nvme_common drm_ttm_helper ttm uhci_hcd scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse dm_multipath
[ 966.744003] Unloaded tainted modules: nfs(OE):1 nfsv4(OE):1 [last unloaded: nfs(OE)]
[ 966.744060] CPU: 6 PID: 3666 Comm: kworker/u27:0 Tainted: G OE 6.6.0-rc1.nks #191
[ 966.744307] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.20904234.BA64.2212051119 12/05/2022
[ 966.744504] Workqueue: nfsiod rpc_async_release [sunrpc]
[ 966.744725] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 966.744915] pc : nfs_inode_remove_request+0xc0/0x220 [nfs]
[ 966.745123] lr : nfs_inode_remove_request+0x9c/0x220 [nfs]
[ 966.745323] sp : ffff8000848fbc90
[ 966.745499] x29: ffff8000848fbc90 x28: 0000000000000000 x27: ffff8000823aec50
[ 966.745673] x26: ffff800081c0e008 x25: ffff0000c7afe2d0 x24: ffff0000c7afe2e0
[ 966.745849] x23: 0000000000000000 x22: ffff80007b92bea8 x21: ffff0000cc137d00
[ 966.746021] x20: 0000000000418958 x19: 0000000000000000 x18: 0000000000000014
[ 966.746193] x17: 1d00000000000000 x16: 4100000000000000 x15: 00000003989a4603
[ 966.746365] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101
[ 966.746536] x11: 7f7f7f7f7f7f7f7f x10: fefefefefefefeff x9 : ffff80007bb02f4c
[ 966.746708] x8 : ffff8000848fbbd0 x7 : 0000000000000000 x6 : 0000000000000000
[ 966.746881] x5 : 0000000000000800 x4 : ffff8000848fbc70 x3 : ffff8000848fbc90
[ 966.747052] x2 : 0000000000000002 x1 : 002fffff00000128 x0 : 0000000000000000
[ 966.747222] Call trace:
[ 966.747390] nfs_inode_remove_request+0xc0/0x220 [nfs]
[ 966.747574] nfs_commit_release_pages+0x218/0x338 [nfs]
[ 966.747756] nfs_commit_release+0x28/0x58 [nfs]
[ 966.747934] rpc_free_task+0x3c/0x78 [sunrpc]
[ 966.748128] rpc_async_release+0x34/0x60 [sunrpc]
[ 966.748315] process_one_work+0x170/0x3d8
[ 966.748482] worker_thread+0x2bc/0x3e0
[ 966.748648] kthread+0xf8/0x110
[ 966.748813] ret_from_fork+0x10/0x20
[ 966.748977] Code: 36980741 f9400001 36600701 952221f7 (f9400000)
[ 966.749143] ---[ end trace 0000000000000000 ]---