On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote:
> In production we have been hitting the following warning consistently
>
> ------------[ cut here ]------------
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28
> refcount_warn_saturate+0x9c/0xe0
> Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
> RIP: 0010:refcount_warn_saturate+0x9c/0xe0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __warn+0x9f/0x130
> ? refcount_warn_saturate+0x9c/0xe0
> ? report_bug+0xcc/0x150
> ? handle_bug+0x3d/0x70
> ? exc_invalid_op+0x16/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0x9c/0xe0
> nfs_direct_write_schedule_work+0x237/0x250 [nfs]
> process_one_work+0x12f/0x4a0
> worker_thread+0x14e/0x3b0
> ? ZSTD_getCParams_internal+0x220/0x220
> kthread+0xdc/0x120
> ? __btf_name_valid+0xa0/0xa0
> ret_from_fork+0x1f/0x30
>
> This is because we're completing the nfs_direct_request twice in a
> row.
>
> The source of this is when we have our commit requests to submit, we
> process them and send them off, and then in the completion path for
> the
> commit requests we have
>
> if (nfs_commit_end(cinfo.mds))
> nfs_direct_write_complete(dreq);
>
> However since we're submitting asynchronous requests we sometimes
> have
> one that completes before we submit the next one, so we end up
> calling
> complete on the nfs_direct_request twice.
>
> The only other place we use nfs_generic_commit_list() is in
> __nfs_commit_inode, which wraps this call in a
>
> nfs_commit_begin();
> nfs_commit_end();
>
> Which is a common pattern for this style of completion handling, one
> that is also repeated in the direct code with get_dreq()/put_dreq()
> calls around where we process events as well as in the completion
> paths.
>
> Fix this by using the same pattern for the commit requests.
>
> Before with my 200 node rocksdb stress running this warning would pop
> every 10ish minutes. With my patch the stress test has been running
> for
> several hours without popping.
>
> Signed-off-by: Josef Bacik <[email protected]>
I think this deserves to be a stable patch. Thanks for finding it!
> ---
> fs/nfs/direct.c | 11 +++++++++--
> fs/nfs/write.c | 2 +-
> include/linux/nfs_fs.h | 1 +
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index befcc167e25f..6b8798d01e3a 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct
> nfs_direct_req *dreq)
> LIST_HEAD(mds_list);
>
> nfs_init_cinfo_from_dreq(&cinfo, dreq);
> + nfs_commit_begin(cinfo.mds);
> nfs_scan_commit(dreq->inode, &mds_list, &cinfo);
> res = nfs_generic_commit_list(dreq->inode, &mds_list, 0,
> &cinfo);
> - if (res < 0) /* res == -ENOMEM */
> - nfs_direct_write_reschedule(dreq);
> + if (res < 0) { /* res == -ENOMEM */
> + spin_lock(&dreq->lock);
> + if (dreq->flags == 0)
> + dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> + spin_unlock(&dreq->lock);
> + }
> + if (nfs_commit_end(cinfo.mds))
> + nfs_direct_write_complete(dreq);
> }
>
> static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq)
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index bb79d3a886ae..5d9dc6c05325 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct
> nfs_mds_commit_info *cinfo)
> !atomic_read(&cinfo-
> >rpcs_out));
> }
>
> -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
> {
> atomic_inc(&cinfo->rpcs_out);
> }
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index f5ce7b101146..d59116ac8209 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode,
> struct folio *folio);
> extern int nfs_commit_inode(struct inode *, int);
> extern struct nfs_commit_data *nfs_commitdata_alloc(void);
> extern void nfs_commit_free(struct nfs_commit_data *data);
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo);
> bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
>
> static inline bool nfs_have_writebacks(const struct inode *inode)
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]