2024-03-04 13:36:20

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH 2/2] nfs: fix UAF in direct writes

On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote:
> In production we have been hitting the following warning consistently
>
> ------------[ cut here ]------------
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0
> Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
> RIP: 0010:refcount_warn_saturate+0x9c/0xe0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __warn+0x9f/0x130
> ? refcount_warn_saturate+0x9c/0xe0
> ? report_bug+0xcc/0x150
> ? handle_bug+0x3d/0x70
> ? exc_invalid_op+0x16/0x40
> ? asm_exc_invalid_op+0x16/0x20
> ? refcount_warn_saturate+0x9c/0xe0
> nfs_direct_write_schedule_work+0x237/0x250 [nfs]
> process_one_work+0x12f/0x4a0
> worker_thread+0x14e/0x3b0
> ? ZSTD_getCParams_internal+0x220/0x220
> kthread+0xdc/0x120
> ? __btf_name_valid+0xa0/0xa0
> ret_from_fork+0x1f/0x30
>
> This is because we're completing the nfs_direct_request twice in a row.
>
> The source of this is when we have our commit requests to submit, we
> process them and send them off, and then in the completion path for the
> commit requests we have
>
> if (nfs_commit_end(cinfo.mds))
> nfs_direct_write_complete(dreq);
>
> However since we're submitting asynchronous requests we sometimes have
> one that completes before we submit the next one, so we end up calling
> complete on the nfs_direct_request twice.
>
> The only other place we use nfs_generic_commit_list() is in
> __nfs_commit_inode, which wraps this call in a
>
> nfs_commit_begin();
> nfs_commit_end();
>
> Which is a common pattern for this style of completion handling, one
> that is also repeated in the direct code with get_dreq()/put_dreq()
> calls around where we process events as well as in the completion paths.
>
> Fix this by using the same pattern for the commit requests.
>
> Before with my 200 node rocksdb stress running this warning would pop
> every 10ish minutes. With my patch the stress test has been running for
> several hours without popping.
>
> Signed-off-by: Josef Bacik <[email protected]>
> ---
> fs/nfs/direct.c | 11 +++++++++--
> fs/nfs/write.c | 2 +-
> include/linux/nfs_fs.h | 1 +
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index befcc167e25f..6b8798d01e3a 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
> LIST_HEAD(mds_list);
>
> nfs_init_cinfo_from_dreq(&cinfo, dreq);
> + nfs_commit_begin(cinfo.mds);
> nfs_scan_commit(dreq->inode, &mds_list, &cinfo);
> res = nfs_generic_commit_list(dreq->inode, &mds_list, 0, &cinfo);
> - if (res < 0) /* res == -ENOMEM */
> - nfs_direct_write_reschedule(dreq);
> + if (res < 0) { /* res == -ENOMEM */
> + spin_lock(&dreq->lock);
> + if (dreq->flags == 0)
> + dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> + spin_unlock(&dreq->lock);
> + }
> + if (nfs_commit_end(cinfo.mds))
> + nfs_direct_write_complete(dreq);


Before, it was calling nfs_direct_write_reschedule directly, but with
the above change it's now just setting NFS_ODIRECT_RESCHED_WRITES and
then queueing the completion workqueue job if it's the last reference
(which should then go and reschedule the job, I think).

That may be a reasonable change to make, but I think it merits some
justification and mention in the changelog. Was that change necessary
for some reason? If so, why?


> }
>
> static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq)
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index bb79d3a886ae..5d9dc6c05325 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct nfs_mds_commit_info *cinfo)
> !atomic_read(&cinfo->rpcs_out));
> }
>
> -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
> {
> atomic_inc(&cinfo->rpcs_out);
> }
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index f5ce7b101146..d59116ac8209 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio);
> extern int nfs_commit_inode(struct inode *, int);
> extern struct nfs_commit_data *nfs_commitdata_alloc(void);
> extern void nfs_commit_free(struct nfs_commit_data *data);
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo);
> bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
>
> static inline bool nfs_have_writebacks(const struct inode *inode)

--
Jeff Layton <[email protected]>