2024-03-04 17:57:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2/2] nfs: fix UAF in direct writes

On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote:
> In production we have been hitting the following warning consistently
>
> ------------[ cut here ]------------
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28
> refcount_warn_saturate+0x9c/0xe0
> Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
> RIP: 0010:refcount_warn_saturate+0x9c/0xe0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? __warn+0x9f/0x130
>  ? refcount_warn_saturate+0x9c/0xe0
>  ? report_bug+0xcc/0x150
>  ? handle_bug+0x3d/0x70
>  ? exc_invalid_op+0x16/0x40
>  ? asm_exc_invalid_op+0x16/0x20
>  ? refcount_warn_saturate+0x9c/0xe0
>  nfs_direct_write_schedule_work+0x237/0x250 [nfs]
>  process_one_work+0x12f/0x4a0
>  worker_thread+0x14e/0x3b0
>  ? ZSTD_getCParams_internal+0x220/0x220
>  kthread+0xdc/0x120
>  ? __btf_name_valid+0xa0/0xa0
>  ret_from_fork+0x1f/0x30
>
> This is because we're completing the nfs_direct_request twice in a
> row.
>
> The source of this is when we have our commit requests to submit, we
> process them and send them off, and then in the completion path for
> the
> commit requests we have
>
> if (nfs_commit_end(cinfo.mds))
> nfs_direct_write_complete(dreq);
>
> However since we're submitting asynchronous requests we sometimes
> have
> one that completes before we submit the next one, so we end up
> calling
> complete on the nfs_direct_request twice.
>
> The only other place we use nfs_generic_commit_list() is in
> __nfs_commit_inode, which wraps this call in a
>
> nfs_commit_begin();
> nfs_commit_end();
>
> Which is a common pattern for this style of completion handling, one
> that is also repeated in the direct code with get_dreq()/put_dreq()
> calls around where we process events as well as in the completion
> paths.
>
> Fix this by using the same pattern for the commit requests.
>
> Before with my 200 node rocksdb stress running this warning would pop
> every 10ish minutes.  With my patch the stress test has been running
> for
> several hours without popping.
>
> Signed-off-by: Josef Bacik <[email protected]>

I think this deserves to be a stable patch. Thanks for finding it!

> ---
>  fs/nfs/direct.c        | 11 +++++++++--
>  fs/nfs/write.c         |  2 +-
>  include/linux/nfs_fs.h |  1 +
>  3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index befcc167e25f..6b8798d01e3a 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct
> nfs_direct_req *dreq)
>   LIST_HEAD(mds_list);
>  
>   nfs_init_cinfo_from_dreq(&cinfo, dreq);
> + nfs_commit_begin(cinfo.mds);
>   nfs_scan_commit(dreq->inode, &mds_list, &cinfo);
>   res = nfs_generic_commit_list(dreq->inode, &mds_list, 0,
> &cinfo);
> - if (res < 0) /* res == -ENOMEM */
> - nfs_direct_write_reschedule(dreq);
> + if (res < 0) { /* res == -ENOMEM */
> + spin_lock(&dreq->lock);
> + if (dreq->flags == 0)
> + dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
> + spin_unlock(&dreq->lock);
> + }
> + if (nfs_commit_end(cinfo.mds))
> + nfs_direct_write_complete(dreq);
>  }
>  
>  static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq)
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index bb79d3a886ae..5d9dc6c05325 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct
> nfs_mds_commit_info *cinfo)
>          !atomic_read(&cinfo-
> >rpcs_out));
>  }
>  
> -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo)
>  {
>   atomic_inc(&cinfo->rpcs_out);
>  }
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index f5ce7b101146..d59116ac8209 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode,
> struct folio *folio);
>  extern int  nfs_commit_inode(struct inode *, int);
>  extern struct nfs_commit_data *nfs_commitdata_alloc(void);
>  extern void nfs_commit_free(struct nfs_commit_data *data);
> +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo);
>  bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
>  
>  static inline bool nfs_have_writebacks(const struct inode *inode)

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]